Give AlbumentationsX a star on GitHub — it powers this leaderboard

Star on GitHub

snowflake-ml-python

The machine learning client library that is used for interacting with Snowflake to build machine learning solutions.

Rank: #3965Downloads: 1,158,571 (30 days)Stars: 62Forks: 13

Description

Snowflake ML Python

Snowflake ML Python is a set of tools including SDKs and underlying infrastructure to build and deploy machine learning models. With Snowflake ML Python, you can pre-process data, train, manage and deploy ML models all within Snowflake, and benefit from Snowflake’s proven performance, scalability, stability and governance at every stage of the Machine Learning workflow.

Key Components of Snowflake ML Python

The Snowflake ML Python SDK provides a number of APIs to support each stage of an end-to-end Machine Learning development and deployment process.

Snowflake ML Model Development

Snowflake ML Model Development provides a collection of python APIs enabling efficient ML model development directly in Snowflake:

  1. Modeling API (snowflake.ml.modeling) for data preprocessing, feature engineering and model training in Snowflake. This includes the snowflake.ml.modeling.preprocessing module for scalable data transformations on large data sets utilizing the compute resources of underlying Snowpark Optimized High Memory Warehouses, and a large collection of ML model development classes based on sklearn, xgboost, and lightgbm.

  2. Framework Connectors: Optimized, secure and performant data provisioning for Pytorch and Tensorflow frameworks in their native data loader formats.

Snowflake ML Ops

Snowflake ML Python contains a suite of MLOps tools. It complements the Snowflake Modeling API, and provides end to end development to deployment within Snowflake. The Snowflake ML Ops suite consists of:

  1. Registry: A python API allows secure deployment and management of models in Snowflake, supporting models trained both inside and outside of Snowflake.
  2. Feature Store: A fully integrated solution for defining, managing, storing and discovering ML features derived from your data. The Snowflake Feature Store supports automated, incremental refresh from batch and streaming data sources, so that feature pipelines need be defined only once to be continuously updated with new data.
  3. Datasets: Dataset provide an immutable, versioned snapshot of your data suitable for ingestion by your machine learning models.

Getting started

Learn about all Snowflake ML feature offerings in the Developer Guide.

Have your Snowflake account ready

If you don't have a Snowflake account yet, you can sign up for a 30-day free trial account.

Installation

Snowflake ML Python is pre-installed in Container Runtime notebook environments. Learn more.

In Snowflake Warehouse notebook environments, snowflake-ml-python can be installed using the "Packages" drop-down menu.

Follow the installation instructions in the Snowflake documentation.

Python versions 3.9 to 3.12 are supported. You can use miniconda or anaconda to create a Conda environment (recommended), or virtualenv to create a virtual environment.

Conda channels

The Snowflake Anaconda Channel contains the official snowflake-ml-python package releases. To install snowflake-ml-python from this conda channel:

conda install \
  -c https://repo.anaconda.com/pkgs/snowflake \
  --override-channels \
  snowflake-ml-python

See the developer guide for detailed installation instructions.

The snowflake-ml-python package is also published in conda-forge. To install snowflake-ml-python from conda forge:

conda install \
  -c https://conda.anaconda.org/conda-forge/ \
  --override-channels \
  snowflake-ml-python

Verifying the package

  1. Install cosign. This example is using golang installation: installing-cosign-with-go.

  2. Download the file from the repository like pypi.

  3. Download the signature files from the release tag.

  4. Verify signature on projects signed using Jenkins job:

    cosign verify-blob snowflake_ml_python-1.7.0.tar.gz --key snowflake-ml-python-1.7.0.pub --signature resources.linux.snowflake_ml_python-1.7.0.tar.gz.sig
    
    cosign verify-blob snowflake_ml_python-1.7.0.tar.gz --key snowflake-ml-python-1.7.0.pub --signature resources.linux.snowflake_ml_python-1.7.0
    

NOTE: Version 1.7.0 is used as example here. Please choose the the latest version.

Release History

1.29.0

New Features

  • Model serving: Introducing InferenceEngine.PYTHON_GENERIC enum value. Users can pass InferenceEngine.PYTHON_GENERIC to use a Python-based inference server for model serving.

Bug Fixes

  • Registry: Fixed a bug where using inference parameters (ParamSpec) with table function or partitioned model methods would fail at runtime with a NameError.
  • Registry: Fixed a bug where MLflow models with string columns failed during inference with "Can not safely convert string to <U0>" errors due to MLflow's schema validation not handling pd.StringDtype correctly.

Behavior Changes

Deprecations

1.28.0 (2026-02-17)

New Features

Bug Fixes

Behavior Changes

Deprecations

1.27.0 (2026-02-12)

New Features

Bug Fixes

  • Registry: Fixed a bug where model_version.run() required READ privilege on the model instead of USAGE, causing inference to fail for users with only USAGE privilege granted (introduced in 1.21.0).

  • Feature Store: Fixed register_feature_view() with overwrite=True failing when changing between external and managed feature views.

Behavior Changes

Deprecations

1.26.0 (2026-02-05)

New Features

  • ML Job: Added support for creating MLJobDefinition (PrPr) and launching jobs with different arguments without re-uploading payloads.
# /path/to/repo/my_script.py

def main(*args):
    print("Hello world", *args)

if __name__ == '__main__':
    import sys
    main(*sys.argv[1:])

from snowflake.ml.jobs.job_definition import MLJobDefinition
job_def = MLJobDefinition.register(
  "/path/to/repo/my_script.py",
  # If you register a source directory, provide the entrypoint file:
  # entrypoint="/path/to/repo/my_script.py",
  compute_pool= "test_comput_pool",
  stage_name="payload_stage",
)

job1 = job_def()
job2 = job_def(arg1="ML Job")


from snowflake.ml import jobs
@jobs.remote(compute_pool = "test_compute_pool", stage_name = "payload_stage")
def test_job(arg1: str = "world") -> None:
  print(f"hello {arg1}")

# this is a job definition handle
job_def_remote = test_job

job1 = job_def_remote()
job2 = job_def_remote(arg1="ML Job")


  • Job-based Batch Inference (PuPr): ModelVersion.run_batch for job-based batch inference in Snowpark Container Services is now in public preview.
from snowflake.ml.registry import Registry
from snowflake.ml.model import OutputSpec

registry = Registry(session)
mv = registry.log_model( ... )

job = mv.run_batch(
    compute_pool = "SYSTEM_COMPUTE_POOL_GPU",
    X=input_df,
    output_spec=OutputSpec(stage_location="@my_db.my_schema.my_stage/output/"),
)
  • Registry: Added support for inference parameters via ParamSpec in model signatures. This allows you to define constant parameters that can be passed at inference time without being part of the input data.
import pandas as pd
from snowflake.ml.model import custom_model, model_signature
from snowflake.ml.registry import Registry

# Define a custom model with inference parameters
class MyModelWithParams(custom_model.CustomModel):
    @custom_model.inference_api
    def predict(
        self,
        input_df: pd.DataFrame,
        *,
        temperature: float = 1.0,  # keyword-only param with default
    ) -> pd.DataFrame:
        return pd.DataFrame({"output": input_df["feature"] * temperature})

# Create sample data
model = MyModelWithParams(custom_model.ModelContext())
sample_input = pd.DataFrame({"feature": [1.0, 2.0, 3.0]})
sample_output = model.predict(sample_input, temperature=1.0)

# Define ParamSpec for the inference parameter
params = [
    model_signature.ParamSpec(
        name="temperature",
        dtype=model_signature.DataType.FLOAT,
        default_value=1.0,
    ),
]

# Infer signature with params
sig = model_signature.infer_signature(
    input_data=sample_input,
    output_data=sample_output,
    params=params,
)

# Log model with the signature
registry = Registry(session)
mv = registry.log_model(
    model=model,
    model_name="my_model_with_params",
    version_name="v1",
    signatures={"predict": sig},
)

# Run inference with custom parameter value
result = mv.run(sample_input, function_name="predict", params={"temperature": 2.0})
  • Feature Store: Added auto_prefix parameter and with_name() method to avoid column name collisions when joining multiple feature views in dataset generation.

  • Feature Store: Added support for Dynamic Iceberg Tables as the backing storage for Feature Views. Use StorageConfig with StorageFormat.ICEBERG to create Iceberg-backed Feature Views that store data in open Apache Iceber