Databricks SDK for Python (Beta)

PyPI

Beta: This SDK is supported for production use cases, but we do expect future releases to have some interface changes; see Interface stability. We are keen to hear feedback from you on these SDKs. Please file issues, and we will address them. | See also the SDK for Java | See also the SDK for Go | See also the Terraform Provider | See also cloud-specific docs (AWS, Azure, GCP) | See also the API reference on readthedocs

The Databricks SDK for Python includes functionality to accelerate development with Python for the Databricks Lakehouse. It covers all public Databricks REST API operations. The SDK's internal HTTP client is robust and handles failures on different levels by performing intelligent retries.

Getting started
Code examples
Authentication
Long-running operations
Paginated responses
Retries
Single-sign-on with OAuth
User Agent Request Attribution
Error handling
Logging
Integration with dbutils
Interface stability

Getting started<a id="getting-started"></a>

Please install Databricks SDK for Python via pip install databricks-sdk and instantiate WorkspaceClient:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
for c in w.clusters.list():
    print(c.cluster_name)

Databricks SDK for Python is compatible with Python 3.7 (until June 2023), 3.8, 3.9, 3.10, and 3.11.
Note: Databricks Runtime starting from version 13.1 includes a bundled version of the Python SDK.
It is highly recommended to upgrade to the latest version which you can do by running the following in a notebook cell:

%pip install --upgrade databricks-sdk

followed by

dbutils.library.restartPython()

Code examples<a id="code-examples"></a>

The Databricks SDK for Python comes with a number of examples demonstrating how to use the library for various common use-cases, including

These examples and more are located in the examples/ directory of the Github repository.

Some other examples of using the SDK include:

Unity Catalog Automated Migration heavily relies on Python SDK for working with Databricks APIs.
ip-access-list-analyzer checks & prunes invalid entries from IP Access Lists.

Authentication<a id="authentication"></a>

If you use Databricks configuration profiles or Databricks-specific environment variables for Databricks authentication, the only code required to start working with a Databricks workspace is the following code snippet, which instructs the Databricks SDK for Python to use its default authentication flow:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
w. # press <TAB> for autocompletion

The conventional name for the variable that holds the workspace-level client of the Databricks SDK for Python is w, which is shorthand for workspace.

In this section

Default authentication flow
Databricks native authentication
Azure native authentication
Overriding .databrickscfg
Additional authentication configuration options

Default authentication flow

If you run the Databricks Terraform Provider, the Databricks SDK for Go, the Databricks CLI, or applications that target the Databricks SDKs for other languages, most likely they will all interoperate nicely together. By default, the Databricks SDK for Python tries the following authentication methods, in the following order, until it succeeds:

Databricks native authentication
Azure native authentication
If the SDK is unsuccessful at this point, it returns an authentication error and stops running.

You can instruct the Databricks SDK for Python to use a specific authentication method by setting the auth_type argument as described in the following sections.

For each authentication method, the SDK searches for compatible authentication credentials in the following locations, in the following order. Once the SDK finds a compatible set of credentials that it can use, it stops searching:

Credentials that are hard-coded into configuration arguments.

:warning: Caution: Databricks does not recommend hard-coding credentials into arguments, as they can be exposed in plain text in version control systems. Use environment variables or configuration profiles instead.
Credentials in Databricks-specific environment variables.
For Databricks native authentication, credentials in the .databrickscfg file's DEFAULT configuration profile from its default file location (~ for Linux or macOS, and %USERPROFILE% for Windows).
For Azure native authentication, the SDK searches for credentials through the Azure CLI as needed.

Depending on the Databricks authentication method, the SDK uses the following information. Presented are the WorkspaceClient and AccountClient arguments (which have corresponding .databrickscfg file fields), their descriptions, and any corresponding environment variables.

Databricks native authentication

By default, the Databricks SDK for Python initially tries Databricks token authentication (auth_type='pat' argument). If the SDK is unsuccessful, it then tries Workload Identity Federation (WIF). See Supported WIF for the supported JWT token providers.

For Databricks token authentication, you must provide host and token; or their environment variable or .databrickscfg file field equivalents.
For Databricks OIDC authentication, you must provide the host, client_id and token_audience (optional) either directly, through the corresponding environment variables, or in your .databrickscfg configuration file.
For Azure DevOps OIDC authentication, the token_audience is irrelevant as the audience is always set to api://AzureADTokenExchange. Also, the System.AccessToken pipeline variable required for OIDC request must be exposed as the SYSTEM_ACCESSTOKEN environment variable, following Pipeline variables

| Argument | Description | Environment variable | |------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

databricks-sdk

Description