databricks-sdk
Databricks SDK for Python (Beta)
Description
Databricks SDK for Python (Beta)
Beta: This SDK is supported for production use cases, but we do expect future releases to have some interface changes; see Interface stability. We are keen to hear feedback from you on these SDKs. Please file issues, and we will address them. | See also the SDK for Java | See also the SDK for Go | See also the Terraform Provider | See also cloud-specific docs (AWS, Azure, GCP) | See also the API reference on readthedocs
The Databricks SDK for Python includes functionality to accelerate development with Python for the Databricks Lakehouse. It covers all public Databricks REST API operations. The SDK's internal HTTP client is robust and handles failures on different levels by performing intelligent retries.
Contents
- Getting started
- Code examples
- Authentication
- Long-running operations
- Paginated responses
- Retries
- Single-sign-on with OAuth
- User Agent Request Attribution
- Error handling
- Logging
- Integration with
dbutils - Interface stability
Getting started<a id="getting-started"></a>
- Please install Databricks SDK for Python via
pip install databricks-sdkand instantiateWorkspaceClient:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
for c in w.clusters.list():
print(c.cluster_name)
Databricks SDK for Python is compatible with Python 3.7 (until June 2023), 3.8, 3.9, 3.10, and 3.11.
Note: Databricks Runtime starting from version 13.1 includes a bundled version of the Python SDK.
It is highly recommended to upgrade to the latest version which you can do by running the following in a notebook cell:
%pip install --upgrade databricks-sdk
followed by
dbutils.library.restartPython()
Code examples<a id="code-examples"></a>
The Databricks SDK for Python comes with a number of examples demonstrating how to use the library for various common use-cases, including
- Using the SDK with OAuth from a webserver
- Using long-running operations
- Authenticating a client app using OAuth
These examples and more are located in the examples/ directory of the Github repository.
Some other examples of using the SDK include:
- Unity Catalog Automated Migration heavily relies on Python SDK for working with Databricks APIs.
- ip-access-list-analyzer checks & prunes invalid entries from IP Access Lists.
Authentication<a id="authentication"></a>
If you use Databricks configuration profiles or Databricks-specific environment variables for Databricks authentication, the only code required to start working with a Databricks workspace is the following code snippet, which instructs the Databricks SDK for Python to use its default authentication flow:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
w. # press <TAB> for autocompletion
The conventional name for the variable that holds the workspace-level client of the Databricks SDK for Python is w, which is shorthand for workspace.
In this section
- Default authentication flow
- Databricks native authentication
- Azure native authentication
- Overriding .databrickscfg
- Additional authentication configuration options
Default authentication flow
If you run the Databricks Terraform Provider, the Databricks SDK for Go, the Databricks CLI, or applications that target the Databricks SDKs for other languages, most likely they will all interoperate nicely together. By default, the Databricks SDK for Python tries the following authentication methods, in the following order, until it succeeds:
- Databricks native authentication
- Azure native authentication
- If the SDK is unsuccessful at this point, it returns an authentication error and stops running.
You can instruct the Databricks SDK for Python to use a specific authentication method by setting the auth_type argument
as described in the following sections.
For each authentication method, the SDK searches for compatible authentication credentials in the following locations, in the following order. Once the SDK finds a compatible set of credentials that it can use, it stops searching:
-
Credentials that are hard-coded into configuration arguments.
:warning: Caution: Databricks does not recommend hard-coding credentials into arguments, as they can be exposed in plain text in version control systems. Use environment variables or configuration profiles instead.
-
Credentials in Databricks-specific environment variables.
-
For Databricks native authentication, credentials in the
.databrickscfgfile'sDEFAULTconfiguration profile from its default file location (~for Linux or macOS, and%USERPROFILE%for Windows). -
For Azure native authentication, the SDK searches for credentials through the Azure CLI as needed.
Depending on the Databricks authentication method, the SDK uses the following information. Presented are the WorkspaceClient and AccountClient arguments (which have corresponding .databrickscfg file fields), their descriptions, and any corresponding environment variables.
Databricks native authentication
By default, the Databricks SDK for Python initially tries Databricks token authentication (auth_type='pat' argument). If the SDK is unsuccessful, it then tries Workload Identity Federation (WIF). See Supported WIF for the supported JWT token providers.
- For Databricks token authentication, you must provide
hostandtoken; or their environment variable or.databrickscfgfile field equivalents. - For Databricks OIDC authentication, you must provide the
host,client_idandtoken_audience(optional) either directly, through the corresponding environment variables, or in your.databrickscfgconfiguration file. - For Azure DevOps OIDC authentication, the
token_audienceis irrelevant as the audience is always set toapi://AzureADTokenExchange. Also, theSystem.AccessTokenpipeline variable required for OIDC request must be exposed as theSYSTEM_ACCESSTOKENenvironment variable, following Pipeline variables
| Argument | Description | Environment variable | |------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------