databricks-labs-blueprint
Common libraries for Databricks Labs
Description
Databricks Labs Blueprint
Baseline for Databricks Labs projects written in Python. Sources are validated with mypy and pylint. See Contributing instructions if you would like to improve this project.
- Databricks Labs Blueprint
- Installation
- Batteries Included
- Python-native
pathlib.Path-like interfaces - Basic Terminal User Interface (TUI) Primitives
- Nicer Logging Formatter
- Parallel Task Execution
- Application and Installation State
- Install Folder
- Detecting Current Installation
- Detecting Installations From All Users
- Saving
@dataclassconfiguration - Saving CSV files
- Loading
@dataclassconfiguration - Brute-forcing
SerdeErrorwithas_dict()andfrom_dict() - Configuration Format Evolution
- Uploading Untyped Files
- Listing All Files in the Install Folder
- Unit Testing Installation State
- Assert Rewriting with PyTest
- Application State Migrations
- Building Wheels
- Databricks CLI's
databricks labs ...Router
- Python-native
- Notable Downstream Projects
- Project Support
Installation
You can install this project via pip:
pip install databricks-labs-blueprint
Batteries Included
This library contains a proven set of building blocks, tested in production through UCX and projects.
Python-native pathlib.Path-like interfaces
This library exposes subclasses of pathlib from Python's standard
library that work with Databricks Workspace paths. These classes provide a more intuitive and Pythonic way to work
with Databricks Workspace paths than the standard str paths. The classes are designed to be drop-in replacements
for pathlib.Path and provide additional functionality for working with Databricks Workspace paths.
Working With User Home Folders
This code initializes a client to interact with a Databricks workspace, creates
a relative workspace path (~/some-folder/foo/bar/baz), verifies the path is not absolute, and then demonstrates
that converting this relative path to an absolute path is not implemented and raises an error. Subsequently,
it expands the relative path to the user's home directory and creates the specified directory if it does not
already exist.
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath
name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/foo/bar/baz")
assert not wsp.is_absolute()
wsp.absolute() # raises NotImplementedError
with_user = wsp.expanduser()
with_user.mkdir()
user_name = ws.current_user.me().user_name
wsp_check = WorkspacePath(ws, f"/Users/{user_name}/{name}/foo/bar/baz")
assert wsp_check.is_dir()
wsp_check.parent.rmdir() # raises BadRequest
wsp_check.parent.rmdir(recursive=True)
assert not wsp_check.exists()
Relative File Paths
This code expands the ~ symbol to the full path of the user's home directory, computes the relative path from this
home directory to the previously created directory (~/some-folder/foo/bar/baz), and verifies it matches the expected
relative path (some-folder/foo/bar/baz). It then confirms that the expanded path is absolute, checks that
calling absolute() on this path returns the path itself, and converts the path to a FUSE-compatible path
format (/Workspace/username@example.com/some-folder/foo/bar/baz).
from pathlib import Path
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath
name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/foo/bar/baz")
with_user = wsp.expanduser()
home = WorkspacePath(ws, "~").expanduser()
relative_name = with_user.relative_to(home)
assert relative_name.as_posix() == f"{name}/foo/bar/baz"
assert with_user.is_absolute()
assert with_user.absolute() == with_user
assert with_user.as_fuse() == Path("/Workspace") / with_user.as_posix()
Browser URLs for Workspace Paths
as_uri() method returns a browser-accessible URI for the workspace path. This example retrieves the current user's username
from the Databricks workspace client, constructs a browser-accessible URI for the previously created directory
(~/some-folder/foo/bar/baz) by formatting the host URL and encoding the username, and then verifies that the URI
generated by the with_user path object matches the constructed browser URI:
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath
name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/foo/bar/baz")
with_user = wsp.expanduser()
user_name = ws.current_user.me().user_name
browser_uri = f'{ws.config.host}#workspace/Users/{user_name.replace("@", "%40")}/{name}/foo/bar/baz'
assert with_user.as_uri() == browser_uri
read/write_text(), read/write_bytes(), and glob() Methods
This code creates a WorkspacePath object for the path ~/some-folder/a/b/c, expands it to the full user path,
and creates the directory along with any necessary parent directories. It then creates a file named hello.txt within
this directory, writes "Hello, World!" to it, and verifies the content. The code lists all .txt files in the directory
and ensures there is exactly one file, which is hello.txt. Finally, it deletes hello.txt and confirms that the file
no longer exists.
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath
name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/a/b/c")
with_user = wsp.expanduser()
with_user.mkdir(parents=True)
hello_txt = with_user / "hello.txt"
hello_txt.write_text("Hello, World!")
assert hello_txt.read_text() == "Hello, World!"
files = list(with_user.glob("**/*.txt"))
assert len(files) == 1
assert hello_txt == files[0]
assert files[0].name == "hello.txt"
with_user.joinpath("hello.txt").unlink()
assert not hello_txt.exists()
read_bytes() method works as expected:
from databricks.sdk import WorkspaceClient
from databricks