Give AlbumentationsX a star on GitHub — it powers this leaderboard

Star on GitHub

databricks-labs-blueprint

Common libraries for Databricks Labs

Downloads: 0 (30 days)

Description

<!-- FOR CONTRIBUTORS: Edit this file in Visual Studio Code with the recommended extensions, so that we update the table of contents automatically -->

Databricks Labs Blueprint

python codecov lines of code

Baseline for Databricks Labs projects written in Python. Sources are validated with mypy and pylint. See Contributing instructions if you would like to improve this project.

<!-- TOC --> <!-- TOC -->

Installation

You can install this project via pip:

pip install databricks-labs-blueprint

Batteries Included

This library contains a proven set of building blocks, tested in production through UCX and projects.

Python-native pathlib.Path-like interfaces

This library exposes subclasses of pathlib from Python's standard library that work with Databricks Workspace paths. These classes provide a more intuitive and Pythonic way to work with Databricks Workspace paths than the standard str paths. The classes are designed to be drop-in replacements for pathlib.Path and provide additional functionality for working with Databricks Workspace paths.

[back to top]

Working With User Home Folders

This code initializes a client to interact with a Databricks workspace, creates a relative workspace path (~/some-folder/foo/bar/baz), verifies the path is not absolute, and then demonstrates that converting this relative path to an absolute path is not implemented and raises an error. Subsequently, it expands the relative path to the user's home directory and creates the specified directory if it does not already exist.

from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/foo/bar/baz")
assert not wsp.is_absolute()

wsp.absolute()  # raises NotImplementedError

with_user = wsp.expanduser()
with_user.mkdir()

user_name = ws.current_user.me().user_name
wsp_check = WorkspacePath(ws, f"/Users/{user_name}/{name}/foo/bar/baz")
assert wsp_check.is_dir()

wsp_check.parent.rmdir() # raises BadRequest
wsp_check.parent.rmdir(recursive=True)

assert not wsp_check.exists()

[back to top]

Relative File Paths

This code expands the ~ symbol to the full path of the user's home directory, computes the relative path from this home directory to the previously created directory (~/some-folder/foo/bar/baz), and verifies it matches the expected relative path (some-folder/foo/bar/baz). It then confirms that the expanded path is absolute, checks that calling absolute() on this path returns the path itself, and converts the path to a FUSE-compatible path format (/Workspace/username@example.com/some-folder/foo/bar/baz).

from pathlib import Path
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/foo/bar/baz")
with_user = wsp.expanduser()

home = WorkspacePath(ws, "~").expanduser()
relative_name = with_user.relative_to(home)
assert relative_name.as_posix() == f"{name}/foo/bar/baz"

assert with_user.is_absolute()
assert with_user.absolute() == with_user
assert with_user.as_fuse() == Path("/Workspace") / with_user.as_posix()

[back to top]

Browser URLs for Workspace Paths

as_uri() method returns a browser-accessible URI for the workspace path. This example retrieves the current user's username from the Databricks workspace client, constructs a browser-accessible URI for the previously created directory (~/some-folder/foo/bar/baz) by formatting the host URL and encoding the username, and then verifies that the URI generated by the with_user path object matches the constructed browser URI:

from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/foo/bar/baz")
with_user = wsp.expanduser()

user_name = ws.current_user.me().user_name
browser_uri = f'{ws.config.host}#workspace/Users/{user_name.replace("@", "%40")}/{name}/foo/bar/baz'

assert with_user.as_uri() == browser_uri

[back to top]

read/write_text(), read/write_bytes(), and glob() Methods

This code creates a WorkspacePath object for the path ~/some-folder/a/b/c, expands it to the full user path, and creates the directory along with any necessary parent directories. It then creates a file named hello.txt within this directory, writes "Hello, World!" to it, and verifies the content. The code lists all .txt files in the directory and ensures there is exactly one file, which is hello.txt. Finally, it deletes hello.txt and confirms that the file no longer exists.

from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/a/b/c")
with_user = wsp.expanduser()
with_user.mkdir(parents=True)

hello_txt = with_user / "hello.txt"
hello_txt.write_text("Hello, World!")
assert hello_txt.read_text() == "Hello, World!"

files = list(with_user.glob("**/*.txt"))
assert len(files) == 1
assert hello_txt == files[0]
assert files[0].name == "hello.txt"

with_user.joinpath("hello.txt").unlink()

assert not hello_txt.exists()

read_bytes() method works as expected:

from databricks.sdk import WorkspaceClient
from databricks