Prefect integrations with Databricks

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

prefect-databricks

Visit the full docs here to see additional examples and the API reference.

Welcome!

Prefect integrations for interacting with Databricks

The tasks within this collection were created by a code generator using the service's OpenAPI spec.

The service's REST API documentation can be found here.

Getting Started

Python setup

Requires an installation of Python 3.7+.

We recommend using a Python virtual environment manager such as pipenv, conda or virtualenv.

These tasks are designed to work with Prefect 2. For more information about how to use Prefect, please refer to the Prefect documentation.

Installation

Install prefect-databricks with pip:

pip install prefect-databricks

A list of available blocks in prefect-databricks and their setup instructions can be found here.

Lists jobs on the Databricks instance

from prefect import flow
from prefect_databricks import DatabricksCredentials
from prefect_databricks.jobs import jobs_list


@flow
def example_execute_endpoint_flow():
    databricks_credentials = DatabricksCredentials.load("my-block")
    jobs = jobs_list(
        databricks_credentials,
        limit=5
    )
    return jobs

example_execute_endpoint_flow()

Use `with_options` to customize options on any existing task or flow

custom_example_execute_endpoint_flow = example_execute_endpoint_flow.with_options(
    name="My custom flow name",
    retries=2,
    retry_delay_seconds=10,
)

Launch a new cluster and run a Databricks notebook

Notebook named example.ipynb on Databricks which accepts a name parameter:

name = dbutils.widgets.get("name")
message = f"Don't worry {name}, I got your request! Welcome to prefect-databricks!"
print(message)

Prefect flow that launches a new cluster to run example.ipynb:

from prefect import flow
from prefect_databricks import DatabricksCredentials
from prefect_databricks.jobs import jobs_runs_submit
from prefect_databricks.models.jobs import (
    AutoScale,
    AwsAttributes,
    JobTaskSettings,
    NotebookTask,
    NewCluster,
)


@flow
def jobs_runs_submit_flow(notebook_path, **base_parameters):
    databricks_credentials = DatabricksCredentials.load("my-block")

    # specify new cluster settings
    aws_attributes = AwsAttributes(
        availability="SPOT",
        zone_id="us-west-2a",
        ebs_volume_type="GENERAL_PURPOSE_SSD",
        ebs_volume_count=3,
        ebs_volume_size=100,
    )
    auto_scale = AutoScale(min_workers=1, max_workers=2)
    new_cluster = NewCluster(
        aws_attributes=aws_attributes,
        autoscale=auto_scale,
        node_type_id="m4.large",
        spark_version="10.4.x-scala2.12",
        spark_conf={"spark.speculation": True},
    )

    # specify notebook to use and parameters to pass
    notebook_task = NotebookTask(
        notebook_path=notebook_path,
        base_parameters=base_parameters,
    )

    # compile job task settings
    job_task_settings = JobTaskSettings(
        new_cluster=new_cluster,
        notebook_task=notebook_task,
        task_key="prefect-task"
    )

    run = jobs_runs_submit(
        databricks_credentials=databricks_credentials,
        run_name="prefect-job",
        tasks=[job_task_settings]
    )

    return run


jobs_runs_submit_flow("/Users/username@gmail.com/example.ipynb", name="Marvin")

Note, instead of using the built-in models, you may also input valid JSON. For example, AutoScale(min_workers=1, max_workers=2) is equivalent to {"min_workers": 1, "max_workers": 2}.

For more tips on how to use tasks and flows in a Collection, check out Using Collections!

Resources

If you encounter any bugs while using prefect-databricks, feel free to open an issue in the prefect-databricks repository.

If you have any questions or issues while using prefect-databricks, you can find help in either the Prefect Discourse forum or the Prefect Slack community.

Feel free to star or watch prefect-databricks for updates too!

Contributing

If you'd like to help contribute to fix an issue or add a feature to prefect-databricks, please propose changes through a pull request from a fork of the repository.

Here are the steps:

Fork the repository
Clone the forked repository
Install the repository and its dependencies:

pip install -e ".[dev]"

Make desired changes
Add tests
Insert an entry to CHANGELOG.md
Install pre-commit to perform quality checks prior to commit:

pre-commit install

git commit, git push, and create a pull request

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.2.5

Apr 26, 2024

0.2.4

Apr 25, 2024

0.2.3

Nov 29, 2023

0.2.2

Nov 13, 2023

0.2.1

Oct 27, 2023

0.2.0

Oct 5, 2023

0.1.6

Jun 16, 2023

0.1.5

May 30, 2023

0.1.4

Jan 4, 2023

0.1.3

Sep 23, 2022

0.1.2

Sep 21, 2022

0.1.1

Aug 19, 2022

0.1.0

Aug 15, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prefect_databricks-0.2.5.tar.gz (132.5 kB view hashes)

Uploaded Apr 26, 2024 Source

Built Distribution

prefect_databricks-0.2.5-py3-none-any.whl (131.4 kB view hashes)

Uploaded Apr 26, 2024 Python 3

Hashes for prefect_databricks-0.2.5.tar.gz

Hashes for prefect_databricks-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`e3f92572a69fb27101089f13273fe213c2f37e3392a54dd974505e42633be1fe`
MD5	`0a6e594ea29f42d8b98cd18a0e0482a4`
BLAKE2b-256	`2afc9c848b8b69643c4012d003ace416f97caad1d17a51411b897ba2ca05d5ad`

Hashes for prefect_databricks-0.2.5-py3-none-any.whl

Hashes for prefect_databricks-0.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a8d1760218ac67d423ee67a3e37b0a9792b40d318957adaf3941afd97298d874`
MD5	`1c4021c8a00b1c4e185221d4515bc227`
BLAKE2b-256	`748de8a6c614c2afa7b9d5d71607ac63bd61f9afd823368ce26868da095d3cf9`

prefect-databricks 0.2.5

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

prefect-databricks

Welcome!

Getting Started

Python setup

Installation

Lists jobs on the Databricks instance

Use `with_options` to customize options on any existing task or flow

Launch a new cluster and run a Databricks notebook

Resources

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

prefect-databricks 0.2.5

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

prefect-databricks

Welcome!

Getting Started

Python setup

Installation

Lists jobs on the Databricks instance

Use with_options to customize options on any existing task or flow

Launch a new cluster and run a Databricks notebook

Resources

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Use `with_options` to customize options on any existing task or flow