Skip to main content

A Spark entry point for python

Project description

Changelog

v0.5.5

  • Added –proxy option in order to set a proxy to access to Python packages repositories.

v0.5.4

  • Added plugin-env section on configuration file in order to be able to set environment variables on plugin download process.

  • Added –plugin-env option (and its environment variable associated SPARPY_PLUGIN_ENVVARS) in order to set environment variables on plugin download process. It could be necessary on some cases using conda environments.

  • Added environment variable SPARPY_CONFIG for option –config.

  • Added environment variable SPARPY_DEBUG for option –debug.

v0.5.3

  • Fix isparpy.

v0.5.2

  • Fix ignoring all packages when exclude packages list is empty.

v0.5.1

  • Fix Python package regex.

  • Fix download script.

v0.5.0

  • Added –exclude-python-packages option in order to exclude python packages.

  • Better parsing plugins names.

  • Added –exclude-packages option in order to exclude spark packages.

v0.4.5

  • Fix isparpy entrypoint. Allows –class parameter.

  • Allow to set constraints files.

v0.4.4

  • Don’t set master and deploy_mode default values.

v0.4.3

  • Fix sparpy-submit entrypoint.

  • Fix –property-file option.

  • Fix –class option.

v0.4.2

  • Able to use environment variables for the most of options.

v0.4.1

  • Support to set pip options as configuration using –conf sparpy.config-key=value in order to allow to use sparpy-submit in EMR-on-EKS images.

  • Allows –class in order to allow to use sparpy-submit in EMR-on-EKS images.

  • Allows –property-file in order to allow to use sparpy-submit in EMR-on-EKS images.

v0.4.0

  • Added –pre option in order to allow pre-release packages.

  • Added –env option in order to set environment variables for spark process.

  • Added spark-env config section in order to set environment variables for spark process.

  • Write pip output when it fails.

  • Fixed problems with interactive sparpy.

  • Fixed no-self option in config file.

  • Allow to use plugins that don’t use click. They must be callable with one argument of type Sequence[str] in order to pass arguments to it.

  • Added –version option in order to print sparpy version.

  • Fixed error when a plugin requires a package which is already installed but version does not satisfy requirement.

  • Sparpy does not print error traceback when subprocess fails.

v0.3.0

  • Enable –force-download option.

  • Added –find-links option in order to use a directory as package repository.

  • Added –no-index option in order to avoid to use external package repositories.

  • Added –queue option in order to set yarn queue.

  • Ensure driver’s python executable is same python as sparpy.

  • Added new entry point sparpy-download just to download packages to specific directory.

  • Added new entry point isparpy in order to start an interactive session.

v0.2.1

  • Force pyspark python executable to same as sparpy.

  • Fix unrecognized options.

  • Fix default configuration file names.

v0.2.0

  • Added configuration file option.

  • Added –debug option.

How to build a Sparpy plugin

On package setup.py an entry point should be configured for Sparpy:

setup(
    name='yourpackage',
    ...

    entry_points={
        ...
        'sparpy.cli_plugins': [
            'my_command_1=yourpackage.module:command_1',
            'my_command_2=yourpackage.module:command_2',
        ]
    }
)

Install

It must be installed on a Spark edge node.

$  pip install sparpy[base]

How to use

Using default Spark submit parameters:

$ sparpy --plugin "mypackage>=0.1" my_plugin_command --myparam 1

Configuration files

sparpy and sparpu-submit accept the parameter –config that allow to set a configuration file. If it is not set it will try to use configuration file $HOME/.sparpyrc. It if does not exist it will try to use /etc/sparpy.conf.

Format:

[spark]

master=yarn
deploy-mode=client

queue=my_queue

spark-executable=/path/to/my-spark-submit
conf=
    spark.conf.1=value1
    spark.conf.2=value2

packages=
    maven:package_1:0.1.1
    maven:package_2:0.6.1

repositories=
    https://my-maven-repository-1.com/mvn
    https://my-maven-repository-2.com/mvn

reqs_paths=
    /path/to/dir/with/python/packages_1
    /path/to/dir/with/python/packages_2

[spark-env]

MY_ENV_VAR=value

[plugins]

extra-index-urls=
    https://my-pypi-repository-1.com/simple
    https://my-pypi-repository-2.com/simple

cache-dir=/path/to/cache/dir

plugins=
    my-package1
    my-package2==0.1.2

requirements-files=
    /path/to/requirement-1.txt
    /path/to/requirement-2.txt

find-links=
    /path/to/directory/with/packages_1
    /path/to/directory/with/packages_2

download-dir-prefix=my_prefix_

no-index=false
no-self=false
force-download=true

[plugin-env]

MY_ENV_VAR=value

[interactive]

pyspark-executable=/path/to/pyspark
python-interactive-driver=/path/to/interactive/driver

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

sparpy-0.5.5-py3-none-any.whl (26.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page