azkaban

Azkaban CLI

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

A lightweight Azkaban client providing:

A command line interface to run jobs, upload projects, and much more.

$ azkaban upload my_project.zip
Project my_project successfully uploaded (id: 1, size: 205kB, version: 1).
Details at https://azkaban.server.url/manager?project=my_project

$ azkaban run my_workflow
Flow my_workflow successfully submitted (execution id: 1).
Details at https://azkaban.server.url/executor?execid=1

A simple syntax to define workflows from a single python file.

from azkaban import Job, Project

project = Project('my_project')
project.add_file('/path/to/bar.txt', 'bar.txt')
project.add_job('bar', Job({'type': 'command', 'command': 'cat bar.txt'}))

Installation

Using pip:

$ pip install azkaban

Command line interface

Once installed, the azkaban executable provides the following commands:

azkaban (create | delete) [options]
azkaban run [options] FLOW [JOB ...]
azkaban upload [options] ZIP

Running azkaban --help shows the full list of options.

URLs and aliases

The previous commands all take a --url, or -u, option used to specify where to find the Azkaban server (and which user to connect as).

$ azkaban create -u http://url.to.foo.server:port

In order to avoid having to input the entire URL every time, it is possible to defines aliases in ~/.azkabanrc:

[azkaban]
default.alias = foo
[alias]
foo = http://url.to.foo.server:port
bar = baruser@http://url.to.bar.server

We can now interact directly with each of these URLs using the --alias, or -a option followed by their corresponding alias. Since we also specified a default alias, it is also possible to omit the option altogether. As a result, the commands below are all equivalent:

$ azkaban create -u http://url.to.foo.server:port
$ azkaban create -a foo
$ azkaban create

Finally, our session ID for a given URL is cached on each successful login, so that we don’t have to authenticate on every remote interaction.

Examples

Creating and deleting projects:

$ azkaban create
Project name: my_project
Description [my_project]: Some interesting description.
Project my_project successfully created.
Details at https://azkaban.server.url/manager?project=my_project

$ azkaban delete -a bar
Project name: my_project
Project my_project successfully deleted.

Uploading an already built archive to an Azkaban server:
```
$ azkaban upload -p my_project my_project.zip
```
Run entire workflows, or individual jobs:
```
$ azkaban run -p my_project my_workflow
```

Syntax

For medium to large sized projects, it quickly becomes tricky to manage the multitude of files required for each workflow. .properties files are helpful but still do not provide the flexibility to generate jobs programmatically (i.e. using for loops, etc.). This approach also requires us to manually bundle and upload our project to the gateway every time.

We provide here a convenient framework to define jobs from a single python file. This framework is entirely compatible with the command line interface above, and even provides additional functionality (e.g. building and uploading projects in a single command).

Quickstart

We start by creating a configuration file for our project. Let’s call it jobs.py, the default file name the command line tool will look for. Here’s a simple example of how we could define a project with a single job and static file:

from azkaban import Job, Project

project = Project('foo')
project.add_file('/path/to/bar.txt', 'bar.txt')
project.add_job('bar', Job({'type': 'command', 'command': 'cat bar.txt'}))

The add_file method adds a file to the project archive (the second optional argument specifies the destination path inside the zip file). The add_job method will trigger the creation of a .job file. The first argument will be the file’s name, the second is a Job instance (cf. Job options).

Once we’ve saved our jobs file, the following additional commands are available to us:

azkaban list, see the list of all jobs in the current project.
azkaban view, view the contents of the .job file for a given job.
azkaban build, build the project archive and store it locally.

Job options

The Job class is a light wrapper which allows the creation of .job files using python dictionaries.

It also provides a convenient way to handle options shared across multiple jobs: the constructor can take in multiple options dictionaries and the last definition of an option (i.e. later in the arguments) will take precedence over earlier ones.

We can use this to efficiently share default options among jobs, for example:

defaults = {'user.to.proxy': 'boo', 'retries': 0}

jobs = [
  Job({'type': 'noop'}),
  Job(defaults, {'type': 'noop'}),
  Job(defaults, {'type': 'command', 'command': 'ls'}),
  Job(defaults, {'type': 'command', 'command': 'ls -l', 'retries': 1}),
]

All jobs except the first one will have their user.to.proxy property set. Note also that the last job overrides the retries property.

Alternatively, if we really don’t want to pass the defaults dictionary around, we can create a new Job subclass to do it for us:

class BooJob(Job):

  def __init__(self, *options):
    super(BooJob, self).__init__(defaults, *options)

Nested options

Nested dictionaries can be used to group options concisely:

# e.g. this job
Job({
  'proxy.user': 'boo',
  'proxy.keytab.location': '/path',
  'param.input': 'foo',
  'param.output': 'bar',
})
# is equivalent to this one
Job({
  'proxy': {'user': 'boo', 'keytab.location': '/path'},
  'param': {'input': 'foo', 'output': 'bar'},
})

Merging projects

If you have multiple projects, you can merge them together to create a single project. The merge is done in place on the project the method is called on. The first project will retain its original name.

from azkaban import Job, Project

project1 = Project('foo')
project1.add_file('/path/to/bar.txt', 'bar.txt')
project1.add_job('bar', Job({'type': 'command', 'command': 'cat bar.txt'}))

project2 = Project('qux')
project2.add_file('/path/to/baz.txt', 'baz.txt')
project2.add_job('baz', Job({'type': 'command', 'command': 'cat baz.txt'}))

# project1 will now contain baz.txt and the baz job from project2
project2.merge_into(project1)

Job details

The info command becomes quite powerful when combined with other Unix tools. Here are a few examples:

$ # To count the number of jobs per type
$ azkaban info -o type | cut -f 2 | sort | uniq -c
$ # To only view the list of jobs of a certain type with their dependencies
$ azkaban info -o type,dependencies | awk -F '\t' '($2 == "job_type")'
$ # To view the size of each file in the project
$ azkaban info -f | xargs -n 1 du -h

Next steps

Any valid python code can go inside the jobs configuration file. This includes using loops to add jobs, subclassing the base Job class to better suit a project’s needs (e.g. by implementing the on_add and on_build handlers), …

Extensions

Pig

Because pig jobs are so common, a PigJob class is provided which accepts a file path (to the pig script) as first constructor argument, optionally followed by job options. It then automatically sets the job type and adds the corresponding script file to the project.

from azkaban import PigJob

project.add_job('baz', PigJob('/.../baz.pig', {'dependencies': 'bar'}))

Using a custom pig type is as simple as changing the PigJob.type class variable.

This extension also comes with the azkabanpig executable to run pig scripts directly. azkabanpig --help will display the list of available options (using UDFs, substituting parameters, running several scripts in order, etc.).

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.9.14

Jan 28, 2020

0.9.13

Oct 27, 2019

0.9.12

Sep 23, 2019

0.9.11

May 7, 2019

0.9.10

Mar 14, 2019

0.9.9

Jan 19, 2019

0.9.8

Oct 30, 2018

0.9.7

Apr 4, 2017

0.9.6

Mar 31, 2017

0.9.5

Aug 26, 2016

0.9.4

Aug 23, 2016

0.9.3

Feb 12, 2016

0.9.2

Feb 12, 2016

0.9.1

Jul 18, 2015

0.9.0

Jul 13, 2015

0.8.7

Jun 26, 2015

0.8.6

Jun 26, 2015

0.8.5

Apr 29, 2015

0.8.4

Apr 29, 2015

0.8.3

Apr 29, 2015

0.8.2

Apr 25, 2015

0.8.1

Apr 24, 2015

0.8.0

Apr 23, 2015

0.7.2

Feb 13, 2015

0.7.1

Feb 5, 2015

0.7.0

Nov 10, 2014

0.6.45

Oct 16, 2014

0.6.44

Oct 16, 2014

0.6.43

Aug 25, 2014

0.6.42

Aug 14, 2014

0.6.41

Aug 14, 2014

0.6.40

Aug 4, 2014

0.6.39

Aug 2, 2014

0.6.38

Jul 31, 2014

0.6.37

Jul 30, 2014

0.6.36

Jul 30, 2014

0.6.35

Jul 30, 2014

0.6.34

Jul 30, 2014

0.6.33

Jul 29, 2014

0.6.32

Jul 29, 2014

0.6.31

Jul 29, 2014

0.6.30

Jul 29, 2014

0.6.29

Jul 27, 2014

0.6.28

Jul 27, 2014

0.6.27

Jul 27, 2014

0.6.26

Jul 27, 2014

0.6.25

Jul 26, 2014

0.6.24

Jul 26, 2014

0.6.23

Jul 25, 2014

0.6.22

Jul 25, 2014

0.6.21

Jul 9, 2014

0.6.20

Jul 9, 2014

0.6.19

Jul 8, 2014

0.6.18

Jul 8, 2014

0.6.16

Jun 2, 2014

0.6.15

Jun 1, 2014

0.6.14

May 22, 2014

0.6.13

May 16, 2014

0.6.12

May 16, 2014

0.6.11

May 10, 2014

0.6.10

May 10, 2014

0.6.9

May 10, 2014

0.6.8

May 9, 2014

0.6.7

May 9, 2014

0.6.6

May 2, 2014

0.6.5

May 1, 2014

0.6.4

Apr 28, 2014

0.6.3

Apr 28, 2014

0.6.2

Apr 27, 2014

0.6.1

Apr 27, 2014

0.6.0

Apr 26, 2014

0.5.6

Apr 17, 2014

0.5.5

Apr 17, 2014

0.5.4

Apr 16, 2014

0.5.2

Apr 16, 2014

0.5.1

Apr 16, 2014

0.5.0

Apr 15, 2014

0.4.2

Apr 15, 2014

0.4.1

Apr 14, 2014

0.4.0

Apr 10, 2014

0.3.11

Apr 10, 2014

0.3.10

Apr 4, 2014

0.3.9

Apr 2, 2014

0.3.8

Mar 7, 2014

0.3.7

Mar 6, 2014

0.3.6

Mar 6, 2014

0.3.5

Mar 6, 2014

0.3.4

Mar 4, 2014

0.3.3

Mar 4, 2014

0.3.2

Feb 28, 2014

0.3.1

Feb 18, 2014

This version

0.3.0

Feb 16, 2014

0.2.7

Feb 13, 2014

0.2.6

Feb 13, 2014

0.2.5

Feb 12, 2014

0.2.4

Feb 12, 2014

0.2.3

Feb 12, 2014

0.2.2

Feb 12, 2014

0.2.1

Feb 11, 2014

0.2.0

Feb 11, 2014

0.1.12

Feb 7, 2014

0.1.11

Feb 7, 2014

0.1.10

Feb 5, 2014

0.1.9

Jan 31, 2014

0.1.8

Nov 24, 2013

0.1.7

Nov 14, 2013

0.1.6

Oct 29, 2013

0.1.5

Oct 19, 2013

0.1.4

Oct 19, 2013

0.1.3

Oct 18, 2013

0.1.2

Oct 18, 2013

0.1.1

Oct 18, 2013

0.1.0

Oct 18, 2013

0.0.1

Oct 17, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azkaban-0.3.0.tar.gz (22.8 kB view hashes)

Uploaded Feb 16, 2014 Source

Hashes for azkaban-0.3.0.tar.gz

Hashes for azkaban-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`a1b2d33bb0ac13253fb531ad3c73998500ac722e80b74ca012dd25f08e2f5024`
MD5	`f6864ab48b7fe49c45c5172215051736`
BLAKE2b-256	`4b7ecb0824d08b307031e678fd35b3d598d60816a6bdfb83e4bf9ade84174fd3`