azkaban 0.1.9

Azkaban CLI

Latest Version: 0.9.3

Lightweight command line interface (CLI) for Azkaban:

  • Define jobs from a single python file.
  • Build projects and upload to Azkaban from the command line.


Using pip:

$ pip install azkaban


We first create a configuration file for our project. Let’s call it, although any name would work. Here’s a simple example of how we could define a project with a single job and static file:

from azkaban import Job, Project

project = Project('foo')
project.add_file('/path/to/bar.txt', 'bar.txt')
project.add_job('bar', Job({'type': 'command', 'command': 'cat bar.txt'}))

if __name__ == '__main__':

The add_file method adds a file to the project archive (the second optional argument specifies the destination path inside the zip file). The add_job method will trigger the creation of a .job file. The first argument will be the file’s name, the second is a Job instance (cf. Job options).

Once we’ve saved our jobs file, the following commands are available to us:

  • python list, see the list of all jobs in the current project.
  • python view, view the contents of the .job file for a given job.
  • python build, build the project archive and store it locally.
  • python upload, build and upload the project to an Azkaban server.

Running python --help shows the list of options for each of the previous commands.

Job options

The Job class is a light wrapper which allows the creation of .job files using python dictionaries.

It also provides a convenient way to handle options shared across multiple jobs: the constructor can take in multiple options dictionaries and the last definition of an option (i.e. later in the arguments) will take precedence over earlier ones.

We can use this to efficiently share default options among jobs, for example:

defaults = {'': 'boo', 'retries': 0}

jobs = [
  Job({'type': 'noop'}),
  Job(defaults, {'type': 'noop'}),
  Job(defaults, {'type': 'command', 'command': 'ls'}),
  Job(defaults, {'type': 'command', 'command': 'ls -l', 'retries': 1}),

All jobs except the first one will have their property set. Note also that the last job overrides the retries property.

Alternatively, if we really don’t want to pass the defaults dictionary around, we can create a new Job subclass to do it for us:

class BooJob(Job):

  def __init__(self, *options):
    super(BooJob, self).__init__(defaults, *options)



To avoid having to enter the server’s URL and our username on every upload (or hard-coding it into our project’s configuration file, ugh), we can define aliases in ~/.azkabanrc:

url =
url =
user = baruser

We can now upload directly to each of these URLs with the shorthand:

$ python upload -a foo

This has the added benefit that we won’t have to authenticate on every upload. The session ID is cached and reused for later connections.

Nested options

Nested dictionaries can be used to group options concisely:

# e.g. this job
  'proxy.user': 'boo',
  'proxy.keytab.location': '/path',
  'param.input': 'foo',
  'param.output': 'bar',
# is equivalent to this one
  'proxy': {'user': 'boo', 'keytab.location': '/path'},
  'param': {'input': 'foo', 'output': 'bar'},

Pig jobs

Because pig jobs are so common, a PigJob class is provided which accepts a file path (to the pig script) as first constructor argument, optionally followed by job options. It then automatically sets the job type and adds the corresponding script file to the project.

from azkaban import PigJob

project.add_job('baz', PigJob('/.../baz.pig', {'dependencies': 'bar'}))

Using a custom pig type is as simple as changing the PigJob.type class variable.

Merging projects

If you have multiple projects, you can merge them together to create a single project. The merge is done in place on the project the method is called on. The first project will retain its original name.

from azkaban import Job, Project

project1 = Project('foo')
project1.add_file('/path/to/bar.txt', 'bar.txt')
project1.add_job('bar', Job({'type': 'command', 'command': 'cat bar.txt'}))

project2 = Project('qux')
project2.add_file('/path/to/baz.txt', 'baz.txt')
project2.add_job('baz', Job({'type': 'command', 'command': 'cat baz.txt'}))

# project1 will now contain baz.txt and the baz job from project2

if __name__ == '__main__':

Next steps

Any valid python code can go inside the jobs configuration file. This includes using loops to add jobs, subclassing the base Job class to better suit a project’s needs (e.g. by implementing the on_add and on_build handlers), …

