Skip to main content

Python packages for Hadoop Streaming

Project description

Ziggy provides a collection of python methods for Hadoop Streaming. Ziggy is useful for building complex MapReduce programs, using Hadoop for batch processing of many files, Monte Carlo processes, graph algorithms, and common utility tasks (e.g. sort, search). Typical usage often looks like this:

#!/usr/bin/env python

import ziggy.hdmc as hdmc
    from glob import glob

    files_to_process = glob("/some/path/*")
    results = hdmc.submit_checkpoint_inline(script_to_run, output_filename, files_to_process, argument_string)

To install run:

python setup.py hadoop
python setup.py install
Ziggy was authored by Dan McClary, Ph.D. and originates in the

Amaral Lab at Northwestern University.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Ziggy-0.1.1.tar.gz (128.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page