skip to navigation
skip to content

Not Logged In

SAGA-Hadoop 0.17

SAGA to launch an Hadoop cluster as a normal batch job on Torque clusters

Latest Version: 0.20

# SAGA Hadoop

# Overview:

Use [SAGA](http://saga-project.github.io/saga-python/) to spawn an Hadoop Cluster within an HPC batch job.

Currently supported SAGA adaptors:

  • Fork
  • Torque

Requirements:

  • PBS/Torque cluster
  • Working directory should be on a shared filesystem

By default SAGA-Hadoop deploys an Hadoop 2.2.0 YARN cluster. The cluster can be customized by adjusting the templates for the Hadoop configuration files in core-site.xml, hdfs-site.xml, mapred-site.xml and yarn-site.xml in the hadoop2/bootstrap_hadoop2.py.

# Usage

Try to run a local Hadoop (e.g. for development and testing)

easy_install saga-hadoop saga-hadoop --resource fork://localhost

Try to run a Hadoop cluster inside a PBS/Torque job:

saga-hadoop --resource pbs+ssh://india.futuregrid.org --number_cores 8

Some Blog Posts about SAGA-Hadoop:

# Packages:

see hadoop1 for setting up a Hadoop 1.x.x cluster

see hadoop2 for setting up a Hadoop 2.2.x cluster

 
File Type Py Version Uploaded on Size
SAGA-Hadoop-0.17.tar.gz (md5) Source 2013-12-26 9KB
  • Downloads (All Versions):
  • 37 downloads in the last day
  • 189 downloads in the last week
  • 1058 downloads in the last month