Running Hadoop in User-Space on LSF

Thomas Bach Fri, 03 Aug 2012 03:44:21 -0700

Hi list,

I'm currently evaluating different scenarios to use Hadoop. I have
access to a Linux cluster running LSF as batch system. I have the idea
to write a small wrapper in Python which


+ generates a Hadoop configuration on a per Job basis
+ formats a per job HDFS
+ brings up the NameNode and the JobTracker
+ copies all necessary files to HDFS
+ launches the actual Map/Reduce instances
+ when the job is finished, copies the produced files from HDFS
+ shuts down the daemons

My questions are:
1) Has someone already put some effort in a project similar to this?
2) Do you estimate the over-head of Hadoop set-up to big to get an
actual performance gain?

I assume (2) to depend on job running time and how big the input data
is. Thus,
3) What do you think are the characteristics of a job to gain
performance improvements?

Regards,
        Thomas.

Running Hadoop in User-Space on LSF

Reply via email to