Greetings,
I would like to announce the 0.1 release of RHIPE:R and Hadoop
Integrated Processing Environment.  The website is located at :
http://www.stat.purdue.edu/~sguha/rhipe<http://www.stat.purdue.edu/%7Esguha/rhipe>.
The download link is the bottom most
link on left side of the page.

RHIPE works on top of Hadoop, providing the R user a way to distribute
commands
over the Hadoop computing framework.

The 0.1 release has one main command - rhlapply which is a parallel version
of
lapply.  rhlapply outputs the results to Sequence files on the  Hadoop
Distributed Filesystem, and
RHIPE comes with commands to read R objects fromthese Sequence files(a
Hadoop file
format). rhlapply has features to share files/load libraries/execute code on
the cluster machines
and collect side effect files.

Since RHIPE uses Hadoop for distributing computation, it also benefits from
Hadoops stability: load balancing and machine failure recovery being two
important features, scheduling of jobs etc.

RHIPE also implements a basic Shared Associative Space via IBM's TSpaces -
with
commands like rhput,rhtake,rhread etc. This is optional.

See the website for details and performance results. Unfortunately, I
haven't
had the chance to compare with rmpi and snow.

TODO(will appear soon ): another function: rhmr - mapreduce using R.

Regards
Saptarshi Guha

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to