Hello, I'd like to announce the release of the 0.1 version of RHIPE -R and Hadoop Integrated Processing Environment. Using RHIPE, it is possible to write map-reduce algorithms using the R language and start them from within R. RHIPE is built on Hadoop and so benefits from Hadoop's fault tolerance, distributed file system and job scheduling features. For the R user, there is rhlapply which runs an lapply across the cluster. For the Hadoop user, there is rhmr which runs a general map-reduce program.
The tired example of counting words: m <- function(key,val){ words <- substr(val," +")[[1]] wc <- table(words) cln <- names(wc) return(sapply(1:length(wc),function(r) list(key=cln[r],value=wc[[r]]),simplify=F)) } r <- function(key,value){ value <- do.call("rbind",value) return(list(list(key=key,value=sum(value)))) } rhmr(mapper=m,reduce=r,input.folder="X",output.folder="Y") URL: http://ml.stat.purdue.edu/rhipe There are some downsides to RHIPE which are described at http://ml.stat.purdue.edu/rhipe/install.html#sec-5 Regards Saptarshi Guha ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.