Re: A new map reduce framework for iterative/pipelined jobs.

Arun C Murthy Tue, 27 Dec 2011 00:07:21 -0800

On Dec 26, 2011, at 10:30 PM, Kevin Burton wrote:

> One key point I wanted to mention for Hadoop developers (but then check out 
> the announcement).
> 
> I implemented a version of sysstat (iostat, vmstat, etc) in Peregrine and 
> would be more than happy to move it out and put it in another dedicated 
> project.
> 
> http://peregrine_mapreduce.bitbucket.org/xref/peregrine/sysstat/package-summary.html
> 
> I run this before and after major MR phases which makes it very easy to 
> understand the system throughput/performance for that iteration.
>


Thanks for sharing. I'd love to play with it, do you have a README/user-guide 
for systat?

> ...
> 
> I'm pleased to announce Peregrine 0.5.0 - a new map reduce framework optimized
> for iterative and pipelined map reduce jobs.
> 
> http://peregrine_mapreduce.bitbucket.org/
> 

Sounds interesting. I briefly skimmed through the site.

Couple of questions: 
# How does peregrine deal with the case that you might not have available 
resources to start reduces while the maps are running? Is the map-output 
buffered to disk before the reduces start?
# How does peregrine deal with failure of in-flight reduces (potentially after 
they have recieved X% of maps' outputs).
# How much does peregrine depend on PFS? One idea worth exploring might be to 
run peregrine within YARN (MR2) as an application. Would you be interested in 
trying that?

Thanks again for sharing.

Arun

Re: A new map reduce framework for iterative/pipelined jobs.

Reply via email to