On Dec 26, 2011, at 10:30 PM, Kevin Burton wrote: > One key point I wanted to mention for Hadoop developers (but then check out > the announcement). > > I implemented a version of sysstat (iostat, vmstat, etc) in Peregrine and > would be more than happy to move it out and put it in another dedicated > project. > > http://peregrine_mapreduce.bitbucket.org/xref/peregrine/sysstat/package-summary.html > > I run this before and after major MR phases which makes it very easy to > understand the system throughput/performance for that iteration. >
Thanks for sharing. I'd love to play with it, do you have a README/user-guide for systat? > ... > > I'm pleased to announce Peregrine 0.5.0 - a new map reduce framework optimized > for iterative and pipelined map reduce jobs. > > http://peregrine_mapreduce.bitbucket.org/ > Sounds interesting. I briefly skimmed through the site. Couple of questions: # How does peregrine deal with the case that you might not have available resources to start reduces while the maps are running? Is the map-output buffered to disk before the reduces start? # How does peregrine deal with failure of in-flight reduces (potentially after they have recieved X% of maps' outputs). # How much does peregrine depend on PFS? One idea worth exploring might be to run peregrine within YARN (MR2) as an application. Would you be interested in trying that? Thanks again for sharing. Arun
