@ Christoph: Thanks for replying. I would try with more nodes/larger url set to see how much improvement in processing time i get from cluster.
@mapreduce-mailing-community: It would be great if anybody can help me with Nutch benchmark on small cluster since it would help me in determining no. of machines i would need for my application to scale up. Regards: Ashish Vyas On Fri, Mar 30, 2012 at 2:16 PM, Christoph Schmitz < [email protected]> wrote: > Hi Ashish, > > IMHO your numbers (2 machines, 10 URLs) are way too small to outweigh the > natural overhead that occurs with a distributed computation (distributing > the program code, coordinating the distributed file system, making sure > everybody is starting and stopping, etc.). Also, if you're web crawling, > the bottleneck might not even be the processing capacity of your machines, > but rather some network component on the way between you and the web. > > I'm not aware of any Hadoop or Nutch benchmarks, but once you use larger > data and/or CPU intensive computations, you should actually see a more or > less linear increase in throughput with more machines. > > Regards, > Christoph > > -----Ursprüngliche Nachricht----- > Von: ashish vyas [mailto:[email protected]] > Gesendet: Freitag, 30. März 2012 10:30 > An: [email protected] > Betreff: Performance improvement-Cluster vs Pseudo > > Hi, > > > > I have setup hadoop clutser(2 node cluster) and I am running Nutch > crawl on it. I am trying to compare results and improvement in processing > time when I crawl with 10 URL's and depth 2. When I am running the crawl on > cluster its taking more time than pseudo cluster which in turn is taking > more time than standalone nutch crawl. > > I am just wondering that after running Nutch on hadoop cluster > processing time should come down logicaly since that's why hadoop has > evolved out of Nutch project. Please let me know if there is any benchmark > test for pseudo vs cluster and why Nutch crawl is taking more time on > cluster. > > > > Please let me know if you need more info. > > > > Regards: > > Ashish Vyas > > >
