On Fri, Apr 8, 2011 at 6:23 AM, Jens Mueller <supidupi...@googlemail.com>wrote:
> Hello all, > > thanks for your generous help. > > I think I now know everything: (What I want to do is to build a web > crawler > and index the documents found). I will start with the setup as suggested by > > Write a web crawler from scratch is... ambitious. Have you looked at Nutch (http://nutch.apache.org/)? It uses Solr for indexing, it may help you get a head start. If you've never used Hadoop before it may take some getting used to, but I have helped a customer implement it and helped a couple of their devs (medium-seniority) get up to speed, and it didn't take them too long to get used to it. Andrea