Gena, Besides droids (simpler, smaller components you can put together) there is also Nutch, a bigger beast for large scale crawling that index crawled pages into Solr - http://lucene.apache.org/nutch .
Otis ----- Original Message ---- > From: Gena Batsyan <gbat...@gmail.com> > To: solr-user@lucene.apache.org > Sent: Wednesday, June 3, 2009 6:09:36 AM > Subject: indexing/crawling HTML + solr > > Hi! > > to be short, where to start with the subject? > > Any pointers to some [semi-]functional solutions that crawl the web as a > normal > crawler, take care about html parsing, etc, and feed the crawled stuff as > solr-documents per ? > > regards!