Re: indexing/crawling HTML + solr

Otis Gospodnetic Wed, 03 Jun 2009 04:24:27 -0700

Gena,

Besides droids (simpler, smaller components you can put together) there is also 
Nutch, a bigger beast for large scale crawling that index crawled pages into 
Solr - http://lucene.apache.org/nutch .


Otis


----- Original Message ----
> From: Gena Batsyan <gbat...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, June 3, 2009 6:09:36 AM
> Subject: indexing/crawling HTML + solr
> 
> Hi!
> 
> to be short, where to start with the subject?
> 
> Any pointers to some [semi-]functional solutions that crawl the web as a 
> normal 
> crawler, take care about html parsing, etc, and feed the crawled stuff as 
> solr-documents per   ?
> 
> regards!

Re: indexing/crawling HTML + solr

Reply via email to