Re: what crawler do you use for Solr indexing?

Baalman, Laura A. (ARC-TI)[QSS GROUP INC] Thu, 05 Mar 2009 16:00:37 -0800

We are using Heritrix, the Internet Archive’s open source crawler, which is 
very easy to extend. We have augmented it with a custom parser to crawl some 
specific data formats and coded our own processors (Heritrix’s terminology for 
extensions) to link together different data sources as well as to output xmls 
in the right format to feed to solr. We have not yet created an automated path 
to feed the xmls into solr but we plan to.


~LB



On 3/5/09 3:32 PM, "Tony Wang" <ivyt...@gmail.com> wrote:

Hi,

I wonder if there's any open source crawler product that could be integrated
with Solr. What crawler do you guys use? or you coded one by yourself? I
have been trying to find out solutions for Nutch/Solr integration, but
haven't got any luck yet.

Could someone shed me some light?

thanks!

Tony

--
Are you RCholic? www.RCholic.com
温 良 恭 俭 让 仁 义 礼 智 信

Re: what crawler do you use for Solr indexing?

Reply via email to