We are using Heritrix, the Internet Archive’s open source crawler, which is 
very easy to extend. We have augmented it with a custom parser to crawl some 
specific data formats and coded our own processors (Heritrix’s terminology for 
extensions) to link together different data sources as well as to output xmls 
in the right format to feed to solr. We have not yet created an automated path 
to feed the xmls into solr but we plan to.

~LB



On 3/5/09 3:32 PM, "Tony Wang" <ivyt...@gmail.com> wrote:

Hi,

I wonder if there's any open source crawler product that could be integrated
with Solr. What crawler do you guys use? or you coded one by yourself? I
have been trying to find out solutions for Nutch/Solr integration, but
haven't got any luck yet.

Could someone shed me some light?

thanks!

Tony

--
Are you RCholic? www.RCholic.com
温 良 恭 俭 让 仁 义 礼 智 信

Reply via email to