I have some sort of same requirement where I need to move to a good crawler. Currently I am using a custom crawler, I mean my own crawler to crawl some public domains and uses Lucene to index all downloaded pages. After doing lots of research I came across JSpider with Lucene. ALso I was looking for Nutch for doing crawler job but I dont think that is possible, I mean feasible. - BR
"A. Banji Oyebisi" <[EMAIL PROTECTED]> wrote: I am interested in this too. any ideas? A. Banji Oyebisi Choicegen, LLC. Email: [EMAIL PROTECTED] Web URL: http://www.choicegen.com Choicegen... Helping you make better choices! Notice: This email message, together with any attachments, may contain information of Choicegen, LLC., its subsidiaries and affiliated entities, that may be confidential, proprietary, copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it. George Everitt wrote: I'm looking for a web crawler to use with Solr. The objective is to crawl about a dozen public web sites regarding a specific topic. After a lot of googling, I came across Heritrix, which seems to be the most robust well supported open source crawler out there. Heritrix has an integration with Nutch (NutchWax), but not with Solr. I'm wondering if anybody can share any experience using Heritrix with Solr. It seems that there are three options for integration: 1. Write a custom Heritrix "Writer" class which submits documents to Solr for indexing. 2. Write an ARC to Sol input XML format converter to import the ARC files. 3. Use the filesystem mirror writer and then another program to walk the downloaded files. Has anybody looked into this or have any suggestions on an alternative approach? The optimal answer would be "You dummy, just use XXX to crawl your web sites - there's no 'integration' required at all. Can you believe the temerity? What a poltroon." Yours in Revolution, George --------------------------------- Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try it now.