On May 11, 2007, at 7:32 AM, David Xiao wrote:
Hello,
I am using crawler to index and search some intranet webpages which
need authorization. I wrote my own crawler for this kind of needs.
But with the requirement is evolving, I need another crawler for
external webpages (on internet) too, so I am looking for a generic
crawler that can integrate with Solr.
The crawler should be easy to configure and able to customize Xml
output according to schema.xml
Nutch with the SolrIndexer and the solrj client is wonderful for this.