It can, as can ManifoldCF. But you should ask on nutch-user list (this may also be documented on the Wiki)
Otis ---- Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm >________________________________ > From: Tolga <to...@ozses.net> >To: solr-user@lucene.apache.org >Sent: Wednesday, May 16, 2012 8:11 AM >Subject: Re: curl or nutch > >Can nutch crawl/index files as well? > >On 5/16/12 12:29 PM, findbestopensource wrote: >> You could very well use Solr. It has support to index the PDF and XML >> files. If you want to index websites and search using page rank then choose >> Nutch. >> >> Regards >> Aditya >> www.findbestopensource.com >> >> >> On Wed, May 16, 2012 at 1:13 PM, Tolga<to...@ozses.net> wrote: >> >>> Hi, >>> >>> I have been trying for a week. I really want to get a start, so what >>> should I use? curl or nutch? I want to be able to index pdf, xml etc. and >>> search within them as well. >>> >>> Regards, >>> > > >