It can, as can ManifoldCF.  But you should ask on nutch-user list (this may 
also be documented on the Wiki)

Otis 
----
Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



>________________________________
> From: Tolga <to...@ozses.net>
>To: solr-user@lucene.apache.org 
>Sent: Wednesday, May 16, 2012 8:11 AM
>Subject: Re: curl or nutch
> 
>Can nutch crawl/index files as well?
>
>On 5/16/12 12:29 PM, findbestopensource wrote:
>> You could very well use Solr. It has support to index the PDF and XML
>> files. If you want to index websites and search using page rank then choose
>> Nutch.
>>
>> Regards
>> Aditya
>> www.findbestopensource.com
>>
>>
>> On Wed, May 16, 2012 at 1:13 PM, Tolga<to...@ozses.net>  wrote:
>>
>>> Hi,
>>>
>>> I have been trying for a week. I really want to get a start, so what
>>> should I use? curl or nutch? I want to be able to index pdf, xml etc. and
>>> search within them as well.
>>>
>>> Regards,
>>>
>
>
>

Reply via email to