If you use curl you will need to track every document and recurse inside 
folders,etc. 
If you use nutch it takes care of incremental crawling in the configured 
locations and submits the docs which changed from its previous run.

The lack of a simple File system crawler around Solr is a big disadvantage. You 
can look at Aperture, Manifold CF frameworks for comparing with nutch. 
Thanks,
Tirthankar
Sent from handheld

----- Original Message -----
From: Tolga [mailto:to...@ozses.net]
Sent: Wednesday, May 16, 2012 03:43 AM
To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>; 
u...@nutch.apache.org <u...@nutch.apache.org>
Subject: curl or nutch

Hi,

I have been trying for a week. I really want to get a start, so what 
should I use? curl or nutch? I want to be able to index pdf, xml etc. 
and search within them as well.

Regards,
******************Legal Disclaimer***************************
"This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you."
*********************************************************

Reply via email to