If you use curl you will need to track every document and recurse inside folders,etc. If you use nutch it takes care of incremental crawling in the configured locations and submits the docs which changed from its previous run.
The lack of a simple File system crawler around Solr is a big disadvantage. You can look at Aperture, Manifold CF frameworks for comparing with nutch. Thanks, Tirthankar Sent from handheld ----- Original Message ----- From: Tolga [mailto:to...@ozses.net] Sent: Wednesday, May 16, 2012 03:43 AM To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>; u...@nutch.apache.org <u...@nutch.apache.org> Subject: curl or nutch Hi, I have been trying for a week. I really want to get a start, so what should I use? curl or nutch? I want to be able to index pdf, xml etc. and search within them as well. Regards, ******************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you." *********************************************************