You could very well use Solr. It has support to index the PDF and XML files. If you want to index websites and search using page rank then choose Nutch.
Regards Aditya www.findbestopensource.com On Wed, May 16, 2012 at 1:13 PM, Tolga <to...@ozses.net> wrote: > Hi, > > I have been trying for a week. I really want to get a start, so what > should I use? curl or nutch? I want to be able to index pdf, xml etc. and > search within them as well. > > Regards, >