Pankaj,
Check this article out on how to get going with Nutch.
http://bit.ly/dbBdK4This is a few months old so you will have to note
that there is a new
parameter called something like -SolrUrl that will allow you to update your
solr index with the crawled data.
For crawling your local file syste
Nutch is also a great option if you want a crawler. I have found that you
will need to use the latest version of PDFBox and a it's dependencies for
better results. Also, make sure to set JAVA_OPT to something really large so
that you won't exceed your heap size.
Adam
On Fri, Dec 10, 2010 at 6:27
Hi Pankaj,
you can find the needed documentation right here [1].
Hope this helps,
Tommaso
[1] : http://wiki.apache.org/solr/ExtractingRequestHandler
2010/12/10 pankaj bhatt
> Hi All,
> I am a newbie to SOLR and trying to integrate TIKA + SOLR.
> Can anyone please guide me, how to achieve