Nutch is also a great option if you want a crawler. I have found that you will need to use the latest version of PDFBox and a it's dependencies for better results. Also, make sure to set JAVA_OPT to something really large so that you won't exceed your heap size.
Adam On Fri, Dec 10, 2010 at 6:27 AM, Tommaso Teofili <tommaso.teof...@gmail.com>wrote: > Hi Pankaj, > you can find the needed documentation right here [1]. > Hope this helps, > Tommaso > > [1] : http://wiki.apache.org/solr/ExtractingRequestHandler > > 2010/12/10 pankaj bhatt <panbh...@gmail.com> > > > Hi All, > > I am a newbie to SOLR and trying to integrate TIKA + SOLR. > > Can anyone please guide me, how to achieve this. > > > > * My Req is:* I have a directory containing a lot of PDF,DOC's and i need > > to > > make a search within the documents. I am using SOLR web application. > > > > I just need some sample xml code both for solr-config.xml and > the > > directory-schema.xml > > Awaiting eagerly for your response. > > > > Regards, > > Pankaj Bhatt. > > >