Nutch is also a great option if you want a crawler. I have found that you
will need to use the latest version of PDFBox and a it's dependencies for
better results. Also, make sure to set JAVA_OPT to something really large so
that you won't exceed your heap size.

Adam

On Fri, Dec 10, 2010 at 6:27 AM, Tommaso Teofili
<tommaso.teof...@gmail.com>wrote:

> Hi Pankaj,
> you can find the needed documentation right here [1].
> Hope this helps,
> Tommaso
>
> [1] : http://wiki.apache.org/solr/ExtractingRequestHandler
>
> 2010/12/10 pankaj bhatt <panbh...@gmail.com>
>
> > Hi All,
> >      I am a newbie to SOLR and trying to integrate TIKA + SOLR.
> >  Can anyone please guide me, how to achieve this.
> >
> > * My Req is:* I have a directory containing a lot of PDF,DOC's and i need
> > to
> > make a search within the documents. I am using SOLR web application.
> >
> >           I just need some sample xml code both for solr-config.xml and
> the
> > directory-schema.xml
> >        Awaiting eagerly for your response.
> >
> > Regards,
> > Pankaj Bhatt.
> >
>

Reply via email to