I did try standalone version of tika0.7, and it extracted pdf content
successfully. Then i replaced tika related jars in contrib/extraction/lib of
solr1.4 dist'n with their newer versions, and now it doesn;t extract
contents from ANY pdf.
Earlier (0.4) it was throwing exception for few pdfs, but now no contents or
exception.


On Fri, Apr 30, 2010 at 4:14 PM, Grant Ingersoll <gsing...@apache.org>wrote:

> Can you share the PDF it is failing on?  FWIW, PDFs are notoriously hard to
> extract.  They come in all shapes and flavors and I've seen many a
> commercial extractor fail on them too.  Have you tried using either Tika
> standalone or PDFBox standalone?  Does the file work there?
>
> On Apr 26, 2010, at 8:35 AM, Marc Ghorayeb wrote:
>
> >
> > Okay i've been digging a little bit through the Java code from the SVN,
> and it seems the load function inside the ExtractingDocumentLoader class
> does not receive the ContentStream (it is set to null...).Maybe i should
> send this to the developper mailing list?
> > Marc
> >
> >> From: dekay...@hotmail.com
> >> To: solr-user@lucene.apache.org
> >> Subject: RE: Problem with pdf, upgrading Cell
> >> Date: Fri, 23 Apr 2010 16:03:28 +0200
> >>
> >>
> >> Seems like i'm not the only one with this "no extraction" problem:
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparentlyhe
>  tried the same thing, building from the trunk, and indexing a pdf, and no
> extraction occured... Strange.
> >> Marc G.
> >>
> >> _________________________________________________________________
> >> Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone,
> Blackberry, …
> >> http://www.messengersurvotremobile.com/?d=Hotmail
> >
> > _________________________________________________________________
> > Découvrez comment SURFER DISCRETEMENT sur un site de rencontres !
> > http://clk.atdmt.com/FRM/go/206608211/direct/01/
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Reply via email to