Can you share the PDF it is failing on?  FWIW, PDFs are notoriously hard to 
extract.  They come in all shapes and flavors and I've seen many a commercial 
extractor fail on them too.  Have you tried using either Tika standalone or 
PDFBox standalone?  Does the file work there?

On Apr 26, 2010, at 8:35 AM, Marc Ghorayeb wrote:

> 
> Okay i've been digging a little bit through the Java code from the SVN, and 
> it seems the load function inside the ExtractingDocumentLoader class does not 
> receive the ContentStream (it is set to null...).Maybe i should send this to 
> the developper mailing list?
> Marc
> 
>> From: dekay...@hotmail.com
>> To: solr-user@lucene.apache.org
>> Subject: RE: Problem with pdf, upgrading Cell
>> Date: Fri, 23 Apr 2010 16:03:28 +0200
>> 
>> 
>> Seems like i'm not the only one with this "no extraction" 
>> problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently
>>  he tried the same thing, building from the trunk, and indexing a pdf, and 
>> no extraction occured... Strange.
>> Marc G.
>>                                        
>> _________________________________________________________________
>> Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
>> Blackberry, …
>> http://www.messengersurvotremobile.com/?d=Hotmail
>                                         
> _________________________________________________________________
> Découvrez comment SURFER DISCRETEMENT sur un site de rencontres !
> http://clk.atdmt.com/FRM/go/206608211/direct/01/

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Reply via email to