I've integrated this into Solr's trunk: https://issues.apache.org/jira/browse/SOLR-1902
-Grant On May 6, 2010, at 3:40 AM, Sandhya Agarwal wrote: > Praveen, > > You can get the latest code, containing the fix, from here : > > http://lucene.apache.org/tika/source-repository.html > > Thanks, > Sandhya > > -----Original Message----- > From: Praveen Agrawal [mailto:pkal...@gmail.com] > Sent: Wednesday, May 05, 2010 10:49 PM > To: solr-user@lucene.apache.org > Subject: Re: Problem with pdf, upgrading Cell > > It reports that Jukka has resolved the issue (Tika-419), and now waiting for > Grant to verify (Solr-1902). But it seems the resolution will be available > in 0.8 version of Tika. > > If it solves the problem, Is there a way to get it now? Any SVN trunk access > etc? All i see there is 0.7 src zip to download.. > > Thanks. > Praveen > > > On Tue, May 4, 2010 at 3:59 PM, Grant Ingersoll <gsing...@apache.org> wrote: > >> Yes, it is loading the libraries, but they are in a different classloader >> that apparently the new way Tika loads doesn't have access to. >> >> -Grant >> >> On May 4, 2010, at 3:28 AM, Sandhya Agarwal wrote: >> >>> Hello, >>> >>> >>> >>> But I see that the libraries are being loaded : >>> >>> >>> >>> INFO: Adding specified lib dirs to ClassLoader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-compress-1.0.jar' >> to classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-logging-1.1.1.jar' >> to classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/dom4j-1.6.1.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/fontbox-1.1.0.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/geronimo-stax-api_1.0_spec-1.0.1.jar' >> to classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/jempbox-1.1.0.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/log4j-1.2.14.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/metadata-extractor-2.4.0-beta-1.jar' >> to classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/pdfbox-1.1.0.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-3.6.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-3.6.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-schemas-3.6.jar' >> to classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-scratchpad-3.6.jar' >> to classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tagsoup-1.2.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-core-0.7.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-parsers-0.7.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xercesImpl-2.8.1.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xml-apis-1.0.b2.jar' to >> classloader >>> >>> May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xmlbeans-2.3.0.jar' to >> classloader >>> >>> May 4, 2010 12:50:16 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding 'file:/C:/apache-solr-1.4.0/dist/apache-solr-cell-1.4.0.jar' >> to classloader >>> >>> May 4, 2010 12:50:20 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/dist/apache-solr-clustering-1.4.0.jar' to >> classloader >>> >>> May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/carrot2-mini-3.1.0.jar' >> to classloader >>> >>> May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/commons-lang-2.4.jar' to >> classloader >>> >>> May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/ehcache-1.6.2.jar' to >> classloader >>> >>> May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/google-collections-1.0-rc2.jar' >> to classloader >>> >>> May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/jackson-core-asl-0.9.9-6.jar' >> to classloader >>> >>> May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/jackson-mapper-asl-0.9.9-6.jar' >> to classloader >>> >>> May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader >> replaceClassLoader >>> >>> INFO: Adding >> 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/log4j-1.2.14.jar' to >> classloader >>> >>> >>> >>> Thanks, >>> >>> Sandhya >>> >>> >>> >>> -----Original Message----- >>> From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant >> Ingersoll >>> Sent: Tuesday, May 04, 2010 6:13 AM >>> Cc: solr-user@lucene.apache.org >>> Subject: Re: Problem with pdf, upgrading Cell >>> >>> >>> >>> Little more info... Seems to be a classloading issue. The tests pass, >> but they aren't loading the Tika libraries via the Solr ResourceLoader, >> whereas the example is. Marc, one thing to try is to unjar the Solr WAR >> file and put the Tika libs in there, as I bet it will then work. Note, >> however, I haven't tried this. >>> >>> >>> >>> On May 3, 2010, at 6:24 PM, Grant Ingersoll wrote: >>> >>> >>> >>>> I've opened https://issues.apache.org/jira/browse/SOLR-1902 to track >> this. It is indeed a bug somewhere (still investigating). It seems that >> Tika is now picking an EmptyParser implementation when trying to determine >> which parser to use, despite the fact that it properly identifies the MIME >> Type. >>> >>>> >>> >>>> -Grant >>> >>>> >>> >>>> On May 3, 2010, at 5:36 PM, Grant Ingersoll wrote: >>> >>>> >>> >>>>> I'm investigating. >>> >>>>> >>> >>>>> On May 3, 2010, at 5:17 AM, Marc Ghorayeb wrote: >>> >>>>> >>> >>>>>> >>> >>>>>> Hi, >>> >>>>>> Grant, i confirm what Praveen has said, any PDF i try does not work >> with the new Tika and SVN versions. :( >>> >>>>>> Marc >>> >>>>>> >>> >>>>>>> From: sagar...@opentext.com >>> >>>>>>> To: solr-user@lucene.apache.org >>> >>>>>>> Date: Mon, 3 May 2010 13:05:24 +0530 >>> >>>>>>> Subject: RE: Problem with pdf, upgrading Cell >>> >>>>>>> >>> >>>>>>> Hello, >>> >>>>>>> >>> >>>>>>> Please let me know if anybody figured out a way out of this issue. >>> >>>>>>> >>> >>>>>>> Thanks, >>> >>>>>>> Sandhya >>> >>>>>>> >>> >>>>>>> -----Original Message----- >>> >>>>>>> From: Praveen Agrawal [mailto:pkal...@gmail.com] >>> >>>>>>> Sent: Friday, April 30, 2010 11:14 PM >>> >>>>>>> To: solr-user@lucene.apache.org >>> >>>>>>> Subject: Re: Problem with pdf, upgrading Cell >>> >>>>>>> >>> >>>>>>> Grant, >>> >>>>>>> You can try any of the sample pdfs that come in /docs folder of Solr >> 1.4 >>> >>>>>>> dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. >> Only >>> >>>>>>> metadata i.e. stream_size, content_type apart from my own literals >> are >>> >>>>>>> indexed, and content is missing.. >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll < >> gsing...@apache.org>wrote: >>> >>>>>>> >>> >>>>>>>> Praveen and Marc, >>> >>>>>>>> >>> >>>>>>>> Can you share the PDF (feel free to email my private email) that >> fails in >>> >>>>>>>> Solr? >>> >>>>>>>> >>> >>>>>>>> Thanks, >>> >>>>>>>> Grant >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote: >>> >>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> Hi >>> >>>>>>>>> Nope i didn't get it to work... Just like you, command line version >> of >>> >>>>>>>> tika extracts correctly the content, but once included in Solr, no >> content >>> >>>>>>>> is extracted. >>> >>>>>>>>> What i tried until now is:- Updating the tika libraries inside Solr >> 1.4 >>> >>>>>>>> public version, no luck there.- Downloading the latest SVN version, >> compiled >>> >>>>>>>> it, and started from a simple schema, still no luck.- Getting other >> versions >>> >>>>>>>> compiled on hudson (nightly builds), and testing them also, still no >>> >>>>>>>> extraction. >>> >>>>>>>>> I sent a mail on the developpers mailing list but they told me i >> should >>> >>>>>>>> just mail here, hope some developper reads this because it's quite >> an >>> >>>>>>>> important feature of Solr and somehow it got broke between the 1.4 >> release, >>> >>>>>>>> and the last version on the svn. >>> >>>>>>>>> Marc >>> >>>>>>>>> _________________________________________________________________ >>> >>>>>>>>> Consultez gratuitement vos emails Orange, Gmail, Free, ... >> directement >>> >>>>>>>> dans HOTMAIL ! >>> >>>>>>>>> http://www.windowslive.fr/hotmail/agregation/ >>> >>>>>>>> >>> >>>>>>>> -------------------------- >>> >>>>>>>> Grant Ingersoll >>> >>>>>>>> http://www.lucidimagination.com/ >>> >>>>>>>> >>> >>>>>>>> Search the Lucene ecosystem using Solr/Lucene: >>> >>>>>>>> http://www.lucidimagination.com/search >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>> >>> >>>>>> _________________________________________________________________ >>> >>>>>> Hotmail et MSN dans la poche? HOTMAIL et MSN sont dispo gratuitement >> sur votre téléphone! >>> >>>>>> http://www.messengersurvotremobile.com/?d=Hotmail >>> >>>>> >>> >>>>> -------------------------- >>> >>>>> Grant Ingersoll >>> >>>>> http://www.lucidimagination.com/ >>> >>>>> >>> >>>>> Search the Lucene ecosystem using Solr/Lucene: >> http://www.lucidimagination.com/search >>> >>>>> >>> >>>> >>> >>>> -------------------------- >>> >>>> Grant Ingersoll >>> >>>> http://www.lucidimagination.com/ >>> >>>> >>> >>>> Search the Lucene ecosystem using Solr/Lucene: >> http://www.lucidimagination.com/search >>> >>>> >>> >>> >>> >>> -------------------------- >>> >>> Grant Ingersoll >>> >>> http://www.lucidimagination.com/ >>> >>> >>> >>> Search the Lucene ecosystem using Solr/Lucene: >> http://www.lucidimagination.com/search >>> >>> >> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem using Solr/Lucene: >> http://www.lucidimagination.com/search >> >> -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search