It reports that Jukka has resolved the issue (Tika-419), and now waiting for Grant to verify (Solr-1902). But it seems the resolution will be available in 0.8 version of Tika.
If it solves the problem, Is there a way to get it now? Any SVN trunk access etc? All i see there is 0.7 src zip to download.. Thanks. Praveen On Tue, May 4, 2010 at 3:59 PM, Grant Ingersoll <gsing...@apache.org> wrote: > Yes, it is loading the libraries, but they are in a different classloader > that apparently the new way Tika loads doesn't have access to. > > -Grant > > On May 4, 2010, at 3:28 AM, Sandhya Agarwal wrote: > > > Hello, > > > > > > > > But I see that the libraries are being loaded : > > > > > > > > INFO: Adding specified lib dirs to ClassLoader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-compress-1.0.jar' > to classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/commons-logging-1.1.1.jar' > to classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/dom4j-1.6.1.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/fontbox-1.1.0.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/geronimo-stax-api_1.0_spec-1.0.1.jar' > to classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/jempbox-1.1.0.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/log4j-1.2.14.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/metadata-extractor-2.4.0-beta-1.jar' > to classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/pdfbox-1.1.0.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-3.6.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-3.6.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-ooxml-schemas-3.6.jar' > to classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/poi-scratchpad-3.6.jar' > to classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tagsoup-1.2.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-core-0.7.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-parsers-0.7.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xercesImpl-2.8.1.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xml-apis-1.0.b2.jar' to > classloader > > > > May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xmlbeans-2.3.0.jar' to > classloader > > > > May 4, 2010 12:50:16 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding 'file:/C:/apache-solr-1.4.0/dist/apache-solr-cell-1.4.0.jar' > to classloader > > > > May 4, 2010 12:50:20 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/dist/apache-solr-clustering-1.4.0.jar' to > classloader > > > > May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/carrot2-mini-3.1.0.jar' > to classloader > > > > May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/commons-lang-2.4.jar' to > classloader > > > > May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/ehcache-1.6.2.jar' to > classloader > > > > May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/google-collections-1.0-rc2.jar' > to classloader > > > > May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/jackson-core-asl-0.9.9-6.jar' > to classloader > > > > May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/jackson-mapper-asl-0.9.9-6.jar' > to classloader > > > > May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader > replaceClassLoader > > > > INFO: Adding > 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/log4j-1.2.14.jar' to > classloader > > > > > > > > Thanks, > > > > Sandhya > > > > > > > > -----Original Message----- > > From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant > Ingersoll > > Sent: Tuesday, May 04, 2010 6:13 AM > > Cc: solr-user@lucene.apache.org > > Subject: Re: Problem with pdf, upgrading Cell > > > > > > > > Little more info... Seems to be a classloading issue. The tests pass, > but they aren't loading the Tika libraries via the Solr ResourceLoader, > whereas the example is. Marc, one thing to try is to unjar the Solr WAR > file and put the Tika libs in there, as I bet it will then work. Note, > however, I haven't tried this. > > > > > > > > On May 3, 2010, at 6:24 PM, Grant Ingersoll wrote: > > > > > > > >> I've opened https://issues.apache.org/jira/browse/SOLR-1902 to track > this. It is indeed a bug somewhere (still investigating). It seems that > Tika is now picking an EmptyParser implementation when trying to determine > which parser to use, despite the fact that it properly identifies the MIME > Type. > > > >> > > > >> -Grant > > > >> > > > >> On May 3, 2010, at 5:36 PM, Grant Ingersoll wrote: > > > >> > > > >>> I'm investigating. > > > >>> > > > >>> On May 3, 2010, at 5:17 AM, Marc Ghorayeb wrote: > > > >>> > > > >>>> > > > >>>> Hi, > > > >>>> Grant, i confirm what Praveen has said, any PDF i try does not work > with the new Tika and SVN versions. :( > > > >>>> Marc > > > >>>> > > > >>>>> From: sagar...@opentext.com > > > >>>>> To: solr-user@lucene.apache.org > > > >>>>> Date: Mon, 3 May 2010 13:05:24 +0530 > > > >>>>> Subject: RE: Problem with pdf, upgrading Cell > > > >>>>> > > > >>>>> Hello, > > > >>>>> > > > >>>>> Please let me know if anybody figured out a way out of this issue. > > > >>>>> > > > >>>>> Thanks, > > > >>>>> Sandhya > > > >>>>> > > > >>>>> -----Original Message----- > > > >>>>> From: Praveen Agrawal [mailto:pkal...@gmail.com] > > > >>>>> Sent: Friday, April 30, 2010 11:14 PM > > > >>>>> To: solr-user@lucene.apache.org > > > >>>>> Subject: Re: Problem with pdf, upgrading Cell > > > >>>>> > > > >>>>> Grant, > > > >>>>> You can try any of the sample pdfs that come in /docs folder of Solr > 1.4 > > > >>>>> dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. > Only > > > >>>>> metadata i.e. stream_size, content_type apart from my own literals > are > > > >>>>> indexed, and content is missing.. > > > >>>>> > > > >>>>> > > > >>>>> On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll < > gsing...@apache.org>wrote: > > > >>>>> > > > >>>>>> Praveen and Marc, > > > >>>>>> > > > >>>>>> Can you share the PDF (feel free to email my private email) that > fails in > > > >>>>>> Solr? > > > >>>>>> > > > >>>>>> Thanks, > > > >>>>>> Grant > > > >>>>>> > > > >>>>>> > > > >>>>>> On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote: > > > >>>>>> > > > >>>>>>> > > > >>>>>>> Hi > > > >>>>>>> Nope i didn't get it to work... Just like you, command line version > of > > > >>>>>> tika extracts correctly the content, but once included in Solr, no > content > > > >>>>>> is extracted. > > > >>>>>>> What i tried until now is:- Updating the tika libraries inside Solr > 1.4 > > > >>>>>> public version, no luck there.- Downloading the latest SVN version, > compiled > > > >>>>>> it, and started from a simple schema, still no luck.- Getting other > versions > > > >>>>>> compiled on hudson (nightly builds), and testing them also, still no > > > >>>>>> extraction. > > > >>>>>>> I sent a mail on the developpers mailing list but they told me i > should > > > >>>>>> just mail here, hope some developper reads this because it's quite > an > > > >>>>>> important feature of Solr and somehow it got broke between the 1.4 > release, > > > >>>>>> and the last version on the svn. > > > >>>>>>> Marc > > > >>>>>>> _________________________________________________________________ > > > >>>>>>> Consultez gratuitement vos emails Orange, Gmail, Free, ... > directement > > > >>>>>> dans HOTMAIL ! > > > >>>>>>> http://www.windowslive.fr/hotmail/agregation/ > > > >>>>>> > > > >>>>>> -------------------------- > > > >>>>>> Grant Ingersoll > > > >>>>>> http://www.lucidimagination.com/ > > > >>>>>> > > > >>>>>> Search the Lucene ecosystem using Solr/Lucene: > > > >>>>>> http://www.lucidimagination.com/search > > > >>>>>> > > > >>>>>> > > > >>>> > > > >>>> _________________________________________________________________ > > > >>>> Hotmail et MSN dans la poche? HOTMAIL et MSN sont dispo gratuitement > sur votre téléphone! > > > >>>> http://www.messengersurvotremobile.com/?d=Hotmail > > > >>> > > > >>> -------------------------- > > > >>> Grant Ingersoll > > > >>> http://www.lucidimagination.com/ > > > >>> > > > >>> Search the Lucene ecosystem using Solr/Lucene: > http://www.lucidimagination.com/search > > > >>> > > > >> > > > >> -------------------------- > > > >> Grant Ingersoll > > > >> http://www.lucidimagination.com/ > > > >> > > > >> Search the Lucene ecosystem using Solr/Lucene: > http://www.lucidimagination.com/search > > > >> > > > > > > > > -------------------------- > > > > Grant Ingersoll > > > > http://www.lucidimagination.com/ > > > > > > > > Search the Lucene ecosystem using Solr/Lucene: > http://www.lucidimagination.com/search > > > > > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem using Solr/Lucene: > http://www.lucidimagination.com/search > >