Grant, You can try any of the sample pdfs that come in /docs folder of Solr 1.4 dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. Only metadata i.e. stream_size, content_type apart from my own literals are indexed, and content is missing..
On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll <gsing...@apache.org>wrote: > Praveen and Marc, > > Can you share the PDF (feel free to email my private email) that fails in > Solr? > > Thanks, > Grant > > > On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote: > > > > > Hi > > Nope i didn't get it to work... Just like you, command line version of > tika extracts correctly the content, but once included in Solr, no content > is extracted. > > What i tried until now is:- Updating the tika libraries inside Solr 1.4 > public version, no luck there.- Downloading the latest SVN version, compiled > it, and started from a simple schema, still no luck.- Getting other versions > compiled on hudson (nightly builds), and testing them also, still no > extraction. > > I sent a mail on the developpers mailing list but they told me i should > just mail here, hope some developper reads this because it's quite an > important feature of Solr and somehow it got broke between the 1.4 release, > and the last version on the svn. > > Marc > > _________________________________________________________________ > > Consultez gratuitement vos emails Orange, Gmail, Free, ... directement > dans HOTMAIL ! > > http://www.windowslive.fr/hotmail/agregation/ > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem using Solr/Lucene: > http://www.lucidimagination.com/search > >