Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Ramirez, Paul M (388J)
Hey Emyr, Looking at your stack trace below my guess is that you have two conflicting Apache POI jars in your classpath. The odd stack trace is indicative of that as the class loader is likely loading some other version of the DirectoryNode class that doesn't have the iterator method. > java

Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Emyr James
Hi, I'm not really sure how these can help with my problem. Can you give a bit more info on this ? I think what i'm after is a fairly common request.. http://lucene.472066.n3.nabble.com/Controlling-Tika-s-metadata-td2378677.html http://lucene.472066.n3.nabble.com/Select-tika-output-for-extract

Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Anuj Kumar
Hi Emyr, You can try the XPath based approach and see if that works. Also, see if dynamic fields can help you for the meta data fields. References- http://wiki.apache.org/solr/SchemaXml#Dynamic_fields http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters http://wiki.apache.org/sol

Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Emyr James
Thanks for the suggestion but there surely must be a better way than that to do it ? I don't want to post the whole file up, get it extracted on the server, send the extracted text back to the client then send it all back up to the server again as plain text. On 05/05/11 14:55, Jay Luker wrote

Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Jay Luker
Hi Emyr, You could try using the "extractOnly=true" parameter [1]. Of course, you'll need to repost the extracted text manually. --jay [1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only On Thu, May 5, 2011 at 9:36 AM, Emyr James wrote: > Hi All, > > I have solr and tika ins