I am working with Solr 1.4 nightly and am running it on a Windows machine. Solr is running using the example folder that was installed from the zip file. The only alteration that I have made to this default installation is to add a simple Word document into the exampledocs folder.
I am trying to get Tika to work in Solr. When I run the tika-0.3.jar directed to a Word document it outputs to the screen in XML format. I am not able to get Solr to run tika and index the information in the sample Word document. I have looked at the following resources: Solr mailing list archive (although I could have missed something here); Documentation and Getting started on the Apache Tika website; I even found an article called Content Extraction with Tika at this website: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles /Content-Extraction-Tika This article talks about using curl. Is curl necessary to use or does Solr have something already configured to do the same as curl? I have modified the solrconfig.xml file to include the request handler for the ExtractingRequestHandler. I used the modification that was commented out in the solrconfig.xml file. Here it is for reference: <requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler"> <lst name="defaults"> <str name="ext.map.Last-Modified">last_modified</str> <bool name="ext.ignore.und.fl">true</bool> </lst> </requestHandler> Is there some modification to this code that I need to make? Can some one please direct me to a source that can help me get this to work. Kevin Miller