I am working with Solr 1.4 nightly and am running it on a Windows
machine.  Solr is running using the example folder that was installed
from the zip file.  The only alteration that I have made to this default
installation is to add a simple Word document into the exampledocs
folder.

I am trying to get Tika to work in Solr.  When I run the tika-0.3.jar
directed to a Word document it outputs to the screen in XML format.  I
am not able to get Solr to run tika and index the information in the
sample Word document.

I have looked at the following resources: 
Solr mailing list archive (although I could have missed something here);
Documentation and Getting started on the Apache Tika website;
I even found an article called Content Extraction with Tika at this
website:
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles
/Content-Extraction-Tika This article talks about using curl.  Is curl
necessary to use or does Solr have something already configured to do
the same as curl?

I have modified the solrconfig.xml file to include the request handler
for the ExtractingRequestHandler.  I used the modification that was
commented out in the solrconfig.xml file.  Here it is for reference:

<requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
    <lst name="defaults">
      <str name="ext.map.Last-Modified">last_modified</str>
      <bool name="ext.ignore.und.fl">true</bool>
    </lst>
  </requestHandler>

Is there some modification to this code that I need to make?

Can some one please direct me to a source that can help me get this to
work.


Kevin Miller

Reply via email to