Re: Using Solr Cell to index a Word Document

2009-08-18 Thread Mark Miller
Solr defers to Tika for this. Tika uses getParagraph text from the POI WordExtractor class: http://poi.apache.org/apidocs/org/apache/poi/hwpf/extractor/WordExtractor.html POI appears to be in limbo and I'm not seeing anything in WordExtractor that looks like it might help you. I'd inquire at

Using Solr Cell to index a Word Document

2009-08-18 Thread Kevin Miller
I am using the Solr nightly build 8/11/09. I have set the text field in the solrconfig.xml file to be stored. I index an MS Word document and when I search for a word in the text of the document and it pulls up the xml format. The text field is showing the text of the document but there are a