Hi there, I've got a Solr instance running and am feeding it rich binary documents to index from a Django application. The setup works just fine with pdf's, etc.. but no matter what type of MS Word document ( doc and docx ) I feed it I can't get any results when searching for content-related queries.
I've curl'd with extract.only to verify that Solr ( and tika ) could extract the contents, and it happily enough spits back the extracted XHTML to me. That content never seems to find it's way into the ext.def.fl that I have specified. When I go and search for terms specific to content in those documents, I get zero hits. However I get hits on metadata related queries ( ie: i store username of who uploaded it, etc.. ) Is there some magical bit I forgot to flip? cheers, joe -- View this message in context: http://www.nabble.com/ExtractRequestHandler---not-properly-indexing-office-docs--tp24120125p24120125.html Sent from the Solr - User mailing list archive at Nabble.com.