ExtractRequestHandler - not properly indexing office docs?

cloax Fri, 19 Jun 2009 16:20:46 -0700

Hi there, 

I've got a Solr instance running and am feeding it rich binary documents to
index from a Django application. The setup works just fine with pdf's, etc..
but no matter what type of MS Word document ( doc and docx ) I feed it I
can't get any results when searching for content-related queries.


I've curl'd with extract.only to verify that Solr ( and tika ) could extract
the contents, and it happily enough spits back the extracted XHTML to me.
That content never seems to find it's way into the ext.def.fl that I have
specified. 

When I go and search for terms specific to content in those documents, I get
zero hits. However I get hits on metadata related queries ( ie: i store
username of who uploaded it, etc.. ) 

Is there some magical bit I forgot to flip?

cheers,
joe
-- 
View this message in context: 
http://www.nabble.com/ExtractRequestHandler---not-properly-indexing-office-docs--tp24120125p24120125.html
Sent from the Solr - User mailing list archive at Nabble.com.

ExtractRequestHandler - not properly indexing office docs?

Reply via email to