Problem with Solr 3.6.1 extracting ODT content using SolrCell's ExtractingRequestHandler

Brett Melbourne Fri, 23 Nov 2012 15:26:18 -0800

Hi all,

I am encountering a problem where Solr 3.6.1 is not able to extract the text 
content from ODT (Open Office Document) files submitted to the 
ExtractingRequestHandler. I can reproduce this issue against the example schema 
running with jetty.


Executing a simple index request (based on the example in the wiki):
curl 
"http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true";<http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true%22>
 -F "myfile=@testfile.odt"
returns no errors, and does not generate any exceptions in the log/console.

A query for doc1 returns an empty attr_content field:
<arr name="attr_content"> <str></str> </arr>

Oddly enough, executing an "extractOnly=true" request against the 
ExtractingRequestHandler with the same ODT file correctly returns the text of 
the file.

I am wondering:

*         Is this a known issue? (I couldn't find any mention of this 
particular issue anywhere...)

*         Are there any workarounds or does anyone have any suggestions?

Thanks,

Brett.

Problem with Solr 3.6.1 extracting ODT content using SolrCell's ExtractingRequestHandler

Reply via email to