Re: Problem with Solr 3.6.1 extracting ODT content using SolrCell's ExtractingRequestHandler

Erick Erickson Tue, 27 Nov 2012 04:38:16 -0800

Not an issue that I know of. I expect you've got some obscure problem in
your definitions, but I'm guession. Try modifying your schema so the glob
pattern maps to a stored field, something like:
<dynamicField name="*" type="string" multiValued="true" stored="true" />
remove all other fields except id, remove your mapping, and try it again.
If you query with fl=* you should see everything that was extracted. That'll
tell you whether it is a problem with Solr/Tika or something in how you're
using
them.


Best
Erick


On Mon, Nov 26, 2012 at 10:19 AM, Brett Melbourne <
bmelbou...@halogensoftware.com> wrote:

> Hi Erik,
>
> The document is committed successfully... it is just missing all the
> extracted content from Tika when I query for that document.
>
> i.e. the mapped content field attr_content is empty
> (fmap.content=attr_content)
>
> <result name="response" numFound="1" start="0" maxScore="1.9162908">
> <doc>
> <float name="score">1.9162908</float>
> <arr name="attr_character_count">
> <str>24</str>
> </arr>
> <arr name="attr_content">
> <str></str>
> </arr>
> <arr name="attr_creation_date">
> <str>2009-04-16T11:32:00</str>
> </arr>
> <arr name="attr_date">
> <str>2012-11-23T00:29:39.73</str>
> </arr>
>
> ...
>
> </result>
>
>
> Brett.
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Sunday, November 25, 2012 9:27 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Problem with Solr 3.6.1 extracting ODT content using
> SolrCell's ExtractingRequestHandler
>
> Did you commit after you added the document but before you tried the
> search?
>
> Best
> Erick
>
>
> On Fri, Nov 23, 2012 at 6:25 PM, Brett Melbourne <
> bmelbou...@halogensoftware.com> wrote:
>
> > Hi all,
> >
> > I am encountering a problem where Solr 3.6.1 is not able to extract
> > the text content from ODT (Open Office Document) files submitted to
> > the ExtractingRequestHandler. I can reproduce this issue against the
> > example schema running with jetty.
> >
> > Executing a simple index request (based on the example in the wiki):
> > curl "
> > http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr
> > _&fmap.content=attr_content&commit=true
> > "<
> > http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr
> > _&fmap.content=attr_content&commit=true%22>
> > -F "myfile=@testfile.odt"
> > returns no errors, and does not generate any exceptions in the
> log/console.
> >
> > A query for doc1 returns an empty attr_content field:
> > <arr name="attr_content"> <str></str> </arr>
> >
> > Oddly enough, executing an "extractOnly=true" request against the
> > ExtractingRequestHandler with the same ODT file correctly returns the
> > text of the file.
> >
> > I am wondering:
> >
> > *         Is this a known issue? (I couldn't find any mention of this
> > particular issue anywhere...)
> >
> > *         Are there any workarounds or does anyone have any suggestions?
> >
> > Thanks,
> >
> > Brett.
> >
> >
>

Re: Problem with Solr 3.6.1 extracting ODT content using SolrCell's ExtractingRequestHandler

Reply via email to