Are you sure date_modified is a meta-data field in the PDF document
you're extracting?

Best,
Erick

On Sat, Jan 11, 2014 at 3:00 AM, sweety <sweetyshind...@yahoo.com> wrote:
> I need to index rich text documents, this is* solrconfig.xml for extract
> handler*:
> <requestHandler name="/update/extract"
> class="solr.extraction.ExtractingRequestHandler" >
> <lst name="defaults">
>
> <str name="lowernames">true</str>
> <str name="uprefix">ignored_</str>
> <str name="captureAttr">true</str>
> </lst>
> </requestHandler>
>
> My *schema.xml* is:
> <field name="doc_id" type="uuid" indexed="true" stored="true" default="NEW"
> multiValued="false"/>
> <field name="id" type="long" indexed="true" stored="true" required="true"
> multiValued="false"/>
> <field name="contents" type="text" indexed="true" stored="true"
> multiValued="false"/>
> <field name="author" type="title_text" indexed="true" stored="true"
> multiValued="true"/>
> <field name="title" type="title_text" indexed="true" stored="true"/>
> <field name="date_modified" type="date" indexed="true" stored="true"
> multivalued="true"/>
> <field name="_version_" type="long" indexed="true" stored="true"
> multiValued="false"/>
> <dynamicField name="ignored_*" type="text" indexed="true" stored="true"
> multiValued="true"/>
>
>
> But after *indexing using this curl*:
> curl
> "http://localhost:8080/solr/document/update/extract?literal.id=12&commit=true";
> -F"myfile=Coding.pdf"
> when queried as q=id:12, the *output* is :
> <arr name="ignored_stream_source_info">
> <str>myfile</str>
> </arr>
> <arr name="ignored_stream_content_type">
> <str>application/octet-stream</str>
> </arr>
> <arr name="ignored_stream_size">
> <str>3336935</str>
> </arr>
> <arr name="ignored_stream_name">
> <str>Coding.pdf</str>
> </arr>
> <arr name="ignored_content_type">
> <str>application/pdf</str>
> </arr>
> <str name="contents"></str>     ----*Contents not shown*
> <long name="_version_">1456831756526157824</long>
> <str name="doc_id">8eb229e0-5f25-4d26-bba4-6cb67aab7f81</str>
> </doc>
>
> Why is it so??
>
> Also date_modified field does not appear??
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to