Afternoon,

After an upgrade to Solr 3.1 which has largely been very smooth and
painless, I'm having a minor issue with the ExtractingRequestHandler.

The problem is that it's inserting metadata into the extracted
content, as well as mapping it to a dynamic field.  Previously the
same configuration only mapped it to a dynamic field and I'm not sure
how it's managing to add it into my content as well.

The requestHandler configuration is as follows

  <requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <!-- All the main content goes into "text"... if you need to return
           the extracted text or do highlighting, use a stored field.
-->
     <str name="fmap.content_type">attr_source_content_type</str>
     <str name="lowernames">true</str>
     <str name="uprefix">ignored_</str>
    </lst>
  </requestHandler>

The schema has a dynamic field for attr_*, <dynamicField name="attr_*"
type="textgen" indexed="true" stored="true" multiValued="true" />.

The request being submitted is (reformatted for readability, extracted
from the catalina log)

literal.ib_extension=blarg
literal.ib_date=2010-09-09T21:41:30Z
literal.ib_custom2=custom2
resource.name=test.txt
literal.ib_custom3=custom3
literal.ib_authorid=1
literal.ib_custom1=custom1
literal.ib_custom6=custom6
literal.ib_custom7=custom7
literal.ib_custom4=custom4
literal.ib_linkid=1
literal.ib_custom5=custom5
literal.ib_tags=foo
literal.ib_tags=bar
literal.ib_tags=blarg
commit=true
literal.ib_permissionid=1
literal.ib_filters=1
literal.ib_filters=2
literal.ib_filters=3
literal.ib_description=My+Description
literal.ib_title=My+Title
json.nl=map
wt=json
literal.ib_realid=1
literal.ib_custom9=custom9
literal.ib_id=fb1
fmap.content=ib_content
literal.ib_custom8=custom8
literal.ib_type=foobar
uprefix=attr_
literal.ib_clientid=1

After indexing, the ib_content field contains the contents of the
file, prefixed with "stream_content_type application/octet-stream
stream_size 971 Content-Encoding UTF-8 Content-Type text/plain
resourceName test.txt".  These have all been mapped to the dynamic
field, so I have attr_content_encoding, attr_source_content_type,
attr_stream_content_type and attr_stream_size all with their correct
values as well.

There are no copyField parameters to add content from attr_* fields
into anything else and I've had no luck tracking down where this is
coming from.  Has there been some option added which controls this
behaviour?

Cheers,
Liam

Reply via email to