Afternoon, After an upgrade to Solr 3.1 which has largely been very smooth and painless, I'm having a minor issue with the ExtractingRequestHandler.
The problem is that it's inserting metadata into the extracted content, as well as mapping it to a dynamic field. Previously the same configuration only mapped it to a dynamic field and I'm not sure how it's managing to add it into my content as well. The requestHandler configuration is as follows <requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler" > <lst name="defaults"> <!-- All the main content goes into "text"... if you need to return the extracted text or do highlighting, use a stored field. --> <str name="fmap.content_type">attr_source_content_type</str> <str name="lowernames">true</str> <str name="uprefix">ignored_</str> </lst> </requestHandler> The schema has a dynamic field for attr_*, <dynamicField name="attr_*" type="textgen" indexed="true" stored="true" multiValued="true" />. The request being submitted is (reformatted for readability, extracted from the catalina log) literal.ib_extension=blarg literal.ib_date=2010-09-09T21:41:30Z literal.ib_custom2=custom2 resource.name=test.txt literal.ib_custom3=custom3 literal.ib_authorid=1 literal.ib_custom1=custom1 literal.ib_custom6=custom6 literal.ib_custom7=custom7 literal.ib_custom4=custom4 literal.ib_linkid=1 literal.ib_custom5=custom5 literal.ib_tags=foo literal.ib_tags=bar literal.ib_tags=blarg commit=true literal.ib_permissionid=1 literal.ib_filters=1 literal.ib_filters=2 literal.ib_filters=3 literal.ib_description=My+Description literal.ib_title=My+Title json.nl=map wt=json literal.ib_realid=1 literal.ib_custom9=custom9 literal.ib_id=fb1 fmap.content=ib_content literal.ib_custom8=custom8 literal.ib_type=foobar uprefix=attr_ literal.ib_clientid=1 After indexing, the ib_content field contains the contents of the file, prefixed with "stream_content_type application/octet-stream stream_size 971 Content-Encoding UTF-8 Content-Type text/plain resourceName test.txt". These have all been mapped to the dynamic field, so I have attr_content_encoding, attr_source_content_type, attr_stream_content_type and attr_stream_size all with their correct values as well. There are no copyField parameters to add content from attr_* fields into anything else and I've had no luck tracking down where this is coming from. Has there been some option added which controls this behaviour? Cheers, Liam