On Jan 28, 2011, at 5:38 PM, Andreas Kemkes wrote: > Just getting my feet wet with the text extraction using both schema and > solrconfig settings from the example directory in the 1.4 distribution, so I > might miss something obvious. > > Trying to provide my own title (and discarding the one received through > Tika's > metadata) wasn't straightforward. I had to use the following: > > fmap.title=tika_title (to discard the Tika title) > literal.attr_title=New Title (to provide the correct one) > fmap.attr_title=title (to map it back to the field as I would like to use > title > in searches) > > Is there anything easier than the above? > > How can this best be generalized to other metadata provided by Tika (which in > our use case will be mostly ignored, as it is provided separately)?
You can provide your own ContentHandler (see the wiki docs). I think it would be reasonable to patch the ExtractingRequestHandler to have a no metadata option and it wouldn't be that hard.