On Jan 28, 2011, at 5:38 PM, Andreas Kemkes wrote:

> Just getting my feet wet with the text extraction using both schema and 
> solrconfig settings from the example directory in the 1.4 distribution, so I 
> might miss something obvious.
> 
> Trying to provide my own title (and discarding the one received through 
> Tika's 
> metadata) wasn't straightforward. I had to use the following:
> 
> fmap.title=tika_title (to discard the Tika title)
> literal.attr_title=New Title (to provide the correct one)
> fmap.attr_title=title (to map it back to the field as I would like to use 
> title 
> in searches)
> 
> Is there anything easier than the above?
> 
> How can this best be generalized to other metadata provided by Tika (which in 
> our use case will be mostly ignored, as it is provided separately)?

You can provide your own ContentHandler (see the wiki docs).  I think it would 
be reasonable to patch the ExtractingRequestHandler to have a no metadata 
option and it wouldn't be that hard.

Reply via email to