This is the same issue I brought up in this thread: 
http://search-lucene.com/m/s8sOH1YG1TP

As a workaround I wrote an UpdateProcessor to copy/move fields around 
(SOLR-2599). 
I think we need a separate fmap for TIKA generated fields (say tmap), so the 
problem could be fixed by:

tmap.title=tika_title
literal.title=My client provided title

In this way we can cleanly rename or ignore TIKA-generated metadata. Perhaps 
also an option to add a prefix to all Tika generated fields?

tika.prefix=tika_

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 2. feb. 2011, at 17.13, Grant Ingersoll wrote:

> 
> On Jan 28, 2011, at 5:38 PM, Andreas Kemkes wrote:
> 
>> Just getting my feet wet with the text extraction using both schema and 
>> solrconfig settings from the example directory in the 1.4 distribution, so I 
>> might miss something obvious.
>> 
>> Trying to provide my own title (and discarding the one received through 
>> Tika's 
>> metadata) wasn't straightforward. I had to use the following:
>> 
>> fmap.title=tika_title (to discard the Tika title)
>> literal.attr_title=New Title (to provide the correct one)
>> fmap.attr_title=title (to map it back to the field as I would like to use 
>> title 
>> in searches)
>> 
>> Is there anything easier than the above?
>> 
>> How can this best be generalized to other metadata provided by Tika (which 
>> in 
>> our use case will be mostly ignored, as it is provided separately)?
> 
> You can provide your own ContentHandler (see the wiki docs).  I think it 
> would be reasonable to patch the ExtractingRequestHandler to have a no 
> metadata option and it wouldn't be that hard.

Reply via email to