This is the same issue I brought up in this thread: http://search-lucene.com/m/s8sOH1YG1TP
As a workaround I wrote an UpdateProcessor to copy/move fields around (SOLR-2599). I think we need a separate fmap for TIKA generated fields (say tmap), so the problem could be fixed by: tmap.title=tika_title literal.title=My client provided title In this way we can cleanly rename or ignore TIKA-generated metadata. Perhaps also an option to add a prefix to all Tika generated fields? tika.prefix=tika_ -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 2. feb. 2011, at 17.13, Grant Ingersoll wrote: > > On Jan 28, 2011, at 5:38 PM, Andreas Kemkes wrote: > >> Just getting my feet wet with the text extraction using both schema and >> solrconfig settings from the example directory in the 1.4 distribution, so I >> might miss something obvious. >> >> Trying to provide my own title (and discarding the one received through >> Tika's >> metadata) wasn't straightforward. I had to use the following: >> >> fmap.title=tika_title (to discard the Tika title) >> literal.attr_title=New Title (to provide the correct one) >> fmap.attr_title=title (to map it back to the field as I would like to use >> title >> in searches) >> >> Is there anything easier than the above? >> >> How can this best be generalized to other metadata provided by Tika (which >> in >> our use case will be mostly ignored, as it is provided separately)? > > You can provide your own ContentHandler (see the wiki docs). I think it > would be reasonable to patch the ExtractingRequestHandler to have a no > metadata option and it wouldn't be that hard.