You can create your own "update processor" that gets control between the
output of Tika and the indexing of the document.
See:
http://wiki.apache.org/solr/UpdateRequestProcessor
-- Jack Krupansky
-----Original Message-----
From: Raphaël
Sent: Sunday, May 27, 2012 6:24 PM
To: solr-user@lucene.apache.org
Subject: Tika ExtractingRequestHandler and field postprocessing
Hi,
I use Tika through the Solr ExtractingRequestHandler and I face a very
common use case namely: postprocessing fields from Tika in order to
normalize
their values or override them with explicitly passed "literal" values.
With exception of some vagues statements about "ContentHandler", I
failed to find some good examples about this (while it appears to be
quite an important feature)
Does anyone knows of some good resources/samples about the proper way to
"postprocess" fields from both Tika results and explicit values ?
PS: I primary thought it was up to the Tika API but have I been
redirected here as Tika only deals with XML/xpath and fields are in the
scope of Solr ExtractingRequestHandler only.
thank you in advance