You can create your own "update processor" that gets control between the output of Tika and the indexing of the document.

See:
http://wiki.apache.org/solr/UpdateRequestProcessor

-- Jack Krupansky

-----Original Message----- From: Raphaël
Sent: Sunday, May 27, 2012 6:24 PM
To: solr-user@lucene.apache.org
Subject: Tika ExtractingRequestHandler and field postprocessing

Hi,

I use Tika through the Solr ExtractingRequestHandler and I face a very
common use case namely: postprocessing fields from Tika in order to normalize
their values or override them with explicitly passed "literal" values.

With exception of some vagues statements about "ContentHandler", I
failed to find some good examples about this (while it appears to be
quite an important feature)

Does anyone knows of some good resources/samples about the proper way to
"postprocess" fields from both Tika results and explicit values ?


PS: I primary thought it was up to the Tika API but have I been
redirected here as Tika only deals with XML/xpath and fields are in the
scope of Solr ExtractingRequestHandler only.


thank you in advance

Reply via email to