I'd consider using a separate Java program that uses Tika directly, or one of various services. Then you can assemble whatever you please before sending the doc to Solr. There are multiple reasons to recommend this, see: https://lucidworks.com/2012/02/14/indexing-with-solrj/
There are other reasons why using extractingRequestHandler is problematic in production, the biggest one being that it can blow up your server. Tika has to try to cope with every variant of every document format it processes, and I personally guarantee that the implementations from company X (which is no longer in business) for a PDF file (from a spec current 10 years ago) may "interpret" that spec...er...freely ;) And Tika has to then try to cope. It does a brilliant job, but there's going to be case N+1 The inference, of course, is that extractingRequestHandler is largely a PoC tool IMO, it gets people going without having to write an external program but not something I'd recommend for production. Best, Erick On Thu, May 24, 2018 at 10:06 PM, Thomas Lustig <tm.lus...@gmail.com> wrote: > dear community, > > I would like to automatically add a sha256 filehash to a Document field > after a binary file is posted to a ExtractingRequestHandler. > First i thought, that the ExtractingRequestHandler has such a feature, but > so far i did not find a configuration. > It was mentioned that I should implement my own Update Request Processor > to calculate the hash and add it to a field. > The SignatureUpdateProcessor seemed to be an out-of-the-box option, but it > only supports md5 and also does not access the raw binary stream. > > The important thing is that i do need the binary stream of the uploaded > file to calculate a correct hashvalue (e.g. md5, sha256,..) > Is it possible to also arrange this with a ScriptUpdateProcessor and > javascript?. > > thanks in advance for any help > > Tom