+1 as always to Erick’s advice. DIH is only a PoC.
We do have a DigestingParser in Tika, and when you combine that w the
RecursiveParserWrapper, you can get digests not only of the main file but
also on all embedded files/attachments...which can be pretty neat for some
use cases.
Operators are st
I'd consider using a separate Java program that uses Tika directly, or
one of various services. Then you can assemble whatever you please
before sending the doc to Solr. There are multiple reasons to
recommend this, see:
https://lucidworks.com/2012/02/14/indexing-with-solrj/
There are other reason
dear community,
I would like to automatically add a sha256 filehash to a Document field
after a binary file is posted to a ExtractingRequestHandler.
First i thought, that the ExtractingRequestHandler has such a feature, but
so far i did not find a configuration.
It was mentioned that I should impl