On 6/20/2018 9:05 AM, neotorand wrote:
I have a specific Requirement where i need to index below things
Meta Data of any document
Some parts from the Document that matches some keywords that i configure
The first part i am able to achieve through ERH or FilelistEntityProcessor.
I am struggling on second part.I am looking for an effective and smart
approach to handle this.
Can any one give me a pointer or help with this.
Write a custom indexing program to compile precisely the information
that you need and send that to Solr.
Yes, that is a serious suggestion. Solr itself is very capable, but it
can't do everything that every user's specific business requirements
dictate. A large percentage of Solr users have written custom indexing
programs.
It is strongly recommended that the ExtractingRequestHandler never be
used in production, because the Tika software it utilizes is prone to
serious problems that might extend as far as an actual program crash.
If Tika crashes and it's running inside Solr, then Solr crashes too.
Running Tika in a custom indexing program instead is recommended, so
that if it crashes, it's only the indexing program that dies, not Solr.
Thanks,
Shawn