On 07/03/2018 09:32, lala wrote:
Thanks for your reply Erick,
Actually I am using Solrj to index files among other operations with Solr,
but to index a large amount of differesnt kinds of file, I'm sending a DIH
request to Solr using Solrj API : FileListEntityProcessor with
TikaEntityParser...
Why not benefit from this technology if Solr offers it? It simplifies our
work tremendosely...
It may simplify your work, but it isn't good practice. Tika has some
heavy lifting to do to extract text from some formats and you should
consider how this load will affect Solr. We've often put Tika into a
different process for this reason.
Isn't there any way to be able to extract inline images in PDF docs??
https://stackoverflow.com/questions/31303735/how-to-extract-images-from-a-file-using-apache-tika
has some useful suggestions.
Charlie
Waiting your reply, best regards...
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
--
Charlie Hull
Flax - Open Source Enterprise Search
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk