Text in images are not extracted and indexed to content

Zheng Lin Edwin Yeo Mon, 09 Apr 2018 19:24:03 -0700

Hi,

Currently I am facing issue whereby the text in images file like jpg, bmp
are not being extracted out and indexed. After the indexing, Tika did
extract all the meta data out and index them under the fields attr_*.
However, the content field is always empty for images file. For other types
of document files like .doc, the content is extracted correctly.


I have already updated the tika-parsers-1.17.jar, under
\prg\apache\tika\parser\pdf\ for extractInlineImages to true.


What could be the reason?

I have just upgraded to Solr 7.3.0.

Regards,
Edwin

Text in images are not extracted and indexed to content

Reply via email to