TIKA OCR not working

trung.ht Wed, 22 Apr 2015 21:34:22 -0700

Hi,

I want to use solr to index some scanned document, after settings solr
document with a two field "content" and "filename", I tried to upload the
attached file, but it seems that the content of the file is only "\n \n
\n....".
But if I used the tesseract from command line I got the result correctly.


The log when solr receive my request:
-----------
INFO  - 2015-04-23 03:49:25.941;
org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
webapp=/solr path=/update/extract params={literal.groupid=2&json.nl=flat&
resource.name=phplNiPrs&literal.id
=4&commit=true&extractOnly=false&literal.historyid=4&omitHeader=true&literal.userid=3&literal.createddate=2015-04-22T15:00:00Z&fmap.content=content&wt=json&literal.filename=\\trunght\test\tesseract_3.png}
------------

The document when I check on solr admin page:
-------------
{ "groupid": 2, "id": "4", "historyid": 4, "userid": 3, "createddate":
"2015-04-22T15:00:00Z", "filename": "\\\\trunght\\test\\tesseract_3.png", "
autocomplete_text": [ "\\\\trunght\\test\\tesseract_3.png" ], "content": "
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n \n \n \n \n \n \n \n \n \n \n \n ", "_version_": 1499213034586898400 }
-----------

Since I am a solr newbie I do not know where to look, can anyone give me an
advice for where to look for error or settings to make it work.
Thanks in advanced.

Trung.

TIKA OCR not working

Reply via email to