Jakub Wilk:
> I believe this is now fixed in upstream VCS. Would you mind giving it a
> try? You can download the snapshot at:
> https://bitbucket.org/jwilk/ocrodjvu/get/6a8a22af7232.tar.gz
Hi Jakub,
I tested the current HEAD (9bf208e1f372) from the mercurial repo and the --fix-
utf8 option works
I believe this is now fixed in upstream VCS. Would you mind giving it a
try? You can download the snapshot at:
https://bitbucket.org/jwilk/ocrodjvu/get/6a8a22af7232.tar.gz
--
Jakub Wilk
--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Tro
* Thomas Koch , 2012-05-07, 08:35:
attached the minimal hocr test file that Jeffrey Ratcliffe uses.
Thanks. Ideally the HTML parser would take care of handling such errors,
but it's not the case:
https://bugs.launchpad.net/lxml/+bug/690110
http://bugs.debian.org/671842
I'll probably implemen
Jakub Wilk:
> Could you attach the HTML files to the bug report (or, alternatively,
> send them to me in a private mail)?
Hi Jakub,
thank you for responding so quickly. I reported the same issue to gscan2pdf
and attached the minimal hocr test file that Jeffrey Ratcliffe uses.
The tesseract utf-
* Thomas Koch , 2012-05-06, 20:56:
Tesseract tends to produce non utf-8 characters from time to time. I
tried only german (deu) so far. Even if that seems to be an error with
tesseract, it would be good, if ocrodjvu could continue working.
first exception without --html5 option
Traceback (mos
Package: ocrodjvu
Version: 0.7.9-1
Severity: normal
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Hi,
I already reported the same problem on gscan2pdf. Tesseract tends to produce non
utf-8 characters from time to time. I tried only german (deu) so far. Even if
that seems to be an error with te
6 matches
Mail list logo