Package: ocrodjvu
Version: 0.4.2-1
Severity: normal

When processing a copy of 

http://fleksem.klf.uw.edu.pl/~jsbien/tmp/Trotz1/Trotz.djvu

with 

ocrodjvu --language deu-f --render all -o Troc_deu-f.djvu --word-segmentation 
uax29 Trotz1/Trotz.djvu 

ocrodjvu crashed after 14 hours with the message:

--8<---------------cut here---------------start------------->8---

- Page #1284
ocroscript: /usr/share/ocropus/scripts//lib/hocr.lua:28: rectangle parsing error
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.5/threading.py", line 486, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.5/threading.py", line 446, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/share/ocrodjvu/lib/_ocrodjvu.py", line 443, in page_thread
    result = self.process_page(page)
  File "/usr/share/ocrodjvu/lib/_ocrodjvu.py", line 428, in process_page
    html_file.close()
  File "/usr/lib/python2.5/contextlib.py", line 33, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/share/ocrodjvu/lib/_ocrodjvu.py", line 189, in recognize
    ocropus.wait()
  File "/usr/share/ocrodjvu/lib/ipc.py", line 58, in wait
    raise CalledProcessError(return_code, self.__command)
CalledProcessError: Command 'ocroscript' returned non-zero exit status 1

--8<---------------cut here---------------end--------------->8---

I have several wishlist items related to the problem:

1. There should be a way to preserve ocrodjvu.djvused in the case of
   crash.

2. The user should have a choice where to store debugging output. I
   use rather small system partition and in consequence debugging
   large ocrodjvu job requires splitting it into smaller ones, which
   is obviously cumbersome.

3. In case the temporary files are preserved, it would be useful to be
   able to resume processing.

Best regards

Janusz

-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i686)

Kernel: Linux 2.6.32-trunk-686 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages ocrodjvu depends on:
ii  djvulibre-bin                 3.5.22-8   Utilities for the DjVu image forma
ii  python                        2.5.4-9    An interactive high-level object-o
ii  python-argparse               1.1-1      optparse-inspired command-line par
ii  python-djvu                   0.1.17-1   Python support for the DjVu image 
ii  python-lxml                   2.2.6-1    pythonic binding for the libxml2 a
ii  python-support                1.0.6.1    automated rebuilding support for P

Versions of packages ocrodjvu recommends:
ii  ocropus                       0.3.1-2    document analysis and OCR system
ii  python-pyicu                  0.9-2      Python extension wrapping the ICU 
ii  tesseract-ocr                 2.04-2     Command line OCR tool

Versions of packages ocrodjvu suggests:
pn  cuneiform                     <none>     (no description available)

-- no debconf information


-- 
                     ,   
dr hab. Janusz S. Bien, prof. UW -  Uniwersytet Warszawski (Katedra Lingwistyki 
Formalnej)
Prof. Janusz S. Bien - Warsaw University (Department of Formal Linguistics)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to