Package: fuzzyocr3
Version: 3.5.1-2
Severity: normal
Tags: patch

Since version 2.03 tesseract requires the image file to have a .tif
extension to work properly. However FuzzyOcr uses the filename
prep.maketiff.out. This stops tesseract from working as seen from this log
entry:

Exec  : pnmtotiff -color -truecolor
Stdin : </tmp/.spamassassin15003RD32Twtmp/me.pnm
Stdout: >/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out
Stderr: >/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.err
Exec  : /usr/bin/tesseract /tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out 
/tmp/.spamassassin15003RD32Twtmp/scanset.tesseract.out
Stdout: >/dev/null
Stderr: >/tmp/.spamassassin15003RD32Twtmp/scanset.tesseract.err
Elapsed [31574]: 0.233304 sec. (/usr/bin/tesseract: exit 31)
Unable to read output from 
"/tmp/.spamassassin15003RD32Twtmp/scanset.tesseract.out.txt" for scanset 
tesseract
Errors in Scanset "tesseract"
Return code: 7936, Error: Tesseract Open Source OCR Engine
   name_to_image_type:Error:Unrecognized image 
type:/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out
   IMAGE::read_header:Error:Can't read this image 
type:/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out
   /usr/bin/tesseract:Error:Read of file 
/failed:/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out
   Signal_exit 31 ABORT. LocCode: 3  AbortCode: 3
Skipping scanset because of errors, trying next...

I patched around this by making sure the maketiff preprocessor uses a
different output name and by having the scanner part know this.

--- Preprocessor.pm.ORIG        2008-05-15 18:24:22.000000000 +0200
+++ Preprocessor.pm     2008-05-15 18:51:03.000000000 +0200
@@ -15,6 +15,9 @@ sub run {
     my $tmpdir = FuzzyOcr::Config::get_tmpdir();
     my $label = $self->{label};
     my $output = "$tmpdir/prep.$label.out";
+    if ($label =~ /maketiff/) {
+        $output = "$tmpdir/prep.$label.tif";
+    }
     my $stderr = ">$tmpdir/prep.$label.err";
 
     my $stdin = undef;
--- Scanset.pm.ORIG     2008-05-15 18:56:11.000000000 +0200
+++ Scanset.pm  2008-05-15 19:03:26.000000000 +0200
@@ -63,7 +63,12 @@ sub run {
                 return ($retcode,@result);
             }
             # Input of next processor is output of last
-            $input = "$tmpdir/prep.$plabel.out";
+            # Output name of maketiff is special!
+            if ($plabel =~ /maketiff/) {
+                $input = "$tmpdir/prep.$plabel.tif";
+            } else {
+                $input = "$tmpdir/prep.$plabel.out";
+            }
         }
     }

It is not the nicest solution, but it works :). The other solution would be
to have the .tif filename extension requirement removed from tesseract. I'll
leave that discussion to you Debian developers... :)

-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: i386 (i686)

Kernel: Linux 2.6.23.14 (PREEMPT)
Locale: LANG=C, LC_CTYPE=en_US.ISO-8859-15 (charmap=ISO-8859-15)
Shell: /bin/sh linked to /bin/bash

Versions of packages fuzzyocr3 depends on:
ii  gifsicle                     1.49-1      Tool for manipulating GIF images
ii  gocr                         0.41-1+b1   A command line OCR
ii  libmldbm-sync-perl           0.30-2      Perl module for safe concurrent ac
ii  libstring-approx-perl        3.25-1+b1   Perl extension for approximate mat
ii  libungif-bin                 4.1.6-4     library for GIF images (transition
ii  netpbm                       2:10.0-11.1 Graphics conversion tools
ii  ocrad                        0.17-3      Optical Character Recognition prog
ii  perl [libdigest-md5-perl]    5.10.0-10   Larry Wall's Practical Extraction 
ii  spamassassin                 3.2.4-1     Perl-based spam filter using text 

fuzzyocr3 recommends no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to