Package: fuzzyocr3 Version: 3.5.1-2 Severity: normal Tags: patch Since version 2.03 tesseract requires the image file to have a .tif extension to work properly. However FuzzyOcr uses the filename prep.maketiff.out. This stops tesseract from working as seen from this log entry:
Exec : pnmtotiff -color -truecolor Stdin : </tmp/.spamassassin15003RD32Twtmp/me.pnm Stdout: >/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out Stderr: >/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.err Exec : /usr/bin/tesseract /tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out /tmp/.spamassassin15003RD32Twtmp/scanset.tesseract.out Stdout: >/dev/null Stderr: >/tmp/.spamassassin15003RD32Twtmp/scanset.tesseract.err Elapsed [31574]: 0.233304 sec. (/usr/bin/tesseract: exit 31) Unable to read output from "/tmp/.spamassassin15003RD32Twtmp/scanset.tesseract.out.txt" for scanset tesseract Errors in Scanset "tesseract" Return code: 7936, Error: Tesseract Open Source OCR Engine name_to_image_type:Error:Unrecognized image type:/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out IMAGE::read_header:Error:Can't read this image type:/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out /usr/bin/tesseract:Error:Read of file /failed:/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out Signal_exit 31 ABORT. LocCode: 3 AbortCode: 3 Skipping scanset because of errors, trying next... I patched around this by making sure the maketiff preprocessor uses a different output name and by having the scanner part know this. --- Preprocessor.pm.ORIG 2008-05-15 18:24:22.000000000 +0200 +++ Preprocessor.pm 2008-05-15 18:51:03.000000000 +0200 @@ -15,6 +15,9 @@ sub run { my $tmpdir = FuzzyOcr::Config::get_tmpdir(); my $label = $self->{label}; my $output = "$tmpdir/prep.$label.out"; + if ($label =~ /maketiff/) { + $output = "$tmpdir/prep.$label.tif"; + } my $stderr = ">$tmpdir/prep.$label.err"; my $stdin = undef; --- Scanset.pm.ORIG 2008-05-15 18:56:11.000000000 +0200 +++ Scanset.pm 2008-05-15 19:03:26.000000000 +0200 @@ -63,7 +63,12 @@ sub run { return ($retcode,@result); } # Input of next processor is output of last - $input = "$tmpdir/prep.$plabel.out"; + # Output name of maketiff is special! + if ($plabel =~ /maketiff/) { + $input = "$tmpdir/prep.$plabel.tif"; + } else { + $input = "$tmpdir/prep.$plabel.out"; + } } } It is not the nicest solution, but it works :). The other solution would be to have the .tif filename extension requirement removed from tesseract. I'll leave that discussion to you Debian developers... :) -- System Information: Debian Release: lenny/sid APT prefers unstable APT policy: (500, 'unstable'), (1, 'experimental') Architecture: i386 (i686) Kernel: Linux 2.6.23.14 (PREEMPT) Locale: LANG=C, LC_CTYPE=en_US.ISO-8859-15 (charmap=ISO-8859-15) Shell: /bin/sh linked to /bin/bash Versions of packages fuzzyocr3 depends on: ii gifsicle 1.49-1 Tool for manipulating GIF images ii gocr 0.41-1+b1 A command line OCR ii libmldbm-sync-perl 0.30-2 Perl module for safe concurrent ac ii libstring-approx-perl 3.25-1+b1 Perl extension for approximate mat ii libungif-bin 4.1.6-4 library for GIF images (transition ii netpbm 2:10.0-11.1 Graphics conversion tools ii ocrad 0.17-3 Optical Character Recognition prog ii perl [libdigest-md5-perl] 5.10.0-10 Larry Wall's Practical Extraction ii spamassassin 3.2.4-1 Perl-based spam filter using text fuzzyocr3 recommends no packages. -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]