On Wed, 2001-11-14 at 06:35, Andrew Perrin wrote: > Greetings. > > I've just received a grant for a project that will involve scanning and > storing a substantial number (e.g., around 3000) of short documents. These > documents will be analyzed as text, which means I'll have to use OCR > software as well as a scanner with an automatic document feed. > > The possibility exists of purchasing a new machine to do this with, but my > preference is to buy a scanner and use software (free preferred, but will > buy if necessary) that will work with my current machine. I would be > grateful for any advice or experiences others have had with scanning > and/or OCR under linux, particularly debian.
I can only give advice on scanning hardware since I scan photographs, not documents. For the smallest possible headache, choose a SCSI scanner. Both USB and IEEE-1394 are immature in Linux. The SCSI code is very solid. There is an image scanning package called vuescan at http://www.hamrick.com/vsm.html . Vuescan is the best scanning package for Linux, Mac, and Windows. It supports essentially all SCSI and USB scanners, and a sprinkling of 1394 scanners. Perhaps you could use that to image all of your documents and use a separate package to OCR the images. There is an OCR package from Mentalix called Pixel!FX. It supports only SCSI scanners, and I believe it is very expensive. -jwb