Re: Use of scanned documents for text extraction and indexing

Vikram Kumar Thu, 26 Feb 2009 18:21:42 -0800

Tesseract is pure OCR. Ocropus builds on Tesseract.
Vikram

On Thu, Feb 26, 2009 at 12:11 PM, Shashi Kant <[email protected]> wrote:


> Another project worth investigating is Tesseract.
>
> http://code.google.com/p/tesseract-ocr/
>
>
>
>
> ----- Original Message ----
> From: Hannes Carl Meyer <[email protected]>
> To: [email protected]
> Sent: Thursday, February 26, 2009 11:35:14 AM
> Subject: Re: Use of scanned documents for text extraction and indexing
>
> Hi Sithu,
>
> there is a project called ocropus done by the DFKI, check the online demo
> here: http://demo.iupr.org/cgi-bin/main.cgi
>
> And also http://sites.google.com/site/ocropus/
>
> Regards
>
> Hannes
>
> [email protected]
> http://mimblog.de
>
> On Thu, Feb 26, 2009 at 5:29 PM, Sudarsan, Sithu D. <
> [email protected]> wrote:
>
> >
> > Hi All:
> >
> > Is there any study / research done on using scanned paper documents as
> > images (may be PDF), and then use some OCR or other technique for
> > extracting text, and the resultant index quality?
> >
> >
> > Thanks in advance,
> > Sithu D Sudarsan
> >
> > [email protected]
> > [email protected]
> >
> >
> >
>
>

Re: Use of scanned documents for text extraction and indexing

Reply via email to