RE: Use of scanned documents for text extraction and indexing

Sudarsan, Sithu D. Thu, 26 Feb 2009 08:51:41 -0800

Thanks Hannes,

The tool looks good.


Sincerely,
Sithu D Sudarsan

sithu.sudar...@fda.hhs.gov
sdsudar...@ualr.edu

-----Original Message-----
From: hannesc...@googlemail.com [mailto:hannesc...@googlemail.com] On
Behalf Of Hannes Carl Meyer
Sent: Thursday, February 26, 2009 11:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Use of scanned documents for text extraction and indexing

Hi Sithu,

there is a project called ocropus done by the DFKI, check the online
demo
here: http://demo.iupr.org/cgi-bin/main.cgi

And also http://sites.google.com/site/ocropus/

Regards

Hannes

m...@hcmeyer.com
http://mimblog.de

On Thu, Feb 26, 2009 at 5:29 PM, Sudarsan, Sithu D. <
sithu.sudar...@fda.hhs.gov> wrote:

>
> Hi All:
>
> Is there any study / research done on using scanned paper documents as
> images (may be PDF), and then use some OCR or other technique for
> extracting text, and the resultant index quality?
>
>
> Thanks in advance,
> Sithu D Sudarsan
>
> sithu.sudar...@fda.hhs.gov
> sdsudar...@ualr.edu
>
>
>

RE: Use of scanned documents for text extraction and indexing

Reply via email to