Thanks Hannes, The tool looks good.
Sincerely, Sithu D Sudarsan sithu.sudar...@fda.hhs.gov sdsudar...@ualr.edu -----Original Message----- From: hannesc...@googlemail.com [mailto:hannesc...@googlemail.com] On Behalf Of Hannes Carl Meyer Sent: Thursday, February 26, 2009 11:35 AM To: solr-user@lucene.apache.org Subject: Re: Use of scanned documents for text extraction and indexing Hi Sithu, there is a project called ocropus done by the DFKI, check the online demo here: http://demo.iupr.org/cgi-bin/main.cgi And also http://sites.google.com/site/ocropus/ Regards Hannes m...@hcmeyer.com http://mimblog.de On Thu, Feb 26, 2009 at 5:29 PM, Sudarsan, Sithu D. < sithu.sudar...@fda.hhs.gov> wrote: > > Hi All: > > Is there any study / research done on using scanned paper documents as > images (may be PDF), and then use some OCR or other technique for > extracting text, and the resultant index quality? > > > Thanks in advance, > Sithu D Sudarsan > > sithu.sudar...@fda.hhs.gov > sdsudar...@ualr.edu > > >