e.org; Shashi Kant
Subject: Re: Use of scanned documents for text extraction and indexing
Check this:
http://code.google.com/p/ocropus/wiki/FrequentlyAskedQuestions
> How well does it work?
>
The character recognition accuracy of OCRopus right now (04/2007) is
about
> like Tesseract. That'
Thu, Feb 26, 2009 at 12:11 PM, Shashi Kant
> wrote:
>
> > Another project worth investigating is Tesseract.
> >
> > http://code.google.com/p/tesseract-ocr/
> >
> >
> >
> >
> > - Original Message
> > From: Hannes Carl Meyer
> > To: solr-user@lucene.a
terested in this too.
--Renaud
-Original Message-
From: Sudarsan, Sithu D. [mailto:sithu.sudar...@fda.hhs.gov]
Sent: Thursday, February 26, 2009 8:29 AM
To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
Subject: Use of scanned documents for text extraction and indexing
Hi All:
ebruary 26, 2009 9:21:07 PM
Subject: Re: Use of scanned documents for text extraction and indexing
Tesseract is pure OCR. Ocropus builds on Tesseract.
Vikram
On Thu, Feb 26, 2009 at 12:11 PM, Shashi Kant wrote:
> Another project worth investigating is Tesseract.
>
> http://code.google.
Carl Meyer
> To: solr-user@lucene.apache.org
> Sent: Thursday, February 26, 2009 11:35:14 AM
> Subject: Re: Use of scanned documents for text extraction and indexing
>
> Hi Sithu,
>
> there is a project called ocropus done by the DFKI, check the online demo
> here: http
apache.org
Subject: Use of scanned documents for text extraction and indexing
Hi All:
Is there any study / research done on using scanned paper documents as
images (may be PDF), and then use some OCR or other technique for extracting
text, and the resultant index quality?
Thanks in advanc
Another project worth investigating is Tesseract.
http://code.google.com/p/tesseract-ocr/
- Original Message
From: Hannes Carl Meyer
To: solr-user@lucene.apache.org
Sent: Thursday, February 26, 2009 11:35:14 AM
Subject: Re: Use of scanned documents for text extraction and indexing
@lucene.apache.org
Subject: Re: Use of scanned documents for text extraction and indexing
Hi Sithu,
there is a project called ocropus done by the DFKI, check the online
demo
here: http://demo.iupr.org/cgi-bin/main.cgi
And also http://sites.google.com/site/ocropus/
Regards
Hannes
m
Hi Sithu,
there is a project called ocropus done by the DFKI, check the online demo
here: http://demo.iupr.org/cgi-bin/main.cgi
And also http://sites.google.com/site/ocropus/
Regards
Hannes
m...@hcmeyer.com
http://mimblog.de
On Thu, Feb 26, 2009 at 5:29 PM, Sudarsan, Sithu D. <
sithu.sudar...
Hi All:
Is there any study / research done on using scanned paper documents as
images (may be PDF), and then use some OCR or other technique for
extracting text, and the resultant index quality?
Thanks in advance,
Sithu D Sudarsan
sithu.sudar...@fda.hhs.gov
sdsudar...@ualr.edu
10 matches
Mail list logo