e.org; Shashi Kant
Subject: Re: Use of scanned documents for text extraction and indexing
Check this:
http://code.google.com/p/ocropus/wiki/FrequentlyAskedQuestions
> How well does it work?
>
The character recognition accuracy of OCRopus right now (04/2007) is
about
> like Tesseract. That'
Thu, Feb 26, 2009 at 12:11 PM, Shashi Kant
> wrote:
>
> > Another project worth investigating is Tesseract.
> >
> > http://code.google.com/p/tesseract-ocr/
> >
> >
> >
> >
> > - Original Message
> > From: Hannes Carl Meyer
> > To: solr-user@lucene.a
You can use Tesseract, an openSource OCR Engine owned from Google. Its
native C Code and to use it in Java you should use JNI or direct process
creation. There is no PDF support, but you can use imagemagick to
convert those docs on the fly. The engine scan documents line by line
without trying
ebruary 26, 2009 9:21:07 PM
Subject: Re: Use of scanned documents for text extraction and indexing
Tesseract is pure OCR. Ocropus builds on Tesseract.
Vikram
On Thu, Feb 26, 2009 at 12:11 PM, Shashi Kant wrote:
> Another project worth investigating is Tesseract.
>
> http://code.google.
Carl Meyer
> To: solr-user@lucene.apache.org
> Sent: Thursday, February 26, 2009 11:35:14 AM
> Subject: Re: Use of scanned documents for text extraction and indexing
>
> Hi Sithu,
>
> there is a project called ocropus done by the DFKI, check the online demo
> here: http
There is quite a bit of litterature available on this topic. This paper
presents a summary. Nothing immediately applicable I'm afraid.
Retrieving OCR Text: A survey of current approaches
Steven M. Beitzel, Eric C. Jensen, David A Grossman
Illinois Institute of Technology
It lists a number of othe
Another project worth investigating is Tesseract.
http://code.google.com/p/tesseract-ocr/
- Original Message
From: Hannes Carl Meyer
To: solr-user@lucene.apache.org
Sent: Thursday, February 26, 2009 11:35:14 AM
Subject: Re: Use of scanned documents for text extraction and indexing
@lucene.apache.org
Subject: Re: Use of scanned documents for text extraction and indexing
Hi Sithu,
there is a project called ocropus done by the DFKI, check the online
demo
here: http://demo.iupr.org/cgi-bin/main.cgi
And also http://sites.google.com/site/ocropus/
Regards
Hannes
m
Hi Sithu,
there is a project called ocropus done by the DFKI, check the online demo
here: http://demo.iupr.org/cgi-bin/main.cgi
And also http://sites.google.com/site/ocropus/
Regards
Hannes
m...@hcmeyer.com
http://mimblog.de
On Thu, Feb 26, 2009 at 5:29 PM, Sudarsan, Sithu D. <
sithu.sudar...