Re: Index scanned documents

Zheng Lin Edwin Yeo Sun, 26 Mar 2017 08:43:05 -0700

I'm also working on this issue right now, to extract the text in the
scanned image in PDF files.


>From what I know, we can use Tesseract OCR to extract the text in the image
through Apache Tika, and it will come together with the Solr.

By the way, which Solr version are you using?

Regards,
Edwin


On 26 March 2017 at 19:58, Waleed Raza <waleed.raza.parhi...@gmail.com>
wrote:

> Hello
> I want to ask you that how can we extract text in solr from images which
> are inside pdf and MS office documents ?
> i found many websites but did not get a reply of it please guide me.
>
> On Sun, Mar 26, 2017 at 2:57 PM, Waleed Raza <waleed.raza.parhiyar@gmail.
> com
> > wrote:
>
> > Hello
> > I want to ask you that how can we extract in solr text from images which
> > are inside pdf and MS office documents ?
> > i found many websites but did not get a reply of it please guide me.
> >
> >
>

Re: Index scanned documents

Reply via email to