I'm also working on this issue right now, to extract the text in the scanned image in PDF files.
>From what I know, we can use Tesseract OCR to extract the text in the image through Apache Tika, and it will come together with the Solr. By the way, which Solr version are you using? Regards, Edwin On 26 March 2017 at 19:58, Waleed Raza <waleed.raza.parhi...@gmail.com> wrote: > Hello > I want to ask you that how can we extract text in solr from images which > are inside pdf and MS office documents ? > i found many websites but did not get a reply of it please guide me. > > On Sun, Mar 26, 2017 at 2:57 PM, Waleed Raza <waleed.raza.parhiyar@gmail. > com > > wrote: > > > Hello > > I want to ask you that how can we extract in solr text from images which > > are inside pdf and MS office documents ? > > i found many websites but did not get a reply of it please guide me. > > > > >