Re: Indexing scanned PDFs

Alexandre Rafalovitch Mon, 05 May 2014 22:32:18 -0700

Nothing I am aware of for Solr directly. You may have better luck
chasing this at TIKA mailing list, as that's what Solr uses under
covers to index PDF otherwise. Doing a quick search for Tika and OCR
brings up a number of links.


Regards,
  Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, May 6, 2014 at 12:15 PM, Chandan Tamrakar
<chandan.tamra...@nepasoft.com> wrote:
> we are using SOLr to index pdf documents but there are cases where PDFs
> are usually a scanned document  with no text to extract and index .
>
> Is there a plugin or module in SOLR that we can integrate so that it would
> actually extract a text / OCR and then index?
>
>
> Thanks in advance
>
> Chandan Tamrakar

Re: Indexing scanned PDFs

Reply via email to