Again, I think you are best to do it out of Solr. But even of you want to get it to work in Solr, I think you start by getting it to work directly in Tika. Then, get the missing libraries and configuration into Solr.
Regards, Alex On Wed, Oct 23, 2019, 7:08 PM suresh pendap, <sureshpen...@gmail.com> wrote: > Hi Alex, > Thanks for your reply. How do we integrate tesseract with Solr? Do we have > to implement Custom update processor or extend the > ExtractingRequestProcessor? > > Regards > Suresh > > On Wed, Oct 23, 2019 at 11:21 AM Alexandre Rafalovitch <arafa...@gmail.com > > > wrote: > > > I believe Tika that powers this can do so with extra libraries > (tesseract?) > > But Solr does not bundle those extras. > > > > In any case, you may want to run Tika externally to avoid the > > conversion/extraction process be a burden to Solr itself. > > > > Regards, > > Alex > > > > On Wed, Oct 23, 2019, 1:58 PM suresh pendap, <sureshpen...@gmail.com> > > wrote: > > > > > Hello, > > > I am reading the Solr documentation about integration with Tika and > Solr > > > Cell framework over here > > > > > > > > > https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html > > > > > > I would like to know if the can Solr Cell framework also be used to > > extract > > > text from the image files? > > > > > > Regards > > > Suresh > > > > > >