Re: regarding Extracting text from Images

Alexandre Rafalovitch Wed, 23 Oct 2019 16:17:06 -0700

Again, I think you are best to do it out of Solr.

But even of you want to get it to work in Solr, I think you start by
getting it to work directly in Tika. Then, get the missing libraries and
configuration into Solr.


Regards,
    Alex

On Wed, Oct 23, 2019, 7:08 PM suresh pendap, <sureshpen...@gmail.com> wrote:

> Hi Alex,
> Thanks for your reply. How do we integrate tesseract with Solr?  Do we have
> to implement Custom update processor or extend the
> ExtractingRequestProcessor?
>
> Regards
> Suresh
>
> On Wed, Oct 23, 2019 at 11:21 AM Alexandre Rafalovitch <arafa...@gmail.com
> >
> wrote:
>
> > I believe Tika that powers this can do so with extra libraries
> (tesseract?)
> > But Solr does not bundle those extras.
> >
> > In any case, you may want to run Tika externally to avoid the
> > conversion/extraction process be a burden to Solr itself.
> >
> > Regards,
> >      Alex
> >
> > On Wed, Oct 23, 2019, 1:58 PM suresh pendap, <sureshpen...@gmail.com>
> > wrote:
> >
> > > Hello,
> > > I am reading the Solr documentation about integration with Tika and
> Solr
> > > Cell framework over here
> > >
> > >
> >
> https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html
> > >
> > > I would like to know if the can Solr Cell framework also be used to
> > extract
> > > text from the image files?
> > >
> > > Regards
> > > Suresh
> > >
> >
>

Re: regarding Extracting text from Images

Reply via email to