Re: Indexing PDF on SOLR 8.5

Fiz N Sun, 07 Jun 2020 14:21:19 -0700

Thanks Erick...

On Sun, Jun 7, 2020 at 1:50 PM Erick Erickson <erickerick...@gmail.com>
wrote:


> https://lucidworks.com/post/indexing-with-solrj/
>
>
> > On Jun 7, 2020, at 3:22 PM, Fiz N <fiznewy...@gmail.com> wrote:
> >
> > Thanks Jorn and Erick.
> >
> > Hi Erick, looks like the skeletal SOLRJ program attachment is missing.
> >
> > Thanks
> > Fiz
> >
> > On Sun, Jun 7, 2020 at 12:20 PM Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> >> Here’s a skeletal SolrJ program using Tika as another alternative.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Jun 7, 2020, at 2:06 PM, Jörn Franke <jornfra...@gmail.com> wrote:
> >>>
> >>> You have to write an external application that creates multiple
> threads,
> >> parses the PDFs and index them in Solr. Ideally you parse the PDFs once
> and
> >> store the resulting text on some file system and then index it. Reason
> is
> >> that if you upgrade to two major versions of Solr you might need to
> reindex
> >> again. Then you can save time because you don’t need to parse the PDFs
> >> again.
> >>> It can be also useful in case you are not sure yet about the final
> >> schema and need to index several times in different schemas etc
> >>>
> >>> You can also use Apache manifoldCF.
> >>>
> >>>
> >>>
> >>>> Am 07.06.2020 um 19:19 schrieb Fiz N <fiznewy...@gmail.com>:
> >>>>
> >>>> Hello SOLR Experts,
> >>>>
> >>>> I am working on a POC to Index millions of PDF documents present in
> >>>> Multiple Folder in fileshare.
> >>>>
> >>>> Could you please let me the best practices and step to implement it.
> >>>>
> >>>> Thanks
> >>>> Fiz Nadiyal.
> >>
> >>
>
>

Re: Indexing PDF on SOLR 8.5

Reply via email to