Thanks Erick... On Sun, Jun 7, 2020 at 1:50 PM Erick Erickson <erickerick...@gmail.com> wrote:
> https://lucidworks.com/post/indexing-with-solrj/ > > > > On Jun 7, 2020, at 3:22 PM, Fiz N <fiznewy...@gmail.com> wrote: > > > > Thanks Jorn and Erick. > > > > Hi Erick, looks like the skeletal SOLRJ program attachment is missing. > > > > Thanks > > Fiz > > > > On Sun, Jun 7, 2020 at 12:20 PM Erick Erickson <erickerick...@gmail.com> > > wrote: > > > >> Here’s a skeletal SolrJ program using Tika as another alternative. > >> > >> Best, > >> Erick > >> > >>> On Jun 7, 2020, at 2:06 PM, Jörn Franke <jornfra...@gmail.com> wrote: > >>> > >>> You have to write an external application that creates multiple > threads, > >> parses the PDFs and index them in Solr. Ideally you parse the PDFs once > and > >> store the resulting text on some file system and then index it. Reason > is > >> that if you upgrade to two major versions of Solr you might need to > reindex > >> again. Then you can save time because you don’t need to parse the PDFs > >> again. > >>> It can be also useful in case you are not sure yet about the final > >> schema and need to index several times in different schemas etc > >>> > >>> You can also use Apache manifoldCF. > >>> > >>> > >>> > >>>> Am 07.06.2020 um 19:19 schrieb Fiz N <fiznewy...@gmail.com>: > >>>> > >>>> Hello SOLR Experts, > >>>> > >>>> I am working on a POC to Index millions of PDF documents present in > >>>> Multiple Folder in fileshare. > >>>> > >>>> Could you please let me the best practices and step to implement it. > >>>> > >>>> Thanks > >>>> Fiz Nadiyal. > >> > >> > >