Thanks Jorn and Erick. Hi Erick, looks like the skeletal SOLRJ program attachment is missing.
Thanks Fiz On Sun, Jun 7, 2020 at 12:20 PM Erick Erickson <erickerick...@gmail.com> wrote: > Here’s a skeletal SolrJ program using Tika as another alternative. > > Best, > Erick > > > On Jun 7, 2020, at 2:06 PM, Jörn Franke <jornfra...@gmail.com> wrote: > > > > You have to write an external application that creates multiple threads, > parses the PDFs and index them in Solr. Ideally you parse the PDFs once and > store the resulting text on some file system and then index it. Reason is > that if you upgrade to two major versions of Solr you might need to reindex > again. Then you can save time because you don’t need to parse the PDFs > again. > > It can be also useful in case you are not sure yet about the final > schema and need to index several times in different schemas etc > > > > You can also use Apache manifoldCF. > > > > > > > >> Am 07.06.2020 um 19:19 schrieb Fiz N <fiznewy...@gmail.com>: > >> > >> Hello SOLR Experts, > >> > >> I am working on a POC to Index millions of PDF documents present in > >> Multiple Folder in fileshare. > >> > >> Could you please let me the best practices and step to implement it. > >> > >> Thanks > >> Fiz Nadiyal. > >