You have to write an external application that creates multiple threads, parses the PDFs and index them in Solr. Ideally you parse the PDFs once and store the resulting text on some file system and then index it. Reason is that if you upgrade to two major versions of Solr you might need to reindex again. Then you can save time because you don’t need to parse the PDFs again. It can be also useful in case you are not sure yet about the final schema and need to index several times in different schemas etc
You can also use Apache manifoldCF. > Am 07.06.2020 um 19:19 schrieb Fiz N <fiznewy...@gmail.com>: > > Hello SOLR Experts, > > I am working on a POC to Index millions of PDF documents present in > Multiple Folder in fileshare. > > Could you please let me the best practices and step to implement it. > > Thanks > Fiz Nadiyal.