Hi Bruno, Assuming you meant 30TB, the first step is to use TIka parser and convert the rich documents into plain text.
We need the number of documents, the unofficial word on the street is about 50 million documents per shard, of course a lot of parameters are involved in this, it's a simple question but answer is not so simple :). Hope this helps. Thanks Sam https://www.linkedin.com/in/skasimalla/ On Fri, Jun 21, 2019 at 12:49 PM Matheo Software Info < i...@matheo-software.com> wrote: > Dear Solr User, > > > > My question is very simple J I would like to know if Solr can process > around 30To of data (Pdf, Text, Word, etc…) ? > > > > What is the best way to index this huge data ? several servers ? several > shards ? other ? > > > > Many thanks for your information, > > > > > > Cordialement, Best Regards > > Bruno Mannina > > www.matheo-software.com > > www.patent-pulse.com > > Tél. +33 0 970 738 743 > > Mob. +33 0 634 421 817 > > [image: facebook (1)] <https://www.facebook.com/PatentPulse>[image: > 1425551717] <https://twitter.com/matheosoftware>[image: 1425551737] > <https://www.linkedin.com/company/matheo-software>[image: 1425551760] > <https://www.youtube.com/user/MatheoSoftware> > > > > > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> > Garanti > sans virus. www.avast.com > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> > <#m_149119889610705423_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >