Hello Shawn, Good news that Solr can do that.
I know that with 30Tb of data, hardware will be the first thing to have. Concerning Expertise, it's the real problem for me. First I think I will do several tests before seeing how Solr works with non-xml document (I have only experience with XML documents) Thanks, Bruno On 6/21/2019 10:32 AM, Matheo Software Info wrote: > My question is very simple JI would like to know if Solr can process > around 30To of data (Pdf, Text, Word, etc.) ? > > What is the best way to index this huge data ? several servers ? > several shards ? other ? Sure, Solr can do that. Whether you have enough resources or expertise available to accomplish it is an entirely different question. Handling that much data is likely going to require a LOT of expensive hardware. The index will almost certainly need to be sharded. Knowing exactly what numbers are involved is impossible with the information available ... and even with more information, it will most likely require experimentation with your actual data to find an optimal solution. https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-don t-have-a-definitive-answer/ Thanks, Shawn --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus