"LucidWorks achieved 150k docs/second"
This is only valid is you don't have replication, I don't know your use case, but a realistic use case normally use some type of redundancy to not lost data in a hardware failure, at least 2 replicas, more implicates a reduction of throughput. Also don't forget that in an realistic use case you should handle reads too. Our cluster is small for the data we hold (12 machines with SSD and 32G of RAM), but we don't need sub-second queries, we need facet with high cardinality (in worst case scenarios we aggregate 5M unique string values) As Shawn probably told you, sizing your cluster is a try and error path. Our cluster is optimize to handle a low rate of reads, facet queries and a high rate of inserts. In a peak of inserts we can handle around 25K docs per second without any issue with 2 replicas and without compromise reads or put a node in stress. Nodes in stress can eject him selfs from the Zookepeer cluster due a GC or a lack of CPU to communicate. If you want accuracy data you need to do test. Keep in mind the most important thing about solr in my opinion, in a terabyte scale any field type schema change or LuceneCodec change will force you to do a full reindex. Each time I need to update Solr to a major release it's a pain in the ass to convert the segments if are not compatible with newer version. This can take months, will not ensure your data will be equal that a clean index (voodoo magic thing can happen, thrust me), and it will drain a huge amount of hardware resources to do it without downtime. \-- /Yago Riveiro  On Sep 24 2016, at 7:48 am, S G <sg.online.em...@gmail.com> wrote: > Hey Yago, > > 12 T is very impressive. > > Can you also share some numbers about the shards, replicas, machine count/specs and docs/second for your case? I think you would not be having a single index of 12 TB too. So some details on that would be really helpful too. > > https://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolkit/ is a good post how LucidWorks achieved 150k docs/second. If you have any such similar blog, that would be quite useful and popular too. > > \--SG > > On Fri, Sep 23, 2016 at 5:00 PM, Yago Riveiro <yago.rive...@gmail.com> wrote: > > > In my company we have a SolrCloud cluster with 12T. > > My advices: > > Be nice with CPU you will needed in some point (very important if you have > not control over the kind of queries to the cluster, clients are greedy, > the want all results at the same time) > > SSD and memory (as many as you can afford if you will do facets) > > Full recoveries are a pain, network it's important and should be as fast > as possible, never less than 1Gbit. > > Divide and conquer, but too much can drive you to an expensive overhead, > data travels over the network. Find the sweet point (only testing you use > case you will know) > > \-- > > /Yago Riveiro > > On 23 Sep 2016, 23:44 +0100, Pushkar Raste <pushkar.ra...@gmail.com>, > wrote: > > Solr is RAM hungry. Make sure that you have enough RAM to have most if > the > > index of a core in the RAM itself. > > > > You should also consider using really good SSDs. > > > > That would be a good start. Like others said, test and verify your setup. > > > > \--Pushkar Raste > > > > On Sep 23, 2016 4:58 PM, "Jeffery Yuan" <yuanyun...@gmail.com> wrote: > > > > Thanks so much for your prompt reply. > > > > We are definitely going to use SolrCloud. > > > > I am just wondering whether SolrCloud can scale even at TB data level and > > what kind of hardware configuration it should be. > > > > Thanks. > > > > > > > > \-- > > View this message in context: [http://lucene.472066.n3.](http://lucene.472 066.n3.&r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn) > > nabble.com/Whether-solr-can-support-2-TB-data-tp4297790p4297800.html > > Sent from the Solr - User mailing list archive at Nabble.com. >