We have been running Solr 5.4 in master-slave mode with ~4500 cores for a 
couple of years very successfully.  The cores represent individual customer 
data, so they can vary greatly in size, and some of them have gotten too large 
to be manageable.

We are trying to upgrade to Solr 7.3 in cloud mode, with ~4500 collections, 2  
NRTreplicas total per collection.  We have experimented with additional servers 
and ZK nodes as a part of this move.  We can create up to ~4000 collections, 
with a slow-down to ~20s per collection to create, but if we go much beyond 
that, the time to create collections shoots up, some collections fail to be 
created, and we see some of the nodes crash.  Autoscaling brings nodes back 
into the cluster, but they don’t have all the replicas created on them that 
they should – we’re pretty sure this is related to the challenge of adding the 
large number of collections on those node as they come up.

There are some approaches we could take that don’t separate our customers into 
collections, but we get some benefits from this approach that we’d like to 
keep.  We’d also like to add the benefits of cloud, like balancing where 
collections are placed and the ability to split large collections.

Is anyone successfully running Solr 7x in cloud mode with thousands or more of 
collections?  Are there some configurations we should be taking a closer look 
at to make this feasible?  Should we try a different replica type?  (We do want 
NRT-like query latency, but we also index heavily – this cluster will have 10’s 
of millions of documents.)

I should note that the problems are not due to the number of documents – the 
problems occur on a new cluster while we’re creating the collections we know 
we’ll need.

Monica Skidmore



Reply via email to