On 4/8/2014 9:48 AM, KNitin wrote:
I am running solr cloud 4.3.1 (there is a plan to upgrade to later
versions but that would take a few months). I noticed a very peculiar solr
behavior in solr that beyond *2496* cores I am unable to create any more
collections due to this error
*Could not get shard id for core.....*
I also noticed in the solr "tree" view that the overseer's collections work
queue gets stuck
( /overseer
collection-queue-work
qn-0000000360
qn-0000000362
qn-0000000364)
The test results are as follows.
With 8 shards and 2 replicas, I can create 156 collections (and then hit
the above error)
With 4 shards and 2 replicas, I can create 312 collections (and then hit
the above error)
With 2 shards and 2 replicas, I can create 624 collections (and then hit
the above error)
The total no of cores is 2496 in all the above cases.
I am unable to create any more collections after this due to cannot get
shard id error?
Is this a known bug or is there a work around for this? Is it fixed in
future releases?
You're probably hitting configuration limits, which are set high enough
for "typical" scalability requirements. Certain things need to be
increased for extreme scalability. I don't know about all of them, so
this is likely an incomplete list:
One of them, most likely the one involved here, is the maximum size of
the zookeeper database - the jute.maxbuffer system property, which
defaults to one megabyte. Another is the maximum number of threads
allowed by the servlet container. In Jetty, this is the maxThreads
parameter. Another is the various connection and thread pool settings
in the ShardHandler config.
http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Unsafe+Options
http://wiki.apache.org/solr/SolrConfigXml#Configuration_of_Shard_Handlers_for_Distributed_searches
https://cwiki.apache.org/confluence/display/solr/Moving+to+the+New+solr.xml+Format
As usual, I could be entirely incorrect about everything I'm saying.
Thanks,
Shawn