Hell folks We are currently using solrcloud 4.3.1. We have 8 node solrcloud cluster with 32 cores, 60Gb of ram and SSDs.We are using zk to manage the solrconfig used by our collections
We have many collections and some of them are relatively very large compared to the other. The size of the shard of these big collections are in the order of Gigabytes.We decided to split the bigger collection evenly across all nodes (8 shards and 2 replicas) with maxNumShards > 1. We did a test with a read load only on one big collection and we still see only 2 nodes running 100% CPU and the rest are blazing through the queries way faster (under 30% cpu). [Despite all of them being sharded across all nodes] I checked the JVM usage and found that none of the pools have high utilization (except Survivor space which is 100%). The GC cycles are in the order of ms and mostly doing scavenge. Mark and sweep occurs once every 30 minutes Few questions: 1. Sharding all collections (small and large) across all nodes evenly distributes the load and makes the system characteristics of all machines similar. Is this a recommended way to do ? 2. Solr Cloud does a distributed query by default. So if a node is at 100% CPU does it slow down the response time for the other nodes waiting for this query? (or does it have a timeout if it cannot get a response from a node within x seconds?) 3. Our collections use Mmap directory but i specifically haven't enabled anything related to mmaps (locked pages under ulimit ). Does it adverse affect performance? or can lock pages even without this? Thanks a lot in advance. Nitin