Hi, Some suggestions.
* 64GB JVM Heap Are you sure you really need this heap size ? Did you check in your GC logs (with gceasy.io) ? A best practice is to minimize as possible the heap size and never more than 31 GB. * OS Caching Did you set swappiness to 1 ? * Put two instances of Solr on each node You need to check resource usage in order to evaluate if it could be interesting (CPU usage, CPU load average, CPU iowait, Heap usage, Disk I/O read and write, MMAP caching, ...) Load Average high with CPU Load low looks like Disk I/O can be the bottleneck. I would consider increasing the number of physical servers with less CPU, RAM and disk space on each (but globally with the same quantity of CPU, RAM and disk space). This will increase the disk I/O capacity. * Collection 4 is the trouble collection Try to have smaller cores (more shards if you increase the number of Solr instances) Investigate in time routed ou category routed aliases if it can match with your update strategy and/or your queries profiles. Work again on shema : - For docValues=true fields, check if you really need indexed=true and storted=true (there are a lot of considerations to take in account), ... - Over-indexing with copyfield ? Work on queries : facets, group, collapse, fl=, rows=, ... Regards Dominique Le mer. 27 janv. 2021 à 14:53, Hollowell,Skip <hollo...@oclc.org> a écrit : > 30 Dedicated physical Nodes in the Solr Cloud Cluster, all of identical > configuration > Server01 RHEL 7.x > 256GB RAM > 10 2TB Spinning Disk in a RAID 10 Configuration (Leaving us 9.8TB usable > per node) > 64GB JVM Heap, Tried has high as 100GB, but it appeared that 64GB was > faster. If we set a higher heap, do we starve the OS for caching? > Huge Pages is off on the system, and thus UseLargePages is off on Solr > Startup > G1GC, Java 11 (ZGC with Java 15 and HugePages turned on was a disaster. > We suspect it was due to the Huge Pages configuration) > At one time we discussed putting two instances of Solr on each node, > giving us a cloud of 60 instances instead of 30. Load Average is high on > these nodes during certain types of queries or updates, but CPU Load is > relatively low and should be able to accommodate a second instance, but all > the data would still be on the same RAID10 group of disks. > Collection 4 is the trouble collection. It has nearly a billion > documents, and there are between 200 and 400 million updates every day. > How do we get that kind of update performance, and still serve 10 million > queries a day? Schemas have been reviewed and re-reviewed to ensure we are > only indexing and storing what is absolutely necessary. What are we > missing? Do we need to revisit our replica policy? Number of replicas or > types of replicas (to ensure some are only used for reading, etc?) > [Grabbed from the Admin UI] > 755.6Gb Index Size according to Solr Cloud UI > Total #docs: 371.8mn > Avg size/doc: 2.1Kb > 90 Shards, 2 NRT Replicas per Shard, 1,750,612,476 documents, avg > size/doc: 1.7Kb, uses nested documents > collection-1_s69r317 31.1Gb > collection-1_s49r96 30.7Gb > collection-1_s78r154 30.2Gb > collection-1_s40r259 30.1Gb > collection-1_s9r197 29.1Gb > collection-1_s18r34 28.9Gb > 120 Shards, 2 TLOG Replicas per Shard, 2,230,207,046 documents, avg > size/doc: 1.3Kb > collection-2_s78r154 22.8Gb > collection-2_s49r96 22.8Gb > collection-2_s46r331 22.8Gb > collection-2_s18r34 22.7Gb > collection-2_s109r216 22.7Gb > collection-2_s104r447 22.7Gb > collection-2_s15r269 22.7Gb > collection-2_s73r385 22.7Gb > 120 Shards, 2 TLOG Replicas per Shard, 733,588,503 documents, avg > size/doc: 1.9Kb > collection-3_s19r277 10.6Gb > collection-3_s108r214 10.6Gb > collection-3_s48r94 10.6Gb > collection-3_s109r457 10.6Gb > collection-3_s47r333 10.5Gb > collection-3_s78r154 10.5Gb > collection-3_s18r34 10.5Gb > collection-3_s77r393 10.5Gb > > 120 Shards, 2 TLOG Replicas per Shard, 864,372,654 documents, avg > size/doc: 5.6Kb > collection-4_s109r216 38.7Gb > collection-4_s100r439 38.7Gb > collection-4_s49r96 38.7Gb > collection-4_s35r309 38.6Gb > collection-4_s18r34 38.6Gb > collection-4_s78r154 38.6Gb > collection-4_s7r253 38.6Gb > collection-4_s69r377 38.6Gb >