Hi,

Some suggestions.

* 64GB JVM Heap
Are you sure you really need this heap size ? Did you check in your GC logs
(with gceasy.io) ?
A best practice is to minimize as possible the heap size and never more
than 31 GB.

* OS Caching
Did you set swappiness to 1 ?

* Put two instances of Solr on each node
You need to check resource usage in order to evaluate if it could be
interesting (CPU usage, CPU load average, CPU iowait, Heap usage, Disk I/O
read and write, MMAP caching, ...)
Load Average high with CPU Load low looks like Disk I/O can be the
bottleneck. I would consider increasing the number of physical servers with
less CPU, RAM and disk space on each (but globally with the same quantity
of CPU, RAM and disk space). This will increase the disk I/O capacity.

* Collection 4 is the trouble collection
Try to have smaller cores (more shards if you increase the number of Solr
instances)
Investigate in time routed ou category routed aliases if it can match with
your update strategy and/or your queries profiles.
Work again on shema :
- For docValues=true fields, check if you really need indexed=true and
storted=true (there are a lot of considerations to take in account), ...
- Over-indexing with copyfield ?
Work on queries : facets, group, collapse, fl=, rows=, ...

Regards

Dominique


Le mer. 27 janv. 2021 à 14:53, Hollowell,Skip <hollo...@oclc.org> a écrit :

> 30 Dedicated physical Nodes in the Solr Cloud Cluster, all of identical
> configuration
> Server01   RHEL 7.x
> 256GB RAM
> 10 2TB Spinning Disk in a RAID 10 Configuration (Leaving us 9.8TB usable
> per node)
> 64GB JVM Heap, Tried has high as 100GB, but it appeared that 64GB was
> faster.  If we set a higher heap, do we starve the OS for caching?
> Huge Pages is off on the system, and thus UseLargePages is off on Solr
> Startup
> G1GC, Java 11  (ZGC with Java 15 and HugePages turned on was a disaster.
> We suspect it was due to the Huge Pages configuration)
> At one time we discussed putting two instances of Solr on each node,
> giving us a cloud of 60 instances instead of 30.  Load Average is high on
> these nodes during certain types of queries or updates, but CPU Load is
> relatively low and should be able to accommodate a second instance, but all
> the data would still be on the same RAID10 group of disks.
> Collection 4 is the trouble collection.  It has nearly a billion
> documents, and there are between 200 and 400 million updates every day.
> How do we get that kind of update performance, and still serve 10 million
> queries a day?  Schemas have been reviewed and re-reviewed to ensure we are
> only indexing and storing what is absolutely necessary.  What are we
> missing?  Do we need to revisit our replica policy?  Number of replicas or
> types of replicas (to ensure some are only used for reading, etc?)
> [Grabbed from the Admin UI]
> 755.6Gb Index Size according to Solr Cloud UI
> Total #docs: 371.8mn
> Avg size/doc: 2.1Kb
> 90 Shards, 2 NRT Replicas per Shard, 1,750,612,476 documents, avg
> size/doc: 1.7Kb, uses nested documents
> collection-1_s69r317       31.1Gb
> collection-1_s49r96         30.7Gb
> collection-1_s78r154       30.2Gb
> collection-1_s40r259       30.1Gb
> collection-1_s9r197         29.1Gb
> collection-1_s18r34         28.9Gb
> 120 Shards, 2 TLOG Replicas per Shard, 2,230,207,046 documents, avg
> size/doc: 1.3Kb
> collection-2_s78r154       22.8Gb
> collection-2_s49r96         22.8Gb
> collection-2_s46r331       22.8Gb
> collection-2_s18r34         22.7Gb
> collection-2_s109r216    22.7Gb
> collection-2_s104r447    22.7Gb
> collection-2_s15r269       22.7Gb
> collection-2_s73r385       22.7Gb
> 120 Shards, 2 TLOG Replicas per Shard, 733,588,503 documents, avg
> size/doc: 1.9Kb
> collection-3_s19r277       10.6Gb
> collection-3_s108r214    10.6Gb
> collection-3_s48r94         10.6Gb
> collection-3_s109r457    10.6Gb
> collection-3_s47r333       10.5Gb
> collection-3_s78r154       10.5Gb
> collection-3_s18r34         10.5Gb
> collection-3_s77r393       10.5Gb
>
> 120 Shards, 2 TLOG Replicas per Shard, 864,372,654 documents, avg
> size/doc: 5.6Kb
> collection-4_s109r216    38.7Gb
> collection-4_s100r439    38.7Gb
> collection-4_s49r96         38.7Gb
> collection-4_s35r309       38.6Gb
> collection-4_s18r34         38.6Gb
> collection-4_s78r154       38.6Gb
> collection-4_s7r253         38.6Gb
> collection-4_s69r377       38.6Gb
>

Reply via email to