Tuning for 500+ field schemas

Tim Robertson Wed, 18 Mar 2020 03:55:49 -0700

Hi all

We load Solr (8.4.1) from Spark and are trying to grow the schema with some
dynamic fields that will result in around 500-600 indexed fields per doc.


Currently, we see ~300 fields/doc work very well into an 8-node Solr
cluster with CPU nicely balanced across a cluster and we saturate our
network.

However, growing to ~500-600 fields we see incoming network traffic drop to
around a quarter and in the Solr cluster we see low CPU on most machines,
but always one machine with high load (it is the Solr process). That
machine will stay high for many minutes, and then another will go high -
see CPU graph [1]. I've played with changing shard counts but beyond 32
didn't see any gains. There is only one replica on each shard, each machine
runs on AWS with an EFS mounted disk only running Solr 8, ZK is on a
different set of machines.

Can anyone please throw out ideas of what you would do to tune Solr for
large amounts of dynamic fields?

Does anyone have a guess on what the single high CPU node is doing (some
kind of metrics aggregation maybe?).

Thank you all,
Tim

[1]

[image: image.png]

Tuning for 500+ field schemas

Reply via email to