The Apache mail server strips attachments pretty aggressively, so I can’t see 
your attachment.

About the only way to diagnose would be to take a thread dump of the machine 
that’s running hot.

There are a couple of places I’d look:

1> what happens if you don’t return any non-docValue fields? To return stored 
fields, the doc must be fetched and decompressed. That doesn’t fit very well 
with your observation that only one node runs hot, but it’s worth checking.

2> Return one doc-value=true field and search only on a single field (with 
different values of course). Does that follow this pattern? What I’m wondering 
about here is whether the delays are because you’re swapping index files in and 
out of memory. Again, that doesn’t really explain high CPU utilization, if that 
were the case I’d expect you to be I/O bound.

3> I’ve seen indexes with this many fields perform reasonably well BTW.

How many fields are you returning? One thing that happens is that when a query 
comes in to a node, sub-queries are sent out to one replica of each shard, and 
the results from each shard are sorted by one node and returned to the client. 
Unless you’re returning lots and lots of fields and/or many rows, this 
shouldn’t run “for many minutes”, but it’s something to look for.

When this happens, what is your query response time like? I’m assuming it’s 
very slow.

But these are all shots in the dark, some thread dumps would be where I’d start.

Best,
Erick

> On Mar 18, 2020, at 6:55 AM, Tim Robertson <timrobertson...@gmail.com> wrote:
> 
> Hi all
> 
> We load Solr (8.4.1) from Spark and are trying to grow the schema with some 
> dynamic fields that will result in around 500-600 indexed fields per doc.
> 
> Currently, we see ~300 fields/doc work very well into an 8-node Solr cluster 
> with CPU nicely balanced across a cluster and we saturate our network.
> 
> However, growing to ~500-600 fields we see incoming network traffic drop to 
> around a quarter and in the Solr cluster we see low CPU on most machines, but 
> always one machine with high load (it is the Solr process). That machine will 
> stay high for many minutes, and then another will go high - see CPU graph 
> [1]. I've played with changing shard counts but beyond 32 didn't see any 
> gains. There is only one replica on each shard, each machine runs on AWS with 
> an EFS mounted disk only running Solr 8, ZK is on a different set of machines.
> 
> Can anyone please throw out ideas of what you would do to tune Solr for large 
> amounts of dynamic fields?
> 
> Does anyone have a guess on what the single high CPU node is doing (some kind 
> of metrics aggregation maybe?).
> 
> Thank you all,
> Tim
> 
> [1]
> 
> 
>  

Reply via email to