What are your hard and soft commit settings? This can have a large
impact on the writing throughput.

Best,
Edward

On Wed, Mar 18, 2020 at 11:43 AM Tim Robertson
<timrobertson...@gmail.com> wrote:
>
> Thank you Erick
>
> I should have been clearer that this is a bulk load job into a write-only
> cluster (until loaded when it becomes read-only) and it is the write
> throughput I was chasing.
>
> I made some changes and have managed to get it working more closely to what
> I expect.  I'll summarise them here in case anyone stumbles on
> this thread but please note this was just the result of a few tuning
> experiments and is not definitive:
>
> - Increased shard count, so there were the same number of shards as virtual
> CPU cores on each machine
> - Set the ramBufferSizeMB to 2048
> - Increased the parallelization in the loading job (i.e. ran the job across
> more spark cores concurrently)
> - Dropped to batches of 500 docs sent instead of 1000
>
>
> On Wed, Mar 18, 2020 at 1:19 PM Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > The Apache mail server strips attachments pretty aggressively, so I can’t
> > see your attachment.
> >
> > About the only way to diagnose would be to take a thread dump of the
> > machine that’s running hot.
> >
> > There are a couple of places I’d look:
> >
> > 1> what happens if you don’t return any non-docValue fields? To return
> > stored fields, the doc must be fetched and decompressed. That doesn’t fit
> > very well with your observation that only one node runs hot, but it’s worth
> > checking.
> >
> > 2> Return one doc-value=true field and search only on a single field (with
> > different values of course). Does that follow this pattern? What I’m
> > wondering about here is whether the delays are because you’re swapping
> > index files in and out of memory. Again, that doesn’t really explain high
> > CPU utilization, if that were the case I’d expect you to be I/O bound.
> >
> > 3> I’ve seen indexes with this many fields perform reasonably well BTW.
> >
> > How many fields are you returning? One thing that happens is that when a
> > query comes in to a node, sub-queries are sent out to one replica of each
> > shard, and the results from each shard are sorted by one node and returned
> > to the client. Unless you’re returning lots and lots of fields and/or many
> > rows, this shouldn’t run “for many minutes”, but it’s something to look for.
> >
> > When this happens, what is your query response time like? I’m assuming
> > it’s very slow.
> >
> > But these are all shots in the dark, some thread dumps would be where I’d
> > start.
> >
> > Best,
> > Erick
> >
> > > On Mar 18, 2020, at 6:55 AM, Tim Robertson <timrobertson...@gmail.com>
> > wrote:
> > >
> > > Hi all
> > >
> > > We load Solr (8.4.1) from Spark and are trying to grow the schema with
> > some dynamic fields that will result in around 500-600 indexed fields per
> > doc.
> > >
> > > Currently, we see ~300 fields/doc work very well into an 8-node Solr
> > cluster with CPU nicely balanced across a cluster and we saturate our
> > network.
> > >
> > > However, growing to ~500-600 fields we see incoming network traffic drop
> > to around a quarter and in the Solr cluster we see low CPU on most
> > machines, but always one machine with high load (it is the Solr process).
> > That machine will stay high for many minutes, and then another will go high
> > - see CPU graph [1]. I've played with changing shard counts but beyond 32
> > didn't see any gains. There is only one replica on each shard, each machine
> > runs on AWS with an EFS mounted disk only running Solr 8, ZK is on a
> > different set of machines.
> > >
> > > Can anyone please throw out ideas of what you would do to tune Solr for
> > large amounts of dynamic fields?
> > >
> > > Does anyone have a guess on what the single high CPU node is doing (some
> > kind of metrics aggregation maybe?).
> > >
> > > Thank you all,
> > > Tim
> > >
> > > [1]
> > >
> > >
> > >
> >
> >

Reply via email to