Re: Tuning for 500+ field schemas

2020-03-18 Thread Tim Robertson
Thank you Edward, Erick, In this environment, hard commits @60s without openSearcher and soft commits are off. We have the luxury of building the index, then opening searchers and adding replicas afterward. We'll monitor the segment merging and lengthen the commit time as suggested - thank you!

Re: Tuning for 500+ field schemas

2020-03-18 Thread Erick Erickson
Ak, ok. Then your spikes were probably being caused by segment merging, which would account for it being on different machines and running for a long time. Segment merging is a very expensive operation... As Edward mentioned, your commit settings come into play. You could easily be creating segmen

Re: Tuning for 500+ field schemas

2020-03-18 Thread Edward Ribeiro
What are your hard and soft commit settings? This can have a large impact on the writing throughput. Best, Edward On Wed, Mar 18, 2020 at 11:43 AM Tim Robertson wrote: > > Thank you Erick > > I should have been clearer that this is a bulk load job into a write-only > cluster (until loaded when i

Re: Tuning for 500+ field schemas

2020-03-18 Thread Tim Robertson
Thank you Erick I should have been clearer that this is a bulk load job into a write-only cluster (until loaded when it becomes read-only) and it is the write throughput I was chasing. I made some changes and have managed to get it working more closely to what I expect. I'll summarise them here

Re: Tuning for 500+ field schemas

2020-03-18 Thread Erick Erickson
The Apache mail server strips attachments pretty aggressively, so I can’t see your attachment. About the only way to diagnose would be to take a thread dump of the machine that’s running hot. There are a couple of places I’d look: 1> what happens if you don’t return any non-docValue fields? To