Tuning for 500+ field schemas

2020-03-18 Thread Tim Robertson
Hi all We load Solr (8.4.1) from Spark and are trying to grow the schema with some dynamic fields that will result in around 500-600 indexed fields per doc. Currently, we see ~300 fields/doc work very well into an 8-node Solr cluster with CPU nicely balanced across a cluster and we saturate our n

Re: Tuning for 500+ field schemas

2020-03-18 Thread Erick Erickson
The Apache mail server strips attachments pretty aggressively, so I can’t see your attachment. About the only way to diagnose would be to take a thread dump of the machine that’s running hot. There are a couple of places I’d look: 1> what happens if you don’t return any non-docValue fields? To

Re: Tuning for 500+ field schemas

2020-03-18 Thread Tim Robertson
Thank you Erick I should have been clearer that this is a bulk load job into a write-only cluster (until loaded when it becomes read-only) and it is the write throughput I was chasing. I made some changes and have managed to get it working more closely to what I expect. I'll summarise them here

Re: How do *you* restrict access to Solr?

2020-03-18 Thread Ryan W
On Tue, Mar 17, 2020 at 6:05 AM Jan Høydahl wrote: > You can consider upgrading to Solr 8.5 which is to be released in a couple > of days, which makes it easy to whitelist IP addresses in solr.in.sh: > Thanks. That is good news, though it won't help me this time around. My application framewor

Re: How do *you* restrict access to Solr?

2020-03-18 Thread Markus Kalkbrenner
> My application framework (Drupal) doesn't support Solr 8. That's not true. Like Solr itself you just have to update to recent drupal module versions. As you can see at https://travis-ci.org/github/mkalkbrenner/search_api_solr/builds/663153535 the automated tests run against Solr 6.6.6, 7.7.2

Re: Tuning for 500+ field schemas

2020-03-18 Thread Edward Ribeiro
What are your hard and soft commit settings? This can have a large impact on the writing throughput. Best, Edward On Wed, Mar 18, 2020 at 11:43 AM Tim Robertson wrote: > > Thank you Erick > > I should have been clearer that this is a bulk load job into a write-only > cluster (until loaded when i

Re: Tuning for 500+ field schemas

2020-03-18 Thread Erick Erickson
Ak, ok. Then your spikes were probably being caused by segment merging, which would account for it being on different machines and running for a long time. Segment merging is a very expensive operation... As Edward mentioned, your commit settings come into play. You could easily be creating segmen

Re: Tuning for 500+ field schemas

2020-03-18 Thread Tim Robertson
Thank you Edward, Erick, In this environment, hard commits @60s without openSearcher and soft commits are off. We have the luxury of building the index, then opening searchers and adding replicas afterward. We'll monitor the segment merging and lengthen the commit time as suggested - thank you!