Re: push to the limit without going over

2018-07-05 Thread Erick Erickson
Arturas: " it is becoming incredibly difficult to find working code" Yeah, I sympathize totally. What I usually do is go into the test code of whatever version of Solr I'm using and find examples there. _That_ code _must_ be kept up to date ;). About batching docs. What you gain basically more e

Re: push to the limit without going over

2018-07-05 Thread Shawn Heisey
On 7/4/2018 3:32 AM, Arturas Mazeika wrote: Details: I am benchmarking solrcloud setup on a single machine (Intel 7 with 8 "cpu cores", an SSD as well as a HDD) using the German Wikipedia collection. I created 4 nodes, 4 shards, rep factor: 2 cluster on the same machine (and managed to push the

Re: push to the limit without going over

2018-07-05 Thread Arturas Mazeika
Hi Erick et al, Thanks a lot for the response. Your explanation seems very plausible and I'd love to investigate those further. Batching the docs (for me surprisingly) improved the numbers: Buffer size secs MB/s Docs/s N:500 1117 34.4077538 2400.72695 N:100 1073 35.8186962 2499.17241 N:10 1170

Re: push to the limit without going over

2018-07-04 Thread Erick Erickson
First, I usually prefer to construct your CloudSolrClient by using the Zookeeper ensemble string rather than URLs, although that's probably not a cure for your problem. Here's what I _think_ is happening. If you're slamming Solr with a lot of updates, you're doing a lot of merging. At some point w