Arturas: " it is becoming incredibly difficult to find working code"
Yeah, I sympathize totally. What I usually do is go into the test code of whatever version of Solr I'm using and find examples there. _That_ code _must_ be kept up to date ;). About batching docs. What you gain basically more efficient I/O, you don't have to wait around for the client to connect/disconnect for every doc. Here's some numbers: https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/ with all the caveats that YMMV. Best, Erick On Thu, Jul 5, 2018 at 7:48 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 7/4/2018 3:32 AM, Arturas Mazeika wrote: >> >> Details: >> >> I am benchmarking solrcloud setup on a single machine (Intel 7 with 8 "cpu >> cores", an SSD as well as a HDD) using the German Wikipedia collection. I >> created 4 nodes, 4 shards, rep factor: 2 cluster on the same machine (and >> managed to push the CPU or SSD to the hardware limits, i.e., ~200MB/s, >> ~100% CPU). Now I wanted to see what happens if I push HDD to the limits. >> Indexing the files from the SSD (I am able to scan the collection at the >> actual rate 400-500MB/s) with 16 threads, I tried to send those to the >> solr >> cluster with all indexes on the HDD. > > <snip> >> >> - 4 cores running 2gb ram > > > If this is saying that the machine running Solr has 2GB of installed memory, > that's going to be a serious problem. > > The default heap size that Solr starts with is 512MB. With 4 Solr nodes > running on the machine, each with a 512MB heap, all of your 2GB of memory is > going to be required by the heaps. Java requires memory beyond the heap to > run. Your operating system and its other processes will also require some > memory. > > This means that not only are you going to have no memory left for the OS > disk cache, you're actually going to allocating MORE than the 2GB of > installed memory, which means the OS is going to start swapping to > accommodate memory allocations. > > When you don't have enough memory for good disk caching, Solr performance is > absolutely terrible. When Solr has to wait for data to be read off of disk, > even if the disk is SSD, its performance will not be good. > > When the OS starts swapping, the performance of ANY software on the system > drops SIGNIFICANTLY. > > You need a lot more memory than 2GB on your server. > > Thanks, > Shawn >