Arturas:

" it is becoming incredibly difficult to find working code"

Yeah, I sympathize totally. What I usually do is go into the test code
of whatever version of Solr I'm using and find examples there. _That_
code _must_ be kept up to date ;).

About batching docs. What you gain basically more efficient I/O, you
don't have to wait around for the client to connect/disconnect for
every doc. Here's some numbers:
https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/ with
all the caveats that YMMV.

Best,
Erick

On Thu, Jul 5, 2018 at 7:48 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 7/4/2018 3:32 AM, Arturas Mazeika wrote:
>>
>> Details:
>>
>> I am benchmarking solrcloud setup on a single machine (Intel 7 with 8 "cpu
>> cores", an SSD as well as a HDD) using the German Wikipedia collection. I
>> created 4 nodes, 4 shards, rep factor: 2 cluster on the same machine (and
>> managed to push the CPU or SSD to the hardware limits, i.e., ~200MB/s,
>> ~100% CPU). Now I wanted to see what happens if I push HDD to the limits.
>> Indexing the files from the SSD (I am able to scan the collection at the
>> actual rate 400-500MB/s) with 16 threads, I tried to send those to the
>> solr
>> cluster with all indexes on the HDD.
>
> <snip>
>>
>> - 4 cores running 2gb ram
>
>
> If this is saying that the machine running Solr has 2GB of installed memory,
> that's going to be a serious problem.
>
> The default heap size that Solr starts with is 512MB.  With 4 Solr nodes
> running on the machine, each with a 512MB heap, all of your 2GB of memory is
> going to be required by the heaps.  Java requires memory beyond the heap to
> run.  Your operating system and its other processes will also require some
> memory.
>
> This means that not only are you going to have no memory left for the OS
> disk cache, you're actually going to allocating MORE than the 2GB of
> installed memory, which means the OS is going to start swapping to
> accommodate memory allocations.
>
> When you don't have enough memory for good disk caching, Solr performance is
> absolutely terrible.  When Solr has to wait for data to be read off of disk,
> even if the disk is SSD, its performance will not be good.
>
> When the OS starts swapping, the performance of ANY software on the system
> drops SIGNIFICANTLY.
>
> You need a lot more memory than 2GB on your server.
>
> Thanks,
> Shawn
>

Reply via email to