If you have fast disk and enough RAM, indexing is CPU limited. So adjust the 
indexing load until the CPU is busy but not overloaded.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 2, 2019, at 9:23 PM, Aroop Ganguly <aroopgang...@icloud.com> wrote:
> 
> Thats an interesting scaling scheme you mention.
> I have been trying to devise a good scheme for myself for our scale.
> 
> I will try to see how this works out for us.
> 
>> On Apr 2, 2019, at 9:15 PM, Walter Underwood <wun...@wunderwood.org> wrote:
>> 
>> Yeah, that would overload it. To get good indexing speed, I configure two 
>> clients per CPU on the indexing machine. With one shard on a 16 processor 
>> machine, that would be 32 threads. With four shards on four 16 processor 
>> machines, 128 clients. Basically, one thread is waiting while the CPU 
>> processes a batch and the other is sending the next batch.
>> 
>> That should get the cluster to about 80% CPU. If the cluster is handling 
>> queries at the same time, I cut that way back, like one client thread for 
>> every two CPUs.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Apr 2, 2019, at 8:13 PM, Aroop Ganguly <aroopgang...@icloud.com> wrote:
>>> 
>>> Mutliple threads to the same index ? And how many concurrent threads?
>>> 
>>> Our case is not merely multiple threads but actually large scale spark 
>>> indexer jobs that index 1B records at a time with a concurrency of 400.
>>> In this case multiple such jobs were indexing into the same index. 
>>> 
>>> 
>>>> On Apr 2, 2019, at 7:25 AM, Walter Underwood <wun...@wunderwood.org> wrote:
>>>> 
>>>> We run multiple threads indexing to Solr all the time and have been doing 
>>>> so for years.
>>>> 
>>>> How big are your documents and how big are your batches?
>>>> 
>>>> wunder
>>>> Walter Underwood
>>>> wun...@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>> 
>>>>> On Apr 1, 2019, at 10:51 PM, Aroop Ganguly <aroopgang...@icloud.com> 
>>>>> wrote:
>>>>> 
>>>>> Turns out the cause was multiple indexing jobs indexing into the index 
>>>>> simultaneously, which one can imagine can cause jvm loads on certain 
>>>>> replicas for sure.
>>>>> Once this was found and only one job ran at a time, things were back to 
>>>>> normal.
>>>>> 
>>>>> Your comments seem right on no correlation to the stack trace! 
>>>>> 
>>>>>> On Apr 1, 2019, at 5:32 PM, Shawn Heisey <apa...@elyograg.org> wrote:
>>>>>> 
>>>>>> 4/1/2019 5:40 PM, Aroop Ganguly wrote:
>>>>>>> Thanks Shawn, for the initial response.
>>>>>>> Digging into a bit, I was wondering if we’d care to read the inner most 
>>>>>>> stack.
>>>>>>> From the inner most stack it seems to be telling us something about 
>>>>>>> what trigger it ?
>>>>>>> Ofcourse, the system could have been overloaded as well, but is the 
>>>>>>> exception telling us something or its of no use to consider this stack
>>>>>> 
>>>>>> The stacktrace on OOME is rarely useful.  The memory allocation where 
>>>>>> the error is thrown probably has absolutely no connection to the part of 
>>>>>> the program where major amounts of memory are being used.  It could be 
>>>>>> ANY memory allocation that actually causes the error.
>>>>>> 
>>>>>> Thanks,
>>>>>> Shawn
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to