Re: Solr Cloud Bulk Indexing Questions

Erick Erickson Wed, 22 Jan 2014 17:34:50 -0800

When you're doing hard commits, is it with openSeacher = true or
false? It should probably be false...


Here's a rundown of the soft/hard commit consequences:

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

I suspect (but, of course, can't prove) that you're over-committing
and hitting segment
merges without meaning to...

FWIW,
Erick

On Wed, Jan 22, 2014 at 1:46 PM, Software Dev <static.void....@gmail.com> wrote:
> A suggestion would be to hard commit much less often, ie every 10
> minutes, and see if there is a change.
>
> - Will try this
>
> How much system RAM ? JVM Heap ? Enough space in RAM for system disk cache ?
>
> - We have 18G of ram 12 dedicated to Solr but as of right now the total
> index size is only 5GB
>
> Ah, and what about network IO ? Could that be a limiting factor ?
>
> - What is the size of your documents ? A few KB, MB, ... ?
>
> Under 1MB
>
> - Again, total index size is only 5GB so I dont know if this would be a
> problem
>
>
>
>
>
>
> On Wed, Jan 22, 2014 at 12:26 AM, Andre Bois-Crettez
> <andre.b...@kelkoo.com>wrote:
>
>> 1 node having more load should be the leader (because of the extra work
>> of receiving and distributing updates, but my experiences show only a
>> bit more CPU usage, and no difference in disk IO).
>>
>> A suggestion would be to hard commit much less often, ie every 10
>> minutes, and see if there is a change.
>> How much system RAM ? JVM Heap ? Enough space in RAM for system disk cache
>> ?
>> What is the size of your documents ? A few KB, MB, ... ?
>> Ah, and what about network IO ? Could that be a limiting factor ?
>>
>>
>> André
>>
>>
>> On 2014-01-21 23:40, Software Dev wrote:
>>
>>> Any other suggestions?
>>>
>>>
>>> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev <static.void....@gmail.com>
>>> wrote:
>>>
>>>  4.6.0
>>>>
>>>>
>>>> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller <markrmil...@gmail.com
>>>> >wrote:
>>>>
>>>>  What version are you running?
>>>>>
>>>>> - Mark
>>>>>
>>>>> On Jan 20, 2014, at 5:43 PM, Software Dev <static.void....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>  We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do
>>>>>> all
>>>>>> updates get sent to one machine or something?
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <
>>>>>>
>>>>> static.void....@gmail.com>wrote:
>>>>>
>>>>>> We commit have a soft commit every 5 seconds and hard commit every 30.
>>>>>>>
>>>>>> As
>>>>>
>>>>>> far as docs/second it would guess around 200/sec which doesn't seem
>>>>>>>
>>>>>> that
>>>>>
>>>>>> high.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
>>>>>>>
>>>>>> erickerick...@gmail.com>wrote:
>>>>>
>>>>>> Questions: How often do you commit your updates? What is your
>>>>>>>> indexing rate in docs/second?
>>>>>>>>
>>>>>>>> In a SolrCloud setup, you should be using a CloudSolrServer. If the
>>>>>>>> server is having trouble keeping up with updates, switching to CUSS
>>>>>>>> probably wouldn't help.
>>>>>>>>
>>>>>>>> So I suspect there's something not optimal about your setup that's
>>>>>>>> the culprit.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Erick
>>>>>>>>
>>>>>>>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <
>>>>>>>>
>>>>>>> static.void....@gmail.com>
>>>>>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>> We are testing our shiny new Solr Cloud architecture but we are
>>>>>>>>> experiencing some issues when doing bulk indexing.
>>>>>>>>>
>>>>>>>>> We have 5 solr cloud machines running and 3 indexing machines
>>>>>>>>>
>>>>>>>> (separate
>>>>>
>>>>>> from the cloud servers). The indexing machines pull off ids from a
>>>>>>>>>
>>>>>>>> queue
>>>>>
>>>>>> then they index and ship over a document via a CloudSolrServer. It
>>>>>>>>>
>>>>>>>> appears
>>>>>>>>
>>>>>>>>> that the indexers are too fast because the load (particularly disk
>>>>>>>>>
>>>>>>>> io)
>>>>>
>>>>>> on
>>>>>>>>
>>>>>>>>> the solr cloud machines spikes through the roof making the entire
>>>>>>>>>
>>>>>>>> cluster
>>>>>>>>
>>>>>>>>> unusable. It's kind of odd because the total index size is not even
>>>>>>>>> large..ie, < 10GB. Are there any optimization/enhancements I could
>>>>>>>>>
>>>>>>>> try
>>>>>
>>>>>> to
>>>>>>>>
>>>>>>>>> help alleviate these problems?
>>>>>>>>>
>>>>>>>>> I should note that for the above collection we have only have 1
>>>>>>>>> shard
>>>>>>>>>
>>>>>>>> thats
>>>>>>>>
>>>>>>>>> replicated across all machines so all machines have the full index.
>>>>>>>>>
>>>>>>>>> Would we benefit from switching to a ConcurrentUpdateSolrServer
>>>>>>>>> where
>>>>>>>>>
>>>>>>>> all
>>>>>>>>
>>>>>>>>> updates get sent to 1 machine and 1 machine only? We could then
>>>>>>>>>
>>>>>>>> remove
>>>>>
>>>>>> this
>>>>>>>>
>>>>>>>>> machine from our cluster than that handles user requests.
>>>>>>>>>
>>>>>>>>> Thanks for any input.
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>> --
>>> André Bois-Crettez
>>>
>>> Software Architect
>>> Search Developer
>>> http://www.kelkoo.com/
>>>
>>
>> Kelkoo SAS
>> Société par Actions Simplifiée
>> Au capital de € 4.168.964,30
>> Siège social : 8, rue du Sentier 75002 Paris
>> 425 093 069 RCS Paris
>>
>> Ce message et les pièces jointes sont confidentiels et établis à
>> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
>> destinataire de ce message, merci de le détruire et d'en avertir
>> l'expéditeur.
>>

Re: Solr Cloud Bulk Indexing Questions

Reply via email to