Re: Solr 1.4 Replication scheme

Jason Rutherglen Fri, 14 Aug 2009 10:48:39 -0700

This would be good! Especially for NRT where this problem is
somewhat harder. I think we may need to look at caching readers
per corresponding http session. The pitfall is expiring them
before running out of RAM.


On Fri, Aug 14, 2009 at 6:34 AM, Yonik Seeley<yo...@lucidimagination.com> wrote:
> Longer term, it might be nice to enable clients to specify what
> version of the index they were searching against.  This could be used
> to prevent consistency issues across different slaves, even if they
> commit at different times.  It could also be used in distributed
> search to make sure the index didn't change between phases.
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> 2009/8/14 Noble Paul നോബിള്‍  नोब्ळ् <noble.p...@corp.aol.com>:
>> On Fri, Aug 14, 2009 at 2:28 PM, KaktuChakarabati<jimmoe...@gmail.com> wrote:
>>>
>>> Hey Noble,
>>> you are right in that this will solve the problem, however it implicitly
>>> assumes that commits to the master are infrequent enough ( so that most
>>> polling operations yield no update and only every few polls lead to an
>>> actual commit. )
>>> This is a relatively safe assumption in most cases, but one that couples the
>>> master update policy with the performance of the slaves - if the master gets
>>> updated (and committed to) frequently, slaves might face a commit on every
>>> 1-2 poll's, much more than is feasible given new searcher warmup times..
>>> In effect what this comes down to it seems is that i must make the master
>>> commit frequency the same as i'd want the slaves to use - and this is
>>> markedly different than previous behaviour with which i could have the
>>> master get updated(+committed to) at one rate and slaves committing those
>>> updates at a different rate.
>> I see , the argument. But , isn't it better to keep both the mster and
>> slave as consistent as possible? There is no use in committing in
>> master, if you do not plan to search on those docs. So the best thing
>> to do is do a commit only as frequently as you wish to commit in a
>> slave.
>>
>> On a different track, if we can have an option of disabling commit
>> after replication, is it worth it? So the user can trigger a commit
>> explicitly
>>
>>>
>>>
>>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>>
>>>> usually the pollInterval is kept to a small value like 10secs. there
>>>> is no harm in polling more frequently. This can ensure that the
>>>> replication happens at almost same time
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabati<jimmoe...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hey Shalin,
>>>>> thanks for your prompt reply.
>>>>> To clarity:
>>>>> With the old script-based replication, I would snappull every x minutes
>>>>> (say, on the order of 5 minutes).
>>>>> Assuming no index optimize occured ( I optimize 1-2 times a day so we can
>>>>> disregard it for the sake of argument), the snappull would take a few
>>>>> seconds to run on each iteration.
>>>>> I then have a crontab on all slaves that runs snapinstall on a fixed
>>>>> time,
>>>>> lets say every 15 minutes from start of a round hour, inclusive. (slave
>>>>> machine times are synced e.g via ntp) so that essentially all slaves will
>>>>> begin a snapinstall exactly at the same time - assuming uniform load and
>>>>> the
>>>>> fact they all have at this point in time the same snapshot since I
>>>>> snappull
>>>>> frequently - this leads to a fairly synchronized replication across the
>>>>> board.
>>>>>
>>>>> With the new replication however, it seems that by binding the pulling
>>>>> and
>>>>> installing as well specifying the timing in delta's only (as opposed to
>>>>> "absolute-time" based like in crontab) we've essentially made it
>>>>> impossible
>>>>> to effectively keep multiple slaves up to date and synchronized; e.g if
>>>>> we
>>>>> set poll interval to 15 minutes, a slight offset in the startup times of
>>>>> the
>>>>> slaves (that can very much be the case for arbitrary resets/maintenance
>>>>> operations) can lead to deviations in snappull(+install) times. this in
>>>>> turn
>>>>> is further made worse by the fact that the pollInterval is then computed
>>>>> based on the offset of when the last commit *finished* - and this number
>>>>> seems to have a higher variance, e.g due to warmup which might be
>>>>> different
>>>>> across machines based on the queries they've handled previously.
>>>>>
>>>>> To summarize, It seems to me like it might be beneficial to introduce a
>>>>> second parameter that acts more like a crontab time-based tableau, in so
>>>>> far
>>>>> that it can enable a user to specify when an actual commit should occur -
>>>>> so
>>>>> then we can have the pollInterval set to a low value (e.g 60 seconds) but
>>>>> then specify to only perform a commit on the 0,15,30,45-minutes of every
>>>>> hour. this makes the commit times on the slaves fairly deterministic.
>>>>>
>>>>> Does this make sense or am i missing something with current in-process
>>>>> replication?
>>>>>
>>>>> Thanks,
>>>>> -Chak
>>>>>
>>>>>
>>>>> Shalin Shekhar Mangar wrote:
>>>>>>
>>>>>> On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
>>>>>> <jimmoe...@gmail.com>wrote:
>>>>>>
>>>>>>>
>>>>>>> In the old replication, I could snappull with multiple slaves
>>>>>>> asynchronously
>>>>>>> but perform the snapinstall on each at the same time (+- epsilon
>>>>>>> seconds),
>>>>>>> so that way production load balanced query serving will always be
>>>>>>> consistent.
>>>>>>>
>>>>>>> With the new system it seems that i have no control over syncing them,
>>>>>>> but
>>>>>>> rather it polls every few minutes and then decides the next cycle based
>>>>>>> on
>>>>>>> last time it *finished* updating, so in any case I lose control over
>>>>>>> the
>>>>>>> synchronization of snap installation across multiple slaves.
>>>>>>>
>>>>>>
>>>>>> That is true. How did you synchronize them with the script based
>>>>>> solution?
>>>>>> Assuming network bandwidth is equally distributed and all slaves are
>>>>>> equal
>>>>>> in hardware/configuration, the time difference between new searcher
>>>>>> registration on any slave should not be more then pollInterval, no?
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Also, I noticed the default poll interval is 60 seconds. It would seem
>>>>>>> that
>>>>>>> for such a rapid interval, what i mentioned above is a non issue,
>>>>>>> however
>>>>>>> i
>>>>>>> am not clear how this works vis-a-vis the new searcher warmup? for a
>>>>>>> considerable index size (20Million docs+) the warmup itself is an
>>>>>>> expensive
>>>>>>> and somewhat lengthy process and if a new searcher opens and warms up
>>>>>>> every
>>>>>>> minute, I am not at all sure i'll be able to serve queries with
>>>>>>> reasonable
>>>>>>> QTimes.
>>>>>>>
>>>>>>
>>>>>> If the pollInterval is 60 seconds, it does not mean that a new index is
>>>>>> fetched every 60 seconds. A new index is downloaded and installed on the
>>>>>> slave only if a commit happened on the master (i.e. the index was
>>>>>> actually
>>>>>> changed on the master).
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Shalin Shekhar Mangar.
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968105.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -----------------------------------------------------
>>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>>
>>>>
>>>
>>> --
>>> View this message in context: 
>>> http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968460.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>

Re: Solr 1.4 Replication scheme

Reply via email to