Re: Replication and soft commits for NRT searches

MOIS Martin (MORPHO) Wed, 14 Oct 2015 23:27:05 -0700

Hello,

the background for my question is that one of the requirements for our 
injection tool is that it should report that a new document has been 
successfully enrolled to the cluster only if it is available on all replicas. 
The automated integration test for this feature will submit a document to the 
cluster and afterwards check if it can be found with an appropriate query (that 
is why I have configured autoSoftCommit/maxDocs=1).


In this context the question appeared, what happens if the update request 
returns rf=1 and I submit a query to a cluster with replication factor of two 
directly after the update (maybe to the replica due to load balancing)? Will 
the automated integration test fail sometimes and sometimes not? Will I have to 
wait artificially between the update and the query and if yes, how long? And 
how can I implement the requirement that our injection tool should report 
successful only if the document has been replicated to all replicas?

Best Regards,
Martin Mois

>bq: If a timeout between shard leader and replica can
>lead to a smaller rf value (because replication has
>timed out), is it possible to increase this timeout in the configuration?
>
>Why do you care? If it timed out, then the follower will
>no longer be active and will not serve queries. The Cloud view
>should show it in "down", "recovery" or the like. Before it
>goes back to the "active" state, it will synchronize from
>the leader automatically without you having to do anything and
>any docs that were indexed to the leader will be faithfully
>reflected on the follower  _before_ the recovering
>follower serves any new queries. So practically it makes no
>difference whether there was an update timeout or not.
>
>This is feeling a lot like an "XY" problem. You're asking detailed
>questions about "X" (in this case timeouts, what rf means and the like)
>without telling us what the problem you're concerned about is ("Y").
>
>So please back up and tell us what your higher level concern is.
>Do you have any evidence of Bad Things Happening?
>
>And do, please, change your commit intervals to not happen after
>doc. That's a Really Bad Practice in Solr.
>
>Best,
>Erick
>
>On Tue, Oct 13, 2015 at 11:58 PM, MOIS Martin (MORPHO)
><martin.m...@morpho.com> wrote:
>> Hello,
>>
>> thank you for the detailed answer.
>>
>> If a timeout between shard leader and replica can lead to a smaller rf value 
>> (because
>replication has timed out), is it possible to increase this timeout in the 
>configuration?
>>
>> Best Regards,
>> Martin Mois
>>
>> Comments inline:
>>
>> On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO)
>> <martin.m...@morpho.com> wrote:
>>> Hello,
>>>
>>> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been 
>>> created
>with
>> replicationFactor=2, i.e. I have one replica for each shard. Beyond that I 
>> am using autoCommit/maxDocs=10000
>> and autoSoftCommits/maxDocs=1 in order to achieve near realtime search 
>> behavior.
>>>
>>> As far as I understand from section "Write Side Fault Tolerance" in the 
>>> documentation
>> (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance),
>I
>> cannot enforce that an update gets replicated to all replicas, but I can 
>> only get the
>achieved
>> replication factor by requesting the return value rf.
>>>
>>> My question is now, what exactly does rf=2 mean? Does it only mean that the 
>>> replica
>has
>> written the update to its transaction log? Or has the replica also performed 
>> the soft
>commit
>> as configured with autoSoftCommits/maxDocs=1? The answer is important for 
>> me, as if the
>update
>> would only get written to the transaction log, I could not search for it 
>> reliable, as
>the
>> replica may not have added it to the searchable index.
>>
>> rf=2 means that the update was successfully replicated to and
>> acknowledged by two replicas (including the leader). The rf only deals
>> with the durability of the update and has no relation to visibility of
>> the update to searchers. The auto(soft)commit settings are applied
>> asynchronously and do not block an update request.
>>
>>>
>>> My second question is, does rf=1 mean that the update was definitely not 
>>> successful
>on
>> the replica or could it also represent a timeout of the replication request 
>> from the
>shard
>> leader? If it could also represent a timeout, then there would be a small 
>> chance that
>the
>> replication was successfully despite of the timeout.
>>
>> Well, rf=1 implies that the update was only applied on the leader's
>> index + tlog and either replicas weren't available or returned an
>> error or the request timed out. So yes, you are right that it can
>> represent a timeout and as such there is a chance that the replication
>> was indeed successful despite of the timeout.
>>
>>>
>>> Is there a way to retrieve the replication factor for a specific document 
>>> after the
>update
>> in order to check if replication was successful in the meantime?
>>>
>>
>> No, there is no way to do that.
>>
>>> Thanks in advance.
>>>
>>> Best Regards,
>>> Martin Mois
>>> #
>>> " This e-mail and any attached documents may contain confidential or 
>>> proprietary
>information.
>> If you are not the intended recipient, you are notified that any 
>> dissemination, copying
>of
>> this e-mail and any attachments thereto or use of their contents by any 
>> means whatsoever
>is
>> strictly prohibited. If you have received this e-mail in error, please 
>> advise the sender
>immediately
>> and delete this e-mail and all attached documents from your computer system."
>>> #
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>> #
>> " This e-mail and any attached documents may contain confidential or 
>> proprietary information.
>If you are not the intended recipient, you are notified that any 
>dissemination, copying of
>this e-mail and any attachments thereto or use of their contents by any means 
>whatsoever is
>strictly prohibited. If you have received this e-mail in error, please advise 
>the sender immediately
>and delete this e-mail and all attached documents from your computer system."
>> #
>
#
" This e-mail and any attached documents may contain confidential or 
proprietary information. If you are not the intended recipient, you are 
notified that any dissemination, copying of this e-mail and any attachments 
thereto or use of their contents by any means whatsoever is strictly 
prohibited. If you have received this e-mail in error, please advise the sender 
immediately and delete this e-mail and all attached documents from your 
computer system."
#

Re: Replication and soft commits for NRT searches

Reply via email to