Thanks for pointing that out. I'm attaching a patch for the ref-guide
which summarizes what you said. Maybe other people will find this useful
as well?

Oh and Erick, thanks for your ever thoughtful replies. Given all the
hours of your time I've soaked up over the years, you should probably
start invoicing me :-)

 - Bram

On 17/06/2020 13:55, Erick Erickson wrote:
> Each node has its own timer that starts when it receives an update.
> So in your situation, 60 seconds after any give replica gets it’s first
> update, all documents that have been received in the interval will
> be committed.
> 
> But note several things:
> 
> 1> commits will tend to cluster for a given shard. By that I mean
>     they’ll tend to happen within a few milliseconds of each other
>    ‘cause it doesn’t take that long for an update to get from the
>    leader to all the followers.
> 
> 2> this is per replica. So if you host replicas from multiple collections
>    on some node, their commits have no relation to each other. And
>    say for some reason you transmit exactly one document that lands
>    on shard1. Further, say nodeA contains replicas for shard1 and shard2.
>    Only the replica for shard1 would commit.
> 
> 3> Solr promises eventual consistency. In this case, due to all the
>    timing variables it is not guaranteed that every replica of a single
>    shard has the same document available for search at any given time.
>    Say doc1 hits the leader at time T and a follower at time T+10ms.
>    Say doc2 hits the leader and gets indexed 5ms before the 
>    commit is triggered, but for some reason it takes 15ms for it to get
>    to the follower. The leader will be able to search doc2, but the
>   follower won’t until 60 seconds later.
> 
> Best,
> Erick
> 
>> On Jun 17, 2020, at 5:36 AM, Bram Van Dam <bram.van...@intix.eu> wrote:
>>
>> 'morning :-)
>>
>> I'm wondering how autocommits work in Solr.
>>
>> Say I have a cluster with many nodes and many colections with many
>> shards. If each collection's config has a hard autocommit configured
>> every minute, does that mean that SolrCloud (presumably the leader?)
>> will dish out commit requests to each node on that schedule? Or does
>> each node have its own timed trigger?
>>
>> If it's the former, doesn't that mean the load will spike dramatically
>> across the whole cluster every minute?
>>
>> I tried reading the code, but I don't quite understand the way
>> CommitTracker and the UpdateHandlers interact with SolrCloud.
>>
>> Thanks,
>>
>> - Bram
> 

>From 858406e5c322a96c82934a6477518f65c5c605cc Mon Sep 17 00:00:00 2001
From: Bram <bram.van...@intix.eu>
Date: Wed, 17 Jun 2020 22:54:46 +0200
Subject: [PATCH] Add a blurb about commit timings to the SolrCloud
 documentation

---
 .../src/shards-and-indexing-data-in-solrcloud.adoc              | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/solr/solr-ref-guide/src/shards-and-indexing-data-in-solrcloud.adoc b/solr/solr-ref-guide/src/shards-and-indexing-data-in-solrcloud.adoc
index 3aa07cbdae7..43828048383 100644
--- a/solr/solr-ref-guide/src/shards-and-indexing-data-in-solrcloud.adoc
+++ b/solr/solr-ref-guide/src/shards-and-indexing-data-in-solrcloud.adoc
@@ -122,6 +122,8 @@ More details on how to use shard splitting is in the section on the Collection A
 
 In most cases, when running in SolrCloud mode, indexing client applications should not send explicit commit requests. Rather, you should configure auto commits with `openSearcher=false` and auto soft-commits to make recent updates visible in search requests. This ensures that auto commits occur on a regular schedule in the cluster.
 
+TIP: Each node has its own auto commit timer which starts upon receipt of an update. While Solr promises eventual consistency, leaders will generally receive updates *before* replicas; it is therefore possible for replicas to lag behind somewhat.
+
 To enforce a policy where client applications should not send explicit commits, you should update all client applications that index data into SolrCloud. However, that is not always feasible, so Solr provides the `IgnoreCommitOptimizeUpdateProcessorFactory`, which allows you to ignore explicit commits and/or optimize requests from client applications without having refactor your client application code.
 
 To activate this request processor you'll need to add the following to your `solrconfig.xml`:
-- 
2.20.1

Reply via email to