Re: IndexFingerprint and Leader Election Slowness

Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A) Mon, 05 May 2025 11:39:58 -0700

Thanks for the reply Jason,

I think I am actually most curious about the TLOG case. Parallelizing is good 
but amounts to just throwing more hardware at the problem. Also, one of the 
tests mysteriously fails when you do this so it isn't a super quick win :-)


From reading the ref guide I see that we claim that TLOGs also don't do any 
local indexing: 
https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-indexing.html#tlog-replicas-plus-pull-replicas:~:text=TLOG%3A%20This%20type,type%20of%20replica.
 Whether the reality is more nuanced would require looking at the code. Just 
curious, do you have any more specific reason to suspect why TLOG replicas 
would need fingerprinting? Perhaps I'm oversimplifying but if TLOGs actually do 
*not* do local indexing as we claim then I don't really see why taking a 
fingerprint over the index would be useful. I can take a deeper look on the 
code as well as running the tests that originally inspired this feature as a 
sanity check (as a litmus test of whether tlog+pull needs this).

My point is it feels "silly" do run this fingerprint for gigantic TLOG clouds 
simply out of superstition. For reference, the fingerprinting feature was added 
*before* TLOG+PULL feature so it wouldn't be terribly surprising if this were 
an optimization that was still on the table.

Luke

From: dev@solr.apache.org At: 05/05/25 13:27:32 UTC-4:00To:  dev@solr.apache.org
Subject: Re: IndexFingerprint and Leader Election Slowness

Yeah, I think there's a lot of room for optimization here.

I can't answer question (2) offhand, but my hunch is that PULL
replicas shouldn't need fingerprinting, even if TLOG replicas might.

(3) sounds really useful, but as you mention it has some edge cases
that might be tricky to handle (e.g. in-place updates)

(1) IMO sounds like the best "bang for buck", in that the computation
would be relatively easy to parallelize and could have a really
drastic performance improvement.  My two-cents would be to start there
and benchmark to see what sort of a speedup that gets you guys on your
larger indices.  It might be enough that you don't need to consider
other, more complex options.  If you go this route and propose it
upstream, feel free to tag me as a reviewer.  Happy to take a look!

Best,

Jason

On Thu, May 1, 2025 at 6:05 PM Matthew Biscocho (BLOOMBERG/ 919 3RD A)
<mbisco...@bloomberg.net> wrote:
>
> I would like to add a 4th point which is how often the potential leader 
during election ends up recalculating its fingerprint. I noticed this when the 
leader asks all the replicas to "syncToMe" before establishing itself as leader.
>
> Lets say for example it asks the 4 other replicas to sync with it to become 
leader. Well each replica ends up sending a request back to the potential 
leader asking for it's fingerprint to check if they are already in sync. Now 
when recalculating it's fingerprint, we don't lock this function so with each 
request being processed, all of them just misses the cache and now we have 4 
threads potentially recalculating the fingerprint across all its segments if 
all the caches miss and populating/overwriting it at the same time. Now this is 
probably fine since they are all going over the same segments but seems 
wasteful as it scales as there are more replicas.
>
> Then when this call finishes and it happens to be the leader is not in-sync 
with any of the replicas, it asks for the fingerprint once more when it tries 
to get it's versions. Again there is a cache in front so technically this may 
not be as big of a problem a second time since the first call already computed 
it for the second. But that means with 4 replicas it potentially just asked the 
leader to check its fingerprint 8 times. This makes me think maybe the leader 
should front load this calculation for its fingerprint before asking to 
"syncToMe" and pass that to each replica to avoid all this repeated work 
and save the network calls.
>
> - Matt
>
> From: dev@solr.apache.org At: 05/01/25 12:57:54 UTC-4:00To:  
dev@solr.apache.org
> Subject: IndexFingerprint and Leader Election Slowness
>
> Hey all,
>
> Can we talk for a moment about index fingerprinting? For those uninitialized
> here is the original ticket https://issues.apache.org/jira/browse/SOLR-8586 as
> well as the condition that motivated it
> https://issues.apache.org/jira/browse/SOLR-8129.
>
> In short, when a leader fails during distributed update fan-out, some replicas
> may get updates that others miss. Normally a new leader fills in any of its
> gaps from other replicas, but if soft-commits have flushed those updates from
> the t-log, comparing logs won’t catch the discrepancy. To avoid split-brain,
> Solr currently checksums all non-deleted document versions in the index, since
> versions are unique per shard.
>
> However, this fingerprinting can be very slow process that blocks leader
> election. In our investigation of election latency, Matt Biscocho and I saw it
> take 10–60 seconds on large shards (hundreds of millions of docs).
>
> A few observations and questions:
>
>
>   1)Can we parallelize the checksum? It's tempting to just slap a
> parallelStream here
> 
https://github.com/apache/solr/blob/25309f64685a8b70a3bb79c4a07eb8e005724600/sol
> r/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L2551, although
> we’re wary of using the ForkJoinPool commonPool in a giant project like Solr.
> Don't want to derail the main discussion but I've seen some arguable misuse of
> parallelStream here, i.e. using it for blocking work.
>   2)Is fingerprinting relevant for TLOG+PULL? Since TLOG replicas only index
> when elected and otherwise bypass indexing, is full index fingerprinting still
> necessary? Segment downloads already include TCP-layer checksums.
>   3)Can fingerprinting be done more eagerly? Doing this work only at election
> time stalls everything. Could we subtract deletions incrementally (as Yonik
> originally suggested) as well as using a custom IndexReaderWarmer to hash new
> segments on open? In-place updates are trickier but are a minority use case 
(at
> least for us).
>
> We're also eager to hear any other ideas—or hear how you’ve configured Solr to
> avoid this issue altogether as we're sort of surprised we didn't see more
> chatter about this on the mailing list.
>
> Thanks,
> Luke
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Re: IndexFingerprint and Leader Election Slowness

Reply via email to