Re: IndexFingerprint and Leader Election Slowness

Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A) Mon, 05 May 2025 15:14:14 -0700

> Agreed but in my mental model this still 
> means only one TLOG replica is indexing
> locally at any given time


For clarity I'll add that the reason this is so important is the practical 
problem *Solr's* fingerprint feature prevented (if you look through the 
original ticket) was a corner case of something getting flushed from tlog (and 
indexed) on replica A while never even getting to replica B and thus the 
discrepancy wasn't caught during tlog comparison. 

But I think this scenario leads to permanent inconsistency only if *both* 
replicas are indexing independently. If only one tlog replica is indexing at 
any given time then each segment will have a golden source, that being the 
replica that created it. So the question of "is my index consistent" should 
only require looking at the segment files as well as any *Lucene*-managed 
versions associated with them like the delete and docValues "gens".
Sent from Bloomberg Professional for Android

----- Original Message -----
From: Luke Kot-Zaniewski <dev@solr.apache.org>
To: dev@solr.apache.org
At: 05/05/25 16:47:30 UTC-04:00


> I think the ref-guide is being pretty misleading here.  It's true that
> TLOG replicas don't index *under normal circumstances*, but they do
> index documents if they're ever called on to become a leader.

Agreed but in my mental model this still means only one TLOG replica is 
indexing locally at any given time, but maybe that model is wrong?

I would certainly be wrong if it turned out that while a "successor" TLOG is 
working/indexing to become the leader the *previous* leader itself might still 
be indexing while *other* TLOGs are actively pulling from it? This would create 
a possible divergence between successor and the rest of the TLOGs which the 
fingerprint would prevent. Is this along the lines of a scenario you might be 
concerned about? If no one knows "off-hand" then it might be worth it to do 
some more digging on this.

I'm so fixated on the TLOG+PULL case because in my experience the segment 
structure is identical (if not 100% then at least 99%) which in my mind makes 
it a more likely candidate for optimization of the fingerprint.

From: dev@solr.apache.org At: 05/05/25 15:57:55 UTC-4:00To:  dev@solr.apache.org
Subject: Re: IndexFingerprint and Leader Election Slowness

Ah - I wasn't trying to say necessarily that I suspect TLOG replicas
require fingerprinting.  I was just trying to express that "PULL"
replicas seem less likely to need it of the two, since PULL replicas
never index documents locally and will never try to assume leadership.
In other words: "I don't know about TLOG replicas, but I'd be pretty
surprised if PULL replicas needed fingerprinting"

> From reading the ref guide I see that we claim that TLOGs also don't do any
local indexing

I think the ref-guide is being pretty misleading here.  It's true that
TLOG replicas don't index *under normal circumstances*, but they do
index documents if they're ever called on to become a leader.

That little section of docs goes on to say: "This type of replica is
also eligible to become a shard leader; it would do so by first
processing its transaction log."  Another way to say that is that when
a TLOG replica is about to become a leader, it indexes the contents of
its transaction log.  These docs could use some rewording for sure...

On Mon, May 5, 2025 at 2:39 PM Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD
A) <lkotzanie...@bloomberg.net> wrote:
>
> Thanks for the reply Jason,
>
> I think I am actually most curious about the TLOG case. Parallelizing is good
but amounts to just throwing more hardware at the problem. Also, one of the
tests mysteriously fails when you do this so it isn't a super quick win :-)
>
> From reading the ref guide I see that we claim that TLOGs also don't do any
local indexing:
https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-inde
xing.html#tlog-replicas-plus-pull-replicas:~:text=TLOG%3A%20This%20type,type%20o
f%20replica. Whether the reality is more nuanced would require looking at the
code. Just curious, do you have any more specific reason to suspect why TLOG
replicas would need fingerprinting? Perhaps I'm oversimplifying but if TLOGs
actually do *not* do local indexing as we claim then I don't really see why
taking a fingerprint over the index would be useful. I can take a deeper look
on the code as well as running the tests that originally inspired this feature
as a sanity check (as a litmus test of whether tlog+pull needs this).
>
> My point is it feels "silly" do run this fingerprint for gigantic TLOG clouds
simply out of superstition. For reference, the fingerprinting feature was added
*before* TLOG+PULL feature so it wouldn't be terribly surprising if this were
an optimization that was still on the table.
>
> Luke
>
> From: dev@solr.apache.org At: 05/05/25 13:27:32 UTC-4:00To:
dev@solr.apache.org
> Subject: Re: IndexFingerprint and Leader Election Slowness
>
> Yeah, I think there's a lot of room for optimization here.
>
> I can't answer question (2) offhand, but my hunch is that PULL
> replicas shouldn't need fingerprinting, even if TLOG replicas might.
>
> (3) sounds really useful, but as you mention it has some edge cases
> that might be tricky to handle (e.g. in-place updates)
>
> (1) IMO sounds like the best "bang for buck", in that the computation
> would be relatively easy to parallelize and could have a really
> drastic performance improvement.  My two-cents would be to start there
> and benchmark to see what sort of a speedup that gets you guys on your
> larger indices.  It might be enough that you don't need to consider
> other, more complex options.  If you go this route and propose it
> upstream, feel free to tag me as a reviewer.  Happy to take a look!
>
> Best,
>
> Jason
>
> On Thu, May 1, 2025 at 6:05 PM Matthew Biscocho (BLOOMBERG/ 919 3RD A)
> <mbisco...@bloomberg.net> wrote:
> >
> > I would like to add a 4th point which is how often the potential leader
> during election ends up recalculating its fingerprint. I noticed this when the
> leader asks all the replicas to "syncToMe" before establishing itself as
leader.
> >
> > Lets say for example it asks the 4 other replicas to sync with it to become
> leader. Well each replica ends up sending a request back to the potential
> leader asking for it's fingerprint to check if they are already in sync. Now
> when recalculating it's fingerprint, we don't lock this function so with each
> request being processed, all of them just misses the cache and now we have 4
> threads potentially recalculating the fingerprint across all its segments if
> all the caches miss and populating/overwriting it at the same time. Now this
is
> probably fine since they are all going over the same segments but seems
> wasteful as it scales as there are more replicas.
> >
> > Then when this call finishes and it happens to be the leader is not in-sync
> with any of the replicas, it asks for the fingerprint once more when it tries
> to get it's versions. Again there is a cache in front so technically this may
> not be as big of a problem a second time since the first call already computed
> it for the second. But that means with 4 replicas it potentially just asked
the
> leader to check its fingerprint 8 times. This makes me think maybe the leader
> should front load this calculation for its fingerprint before asking to
> "syncToMe" and pass that to each replica to avoid all this repeated work
> and save the network calls.
> >
> > - Matt
> >
> > From: dev@solr.apache.org At: 05/01/25 12:57:54 UTC-4:00To:
> dev@solr.apache.org
> > Subject: IndexFingerprint and Leader Election Slowness
> >
> > Hey all,
> >
> > Can we talk for a moment about index fingerprinting? For those uninitialized
> > here is the original ticket https://issues.apache.org/jira/browse/SOLR-8586
as
> > well as the condition that motivated it
> > https://issues.apache.org/jira/browse/SOLR-8129.
> >
> > In short, when a leader fails during distributed update fan-out, some
replicas
> > may get updates that others miss. Normally a new leader fills in any of its
> > gaps from other replicas, but if soft-commits have flushed those updates
from
> > the t-log, comparing logs won’t catch the discrepancy. To avoid split-brain,
> > Solr currently checksums all non-deleted document versions in the index,
since
> > versions are unique per shard.
> >
> > However, this fingerprinting can be very slow process that blocks leader
> > election. In our investigation of election latency, Matt Biscocho and I saw
it
> > take 10–60 seconds on large shards (hundreds of millions of docs).
> >
> > A few observations and questions:
> >
> >
> >   1)Can we parallelize the checksum? It's tempting to just slap a
> > parallelStream here
> >
>
https://github.com/apache/solr/blob/25309f64685a8b70a3bb79c4a07eb8e005724600/sol
> > r/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L2551,
although
> > we’re wary of using the ForkJoinPool commonPool in a giant project like
Solr.
> > Don't want to derail the main discussion but I've seen some arguable misuse
of
> > parallelStream here, i.e. using it for blocking work.
> >   2)Is fingerprinting relevant for TLOG+PULL? Since TLOG replicas only index
> > when elected and otherwise bypass indexing, is full index fingerprinting
still
> > necessary? Segment downloads already include TCP-layer checksums.
> >   3)Can fingerprinting be done more eagerly? Doing this work only at
election
> > time stalls everything. Could we subtract deletions incrementally (as Yonik
> > originally suggested) as well as using a custom IndexReaderWarmer to hash
new
> > segments on open? In-place updates are trickier but are a minority use case
> (at
> > least for us).
> >
> > We're also eager to hear any other ideas—or hear how you’ve configured Solr
to
> > avoid this issue altogether as we're sort of surprised we didn't see more
> > chatter about this on the mailing list.
> >
> > Thanks,
> > Luke
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Re: IndexFingerprint and Leader Election Slowness

Reply via email to