[I] Add an optional bandwidth cap to `TieredMergePolicy`? [lucene]

via GitHub Fri, 17 Jan 2025 05:06:20 -0800


mikemccand opened a new issue, #14148:
URL: https://github.com/apache/lucene/issues/14148

### Description

TL;DR: `TieredMergePolicy` can create massive snapshots if you configure it
for aggressive `deletesPctAllowed`, which can hurt searchers (cause page fault
storms) in a near-real-time replication world. Maybe we could add an optional
(off by default) "rate limit" on how many amortized bytes/sec TMP is merging?
This is just an idea / brainstorming / design discussion so far ... no PR.

Full context:

At Amazon (Product Search team) we use [near-real-time segment
replication](https://blog.mikemccandless.com/2017/09/lucenes-near-real-time-segment-index.html)
to efficiently distribute index updates to all searchers/replicas.

Since we have many searchers per indexer shard, to scale to very high QPS,
we intentionally tune `TieredMergePolicy` (TMP) to very aggressively reclaim
deletions. Burning extra CPU / bandwidth during indexing to save even a little
bit of CPU during searching is a good tradeoff for us (and in general, for
Lucene users with high QPS requirements).

But we have a "fun" problem occasionally: sometimes we have an update storm
(an upstream team reindexes large-ish portions of Amazon's catalog through the
real-time indexing stream), and this leads to lots and lots of merging and many
large (max-sized 5 GB) segments being replicated out to searchers in short
order, sometimes over links (e.g. cross-region) that are not as crazy-fast as
within-region networking fabric, and our searchers fall behind a bit.

Falling behind is not the end of the world: the searchers simply skip some
point-in-time snapshots and jump to the latest one, effectively sub-sampling
checkpoints as best they can given the bandwidth constraints. Index-to-search
latency is hurt a bit, but recovers once the indexer catches up on the update
storm.

The bigger problem for us is that we size our shards, roughly, so that the
important parts of the index (the parts hit often by query traffic) are fully
"hot". I.e. so the OS has enough RAM to hold the hot parts of the index. But
when it takes too long to copy and light (switch over to the new segments for
searching) a given snapshot, and we skip the next one or two snapshots, the
followon snapshot that we finally do load may have a sizable part of the index
rewritten, and the snapshot size maybe a big percentage of the overall index,
and copying/lighting it will stress the OS into a paging storm, hurting our
long-pole latencies.

So one possible solution we thought of is to add an optional (off by
default) `setMaxBandwidth` to TMP so that "on average" (amortized over some
time window ish) TMP would not produce so many merges that it exceeds that
bandwidth cap. With such a cap, during an update storm (war time), the index
delete %tg would necessarily increase beyond what we ideally want / configured
with `setDeletesPctAllowed`, but then during peace time, TMP could again catch
up and push the deletes back below the target.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[I] Add an optional bandwidth cap to `TieredMergePolicy`? [lucene]

Reply via email to