mikemccand opened a new issue, #14148:
URL: https://github.com/apache/lucene/issues/14148

   ### Description
   
   TL;DR: `TieredMergePolicy` can create massive snapshots if you configure it 
for aggressive `deletesPctAllowed`, which can hurt searchers (cause page fault 
storms) in a near-real-time replication world.  Maybe we could add an optional 
(off by default) "rate limit" on how many amortized bytes/sec TMP is merging?  
This is just an idea / brainstorming / design discussion so far ... no PR.
   
   Full context:
   
   At Amazon (Product Search team) we use [near-real-time segment 
replication](https://blog.mikemccandless.com/2017/09/lucenes-near-real-time-segment-index.html)
 to efficiently distribute index updates to all searchers/replicas.
   
   Since we have many searchers per indexer shard, to scale to very high QPS, 
we intentionally tune `TieredMergePolicy` (TMP) to very aggressively reclaim 
deletions.  Burning extra CPU / bandwidth during indexing to save even a little 
bit of CPU during searching is a good tradeoff for us (and in general, for 
Lucene users with high QPS requirements).
   
   But we have a "fun" problem occasionally: sometimes we have an update storm 
(an upstream team reindexes large-ish portions of Amazon's catalog through the 
real-time indexing stream), and this leads to lots and lots of merging and many 
large (max-sized 5 GB) segments being replicated out to searchers in short 
order, sometimes over links (e.g. cross-region) that are not as crazy-fast as 
within-region networking fabric, and our searchers fall behind a bit.
   
   Falling behind is not the end of the world: the searchers simply skip some 
point-in-time snapshots and jump to the latest one,  effectively sub-sampling 
checkpoints as best they can given the bandwidth constraints.  Index-to-search 
latency is hurt a bit, but recovers once the indexer catches up on the update 
storm.
   
   The bigger problem for us is that we size our shards, roughly, so that the 
important parts of the index (the parts hit often by query traffic) are fully 
"hot".  I.e. so the OS has enough RAM to hold the hot parts of the index.  But 
when it takes too long to copy and light (switch over to the new segments for 
searching) a given snapshot, and we skip the next one or two snapshots, the 
followon snapshot that we finally do load may have a sizable part of the index 
rewritten, and the snapshot size maybe a big percentage of the overall index, 
and copying/lighting it will stress the OS into a paging storm, hurting our 
long-pole latencies.
   
   So one possible solution we thought of is to add an optional (off by 
default) `setMaxBandwidth` to TMP so that "on average" (amortized over some 
time window ish) TMP would not produce so many merges that it exceeds that 
bandwidth cap.  With such a cap, during an update storm (war time), the index 
delete %tg would necessarily increase beyond what we ideally want / configured 
with `setDeletesPctAllowed`, but then during peace time, TMP could again catch 
up and push the deletes back below the target.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to