mikemccand commented on PR #14163:
URL: https://github.com/apache/lucene/pull/14163#issuecomment-2612285079

   It's terrible that `TieredMergePolicy` was not merging these segments, 
naturally or under `forceMerge` -- let's understand why it's failing to do so?  
It's like we need an `explain` API for its merge selection.
   
   `TMP` does have a `setForceMergeDeletesPctAllowed`, which defaults to 10%, 
meaning if a segment has <= 10% deletions, it won't be selected under 
`forceMerge`.  But if I'm reading it right you have a segment `_1btbuk` with 
~82.4% deleted docs (`12507939 / (2666453 + 12507939) = 0.8242794175872088`), 
which should have been selected.
   
   Have you changed `setMaxMergedSegmentMB` away from its default (5 GB)?
   
   Separately, you have crazy high segment names -- I'm curious if this is a 
very long lived index?
   
   This PR reminds me of the Linux "direct IO" struggles.  Linus [really does 
not like the existence of "direct IO" (`O_DIRECT` flag to `open` 
API)](https://www.theregister.com/2019/06/21/linus_torvalds_rant/), because its 
existence means users may jump straight to that and take pressure off improving 
how Linux manages IO caching (the buffer cache).  I.e. rather than improving 
the kernel's IO caching, users can skip it altogether.  It's the same thing 
here: if we expose a merge policy where users can simply pick their own merges, 
we take pressure off of fixing the problems in our default `TieredMergePolicy`. 
 That being said, `MergePolicy` is pluggable for exactly this reason: users 
(well direct Lucene users) are free to customize merge selection.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to