jpountz commented on issue #12140:
URL: https://github.com/apache/lucene/issues/12140#issuecomment-1426119587

   I like medians better than averages in many cases, but would this require 
iterating over all segments in the index everytime we need to make a caching 
decision? I worry this could be a bottleneck for indexes with many segments and 
cheap queries. My reasoning for the average segment size was that it's 
something that can be computed cheaply as `topLevelReader.maxDoc() / 
leaves.size()`. To @dnhatn 's point, maybe it should even be half the average 
segment size to make sure it includes all segments from the upper tier? I'm 
going below 95% because I'd expect the next tier to have segments in the order 
of 10x smaller, so with a 50% threshold we'd cover segments from the upper tier 
with greater confidence while still excluding the next tier?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to