jpountz commented on issue #12140: URL: https://github.com/apache/lucene/issues/12140#issuecomment-1426119587
I like medians better than averages in many cases, but would this require iterating over all segments in the index everytime we need to make a caching decision? I worry this could be a bottleneck for indexes with many segments and cheap queries. My reasoning for the average segment size was that it's something that can be computed cheaply as `topLevelReader.maxDoc() / leaves.size()`. To @dnhatn 's point, maybe it should even be half the average segment size to make sure it includes all segments from the upper tier? I'm going below 95% because I'd expect the next tier to have segments in the order of 10x smaller, so with a 50% threshold we'd cover segments from the upper tier with greater confidence while still excluding the next tier? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org