HoustonPutman opened a new pull request, #13914: URL: https://github.com/apache/lucene/pull/13914
Resolves #13760 ### Description <!-- If this is your first contribution to Lucene, please make sure you have reviewed the contribution guide. https://github.com/apache/lucene/blob/main/CONTRIBUTING.md --> This is using a similar approach to how Solr used to compute multiple percentiles at a single time. Basically utilize the quick select method, but instead of following a single path, follow a path for each of the `k`s that is requested. Multi-quickselect. That's what I originally made, until I realized that the `DynamicRangeUtil` is weighted, so I refactored it to choose by weights instead, and also capture the running-value-total and running-weight-total, because that information is used in the `DynamicRangeInfo`. My goal was to add this as a generic capability of the `Selector` (or `IntroSelector`) class, but because of the limitations above, it is currently a separate class to handle this. If there's any suggestions on how to make this generic enough to be put in the generic class, that would be great. But it might not be worth the effort if it wouldn't be used anywhere else. As for the original multi-quickSelect algorithm I mentioned, I looked for other multi-select use cases across Lucene, but I only found one instance (The quantized does two select calls in succession). If there's more instances we can find, I would be happy to add multiSelect as an option on the `Selector` class, and implement it in all provided classes. ### To-Do - The code needs to be cleaned up and better documented, this is just a POC - Benchmarks comparing this to the full-sorting implementation. ### Caveat The implement is slightly different, as it will pick the groups according to "The first value for which the running weight is <= weight-range-boundary". The old logic would start counting again after a weight range was complete, which removes information from the overflow of previous weight-ranges. I'm not sure either approach is right or wrong, but I wanted to explicitly state how the results would be different and why I had to alter a unit test to pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org