stefanvodita opened a new pull request, #12966: URL: https://github.com/apache/lucene/pull/12966
### Note This is a large change, refactoring most of the taxonomy facets code and changing internal behavior, without changing the API. There are specific API changes this sets us up to do later, e.g. retrieving counts from aggregation facets. ### What does this PR do well? 1. Moves most of the responsibility from `TaxonomyFacets` implementations to `TaxonomyFacets` itself. This reduces code duplication and enables future development. Addresses genericity issue mentioned in #12553. 2. As a consequence, it introduces sparse values to `FloatTaxonomyFacets`, which previously used dense values always. This issue is part of #12576. 3. It computes counts for all taxonomy facets always, which enables us to add an API to retrieve counts for association facets in the future. Addresses #11282. 4. As a consequence of having counts, we can check whether we encountered a label while faceting (`count > 0`), while previously we relied on the aggregation value to be positive. Closes #12585. 5. It introduces the idea of doing multiple aggregations in one go, with association facets doing the aggregation they were already doing, plus a count. We can extend to an arbitrary number of aggregations, as suggested in #12546. 6. It doesn't change the API. The only change in behavior users should notice is the fix for non-positive aggregation values, which were previously discarded. 7. It adds tests which were missing for sparse/dense values and non-positive aggregations. ### What's not ideal about this approach? 1. We could see some performance decreases. The more critical part of the work, aggregating, should be unaffected. There are a few extra method calls / dispatches / branches. Ranking and collecting results might be impacted because we are boxing / unboxing results to / from `Number` to avoid the primitive types. 2. The way the `TopOrdAndNumberQueue`s work is a bit awkward and inefficient. It required small changes to classes outside the scope of this change. Maybe we can come up with something better. ## What is next? 1. I'd like to know if the approach makes sense to others. 2. We can try running some benchmarks to see if there are any performance changes. 3. Is it important to preserve a default aggregation value of the right type in the results (i.e. `-1` for int aggregations, `-1f` for float aggregations)? If not, we can make a small simplification to always return `-1`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org