[I] [Bug] Fix distributed TopN accuracy for SUM, COUNT, and MEAN aggregations [skywalking]

via GitHub Sat, 11 Apr 2026 03:58:45 -0700


eye-gu opened a new issue, #13811:
URL: https://github.com/apache/skywalking/issues/13811


   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Apache SkyWalking Component
   
   BanyanDB (apache/skywalking-banyandb)
   
   ### What happened
   
   ## Problem
   
   In distributed TopN queries, each data node sets `agg` to 
`AGGREGATION_FUNCTION_UNSPECIFIED` and applies the TopN limit locally before 
sending results to the liaison node. This produces incorrect results for SUM, 
COUNT, and MEAN aggregations in two ways:
   
   1. **Cross-node truncation**: An entity that ranks low on one data node 
might rank high globally after merging partial results across all nodes.
   
   2. **Intra-node truncation**: A single data node may have multiple data 
points for the same entity (e.g., across different shards), but only the 
largest one survives the local TopN truncation. For example, with COUNT 
aggregation and TopN=3, if a node has 2 data points for entity5, only the 
larger one is sent to liaison, causing the count to be 1 instead of 2.
   
   MIN and MAX are not affected since they are monotonic — the global MIN/MAX 
always appears in at least one node's local TopN, and truncation within a node 
doesn't discard the extremum.
   
   ## Proposed Solution
   
   - For MIN/MAX: keep applying TopN limit at data nodes (safe to truncate)
   - For SUM/COUNT/MEAN: push the TopN query to all data nodes **without 
truncation** (TopN=0) and with the actual `agg` function and 
`emit_partial=true`, so data nodes return partial aggregated values (e.g., 
`[sum, count]` for MEAN) along with `shard_id` metadata
   - At the liaison node, merge partial results by entity using a min-heap 
aggregator, then compute the final TopN
   - Include `shard_id` in the group-by key for non-MIN/MAX to prevent 
incorrect cross-shard merging
   
   
   ### What you expected to happen
   
   Distributed TopN queries should return accurate results for all aggregation 
types (MIN, MAX, SUM, COUNT, MEAN) regardless of the number of data nodes.
   
   For SUM/COUNT/MEAN, the final TopN should reflect the true aggregated value 
computed from all data points across all data nodes, not just the top-N 
per-node subset.
   
   
   ### How to reproduce
   
   1. Deploy BanyanDB in distributed mode with at least 2 data nodes.
   2. Ingest data points where the same entity has data on multiple data nodes 
or across multiple shards.
   3. Issue a TopN query with COUNT, SUM, or MEAN aggregation.
   4. Expected: entity with true aggregated value rank N should appear in TopN.
      Actual: entity is missing because its per-node/sub-shard partial value 
was below the local TopN threshold and was truncated.
   
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit a pull request to fix on your own?
   
   - [x] Yes I am willing to submit a pull request on my own!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] Fix distributed TopN accuracy for SUM, COUNT, and MEAN aggregations [skywalking]

Reply via email to