Copilot commented on code in PR #17186:
URL: https://github.com/apache/pinot/pull/17186#discussion_r2517082227
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/NoDictColumnStatisticsCollector.java:
##########
@@ -52,6 +54,9 @@ public class NoDictColumnStatisticsCollector extends
AbstractColumnStatisticsCol
private boolean _sealed = false;
// HLL Plus generally returns approximate cardinality >= actual cardinality
which is desired
private final HyperLogLogPlus _hllPlus;
+ // Track exact uniques up to a threshold to avoid small-N underestimation
and test flakiness
+ private static final int EXACT_UNIQUE_TRACKING_THRESHOLD = 2048;
+ private Set<Object> _exactUniques = new HashSet<>();
Review Comment:
The `_exactUniques` field is reassigned to `null` after exceeding the
threshold, but it's not marked as `volatile` or synchronized. If this class is
used in a concurrent context, this pattern could lead to visibility issues.
Consider documenting thread-safety assumptions or using `volatile` if
concurrent access is expected.
```suggestion
private volatile Set<Object> _exactUniques = new HashSet<>();
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]