Copilot commented on code in PR #17186:
URL: https://github.com/apache/pinot/pull/17186#discussion_r2517089823
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/NoDictColumnStatisticsCollector.java:
##########
@@ -52,6 +54,10 @@ public class NoDictColumnStatisticsCollector extends
AbstractColumnStatisticsCol
private boolean _sealed = false;
// HLL Plus generally returns approximate cardinality >= actual cardinality
which is desired
private final HyperLogLogPlus _hllPlus;
+ // Track exact uniques up to a threshold to avoid small-N underestimation
and test flakiness
+ private static final int EXACT_UNIQUE_TRACKING_THRESHOLD = 2048;
+ // Volatile because this reference can be nulled out once the threshold is
exceeded
+ private volatile Set<Object> _exactUniques = new HashSet<>();
Review Comment:
The `volatile` keyword on the `Set` reference doesn't provide thread-safety
for the `HashSet` operations themselves. Concurrent modifications to the
`HashSet` (via `add()`) can lead to race conditions. Consider using
`Collections.synchronizedSet()` or `ConcurrentHashMap.newKeySet()` if this
collector can be accessed by multiple threads.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/NoDictColumnStatisticsCollector.java:
##########
@@ -223,10 +238,34 @@ public void seal() {
private void updateHllPlus(Object value) {
if (value instanceof BigDecimal) {
- // Canonicalize BigDecimal as string to avoid scale-related equality
issues
+ // Canonicalize BigDecimal as string to avoid scale-related equality
issues:
+ // BigDecimals with different scales (e.g., 1.0 vs 1.00) are not equal
by default,
+ // but their string representations normalize the value for cardinality
tracking.
_hllPlus.offer(((BigDecimal) value).toString());
} else {
_hllPlus.offer(value);
}
}
+
+ private void trackExactUnique(Object value) {
+ if (_exactUniques == null) {
+ return;
+ }
Review Comment:
Reading `_exactUniques` and then calling `add()` on it creates a
check-then-act race condition. Another thread could null out `_exactUniques`
between the null check and the `add()` operation, causing a
`NullPointerException`. Store the reference in a local variable at the start of
the method to ensure consistent access.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]