Copilot commented on code in PR #17186:
URL: https://github.com/apache/pinot/pull/17186#discussion_r2517346783
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/NoDictColumnStatisticsCollector.java:
##########
@@ -223,10 +242,40 @@ public void seal() {
private void updateHllPlus(Object value) {
if (value instanceof BigDecimal) {
- // Canonicalize BigDecimal as string to avoid scale-related equality
issues
+ // Canonicalize BigDecimal as string to avoid scale-related equality
issues:
+ // BigDecimals with different scales (e.g., 1.0 vs 1.00) are not equal
by default,
+ // but their string representations normalize the value for cardinality
tracking.
_hllPlus.offer(((BigDecimal) value).toString());
} else {
_hllPlus.offer(value);
}
}
+
+ private void trackExactUnique(Object value) {
+ Set<Object> exact = _exactUniquesRef.get(); // local snapshot to avoid
check-then-act race
+ if (exact == null) {
+ return;
+ }
+ Object key;
+ if (value instanceof byte[]) {
+ key = new ByteArray((byte[]) value);
+ } else if (value instanceof BigDecimal) {
+ // Use string representation to avoid scale-related equality issues:
+ // BigDecimals with different scales (e.g., 1.0 vs 1.00) are not equal
by default,
+ // but their string representations normalize the value for cardinality
tracking.
+ key = ((BigDecimal) value).toString();
+ } else {
+ key = value;
+ }
+ exact.add(key);
+ if (exact.size() > EXACT_UNIQUE_TRACKING_THRESHOLD) {
Review Comment:
Race condition between `add()` and `size()` check can cause the set to grow
unbounded beyond the threshold. Multiple threads could simultaneously add
elements and check size when it's exactly at the threshold, with each seeing
size <= threshold before their additions complete. Use the return value from
`add()` (which returns true if element was added) to increment an
`AtomicInteger` counter, and check the counter instead of set size to ensure
deterministic threshold enforcement.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]