Copilot commented on code in PR #17186:
URL: https://github.com/apache/pinot/pull/17186#discussion_r2517346783


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/NoDictColumnStatisticsCollector.java:
##########
@@ -223,10 +242,40 @@ public void seal() {
 
   private void updateHllPlus(Object value) {
     if (value instanceof BigDecimal) {
-      // Canonicalize BigDecimal as string to avoid scale-related equality 
issues
+      // Canonicalize BigDecimal as string to avoid scale-related equality 
issues:
+      // BigDecimals with different scales (e.g., 1.0 vs 1.00) are not equal 
by default,
+      // but their string representations normalize the value for cardinality 
tracking.
       _hllPlus.offer(((BigDecimal) value).toString());
     } else {
       _hllPlus.offer(value);
     }
   }
+
+  private void trackExactUnique(Object value) {
+    Set<Object> exact = _exactUniquesRef.get(); // local snapshot to avoid 
check-then-act race
+    if (exact == null) {
+      return;
+    }
+    Object key;
+    if (value instanceof byte[]) {
+      key = new ByteArray((byte[]) value);
+    } else if (value instanceof BigDecimal) {
+      // Use string representation to avoid scale-related equality issues:
+      // BigDecimals with different scales (e.g., 1.0 vs 1.00) are not equal 
by default,
+      // but their string representations normalize the value for cardinality 
tracking.
+      key = ((BigDecimal) value).toString();
+    } else {
+      key = value;
+    }
+    exact.add(key);
+    if (exact.size() > EXACT_UNIQUE_TRACKING_THRESHOLD) {

Review Comment:
   Race condition between `add()` and `size()` check can cause the set to grow 
unbounded beyond the threshold. Multiple threads could simultaneously add 
elements and check size when it's exactly at the threshold, with each seeing 
size <= threshold before their additions complete. Use the return value from 
`add()` (which returns true if element was added) to increment an 
`AtomicInteger` counter, and check the counter instead of set size to ensure 
deterministic threshold enforcement.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to