yashmayya commented on code in PR #17186:
URL: https://github.com/apache/pinot/pull/17186#discussion_r2518905386


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/NoDictColumnStatisticsCollector.java:
##########
@@ -223,10 +242,40 @@ public void seal() {
 
   private void updateHllPlus(Object value) {
     if (value instanceof BigDecimal) {
-      // Canonicalize BigDecimal as string to avoid scale-related equality 
issues
+      // Canonicalize BigDecimal as string to avoid scale-related equality 
issues:
+      // BigDecimals with different scales (e.g., 1.0 vs 1.00) are not equal 
by default,
+      // but their string representations normalize the value for cardinality 
tracking.
       _hllPlus.offer(((BigDecimal) value).toString());
     } else {
       _hllPlus.offer(value);
     }
   }
+
+  private void trackExactUnique(Object value) {
+    Set<Object> exact = _exactUniquesRef.get(); // local snapshot to avoid 
check-then-act race
+    if (exact == null) {
+      return;
+    }
+    Object key;
+    if (value instanceof byte[]) {
+      key = new ByteArray((byte[]) value);
+    } else if (value instanceof BigDecimal) {
+      // Use string representation to avoid scale-related equality issues:
+      // BigDecimals with different scales (e.g., 1.0 vs 1.00) are not equal 
by default,
+      // but their string representations normalize the value for cardinality 
tracking.
+      key = ((BigDecimal) value).toString();
+    } else {
+      key = value;
+    }
+    exact.add(key);
+    if (exact.size() > EXACT_UNIQUE_TRACKING_THRESHOLD) {
+      // Drop exact tracking once the threshold is exceeded to avoid unbounded 
memory.
+      // Note on concurrency: we intentionally operate on the local snapshot 
'exact'.
+      // Another thread can successfully CAS the AtomicReference to null while 
this thread still holds
+      // the snapshot. That is safe: the snapshot remains valid for this call, 
and any additional adds
+      // are bounded and the snapshot becomes unreachable after this method 
returns (eligible for GC).
+      // Using CAS ensures only one thread disables tracking.
+      _exactUniquesRef.compareAndSet(exact, null);

Review Comment:
   Why is any of this needed? What exactly is the threading model and potential 
for contention or race conditions here?
   
   cc - @krishan1390 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to