Copilot commented on code in PR #17186:
URL: https://github.com/apache/pinot/pull/17186#discussion_r2517203168


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/NoDictColumnStatisticsCollector.java:
##########
@@ -208,8 +222,17 @@ public int getMaxRowLengthInBytes() {
 
   @Override
   public int getCardinality() {
-    // Get approximate distinct count estimate using HLL++
-    // Increase by 10% to increase probability of not returning lower than 
actual cardinality
+    // If we are still tracking exact uniques, return exact cardinality
+    Set<Object> exact = _exactUniquesRef.get(); // local snapshot to avoid 
races
+    if (exact != null) {
+      int size = exact.size();
+      // Defense-in-depth: if exact tracking was concurrently disabled after 
we took the snapshot,
+      // prefer falling back to the HLL++ estimate to avoid any theoretical 
visibility concerns.
+      if (_exactUniquesRef.get() == exact) {
+        return size;
+      }

Review Comment:
   The double-check pattern here is incorrect. After calling `exact.size()` on 
line 228, the size could become stale if another thread exceeds the threshold 
and nulls out the reference. The second check on line 231 doesn't prevent 
returning a stale size. Consider reading the size only after the second 
reference check, or accept that exact cardinality might slightly exceed the 
threshold in race conditions.
   ```suggestion
       Set<Object> exact = _exactUniquesRef.get();
       if (exact != null) {
         // Only read the size after confirming the reference is still valid
         // This avoids returning a stale size if the reference was nulled out 
concurrently
         return exact.size();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to