Re: [PR] [enhancement]NoDict: hybrid cardinality (exact ≤ 2,048; HLL++ with +10% buffer) [pinot]

via GitHub Tue, 11 Nov 2025 21:59:36 -0800


Copilot commented on code in PR #17186:
URL: https://github.com/apache/pinot/pull/17186#discussion_r2516963829



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/NoDictColumnStatisticsCollector.java:
##########
@@ -229,4 +243,24 @@ private void updateHllPlus(Object value) {
       _hllPlus.offer(value);
     }
   }
+
+  private void trackExactUnique(Object value) {
+    if (_exactUniques == null) {
+      return;
+    }
+    Object key;
+    if (value instanceof byte[]) {
+      key = new ByteArray((byte[]) value);

Review Comment:
   The code references `ByteArray` class without importing it. Add the required 
import statement for `ByteArray`, likely from 
`org.apache.pinot.spi.utils.ByteArray` or equivalent package.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/NoDictColumnStatisticsCollector.java:
##########
@@ -74,6 +79,7 @@ public void collect(Object entry) {
           throw new UnsupportedOperationException();
         }
         updateMinMax(value);
+        trackExactUnique(value);

Review Comment:
   The `trackExactUnique()` method is called before `updateHllPlus()` in the 
single-value path, but after the threshold is exceeded, `_exactUniques` will be 
null while HLL++ continues to track values. Consider reordering to call 
`updateHllPlus()` first to ensure HLL++ state is consistent even when exact 
tracking is active, or document why the current ordering is intentional.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/NoDictColumnStatisticsCollector.java:
##########
@@ -229,4 +243,24 @@ private void updateHllPlus(Object value) {
       _hllPlus.offer(value);
     }
   }
+
+  private void trackExactUnique(Object value) {
+    if (_exactUniques == null) {
+      return;
+    }
+    Object key;
+    if (value instanceof byte[]) {
+      key = new ByteArray((byte[]) value);
+    } else if (value instanceof BigDecimal) {
+      // Use string representation to avoid scale-related equality issues

Review Comment:
   [nitpick] The comment mentions 'scale-related equality issues' but doesn't 
explain what specific problem this solves. Consider expanding the comment to 
clarify that BigDecimals with different scales (e.g., 1.0 vs 1.00) are not 
equal by default, and string representation normalizes this.
   ```suggestion
         // Use string representation to avoid scale-related equality issues:
         // BigDecimals with different scales (e.g., 1.0 vs 1.00) are not equal 
by default,
         // but their string representations normalize the value for 
cardinality tracking.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [enhancement]NoDict: hybrid cardinality (exact ≤ 2,048; HLL++ with +10% buffer) [pinot]

Reply via email to