Copilot commented on code in PR #17113:
URL: https://github.com/apache/pinot/pull/17113#discussion_r2488281790


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/MapColumnPreIndexStatsCollector.java:
##########
@@ -105,6 +110,62 @@ public void collect(Object entry) {
             updatePartition(key);
           }
         }
+        if (keyStats instanceof NoDictColumnStatisticsCollector) {
+          keyStats.collect(value);
+          continue;
+        }
+        if (keyStats instanceof StringColumnPreIndexStatsCollector) {
+          if (value instanceof String || value instanceof Number || value 
instanceof Boolean) {
+            keyStats.collect(String.valueOf(value));
+            continue;
+          }
+          try {
+            keyStats.collect(JsonUtils.objectToString(value));
+            continue;
+          } catch (JsonProcessingException e) {
+            throw new RuntimeException("Failed to serialize value for key '" + 
key + "': " + value, e);
+          }
+        }
+        if (keyStats instanceof BigDecimalColumnPreIndexStatsCollector) {
+          
keyStats.collect(PinotDataType.STRING.toBigDecimal(value.toString()));

Review Comment:
   Using `PinotDataType.STRING.toBigDecimal()` is incorrect. The `STRING` 
type's `toBigDecimal()` method expects a String value but is being passed the 
result of `value.toString()`. This should use `new 
BigDecimal(value.toString())` directly or verify that the STRING type's 
toBigDecimal implementation handles string conversion properly.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/MapColumnPreIndexStatsCollector.java:
##########
@@ -208,18 +267,23 @@ private AbstractColumnStatisticsCollector 
createKeyStatsCollector(String key, Ob
         return new DoubleColumnPreIndexStatsCollector(key, config);
       case BIG_DECIMAL:
         return new BigDecimalColumnPreIndexStatsCollector(key, config);
+      case BOOLEAN:
       case STRING:
+      case MAP:
+      case OBJECT:
         return new StringColumnPreIndexStatsCollector(key, config);
       default:
-        throw new UnsupportedOperationException(String.format("MAP column does 
not yet support '%s'", type));
+        LOGGER.warn("Unknown data type {} for key {} and value {}", type, key, 
value);

Review Comment:
   Logging the raw value object could expose sensitive data. Consider logging 
only the value's type or a sanitized representation instead of the actual value.
   ```suggestion
           LOGGER.warn("Unknown data type {} for key {} and value type {}", 
type, key, value.getClass().getName());
   ```



##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/creator/impl/stats/MapColumnPreIndexStatsCollectorTest.java:
##########
@@ -249,4 +250,141 @@ public void 
testKeyCollectorsUseNoDictWhenEnabledAndMatchOutputs() {
       assertEquals(keyNoDict.isSorted(), keyDict.isSorted(), "sorted mismatch 
for key " + key);
     }
   }
+
+  @Test
+  public void testFrequenciesUniqueKeysAndLengths() {
+    Map<String, Object> r1 = new HashMap<>();
+    r1.put("kStr", "alpha");
+    r1.put("kInt", 3);
+    r1.put("kLong", 7L);
+    r1.put("kFloat", 1.5f);
+    r1.put("kDouble", 2.25d);
+    r1.put("kBigDec", new java.math.BigDecimal("10.01"));
+    r1.put("kNull", null); // ignored
+
+    Map<String, Object> r2 = new HashMap<>();
+    r2.put("kStr", "beta");
+    r2.put("kInt", 3);
+    r2.put("kLong", 2L);
+    r2.put("kFloat", 1.5f);
+    r2.put("kDouble", 0.75d);
+    r2.put("kBigDec", new java.math.BigDecimal("10.01"));
+
+    Map<String, Object> r3 = new HashMap<>();
+    r3.put("kStr", "alpha");
+    r3.put("kInt", 3);
+    r3.put("kFloat", 3.5f);
+    r3.put("kBigDec", new java.math.BigDecimal("5.25"));
+
+    StatsCollectorConfig cfg = newConfig(false);
+    MapColumnPreIndexStatsCollector col = new 
MapColumnPreIndexStatsCollector("col", cfg);
+    col.collect(r1);
+    col.collect(r2);
+    col.collect(r3);
+    col.seal();
+
+    // Frequencies per key
+    Map<String, Integer> freqs = col.getAllKeyFrequencies();
+    assertEquals(freqs.get("kStr").intValue(), 3);
+    assertEquals(freqs.get("kInt").intValue(), 3);
+    assertEquals(freqs.get("kLong").intValue(), 2);
+    assertEquals(freqs.get("kFloat").intValue(), 3);
+    assertEquals(freqs.get("kDouble").intValue(), 2);
+    assertEquals(freqs.get("kBigDec").intValue(), 3);
+    assertFalse(freqs.containsKey("kNull"));
+
+    // Unique key set is sorted
+    String[] keys = col.getUniqueValuesSet();
+    assertEquals(keys, new String[]{"kBigDec", "kDouble", "kFloat", "kInt", 
"kLong", "kStr"});
+
+    // Row length metrics
+    int l1 = org.apache.pinot.spi.utils.MapUtils.serializeMap(r1).length;
+    int l2 = org.apache.pinot.spi.utils.MapUtils.serializeMap(r2).length;
+    int l3 = org.apache.pinot.spi.utils.MapUtils.serializeMap(r3).length;
+    int expectedMin = Math.min(l1, Math.min(l2, l3));
+    int expectedMax = Math.max(l1, Math.max(l2, l3));
+    assertEquals(col.getLengthOfShortestElement(), expectedMin);
+    assertEquals(col.getLengthOfLargestElement(), expectedMax);
+    assertEquals(col.getMaxRowLengthInBytes(), expectedMax);
+  }
+
+  @Test
+  public void testTypeConversionAndJsonSerialization() {
+    // Create collectors based on first-seen types, then feed convertible 
values
+    Map<String, Object> r1 = new HashMap<>();
+    r1.put("kInt2", 5); // int collector
+    r1.put("kObj", new TreeMap<>(Map.of("k1", "v1", "k2", "v2"))); // will 
serialize to "[1]"

Review Comment:
   The comment states the TreeMap 'will serialize to \"[1]\"' which is 
incorrect. A TreeMap containing {\"k1\":\"v1\",\"k2\":\"v2\"} would serialize 
to a JSON object, not \"[1]\". The comment should be corrected or removed.
   ```suggestion
       r1.put("kObj", new TreeMap<>(Map.of("k1", "v1", "k2", "v2"))); // will 
serialize to a JSON object: {"k1":"v1","k2":"v2"}
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/MapColumnPreIndexStatsCollector.java:
##########
@@ -105,6 +110,62 @@ public void collect(Object entry) {
             updatePartition(key);
           }
         }
+        if (keyStats instanceof NoDictColumnStatisticsCollector) {
+          keyStats.collect(value);
+          continue;
+        }
+        if (keyStats instanceof StringColumnPreIndexStatsCollector) {
+          if (value instanceof String || value instanceof Number || value 
instanceof Boolean) {
+            keyStats.collect(String.valueOf(value));
+            continue;
+          }
+          try {
+            keyStats.collect(JsonUtils.objectToString(value));
+            continue;
+          } catch (JsonProcessingException e) {
+            throw new RuntimeException("Failed to serialize value for key '" + 
key + "': " + value, e);
+          }
+        }
+        if (keyStats instanceof BigDecimalColumnPreIndexStatsCollector) {
+          
keyStats.collect(PinotDataType.STRING.toBigDecimal(value.toString()));
+          continue;
+        }
+        if (value instanceof Number) {
+          Number valueNumber = (Number) value;
+          if (keyStats instanceof IntColumnPreIndexStatsCollector) {
+            keyStats.collect(valueNumber.intValue());
+            continue;
+          }
+          if (keyStats instanceof LongColumnPreIndexStatsCollector) {
+            keyStats.collect(valueNumber.longValue());
+            continue;
+          }
+          if (keyStats instanceof FloatColumnPreIndexStatsCollector) {
+            keyStats.collect(valueNumber.floatValue());
+            continue;
+          }
+          if (keyStats instanceof DoubleColumnPreIndexStatsCollector) {
+            keyStats.collect(valueNumber.doubleValue());
+            continue;
+          }
+        }
+        if (keyStats instanceof IntColumnPreIndexStatsCollector) {
+          keyStats.collect(PinotDataType.STRING.toInt(value.toString()));
+          continue;
+        }
+        if (keyStats instanceof LongColumnPreIndexStatsCollector) {
+          keyStats.collect(PinotDataType.STRING.toLong(value.toString()));
+          continue;
+        }
+        if (keyStats instanceof FloatColumnPreIndexStatsCollector) {
+          keyStats.collect(PinotDataType.STRING.toFloat(value.toString()));
+          continue;
+        }
+        if (keyStats instanceof DoubleColumnPreIndexStatsCollector) {
+          keyStats.collect(PinotDataType.STRING.toDouble(value.toString()));

Review Comment:
   These conversion branches use 
`PinotDataType.STRING.toInt/toLong/toFloat/toDouble()` methods, but according 
to the context, `STRING.toInt()` throws `UnsupportedOperationException`. The 
code should either use direct parsing (e.g., `Integer.parseInt()`, 
`Long.parseLong()`) or verify that STRING type implements these conversion 
methods properly.
   ```suggestion
             keyStats.collect(Integer.parseInt(value.toString()));
             continue;
           }
           if (keyStats instanceof LongColumnPreIndexStatsCollector) {
             keyStats.collect(Long.parseLong(value.toString()));
             continue;
           }
           if (keyStats instanceof FloatColumnPreIndexStatsCollector) {
             keyStats.collect(Float.parseFloat(value.toString()));
             continue;
           }
           if (keyStats instanceof DoubleColumnPreIndexStatsCollector) {
             keyStats.collect(Double.parseDouble(value.toString()));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to