Copilot commented on code in PR #17113:
URL: https://github.com/apache/pinot/pull/17113#discussion_r2488281790
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/MapColumnPreIndexStatsCollector.java:
##########
@@ -105,6 +110,62 @@ public void collect(Object entry) {
updatePartition(key);
}
}
+ if (keyStats instanceof NoDictColumnStatisticsCollector) {
+ keyStats.collect(value);
+ continue;
+ }
+ if (keyStats instanceof StringColumnPreIndexStatsCollector) {
+ if (value instanceof String || value instanceof Number || value
instanceof Boolean) {
+ keyStats.collect(String.valueOf(value));
+ continue;
+ }
+ try {
+ keyStats.collect(JsonUtils.objectToString(value));
+ continue;
+ } catch (JsonProcessingException e) {
+ throw new RuntimeException("Failed to serialize value for key '" +
key + "': " + value, e);
+ }
+ }
+ if (keyStats instanceof BigDecimalColumnPreIndexStatsCollector) {
+
keyStats.collect(PinotDataType.STRING.toBigDecimal(value.toString()));
Review Comment:
Using `PinotDataType.STRING.toBigDecimal()` is incorrect. The `STRING`
type's `toBigDecimal()` method expects a String value but is being passed the
result of `value.toString()`. This should use `new
BigDecimal(value.toString())` directly or verify that the STRING type's
toBigDecimal implementation handles string conversion properly.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/MapColumnPreIndexStatsCollector.java:
##########
@@ -208,18 +267,23 @@ private AbstractColumnStatisticsCollector
createKeyStatsCollector(String key, Ob
return new DoubleColumnPreIndexStatsCollector(key, config);
case BIG_DECIMAL:
return new BigDecimalColumnPreIndexStatsCollector(key, config);
+ case BOOLEAN:
case STRING:
+ case MAP:
+ case OBJECT:
return new StringColumnPreIndexStatsCollector(key, config);
default:
- throw new UnsupportedOperationException(String.format("MAP column does
not yet support '%s'", type));
+ LOGGER.warn("Unknown data type {} for key {} and value {}", type, key,
value);
Review Comment:
Logging the raw value object could expose sensitive data. Consider logging
only the value's type or a sanitized representation instead of the actual value.
```suggestion
LOGGER.warn("Unknown data type {} for key {} and value type {}",
type, key, value.getClass().getName());
```
##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/creator/impl/stats/MapColumnPreIndexStatsCollectorTest.java:
##########
@@ -249,4 +250,141 @@ public void
testKeyCollectorsUseNoDictWhenEnabledAndMatchOutputs() {
assertEquals(keyNoDict.isSorted(), keyDict.isSorted(), "sorted mismatch
for key " + key);
}
}
+
+ @Test
+ public void testFrequenciesUniqueKeysAndLengths() {
+ Map<String, Object> r1 = new HashMap<>();
+ r1.put("kStr", "alpha");
+ r1.put("kInt", 3);
+ r1.put("kLong", 7L);
+ r1.put("kFloat", 1.5f);
+ r1.put("kDouble", 2.25d);
+ r1.put("kBigDec", new java.math.BigDecimal("10.01"));
+ r1.put("kNull", null); // ignored
+
+ Map<String, Object> r2 = new HashMap<>();
+ r2.put("kStr", "beta");
+ r2.put("kInt", 3);
+ r2.put("kLong", 2L);
+ r2.put("kFloat", 1.5f);
+ r2.put("kDouble", 0.75d);
+ r2.put("kBigDec", new java.math.BigDecimal("10.01"));
+
+ Map<String, Object> r3 = new HashMap<>();
+ r3.put("kStr", "alpha");
+ r3.put("kInt", 3);
+ r3.put("kFloat", 3.5f);
+ r3.put("kBigDec", new java.math.BigDecimal("5.25"));
+
+ StatsCollectorConfig cfg = newConfig(false);
+ MapColumnPreIndexStatsCollector col = new
MapColumnPreIndexStatsCollector("col", cfg);
+ col.collect(r1);
+ col.collect(r2);
+ col.collect(r3);
+ col.seal();
+
+ // Frequencies per key
+ Map<String, Integer> freqs = col.getAllKeyFrequencies();
+ assertEquals(freqs.get("kStr").intValue(), 3);
+ assertEquals(freqs.get("kInt").intValue(), 3);
+ assertEquals(freqs.get("kLong").intValue(), 2);
+ assertEquals(freqs.get("kFloat").intValue(), 3);
+ assertEquals(freqs.get("kDouble").intValue(), 2);
+ assertEquals(freqs.get("kBigDec").intValue(), 3);
+ assertFalse(freqs.containsKey("kNull"));
+
+ // Unique key set is sorted
+ String[] keys = col.getUniqueValuesSet();
+ assertEquals(keys, new String[]{"kBigDec", "kDouble", "kFloat", "kInt",
"kLong", "kStr"});
+
+ // Row length metrics
+ int l1 = org.apache.pinot.spi.utils.MapUtils.serializeMap(r1).length;
+ int l2 = org.apache.pinot.spi.utils.MapUtils.serializeMap(r2).length;
+ int l3 = org.apache.pinot.spi.utils.MapUtils.serializeMap(r3).length;
+ int expectedMin = Math.min(l1, Math.min(l2, l3));
+ int expectedMax = Math.max(l1, Math.max(l2, l3));
+ assertEquals(col.getLengthOfShortestElement(), expectedMin);
+ assertEquals(col.getLengthOfLargestElement(), expectedMax);
+ assertEquals(col.getMaxRowLengthInBytes(), expectedMax);
+ }
+
+ @Test
+ public void testTypeConversionAndJsonSerialization() {
+ // Create collectors based on first-seen types, then feed convertible
values
+ Map<String, Object> r1 = new HashMap<>();
+ r1.put("kInt2", 5); // int collector
+ r1.put("kObj", new TreeMap<>(Map.of("k1", "v1", "k2", "v2"))); // will
serialize to "[1]"
Review Comment:
The comment states the TreeMap 'will serialize to \"[1]\"' which is
incorrect. A TreeMap containing {\"k1\":\"v1\",\"k2\":\"v2\"} would serialize
to a JSON object, not \"[1]\". The comment should be corrected or removed.
```suggestion
r1.put("kObj", new TreeMap<>(Map.of("k1", "v1", "k2", "v2"))); // will
serialize to a JSON object: {"k1":"v1","k2":"v2"}
```
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/MapColumnPreIndexStatsCollector.java:
##########
@@ -105,6 +110,62 @@ public void collect(Object entry) {
updatePartition(key);
}
}
+ if (keyStats instanceof NoDictColumnStatisticsCollector) {
+ keyStats.collect(value);
+ continue;
+ }
+ if (keyStats instanceof StringColumnPreIndexStatsCollector) {
+ if (value instanceof String || value instanceof Number || value
instanceof Boolean) {
+ keyStats.collect(String.valueOf(value));
+ continue;
+ }
+ try {
+ keyStats.collect(JsonUtils.objectToString(value));
+ continue;
+ } catch (JsonProcessingException e) {
+ throw new RuntimeException("Failed to serialize value for key '" +
key + "': " + value, e);
+ }
+ }
+ if (keyStats instanceof BigDecimalColumnPreIndexStatsCollector) {
+
keyStats.collect(PinotDataType.STRING.toBigDecimal(value.toString()));
+ continue;
+ }
+ if (value instanceof Number) {
+ Number valueNumber = (Number) value;
+ if (keyStats instanceof IntColumnPreIndexStatsCollector) {
+ keyStats.collect(valueNumber.intValue());
+ continue;
+ }
+ if (keyStats instanceof LongColumnPreIndexStatsCollector) {
+ keyStats.collect(valueNumber.longValue());
+ continue;
+ }
+ if (keyStats instanceof FloatColumnPreIndexStatsCollector) {
+ keyStats.collect(valueNumber.floatValue());
+ continue;
+ }
+ if (keyStats instanceof DoubleColumnPreIndexStatsCollector) {
+ keyStats.collect(valueNumber.doubleValue());
+ continue;
+ }
+ }
+ if (keyStats instanceof IntColumnPreIndexStatsCollector) {
+ keyStats.collect(PinotDataType.STRING.toInt(value.toString()));
+ continue;
+ }
+ if (keyStats instanceof LongColumnPreIndexStatsCollector) {
+ keyStats.collect(PinotDataType.STRING.toLong(value.toString()));
+ continue;
+ }
+ if (keyStats instanceof FloatColumnPreIndexStatsCollector) {
+ keyStats.collect(PinotDataType.STRING.toFloat(value.toString()));
+ continue;
+ }
+ if (keyStats instanceof DoubleColumnPreIndexStatsCollector) {
+ keyStats.collect(PinotDataType.STRING.toDouble(value.toString()));
Review Comment:
These conversion branches use
`PinotDataType.STRING.toInt/toLong/toFloat/toDouble()` methods, but according
to the context, `STRING.toInt()` throws `UnsupportedOperationException`. The
code should either use direct parsing (e.g., `Integer.parseInt()`,
`Long.parseLong()`) or verify that STRING type implements these conversion
methods properly.
```suggestion
keyStats.collect(Integer.parseInt(value.toString()));
continue;
}
if (keyStats instanceof LongColumnPreIndexStatsCollector) {
keyStats.collect(Long.parseLong(value.toString()));
continue;
}
if (keyStats instanceof FloatColumnPreIndexStatsCollector) {
keyStats.collect(Float.parseFloat(value.toString()));
continue;
}
if (keyStats instanceof DoubleColumnPreIndexStatsCollector) {
keyStats.collect(Double.parseDouble(value.toString()));
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]