DistinctCountULL) intermediate results serde issue [pinot]

via GitHub Wed, 03 Dec 2025 02:42:49 -0800


Copilot commented on code in PR #17284:
URL: https://github.com/apache/pinot/pull/17284#discussion_r2584580496



##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/MapTypeTest.java:
##########
@@ -99,24 +98,23 @@ public List<File> createAvroFiles()
             new org.apache.avro.Schema.Field(INT_KEY_MAP_FIELD_NAME, 
intKeyMapAvroSchema, null, null));
     avroSchema.setFields(fields);
 
-    File avroFile = new File(_tempDir, "data.avro");
-    try (DataFileWriter<GenericData.Record> fileWriter = new 
DataFileWriter<>(new GenericDatumWriter<>(avroSchema))) {
-      fileWriter.create(avroSchema, avroFile);
-      for (int i = 0; i < NUM_DOCS; i++) {
+    try (AvroFilesAndWriters avroFilesAndWriters = 
createAvroFilesAndWriters(avroSchema)) {
+      for (int i = 0; i < NUM_DOCS_PER_SEGMENT; i++) {
         Map<String, Integer> stringKeyMap = new HashMap<>();
         stringKeyMap.put("k1", i);
-        stringKeyMap.put("k2", NUM_DOCS + i);
+        stringKeyMap.put("k2", NUM_DOCS_PER_SEGMENT + i);
         Map<Integer, String> intKeyMap = new HashMap<>();
         intKeyMap.put(95, Integer.toString(i));
-        intKeyMap.put(717, Integer.toString(NUM_DOCS + i));
+        intKeyMap.put(717, Integer.toString(NUM_DOCS_PER_SEGMENT + i));
         GenericData.Record record = new GenericData.Record(avroSchema);
         record.put(STRING_KEY_MAP_FIELD_NAME, stringKeyMap);
         record.put(INT_KEY_MAP_FIELD_NAME, intKeyMap);
-        fileWriter.append(record);
+        for (DataFileWriter<GenericData.Record> writer : 
avroFilesAndWriters.getWriters()) {
+          writer.append(record);
+        }

Review Comment:
   This loop duplicates the same record across all writers instead of 
distributing data across segments. Unlike other test files that use 
`writers.get(i % getNumAvroFiles())` to distribute records, this creates 
identical data in each segment. This defeats the purpose of testing 
multi-segment scenarios.
   ```suggestion
           avroFilesAndWriters.getWriters().get(i % 
getNumAvroFiles()).append(record);
   ```



##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/JsonPathTest.java:
##########
@@ -123,13 +120,14 @@ public List<File> createAvroFiles()
             Map.of("k4-k1", "value-k4-k1-" + i, "k4-k2", "value-k4-k2-" + i, 
"k4-k3", "value-k4-k3-" + i,
                 "met", i));
         record.put(COMPLEX_MAP_STR_FIELD_NAME, 
JsonUtils.objectToString(complexMap));
-        fileWriter.append(record);
+        for (DataFileWriter<GenericData.Record> writer : 
avroFilesAndWriters.getWriters()) {
+          writer.append(record);
+        }

Review Comment:
   This loop duplicates the same record across all writers instead of 
distributing data across segments. This pattern differs from other test files 
which use `writers.get(i % getNumAvroFiles())` to distribute records properly.
   ```suggestion
           DataFileWriter<GenericData.Record> writer = 
avroFilesAndWriters.getWriters().get(i % getNumAvroFiles());
           writer.append(record);
   ```



##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/MapFieldTypeTest.java:
##########
@@ -155,10 +152,12 @@ public List<File> createAvroFiles()
         GenericData.Record record = new GenericData.Record(avroSchema);
         record.put(STRING_MAP_FIELD_NAME, stringMap);
         record.put(INT_MAP_FIELD_NAME, intMap);
-        fileWriter.append(record);
+        for (DataFileWriter<GenericData.Record> writer : 
avroFilesAndWriters.getWriters()) {
+          writer.append(record);
+        }

Review Comment:
   This loop duplicates the same record across all writers instead of 
distributing data across segments. Other test files use `writers.get(i % 
getNumAvroFiles())` to properly distribute records across multiple segments.
   ```suggestion
           avroFilesAndWriters.getWriters().get(i % 
getNumAvroFiles()).append(record);
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountULLAggregationFunction.java:
##########
@@ -338,12 +341,18 @@ public UltraLogLog 
extractGroupByResult(GroupByResultHolder groupByResultHolder,
   }
 
   @Override
-  public UltraLogLog merge(UltraLogLog intermediateResult1, UltraLogLog 
intermediateResult2) {
-    int largerP = Math.max(intermediateResult1.getP(), 
intermediateResult2.getP());
-    UltraLogLog merged = UltraLogLog.create(largerP);
-    merged.add(intermediateResult1);
-    merged.add(intermediateResult2);
-    return merged;
+  public UltraLogLog merge(@Nullable UltraLogLog intermediateResult1, 
@Nullable UltraLogLog intermediateResult2) {
+    if (intermediateResult1 == null) {
+      return intermediateResult2;
+    }
+    if (intermediateResult2 == null) {
+      return intermediateResult1;
+    }
+    // UltraLogLog object doesn't support merge a smaller P object into a 
larger p object.

Review Comment:
   Inconsistent capitalization: 'P' and 'p' both refer to the same parameter. 
Should be 'merging a smaller p object into a larger p object' for consistency.
   ```suggestion
       // UltraLogLog object doesn't support merging a smaller p object into a 
larger p object.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Fix multiple aggregation functions(StUnion/SumArrayLong/SumArrayDouble/DistinctCountULL) intermediate results serde issue [pinot]

Reply via email to