Copilot commented on code in PR #16027:
URL: https://github.com/apache/pinot/pull/16027#discussion_r2132681948


##########
pinot-query-runtime/src/test/resources/queries/NullHandling.json:
##########
@@ -271,7 +271,10 @@
           ["bar", 2, 2, 2, "alice", "alice", "alice"],
           ["alice", 2, 2, 2, null, null, null],
           [null, 4, 4, 4, null, null, null],
-          ["bob", null, null, null, null, null, null]
+          ["bob", null, null, null, null, null, null],
+          ["alice", 2, 2, 2, null, null, "null"],
+          [null, 4, 4, 4, null, null, "null"],
+          ["bob", null, null, -2147483648, null, null, "null"]

Review Comment:
   This JSON mixes literal `null` and the string "null" to represent missing 
values. Use a consistent null representation to avoid confusion in test 
expectations.
   ```suggestion
             ["alice", 2, 2, 2, null, null, null],
             [null, 4, 4, 4, null, null, null],
             ["bob", null, null, -2147483648, null, null, null]
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/RecordReaderSegmentCreationDataSource.java:
##########
@@ -63,12 +64,17 @@ public SegmentPreIndexStatsCollector 
gatherStats(StatsCollectorConfig statsColle
               .getIngestionConfig().isContinueOnError();
       GenericRow reuse = new GenericRow();
       TransformPipeline.Result reusedResult = new TransformPipeline.Result();
+      Set<String> notNullColumns = 
SegmentCreatorUtils.extractNotNullColumns(statsCollectorConfig.getSchema());
       while (_recordReader.hasNext()) {
         reuse.clear();
         try {
           reuse = _recordReader.next(reuse);
           transformPipeline.processRow(reuse, reusedResult);
           for (GenericRow row : reusedResult.getTransformedRows()) {
+            if (SegmentCreatorUtils.shouldSkipRowForNotNull(notNullColumns, 
row)) {
+              throw new RuntimeException("Row contains null value in a 
not-null column, and "
+                  + "`TableConfig.IngestionConfig.continueOnError` is set to 
false. Row: " + row);

Review Comment:
   The real-time stats collector always throws on encountering a null in a 
not-null column regardless of `continueOnError`. Wrap this in a check so that 
it only throws when `continueOnError` is false, and skips the row otherwise.
   ```suggestion
                 if (!continueOnError) {
                   throw new RuntimeException("Row contains null value in a 
not-null column, and "
                       + "`TableConfig.IngestionConfig.continueOnError` is set 
to false. Row: " + row);
                 } else {
                   LOGGER.warn("Skipping row with null value in a not-null 
column due to `continueOnError` being true. Row: {}", row);
                   continue;
                 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to