wombatu-kun commented on code in PR #18375:
URL: https://github.com/apache/hudi/pull/18375#discussion_r3071513724
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -251,8 +251,11 @@ public static Collection<Pair<String, Long>>
filterKeysFromFile(StoragePath file
return Collections.emptyList();
}
log.info("Going to filter {} keys from file {}",
candidateRecordKeys.size(), filePath);
Review Comment:
Filed as https://github.com/apache/hudi/issues/18496 and linked the TODO to
it — thanks for the nudge.
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/UpdateProcessor.java:
##########
@@ -136,18 +140,54 @@ protected BufferedRecord<T>
handleNonDeletes(BufferedRecord<T> previousRecord, B
Review Comment:
Good point — replaced the schema-inequality heuristic with an explicit
`consumeLastAvroRecordFromCache()` flag on `RecordContext`.
`SparkFileFormatInternalRecordContext.convertToAvroRecord` sets it when it
returns a cached record, and `UpdateProcessor.handleNonDeletes` reads (and
resets) the flag instead of comparing schemas. Default impl in `RecordContext`
returns `false`, so Flink/Java engines are unaffected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]