Re: [PR] feat(lance): Ensure Spark-SQL works correctly when using lance as base file format [hudi]

via GitHub Mon, 13 Apr 2026 00:42:59 -0700


wombatu-kun commented on code in PR #18375:
URL: https://github.com/apache/hudi/pull/18375#discussion_r3071513724



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -251,8 +251,11 @@ public static Collection<Pair<String, Long>> 
filterKeysFromFile(StoragePath file
       return Collections.emptyList();
     }
     log.info("Going to filter {} keys from file {}", 
candidateRecordKeys.size(), filePath);

Review Comment:
   Filed as https://github.com/apache/hudi/issues/18496 and linked the TODO to 
it — thanks for the nudge.



##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/UpdateProcessor.java:
##########
@@ -136,18 +140,54 @@ protected BufferedRecord<T> 
handleNonDeletes(BufferedRecord<T> previousRecord, B

Review Comment:
   Good point — replaced the schema-inequality heuristic with an explicit 
`consumeLastAvroRecordFromCache()` flag on `RecordContext`. 
`SparkFileFormatInternalRecordContext.convertToAvroRecord` sets it when it 
returns a cached record, and `UpdateProcessor.handleNonDeletes` reads (and 
resets) the flag instead of comparing schemas. Default impl in `RecordContext` 
returns `false`, so Flink/Java engines are unaffected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(lance): Ensure Spark-SQL works correctly when using lance as base file format [hudi]

Reply via email to