Re: [PR] feat(lance): Ensure Spark-SQL works correctly when using lance as base file format [hudi]

via GitHub Sun, 12 Apr 2026 09:38:14 -0700


rahil-c commented on code in PR #18375:
URL: https://github.com/apache/hudi/pull/18375#discussion_r3069732092



##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileFormat.java:
##########
@@ -70,6 +70,22 @@ public String getFileExtension() {
     return extension;
   }
 
+  /**
+   * Returns true if this file format requires the SPARK record type for 
reading/writing.
+   * Lance only supports the Spark-native InternalRow representation, not Avro.
+   */
+  public boolean requiresSparkRecordType() {
+    return this == LANCE;
+  }
+
+  /**

Review Comment:
   **[High]** `resolveRecordType()` fixes the I/O layer (reader/writer factory 
selection), but every test still must manually set 
`hoodie.write.record.merge.custom.implementation.classes=DefaultSparkRecordMerger`
 because the **merger** is selected at config init time before file format is 
known.
   
   This creates a usability gap — users who set `baseFileFormat=LANCE` but 
forget the merger property will hit a confusing runtime error. Could the merger 
selection logic (`HoodieRecordUtils.createRecordMerger`) also consult the 
table's base file format and auto-select a Spark merger when Lance is 
configured? That would make `resolveRecordType` a complete solution rather than 
a partial one.
   
   Note: for the above I am ok with filling this as a github followup issue, as 
I think we will want to change this requirement at some point.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(lance): Ensure Spark-SQL works correctly when using lance as base file format [hudi]

Reply via email to