rahil-c commented on code in PR #18375:
URL: https://github.com/apache/hudi/pull/18375#discussion_r3069732092
##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileFormat.java:
##########
@@ -70,6 +70,22 @@ public String getFileExtension() {
return extension;
}
+ /**
+ * Returns true if this file format requires the SPARK record type for
reading/writing.
+ * Lance only supports the Spark-native InternalRow representation, not Avro.
+ */
+ public boolean requiresSparkRecordType() {
+ return this == LANCE;
+ }
+
+ /**
Review Comment:
**[High]** `resolveRecordType()` fixes the I/O layer (reader/writer factory
selection), but every test still must manually set
`hoodie.write.record.merge.custom.implementation.classes=DefaultSparkRecordMerger`
because the **merger** is selected at config init time before file format is
known.
This creates a usability gap — users who set `baseFileFormat=LANCE` but
forget the merger property will hit a confusing runtime error. Could the merger
selection logic (`HoodieRecordUtils.createRecordMerger`) also consult the
table's base file format and auto-select a Spark merger when Lance is
configured? That would make `resolveRecordType` a complete solution rather than
a partial one.
Note for the above I am ok with filling this as a github followup issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]