nsivabalan commented on code in PR #18132:
URL: https://github.com/apache/hudi/pull/18132#discussion_r3053538737
##########
hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java:
##########
@@ -777,8 +777,8 @@ private void assertBoundaryCounts(Dataset<Row> df, String
exprZero, String exprT
}
@ParameterizedTest
- @CsvSource(value = {"SIX,AVRO,CLUSTER", "CURRENT,AVRO,NONE",
"CURRENT,AVRO,CLUSTER", "CURRENT,SPARK,NONE", "CURRENT,SPARK,CLUSTER"})
- void testCOWLogicalRepair(String tableVersion, String recordType, String
operation) throws Throwable {
+ @CsvSource(value = {"CLUSTER", "NONE"})
+ void testCOWLogicalRepair(String operation) throws Throwable {
Review Comment:
lets include SPARK as well and not just AVRO
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##########
@@ -87,8 +87,12 @@ public void runMerge(HoodieTable<?, ?, ?, ?> table,
HoodieFileReader bootstrapFileReader = null;
Schema writerSchema = mergeHandle.getWriterSchemaWithMetaFields();
- Schema readerSchema =
AvroSchemaUtils.getRepairedSchema(baseFileReader.getSchema(), writerSchema);
-
+ Schema readerSchema;
+ if (AvroSchemaUtils.isLogicalTimestampRepairNeeded(table.getHadoopConf(),
true)) {
Review Comment:
this line is executed in the executor. we should parse the schema in the
driver and set the boolean flag in hadoop conf at one of the caller site in the
driver.
##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroParquetReader.java:
##########
@@ -174,7 +180,8 @@ private ClosableIterator<IndexedRecord>
getIndexedRecordIteratorInternal(Schema
AvroReadSupport.setRequestedProjection(conf, schema);
}
ParquetReader<IndexedRecord> reader =
- new HoodieAvroParquetReaderBuilder<IndexedRecord>(path)
+ new HoodieAvroParquetReaderBuilder<IndexedRecord>(path,
+ AvroSchemaUtils.isLogicalTimestampRepairNeeded(conf, false) ||
schema == null || AvroSchemaUtils.hasTimestampMillisField(schema))
Review Comment:
why the default value is false here, but true in all other places?
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java:
##########
@@ -133,9 +133,7 @@ public HoodieData<WriteStatus> compact(
// Since we are using merge handle here, we can directly query the write
schema from conf
// Write handle provides an option to use overridden write schema as well
which is not used by merge handle
Schema writerSchema = HoodieAvroUtils.addMetadataFields(new
Schema.Parser().parse(config.getWriteSchema()),
config.allowOperationMetadataField());
- if (AvroSchemaUtils.hasTimestampMillisField(writerSchema)) {
- AvroSchemaUtils.setLogicalTimestampRepairNeeded(table.getHadoopConf());
- }
+ AvroSchemaUtils.setLogicalTimestampRepairIfNotSet(table.getHadoopConf(),
() -> AvroSchemaUtils.hasTimestampMillisField(writerSchema));
Review Comment:
do we need to check for mdt here and avoid setting the config?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]