nsivabalan commented on code in PR #18132:
URL: https://github.com/apache/hudi/pull/18132#discussion_r3048824555
##########
hudi-common/src/main/java/org/apache/parquet/avro/HoodieAvroParquetReaderBuilder.java:
##########
@@ -67,13 +74,19 @@ public HoodieAvroParquetReaderBuilder<T>
withCompatibility(boolean enableCompati
return this;
}
+ public HoodieAvroParquetReaderBuilder<T> withTableSchema(Schema tableSchema)
{
+ this.tableSchema = tableSchema;
+ return this;
+ }
+
@Override
protected ReadSupport<T> getReadSupport() {
if (isReflect) {
conf.setBoolean(AvroReadSupport.AVRO_COMPATIBILITY, false);
} else {
conf.setBoolean(AvroReadSupport.AVRO_COMPATIBILITY, enableCompatibility);
}
- return new HoodieAvroReadSupport<>(model);
+ return new HoodieAvroReadSupport<>(model,
Option.ofNullable(tableSchema).map(schema ->
getAvroSchemaConverter(conf).convert(schema)),
Review Comment:
if hadoopConf has the value for `hasLogicalTsField`, we can also avoid the
additional call in L90
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##########
@@ -86,7 +87,8 @@ public void runMerge(HoodieTable<?, ?, ?, ?> table,
HoodieFileReader bootstrapFileReader = null;
Schema writerSchema = mergeHandle.getWriterSchemaWithMetaFields();
- Schema readerSchema = baseFileReader.getSchema();
+ Schema readerSchema =
AvroSchemaUtils.getRepairedSchema(baseFileReader.getSchema(), writerSchema);
Review Comment:
but why can't we add it to hadoopConfiguration thats part of
`table.getHadoopConf()` in the driver and then fetch it from here
##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroParquetReader.java:
##########
@@ -154,23 +158,70 @@ private static Configuration
tryOverrideDefaultConfigs(Configuration conf) {
return conf;
}
- private ClosableIterator<IndexedRecord>
getIndexedRecordIteratorInternal(Schema schema, Option<Schema> requestedSchema)
throws IOException {
+ private ClosableIterator<IndexedRecord>
getIndexedRecordIteratorInternal(Schema schema, Option<Schema> renamedColumns)
throws IOException {
// NOTE: We have to set both Avro read-schema and projection schema to make
// sure that in case the file-schema is not equal to read-schema
we'd still
// be able to read that file (in case projection is a proper one)
- if (!requestedSchema.isPresent()) {
+ Schema repairedFileSchema = getRepairedSchema(getSchema(), schema);
Review Comment:
When we are instantiating the base file reader in L84 in HoodieMergeHelper,
if we can embed a boolean flag in hadoopConf, we can fetch it again here and
avoid repair calls for tables w/o any logical type.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]