[I] Incorrect Metrics Calculation for Iceberg Table Due to Column Name Transformation with Special Characters [iceberg]

via GitHub Tue, 09 Apr 2024 20:41:30 -0700


lintingbin opened a new issue, #10115:
URL: https://github.com/apache/iceberg/issues/10115


   ### Apache Iceberg version
   
   1.3.1
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   ```
   CREATE TABLE tmp.iceberg_test3 (
     `log_type.string` STRING,
     `event_time.string` STRING,
     `version.string` STRING,
     `version.bigint` BIGINT)
   USING iceberg
   PARTITIONED BY (truncate(10, `event_time.string`), `log_type.string`)
   TBLPROPERTIES (
     'write.metadata.metrics.column.event_time.string' = 'truncate(16)',
     'write.metadata.metrics.default' = 'none');
   ```
   When creating a table using the provided DDL statement for Iceberg tables, a 
bug arises where the metrics calculation for the column event_time.string 
becomes erroneous. This issue stems from the transformation applied to column 
names during storage in Parquet files. Specifically, the column name 
event_time.string undergoes conversion to event_time_x2Estring during the 
transformation process within the 
AvroSchemaUtil.makeCompatibleName(originalName) code. Consequently, in the 
ParquetUtil.java file, when fetching the MetricsMode using the statement 
MetricsMode metricsMode = MetricsUtil.metricsMode(fileSchema, metricsConfig, 
fieldId), an incorrect MetricsMode is retrieved due to the mismatch between the 
provided field name and the one stored in Parquet files.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Incorrect Metrics Calculation for Iceberg Table Due to Column Name Transformation with Special Characters [iceberg]

Reply via email to