[GitHub] [iceberg] JonasJ-ap commented on pull request #6449: WIP: Delta, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

GitBox Fri, 06 Jan 2023 14:50:35 -0800


JonasJ-ap commented on PR #6449:
URL: https://github.com/apache/iceberg/pull/6449#issuecomment-1374229027


   > Regarding Delta name mapping that @findepi mentioned, looking at the spec,
   > 
   > ```
   > Write data files by using the physical name that is chosen for each 
column. The physical name of the column is static and can be different than the 
display name of the column, which is changeable.
   > 
   > Write the 32 bit integer column identifier as part of the field_id field 
of the SchemaElement struct in the [Parquet Thrift 
specification](https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift).
   > 
   > Track partition values and column level statistics with the physical name 
of the column in the transaction log.
   > ```
   > 
   > Because the column name has changed in the underlying parquet file, 
migrating that requires not only Iceberg name mapping configuration, but also 
converting the statistics retrieved from Parquet files.
   > 
   > Sounds like something that can be added as the next step after this PR is 
merged.
   
   Also, according to [roadmap of delta 
lake](https://github.com/delta-io/delta/issues/1307), the `delta-standalone` 
currently does not support culumnMapping and other features in high protocol 
version. Maybe we can start adding support for these features once the new 
version of `delta-standalone` get published in the next few months


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] JonasJ-ap commented on pull request #6449: WIP: Delta, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

Reply via email to