JFinis opened a new issue, #8699:
URL: https://github.com/apache/iceberg/issues/8699

   ### Apache Iceberg version
   
   1.3.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   I created some Icebergs with the latest Iceberg/Spark and checked whether 
the schemas of the generated Manifest files and Manifest Lists are in 
accordance to the Spec. 
   
   This issue also is based on the discussions [in this 
PR](https://github.com/apache/iceberg/pull/8672) and this [Slack 
Thread](https://apache-iceberg.slack.com/archives/C03LG1D563F/p1695834739711569),
 where we discussed what the semantics should be, if a field is labeled 
optional by the spec:
   
   So far, the outcome of the discussion is that if a field is labeled optional 
for an Iceberg format version, then writers writing that format version 
**should** include a column for that field in the Avro file but tag that column 
optional (i.e., nullable, i.e.,`[Null, T]` in Avro). They **should not** just 
leave out the column.
   
   This Issue contains all deviations from the Spec I could find. All 
deviations found were in Icebergs created with Spark 3.4.1 using Iceberg 1.3.1 
(latest release).
   
   In a format_version=**2** Manifest **List**:
   * column `key_metadata` is not written at all, even though the Spec tags it 
as optional in v2 (and v1).
   * The fields `added_files_count`, `existing_files_count`, 
`deleted_files_count` are not named correctly. They have an additional data_ 
infix. This was already reported separately in [Issue 
8684](https://github.com/apache/iceberg/issues/8684), but I include it here as 
well for the sake of completeness.
   
   In a format_version=**1** Manifest **File**:
   * column `file_ordinal` is not written at all, even though the Spec tags it 
as optional in v1 (it is considered deprecated though)
   * column `sort_columns` is not written at all, even though the Spec tags it 
as optional in v1 (it is considered deprecated though)
   * [same as v2] column `distinct_counts` is not written at all, even though 
the Spec tags it as optional in v1 (and v2).
   
   In a format_version=**1** Manifest **List**:
   * The position of the column `507 partitions` in the `manifest_entry` struct 
is different than in the Spec:
     * In the spec, it is placed behind `514 deleted_rows_count`
     * In the file, it is placed behind `506 deleted_files_count
   * [same as v2] column `key_metadata` is not written at all, even though the 
Spec tags it as optional in v2 (and v1).
   * [same as v2] The fields `added_files_count`, `existing_files_count`, 
`deleted_files_count` are not named correctly. They have an additional data_ 
infix. This was already reported separately in [Issue 
8684](https://github.com/apache/iceberg/issues/8684), but I include it here as 
well for the sake of completeness.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to