Re: [PR] Spec: Clarify identity partition edge cases. [iceberg]

via GitHub Fri, 02 Aug 2024 14:17:57 -0700


findepi commented on code in PR #10835:
URL: https://github.com/apache/iceberg/pull/10835#discussion_r1702327165



##########
format/spec.md:
##########
@@ -241,7 +245,14 @@ Struct evolution requires the following rules for default 
values:
 
 #### Column Projection
 
-Columns in Iceberg data files are selected by field id. The table schema's 
column names and order may change after a data file is written, and projection 
must be done using field ids. If a field id is missing from a data file, its 
value for each row should be `null`.
+Columns in Iceberg data files are selected by field id. The table schema's 
column names and order may change after a data file is written, and projection 
must be done using field ids.
+
+Values for field ids which are not present in a data file must be resolved 
according the following rules:
+
+* Return the value from partition metadata if an [Identity 
Transform](#partition-transforms) exists for the field and the partition value 
is present in the `partition` struct on `data_file` object in the manifest. 

Review Comment:
   I was thinking about situation like this (using Trino syntax)
   
   ```
   -- create a table with some data
   CREATE TABLE t AS SELECT 123 AS a;
   
   -- add new partitioning column
   ALTER TABLE t SET PROPERTIES partitioning = ARRAY['truncate(a, 10)'];
   ```
   
   Now we have two fields: the `a` data column and `a_trunc` projected column.
   Trino doesn't provide a way to query for `a_trunc` column directly.
   However, if it did, the value for `a_trunc` could be derived from the data.
   
   
   
   



##########
format/spec.md:
##########
@@ -241,7 +245,14 @@ Struct evolution requires the following rules for default 
values:
 
 #### Column Projection
 
-Columns in Iceberg data files are selected by field id. The table schema's 
column names and order may change after a data file is written, and projection 
must be done using field ids. If a field id is missing from a data file, its 
value for each row should be `null`.
+Columns in Iceberg data files are selected by field id. The table schema's 
column names and order may change after a data file is written, and projection 
must be done using field ids.
+
+Values for field ids which are not present in a data file must be resolved 
according the following rules:
+
+* Return the value from partition metadata if an [Identity 
Transform](#partition-transforms) exists for the field and the partition value 
is present in the `partition` struct on `data_file` object in the manifest. 

Review Comment:
   I was thinking about situation like this (using Trino syntax)
   
   ```sql
   -- create a table with some data
   CREATE TABLE t AS SELECT 123 AS a;
   
   -- add new partitioning column
   ALTER TABLE t SET PROPERTIES partitioning = ARRAY['truncate(a, 10)'];
   ```
   
   Now we have two fields: the `a` data column and `a_trunc` projected column.
   Trino doesn't provide a way to query for `a_trunc` column directly.
   However, if it did, the value for `a_trunc` could be derived from the data.
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spec: Clarify identity partition edge cases. [iceberg]

Reply via email to