Re: [I] Inconsistent PyArrow Schema Field Metadata on `project_table`: Parquet Field ID [iceberg-python]

via GitHub Sun, 02 Jun 2024 11:52:39 -0700


Fokko commented on issue #788:
URL: https://github.com/apache/iceberg-python/issues/788#issuecomment-2143984768


   Thanks @syun64 for raising this, and it indeed looks inconsistent.
   
   There has been a lot of confusion around this in the past. The Field-IDs are 
internal to Iceberg and should only be used when:
   
    - Reading: Looking up the field in the requested schema
    - Writing: Aligning the fields with the table schema
   
    > Do we want to attach the parquet file ID attribute on all pyarrow schema 
returned by project_table?
   
   After the project table, it is not relevant anymore, so it is best to remove 
them.
   
   > Or should we remove parquet file ID attached on the field metadata of the 
pyarrow schema? The idea here is that we would have two modes of creating 
schema_to_pyarrow , with or without parquet Field ID (write, versus read use 
cases)
   
   Yes, this makes sense to me. It would be good to have the option to omit 
field IDs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Inconsistent PyArrow Schema Field Metadata on `project_table`: Parquet Field ID [iceberg-python]

Reply via email to