Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]

via GitHub Mon, 22 Jan 2024 07:49:58 -0800


syun64 commented on issue #278:
URL: https://github.com/apache/iceberg-python/issues/278#issuecomment-1904289015


   > what do we do with the name-mapping created in step 1 after the table is 
created? Do we just discard it or put it in schema.name-mapping.default? If the 
later, I think we need either to update the 
[new_table_metadata](https://github.com/apache/iceberg-python/blob/83306104a25a4ecd1f2185ec46cd9fda247544f4/pyiceberg/table/metadata.py#L399-L409)
 to not assign fresh ids when a name-mapping present or update the _SetFreshIds 
to respect name-mapping if given. I would appreciate any thoughts on this 
matter!
   
   Great question @HonahX my understanding is that the act of putting in a name 
mapping into schema.name-mapping.default isn't done automatically by any 
operation, and requires the user to actually insert the name mapping json as a 
table property into the iceberg table.
   
   I think regardless of whether we create this visitor to create a name 
mapping (which in turn will be used to create an iceberg schema), or an iceberg 
schema directly, it will need have to have the ability to incrementally assign 
a new id by position. Because we are trying to create a new iceberg schema 
based on an arrow schema that does not have the field_id metadata.
   
   Imagine we are trying to grab a 100 column parquet file from a vendor and 
create an Iceberg table based on it, and it doens't have PARQUET:FIELD_ID 
metadata on its columns. Currently, there's no way to create this iceberg table 
and ingest this data without manually coding and labelling each and every 
column using the Iceberg schema types to create an Iceberg schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]

Reply via email to