syun64 opened a new issue, #278:
URL: https://github.com/apache/iceberg-python/issues/278

   ### Feature Request / Improvement
   
   I see three ways a user would want to create an Iceberg table:
   1. Completely manual - by specifying the schema, field by field
   2. By inferring the schema from an existing strongly-typed file or pyarrow 
table
   3. By copying the schema of an existing iceberg table (migration)
   
   create_table function currently takes a pyiceberg.Schema as the input. The 
existing visitors support patterns (1) and (3), but not (2).
   
   This is because the creation of a pyiceberg.Schema is only supported in the 
following two ways:
   1. From a pyarrow schema with valid field-id metadata
   2. using a NameMapping which have field-ids. Currently, the only way to 
create a NameMapping is by constructing it field-id by field-id, or by using a 
utility function on an existing Iceberg Schema.
   
   Therefore, we need to update an existing Visitor, or create a new Visitor in 
order to support the generation of a pyiceberg.Schema from a pyarrow Schema 
with no IDs.
   
   On https://github.com/apache/iceberg-python/pull/219 the following 
approaches have been discussed so far:
   1. Update _ConvertToIceberg to create a pyiceberg.Schema from pyarrow schema 
by assigning "-1" field_ids and use _SetFreshIDs to assign ordered fresh IDs. 
This idea unfortunately does not work as _SetFreshIDs requires different IDs to 
track each column and assign new ones.
   2. Create a new Visitor _CreateMappingFromPyArrowSchema that creates a name 
mapping from PyArrow schema and assigns fresh IDs if it does not have one. This 
is different from existing _CreateMapping visitor which is a pyiceberg Schema 
visitor.
   3. Use a separate visitor _ConvertToIcebergWithFreshIds which assigns fresh 
IDs based on the order of the fields' appearance in the pyarrow schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to