syun64 opened a new issue, #278: URL: https://github.com/apache/iceberg-python/issues/278
### Feature Request / Improvement I see three ways a user would want to create an Iceberg table: 1. Completely manual - by specifying the schema, field by field 2. By inferring the schema from an existing strongly-typed file or pyarrow table 3. By copying the schema of an existing iceberg table (migration) create_table function currently takes a pyiceberg.Schema as the input. The existing visitors support patterns (1) and (3), but not (2). This is because the creation of a pyiceberg.Schema is only supported in the following two ways: 1. From a pyarrow schema with valid field-id metadata 2. using a NameMapping which have field-ids. Currently, the only way to create a NameMapping is by constructing it field-id by field-id, or by using a utility function on an existing Iceberg Schema. Therefore, we need to update an existing Visitor, or create a new Visitor in order to support the generation of a pyiceberg.Schema from a pyarrow Schema with no IDs. On https://github.com/apache/iceberg-python/pull/219 the following approaches have been discussed so far: 1. Update _ConvertToIceberg to create a pyiceberg.Schema from pyarrow schema by assigning "-1" field_ids and use _SetFreshIDs to assign ordered fresh IDs. This idea unfortunately does not work as _SetFreshIDs requires different IDs to track each column and assign new ones. 2. Create a new Visitor _CreateMappingFromPyArrowSchema that creates a name mapping from PyArrow schema and assigns fresh IDs if it does not have one. This is different from existing _CreateMapping visitor which is a pyiceberg Schema visitor. 3. Use a separate visitor _ConvertToIcebergWithFreshIds which assigns fresh IDs based on the order of the fields' appearance in the pyarrow schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org