Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order, and support type promotion on write [iceberg-python]

via GitHub Mon, 15 Jul 2024 08:32:47 -0700


syun64 commented on code in PR #921:
URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1678021509



##########
pyiceberg/io/pyarrow.py:
##########
@@ -2214,13 +2199,16 @@ def _dataframe_to_data_files(
         property_name=TableProperties.WRITE_TARGET_FILE_SIZE_BYTES,
         default=TableProperties.WRITE_TARGET_FILE_SIZE_BYTES_DEFAULT,
     )
+    name_mapping = table_metadata.schema().name_mapping

Review Comment:
   Good question - I'm not sure actually. 
   
   When we are writing a dataframe into an Iceberg table, I think we are making 
the assumption that its names match the current names of the Iceberg table, so 
I think using the `table_metadata.schema().name_mapping` is appropriate in 
supporting that.
   
   `table_metadata.name_mapping` is used to map the field names of written data 
files hat do not have field IDs to the Iceberg SChema, so I don't think we need 
to extend our write APIs to also support mapping other names in the name 
mapping. Since this will be a new feature, I'm of the opinion that we should 
leave it out until we can think of valid use cases for that from our user base.
   
   I'm curious to hear what others' thoughts are, and whether anyone has a 
workflow in mind that would benefit from this change!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order, and support type promotion on write [iceberg-python]

Reply via email to