syun64 commented on issue #281: URL: https://github.com/apache/iceberg-python/issues/281#issuecomment-1900741649
Thank you for the great points @Fokko and @nicor88 . Just like @nicor88 mentioned, I think RTAS will be slightly different from overwrite in the sense that the schema, the partitioning scheme, sort order or any of the table properties can also be updated atomically with this operation. In short, the function needs to support updating any of the arguments that are currently supported on [create_table](https://github.com/apache/iceberg-python/blob/94d7821cbc6b31b791e18d4f91c0991684616076/pyiceberg/catalog/__init__.py#L286) function, in addition to overwriting the Iceberg table data with the input pyarrow table. I'm wondering if it would be better to have a separate function that achieves these goals in a single transaction? ``` class Table: ... def replace( self, schema: Schema, df: pa.Table, location: Optional[str] = None, partition_spec: PartitionSpec = UNPARTITIONED_PARTITION_SPEC, sort_order: SortOrder = UNSORTED_SORT_ORDER, properties: Properties = EMPTY_DICT, ) -> None: # update table properties, partition spec, sort_order and schema # overwrite all data in the table with new data from df # commit transaction in single metadata update ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org