syun64 commented on PR #498: URL: https://github.com/apache/iceberg-python/pull/498#issuecomment-2033344103
> Shall we move "append", "overwrite", and "add_files" to `Transaction` class? This change would enable us to seamlessly chain these operations with other table updates in a single commit. This adjustment could be particularly beneficial in the context of `CreateTableTransaction`, as it would enable users to not only create a table but also populate it with initial data in one go. I think this is a great question. I think we have two options here: 1. We move these actions into the Transaction class, and remove them from Table class 2. We move them into the Transaction class, and also keep an implementation in the Table class I'm not sure which of the above two are better, but I keep asking myself whether there's a 'good' reason why we have two separate APIs that achieve similar results. For example, we have **update_spec**, **update_schema** that can be created from the **Transaction** or the **Table**, and I feel like we might be creating work for ourselves by duplicating the feature in both classes. What if we consolidated all of our actions into the Transaction class, and removed them from the Table class? I think the upside of that would be that API would convey a very clear message to the developer that a _transaction is committed to a table_, and that a series of _actions_ can be chained onto the _same transaction_, as a single commit. In addition, we can avoid [issues like this](https://github.com/apache/iceberg-python/pull/508) where we roll out a feature to one API implementation, but not the other. ``` with given_table.update_schema() as tx: tx.add_column(path="new_column1", field_type=IntegerType()) ``` ``` with given_table.transaction() as tx: with tx.update_schema() as update: update.add_column(path="new_column1", field_type=IntegerType()) ``` To me, the bottom pattern feels more explicit than the above option, and I'm curious to hear others' opinions on this topic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org