HonahX commented on PR #1246:
URL: https://github.com/apache/iceberg-python/pull/1246#issuecomment-2439772946

   
   @stevie9868 @kevinjqliu Thanks for the great PR and discussions! I agree 
that there is some issue with the current Transaction mechanism: the 
`commit_transaction` can be incorrectly called when we should just abandon 
everything
   
   The following pattern has been the most common practice of updating tables 
in pyiceberg since the beginning
   ```python
   with tbl.transaction() as txn:
               txn.overwrite(...)
               ....
   ```
   The "with" statement will ensure that the context manager---Transaction 
object's `__exit__()`(`commit_transaction`[) will always be called (even there 
is an exception)](https://docs.python.org/3/reference/compound_stmts.html#with) 
as long as the transaction object is successfully initialized. However, we 
should only call `commit_transaction` when there is no exception along the way. 
 
   
   A simpler example would be:
   
   ```python
   pa_table_with_column = pa.Table.from_pydict(
           {
               "foo": ["a", None, "z"],
               "bar": [19, None, 25],
           },
           schema=pa.schema([
               pa.field("foo", pa.large_string(), nullable=True),
               pa.field("bar", pa.int32(), nullable=True),
           ]),
       )
   
   tbl = catalog.create_table(identifier=identifier, 
schema=pa_table_with_column.schema)
   
       with pytest.raises(ValueError):
           with tbl.transaction() as txn:
               txn.append(pa_table_with_column)
               raise ValueError
               txn.append(pa_table_with_column)
   
       assert len(tbl.scan().to_pandas()) == 0
   ```
   Since I explicitly raise an error during the transaction, the whole 
transaction should be abandoned. But this code block still insert 3 rows (first 
append) to the table.
   
   Please let me know if these make sense. Would love to hear your thoughts on 
this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to