ForeverAngry commented on issue #2409:
URL:
https://github.com/apache/iceberg-python/issues/2409#issuecomment-3251396079
@QlikFrederic thanks for reporting this! I'm sorry you ran into this bug :/
I believe i reproduced and found the cause.
I think the issue is here:
```python
class ExpireSnapshots(UpdateTableMetadata["ExpireSnapshots"]):
_snapshot_ids_to_expire: Set[int] = set() # ❌ SHARED ACROSS ALL
INSTANCES!
_updates: Tuple[TableUpdate, ...] = ()
_requirements: Tuple[TableRequirement, ...] = ()
```
Where the `_snapshot_ids_to_expire` is a **class-level attribute**, not
instance-level, so when Thread 1 does `table1.expire_snapshots().by_id(1001)`
and Thread 2 does `table2.expire_snapshots().by_id(2001)`, they're both adding
to the **same shared set**.
The fix seems trivia, i think... 🤞 - I moved those attributes to the
`__init__` method:
```python
def __init__(self, transaction: Transaction) -> None:
super().__init__(transaction)
# ✅ Instance-level now - each table gets its own!
self._snapshot_ids_to_expire: Set[int] = set()
self._updates: Tuple[TableUpdate, ...] = ()
self._requirements: Tuple[TableRequirement, ...] = ()
```
I wrote tests to reproduced the bug (literally got the exact same error
message as your issue), applied the fix, and thread safety seemed to be
restored! No more snapshot ID mix-ups between tables.
That being said, if I'm right, the same issue exists in the
`ManageSnapshots` class as well.
I'm traveling at the moment, once I'm home, I can push a branch for you to
test (sometime in the next 24-48 hours). Let me know what you think!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]