kevinjqliu commented on issue #2409:
URL:
https://github.com/apache/iceberg-python/issues/2409#issuecomment-3254588993
Thanks everyone! This is a great find! @ForeverAngry looking forward to the
PR!
> That being said, if I'm right, the same issue exists in the
ManageSnapshots class as well.
i think you're right. `ManageSnapshots` also declares a mutate state as
class attributes
I generated a script which scans the entire codebase for class with mutable
states
```
import ast
from pathlib import Path
# Types we consider "mutable" at class scope
MUTABLE_TYPES = (ast.List, ast.Dict, ast.Set)
def is_mutable(node: ast.AST) -> bool:
"""Return True if the node represents a mutable literal."""
if isinstance(node, MUTABLE_TYPES):
return True
if isinstance(node, ast.Call) and isinstance(node.func, ast.Name):
# e.g. list(), dict(), set(), defaultdict()
return node.func.id in {"list", "dict", "set", "defaultdict"}
return False
def find_mutable_class_attributes(file_path):
with open(file_path, "r", encoding="utf-8") as f:
source = f.read()
try:
tree = ast.parse(source, filename=file_path)
except SyntaxError:
return []
results = []
for node in ast.walk(tree):
if isinstance(node, ast.ClassDef):
class_name = node.name
mutable_attrs = []
for body_item in node.body:
if isinstance(body_item, ast.Assign):
if is_mutable(body_item.value):
for target in body_item.targets:
if isinstance(target, ast.Name):
mutable_attrs.append(target.id)
elif isinstance(body_item, ast.AnnAssign):
if is_mutable(body_item.value) and
isinstance(body_item.target, ast.Name):
mutable_attrs.append(body_item.target.id)
if mutable_attrs:
results.append((class_name, mutable_attrs))
return results
def scan_directory(directory="."):
for path in Path(directory).rglob("*.py"):
# Skip unwanted directories
if any(part in ("tests", "vendor") for part in path.parts):
continue
class_attrs = find_mutable_class_attributes(path)
for class_name, attributes in class_attrs:
print(f"{path}: class {class_name} → {attributes}")
if __name__ == "__main__":
scan_directory(".")
```
Here are the results,
```
➜ iceberg-python git:(main) ✗ poetry run python scan_class_attr.py
pyiceberg/catalog/sql.py: class IcebergNamespaceProperties →
['NAMESPACE_MINIMAL_PROPERTIES']
pyiceberg/utils/singleton.py: class Singleton → ['_instances']
pyiceberg/catalog/rest/auth.py: class AuthManagerFactory → ['_registry']
pyiceberg/table/update/spec.py: class UpdateSpec → ['_name_to_field',
'_name_to_added_field', '_transform_to_field', '_transform_to_added_field',
'_renames', '_added_time_fields']
pyiceberg/table/update/snapshot.py: class ExpireSnapshots →
['_snapshot_ids_to_expire']
pyiceberg/table/update/schema.py: class UpdateSchema → ['_adds', '_updates',
'_deletes', '_moves', '_added_name_to_id', '_id_to_parent']
```
Some of these are false positives. But it looks like `UpdateSpec`,
`ExpireSnapshots`, and `UpdateSchema` all have the same problem
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]