kevinjqliu commented on issue #2409:
URL: 
https://github.com/apache/iceberg-python/issues/2409#issuecomment-3254588993

   Thanks everyone! This is a great find! @ForeverAngry looking forward to the 
PR! 
   
   > That being said, if I'm right, the same issue exists in the 
ManageSnapshots class as well.
   
   i think you're right. `ManageSnapshots` also declares a mutate state as 
class attributes
   
   
   I generated a script which scans the entire codebase for class with mutable 
states 
   
   ```
   import ast
   from pathlib import Path
   
   # Types we consider "mutable" at class scope
   MUTABLE_TYPES = (ast.List, ast.Dict, ast.Set)
   
   def is_mutable(node: ast.AST) -> bool:
       """Return True if the node represents a mutable literal."""
       if isinstance(node, MUTABLE_TYPES):
           return True
       if isinstance(node, ast.Call) and isinstance(node.func, ast.Name):
           # e.g. list(), dict(), set(), defaultdict()
           return node.func.id in {"list", "dict", "set", "defaultdict"}
       return False
   
   def find_mutable_class_attributes(file_path):
       with open(file_path, "r", encoding="utf-8") as f:
           source = f.read()
   
       try:
           tree = ast.parse(source, filename=file_path)
       except SyntaxError:
           return []
   
       results = []
   
       for node in ast.walk(tree):
           if isinstance(node, ast.ClassDef):
               class_name = node.name
               mutable_attrs = []
   
               for body_item in node.body:
                   if isinstance(body_item, ast.Assign):
                       if is_mutable(body_item.value):
                           for target in body_item.targets:
                               if isinstance(target, ast.Name):
                                   mutable_attrs.append(target.id)
   
                   elif isinstance(body_item, ast.AnnAssign):
                       if is_mutable(body_item.value) and 
isinstance(body_item.target, ast.Name):
                           mutable_attrs.append(body_item.target.id)
   
               if mutable_attrs:
                   results.append((class_name, mutable_attrs))
   
       return results
   
   def scan_directory(directory="."):
       for path in Path(directory).rglob("*.py"):
           # Skip unwanted directories
           if any(part in ("tests", "vendor") for part in path.parts):
               continue
   
           class_attrs = find_mutable_class_attributes(path)
           for class_name, attributes in class_attrs:
               print(f"{path}: class {class_name} → {attributes}")
   
   if __name__ == "__main__":
       scan_directory(".")
   ```
   
   Here are the results, 
   ```
   ➜  iceberg-python git:(main) ✗ poetry run python scan_class_attr.py
   pyiceberg/catalog/sql.py: class IcebergNamespaceProperties → 
['NAMESPACE_MINIMAL_PROPERTIES']
   pyiceberg/utils/singleton.py: class Singleton → ['_instances']
   pyiceberg/catalog/rest/auth.py: class AuthManagerFactory → ['_registry']
   pyiceberg/table/update/spec.py: class UpdateSpec → ['_name_to_field', 
'_name_to_added_field', '_transform_to_field', '_transform_to_added_field', 
'_renames', '_added_time_fields']
   pyiceberg/table/update/snapshot.py: class ExpireSnapshots → 
['_snapshot_ids_to_expire']
   pyiceberg/table/update/schema.py: class UpdateSchema → ['_adds', '_updates', 
'_deletes', '_moves', '_added_name_to_id', '_id_to_parent']
   ```
   
   Some of these are false positives. But it looks like `UpdateSpec`, 
`ExpireSnapshots`, and `UpdateSchema` all have the same problem


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to