srilman opened a new issue, #1778: URL: https://github.com/apache/iceberg-python/issues/1778
### Apache Iceberg version 0.9.0 (latest release) ### Please describe the bug 🐞 When attempting to apply filter to top-level struct columns, such as null / not-null, an error occurs. For example: ```py from pyiceberg.catalog.sql import SqlCatalog from pyiceberg.schema import Schema from pyiceberg.types import NestedField, StructType, IntegerType, StringType import pyiceberg.expressions as pe import pyarrow as pa catalog = SqlCatalog("sql_catalog", uri="sqlite:///:memory:") catalog.create_namespace("ns") schema = Schema( NestedField(1, "structs", StructType( NestedField(2, "id", IntegerType(), required=True), NestedField(3, "name", StringType(), required=True), )), ) table = catalog.create_table("ns.struct_table", schema, "/tmp/wh/ns/struct_table") df = pa.Table.from_pydict({ "structs": [ {"id": 1, "name": "a"}, {"id": 2, "name": "b"}, {"id": 3, "name": "c"}, ] }, schema=schema.as_arrow()) table.append(df) print(list(table.scan(row_filter=pe.NotNull("structs")).plan_files())) ``` ``` Traceback (most recent call last): File "/Users/slade/bodo/mono/develop/test.py", line 27, in <module> print(list(table.scan(row_filter=pe.NotNull("structs")).plan_files())) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/slade/bodo/mono/develop/.pixi/envs/default/lib/python3.12/site-packages/pyiceberg/table/__init__.py", line 1697, in plan_files if manifest_evaluators[manifest_file.partition_spec_id](manifest_file) ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ... File "/Users/slade/bodo/mono/develop/.pixi/envs/default/lib/python3.12/site-packages/pyiceberg/expressions/__init__.py", line 201, in bind accessor = schema.accessor_for_field(field.field_id) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/slade/bodo/mono/develop/.pixi/envs/default/lib/python3.12/site-packages/pyiceberg/schema.py", line 280, in accessor_for_field raise ValueError(f"Could not find accessor for field with id: {field_id}") ValueError: Could not find accessor for field with id: 1 ``` It looks like the cause is an intention feature of the field to accessor map for schemas. See the docstring of class `_BuildPositionAccessors`: ```py class _BuildPositionAccessors(SchemaVisitor[Dict[Position, Accessor]]): """A schema visitor for generating a field ID to accessor index. Example: >>> from pyiceberg.schema import Schema >>> from pyiceberg.types import * >>> schema = Schema( ... NestedField(field_id=2, name="id", field_type=IntegerType(), required=False), ... NestedField(field_id=1, name="data", field_type=StringType(), required=True), ... NestedField( ... field_id=3, ... name="location", ... field_type=StructType( ... NestedField(field_id=5, name="latitude", field_type=FloatType(), required=False), ... NestedField(field_id=6, name="longitude", field_type=FloatType(), required=False), ... ), ... required=True, ... ), ... schema_id=1, ... identifier_field_ids=[1], ... ) >>> result = build_position_accessors(schema) >>> expected = { ... 2: Accessor(position=0, inner=None), ... 1: Accessor(position=1, inner=None), ... 5: Accessor(position=2, inner=Accessor(position=0, inner=None)), ... 6: Accessor(position=2, inner=Accessor(position=1, inner=None)) ... } >>> result == expected True """ ``` But I'm not exactly sure why. Looking at all uses, I don't see a reason why the id_to_accessor map shouldn't include top-level structs. Is there a reason why, or is this just a bug? If its just a bug, I think this is a 1-2 line fix in `_BuildPositionAccessors`. ### Willingness to contribute - [x] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org