srilman opened a new issue, #1778:
URL: https://github.com/apache/iceberg-python/issues/1778

   ### Apache Iceberg version
   
   0.9.0 (latest release)
   
   ### Please describe the bug 🐞
   
   When attempting to apply filter to top-level struct columns, such as null / 
not-null, an error occurs. For example:
   
   ```py
   from pyiceberg.catalog.sql import SqlCatalog
   from pyiceberg.schema import Schema
   from pyiceberg.types import NestedField, StructType, IntegerType, StringType
   import pyiceberg.expressions as pe
   import pyarrow as pa
   
   catalog = SqlCatalog("sql_catalog", uri="sqlite:///:memory:")
   catalog.create_namespace("ns")
   
   schema = Schema(
       NestedField(1, "structs", StructType(
           NestedField(2, "id", IntegerType(), required=True),
           NestedField(3, "name", StringType(), required=True),
       )),
   )
   table = catalog.create_table("ns.struct_table", schema, 
"/tmp/wh/ns/struct_table")
   
   df = pa.Table.from_pydict({
       "structs": [
           {"id": 1, "name": "a"},
           {"id": 2, "name": "b"},
           {"id": 3, "name": "c"},
       ]
   }, schema=schema.as_arrow())
   table.append(df)
   
   print(list(table.scan(row_filter=pe.NotNull("structs")).plan_files()))
   ```
   ```
   Traceback (most recent call last):
     File "/Users/slade/bodo/mono/develop/test.py", line 27, in <module>
       print(list(table.scan(row_filter=pe.NotNull("structs")).plan_files()))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/Users/slade/bodo/mono/develop/.pixi/envs/default/lib/python3.12/site-packages/pyiceberg/table/__init__.py",
 line 1697, in plan_files
       if manifest_evaluators[manifest_file.partition_spec_id](manifest_file)
          ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   ...
     File 
"/Users/slade/bodo/mono/develop/.pixi/envs/default/lib/python3.12/site-packages/pyiceberg/expressions/__init__.py",
 line 201, in bind
       accessor = schema.accessor_for_field(field.field_id)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/Users/slade/bodo/mono/develop/.pixi/envs/default/lib/python3.12/site-packages/pyiceberg/schema.py",
 line 280, in accessor_for_field
       raise ValueError(f"Could not find accessor for field with id: 
{field_id}")
   ValueError: Could not find accessor for field with id: 1
   ```
   
   It looks like the cause is an intention feature of the field to accessor map 
for schemas. See the docstring of class `_BuildPositionAccessors`:
   ```py
   class _BuildPositionAccessors(SchemaVisitor[Dict[Position, Accessor]]):
       """A schema visitor for generating a field ID to accessor index.
   
       Example:
           >>> from pyiceberg.schema import Schema
           >>> from pyiceberg.types import *
           >>> schema = Schema(
           ...     NestedField(field_id=2, name="id", field_type=IntegerType(), 
required=False),
           ...     NestedField(field_id=1, name="data", 
field_type=StringType(), required=True),
           ...     NestedField(
           ...         field_id=3,
           ...         name="location",
           ...         field_type=StructType(
           ...             NestedField(field_id=5, name="latitude", 
field_type=FloatType(), required=False),
           ...             NestedField(field_id=6, name="longitude", 
field_type=FloatType(), required=False),
           ...         ),
           ...         required=True,
           ...     ),
           ...     schema_id=1,
           ...     identifier_field_ids=[1],
           ... )
           >>> result = build_position_accessors(schema)
           >>> expected = {
           ...     2: Accessor(position=0, inner=None),
           ...     1: Accessor(position=1, inner=None),
           ...     5: Accessor(position=2, inner=Accessor(position=0, 
inner=None)),
           ...     6: Accessor(position=2, inner=Accessor(position=1, 
inner=None))
           ... }
           >>> result == expected
           True
       """
   ```
   But I'm not exactly sure why. Looking at all uses, I don't see a reason why 
the id_to_accessor map shouldn't include top-level structs. Is there a reason 
why, or is this just a bug? If its just a bug, I think this is a 1-2 line fix 
in `_BuildPositionAccessors`.
   
   ### Willingness to contribute
   
   - [x] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to