sungwy commented on code in PR #1689: URL: https://github.com/apache/iceberg-python/pull/1689#discussion_r1963684910
########## pyiceberg/expressions/parser.py: ########## @@ -90,13 +89,7 @@ @column.set_parse_action def _(result: ParseResults) -> Reference: if len(result.column) > 1: - deprecation_message( - deprecated_in="0.8.0", - removed_in="0.9.0", - help_message="Parsing expressions with table name is deprecated. Only provide field names in the row_filter.", - ) - # TODO: Once this is removed, we will no longer take just the last index of parsed column result - # And introduce support for parsing filter expressions with nested fields. + raise ValueError(f"Cannot parse expressions with table names or nested fields, got: {".".join(result.column)}") return Reference(result.column[-1]) Review Comment: Hello! There's two missing features in PyIceberg right now with expression parsing: - Support for parsing field names with "." delimiters - Support for parsing nested fields I think @Fokko 's suggested change will make it work for both those cases, and I think there will be value in testing the Reference binding for both cases in `tests/expressions/test_expressions.py`. I can think of three test cases: ``` def test_nested_bind() -> None: schema = Schema(NestedField(2, "name", StructType(NestedField(3, "first", StringType()))), schema_id=1) bound = BoundIsNull(BoundReference(schema.find_field(3), schema.accessor_for_field(3))) assert IsNull(Reference("name.first")).bind(schema) == bound def test_bind_dot_name() -> None: schema = Schema(NestedField(2, "name.first", StringType()), schema_id=1) bound = BoundIsNull(BoundReference(schema.find_field(2), schema.accessor_for_field(2))) assert IsNull(Reference("name.first")).bind(schema) == bound def test_bind_ambiguous_name() -> None: with pytest.raises(ValueError) as exc_info: schema = Schema(NestedField(2, "name", StructType(NestedField(3, "first", StringType()))), NestedField(4, "name.first", StringType()), schema_id=1) assert "Invalid schema, multiple fields for name name.first: 3 and 4" in str(exc_info) bound = BoundIsNull(BoundReference(schema.find_field(3), schema.accessor_for_field(3))) assert IsNull(Reference("name.first")).bind(schema) == bound ``` Where the last one should throw an error that's propagated from the Schema having ambiguous names. I had originally thought that this work would be a lot more complicated because we would want to distinguish the nested fields from simple field names with "." delimiters, but I think we can just rely on PyIceberg not supporting ambiguous field names for now as we introduce nested parsing support -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org