sungwy commented on code in PR #1689:
URL: https://github.com/apache/iceberg-python/pull/1689#discussion_r1963684910
##########
pyiceberg/expressions/parser.py:
##########
@@ -90,13 +89,7 @@
@column.set_parse_action
def _(result: ParseResults) -> Reference:
if len(result.column) > 1:
- deprecation_message(
- deprecated_in="0.8.0",
- removed_in="0.9.0",
- help_message="Parsing expressions with table name is deprecated.
Only provide field names in the row_filter.",
- )
- # TODO: Once this is removed, we will no longer take just the last index
of parsed column result
- # And introduce support for parsing filter expressions with nested fields.
+ raise ValueError(f"Cannot parse expressions with table names or nested
fields, got: {".".join(result.column)}")
return Reference(result.column[-1])
Review Comment:
Hello!
There's two missing features in PyIceberg right now with expression parsing:
- Support for parsing field names with "." delimiters
- Support for parsing nested fields
I think @Fokko 's suggested change will make it work for both those cases,
and I think there will be value in testing the Reference binding for both cases
in `tests/expressions/test_expressions.py`.
I can think of three test cases:
```
def test_nested_bind() -> None:
schema = Schema(NestedField(2, "name", StructType(NestedField(3,
"first", StringType()))), schema_id=1)
bound = BoundIsNull(BoundReference(schema.find_field(3),
schema.accessor_for_field(3)))
assert IsNull(Reference("name.first")).bind(schema) == bound
def test_bind_dot_name() -> None:
schema = Schema(NestedField(2, "name.first", StringType()), schema_id=1)
bound = BoundIsNull(BoundReference(schema.find_field(2),
schema.accessor_for_field(2)))
assert IsNull(Reference("name.first")).bind(schema) == bound
def test_bind_ambiguous_name() -> None:
with pytest.raises(ValueError) as exc_info:
schema = Schema(NestedField(2, "name", StructType(NestedField(3,
"first", StringType()))), NestedField(4, "name.first", StringType()),
schema_id=1)
assert "Invalid schema, multiple fields for name name.first: 3 and 4" in
str(exc_info)
bound = BoundIsNull(BoundReference(schema.find_field(3),
schema.accessor_for_field(3)))
assert IsNull(Reference("name.first")).bind(schema) == bound
```
Where the last one should throw an error that's propagated from the Schema
having ambiguous names.
I had originally thought that this work would be a lot more complicated
because we would want to distinguish the nested fields from simple field names
with "." delimiters. But I think we can just rely on PyIceberg not supporting
ambiguous field names for now as we introduce nested parsing support
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]