Re: [PR] Deprecate for 0.9.0 release [iceberg-python]

via GitHub Thu, 20 Feb 2025 06:38:34 -0800


sungwy commented on code in PR #1689:
URL: https://github.com/apache/iceberg-python/pull/1689#discussion_r1963684910



##########
pyiceberg/expressions/parser.py:
##########
@@ -90,13 +89,7 @@
 @column.set_parse_action
 def _(result: ParseResults) -> Reference:
     if len(result.column) > 1:
-        deprecation_message(
-            deprecated_in="0.8.0",
-            removed_in="0.9.0",
-            help_message="Parsing expressions with table name is deprecated. 
Only provide field names in the row_filter.",
-        )
-    # TODO: Once this is removed, we will no longer take just the last index 
of parsed column result
-    # And introduce support for parsing filter expressions with nested fields.
+        raise ValueError(f"Cannot parse expressions with table names or nested 
fields, got: {".".join(result.column)}")
     return Reference(result.column[-1])

Review Comment:
   Hello!
   
   There's two missing features in PyIceberg right now with expression parsing:
   - Support for parsing field names with "." delimiters
   - Support for parsing nested fields
   
   I think @Fokko 's suggested change will make it work for both those cases, 
and I think there will be value in testing the Reference binding for both cases 
in `tests/expressions/test_expressions.py`.
   
   I can think of three test cases:
   
   ```
   def test_nested_bind() -> None:
       schema = Schema(NestedField(2, "name", StructType(NestedField(3, 
"first", StringType()))), schema_id=1)
       bound = BoundIsNull(BoundReference(schema.find_field(3), 
schema.accessor_for_field(3)))
       assert IsNull(Reference("name.first")).bind(schema) == bound
   
   
   def test_bind_dot_name() -> None:
       schema = Schema(NestedField(2, "name.first", StringType()), schema_id=1)
       bound = BoundIsNull(BoundReference(schema.find_field(2), 
schema.accessor_for_field(2)))
       assert IsNull(Reference("name.first")).bind(schema) == bound
   
   
   def test_bind_ambiguous_name() -> None:
       with pytest.raises(ValueError) as exc_info:
           schema = Schema(NestedField(2, "name", StructType(NestedField(3, 
"first", StringType()))), NestedField(4, "name.first", StringType()), 
schema_id=1)
       assert "Invalid schema, multiple fields for name name.first: 3 and 4" in 
str(exc_info)
       bound = BoundIsNull(BoundReference(schema.find_field(3), 
schema.accessor_for_field(3)))
       assert IsNull(Reference("name.first")).bind(schema) == bound
   ```
   
   Where the last one should throw an error that's propagated from the Schema 
having ambiguous names.
   
   I had originally thought that this work would be a lot more complicated 
because we would want to distinguish the nested fields from simple field names 
with "." delimiters. But I think we can just rely on PyIceberg not supporting 
ambiguous field names for now as we introduce nested parsing support



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Deprecate for 0.9.0 release [iceberg-python]

Reply via email to