amogh-jahagirdar commented on code in PR #2029:
URL: https://github.com/apache/iceberg-python/pull/2029#discussion_r2243422896
##########
pyiceberg/expressions/visitors.py:
##########
@@ -910,6 +915,14 @@ def visit_bound_predicate(self, predicate:
BoundPredicate[L]) -> BooleanExpressi
else:
raise ValueError(f"Unsupported predicate: {predicate}")
+ # In the order described by the "Column Projection" section of the
Iceberg spec:
+ # https://iceberg.apache.org/spec/#column-projection
+ # Evaluate column projection first if it exists
+ if projected_field_value :=
self.projected_field_values.get(field.name):
Review Comment:
Curious, what's the rationale for not including default value handling
inside the logic that produces the `projected_field_values`? Seems like the
intent of `_get_column_projection_values` is to apply all the projection rules
based on the comment but it looks like we apply most of them and then here we
fall through to applying the initial default on 928. May be better if all of
that logic is self contained in the function so that in case things move around
you don't have a separate place where default values are propagated
##########
pyiceberg/expressions/visitors.py:
##########
@@ -869,10 +871,14 @@ class
_ColumnNameTranslator(BooleanExpressionVisitor[BooleanExpression]):
file_schema: Schema
case_sensitive: bool
+ projected_field_values: Dict[str, Any]
- def __init__(self, file_schema: Schema, case_sensitive: bool) -> None:
+ def __init__(
+ self, file_schema: Schema, case_sensitive: bool,
projected_field_values: Optional[Dict[str, Any]] = None
Review Comment:
Not sure if it's idiomatic python, so up to you @kevinjqliu @Fokko , but
is it possible to just make the default value here an empty dictionary, and
then self.projected_field_values = self.projected_field_values
##########
tests/expressions/test_visitors.py:
##########
@@ -1623,3 +1625,282 @@ def test_expression_evaluator_null() -> None:
assert expression_evaluator(schema, LessThan("a", 1),
case_sensitive=True)(struct) is False
assert expression_evaluator(schema, StartsWith("a", 1),
case_sensitive=True)(struct) is False
assert expression_evaluator(schema, NotStartsWith("a", 1),
case_sensitive=True)(struct) is True
+
+
+def test_translate_column_names_simple_case(table_schema_simple: Schema) ->
None:
Review Comment:
Thanks for all the test cases, I've stepped through them and they cover the
cases I'd expect.
Some other additions that may be worth it:
1.) Disjunctive/Conjunctive cases (Or, and, etc) where one field is missing
from the file and one field is not. Maybe mix this in with the rename case
where the file on disk has the field but with a different name (I see that's
already tested in the single predicate case, but just to sanity check combined
cases)
2.) Maybe a nested field case though it's really no different
Down the line, when say Spark can support the DDL. for default values then
we can have end to end verification tests as well
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]