Erigara commented on code in PR #2029:
URL: https://github.com/apache/iceberg-python/pull/2029#discussion_r2098978290
##########
pyiceberg/expressions/visitors.py:
##########
@@ -894,12 +895,17 @@ def visit_unbound_predicate(self, predicate:
UnboundPredicate[L]) -> BooleanExpr
def visit_bound_predicate(self, predicate: BoundPredicate[L]) ->
BooleanExpression:
file_column_name =
self.file_schema.find_column_name(predicate.term.ref().field.field_id)
+ field_name = predicate.term.ref().field.name
if file_column_name is None:
# In the case of schema evolution, the column might not be present
# in the file schema when reading older data
if isinstance(predicate, BoundIsNull):
return AlwaysTrue()
+ # Projected fields are only available for identity partition fields
+ # Which mean that partition pruning excluded partition field which
evaluates to false
+ elif field_name in self.projected_missing_fields:
+ return AlwaysTrue()
Review Comment:
On the second look such an approach could lead to incorrect results in case
of some complex predicates.
For example `(P = x AND F = a) OR ( P = y AND F = b)` by substituting the
term `P = ...` we would get an incorrect predicate `F = a OR F = b`.
The correct approach here should be substituting `P` with concrete value
extracted from partition.
Not sure how to implement this feature.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]