hemanthboyina commented on code in PR #16531:
URL: https://github.com/apache/iceberg/pull/16531#discussion_r3372393764
##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/BaseSparkScanBuilder.java:
##########
@@ -164,9 +164,19 @@ public Predicate[] pushPredicates(Predicate[] predicates) {
Binder.bind(projection.asStruct(), expr, caseSensitive);
expressions.add(expr);
pushablePredicates.add(predicate);
+ } else {
+ // if the full predicate can't be converted, try to extract
convertible conjuncts
+ // from compound AND predicates for file-level pruning
+ Expression partialExpr = convertAndConjuncts(predicate);
Review Comment:
Thanks for the suggestion, I considered moving this to
SparkV2Filters.convert() initially, but chose pushPredicates() intentionally to
avoid a correctness issue.
In pushPredicates, there's this logic:
```java
if (expr == null || !ExpressionUtil.selectsPartitions(expr, table,
caseSensitive)) {
postScanPredicates.add(predicate);
} else {
LOG.info("Evaluating completely on Iceberg side");
}
```
If convert(AND(partition_col = 'US', udf(col) > 5)) returns just
partition_col = 'US', and that expression selects entire partitions, the
original AND predicate won't be returned to Spark as a post-scan filter. Spark
would then skip evaluating udf(col) > 5 — returning wrong results.
By handling this in pushPredicates() instead, expr remains null from
convert(), which guarantees the original predicate always stays in post-scan
filters for Spark to re-evaluate the full condition.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]