Re: [PR] Spark: Push supported filters from compound AND predicates that contain unsupported expressions [iceberg]

via GitHub Mon, 08 Jun 2026 03:17:25 -0700


hemanthboyina commented on code in PR #16531:
URL: https://github.com/apache/iceberg/pull/16531#discussion_r3372393764



##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/BaseSparkScanBuilder.java:
##########
@@ -164,9 +164,19 @@ public Predicate[] pushPredicates(Predicate[] predicates) {
           Binder.bind(projection.asStruct(), expr, caseSensitive);
           expressions.add(expr);
           pushablePredicates.add(predicate);
+        } else {
+          // if the full predicate can't be converted, try to extract 
convertible conjuncts
+          // from compound AND predicates for file-level pruning
+          Expression partialExpr = convertAndConjuncts(predicate);

Review Comment:
   Thanks for the suggestion, I considered moving this to 
SparkV2Filters.convert() initially, but chose pushPredicates() intentionally to 
avoid a correctness issue.
   
   In pushPredicates, there's this logic:
   
   ```java
   if (expr == null || !ExpressionUtil.selectsPartitions(expr, table, 
caseSensitive)) {
       postScanPredicates.add(predicate);
   } else {
       LOG.info("Evaluating completely on Iceberg side");
   }
   ```
   
   If convert(AND(partition_col = 'US', udf(col) > 5)) returns just 
partition_col = 'US', and that expression selects entire partitions, the 
original AND predicate won't be returned to Spark as a post-scan filter. Spark 
would then skip evaluating udf(col) > 5 — returning wrong results.
   
   By handling this in pushPredicates() instead, expr remains null from 
convert(), which guarantees the original predicate always stays in post-scan 
filters for Spark to re-evaluate the full condition.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: Push supported filters from compound AND predicates that contain unsupported expressions [iceberg]

Reply via email to