[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #8446: Spark: value list of IN/NOT_IN containing null value should not be converted to Iceberg expression

via GitHub Tue, 19 Sep 2023 13:17:02 -0700


aokolnychyi commented on code in PR #8446:
URL: https://github.com/apache/iceberg/pull/8446#discussion_r1330656200



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java:
##########
@@ -161,10 +162,13 @@ public static Expression convert(Filter filter) {
 
         case IN:
           In inFilter = (In) filter;
+          if (Stream.of(inFilter.values()).anyMatch(Objects::isNull)) {

Review Comment:
   I thought we handled such cases in a special way. For instance, there is 
`hasNoInFilter` used in the negation to recursively check for nested NOT IN 
inside NOT. We do handle IN and NOT IN differently, there are separate branches 
for them with specific null handling.
   
   My worry is that filters like `IN (1, 2, NULL)`, which are perfectly fine to 
push down, will no longer be pushed down, causing silent performance issues. It 
is unlikely someone explicitly passes NULL inside IN but such predicates can be 
generated programatically.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #8446: Spark: value list of IN/NOT_IN containing null value should not be converted to Iceberg expression

Reply via email to