[
https://issues.apache.org/jira/browse/SPARK-43413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-43413:
-----------------------------------
Labels: pull-request-available (was: )
> IN subquery ListQuery has wrong nullability
> -------------------------------------------
>
> Key: SPARK-43413
> URL: https://issues.apache.org/jira/browse/SPARK-43413
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.4.0
> Reporter: Jack Chen
> Assignee: Jack Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.5.0
>
>
> IN subquery expressions are incorrectly marked as non-nullable, even when
> they are actually nullable. They correctly check the nullability of the
> left-hand-side, but the right-hand-side of a IN subquery, the ListQuery, is
> currently defined with nullability = false always. This is incorrect and can
> lead to incorrect query transformations.
> Example: (non_nullable_col IN (select nullable_col)) <=> TRUE . Here the IN
> expression returns NULL when the nullable_col is null, but our code marks it
> as non-nullable, and therefore SimplifyBinaryComparison transforms away the
> <=> TRUE, transforming the expression to non_nullable_col IN (select
> nullable_col) , which is an incorrect transformation because NULL values of
> nullable_col now cause the expression to yield NULL instead of FALSE.
> This bug can potentially lead to wrong results, but in most cases this
> doesn't directly cause wrong results end-to-end, because IN subqueries are
> almost always transformed to semi/anti/existence joins in
> RewritePredicateSubquery, and this rewrite can also incorrectly discard
> NULLs, which is another bug. But we can observe it causing wrong behavior in
> unit tests, and it could easily lead to incorrect query results if there are
> changes to the surrounding context, so it should be fixed regardless.
> This is a long-standing bug that has existed at least since 2016, as long as
> the ListQuery class has existed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]