Repository: spark
Updated Branches:
  refs/heads/branch-2.0 4c5e16f58 -> e68872f2e


[SPARK-16181][SQL] outer join with isNull filter may return wrong result

## What changes were proposed in this pull request?

The root cause is: the output attributes of outer join are derived from its 
children, while they are actually different attributes(outer join can return 
null).

We have already added some special logic to handle it, e.g. 
`PushPredicateThroughJoin` won't push down predicates through outer join side, 
`FixNullability`.

This PR adds one more special logic in `FoldablePropagation`.

## How was this patch tested?

new test in `DataFrameSuite`

Author: Wenchen Fan <[email protected]>

Closes #13884 from cloud-fan/bug.

(cherry picked from commit 1f2776df6e87a84991537ac20e4b8829472d3462)
Signed-off-by: Yin Huai <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e68872f2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e68872f2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e68872f2

Branch: refs/heads/branch-2.0
Commit: e68872f2ef89e85ab7a856bee82d16c938de6db0
Parents: 4c5e16f
Author: Wenchen Fan <[email protected]>
Authored: Tue Jun 28 10:26:01 2016 -0700
Committer: Yin Huai <[email protected]>
Committed: Tue Jun 28 10:28:57 2016 -0700

----------------------------------------------------------------------
 .../org/apache/spark/sql/catalyst/optimizer/Optimizer.scala | 8 ++++++++
 .../test/scala/org/apache/spark/sql/DataFrameSuite.scala    | 9 +++++++++
 2 files changed, 17 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/e68872f2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 6b10484..f24f8b7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -687,6 +687,14 @@ object FoldablePropagation extends Rule[LogicalPlan] {
         case c: Command =>
           stop = true
           c
+        // For outer join, although its output attributes are derived from its 
children, they are
+        // actually different attributes: the output of outer join is not 
always picked from its
+        // children, but can also be null.
+        // TODO(cloud-fan): It seems more reasonable to use new attributes as 
the output attributes
+        // of outer join.
+        case j @ Join(_, _, LeftOuter | RightOuter | FullOuter, _) =>
+          stop = true
+          j
         case p: LogicalPlan if !stop => p.transformExpressions {
           case a: AttributeReference if foldableMap.contains(a) =>
             foldableMap(a)

http://git-wip-us.apache.org/repos/asf/spark/blob/e68872f2/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
index 1afee9f..5151532 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
@@ -1541,4 +1541,13 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
     val df = Seq(1, 1, 2).toDF("column.with.dot")
     checkAnswer(df.distinct(), Row(1) :: Row(2) :: Nil)
   }
+
+  test("SPARK-16181: outer join with isNull filter") {
+    val left = Seq("x").toDF("col")
+    val right = Seq("y").toDF("col").withColumn("new", lit(true))
+    val joined = left.join(right, left("col") === right("col"), "left_outer")
+
+    checkAnswer(joined, Row("x", null, null))
+    checkAnswer(joined.filter($"new".isNull), Row("x", null, null))
+  }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to