spark git commit: [SPARK-22335][SQL] Clarify union behavior on Dataset of typed objects in the document

gurwls223 Sat, 28 Oct 2017 05:47:55 -0700

Repository: spark
Updated Branches:
  refs/heads/master d28d5732a -> 683ffe062



[SPARK-22335][SQL] Clarify union behavior on Dataset of typed objects in the 
document

## What changes were proposed in this pull request?

Seems that end users can be confused by the union's behavior on Dataset of 
typed objects. We can clarity it more in the document of `union` function.

## How was this patch tested?

Only document change.

Author: Liang-Chi Hsieh <[email protected]>

Closes #19570 from viirya/SPARK-22335.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/683ffe06
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/683ffe06
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/683ffe06

Branch: refs/heads/master
Commit: 683ffe0620e69fd6e9f92c1037eef7996029aba8
Parents: d28d573
Author: Liang-Chi Hsieh <[email protected]>
Authored: Sat Oct 28 21:47:15 2017 +0900
Committer: hyukjinkwon <[email protected]>
Committed: Sat Oct 28 21:47:15 2017 +0900

----------------------------------------------------------------------
 .../scala/org/apache/spark/sql/Dataset.scala    | 21 +++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/683ffe06/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index fe4e192..bd99ec5 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1747,7 +1747,26 @@ class Dataset[T] private[sql](
    * This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union 
(that does
    * deduplication of elements), use this function followed by a [[distinct]].
    *
-   * Also as standard in SQL, this function resolves columns by position (not 
by name).
+   * Also as standard in SQL, this function resolves columns by position (not 
by name):
+   *
+   * {{{
+   *   val df1 = Seq((1, 2, 3)).toDF("col0", "col1", "col2")
+   *   val df2 = Seq((4, 5, 6)).toDF("col1", "col2", "col0")
+   *   df1.union(df2).show
+   *
+   *   // output:
+   *   // +----+----+----+
+   *   // |col0|col1|col2|
+   *   // +----+----+----+
+   *   // |   1|   2|   3|
+   *   // |   4|   5|   6|
+   *   // +----+----+----+
+   * }}}
+   *
+   * Notice that the column positions in the schema aren't necessarily matched 
with the
+   * fields in the strongly typed objects in a Dataset. This function resolves 
columns
+   * by their positions in the schema, not the fields in the strongly typed 
objects. Use
+   * [[unionByName]] to resolve columns by field name in the typed objects.
    *
    * @group typedrel
    * @since 2.0.0


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-22335][SQL] Clarify union behavior on Dataset of typed objects in the document

Reply via email to