Repository: spark Updated Branches: refs/heads/master d28d5732a -> 683ffe062
[SPARK-22335][SQL] Clarify union behavior on Dataset of typed objects in the document ## What changes were proposed in this pull request? Seems that end users can be confused by the union's behavior on Dataset of typed objects. We can clarity it more in the document of `union` function. ## How was this patch tested? Only document change. Author: Liang-Chi Hsieh <[email protected]> Closes #19570 from viirya/SPARK-22335. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/683ffe06 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/683ffe06 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/683ffe06 Branch: refs/heads/master Commit: 683ffe0620e69fd6e9f92c1037eef7996029aba8 Parents: d28d573 Author: Liang-Chi Hsieh <[email protected]> Authored: Sat Oct 28 21:47:15 2017 +0900 Committer: hyukjinkwon <[email protected]> Committed: Sat Oct 28 21:47:15 2017 +0900 ---------------------------------------------------------------------- .../scala/org/apache/spark/sql/Dataset.scala | 21 +++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/683ffe06/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---------------------------------------------------------------------- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala index fe4e192..bd99ec5 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -1747,7 +1747,26 @@ class Dataset[T] private[sql]( * This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does * deduplication of elements), use this function followed by a [[distinct]]. * - * Also as standard in SQL, this function resolves columns by position (not by name). + * Also as standard in SQL, this function resolves columns by position (not by name): + * + * {{{ + * val df1 = Seq((1, 2, 3)).toDF("col0", "col1", "col2") + * val df2 = Seq((4, 5, 6)).toDF("col1", "col2", "col0") + * df1.union(df2).show + * + * // output: + * // +----+----+----+ + * // |col0|col1|col2| + * // +----+----+----+ + * // | 1| 2| 3| + * // | 4| 5| 6| + * // +----+----+----+ + * }}} + * + * Notice that the column positions in the schema aren't necessarily matched with the + * fields in the strongly typed objects in a Dataset. This function resolves columns + * by their positions in the schema, not the fields in the strongly typed objects. Use + * [[unionByName]] to resolve columns by field name in the typed objects. * * @group typedrel * @since 2.0.0 --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
