This is an automated email from the ASF dual-hosted git repository.
maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push:
new 9c83bf501cc [MINOR][DOCS] Fix documentation for
`spark.sql.legacy.doLooseUpcast` in SQL migration guide
9c83bf501cc is described below
commit 9c83bf501ccefa7c6c0ba071f69e2528f3504854
Author: Amy Tsai <[email protected]>
AuthorDate: Mon Dec 11 18:35:31 2023 +0300
[MINOR][DOCS] Fix documentation for `spark.sql.legacy.doLooseUpcast` in SQL
migration guide
### What changes were proposed in this pull request?
Fixes an error in the SQL migration guide documentation for
`spark.sql.legacy.doLooseUpcast`. I corrected the config name and moved it to
the section for migration from Spark 2.4 to 3.0 since it was not made available
until Spark 3.0.
### Why are the changes needed?
The config was documented as `spark.sql.legacy.looseUpcast` and is
inaccurately included in the Spark 2.4 to Spark 2.4.1 section.
I changed the docs to match what is implemented in
https://github.com/apache/spark/blob/20df062d85e80422a55afae80ddbf2060f26516c/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L3873
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Docs only change
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #44262 from amytsai-stripe/fix-migration-docs-loose-upcast.
Authored-by: Amy Tsai <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
(cherry picked from commit bab884082c0f82e3f9053adac6c7e8a3fcfab11c)
Signed-off-by: Max Gekk <[email protected]>
---
docs/sql-migration-guide.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 88635ee3d1f..2eba9500e90 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -251,6 +251,8 @@ license: |
- In Spark 3.0, the column metadata will always be propagated in the API
`Column.name` and `Column.as`. In Spark version 2.4 and earlier, the metadata
of `NamedExpression` is set as the `explicitMetadata` for the new column at the
time the API is called, it won't change even if the underlying
`NamedExpression` changes metadata. To restore the behavior before Spark 3.0,
you can use the API `as(alias: String, metadata: Metadata)` with explicit
metadata.
+ - When turning a Dataset to another Dataset, Spark will up cast the fields
in the original Dataset to the type of corresponding fields in the target
DataSet. In version 2.4 and earlier, this up cast is not very strict, e.g.
`Seq("str").toDS.as[Int]` fails, but `Seq("str").toDS.as[Boolean]` works and
throw NPE during execution. In Spark 3.0, the up cast is stricter and turning
String into something else is not allowed, i.e. `Seq("str").toDS.as[Boolean]`
will fail during analysis. To res [...]
+
### DDL Statements
- In Spark 3.0, when inserting a value into a table column with a different
data type, the type coercion is performed as per ANSI SQL standard. Certain
unreasonable type conversions such as converting `string` to `int` and `double`
to `boolean` are disallowed. A runtime exception is thrown if the value is
out-of-range for the data type of the column. In Spark version 2.4 and below,
type conversions during table insertion are allowed as long as they are valid
`Cast`. When inserting an o [...]
@@ -464,8 +466,6 @@ license: |
need to specify a value with units like "30s" now, to avoid being
interpreted as milliseconds; otherwise,
the extremely short interval that results will likely cause applications
to fail.
- - When turning a Dataset to another Dataset, Spark will up cast the fields
in the original Dataset to the type of corresponding fields in the target
DataSet. In version 2.4 and earlier, this up cast is not very strict, e.g.
`Seq("str").toDS.as[Int]` fails, but `Seq("str").toDS.as[Boolean]` works and
throw NPE during execution. In Spark 3.0, the up cast is stricter and turning
String into something else is not allowed, i.e. `Seq("str").toDS.as[Boolean]`
will fail during analysis. To res [...]
-
## Upgrading from Spark SQL 2.3 to 2.4
- In Spark version 2.3 and earlier, the second parameter to array_contains
function is implicitly promoted to the element type of first array type
parameter. This type promotion can be lossy and may cause `array_contains`
function to return wrong result. This problem has been addressed in 2.4 by
employing a safer type promotion mechanism. This can cause some change in
behavior and are illustrated in the table below.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]