(spark) branch branch-3.5 updated: [MINOR][DOCS] Fix documentation for `spark.sql.legacy.doLooseUpcast` in SQL migration guide

maxgekk Mon, 11 Dec 2023 07:36:05 -0800

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.5 by this push:
     new 9c83bf501cc [MINOR][DOCS] Fix documentation for 
`spark.sql.legacy.doLooseUpcast` in SQL migration guide
9c83bf501cc is described below

commit 9c83bf501ccefa7c6c0ba071f69e2528f3504854
Author: Amy Tsai <[email protected]>
AuthorDate: Mon Dec 11 18:35:31 2023 +0300

    [MINOR][DOCS] Fix documentation for `spark.sql.legacy.doLooseUpcast` in SQL 
migration guide
    
    ### What changes were proposed in this pull request?
    
    Fixes an error in the SQL migration guide documentation for 
`spark.sql.legacy.doLooseUpcast`. I corrected the config name and moved it to 
the section for migration from Spark 2.4 to 3.0 since it was not made available 
until Spark 3.0.
    
    ### Why are the changes needed?
    
    The config was documented as `spark.sql.legacy.looseUpcast` and is 
inaccurately included in the Spark 2.4 to Spark 2.4.1 section.
    
    I changed the docs to match what is implemented in 
https://github.com/apache/spark/blob/20df062d85e80422a55afae80ddbf2060f26516c/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L3873
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Docs only change
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #44262 from amytsai-stripe/fix-migration-docs-loose-upcast.
    
    Authored-by: Amy Tsai <[email protected]>
    Signed-off-by: Max Gekk <[email protected]>
    (cherry picked from commit bab884082c0f82e3f9053adac6c7e8a3fcfab11c)
    Signed-off-by: Max Gekk <[email protected]>
---
 docs/sql-migration-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 88635ee3d1f..2eba9500e90 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -251,6 +251,8 @@ license: |
 
   - In Spark 3.0, the column metadata will always be propagated in the API 
`Column.name` and `Column.as`. In Spark version 2.4 and earlier, the metadata 
of `NamedExpression` is set as the `explicitMetadata` for the new column at the 
time the API is called, it won't change even if the underlying 
`NamedExpression` changes metadata. To restore the behavior before Spark 3.0, 
you can use the API `as(alias: String, metadata: Metadata)` with explicit 
metadata.
 
+  - When turning a Dataset to another Dataset, Spark will up cast the fields 
in the original Dataset to the type of corresponding fields in the target 
DataSet. In version 2.4 and earlier, this up cast is not very strict, e.g. 
`Seq("str").toDS.as[Int]` fails, but `Seq("str").toDS.as[Boolean]` works and 
throw NPE during execution. In Spark 3.0, the up cast is stricter and turning 
String into something else is not allowed, i.e. `Seq("str").toDS.as[Boolean]` 
will fail during analysis. To res [...]
+
 ### DDL Statements
 
   - In Spark 3.0, when inserting a value into a table column with a different 
data type, the type coercion is performed as per ANSI SQL standard. Certain 
unreasonable type conversions such as converting `string` to `int` and `double` 
to `boolean` are disallowed. A runtime exception is thrown if the value is 
out-of-range for the data type of the column. In Spark version 2.4 and below, 
type conversions during table insertion are allowed as long as they are valid 
`Cast`. When inserting an o [...]
@@ -464,8 +466,6 @@ license: |
     need to specify a value with units like "30s" now, to avoid being 
interpreted as milliseconds; otherwise,
     the extremely short interval that results will likely cause applications 
to fail.
 
-  - When turning a Dataset to another Dataset, Spark will up cast the fields 
in the original Dataset to the type of corresponding fields in the target 
DataSet. In version 2.4 and earlier, this up cast is not very strict, e.g. 
`Seq("str").toDS.as[Int]` fails, but `Seq("str").toDS.as[Boolean]` works and 
throw NPE during execution. In Spark 3.0, the up cast is stricter and turning 
String into something else is not allowed, i.e. `Seq("str").toDS.as[Boolean]` 
will fail during analysis. To res [...]
-
 ## Upgrading from Spark SQL 2.3 to 2.4
 
   - In Spark version 2.3 and earlier, the second parameter to array_contains 
function is implicitly promoted to the element type of first array type 
parameter. This type promotion can be lossy and may cause `array_contains` 
function to return wrong result. This problem has been addressed in 2.4 by 
employing a safer type promotion mechanism. This can cause some change in 
behavior and are illustrated in the table below.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-3.5 updated: [MINOR][DOCS] Fix documentation for `spark.sql.legacy.doLooseUpcast` in SQL migration guide

Reply via email to