This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-4.1
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.1 by this push:
new d50e9b7fee5d [SPARK-54454][SQL] Enable variant shredding and variant
logical type annotation configs by default
d50e9b7fee5d is described below
commit d50e9b7fee5d1cdb7452461c1d48074353bac133
Author: Harsh Motwani <[email protected]>
AuthorDate: Fri Nov 28 13:59:06 2025 -0800
[SPARK-54454][SQL] Enable variant shredding and variant logical type
annotation configs by default
### What changes were proposed in this pull request?
This PR enables the annotation of the variant parquet logical type and
shredded writes and reads by default.
### Why are the changes needed?
1. Having variant data annotated with the variant logical type is required
by the parquet variant spec
([source](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#variant-in-parquet)).
This is necessary to adhere to the spec
2. Variant shredding brings in significant performance optimizations over
regular unshredded variants, and should be the default mode.
### Does this PR introduce _any_ user-facing change?
Yes, variant data written by Spark would be annotated with the variant
logical type annotation and variant shredding would be enabled by default.
### How was this patch tested?
Existing tests.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #53164 from harshmotw-db/harshmotw-db/enable_variant_shredding.
Lead-authored-by: Harsh Motwani <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 3a06297fb9a66dca9bd5597630e34b4b057e893f)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 4b82966b2b6d..951bdb30c701 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1598,7 +1598,7 @@ object SQLConf {
"variant logical type.")
.version("4.1.0")
.booleanConf
- .createWithDefault(false)
+ .createWithDefault(true)
val PARQUET_IGNORE_VARIANT_ANNOTATION =
buildConf("spark.sql.parquet.ignoreVariantAnnotation")
@@ -5526,7 +5526,7 @@ object SQLConf {
"requested fields.")
.version("4.0.0")
.booleanConf
- .createWithDefault(false)
+ .createWithDefault(true)
val VARIANT_WRITE_SHREDDING_ENABLED =
buildConf("spark.sql.variant.writeShredding.enabled")
@@ -5534,7 +5534,7 @@ object SQLConf {
.doc("When true, the Parquet writer is allowed to write shredded
variant. ")
.version("4.0.0")
.booleanConf
- .createWithDefault(false)
+ .createWithDefault(true)
val VARIANT_FORCE_SHREDDING_SCHEMA_FOR_TEST =
buildConf("spark.sql.variant.forceShreddingSchemaForTest")
@@ -5567,7 +5567,7 @@ object SQLConf {
.doc("Infer shredding schema when writing Variant columns in Parquet
tables.")
.version("4.1.0")
.booleanConf
- .createWithDefault(false)
+ .createWithDefault(true)
val LEGACY_CSV_ENABLE_DATE_TIME_PARSING_FALLBACK =
buildConf("spark.sql.legacy.csv.enableDateTimeParsingFallback")
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]