Re: [PR] improve spark 4.1.0 release notes [spark-website]

via GitHub Wed, 07 Jan 2026 18:54:34 -0800


pan3793 commented on code in PR #657:
URL: https://github.com/apache/spark-website/pull/657#discussion_r2670684966



##########
releases/_posts/2025-12-16-spark-release-4.1.0.md:
##########
@@ -11,8 +11,444 @@ meta:
   _wpas_done_all: '1'
 ---
 
-Apache Spark 4.1.0 is a new feature release. It introduces new functionality 
and improvements. We encourage users to try it and provide feedback.
+Apache Spark 4.1.0 is the second release in the 4.x series. With significant 
contributions from the open-source community, this release addressed over 1,800 
Jira tickets with contributions from more than 230 individuals.
 
-You can find the list of resolved issues and detailed changes in the [JIRA 
release 
notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12355581).
+This release continues the Spark 4.x momentum and focuses on higher-level data 
engineering, lower-latency streaming, faster and easier PySpark, and a more 
capable SQL surface.
 
-We would like to acknowledge all community members for contributing patches 
and features to this release.
+This release adds Spark Declarative Pipelines (SDP): A new declarative 
framework where you define datasets and queries, and Spark handles the 
execution graph, dependency ordering, parallelism, checkpoints, and retries.
+
+This release supports Structured Streaming Real-Time Mode (RTM): First 
official support for Structured Streaming queries running in real-time mode for 
continuous, sub-second latency processing. For stateless tasks, latency can 
even drop to single-digit milliseconds.
+
+PySpark UDFs and Data Sources have been improved: New Arrow-native UDF and 
UDTF decorators for efficient PyArrow execution without Pandas conversion 
overhead, plus Python Data Source filter pushdown to reduce data movement.
+
+Spark ML on Connect is GA for the Python client, with smarter model caching 
and memory management. Spark 4.1 also improves stability for large workloads 
with zstd-compressed protobuf plans, chunked Arrow result streaming, and 
enhanced support for large local relations.
+
+SQL Scripting is GA and enabled by default, with improved error handling and 
cleaner declarations. VARIANT is GA with shredding for faster reads on 
semi-structured data, plus recursive CTE support and new approximate data 
sketches (KLL and Theta).
+
+To download Apache Spark 4.1.0, please visit the 
[downloads](https://spark.apache.org/downloads.html) page. For [detailed 
changes](https://issues.apache.org/jira/projects/SPARK/versions/12355581), you 
can consult JIRA. We have also curated a list of high-level changes here, 
grouped by major components.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+
+### Highlights
+- **[[SPARK-51727]](https://issues.apache.org/jira/browse/SPARK-51727)** A new 
component to define and run data pipelines: **Declarative Pipelines**

Review Comment:
   Could you revise all the Highlights' ticket title to make sure to use a 
unified and pretty format? e.g., this one should be "SPIP: Declarative 
Pipelines"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] improve spark 4.1.0 release notes [spark-website]

Reply via email to