This is an automated email from the ASF dual-hosted git repository.
zhengruifeng pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.x by this push:
new b6b30387444c [SPARK-57072][PYTHON][DOC] Add missing 4.2 methods to
PySpark API reference
b6b30387444c is described below
commit b6b30387444ca0a5868c732911b12d99a94a404b
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Wed May 27 13:33:39 2026 +0800
[SPARK-57072][PYTHON][DOC] Add missing 4.2 methods to PySpark API reference
### What changes were proposed in this pull request?
Add public PySpark APIs that were added in Spark 4.2 but missing from the
rendered Python API reference. This PR is documentation-only.
`python/docs/source/reference/pyspark.sql/dataframe.rst`:
- `DataFrame.zipWithIndex`
`python/docs/source/reference/pyspark.sql/datasource.rst`:
- `DataSourceStreamReader.getDefaultReadLimit`
- `DataSourceStreamReader.reportLatestOffset`
`python/docs/source/reference/pyspark.sql/io.rst`:
- `DataFrameReader.changes`
`python/docs/source/reference/pyspark.ss/io.rst`:
- `DataStreamReader.changes`
- `DataStreamReader.name`
### Why are the changes needed?
All of the above are public, marked `.. versionadded:: 4.2.0`, and
reachable through their respective public modules, but the autosummary entries
were never added so they do not appear in the rendered API reference.
Original JIRAs:
- `DataFrame.zipWithIndex` — SPARK-55229 / SPARK-55231
- `DataSourceStreamReader.getDefaultReadLimit` / `reportLatestOffset` —
SPARK-55304
- `DataFrameReader.changes` / `DataStreamReader.changes` — SPARK-55950
- `DataStreamReader.name` — SPARK-55121
### Does this PR introduce _any_ user-facing change?
Documentation-only change; the methods themselves are unchanged.
### How was this patch tested?
Docs-only change. New entries inserted alphabetically within each
autosummary block (`DataFrame.zipWithIndex` is appended after the existing
trailing `DataFrame.pandas_api` since it is alphabetically last).
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (model: claude-opus-4-7)
Closes #56116 from zhengruifeng/spark-doc-methods-dev2.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
(cherry picked from commit 64a8b517dee5d88a09add97dc04416e2a32614b3)
Signed-off-by: Ruifeng Zheng <[email protected]>
---
python/docs/source/reference/pyspark.sql/dataframe.rst | 1 +
python/docs/source/reference/pyspark.sql/datasource.rst | 2 ++
python/docs/source/reference/pyspark.sql/io.rst | 1 +
python/docs/source/reference/pyspark.ss/io.rst | 2 ++
4 files changed, 6 insertions(+)
diff --git a/python/docs/source/reference/pyspark.sql/dataframe.rst
b/python/docs/source/reference/pyspark.sql/dataframe.rst
index 91cf0961318b..e61100435664 100644
--- a/python/docs/source/reference/pyspark.sql/dataframe.rst
+++ b/python/docs/source/reference/pyspark.sql/dataframe.rst
@@ -141,6 +141,7 @@ DataFrame
DataFrame.writeTo
DataFrame.mergeInto
DataFrame.pandas_api
+ DataFrame.zipWithIndex
DataFrameNaFunctions.drop
DataFrameNaFunctions.fill
DataFrameNaFunctions.replace
diff --git a/python/docs/source/reference/pyspark.sql/datasource.rst
b/python/docs/source/reference/pyspark.sql/datasource.rst
index 453875de9336..bb52ef26d94f 100644
--- a/python/docs/source/reference/pyspark.sql/datasource.rst
+++ b/python/docs/source/reference/pyspark.sql/datasource.rst
@@ -35,10 +35,12 @@ Python Data Source
DataSourceReader.read
DataSourceRegistration.register
DataSourceStreamReader.commit
+ DataSourceStreamReader.getDefaultReadLimit
DataSourceStreamReader.initialOffset
DataSourceStreamReader.latestOffset
DataSourceStreamReader.partitions
DataSourceStreamReader.read
+ DataSourceStreamReader.reportLatestOffset
DataSourceStreamReader.stop
DataSourceWriter.abort
DataSourceWriter.commit
diff --git a/python/docs/source/reference/pyspark.sql/io.rst
b/python/docs/source/reference/pyspark.sql/io.rst
index 0554e4bea89d..3aafb9571314 100644
--- a/python/docs/source/reference/pyspark.sql/io.rst
+++ b/python/docs/source/reference/pyspark.sql/io.rst
@@ -24,6 +24,7 @@ Input/Output
.. autosummary::
:toctree: api/
+ DataFrameReader.changes
DataFrameReader.csv
DataFrameReader.format
DataFrameReader.jdbc
diff --git a/python/docs/source/reference/pyspark.ss/io.rst
b/python/docs/source/reference/pyspark.ss/io.rst
index 7a20777fdc7c..38e15cb23f89 100644
--- a/python/docs/source/reference/pyspark.ss/io.rst
+++ b/python/docs/source/reference/pyspark.ss/io.rst
@@ -25,10 +25,12 @@ Input/Output
.. autosummary::
:toctree: api/
+ DataStreamReader.changes
DataStreamReader.csv
DataStreamReader.format
DataStreamReader.json
DataStreamReader.load
+ DataStreamReader.name
DataStreamReader.option
DataStreamReader.options
DataStreamReader.orc
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]