(spark-connect-swift) branch main updated: [SPARK-57090] Make `Documentation` up-to-date

dongjoon Tue, 26 May 2026 15:48:45 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon-hyun pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-connect-swift.git



The following commit(s) were added to refs/heads/main by this push:
     new ffe3de2  [SPARK-57090] Make `Documentation` up-to-date
ffe3de2 is described below

commit ffe3de220f97cb363d27a1cd091597b96e6b7169
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Tue May 26 15:48:31 2026 -0700

    [SPARK-57090] Make `Documentation` up-to-date
    
    ### What changes were proposed in this pull request?
    
    This PR updates `Sources/SparkConnect/Documentation.docc` to surface the 
public APIs added during SPARK-57044 ~ SPARK-57087.
    
    - Add `Catalog.md`, `DataFrameReader.md`, `DataFrameWriter.md` 
topic-curation pages.
    - Expand the top-level `## Topics` in `SparkConnect.md` to cover all major 
public types.
    - Add missing members (`version`, `newSession()`, `table(_:)`, 
`readStream`, `addArtifact`, `executeCommand`, `time`, `streams`, etc.) to 
`SparkSession.md`.
    - Add a "Catalog Operations" example to `GettingStarted.md`.
    
    ### Why are the changes needed?
    
    The DocC bundle had not been updated for the recently added `Catalog` 
methods and `DataFrameReader` overloads, leaving them either unlisted or 
auto-rendered without category structure.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    - `swift build` passes.
    - Every DocC symbol reference in the new/edited `.md` files was 
cross-checked against the actual public signatures in the source.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Code (claude-opus-4-7)
    
    Closes #396 from dongjoon-hyun/SPARK-57090.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 Sources/SparkConnect/Documentation.docc/Catalog.md | 86 ++++++++++++++++++++++
 .../Documentation.docc/DataFrameReader.md          | 62 ++++++++++++++++
 .../Documentation.docc/DataFrameWriter.md          | 67 +++++++++++++++++
 .../Documentation.docc/GettingStarted.md           | 19 +++++
 .../Documentation.docc/SparkConnect.md             | 27 ++++++-
 .../Documentation.docc/SparkSession.md             | 22 ++++++
 6 files changed, 282 insertions(+), 1 deletion(-)

diff --git a/Sources/SparkConnect/Documentation.docc/Catalog.md 
b/Sources/SparkConnect/Documentation.docc/Catalog.md
new file mode 100644
index 0000000..fb8b338
--- /dev/null
+++ b/Sources/SparkConnect/Documentation.docc/Catalog.md
@@ -0,0 +1,86 @@
+# ``SparkConnect/Catalog``
+
+Interface for managing catalogs, databases, tables, views, functions, 
partitions, and caching.
+
+## Overview
+
+`Catalog` is accessible via ``SparkSession/catalog`` and provides a 
programmatic way to query and manipulate the metadata layer of a Spark cluster 
— including catalog selection, database lifecycle, table/view discovery, 
function lookup, partition recovery, and table-level caching and analysis.
+
+```swift
+let spark = try await SparkSession.builder.getOrCreate()
+
+// Discover and switch
+let dbs = try await spark.catalog.listDatabases()
+try await spark.catalog.setCurrentDatabase("analytics")
+
+// Create / drop
+try await spark.catalog.createDatabase("demo", ifNotExists: true)
+try await spark.catalog.dropDatabase("demo", ifExists: true, cascade: true)
+
+// Inspect a function
+let fn = try await spark.catalog.getFunction("to_date")
+```
+
+## Topics
+
+### Catalog Management
+
+- ``currentCatalog()``
+- ``setCurrentCatalog(_:)``
+- ``listCatalogs(pattern:)``
+
+### Database Operations
+
+- ``currentDatabase()``
+- ``setCurrentDatabase(_:)``
+- ``createDatabase(_:ifNotExists:properties:)``
+- ``dropDatabase(_:ifExists:cascade:)``
+- ``listDatabases(pattern:)``
+- ``getDatabase(_:)``
+- ``databaseExists(_:)``
+
+### Table Operations
+
+- ``listTables(dbName:pattern:)``
+- ``getTable(_:)``
+- ``getTableProperties(_:)``
+- ``getCreateTableString(_:asSerde:)``
+- ``tableExists(_:)``
+- ``tableExists(_:_:)``
+- ``createTable(_:_:source:description:options:)``
+- ``dropTable(_:ifExists:purge:)``
+- ``truncateTable(_:)``
+
+### View Operations
+
+- ``listViews(dbName:pattern:)``
+- ``dropView(_:ifExists:)``
+- ``dropTempView(_:)``
+- ``dropGlobalTempView(_:)``
+
+### Function Operations
+
+- ``listFunctions(dbName:pattern:)``
+- ``getFunction(_:)``
+- ``getFunction(_:_:)``
+- ``functionExists(_:)``
+- ``functionExists(_:_:)``
+
+### Column & Partition Operations
+
+- ``listColumns(_:)``
+- ``listPartitions(_:)``
+- ``recoverPartitions(_:)``
+
+### Caching
+
+- ``cacheTable(_:_:)``
+- ``isCached(_:)``
+- ``uncacheTable(_:)``
+- ``clearCache()``
+- ``refreshTable(_:)``
+- ``refreshByPath(_:)``
+
+### Table Analysis
+
+- ``analyzeTable(_:noScan:)``
diff --git a/Sources/SparkConnect/Documentation.docc/DataFrameReader.md 
b/Sources/SparkConnect/Documentation.docc/DataFrameReader.md
new file mode 100644
index 0000000..8d32426
--- /dev/null
+++ b/Sources/SparkConnect/Documentation.docc/DataFrameReader.md
@@ -0,0 +1,62 @@
+# ``SparkConnect/DataFrameReader``
+
+Interface for loading a ``DataFrame`` from external storage systems.
+
+## Overview
+
+`DataFrameReader` is obtained via ``SparkSession/read``. Configure it with 
``format(_:)``, ``option(_:_:)``, and ``schema(_:)``, then call a 
format-specific loader (e.g., ``csv(_:)``, ``orc(_:)``) or the generic 
``load()`` / ``load(_:)``.
+
+```swift
+// Format-specific reader
+let csvDf = spark.read
+    .option("header", "true")
+    .option("inferSchema", "true")
+    .csv("path/to/data.csv")
+
+// Read from another DataFrame (CSV strings per row)
+let lines: DataFrame = ...
+let parsed = await spark.read.option("header", "true").csv(lines)
+
+// Generic reader
+let df = spark.read
+    .format("orc")
+    .load("path/to/data")
+```
+
+## Topics
+
+### Configuration
+
+- ``format(_:)``
+- ``option(_:_:)``
+- ``schema(_:)``
+
+### Generic Loading
+
+- ``load()``
+- ``load(_:)``
+- ``table(_:)``
+
+### CSV
+
+- ``csv(_:)``
+
+### JSON
+
+- ``json(_:)``
+
+### XML
+
+- ``xml(_:)``
+
+### Parquet
+
+- ``parquet(_:)``
+
+### ORC
+
+- ``orc(_:)``
+
+### JDBC
+
+- ``jdbc(_:_:_:)``
diff --git a/Sources/SparkConnect/Documentation.docc/DataFrameWriter.md 
b/Sources/SparkConnect/Documentation.docc/DataFrameWriter.md
new file mode 100644
index 0000000..0c47d38
--- /dev/null
+++ b/Sources/SparkConnect/Documentation.docc/DataFrameWriter.md
@@ -0,0 +1,67 @@
+# ``SparkConnect/DataFrameWriter``
+
+Interface for writing a ``DataFrame`` to external storage systems.
+
+## Overview
+
+`DataFrameWriter` is obtained via ``DataFrame/write``. Configure it with 
``format(_:)``, ``mode(_:)``, ``option(_:_:)``, and partitioning helpers, then 
call a format-specific writer (e.g., ``orc(_:)``, ``csv(_:)``), ``save()``, 
``saveAsTable(_:)``, or ``insertInto(_:)``.
+
+```swift
+// Format-specific writer
+try await df.write
+    .mode("overwrite")
+    .partitionBy("year", "month")
+    .orc("path/to/output")
+
+// Save as a managed table
+try await df.write
+    .mode("append")
+    .saveAsTable("events")
+```
+
+## Topics
+
+### Configuration
+
+- ``format(_:)``
+- ``mode(_:)``
+- ``option(_:_:)``
+- ``partitionBy(_:)``
+- ``bucketBy(numBuckets:_:)``
+- ``sortBy(_:)``
+- ``clusterBy(_:)``
+
+### Saving Data
+
+- ``save()``
+- ``save(_:)``
+- ``saveAsTable(_:)``
+- ``insertInto(_:)``
+
+### CSV
+
+- ``csv(_:)``
+
+### JSON
+
+- ``json(_:)``
+
+### XML
+
+- ``xml(_:)``
+
+### ORC
+
+- ``orc(_:)``
+
+### Parquet
+
+- ``parquet(_:)``
+
+### Text
+
+- ``text(_:)``
+
+### JDBC
+
+- ``jdbc(_:_:_:)``
diff --git a/Sources/SparkConnect/Documentation.docc/GettingStarted.md 
b/Sources/SparkConnect/Documentation.docc/GettingStarted.md
index 7397690..78e3300 100644
--- a/Sources/SparkConnect/Documentation.docc/GettingStarted.md
+++ b/Sources/SparkConnect/Documentation.docc/GettingStarted.md
@@ -99,3 +99,22 @@ csvDf.write
     .mode("overwrite")
     .orc("path/to/output")
 ```
+
+### 5. Catalog Operations
+
+```swift
+// Create / drop databases
+try await spark.catalog.createDatabase("demo", ifNotExists: true)
+
+// Discover tables, views, and functions
+let tables = try await spark.catalog.listTables(pattern: "*")
+let views  = try await spark.catalog.listViews()
+let funcs  = try await spark.catalog.listFunctions(pattern: "to_*")
+
+// Inspect a specific function
+let fn = try await spark.catalog.getFunction("to_date")
+
+// Partition maintenance and table statistics
+try await spark.catalog.recoverPartitions("my_partitioned_table")
+try await spark.catalog.analyzeTable("my_table", noScan: true)
+```
diff --git a/Sources/SparkConnect/Documentation.docc/SparkConnect.md 
b/Sources/SparkConnect/Documentation.docc/SparkConnect.md
index 6c1f49d..2aff7da 100644
--- a/Sources/SparkConnect/Documentation.docc/SparkConnect.md
+++ b/Sources/SparkConnect/Documentation.docc/SparkConnect.md
@@ -18,8 +18,33 @@ SparkConnect is a modern Swift library that provides a 
native interface to Apach
 ### Getting Started
 
 - <doc:GettingStarted>
+- <doc:Examples>
+
+### Sessions
+
 - ``SparkSession``
 
-### DataFrame Operations
+### DataFrames
 
 - ``DataFrame``
+- ``GroupedData``
+- ``Row``
+- ``StorageLevel``
+
+### Data I/O
+
+- ``DataFrameReader``
+- ``DataFrameWriter``
+- ``MergeIntoWriter``
+
+### Catalog & Configuration
+
+- ``Catalog``
+- ``RuntimeConf``
+
+### Streaming
+
+- ``DataStreamReader``
+- ``DataStreamWriter``
+- ``StreamingQuery``
+- ``StreamingQueryManager``
diff --git a/Sources/SparkConnect/Documentation.docc/SparkSession.md 
b/Sources/SparkConnect/Documentation.docc/SparkSession.md
index 7c2482c..914a686 100644
--- a/Sources/SparkConnect/Documentation.docc/SparkSession.md
+++ b/Sources/SparkConnect/Documentation.docc/SparkSession.md
@@ -34,17 +34,25 @@ let csvDf = spark.read.csv("path/to/file.csv")
 ### Creating Sessions
 
 - ``builder``
+- ``newSession()``
 - ``stop()``
 
+### Session Information
+
+- ``version``
+
 ### DataFrame Operations
 
 - ``emptyDataFrame``
+- ``range(_:)``
 - ``range(_:_:_:)``
 - ``sql(_:)``
+- ``table(_:)``
 
 ### Data I/O
 
 - ``read``
+- ``readStream``
 
 ### Configuration
 
@@ -63,3 +71,17 @@ let csvDf = spark.read.csv("path/to/file.csv")
 - ``interruptAll()``
 - ``interruptTag(_:)``
 - ``interruptOperation(_:)``
+
+### Artifacts & External Commands
+
+- ``addArtifact(_:)``
+- ``addArtifacts(_:)``
+- ``executeCommand(_:_:_:)``
+
+### Streaming
+
+- ``streams``
+
+### Utilities
+
+- ``time(_:)``


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark-connect-swift) branch main updated: [SPARK-57090] Make `Documentation` up-to-date

Reply via email to