(spark-connect-swift) branch main updated: [SPARK-57146] Add `AGENTS.md` and symlink `CLAUDE.md` to it

dongjoon Fri, 29 May 2026 10:04:32 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon-hyun pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-connect-swift.git



The following commit(s) were added to refs/heads/main by this push:
     new 1282233  [SPARK-57146] Add `AGENTS.md` and symlink `CLAUDE.md` to it
1282233 is described below

commit 12822332efe87759df51ea750a828d862d76c6e0
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Fri May 29 10:04:17 2026 -0700

    [SPARK-57146] Add `AGENTS.md` and symlink `CLAUDE.md` to it
    
    ### What changes were proposed in this pull request?
    
    This PR adds `AGENTS.md`, a guide for AI coding agents (project overview, 
layout, build/test, conventions, contribution workflow), and adds `CLAUDE.md` 
as a relative symlink to it so both share one source of truth.
    
    ### Why are the changes needed?
    
    To give AI coding agents accurate, project-specific context in a single 
place.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Documentation-only change. Verified the `CLAUDE.md` symlink resolves to 
`AGENTS.md`.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Code (Opus 4.8)
    
    Closes #398 from dongjoon-hyun/SPARK-57146.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 AGENTS.md | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 CLAUDE.md |   1 +
 2 files changed, 129 insertions(+)

diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 0000000..9ef64b4
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,128 @@
+# Agent Guide — Apache Spark Connect Client for Swift
+
+Guidance for AI coding agents working in this repository. For end-user usage,
+see [README.md](README.md).
+
+## Project overview
+
+`SparkConnect` is a modern Swift client library for the
+[Spark 
Connect](https://spark.apache.org/docs/latest/spark-connect-overview.html)
+protocol, developed as a subproject of Apache Spark. It lets Swift applications
+drive a remote Apache Spark cluster (DataFrame/SQL operations, streaming,
+catalog, ML) over gRPC, exchanging results in Apache Arrow format.
+
+- Single library product: `SparkConnect`.
+- Public API mirrors PySpark/Spark SQL (`SparkSession`, `DataFrame`,
+  `DataFrameReader`/`Writer`, `Catalog`, `RuntimeConf`, streaming, etc.).
+- License: Apache-2.0. JIRA project: `SPARK` (issues.apache.org/jira).
+
+## Tech stack & requirements
+
+- **Swift 6.3.2+** with Swift Package Manager (`swift-tools-version: 6.3.2`).
+- **Platforms**: macOS 15+, iOS 18+, watchOS 11+, tvOS 18+, and Linux
+  (built and tested on Ubuntu x86_64 & arm64 with the Swift 6.3.2 toolchain).
+- **Server**: Apache Spark 4.x Connect server (tested against 4.0.2 / 4.1.2 / 
4.2.0-preview).
+- **Dependencies** (all pinned with `exact` in [Package.swift](Package.swift)):
+  `grpc-swift-2`, `grpc-swift-protobuf`, `grpc-swift-nio-transport`,
+  `flatbuffers`, `swift-system`. Keep version bumps as `exact` pins, never
+  `branch`/`from` (see `git log` for `SPARK-57094`).
+
+## Repository layout
+
+- `Sources/SparkConnect/` — the library.
+  - Hand-written API: `SparkSession.swift`, `DataFrame*.swift`, 
`Catalog.swift`,
+    `RuntimeConf.swift`, `DataStream*.swift`, `StreamingQuery*.swift`,
+    `MergeIntoWriter.swift`, `SparkConnectClient.swift`, etc.
+  - **Generated — do not hand-edit**: `*.pb.swift` (protobuf), 
`base.grpc.swift`
+    (gRPC), `Flight.pb.swift`, and FlatBuffers `*_generated.swift`
+    (`File_generated`, `Message_generated`, `Schema_generated`, 
`Tensor_generated`,
+    `SparseTensor_generated`). These derive from upstream Spark Connect protos
+    and Arrow schemas.
+  - `Arrow*.swift` — a vendored Arrow implementation tracking
+    [apache/arrow-swift](https://github.com/apache/arrow-swift); treat as 
upstream.
+  - `Documentation.docc/` — DocC docs (published to Swift Package Index).
+- `Tests/SparkConnectTests/` — test suite (Swift Testing). `Resources/queries/`
+  holds golden SQL result files.
+- `Examples/` — runnable sample apps (`pi`, `spark-sql`, `stream`, `web`, 
`app`,
+  `pyspark-connect`), each with its own `Package.swift` and `Dockerfile`.
+- `dev/` — Python maintainer scripts (JIRA + PR merge tooling).
+- `.github/workflows/build_and_test.yml` — CI: license check, multi-platform
+  build, and integration tests.
+
+## Build
+
+```bash
+swift build                 # debug build
+swift build -c release      # what CI builds
+```
+
+## Test
+
+Tests are **integration tests**: they require a live Spark Connect server and
+run **serially**. Without a reachable server they will fail.
+
+```bash
+# Build the test target without running (quick compile check):
+swift test --filter NOTHING -c release
+
+# Run the full suite (needs a running server, see below):
+swift test --no-parallel -c release
+
+# Run a single suite:
+swift test --no-parallel --filter DataFrameTests
+```
+
+Start a local Spark Connect server first (default endpoint 
`sc://localhost:15002`):
+
+```bash
+docker run -it --rm -p 15002:15002 -e SPARK_NO_DAEMONIZE=1 \
+  apache/spark:4.2.0-preview5 bash -c /opt/spark/sbin/start-connect-server.sh
+```
+
+Environment variables that gate behavior:
+
+- `SPARK_REMOTE` — Spark Connect connection string (default 
`sc://localhost:15002`).
+- `SPARK_CONNECT_AUTHENTICATE_TOKEN` — exercises bearer-token auth.
+- `SPARK_ICEBERG_TEST_ENABLED` — enables Iceberg tests (server needs Iceberg 
packages).
+- `SPARK_GENERATE_GOLDEN_FILES` — regenerates golden files under 
`Tests/.../Resources/queries`.
+
+Use the Swift Testing framework (`import Testing`, `@Test`, `@Suite`,
+`#expect`/`#require`) — **not** XCTest. Tests use `@testable import 
SparkConnect`,
+and `SQLHelper` provides `withTable`/`withDatabase` scoping helpers.
+
+## Coding conventions
+
+- **ASF license header is mandatory** on source files. CI's "License Check"
+  (`skywalking-eyes`) enforces it. Swift files use the `//`-style header — copy
+  it from the top of any existing `.swift` file. Exempt paths (see
+  `.github/.licenserc.yaml`): `**/*.md`, `Package.swift`, `**/*pb.swift`,
+  `.github/**`, `Tests/.../Resources/queries/**`, `LICENSE`, `NOTICE`, 
`.asf.yaml`.
+- Match the surrounding style: 2-space indentation, no reformatting of 
untouched code.
+- Markdown: `markdownlint` config in `.markdownlint.yaml` (only 
`MD013`/line-length
+  is disabled).
+- Follow the project's own coding philosophy — minimal, surgical changes; no
+  speculative abstractions; every changed line should trace to the task.
+
+## Contribution workflow
+
+This repo follows standard Apache Spark process.
+
+- **One JIRA per change.** Titles use `[SPARK-XXXXX] Summary` for both commits 
and
+  PRs (PRs auto-link `SPARK-` ids via `.asf.yaml`).
+- **PR description in English**, filling out the
+  [.github/PULL_REQUEST_TEMPLATE](.github/PULL_REQUEST_TEMPLATE) sections: 
*What
+  changes*, *Why*, *Does this PR introduce any user-facing change*, *How was 
this
+  patch tested*, and the generative-AI disclosure.
+- **Merge style**: squash or rebase only — merge commits are disabled.
+- Maintainer scripts in `dev/` (require `JIRA_ACCESS_TOKEN`):
+  - `python dev/create_jira_and_branch.py "Title" [-p PARENT] [-t TYPE] [-v 
VERSION]`
+    — creates a SPARK JIRA issue and a local branch named after its id.
+  - `python dev/merge_spark_pr.py` — committer PR-merge tool.
+
+## Gotchas
+
+- Never edit generated `*.pb.swift` / `*_generated.swift` / `base.grpc.swift`
+  by hand — changes belong upstream (Spark Connect protos / Arrow).
+- Test failures are usually "no server" or version mismatch, not code bugs —
+  confirm a Spark Connect server is running and reachable at `SPARK_REMOTE`.
+- Keep dependency versions as `exact` pins.
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 120000
index 0000000..47dc3e3
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark-connect-swift) branch main updated: [SPARK-57146] Add `AGENTS.md` and symlink `CLAUDE.md` to it

Reply via email to