rahulsmahadev opened a new pull request, #16751:
URL: https://github.com/apache/iceberg/pull/16751
## Summary
First step toward Spark 4.2 support: this PR adds `spark/v4.2` as a
**mechanical, byte-identical copy of `spark/v4.1`** — zero content changes, no
version bumps, no build wiring. `spark/v4.1` is untouched.
## Why a copy-only PR
This is intentionally split into two PRs so the follow-up PR containing the
actual Spark-4.2-specific changes (version bumps, API fixes, build wiring) has
a small, reviewable diff instead of being buried in ~150k lines of copied code.
Because the copy is byte-identical, git's copy/rename detection (`git log
--follow -C`, `git blame -C -C`) links every `spark/v4.2` file back to its full
`v4.1`/`v4.0`/`v3.5` history — and this holds even under squash-merge, which
does not preserve the `git mv` + copy-back commit pairs used previously.
Verified on this branch: `git blame -C -C` on
`spark/v4.2/.../SparkCatalog.java` attributes lines to the original 2020–2024
commits, not to the copy commit.
Precedent note: Spark 4.0 (#13059) and Spark 4.1 (#14155) were introduced as
`Move X as Y` + `Copy back Y as X` + `Initial support` commit triplets,
rebase-merged to preserve the rename pair. This PR deliberately uses a plain
byte-identical copy instead, which achieves the same history preservation
independent of merge strategy; the "initial support" content will come as the
follow-up PR. Related: #14984 takes the established single-PR approach for
4.2.0 (RC) — happy to coordinate or defer to whichever structure maintainers
prefer.
## Build impact: none
The new directory is invisible to the build until explicitly registered:
- `gradle.properties` gates versions via
`systemProp.knownSparkVersions=3.5,4.0,4.1` (and `defaultSparkVersions=4.1`) —
`4.2` is not listed.
- `settings.gradle` only includes spark subprojects inside explicit `if
(sparkVersions.contains("X"))` blocks; there is no globbing of `spark/*`.
- `spark/build.gradle` only does `apply from:
file("$projectDir/vX/build.gradle")` for enabled versions, so
`spark/v4.2/build.gradle` is never applied.
- The Spark CI matrix is hardcoded to `spark: ['3.5', '4.0', '4.1']`.
CI and releases are therefore unaffected until the follow-up PR wires the
module up. The RAT license check passes since every file is an identical copy
of an already-licensed file (`dev/.rat-excludes` is glob-based, not
path-specific).
## Verification
- `diff -r spark/v4.1 spark/v4.2` → empty (exit 0)
- File counts equal: 627 files in `spark/v4.1`, 627 in `spark/v4.2`, no
symlinks, all tracked
- Single commit touching only `spark/v4.2/**` (627 files, +149,822 lines)
## Follow-up PR
Spark-4.2-specific changes come next, mirroring the v4.1 "initial support"
commit: add `4.2` to `knownSparkVersions`/`defaultSparkVersions`,
`settings.gradle` + `spark/build.gradle` + `jmh.gradle` wiring,
`gradle/libs.versions.toml` entries, `.github/workflows/spark-ci.yml` matrix,
`.gitignore` benchmark paths, `dev/stage-binaries.sh`, version-string bumps
inside `spark/v4.2`, and any API fixes Spark 4.2 requires.
This pull request and its description were written by Isaac.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]