rahulsmahadev opened a new pull request, #16751:
URL: https://github.com/apache/iceberg/pull/16751

   ## Summary
   
   First step toward Spark 4.2 support: this PR adds `spark/v4.2` as a 
**mechanical, byte-identical copy of `spark/v4.1`** — zero content changes, no 
version bumps, no build wiring. `spark/v4.1` is untouched.
   
   ## Why a copy-only PR
   
   This is intentionally split into two PRs so the follow-up PR containing the 
actual Spark-4.2-specific changes (version bumps, API fixes, build wiring) has 
a small, reviewable diff instead of being buried in ~150k lines of copied code.
   
   Because the copy is byte-identical, git's copy/rename detection (`git log 
--follow -C`, `git blame -C -C`) links every `spark/v4.2` file back to its full 
`v4.1`/`v4.0`/`v3.5` history — and this holds even under squash-merge, which 
does not preserve the `git mv` + copy-back commit pairs used previously. 
Verified on this branch: `git blame -C -C` on 
`spark/v4.2/.../SparkCatalog.java` attributes lines to the original 2020–2024 
commits, not to the copy commit.
   
   Precedent note: Spark 4.0 (#13059) and Spark 4.1 (#14155) were introduced as 
`Move X as Y` + `Copy back Y as X` + `Initial support` commit triplets, 
rebase-merged to preserve the rename pair. This PR deliberately uses a plain 
byte-identical copy instead, which achieves the same history preservation 
independent of merge strategy; the "initial support" content will come as the 
follow-up PR. Related: #14984 takes the established single-PR approach for 
4.2.0 (RC) — happy to coordinate or defer to whichever structure maintainers 
prefer.
   
   ## Build impact: none
   
   The new directory is invisible to the build until explicitly registered:
   
   - `gradle.properties` gates versions via 
`systemProp.knownSparkVersions=3.5,4.0,4.1` (and `defaultSparkVersions=4.1`) — 
`4.2` is not listed.
   - `settings.gradle` only includes spark subprojects inside explicit `if 
(sparkVersions.contains("X"))` blocks; there is no globbing of `spark/*`.
   - `spark/build.gradle` only does `apply from: 
file("$projectDir/vX/build.gradle")` for enabled versions, so 
`spark/v4.2/build.gradle` is never applied.
   - The Spark CI matrix is hardcoded to `spark: ['3.5', '4.0', '4.1']`.
   
   CI and releases are therefore unaffected until the follow-up PR wires the 
module up. The RAT license check passes since every file is an identical copy 
of an already-licensed file (`dev/.rat-excludes` is glob-based, not 
path-specific).
   
   ## Verification
   
   - `diff -r spark/v4.1 spark/v4.2` → empty (exit 0)
   - File counts equal: 627 files in `spark/v4.1`, 627 in `spark/v4.2`, no 
symlinks, all tracked
   - Single commit touching only `spark/v4.2/**` (627 files, +149,822 lines)
   
   ## Follow-up PR
   
   Spark-4.2-specific changes come next, mirroring the v4.1 "initial support" 
commit: add `4.2` to `knownSparkVersions`/`defaultSparkVersions`, 
`settings.gradle` + `spark/build.gradle` + `jmh.gradle` wiring, 
`gradle/libs.versions.toml` entries, `.github/workflows/spark-ci.yml` matrix, 
`.gitignore` benchmark paths, `dev/stage-binaries.sh`, version-string bumps 
inside `spark/v4.2`, and any API fixes Spark 4.2 requires.
   
   This pull request and its description were written by Isaac.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to