This is an automated email from the ASF dual-hosted git repository.
zhengruifeng pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.x by this push:
new 94a0cf7b9fe7 [SPARK-56866][INFRA] Pin downstream actions/checkout to a
single resolved SHA
94a0cf7b9fe7 is described below
commit 94a0cf7b9fe789ce07714200bb245e3febb7626e
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Tue May 19 08:58:23 2026 +0800
[SPARK-56866][INFRA] Pin downstream actions/checkout to a single resolved
SHA
### What changes were proposed in this pull request?
In `.github/workflows/build_and_test.yml`, add a step to the `precondition`
job that captures `git rev-parse HEAD` right after the apache/spark checkout,
exposes it as a `head_sha` output, and switch every downstream
`actions/checkout` from `ref: ${{ inputs.branch }}` to `ref: ${{
needs.precondition.outputs.head_sha }}`. The `precondition` job's own checkout
still resolves `inputs.branch`; the 11 downstream checkouts (`build`,
`infra-image`, `precompile`, `pyspark`, `sparkr`, `buf`, ` [...]
### Why are the changes needed?
Today each `actions/checkout` step independently re-resolves `ref: ${{
inputs.branch }}` (default `master`) at the moment the runner picks it up.
Different jobs in the same workflow run can therefore end up testing different
commits.
**This is a long-standing issue.** `ref: ${{ inputs.branch }}` has been in
`build_and_test.yml` since commit `9e468cf010f` (SPARK-39521, 2022-06-21) —
~3.5 years. The race has existed the entire time. It usually goes unnoticed
because a normal master commit doesn't cross the JVM/Python boundary, so even
when jobs do see different commits the tests stay consistent within each job.
**It becomes a real problem during merge bursts.** Commits per hour on
master vary wildly; release-prep windows, end-of-week merges, and APAC + EU
overlap regularly push 3–6 commits in 20 minutes. The drift window for
`pyspark` jobs is structurally ~17 minutes (`precompile` time) plus runner
queue wait — so during a merge burst the probability that at least one commit
lands inside that window approaches 1. When the unlucky commit happens to add a
tightly-coupled change — new Spark Con [...]
```
[CONNECT_INVALID_PLAN.INVALID_ONE_OF_FIELD_NOT_SET]
The Spark Connect plan is invalid. This oneOf field in
spark.connect.Relation is not set: RELTYPE_NOT_SET
```
Concrete example from 2026-05-14:
- Run
[25835824862](https://github.com/apache/spark/actions/runs/25835824862)
triggered by `e19bc35c` (SPARK-56844) — `pyspark-connect` failed with 19
NEAREST BY errors.
- Run
[25835929554](https://github.com/apache/spark/actions/runs/25835929554)
triggered ~3 minutes later by the next commit `13380e78` (SPARK-56395, which
added the NEAREST BY feature) — same job passed.
The first run's `precompile` checked out `e19bc35c` (no NEAREST BY server
code), but by the time its `pyspark-connect` job actually started 17 minutes
later, master was at `13380e78` and `actions/checkout` resolved that newer
commit (with the new Python test files). Pinning every job to the SHA
`precondition` saw makes this impossible.
The fix is also forward-leaning: as Spark's release cadence and contributor
count grow, the merge-burst probability only goes up; without pinning,
"spurious red CI on the previous PR every time someone merges a Connect
feature" will keep recurring.
### Does this PR introduce _any_ user-facing change?
No. CI infrastructure only.
### How was this patch tested?
YAML syntax validated locally. CI will exercise the change end-to-end.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-opus-4-7)
Closes #55879 from zhengruifeng/ci-pin-checkout-sha.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
(cherry picked from commit 869adad659f8ce5c449daba4123f779f76b41ba6)
Signed-off-by: Ruifeng Zheng <[email protected]>
---
.github/workflows/build_and_test.yml | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)
diff --git a/.github/workflows/build_and_test.yml
b/.github/workflows/build_and_test.yml
index 4d7f246360d9..6c5929ad6ae6 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -65,6 +65,8 @@ jobs:
GITHUB_PREV_SHA: ${{ github.event.before }}
outputs:
required: ${{ steps.set-outputs.outputs.required }}
+ # Pinned so every downstream job checks out the same snapshot, even if
`master` advances mid-run.
+ head_sha: ${{ steps.resolve-sha.outputs.head_sha }}
image_url: ${{ steps.infra-image-outputs.outputs.image_url }}
image_docs_url: ${{
steps.infra-image-docs-outputs.outputs.image_docs_url }}
image_docs_url_link: ${{
steps.infra-image-link.outputs.image_docs_url_link }}
@@ -81,6 +83,9 @@ jobs:
fetch-depth: 0
repository: apache/spark
ref: ${{ inputs.branch }}
+ - name: Resolve apache/spark HEAD SHA
+ id: resolve-sha
+ run: echo "head_sha=$(git rev-parse HEAD)" >> $GITHUB_OUTPUT
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
@@ -346,7 +351,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
@@ -464,7 +469,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
@@ -558,7 +563,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
@@ -680,7 +685,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Add GITHUB_WORKSPACE to git trust safe.directory
run: |
git config --global --add safe.directory ${GITHUB_WORKSPACE}
@@ -830,7 +835,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Add GITHUB_WORKSPACE to git trust safe.directory
run: |
git config --global --add safe.directory ${GITHUB_WORKSPACE}
@@ -919,7 +924,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
@@ -981,7 +986,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Add GITHUB_WORKSPACE to git trust safe.directory
run: |
git config --global --add safe.directory ${GITHUB_WORKSPACE}
@@ -1173,7 +1178,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Add GITHUB_WORKSPACE to git trust safe.directory
run: |
git config --global --add safe.directory ${GITHUB_WORKSPACE}
@@ -1346,7 +1351,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
@@ -1463,7 +1468,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
@@ -1531,7 +1536,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]