This is an automated email from the ASF dual-hosted git repository.
viirya pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.x by this push:
new e3e87efa115c [SPARK-57154][INFRA] Harden `notify_test_workflow`
against fork run/check-run race
e3e87efa115c is described below
commit e3e87efa115c7e166a43e8d52f907f1df2eb1f1d
Author: Liang-Chi Hsieh <[email protected]>
AuthorDate: Fri May 29 17:29:23 2026 -0700
[SPARK-57154][INFRA] Harden `notify_test_workflow` against fork
run/check-run race
### What changes were proposed in this pull request?
This PR makes `notify_test_workflow.yml` resilient to two timing races
between
the upstream `pull_request_target` notify run and the fork's CI:
1. When listing the fork's workflow runs, instead of blindly taking the most
recent run (`workflow_runs[0]`) and throwing if its `head_sha` does not
match
the PR head SHA, the script now retries (up to 3 times, 3s apart)
looking for
the run whose `head_sha` matches the PR head SHA. The listing endpoint
orders
by most recent, so the run for the just-pushed SHA may not be registered
yet
and a stale run from a previous push could be returned.
2. When resolving the `Run / Check changes` check-run id (used only to
render a
Check-run view link instead of the Actions view, see SPARK-37879), a
missing
check-run no longer throws. The check-run materializes later than the
workflow
run, especially when the matrix is queued, so this is now best-effort:
if it
cannot be found, the `Build` check is still created pointing at the
Actions
run URL.
Behavior is otherwise preserved: when no runs exist at all, the
`action_required`
("workflow run detection failed") check is still created; when runs exist
but
none match the PR head SHA, the script still throws so a fresh notify run
handles
the newer commit.
### Why are the changes needed?
Previously these races caused the notify run to `throw`, leaving the PR
with no
`Build` check at all. Because the scheduled `update_build_status.yml` only
syncs
existing `Build` checks, a PR that hit this race had no status reported and
no
way for the updater to recover until the next push. Creating the check
(falling
back to the Actions URL when needed) lets the updater take over.
### Does this PR introduce _any_ user-facing change?
No. CI infrastructure only.
### How was this patch tested?
Static verification: the embedded `actions/github-script` body passes
`node --check`, and the workflow YAML parses. The behavior paths (matching
run
found, runs present but unmatched SHA, no runs, check-run present, check-run
absent) were traced against the existing control flow.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)
Closes #56212 from viirya/SPARK-57154-notify-race.
Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
(cherry picked from commit 316b05b8b9cc6b8335cec387266b43792c9a5481)
Signed-off-by: Liang-Chi Hsieh <[email protected]>
---
.github/workflows/notify_test_workflow.yml | 94 +++++++++++++++++++-----------
1 file changed, 60 insertions(+), 34 deletions(-)
diff --git a/.github/workflows/notify_test_workflow.yml
b/.github/workflows/notify_test_workflow.yml
index b2d0578e7b6c..1ca9e14c408b 100644
--- a/.github/workflows/notify_test_workflow.yml
+++ b/.github/workflows/notify_test_workflow.yml
@@ -61,22 +61,48 @@ jobs:
console.log('Ref: ' + context.payload.pull_request.head.ref)
console.log('SHA: ' + context.payload.pull_request.head.sha)
+ const name = 'Build'
+ const head_sha = context.payload.pull_request.head.sha
+ let status = 'queued'
+
// Wait 3 seconds to make sure the fork repository triggered a
workflow.
await new Promise(r => setTimeout(r, 3000))
- let runs
- try {
- runs = await github.request(endpoint, params)
- } catch (error) {
- console.error(error)
- // Assume that runs were not found.
+ // The workflow run for this exact SHA may not be registered yet,
and the
+ // listing endpoint orders by most recent, so blindly taking the
first run
+ // can return a stale run from a previous push. Re-query a few
times looking
+ // for the run whose head_sha matches this PR's head SHA before
giving up.
+ let matched_run
+ let any_runs = false
+ let retryCount = 0
+ while (retryCount < 3) {
+ let runs
+ try {
+ runs = await github.request(endpoint, params)
+ } catch (error) {
+ console.error(error)
+ // Assume that runs were not found.
+ }
+ if (runs && runs.data.workflow_runs.length > 0) {
+ any_runs = true
+ matched_run = runs.data.workflow_runs.find(r => r.head_sha ==
head_sha)
+ if (matched_run) {
+ break
+ }
+ }
+ retryCount++
+ if (retryCount < 3) {
+ await new Promise(resolve => setTimeout(resolve, 3000))
+ }
}
- const name = 'Build'
- const head_sha = context.payload.pull_request.head.sha
- let status = 'queued'
+ // If we saw runs but none matched the PR head SHA, a newer commit
was pushed
+ // and a fresh notify run will handle it; nothing useful to report
for this SHA.
+ if (any_runs && !matched_run) {
+ throw new Error('There was a new unsynced commit pushed. Please
retrigger the workflow.');
+ }
- if (!runs || runs.data.workflow_runs.length === 0) {
+ if (!matched_run) {
status = 'completed'
const conclusion = 'action_required'
@@ -108,18 +134,26 @@ jobs:
}
})
} else {
- const run_id = runs.data.workflow_runs[0].id
+ const run_id = matched_run.id
- if (runs.data.workflow_runs[0].head_sha !=
context.payload.pull_request.head.sha) {
- throw new Error('There was a new unsynced commit pushed.
Please retrigger the workflow.');
- }
+ const actions_url = 'https://github.com/'
+ + context.payload.pull_request.head.repo.full_name
+ + '/actions/runs/'
+ + run_id
+ console.log('Actions URL: ' + actions_url)
- // Here we get check run ID to provide Check run view instead of
Actions view, see also SPARK-37879.
+ // Here we get the check run ID to provide a Check run view
instead of the
+ // Actions view, see also SPARK-37879. The check run may not
have materialized
+ // yet (it is created later than the workflow run, especially
when the matrix
+ // is queued), so this is best-effort: if it cannot be found, we
fall back to
+ // the Actions run URL rather than failing and leaving the PR
with no Build
+ // check for the scheduled updater to sync.
let retryCount = 0;
let check_run_head;
while (retryCount < 3) {
const check_runs = await github.request(check_run_endpoint,
check_run_params);
- check_run_head = check_runs.data.check_runs.find(r => r.name
=== "Run / Check changes");
+ check_run_head = check_runs.data.check_runs.find(
+ r => r.name === "Run / Check changes" && r.head_sha ==
head_sha);
if (check_run_head) {
break;
}
@@ -128,26 +162,18 @@ jobs:
await new Promise(resolve => setTimeout(resolve, 3000));
}
}
- if (!check_run_head) {
- throw new Error('Failed to retrieve check_run_head after 3
attempts');
- }
- if (check_run_head.head_sha !=
context.payload.pull_request.head.sha) {
- throw new Error('There was a new unsynced commit pushed.
Please retrigger the workflow.');
+ let summary_url = actions_url
+ if (check_run_head) {
+ summary_url = 'https://github.com/'
+ + context.payload.pull_request.head.repo.full_name
+ + '/runs/'
+ + check_run_head.id
+ console.log('Check run URL: ' + summary_url)
+ } else {
+ console.log('Check run not found; falling back to Actions URL:
' + actions_url)
}
- const check_run_url = 'https://github.com/'
- + context.payload.pull_request.head.repo.full_name
- + '/runs/'
- + check_run_head.id
- console.log('Check run URL: ' + check_run_url)
-
- const actions_url = 'https://github.com/'
- + context.payload.pull_request.head.repo.full_name
- + '/actions/runs/'
- + run_id
- console.log('Actions URL: ' + actions_url)
-
github.rest.checks.create({
owner: context.repo.owner,
repo: context.repo.repo,
@@ -156,7 +182,7 @@ jobs:
status: status,
output: {
title: 'Test results',
- summary: '[See test results](' + check_run_url + ')\n\n'
+ summary: '[See test results](' + summary_url + ')\n\n'
+ 'If the tests fail for reasons unrelated to this pull
request, '
+ 'please rerun the workflow in your forked repository.\n'
+ 'If the failures are related to this pull request, '
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]