This is an automated email from the ASF dual-hosted git repository.

dongjoon-hyun pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
     new 87fdd88  [SPARK-57157] Harden fork CI status workflows against 
run/check-run race and pagination
87fdd88 is described below

commit 87fdd88259d8c25e1c50ab961436bf7297b2085b
Author: Liang-Chi Hsieh <[email protected]>
AuthorDate: Fri May 29 18:42:52 2026 -0700

    [SPARK-57157] Harden fork CI status workflows against run/check-run race 
and pagination
    
    ### What changes were proposed in this pull request?
    
    Port the hardening fixes from Apache Spark (SPARK-57154, SPARK-57155) to 
this
    repository's `notify_test_workflow.yml` and `update_build_status.yml`, 
which were
    introduced here as a copy of Spark's fork-based CI status mechanism 
(SPARK-57153).
    
    `notify_test_workflow.yml`:
    
    1. When listing the fork's workflow runs, instead of blindly taking the most
       recent run (`workflow_runs[0]`) and throwing if its `head_sha` does not 
match
       the PR head SHA, retry (up to 3 times, 3s apart) looking for the run 
whose
       `head_sha` matches the PR head SHA. The listing endpoint orders by most 
recent,
       so the run for the just-pushed SHA may not be registered yet and a stale 
run
       from a previous push could be returned.
    
    2. When resolving the `Run / License Check` check-run id (used only to 
render a
       Check-run view link instead of the Actions view, see SPARK-37879), a 
missing
       check-run no longer throws. The check-run materializes later than the 
workflow
       run, especially when the matrix is queued, so this is now best-effort: 
if it
       cannot be found, the `Build` check is still created pointing at the 
Actions
       run URL.
    
    `update_build_status.yml`:
    
    3. List a commit's check-runs with `github.paginate(..., per_page: 100)` 
instead
       of a single un-paginated request, matching `notify_test_workflow.yml`. 
The
       default page size is 30, so the target `Build` check could fall off the 
first
       page on a SHA that accumulates more check-runs than that.
    
    4. Wrap `JSON.parse(cr.output.text)` in try/catch and `continue` on 
failure, so a
       `Build` check with empty or malformed output text does not abort the 
whole
       scheduled run and block updates for every PR queued behind it.
    
    ### Why are the changes needed?
    
    The race conditions previously left a PR with no `Build` check at all, and 
the
    scheduled updater only syncs existing checks, so the PR had no status 
reported
    until the next push. The pagination and parsing issues silently block status
    updates, leaving PRs stuck in `queued`.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. CI infrastructure only.
    
    ### How was this patch tested?
    
    Static verification: the embedded `actions/github-script` bodies pass
    `node --check`, and the workflow YAML parses.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Code (Claude Opus 4.8)
    
    Closes #699 from viirya/SPARK-57157-harden-ci-status.
    
    Authored-by: Liang-Chi Hsieh <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .github/workflows/notify_test_workflow.yml | 94 +++++++++++++++++++-----------
 .github/workflows/update_build_status.yml  | 36 +++++++++---
 2 files changed, 88 insertions(+), 42 deletions(-)

diff --git a/.github/workflows/notify_test_workflow.yml 
b/.github/workflows/notify_test_workflow.yml
index 03debf0..2a99587 100644
--- a/.github/workflows/notify_test_workflow.yml
+++ b/.github/workflows/notify_test_workflow.yml
@@ -61,22 +61,48 @@ jobs:
             console.log('Ref: ' + context.payload.pull_request.head.ref)
             console.log('SHA: ' + context.payload.pull_request.head.sha)
 
+            const name = 'Build'
+            const head_sha = context.payload.pull_request.head.sha
+            let status = 'queued'
+
             // Wait 3 seconds to make sure the fork repository triggered a 
workflow.
             await new Promise(r => setTimeout(r, 3000))
 
-            let runs
-            try {
-              runs = await github.request(endpoint, params)
-            } catch (error) {
-              console.error(error)
-              // Assume that runs were not found.
+            // The workflow run for this exact SHA may not be registered yet, 
and the
+            // listing endpoint orders by most recent, so blindly taking the 
first run
+            // can return a stale run from a previous push. Re-query a few 
times looking
+            // for the run whose head_sha matches this PR's head SHA before 
giving up.
+            let matched_run
+            let any_runs = false
+            let retryCount = 0
+            while (retryCount < 3) {
+              let runs
+              try {
+                runs = await github.request(endpoint, params)
+              } catch (error) {
+                console.error(error)
+                // Assume that runs were not found.
+              }
+              if (runs && runs.data.workflow_runs.length > 0) {
+                any_runs = true
+                matched_run = runs.data.workflow_runs.find(r => r.head_sha == 
head_sha)
+                if (matched_run) {
+                  break
+                }
+              }
+              retryCount++
+              if (retryCount < 3) {
+                await new Promise(resolve => setTimeout(resolve, 3000))
+              }
             }
 
-            const name = 'Build'
-            const head_sha = context.payload.pull_request.head.sha
-            let status = 'queued'
+            // If we saw runs but none matched the PR head SHA, a newer commit 
was pushed
+            // and a fresh notify run will handle it; nothing useful to report 
for this SHA.
+            if (any_runs && !matched_run) {
+              throw new Error('There was a new unsynced commit pushed. Please 
retrigger the workflow.');
+            }
 
-            if (!runs || runs.data.workflow_runs.length === 0) {
+            if (!matched_run) {
               status = 'completed'
               const conclusion = 'action_required'
 
@@ -108,18 +134,26 @@ jobs:
                 }
               })
             } else {
-              const run_id = runs.data.workflow_runs[0].id
+              const run_id = matched_run.id
 
-              if (runs.data.workflow_runs[0].head_sha != 
context.payload.pull_request.head.sha) {
-                throw new Error('There was a new unsynced commit pushed. 
Please retrigger the workflow.');
-              }
+              const actions_url = 'https://github.com/'
+                + context.payload.pull_request.head.repo.full_name
+                + '/actions/runs/'
+                + run_id
+              console.log('Actions URL: ' + actions_url)
 
-              // Here we get check run ID to provide Check run view instead of 
Actions view, see also SPARK-37879.
+              // Here we get the check run ID to provide a Check run view 
instead of the
+              // Actions view, see also SPARK-37879. The check run may not 
have materialized
+              // yet (it is created later than the workflow run, especially 
when the matrix
+              // is queued), so this is best-effort: if it cannot be found, we 
fall back to
+              // the Actions run URL rather than failing and leaving the PR 
with no Build
+              // check for the scheduled updater to sync.
               let retryCount = 0;
               let check_run_head;
               while (retryCount < 3) {
                 const check_runs = await github.request(check_run_endpoint, 
check_run_params);
-                check_run_head = check_runs.data.check_runs.find(r => r.name 
=== "Run / License Check");
+                check_run_head = check_runs.data.check_runs.find(
+                  r => r.name === "Run / License Check" && r.head_sha == 
head_sha);
                 if (check_run_head) {
                   break;
                 }
@@ -128,26 +162,18 @@ jobs:
                   await new Promise(resolve => setTimeout(resolve, 3000));
                 }
               }
-              if (!check_run_head) {
-                throw new Error('Failed to retrieve check_run_head after 3 
attempts');
-              }
 
-              if (check_run_head.head_sha != 
context.payload.pull_request.head.sha) {
-                throw new Error('There was a new unsynced commit pushed. 
Please retrigger the workflow.');
+              let summary_url = actions_url
+              if (check_run_head) {
+                summary_url = 'https://github.com/'
+                  + context.payload.pull_request.head.repo.full_name
+                  + '/runs/'
+                  + check_run_head.id
+                console.log('Check run URL: ' + summary_url)
+              } else {
+                console.log('Check run not found; falling back to Actions URL: 
' + actions_url)
               }
 
-              const check_run_url = 'https://github.com/'
-                + context.payload.pull_request.head.repo.full_name
-                + '/runs/'
-                + check_run_head.id
-              console.log('Check run URL: ' + check_run_url)
-
-              const actions_url = 'https://github.com/'
-                + context.payload.pull_request.head.repo.full_name
-                + '/actions/runs/'
-                + run_id
-              console.log('Actions URL: ' + actions_url)
-
               github.rest.checks.create({
                 owner: context.repo.owner,
                 repo: context.repo.repo,
@@ -156,7 +182,7 @@ jobs:
                 status: status,
                 output: {
                   title: 'Test results',
-                  summary: '[See test results](' + check_run_url + ')\n\n'
+                  summary: '[See test results](' + summary_url + ')\n\n'
                     + 'If the tests fail for reasons unrelated to this pull 
request, '
                     + 'please rerun the workflow in your forked repository.\n'
                     + 'If the failures are related to this pull request, '
diff --git a/.github/workflows/update_build_status.yml 
b/.github/workflows/update_build_status.yml
index 26ab78f..6b16c59 100644
--- a/.github/workflows/update_build_status.yml
+++ b/.github/workflows/update_build_status.yml
@@ -53,17 +53,37 @@ jobs:
                 console.log('SHA: ' + pr.head.sha)
                 console.log('  Mergeable status: ' + pr.mergeable_state)
                 if (pr.mergeable_state == null || 
maybeReady.includes(pr.mergeable_state)) {
-                  const checkRuns = await github.request('GET 
/repos/{owner}/{repo}/commits/{ref}/check-runs', {
-                    owner: context.repo.owner,
-                    repo: context.repo.repo,
-                    ref: pr.head.sha
-                  })
+                  // Paginate with per_page=100 to match 
notify_test_workflow.yml. The default
+                  // page size is 30, and a SHA can accumulate more check-runs 
than that (CI
+                  // matrix, external checks, duplicate Build checks from 
reopened PRs), which
+                  // could push the target Build check off the first page and 
leave the PR
+                  // stuck in 'queued' forever.
+                  const checkRuns = await github.paginate(
+                    'GET /repos/{owner}/{repo}/commits/{ref}/check-runs',
+                    {
+                      owner: context.repo.owner,
+                      repo: context.repo.repo,
+                      ref: pr.head.sha,
+                      per_page: 100
+                    }
+                  )
 
                   // Iterator GitHub Checks in the PR
-                  for await (const cr of checkRuns.data.check_runs) {
+                  for await (const cr of checkRuns) {
                     if (cr.name == 'Build' && cr.conclusion != 
"action_required") {
-                      // text contains parameters to make request in JSON.
-                      const params = JSON.parse(cr.output.text)
+                      // text contains parameters to make request in JSON. A 
Build check
+                      // created by something other than 
notify_test_workflow.yml (an older
+                      // version, a manual run, or another app) may have empty 
or malformed
+                      // output text; skip it instead of aborting the whole 
scheduled run,
+                      // which would block updates for every PR queued behind 
it.
+                      let params
+                      try {
+                        params = JSON.parse(cr.output.text)
+                      } catch (error) {
+                        console.error('Skipping Build check ' + cr.id + ' with 
unparseable output text')
+                        console.error(error)
+                        continue
+                      }
 
                       // Get the workflow run in the forked repository
                       let run


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to