This is an automated email from the ASF dual-hosted git repository.

zhengruifeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 5ec09fee3bf2 [SPARK-57142][INFRA] Share SBT precompile artifact with 
tpcds-1g CI job
5ec09fee3bf2 is described below

commit 5ec09fee3bf2696e672fe1f9cd141a60c20bcd25
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Thu Jun 4 13:38:23 2026 +0800

    [SPARK-57142][INFRA] Share SBT precompile artifact with tpcds-1g CI job
    
    ### What changes were proposed in this pull request?
    
    This PR wires the `tpcds-1g` job in `.github/workflows/build_and_test.yml` 
to consume the shared `precompile` artifact, extending the pattern already 
applied to `docker-integration-tests` and `k8s-integration-tests` 
([SPARK-57069](https://issues.apache.org/jira/browse/SPARK-57069); parent 
[SPARK-56830](https://issues.apache.org/jira/browse/SPARK-56830)).
    
    Concretely:
    
    - The `precompile` job's `if:` gate is extended to also fire when `tpcds-1g 
== 'true'` in the precondition output, so the artifact is available whenever 
the job runs.
    - `tpcds-1g`:
      - `needs: precondition` -> `needs: [precondition, precompile]`
      - `if:` extended with `(!cancelled()) &&` so the job still runs if 
precompile is cancelled.
      - Adds "Download precompiled artifact" + "Extract precompiled artifact" 
steps after Java install, with graceful fallback (`continue-on-error: true`).
    
    The `tpcds-1g` job drives SBT directly via `build/sbt "sql/testOnly ..."` 
(and `build/sbt "sql/Test/runMain org.apache.spark.sql.GenTPCDSData ..."` on a 
TPC-DS data cache miss), so it does not go through `dev/run-tests.py` and needs 
no `SKIP_SCALA_BUILD` flag -- the same situation as `k8s-integration-tests`. 
The first SBT invocation otherwise compiles `sql/core` (main + test) from 
scratch. The `precompile` job already runs `Test/package`, which compiles the 
`sql/core` test classes thi [...]
    
    ### Optional: graceful fallback if precompile fails
    
    Same pattern as the prior consumers:
    - `precompile` keeps `continue-on-error: true`.
    - The "Download precompiled artifact" step is gated on 
`needs.precompile.result == 'success'` and has `continue-on-error: true`.
    - "Extract precompiled artifact" is gated on the download succeeding and 
has `continue-on-error: true`.
    - If extraction fails or the artifact is missing, SBT compiles from scratch 
exactly as before.
    
    Worst case is degraded to the pre-PR behavior, not a workflow failure.
    
    Note: the existing `# Any TPC-DS related updates on this job need to be 
applied to tpcds-1g-gen job of benchmark.yml as well` comment refers to TPC-DS 
data-generation parameters (scale factor, `tpcds-kit` ref, `GenTPCDSData` 
args). This PR changes none of those -- it only adds build-artifact reuse, and 
`benchmark.yml` is a standalone workflow with no shared `precompile` job -- so 
no corresponding change is needed there.
    
    ### Why are the changes needed?
    
    Today every run of `build_and_test.yml` that requires `tpcds-1g` re-runs 
the same `sql/core` SBT compile that the `precompile` job already produced for 
`pyspark` / `sparkr` / `build` / docker / k8s. Wiring `tpcds-1g` to the 
existing artifact removes that duplicate compile for free (precompile is 
already running).
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. CI infrastructure change only.
    
    ### How was this patch tested?
    
    The change is exercised by the CI run of this PR itself. The 
Download/Extract steps log the artifact size; if the precompile job is forced 
to fail (or its artifact is missing), the job falls back to the original local 
SBT build.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Code (Opus 4.7)
    
    Closes #56200 from zhengruifeng/precompile-tpcds-ci-share-dev5.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 .github/workflows/build_and_test.yml | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 860ef27447f9..cda9636a92e4 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -580,7 +580,8 @@ jobs:
         fromJson(needs.precondition.outputs.required).pyspark-install == 
'true' ||
         fromJson(needs.precondition.outputs.required).sparkr == 'true' ||
         fromJson(needs.precondition.outputs.required).docker-integration-tests 
== 'true' ||
-        fromJson(needs.precondition.outputs.required).k8s-integration-tests == 
'true')
+        fromJson(needs.precondition.outputs.required).k8s-integration-tests == 
'true' ||
+        fromJson(needs.precondition.outputs.required).tpcds-1g == 'true')
     name: "Precompile Spark"
     runs-on: ubuntu-latest
     timeout-minutes: 60
@@ -1450,8 +1451,8 @@ jobs:
 
   # Any TPC-DS related updates on this job need to be applied to tpcds-1g-gen 
job of benchmark.yml as well
   tpcds-1g:
-    needs: precondition
-    if: fromJson(needs.precondition.outputs.required).tpcds-1g == 'true'
+    needs: [precondition, precompile]
+    if: (!cancelled()) && 
fromJson(needs.precondition.outputs.required).tpcds-1g == 'true'
     name: Run TPC-DS queries with SF=1
     runs-on: ubuntu-latest
     timeout-minutes: 120
@@ -1494,6 +1495,20 @@ jobs:
       with:
         distribution: zulu
         java-version: ${{ inputs.java }}
+    - name: Download precompiled artifact
+      id: download-precompiled
+      if: needs.precompile.result == 'success'
+      continue-on-error: true
+      uses: actions/download-artifact@v8
+      with:
+        name: spark-compile-${{ inputs.branch }}-${{ github.run_id }}
+    - name: Extract precompiled artifact
+      id: extract-precompiled
+      if: steps.download-precompiled.outcome == 'success'
+      continue-on-error: true
+      run: |
+        tar -xzf compile-artifact.tar.gz
+        rm compile-artifact.tar.gz
     - name: Cache TPC-DS generated data
       id: cache-tpcds-sf-1
       uses: actions/cache@v5


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to