This is an automated email from the ASF dual-hosted git repository.
zhengruifeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 5ec09fee3bf2 [SPARK-57142][INFRA] Share SBT precompile artifact with
tpcds-1g CI job
5ec09fee3bf2 is described below
commit 5ec09fee3bf2696e672fe1f9cd141a60c20bcd25
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Thu Jun 4 13:38:23 2026 +0800
[SPARK-57142][INFRA] Share SBT precompile artifact with tpcds-1g CI job
### What changes were proposed in this pull request?
This PR wires the `tpcds-1g` job in `.github/workflows/build_and_test.yml`
to consume the shared `precompile` artifact, extending the pattern already
applied to `docker-integration-tests` and `k8s-integration-tests`
([SPARK-57069](https://issues.apache.org/jira/browse/SPARK-57069); parent
[SPARK-56830](https://issues.apache.org/jira/browse/SPARK-56830)).
Concretely:
- The `precompile` job's `if:` gate is extended to also fire when `tpcds-1g
== 'true'` in the precondition output, so the artifact is available whenever
the job runs.
- `tpcds-1g`:
- `needs: precondition` -> `needs: [precondition, precompile]`
- `if:` extended with `(!cancelled()) &&` so the job still runs if
precompile is cancelled.
- Adds "Download precompiled artifact" + "Extract precompiled artifact"
steps after Java install, with graceful fallback (`continue-on-error: true`).
The `tpcds-1g` job drives SBT directly via `build/sbt "sql/testOnly ..."`
(and `build/sbt "sql/Test/runMain org.apache.spark.sql.GenTPCDSData ..."` on a
TPC-DS data cache miss), so it does not go through `dev/run-tests.py` and needs
no `SKIP_SCALA_BUILD` flag -- the same situation as `k8s-integration-tests`.
The first SBT invocation otherwise compiles `sql/core` (main + test) from
scratch. The `precompile` job already runs `Test/package`, which compiles the
`sql/core` test classes thi [...]
### Optional: graceful fallback if precompile fails
Same pattern as the prior consumers:
- `precompile` keeps `continue-on-error: true`.
- The "Download precompiled artifact" step is gated on
`needs.precompile.result == 'success'` and has `continue-on-error: true`.
- "Extract precompiled artifact" is gated on the download succeeding and
has `continue-on-error: true`.
- If extraction fails or the artifact is missing, SBT compiles from scratch
exactly as before.
Worst case is degraded to the pre-PR behavior, not a workflow failure.
Note: the existing `# Any TPC-DS related updates on this job need to be
applied to tpcds-1g-gen job of benchmark.yml as well` comment refers to TPC-DS
data-generation parameters (scale factor, `tpcds-kit` ref, `GenTPCDSData`
args). This PR changes none of those -- it only adds build-artifact reuse, and
`benchmark.yml` is a standalone workflow with no shared `precompile` job -- so
no corresponding change is needed there.
### Why are the changes needed?
Today every run of `build_and_test.yml` that requires `tpcds-1g` re-runs
the same `sql/core` SBT compile that the `precompile` job already produced for
`pyspark` / `sparkr` / `build` / docker / k8s. Wiring `tpcds-1g` to the
existing artifact removes that duplicate compile for free (precompile is
already running).
### Does this PR introduce _any_ user-facing change?
No. CI infrastructure change only.
### How was this patch tested?
The change is exercised by the CI run of this PR itself. The
Download/Extract steps log the artifact size; if the precompile job is forced
to fail (or its artifact is missing), the job falls back to the original local
SBT build.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)
Closes #56200 from zhengruifeng/precompile-tpcds-ci-share-dev5.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
---
.github/workflows/build_and_test.yml | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/.github/workflows/build_and_test.yml
b/.github/workflows/build_and_test.yml
index 860ef27447f9..cda9636a92e4 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -580,7 +580,8 @@ jobs:
fromJson(needs.precondition.outputs.required).pyspark-install ==
'true' ||
fromJson(needs.precondition.outputs.required).sparkr == 'true' ||
fromJson(needs.precondition.outputs.required).docker-integration-tests
== 'true' ||
- fromJson(needs.precondition.outputs.required).k8s-integration-tests ==
'true')
+ fromJson(needs.precondition.outputs.required).k8s-integration-tests ==
'true' ||
+ fromJson(needs.precondition.outputs.required).tpcds-1g == 'true')
name: "Precompile Spark"
runs-on: ubuntu-latest
timeout-minutes: 60
@@ -1450,8 +1451,8 @@ jobs:
# Any TPC-DS related updates on this job need to be applied to tpcds-1g-gen
job of benchmark.yml as well
tpcds-1g:
- needs: precondition
- if: fromJson(needs.precondition.outputs.required).tpcds-1g == 'true'
+ needs: [precondition, precompile]
+ if: (!cancelled()) &&
fromJson(needs.precondition.outputs.required).tpcds-1g == 'true'
name: Run TPC-DS queries with SF=1
runs-on: ubuntu-latest
timeout-minutes: 120
@@ -1494,6 +1495,20 @@ jobs:
with:
distribution: zulu
java-version: ${{ inputs.java }}
+ - name: Download precompiled artifact
+ id: download-precompiled
+ if: needs.precompile.result == 'success'
+ continue-on-error: true
+ uses: actions/download-artifact@v8
+ with:
+ name: spark-compile-${{ inputs.branch }}-${{ github.run_id }}
+ - name: Extract precompiled artifact
+ id: extract-precompiled
+ if: steps.download-precompiled.outcome == 'success'
+ continue-on-error: true
+ run: |
+ tar -xzf compile-artifact.tar.gz
+ rm compile-artifact.tar.gz
- name: Cache TPC-DS generated data
id: cache-tpcds-sf-1
uses: actions/cache@v5
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]