This is an automated email from the ASF dual-hosted git repository.
gengliangwang pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.x by this push:
new fefb7f47dcc1 [SPARK-57108][INFRA] Skip core/utils build matrix entry
for unrelated changes
fefb7f47dcc1 is described below
commit fefb7f47dcc19a565eaba7487d08f1abfec44285
Author: Gengliang Wang <[email protected]>
AuthorDate: Wed May 27 14:15:56 2026 -0700
[SPARK-57108][INFRA] Skip core/utils build matrix entry for unrelated
changes
### What changes were proposed in this pull request?
Splits the `core, unsafe, kvstore, avro, utils, utils-java, network-common,
network-shuffle, repl, launcher, examples, sketch, variant` build matrix entry
in `build_and_test.yml` into two:
- `core, unsafe, kvstore, utils, utils-java, network-common,
network-shuffle, sketch, variant, launcher` — foundational modules under
`common/` (plus `launcher/`) whose only dependency is `tags`.
- `avro, repl, examples` — modules that are transitive dependents of
`sql`/`hive` and naturally need to run when those change.
Adds a new `build-core-utils` precondition (computed via `is-changed.py`
against just the foundational module list) and a matrix `exclude` rule that
drops the first entry when `build-core-utils == 'false'`. The check is opt-out:
missing/unset means run, so periodic full-build workflows that only set
`"build": "true"` continue to run this entry unchanged.
### Why are the changes needed?
Today, any PR that triggers the `build` job runs every matrix entry,
including the foundational core/utils group, even when the PR only touches SQL
or PySpark code. Because `is-changed.py` propagates changes forward (to
dependents), SQL/PySpark changes never make core/utils tests stale, so the
runner spend is wasted.
After this change:
- SQL-only / PySpark-only PR → `build-core-utils=false` → the core/utils
runner is skipped. The `avro, repl, examples` runner still fires because those
modules are transitive dependents of `sql`/`hive`.
- Core/utils PR → `build-core-utils=true` → entry runs as before.
- Periodic full-build workflows (e.g. `build_java21.yml`) → no
`build-core-utils` key, opt-out semantics keeps the entry running.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Validated the workflow YAML parses and that the literal string used in the
`exclude` expression matches the matrix entry string exactly (so the exclude
rule actually fires).
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code
Closes #56152 from gengliangwang/infra-skip-core-utils-build.
Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
(cherry picked from commit 6fabcef2ff12f75aed18a8309a9efc3b465bf4ad)
Signed-off-by: Gengliang Wang <[email protected]>
---
.github/workflows/build_and_test.yml | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/.github/workflows/build_and_test.yml
b/.github/workflows/build_and_test.yml
index 531eddfc4d31..f698f78a279d 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -149,9 +149,11 @@ jobs:
java25=false
fi
build=`./dev/is-changed.py -m
"core,unsafe,kvstore,avro,utils,utils-java,network-common,network-shuffle,repl,launcher,examples,sketch,variant,api,catalyst,hive-thriftserver,mllib-local,mllib,graphx,streaming,sql-kafka-0-10,streaming-kafka-0-10,streaming-kinesis-asl,kubernetes,hadoop-cloud,spark-ganglia-lgpl,profiler,protobuf,yarn,connect,sql,hive,pipelines"`
+ build_core_utils=`./dev/is-changed.py -m
"core,unsafe,kvstore,utils,utils-java,network-common,network-shuffle,sketch,variant,launcher"`
precondition="
{
\"build\": \"$build\",
+ \"build-core-utils\": \"$build_core_utils\",
\"pyspark\": \"$pyspark\",
\"pyspark-pandas\": \"$pandas\",
\"pyspark-install\": \"$pyspark_install\",
@@ -280,16 +282,15 @@ jobs:
# Note that the modules below are from sparktestsupport/modules.py.
modules:
- >-
- core, unsafe, kvstore, avro, utils, utils-java,
- network-common, network-shuffle, repl, launcher,
- examples, sketch, variant
+ core, unsafe, kvstore, utils, utils-java,
+ network-common, network-shuffle, sketch, variant, launcher
- >-
api, catalyst, hive-thriftserver
- >-
- mllib-local, mllib, graphx, profiler, pipelines
+ mllib-local, mllib, graphx, profiler, pipelines, repl, examples
- >-
streaming, sql-kafka-0-10, streaming-kafka-0-10,
streaming-kinesis-asl,
- kubernetes, hadoop-cloud, spark-ganglia-lgpl, protobuf, connect
+ kubernetes, hadoop-cloud, spark-ganglia-lgpl, protobuf, connect,
avro
- yarn
# Here, we split Hive and SQL tests into some of slow ones and the
rest of them.
included-tags: [""]
@@ -333,6 +334,10 @@ jobs:
# In practice, the build will run in individual PR, but not against
the individual commit
# in Apache Spark repository.
- modules: ${{ fromJson(needs.precondition.outputs.required).yarn !=
'true' && 'yarn' }}
+ # Skip the core/utils group when a PR doesn't touch those modules
(e.g. SQL-only or
+ # PySpark-only changes). The precondition is opt-out: omitted means
run (so periodic
+ # full-build workflows that set only "build": "true" keep running
this entry).
+ - modules: ${{
fromJson(needs.precondition.outputs.required).build-core-utils == 'false' &&
'core, unsafe, kvstore, utils, utils-java, network-common, network-shuffle,
sketch, variant, launcher' }}
env:
MODULES_TO_TEST: ${{ matrix.modules }}
EXCLUDED_TAGS: ${{ matrix.excluded-tags }}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]