[PR] build: add spark-4.1 Maven profile and shim sources [datafusion-comet]

via GitHub Sun, 26 Apr 2026 09:48:12 -0700


andygrove opened a new pull request, #4097:
URL: https://github.com/apache/datafusion-comet/pull/4097


   ## Which issue does this PR close?
   
   <!-- no tracking issue yet -->
   
   ## Rationale for this change
   
   Adds the infrastructure needed to compile Comet against Spark 4.1 without 
enabling it in any CI workflow. This is a preparation PR that lets the bigger 
Spark 4.1 enablement work (diff file, test matrix entries, plan-stability 
golden files, docs) land in subsequent PRs against a code base that already has 
the profile and shims in place.
   
   After this PR, `./mvnw -Pspark-4.1 test-compile` succeeds locally; the 
existing 3.4 / 3.5 / 4.0 profiles continue to build unchanged.
   
   ## What changes are included in this PR?
   
   **New `spark-4.1` Maven profile** in `pom.xml` and `spark/pom.xml`. Pins 
versions to match what Spark 4.1.1 itself ships with (`scala.version=2.13.17`, 
`parquet.version=1.16.0`, `slf4j.version=2.0.17`, `jetty=11.0.26`). The 
iceberg-spark-runtime artifact for Spark 4.1 isn't published yet, so the 
profile reuses `iceberg-spark-runtime-4.0_2.13:1.10.0` until it is.
   
   **New shim source trees** at `spark/src/main/spark-4.1/`, 
`common/src/main/spark-4.1/`, and `spark/src/test/spark-4.1/`, mirroring their 
`spark-4.0` counterparts. Three 4.1-specific deltas inside the trees:
   
   - `CometExprShim.binaryOutputStyle` no longer parses `BINARY_OUTPUT_STYLE` 
as a string. Spark 4.1 made it an `enumConf`, so `getConf` already returns the 
enum.
   - `CometEvalModeUtil.sumEvalMode` reads `s.evalContext.evalMode`. Spark 4.1 
wrapped `Sum`'s `EvalMode` in a `NumericEvalContext`.
   - `CometTypeShim.isVariantStruct` wires through to 
`VariantMetadata.isVariantStruct`, matching the helper #4084 added to the 
spark-4.0 shim.
   
   **No-op refactors that prepare existing profiles for the new shim** (these 
compile to identical bytecode on 3.x/4.0 today):
   
   - `CometEvalModeUtil.sumEvalMode` helper added to the spark-3.4 / spark-3.5 
/ spark-4.0 type shims. `aggregates.scala` updated to call 
`CometEvalModeUtil.sumEvalMode(sum)` instead of `sum.evalMode`. Lets the 
spark-4.1 shim provide the alternate `s.evalContext.evalMode` form.
   - New `MapStatusHelper.scala` wraps `MapStatus.apply`, so the three Java 
shuffle-writer call sites (in `CometBypassMergeSortShuffleWriter` and 
`CometUnsafeShuffleWriter`) don't need to see Scala default parameters. 
Required because Spark 4.1 added a `checksumVal: Long = 0` default.
   - `CometShuffleManager` swaps `Collections.emptyMap()` for `new 
ConcurrentHashMap[Int, OpenHashSet[Long]]()` in the reflective 
`IndexShuffleBlockResolver` constructor lookup. `ConcurrentHashMap` is also a 
`java.util.Map`, so 3.x/4.0 still resolve; required because Spark 4.1 widened 
the third constructor parameter from `Map` to `ConcurrentMap`.
   - `isSpark41Plus` added to `CometSparkSessionExtensions`. Not yet referenced 
anywhere; lands here so downstream PRs can use it without churn.
   
   **Out of scope** (intentionally deferred to follow-up PRs):
   - `dev/diffs/4.1.1.diff` for Spark SQL Tests
   - Any `.github/workflows/*` matrix entries for 4.1
   - Plan-stability golden files for 4.1 and the corresponding 
`CometPlanStabilitySuite` branch
   - Docs additions for the new-version workflow
   - Runtime-only Spark 4.1 fixes (e.g. `CometNativeWriteExec` `FileNameSpec`, 
`REMAINDER_BY_ZERO` test branching)
   
   ## How are these changes tested?
   
   Locally:
   - `./mvnw test-compile -Pspark-4.1` passes.
   - `./mvnw test-compile -Pspark-3.5 -Pscala-2.13` passes.
   - `./mvnw test-compile -Pspark-4.0` passes.
   
   CI runs the existing 3.4 / 3.5 / 4.0 PR-build matrices, which exercise the 
no-op refactors. `-Pspark-4.1` is intentionally not added to any matrix in this 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] build: add spark-4.1 Maven profile and shim sources [datafusion-comet]

Reply via email to