andygrove opened a new pull request, #4097: URL: https://github.com/apache/datafusion-comet/pull/4097
## Which issue does this PR close? <!-- no tracking issue yet --> ## Rationale for this change Adds the infrastructure needed to compile Comet against Spark 4.1 without enabling it in any CI workflow. This is a preparation PR that lets the bigger Spark 4.1 enablement work (diff file, test matrix entries, plan-stability golden files, docs) land in subsequent PRs against a code base that already has the profile and shims in place. After this PR, `./mvnw -Pspark-4.1 test-compile` succeeds locally; the existing 3.4 / 3.5 / 4.0 profiles continue to build unchanged. ## What changes are included in this PR? **New `spark-4.1` Maven profile** in `pom.xml` and `spark/pom.xml`. Pins versions to match what Spark 4.1.1 itself ships with (`scala.version=2.13.17`, `parquet.version=1.16.0`, `slf4j.version=2.0.17`, `jetty=11.0.26`). The iceberg-spark-runtime artifact for Spark 4.1 isn't published yet, so the profile reuses `iceberg-spark-runtime-4.0_2.13:1.10.0` until it is. **New shim source trees** at `spark/src/main/spark-4.1/`, `common/src/main/spark-4.1/`, and `spark/src/test/spark-4.1/`, mirroring their `spark-4.0` counterparts. Three 4.1-specific deltas inside the trees: - `CometExprShim.binaryOutputStyle` no longer parses `BINARY_OUTPUT_STYLE` as a string. Spark 4.1 made it an `enumConf`, so `getConf` already returns the enum. - `CometEvalModeUtil.sumEvalMode` reads `s.evalContext.evalMode`. Spark 4.1 wrapped `Sum`'s `EvalMode` in a `NumericEvalContext`. - `CometTypeShim.isVariantStruct` wires through to `VariantMetadata.isVariantStruct`, matching the helper #4084 added to the spark-4.0 shim. **No-op refactors that prepare existing profiles for the new shim** (these compile to identical bytecode on 3.x/4.0 today): - `CometEvalModeUtil.sumEvalMode` helper added to the spark-3.4 / spark-3.5 / spark-4.0 type shims. `aggregates.scala` updated to call `CometEvalModeUtil.sumEvalMode(sum)` instead of `sum.evalMode`. Lets the spark-4.1 shim provide the alternate `s.evalContext.evalMode` form. - New `MapStatusHelper.scala` wraps `MapStatus.apply`, so the three Java shuffle-writer call sites (in `CometBypassMergeSortShuffleWriter` and `CometUnsafeShuffleWriter`) don't need to see Scala default parameters. Required because Spark 4.1 added a `checksumVal: Long = 0` default. - `CometShuffleManager` swaps `Collections.emptyMap()` for `new ConcurrentHashMap[Int, OpenHashSet[Long]]()` in the reflective `IndexShuffleBlockResolver` constructor lookup. `ConcurrentHashMap` is also a `java.util.Map`, so 3.x/4.0 still resolve; required because Spark 4.1 widened the third constructor parameter from `Map` to `ConcurrentMap`. - `isSpark41Plus` added to `CometSparkSessionExtensions`. Not yet referenced anywhere; lands here so downstream PRs can use it without churn. **Out of scope** (intentionally deferred to follow-up PRs): - `dev/diffs/4.1.1.diff` for Spark SQL Tests - Any `.github/workflows/*` matrix entries for 4.1 - Plan-stability golden files for 4.1 and the corresponding `CometPlanStabilitySuite` branch - Docs additions for the new-version workflow - Runtime-only Spark 4.1 fixes (e.g. `CometNativeWriteExec` `FileNameSpec`, `REMAINDER_BY_ZERO` test branching) ## How are these changes tested? Locally: - `./mvnw test-compile -Pspark-4.1` passes. - `./mvnw test-compile -Pspark-3.5 -Pscala-2.13` passes. - `./mvnw test-compile -Pspark-4.0` passes. CI runs the existing 3.4 / 3.5 / 4.0 PR-build matrices, which exercise the no-op refactors. `-Pspark-4.1` is intentionally not added to any matrix in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
