andygrove opened a new pull request, #4328: URL: https://github.com/apache/datafusion-comet/pull/4328
## Which issue does this PR close? Closes #3900. ## Rationale for this change Comet provides limited benefit when its shuffle manager is not registered, because shuffle typically dominates query runtime. When users enable Comet but forget to set `spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager`, they may believe Comet is accelerating their workload while actually most of the runtime is spent in Spark's default shuffle. The discussion in #3900 noted that for testing — for example, measuring scan performance in isolation — it is sometimes useful to run Comet with Spark's default shuffle manager. To preserve that workflow, this change is gated by an opt-out config rather than being unconditional. ## What changes are included in this PR? - New config `spark.comet.exec.shuffle.required` (default `true`). - `CometSparkSessionExtensions.isCometLoaded` returns `false` and logs a warning when `shuffle.required=true` and `CometShuffleManager` is not registered. - Updated the existing `isCometLoaded` test (sets `shuffle.required=false`) and added a new test covering all three states (required-but-missing → disabled, opted-out → enabled, required-and-set → enabled). ## How are these changes tested? - `CometSparkSessionExtensionsSuite` — 7/7 pass, including the new test. - `CometExpressionSuite` smoke run — 127/127 pass (`CometTestBase` already sets the shuffle manager, so default behavior is unaffected). - `configs.md` is auto-generated at doc-build time, so the new config will appear in the user-facing config reference automatically. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
