nabuskey commented on code in PR #4086: URL: https://github.com/apache/datafusion-comet/pull/4086#discussion_r3150123645
########## docs/source/user-guide/latest/understanding-comet-plans.md: ########## @@ -0,0 +1,225 @@ +<!--- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Understanding Comet Plans + +This guide explains how to read a Spark query plan once Comet is enabled, what +happens when parts of a plan fall back to Spark, and which configs to use to +inspect that behavior. + +## Overview + +When Comet is enabled, the `CometSparkSessionExtensions` rules walk the +physical plan bottom-up and replace Spark operators with Comet equivalents +where possible. Consecutive native operators are combined into a single block +that is serialized as protobuf and executed by DataFusion on the executor. +Operators that Comet does not support remain as their original Spark form. + +As a result, a plan can mix three kinds of nodes: + +- **`Comet*` nodes** that run natively in Rust (for example `CometProject`, + `CometHashAggregate`). +- **`Comet*` nodes that run on the JVM** but are still part of the Comet + pipeline (for example `CometBroadcastExchange`, `CometColumnarExchange`). +- **Standard Spark nodes** (for example `Project`, `HashAggregate`) where + Comet either does not support the operator or has fallen back due to an + unsupported expression, data type, or configuration. + +Wherever data crosses between columnar and row-based execution, Comet inserts +a transition node such as `CometColumnarToRow` or `CometSparkRowToColumnar`. + +## Reading a Plan + +You can print a plan with `df.explain(true)` or `EXPLAIN FORMATTED <sql>`, and +the same plan is shown in the Spark SQL UI. When reading a plan, look for: + +- **Node prefix.** `Comet*` nodes are accelerated by Comet. Anything without + the prefix is unmodified Spark. +- **Transitions.** `CometColumnarToRow`, `CometNativeColumnarToRow`, and + `CometSparkRowToColumnar` mark boundaries between columnar Comet execution + and row-based Spark execution. Frequent transitions usually indicate + fallback inside the plan. +- **Exchange type.** `CometExchange` is the native shuffle path, + `CometColumnarExchange` is the JVM columnar shuffle path, and a plain + `Exchange` means Spark shuffle. See [Shuffle Operators](#shuffle-operators) + below. + +## Fallback + +A "fallback" happens when Comet cannot translate part of a plan into native +execution. Fallback can be partial (a subtree falls back while the rest stays +native) or full (no Comet nodes appear). + +Common reasons: + +- The Spark operator is not supported by Comet. +- An expression inside an otherwise supported operator is not supported, or + is marked incompatible and the per-expression opt-in + `spark.comet.expression.<ExpressionName>.allowIncompatible=true` is not + set. Operators have an equivalent + `spark.comet.operator.<OperatorName>.allowIncompatible` opt-in. +- A data type is not supported by the operator. +- A configuration setting disables a specific operator or expression. + +See [Supported Spark Operators](operators.md) and [Supported Expressions](expressions.md) +for current coverage, and the [Compatibility Guide](compatibility/index.md) for +incompatibility details. + +## Configs for Inspecting Plans and Fallback + +Comet provides four configs for understanding what is happening in a plan. +They serve different purposes and produce output in different places. + +| Config | Output destination | What you see | +| ---------------------------------------- | ---------------------------------- | --------------------------------------------------------------------------------------------- | +| `spark.comet.explainFallback.enabled` | Driver log (only when fallback) | A WARN with the list of reasons each query stage could not run natively. | +| `spark.comet.logFallbackReasons.enabled` | Driver log | One WARN per fallback reason as it is encountered, without surrounding plan context. | +| `spark.comet.explain.format` | Spark SQL UI (Spark 4.0 and newer) | Annotated plan or fallback-reason list, depending on `verbose` (default) or `fallback` value. | +| `spark.comet.explain.native.enabled` | Executor logs, per task | The DataFusion native plan with metrics, useful for inspecting native execution. | + +### `spark.comet.explainFallback.enabled` + +Logs a single WARN listing the reasons each query stage could not be executed +natively. Nothing is logged when the entire stage runs in Comet. Useful as a +low-noise check that fallback is or is not happening. + +### `spark.comet.logFallbackReasons.enabled` + +Logs every fallback reason as it is encountered, one WARN per reason. Use this +when you want to see all reasons, including ones that +`spark.comet.explainFallback.enabled` may aggregate or omit. The output does +not include the surrounding plan, so it is best for accumulating diagnostics +across many queries. + +### `spark.comet.explain.format` + +This config is read by `org.apache.comet.ExtendedExplainInfo`, which Spark +loads via the `spark.sql.extendedExplainProviders` mechanism added in Spark +4.0. Add the provider: + +```shell +--conf spark.sql.extendedExplainProviders=org.apache.comet.ExtendedExplainInfo +``` + +The Spark SQL UI then shows an additional section under the detailed plan. +The format is controlled by `spark.comet.explain.format`: + +- `verbose` (default): the full plan annotated with fallback reasons, plus a + summary of how much of the plan is accelerated. +- `fallback`: a list of fallback reasons only. + +This is the most convenient option on Spark 4.0 because the output is shown +inline in the UI. Earlier Spark versions do not have the +`extendedExplainProviders` extension point, so this provider is not used and +the config has no effect there. + +### `spark.comet.explain.native.enabled` + +When enabled, each executor task logs the DataFusion native plan it executes, +along with metrics. This is verbose because there is one plan per task, but it +is the only way to see the native plan as DataFusion sees it (including how +operators were arranged after Comet's serialization). See the +[Metrics Guide](metrics.md) for details on the native metrics that appear in +this output. + +## Comet Operator Reference + +The following sections describe the Comet nodes you will see in plans, grouped +by role. Names match what is shown in the plan output. + +### Scans + +| Node | Description | +| ------------------------ | ----------------------------------------------------------------------------------------------- | +| `CometScan` | V1 Parquet file scan with native execution. | Review Comment: What are difference between `CometScan` and `CometNativeScan`? Some things are in JVM with `CometScan`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
