andygrove opened a new issue, #4077:
URL: https://github.com/apache/datafusion-comet/issues/4077
## Describe the bug
Version-specific expressions registered through
`CometExprShim.versionSpecificExprToProtoInternal` (in
`spark/src/main/spark-{3.4,3.5,4.0}/org/apache/comet/shims/CometExprShim.scala`)
bypass the `CometExpressionSerde` framework. As a result they don't
participate in:
- per-expression enable/disable config (`spark.comet.expression.<X>.enabled`)
- the support-level gate (`Compatible` / `Incompatible` / `Unsupported`) and
the corresponding `spark.comet.expression.<X>.allowIncompatible` flag
- the auto-generated compatibility docs that `GenerateDocs` builds from
`getCompatibleNotes` / `getIncompatibleReasons` / `getUnsupportedReasons`
Expressions affected today include `MapSort` (Spark 4.0), `WidthBucket`,
`ToPrettyString`, and `StringDecode`. They are wired up directly inside the
shim's pattern match instead of as `CometExpressionSerde[T]` objects.
## Proposed solution
Refactor each shim so version-specific expressions that are real Spark
`Expression` classes are implemented as `CometExpressionSerde[T]` objects
living under `spark/src/main/spark-<version>/org/apache/comet/serde/`, and have
`CometExprShim` contribute category-keyed maps
(`versionSpecificMapExpressions`, `versionSpecificTemporalExpressions`, etc.)
that `QueryPlanSerde` merges into the corresponding global maps. Once that's
done they flow through the same `convert()` path as everything else and pick up
the framework features for free.
A few shim cases can't be keyed by `Class[_]` and should stay as ad-hoc
handlers in `versionSpecificExprToProtoInternal`:
- `Invoke` matched on `StructsToJsonEvaluator` target (runtime polymorphism
on the target object)
- `KnownNotContainsNull`, which is a tagging expression that simply forwards
to its child
## Additional context
This came up while reviewing #4076 (MapSort support for Spark 4.0). MapSort
is currently registered via the shim and therefore has no entry in the
compatibility guide and no per-expression config.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]