andygrove opened a new issue, #4077:
URL: https://github.com/apache/datafusion-comet/issues/4077

   ## Describe the bug
   
   Version-specific expressions registered through 
`CometExprShim.versionSpecificExprToProtoInternal` (in 
`spark/src/main/spark-{3.4,3.5,4.0}/org/apache/comet/shims/CometExprShim.scala`)
 bypass the `CometExpressionSerde` framework. As a result they don't 
participate in:
   
   - per-expression enable/disable config (`spark.comet.expression.<X>.enabled`)
   - the support-level gate (`Compatible` / `Incompatible` / `Unsupported`) and 
the corresponding `spark.comet.expression.<X>.allowIncompatible` flag
   - the auto-generated compatibility docs that `GenerateDocs` builds from 
`getCompatibleNotes` / `getIncompatibleReasons` / `getUnsupportedReasons`
   
   Expressions affected today include `MapSort` (Spark 4.0), `WidthBucket`, 
`ToPrettyString`, and `StringDecode`. They are wired up directly inside the 
shim's pattern match instead of as `CometExpressionSerde[T]` objects.
   
   ## Proposed solution
   
   Refactor each shim so version-specific expressions that are real Spark 
`Expression` classes are implemented as `CometExpressionSerde[T]` objects 
living under `spark/src/main/spark-<version>/org/apache/comet/serde/`, and have 
`CometExprShim` contribute category-keyed maps 
(`versionSpecificMapExpressions`, `versionSpecificTemporalExpressions`, etc.) 
that `QueryPlanSerde` merges into the corresponding global maps. Once that's 
done they flow through the same `convert()` path as everything else and pick up 
the framework features for free.
   
   A few shim cases can't be keyed by `Class[_]` and should stay as ad-hoc 
handlers in `versionSpecificExprToProtoInternal`:
   
   - `Invoke` matched on `StructsToJsonEvaluator` target (runtime polymorphism 
on the target object)
   - `KnownNotContainsNull`, which is a tagging expression that simply forwards 
to its child
   
   ## Additional context
   
   This came up while reviewing #4076 (MapSort support for Spark 4.0). MapSort 
is currently registered via the shim and therefore has no entry in the 
compatibility guide and no per-expression config.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to