andygrove opened a new issue, #4575:
URL: https://github.com/apache/datafusion-comet/issues/4575
### Describe the bug
`CometFromUnixTime` currently reports a single support level of
`Incompatible(None)` and folds two unrelated limitations into one
`getIncompatibleReasons()` string:
```scala
override def getIncompatibleReasons(): Seq[String] = Seq(
"Only supports the default datetime format pattern `yyyy-MM-dd HH:mm:ss`."
+
" DataFusion's valid timestamp range differs from Spark" +
" (https://github.com/apache/datafusion/issues/16594)")
override def getSupportLevel(expr: FromUnixTime): SupportLevel =
Incompatible(None)
```
(`spark/src/main/scala/org/apache/comet/serde/unixtime.scala`)
These are two different things:
1. **DataFusion's valid timestamp range differs from Spark.** This is a
genuine incompatibility. It applies even to the default format pattern, where
Comet executes natively but can produce different results outside DataFusion's
supported range. `Incompatible` is the correct classification for this.
2. **Only the default datetime pattern `yyyy-MM-dd HH:mm:ss` is supported.**
A non-default pattern cannot be executed at all. `convert` falls back to Spark
via `withFallbackReason`. This is an *unsupported* limitation, not an
incompatibility.
Reporting the unsupported format pattern as an incompatible reason is the
misclassification noted in the third bullet of #4074. There is no
`getUnsupportedReasons()` to surface the format limitation properly, so the
generated compatibility docs describe a fall-back-only case as an
incompatibility.
This is a classification and documentation accuracy problem, not a
correctness bug. Runtime behavior is already safe. By default
(`spark.comet.expr.FromUnixTime.allowIncompatible=false`) the whole expression
falls back to Spark, and with that flag enabled the default format runs
natively while non-default formats fall back.
### Steps to reproduce
Look at `CometFromUnixTime` in
`spark/src/main/scala/org/apache/comet/serde/unixtime.scala` and the generated
`from_unixtime` entry on the datetime compatibility page. The format limitation
is rendered as an incompatibility bullet rather than as an unsupported
limitation.
### Expected behavior
`getSupportLevel` receives the concrete `FromUnixTime`, so it can
distinguish the two cases. For example:
```scala
override def getSupportLevel(expr: FromUnixTime): SupportLevel =
if (expr.format != Literal(TimestampFormatter.defaultPattern)) {
Unsupported(Some("Only the default datetime pattern `yyyy-MM-dd
HH:mm:ss` is supported"))
} else {
Incompatible(None) // DataFusion's valid timestamp range differs from
Spark
}
```
Then split the reason strings. Keep the DataFusion timestamp-range
difference in `getIncompatibleReasons()`, and add a `getUnsupportedReasons()`
for the non-default-format limitation.
One thing to confirm during implementation: check how `GenerateDocs` invokes
`getSupportLevel`. If it calls it without a concrete `FromUnixTime` instance, a
format-dependent `getSupportLevel` needs to degrade sensibly for the docs path.
### Additional context
Follow-up to the third item in #4074. The `CometSum` item from that issue is
addressed in #4111. The `CometStringRepeat` item has been fixed separately.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]