comphead commented on code in PR #4067: URL: https://github.com/apache/datafusion-comet/pull/4067#discussion_r3140012942
##########
docs/source/user-guide/latest/expressions.md:
##########
@@ -19,204 +19,200 @@
# Supported Spark Expressions
-Comet supports the following Spark expressions. Expressions that are marked as
Spark-compatible will either run
-natively in Comet and provide the same results as Spark, or will fall back to
Spark for cases that would not
-be compatible.
+Comet supports the following Spark expressions. See the [Comet Compatibility
Guide] for details on known
+incompatibilities and unsupported cases.
All expressions are enabled by default, but most can be disabled by setting
`spark.comet.expression.EXPRNAME.enabled=false`, where `EXPRNAME` is the
expression name as specified in
the following tables, such as `Length`, or `StartsWith`. See the [Comet
Configuration Guide] for a full list
of expressions that be disabled.
-Expressions that are not Spark-compatible will fall back to Spark by default
and can be enabled by setting
-`spark.comet.expression.EXPRNAME.allowIncompatible=true`.
-
## Conditional Expressions
-| Expression | SQL | Spark-Compatible?
|
-| ---------- | ------------------------------------------- | -----------------
|
-| CaseWhen | `CASE WHEN expr THEN expr ELSE expr END` | Yes
|
-| If | `IF(predicate_expr, true_expr, false_expr)` | Yes
|
+| Expression | SQL |
+| ---------- | ------------------------------------------- |
+| CaseWhen | `CASE WHEN expr THEN expr ELSE expr END` |
+| If | `IF(predicate_expr, true_expr, false_expr)` |
## Predicate Expressions
-| Expression | SQL | Spark-Compatible? |
-| ------------------ | ------------- | ----------------- |
-| And | `AND` | Yes |
-| EqualTo | `=` | Yes |
-| EqualNullSafe | `<=>` | Yes |
-| GreaterThan | `>` | Yes |
-| GreaterThanOrEqual | `>=` | Yes |
-| LessThan | `<` | Yes |
-| LessThanOrEqual | `<=` | Yes |
-| In | `IN` | Yes |
-| IsNotNull | `IS NOT NULL` | Yes |
-| IsNull | `IS NULL` | Yes |
-| InSet | `IN (...)` | Yes |
-| Not | `NOT` | Yes |
-| Or | `OR` | Yes |
+| Expression | SQL |
+| ------------------ | ------------- |
+| And | `AND` |
+| EqualTo | `=` |
+| EqualNullSafe | `<=>` |
+| GreaterThan | `>` |
+| GreaterThanOrEqual | `>=` |
+| LessThan | `<` |
+| LessThanOrEqual | `<=` |
+| In | `IN` |
+| IsNotNull | `IS NOT NULL` |
+| IsNull | `IS NULL` |
+| InSet | `IN (...)` |
+| Not | `NOT` |
+| Or | `OR` |
## String Functions
-| Expression | Spark-Compatible? | Compatibility Notes
|
-| --------------- | ----------------- |
----------------------------------------------------------------------------------------------------------
|
-| Ascii | Yes |
|
-| BitLength | Yes |
|
-| Chr | Yes |
|
-| Concat | Yes | Only string inputs are supported
|
-| ConcatWs | Yes |
|
-| Contains | Yes |
|
-| EndsWith | Yes |
|
-| InitCap | No | Behavior is different in some cases,
such as hyphenated names. |
-| Left | Yes | Length argument must be a literal
value |
-| Length | Yes |
|
-| Like | Yes |
|
-| Lower | No | Results can vary depending on locale
and character set. Requires `spark.comet.caseConversion.enabled=true` |
-| OctetLength | Yes |
|
-| Reverse | Yes |
|
-| RLike | No | Uses Rust regexp engine, which has
different behavior to Java regexp engine |
-| StartsWith | Yes |
|
-| StringInstr | Yes |
|
-| StringRepeat | Yes | Negative argument for number of times
to repeat causes exception |
-| StringReplace | Yes |
|
-| StringLPad | Yes |
|
-| StringRPad | Yes |
|
-| StringSpace | Yes |
|
-| StringTranslate | Yes |
|
-| StringTrim | Yes |
|
-| StringTrimBoth | Yes |
|
-| StringTrimLeft | Yes |
|
-| StringTrimRight | Yes |
|
-| Substring | Yes |
|
-| Upper | No | Results can vary depending on locale
and character set. Requires `spark.comet.caseConversion.enabled=true` |
+| Expression |
+| --------------- |
+| Ascii |
+| BitLength |
+| Chr |
+| Concat |
+| ConcatWs |
+| Contains |
+| EndsWith |
+| InitCap |
+| Left |
+| Length |
+| Like |
+| Lower |
+| OctetLength |
+| Reverse |
+| RLike |
+| StartsWith |
+| StringInstr |
+| StringRepeat |
+| StringReplace |
+| StringLPad |
+| StringRPad |
+| StringSpace |
+| StringTranslate |
+| StringTrim |
+| StringTrimBoth |
+| StringTrimLeft |
+| StringTrimRight |
+| Substring |
+| Upper |
## JSON Functions
-| Expression | Spark-Compatible? | Compatibility Notes
|
-| ------------- | ----------------- |
---------------------------------------------------------------------------------------------
|
-| GetJsonObject | No | Spark allows single-quoted JSON and
unescaped control characters which Comet does not support |
+| Expression |
+| ------------- |
+| GetJsonObject |
## Date/Time Functions
-| Expression | SQL | Spark-Compatible? |
Compatibility Notes
|
-| -------------- | ---------------------------- | ----------------- |
--------------------------------------------------------------------------------------------------------------------------------
|
-| DateAdd | `date_add` | Yes |
|
-| DateDiff | `datediff` | Yes |
|
-| DateFormat | `date_format` | Yes | Partial
support. Only specific format patterns are supported.
|
-| DateSub | `date_sub` | Yes |
|
-| DatePart | `date_part(field, source)` | Yes |
Supported values of `field`:
`year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute`
|
-| Days | `days` | Yes | V2
partition transform. Supports DateType and TimestampType inputs.
|
-| Extract | `extract(field FROM source)` | Yes |
Supported values of `field`:
`year`/`month`/`week`/`day`/`dayofweek`/`dayofweek_iso`/`doy`/`quarter`/`hour`/`minute`
|
-| FromUnixTime | `from_unixtime` | No | Does not
support format, supports only -8334601211038 <= sec <= 8210266876799
|
-| Hour | `hour` | No |
Incorrectly applies timezone conversion to TimestampNTZ inputs
([#3180](https://github.com/apache/datafusion-comet/issues/3180)) |
-| LastDay | `last_day` | Yes |
|
-| Minute | `minute` | No |
Incorrectly applies timezone conversion to TimestampNTZ inputs
([#3180](https://github.com/apache/datafusion-comet/issues/3180)) |
-| Second | `second` | No |
Incorrectly applies timezone conversion to TimestampNTZ inputs
([#3180](https://github.com/apache/datafusion-comet/issues/3180)) |
-| TruncDate | `trunc` | Yes |
|
-| TruncTimestamp | `date_trunc` | No |
Incorrect results in non-UTC timezones
([#2649](https://github.com/apache/datafusion-comet/issues/2649))
|
-| UnixDate | `unix_date` | Yes |
|
-| UnixTimestamp | `unix_timestamp` | Yes |
|
-| Year | `year` | Yes |
|
-| Month | `month` | Yes |
|
-| DayOfMonth | `day`/`dayofmonth` | Yes |
|
-| DayOfWeek | `dayofweek` | Yes |
|
-| WeekDay | `weekday` | Yes |
|
-| DayOfYear | `dayofyear` | Yes |
|
-| WeekOfYear | `weekofyear` | Yes |
|
-| Quarter | `quarter` | Yes |
|
+| Expression | SQL |
+| -------------- | ---------------------------- |
+| DateAdd | `date_add` |
+| DateDiff | `datediff` |
+| DateFormat | `date_format` |
+| DateSub | `date_sub` |
+| DatePart | `date_part(field, source)` |
+| Days | `days` |
+| Extract | `extract(field FROM source)` |
+| FromUnixTime | `from_unixtime` |
+| Hour | `hour` |
+| LastDay | `last_day` |
+| Minute | `minute` |
+| Second | `second` |
+| TruncDate | `trunc` |
+| TruncTimestamp | `date_trunc` |
+| UnixDate | `unix_date` |
+| UnixTimestamp | `unix_timestamp` |
+| Year | `year` |
+| Month | `month` |
+| DayOfMonth | `day`/`dayofmonth` |
+| DayOfWeek | `dayofweek` |
+| WeekDay | `weekday` |
+| DayOfYear | `dayofyear` |
+| WeekOfYear | `weekofyear` |
+| Quarter | `quarter` |
## Math Expressions
-| Expression | SQL | Spark-Compatible? | Compatibility Notes
|
-| -------------- | --------- | ----------------- |
--------------------------------- |
-| Abs | `abs` | Yes |
|
-| Acos | `acos` | Yes |
|
-| Add | `+` | Yes |
|
-| Asin | `asin` | Yes |
|
-| Atan | `atan` | Yes |
|
-| Atan2 | `atan2` | Yes |
|
-| BRound | `bround` | Yes |
|
-| Ceil | `ceil` | Yes |
|
-| Cos | `cos` | Yes |
|
-| Cosh | `cosh` | Yes |
|
-| Cot | `cot` | Yes |
|
-| Divide | `/` | Yes |
|
-| Exp | `exp` | Yes |
|
-| Expm1 | `expm1` | Yes |
|
-| Floor | `floor` | Yes |
|
-| Hex | `hex` | Yes |
|
-| IntegralDivide | `div` | Yes |
|
-| IsNaN | `isnan` | Yes |
|
-| Log | `log` | Yes |
|
-| Log2 | `log2` | Yes |
|
-| Log10 | `log10` | Yes |
|
-| Multiply | `*` | Yes |
|
-| Pow | `power` | Yes |
|
-| Rand | `rand` | Yes |
|
-| Randn | `randn` | Yes |
|
-| Remainder | `%` | Yes |
|
-| Round | `round` | Yes |
|
-| Signum | `signum` | Yes |
|
-| Sin | `sin` | Yes |
|
-| Sinh | `sinh` | Yes |
|
-| Sqrt | `sqrt` | Yes |
|
-| Subtract | `-` | Yes |
|
-| Tan | `tan` | Yes |
|
-| Tanh | `tanh` | Yes |
|
-| TryAdd | `try_add` | Yes | Only integer inputs are
supported |
-| TryDivide | `try_div` | Yes | Only integer inputs are
supported |
-| TryMultiply | `try_mul` | Yes | Only integer inputs are
supported |
-| TrySubtract | `try_sub` | Yes | Only integer inputs are
supported |
-| UnaryMinus | `-` | Yes |
|
-| Unhex | `unhex` | Yes |
|
+| Expression | SQL |
+| -------------- | --------- |
+| Abs | `abs` |
+| Acos | `acos` |
+| Add | `+` |
+| Asin | `asin` |
+| Atan | `atan` |
+| Atan2 | `atan2` |
+| BRound | `bround` |
+| Ceil | `ceil` |
+| Cos | `cos` |
+| Cosh | `cosh` |
+| Cot | `cot` |
+| Divide | `/` |
+| Exp | `exp` |
+| Expm1 | `expm1` |
+| Floor | `floor` |
+| Hex | `hex` |
+| IntegralDivide | `div` |
+| IsNaN | `isnan` |
+| Log | `log` |
+| Log2 | `log2` |
+| Log10 | `log10` |
+| Multiply | `*` |
+| Pow | `power` |
+| Rand | `rand` |
+| Randn | `randn` |
+| Remainder | `%` |
+| Round | `round` |
+| Signum | `signum` |
+| Sin | `sin` |
+| Sinh | `sinh` |
+| Sqrt | `sqrt` |
+| Subtract | `-` |
+| Tan | `tan` |
+| Tanh | `tanh` |
+| TryAdd | `try_add` |
+| TryDivide | `try_div` |
+| TryMultiply | `try_mul` |
+| TrySubtract | `try_sub` |
+| UnaryMinus | `-` |
+| Unhex | `unhex` |
## Hashing Functions
-| Expression | Spark-Compatible? |
-| ----------- | ----------------- |
-| Md5 | Yes |
-| Murmur3Hash | Yes |
-| Sha1 | Yes |
-| Sha2 | Yes |
-| XxHash64 | Yes |
+| Expression |
+| ----------- |
+| Md5 |
+| Murmur3Hash |
+| Sha1 |
+| Sha2 |
+| XxHash64 |
## Bitwise Expressions
-| Expression | SQL | Spark-Compatible? |
-| ------------ | ---- | ----------------- |
-| BitwiseAnd | `&` | Yes |
-| BitwiseCount | | Yes |
-| BitwiseGet | | Yes |
-| BitwiseOr | `\|` | Yes |
-| BitwiseNot | `~` | Yes |
-| BitwiseXor | `^` | Yes |
-| ShiftLeft | `<<` | Yes |
-| ShiftRight | `>>` | Yes |
+| Expression | SQL |
+| ------------ | ---- |
+| BitwiseAnd | `&` |
+| BitwiseCount | |
+| BitwiseGet | |
+| BitwiseOr | `\|` |
+| BitwiseNot | `~` |
+| BitwiseXor | `^` |
+| ShiftLeft | `<<` |
+| ShiftRight | `>>` |
## Aggregate Expressions
-| Expression | SQL | Spark-Compatible? | Compatibility Notes
|
-| ------------- | ---------- | ------------------------- |
---------------------------------------------------------------- |
-| Average | | Yes, except for ANSI mode |
|
-| BitAndAgg | | Yes |
|
-| BitOrAgg | | Yes |
|
-| BitXorAgg | | Yes |
|
-| BoolAnd | `bool_and` | Yes |
|
-| BoolOr | `bool_or` | Yes |
|
-| CollectSet | | No | NaN dedup differs
from Spark. See compatibility guide. |
-| Corr | | Yes |
|
-| Count | | Yes |
|
-| CovPopulation | | Yes |
|
-| CovSample | | Yes |
|
-| First | | No | This function is
not deterministic. Results may not match Spark. |
-| Last | | No | This function is
not deterministic. Results may not match Spark. |
-| Max | | Yes |
|
-| Min | | Yes |
|
-| StddevPop | | Yes |
|
-| StddevSamp | | Yes |
|
-| Sum | | Yes, except for ANSI mode |
|
-| VariancePop | | Yes |
|
-| VarianceSamp | | Yes |
|
+| Expression | SQL |
+| ------------- | ---------- |
+| Average | |
+| BitAndAgg | |
+| BitOrAgg | |
+| BitXorAgg | |
+| BoolAnd | `bool_and` |
+| BoolOr | `bool_or` |
+| CollectSet | |
Review Comment:
we prob need to address those gaps later. for example count, corr,
collect_set supported and have sql expression
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
