andygrove commented on code in PR #4461:
URL: https://github.com/apache/datafusion-comet/pull/4461#discussion_r3319454646


##########
docs/source/contributor-guide/spark_expressions_support.md:
##########
@@ -523,40 +523,109 @@
 ### string_funcs
 
 - [x] ascii
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `StringType -> IntegerType`; 
`nullSafeEval` returns `codePointAt(0)` of the first char, or `0` for the empty 
string. Wired via `CometScalarFunction("ascii")` and resolved to DataFusion 
`ascii` (`chars().next() as i32`); first-code-point semantics match for ASCII, 
BMP, and supplementary code points.
+  - Spark 4.0.1 (audited 2026-05-27): `inputTypes` widened to 
`StringTypeWithCollation(supportsTrimCollation = true)`; behaviour unchanged 
for `UTF8_BINARY`. Comet does not propagate collation, so non-default 
collations may diverge silently.
 - [ ] base64
 - [x] bit_length
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `(StringType|BinaryType) -> 
IntegerType`; eval returns `numBytes * 8` for strings and `.length * 8` for 
binary.
+  - Spark 4.0.1 (audited 2026-05-27): `inputTypes` widened to 
`StringTypeWithCollation(supportsTrimCollation = true)`; semantics unchanged.
+  - Known limitation: wired as a raw `CometScalarFunction("bit_length")` with 
no `BinaryType` guard. DataFusion's `BitLengthFunc` signature only accepts 
string types, so `bit_length(<binary>)` execute-fails on the native side 
instead of falling back cleanly 
(https://github.com/apache/datafusion-comet/issues/4464).
 - [x] btrim
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `StringTrimBoth` is 
`RuntimeReplaceable` and rewritten to `StringTrim(srcStr, trimStr)` before 
serde runs, so the explicit `CometScalarFunction("btrim")` mapping is 
unreachable.
+  - Spark 4.0.1 (audited 2026-05-27): `StringTrim` (the rewrite target) routes 
through `CollationSupport.StringTrim.exec` and uses 
`StringTypeNonCSAICollation(supportsTrimCollation = true)`; semantics unchanged 
for `UTF8_BINARY`. Non-default collations may diverge in Comet.

Review Comment:
   I filed https://github.com/apache/datafusion-comet/issues/4496 for doing a 
follow on audit focussing on collation issues once this PR is merged



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to