Myx778 opened a new pull request, #4105:
URL: https://github.com/apache/datafusion-comet/pull/4105

   ## Which issue does this PR close?
   
   Closes #3084
   
   ## What changes are included in this PR?
   
   Implements Spark's `levenshtein(str1, str2)` function as a native Comet 
scalar UDF, enabling it to run via DataFusion instead of falling back to the 
JVM.
   
   ### Implementation Details
   
   **Rust (native/spark-expr/src/string_funcs/levenshtein.rs)**
   - Standard DP algorithm with O(min(m,n)) space optimization
   - Unicode character-level distance (not byte-level), matching Spark semantics
   - Proper NULL propagation: any NULL input → NULL output
   - Unit tests for basic cases, Unicode, and NULL handling
   
   **Scala Serde (strings.scala + QueryPlanSerde.scala)**
   - Registered via `CometScalarFunction("levenshtein")` — leverages existing 
ScalarFunc proto pathway
   - No new protobuf message needed (reuses generic ScalarFunc)
   
   **UDF Registration (comet_scalar_funcs.rs)**
   - Registered `spark_levenshtein` in the 
`create_comet_physical_fun_with_eval_mode` match
   
   **Tests (CometStringExpressionSuite.scala)**
   - `test("levenshtein")` — basic edit distance computation
   - `test("levenshtein with nulls")` — NULL propagation
   - `test("levenshtein with unicode")` — character-level distance for CJK and 
emoji
   
   ## How are these changes tested?
   
   - Rust unit tests: `cargo test -p datafusion-comet-spark-expr`
   - Spark integration tests: `CometStringExpressionSuite` (3 new test cases)
   - All tests use `checkSparkAnswerAndOperator` to verify result matches Spark 
AND runs natively


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to