Myx778 commented on PR #4105:
URL:
https://github.com/apache/datafusion-comet/pull/4105#issuecomment-4334273434
## Benchmark Results
Ran `CometStringExpressionBenchmark` on GitHub Codespace (4-core AMD EPYC
7763, 16GB RAM, Ubuntu 22.04, JDK 17.0.18).
### levenshtein (2-arg)
```
OpenJDK 64-Bit Server VM 17.0.18+8-Ubuntu-124.04.1 on Linux 6.8.0-1044-azure
AMD EPYC 7763 64-Core Processor
levenshtein: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Spark 21 24
3 0.0 20689.0 1.0X
Comet (Scan) 22 24
2 0.0 21926.5 0.9X
Comet (Scan + Exec) 22 25
3 0.0 21696.1 1.0X
```
### levenshtein_threshold (3-arg)
```
OpenJDK 64-Bit Server VM 17.0.18+8-Ubuntu-124.04.1 on Linux 6.8.0-1044-azure
AMD EPYC 7763 64-Core Processor
levenshtein_threshold: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Spark 16 18
2 0.1 15537.4 1.0X
Comet (Scan) 17 20
3 0.1 16607.5 0.9X
Comet (Scan + Exec) 17 19
3 0.1 16605.2 0.9X
```
### Analysis
Performance is roughly on par with Spark for both variants (~1.0X). This is
expected because:
1. The benchmark dataset is small (1024 rows with short repeated strings),
so the overhead of JVM↔Native transitions dominates the actual computation time
2. `levenshtein` is a character-by-character comparison — the real gains
from native execution would show on larger datasets where SIMD
auto-vectorization and cache-efficient memory layout have more impact
3. The `WARNING: Benchmark plan is NOT fully Comet native!` shows the
expression falls back through `ColumnarToRow`, which adds conversion overhead
The key benefit is **correctness** — ensuring the native path produces
identical results to Spark, which is validated by the unit tests and SLT tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]