2010YOUY01 commented on code in PR #21821:
URL: https://github.com/apache/datafusion/pull/21821#discussion_r3141639368
##########
benchmarks/src/hj.rs:
##########
@@ -303,6 +303,198 @@ const HASH_QUERIES: &[HashJoinQuery] = &[
build_size: "100K_(20%_dups)",
probe_size: "60M",
},
+ // RightSemi Join benchmarks with Int32 keys
+ // Q16: RightSemi, 100% Density, 100% Hit rate
+ HashJoinQuery {
+ sql: r###"SELECT l.k
Review Comment:
It might be clearer to express these directly using `RIGHT SEMI JOIN`, for
example:
```sh
DataFusion CLI v53.1.0
> select count(*)
from generate_series(100) as t1(v1)
right semi join generate_series(100000) as t2(v1)
on t1.v1 > t2.v1;
+----------+
| count(*) |
+----------+
| 100 |
+----------+
1 row(s) fetched.
Elapsed 0.077 seconds.
> select count(*)
from generate_series(100) as t1(v1)
right anti join generate_series(100000) as t2(v1)
on t1.v1 > t2.v1;
+----------+
| count(*) |
+----------+
| 99901 |
+----------+
1 row(s) fetched.
Elapsed 0.007 seconds.
```
Though, I'm not sure if it's standard SQL 🤔 , but df have them and it's
easier to read.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]