adriangb opened a new pull request, #22760: URL: https://github.com/apache/datafusion/pull/22760
## Which issue does this PR close? - Relates to https://github.com/apache/datafusion/issues/11900 ## Rationale for this change This splits the test and benchmark scaffolding out of #21621 so the `PushDownTopKThroughJoin` optimizer rule itself can be reviewed in isolation, with a small, focused diff. The benchmark and SLT files here do not depend on the rule. They are committed first so that: 1. The benchmark can measure the rule's effect against a baseline that does not register it. 2. The follow-up rule PR's diff shows exactly which plans change, since the EXPLAIN plans here capture the current (pre-rule) behavior. ## What changes are included in this PR? - A `push_down_topk` benchmark (`dfbench push-down-topk`) that runs `ORDER BY <cols> LIMIT N` queries over outer joins against TPC-H `customer`/`orders`/`nation`, plus its query files under `benchmarks/queries/push_down_topk/`. - `push_down_topk_through_join.slt` covering the scenarios the rule handles: preserved-side sort keys, ineligible join types (inner/full/semi/anti), `ON`-clause filters, projection and `SubqueryAlias` resolution, existing child sorts, ties, multi-level joins, `OFFSET`, and volatile expressions. The EXPLAIN plans assert current behavior (TopK not yet pushed through the join). The follow-up PR that adds the rule updates those plans in place; the query-result checks hold regardless of whether the rule is enabled. The new optimizer rule, the `push_down_limit.rs` changes, and the `optimizer_rule_reference.md` update from #21621 are intentionally left for the follow-up PR. ## Are these changes tested? Yes — this PR is the tests. `push_down_topk_through_join.slt` passes against `main`, and the benchmark binary compiles and runs. ## Are there any user-facing changes? No. No API changes; only new benchmark and test files plus benchmark CLI wiring. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
