xudong963 opened a new pull request, #21945: URL: https://github.com/apache/datafusion/pull/21945
## Which issue does this PR close? N/A. This is a benchmark follow-up for #21637. ## Rationale for this change This adds a ClickBench extended query that exercises Parquet filter pushdown when row group statistics can prove a string range predicate matches every row in the row group. This case is useful for validating the optimization in #21637: when Parquet statistics prove a row group is fully matched, DataFusion can avoid evaluating the pushed-down RowFilter for that row group. ## What changes are included in this PR? - Add `benchmarks/queries/clickbench/extended/q13.sql`. - Document Q13 in the ClickBench query README. - Keep `clickbench_extended` behavior unchanged. This query should be run with an explicit `dfbench clickbench --pushdown --query 13`. ## Are these changes tested? Ran: ```bash cargo fmt --all bash -n benchmarks/bench.sh git diff --check -- benchmarks/bench.sh benchmarks/queries/clickbench/README.md benchmarks/queries/clickbench/extended/q13.sql ``` I also ran a local synthetic-data comparison for this query. With `target_partitions=1`, the #21637 branch reduced scan processing time from about 85.82ms to 24.89ms, reduced `bytes_scanned` from 26.12M to 400.6K, and reduced `row_pushdown_eval_time` from 4.12ms to effectively zero. I attempted: ```bash cargo clippy --all-targets --all-features -- -D warnings ``` That command fails on the existing benchmarks allocator feature conflict where `snmalloc` and `mimalloc` are both enabled for `benchmarks/src/bin/imdb.rs`. ## Are there any user-facing changes? No public API changes. This adds a benchmark query and benchmark documentation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
