xudong963 opened a new pull request, #21945:
URL: https://github.com/apache/datafusion/pull/21945

   ## Which issue does this PR close?
   
   N/A. This is a benchmark follow-up for #21637.
   
   ## Rationale for this change
   
   This adds a ClickBench extended query that exercises Parquet filter pushdown 
when row group statistics can prove a string range predicate matches every row 
in the row group.
   
   This case is useful for validating the optimization in #21637: when Parquet 
statistics prove a row group is fully matched, DataFusion can avoid evaluating 
the pushed-down RowFilter for that row group.
   
   ## What changes are included in this PR?
   
   - Add `benchmarks/queries/clickbench/extended/q13.sql`.
   - Document Q13 in the ClickBench query README.
   - Keep `clickbench_extended` behavior unchanged. This query should be run 
with an explicit `dfbench clickbench --pushdown --query 13`.
   
   ## Are these changes tested?
   
   Ran:
   
   ```bash
   cargo fmt --all
   bash -n benchmarks/bench.sh
   git diff --check -- benchmarks/bench.sh 
benchmarks/queries/clickbench/README.md 
benchmarks/queries/clickbench/extended/q13.sql
   ```
   
   I also ran a local synthetic-data comparison for this query. With 
`target_partitions=1`, the #21637 branch reduced scan processing time from 
about 85.82ms to 24.89ms, reduced `bytes_scanned` from 26.12M to 400.6K, and 
reduced `row_pushdown_eval_time` from 4.12ms to effectively zero.
   
   I attempted:
   
   ```bash
   cargo clippy --all-targets --all-features -- -D warnings
   ```
   
   That command fails on the existing benchmarks allocator feature conflict 
where `snmalloc` and `mimalloc` are both enabled for 
`benchmarks/src/bin/imdb.rs`.
   
   ## Are there any user-facing changes?
   
   No public API changes. This adds a benchmark query and benchmark 
documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to