Re: [PR] Add benchmark_runner for sql_benchmarks with help and list commands [datafusion]

via GitHub Mon, 04 May 2026 12:42:08 -0700


Omega359 commented on PR #22001:
URL: https://github.com/apache/datafusion/pull/22001#issuecomment-4373963246


   > I think I am missing some of the larger plan for this code. My hope was 
for a binary that could run the same benchmark setup as the criterion runner 
(e.g. the content of benchmarks/sql_benchmarks ) rather than a new set of 
definitions
   
   Ah, yes, I really do need to work on explaining what I'm planning and/or 
doing. I tend to just go ahead and do things and tell others why later :/
   
   Since just using environment variables to pass things into sql benchmarks 
isn't exactly simple to figure out (need to read docs, typos, etc) and the 
issue this PR partially resolves requested a better approach I went ahead and 
wrote a whole runner that can dynamically discover sql benchmarks (via suite 
files), their options, and provide help, required options, etc. I've just added 
support for native benchmarks locally as well so benchmark_runner list would 
show all of the benchmarks, not sure sql ones.
   
   My plan for this is a benchmark runner that replaces both dfbench and the 
criterion benchmark runner so datafusion has just a single benchmark runner 
that can run either the sql benchmarks or the native ones with builtin 
discovery, help and information on suites and individual benchmarks. I've got a 
full working solution locally but I need to tighten it up (I need to add 
criterion baseline support for example). I suppose we could just rename the 
benchmark runner to be dfbench at that point.
   
   In any case, I split up the work into 3 separate branches to make the review 
easier. This is the first of the 3, the other two add support for info and 
query subcommands (query only applies to sql benchmarks) and finally the run 
command. The last one comes with documentation updates but doesn't touch 
bench.sh nor dfbench. Run supports both the dfbench simple '# of iterations' 
approach as well as using criterion. Per suite arguments are supported, coming 
either from the .suite file or the RunOpts for the native benchmark.
   
   If all of these are approved I plan on submitting PRs to update the 
benchmarks/README.md, the bench.sh to use the new runner, remove or update the 
sql.rs (cargo bench - update would have it just delegate to the new runner 
though I haven't yet looked into that) and to remove dfbench.rs.
   
   After all that I plan on submitting PR's for each migration of a native 
benchmark to be sql based (tpcds, h2o, etc). After that I think the only native 
benchmark that would be left would be the cancellation one.
   
   I hope that clarifies things a bit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add benchmark_runner for sql_benchmarks with help and list commands [datafusion]

Reply via email to