Omega359 commented on PR #22001: URL: https://github.com/apache/datafusion/pull/22001#issuecomment-4373963246
> I think I am missing some of the larger plan for this code. My hope was for a binary that could run the same benchmark setup as the criterion runner (e.g. the content of benchmarks/sql_benchmarks ) rather than a new set of definitions Ah, yes, I really do need to work on explaining what I'm planning and/or doing. I tend to just go ahead and do things and tell others why later :/ Since just using environment variables to pass things into sql benchmarks isn't exactly simple to figure out (need to read docs, typos, etc) and the issue this PR partially resolves requested a better approach I went ahead and wrote a whole runner that can dynamically discover sql benchmarks (via suite files), their options, and provide help, required options, etc. I've just added support for native benchmarks locally as well so benchmark_runner list would show all of the benchmarks, not sure sql ones. My plan for this is a benchmark runner that replaces both dfbench and the criterion benchmark runner so datafusion has just a single benchmark runner that can run either the sql benchmarks or the native ones with builtin discovery, help and information on suites and individual benchmarks. I've got a full working solution locally but I need to tighten it up (I need to add criterion baseline support for example). I suppose we could just rename the benchmark runner to be dfbench at that point. In any case, I split up the work into 3 separate branches to make the review easier. This is the first of the 3, the other two add support for info and query subcommands (query only applies to sql benchmarks) and finally the run command. The last one comes with documentation updates but doesn't touch bench.sh nor dfbench. Run supports both the dfbench simple '# of iterations' approach as well as using criterion. Per suite arguments are supported, coming either from the .suite file or the RunOpts for the native benchmark. If all of these are approved I plan on submitting PRs to update the benchmarks/README.md, the bench.sh to use the new runner, remove or update the sql.rs (cargo bench - update would have it just delegate to the new runner though I haven't yet looked into that) and to remove dfbench.rs. After all that I plan on submitting PR's for each migration of a native benchmark to be sql based (tpcds, h2o, etc). After that I think the only native benchmark that would be left would be the cancellation one. I hope that clarifies things a bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
