mbutrovich commented on PR #21499: URL: https://github.com/apache/datafusion/pull/21499#issuecomment-4307022282
Thanks for working on this @rluvaton. As a downstream consumer maintaining [Comet's DataFusion upgrade branch](https://github.com/apache/datafusion-comet/pull/3916), I can confirm this would be valuable. The crates that generate the most upgrade churn for us are `physical_plan`, `physical_expr`, and `logical_expr` (trait changes cascade across dozens of impl sites), so catching those automatically would help a lot. A few things I noticed: **Workspace member parsing is fragile.** The `sed`-based TOML parsing in `changed_crates.sh` will break if the format drifts, and the hardcoded exclusion list misses several unpublished crates (`datafusion/wasmtest`, `datafusion-examples`, `datafusion/proto/gen`, `datafusion/proto-common/gen`). `cargo metadata --no-deps --format-version 1 | jq` can give you workspace members with their `publish` field, which handles this correctly and doesn't need a manual exclusion list. **Double output in `semver-check`.** The `tee` writes to stdout *and* the file, then the `sed` strip also writes to stdout, so the workflow captures both concatenated. Either `tee /dev/stderr > /tmp/semver-output.txt` or just write to the file without tee. **`cargo install cargo-semver-checks` on every PR run** will be slow (3-5 min compile). Consider caching the cargo install dir or using a pre-built binary. **Unquoted `$args`** (line 83) works by accident via word-splitting. A bash array would be more robust. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
