andygrove opened a new pull request, #4206: URL: https://github.com/apache/datafusion-comet/pull/4206
## Which issue does this PR close? Closes #. ## Rationale for this change Comet emulates Spark behavior across many subsystems: expressions, the optimizer, Parquet read and write, shuffle, joins, aggregates, and more. When Spark changes behavior on `master`, Comet may need to follow. Today there is no documented, repeatable process for the community to notice those changes commit-by-commit. This PR introduces that process so the project can stay aware of upstream Spark changes since `branch-4.2` was cut and not silently diverge. The work was scaffolded with the project `superpowers:brainstorming` skill, with the spec and plan kept on disk only. ## What changes are included in this PR? - `docs/source/contributor-guide/spark_commit_audit.md`: human-facing process page with rubric, scope, states, and workflow. Linked from the contributor guide index. - `dev/spark-commit-audit.md`: the audit log itself, populated with the 2 in-scope `sql/` commits on `apache/spark` `master` since `branch-4.2` was cut. Each line carries a short hash, date, state, and subject. - `dev/regenerate-spark-audit.py`: bootstrap and incremental update script. Idempotent; preserves existing verdicts and prose notes by short hash. Reuses the existing `dev/release/venv` (PyGithub). - `dev/test_regenerate_spark_audit.py`: 15 unit tests over the script's pure helpers (`parse_existing_block`, `format_new_line`, `is_in_scope`, `merge_lines`, `replace_block`). - `.claude/skills/audit-spark-commit/SKILL.md`: thin Claude skill that audits one commit per invocation, reads the contributor guide for the rubric, proposes a verdict, and updates the audit log line in place after user review. ## How are these changes tested? - `python3 dev/test_regenerate_spark_audit.py`: 15 unit tests over the script's pure functions, all pass. - Smoke test via `python dev/regenerate-spark-audit.py --dry-run --limit 5`, then full bootstrap. - Manual idempotency check: edited a populated line, re-ran the script, confirmed the manual edit was preserved by short hash, then reverted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
