[PR] docs: add audit-expression-page skill for auditing the expression support page [datafusion-comet]

via GitHub Tue, 02 Jun 2026 11:22:36 -0700


andygrove opened a new pull request, #4571:
URL: https://github.com/apache/datafusion-comet/pull/4571


   ## Which issue does this PR close?
   
   No dedicated issue. This adds tooling that supports keeping the expression 
support page accurate, complementing the release-prep step that calls for 
verifying that page.
   
   ## Rationale for this change
   
   `docs/source/user-guide/latest/expressions.md` is the source of truth for 
which Spark expressions Comet supports and at what status. It is 
hand-maintained, so it drifts: a newly registered expression can be missing, a 
status can be stale after a serde change, or a row can linger after a serde is 
removed. There was no repeatable procedure for checking the whole page against 
the registered serdes. The existing `audit-comet-expression` skill audits one 
expression deeply, but nothing swept the page for coverage and status accuracy.
   
   ## What changes are included in this PR?
   
   A new project skill at `.claude/skills/audit-expression-page/SKILL.md`. It 
guides a whole-page audit of `expressions.md` along three dimensions and offers 
to fix the page:
   
   - **Missing coverage:** every expression registered in `QueryPlanSerde` (the 
per-category maps that build `exprSerdeMap`, plus `aggrSerdeMap`) appears on 
the page, resolving serde classes to SQL names via Spark's function registry. 
Operator-injected and shim-wired expressions are handled explicitly.
   - **Status accuracy:** each row's status matches the runtime behavior, 
classified from `getSupportLevel`, the `allowIncompatible` default, and the 
`convert` fallback branches. The skill is explicit that 
`getIncompatibleReasons` / `getCompatibleNotes` only generate documentation 
text and do not by themselves drive fallback, so classification is anchored on 
`getSupportLevel`, and a disagreement between the two is itself reported.
   - **Stale entries:** rows marked supported that no longer resolve to a 
registered serde, or whose status contradicts registration.
   
   The skill reads the status legend from the page at runtime, so it stays 
correct as the legend evolves. It takes an optional `[category]` argument to 
scope a run to one registry category.
   
   ## How are these changes tested?
   
   This adds one Markdown skill file and changes no code. It was verified by 
dry-running the skill end to end on two categories (`agg_funcs` and 
`math_funcs`) with the actual repository: confirming each of the three checks 
is followable, that the classification anchors on the right methods, and that 
it surfaces real coverage and status discrepancies with code evidence. Gaps 
found in the first dry run (category-name mapping, the `getIncompatibleReasons` 
versus `getSupportLevel` distinction, alias and operator-injected handling) 
were fixed and re-verified. `prettier --check` passes on the new file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] docs: add audit-expression-page skill for auditing the expression support page [datafusion-comet]

Reply via email to