andygrove opened a new pull request, #4132:
URL: https://github.com/apache/datafusion-comet/pull/4132

   ## Which issue does this PR close?
   
   N/A. Documentation-only follow-up to issues observed while reviewing scalar 
function PRs (e.g. #4105 for `levenshtein`).
   
   ## Rationale for this change
   
   When a Comet scalar function shares its name with a `datafusion-functions` 
built-in (e.g. `levenshtein`, `concat`, `coalesce`, `sha2`, `regexp_replace`), 
the serde **must** set the return type explicitly on the protobuf `ScalarFunc`. 
If it does not, the native planner consults DataFusion's UDF registry first to 
resolve the return type, and any arity or input-type difference between the 
Spark version and the DataFusion built-in fails native execution at runtime 
with errors like:
   
   ```
   org.apache.comet.CometNativeException: Error from DataFusion:
   Function 'levenshtein' expects 2 arguments but received 3.
   ```
   
   This is subtle because the 2-arg path happens to work even without an 
explicit return type (DF's signature accepts 2 args, the lookup succeeds, then 
Comet's UDF is swapped in). It only blows up when Spark's signature is wider 
than DF's, which is exactly when a contributor is most likely to hit it without 
warning.
   
   The contributor guide already mentions 
`scalarFunctionExprToProtoWithReturnType` in passing but does not explain when 
it is required versus optional. This PR adds an explicit subsection.
   
   ## What changes are included in this PR?
   
   - `docs/source/contributor-guide/adding_a_new_expression.md`: new "When to 
set the return type explicitly" subsection that explains the failure mode, 
lists known colliding names, and shows a `CometLevenshtein`-style example using 
`scalarFunctionExprToProtoWithReturnType`.
   - `.claude/skills/review-comet-pr/SKILL.md`: matching bullet under "Common 
Comet Review Issues" so reviewers catch this on incoming PRs.
   
   ## How are these changes tested?
   
   Documentation-only change. No tests required. The Sphinx build will validate 
the Markdown on the next docs publish.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to