comphead commented on issue #15914:
URL: https://github.com/apache/datafusion/issues/15914#issuecomment-4383429117
Thanks @Jefffrey and @coderfender
There are def pros and cons of having `simplify` over execution
From what I can see
#### Pros
- Less code to maintain. No bespoke Arrow kernel per function.
- Better downstream optimization. Once rewritten, constant folding, cast
pushdown, and redundant-cast elimination apply naturally. A UDF call is opaque
to those rules.
- Fewer correctness drift risks between the Spark shim and DataFusion's
native type semantics.
#### Cons
- Hidden coupling to the optimizer. datafusion-spark consumers must run
SimplifyExpressions before execution, which is not obvious from the crate's
surface. And this is not always possible
- Inconsistent UDF contract. Most UDFs in the codebase are executable via
invoke_with_args; these silently aren't. A caller that builds a physical plan
directly from a logical plan (bypassing
optimization) hits a surprising failure.
- No compile-time or registration-time signal. Nothing tells a downstream
user "this UDF only works after simplification."
Perhaps we can think of contract unification, so when the downstream
registers the UDF it should know in advance if function should be called as is
or via alternate methods. Especially it would be breaking if the DF release
changes the behavior from `invoke` to `simplify` it would a breaking change for
downstream although API itself is not changed.
The simplification itself is useful and as downstream, we can make the code
simpler as well just rewriting the call for particular function to similar
simplified expression that DF provides.
So reg to action items I would highlight to register the change in behavior
from `invoke` to `simplify` to be a breaking change and this need to be exposed
to the Migration Guide. For contract unification it is nice to have but not
urgent
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]