comphead commented on issue #15914:
URL: https://github.com/apache/datafusion/issues/15914#issuecomment-4383429117

   Thanks @Jefffrey and @coderfender 
   
   There are def pros and cons of having `simplify` over execution
   From what I can see
   
     #### Pros                                                                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                           
    
   - Less code to maintain. No bespoke Arrow kernel per function.
   - Better downstream optimization. Once rewritten, constant folding, cast 
pushdown, and redundant-cast elimination apply naturally. A UDF call is opaque 
to those rules.                                  
   - Fewer correctness drift risks between the Spark shim and DataFusion's 
native type semantics.                                                          
                                                 
                                                                                
                                                                                
                                              
   ####  Cons                                                                   
                                                                                
                                                  
                                                                                
                                                                                
                                              
     - Hidden coupling to the optimizer. datafusion-spark consumers must run 
SimplifyExpressions before execution, which is not obvious from the crate's 
surface.   And this is not always possible                                      
    
     - Inconsistent UDF contract. Most UDFs in the codebase are executable via 
invoke_with_args; these silently aren't. A caller that builds a physical plan 
directly from a logical plan (bypassing          
     optimization) hits a surprising failure.   
   - No compile-time or registration-time signal. Nothing tells a downstream 
user "this UDF only works after simplification."     
   
   
   Perhaps we can think of contract unification, so when the downstream 
registers the UDF it should know in advance if function should be called as is 
or via alternate methods. Especially it would be breaking if the DF release 
changes the behavior from `invoke` to `simplify` it would a breaking change for 
downstream although API itself is not changed.           
   
   The simplification itself is useful and as downstream, we can make the code 
simpler as well just rewriting the call for particular function to similar 
simplified expression that DF provides. 
   
   So reg to action items I would highlight to register the change in behavior 
from `invoke` to `simplify` to be a breaking change and this need to be exposed 
to the Migration Guide. For contract unification  it is nice to have but not 
urgent                                                              
   
                                                                                
                                                                                
 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to