Geethapranay1 commented on issue #4131:
URL: 
https://github.com/apache/datafusion-comet/issues/4131#issuecomment-4386430150

   Hi @comphead, I just went through the merged pr #4003  and noticed the 
fallback for `FIRST/LAST` under `PartialMerge` is already in place doConvert 
rejects them at operators.scala. So the correctness issue from the original bug 
report is handled.
   
   I think what's left here is removing that fallback and actually supporting 
`FIRST/LAST` natively in `PartialMerge` mode. The underlying problem is that 
datafusion's hash table doesn't guarantee the same group visitation order as 
spark during `merge_batch`, which breaks order-dependent aggregates like 
`FIRST` and `LAST`.
   
   one way i think to fix this would be to add a monotonic ordinal to the 
accumulator state something like `(value, ordinal)` so that `merge_batch` can 
always pick the right value regardless of processing order:
   * `FIRST` would keep the smallest ordinal
   * `LAST` the largest
   
   that would mean touching the state schema and merge logic in 
`native/spark-expr/src/agg_funcs/`, and then removing the guard in 
operators.scala once native execution is correct.
   
   does this direction make sense before I start working on it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to