RatulDawar commented on issue #21625:
URL: https://github.com/apache/datafusion/issues/21625#issuecomment-4247114703

   @adriangb  @Omega359  So I want able to find the issue here, the 
[commit](https://github.com/apache/datafusion/commit/6c5e241e6298e70077259b3a12840c3adab3c810)
 introduced a short circuit optimization where suppose if after the build 
process of the local partition there are zero rows, it doesn't need to run the 
probe state (as obviously there won't be any matches). 
   So this optimization basically marks the state from `build` -> `completed`. 
   
   Problem arises when we are using dynamic filtering since dynamic filter is 
shared across partitions (since it's used to filter the probe side table) we 
use synchronization across parititons so that before passing the filter to 
probe side we have completed the filter collection from all the build side 
partitions. 
   
   Now when we significantly increase the target partitions count some 
partitions get zero rows allocated, which leads to the shot-circuit getting 
triggerd marking state of the empty partitions to completed. 
   
   Now these empty partitions never report the dynamic filters to the shared 
accumlator, which is waiting for the status to be reported by all the 
parititons. Leading to indefinite weight. 
   
   
   What I propose as a fix here is that for empty partitons we still report the 
filters to the shared accumlator and then mark it as completed. This would stop 
this indefinite weight situation. 
   If you guys are aligned I will raise a PR with this fix.  
   
   
   **TL;DR:**
   The short-circuit optimization skips probe execution for empty partitions, 
but those partitions never report their dynamic filters. Since the shared 
accumulator waits for all partitions, this causes an indefinite wait. Fix: even 
empty partitions should report (empty) filters before marking themselves as 
completed.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to