RatulDawar commented on issue #21625: URL: https://github.com/apache/datafusion/issues/21625#issuecomment-4247114703
@adriangb @Omega359 So I want able to find the issue here, the [commit](https://github.com/apache/datafusion/commit/6c5e241e6298e70077259b3a12840c3adab3c810) introduced a short circuit optimization where suppose if after the build process of the local partition there are zero rows, it doesn't need to run the probe state (as obviously there won't be any matches). So this optimization basically marks the state from `build` -> `completed`. Problem arises when we are using dynamic filtering since dynamic filter is shared across partitions (since it's used to filter the probe side table) we use synchronization across parititons so that before passing the filter to probe side we have completed the filter collection from all the build side partitions. Now when we significantly increase the target partitions count some partitions get zero rows allocated, which leads to the shot-circuit getting triggerd marking state of the empty partitions to completed. Now these empty partitions never report the dynamic filters to the shared accumlator, which is waiting for the status to be reported by all the parititons. Leading to indefinite weight. What I propose as a fix here is that for empty partitons we still report the filters to the shared accumlator and then mark it as completed. This would stop this indefinite weight situation. If you guys are aligned I will raise a PR with this fix. **TL;DR:** The short-circuit optimization skips probe execution for empty partitions, but those partitions never report their dynamic filters. Since the shared accumulator waits for all partitions, this causes an indefinite wait. Fix: even empty partitions should report (empty) filters before marking themselves as completed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
