Re: [PR] Parallel bounded RANGE-frame window functions without PARTITION BY (draft) [datafusion]

via GitHub Mon, 22 Jun 2026 18:00:54 -0700


2010YOUY01 commented on PR #23026:
URL: https://github.com/apache/datafusion/pull/23026#issuecomment-4774549328


   I think it would be easier to ship intra-operator parallelism first. It 
would support the segment-tree parallelism mentioned by @alamb in 
https://github.com/apache/datafusion/pull/23026#issuecomment-4770385196, and it 
can also support the workload targeted by this PR. We can then implement this 
PR’s repartition-based approach afterward.
   
   I’ll write more about the intra-operator parallelism approach later and work 
on the implementation. For now, I’m trying to make the case that we should do 
intra-operator parallelism first, and I’m happy to help with this PR’s approach 
afterward. TL;DR: this order likely requires less total engineering effort.
   
   I agree this PR’s approach is necessary for scaling out, especially for 
running certain window workloads in a distributed way. The tradeoff is that it 
is more complex, because it requires multiple `ExecutionPlan` nodes to 
cooperate to evaluate a single logical window node:
   
   - Temporary inconsistent intermediate states are introduced.
   - Optimizer rules become harder to implement. For example, we need to handle 
how filters are pushed down below the two newly introduced operators.
   
   By contrast, intra-operator parallelism is simpler and more broadly 
applicable. The same architecture can be extended to range-partitioned 
parallelism, segment-tree-based parallelism, and potentially other forms of 
parallelism.
   
   My suggestion is to implement this approach after we have architectural 
support for intra-node parallelism in the window operator. Alternatively, we 
could consider implementing this approach outside the core first to make 
maintenance easier.
   
   The main reason is the implementation order:
   
   - Intra-operator parallelism → repartition-based parallelism: the latter 
remains easy to add, since it mostly introduces new operators and does not 
require significant changes inside the existing window operator.
   - Repartition-based parallelism → intra-operator parallelism: this is more 
subtle, but I have thought about the intra-operator approach before. To support 
it properly, we will eventually need to refactor most window-function-related 
traits and structs because of some historical structural issues. If we land the 
repartition-based approach first, that refactor will likely become harder.
   
   It is also more common to scale up before scaling out, and to prioritize the 
more broadly applicable solution first.
   
   I’ll write more about the alternative intra-operator parallelism approach 
later. Here is an existing PoC:
   
   - https://github.com/apache/datafusion/pull/22356


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Parallel bounded RANGE-frame window functions without PARTITION BY (draft) [datafusion]

Reply via email to