2010YOUY01 commented on PR #23026: URL: https://github.com/apache/datafusion/pull/23026#issuecomment-4774549328
I think it would be easier to ship intra-operator parallelism first. It would support the segment-tree parallelism mentioned by @alamb in https://github.com/apache/datafusion/pull/23026#issuecomment-4770385196, and it can also support the workload targeted by this PR. We can then implement this PR’s repartition-based approach afterward. I’ll write more about the intra-operator parallelism approach later and work on the implementation. For now, I’m trying to make the case that we should do intra-operator parallelism first, and I’m happy to help with this PR’s approach afterward. TL;DR: this order likely requires less total engineering effort. I agree this PR’s approach is necessary for scaling out, especially for running certain window workloads in a distributed way. The tradeoff is that it is more complex, because it requires multiple `ExecutionPlan` nodes to cooperate to evaluate a single logical window node: - Temporary inconsistent intermediate states are introduced. - Optimizer rules become harder to implement. For example, we need to handle how filters are pushed down below the two newly introduced operators. By contrast, intra-operator parallelism is simpler and more broadly applicable. The same architecture can be extended to range-partitioned parallelism, segment-tree-based parallelism, and potentially other forms of parallelism. My suggestion is to implement this approach after we have architectural support for intra-node parallelism in the window operator. Alternatively, we could consider implementing this approach outside the core first to make maintenance easier. The main reason is the implementation order: - Intra-operator parallelism → repartition-based parallelism: the latter remains easy to add, since it mostly introduces new operators and does not require significant changes inside the existing window operator. - Repartition-based parallelism → intra-operator parallelism: this is more subtle, but I have thought about the intra-operator approach before. To support it properly, we will eventually need to refactor most window-function-related traits and structs because of some historical structural issues. If we land the repartition-based approach first, that refactor will likely become harder. It is also more common to scale up before scaling out, and to prioritize the more broadly applicable solution first. I’ll write more about the alternative intra-operator parallelism approach later. Here is an existing PoC: - https://github.com/apache/datafusion/pull/22356 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
