protze.joachim added inline comments.
================ Comment at: openmp/runtime/src/kmp_taskdeps.cpp:344-346 + // link node as successor of all nodes in the prev_set if any + npredecessors += + __kmp_depnode_link_successor(gtid, thread, task, node, prev_set); ---------------- AndreyChurbanov wrote: > protze.joachim wrote: > > If I understand this code right, this has O(n^2) complexity for two sets of > > size n? > > > > Consider: > > ``` > > for (int i=0; i<n; i++) { > > #pragma omp task depend(in:A) > > work(A,i); > > } > > for (int i=0; i<n; i++) { > > #pragma omp task depend(inoutset:A) > > work(A,i); > > } > > ``` > > All n tasks in the first loop would be predecessor for each of the tasks in > > the second loop. This will also result in n^2 releases/notifications. > > > > > > I'd suggest to model the dependencies like: > > ``` > > for (int i=0; i<n; i++) { > > #pragma omp task depend(in:A) > > work(A,i); > > } > > #pragma omp taskwait depend(inout:A) nowait > > for (int i=0; i<n; i++) { > > #pragma omp task depend(inoutset:A) > > work(A,i); > > } > > ``` > > I.e., add a dummy dependency node between the two sets, which has n > > predecessors and n successors. You probably understand the depnode code > > better, than I do, but I think you would only need to add some code in line > > 357, where `last_set` is moved to `prev_set`. > > This dummy node would reduce linking and releasing complexity to O(n). > This could be a separate research project. Because such change may cause > performance regressions on some real codes, so it needs thorough > investigation. I mean insertion of an auxiliary dummy node into dependency > graph is definitely an overhead, which may or may not be overridden by > reduction in number of edges in the graph. Or it can be made optional > optimization under an external control, if there is a significant gain in > some particular case. I don't suggest to insert the auxiliary node in the general case. Just for the case that you see a transition of `in` -> `inoutset` or vice versa. With current task dependencies, you always have 1 node notifying n nodes (`out` -> `in`) or n nodes notifying one node (`in` -> `out`). You can see this as an amortized O(1) operation per task in the graph. Here you introduce a code pattern, where n nodes each will need to notify m other nodes. This leads to an O(n) operation per task. I'm really worried about the complexity of this pattern. If there is no use case with large n, I don't understand, why `inoutset` was introduced in the first place. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D97085/new/ https://reviews.llvm.org/D97085 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits