protze.joachim added inline comments.

================
Comment at: openmp/runtime/src/kmp_taskdeps.cpp:344-346
+        // link node as successor of all nodes in the prev_set if any
+        npredecessors +=
+            __kmp_depnode_link_successor(gtid, thread, task, node, prev_set);
----------------
AndreyChurbanov wrote:
> protze.joachim wrote:
> > If I understand this code right, this has O(n^2) complexity for two sets of 
> > size n?
> > 
> > Consider:
> > ```
> > for (int i=0; i<n; i++) {
> > #pragma omp task depend(in:A)
> > work(A,i);
> > }
> > for (int i=0; i<n; i++) {
> > #pragma omp task depend(inoutset:A)
> > work(A,i);
> > }
> > ```
> > All n tasks in the first loop would be predecessor for each of the tasks in 
> > the second loop. This will also result in n^2 releases/notifications.
> > 
> > 
> > I'd suggest to model the dependencies like:
> > ```
> > for (int i=0; i<n; i++) {
> > #pragma omp task depend(in:A)
> > work(A,i);
> > }
> > #pragma omp taskwait depend(inout:A) nowait
> > for (int i=0; i<n; i++) {
> > #pragma omp task depend(inoutset:A)
> > work(A,i);
> > }
> > ```
> > I.e., add a dummy dependency node between the two sets, which has n 
> > predecessors and n successors. You probably understand the depnode code 
> > better, than I do, but I think you would only need to add some code in line 
> > 357, where `last_set` is moved to `prev_set`.
> > This dummy node would reduce linking and releasing complexity to O(n).
> This could be a separate research project.  Because such change may cause 
> performance regressions on some real codes, so it needs thorough 
> investigation.  I mean insertion of an auxiliary dummy node into dependency 
> graph is definitely an overhead, which may or may not be overridden by 
> reduction in number of edges in the graph.  Or it can be made optional 
> optimization under an external control, if there is a significant gain in 
> some particular case.
I don't suggest to insert the auxiliary node in the general case. Just for the 
case that you see a transition of `in` -> `inoutset` or vice versa.

With current task dependencies, you always have 1 node notifying n nodes (`out` 
-> `in`) or n nodes notifying one node (`in` -> `out`). You can see this as an 
amortized O(1) operation per task in the graph.

Here you introduce a code pattern, where n nodes each will need to notify m 
other nodes. This leads to an O(n) operation per task. I'm really worried about 
the complexity of this pattern.
If there is no use case with large n, I don't understand, why `inoutset` was 
introduced in the first place.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to