AndreyChurbanov added inline comments.

================
Comment at: openmp/runtime/src/kmp_taskdeps.cpp:344-346
+        // link node as successor of all nodes in the prev_set if any
+        npredecessors +=
+            __kmp_depnode_link_successor(gtid, thread, task, node, prev_set);
----------------
protze.joachim wrote:
> AndreyChurbanov wrote:
> > protze.joachim wrote:
> > > If I understand this code right, this has O(n^2) complexity for two sets 
> > > of size n?
> > > 
> > > Consider:
> > > ```
> > > for (int i=0; i<n; i++) {
> > > #pragma omp task depend(in:A)
> > > work(A,i);
> > > }
> > > for (int i=0; i<n; i++) {
> > > #pragma omp task depend(inoutset:A)
> > > work(A,i);
> > > }
> > > ```
> > > All n tasks in the first loop would be predecessor for each of the tasks 
> > > in the second loop. This will also result in n^2 releases/notifications.
> > > 
> > > 
> > > I'd suggest to model the dependencies like:
> > > ```
> > > for (int i=0; i<n; i++) {
> > > #pragma omp task depend(in:A)
> > > work(A,i);
> > > }
> > > #pragma omp taskwait depend(inout:A) nowait
> > > for (int i=0; i<n; i++) {
> > > #pragma omp task depend(inoutset:A)
> > > work(A,i);
> > > }
> > > ```
> > > I.e., add a dummy dependency node between the two sets, which has n 
> > > predecessors and n successors. You probably understand the depnode code 
> > > better, than I do, but I think you would only need to add some code in 
> > > line 357, where `last_set` is moved to `prev_set`.
> > > This dummy node would reduce linking and releasing complexity to O(n).
> > This could be a separate research project.  Because such change may cause 
> > performance regressions on some real codes, so it needs thorough 
> > investigation.  I mean insertion of an auxiliary dummy node into dependency 
> > graph is definitely an overhead, which may or may not be overridden by 
> > reduction in number of edges in the graph.  Or it can be made optional 
> > optimization under an external control, if there is a significant gain in 
> > some particular case.
> I don't suggest to insert the auxiliary node in the general case. Just for 
> the case that you see a transition of `in` -> `inoutset` or vice versa.
> 
> With current task dependencies, you always have 1 node notifying n nodes 
> (`out` -> `in`) or n nodes notifying one node (`in` -> `out`). You can see 
> this as an amortized O(1) operation per task in the graph.
> 
> Here you introduce a code pattern, where n nodes each will need to notify m 
> other nodes. This leads to an O(n) operation per task. I'm really worried 
> about the complexity of this pattern.
> If there is no use case with large n, I don't understand, why `inoutset` was 
> introduced in the first place.
> I don't suggest to insert the auxiliary node in the general case. Just for 
> the case that you see a transition of `in` -> `inoutset` or vice versa.
> 
> With current task dependencies, you always have 1 node notifying n nodes 
> (`out` -> `in`) or n nodes notifying one node (`in` -> `out`). You can see 
> this as an amortized O(1) operation per task in the graph.
> 
> Here you introduce a code pattern, where n nodes each will need to notify m 
> other nodes. This leads to an O(n) operation per task. I'm really worried 
> about the complexity of this pattern.
No, I don't introduce the pattern, as it already worked for sets of tasks with 
`in` dependency following set of tasks with `mutexinoutset` dependency or vice 
versa.

> If there is no use case with large n, I don't understand, why `inoutset` was 
> introduced in the first place.
I didn't like that `inoutset` was introduced as the clone of "in" in order to 
separate two (big) sets of mutually independent tasks, but this has already 
been added to specification, so we have to implement it. Of cause your 
suggestion can dramatically speed up dependency processing of some trivial case 
with two huge sets of tasks one wit `in` dependency another with `inoutset`; 
but it slightly changes the semantics of user's code, and actually can slowdown 
other cases.  I would let users do such optimizations.  Or compiler can do this 
once it sees big sets of tasks those don't have barrier-like "taskwait nowait". 
 For user this is one line of code, for compiler this is dozen lines of code, 
for runtime library this is pretty big change which does not worth efforts to 
implement, I think.  Especially given that such "optimization" will definitely 
slowdown some cases.  E.g. small set of tasks with `in` dependency can be 
followed single task with `inoutset` in a loop. Why should we slowdown this 
case?  Library has no idea of the structure of user's code.

In general, I don't like the idea when runtime library tries to optimize user's 
code.  Especially along with other changes in the same patch.



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to