On Tue, Jan 12, 2016 at 02:46:52PM +0100, Martin Jambor wrote: > diff --git a/libgomp/task.c b/libgomp/task.c > index ab5df51..828c1fb 100644 > --- a/libgomp/task.c > +++ b/libgomp/task.c > @@ -584,8 +592,34 @@ GOMP_PLUGIN_target_task_completion (void *data) > gomp_mutex_unlock (&team->task_lock); > } > ttask->state = GOMP_TARGET_TASK_FINISHED; > - free (ttask->firstprivate_copies); > - gomp_target_task_completion (team, task); > + > + if (ttask->devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
First of all, I'm surprised you've changed GOMP_PLUGIN_target_task_completion rather than gomp_target_task_completion. The difference between those two is that the latter is run nost just from the async event, but also if GOMP_PLUGIN_target_task_completion happens to run before the gomp_mutex_lock (&team->task_lock); is acquired in the various spots before child_task->kind = GOMP_TASK_ASYNC_RUNNING; The point is if the async completion happens too early for the thread spawning it to notice, we want to complete it only when the spawning thread is ready for that. But looking at GOMP_PLUGIN_target_task_completion, I see we have a bug in there, gomp_mutex_lock (&team->task_lock); if (ttask->state == GOMP_TARGET_TASK_READY_TO_RUN) { ttask->state = GOMP_TARGET_TASK_FINISHED; gomp_mutex_unlock (&team->task_lock); } ttask->state = GOMP_TARGET_TASK_FINISHED; gomp_target_task_completion (team, task); gomp_mutex_unlock (&team->task_lock); there was meant to be I think return; after the first unlock, otherwise it doubly unlocks the same lock, and performs gomp_target_task_completion without the lock held, which may cause great havoc. I'll test the obvious change here. > + { > + free (ttask->firstprivate_copies); > + size_t new_tasks > + = gomp_task_run_post_handle_depend (task, team); > + gomp_task_run_post_remove_parent (task); > + gomp_clear_parent (&task->children_queue); > + gomp_task_run_post_remove_taskgroup (task); > + team->task_count--; > + int do_wake = 0; > + if (new_tasks) > + { > + do_wake = team->nthreads - team->task_running_count > + - !task->in_tied_task; > + if (do_wake > new_tasks) > + do_wake = new_tasks; > + } > + /* Unlike other places, the following will be also run with the > task_lock > + held, but there is nothing to do about it. See the comment in > + gomp_target_task_completion. */ > + gomp_finish_task (task); > + free (task); > + gomp_team_barrier_set_task_pending (&team->barrier); This one really looks weird. I mean, this should be done if we increase the number of team's tasks, and gomp_task_run_post_handle_depend should do that if it adds new tasks (IMHO it does), but if new_tasks is 0, then there is no new task to schedule and therefore it should not be set. > + gomp_team_barrier_wake (&team->barrier, do_wake ? do_wake : 1); > + } > + else > + gomp_target_task_completion (team, task); > gomp_mutex_unlock (&team->task_lock); > } > > @@ -1275,7 +1309,12 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state) > thr->task = task; > } > else > - return; > + { > + if (team->task_count == 0 > + && gomp_team_barrier_waiting_for_tasks (&team->barrier)) > + gomp_team_barrier_done (&team->barrier, state); > + return; > + } > gomp_mutex_lock (&team->task_lock); > if (child_task) > { And this hunk looks wrong too. gomp_team_barrier_done shouldn't be done outside of the lock held, there is no waking and I don't understand the rationale for why you think current gomp_barrier_handle_tasks is wrong. Anyway, if you make the HSA branch work the less efficient way of creating a task that just frees the firstprivate copies, and post after the merge into trunk a WIP patch that includes this, plus if there are clear instructions how to build the HSA stuff on the wiki, my son has a box with AMD Kaveri, so I'll try to debug it there. Jakub