https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122356

--- Comment #2 from Matthew Malcomson <matmal01 at gcc dot gnu.org> ---
(In reply to Matthew Malcomson from comment #1)
> I am leaning towards approach (B) because it feels like the most robust
> (always using the same code flow to ensure the synchronization).
> 
> I don't think that performance would be hit much by the last thread going
> into `gomp_barrier_handle_tasks` and seeing no tasks to perform when there
> instead of  seeing directly in `gomp_team_barrier_wait_end`.

Uhh, I just actually tested this on the minimal benchmark that I had access too
and does nothing with tasks (so I reasoned would be least likely to show any
affect) and that I've been trying to optimize for the last few months.

Turns out that going into `gomp_barrier_handle_tasks` has a noticeable (though
not huge) affect, and I now would lean towards option (A).

Reply via email to