On Tue, Nov 29, 2011 at 04:15:36PM +1030, Alan Modra wrote: > On Mon, Nov 28, 2011 at 05:42:15PM -0800, Richard Henderson wrote: > > On 11/28/2011 06:02 AM, Alan Modra wrote: > > > - unsigned int ret = bar->generation & ~3; > > > - /* Do we need any barrier here or is __sync_add_and_fetch acting > > > - as the needed LoadLoad barrier already? */ > > > - ret += __sync_add_and_fetch (&bar->awaited, -1) == 0; > > > + unsigned int ret = __atomic_load_4 (&bar->generation, > > > MEMMODEL_ACQUIRE) & ~3; > > > + ret += __atomic_add_fetch (&bar->awaited, -1, MEMMODEL_ACQ_REL) == 0; > > > > Given that the read from bar->generation is ACQ, we don't need a duplicate > > barrier from the REL on the atomic add. I believe both can be > > MEMMODEL_ACQUIRE > > both in order to force the ordering of these two memops, as well as force > > these > > to happen before anything subsequent. > > I tried with MEMMODEL_ACQUIRE and ran force-parallel-6.exe, the test > that seems most sensitive to barrier problems, many times, and it hangs > occasionally in futex_wait called via gomp_team_barrier_wait_end. > > I believe that threads can't be allowed to exit from > gomp_{,team_}barrier_wait without hitting a release barrier, and > perhaps from gomp_barrier_wait_last too. gomp_barrier_wait_start is a > convenient point to insert the barrier, and a minimal change from the > old code using __sync_add_and_fetch. I can add a comment. ;-)
Committed rev 181833. -- Alan Modra Australia Development Lab, IBM