https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99277
--- Comment #17 from Thiago Macieira <thiago at kde dot org> --- (In reply to Thomas Rodgers from comment #16) > The original implementation came from Olvier Giroux and is part of libc++. > The libc++ implementation also does not use a type that futex or > ulock_wait/wake (uint64_t) can handle. I have discussed this in the past > with Olivier, the choice of char was deliberate on his part. The > implementation has been tested on a number of platforms (including time on > ORNL's Summit). I remember our discussion on this. But libc++ isn't trying to be optimal and it never supports direct futex. The fact that they chose this path does not mean libstdc++ must too. > The following comment, preserved from libc++ should be > considered carefully before any change here - > > " 2. A great deal of attention has been paid to avoid cache line thrashing > by flattening the tree structure into cache-line sized arrays, that > are indexed in an efficient way." > > It is my opinion that the bar for making a change here is high. I would need > to see benchmark numbers that illustrate the performance differences under > various contention scenarios vs impact on caches by being able to fit the > entire tree in a single cache line using char, vs four or eight cache lines > using the type favored by futex or ulock_wait/wake. Indeed. My other $DAYJOB involves a lot of cacheline thrashing up to and including current 480-core machines, so I appreciate the thought there. In any case, we can't change the design even if we turn up new data showing that there's benefit or a bottleneck somewhere.