https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99277

--- Comment #17 from Thiago Macieira <thiago at kde dot org> ---
(In reply to Thomas Rodgers from comment #16)
> The original implementation came from Olvier Giroux and is part of libc++.
> The libc++ implementation also does not use a type that futex or
> ulock_wait/wake (uint64_t) can handle. I have discussed this in the past
> with Olivier, the choice of char was deliberate on his part. The
> implementation has been tested on a number of platforms (including time on
> ORNL's Summit). 

I remember our discussion on this. But libc++ isn't trying to be optimal and it
never supports direct futex. The fact that they chose this path does not mean
libstdc++ must too.


> The following comment, preserved from libc++ should be
> considered carefully before any change here -
> 
> " 2. A great deal of attention has been paid to avoid cache line thrashing
>     by flattening the tree structure into cache-line sized arrays, that
>     are indexed in an efficient way."
> 
> It is my opinion that the bar for making a change here is high. I would need
> to see benchmark numbers that illustrate the performance differences under
> various contention scenarios vs impact on caches by being able to fit the
> entire tree in a single cache line using char, vs four or eight cache lines
> using the type favored by futex or ulock_wait/wake.

Indeed. My other $DAYJOB involves a lot of cacheline thrashing up to and
including current 480-core machines, so I appreciate the thought there.

In any case, we can't change the design even if we turn up new data showing
that there's benefit or a bottleneck somewhere.

Reply via email to