On Mon, 8 May 2023 17:44:43 +0800 Lipeng Zhu <lipeng....@intel.com> wrote:
> This patch try to introduce the rwlock and split the read/write to > unit_root tree and unit_cache with rwlock instead of the mutex to > increase CPU efficiency. In the get_gfc_unit function, the percentage > to step into the insert_unit function is around 30%, in most instances, > we can get the unit in the phase of reading the unit_cache or unit_root > tree. So split the read/write phase by rwlock would be an approach to > make it more parallel. > > BTW, the IPC metrics can gain around 9x in our test > server with 220 cores. The benchmark we used is > https://github.com/rwesson/NEAT See commentary typos below. You did not state if you regression tested the patch? Other than that it LGTM but i cannot approve it. > diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h > index ad226c8e856..0033cc74252 100644 > --- a/libgfortran/io/async.h > +++ b/libgfortran/io/async.h > @@ -210,6 +210,128 @@ > DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", > aio_prefix, #mutex, mutex); \ > } while (0) > > +#ifdef __GTHREAD_RWLOCK_INIT > +#define RWLOCK_DEBUG_ADD(rwlock) do { \ > + aio_rwlock_debug *n; \ > + n = xmalloc (sizeof(aio_rwlock_debug)); \ Missing space before the open brace: sizeof ( > diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c > index 82664dc5f98..62f1db21d34 100644 > --- a/libgfortran/io/unit.c > +++ b/libgfortran/io/unit.c > @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME respectively. > If not, see > > > /* IO locking rules: > - UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE. > + UNIT_RWLOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE. > + And use the rwlock to spilt read and write phase to UNIT_ROOT tree > + and UNIT_CACHE to increase CPU efficiency. s/spilt/split. Maybe: Using an rwlock improves efficiency by allowing us to separate readers and writers of both UNIT_ROOT and UNIT_CACHE. > @@ -350,6 +356,17 @@ retry: > if (c == 0) > break; > } > + /* We did not find a unit in the cache nor in the unit list, create a new > + (locked) unit and insert into the unit list and cache. > + Manipulating either or both the unit list and the unit cache requires to > + hold a write-lock [for obvious reasons]: > + 1. By separating the read/write lock, it will greatly reduce the > contention > + at the read part, while write part is not always necessary or most > + unlikely once the unit hit in cache. + By separating the read/write lock, we will greatly reduce the contention + on the read part, while the write part is unlikely once the unit hits + the cache. > + 2. We try to balance the implementation complexity and the performance > + gains that fit into current cases we observed by just using a > + pthread_rwlock. */ Let's drop 2. thanks,