RE: [PATCH v4] libgfortran: Replace mutex with rwlock
Hi Thomas, > > Hi Lipeng, > > > May I know any comment or concern on this patch, thanks for your time > > 😄 > > Thanks for your patience in getting this reviewed. > > A few remarks / questions. > > Which strategy is used in this implementation, read-preferring or write- > preferring? And if read-preferring is used, is there a danger of deadlock if > people do unreasonable things? > Maybe you could explain that, also in a comment in the code. > Yes, the implementation use the read-preferring strategy, and comments in code. When adding the test cases, I didn’t meet the situation which may cause the deadlock. Maybe you can give more guidance about that. > Can you add some sort of torture test case(s) which does a lot of > opening/closing/reading/writing, possibly with asynchronous I/O and/or > pthreads, to catch possible problems? If there is a system dependency or > some race condition, chances are that regression testers will catch this. > Sure, as your comments, in the patch V6, I added 3 test cases with OpenMP to test different cases in concurrency respectively: 1. find and create unit very frequently to stress read lock and write lock. 2. only access the unit which exist in cache to stress read lock. 3. access the same unit in concurrency. For the third test case, it also help to find a bug: When unit can't be found in cache nor unit list in read phase, then threads will try to acquire write lock to insert the same unit, this will cause duplicate key error. To fix this bug, I get the unit from unit list once again before insert in write lock. More details you can refer the patch v6. > With this, the libgfortran parts are OK, unless somebody else has more > comments, so give this a couple of days. I cannot approve the libgcc parts, > that would be somebody else (Jakub?) > > Best regards > > Thomas > Best Regards, Lipeng Zhu
[PATCH v6] libgfortran: Replace mutex with rwlock
From: Lipeng Zhu This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro (__gthrw): New function (__gthread_rwlock_rdlock): New function (__gthread_rwlock_tryrdlock): New function (__gthread_rwlock_wrlock): New function (__gthread_rwlock_trywrlock): New function (__gthread_rwlock_unlock): New function libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New * io/async.h (RWLOCK_DEBUG_ADD): New macro (CHECK_RDLOCK): New macro (CHECK_WRLOCK): New macro (TAIL_RWLOCK_DEBUG_QUEUE): New macro (IN_RWLOCK_DEBUG_QUEUE): New macro (RDLOCK): New macro (WRLOCK): New macro (RWUNLOCK): New macro (RD_TO_WRLOCK): New macro (INTERN_RDLOCK): New macro (INTERN_WRLOCK): New macro (INTERN_RWUNLOCK): New macro * io/io.h (internal_proto): Define unit_rwlock * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock (st_write_done_worker): Relace unit_lock with unit_rwlock * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock (if): Relace unit_lock with unit_rwlock (close_unit_1): Relace unit_lock with unit_rwlock (close_units): Relace unit_lock with unit_rwlock (newunit_alloc): Relace unit_lock with unit_rwlock * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock --- v1 -> v2: Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined. v2 -> v3: Rebase the patch with trunk branch. v3 -> v4: Update the comments. v4 -> v5: Fix typos and code formatter. v5 -> v6: Add unit tests. Reviewed-by: Hongjiu Lu Reviewed-by: Bernhard Reutner-Fischer Reviewed-by: Thomas Koenig Signed-off-by: Zhu, Lipeng --- libgcc/gthr-posix.h | 60 +++ libgfortran/io/async.c| 4 + libgfortran/io/async.h| 154 +- libgfortran/io/io.h | 15 +- libgfortran/io/transfer.c | 8 +- libgfortran/io/unit.c | 116 - libgfortran/io/unix.c | 16 +- .../testsuite/libgomp.fortran/rwlock_1.f90| 33 .../testsuite/libgomp.fortran/rwlock_2.f90| 22 +++ .../testsuite/libgomp.fortran/rwlock_3.f90| 18 ++ 10 files changed, 387 insertions(+), 59 deletions(-) create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_3.f90 diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h index aebcfdd9f4c..73283082997 100644 --- a/libgcc/gthr-posix.h +++ b/libgcc/gthr-posix.h @@ -48,6 +48,9 @@ typedef pthread_t __gthread_t; typedef pthread_key_t __gthread_key_t; typedef pthread_once_t __gthread_once_t; typedef pthread_mutex_t __gthread_mutex_t; +#ifndef __cplusplus +typedef pthread_rwlock_t __gthread_rwlock_t; +#endif typedef pthread_mutex_t __gthread_recursive_mutex_t; typedef pthread_cond_t __gthread_cond_t; typedef struct timespec __gthread_time_t; @@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t; #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function +#ifndef __cplusplus +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER +#endif #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT #if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER) #define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER @@ -135,6 +141,13 @@ __gthrw(pthread_mutexattr_init) __gthrw(pthread_mutexattr_settype) __gthrw(pthread_mutexattr_destroy) +#ifndef __cplusplus +__gthrw(pthread_rwlock_rdlock) +__gthrw(pthread_rwlock_tryrdlock) +__gthrw(pthread_rwlock_wrlock) +__gthrw(pthread_rwlock_trywrlock) +__gthrw(pthread_rwlock_unlock) +#endif #if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK) /* Objective-C. */ @@ -885,6 +898,53 @@ __gthread_cond_destroy (__gthread_cond_t* __cond) return __gthrw_(pthread_cond_destroy) (__cond); } +#ifndef __cplusplus +static inline int +__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_rdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock); +