Re: [PATCH v2] libgfortran: Bugfix if not define HAVE_ATOMIC_FETCH_ADD
On 1/10/2024 7:52 PM, Richard Earnshaw wrote: On 05/01/2024 01:43, Lipeng Zhu wrote: This patch try to fix the bug when HAVE_ATOMIC_FETCH_ADD is not defined in dec_waiting_unlocked function. As io.h does not include async.h, the WRLOCK and RWUNLOCK macros are undefined. libgfortran/ChangeLog: * io/io.h (dec_waiting_unlocked): Use __gthread_rwlock_wrlock/__gthread_rwlock_unlock or __gthread_mutex_lock/__gthread_mutex_unlock functions to replace WRLOCK and RWUNLOCK macros. Signed-off-by: Lipeng Zhu Has this been committed yet? R. Hi Richard, The patch is waiting for community's review. Hi Tobias, Any concern about this patch? Best Regards, Lipeng Zhu --- libgfortran/io/io.h | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h index 15daa0995b1..c7f0f7d7d9e 100644 --- a/libgfortran/io/io.h +++ b/libgfortran/io/io.h @@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u) #ifdef HAVE_ATOMIC_FETCH_ADD (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED); #else - WRLOCK (&unit_rwlock); +#ifdef __GTHREAD_RWLOCK_INIT + __gthread_rwlock_wrlock (&unit_rwlock); + u->waiting--; + __gthread_rwlock_unlock (&unit_rwlock); +#else + __gthread_mutex_lock (&unit_rwlock); u->waiting--; - RWUNLOCK (&unit_rwlock); + __gthread_mutex_unlock (&unit_rwlock); +#endif #endif }
Re: [PATCH v7] libgfortran: Replace mutex with rwlock
On 1/3/2024 5:14 PM, Lipeng Zhu wrote: On 2023/12/21 19:42, Thomas Schwinge wrote: Hi! On 2023-12-13T21:52:29+0100, I wrote: On 2023-12-12T02:05:26+, "Zhu, Lipeng" wrote: On 2023/12/12 1:45, H.J. Lu wrote: On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng wrote: On 2023/12/9 23:23, Jakub Jelinek wrote: On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote: This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT Ok for trunk, thanks. Thanks! Looking forward to landing to trunk. Pushed for you. I've just filed <https://gcc.gnu.org/PR113005> "'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test timeouts". Would you be able to look into that? See my update in there. Grüße Thomas - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Updated in https://gcc.gnu.org/PR113005. Could you help to verify if the draft patch would fix the execution test timeout issue on your side? Hi Thomas, Any feedback from your side? Regards, Lipeng Zhu
Re: [PATCH v7] libgfortran: Replace mutex with rwlock
On 2023/12/14 23:50, Richard Earnshaw (lists) wrote: On 09/12/2023 15:39, Lipeng Zhu wrote: This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro. (__gthrw): New function. (__gthread_rwlock_rdlock): New function. (__gthread_rwlock_tryrdlock): New function. (__gthread_rwlock_wrlock): New function. (__gthread_rwlock_trywrlock): New function. (__gthread_rwlock_unlock): New function. libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New macro. * io/async.h (RWLOCK_DEBUG_ADD): New macro. (CHECK_RDLOCK): New macro. (CHECK_WRLOCK): New macro. (TAIL_RWLOCK_DEBUG_QUEUE): New macro. (IN_RWLOCK_DEBUG_QUEUE): New macro. (RDLOCK): New macro. (WRLOCK): New macro. (RWUNLOCK): New macro. (RD_TO_WRLOCK): New macro. (INTERN_RDLOCK): New macro. (INTERN_WRLOCK): New macro. (INTERN_RWUNLOCK): New macro. * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in a comment. (unit_lock): Remove including associated internal_proto. (unit_rwlock): New declarations including associated internal_proto. (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock instead of __gthread_mutex_lock and __gthread_mutex_unlock on unit_lock. * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (st_write_done_worker): Likewise. * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules' comment. Use unit_rwlock variable instead of unit_lock variable. (get_gfc_unit_from_unit_root): New function. (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_units): Likewise. (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on unit_lock. * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. It looks like this has broken builds on arm-none-eabi when using newlib: In file included from /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran /runtime/error.c:27: /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In function ‘dec_waiting_unlocked’: /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error : implicit declaration of function ‘WRLOCK’ [-Wimplicit-function-declaration] 1023 | WRLOCK (&unit_rwlock); | ^~ /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error : implicit declaration of function ‘RWUNLOCK’ [-Wimplicit-function-declaration] 1025 | RWUNLOCK (&unit_rwlock); | ^~~~ R. Hi Richard, The root cause is that the macro WRLOCK and RWUNLOCK are not defined in io.h. The reason of x86 platform not failed is that HAVE_ATOMIC_FETCH_ADD is defined then caused above macros were never been used. Code logic show as below: #ifdef HAVE_ATOMIC_FETCH_ADD (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED); #else WRLOCK (&unit_rwlock); u->waiting--; RWUNLOCK (&unit_rwlock); #endif I just draft a patch try to fix this bug, because I didn't have arm platform, would you help to validate if it was fixed on arm platform? diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h index 15daa0995b1..c7f0f7d7d9e 100644 --- a/libgfortran/io/io.h +++ b/libgfortran/io/io.h @@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u) #ifdef HAVE_ATOMIC_FETCH_ADD (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED); #else - WRLOCK (&unit_rwlock); +#ifdef __GTHREAD_RWLOCK_INIT + __gthread_rwlock_wrlock (&unit_rwlock); + u->waiting--; + __gthread_rwlock_unlock (&unit_rwlock); +#else + __gthread_mutex_lock (&unit_rwlock); u->waiting--; - RWUNLOCK (&unit_rwlock); + __gthread_mutex_unlock (&unit_rwlock); +#endif #endif } Lipeng Zhu
[PATCH] libgfortran: Bugfix if not define HAVE_ATOMIC_FETCH_ADD
This patch try to fix the bug when HAVE_ATOMIC_FETCH_ADD is not defined in dec_waiting_unlocked function. libgfortran/ChangeLog: * io/io.h (dec_waiting_unlocked): Use __gthread_rwlock_wrlock/__gthread_rwlock_unlock or __gthread_mutex_lock/__gthread_mutex_unlock functions to replace WRLOCK and RWUNLOCK macros. Signed-off-by: Lipeng Zhu --- libgfortran/io/io.h | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h index 15daa0995b1..c7f0f7d7d9e 100644 --- a/libgfortran/io/io.h +++ b/libgfortran/io/io.h @@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u) #ifdef HAVE_ATOMIC_FETCH_ADD (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED); #else - WRLOCK (&unit_rwlock); +#ifdef __GTHREAD_RWLOCK_INIT + __gthread_rwlock_wrlock (&unit_rwlock); + u->waiting--; + __gthread_rwlock_unlock (&unit_rwlock); +#else + __gthread_mutex_lock (&unit_rwlock); u->waiting--; - RWUNLOCK (&unit_rwlock); + __gthread_mutex_unlock (&unit_rwlock); +#endif #endif } -- 2.39.3
Re: RE: [PATCH v7] libgfortran: Replace mutex with rwlock
Hi Thomas, On 2023/12/21 19:42, Thomas Schwinge wrote: Hi! On 2023-12-13T21:52:29+0100, I wrote: On 2023-12-12T02:05:26+, "Zhu, Lipeng" wrote: On 2023/12/12 1:45, H.J. Lu wrote: On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng wrote: On 2023/12/9 23:23, Jakub Jelinek wrote: On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote: This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT Ok for trunk, thanks. Thanks! Looking forward to landing to trunk. Pushed for you. I've just filed <https://gcc.gnu.org/PR113005> "'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test timeouts". Would you be able to look into that? See my update in there. Grüße Thomas -- > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Since I don't have gcc bugzilla account. Reply in this thread: Limit themselves to some lower 'OMP_NUM_THREADS' should be an option or increase the execution timeout? But I can't reproduce the execution timeout failure on both powerpc9 and powerpc10 arch machine. And I also tried to decrease the CPU frequency from 2.6G to 800M, these test cases still can run successfully. > so only a little bit of an improvement of the new "rwlock" libgfortran vs. old "mutex" GCC 10 one, curiously. (But supposedly that depends on the hardware or other factors?) The rwlock can increase the IPC a lot, maybe the wall time you listed is not obvious. $ grep ^cpu < /proc/cpuinfo | uniq -c 192 cpu : POWER10 (architected), altivec supported Native configuration is powerpc64le-unknown-linux-gnu Schedule of variations: unix PASS: libgomp.fortran/rwlock_1.f90 -O0 (test for excess errors) PASS: libgomp.fortran/rwlock_1.f90 -O0 execution test PASS: libgomp.fortran/rwlock_1.f90 -O1 (test for excess errors) PASS: libgomp.fortran/rwlock_1.f90 -O1 execution test PASS: libgomp.fortran/rwlock_1.f90 -O2 (test for excess errors) PASS: libgomp.fortran/rwlock_1.f90 -O2 execution test PASS: libgomp.fortran/rwlock_1.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) PASS: libgomp.fortran/rwlock_1.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: libgomp.fortran/rwlock_1.f90 -O3 -g (test for excess errors) PASS: libgomp.fortran/rwlock_1.f90 -O3 -g execution test PASS: libgomp.fortran/rwlock_1.f90 -Os (test for excess errors) PASS: libgomp.fortran/rwlock_1.f90 -Os execution test PASS: libgomp.fortran/rwlock_2.f90 -O0 (test for excess errors) PASS: libgomp.fortran/rwlock_2.f90 -O0 execution test PASS: libgomp.fortran/rwlock_2.f90 -O1 (test for excess errors) PASS: libgomp.fortran/rwlock_2.f90 -O1 execution test PASS: libgomp.fortran/rwlock_2.f90 -O2 (test for excess errors) PASS: libgomp.fortran/rwlock_2.f90 -O2 execution test PASS: libgomp.fortran/rwlock_2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) PASS: libgomp.fortran/rwlock_2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: libgomp.fortran/rwlock_2.f90 -O3 -g (test for excess errors) PASS: libgomp.fortran/rwlock_2.f90 -O3 -g execution test PASS: libgomp.fortran/rwlock_2.f90 -Os (test for excess errors) PASS: libgomp.fortran/rwlock_2.f90 -Os execution test PASS: libgomp.fortran/rwlock_3.f90 -O0 (test for excess errors) PASS: libgomp.fortran/rwlock_3.f90 -O0 execution test PASS: libgomp.fortran/rwlock_3.f90 -O1 (test for excess errors) PASS: libgomp.fortran/rwlock_3.f90 -O1 execution test PASS: libgomp.fortran/rwlock_3.f90 -O2 (test for excess errors) PASS: libgomp.fortran/rwlock_3.f90 -O2 execution test PASS: libgomp.fortran/rwlock_3.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) PASS: libgomp.fortran/rwlock_3.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test Lipeng Zhu
Re: [PATCH v7] libgfortran: Replace mutex with rwlock
On 2024/1/2 11:57, Vaseeharan Vinayagamoorthy wrote: Hi Lipeng, It looks like your draft patch to fix the builds for arm-none-eabi target is not merged yet, because our arm-none-eabi builds are still broken. Are you waiting for additional information, or would you be able to fix this issue? Kind regards, Vasee Hi Vasee, Actually I already sent a patch https://inbox.sourceware.org/gcc-patches/20231222023605.3894839-1-lipeng@intel.com/ to fix the build failure issue, now it is waiting for community to review. Lipeng Zhu From: Richard Earnshaw Sent: 15 December 2023 19:23 To: Lipeng Zhu ; Richard Earnshaw ; ja...@redhat.com Cc: fort...@gcc.gnu.org ; gcc-patches@gcc.gnu.org ; hongjiu...@intel.com ; pan.d...@intel.com ; rep.dot@gmail.com ; tianyou...@intel.com ; tkoe...@netcologne.de ; wangyang@intel.com Subject: Re: [PATCH v7] libgfortran: Replace mutex with rwlock On 15/12/2023 11:31, Lipeng Zhu wrote: On 2023/12/14 23:50, Richard Earnshaw (lists) wrote: On 09/12/2023 15:39, Lipeng Zhu wrote: This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro. (__gthrw): New function. (__gthread_rwlock_rdlock): New function. (__gthread_rwlock_tryrdlock): New function. (__gthread_rwlock_wrlock): New function. (__gthread_rwlock_trywrlock): New function. (__gthread_rwlock_unlock): New function. libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New macro. * io/async.h (RWLOCK_DEBUG_ADD): New macro. (CHECK_RDLOCK): New macro. (CHECK_WRLOCK): New macro. (TAIL_RWLOCK_DEBUG_QUEUE): New macro. (IN_RWLOCK_DEBUG_QUEUE): New macro. (RDLOCK): New macro. (WRLOCK): New macro. (RWUNLOCK): New macro. (RD_TO_WRLOCK): New macro. (INTERN_RDLOCK): New macro. (INTERN_WRLOCK): New macro. (INTERN_RWUNLOCK): New macro. * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in a comment. (unit_lock): Remove including associated internal_proto. (unit_rwlock): New declarations including associated internal_proto. (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock instead of __gthread_mutex_lock and __gthread_mutex_unlock on unit_lock. * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (st_write_done_worker): Likewise. * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules' comment. Use unit_rwlock variable instead of unit_lock variable. (get_gfc_unit_from_unit_root): New function. (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_units): Likewise. (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on unit_lock. * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. It looks like this has broken builds on arm-none-eabi when using newlib: In file included from /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran /runtime/error.c:27: /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In function ‘dec_waiting_unlocked’: /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error : implicit declaration of function ‘WRLOCK’ [-Wimplicit-function-declaration] 1023 | WRLOCK (&unit_rwlock); | ^~ /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error : implicit declaration of function ‘RWUNLOCK’ [-Wimplicit-function-declaration] 1025 | RWUNLOCK (&unit_rwlock); | ^~~~ R. Hi Richard, The root cause is that the macro WRLOCK and RWUNLOCK are not defined in io.h. The reason of x86 platform not failed is that HAVE_ATOMIC_FETCH_ADD is defined then caused above macros were never been used. Code logic show as below: #ifdef HAVE_ATOMIC_FETCH_ADD (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED); #else WRLOCK (&unit_rwlock); u->waiting--; RWUNLOCK (&unit_rwlock); #endif I just draft a patch try to fix this bug
Re: RE: [PATCH v7] libgfortran: Replace mutex with rwlock
On 2023/12/21 19:42, Thomas Schwinge wrote: Hi! On 2023-12-13T21:52:29+0100, I wrote: On 2023-12-12T02:05:26+, "Zhu, Lipeng" wrote: On 2023/12/12 1:45, H.J. Lu wrote: On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng wrote: On 2023/12/9 23:23, Jakub Jelinek wrote: On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote: This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT Ok for trunk, thanks. Thanks! Looking forward to landing to trunk. Pushed for you. I've just filed <https://gcc.gnu.org/PR113005> "'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test timeouts". Would you be able to look into that? See my update in there. Grüße Thomas - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Updated in https://gcc.gnu.org/PR113005. Could you help to verify if the draft patch would fix the execution test timeout issue on your side? diff --git a/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 b/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 index f90ecbeb00f..1c271ae031d 100644 --- a/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 +++ b/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 @@ -7,13 +7,12 @@ program main use omp_lib implicit none integer:: unit_number, v1, v2, i - character(11) :: file_name + character(16) :: file_name character(3) :: async = "no" !$omp parallel private (unit_number, v1, v2, file_name, async, i) do i = 0, 100 unit_number = 10 + omp_get_thread_num () - write (file_name, "(I3, A)") unit_number, "_tst.dat" - file_name = adjustl(file_name) + write (file_name, "(I5.5, A)") unit_number, "_tst.dat" open (unit_number, file=file_name, asynchronous="yes") ! call inquire with file parameter to test find_file in unix.c inquire (file=file_name, asynchronous=async) Lipeng Zhu
Re: [PATCH] libgfortran: Bugfix if not define HAVE_ATOMIC_FETCH_ADD
On 2024/1/3 19:12, Tobias Burnus wrote: On 22.12.23 03:36, Lipeng Zhu wrote: This patch try to fix the bug when HAVE_ATOMIC_FETCH_ADD is not defined in dec_waiting_unlocked function. libgfortran/ChangeLog: * io/io.h (dec_waiting_unlocked): Use __gthread_rwlock_wrlock/__gthread_rwlock_unlock or __gthread_mutex_lock/__gthread_mutex_unlock functions to replace WRLOCK and RWUNLOCK macros. Signed-off-by: Lipeng Zhu The change looks good to me + I assume it will work, but have not tested it myself. Downside is that it slightly breaks with the abstraction done with all the macros, but it seems to be the simplest solution. Hi Tobias, Thanks for your review, the reason I changed like this is because I found when LOCK macro was first introduced in this patch: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=2b4c90656132abb8b8ad155d345c7d4fbf1687c9, it replaced __gthread_mutex_lock method with LOCK macro in other files like io/unit.c, but remained __gthread_mutex_lock in io/io.h. I am not sure if this is intentional or not, to avoid potential risk, I used the most straightforward solution. What is really missing - and should be included in the commit message (before the ChangeLog block) - is the following information: As io.h does not include async.h, the WRLOCK and RWUNLOCK macros are undefined. (Or something similar in other words.) I think that helps others when looking at "git log" and wondering *why* that change was needed. Thanks, Tobias Thanks, I will update the commit message as suggested. Lipeng Zhu libgfortran/io/io.h | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h index 15daa0995b1..c7f0f7d7d9e 100644 --- a/libgfortran/io/io.h +++ b/libgfortran/io/io.h @@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u) #ifdef HAVE_ATOMIC_FETCH_ADD (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED); #else - WRLOCK (&unit_rwlock); +#ifdef __GTHREAD_RWLOCK_INIT + __gthread_rwlock_wrlock (&unit_rwlock); + u->waiting--; + __gthread_rwlock_unlock (&unit_rwlock); +#else + __gthread_mutex_lock (&unit_rwlock); u->waiting--; - RWUNLOCK (&unit_rwlock); + __gthread_mutex_unlock (&unit_rwlock); +#endif #endif } - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[PATCH v2] libgfortran: Bugfix if not define HAVE_ATOMIC_FETCH_ADD
This patch try to fix the bug when HAVE_ATOMIC_FETCH_ADD is not defined in dec_waiting_unlocked function. As io.h does not include async.h, the WRLOCK and RWUNLOCK macros are undefined. libgfortran/ChangeLog: * io/io.h (dec_waiting_unlocked): Use __gthread_rwlock_wrlock/__gthread_rwlock_unlock or __gthread_mutex_lock/__gthread_mutex_unlock functions to replace WRLOCK and RWUNLOCK macros. Signed-off-by: Lipeng Zhu --- libgfortran/io/io.h | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h index 15daa0995b1..c7f0f7d7d9e 100644 --- a/libgfortran/io/io.h +++ b/libgfortran/io/io.h @@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u) #ifdef HAVE_ATOMIC_FETCH_ADD (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED); #else - WRLOCK (&unit_rwlock); +#ifdef __GTHREAD_RWLOCK_INIT + __gthread_rwlock_wrlock (&unit_rwlock); + u->waiting--; + __gthread_rwlock_unlock (&unit_rwlock); +#else + __gthread_mutex_lock (&unit_rwlock); u->waiting--; - RWUNLOCK (&unit_rwlock); + __gthread_mutex_unlock (&unit_rwlock); +#endif #endif } -- 2.39.3
[PATCH v7] libgfortran: Replace mutex with rwlock
This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro. (__gthrw): New function. (__gthread_rwlock_rdlock): New function. (__gthread_rwlock_tryrdlock): New function. (__gthread_rwlock_wrlock): New function. (__gthread_rwlock_trywrlock): New function. (__gthread_rwlock_unlock): New function. libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New macro. * io/async.h (RWLOCK_DEBUG_ADD): New macro. (CHECK_RDLOCK): New macro. (CHECK_WRLOCK): New macro. (TAIL_RWLOCK_DEBUG_QUEUE): New macro. (IN_RWLOCK_DEBUG_QUEUE): New macro. (RDLOCK): New macro. (WRLOCK): New macro. (RWUNLOCK): New macro. (RD_TO_WRLOCK): New macro. (INTERN_RDLOCK): New macro. (INTERN_WRLOCK): New macro. (INTERN_RWUNLOCK): New macro. * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in a comment. (unit_lock): Remove including associated internal_proto. (unit_rwlock): New declarations including associated internal_proto. (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock instead of __gthread_mutex_lock and __gthread_mutex_unlock on unit_lock. * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (st_write_done_worker): Likewise. * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules' comment. Use unit_rwlock variable instead of unit_lock variable. (get_gfc_unit_from_unit_root): New function. (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_units): Likewise. (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on unit_lock. * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. --- v1 -> v2: Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined. v2 -> v3: Rebase the patch with trunk branch. v3 -> v4: Update the comments. v4 -> v5: Fix typos and code formatter. v5 -> v6: Add unit tests. v6 -> v7: Update ChangeLog and code formatter. Reviewed-by: Hongjiu Lu Reviewed-by: Bernhard Reutner-Fischer Reviewed-by: Thomas Koenig Reviewed-by: Jakub Jelinek Signed-off-by: Lipeng Zhu --- libgcc/gthr-posix.h | 60 +++ libgfortran/io/async.c| 4 + libgfortran/io/async.h| 151 ++ libgfortran/io/io.h | 15 +- libgfortran/io/transfer.c | 8 +- libgfortran/io/unit.c | 117 +- libgfortran/io/unix.c | 16 +- .../testsuite/libgomp.fortran/rwlock_1.f90| 33 .../testsuite/libgomp.fortran/rwlock_2.f90| 22 +++ .../testsuite/libgomp.fortran/rwlock_3.f90| 18 +++ 10 files changed, 386 insertions(+), 58 deletions(-) create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_3.f90 diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h index aebcfdd9f4c..73283082997 100644 --- a/libgcc/gthr-posix.h +++ b/libgcc/gthr-posix.h @@ -48,6 +48,9 @@ typedef pthread_t __gthread_t; typedef pthread_key_t __gthread_key_t; typedef pthread_once_t __gthread_once_t; typedef pthread_mutex_t __gthread_mutex_t; +#ifndef __cplusplus +typedef pthread_rwlock_t __gthread_rwlock_t; +#endif typedef pthread_mutex_t __gthread_recursive_mutex_t; typedef pthread_cond_t __gthread_cond_t; typedef struct timespec __gthread_time_t; @@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t; #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function +#ifndef __cplusplus +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER +#endif #define __GTHREAD_ONCE_IN
[PATCH] libgfortran: Replace mutex with rwlock
This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can increase from 0.25 to 2.2 in the Intel SRP server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro (__gthrw): New function (__gthread_rwlock_rdlock): New function (__gthread_rwlock_tryrdlock): New function (__gthread_rwlock_wrlock): New function (__gthread_rwlock_trywrlock): New function (__gthread_rwlock_unlock): New function libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New * io/async.h (RWLOCK_DEBUG_ADD): New macro (CHECK_RDLOCK): New macro (CHECK_WRLOCK): New macro (TAIL_RWLOCK_DEBUG_QUEUE): New macro (IN_RWLOCK_DEBUG_QUEUE): New macro (RDLOCK): New macro (WRLOCK): New macro (RWUNLOCK): New macro (RD_TO_WRLOCK): New macro (INTERN_RDLOCK): New macro (INTERN_WRLOCK): New macro (INTERN_RWUNLOCK): New macro * io/io.h (internal_proto): Define unit_rwlock * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock (st_write_done_worker): Relace unit_lock with unit_rwlock * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock (if): Relace unit_lock with unit_rwlock (close_unit_1): Relace unit_lock with unit_rwlock (close_units): Relace unit_lock with unit_rwlock (newunit_alloc): Relace unit_lock with unit_rwlock * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock --- libgcc/gthr-posix.h | 52 + libgfortran/io/async.c| 4 + libgfortran/io/async.h| 151 ++ libgfortran/io/io.h | 15 ++-- libgfortran/io/transfer.c | 8 +- libgfortran/io/unit.c | 65 libgfortran/io/unix.c | 16 ++-- 7 files changed, 265 insertions(+), 46 deletions(-) diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h index f1a5ab8e075..358948e8ae8 100644 --- a/libgcc/gthr-posix.h +++ b/libgcc/gthr-posix.h @@ -48,6 +48,7 @@ typedef pthread_t __gthread_t; typedef pthread_key_t __gthread_key_t; typedef pthread_once_t __gthread_once_t; typedef pthread_mutex_t __gthread_mutex_t; +typedef pthread_rwlock_t __gthread_rwlock_t; typedef pthread_mutex_t __gthread_recursive_mutex_t; typedef pthread_cond_t __gthread_cond_t; typedef struct timespec __gthread_time_t; @@ -58,6 +59,7 @@ typedef struct timespec __gthread_time_t; #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT #if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER) #define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER @@ -135,6 +137,11 @@ __gthrw(pthread_mutexattr_init) __gthrw(pthread_mutexattr_settype) __gthrw(pthread_mutexattr_destroy) +__gthrw(pthread_rwlock_rdlock) +__gthrw(pthread_rwlock_tryrdlock) +__gthrw(pthread_rwlock_wrlock) +__gthrw(pthread_rwlock_trywrlock) +__gthrw(pthread_rwlock_unlock) #if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK) /* Objective-C. */ @@ -885,6 +892,51 @@ __gthread_cond_destroy (__gthread_cond_t* __cond) return __gthrw_(pthread_cond_destroy) (__cond); } +static inline int +__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_rdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_wrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_wrlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_trywrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_trywrlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_unlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_unlock) (__rwlock); + else +return 0; +} + #endif /* _LIBOBJC */ #endif /* ! GCC_GTHR_POSIX_H */ diff --git a/libgfortran/io/async.c b/libgfortran/io/async.c index 912b39ea302..f0bde979da4 100644 --- a/libgfortran/io/async.c +++ b/libgfortran/io/async.c @@ -42,6 +42,10 @@ DEBUG_LINE (__thread const char *aio_prefix = MPREFIX); DEBUG_LINE (__gthread_mutex_t debug_queue_lock = __
[PATCH] libgfortran: Replace mutex with rwlock
This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro (__gthrw): New function (__gthread_rwlock_rdlock): New function (__gthread_rwlock_tryrdlock): New function (__gthread_rwlock_wrlock): New function (__gthread_rwlock_trywrlock): New function (__gthread_rwlock_unlock): New function libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New * io/async.h (RWLOCK_DEBUG_ADD): New macro (CHECK_RDLOCK): New macro (CHECK_WRLOCK): New macro (TAIL_RWLOCK_DEBUG_QUEUE): New macro (IN_RWLOCK_DEBUG_QUEUE): New macro (RDLOCK): New macro (WRLOCK): New macro (RWUNLOCK): New macro (RD_TO_WRLOCK): New macro (INTERN_RDLOCK): New macro (INTERN_WRLOCK): New macro (INTERN_RWUNLOCK): New macro * io/io.h (internal_proto): Define unit_rwlock * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock (st_write_done_worker): Relace unit_lock with unit_rwlock * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock (if): Relace unit_lock with unit_rwlock (close_unit_1): Relace unit_lock with unit_rwlock (close_units): Relace unit_lock with unit_rwlock (newunit_alloc): Relace unit_lock with unit_rwlock * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock --- libgcc/gthr-posix.h | 52 + libgfortran/io/async.c| 4 + libgfortran/io/async.h| 151 ++ libgfortran/io/io.h | 15 ++-- libgfortran/io/transfer.c | 8 +- libgfortran/io/unit.c | 65 libgfortran/io/unix.c | 16 ++-- 7 files changed, 265 insertions(+), 46 deletions(-) diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h index f1a5ab8e075..358948e8ae8 100644 --- a/libgcc/gthr-posix.h +++ b/libgcc/gthr-posix.h @@ -48,6 +48,7 @@ typedef pthread_t __gthread_t; typedef pthread_key_t __gthread_key_t; typedef pthread_once_t __gthread_once_t; typedef pthread_mutex_t __gthread_mutex_t; +typedef pthread_rwlock_t __gthread_rwlock_t; typedef pthread_mutex_t __gthread_recursive_mutex_t; typedef pthread_cond_t __gthread_cond_t; typedef struct timespec __gthread_time_t; @@ -58,6 +59,7 @@ typedef struct timespec __gthread_time_t; #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT #if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER) #define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER @@ -135,6 +137,11 @@ __gthrw(pthread_mutexattr_init) __gthrw(pthread_mutexattr_settype) __gthrw(pthread_mutexattr_destroy) +__gthrw(pthread_rwlock_rdlock) +__gthrw(pthread_rwlock_tryrdlock) +__gthrw(pthread_rwlock_wrlock) +__gthrw(pthread_rwlock_trywrlock) +__gthrw(pthread_rwlock_unlock) #if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK) /* Objective-C. */ @@ -885,6 +892,51 @@ __gthread_cond_destroy (__gthread_cond_t* __cond) return __gthrw_(pthread_cond_destroy) (__cond); } +static inline int +__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_rdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_wrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_wrlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_trywrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_trywrlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_unlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_unlock) (__rwlock); + else +return 0; +} + #endif /* _LIBOBJC */ #endif /* ! GCC_GTHR_POSIX_H */ diff --git a/libgfortran/io/async.c b/libgfortran/io/async.c index 912b39ea302..f0bde979da4 100644 --- a/libgfortran/io/async.c +++ b/libgfortran/io/async.c @@ -42,6 +42,10 @@ DEBUG_LINE (__thread const char *aio_prefix = MPREFIX); DEBUG_LINE (__gthread_mutex_t debug_queue_lock = __GTHREAD_MUTEX_IN
[PATCH v2] libgfortran: Replace mutex with rwlock
This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro (__gthrw): New function (__gthread_rwlock_rdlock): New function (__gthread_rwlock_tryrdlock): New function (__gthread_rwlock_wrlock): New function (__gthread_rwlock_trywrlock): New function (__gthread_rwlock_unlock): New function libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New * io/async.h (RWLOCK_DEBUG_ADD): New macro (CHECK_RDLOCK): New macro (CHECK_WRLOCK): New macro (TAIL_RWLOCK_DEBUG_QUEUE): New macro (IN_RWLOCK_DEBUG_QUEUE): New macro (RDLOCK): New macro (WRLOCK): New macro (RWUNLOCK): New macro (RD_TO_WRLOCK): New macro (INTERN_RDLOCK): New macro (INTERN_WRLOCK): New macro (INTERN_RWUNLOCK): New macro * io/io.h (internal_proto): Define unit_rwlock * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock (st_write_done_worker): Relace unit_lock with unit_rwlock * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock (if): Relace unit_lock with unit_rwlock (close_unit_1): Relace unit_lock with unit_rwlock (close_units): Relace unit_lock with unit_rwlock (newunit_alloc): Relace unit_lock with unit_rwlock * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock --- v1 -> v2: Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined. Signed-off-by: Lipeng Zhu --- libgcc/gthr-posix.h | 60 +++ libgfortran/io/async.c| 4 + libgfortran/io/async.h| 151 ++ libgfortran/io/io.h | 15 ++-- libgfortran/io/transfer.c | 8 +- libgfortran/io/unit.c | 65 libgfortran/io/unix.c | 16 ++-- 7 files changed, 273 insertions(+), 46 deletions(-) diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h index f1a5ab8e075..cbe7cba20d0 100644 --- a/libgcc/gthr-posix.h +++ b/libgcc/gthr-posix.h @@ -48,6 +48,9 @@ typedef pthread_t __gthread_t; typedef pthread_key_t __gthread_key_t; typedef pthread_once_t __gthread_once_t; typedef pthread_mutex_t __gthread_mutex_t; +#ifndef __cplusplus +typedef pthread_rwlock_t __gthread_rwlock_t; +#endif typedef pthread_mutex_t __gthread_recursive_mutex_t; typedef pthread_cond_t __gthread_cond_t; typedef struct timespec __gthread_time_t; @@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t; #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function +#ifndef __cplusplus +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER +#endif #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT #if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER) #define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER @@ -135,6 +141,13 @@ __gthrw(pthread_mutexattr_init) __gthrw(pthread_mutexattr_settype) __gthrw(pthread_mutexattr_destroy) +#ifndef __cplusplus +__gthrw(pthread_rwlock_rdlock) +__gthrw(pthread_rwlock_tryrdlock) +__gthrw(pthread_rwlock_wrlock) +__gthrw(pthread_rwlock_trywrlock) +__gthrw(pthread_rwlock_unlock) +#endif #if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK) /* Objective-C. */ @@ -885,6 +898,53 @@ __gthread_cond_destroy (__gthread_cond_t* __cond) return __gthrw_(pthread_cond_destroy) (__cond); } +#ifndef __cplusplus +static inline int +__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_rdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_wrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_wrlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_trywrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_trywrlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_unlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_unlock) (__rwlock); + else +return 0; +} +#endif + #endif /* _LIBOBJC */ #endif /* ! GCC_GTHR_POSIX_H */ diff --git a/libgfortran/io/async.c b/libgfortran/io
[PATCH v3] libgfortran: Replace mutex with rwlock
This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro (__gthrw): New function (__gthread_rwlock_rdlock): New function (__gthread_rwlock_tryrdlock): New function (__gthread_rwlock_wrlock): New function (__gthread_rwlock_trywrlock): New function (__gthread_rwlock_unlock): New function libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New * io/async.h (RWLOCK_DEBUG_ADD): New macro (CHECK_RDLOCK): New macro (CHECK_WRLOCK): New macro (TAIL_RWLOCK_DEBUG_QUEUE): New macro (IN_RWLOCK_DEBUG_QUEUE): New macro (RDLOCK): New macro (WRLOCK): New macro (RWUNLOCK): New macro (RD_TO_WRLOCK): New macro (INTERN_RDLOCK): New macro (INTERN_WRLOCK): New macro (INTERN_RWUNLOCK): New macro * io/io.h (internal_proto): Define unit_rwlock * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock (st_write_done_worker): Relace unit_lock with unit_rwlock * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock (if): Relace unit_lock with unit_rwlock (close_unit_1): Relace unit_lock with unit_rwlock (close_units): Relace unit_lock with unit_rwlock (newunit_alloc): Relace unit_lock with unit_rwlock * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock --- v1 -> v2: Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined. v2 -> v3: Rebase the patch with trunk branch. Signed-off-by: Lipeng Zhu --- libgcc/gthr-posix.h | 60 +++ libgfortran/io/async.c| 4 + libgfortran/io/async.h| 151 ++ libgfortran/io/io.h | 15 ++-- libgfortran/io/transfer.c | 8 +- libgfortran/io/unit.c | 65 libgfortran/io/unix.c | 16 ++-- 7 files changed, 273 insertions(+), 46 deletions(-) diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h index aebcfdd9f4c..73283082997 100644 --- a/libgcc/gthr-posix.h +++ b/libgcc/gthr-posix.h @@ -48,6 +48,9 @@ typedef pthread_t __gthread_t; typedef pthread_key_t __gthread_key_t; typedef pthread_once_t __gthread_once_t; typedef pthread_mutex_t __gthread_mutex_t; +#ifndef __cplusplus +typedef pthread_rwlock_t __gthread_rwlock_t; +#endif typedef pthread_mutex_t __gthread_recursive_mutex_t; typedef pthread_cond_t __gthread_cond_t; typedef struct timespec __gthread_time_t; @@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t; #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function +#ifndef __cplusplus +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER +#endif #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT #if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER) #define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER @@ -135,6 +141,13 @@ __gthrw(pthread_mutexattr_init) __gthrw(pthread_mutexattr_settype) __gthrw(pthread_mutexattr_destroy) +#ifndef __cplusplus +__gthrw(pthread_rwlock_rdlock) +__gthrw(pthread_rwlock_tryrdlock) +__gthrw(pthread_rwlock_wrlock) +__gthrw(pthread_rwlock_trywrlock) +__gthrw(pthread_rwlock_unlock) +#endif #if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK) /* Objective-C. */ @@ -885,6 +898,53 @@ __gthread_cond_destroy (__gthread_cond_t* __cond) return __gthrw_(pthread_cond_destroy) (__cond); } +#ifndef __cplusplus +static inline int +__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_rdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_wrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_wrlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_trywrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_trywrlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_unlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_unlock) (__rwlock); + else +return 0; +} +#endif + #endif /* _LIBOBJC */ #endif /* ! GCC_GTHR_POSIX_H */ d
[PATCH v4] libgfortran: Replace mutex with rwlock
This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro (__gthrw): New function (__gthread_rwlock_rdlock): New function (__gthread_rwlock_tryrdlock): New function (__gthread_rwlock_wrlock): New function (__gthread_rwlock_trywrlock): New function (__gthread_rwlock_unlock): New function libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New * io/async.h (RWLOCK_DEBUG_ADD): New macro (CHECK_RDLOCK): New macro (CHECK_WRLOCK): New macro (TAIL_RWLOCK_DEBUG_QUEUE): New macro (IN_RWLOCK_DEBUG_QUEUE): New macro (RDLOCK): New macro (WRLOCK): New macro (RWUNLOCK): New macro (RD_TO_WRLOCK): New macro (INTERN_RDLOCK): New macro (INTERN_WRLOCK): New macro (INTERN_RWUNLOCK): New macro * io/io.h (internal_proto): Define unit_rwlock * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock (st_write_done_worker): Relace unit_lock with unit_rwlock * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock (if): Relace unit_lock with unit_rwlock (close_unit_1): Relace unit_lock with unit_rwlock (close_units): Relace unit_lock with unit_rwlock (newunit_alloc): Relace unit_lock with unit_rwlock * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock --- v1 -> v2: Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined. v2 -> v3: Rebase the patch with trunk branch. v3 -> v4: Update the comments. Reviewed-by: Hongjiu Lu Signed-off-by: Lipeng Zhu --- libgcc/gthr-posix.h | 60 +++ libgfortran/io/async.c| 4 + libgfortran/io/async.h| 151 ++ libgfortran/io/io.h | 15 ++-- libgfortran/io/transfer.c | 8 +- libgfortran/io/unit.c | 75 +++ libgfortran/io/unix.c | 16 ++-- 7 files changed, 283 insertions(+), 46 deletions(-) diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h index aebcfdd9f4c..73283082997 100644 --- a/libgcc/gthr-posix.h +++ b/libgcc/gthr-posix.h @@ -48,6 +48,9 @@ typedef pthread_t __gthread_t; typedef pthread_key_t __gthread_key_t; typedef pthread_once_t __gthread_once_t; typedef pthread_mutex_t __gthread_mutex_t; +#ifndef __cplusplus +typedef pthread_rwlock_t __gthread_rwlock_t; +#endif typedef pthread_mutex_t __gthread_recursive_mutex_t; typedef pthread_cond_t __gthread_cond_t; typedef struct timespec __gthread_time_t; @@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t; #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function +#ifndef __cplusplus +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER +#endif #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT #if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER) #define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER @@ -135,6 +141,13 @@ __gthrw(pthread_mutexattr_init) __gthrw(pthread_mutexattr_settype) __gthrw(pthread_mutexattr_destroy) +#ifndef __cplusplus +__gthrw(pthread_rwlock_rdlock) +__gthrw(pthread_rwlock_tryrdlock) +__gthrw(pthread_rwlock_wrlock) +__gthrw(pthread_rwlock_trywrlock) +__gthrw(pthread_rwlock_unlock) +#endif #if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK) /* Objective-C. */ @@ -885,6 +898,53 @@ __gthread_cond_destroy (__gthread_cond_t* __cond) return __gthrw_(pthread_cond_destroy) (__cond); } +#ifndef __cplusplus +static inline int +__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_rdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_wrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_wrlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_trywrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_trywrlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_unlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_unlock) (__rwlock); + else +return 0; +} +#e
[PATCH v5] libgfortran: Replace mutex with rwlock
This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro (__gthrw): New function (__gthread_rwlock_rdlock): New function (__gthread_rwlock_tryrdlock): New function (__gthread_rwlock_wrlock): New function (__gthread_rwlock_trywrlock): New function (__gthread_rwlock_unlock): New function libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New * io/async.h (RWLOCK_DEBUG_ADD): New macro (CHECK_RDLOCK): New macro (CHECK_WRLOCK): New macro (TAIL_RWLOCK_DEBUG_QUEUE): New macro (IN_RWLOCK_DEBUG_QUEUE): New macro (RDLOCK): New macro (WRLOCK): New macro (RWUNLOCK): New macro (RD_TO_WRLOCK): New macro (INTERN_RDLOCK): New macro (INTERN_WRLOCK): New macro (INTERN_RWUNLOCK): New macro * io/io.h (internal_proto): Define unit_rwlock * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock (st_write_done_worker): Relace unit_lock with unit_rwlock * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock (if): Relace unit_lock with unit_rwlock (close_unit_1): Relace unit_lock with unit_rwlock (close_units): Relace unit_lock with unit_rwlock (newunit_alloc): Relace unit_lock with unit_rwlock * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock --- v1 -> v2: Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined. v2 -> v3: Rebase the patch with trunk branch. v3 -> v4: Update the comments. v4 -> v5: Fix typos and code formatter. Reviewed-by: Hongjiu Lu Signed-off-by: Lipeng Zhu --- libgcc/gthr-posix.h | 60 +++ libgfortran/io/async.c| 4 + libgfortran/io/async.h| 154 +- libgfortran/io/io.h | 15 ++-- libgfortran/io/transfer.c | 8 +- libgfortran/io/unit.c | 71 +++--- libgfortran/io/unix.c | 16 ++-- 7 files changed, 281 insertions(+), 47 deletions(-) diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h index aebcfdd9f4c..73283082997 100644 --- a/libgcc/gthr-posix.h +++ b/libgcc/gthr-posix.h @@ -48,6 +48,9 @@ typedef pthread_t __gthread_t; typedef pthread_key_t __gthread_key_t; typedef pthread_once_t __gthread_once_t; typedef pthread_mutex_t __gthread_mutex_t; +#ifndef __cplusplus +typedef pthread_rwlock_t __gthread_rwlock_t; +#endif typedef pthread_mutex_t __gthread_recursive_mutex_t; typedef pthread_cond_t __gthread_cond_t; typedef struct timespec __gthread_time_t; @@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t; #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function +#ifndef __cplusplus +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER +#endif #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT #if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER) #define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER @@ -135,6 +141,13 @@ __gthrw(pthread_mutexattr_init) __gthrw(pthread_mutexattr_settype) __gthrw(pthread_mutexattr_destroy) +#ifndef __cplusplus +__gthrw(pthread_rwlock_rdlock) +__gthrw(pthread_rwlock_tryrdlock) +__gthrw(pthread_rwlock_wrlock) +__gthrw(pthread_rwlock_trywrlock) +__gthrw(pthread_rwlock_unlock) +#endif #if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK) /* Objective-C. */ @@ -885,6 +898,53 @@ __gthread_cond_destroy (__gthread_cond_t* __cond) return __gthrw_(pthread_cond_destroy) (__cond); } +#ifndef __cplusplus +static inline int +__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_rdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_wrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_wrlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_trywrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_trywrlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_unlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_unlock) (_