[PATCH 0/5] amdgcn: Improve TImode support
This patch series extends TImode support for AMD GCN (see e.g. PR96306 and PR95730). This fixes several test failures that appear at present, and enables use of a 128-bit integer "omp_depend_kind" for OpenMP 5.0. Tested with offloading to AMD GCN. Further commentary on invididual patches. Thanks, Julian Julian Brown (5): amdgcn: Use unsigned types for udivsi3/umodsi3 libgcc helper args/return amdgcn: Add [us]mulsi3_highpart SGPR alternatives & [us]mulsid3/muldi3 expanders amdgcn: Add clrsbsi2/clrsbdi2 implementation amdgcn: Enable support for TImode for AMD GCN Fortran: Re-enable 128-bit integers for AMD GCN gcc/config/gcn/gcn.c | 30 ++ gcc/config/gcn/gcn.h | 11 ++-- gcc/config/gcn/gcn.md | 95 +++--- libgcc/config/gcn/lib2-bswapti2.c | 47 +++ libgcc/config/gcn/lib2-divmod-di.c | 35 +++ libgcc/config/gcn/lib2-divmod.c| 8 +-- libgcc/config/gcn/lib2-gcn.h | 12 +++- libgcc/config/gcn/t-amdgcn | 2 + libgfortran/configure | 22 ++- libgfortran/configure.ac | 4 -- 10 files changed, 226 insertions(+), 40 deletions(-) create mode 100644 libgcc/config/gcn/lib2-bswapti2.c create mode 100644 libgcc/config/gcn/lib2-divmod-di.c -- 2.29.2
[PATCH 1/5] amdgcn: Use unsigned types for udivsi3/umodsi3 libgcc helper args/return
This patch changes the argument and return types for the libgcc __udivsi3 and __umodsi3 helper functions for GCN to USItype instead of SItype. This is probably just cosmetic in practice. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown libgcc/ * config/gcn/lib2-divmod.c (__udivsi3, __umodsi3): Change argument and return types to USItype. * config/gcn/lib2-gcn.h (__udivsi3, __umodsi3): Update prototypes. --- libgcc/config/gcn/lib2-divmod.c | 8 libgcc/config/gcn/lib2-gcn.h| 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/libgcc/config/gcn/lib2-divmod.c b/libgcc/config/gcn/lib2-divmod.c index 0d6ca44f521..7c72e24e0c3 100644 --- a/libgcc/config/gcn/lib2-divmod.c +++ b/libgcc/config/gcn/lib2-divmod.c @@ -102,15 +102,15 @@ __modsi3 (SItype a, SItype b) } -SItype -__udivsi3 (SItype a, SItype b) +USItype +__udivsi3 (USItype a, USItype b) { return udivmodsi4 (a, b, 0); } -SItype -__umodsi3 (SItype a, SItype b) +USItype +__umodsi3 (USItype a, USItype b) { return udivmodsi4 (a, b, 1); } diff --git a/libgcc/config/gcn/lib2-gcn.h b/libgcc/config/gcn/lib2-gcn.h index 11476c4cda8..9223d73b8e7 100644 --- a/libgcc/config/gcn/lib2-gcn.h +++ b/libgcc/config/gcn/lib2-gcn.h @@ -38,8 +38,8 @@ typedef int word_type __attribute__ ((mode (__word__))); /* Exported functions. */ extern SItype __divsi3 (SItype, SItype); extern SItype __modsi3 (SItype, SItype); -extern SItype __udivsi3 (SItype, SItype); -extern SItype __umodsi3 (SItype, SItype); +extern USItype __udivsi3 (USItype, USItype); +extern USItype __umodsi3 (USItype, USItype); extern HItype __divhi3 (HItype, HItype); extern HItype __modhi3 (HItype, HItype); extern UHItype __udivhi3 (UHItype, UHItype); -- 2.29.2
[PATCH 2/5] amdgcn: Add [us]mulsi3_highpart SGPR alternatives & [us]mulsid3/muldi3 expanders
This patch improves 64-bit multiplication for AMD GCN: patterns for unsigned and signed 32x32->64 bit multiplication have been added, and also 64x64->64 bit multiplication is now open-coded rather than calling a library function (which may be a win for code size as well as speed: the function calling sequence isn't particularly concise for GCN). The mulsi3_highpart pattern has also been extended for GCN5+, since that ISA version supports high-part result multiply instructions with SGPR operands. The DImode multiply implementation is lost from libgcc if we build it for DImode/TImode rather than SImode/DImode, a change we make in a later patch in this series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown gcc/ * config/gcn/gcn.md (mulsi3_highpart): Add SGPR alternatives for GCN5+. (mulsidi3, muldi3): Add expanders. --- gcc/config/gcn/gcn.md | 55 ++- 1 file changed, 49 insertions(+), 6 deletions(-) diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index b5f895a93e2..70655ca4b8b 100644 --- a/gcc/config/gcn/gcn.md +++ b/gcc/config/gcn/gcn.md @@ -1392,19 +1392,62 @@ (define_code_attr e [(sign_extend "e") (zero_extend "")]) (define_insn "mulsi3_highpart" - [(set (match_operand:SI 0 "register_operand""= v") + [(set (match_operand:SI 0 "register_operand" "=Sg, Sg, v") (truncate:SI (lshiftrt:DI (mult:DI (any_extend:DI - (match_operand:SI 1 "register_operand" "% v")) + (match_operand:SI 1 "register_operand" "%SgA,SgA, v")) (any_extend:DI - (match_operand:SI 2 "register_operand" "vSv"))) + (match_operand:SI 2 "register_operand" "SgA, B,vSv"))) (const_int 32] "" - "v_mul_hi0\t%0, %2, %1" - [(set_attr "type" "vop3a") - (set_attr "length" "8")]) + "@ + s_mul_hi0\t%0, %1, %2 + s_mul_hi0\t%0, %1, %2 + v_mul_hi0\t%0, %2, %1" + [(set_attr "type" "sop2,sop2,vop3a") + (set_attr "length" "4,8,8") + (set_attr "gcn_version" "gcn5,gcn5,*")]) + +(define_expand "mulsidi3" + [(set (match_operand:DI 0 "register_operand" "") + (mult:DI + (any_extend:DI (match_operand:SI 1 "register_operand" "")) + (any_extend:DI (match_operand:SI 2 "register_operand" ""] + "" + { +rtx dst = gen_reg_rtx (DImode); +rtx dstlo = gen_lowpart (SImode, dst); +rtx dsthi = gen_highpart_mode (SImode, DImode, dst); +emit_insn (gen_mulsi3 (dstlo, operands[1], operands[2])); +emit_insn (gen_mulsi3_highpart (dsthi, operands[1], operands[2])); +emit_move_insn (operands[0], dst); +DONE; + }) + +(define_expand "muldi3" + [(set (match_operand:DI 0 "register_operand" "") + (mult:DI (match_operand:DI 1 "register_operand" "") +(match_operand:DI 2 "register_operand" "")))] + "" + { +rtx tmp0 = gen_reg_rtx (SImode); +rtx tmp1 = gen_reg_rtx (SImode); +rtx dst = gen_reg_rtx (DImode); +rtx dsthi = gen_highpart_mode (SImode, DImode, dst); +rtx op1lo = gen_lowpart (SImode, operands[1]); +rtx op1hi = gen_highpart_mode (SImode, DImode, operands[1]); +rtx op2lo = gen_lowpart (SImode, operands[2]); +rtx op2hi = gen_highpart_mode (SImode, DImode, operands[2]); +emit_insn (gen_umulsidi3 (dst, op1lo, op2lo)); +emit_insn (gen_mulsi3 (tmp0, op1lo, op2hi)); +emit_insn (gen_addsi3 (dsthi, dsthi, tmp0)); +emit_insn (gen_mulsi3 (tmp1, op1hi, op2lo)); +emit_insn (gen_addsi3 (dsthi, dsthi, tmp1)); +emit_move_insn (operands[0], dst); +DONE; + }) (define_insn "mulhisi3" [(set (match_operand:SI 0 "register_operand" "=v") -- 2.29.2
[PATCH 3/5] amdgcn: Add clrsbsi2/clrsbdi2 implementation
This patch adds an open-coded implementation of the clrsb2 (count leading redundant sign bit) standard names using the GCN flbit_i* instructions for SImode and DImode. Those don't count exactly as we need, so we need a couple of other instructions to fix up the result afterwards. These patterns are lost from libgcc if we build it for DImode/TImode rather than SImode/DImode, a change we make in a later patch in this series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown gcc/ * config/gcn/gcn.md (UNSPEC_FLBIT_INT): New unspec constant. (s_mnemonic): Add clrsb. (gcn_flbit_int): Add insn pattern for SImode/DImode. (clrsb2): Add expander for SImode/DImode. --- gcc/config/gcn/gcn.md | 40 ++-- 1 file changed, 38 insertions(+), 2 deletions(-) diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index 70655ca4b8b..0fa7f86702e 100644 --- a/gcc/config/gcn/gcn.md +++ b/gcc/config/gcn/gcn.md @@ -81,7 +81,8 @@ UNSPEC_MOV_FROM_LANE63 UNSPEC_GATHER UNSPEC_SCATTER - UNSPEC_RCP]) + UNSPEC_RCP + UNSPEC_FLBIT_INT]) ;; }}} ;; {{{ Attributes @@ -338,7 +339,8 @@ [(not "not%b") (popcount "bcnt1_i32%b") (clz "flbit_i32%b") - (ctz "ff1_i32%b")]) + (ctz "ff1_i32%b") + (clrsb "flbit_i32%i")]) (define_code_attr revmnemonic [(minus "subrev%i") @@ -1509,6 +1511,40 @@ [(set_attr "type" "sop1") (set_attr "length" "4,8")]) +(define_insn "gcn_flbit_int" + [(set (match_operand:SI 0 "register_operand" "=Sg,Sg") + (unspec:SI [(match_operand:SIDI 1 "gcn_alu_operand" "SgA, B")] + UNSPEC_FLBIT_INT))] + "" + { +if (mode == SImode) + return "s_flbit_i32\t%0, %1"; +else + return "s_flbit_i32_i64\t%0, %1"; + } + [(set_attr "type" "sop1") + (set_attr "length" "4,8")]) + +(define_expand "clrsb2" + [(set (match_operand:SI 0 "register_operand" "") + (clrsb:SI (match_operand:SIDI 1 "gcn_alu_operand" "")))] + "" + { +rtx tmp = gen_reg_rtx (SImode); +/* FLBIT_I* counts sign or zero bits at the most-significant end of the + input register (and returns -1 for 0/-1 inputs). We want the number of + *redundant* bits (i.e. that value minus one), and an answer of 31/63 for + 0/-1 inputs. We can do that in three instructions... */ +emit_insn (gen_gcn_flbit_int (tmp, operands[1])); +emit_insn (gen_uminsi3 (tmp, tmp, + gen_int_mode (GET_MODE_BITSIZE (mode), + SImode))); +/* If we put this last, it can potentially be folded into a subsequent + arithmetic operation. */ +emit_insn (gen_subsi3 (operands[0], tmp, const1_rtx)); +DONE; + }) + ;; }}} ;; {{{ ALU: generic 32-bit binop -- 2.29.2
[PATCH 4/5] amdgcn: Enable support for TImode for AMD GCN
This patch enables support for TImode for AMD GCN, the lack of which is currently causing a number of test failures for the target and which is also needed to support "omp_depend_kind" for OpenMP 5.0, since that is implemented as a 128-bit integer. Several libgcc support routines are built by default for the "word size" of a machine, and also for "2 * word size" of the machine. The libgcc build for AMD GCN is changed so that it builds for a "word size" of 64 bits, in order to better match the (64-bit) host compiler. However it isn't really true that we have 64-bit words -- GCN has 32-bit registers, so changing UNITS_PER_WORD unconditionally would be the wrong thing to do. Changing this setting for libgcc (only) means that support routines are built for "single word" operations that are DImode (64 bits), and those for "double word" operations are built for TImode (128 bits). That leaves some gaps regarding previous operations that were built for a "single word" size of 32 bits and a "double word" size of 64 bits (generic code doesn't cover both alternatives for all operations that might be needed). Those gaps are filled in by this patch, or by the preceding patches in the series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown gcc/ * config/gcn/gcn.c (gcn_init_libfuncs): New function. (TARGET_INIT_LIBFUNCS): Define target hook using above function. * config/gcn/gcn.h (UNITS_PER_WORD): Define to 8 for IN_LIBGCC2, 4 otherwise. (LIBGCC2_UNITS_PER_WORD, BITS_PER_WORD): Remove definitions. (MAX_FIXED_MODE_SIZE): Change to 128. libgcc/ * config/gcn/lib2-bswapti2.c: New file. * config/gcn/lib2-divmod-di.c: New file. * config/gcn/lib2-gcn.h (DItype, UDItype, TItype, UTItype): Add typedefs. (__divdi3, __moddi3, __udivdi3, __umoddi3): Add prototypes. * config/gcn/t-amdgcn (LIB2ADD): Add lib2-divmod-di.c and lib2-bswapti2.c. --- gcc/config/gcn/gcn.c | 30 +++ gcc/config/gcn/gcn.h | 11 --- libgcc/config/gcn/lib2-bswapti2.c | 47 ++ libgcc/config/gcn/lib2-divmod-di.c | 35 ++ libgcc/config/gcn/lib2-gcn.h | 8 + libgcc/config/gcn/t-amdgcn | 2 ++ 6 files changed, 129 insertions(+), 4 deletions(-) create mode 100644 libgcc/config/gcn/lib2-bswapti2.c create mode 100644 libgcc/config/gcn/lib2-divmod-di.c diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index 283a91fe50a..45f37d5310d 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -3610,6 +3610,34 @@ gcn_init_builtins (void) #endif } +/* Implement TARGET_INIT_LIBFUNCS. */ + +static void +gcn_init_libfuncs (void) +{ + /* BITS_PER_UNIT * 2 is 64 bits, which causes + optabs-libfuncs.c:gen_int_libfunc to omit TImode (i.e 128 bits) + libcalls that we need to support operations for that type. Initialise + them here instead. */ + set_optab_libfunc (udiv_optab, TImode, "__udivti3"); + set_optab_libfunc (umod_optab, TImode, "__umodti3"); + set_optab_libfunc (sdiv_optab, TImode, "__divti3"); + set_optab_libfunc (smod_optab, TImode, "__modti3"); + set_optab_libfunc (smul_optab, TImode, "__multi3"); + set_optab_libfunc (addv_optab, TImode, "__addvti3"); + set_optab_libfunc (subv_optab, TImode, "__subvti3"); + set_optab_libfunc (negv_optab, TImode, "__negvti2"); + set_optab_libfunc (absv_optab, TImode, "__absvti2"); + set_optab_libfunc (smulv_optab, TImode, "__mulvti3"); + set_optab_libfunc (ffs_optab, TImode, "__ffsti2"); + set_optab_libfunc (clz_optab, TImode, "__clzti2"); + set_optab_libfunc (ctz_optab, TImode, "__ctzti2"); + set_optab_libfunc (clrsb_optab, TImode, "__clrsbti2"); + set_optab_libfunc (popcount_optab, TImode, "__popcountti2"); + set_optab_libfunc (parity_optab, TImode, "__parityti2"); + set_optab_libfunc (bswap_optab, TImode, "__bswapti2"); +} + /* Expand the CMP_SWAP GCN builtins. We have our own versions that do not require taking the address of any object, other than the memory cell being operated on. @@ -6336,6 +6364,8 @@ gcn_dwarf_register_span (rtx rtl) #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS gcn_init_builtins +#undef TARGET_INIT_LIBFUNCS +#define TARGET_INIT_LIBFUNCS gcn_init_libfuncs #undef TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS #define TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS \ gcn_ira_change_pseudo_allocno_class diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h index eba4646f1bf..540835b81cc 100644 --- a/gcc/config/gcn/gcn.h +++ b/gcc/config/gcn/gcn.h @@ -46,9 +46,12 @@ #define BYTES_BIG_ENDIAN 0 #define WORDS_BIG_ENDIAN 0 -#define BITS_PER_WORD 32 -#define UNITS_PER_WORD (BITS_PER_WORD/BITS_PER_UNIT) -#define LIBGCC2_UNITS_PER_WORD 4 +#ifdef IN_LIBGCC2 +/* We want DImod
[PATCH 5/5] Fortran: Re-enable 128-bit integers for AMD GCN
This patch reverts the part of Tobias's patch for PR target/96306 that disables 128-bit integer support for AMD GCN. OK for mainline (assuming the previous patches are in first)? Thanks, Julian 2021-06-18 Julian Brown libgfortran/ PR target/96306 * configure.ac: Remove stanza that removes KIND=16 integers for AMD GCN. * configure: Regenerate. --- libgfortran/configure| 22 -- libgfortran/configure.ac | 4 2 files changed, 4 insertions(+), 22 deletions(-) diff --git a/libgfortran/configure b/libgfortran/configure index f3634389cf8..886216f69d4 100755 --- a/libgfortran/configure +++ b/libgfortran/configure @@ -6017,7 +6017,7 @@ case "$host" in case "$enable_cet" in auto) # Check if target supports multi-byte NOPs - # and if assembler supports CET insn. + # and if compiler and assembler support CET insn. cet_save_CFLAGS="$CFLAGS" CFLAGS="$CFLAGS -fcf-protection" cat confdefs.h - <<_ACEOF >conftest.$ac_ext @@ -6216,10 +6216,6 @@ fi LIBGOMP_CHECKED_INT_KINDS="1 2 4 8 16" LIBGOMP_CHECKED_REAL_KINDS="4 8 10 16" -if test "x${target_cpu}" = xamdgcn; then - # amdgcn only has limited support for __int128. - LIBGOMP_CHECKED_INT_KINDS="1 2 4 8" -fi @@ -12731,7 +12727,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 12744 "configure" +#line 12730 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -12837,7 +12833,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 12850 "configure" +#line 12836 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -15532,16 +15528,6 @@ freebsd* | dragonfly*) esac ;; -gnu*) - version_type=linux - need_lib_prefix=no - need_version=no - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}${major} ${libname}${shared_ext}' - soname_spec='${libname}${release}${shared_ext}$major' - shlibpath_var=LD_LIBRARY_PATH - hardcode_into_libs=yes - ;; - haiku*) version_type=linux need_lib_prefix=no @@ -15663,7 +15649,7 @@ linux*oldld* | linux*aout* | linux*coff*) # project, but have not yet been accepted: they are GCC-local changes # for the time being. (See # https://lists.gnu.org/archive/html/libtool-patches/2018-05/msg0.html) -linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinuxfdpiceabi) +linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu* | uclinuxfdpiceabi) version_type=linux need_lib_prefix=no need_version=no diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac index 8961e314d82..523eb24bca1 100644 --- a/libgfortran/configure.ac +++ b/libgfortran/configure.ac @@ -222,10 +222,6 @@ AM_CONDITIONAL(LIBGFOR_MINIMAL, [test "x${target_cpu}" = xnvptx]) LIBGOMP_CHECKED_INT_KINDS="1 2 4 8 16" LIBGOMP_CHECKED_REAL_KINDS="4 8 10 16" -if test "x${target_cpu}" = xamdgcn; then - # amdgcn only has limited support for __int128. - LIBGOMP_CHECKED_INT_KINDS="1 2 4 8" -fi AC_SUBST(LIBGOMP_CHECKED_INT_KINDS) AC_SUBST(LIBGOMP_CHECKED_REAL_KINDS) -- 2.29.2
Re: [PATCH 2/5] amdgcn: Add [us]mulsi3_highpart SGPR alternatives & [us]mulsid3/muldi3 expanders
On 18/06/2021 15:19, Julian Brown wrote: This patch improves 64-bit multiplication for AMD GCN: patterns for unsigned and signed 32x32->64 bit multiplication have been added, and also 64x64->64 bit multiplication is now open-coded rather than calling a library function (which may be a win for code size as well as speed: the function calling sequence isn't particularly concise for GCN). The mulsi3_highpart pattern has also been extended for GCN5+, since that ISA version supports high-part result multiply instructions with SGPR operands. The DImode multiply implementation is lost from libgcc if we build it for DImode/TImode rather than SImode/DImode, a change we make in a later patch in this series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown gcc/ * config/gcn/gcn.md (mulsi3_highpart): Add SGPR alternatives for GCN5+. (mulsidi3, muldi3): Add expanders. --- gcc/config/gcn/gcn.md | 55 ++- 1 file changed, 49 insertions(+), 6 deletions(-) diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index b5f895a93e2..70655ca4b8b 100644 --- a/gcc/config/gcn/gcn.md +++ b/gcc/config/gcn/gcn.md @@ -1392,19 +1392,62 @@ (define_code_attr e [(sign_extend "e") (zero_extend "")]) (define_insn "mulsi3_highpart" - [(set (match_operand:SI 0 "register_operand" "= v") + [(set (match_operand:SI 0 "register_operand" "=Sg, Sg, v") (truncate:SI (lshiftrt:DI (mult:DI (any_extend:DI - (match_operand:SI 1 "register_operand" "% v")) + (match_operand:SI 1 "register_operand" "%SgA,SgA, v")) (any_extend:DI - (match_operand:SI 2 "register_operand" "vSv"))) + (match_operand:SI 2 "register_operand" "SgA, B,vSv"))) (const_int 32] "" - "v_mul_hi0\t%0, %2, %1" - [(set_attr "type" "vop3a") - (set_attr "length" "8")]) + "@ + s_mul_hi0\t%0, %1, %2 + s_mul_hi0\t%0, %1, %2 + v_mul_hi0\t%0, %2, %1" + [(set_attr "type" "sop2,sop2,vop3a") + (set_attr "length" "4,8,8") + (set_attr "gcn_version" "gcn5,gcn5,*")]) + +(define_expand "mulsidi3" + [(set (match_operand:DI 0 "register_operand" "") + (mult:DI + (any_extend:DI (match_operand:SI 1 "register_operand" "")) + (any_extend:DI (match_operand:SI 2 "register_operand" ""] + "" + { +rtx dst = gen_reg_rtx (DImode); +rtx dstlo = gen_lowpart (SImode, dst); +rtx dsthi = gen_highpart_mode (SImode, DImode, dst); +emit_insn (gen_mulsi3 (dstlo, operands[1], operands[2])); +emit_insn (gen_mulsi3_highpart (dsthi, operands[1], operands[2])); +emit_move_insn (operands[0], dst); +DONE; + }) + +(define_expand "muldi3" + [(set (match_operand:DI 0 "register_operand" "") + (mult:DI (match_operand:DI 1 "register_operand" "") +(match_operand:DI 2 "register_operand" "")))] + "" + { +rtx tmp0 = gen_reg_rtx (SImode); +rtx tmp1 = gen_reg_rtx (SImode); +rtx dst = gen_reg_rtx (DImode); +rtx dsthi = gen_highpart_mode (SImode, DImode, dst); +rtx op1lo = gen_lowpart (SImode, operands[1]); +rtx op1hi = gen_highpart_mode (SImode, DImode, operands[1]); +rtx op2lo = gen_lowpart (SImode, operands[2]); +rtx op2hi = gen_highpart_mode (SImode, DImode, operands[2]); +emit_insn (gen_umulsidi3 (dst, op1lo, op2lo)); +emit_insn (gen_mulsi3 (tmp0, op1lo, op2hi)); +emit_insn (gen_addsi3 (dsthi, dsthi, tmp0)); +emit_insn (gen_mulsi3 (tmp1, op1hi, op2lo)); +emit_insn (gen_addsi3 (dsthi, dsthi, tmp1)); +emit_move_insn (operands[0], dst); +DONE; + }) (define_insn "mulhisi3" [(set (match_operand:SI 0 "register_operand" "=v") Most of the rest of the backend expands 64-bit operations to 32-bit pairs much later, using define_insn_and_split, because there were lots of issues with splitting it early. I don't recall exactly what right now, unfortunately. (It might have been related to spilling only half the value to the stack?) It also makes it hard to debug, I think. Andrew
Re: [PATCH 3/5] amdgcn: Add clrsbsi2/clrsbdi2 implementation
On 18/06/2021 15:19, Julian Brown wrote: This patch adds an open-coded implementation of the clrsb2 (count leading redundant sign bit) standard names using the GCN flbit_i* instructions for SImode and DImode. Those don't count exactly as we need, so we need a couple of other instructions to fix up the result afterwards. These patterns are lost from libgcc if we build it for DImode/TImode rather than SImode/DImode, a change we make in a later patch in this series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown gcc/ * config/gcn/gcn.md (UNSPEC_FLBIT_INT): New unspec constant. (s_mnemonic): Add clrsb. (gcn_flbit_int): Add insn pattern for SImode/DImode. (clrsb2): Add expander for SImode/DImode. --- gcc/config/gcn/gcn.md | 40 ++-- 1 file changed, 38 insertions(+), 2 deletions(-) diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index 70655ca4b8b..0fa7f86702e 100644 --- a/gcc/config/gcn/gcn.md +++ b/gcc/config/gcn/gcn.md @@ -81,7 +81,8 @@ UNSPEC_MOV_FROM_LANE63 UNSPEC_GATHER UNSPEC_SCATTER - UNSPEC_RCP]) + UNSPEC_RCP + UNSPEC_FLBIT_INT]) ;; }}} ;; {{{ Attributes @@ -338,7 +339,8 @@ [(not "not%b") (popcount "bcnt1_i32%b") (clz "flbit_i32%b") - (ctz "ff1_i32%b")]) + (ctz "ff1_i32%b") + (clrsb "flbit_i32%i")]) (define_code_attr revmnemonic [(minus "subrev%i") @@ -1509,6 +1511,40 @@ [(set_attr "type" "sop1") (set_attr "length" "4,8")]) +(define_insn "gcn_flbit_int" + [(set (match_operand:SI 0 "register_operand" "=Sg,Sg") + (unspec:SI [(match_operand:SIDI 1 "gcn_alu_operand" "SgA, B")] + UNSPEC_FLBIT_INT))] + "" + { +if (mode == SImode) + return "s_flbit_i32\t%0, %1"; +else + return "s_flbit_i32_i64\t%0, %1"; + } + [(set_attr "type" "sop1") + (set_attr "length" "4,8")]) + +(define_expand "clrsb2" + [(set (match_operand:SI 0 "register_operand" "") + (clrsb:SI (match_operand:SIDI 1 "gcn_alu_operand" "")))] + "" + { +rtx tmp = gen_reg_rtx (SImode); +/* FLBIT_I* counts sign or zero bits at the most-significant end of the + input register (and returns -1 for 0/-1 inputs). We want the number of + *redundant* bits (i.e. that value minus one), and an answer of 31/63 for + 0/-1 inputs. We can do that in three instructions... */ +emit_insn (gen_gcn_flbit_int (tmp, operands[1])); +emit_insn (gen_uminsi3 (tmp, tmp, + gen_int_mode (GET_MODE_BITSIZE (mode), + SImode))); +/* If we put this last, it can potentially be folded into a subsequent + arithmetic operation. */ +emit_insn (gen_subsi3 (operands[0], tmp, const1_rtx)); +DONE; + }) + ;; }}} ;; {{{ ALU: generic 32-bit binop OK. Andrew
Re: [PATCH 4/5] amdgcn: Enable support for TImode for AMD GCN
On 18/06/2021 15:19, Julian Brown wrote: This patch enables support for TImode for AMD GCN, the lack of which is currently causing a number of test failures for the target and which is also needed to support "omp_depend_kind" for OpenMP 5.0, since that is implemented as a 128-bit integer. Several libgcc support routines are built by default for the "word size" of a machine, and also for "2 * word size" of the machine. The libgcc build for AMD GCN is changed so that it builds for a "word size" of 64 bits, in order to better match the (64-bit) host compiler. However it isn't really true that we have 64-bit words -- GCN has 32-bit registers, so changing UNITS_PER_WORD unconditionally would be the wrong thing to do. Changing this setting for libgcc (only) means that support routines are built for "single word" operations that are DImode (64 bits), and those for "double word" operations are built for TImode (128 bits). That leaves some gaps regarding previous operations that were built for a "single word" size of 32 bits and a "double word" size of 64 bits (generic code doesn't cover both alternatives for all operations that might be needed). Those gaps are filled in by this patch, or by the preceding patches in the series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown gcc/ * config/gcn/gcn.c (gcn_init_libfuncs): New function. (TARGET_INIT_LIBFUNCS): Define target hook using above function. * config/gcn/gcn.h (UNITS_PER_WORD): Define to 8 for IN_LIBGCC2, 4 otherwise. (LIBGCC2_UNITS_PER_WORD, BITS_PER_WORD): Remove definitions. (MAX_FIXED_MODE_SIZE): Change to 128. libgcc/ * config/gcn/lib2-bswapti2.c: New file. * config/gcn/lib2-divmod-di.c: New file. * config/gcn/lib2-gcn.h (DItype, UDItype, TItype, UTItype): Add typedefs. (__divdi3, __moddi3, __udivdi3, __umoddi3): Add prototypes. * config/gcn/t-amdgcn (LIB2ADD): Add lib2-divmod-di.c and lib2-bswapti2.c. --- gcc/config/gcn/gcn.c | 30 +++ gcc/config/gcn/gcn.h | 11 --- libgcc/config/gcn/lib2-bswapti2.c | 47 ++ libgcc/config/gcn/lib2-divmod-di.c | 35 ++ libgcc/config/gcn/lib2-gcn.h | 8 + libgcc/config/gcn/t-amdgcn | 2 ++ 6 files changed, 129 insertions(+), 4 deletions(-) create mode 100644 libgcc/config/gcn/lib2-bswapti2.c create mode 100644 libgcc/config/gcn/lib2-divmod-di.c diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index 283a91fe50a..45f37d5310d 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -3610,6 +3610,34 @@ gcn_init_builtins (void) #endif } +/* Implement TARGET_INIT_LIBFUNCS. */ + +static void +gcn_init_libfuncs (void) +{ + /* BITS_PER_UNIT * 2 is 64 bits, which causes + optabs-libfuncs.c:gen_int_libfunc to omit TImode (i.e 128 bits) + libcalls that we need to support operations for that type. Initialise + them here instead. */ + set_optab_libfunc (udiv_optab, TImode, "__udivti3"); + set_optab_libfunc (umod_optab, TImode, "__umodti3"); + set_optab_libfunc (sdiv_optab, TImode, "__divti3"); + set_optab_libfunc (smod_optab, TImode, "__modti3"); + set_optab_libfunc (smul_optab, TImode, "__multi3"); + set_optab_libfunc (addv_optab, TImode, "__addvti3"); + set_optab_libfunc (subv_optab, TImode, "__subvti3"); + set_optab_libfunc (negv_optab, TImode, "__negvti2"); + set_optab_libfunc (absv_optab, TImode, "__absvti2"); + set_optab_libfunc (smulv_optab, TImode, "__mulvti3"); + set_optab_libfunc (ffs_optab, TImode, "__ffsti2"); + set_optab_libfunc (clz_optab, TImode, "__clzti2"); + set_optab_libfunc (ctz_optab, TImode, "__ctzti2"); + set_optab_libfunc (clrsb_optab, TImode, "__clrsbti2"); + set_optab_libfunc (popcount_optab, TImode, "__popcountti2"); + set_optab_libfunc (parity_optab, TImode, "__parityti2"); + set_optab_libfunc (bswap_optab, TImode, "__bswapti2"); +} + /* Expand the CMP_SWAP GCN builtins. We have our own versions that do not require taking the address of any object, other than the memory cell being operated on. @@ -6336,6 +6364,8 @@ gcn_dwarf_register_span (rtx rtl) #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS gcn_init_builtins +#undef TARGET_INIT_LIBFUNCS +#define TARGET_INIT_LIBFUNCS gcn_init_libfuncs #undef TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS #define TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS \ gcn_ira_change_pseudo_allocno_class diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h index eba4646f1bf..540835b81cc 100644 --- a/gcc/config/gcn/gcn.h +++ b/gcc/config/gcn/gcn.h @@ -46,9 +46,12 @@ #define BYTES_BIG_ENDIAN 0 #define WORDS_BIG_ENDIAN 0 -#define BITS_PER_WORD 32 -#define UNITS_PER_WORD (BITS_PER_WORD/BITS_PER_UNIT) -
Re: [PATCH 1/5] amdgcn: Use unsigned types for udivsi3/umodsi3 libgcc helper args/return
On 18/06/2021 15:19, Julian Brown wrote: This patch changes the argument and return types for the libgcc __udivsi3 and __umodsi3 helper functions for GCN to USItype instead of SItype. This is probably just cosmetic in practice. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown libgcc/ * config/gcn/lib2-divmod.c (__udivsi3, __umodsi3): Change argument and return types to USItype. * config/gcn/lib2-gcn.h (__udivsi3, __umodsi3): Update prototypes. --- libgcc/config/gcn/lib2-divmod.c | 8 libgcc/config/gcn/lib2-gcn.h| 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/libgcc/config/gcn/lib2-divmod.c b/libgcc/config/gcn/lib2-divmod.c index 0d6ca44f521..7c72e24e0c3 100644 --- a/libgcc/config/gcn/lib2-divmod.c +++ b/libgcc/config/gcn/lib2-divmod.c @@ -102,15 +102,15 @@ __modsi3 (SItype a, SItype b) } -SItype -__udivsi3 (SItype a, SItype b) +USItype +__udivsi3 (USItype a, USItype b) { return udivmodsi4 (a, b, 0); } -SItype -__umodsi3 (SItype a, SItype b) +USItype +__umodsi3 (USItype a, USItype b) { return udivmodsi4 (a, b, 1); } diff --git a/libgcc/config/gcn/lib2-gcn.h b/libgcc/config/gcn/lib2-gcn.h index 11476c4cda8..9223d73b8e7 100644 --- a/libgcc/config/gcn/lib2-gcn.h +++ b/libgcc/config/gcn/lib2-gcn.h @@ -38,8 +38,8 @@ typedef int word_type __attribute__ ((mode (__word__))); /* Exported functions. */ extern SItype __divsi3 (SItype, SItype); extern SItype __modsi3 (SItype, SItype); -extern SItype __udivsi3 (SItype, SItype); -extern SItype __umodsi3 (SItype, SItype); +extern USItype __udivsi3 (USItype, USItype); +extern USItype __umodsi3 (USItype, USItype); extern HItype __divhi3 (HItype, HItype); extern HItype __modhi3 (HItype, HItype); extern UHItype __udivhi3 (UHItype, UHItype); OK, this seems to match what some other targets have. Except NIOS2 though, which is probably where this file was copied from. Andrew
Help,Guidance required-Reg.
Dear Sir, Please let me know where available GFortran Syntax,Key/Reserved words,Graphics to develop menu examples.I am new to GFortran and also to NetBeans IDE. How to get help from seers,pioneers, seniors regrading GFortran language help,Guidance? please let me know frankly,If any error regarding this mail message. please let me know,Is there any reference,users manuals available. Thanks Regards, D.SundarChand
[Patch, committed] PR fortran/101123 - [11/12 Regression] Invalid code for MAX0 with -fdefault-integer-8
As confirmed in the PR by Jakub, there was a bad conversion of the result of min0/max0 to the result type. We should just unconditionally convert in all cases. As a benefit, this also fixes pr100283. Committed after regtesting. Thanks, Harald Fortran - fix conversion to result type for the min/max intrinsic gcc/fortran/ChangeLog: PR fortran/100283 PR fortran/101123 * trans-intrinsic.c (gfc_conv_intrinsic_minmax): Unconditionally convert result of min/max to result type. gcc/testsuite/ChangeLog: PR fortran/100283 PR fortran/101123 * gfortran.dg/min0_max0_1.f90: New test. * gfortran.dg/min0_max0_2.f90: New test. diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c index 73b0bcc9dea..e578449995a 100644 --- a/gcc/fortran/trans-intrinsic.c +++ b/gcc/fortran/trans-intrinsic.c @@ -4147,10 +4147,7 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op) build_empty_stmt (input_location)); gfc_add_expr_to_block (&se->pre, tmp); } - if (TREE_CODE (type) == INTEGER_TYPE) -se->expr = fold_build1_loc (input_location, FIX_TRUNC_EXPR, type, mvar); - else -se->expr = convert (type, mvar); + se->expr = convert (type, mvar); } diff --git a/gcc/testsuite/gfortran.dg/min0_max0_1.f90 b/gcc/testsuite/gfortran.dg/min0_max0_1.f90 new file mode 100644 index 000..118b0f03b52 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/min0_max0_1.f90 @@ -0,0 +1,9 @@ +! { dg-do compile } +! { dg-options "-std=gnu" } +! PR fortran/100283 + +subroutine s () + integer(kind=8) :: i,j,k + i = min0 (j,k) + i = max0 (-127_8, min0 (j,127_8)) +end subroutine s diff --git a/gcc/testsuite/gfortran.dg/min0_max0_2.f90 b/gcc/testsuite/gfortran.dg/min0_max0_2.f90 new file mode 100644 index 000..3fe4fcd3609 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/min0_max0_2.f90 @@ -0,0 +1,10 @@ +! { dg-do compile } +! { dg-options "-fdefault-integer-8 -std=gnu" } +! PR fortran/101123 + +SUBROUTINE TEST + IMPLICIT INTEGER*4 (I-N) + MAXMN=MAX0(M,N) + MINMN=MIN0(M,0_4) + MAXRS=MAX1(R,S) +END SUBROUTINE TEST
PING [PATCH] PR fortran/100950 - ICE in output_constructor_regular_field, at varasm.c:5514
*PING* > Gesendet: Mittwoch, 09. Juni 2021 um 23:39 Uhr > Von: "Harald Anlauf" > An: "fortran" , "gcc-patches" > Betreff: [PATCH] PR fortran/100950 - ICE in output_constructor_regular_field, > at varasm.c:5514 > > Dear Fortranners, > > we should be able to simplify the length of a substring with known > constant bounds. The attached patch adds this. > > Regtested on x86_64-pc-linux-gnu. > > OK for mainline? Since this should be rather safe, to at least 11-branch? > > Thanks, > Harald > > > Fortran - simplify length of substring with constant bounds > > gcc/fortran/ChangeLog: > > PR fortran/100950 > * simplify.c (substring_has_constant_len): New. > (gfc_simplify_len): Handle case of substrings with constant > bounds. > > gcc/testsuite/ChangeLog: > > PR fortran/100950 > * gfortran.dg/pr100950.f90: New test. > >