date:20210618

[PATCH 0/5] amdgcn: Improve TImode support

2021-06-18 Thread Julian Brown

This patch series extends TImode support for AMD GCN (see e.g. PR96306
and PR95730). This fixes several test failures that appear at present,
and enables use of a 128-bit integer "omp_depend_kind" for OpenMP 5.0.

Tested with offloading to AMD GCN. Further commentary on invididual
patches.

Thanks,

Julian

Julian Brown (5):
  amdgcn: Use unsigned types for udivsi3/umodsi3 libgcc helper
args/return
  amdgcn: Add [us]mulsi3_highpart SGPR alternatives & [us]mulsid3/muldi3
expanders
  amdgcn: Add clrsbsi2/clrsbdi2 implementation
  amdgcn: Enable support for TImode for AMD GCN
  Fortran: Re-enable 128-bit integers for AMD GCN

 gcc/config/gcn/gcn.c   | 30 ++
 gcc/config/gcn/gcn.h   | 11 ++--
 gcc/config/gcn/gcn.md  | 95 +++---
 libgcc/config/gcn/lib2-bswapti2.c  | 47 +++
 libgcc/config/gcn/lib2-divmod-di.c | 35 +++
 libgcc/config/gcn/lib2-divmod.c|  8 +--
 libgcc/config/gcn/lib2-gcn.h   | 12 +++-
 libgcc/config/gcn/t-amdgcn |  2 +
 libgfortran/configure  | 22 ++-
 libgfortran/configure.ac   |  4 --
 10 files changed, 226 insertions(+), 40 deletions(-)
 create mode 100644 libgcc/config/gcn/lib2-bswapti2.c
 create mode 100644 libgcc/config/gcn/lib2-divmod-di.c

-- 
2.29.2

[PATCH 1/5] amdgcn: Use unsigned types for udivsi3/umodsi3 libgcc helper args/return

2021-06-18 Thread Julian Brown

This patch changes the argument and return types for the libgcc __udivsi3
and __umodsi3 helper functions for GCN to USItype instead of SItype.
This is probably just cosmetic in practice.

I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.

Thanks,

Julian

2021-06-18  Julian Brown  

libgcc/
* config/gcn/lib2-divmod.c (__udivsi3, __umodsi3): Change argument and
return types to USItype.
* config/gcn/lib2-gcn.h (__udivsi3, __umodsi3): Update prototypes.
---
 libgcc/config/gcn/lib2-divmod.c | 8 
 libgcc/config/gcn/lib2-gcn.h| 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/libgcc/config/gcn/lib2-divmod.c b/libgcc/config/gcn/lib2-divmod.c
index 0d6ca44f521..7c72e24e0c3 100644
--- a/libgcc/config/gcn/lib2-divmod.c
+++ b/libgcc/config/gcn/lib2-divmod.c
@@ -102,15 +102,15 @@ __modsi3 (SItype a, SItype b)
 }
 
 
-SItype
-__udivsi3 (SItype a, SItype b)
+USItype
+__udivsi3 (USItype a, USItype b)
 {
   return udivmodsi4 (a, b, 0);
 }
 
 
-SItype
-__umodsi3 (SItype a, SItype b)
+USItype
+__umodsi3 (USItype a, USItype b)
 {
   return udivmodsi4 (a, b, 1);
 }
diff --git a/libgcc/config/gcn/lib2-gcn.h b/libgcc/config/gcn/lib2-gcn.h
index 11476c4cda8..9223d73b8e7 100644
--- a/libgcc/config/gcn/lib2-gcn.h
+++ b/libgcc/config/gcn/lib2-gcn.h
@@ -38,8 +38,8 @@ typedef int word_type __attribute__ ((mode (__word__)));
 /* Exported functions.  */
 extern SItype __divsi3 (SItype, SItype);
 extern SItype __modsi3 (SItype, SItype);
-extern SItype __udivsi3 (SItype, SItype);
-extern SItype __umodsi3 (SItype, SItype);
+extern USItype __udivsi3 (USItype, USItype);
+extern USItype __umodsi3 (USItype, USItype);
 extern HItype __divhi3 (HItype, HItype);
 extern HItype __modhi3 (HItype, HItype);
 extern UHItype __udivhi3 (UHItype, UHItype);
-- 
2.29.2

[PATCH 2/5] amdgcn: Add [us]mulsi3_highpart SGPR alternatives & [us]mulsid3/muldi3 expanders

2021-06-18 Thread Julian Brown

This patch improves 64-bit multiplication for AMD GCN: patterns for
unsigned and signed 32x32->64 bit multiplication have been added, and
also 64x64->64 bit multiplication is now open-coded rather than calling
a library function (which may be a win for code size as well as speed:
the function calling sequence isn't particularly concise for GCN).

The mulsi3_highpart pattern has also been extended for GCN5+, since
that ISA version supports high-part result multiply instructions with
SGPR operands.

The DImode multiply implementation is lost from libgcc if we build it
for DImode/TImode rather than SImode/DImode, a change we make in a later
patch in this series.

I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.

Thanks,

Julian

2021-06-18  Julian Brown  

gcc/
* config/gcn/gcn.md (mulsi3_highpart): Add SGPR alternatives for
GCN5+.
(mulsidi3, muldi3): Add expanders.
---
 gcc/config/gcn/gcn.md | 55 ++-
 1 file changed, 49 insertions(+), 6 deletions(-)

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index b5f895a93e2..70655ca4b8b 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -1392,19 +1392,62 @@
 (define_code_attr e [(sign_extend "e") (zero_extend "")])
 
 (define_insn "mulsi3_highpart"
-  [(set (match_operand:SI 0 "register_operand""= v")
+  [(set (match_operand:SI 0 "register_operand" "=Sg, Sg,  v")
(truncate:SI
  (lshiftrt:DI
(mult:DI
  (any_extend:DI
-   (match_operand:SI 1 "register_operand" "% v"))
+   (match_operand:SI 1 "register_operand" "%SgA,SgA,  v"))
  (any_extend:DI
-   (match_operand:SI 2 "register_operand" "vSv")))
+   (match_operand:SI 2 "register_operand"  "SgA,  B,vSv")))
(const_int 32]
   ""
-  "v_mul_hi0\t%0, %2, %1"
-  [(set_attr "type" "vop3a")
-   (set_attr "length" "8")])
+  "@
+  s_mul_hi0\t%0, %1, %2
+  s_mul_hi0\t%0, %1, %2
+  v_mul_hi0\t%0, %2, %1"
+  [(set_attr "type" "sop2,sop2,vop3a")
+   (set_attr "length" "4,8,8")
+   (set_attr "gcn_version" "gcn5,gcn5,*")])
+
+(define_expand "mulsidi3"
+  [(set (match_operand:DI 0 "register_operand" "")
+   (mult:DI
+ (any_extend:DI (match_operand:SI 1 "register_operand" ""))
+ (any_extend:DI (match_operand:SI 2 "register_operand" ""]
+  ""
+  {
+rtx dst = gen_reg_rtx (DImode);
+rtx dstlo = gen_lowpart (SImode, dst);
+rtx dsthi = gen_highpart_mode (SImode, DImode, dst);
+emit_insn (gen_mulsi3 (dstlo, operands[1], operands[2]));
+emit_insn (gen_mulsi3_highpart (dsthi, operands[1], operands[2]));
+emit_move_insn (operands[0], dst);
+DONE;
+  })
+
+(define_expand "muldi3"
+  [(set (match_operand:DI 0 "register_operand" "")
+   (mult:DI (match_operand:DI 1 "register_operand" "")
+(match_operand:DI 2 "register_operand" "")))]
+  ""
+  {
+rtx tmp0 = gen_reg_rtx (SImode);
+rtx tmp1 = gen_reg_rtx (SImode);
+rtx dst = gen_reg_rtx (DImode);
+rtx dsthi = gen_highpart_mode (SImode, DImode, dst);
+rtx op1lo = gen_lowpart (SImode, operands[1]);
+rtx op1hi = gen_highpart_mode (SImode, DImode, operands[1]);
+rtx op2lo = gen_lowpart (SImode, operands[2]);
+rtx op2hi = gen_highpart_mode (SImode, DImode, operands[2]);
+emit_insn (gen_umulsidi3 (dst, op1lo, op2lo));
+emit_insn (gen_mulsi3 (tmp0, op1lo, op2hi));
+emit_insn (gen_addsi3 (dsthi, dsthi, tmp0));
+emit_insn (gen_mulsi3 (tmp1, op1hi, op2lo));
+emit_insn (gen_addsi3 (dsthi, dsthi, tmp1));
+emit_move_insn (operands[0], dst);
+DONE;
+  })
 
 (define_insn "mulhisi3"
   [(set (match_operand:SI 0 "register_operand" "=v")
-- 
2.29.2

[PATCH 3/5] amdgcn: Add clrsbsi2/clrsbdi2 implementation

2021-06-18 Thread Julian Brown

This patch adds an open-coded implementation of the clrsb2
(count leading redundant sign bit) standard names using the GCN flbit_i*
instructions for SImode and DImode.  Those don't count exactly as we need,
so we need a couple of other instructions to fix up the result afterwards.

These patterns are lost from libgcc if we build it for DImode/TImode
rather than SImode/DImode, a change we make in a later patch in this
series.

I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.

Thanks,

Julian

2021-06-18  Julian Brown  

gcc/
* config/gcn/gcn.md (UNSPEC_FLBIT_INT): New unspec constant.
(s_mnemonic): Add clrsb.
(gcn_flbit_int): Add insn pattern for SImode/DImode.
(clrsb2): Add expander for SImode/DImode.
---
 gcc/config/gcn/gcn.md | 40 ++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 70655ca4b8b..0fa7f86702e 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -81,7 +81,8 @@
   UNSPEC_MOV_FROM_LANE63
   UNSPEC_GATHER
   UNSPEC_SCATTER
-  UNSPEC_RCP])
+  UNSPEC_RCP
+  UNSPEC_FLBIT_INT])
 
 ;; }}}
 ;; {{{ Attributes
@@ -338,7 +339,8 @@
   [(not "not%b")
(popcount "bcnt1_i32%b")
(clz "flbit_i32%b")
-   (ctz "ff1_i32%b")])
+   (ctz "ff1_i32%b")
+   (clrsb "flbit_i32%i")])
 
 (define_code_attr revmnemonic
   [(minus "subrev%i")
@@ -1509,6 +1511,40 @@
   [(set_attr "type" "sop1")
(set_attr "length" "4,8")])
 
+(define_insn "gcn_flbit_int"
+  [(set (match_operand:SI 0 "register_operand"  "=Sg,Sg")
+   (unspec:SI [(match_operand:SIDI 1 "gcn_alu_operand" "SgA, B")]
+  UNSPEC_FLBIT_INT))]
+  ""
+  {
+if (mode == SImode)
+  return "s_flbit_i32\t%0, %1";
+else
+  return "s_flbit_i32_i64\t%0, %1";
+  }
+  [(set_attr "type" "sop1")
+   (set_attr "length" "4,8")])
+
+(define_expand "clrsb2"
+  [(set (match_operand:SI 0 "register_operand" "")
+   (clrsb:SI (match_operand:SIDI 1 "gcn_alu_operand" "")))]
+  ""
+  {
+rtx tmp = gen_reg_rtx (SImode);
+/* FLBIT_I* counts sign or zero bits at the most-significant end of the
+   input register (and returns -1 for 0/-1 inputs).  We want the number of
+   *redundant* bits (i.e. that value minus one), and an answer of 31/63 for
+   0/-1 inputs.  We can do that in three instructions...  */
+emit_insn (gen_gcn_flbit_int (tmp, operands[1]));
+emit_insn (gen_uminsi3 (tmp, tmp,
+   gen_int_mode (GET_MODE_BITSIZE (mode),
+ SImode)));
+/* If we put this last, it can potentially be folded into a subsequent
+   arithmetic operation.  */
+emit_insn (gen_subsi3 (operands[0], tmp, const1_rtx));
+DONE;
+  })
+
 ;; }}}
 ;; {{{ ALU: generic 32-bit binop
 
-- 
2.29.2

[PATCH 4/5] amdgcn: Enable support for TImode for AMD GCN

2021-06-18 Thread Julian Brown

This patch enables support for TImode for AMD GCN, the lack of which
is currently causing a number of test failures for the target and which
is also needed to support "omp_depend_kind" for OpenMP 5.0, since that
is implemented as a 128-bit integer.

Several libgcc support routines are built by default for the "word size"
of a machine, and also for "2 * word size" of the machine.  The libgcc
build for AMD GCN is changed so that it builds for a "word size" of 64
bits, in order to better match the (64-bit) host compiler.  However it
isn't really true that we have 64-bit words -- GCN has 32-bit registers,
so changing UNITS_PER_WORD unconditionally would be the wrong thing to do.

Changing this setting for libgcc (only) means that support routines
are built for "single word" operations that are DImode (64 bits), and
those for "double word" operations are built for TImode (128 bits).
That leaves some gaps regarding previous operations that were built
for a "single word" size of 32 bits and a "double word" size of 64 bits
(generic code doesn't cover both alternatives for all operations that
might be needed).  Those gaps are filled in by this patch, or by the
preceding patches in the series.

I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.

Thanks,

Julian

2021-06-18  Julian Brown  

gcc/
* config/gcn/gcn.c (gcn_init_libfuncs): New function.
(TARGET_INIT_LIBFUNCS): Define target hook using above function.
* config/gcn/gcn.h (UNITS_PER_WORD): Define to 8 for IN_LIBGCC2, 4
otherwise.
(LIBGCC2_UNITS_PER_WORD, BITS_PER_WORD): Remove definitions.
(MAX_FIXED_MODE_SIZE): Change to 128.

libgcc/
* config/gcn/lib2-bswapti2.c: New file.
* config/gcn/lib2-divmod-di.c: New file.
* config/gcn/lib2-gcn.h (DItype, UDItype, TItype, UTItype): Add
typedefs.
(__divdi3, __moddi3, __udivdi3, __umoddi3): Add prototypes.
* config/gcn/t-amdgcn (LIB2ADD): Add lib2-divmod-di.c and
lib2-bswapti2.c.
---
 gcc/config/gcn/gcn.c   | 30 +++
 gcc/config/gcn/gcn.h   | 11 ---
 libgcc/config/gcn/lib2-bswapti2.c  | 47 ++
 libgcc/config/gcn/lib2-divmod-di.c | 35 ++
 libgcc/config/gcn/lib2-gcn.h   |  8 +
 libgcc/config/gcn/t-amdgcn |  2 ++
 6 files changed, 129 insertions(+), 4 deletions(-)
 create mode 100644 libgcc/config/gcn/lib2-bswapti2.c
 create mode 100644 libgcc/config/gcn/lib2-divmod-di.c

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 283a91fe50a..45f37d5310d 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -3610,6 +3610,34 @@ gcn_init_builtins (void)
 #endif
 }
 
+/* Implement TARGET_INIT_LIBFUNCS.  */
+
+static void
+gcn_init_libfuncs (void)
+{
+  /* BITS_PER_UNIT * 2 is 64 bits, which causes
+ optabs-libfuncs.c:gen_int_libfunc to omit TImode (i.e 128 bits)
+ libcalls that we need to support operations for that type.  Initialise
+ them here instead.  */
+  set_optab_libfunc (udiv_optab, TImode, "__udivti3");
+  set_optab_libfunc (umod_optab, TImode, "__umodti3");
+  set_optab_libfunc (sdiv_optab, TImode, "__divti3");
+  set_optab_libfunc (smod_optab, TImode, "__modti3");
+  set_optab_libfunc (smul_optab, TImode, "__multi3");
+  set_optab_libfunc (addv_optab, TImode, "__addvti3");
+  set_optab_libfunc (subv_optab, TImode, "__subvti3");
+  set_optab_libfunc (negv_optab, TImode, "__negvti2");
+  set_optab_libfunc (absv_optab, TImode, "__absvti2");
+  set_optab_libfunc (smulv_optab, TImode, "__mulvti3");
+  set_optab_libfunc (ffs_optab, TImode, "__ffsti2");
+  set_optab_libfunc (clz_optab, TImode, "__clzti2");
+  set_optab_libfunc (ctz_optab, TImode, "__ctzti2");
+  set_optab_libfunc (clrsb_optab, TImode, "__clrsbti2");
+  set_optab_libfunc (popcount_optab, TImode, "__popcountti2");
+  set_optab_libfunc (parity_optab, TImode, "__parityti2");
+  set_optab_libfunc (bswap_optab, TImode, "__bswapti2");
+}
+
 /* Expand the CMP_SWAP GCN builtins.  We have our own versions that do
not require taking the address of any object, other than the memory
cell being operated on.
@@ -6336,6 +6364,8 @@ gcn_dwarf_register_span (rtx rtl)
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 #undef  TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS gcn_init_builtins
+#undef  TARGET_INIT_LIBFUNCS
+#define TARGET_INIT_LIBFUNCS gcn_init_libfuncs
 #undef  TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
 #define TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS \
   gcn_ira_change_pseudo_allocno_class
diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index eba4646f1bf..540835b81cc 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -46,9 +46,12 @@
 #define BYTES_BIG_ENDIAN 0
 #define WORDS_BIG_ENDIAN 0
 
-#define BITS_PER_WORD 32
-#define UNITS_PER_WORD (BITS_PER_WORD/BITS_PER_UNIT)
-#define LIBGCC2_UNITS_PER_WORD 4
+#ifdef IN_LIBGCC2
+/* We want DImod

[PATCH 5/5] Fortran: Re-enable 128-bit integers for AMD GCN

2021-06-18 Thread Julian Brown

This patch reverts the part of Tobias's patch for PR target/96306 that
disables 128-bit integer support for AMD GCN.

OK for mainline (assuming the previous patches are in first)?

Thanks,

Julian

2021-06-18  Julian Brown  

libgfortran/
PR target/96306
* configure.ac: Remove stanza that removes KIND=16 integers for AMD GCN.
* configure: Regenerate.
---
 libgfortran/configure| 22 --
 libgfortran/configure.ac |  4 
 2 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/libgfortran/configure b/libgfortran/configure
index f3634389cf8..886216f69d4 100755
--- a/libgfortran/configure
+++ b/libgfortran/configure
@@ -6017,7 +6017,7 @@ case "$host" in
 case "$enable_cet" in
   auto)
# Check if target supports multi-byte NOPs
-   # and if assembler supports CET insn.
+   # and if compiler and assembler support CET insn.
cet_save_CFLAGS="$CFLAGS"
CFLAGS="$CFLAGS -fcf-protection"
cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -6216,10 +6216,6 @@ fi
 LIBGOMP_CHECKED_INT_KINDS="1 2 4 8 16"
 LIBGOMP_CHECKED_REAL_KINDS="4 8 10 16"
 
-if test "x${target_cpu}" = xamdgcn; then
-  # amdgcn only has limited support for __int128.
-  LIBGOMP_CHECKED_INT_KINDS="1 2 4 8"
-fi
 
 
 
@@ -12731,7 +12727,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12744 "configure"
+#line 12730 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12837,7 +12833,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12850 "configure"
+#line 12836 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15532,16 +15528,6 @@ freebsd* | dragonfly*)
   esac
   ;;
 
-gnu*)
-  version_type=linux
-  need_lib_prefix=no
-  need_version=no
-  library_names_spec='${libname}${release}${shared_ext}$versuffix 
${libname}${release}${shared_ext}${major} ${libname}${shared_ext}'
-  soname_spec='${libname}${release}${shared_ext}$major'
-  shlibpath_var=LD_LIBRARY_PATH
-  hardcode_into_libs=yes
-  ;;
-
 haiku*)
   version_type=linux
   need_lib_prefix=no
@@ -15663,7 +15649,7 @@ linux*oldld* | linux*aout* | linux*coff*)
 # project, but have not yet been accepted: they are GCC-local changes
 # for the time being.  (See
 # https://lists.gnu.org/archive/html/libtool-patches/2018-05/msg0.html)
-linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinuxfdpiceabi)
+linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu* | uclinuxfdpiceabi)
   version_type=linux
   need_lib_prefix=no
   need_version=no
diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index 8961e314d82..523eb24bca1 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -222,10 +222,6 @@ AM_CONDITIONAL(LIBGFOR_MINIMAL, [test "x${target_cpu}" = 
xnvptx])
 LIBGOMP_CHECKED_INT_KINDS="1 2 4 8 16"
 LIBGOMP_CHECKED_REAL_KINDS="4 8 10 16"
 
-if test "x${target_cpu}" = xamdgcn; then
-  # amdgcn only has limited support for __int128.
-  LIBGOMP_CHECKED_INT_KINDS="1 2 4 8"
-fi
 AC_SUBST(LIBGOMP_CHECKED_INT_KINDS)
 AC_SUBST(LIBGOMP_CHECKED_REAL_KINDS)
 
-- 
2.29.2

Re: [PATCH 2/5] amdgcn: Add [us]mulsi3_highpart SGPR alternatives & [us]mulsid3/muldi3 expanders

2021-06-18 Thread Andrew Stubbs


On 18/06/2021 15:19, Julian Brown wrote:

This patch improves 64-bit multiplication for AMD GCN: patterns for
unsigned and signed 32x32->64 bit multiplication have been added, and
also 64x64->64 bit multiplication is now open-coded rather than calling
a library function (which may be a win for code size as well as speed:
the function calling sequence isn't particularly concise for GCN).

The mulsi3_highpart pattern has also been extended for GCN5+, since
that ISA version supports high-part result multiply instructions with
SGPR operands.

The DImode multiply implementation is lost from libgcc if we build it
for DImode/TImode rather than SImode/DImode, a change we make in a later
patch in this series.

I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.

Thanks,

Julian

2021-06-18  Julian Brown  

gcc/
* config/gcn/gcn.md (mulsi3_highpart): Add SGPR alternatives for
GCN5+.
(mulsidi3, muldi3): Add expanders.
---
  gcc/config/gcn/gcn.md | 55 ++-
  1 file changed, 49 insertions(+), 6 deletions(-)

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index b5f895a93e2..70655ca4b8b 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -1392,19 +1392,62 @@
  (define_code_attr e [(sign_extend "e") (zero_extend "")])
  
  (define_insn "mulsi3_highpart"

-  [(set (match_operand:SI 0 "register_operand"  "= v")
+  [(set (match_operand:SI 0 "register_operand"   "=Sg, Sg,  v")
(truncate:SI
  (lshiftrt:DI
(mult:DI
  (any_extend:DI
-   (match_operand:SI 1 "register_operand" "% v"))
+   (match_operand:SI 1 "register_operand" "%SgA,SgA,  v"))
  (any_extend:DI
-   (match_operand:SI 2 "register_operand" "vSv")))
+   (match_operand:SI 2 "register_operand"  "SgA,  B,vSv")))
(const_int 32]
""
-  "v_mul_hi0\t%0, %2, %1"
-  [(set_attr "type" "vop3a")
-   (set_attr "length" "8")])
+  "@
+  s_mul_hi0\t%0, %1, %2
+  s_mul_hi0\t%0, %1, %2
+  v_mul_hi0\t%0, %2, %1"
+  [(set_attr "type" "sop2,sop2,vop3a")
+   (set_attr "length" "4,8,8")
+   (set_attr "gcn_version" "gcn5,gcn5,*")])
+
+(define_expand "mulsidi3"
+  [(set (match_operand:DI 0 "register_operand" "")
+   (mult:DI
+ (any_extend:DI (match_operand:SI 1 "register_operand" ""))
+ (any_extend:DI (match_operand:SI 2 "register_operand" ""]
+  ""
+  {
+rtx dst = gen_reg_rtx (DImode);
+rtx dstlo = gen_lowpart (SImode, dst);
+rtx dsthi = gen_highpart_mode (SImode, DImode, dst);
+emit_insn (gen_mulsi3 (dstlo, operands[1], operands[2]));
+emit_insn (gen_mulsi3_highpart (dsthi, operands[1], operands[2]));
+emit_move_insn (operands[0], dst);
+DONE;
+  })
+
+(define_expand "muldi3"
+  [(set (match_operand:DI 0 "register_operand" "")
+   (mult:DI (match_operand:DI 1 "register_operand" "")
+(match_operand:DI 2 "register_operand" "")))]
+  ""
+  {
+rtx tmp0 = gen_reg_rtx (SImode);
+rtx tmp1 = gen_reg_rtx (SImode);
+rtx dst = gen_reg_rtx (DImode);
+rtx dsthi = gen_highpart_mode (SImode, DImode, dst);
+rtx op1lo = gen_lowpart (SImode, operands[1]);
+rtx op1hi = gen_highpart_mode (SImode, DImode, operands[1]);
+rtx op2lo = gen_lowpart (SImode, operands[2]);
+rtx op2hi = gen_highpart_mode (SImode, DImode, operands[2]);
+emit_insn (gen_umulsidi3 (dst, op1lo, op2lo));
+emit_insn (gen_mulsi3 (tmp0, op1lo, op2hi));
+emit_insn (gen_addsi3 (dsthi, dsthi, tmp0));
+emit_insn (gen_mulsi3 (tmp1, op1hi, op2lo));
+emit_insn (gen_addsi3 (dsthi, dsthi, tmp1));
+emit_move_insn (operands[0], dst);
+DONE;
+  })
  
  (define_insn "mulhisi3"

[(set (match_operand:SI 0 "register_operand"  "=v")



Most of the rest of the backend expands 64-bit operations to 32-bit 
pairs much later, using define_insn_and_split, because there were lots 
of issues with splitting it early. I don't recall exactly what right 
now, unfortunately. (It might have been related to spilling only half 
the value to the stack?) It also makes it hard to debug, I think.


Andrew

Re: [PATCH 3/5] amdgcn: Add clrsbsi2/clrsbdi2 implementation

2021-06-18 Thread Andrew Stubbs


On 18/06/2021 15:19, Julian Brown wrote:

This patch adds an open-coded implementation of the clrsb2
(count leading redundant sign bit) standard names using the GCN flbit_i*
instructions for SImode and DImode.  Those don't count exactly as we need,
so we need a couple of other instructions to fix up the result afterwards.

These patterns are lost from libgcc if we build it for DImode/TImode
rather than SImode/DImode, a change we make in a later patch in this
series.

I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.

Thanks,

Julian

2021-06-18  Julian Brown  

gcc/
* config/gcn/gcn.md (UNSPEC_FLBIT_INT): New unspec constant.
(s_mnemonic): Add clrsb.
(gcn_flbit_int): Add insn pattern for SImode/DImode.
(clrsb2): Add expander for SImode/DImode.
---
  gcc/config/gcn/gcn.md | 40 ++--
  1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 70655ca4b8b..0fa7f86702e 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -81,7 +81,8 @@
UNSPEC_MOV_FROM_LANE63
UNSPEC_GATHER
UNSPEC_SCATTER
-  UNSPEC_RCP])
+  UNSPEC_RCP
+  UNSPEC_FLBIT_INT])
  
  ;; }}}

  ;; {{{ Attributes
@@ -338,7 +339,8 @@
[(not "not%b")
 (popcount "bcnt1_i32%b")
 (clz "flbit_i32%b")
-   (ctz "ff1_i32%b")])
+   (ctz "ff1_i32%b")
+   (clrsb "flbit_i32%i")])
  
  (define_code_attr revmnemonic

[(minus "subrev%i")
@@ -1509,6 +1511,40 @@
[(set_attr "type" "sop1")
 (set_attr "length" "4,8")])
  
+(define_insn "gcn_flbit_int"

+  [(set (match_operand:SI 0 "register_operand"  "=Sg,Sg")
+   (unspec:SI [(match_operand:SIDI 1 "gcn_alu_operand" "SgA, B")]
+  UNSPEC_FLBIT_INT))]
+  ""
+  {
+if (mode == SImode)
+  return "s_flbit_i32\t%0, %1";
+else
+  return "s_flbit_i32_i64\t%0, %1";
+  }
+  [(set_attr "type" "sop1")
+   (set_attr "length" "4,8")])
+
+(define_expand "clrsb2"
+  [(set (match_operand:SI 0 "register_operand" "")
+   (clrsb:SI (match_operand:SIDI 1 "gcn_alu_operand" "")))]
+  ""
+  {
+rtx tmp = gen_reg_rtx (SImode);
+/* FLBIT_I* counts sign or zero bits at the most-significant end of the
+   input register (and returns -1 for 0/-1 inputs).  We want the number of
+   *redundant* bits (i.e. that value minus one), and an answer of 31/63 for
+   0/-1 inputs.  We can do that in three instructions...  */
+emit_insn (gen_gcn_flbit_int (tmp, operands[1]));
+emit_insn (gen_uminsi3 (tmp, tmp,
+   gen_int_mode (GET_MODE_BITSIZE (mode),
+ SImode)));
+/* If we put this last, it can potentially be folded into a subsequent
+   arithmetic operation.  */
+emit_insn (gen_subsi3 (operands[0], tmp, const1_rtx));
+DONE;
+  })
+
  ;; }}}
  ;; {{{ ALU: generic 32-bit binop
  



OK.

Andrew

Re: [PATCH 4/5] amdgcn: Enable support for TImode for AMD GCN

2021-06-18 Thread Andrew Stubbs


On 18/06/2021 15:19, Julian Brown wrote:

This patch enables support for TImode for AMD GCN, the lack of which
is currently causing a number of test failures for the target and which
is also needed to support "omp_depend_kind" for OpenMP 5.0, since that
is implemented as a 128-bit integer.

Several libgcc support routines are built by default for the "word size"
of a machine, and also for "2 * word size" of the machine.  The libgcc
build for AMD GCN is changed so that it builds for a "word size" of 64
bits, in order to better match the (64-bit) host compiler.  However it
isn't really true that we have 64-bit words -- GCN has 32-bit registers,
so changing UNITS_PER_WORD unconditionally would be the wrong thing to do.

Changing this setting for libgcc (only) means that support routines
are built for "single word" operations that are DImode (64 bits), and
those for "double word" operations are built for TImode (128 bits).
That leaves some gaps regarding previous operations that were built
for a "single word" size of 32 bits and a "double word" size of 64 bits
(generic code doesn't cover both alternatives for all operations that
might be needed).  Those gaps are filled in by this patch, or by the
preceding patches in the series.

I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.

Thanks,

Julian

2021-06-18  Julian Brown  

gcc/
* config/gcn/gcn.c (gcn_init_libfuncs): New function.
(TARGET_INIT_LIBFUNCS): Define target hook using above function.
* config/gcn/gcn.h (UNITS_PER_WORD): Define to 8 for IN_LIBGCC2, 4
otherwise.
(LIBGCC2_UNITS_PER_WORD, BITS_PER_WORD): Remove definitions.
(MAX_FIXED_MODE_SIZE): Change to 128.

libgcc/
* config/gcn/lib2-bswapti2.c: New file.
* config/gcn/lib2-divmod-di.c: New file.
* config/gcn/lib2-gcn.h (DItype, UDItype, TItype, UTItype): Add
typedefs.
(__divdi3, __moddi3, __udivdi3, __umoddi3): Add prototypes.
* config/gcn/t-amdgcn (LIB2ADD): Add lib2-divmod-di.c and
lib2-bswapti2.c.
---
  gcc/config/gcn/gcn.c   | 30 +++
  gcc/config/gcn/gcn.h   | 11 ---
  libgcc/config/gcn/lib2-bswapti2.c  | 47 ++
  libgcc/config/gcn/lib2-divmod-di.c | 35 ++
  libgcc/config/gcn/lib2-gcn.h   |  8 +
  libgcc/config/gcn/t-amdgcn |  2 ++
  6 files changed, 129 insertions(+), 4 deletions(-)
  create mode 100644 libgcc/config/gcn/lib2-bswapti2.c
  create mode 100644 libgcc/config/gcn/lib2-divmod-di.c

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 283a91fe50a..45f37d5310d 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -3610,6 +3610,34 @@ gcn_init_builtins (void)
  #endif
  }
  
+/* Implement TARGET_INIT_LIBFUNCS.  */

+
+static void
+gcn_init_libfuncs (void)
+{
+  /* BITS_PER_UNIT * 2 is 64 bits, which causes
+ optabs-libfuncs.c:gen_int_libfunc to omit TImode (i.e 128 bits)
+ libcalls that we need to support operations for that type.  Initialise
+ them here instead.  */
+  set_optab_libfunc (udiv_optab, TImode, "__udivti3");
+  set_optab_libfunc (umod_optab, TImode, "__umodti3");
+  set_optab_libfunc (sdiv_optab, TImode, "__divti3");
+  set_optab_libfunc (smod_optab, TImode, "__modti3");
+  set_optab_libfunc (smul_optab, TImode, "__multi3");
+  set_optab_libfunc (addv_optab, TImode, "__addvti3");
+  set_optab_libfunc (subv_optab, TImode, "__subvti3");
+  set_optab_libfunc (negv_optab, TImode, "__negvti2");
+  set_optab_libfunc (absv_optab, TImode, "__absvti2");
+  set_optab_libfunc (smulv_optab, TImode, "__mulvti3");
+  set_optab_libfunc (ffs_optab, TImode, "__ffsti2");
+  set_optab_libfunc (clz_optab, TImode, "__clzti2");
+  set_optab_libfunc (ctz_optab, TImode, "__ctzti2");
+  set_optab_libfunc (clrsb_optab, TImode, "__clrsbti2");
+  set_optab_libfunc (popcount_optab, TImode, "__popcountti2");
+  set_optab_libfunc (parity_optab, TImode, "__parityti2");
+  set_optab_libfunc (bswap_optab, TImode, "__bswapti2");
+}
+
  /* Expand the CMP_SWAP GCN builtins.  We have our own versions that do
 not require taking the address of any object, other than the memory
 cell being operated on.
@@ -6336,6 +6364,8 @@ gcn_dwarf_register_span (rtx rtl)
  #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
  #undef  TARGET_INIT_BUILTINS
  #define TARGET_INIT_BUILTINS gcn_init_builtins
+#undef  TARGET_INIT_LIBFUNCS
+#define TARGET_INIT_LIBFUNCS gcn_init_libfuncs
  #undef  TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
  #define TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS \
gcn_ira_change_pseudo_allocno_class
diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index eba4646f1bf..540835b81cc 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -46,9 +46,12 @@
  #define BYTES_BIG_ENDIAN 0
  #define WORDS_BIG_ENDIAN 0
  
-#define BITS_PER_WORD 32

-#define UNITS_PER_WORD (BITS_PER_WORD/BITS_PER_UNIT)
-

Re: [PATCH 1/5] amdgcn: Use unsigned types for udivsi3/umodsi3 libgcc helper args/return

2021-06-18 Thread Andrew Stubbs


On 18/06/2021 15:19, Julian Brown wrote:

This patch changes the argument and return types for the libgcc __udivsi3
and __umodsi3 helper functions for GCN to USItype instead of SItype.
This is probably just cosmetic in practice.

I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.

Thanks,

Julian

2021-06-18  Julian Brown  

libgcc/
* config/gcn/lib2-divmod.c (__udivsi3, __umodsi3): Change argument and
return types to USItype.
* config/gcn/lib2-gcn.h (__udivsi3, __umodsi3): Update prototypes.
---
  libgcc/config/gcn/lib2-divmod.c | 8 
  libgcc/config/gcn/lib2-gcn.h| 4 ++--
  2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/libgcc/config/gcn/lib2-divmod.c b/libgcc/config/gcn/lib2-divmod.c
index 0d6ca44f521..7c72e24e0c3 100644
--- a/libgcc/config/gcn/lib2-divmod.c
+++ b/libgcc/config/gcn/lib2-divmod.c
@@ -102,15 +102,15 @@ __modsi3 (SItype a, SItype b)
  }
  
  
-SItype

-__udivsi3 (SItype a, SItype b)
+USItype
+__udivsi3 (USItype a, USItype b)
  {
return udivmodsi4 (a, b, 0);
  }
  
  
-SItype

-__umodsi3 (SItype a, SItype b)
+USItype
+__umodsi3 (USItype a, USItype b)
  {
return udivmodsi4 (a, b, 1);
  }
diff --git a/libgcc/config/gcn/lib2-gcn.h b/libgcc/config/gcn/lib2-gcn.h
index 11476c4cda8..9223d73b8e7 100644
--- a/libgcc/config/gcn/lib2-gcn.h
+++ b/libgcc/config/gcn/lib2-gcn.h
@@ -38,8 +38,8 @@ typedef int word_type __attribute__ ((mode (__word__)));
  /* Exported functions.  */
  extern SItype __divsi3 (SItype, SItype);
  extern SItype __modsi3 (SItype, SItype);
-extern SItype __udivsi3 (SItype, SItype);
-extern SItype __umodsi3 (SItype, SItype);
+extern USItype __udivsi3 (USItype, USItype);
+extern USItype __umodsi3 (USItype, USItype);
  extern HItype __divhi3 (HItype, HItype);
  extern HItype __modhi3 (HItype, HItype);
  extern UHItype __udivhi3 (UHItype, UHItype);


OK, this seems to match what some other targets have. Except NIOS2 
though, which is probably where this file was copied from.


Andrew

Help,Guidance required-Reg.

2021-06-18 Thread sunderchand via Fortran

Dear Sir,

Please let me know where available GFortran Syntax,Key/Reserved words,Graphics 
to develop menu examples.I am new to GFortran and also to NetBeans IDE.

How to get help from seers,pioneers, seniors regrading GFortran language 
help,Guidance?

please let me know frankly,If any error regarding this mail message.

please let me know,Is there any reference,users manuals available.  

Thanks

Regards,
D.SundarChand

[Patch, committed] PR fortran/101123 - [11/12 Regression] Invalid code for MAX0 with -fdefault-integer-8

2021-06-18 Thread Harald Anlauf via Fortran

As confirmed in the PR by Jakub, there was a bad conversion of the result
of min0/max0 to the result type.  We should just unconditionally convert
in all cases.

As a benefit, this also fixes pr100283.

Committed after regtesting.

Thanks,
Harald

Fortran - fix conversion to result type for the min/max intrinsic

gcc/fortran/ChangeLog:

PR fortran/100283
PR fortran/101123
* trans-intrinsic.c (gfc_conv_intrinsic_minmax): Unconditionally
convert result of min/max to result type.

gcc/testsuite/ChangeLog:

PR fortran/100283
PR fortran/101123
* gfortran.dg/min0_max0_1.f90: New test.
* gfortran.dg/min0_max0_2.f90: New test.

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 73b0bcc9dea..e578449995a 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -4147,10 +4147,7 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 			build_empty_stmt (input_location));
   gfc_add_expr_to_block (&se->pre, tmp);
 }
-  if (TREE_CODE (type) == INTEGER_TYPE)
-se->expr = fold_build1_loc (input_location, FIX_TRUNC_EXPR, type, mvar);
-  else
-se->expr = convert (type, mvar);
+  se->expr = convert (type, mvar);
 }


diff --git a/gcc/testsuite/gfortran.dg/min0_max0_1.f90 b/gcc/testsuite/gfortran.dg/min0_max0_1.f90
new file mode 100644
index 000..118b0f03b52
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min0_max0_1.f90
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! { dg-options "-std=gnu" }
+! PR fortran/100283
+
+subroutine s ()
+  integer(kind=8) :: i,j,k
+  i = min0 (j,k)
+  i = max0 (-127_8, min0 (j,127_8))
+end subroutine s
diff --git a/gcc/testsuite/gfortran.dg/min0_max0_2.f90 b/gcc/testsuite/gfortran.dg/min0_max0_2.f90
new file mode 100644
index 000..3fe4fcd3609
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min0_max0_2.f90
@@ -0,0 +1,10 @@
+! { dg-do compile }
+! { dg-options "-fdefault-integer-8 -std=gnu" }
+! PR fortran/101123
+
+SUBROUTINE TEST
+  IMPLICIT INTEGER*4 (I-N)
+  MAXMN=MAX0(M,N)
+  MINMN=MIN0(M,0_4)
+  MAXRS=MAX1(R,S)
+END SUBROUTINE TEST

PING [PATCH] PR fortran/100950 - ICE in output_constructor_regular_field, at varasm.c:5514

2021-06-18 Thread Harald Anlauf via Fortran

*PING*

> Gesendet: Mittwoch, 09. Juni 2021 um 23:39 Uhr
> Von: "Harald Anlauf" 
> An: "fortran" , "gcc-patches" 
> Betreff: [PATCH] PR fortran/100950 - ICE in output_constructor_regular_field, 
> at varasm.c:5514
>
> Dear Fortranners,
>
> we should be able to simplify the length of a substring with known
> constant bounds.  The attached patch adds this.
>
> Regtested on x86_64-pc-linux-gnu.
>
> OK for mainline?  Since this should be rather safe, to at least 11-branch?
>
> Thanks,
> Harald
>
>
> Fortran - simplify length of substring with constant bounds
>
> gcc/fortran/ChangeLog:
>
>   PR fortran/100950
>   * simplify.c (substring_has_constant_len): New.
>   (gfc_simplify_len): Handle case of substrings with constant
>   bounds.
>
> gcc/testsuite/ChangeLog:
>
>   PR fortran/100950
>   * gfortran.dg/pr100950.f90: New test.
>
>

[PATCH 0/5] amdgcn: Improve TImode support

[PATCH 1/5] amdgcn: Use unsigned types for udivsi3/umodsi3 libgcc helper args/return

[PATCH 2/5] amdgcn: Add [us]mulsi3_highpart SGPR alternatives & [us]mulsid3/muldi3 expanders

[PATCH 3/5] amdgcn: Add clrsbsi2/clrsbdi2 implementation

[PATCH 4/5] amdgcn: Enable support for TImode for AMD GCN

[PATCH 5/5] Fortran: Re-enable 128-bit integers for AMD GCN

Re: [PATCH 2/5] amdgcn: Add [us]mulsi3_highpart SGPR alternatives & [us]mulsid3/muldi3 expanders

Re: [PATCH 3/5] amdgcn: Add clrsbsi2/clrsbdi2 implementation

Re: [PATCH 4/5] amdgcn: Enable support for TImode for AMD GCN

Re: [PATCH 1/5] amdgcn: Use unsigned types for udivsi3/umodsi3 libgcc helper args/return

Help,Guidance required-Reg.

[Patch, committed] PR fortran/101123 - [11/12 Regression] Invalid code for MAX0 with -fdefault-integer-8

PING [PATCH] PR fortran/100950 - ICE in output_constructor_regular_field, at varasm.c:5514

13 matches

Site Navigation

Mail list logo

Footer information