Re: [PATCH] RISC-V: Enable builtin __riscv_mul with Zmmul extension.

2024-10-10 Thread Tsung Chun Lin
Hi Jeff,

Thanks for reviewing and sorry for the testsuite without any update in
my patch to cause regression failures..

I will re-submit the patches with the updated testsuite.

Tsung chun

Patrick O'Neill  於 2024年10月10日 週四 上午7:35寫道:
>
>
> On 10/9/24 14:50, Jeff Law wrote:
> >
> >
> > On 10/9/24 3:21 PM, Patrick O'Neill wrote:
> >>
> >> On 10/9/24 14:07, Jeff Law wrote:
> >>> 
> >>>
> >>> Also note that if you use the tag "[RISC-V]" in your subject line
> >>> your patch will be automatically picked up by a pre-commit tester
> >>> that can be subsequently examined to verify behavior.
> >>>
> >> This patch's subject line looks good to me. It would've been picked
> >> up as-is since it mentions riscv/risc-v.
> >>
> >> The patch doesn't show up in patchworks so that's what stopped the
> >> risc- v pre-commit from finding it.
> >>
> >> Sadly I don't have much insight into what stopped patchworks from
> >> seeing it. :-/
> > I'd assumed it wasn't [RISC-V], but you know that aspect better than I
> > :-)
> >
> That's a safe first guess :)
> The flow for precommit gets new patches from the Patchworks API, so if
> it isn't in patchworks then precommit won't see it.
> We have patchworks to handle parsing emails/extracting patches for us :)
>
>  From poking around the patchworks source code my new best guess is that
> the Content-Type header of the attachment in the original email threw it
> off:
>
> --79e1d00623f13532
> Content-Type: application/octet-stream;
>   name="0001-RISC-V-Enable-builtin-__riscv_mul-with-Zmmul-extensi.patch"
>
> Seems like patchworks ignores all attachments that aren't `*/x-patch`,
> `*/x-diff`, `text/*`?
> https://github.com/getpatchwork/patchwork/blob/4dfe6991a7bcdb11fd878a087aba314e9fdaa2db/patchwork/parser.py#L686
> https://github.com/getpatchwork/patchwork/blob/4dfe6991a7bcdb11fd878a087aba314e9fdaa2db/patchwork/parser.py#L639
>
> Patrick


[PATCH] This is a test, please ignore

2024-10-10 Thread Christophe Lyon
This is a test patch, please ignore.

---
ci-tag: skip
---
 README | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README b/README
index be15bc2b44e..7a3d7cfeb74 100644
--- a/README
+++ b/README
@@ -1,3 +1,5 @@
+THIS IS A TEST -- IGNORE
+
 This directory contains the GNU Compiler Collection (GCC).
 
 The GNU Compiler Collection is free software.  See the files whose
-- 
2.34.1



[PATCH 1/2] libstdc++: Enable memcpy optimizations for distinct integral types [PR93059]

2024-10-10 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

Currently we only optimize std::copy, std::copy_n etc. to memmove when
the source and destination types are the same. This means that we fail
to optimize copying between distinct 1-byte types, e.g. copying from a
buffer of std::byte to a buffer of unsigned char.

This patch adds more partial specializations of the __memcpyable trait
so that we allow memcpy between integers of equal widths. This will
enable memmove for copying std::byte to unsigned char, and copying int
to unsigned, and long to long long (for I32LP64) or long to int (for
ILP32).

Enabling the optimization needs to be based on the width of the integer
type, not just the size in bytes. This is because some targets define
non-standard integral types such as __int20 in msp430, which has padding
bits. It would not be safe to memcpy between e.g. __int20 and int32_t,
even though sizeof(__int20) == sizeof(int32_t). A new trait is
introduced to define the width, __memcpyable_integer, and then the
__memcpyable trait compares the widths.

It's safe to copy between signed and unsigned integers of the same
width, because GCC only supports two's complement integers.

We can also add the specialization __memcpyable_integer to enable
copying between narrow character types and std::byte.

libstdc++-v3/ChangeLog:

PR libstdc++/93059
* include/bits/cpp_type_traits.h (__memcpyable): Add partial
specialization for pointers to distinct types.
(__memcpyable_integer): New trait to control which types can use
cross-type memcpy optimizations.
---
 libstdc++-v3/include/bits/cpp_type_traits.h | 88 -
 1 file changed, 85 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
b/libstdc++-v3/include/bits/cpp_type_traits.h
index 19bf1edf647..8d386a36e62 100644
--- a/libstdc++-v3/include/bits/cpp_type_traits.h
+++ b/libstdc++-v3/include/bits/cpp_type_traits.h
@@ -414,7 +414,7 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
   typedef __true_type __type;
 };
 
-#if __cplusplus >= 201703L
+#ifdef __glibcxx_byte // C++ >= 17
   enum class byte : unsigned char;
 
   template<>
@@ -434,8 +434,6 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
 };
 #endif
 
-  template struct iterator_traits;
-
   // A type that is safe for use with memcpy, memmove, memcmp etc.
   template
 struct __is_nonvolatile_trivially_copyable
@@ -459,16 +457,100 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
   enum { __value = 0 };
 };
 
+  // Allow memcpy when source and destination are pointers to the same type.
   template
 struct __memcpyable<_Tp*, _Tp*>
 : __is_nonvolatile_trivially_copyable<_Tp>
 { };
 
+  // Source pointer can be const.
   template
 struct __memcpyable<_Tp*, const _Tp*>
 : __is_nonvolatile_trivially_copyable<_Tp>
 { };
 
+  template struct __memcpyable_integer;
+
+  // For heterogeneous types, allow memcpy between equal-sized integers.
+  template
+struct __memcpyable<_Tp*, _Up*>
+{
+  enum {
+   __value = __memcpyable_integer<_Tp>::__width != 0
+   && ((int)__memcpyable_integer<_Tp>::__width
+ == (int)__memcpyable_integer<_Up>::__width)
+  };
+};
+
+  // Specialization for const U* because __is_integer is never true.
+  template
+struct __memcpyable<_Tp*, const _Up*>
+: __memcpyable<_Tp*, _Up*>
+{ };
+
+  template
+struct __memcpyable_integer
+{
+  enum {
+   __width = __is_integer<_Tp>::__value ? (sizeof(_Tp) * __CHAR_BIT__) : 0
+  };
+};
+
+  // Cannot memcpy volatile memory.
+  template
+struct __memcpyable_integer
+{ enum { __width = 0 }; };
+
+#ifdef __glibcxx_byte // C++ >= 17
+  // std::byte is not an integer, but is safe to memcpy to/from char.
+  template<>
+struct __memcpyable_integer
+{ enum { __width = __CHAR_BIT__ }; };
+#endif
+
+  // Specializations for __intNN types with padding bits.
+#if defined __GLIBCXX_TYPE_INT_N_0 && __GLIBCXX_BITSIZE_INT_N_0 % __CHAR_BIT__
+  template<>
+struct __memcpyable_integer<__GLIBCXX_TYPE_INT_N_0>
+{ enum { __width = __GLIBCXX_BITSIZE_INT_N_0 }; };
+  template<>
+struct __memcpyable_integer
+{ enum { __width = __GLIBCXX_BITSIZE_INT_N_0 }; };
+#endif
+#if defined __GLIBCXX_TYPE_INT_N_1 && __GLIBCXX_BITSIZE_INT_N_1 % __CHAR_BIT__
+  template<>
+struct __memcpyable_integer<__GLIBCXX_TYPE_INT_N_1>
+{ enum { __width = __GLIBCXX_BITSIZE_INT_N_1 }; };
+  template<>
+struct __memcpyable_integer
+{ enum { __width = __GLIBCXX_BITSIZE_INT_N_1 }; };
+#endif
+#if defined __GLIBCXX_TYPE_INT_N_2 && __GLIBCXX_BITSIZE_INT_N_2 % __CHAR_BIT__
+  template<>
+struct __memcpyable_integer<__GLIBCXX_TYPE_INT_N_2>
+{ enum { __width = __GLIBCXX_BITSIZE_INT_N_2 }; };
+  template<>
+struct __memcpyable_integer
+{ enum { __width = __GLIBCXX_BITSIZE_INT_N_2 }; };
+#endif
+#if defined __GLIBCXX_TYPE_INT_N_3 && __GLIBCXX_BITSIZE_INT_N_3 % __CHAR_BIT__
+  template<>
+s

[PATCH 2/2] libstdc++: Enable memset optimizations for distinct character types [PR93059]

2024-10-10 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

Currently we only optimize std::fill to memset when the source and
destination types are the same byte-sized type. This means that we fail
to optimize when the fill character is another integer (e.g. a literal
int value), even though assigning an int to a char would produce the
same value as memset would (after converting the fill value to unsigned
char).

This patch enables the optimized code path when the fill character is a
memcpy-able integer (using the new __memcpyable_integer trait).

libstdc++-v3/ChangeLog:

PR libstdc++/93059
* include/bits/stl_algobase.h (__fill_a1(T*, T*, const T&)):
Change template parameters and enable_if condition to allow the
fill value to be an integer or std::byte.
---
 libstdc++-v3/include/bits/stl_algobase.h | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index 9e92211c124..dacbeaf5f64 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -967,23 +967,26 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 #pragma GCC diagnostic pop
 
   // Specialization: for char types we can use memset.
-  template
+  template
 _GLIBCXX20_CONSTEXPR
 inline typename
-__gnu_cxx::__enable_if<__is_byte<_Tp>::__value, void>::__type
-__fill_a1(_Tp* __first, _Tp* __last, const _Tp& __c)
+__gnu_cxx::__enable_if<__is_byte<_Up>::__value
+&& __memcpyable_integer<_Tp>::__value,
+  void>::__type
+__fill_a1(_Up* __first, _Up* __last, const _Tp& __x)
 {
-  const _Tp __tmp = __c;
+  // This hoists the load out of the loop and also ensures that we don't
+  // use memset for cases where the assignment would be ill-formed.
+  const _Up __val = __x;
 #if __cpp_lib_is_constant_evaluated
   if (std::is_constant_evaluated())
{
  for (; __first != __last; ++__first)
-   *__first = __tmp;
- return;
+   *__first = __val;
}
 #endif
   if (const size_t __len = __last - __first)
-   __builtin_memset(__first, static_cast(__tmp), __len);
+   __builtin_memset(__first, static_cast(__val), __len);
 }
 
   template
-- 
2.46.2



[PATCH] libstdc++: Rearrange std::move_iterator helpers in stl_iterator.h

2024-10-10 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

The __niter_base(move_iterator) overload and __is_move_iterator trait
were originally immediately after the definition of move_iterator. The
addition of C++20 features after move_iterator meant that those helpers
were no longer anywhere near move_iterator.

This change puts them back where they used to be, before all the new
C++20 additions.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (__niter_base(move_iterator))
(__is_move_iterator, __miter_base, _GLIBCXX_MAKE_MOVE_ITERATOR)
(_GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR): Move earlier in the
file.
---
 libstdc++-v3/include/bits/stl_iterator.h | 63 
 1 file changed, 31 insertions(+), 32 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 20c0319f3a7..28a600c81cb 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1349,9 +1349,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_NOEXCEPT_IF(std::is_nothrow_copy_constructible<_Iterator>::value)
 { return __it.base(); }
 
-#if __cplusplus >= 201103L
-
-#if __cplusplus <= 201703L
+#if __cplusplus >= 201103L && __cplusplus <= 201703L
   // Need to overload __to_address because the pointer_traits primary template
   // will deduce element_type of __normal_iterator as T* rather than T.
   template
@@ -1362,6 +1360,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return std::__to_address(__it.base()); }
 #endif
 
+#if __cplusplus >= 201103L
   /**
* @addtogroup iterators
* @{
@@ -1821,6 +1820,35 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __make_move_if_noexcept_iterator(_Tp* __i)
 { return _ReturnType(__i); }
 
+  template
+_GLIBCXX20_CONSTEXPR
+auto
+__niter_base(move_iterator<_Iterator> __it)
+-> decltype(make_move_iterator(__niter_base(__it.base(
+{ return make_move_iterator(__niter_base(__it.base())); }
+
+  template
+struct __is_move_iterator >
+{
+  enum { __value = 1 };
+  typedef __true_type __type;
+};
+
+  template
+_GLIBCXX20_CONSTEXPR
+auto
+__miter_base(move_iterator<_Iterator> __it)
+-> decltype(__miter_base(__it.base()))
+{ return __miter_base(__it.base()); }
+
+#define _GLIBCXX_MAKE_MOVE_ITERATOR(_Iter) std::make_move_iterator(_Iter)
+#define _GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(_Iter) \
+  std::__make_move_if_noexcept_iterator(_Iter)
+#else
+#define _GLIBCXX_MAKE_MOVE_ITERATOR(_Iter) (_Iter)
+#define _GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(_Iter) (_Iter)
+#endif // C++11
+
 #if __cplusplus > 201703L && __glibcxx_concepts
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 3736.  move_iterator missing disable_sized_sentinel_for specialization
@@ -2957,35 +2985,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// @} group iterators
 
-  template
-_GLIBCXX20_CONSTEXPR
-auto
-__niter_base(move_iterator<_Iterator> __it)
--> decltype(make_move_iterator(__niter_base(__it.base(
-{ return make_move_iterator(__niter_base(__it.base())); }
-
-  template
-struct __is_move_iterator >
-{
-  enum { __value = 1 };
-  typedef __true_type __type;
-};
-
-  template
-_GLIBCXX20_CONSTEXPR
-auto
-__miter_base(move_iterator<_Iterator> __it)
--> decltype(__miter_base(__it.base()))
-{ return __miter_base(__it.base()); }
-
-#define _GLIBCXX_MAKE_MOVE_ITERATOR(_Iter) std::make_move_iterator(_Iter)
-#define _GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(_Iter) \
-  std::__make_move_if_noexcept_iterator(_Iter)
-#else
-#define _GLIBCXX_MAKE_MOVE_ITERATOR(_Iter) (_Iter)
-#define _GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(_Iter) (_Iter)
-#endif // C++11
-
 #if __cpp_deduction_guides >= 201606
   // These helper traits are used for deduction guides
   // of associative containers.
-- 
2.46.2



Re: [PATCH][PR113816] AArch64: Use SIMD+GPR for logical vector reductions

2024-10-10 Thread Richard Biener
On Thu, 10 Oct 2024, Richard Sandiford wrote:

> Jennifer Schmitz  writes:
> > This patch implements the optabs reduc_and_scal_,
> > reduc_ior_scal_, and reduc_xor_scal_ for ASIMD modes V8QI,
> > V16QI, V4HI, and V8HI for TARGET_SIMD to improve codegen for bitwise logical
> > vector reduction operations.
> > Previously, either only vector registers or only general purpose registers 
> > (GPR)
> > were used. Now, vector registers are used for the reduction from 128 to 64 
> > bits;
> > 64-bit GPR are used for the reduction from 64 to 32 bits; and 32-bit GPR 
> > are used
> > for the rest of the reduction steps.
> >
> > For example, the test case (V8HI)
> > int16_t foo (int16_t *a)
> > {
> >   int16_t b = -1;
> >   for (int i = 0; i < 8; ++i)
> > b &= a[i];
> >   return b;
> > }
> >
> > was previously compiled to (-O2):
> > foo:
> > ldr q0, [x0]
> > moviv30.4s, 0
> > ext v29.16b, v0.16b, v30.16b, #8
> > and v29.16b, v29.16b, v0.16b
> > ext v31.16b, v29.16b, v30.16b, #4
> > and v31.16b, v31.16b, v29.16b
> > ext v30.16b, v31.16b, v30.16b, #2
> > and v30.16b, v30.16b, v31.16b
> > umovw0, v30.h[0]
> > ret
> >
> > With patch, it is compiled to:
> > foo:
> > ldr q31, [x0]
> > ext v30.16b, v31.16b, v31.16b, #8
> > and v31.8b, v30.8b, v31.8b
> > fmovx0, d31
> > and x0, x0, x0, lsr 32
> > and w0, w0, w0, lsr 16
> > ret
> >
> > For modes V4SI and V2DI, the pattern was not implemented, because the
> > current codegen (using only base instructions) is already efficient.
> >
> > Note that the PR initially suggested to use SVE reduction ops. However,
> > they have higher latency than the proposed sequence, which is why using
> > neon and base instructions is preferable.
> >
> > Test cases were added for 8/16-bit integers for all implemented modes and 
> > all
> > three operations to check the produced assembly.
> >
> > We also added [istarget aarch64*-*-*] to the selector vect_logical_reduc,
> > because for aarch64 vector types, either the logical reduction optabs are
> > implemented or the codegen for reduction operations is good as it is.
> > This was motivated by failure of a scan-tree-dump directive in the test 
> > cases
> > gcc.dg/vect/vect-reduc-or_1.c and gcc.dg/vect/vect-reduc-or_2.c.
> >
> > The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> > regression.
> > OK for mainline?
> >
> > Signed-off-by: Jennifer Schmitz 
> >
> > gcc/
> > PR target/113816
> > * config/aarch64/aarch64-simd.md (reduc__scal_):
> > Implement for logical bitwise operations for VDQV_E.
> >
> > gcc/testsuite/
> > PR target/113816
> > * lib/target-supports.exp (vect_logical_reduc): Add aarch64*.
> > * gcc.target/aarch64/simd/logical_reduc.c: New test.
> > * gcc.target/aarch64/vect-reduc-or_1.c: Adjust expected outcome.
> > ---
> >  gcc/config/aarch64/aarch64-simd.md|  55 +
> >  .../gcc.target/aarch64/simd/logical_reduc.c   | 208 ++
> >  .../gcc.target/aarch64/vect-reduc-or_1.c  |   2 +-
> >  gcc/testsuite/lib/target-supports.exp |   4 +-
> >  4 files changed, 267 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/logical_reduc.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md 
> > b/gcc/config/aarch64/aarch64-simd.md
> > index 23c03a96371..00286b8b020 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -3608,6 +3608,61 @@
> >}
> >  )
> >  
> > +;; Emit a sequence for bitwise logical reductions over vectors for V8QI, 
> > V16QI,
> > +;; V4HI, and V8HI modes.  The reduction is achieved by iteratively 
> > operating
> > +;; on the two halves of the input.
> > +;; If the input has 128 bits, the first operation is performed in vector
> > +;; registers.  From 64 bits down, the reduction steps are performed in 
> > general
> > +;; purpose registers.
> > +;; For example, for V8HI and operation AND, the intended sequence is:
> > +;; EXT  v1.16b, v0.16b, v0.16b, #8
> > +;; AND  v0.8b, v1.8b, v0.8b
> > +;; FMOV x0, d0
> > +;; AND  x0, x0, x0, 32
> > +;; AND  w0, w0, w0, 16
> > +;;
> > +;; For V8QI and operation AND, the sequence is:
> > +;; AND  x0, x0, x0, lsr 32
> > +;; AND  w0, w0, w0, lsr, 16
> > +;; AND  w0, w0, w0, lsr, 8
> > +
> > +(define_expand "reduc__scal_"
> > + [(match_operand: 0 "register_operand")
> > +  (LOGICAL:VDQV_E (match_operand:VDQV_E 1 "register_operand"))]
> > +  "TARGET_SIMD"
> > +  {
> > +rtx dst = operands[1];
> > +rtx tdi = gen_reg_rtx (DImode);
> > +rtx tsi = lowpart_subreg (SImode, tdi, DImode);
> > +rtx op1_lo;
> > +if (known_eq (GET_MODE_SIZE (mode), 16))
> > +  {
> > +   rtx t0 = gen_reg_rtx (mode);
> > +   rtx t1 = gen_reg_rtx (DImode);
> > +   rtx t2 = gen_reg_rtx (DImode);
> > +   rtx idx = GEN_INT (8 / GET_MODE_UNIT_SIZE (mode));
> > +   emi

[committed] libstdc++: Fix some test failures with -fno-char8_t

2024-10-10 Thread Jonathan Wakely
Testewd x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* testsuite/20_util/duration/io.cc [!__cpp_lib_char8_t]: Define
char8_t as a typedef for unsigned char.
* testsuite/std/format/parse_ctx_neg.cc: Skip for -fno-char8_t.
---
 libstdc++-v3/testsuite/20_util/duration/io.cc  | 10 --
 libstdc++-v3/testsuite/std/format/parse_ctx_neg.cc |  1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/testsuite/20_util/duration/io.cc 
b/libstdc++-v3/testsuite/20_util/duration/io.cc
index 383fb60afe2..0117673dbdc 100644
--- a/libstdc++-v3/testsuite/20_util/duration/io.cc
+++ b/libstdc++-v3/testsuite/20_util/duration/io.cc
@@ -5,6 +5,10 @@
 #include 
 #include 
 
+#ifndef __cpp_lib_char8_t
+using char8_t = unsigned char; // Prevent errors if -fno-char8_t is used.
+#endif
+
 void
 test01()
 {
@@ -173,12 +177,14 @@ test_format()
 
 #if __cplusplus > 202002L
   static_assert( ! std::formattable, char> );
-  static_assert( ! std::formattable, char> );
   static_assert( ! std::formattable, char> );
   static_assert( ! std::formattable, char> );
-  static_assert( ! std::formattable, wchar_t> );
   static_assert( ! std::formattable, wchar_t> 
);
   static_assert( ! std::formattable, wchar_t> 
);
+#ifdef __cpp_lib_char8_t
+  static_assert( ! std::formattable, char> );
+  static_assert( ! std::formattable, wchar_t> );
+#endif
 #endif
 }
 
diff --git a/libstdc++-v3/testsuite/std/format/parse_ctx_neg.cc 
b/libstdc++-v3/testsuite/std/format/parse_ctx_neg.cc
index d6a4366d7d0..f19107c886f 100644
--- a/libstdc++-v3/testsuite/std/format/parse_ctx_neg.cc
+++ b/libstdc++-v3/testsuite/std/format/parse_ctx_neg.cc
@@ -1,4 +1,5 @@
 // { dg-do compile { target c++26 } }
+// { dg-skip-if "" { *-*-* } { "-fno-char8_t" } }
 
 #include 
 
-- 
2.46.2



[PATCH v1 1/4] Match: Support form 1 for vector signed integer SAT_SUB

2024-10-10 Thread pan2 . li
From: Pan Li 

This patch would like to support the form 1 of the vector signed
integer SAT_SUB.  Aka below example:

Form 1:
  #define DEF_VEC_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) \
  void __attribute__((noinline))   \
  vec_sat_s_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
T x = op_1[i]; \
T y = op_2[i]; \
T minus = (UT)x - (UT)y;   \
out[i] = (x ^ y) >= 0  \
  ? minus  \
  : (minus ^ x) >= 0   \
? minus\
: x < 0 ? MIN : MAX;   \
  }\
  }

DEF_VEC_SAT_S_SUB_FMT_1(int8_t, uint8_t, INT8_MIN, INT8_MAX)

Before this patch:
  91   │   _108 = .SELECT_VL (ivtmp_106, POLY_INT_CST [16, 16]);
  92   │   vect_x_16.11_80 = .MASK_LEN_LOAD (vectp_op_1.9_78, 8B, { -1, ... }, 
_108, 0);
  93   │   _69 = vect_x_16.11_80 >> 7;
  94   │   vect_x.12_81 = VIEW_CONVERT_EXPR(vect_x_16.11_80);
  95   │   vect_y_18.15_85 = .MASK_LEN_LOAD (vectp_op_2.13_83, 8B, { -1, ... }, 
_108, 0);
  96   │   vect__7.21_91 = vect_x_16.11_80 ^ vect_y_18.15_85;
  97   │   mask__44.22_92 = vect__7.21_91 < { 0, ... };
  98   │   vect_y.16_86 = VIEW_CONVERT_EXPR(vect_y_18.15_85);
  99   │   vect__6.17_87 = vect_x.12_81 - vect_y.16_86;
 100   │   vect_minus_19.18_88 = VIEW_CONVERT_EXPR(vect__6.17_87);
 101   │   vect__8.19_89 = vect_x_16.11_80 ^ vect_minus_19.18_88;
 102   │   mask__42.20_90 = vect__8.19_89 < { 0, ... };
 103   │   mask__41.23_93 = mask__42.20_90 & mask__44.22_92;
 104   │   _4 = .COND_XOR (mask__41.23_93, _69, { 127, ... }, 
vect_minus_19.18_88);
 105   │   .MASK_LEN_STORE (vectp_out.31_102, 8B, { -1, ... }, _108, 0, _4);
 106   │   vectp_op_1.9_79 = vectp_op_1.9_78 + _108;
 107   │   vectp_op_2.13_84 = vectp_op_2.13_83 + _108;
 108   │   vectp_out.31_103 = vectp_out.31_102 + _108;
 109   │   ivtmp_107 = ivtmp_106 - _108;

After this patch:
  81   │   _102 = .SELECT_VL (ivtmp_100, POLY_INT_CST [16, 16]);
  82   │   vect_x_16.11_89 = .MASK_LEN_LOAD (vectp_op_1.9_87, 8B, { -1, ... }, 
_102, 0);
  83   │   vect_y_18.14_93 = .MASK_LEN_LOAD (vectp_op_2.12_91, 8B, { -1, ... }, 
_102, 0);
  84   │   vect_patt_38.15_94 = .SAT_SUB (vect_x_16.11_89, vect_y_18.14_93);
  85   │   .MASK_LEN_STORE (vectp_out.16_96, 8B, { -1, ... }, _102, 0, 
vect_patt_38.15_94);
  86   │   vectp_op_1.9_88 = vectp_op_1.9_87 + _102;
  87   │   vectp_op_2.12_92 = vectp_op_2.12_91 + _102;
  88   │   vectp_out.16_97 = vectp_out.16_96 + _102;
  89   │   ivtmp_101 = ivtmp_100 - _102;

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add case 1 matching pattern for vector signed SAT_SUB.

Signed-off-by: Pan Li 
---
 gcc/match.pd | 16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 8a7569ce387..a3c298d3a22 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3401,6 +3401,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
+/* Signed saturation sub, case 4:
+   T minus = (T)((UT)X - (UT)Y);
+   SAT_S_SUB = (X ^ Y) < 0 & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus;
+
+   The T and UT are type pair like T=int8_t, UT=uint8_t.  */
+(match (signed_integer_sat_sub @0 @1)
+ (cond^ (bit_and:c (lt (bit_xor @0 (nop_convert@2 (minus (nop_convert @0)
+(nop_convert @1
+  integer_zerop)
+  (lt (bit_xor:c @0 @1) integer_zerop))
+   (bit_xor:c (nop_convert (negate (nop_convert (convert
+ (lt @0 integer_zerop)
+  max_value)
+   @2)
+ (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type
+
 /* Unsigned saturation truncate, case 1, sizeof (WT) > sizeof (NT).
SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))).  */
 (match (unsigned_integer_sat_trunc @0)
-- 
2.43.0



[PATCH v1 2/4] Vect: Try the pattern of vector signed integer SAT_SUB

2024-10-10 Thread pan2 . li
From: Pan Li 

Almost the same as vector unsigned integer SAT_SUB, try to match
the signed version during the vector pattern matching.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* tree-vect-patterns.cc (gimple_signed_integer_sat_sub): Add new
func decl for signed SAT_SUB.
(vect_recog_sat_sub_pattern_transform): Update comments.
(vect_recog_sat_sub_pattern): Try the vector signed SAT_SUB
pattern.

Signed-off-by: Pan Li 
---
 gcc/tree-vect-patterns.cc | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 9bf8526ac99..746f100a084 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4538,6 +4538,7 @@ extern bool gimple_unsigned_integer_sat_sub (tree, tree*, 
tree (*)(tree));
 extern bool gimple_unsigned_integer_sat_trunc (tree, tree*, tree (*)(tree));
 
 extern bool gimple_signed_integer_sat_add (tree, tree*, tree (*)(tree));
+extern bool gimple_signed_integer_sat_sub (tree, tree*, tree (*)(tree));
 
 static gimple *
 vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
@@ -4684,6 +4685,7 @@ vect_recog_sat_sub_pattern_transform (vec_info *vinfo,
 
 /*
  * Try to detect saturation sub pattern (SAT_ADD), aka below gimple:
+ * Unsigned:
  *   _7 = _1 >= _2;
  *   _8 = _1 - _2;
  *   _10 = (long unsigned int) _7;
@@ -4691,6 +4693,27 @@ vect_recog_sat_sub_pattern_transform (vec_info *vinfo,
  *
  * And then simplied to
  *   _9 = .SAT_SUB (_1, _2);
+ *
+ * Signed:
+ *   x.0_4 = (unsigned char) x_16;
+ *   y.1_5 = (unsigned char) y_18;
+ *   _6 = x.0_4 - y.1_5;
+ *   minus_19 = (int8_t) _6;
+ *   _7 = x_16 ^ y_18;
+ *   _8 = x_16 ^ minus_19;
+ *   _44 = _7 < 0;
+ *   _23 = x_16 < 0;
+ *   _24 = (signed char) _23;
+ *   _58 = (unsigned char) _24;
+ *   _59 = -_58;
+ *   _25 = (signed char) _59;
+ *   _26 = _25 ^ 127;
+ *   _42 = _8 < 0;
+ *   _41 = _42 & _44;
+ *   iftmp.2_11 = _41 ? _26 : minus_19;
+ *
+ * And then simplied to
+ *   iftmp.2_11 = .SAT_SUB (x_16, y_18);
  */
 
 static gimple *
@@ -4705,7 +4728,8 @@ vect_recog_sat_sub_pattern (vec_info *vinfo, 
stmt_vec_info stmt_vinfo,
   tree ops[2];
   tree lhs = gimple_assign_lhs (last_stmt);
 
-  if (gimple_unsigned_integer_sat_sub (lhs, ops, NULL))
+  if (gimple_unsigned_integer_sat_sub (lhs, ops, NULL)
+  || gimple_signed_integer_sat_sub (lhs, ops, NULL))
 {
   vect_recog_sat_sub_pattern_transform (vinfo, stmt_vinfo, lhs, ops);
   gimple *stmt = vect_recog_build_binary_gimple_stmt (vinfo, stmt_vinfo,
-- 
2.43.0



[PATCH v1 3/4] RISC-V: Implement vector SAT_SUB for signed integer

2024-10-10 Thread pan2 . li
From: Pan Li 

This patch would like to implement the sssub for vector signed integer.

Form 1:
  #define DEF_VEC_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) \
  void __attribute__((noinline))   \
  vec_sat_s_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
T x = op_1[i]; \
T y = op_2[i]; \
T minus = (UT)x - (UT)y;   \
out[i] = (x ^ y) >= 0  \
  ? minus  \
  : (minus ^ x) >= 0   \
? minus\
: x < 0 ? MIN : MAX;   \
  }\
  }

DEF_VEC_SAT_S_SUB_FMT_1(int8_t, uint8_t, INT8_MIN, INT8_MAX)

Before this patch:
  28   │ vle8.v  v1,0(a1)
  29   │ vle8.v  v2,0(a2)
  30   │ sub a3,a3,a5
  31   │ add a1,a1,a5
  32   │ add a2,a2,a5
  33   │ vsra.vi v4,v1,7
  34   │ vsub.vv v3,v1,v2
  35   │ vxor.vv v2,v1,v2
  36   │ vxor.vv v0,v1,v3
  37   │ vmslt.viv2,v2,0
  38   │ vmslt.viv0,v0,0
  39   │ vmand.mmv0,v0,v2
  40   │ vxor.vv v3,v4,v5,v0.t
  41   │ vse8.v  v3,0(a0)
  42   │ add a0,a0,a5

After this patch:
  25   │ vle8.v  v1,0(a1)
  26   │ vle8.v  v2,0(a2)
  27   │ sub a3,a3,a5
  28   │ add a1,a1,a5
  29   │ add a2,a2,a5
  30   │ vssub.vvv1,v1,v2
  31   │ vse8.v  v1,0(a0)
  32   │ add a0,a0,a5

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec.md (sssub3): Add new pattern for
signed SAT_SUB.
* config/riscv/riscv-protos.h (expand_vec_sssub): Add new func
decl to expand sssub to vssub.
* config/riscv/riscv-v.cc (expand_vec_sssub): Add new func
impl to expand sssub to vssub.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md | 11 +++
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv-v.cc |  9 +
 3 files changed, 21 insertions(+)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 836cdd4491f..7dc78a48874 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2734,6 +2734,17 @@ (define_expand "ussub3"
   }
 )
 
+(define_expand "sssub3"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")
+   (match_operand:V_VLSI 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_vec_sssub (operands[0], operands[1], operands[2], 
mode);
+DONE;
+  }
+)
+
 (define_expand "ustrunc2"
   [(match_operand: 0 "register_operand")
(match_operand:VWEXTI   1 "register_operand")]
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 1e6d10a1402..b2f5d72f494 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -649,6 +649,7 @@ void expand_vec_lfloor (rtx, rtx, machine_mode, 
machine_mode);
 void expand_vec_usadd (rtx, rtx, rtx, machine_mode);
 void expand_vec_ssadd (rtx, rtx, rtx, machine_mode);
 void expand_vec_ussub (rtx, rtx, rtx, machine_mode);
+void expand_vec_sssub (rtx, rtx, rtx, machine_mode);
 void expand_vec_double_ustrunc (rtx, rtx, machine_mode);
 void expand_vec_quad_ustrunc (rtx, rtx, machine_mode, machine_mode);
 void expand_vec_oct_ustrunc (rtx, rtx, machine_mode, machine_mode,
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index ca3a80cceb9..fba35652cc2 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4902,6 +4902,15 @@ expand_vec_ussub (rtx op_0, rtx op_1, rtx op_2, 
machine_mode vec_mode)
   emit_vec_binary_alu (op_0, op_1, op_2, US_MINUS, vec_mode);
 }
 
+/* Expand the standard name ssadd3 for vector mode,  we can leverage
+   the vector fixed point vector single-width saturating add directly.  */
+
+void
+expand_vec_sssub (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  emit_vec_binary_alu (op_0, op_1, op_2, SS_MINUS, vec_mode);
+}
+
 /* Expand the standard name ustrunc2 for double vector mode,  like
DI => SI.  we can leverage the vector fixed point vector narrowing
fixed-point clip directly.  */
-- 
2.43.0



[PATCH] tree-optimization/117050 - fix ICE with non-grouped .MASK_LOAD SLP

2024-10-10 Thread Richard Biener
The following temporarily reverts the support of permuted .MASK_LOAD for the
case of non-grouped accesses.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/117050
* tree-vect-slp.cc (vect_build_slp_tree_2): Do not support
permutes of non-grouped .MASK_LOAD.

* gcc.dg/vect/pr117050.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr117050.c | 18 ++
 gcc/tree-vect-slp.cc |  3 ++-
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr117050.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr117050.c 
b/gcc/testsuite/gcc.dg/vect/pr117050.c
new file mode 100644
index 000..7b12cbc9ef4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr117050.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
+
+typedef struct {
+  char *data;
+} song_sample_t;
+typedef struct {
+  int right_ramp;
+  int left_ramp;
+} song_voice_t;
+song_sample_t *csf_stop_sample_smp, *csf_stop_sample_v_3;
+song_voice_t *csf_stop_sample_v;
+void csf_stop_sample()
+{
+  for (int i; i; i++, csf_stop_sample_v++)
+if (csf_stop_sample_v_3 || csf_stop_sample_smp->data)
+  csf_stop_sample_v->left_ramp = csf_stop_sample_v->right_ramp = 0;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3024b87a1f8..914b0b61b4d 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2031,7 +2031,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
 loads with gaps.  */
  if ((STMT_VINFO_GROUPED_ACCESS (stmt_info)
   && (DR_GROUP_GAP (first_stmt_info) != 0 || has_gaps))
- || STMT_VINFO_STRIDED_P (stmt_info))
+ || STMT_VINFO_STRIDED_P (stmt_info)
+ || (!STMT_VINFO_GROUPED_ACCESS (stmt_info) && any_permute))
{
  load_permutation.release ();
  matches[0] = false;
-- 
2.43.0


[PATCH v1 4/4] RISC-V: Add testcases for form 1 of vector signed SAT_SUB

2024-10-10 Thread pan2 . li
From: Pan Li 

Form 1:
  #define DEF_VEC_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) \
  void __attribute__((noinline))   \
  vec_sat_s_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
T x = op_1[i]; \
T y = op_2[i]; \
T minus = (UT)x - (UT)y;   \
out[i] = (x ^ y) >= 0  \
  ? minus  \
  : (minus ^ x) >= 0   \
? minus\
: x < 0 ? MIN : MAX;   \
  }\
  }

DEF_VEC_SAT_S_SUB_FMT_1(int8_t, uint8_t, INT8_MIN, INT8_MAX)

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper
macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-1-i16.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-1-i32.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-1-i64.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-1-i8.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/binop/vec_sat_data.h| 264 ++
 .../rvv/autovec/binop/vec_sat_s_sub-1-i16.c   |   9 +
 .../rvv/autovec/binop/vec_sat_s_sub-1-i32.c   |   9 +
 .../rvv/autovec/binop/vec_sat_s_sub-1-i64.c   |   9 +
 .../rvv/autovec/binop/vec_sat_s_sub-1-i8.c|   9 +
 .../autovec/binop/vec_sat_s_sub-run-1-i16.c   |  17 ++
 .../autovec/binop/vec_sat_s_sub-run-1-i32.c   |  17 ++
 .../autovec/binop/vec_sat_s_sub-run-1-i64.c   |  17 ++
 .../autovec/binop/vec_sat_s_sub-run-1-i8.c|  17 ++
 .../riscv/rvv/autovec/vec_sat_arith.h |  25 ++
 10 files changed, 393 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-1-i8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h
index 99d618168f3..32edc358a08 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h
@@ -598,4 +598,268 @@ int64_t TEST_BINARY_DATA_NAME(int64_t, int64_t, 
ssadd)[][3][N] =
   },
 };
 
+int8_t TEST_BINARY_DATA_NAME(int8_t, int8_t, sssub)[][3][N] =
+{
+  {
+{
+ 0,0,0,0,
+ 2,2,2,2,
+   126,  126,  126,  126,
+   127,  127,  127,  127,
+},
+{
+ 0,0,0,0,
+ 4,4,4,4,
+-2,   -2,   -2,   -2,
+  -127, -127, -127, -127,
+},
+{
+ 0,0,0,0,
+-2,   -2,   -2,   -2,
+   127,  127,  127,  127,
+   127,  127,  127,  127,
+},
+  },
+
+  {
+{
+-7,   -7,   -7,   -7,
+  -128, -128, -128, -128,
+  -127, -127, -127, -127,
+  -128, -128, -128, -128,
+},
+{
+-4,   -4,   -4,   -4,
+ 1,1,1,1,
+ 1,1,1,1,
+   127,  127,  127,  127,
+},
+{
+-3,   -3,   -3,   -3,
+  -128, -128, -128, -128,
+  -128, -128, -128, -128,
+  -128, -128,

[PATCH] [RFC] This is a test, please ignore

2024-10-10 Thread Christophe Lyon
This is a test patch, please ignore.
---
 README | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README b/README
index be15bc2b44e..7a3d7cfeb74 100644
--- a/README
+++ b/README
@@ -1,3 +1,5 @@
+THIS IS A TEST -- IGNORE
+
 This directory contains the GNU Compiler Collection (GCC).
 
 The GNU Compiler Collection is free software.  See the files whose
-- 
2.34.1



[PATCH] [RFC] This is a test, please ignore

2024-10-10 Thread Christophe Lyon
This is a test patch, please ignore.
---
 README | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README b/README
index be15bc2b44e..7a3d7cfeb74 100644
--- a/README
+++ b/README
@@ -1,3 +1,5 @@
+THIS IS A TEST -- IGNORE
+
 This directory contains the GNU Compiler Collection (GCC).
 
 The GNU Compiler Collection is free software.  See the files whose
-- 
2.34.1



[PATCH] This is a test2, please ignore

2024-10-10 Thread Christophe Lyon
ci-tag: skip
-- >8 --

This is a test patch, please ignore.

---
 README | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README b/README
index be15bc2b44e..7a3d7cfeb74 100644
--- a/README
+++ b/README
@@ -1,3 +1,5 @@
+THIS IS A TEST -- IGNORE
+
 This directory contains the GNU Compiler Collection (GCC).
 
 The GNU Compiler Collection is free software.  See the files whose
-- 
2.34.1



[PATCH] vect: Avoid divide by zero for permutes of extern VLA vectors

2024-10-10 Thread Richard Sandiford
My recent VLA SLP patches caused a regression with cross compilers
in gcc.dg/torture/neon-sve-bridge.c.  There we have a VEC_PERM_EXPR
created from two BIT_FIELD_REFs, with the child node being an
external VLA vector:

note:   node 0x3704a70 (max_nunits=1, refcnt=2) vector(2) long int
note:   op: VEC_PERM_EXPR
note:  stmt 0 val1Return_9 = BIT_FIELD_REF ;
note:  stmt 1 val2Return_10 = BIT_FIELD_REF ;
note:  lane permutation { 0[0] 0[1] }
note:  children 0x3704b08
note:   node (external) 0x3704b08 (max_nunits=1, refcnt=1) svint64_t
note:  { }

For this kind of external node, the SLP_TREE_LANES is normally
the total number of lanes in the vector, but it is zero if the
vector has variable length:

  auto nunits = TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (vnode));
  unsigned HOST_WIDE_INT const_nunits;
  if (nunits.is_constant (&const_nunits))
SLP_TREE_LANES (vnode) = const_nunits;

This led to division by zero in:

  /* Check whether the output has N times as many lanes per vector.  */
  else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,
SLP_TREE_LANES (child) * nunits,
&this_unpack_factor)
   && (i == 0 || unpack_factor == this_unpack_factor))
unpack_factor = this_unpack_factor;

No repetition takes place for this kind of external node, so this
patch goes with Richard's suggestion to check for external nodes
that have no scalar statements.

This didn't show up for my native testing since division by zero
doesn't trap on AArch64.

Bootstrapped & regreesion-tested on aarch64-linux-gnu and spot-checked
with a cross compiler.  OK to install?

gcc/
* tree-vect-slp.cc (vectorizable_slp_permutation_1): Set repeating_p
to false if we have an external node for a pre-existing vector.
---
 gcc/tree-vect-slp.cc | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 9bb765e2cba..1991fb1d3b6 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10288,10 +10288,19 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
}
   auto op_nunits = TYPE_VECTOR_SUBPARTS (op_vectype);
   unsigned int this_unpack_factor;
+  /* Detect permutations of external, pre-existing vectors.  The external
+node's SLP_TREE_LANES stores the total number of units in the vector,
+or zero if the vector has variable length.
+
+We are expected to keep the original VEC_PERM_EXPR for such cases.
+There is no repetition to model.  */
+  if (SLP_TREE_DEF_TYPE (child) == vect_external_def
+ && SLP_TREE_SCALAR_OPS (child).is_empty ())
+   repeating_p = false;
   /* Check whether the input has twice as many lanes per vector.  */
-  if (children.length () == 1
- && known_eq (SLP_TREE_LANES (child) * nunits,
-  SLP_TREE_LANES (node) * op_nunits * 2))
+  else if (children.length () == 1
+  && known_eq (SLP_TREE_LANES (child) * nunits,
+   SLP_TREE_LANES (node) * op_nunits * 2))
pack_p = true;
   /* Check whether the output has N times as many lanes per vector.  */
   else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,
-- 
2.25.1



Re: [PATCH] vect: Avoid divide by zero for permutes of extern VLA vectors

2024-10-10 Thread Richard Biener
On Thu, 10 Oct 2024, Richard Sandiford wrote:

> My recent VLA SLP patches caused a regression with cross compilers
> in gcc.dg/torture/neon-sve-bridge.c.  There we have a VEC_PERM_EXPR
> created from two BIT_FIELD_REFs, with the child node being an
> external VLA vector:
> 
> note:   node 0x3704a70 (max_nunits=1, refcnt=2) vector(2) long int
> note:   op: VEC_PERM_EXPR
> note:  stmt 0 val1Return_9 = BIT_FIELD_REF ;
> note:  stmt 1 val2Return_10 = BIT_FIELD_REF ;
> note:  lane permutation { 0[0] 0[1] }
> note:  children 0x3704b08
> note:   node (external) 0x3704b08 (max_nunits=1, refcnt=1) svint64_t
> note:  { }
> 
> For this kind of external node, the SLP_TREE_LANES is normally
> the total number of lanes in the vector, but it is zero if the
> vector has variable length:
> 
>   auto nunits = TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (vnode));
>   unsigned HOST_WIDE_INT const_nunits;
>   if (nunits.is_constant (&const_nunits))
>   SLP_TREE_LANES (vnode) = const_nunits;
> 
> This led to division by zero in:
> 
>   /* Check whether the output has N times as many lanes per vector.  */
>   else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,
>   SLP_TREE_LANES (child) * nunits,
>   &this_unpack_factor)
>  && (i == 0 || unpack_factor == this_unpack_factor))
>   unpack_factor = this_unpack_factor;
> 
> No repetition takes place for this kind of external node, so this
> patch goes with Richard's suggestion to check for external nodes
> that have no scalar statements.
> 
> This didn't show up for my native testing since division by zero
> doesn't trap on AArch64.
> 
> Bootstrapped & regreesion-tested on aarch64-linux-gnu and spot-checked
> with a cross compiler.  OK to install?

OK.

Thanks,
Richard.

> gcc/
>   * tree-vect-slp.cc (vectorizable_slp_permutation_1): Set repeating_p
>   to false if we have an external node for a pre-existing vector.
> ---
>  gcc/tree-vect-slp.cc | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 9bb765e2cba..1991fb1d3b6 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -10288,10 +10288,19 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
> gimple_stmt_iterator *gsi,
>   }
>auto op_nunits = TYPE_VECTOR_SUBPARTS (op_vectype);
>unsigned int this_unpack_factor;
> +  /* Detect permutations of external, pre-existing vectors.  The external
> +  node's SLP_TREE_LANES stores the total number of units in the vector,
> +  or zero if the vector has variable length.
> +
> +  We are expected to keep the original VEC_PERM_EXPR for such cases.
> +  There is no repetition to model.  */
> +  if (SLP_TREE_DEF_TYPE (child) == vect_external_def
> +   && SLP_TREE_SCALAR_OPS (child).is_empty ())
> + repeating_p = false;
>/* Check whether the input has twice as many lanes per vector.  */
> -  if (children.length () == 1
> -   && known_eq (SLP_TREE_LANES (child) * nunits,
> -SLP_TREE_LANES (node) * op_nunits * 2))
> +  else if (children.length () == 1
> +&& known_eq (SLP_TREE_LANES (child) * nunits,
> + SLP_TREE_LANES (node) * op_nunits * 2))
>   pack_p = true;
>/* Check whether the output has N times as many lanes per vector.  */
>else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH v6 0/2] Add support for SVE2 faminmax

2024-10-10 Thread saurabh.jha
From: Saurabh Jha 

This patch series is a revised version of:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664912.html

In particular, the only changes are in the first patch, where in the
test cases of intrinsics, we removed unnecessary capture of regular
expression of operands. The second patch has been reviewed already.

Regression tested on aarch64-unknown-linux-gnu and found no regressions.

Ok for master?

Regards,
Saurabh

Saurabh Jha (2):
  aarch64: Add SVE2 faminmax intrinsics
  aarch64: Add codegen support for SVE2 faminmax

 .../aarch64/aarch64-sve-builtins-base.cc  |   4 +
 .../aarch64/aarch64-sve-builtins-base.def |   5 +
 .../aarch64/aarch64-sve-builtins-base.h   |   2 +
 gcc/config/aarch64/aarch64-sve2.md|  37 ++
 gcc/config/aarch64/aarch64.h  |   1 +
 gcc/config/aarch64/iterators.md   |  24 +-
 .../gcc.target/aarch64/sve/faminmax_1.c   |  44 ++
 .../gcc.target/aarch64/sve/faminmax_2.c   |  60 +++
 .../aarch64/sve2/acle/asm/amax_f16.c  | 437 ++
 .../aarch64/sve2/acle/asm/amax_f32.c  | 437 ++
 .../aarch64/sve2/acle/asm/amax_f64.c  | 437 ++
 .../aarch64/sve2/acle/asm/amin_f16.c  | 437 ++
 .../aarch64/sve2/acle/asm/amin_f32.c  | 437 ++
 .../aarch64/sve2/acle/asm/amin_f64.c  | 437 ++
 14 files changed, 2798 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/faminmax_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/faminmax_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amin_f16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amin_f32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amin_f64.c

-- 
2.34.1



[PATCH v6 1/2] aarch64: Add SVE2 faminmax intrinsics

2024-10-10 Thread saurabh.jha

The AArch64 FEAT_FAMINMAX extension introduces instructions for
computing the floating point absolute maximum and minimum of the
two vectors element-wise.

This patch introduces SVE2 faminmax intrinsics. The intrinsics of this
extension are implemented as the following builtin functions:
* sva[max|min]_[m|x|z]
* sva[max|min]_[f16|f32|f64]_[m|x|z]
* sva[max|min]_n_[f16|f32|f64]_[m|x|z]

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins-base.cc
(svamax): Absolute maximum declaration.
(svamin): Absolute minimum declaration.
* config/aarch64/aarch64-sve-builtins-base.def
(REQUIRED_EXTENSIONS): Add faminmax intrinsics behind a flag.
(svamax): Absolute maximum declaration.
(svamin): Absolute minimum declaration.
* config/aarch64/aarch64-sve-builtins-base.h: Declaring function
bases for the new intrinsics.
* config/aarch64/aarch64.h
(TARGET_SVE_FAMINMAX): New flag for SVE2 faminmax.
* config/aarch64/iterators.md: New unspecs, iterators, and attrs
for the new intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve2/acle/asm/amax_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/amax_f32.c: New test.
* gcc.target/aarch64/sve2/acle/asm/amax_f64.c: New test.
* gcc.target/aarch64/sve2/acle/asm/amin_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/amin_f32.c: New test.
* gcc.target/aarch64/sve2/acle/asm/amin_f64.c: New test.
---
 .../aarch64/aarch64-sve-builtins-base.cc  |   4 +
 .../aarch64/aarch64-sve-builtins-base.def |   5 +
 .../aarch64/aarch64-sve-builtins-base.h   |   2 +
 gcc/config/aarch64/aarch64.h  |   1 +
 gcc/config/aarch64/iterators.md   |  18 +-
 .../aarch64/sve2/acle/asm/amax_f16.c  | 437 ++
 .../aarch64/sve2/acle/asm/amax_f32.c  | 437 ++
 .../aarch64/sve2/acle/asm/amax_f64.c  | 437 ++
 .../aarch64/sve2/acle/asm/amin_f16.c  | 437 ++
 .../aarch64/sve2/acle/asm/amin_f32.c  | 437 ++
 .../aarch64/sve2/acle/asm/amin_f64.c  | 437 ++
 11 files changed, 2651 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amin_f16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amin_f32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amin_f64.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 4b33585d981..b189818d643 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -3071,6 +3071,10 @@ FUNCTION (svadrb, svadr_bhwd_impl, (0))
 FUNCTION (svadrd, svadr_bhwd_impl, (3))
 FUNCTION (svadrh, svadr_bhwd_impl, (1))
 FUNCTION (svadrw, svadr_bhwd_impl, (2))
+FUNCTION (svamax, cond_or_uncond_unspec_function,
+	  (UNSPEC_COND_FAMAX, UNSPEC_FAMAX))
+FUNCTION (svamin, cond_or_uncond_unspec_function,
+	  (UNSPEC_COND_FAMIN, UNSPEC_FAMIN))
 FUNCTION (svand, rtx_code_function, (AND, AND))
 FUNCTION (svandv, reduction, (UNSPEC_ANDV))
 FUNCTION (svasr, rtx_code_function, (ASHIFTRT, ASHIFTRT))
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.def b/gcc/config/aarch64/aarch64-sve-builtins-base.def
index 65fcba91586..95e04e4393d 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.def
@@ -379,3 +379,8 @@ DEF_SVE_FUNCTION (svzip2q, binary, all_data, none)
 DEF_SVE_FUNCTION (svld1ro, load_replicate, all_data, implicit)
 DEF_SVE_FUNCTION (svmmla, mmla, d_float, none)
 #undef REQUIRED_EXTENSIONS
+
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_FAMINMAX
+DEF_SVE_FUNCTION (svamax, binary_opt_single_n, all_float, mxz)
+DEF_SVE_FUNCTION (svamin, binary_opt_single_n, all_float, mxz)
+#undef REQUIRED_EXTENSIONS
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.h b/gcc/config/aarch64/aarch64-sve-builtins-base.h
index 5bbf3569c4b..978cf7013f9 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.h
@@ -37,6 +37,8 @@ namespace aarch64_sve
 extern const function_base *const svadrd;
 extern const function_base *const svadrh;
 extern const function_base *const svadrw;
+extern const function_base *const svamax;
+extern const function_base *const svamin;
 extern const function_base *const svand;
 extern const function_base *const svandv;
 extern const function_base *const svasr;
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 030cffb1760..593319fd472 100644
--- a/gcc/c

[PATCH v6 2/2] aarch64: Add codegen support for SVE2 faminmax

2024-10-10 Thread saurabh.jha

The AArch64 FEAT_FAMINMAX extension introduces instructions for
computing the floating point absolute maximum and minimum of the
two vectors element-wise.

This patch adds code generation for famax and famin in terms of existing
unspecs. With this patch:
1. famax can be expressed as taking UNSPEC_COND_SMAX of the two operands
   and then taking absolute value of their result.
2. famin can be expressed as taking UNSPEC_COND_SMIN of the two operands
   and then taking absolute value of their result.

This fusion of operators is only possible when
-march=armv9-a+faminmax+sve flags are passed. We also need to pass
-ffast-math flag; this is what enables compiler to use UNSPEC_COND_SMAX
and UNSPEC_COND_SMIN.

This code generation is only available on -O2 or -O3 as that is when
auto-vectorization is enabled.

gcc/ChangeLog:

* config/aarch64/aarch64-sve2.md
(*aarch64_pred_faminmax_fused): Instruction pattern for faminmax
codegen.
* config/aarch64/iterators.md: Iterator and attribute for
faminmax codegen.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/faminmax_1.c: New test.
* gcc.target/aarch64/sve/faminmax_2.c: New test.
---
 gcc/config/aarch64/aarch64-sve2.md| 37 
 gcc/config/aarch64/iterators.md   |  6 ++
 .../gcc.target/aarch64/sve/faminmax_1.c   | 44 ++
 .../gcc.target/aarch64/sve/faminmax_2.c   | 60 +++
 4 files changed, 147 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/faminmax_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/faminmax_2.c

diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
index 725092cc95f..5f2697c3179 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -2467,6 +2467,43 @@
   [(set_attr "movprfx" "yes")]
 )
 
+;; -
+;; -- [FP] Absolute maximum and minimum
+;; -
+;; Includes:
+;; - FAMAX
+;; - FAMIN
+;; -
+;; Predicated floating-point absolute maximum and minimum.
+(define_insn_and_rewrite "*aarch64_pred_faminmax_fused"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	  [(match_operand: 1 "register_operand")
+	   (match_operand:SI 4 "aarch64_sve_gp_strictness")
+	   (unspec:SVE_FULL_F
+	 [(match_operand 5)
+	  (const_int SVE_RELAXED_GP)
+	  (match_operand:SVE_FULL_F 2 "register_operand")]
+	 UNSPEC_COND_FABS)
+	   (unspec:SVE_FULL_F
+	 [(match_operand 6)
+	  (const_int SVE_RELAXED_GP)
+	  (match_operand:SVE_FULL_F 3 "register_operand")]
+	 UNSPEC_COND_FABS)]
+	  SVE_COND_SMAXMIN))]
+  "TARGET_SVE_FAMINMAX"
+  {@ [ cons: =0 , 1   , 2  , 3 ; attrs: movprfx ]
+ [ w, Upl , %0 , w ; *  ] \t%0., %1/m, %0., %3.
+ [ ?&w  , Upl , w  , w ; yes] movprfx\t%0, %2\;\t%0., %1/m, %0., %3.
+  }
+  "&& (!rtx_equal_p (operands[1], operands[5])
+   || !rtx_equal_p (operands[1], operands[6]))"
+  {
+operands[5] = copy_rtx (operands[1]);
+operands[6] = copy_rtx (operands[1]);
+  }
+)
+
 ;; =
 ;; == Complex arithmetic
 ;; =
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index cbacf59c451..244a9c1b75d 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -3143,6 +3143,9 @@
 	 UNSPEC_COND_SMAX
 	 UNSPEC_COND_SMIN])
 
+(define_int_iterator SVE_COND_SMAXMIN [UNSPEC_COND_SMAX
+   UNSPEC_COND_SMIN])
+
 (define_int_iterator SVE_COND_FP_TERNARY [UNSPEC_COND_FMLA
 	  UNSPEC_COND_FMLS
 	  UNSPEC_COND_FNMLA
@@ -4503,6 +4506,9 @@
 
 (define_int_iterator FAMINMAX_UNS [UNSPEC_FAMAX UNSPEC_FAMIN])
 
+(define_int_attr faminmax_cond_uns_op
+  [(UNSPEC_COND_SMAX "famax") (UNSPEC_COND_SMIN "famin")])
+
 (define_int_attr faminmax_uns_op
   [(UNSPEC_FAMAX "famax") (UNSPEC_FAMIN "famin")])
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/faminmax_1.c b/gcc/testsuite/gcc.target/aarch64/sve/faminmax_1.c
new file mode 100644
index 000..3b65ccea065
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/faminmax_1.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -ffast-math" } */
+
+#include "arm_sve.h"
+
+#pragma GCC target "+sve+faminmax"
+
+#define TEST_FAMAX(TYPE)		\
+  void fn_famax_##TYPE (TYPE * restrict a,\
+			TYPE * restrict b,\
+			TYPE * restrict c,\
+			int n) {	\
+for (int i = 0; i < n; i++) {	\
+  TYPE temp1 = __builtin_fabs (a[i]);\
+  TYPE temp2 = __builtin_fabs (b[i]);\
+  c[i] = __builtin_fmax (temp1, temp2);\
+}	\
+  }	

[PATCH] tree-optimization/117060 - fix oversight in vect_build_slp_tree_1

2024-10-10 Thread Richard Biener
We are failing to match call vs. non-call when dealing with matching
loads or stores.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/117060
* tree-vect-slp.cc (vect_build_slp_tree_1): When comparing
calls also fail if the first isn't a call.

* gfortran.dg/pr117060.f90: New testcase.
---
 gcc/testsuite/gfortran.dg/pr117060.f90 | 21 +
 gcc/tree-vect-slp.cc   |  5 +++--
 2 files changed, 24 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr117060.f90

diff --git a/gcc/testsuite/gfortran.dg/pr117060.f90 
b/gcc/testsuite/gfortran.dg/pr117060.f90
new file mode 100644
index 000..50004e1aaf3
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr117060.f90
@@ -0,0 +1,21 @@
+! { dg-do compile }
+! { dg-options "-O2" }
+
+subroutine foo (out)
+
+implicit none
+
+real:: out(*)
+integer :: i,k
+real:: a(100)
+real:: b(100)
+
+k = 0
+do i = 1, 10
+  k = k + 1
+  out(k) = a(i)
+  k = k + 1
+  out(k) = sqrt((a(3*i)-b(4))**2 + (a(3*i+1)-b(4+1))**2)
+end do
+
+end subroutine
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index a0dfa18486b..3b6df34b6ee 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1367,8 +1367,9 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  && first_stmt_code != CFN_MASK_LOAD
  && first_stmt_code != CFN_MASK_STORE)
{
- if (!compatible_calls_p (as_a  (stmts[0]->stmt),
-  call_stmt))
+ if (!is_a  (stmts[0]->stmt)
+ || !compatible_calls_p (as_a  (stmts[0]->stmt),
+ call_stmt))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-- 
2.43.0


[PATCH 1/2] Remove SLP_INSTANCE_UNROLLING_FACTOR, compute VF in vect_make_slp_decision

2024-10-10 Thread Richard Biener
The following prepares us for SLP instances with a non-uniform number
of lanes.  We already have this with load permutation lowering, but
we managed to keep that within the constraints of the per SLP instance
computed VF based on its max_nunits (with a vector type fixed for
each node) and the instance group size which is the number of lanes
in the SLP instance root.  But in the case where arbitrary splitting
and merging SLP nodes at non-power-of-two lane boundaries is allowed
this simple calculation based on the outgoing group size falls apart.

The following, instead of computing a VF during SLP instance
discovery, computes it at vect_make_slp_decision time by walking
the SLP graph and looking at each SLP node in isolation.  We do
track max_nunits per node which could be a VF per node instead or
forgo with both completely (though for BB vectorization we need
to communicate a VF > 1 requirement upward, or compute that after
the fact).  In the end we'd like to delay vector type assignment
and only compute a minimum VF here, allowing vector types to
grow when the actual VF is bigger.

There's slight complication with permutes of externs / constants
as those get their vector type (and thus max_nunits) assigned late.
While we force them to have the same vector type as the result at
the moment their number of lanes can differ.  So those get handled
explicitly there right now to up the VF as needed - the alternative
is to fail vectorization, I have an addition to
vect_maybe_update_slp_op_vectype that would FAIL if the set
vector type isn't within the constraints of the VF.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  I'll watch the
CIs and push if no problems.

Richard.

* tree-vectorizer.h (SLP_INSTANCE_UNROLLING_FACTOR): Remove.
(slp_instance::unrolling_factor): Likewise.
* tree-vect-slp.cc (vect_build_slp_instance): Do not set
SLP_INSTANCE_UNROLLING_FACTOR.  Remove then dead code.
Compute and set max_nunits from the RHS nodes merged.
(vect_update_slp_vf_for_node): New function.
(vect_make_slp_decision): Use vect_update_slp_vf_for_node
to compute VF recursively.
(vect_build_slp_store_interleaving): Get max_nunits and
properly set that on the permute nodes built.
---
 gcc/tree-vect-slp.cc  | 70 +--
 gcc/tree-vectorizer.h |  4 ---
 2 files changed, 54 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index c9301d166a0..796fc4ba577 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3582,13 +3582,15 @@ vect_analyze_slp_instance (vec_info *vinfo,
 
 static slp_tree
 vect_build_slp_store_interleaving (vec &rhs_nodes,
-  vec &scalar_stmts)
+  vec &scalar_stmts,
+  poly_uint64 max_nunits)
 {
   unsigned int group_size = scalar_stmts.length ();
   slp_tree node = vect_create_new_slp_node (scalar_stmts,
SLP_TREE_CHILDREN
  (rhs_nodes[0]).length ());
   SLP_TREE_VECTYPE (node) = SLP_TREE_VECTYPE (rhs_nodes[0]);
+  node->max_nunits = max_nunits;
   for (unsigned l = 0;
l < SLP_TREE_CHILDREN (rhs_nodes[0]).length (); ++l)
 {
@@ -3598,6 +3600,7 @@ vect_build_slp_store_interleaving (vec 
&rhs_nodes,
   SLP_TREE_CHILDREN (node).quick_push (perm);
   SLP_TREE_LANE_PERMUTATION (perm).create (group_size);
   SLP_TREE_VECTYPE (perm) = SLP_TREE_VECTYPE (node);
+  perm->max_nunits = max_nunits;
   SLP_TREE_LANES (perm) = group_size;
   /* ???  We should set this NULL but that's not expected.  */
   SLP_TREE_REPRESENTATIVE (perm)
@@ -3653,6 +3656,7 @@ vect_build_slp_store_interleaving (vec 
&rhs_nodes,
  SLP_TREE_LANES (permab) = n;
  SLP_TREE_LANE_PERMUTATION (permab).create (n);
  SLP_TREE_VECTYPE (permab) = SLP_TREE_VECTYPE (perm);
+ permab->max_nunits = max_nunits;
  /* ???  Should be NULL but that's not expected.  */
  SLP_TREE_REPRESENTATIVE (permab) = SLP_TREE_REPRESENTATIVE (perm);
  SLP_TREE_CHILDREN (permab).quick_push (a);
@@ -3723,6 +3727,7 @@ vect_build_slp_store_interleaving (vec 
&rhs_nodes,
  SLP_TREE_LANES (permab) = n;
  SLP_TREE_LANE_PERMUTATION (permab).create (n);
  SLP_TREE_VECTYPE (permab) = SLP_TREE_VECTYPE (perm);
+ permab->max_nunits = max_nunits;
  /* ???  Should be NULL but that's not expected.  */
  SLP_TREE_REPRESENTATIVE (permab) = SLP_TREE_REPRESENTATIVE (perm);
  SLP_TREE_CHILDREN (permab).quick_push (a);
@@ -3846,7 +3851,6 @@ vect_build_slp_instance (vec_info *vinfo,
  /* Create a new SLP instance.  */
  slp_instance new_instance = XNEW (class _slp_instance);
  SLP_INSTANCE_TREE (new_instance) = node;
- SLP_INSTANCE_UNROLLING_FACTO

Re: [PATCH] gcc.target/i386/pr115407.c: Only run for lp64

2024-10-10 Thread H.J. Lu
On Thu, Oct 10, 2024 at 7:13 PM H.J. Lu  wrote:
>
> Since -mcmodel=large is valid only for lp64, run pr115407.c only for
> lp64.
>
> * gcc.target/i386/pr115407.c: Only run for lp64.
>
> --
> H.J.

This time is the correct patch.

-- 
H.J.
From 566b1920ce82e12a4355f4131116d4069536a61f Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 10 Oct 2024 17:29:27 +0800
Subject: [PATCH] gcc.target/i386/pr115407.c: Only run for lp64

Since -mcmodel=large is valid only for lp64, run pr115407.c only for
lp64.

	* gcc.target/i386/pr115407.c: Only run for lp64.

Signed-off-by: H.J. Lu 
---
 gcc/testsuite/gcc.target/i386/pr115407.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr115407.c b/gcc/testsuite/gcc.target/i386/pr115407.c
index b6cb7a6d9ea..426fb176b5b 100644
--- a/gcc/testsuite/gcc.target/i386/pr115407.c
+++ b/gcc/testsuite/gcc.target/i386/pr115407.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile { target { lp64 } } } */
 /* { dg-options "-O2 -mcmodel=large -mavx512bw" } */
 __attribute__((__vector_size__(64))) char v;
 
-- 
2.46.2



[PATCH 2/2] tree-optimization/117050 - fix ICE with non-grouped .MASK_LOAD SLP

2024-10-10 Thread Richard Biener
The following fixes an oversight when handling permuted non-grouped
.MASK_LOAD SLP discovery.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  This requires
1/2.

PR tree-optimization/117050
* tree-vect-slp.cc (vect_build_slp_tree_2): Properly handle
non-grouped masked loads when handling permutations.

* gcc.dg/vect/pr117050.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr117050.c | 18 ++
 gcc/tree-vect-slp.cc | 15 ---
 2 files changed, 26 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr117050.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr117050.c 
b/gcc/testsuite/gcc.dg/vect/pr117050.c
new file mode 100644
index 000..7b12cbc9ef4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr117050.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
+
+typedef struct {
+  char *data;
+} song_sample_t;
+typedef struct {
+  int right_ramp;
+  int left_ramp;
+} song_voice_t;
+song_sample_t *csf_stop_sample_smp, *csf_stop_sample_v_3;
+song_voice_t *csf_stop_sample_v;
+void csf_stop_sample()
+{
+  for (int i; i; i++, csf_stop_sample_v++)
+if (csf_stop_sample_v_3 || csf_stop_sample_smp->data)
+  csf_stop_sample_v->left_ramp = csf_stop_sample_v->right_ramp = 0;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 796fc4ba577..dd8f1befa25 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1986,7 +1986,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  stmt_vec_info load_info;
  load_permutation.create (group_size);
  stmt_vec_info first_stmt_info
-   = DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (node)[0]);
+   = STMT_VINFO_GROUPED_ACCESS (stmt_info)
+ ? DR_GROUP_FIRST_ELEMENT (stmt_info) : stmt_info;
  bool any_permute = false;
  bool any_null = false;
  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), j, load_info)
@@ -2045,17 +2046,17 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
{
  /* Discover the whole unpermuted load.  */
  vec stmts2;
- stmts2.create (DR_GROUP_SIZE (first_stmt_info));
- stmts2.quick_grow_cleared (DR_GROUP_SIZE (first_stmt_info));
+ unsigned dr_group_size = STMT_VINFO_GROUPED_ACCESS (stmt_info)
+ ? DR_GROUP_SIZE (first_stmt_info) : 1;
+ stmts2.create (dr_group_size);
+ stmts2.quick_grow_cleared (dr_group_size);
  unsigned i = 0;
  for (stmt_vec_info si = first_stmt_info;
   si; si = DR_GROUP_NEXT_ELEMENT (si))
stmts2[i++] = si;
- bool *matches2
-   = XALLOCAVEC (bool, DR_GROUP_SIZE (first_stmt_info));
+ bool *matches2 = XALLOCAVEC (bool, dr_group_size);
  slp_tree unperm_load
-   = vect_build_slp_tree (vinfo, stmts2,
-  DR_GROUP_SIZE (first_stmt_info),
+   = vect_build_slp_tree (vinfo, stmts2, dr_group_size,
   &this_max_nunits, matches2, limit,
   &this_tree_size, bst_map);
  /* When we are able to do the full masked load emit that
-- 
2.43.0


[PATCH] Fix possible wrong-code with masked store-lanes

2024-10-10 Thread Richard Biener
When we're doing masked store-lanes one mask element applies to all
loads of one struct element.  This requires uniform masks for all
of the SLP lanes, something we already compute into STMT_VINFO_SLP_VECT_ONLY
but fail to check when doing SLP store-lanes.  The following corrects
this.  The following also adjusts the store-lane heuristic to properly
check for masked or non-masked optab support.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

* tree-vect-slp.cc (vect_slp_prefer_store_lanes_p): Allow
passing in of vectype, pass in whether the stores are masked
and query the correct optab.
(vect_build_slp_instance): Guard store-lanes query with
! STMT_VINFO_SLP_VECT_ONLY, guaranteeing an uniform mask.
---
 gcc/tree-vect-slp.cc | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index dd8f1befa25..cfc6e599110 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3540,17 +3540,22 @@ vect_match_slp_patterns (slp_instance instance, 
vec_info *vinfo,
 }
 
 /* STMT_INFO is a store group of size GROUP_SIZE that we are considering
-   splitting into two, with the first split group having size NEW_GROUP_SIZE.
+   vectorizing with VECTYPE that might be NULL.  MASKED_P indicates whether
+   the stores are masked.
Return true if we could use IFN_STORE_LANES instead and if that appears
to be the better approach.  */
 
 static bool
 vect_slp_prefer_store_lanes_p (vec_info *vinfo, stmt_vec_info stmt_info,
+  tree vectype, bool masked_p,
   unsigned int group_size,
   unsigned int new_group_size)
 {
-  tree scalar_type = TREE_TYPE (DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
-  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
+  if (!vectype)
+{
+  tree scalar_type = TREE_TYPE (DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
+  vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
+}
   if (!vectype)
 return false;
   /* Allow the split if one of the two new groups would operate on full
@@ -3564,7 +3569,7 @@ vect_slp_prefer_store_lanes_p (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if (multiple_p (group_size - new_group_size, TYPE_VECTOR_SUBPARTS (vectype))
   || multiple_p (new_group_size, TYPE_VECTOR_SUBPARTS (vectype)))
 return false;
-  return vect_store_lanes_supported (vectype, group_size, false) != IFN_LAST;
+  return vect_store_lanes_supported (vectype, group_size, masked_p) != 
IFN_LAST;
 }
 
 /* Analyze an SLP instance starting from a group of grouped stores.  Call
@@ -4013,6 +4018,10 @@ vect_build_slp_instance (vec_info *vinfo,
   else if (is_a  (vinfo)
   && (group_size != 1 && i < group_size))
{
+ gcall *call = dyn_cast  (stmt_info->stmt);
+ bool masked_p = call
+ && gimple_call_internal_p (call)
+ && internal_fn_mask_index (gimple_call_internal_fn (call)) != -1;
  /* There are targets that cannot do even/odd interleaving schemes
 so they absolutely need to use load/store-lanes.  For now
 force single-lane SLP for them - they would be happy with
@@ -4027,9 +4036,10 @@ vect_build_slp_instance (vec_info *vinfo,
  bool want_store_lanes
= (! STMT_VINFO_GATHER_SCATTER_P (stmt_info)
   && ! STMT_VINFO_STRIDED_P (stmt_info)
+  && ! STMT_VINFO_SLP_VECT_ONLY (stmt_info)
   && compare_step_with_zero (vinfo, stmt_info) > 0
-  && vect_slp_prefer_store_lanes_p (vinfo, stmt_info,
-group_size, 1));
+  && vect_slp_prefer_store_lanes_p (vinfo, stmt_info, NULL_TREE,
+masked_p, group_size, 1));
  if (want_store_lanes || force_single_lane)
i = 1;
 
@@ -4107,14 +4117,14 @@ vect_build_slp_instance (vec_info *vinfo,
 
  /* Now re-assess whether we want store lanes in case the
 discovery ended up producing all single-lane RHSs.  */
- if (rhs_common_nlanes == 1
+ if (! want_store_lanes
+ && rhs_common_nlanes == 1
  && ! STMT_VINFO_GATHER_SCATTER_P (stmt_info)
  && ! STMT_VINFO_STRIDED_P (stmt_info)
+ && ! STMT_VINFO_SLP_VECT_ONLY (stmt_info)
  && compare_step_with_zero (vinfo, stmt_info) > 0
  && (vect_store_lanes_supported (SLP_TREE_VECTYPE (rhs_nodes[0]),
- group_size,
- SLP_TREE_CHILDREN
-   (rhs_nodes[0]).length () != 1)
+ group_size, masked_p)
  != IFN_LAST))
want_store_lanes = true;
 
-- 
2.43.0


RE: [PATCH]middle-end: support SLP early break

2024-10-10 Thread Tamar Christina
> > e.g. if (a != 0) where a is loop invariant.  For instance test_memcmp_1_1
> > in /gcc.dg/memcmp-1.c is such loop.  Technically we should be able to
> > vectorize such loops,  but while we can represent externals in the SLP tree,
> > we can't start discovery at them, as no stmt_info for them.
> >
> > In principle all I need here is an empty SLP tree, since all codegen is 
> > driven
> > by the roots for such invariant compares.  However vect_build_slp_tree
> > doesn't accept empty stmts.
> 
> The externals would have SLP nodes of course but the requirement
> currently is that the SLP instance root is an internal def.
> 
> > I believe we are able to vectorize such loops today,  so perhaps instead of
> > failing we should support building an SLP instance with only roots?
> 
> It might be tempting but I don't think this is generally useful.
> 
> > In which case should I try to fit it into vect_build_slp_tree or just 
> > special
> > case it for the gcond discovery?
> 
> The issue is that you have two operands you technically would like to
> see code-genrated - the 'a' and the '0' vector invariants, but the
> SLP instance only has a single root.  You could (as I suggested)
> simply only build the SLP node for the (invariant) LHS of the gcond,
> not by using vect_build_slp_tree but instead by manually building
> the SLP tree for the invariant - see what vect_build_slp_tree_2 does
> here:
> 

Done,

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Will test more targets closer to commit.

Ok for master?

gcc/ChangeLog:

* tree-vect-loop.cc (vect_analyze_loop_2): Handle SLP trees with no
children.
* tree-vectorizer.h (enum slp_instance_kind): Add slp_inst_kind_gcond.
(LOOP_VINFO_EARLY_BREAKS_LIVE_IVS): New.
(vectorizable_early_exit): Expose.
(class _loop_vec_info): Add early_break_live_stmts.
* tree-vect-slp.cc (vect_build_slp_instance, vect_analyze_slp_instance):
Support gcond instances.
(vect_analyze_slp): Analyze gcond roots and early break live statements.
(maybe_push_to_hybrid_worklist): Don't sink gconds.
(vect_slp_analyze_operations): Support gconds.
(vect_slp_check_for_roots): Update comments.
(vectorize_slp_instance_root_stmt): Support gconds.
(vect_schedule_slp): Pass vinfo to vectorize_slp_instance_root_stmt.
* tree-vect-stmts.cc (vect_stmt_relevant_p): Record early break live
statements.
(vectorizable_early_exit): Support SLP.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-early-break_126.c: New test.
* gcc.dg/vect/vect-early-break_127.c: New test.
* gcc.dg/vect/vect-early-break_128.c: New test.

-- inline copy of patch --

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_126.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_126.c
new file mode 100644
index 
..4bfc9880f9fc869bf616123ff509d13be17ffacf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_126.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+ {
+   ret *= vect_a[i];
+   return vect_a[i];
+ }
+   vect_a[i] = x;
+   ret += vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_127.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_127.c
new file mode 100644
index 
..67cb5d34a77192e5d7d72c35df8e83535ef184ab
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_127.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
+
+#ifndef N
+#define N 800
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 != x)
+ break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
new file mode 100644
index 
..6d7fb920ec2de529a4aa1de2c4a04286989204fd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-add-options vect

Re: [PATCH] [PR116831] match.pd: Check trunc_mod vector obtap before folding.

2024-10-10 Thread Richard Biener
On Wed, 9 Oct 2024, Jennifer Schmitz wrote:

> 
> > On 8 Oct 2024, at 10:31, Richard Biener  wrote:
> > 
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Fri, 4 Oct 2024, Jennifer Schmitz wrote:
> > 
> >> As in https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663185.html,
> >> this patch guards the simplification x / y * y == x -> x % y == 0 in
> >> match.pd for vector types by a check for:
> >> 1) Support of the mod optab for vectors OR
> >> 2) Application before vector lowering for non-VL vectors.
> >> 
> >> The patch was bootstrapped and tested with no regression on
> >> aarch64-linux-gnu and x86_64-linux-gnu.
> >> OK for mainline?
> > 
> > -  (if (TREE_CODE (TREE_TYPE (@0)) != COMPLEX_TYPE)
> > +  (if (TREE_CODE (TREE_TYPE (@0)) != COMPLEX_TYPE
> > +   || (VECTOR_INTEGER_TYPE_P (type)
> > +  && ((optimize_vectors_before_lowering_p ()
> > +   && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST)
> > +  || target_supports_op_p (type, TRUNC_MOD_EXPR,
> > +   optab_vector
> > 
> > this looks a bit odd, VECTOR_INTEGER_TYPE_P (type) checks the
> > result type of the comparison.  I think the whole condition is
> > better written as
> > 
> > (if (TREE_CODE (TREE_TYPE (@0)) != COMPLEX_TYPE
> >  && (!VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (@0)))
> >  || !target_supports_op_p (TREE_TYPE (@0), TRUNC_DIV_EXPR,
> >optab_vector)
> >  || target_supports_op_p (TREE_TYPE (@0), TRUNC_MOD_EXPR,
> >optab_vector)))
> > 
> > when we have non-vector mode we're before lowering, likewise when
> > the target doesn't support the division.  Even before lowering
> > we shouldn't replace a supported division (and multiplication)
> > with an unsupported modulo.
> Dear Richard,
> thanks for the review. I updated the patch with your suggestion and 
> re-validated on aarch64 and x86_64.
> Best,
> Jennifer
> 
> This patch guards the simplification x / y * y == x -> x % y == 0 in
> match.pd by a check for:
> 1) Non-vector mode of x OR
> 2) Lack of support for vector division OR
> 3) Support of vector modulo
> 
> The patch was bootstrapped and tested with no regression on
> aarch64-linux-gnu and x86_64-linux-gnu.
> OK for mainline?

OK.

Thanks,
Richard.

> Signed-off-by: Jennifer Schmitz 
> 
> gcc/
>   PR tree-optimization/116831
>   * match.pd: Guard simplification to trunc_mod with check for
>   mod optab support.
> 
> gcc/testsuite/
>   PR tree-optimization/116831
>   * gcc.dg/torture/pr116831.c: New test.
> ---
>  gcc/match.pd|  9 +++--
>  gcc/testsuite/gcc.dg/torture/pr116831.c | 10 ++
>  2 files changed, 17 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr116831.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index ba83f0f29e6..9b59b5c12f1 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5380,8 +5380,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* x / y * y == x -> x % y == 0.  */
>  (simplify
>(eq:c (mult:c (trunc_div:s @0 @1) @1) @0)
> -  (if (TREE_CODE (TREE_TYPE (@0)) != COMPLEX_TYPE)
> -(eq (trunc_mod @0 @1) { build_zero_cst (TREE_TYPE (@0)); })))
> +  (if (TREE_CODE (TREE_TYPE (@0)) != COMPLEX_TYPE
> +   && (!VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (@0)))
> +|| !target_supports_op_p (TREE_TYPE (@0), TRUNC_DIV_EXPR,
> +  optab_vector)
> +|| target_supports_op_p (TREE_TYPE (@0), TRUNC_MOD_EXPR,
> + optab_vector)))
> +   (eq (trunc_mod @0 @1) { build_zero_cst (TREE_TYPE (@0)); })))
>  
>  /* ((X /[ex] A) +- B) * A  -->  X +- A * B.  */
>  (for op (plus minus)
> diff --git a/gcc/testsuite/gcc.dg/torture/pr116831.c 
> b/gcc/testsuite/gcc.dg/torture/pr116831.c
> new file mode 100644
> index 000..92b2a130e69
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr116831.c
> @@ -0,0 +1,10 @@
> +/* { dg-additional-options "-mcpu=neoverse-v2" { target aarch64*-*-* } } */
> +
> +long a;
> +int b, c;
> +void d (int e[][5], short f[][5][5][5]) 
> +{
> +  for (short g; g; g += 4)
> +a = c ?: e[6][0] % b ? 0 : f[0][0][0][g];
> +}
> +
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] gcc.target/i386: Replace long with long long

2024-10-10 Thread H.J. Lu
Since long is 64-bit for x32, replace long with long long for x32.

* gcc.target/i386/bmi2-pr112526.c: Replace long with long long.
* gcc.target/i386/pr105854.c: Likewise.
* gcc.target/i386/pr112943.c: Likewise.
* gcc.target/i386/pr67325.c: Likewise.
* gcc.target/i386/pr97971.c: Likewise.

-- 
H.J.
From 566b1920ce82e12a4355f4131116d4069536a61f Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 10 Oct 2024 17:29:27 +0800
Subject: [PATCH] gcc.target/i386/pr115407.c: Only run for lp64

Since -mcmodel=large is valid only for lp64, run pr115407.c only for
lp64.

	* gcc.target/i386/pr115407.c: Only run for lp64.

Signed-off-by: H.J. Lu 
---
 gcc/testsuite/gcc.target/i386/pr115407.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr115407.c b/gcc/testsuite/gcc.target/i386/pr115407.c
index b6cb7a6d9ea..426fb176b5b 100644
--- a/gcc/testsuite/gcc.target/i386/pr115407.c
+++ b/gcc/testsuite/gcc.target/i386/pr115407.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile { target { lp64 } } } */
 /* { dg-options "-O2 -mcmodel=large -mavx512bw" } */
 __attribute__((__vector_size__(64))) char v;
 
-- 
2.46.2



[PATCH] gcc.target/i386/pr115407.c: Only run for lp64

2024-10-10 Thread H.J. Lu
Since -mcmodel=large is valid only for lp64, run pr115407.c only for
lp64.

* gcc.target/i386/pr115407.c: Only run for lp64.

-- 
H.J.
From 51b697d584743db0c954be293341506616c2b803 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 10 Oct 2024 19:00:32 +0800
Subject: [PATCH] g++.target/i386/pr105953.C: Skip for x32

Since -mabi=ms isn't supported for x32, skip g++.target/i386/pr105953.C
for x32.

	* g++.target/i386/pr105953.C: Skip for x32.

Signed-off-by: H.J. Lu 
---
 gcc/testsuite/g++.target/i386/pr105953.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.target/i386/pr105953.C b/gcc/testsuite/g++.target/i386/pr105953.C
index b423d2dfdae..4454c27a509 100644
--- a/gcc/testsuite/g++.target/i386/pr105953.C
+++ b/gcc/testsuite/g++.target/i386/pr105953.C
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! x32 } } } */
 /* { dg-options "-O2 -mavx512vl -mabi=ms" } */
 
 #include "pr100738-1.C"
-- 
2.46.2



[PATCH] g++.target/i386/pr105953.C: Skip for x32

2024-10-10 Thread H.J. Lu
Since -mabi=ms isn't supported for x32, skip g++.target/i386/pr105953.C
for x32.

* g++.target/i386/pr105953.C: Skip for x32.


-- 
H.J.
From 51b697d584743db0c954be293341506616c2b803 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 10 Oct 2024 19:00:32 +0800
Subject: [PATCH] g++.target/i386/pr105953.C: Skip for x32

Since -mabi=ms isn't supported for x32, skip g++.target/i386/pr105953.C
for x32.

	* g++.target/i386/pr105953.C: Skip for x32.

Signed-off-by: H.J. Lu 
---
 gcc/testsuite/g++.target/i386/pr105953.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.target/i386/pr105953.C b/gcc/testsuite/g++.target/i386/pr105953.C
index b423d2dfdae..4454c27a509 100644
--- a/gcc/testsuite/g++.target/i386/pr105953.C
+++ b/gcc/testsuite/g++.target/i386/pr105953.C
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! x32 } } } */
 /* { dg-options "-O2 -mavx512vl -mabi=ms" } */
 
 #include "pr100738-1.C"
-- 
2.46.2



[Patch] Fortran/OpenMP: Warn when mapping polymorphic variables

2024-10-10 Thread Tobias Burnus
GCC does not really handle mapping of polymorphic variables - and OpenMP 
6 will also make it implementation defined. (While explicitly permitting 
it with data-sharing clauses.)


This matches essentially what is in GCC, except that 'private' (and 
other privatizations) are not properly handled.


It also fixes the reported error location which due to missing gobbling 
of whitespace and pointing before the actual location looked odd.


Review comments? Remarks, Suggestions?

Tobias

PS: I think we eventually should move to location ranges, i.e. for a 
variable or expression, not only point at the first character but at the 
range. That's supported by the generic GCC diagnostic system. This can 
be done step wise and I think the expression, the name and the symbol 
matching are obvious candidates.
Fortran/OpenMP: Warn when mapping polymorphic variables

OpenMP (TR13) states for Fortran:
* For map: "If a list item has polymorphic type, the behavior is unspecified."
* "If the firstprivate clause is on a target construct and a variable is of
  polymorphic type, the behavior is unspecified."
which this commit now warns for.

It also fixes a diagnostic issue related to composite constructs containing
'target' and the match locus in gfc_match_omp_variable_list.

gcc/fortran/ChangeLog:

	* gfortran.h (gfc_locus_add_offset): New macro.
	* openmp.cc (gfc_match_omp_variable_list): Use it.
	(resolve_omp_clauses): Diagnose polymorphic mapping.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-14.f90: Fix off-by-one+ dg- column.
	* gfortran.dg/gomp/reduction5.f90: Likewise.
	* gfortran.dg/gomp/reduction6.f90: Likewise.
	* gfortran.dg/goacc/pr92793-1.f90: Likewise.
	* gfortran.dg/gomp/polymorphic-mapping.f90: New test.
	* gfortran.dg/gomp/polymorphic-mapping-2.f90: New test.

 gcc/fortran/gfortran.h |  3 ++
 gcc/fortran/openmp.cc  | 55 +-
 gcc/testsuite/gfortran.dg/goacc/pr92793-1.f90  |  4 +-
 gcc/testsuite/gfortran.dg/gomp/allocate-14.f90 |  4 +-
 .../gfortran.dg/gomp/polymorphic-mapping-2.f90 | 16 +++
 .../gfortran.dg/gomp/polymorphic-mapping.f90   | 49 +++
 gcc/testsuite/gfortran.dg/gomp/reduction5.f90  |  6 +--
 gcc/testsuite/gfortran.dg/gomp/reduction6.f90  |  4 +-
 8 files changed, 130 insertions(+), 11 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 917866a7ef0..2e495e80e0d 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1083,6 +1083,9 @@ typedef struct gfc_linebuf
 
 #define gfc_linebuf_linenum(LBUF) (LOCATION_LINE ((LBUF)->location))
 
+#define gfc_locus_add_offset(loc, offset) \
+  do { STATIC_ASSERT (offset >= 0); loc.nextc += offset; } while (false)
+
 typedef struct
 {
   gfc_char_t *nextc;
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index d9ccae8a11f..bd5dee56ca5 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -424,6 +424,7 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 
   for (;;)
 {
+  gfc_gobble_whitespace ();
   cur_loc = gfc_current_locus;
 
   m = gfc_match_name (n);
@@ -445,6 +446,7 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	  tail = tail->next;
 	}
 	  tail->where = cur_loc;
+	  gfc_locus_add_offset (tail->where, 1);
 	  goto next_item;
 	}
   if (m == MATCH_YES)
@@ -492,6 +494,7 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	  tail->sym = sym;
 	  tail->expr = expr;
 	  tail->where = cur_loc;
+	  gfc_locus_add_offset (tail->where, 1);
 	  if (reject_common_vars && sym->attr.in_common)
 	{
 	  gcc_assert (allow_common);
@@ -535,6 +538,7 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	  tail = tail->next;
 	}
 	  tail->sym = sym;
+	  gfc_locus_add_offset (tail->where, 1);
 	  tail->where = cur_loc;
 	}
 
@@ -9087,10 +9091,30 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 		  gfc_error ("List item %qs with allocatable components is not "
 			 "permitted in map clause at %L", n->sym->name,
 			 &n->where);
+		if (!openacc
+		&& (list == OMP_LIST_MAP
+			|| list == OMP_LIST_FROM
+			|| list == OMP_LIST_TO)
+		&& ((n->expr && n->expr->ts.type == BT_CLASS)
+			|| (!n->expr && n->sym->ts.type == BT_CLASS)))
+		  gfc_warning (OPT_Wopenmp,
+			   "Mapping polymorphic list item at %L is "
+			   "unspecified behavior", &n->where);
 		if (list == OMP_LIST_MAP && !openacc)
 		  switch (code->op)
 		{
 		case EXEC_OMP_TARGET:
+		case EXEC_OMP_TARGET_PARALLEL:
+		case EXEC_OMP_TARGET_PARALLEL_DO:
+		case EXEC_OMP_TARGET_PARALLEL_DO_SIMD:
+		case EXEC_OMP_TARGET_PARALLEL_LOOP:
+		case EXEC_OMP_TARGET_SIMD:
+		case EXEC_OMP_TARGET_TEAMS:
+		case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE:
+		case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO:
+		case EXEC_OMP_TARGE

[pushed] libiberty: Restore build with CP_DEMANGLE_DEBUG defined

2024-10-10 Thread Simon Martin
cp-demangle.c does not build when CP_DEMANGLE_DEBUG is defined since
r13-2887-gb04208895fed34. This trivial patch fixes the issue.

Tested on x86_64-apple-darwin19.6.0 with "make && make check" in
libiberty with CP_DEMANGLE_DEBUG defined.

I'm applying this as obvious.

libiberty/ChangeLog:

* cp-demangle.c (d_dump): Fix compilation when CP_DEMANGLE_DEBUG
is defined.

---
 libiberty/cp-demangle.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index fc2cf64e6e0..5b1bd5dff22 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -655,9 +655,9 @@ d_dump (struct demangle_component *dc, int indent)
   return;
 case DEMANGLE_COMPONENT_EXTENDED_BUILTIN_TYPE:
   {
-   char suffix[2] = { dc->u.s_extended_builtin.type->suffix, 0 };
+   char suffix[2] = { dc->u.s_extended_builtin.suffix, 0 };
printf ("builtin type %s%d%s\n", dc->u.s_extended_builtin.type->name,
-   dc->u.s_extended_builtin.type->arg, suffix);
+   dc->u.s_extended_builtin.arg, suffix);
   }
   return;
 case DEMANGLE_COMPONENT_OPERATOR:
-- 
2.44.0



Re: [PATCH v6] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2024-10-10 Thread Jason Merrill

On 10/7/24 3:35 PM, Simon Martin wrote:

On 7 Oct 2024, at 18:58, Jason Merrill wrote:

On 10/7/24 11:27 AM, Simon Martin wrote:



/* Now give a warning for all base functions without overriders,
   as they are hidden.  */
for (tree base_fndecl : base_fndecls)
+ {
+   if (!base_fndecl || overriden_base_fndecls.contains
(base_fndecl))
+ continue;
+   tree *hider = hidden_base_fndecls.get (base_fndecl);
+   if (hider)


How about looping over hidden_base_fndecls instead of base_fndecls?



Unfortunately it does not work because a given base method can be hidden
by one overload and overriden by another, in which case we don’t want
to warn (see for example AA:foo(int) in Woverloaded-virt7.C). So we need
to take both collections into account.


Yes, you'd still need to check overridden_base_fndecls.contains, but 
that doesn't seem any different iterating over hidden_base_fndecls 
instead of base_fndecls.


Or you could first iterate over overridden_base_fndecls and remove its 
elements from hidden_base_fndecls.


Jason



Re: [PATCH] fold fold_truth_andor field merging into ifcombine was: [PATCH] assorted improvements for fold_truth_andor_1)

2024-10-10 Thread Richard Biener
On Thu, Sep 26, 2024 at 10:49 AM Alexandre Oliva  wrote:
>
>
> This patch introduces various improvements to the logic that merges
> field compares, moving it into ifcombine.
>
> Before the patch, we could merge:
>
>   (a.x1 EQNE b.x1)  ANDOR  (a.y1 EQNE b.y1)
>
> into something like:
>
>   (((type *)&a)[Na] & MASK) EQNE (((type *)&b)[Nb] & MASK)
>
> if both of A's fields live within the same alignment boundaries, and
> so do B's, at the same relative positions.  Constants may be used
> instead of the object B.
>
> The initial goal of this patch was to enable such combinations when a
> field crossed alignment boundaries, e.g. for packed types.  We can't
> generally access such fields with a single memory access, so when we
> come across such a compare, we will attempt to combine each access
> separately.
>
> Some merging opportunities were missed because of right-shifts,
> compares expressed as e.g. ((a.x1 ^ b.x1) & MASK) EQNE 0, and
> narrowing conversions, especially after earlier merges.  This patch
> introduces handlers for several cases involving these.
>
> The merging of multiple field accesses into wider bitfield-like
> accesses is undesirable to do too early in compilation, so we move it
> from folding to ifcombine, and extend ifcombine to merge noncontiguous
> compares, absent intervening side effects.  VUSEs used to prevent
> ifcombine; that seemed excessively conservative, since relevant side
> effects were already tested, including the possibility of trapping
> loads, so that's removed.
>
> Unlike earlier ifcombine, when merging noncontiguous compares the
> merged compare must replace the earliest compare, which may require
> moving up the DEFs that contributed to the latter compare.
>
> When it is the second of a noncontiguous pair of compares that first
> accesses a word, we may merge the first compare with part of the
> second compare that refers to the same word, keeping the compare of
> the remaining bits at the spot where the second compare used to be.
>
> Handling compares with non-constant fields was somewhat generalized
> from what fold used to do, now handling non-adjacent fields, even if a
> field of one object crosses an alignment boundary but the other
> doesn't.

Thanks for working on this.  There's #if 0 portions in the patch - did you
send the correct version?

>
> The -Wno-error for toplev.o on rs6000 is because of toplev.c's:
>
>   if ((flag_sanitize & SANITIZE_ADDRESS)
>   && !FRAME_GROWS_DOWNWARD)
>
> and rs6000.h's:
>
> #define FRAME_GROWS_DOWNWARD (flag_stack_protect != 0   \
>   || (flag_sanitize & SANITIZE_ADDRESS) != 0)
>
> The mutually exclusive conditions involving flag_sanitize are now
> noticed and reported by ifcombine's warning on mutually exclusive
> compares.  i386's needs -Wno-error for insn-attrtab.o for similar
> reasons.

I wonder if we can check the locations as to whether the spelling location
is different and suppress diagnostics when macro expansions were involved?

Adding -Wno-error should be the last resort and I suspect such cases
happen in user code as well?

More comments inline.

>
> for  gcc/ChangeLog
>
> * fold-const.cc (make_bit_field): Export.
> (all_ones_mask_p): Drop.
> (unextend, decode_field_reference, fold_truth_andor_1): Move
> field compare merging logic...
> * gimple-fold.cc: ... here.
> (ssa_is_substitutable_p, is_cast_p, is_binop_p): New.
> (prepare_xor, follow_load): New.
> (compute_split_boundary_from_align): New.
> (make_bit_field_load, build_split_load): New.
> (reuse_split_load, mergeable_loads_p): New.
> (fold_truth_andor_maybe_separate): New.
> * tree-ssa-ifcombine.cc: Include bitmap.h.
> (constant_condition_p): New.
> (recognize_if_then_else_nc, recognize_if_succs): New.
> (bb_no_side_effects_p): Don't reject VUSEs.
> (update_profile_after_ifcombine): Adjust for noncontiguous
> merges.
> (ifcombine_mark_ssa_name): New.
> (struct ifcombine_mark_ssa_name_t): New.
> (ifcombine_mark_ssa_name_walk): New.
> (ifcombine_replace_cond): Extended for noncontiguous merges
> after factoring out of...
> (ifcombine_ifandif): ... this.  Drop result_inv arg.  Try
> fold_truth_andor_maybe_separate.
> (tree_ssa_ifcombine_bb_1): Add outer_succ_bb arg.  Call
> recognize_if_then_else_nc.  Adjust ifcombine_ifandif calls.
> (tree_ssa_ifcombine_bb): Return the earliest affected block.
> Call recognize_if_then_else_nc.  Try noncontiguous blocks.
> (pass_tree_ifcombine::execute): Retry affected blocks.
> * config/i386/t-i386 (insn-attrtab.o-warn): Disable errors.
> * config/rs6000/t-rs6000 (toplev.o-warn): Likewise.
>
> for  gcc/testsuite/ChangeLog
>
> * gcc.dg/field-merge-1.c: New.
> * gcc.dg/field-merge-2.c: New.
> * gcc.dg/field-merge-3

Re: [PATCH] gcc.target/i386: Replace long with long long

2024-10-10 Thread H.J. Lu
On Thu, Oct 10, 2024 at 7:14 PM H.J. Lu  wrote:
>
> Since long is 64-bit for x32, replace long with long long for x32.
>
> * gcc.target/i386/bmi2-pr112526.c: Replace long with long long.
> * gcc.target/i386/pr105854.c: Likewise.
> * gcc.target/i386/pr112943.c: Likewise.
> * gcc.target/i386/pr67325.c: Likewise.
> * gcc.target/i386/pr97971.c: Likewise.
>
> --
> H.J.

This time is the correct patch.

-- 
H.J.
From b1602a057dd8a0d93d9edfb74d9fe114242e725f Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 10 Oct 2024 17:22:36 +0800
Subject: [PATCH] gcc.target/i386: Replace long with long long

Since long is 64-bit for x32, replace long with long long for x32.

	* gcc.target/i386/bmi2-pr112526.c: Replace long with long long.
	* gcc.target/i386/pr105854.c: Likewise.
	* gcc.target/i386/pr112943.c: Likewise.
	* gcc.target/i386/pr67325.c: Likewise.
	* gcc.target/i386/pr97971.c: Likewise.

Signed-off-by: H.J. Lu 
---
 gcc/testsuite/gcc.target/i386/bmi2-pr112526.c | 7 ---
 gcc/testsuite/gcc.target/i386/pr105854.c  | 2 +-
 gcc/testsuite/gcc.target/i386/pr112943.c  | 4 ++--
 gcc/testsuite/gcc.target/i386/pr67325.c   | 2 +-
 gcc/testsuite/gcc.target/i386/pr97971.c   | 2 +-
 5 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/bmi2-pr112526.c b/gcc/testsuite/gcc.target/i386/bmi2-pr112526.c
index 7a3c6f14982..4bcedd5ca06 100644
--- a/gcc/testsuite/gcc.target/i386/bmi2-pr112526.c
+++ b/gcc/testsuite/gcc.target/i386/bmi2-pr112526.c
@@ -5,9 +5,10 @@
 #include "bmi2-check.h"
 
 __attribute__((noipa)) void
-foo (unsigned long x, unsigned __int128 *y, unsigned long z, unsigned long *w)
+foo (unsigned long long x, unsigned __int128 *y, unsigned long long z,
+ unsigned long long *w)
 {
-  register unsigned long a __asm ("%r10") = x + z;
+  register unsigned long long a __asm ("%r10") = x + z;
   register unsigned __int128 b __asm ("%r8") = ((unsigned __int128) a) * 257342423UL;
   asm volatile ("" : "+r" (b));
   asm volatile ("" : "+d" (a));
@@ -19,7 +20,7 @@ static void
 bmi2_test ()
 {
   unsigned __int128 y;
-  unsigned long w;
+  unsigned long long w;
   foo (10268318293806702989UL, &y, 4702524958196331333UL, &w);
   if (y != unsigned __int128) 0xc72d2c9UL) << 64) | 0x9586adfdc95b225eUL)
   || w != 14970843252003034322UL)
diff --git a/gcc/testsuite/gcc.target/i386/pr105854.c b/gcc/testsuite/gcc.target/i386/pr105854.c
index 36a8080b8a7..326485c2056 100644
--- a/gcc/testsuite/gcc.target/i386/pr105854.c
+++ b/gcc/testsuite/gcc.target/i386/pr105854.c
@@ -29,5 +29,5 @@ foo (void)
   d += 0.;
   U u0 = u + u + u1 + (U) d;
   V v0 = ((X)u0)[0] + v + v;
-  t = (T) (long) (__int128) v0 + t + t + t1;
+  t = (T) (long long) (__int128) v0 + t + t + t1;
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr112943.c b/gcc/testsuite/gcc.target/i386/pr112943.c
index b1840a1f462..de2771a3cab 100644
--- a/gcc/testsuite/gcc.target/i386/pr112943.c
+++ b/gcc/testsuite/gcc.target/i386/pr112943.c
@@ -14,7 +14,7 @@ typedef _Decimal64 d64;
 char foo0_u8_0;
 v8u8 foo0_v8u8_0;
 __attribute__((__vector_size__(sizeof(char char foo0_v8s8_0;
-__attribute__((__vector_size__(sizeof(long unsigned long v64u64_0;
+__attribute__((__vector_size__(sizeof(long long unsigned long long v64u64_0;
 _Float16 foo0_f16_0;
 v128f16 foo0_v128f16_0;
 double foo0_f64_0;
@@ -56,7 +56,7 @@ void foo0() {
 		})v16u8_r)
 .b +
   foo0_v8u8_0 + v8u8_1 + foo0_v8s8_0;
-long u64_r = u128_r + foo0_f64_0 + (unsigned long)foo0__0;
+long long u64_r = u128_r + foo0_f64_0 + (unsigned long long)foo0__0;
 short u16_r = u64_r + foo0_f16_0;
 char u8_r = u16_r + foo0_u8_0;
 *foo0_ret = v8u8_r + u8_r;
diff --git a/gcc/testsuite/gcc.target/i386/pr67325.c b/gcc/testsuite/gcc.target/i386/pr67325.c
index c3c1e4c5b4d..7fe0fd7232b 100644
--- a/gcc/testsuite/gcc.target/i386/pr67325.c
+++ b/gcc/testsuite/gcc.target/i386/pr67325.c
@@ -2,6 +2,6 @@
 /* { dg-options "-O2" } */
 /* { dg-final { scan-assembler-not "(?:sar|shr)" } } */
 
-int f(long*l){
+int f(long long*l){
   return *l>>32;
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr97971.c b/gcc/testsuite/gcc.target/i386/pr97971.c
index d07a31097c6..031e4c94de8 100644
--- a/gcc/testsuite/gcc.target/i386/pr97971.c
+++ b/gcc/testsuite/gcc.target/i386/pr97971.c
@@ -5,7 +5,7 @@
 int
 foo (void)
 {
-  register _Complex long a asm ("rax");
+  register _Complex long long a asm ("rax");
   register int b asm ("rdx");
   asm ("# %0 %1" : "=&r" (a), "=r" (b));	/* { dg-error "inconsistent operand constraints in an 'asm'" } */
   return a;
-- 
2.46.2



[PATCH] phiopt: Remove candorest variable return instead

2024-10-10 Thread Andrew Pinski
After r15-3560-gb081e6c860eb9688d24365d39, the setting of candorest
with the break can just change to a return since this is inside a lambda now.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (pass_phiopt::execute): Remove candorest
and return instead of setting candorest.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 43b65b362a3..f3ee3a80c0f 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -4322,7 +4322,6 @@ pass_phiopt::execute (function *)
}
 
   gimple_stmt_iterator gsi;
-  bool candorest = true;
 
   /* Check that we're looking for nested phis.  */
   basic_block merge = diamond_p ? EDGE_SUCC (bb2, 0)->dest : bb2;
@@ -4338,15 +4337,11 @@ pass_phiopt::execute (function *)
tree arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
if (value_replacement (bb, bb1, e1, e2, phi, arg0, arg1) == 2)
  {
-   candorest = false;
cfgchanged = true;
-   break;
+   return;
  }
  }
 
-  if (!candorest)
-   return;
-
   gphi *phi = single_non_singleton_phi_for_edges (phis, e1, e2);
   if (!phi)
return;
-- 
2.34.1



[PATCH] middle-end: [PR middle-end/116926] Allow widening optabs for vec-mode -> scalar-mode

2024-10-10 Thread Victor Do Nascimento
The recent refactoring of the dot_prod optab to convert-type exposed a
limitation in how `find_widening_optab_handler_and_mode' is currently
implemented, owing to the fact that, while the function expects the

  GET_MODE_CLASS (from_mode) == GET_MODE_CLASS (to_mode)

condition to hold, the c6x backend implements a dot product from V2HI
to SI, which triggers an ICE.

Consequently, this patch adds some logic to allow widening optabs
which accumulate vector elements to a single scalar.

Regression tested on x86_64 and aarch64 with no new regressions.
Fixes failing unit tests on c6x, as validated for the tic6x-unknown-elf
target.

Ok for master?

gcc/ChangeLog:

PR middle-end/116926
* optabs-query.cc (find_widening_optab_handler_and_mode): Add
handling of vector -> scalar optab handling.
---
 gcc/optabs-query.cc | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
index c3134d6a2ce..8a9092ffec7 100644
--- a/gcc/optabs-query.cc
+++ b/gcc/optabs-query.cc
@@ -485,6 +485,19 @@ find_widening_optab_handler_and_mode (optab op, 
machine_mode to_mode,
   if (GET_MODE_CLASS (limit_mode) == MODE_PARTIAL_INT)
limit_mode = GET_MODE_WIDER_MODE (limit_mode).require ();
 }
+  else if (GET_MODE_CLASS (from_mode) != GET_MODE_CLASS (to_mode))
+{
+  gcc_checking_assert (VECTOR_MODE_P (from_mode)
+  && !VECTOR_MODE_P (to_mode)
+  && GET_MODE_INNER (from_mode) < to_mode);
+  enum insn_code handler = convert_optab_handler (op, to_mode, from_mode);
+  if (handler != CODE_FOR_nothing)
+   {
+ if (found_mode)
+   *found_mode = from_mode;
+ return handler;
+   }
+}
   else
 gcc_checking_assert (GET_MODE_CLASS (from_mode) == GET_MODE_CLASS (to_mode)
 && from_mode < to_mode);
-- 
2.34.1



Re: [PATCH] aarch64: Alter pr116258.c test to correct for big endian.

2024-10-10 Thread Richard Sandiford
Richard Ball  writes:
> The test at pr116258.c fails on big endian targets,
> this is because the test checks that the index of a floating
> point multiply is 0, which is correct only for little endian.
>
> gcc/testsuite/ChangeLog:
>
>   PR tree-optimization/116258
>   * gcc.target/aarch64/pr116258.c:
>   Alter test to add big-endian support.

OK, thanks.

Richard

>
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr116258.c 
> b/gcc/testsuite/gcc.target/aarch64/pr116258.c
> index 
> e727ad4b72a5b8fe86e295d6e695d46203cd082e..5b63de25b7bf6dfd5f7b71cefcb27cabb42ac99e
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr116258.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr116258.c
> @@ -12,6 +12,7 @@
>return (x + h(t));
>  }
>  
> -/* { dg-final { scan-assembler-times "\\\[0\\\]" 1 } } */
> +/* { dg-final { scan-assembler-times "\\\[0\\\]" 1 { target { 
> aarch64_little_endian } } } } */
> +/* { dg-final { scan-assembler-times "\\\[3\\\]" 1 { target { 
> aarch64_big_endian } } } } */
>  /* { dg-final { scan-assembler-not "dup\t" } } */
>  /* { dg-final { scan-assembler-not "ins\t" } } */


[PATCH v4 2/2] arm: [MVE intrinsics] Improve vdupq_n implementation

2024-10-10 Thread Christophe Lyon
Hi,

v4 of patch 2/2 fixes a small mistake in 3 testcases, by relaxing the
expected q0 as result register into q[0-9]+ to account for codegen
differences depending on if the test is compiled with
-mfloat-abi=softfp or -mfloat-abi=hard.

I repost patch 1/2 (already approved) so that Linaro CI can apply patch 2/2.

Thanks,

Christophe


This patch makes the non-predicated vdupq_n MVE intrinsics use
vec_duplicate rather than an unspec.  This enables the compiler to
generate better code sequences (for instance using vmov when
possible).

The patch renames the existing mve_vdup pattern into
@mve_vdupq_n, and removes the now useless
@mve_q_n_f and @mve_q_n_ ones.

As a side-effect, it needs to update the mve_unpredicated_insn
predicates in @mve_q_m_n_ and
@mve_q_m_n_f.

Using vec_duplicates means the compiler is now able to use vmov in the
tests with an immediate argument in vdupq_n_[su]{8,16,32}.c:
vmov.i8 q0,#0x1

However, this is only possible when the immediate has a suitable value
(MVE encoding constraints, see imm_for_neon_mov_operand predicate).

Provided we adjust the cost computations in arm_rtx_costs_internal(),
when the immediate does not meet the vmov constraints, we now generate:
mov r0, #imm
vdup.xx q0,r0

or
ldr r0, .L4
vdup.32 q0,r0
in the f32 case (with 1.1 as immediate).

Without the cost adjustment, we would generate:
vldr.64 d0, .L4
vldr.64 d1, .L4+8
and an associated literal pool entry.

Regarding the testsuite updates:

* The signed versions of vdupq_* tests lack a version with an
immediate argument.  This patch adds them, similar to what we already
have for vdupq_n_u*.c tests.

* Code generation for different immediate values is checked with the
new tests this patch introduces.  Note there's no need for s8/u8 tests
because 8-bit immediates always comply wth imm_for_neon_mov_operand.

* We can remove xfail from vcmp*f tests since we now generate:
movw r3, #15462
vcmp.f16 eq, q0, r3
instead of the previous:
vldr.64 d6, .L5
vldr.64 d7, .L5+8
vcmp.f16 eq, q0, q3

Tested on arm-linux-gnueabihf and arm-none-eabi with no regression.

2024-07-02  Jolen Li  
Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vdupq_impl): New class.
(vdupq): Use new implementation.
* config/arm/arm.cc (arm_rtx_costs_internal): Handle HFmode
for COST_DOUBLE. Update costing for CONST_VECTOR.
* config/arm/arm_mve_builtins.def: Merge vdupq_n_f, vdupq_n_s
and vdupq_n_u into vdupq_n.
* config/arm/mve.md (mve_vdup): Rename into ...
(@mve_vdup_n): ... this.
(@mve_q_n_f): Delete.
(@mve_q_n_): Delete..
(@mve_q_m_n_): Update mve_unpredicated_insn
attribute.
(@mve_q_m_n_f): Likewise.

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/vdupq_n_u8.c (foo1): Update
expected code.
* gcc.target/arm/mve/intrinsics/vdupq_n_u16.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_u32.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_s8.c: Add test with
immediate argument.
* gcc.target/arm/mve/intrinsics/vdupq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_f16.c (foo1): Update
expected code.
* gcc.target/arm/mve/intrinsics/vdupq_n_f32.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c: Add test with
immediate argument.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_f32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_s16-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_s32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_u16-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_u32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: Remove xfail.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrins

[PATCH v4 1/2] arm: [MVE intrinsics] fix vdup iterator

2024-10-10 Thread Christophe Lyon
[Reposting these 2 patches as patchwork didn't pick them.]

This patch fixes a bug where the mode iterator for mve_vdup
should be MVE_VLD_ST instead of MVE_vecs: V2DI and V2DF (thus vdup.64)
are not supported by MVE.

2024-07-02  Jolen Li  
Christophe Lyon  

gcc/
* config/arm/mve.md (mve_vdup): Fix mode iterator.
---
 gcc/config/arm/mve.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 4b4d6298ffb..afe5fba698c 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -95,8 +95,8 @@ (define_insn "mve_mov"
(set_attr "neg_pool_range" "*,*,*,*,996,*,*,*")])
 
 (define_insn "mve_vdup"
-  [(set (match_operand:MVE_vecs 0 "s_register_operand" "=w")
-   (vec_duplicate:MVE_vecs
+  [(set (match_operand:MVE_VLD_ST 0 "s_register_operand" "=w")
+   (vec_duplicate:MVE_VLD_ST
  (match_operand: 1 "s_register_operand" "r")))]
   "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
   "vdup.\t%q0, %1"
-- 
2.34.1



Re: [PATCH v6 1/2] aarch64: Add SVE2 faminmax intrinsics

2024-10-10 Thread Richard Sandiford
 writes:
> +/*
> +** amax_0_f16_z_tied1:
> +**   ...
> +**   movprfx z0, z31
> +**   famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
> +**   ret
> +*/
> +TEST_UNIFORM_Z (amax_0_f16_z_tied1, svfloat16_t,
> + z0 = svamax_n_f16_z (p0, z0, 0),
> + z0 = svamax_z (p0, z0, 0))

We shouldn't match z31 here.  Probably best just to drop the movprfx
altogether, like for:

> +
> +/*
> +** amax_0_f16_z_untied:
> +**   ...
> +**   famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
> +**   ret
> +*/
> +TEST_UNIFORM_Z (amax_0_f16_z_untied, svfloat16_t,
> + z0 = svamax_n_f16_z (p0, z1, 0),
> + z0 = svamax_z (p0, z1, 0))

...this.

Same for the other tests.

OK for trunk with that change, thanks.

Richard

> +
> +/*
> +** amax_1_f16_z_tied1:
> +**   ...
> +**   movprfx z0\.h, p0/z, z0\.h
> +**   famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
> +**   ret
> +*/
> +TEST_UNIFORM_Z (amax_1_f16_z_tied1, svfloat16_t,
> + z0 = svamax_n_f16_z (p0, z0, 1),
> + z0 = svamax_z (p0, z0, 1))
> +
> +/*
> +** amax_1_f16_z_untied:
> +**   ...
> +**   movprfx z0\.h, p0/z, z0\.h
> +**   famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
> +**   ret
> +*/
> +TEST_UNIFORM_Z (amax_1_f16_z_untied, svfloat16_t,
> + z0 = svamax_n_f16_z (p0, z1, 1),
> + z0 = svamax_z (p0, z1, 1))
> +
> +/*
> +** amax_2_f16_z:
> +**   ...
> +**   movprfx z0\.h, p0/z, z0\.h
> +**   famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
> +**   ret
> +*/
> +TEST_UNIFORM_Z (amax_2_f16_z, svfloat16_t,
> + z0 = svamax_n_f16_z (p0, z0, 2),
> + z0 = svamax_z (p0, z0, 2))
> +
> +/*
> +** amax_f16_x_tied1:
> +**   famax   z0\.h, p0/m, z0\.h, z1\.h
> +**   ret
> +*/
> +TEST_UNIFORM_Z (amax_f16_x_tied1, svfloat16_t,
> + z0 = svamax_f16_x (p0, z0, z1),
> + z0 = svamax_x (p0, z0, z1))
> +
> +/*
> +** amax_f16_x_tied2:
> +**   famax   z0\.h, p0/m, z0\.h, z1\.h
> +**   ret
> +*/
> +TEST_UNIFORM_Z (amax_f16_x_tied2, svfloat16_t,
> + z0 = svamax_f16_x (p0, z1, z0),
> + z0 = svamax_x (p0, z1, z0))
> +
> +/*
> +** amax_f16_x_untied:
> +** (
> +**   movprfx z0, z1
> +**   famax   z0\.h, p0/m, z0\.h, z2\.h
> +** |
> +**   movprfx z0, z2
> +**   famax   z0\.h, p0/m, z0\.h, z1\.h
> +** )
> +**   ret
> +*/
> +TEST_UNIFORM_Z (amax_f16_x_untied, svfloat16_t,
> + z0 = svamax_f16_x (p0, z1, z2),
> + z0 = svamax_x (p0, z1, z2))
> +
> +/*
> +** amax_h4_f16_x_tied1:
> +**   mov (z[0-9]+\.h), h4
> +**   famax   z0\.h, p0/m, z0\.h, \1
> +**   ret
> +*/
> +TEST_UNIFORM_ZD (amax_h4_f16_x_tied1, svfloat16_t, __fp16,
> +  z0 = svamax_n_f16_x (p0, z0, d4),
> +  z0 = svamax_x (p0, z0, d4))
> +
> +/*
> +** amax_h4_f16_x_untied:
> +**   mov z0\.h, h4
> +**   famax   z0\.h, p0/m, z0\.h, z1\.h
> +**   ret
> +*/
> +TEST_UNIFORM_ZD (amax_h4_f16_x_untied, svfloat16_t, __fp16,
> +  z0 = svamax_n_f16_x (p0, z1, d4),
> +  z0 = svamax_x (p0, z1, d4))
> +
> +/*
> +** amax_0_f16_x_tied1:
> +**   ...
> +**   famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
> +**   ret
> +*/
> +TEST_UNIFORM_Z (amax_0_f16_x_tied1, svfloat16_t,
> + z0 = svamax_n_f16_x (p0, z0, 0),
> + z0 = svamax_x (p0, z0, 0))
> +
> +/*
> +** amax_0_f16_x_untied:
> +**   ...
> +**   famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
> +**   ret
> +*/
> +TEST_UNIFORM_Z (amax_0_f16_x_untied, svfloat16_t,
> + z0 = svamax_n_f16_x (p0, z1, 0),
> + z0 = svamax_x (p0, z1, 0))
> +
> +/*
> +** amax_1_f16_x_tied1:
> +**   ...
> +**   famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
> +**   ret
> +*/
> +TEST_UNIFORM_Z (amax_1_f16_x_tied1, svfloat16_t,
> + z0 = svamax_n_f16_x (p0, z0, 1),
> + z0 = svamax_x (p0, z0, 1))
> +
> +/*
> +** amax_1_f16_x_untied:
> +**   ...
> +**   famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
> +**   ret
> +*/
> +TEST_UNIFORM_Z (amax_1_f16_x_untied, svfloat16_t,
> + z0 = svamax_n_f16_x (p0, z1, 1),
> + z0 = svamax_x (p0, z1, 1))
> +
> +/*
> +** amax_2_f16_x_tied1:
> +**   ...
> +**   famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
> +**   ret
> +*/
> +TEST_UNIFORM_Z (amax_2_f16_x_tied1, svfloat16_t,
> + z0 = svamax_n_f16_x (p0, z0, 2),
> + z0 = svamax_x (p0, z0, 2))
> +
> +/*
> +** amax_2_f16_x_untied:
> +**   ...
> +**   famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
> +**   ret
> +*/
> +TEST_UNIFORM_Z (amax_2_f16_x_untied, svfloat16_t,
> + z0 = svamax_n_f16_x (p0, z1, 2),
> + z0 = svamax_x (p0, z1, 2))
> +
> +/*
> +** ptrue_amax_f16_x_tied1:
> +**   ...
> +**   ptrue   p[0-9]+\.b[^\n]*
> +**   ...
> +**   ret
> +*/
> +TEST_UNIFORM_Z (ptrue_amax_f16_x_tied1, svfloat16_t,
> + z0 = svamax_f16_x (svptrue_b16 (), z0, z1),
> + z0 = svamax_x (svptrue_b16 (), z0, z1))
> +
> +/*
> +** ptrue_amax_f16_x_tied2:
> +**   ...
> +**   ptrue   p[0-9]+\.b[^\n]*
> +**   ...
> +**   ret
> +*/
> +TEST_UNIFORM_Z (ptrue_amax_f16_x_tied2, svfloat16_t,
> + 

[PATCH] Always set SECTION_RELRO for or .data.rel.ro{, .local} [PR116887]

2024-10-10 Thread Xi Ruoyao
At least two ports (hppa and loongarch) need to set SECTION_RELRO for
.data.rel.ro{,.local} in section_type_flags (PR52999 and PR116887), and
I cannot see a reason not to just set it in the generic code.

With this applied we can also remove the hppa-specific
pa_section_type_flags in a future patch.

gcc/ChangeLog:

PR target/116887
* varasm.cc (default_section_type_flags): Always set
SECTION_RELRO if name is .data.rel.ro{,.local}.

gcc/testsuite/ChangeLog:

PR target/116887
* gcc.dg/pr116887.c: New test.
---

Bootstrapped & regtested on x86_64-linux-gnu.  Ok for trunk?

 gcc/testsuite/gcc.dg/pr116887.c | 23 +++
 gcc/varasm.cc   | 10 --
 2 files changed, 27 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr116887.c

diff --git a/gcc/testsuite/gcc.dg/pr116887.c b/gcc/testsuite/gcc.dg/pr116887.c
new file mode 100644
index 000..b7255e09a18
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr116887.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-fpic" } */
+
+struct link_map
+{
+  struct link_map *l_next;
+};
+struct rtld_global
+{
+  struct link_map *_ns_loaded;
+  char buf[4096];
+  struct link_map _dl_rtld_map;
+};
+extern struct rtld_global _rtld_global;
+static int _dlfo_main __attribute__ ((section (".data.rel.ro"), used));
+void
+_dlfo_process_initial (int ns)
+{
+  for (struct link_map *l = _rtld_global._ns_loaded; l != ((void *)0);
+   l = l->l_next)
+if (l == &_rtld_global._dl_rtld_map)
+  asm ("");
+}
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 4426e7ce6c6..aa450092ce5 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -6863,6 +6863,9 @@ default_section_type_flags (tree decl, const char *name, 
int reloc)
 
   if (decl && TREE_CODE (decl) == FUNCTION_DECL)
 flags = SECTION_CODE;
+  else if (strcmp (name, ".data.rel.ro") == 0
+  || strcmp (name, ".data.rel.ro.local") == 0)
+flags = SECTION_WRITE | SECTION_RELRO;
   else if (decl)
 {
   enum section_category category
@@ -6876,12 +6879,7 @@ default_section_type_flags (tree decl, const char *name, 
int reloc)
flags = SECTION_WRITE;
 }
   else
-{
-  flags = SECTION_WRITE;
-  if (strcmp (name, ".data.rel.ro") == 0
- || strcmp (name, ".data.rel.ro.local") == 0)
-   flags |= SECTION_RELRO;
-}
+flags = SECTION_WRITE;
 
   if (decl && DECL_P (decl) && DECL_COMDAT_GROUP (decl))
 flags |= SECTION_LINKONCE;
-- 
2.47.0



Re: [PATCH][PR113816] AArch64: Use SIMD+GPR for logical vector reductions

2024-10-10 Thread Richard Sandiford
Jennifer Schmitz  writes:
> This patch implements the optabs reduc_and_scal_,
> reduc_ior_scal_, and reduc_xor_scal_ for ASIMD modes V8QI,
> V16QI, V4HI, and V8HI for TARGET_SIMD to improve codegen for bitwise logical
> vector reduction operations.
> Previously, either only vector registers or only general purpose registers 
> (GPR)
> were used. Now, vector registers are used for the reduction from 128 to 64 
> bits;
> 64-bit GPR are used for the reduction from 64 to 32 bits; and 32-bit GPR are 
> used
> for the rest of the reduction steps.
>
> For example, the test case (V8HI)
> int16_t foo (int16_t *a)
> {
>   int16_t b = -1;
>   for (int i = 0; i < 8; ++i)
> b &= a[i];
>   return b;
> }
>
> was previously compiled to (-O2):
> foo:
>   ldr q0, [x0]
>   moviv30.4s, 0
>   ext v29.16b, v0.16b, v30.16b, #8
>   and v29.16b, v29.16b, v0.16b
>   ext v31.16b, v29.16b, v30.16b, #4
>   and v31.16b, v31.16b, v29.16b
>   ext v30.16b, v31.16b, v30.16b, #2
>   and v30.16b, v30.16b, v31.16b
>   umovw0, v30.h[0]
>   ret
>
> With patch, it is compiled to:
> foo:
>   ldr q31, [x0]
>   ext v30.16b, v31.16b, v31.16b, #8
>   and v31.8b, v30.8b, v31.8b
>   fmovx0, d31
>   and x0, x0, x0, lsr 32
>   and w0, w0, w0, lsr 16
>   ret
>
> For modes V4SI and V2DI, the pattern was not implemented, because the
> current codegen (using only base instructions) is already efficient.
>
> Note that the PR initially suggested to use SVE reduction ops. However,
> they have higher latency than the proposed sequence, which is why using
> neon and base instructions is preferable.
>
> Test cases were added for 8/16-bit integers for all implemented modes and all
> three operations to check the produced assembly.
>
> We also added [istarget aarch64*-*-*] to the selector vect_logical_reduc,
> because for aarch64 vector types, either the logical reduction optabs are
> implemented or the codegen for reduction operations is good as it is.
> This was motivated by failure of a scan-tree-dump directive in the test cases
> gcc.dg/vect/vect-reduc-or_1.c and gcc.dg/vect/vect-reduc-or_2.c.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>   PR target/113816
>   * config/aarch64/aarch64-simd.md (reduc__scal_):
>   Implement for logical bitwise operations for VDQV_E.
>
> gcc/testsuite/
>   PR target/113816
>   * lib/target-supports.exp (vect_logical_reduc): Add aarch64*.
>   * gcc.target/aarch64/simd/logical_reduc.c: New test.
>   * gcc.target/aarch64/vect-reduc-or_1.c: Adjust expected outcome.
> ---
>  gcc/config/aarch64/aarch64-simd.md|  55 +
>  .../gcc.target/aarch64/simd/logical_reduc.c   | 208 ++
>  .../gcc.target/aarch64/vect-reduc-or_1.c  |   2 +-
>  gcc/testsuite/lib/target-supports.exp |   4 +-
>  4 files changed, 267 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/logical_reduc.c
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 23c03a96371..00286b8b020 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3608,6 +3608,61 @@
>}
>  )
>  
> +;; Emit a sequence for bitwise logical reductions over vectors for V8QI, 
> V16QI,
> +;; V4HI, and V8HI modes.  The reduction is achieved by iteratively operating
> +;; on the two halves of the input.
> +;; If the input has 128 bits, the first operation is performed in vector
> +;; registers.  From 64 bits down, the reduction steps are performed in 
> general
> +;; purpose registers.
> +;; For example, for V8HI and operation AND, the intended sequence is:
> +;; EXT  v1.16b, v0.16b, v0.16b, #8
> +;; AND  v0.8b, v1.8b, v0.8b
> +;; FMOV x0, d0
> +;; AND  x0, x0, x0, 32
> +;; AND  w0, w0, w0, 16
> +;;
> +;; For V8QI and operation AND, the sequence is:
> +;; AND  x0, x0, x0, lsr 32
> +;; AND  w0, w0, w0, lsr, 16
> +;; AND  w0, w0, w0, lsr, 8
> +
> +(define_expand "reduc__scal_"
> + [(match_operand: 0 "register_operand")
> +  (LOGICAL:VDQV_E (match_operand:VDQV_E 1 "register_operand"))]
> +  "TARGET_SIMD"
> +  {
> +rtx dst = operands[1];
> +rtx tdi = gen_reg_rtx (DImode);
> +rtx tsi = lowpart_subreg (SImode, tdi, DImode);
> +rtx op1_lo;
> +if (known_eq (GET_MODE_SIZE (mode), 16))
> +  {
> + rtx t0 = gen_reg_rtx (mode);
> + rtx t1 = gen_reg_rtx (DImode);
> + rtx t2 = gen_reg_rtx (DImode);
> + rtx idx = GEN_INT (8 / GET_MODE_UNIT_SIZE (mode));
> + emit_insn (gen_aarch64_ext (t0, dst, dst, idx));
> + op1_lo = lowpart_subreg (V2DImode, dst, mode);
> + rtx t0_lo = lowpart_subreg (V2DImode, t0, mode);
> + emit_insn (gen_aarch64_get_lanev2di (t1, op1_lo, GEN_INT (0)));
> + emit_insn (gen_aarch64_get_la

[PATCH] Add 'cobol' to Makefile.def

2024-10-10 Thread James K. Lowden
Hello, 

I just joined the list to begin contributing patches for the COBOL
front end we've been touting for the last 4 years.  It's my first
attempt.  Please tell me if you'd like to see something different.  

What follows mimics to some degree the output of "git format-patch".  I
don't think I can use that command literally, but if I can and that
would be better, I'm happy to follow instructions.  

My plan is to send patches for one file at a time, starting from
the top of the tree.  Very soon we'll get to the front end proper, in
gcc/cobol.  After we work our way through those, there is a runtiime
library.  After that I have tests and documentation.  And then we'll be
done. Right?  ;-)  

This patch adds "cobol" as a language and subdirectory.  

--jkl


>From 216ec55cdb2ad95728612d4b9b5550324e9b506fpatch 4 Oct 2024 12:01:22 -0400
From: "James K. Lowden" 
Date: Thu Oct 10 14:28:48 EDT 2024
Subject: [PATCH]  Add 'cobol' to 1 file

---
a/Makefile.def | +++
1 file changed, 7 insertions(+), 0 deletions(-)
diff --git a/Makefile.def b/Makefile.def
index 19954e7d731..1192e852c7a 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -209,6 +209,7 @@ target_modules = { module= libgomp; bootstrap= true; 
lib_path=.libs; };
 target_modules = { module= libitm; lib_path=.libs; };
 target_modules = { module= libatomic; bootstrap=true; lib_path=.libs; };
 target_modules = { module= libgrust; };
+target_modules = { module= libgcobol; };
 
 // These are (some of) the make targets to be done in each subdirectory.
 // Not all; these are the ones which don't have special options.
@@ -324,6 +325,7 @@ flags_to_pass = { flag= CXXFLAGS_FOR_TARGET ; };
 flags_to_pass = { flag= DLLTOOL_FOR_TARGET ; };
 flags_to_pass = { flag= DSYMUTIL_FOR_TARGET ; };
 flags_to_pass = { flag= FLAGS_FOR_TARGET ; };
+flags_to_pass = { flag= GCOBOL_FOR_TARGET ; };
 flags_to_pass = { flag= GFORTRAN_FOR_TARGET ; };
 flags_to_pass = { flag= GOC_FOR_TARGET ; };
 flags_to_pass = { flag= GOCFLAGS_FOR_TARGET ; };
@@ -655,6 +657,7 @@ lang_env_dependencies = { module=libgcc; no_gcc=true; 
no_c=true; };
 // built newlib on some targets (e.g. Cygwin).  It still needs
 // a dependency on libgcc for native targets to configure.
 lang_env_dependencies = { module=libiberty; no_c=true; };
+lang_env_dependencies = { module=libgcobol; cxx=true; };
 
 dependencies = { module=configure-target-fastjar; on=configure-target-zlib; };
 dependencies = { module=all-target-fastjar; on=all-target-zlib; };
@@ -690,6 +693,7 @@ dependencies = { module=install-target-libvtv; 
on=install-target-libgcc; };
 dependencies = { module=install-target-libitm; on=install-target-libgcc; };
 dependencies = { module=install-target-libobjc; on=install-target-libgcc; };
 dependencies = { module=install-target-libstdc++-v3; on=install-target-libgcc; 
};
+dependencies = { module=install-target-libgcobol; 
on=install-target-libstdc++-v3; };
 
 // Target modules in the 'src' repository.
 lang_env_dependencies = { module=libtermcap; };
@@ -727,6 +731,8 @@ languages = { language=d;   gcc-check-target=check-d;
lib-check-target=check-target-libphobos; };
 languages = { language=jit;gcc-check-target=check-jit; };
 languages = { language=rust;   gcc-check-target=check-rust; };
+languages = { language=cobol;  gcc-check-target=check-cobol;
+   lib-check-target=check-target-libgcobol; };
 
 // Toplevel bootstrap
 bootstrap_stage = { id=1 ; };





RE: [PATCH][PR113816] AArch64: Use SIMD+GPR for logical vector reductions

2024-10-10 Thread Tamar Christina
Hi Jennifer,

> -Original Message-
> From: Jennifer Schmitz 
> Sent: Thursday, October 10, 2024 9:27 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Richard Earnshaw
> ; Kyrylo Tkachov ; Tamar
> Christina 
> Subject: [PATCH][PR113816] AArch64: Use SIMD+GPR for logical vector
> reductions
> 
> This patch implements the optabs reduc_and_scal_,
> reduc_ior_scal_, and reduc_xor_scal_ for ASIMD modes V8QI,
> V16QI, V4HI, and V8HI for TARGET_SIMD to improve codegen for bitwise logical
> vector reduction operations.
> Previously, either only vector registers or only general purpose registers 
> (GPR)
> were used. Now, vector registers are used for the reduction from 128 to 64 
> bits;
> 64-bit GPR are used for the reduction from 64 to 32 bits; and 32-bit GPR are 
> used
> for the rest of the reduction steps.
> 
> For example, the test case (V8HI)
> int16_t foo (int16_t *a)
> {
>   int16_t b = -1;
>   for (int i = 0; i < 8; ++i)
> b &= a[i];
>   return b;
> }
> 
> was previously compiled to (-O2):
> foo:
>   ldr q0, [x0]
>   moviv30.4s, 0
>   ext v29.16b, v0.16b, v30.16b, #8
>   and v29.16b, v29.16b, v0.16b
>   ext v31.16b, v29.16b, v30.16b, #4
>   and v31.16b, v31.16b, v29.16b
>   ext v30.16b, v31.16b, v30.16b, #2
>   and v30.16b, v30.16b, v31.16b
>   umovw0, v30.h[0]
>   ret
> 
> With patch, it is compiled to:
> foo:
>   ldr q31, [x0]
>   ext v30.16b, v31.16b, v31.16b, #8
>   and v31.8b, v30.8b, v31.8b
>   fmovx0, d31
>   and x0, x0, x0, lsr 32
>   and w0, w0, w0, lsr 16
>   ret
> 
> For modes V4SI and V2DI, the pattern was not implemented, because the
> current codegen (using only base instructions) is already efficient.
> 
> Note that the PR initially suggested to use SVE reduction ops. However,
> they have higher latency than the proposed sequence, which is why using
> neon and base instructions is preferable.
> 
> Test cases were added for 8/16-bit integers for all implemented modes and all
> three operations to check the produced assembly.
> 
> We also added [istarget aarch64*-*-*] to the selector vect_logical_reduc,
> because for aarch64 vector types, either the logical reduction optabs are
> implemented or the codegen for reduction operations is good as it is.
> This was motivated by failure of a scan-tree-dump directive in the test cases
> gcc.dg/vect/vect-reduc-or_1.c and gcc.dg/vect/vect-reduc-or_2.c.
> 
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
> 
> Signed-off-by: Jennifer Schmitz 
> 
> gcc/
>   PR target/113816
>   * config/aarch64/aarch64-simd.md (reduc__scal_):
>   Implement for logical bitwise operations for VDQV_E.
> 
> gcc/testsuite/
>   PR target/113816
>   * lib/target-supports.exp (vect_logical_reduc): Add aarch64*.
>   * gcc.target/aarch64/simd/logical_reduc.c: New test.
>   * gcc.target/aarch64/vect-reduc-or_1.c: Adjust expected outcome.
> ---
>  gcc/config/aarch64/aarch64-simd.md|  55 +
>  .../gcc.target/aarch64/simd/logical_reduc.c   | 208 ++
>  .../gcc.target/aarch64/vect-reduc-or_1.c  |   2 +-
>  gcc/testsuite/lib/target-supports.exp |   4 +-
>  4 files changed, 267 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/logical_reduc.c
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-
> simd.md
> index 23c03a96371..00286b8b020 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3608,6 +3608,61 @@
>}
>  )
> 
> +;; Emit a sequence for bitwise logical reductions over vectors for V8QI, 
> V16QI,
> +;; V4HI, and V8HI modes.  The reduction is achieved by iteratively operating
> +;; on the two halves of the input.
> +;; If the input has 128 bits, the first operation is performed in vector
> +;; registers.  From 64 bits down, the reduction steps are performed in 
> general
> +;; purpose registers.
> +;; For example, for V8HI and operation AND, the intended sequence is:
> +;; EXT  v1.16b, v0.16b, v0.16b, #8
> +;; AND  v0.8b, v1.8b, v0.8b
> +;; FMOV x0, d0
> +;; AND  x0, x0, x0, 32
> +;; AND  w0, w0, w0, 16
> +;;
> +;; For V8QI and operation AND, the sequence is:
> +;; AND  x0, x0, x0, lsr 32
> +;; AND  w0, w0, w0, lsr, 16
> +;; AND  w0, w0, w0, lsr, 8
> +
> +(define_expand "reduc__scal_"
> + [(match_operand: 0 "register_operand")
> +  (LOGICAL:VDQV_E (match_operand:VDQV_E 1 "register_operand"))]
> +  "TARGET_SIMD"
> +  {
> +rtx dst = operands[1];
> +rtx tdi = gen_reg_rtx (DImode);
> +rtx tsi = lowpart_subreg (SImode, tdi, DImode);
> +rtx op1_lo;
> +if (known_eq (GET_MODE_SIZE (mode), 16))
> +  {
> + rtx t0 = gen_reg_rtx (mode);
> + rtx t1 = gen_reg_rtx (DImode);
> + rtx t2 = gen_reg_rtx (DImode);
> + rtx idx = GEN_INT (8 / GET_MODE_U

[PATCH][LRA][PR116550] Reuse scratch registers generated by LRA

2024-10-10 Thread Denis Chertykov

The detailed explanation from PR116550:

Test file: udivmoddi.c
problem insn: 484

Before LRA pass we have:
(insn 484 483 485 72 (parallel [
(set (reg/v:SI 143 [ __q1 ])
(plus:SI (reg/v:SI 143 [ __q1 ])
(const_int -2 [0xfffe])))
(clobber (scratch:QI))
]) "udivmoddi.c":163:405 discrim 5 186 {addsi3}
 (nil))

LRA substitute all scratches with new pseudos, so we have:
(insn 484 483 485 72 (parallel [
(set (reg/v:SI 143 [ __q1 ])
(plus:SI (reg/v:SI 143 [ __q1 ])
(const_int -2 [0xfffe])))
(clobber (reg:QI 619))
]) "/mnt/d/avr-lra/udivmoddi.c":163:405 discrim 5 186 {addsi3}
 (expr_list:REG_UNUSED (reg:QI 619)
(nil)))

Pseudo 619 is a special scratch register generated by LRA which is marked in 
`scratch_bitmap' and can be tested by call `ira_former_scratch_p(regno)'.

In dump file (udivmoddi.c.317r.reload) we have:
  Creating newreg=619
Removing SCRATCH to p619 in insn #484 (nop 3)
rescanning insn with uid = 484.

After that LRA tries to spill (reg:QI 619)
It's a bug because (reg:QI 619) is an output scratch register which is already 
something like spill register.

Fragment from udivmoddi.c.317r.reload:
  Choosing alt 2 in insn 484:  (0) r  (1) 0  (2) nYnn  (3) &d {addsi3}
  Creating newreg=728 from oldreg=619, assigning class LD_REGS to r728

IMHO: the bug is in lra-constraints.cc in function `get_reload_reg'
fragment of `get_reload_reg':
  if (type == OP_OUT)
{
  /* Output reload registers tend to start out with a conservative
 choice of register class.  Usually this is ALL_REGS, although
 a target might narrow it (for performance reasons) through
 targetm.preferred_reload_class.  It's therefore quite common
 for a reload instruction to require a more restrictive class
 than the class that was originally assigned to the reload register.

 In these situations, it's more efficient to refine the choice
 of register class rather than create a second reload register.
 This also helps to avoid cycling for registers that are only
 used by reload instructions.  */
  if (REG_P (original)
  && (int) REGNO (original) >= new_regno_start
  && INSN_UID (curr_insn) >= new_insn_uid_start
__^^
  && in_class_p (original, rclass, &new_class, true))
{
  unsigned int regno = REGNO (original);
  if (lra_dump_file != NULL)
{
  fprintf (lra_dump_file, " Reuse r%d for output ", regno);
  dump_value_slim (lra_dump_file, original, 1);
}


This condition incorrectly limits register reuse to ONLY newly generated 
instructions.
i.e. LRA can reuse registers only from insns generated by himself.

IMHO:It's wrong.
Scratch registers generated by LRA also have to be reused.

The patch is very simple.
On x86_64, it bootstraps+regtests fine.
Ok for trunk?

Denis.


PR target/116550
gcc/
* lra-constraints.cc (get_reload_reg): Reuse scratch registers
generated by LRA.


diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index fdcc07764a2..1f63113f321 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -680,7 +680,8 @@ get_reload_reg (enum op_type type, machine_mode mode, rtx 
original,
 used by reload instructions.  */
   if (REG_P (original)
  && (int) REGNO (original) >= new_regno_start
- && INSN_UID (curr_insn) >= new_insn_uid_start
+ && (INSN_UID (curr_insn) >= new_insn_uid_start
+ || ira_former_scratch_p (REGNO (original)))
  && in_class_p (original, rclass, &new_class, true))
{
  unsigned int regno = REGNO (original);



RE: [PATCH][PR113816] AArch64: Use SIMD+GPR for logical vector reductions

2024-10-10 Thread Tamar Christina
> -Original Message-
> From: Richard Sandiford 
> Sent: Thursday, October 10, 2024 8:08 PM
> To: Jennifer Schmitz 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw ;
> Kyrylo Tkachov ; Tamar Christina
> ; rguent...@suse.de
> Subject: Re: [PATCH][PR113816] AArch64: Use SIMD+GPR for logical vector
> reductions
> 
> Jennifer Schmitz  writes:
> > This patch implements the optabs reduc_and_scal_,
> > reduc_ior_scal_, and reduc_xor_scal_ for ASIMD modes V8QI,
> > V16QI, V4HI, and V8HI for TARGET_SIMD to improve codegen for bitwise logical
> > vector reduction operations.
> > Previously, either only vector registers or only general purpose registers 
> > (GPR)
> > were used. Now, vector registers are used for the reduction from 128 to 64 
> > bits;
> > 64-bit GPR are used for the reduction from 64 to 32 bits; and 32-bit GPR are
> used
> > for the rest of the reduction steps.
> >
> > For example, the test case (V8HI)
> > int16_t foo (int16_t *a)
> > {
> >   int16_t b = -1;
> >   for (int i = 0; i < 8; ++i)
> > b &= a[i];
> >   return b;
> > }
> >
> > was previously compiled to (-O2):
> > foo:
> > ldr q0, [x0]
> > moviv30.4s, 0
> > ext v29.16b, v0.16b, v30.16b, #8
> > and v29.16b, v29.16b, v0.16b
> > ext v31.16b, v29.16b, v30.16b, #4
> > and v31.16b, v31.16b, v29.16b
> > ext v30.16b, v31.16b, v30.16b, #2
> > and v30.16b, v30.16b, v31.16b
> > umovw0, v30.h[0]
> > ret
> >
> > With patch, it is compiled to:
> > foo:
> > ldr q31, [x0]
> > ext v30.16b, v31.16b, v31.16b, #8
> > and v31.8b, v30.8b, v31.8b
> > fmovx0, d31
> > and x0, x0, x0, lsr 32
> > and w0, w0, w0, lsr 16
> > ret
> >
> > For modes V4SI and V2DI, the pattern was not implemented, because the
> > current codegen (using only base instructions) is already efficient.
> >
> > Note that the PR initially suggested to use SVE reduction ops. However,
> > they have higher latency than the proposed sequence, which is why using
> > neon and base instructions is preferable.
> >
> > Test cases were added for 8/16-bit integers for all implemented modes and 
> > all
> > three operations to check the produced assembly.
> >
> > We also added [istarget aarch64*-*-*] to the selector vect_logical_reduc,
> > because for aarch64 vector types, either the logical reduction optabs are
> > implemented or the codegen for reduction operations is good as it is.
> > This was motivated by failure of a scan-tree-dump directive in the test 
> > cases
> > gcc.dg/vect/vect-reduc-or_1.c and gcc.dg/vect/vect-reduc-or_2.c.
> >
> > The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> > regression.
> > OK for mainline?
> >
> > Signed-off-by: Jennifer Schmitz 
> >
> > gcc/
> > PR target/113816
> > * config/aarch64/aarch64-simd.md (reduc__scal_):
> > Implement for logical bitwise operations for VDQV_E.
> >
> > gcc/testsuite/
> > PR target/113816
> > * lib/target-supports.exp (vect_logical_reduc): Add aarch64*.
> > * gcc.target/aarch64/simd/logical_reduc.c: New test.
> > * gcc.target/aarch64/vect-reduc-or_1.c: Adjust expected outcome.
> > ---
> >  gcc/config/aarch64/aarch64-simd.md|  55 +
> >  .../gcc.target/aarch64/simd/logical_reduc.c   | 208 ++
> >  .../gcc.target/aarch64/vect-reduc-or_1.c  |   2 +-
> >  gcc/testsuite/lib/target-supports.exp |   4 +-
> >  4 files changed, 267 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/logical_reduc.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> > index 23c03a96371..00286b8b020 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -3608,6 +3608,61 @@
> >}
> >  )
> >
> > +;; Emit a sequence for bitwise logical reductions over vectors for V8QI, 
> > V16QI,
> > +;; V4HI, and V8HI modes.  The reduction is achieved by iteratively 
> > operating
> > +;; on the two halves of the input.
> > +;; If the input has 128 bits, the first operation is performed in vector
> > +;; registers.  From 64 bits down, the reduction steps are performed in 
> > general
> > +;; purpose registers.
> > +;; For example, for V8HI and operation AND, the intended sequence is:
> > +;; EXT  v1.16b, v0.16b, v0.16b, #8
> > +;; AND  v0.8b, v1.8b, v0.8b
> > +;; FMOV x0, d0
> > +;; AND  x0, x0, x0, 32
> > +;; AND  w0, w0, w0, 16
> > +;;
> > +;; For V8QI and operation AND, the sequence is:
> > +;; AND  x0, x0, x0, lsr 32
> > +;; AND  w0, w0, w0, lsr, 16
> > +;; AND  w0, w0, w0, lsr, 8
> > +
> > +(define_expand "reduc__scal_"
> > + [(match_operand: 0 "register_operand")
> > +  (LOGICAL:VDQV_E (match_operand:VDQV_E 1 "register_operand"))]
> > +  "TARGET_SIMD"
> > +  {
> > +rtx dst = operands[1];
> > +rtx tdi = gen_reg_rtx (DImode);
> > +rtx tsi = lowpart_subreg (SImode, tdi, DImode);
> > 

Re: [PATCH v2] aarch64: Improve scalar mode popcount expansion by using SVE [PR113860]

2024-10-10 Thread Richard Sandiford
Pengxuan Zheng  writes:
> This is similar to the recent improvements to the Advanced SIMD popcount
> expansion by using SVE. We can utilize SVE to generate more efficient code for
> scalar mode popcount too.
>
> Changes since v1:
> * v2: Add a new VNx1BI mode and a new test case for V1DI.

Sorry for the delay in reviewing this, and for the run-around,
but: following the later discussion in the FLOGB thread about using
SVE for Advanced SIMD modes, the agreement was to use the full SVE
predicate mode, but with predicate restricted to the leading 64 bits
or 128 bits (for 64-bit and 128-bit Advanced SIMD modes respectively).
I think we should do that even when it isn't strictly necessary, partly
so that all Advanced SIMD code uses the same predicate, and partly to
avoid bugs that only show up on VL>128 targets.

I'm afraid that means going back to VNx2BI, as in your original patch.
But we should use:

ptrue   pN.b, vl8

rather than:

ptrue   pN.b, all

to set the predicate.  We could do this by adding;

rtx
aarch64_ptrue_reg (machine_mode mode, unsigned int vl)

where "vl" is 8 for 64-bit modes and 16 for 128-bit modes.  Like with
the current aarch64_ptrue_reg, the predicate would always be constructed
in VNx16BImode and then cast to the right mode.

Thanks,
Richard

>
>   PR target/113860
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-modes.def (VECTOR_BOOL_MODE): Add VNx1BI.
>   (ADJUST_NUNITS): Likewise.
>   (ADJUST_ALIGNMENT): Likewise.
>   * config/aarch64/aarch64-simd.md (popcount2): Update pattern to
>   also support V1DI mode.
>   * config/aarch64/aarch64.cc (aarch64_sve_pred_mode_p): Add VNx1BImode.
>   * config/aarch64/aarch64.md (popcount2): Add TARGET_SVE support.
>   * config/aarch64/iterators.md (VDQHSD_V1DI): New mode iterator.
>   (SVE_VDQ_I): Add V1DI.
>   (bitsize): Likewise.
>   (VPRED): Likewise.
>   (VEC_POP_MODE): New mode attribute.
>   (vec_pop_mode): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/popcnt11.c: New test.
>   * gcc.target/aarch64/popcnt12.c: New test.
>
> Signed-off-by: Pengxuan Zheng 
> ---
>  gcc/config/aarch64/aarch64-modes.def|  3 ++
>  gcc/config/aarch64/aarch64-simd.md  | 13 -
>  gcc/config/aarch64/aarch64.cc   |  3 +-
>  gcc/config/aarch64/aarch64.md   |  9 
>  gcc/config/aarch64/iterators.md | 16 --
>  gcc/testsuite/gcc.target/aarch64/popcnt11.c | 58 +
>  gcc/testsuite/gcc.target/aarch64/popcnt12.c | 18 +++
>  7 files changed, 114 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt11.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt12.c
>
> diff --git a/gcc/config/aarch64/aarch64-modes.def 
> b/gcc/config/aarch64/aarch64-modes.def
> index 25a22c1195e..d822d4dfc13 100644
> --- a/gcc/config/aarch64/aarch64-modes.def
> +++ b/gcc/config/aarch64/aarch64-modes.def
> @@ -53,18 +53,21 @@ VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
>  VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
>  VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
>  VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
> +VECTOR_BOOL_MODE (VNx1BI, 1, BI, 2);
>  
>  ADJUST_NUNITS (VNx32BI, aarch64_sve_vg * 16);
>  ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
>  ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
>  ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
>  ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
> +ADJUST_NUNITS (VNx1BI, exact_div (aarch64_sve_vg, 2));
>  
>  ADJUST_ALIGNMENT (VNx32BI, 2);
>  ADJUST_ALIGNMENT (VNx16BI, 2);
>  ADJUST_ALIGNMENT (VNx8BI, 2);
>  ADJUST_ALIGNMENT (VNx4BI, 2);
>  ADJUST_ALIGNMENT (VNx2BI, 2);
> +ADJUST_ALIGNMENT (VNx1BI, 2);
>  
>  /* Bfloat16 modes.  */
>  FLOAT_MODE (BF, 2, 0);
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 23c03a96371..386b1fa1f4b 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3515,8 +3515,9 @@ (define_insn "popcount2"
>  )
>  
>  (define_expand "popcount2"
> -  [(set (match_operand:VDQHSD 0 "register_operand")
> - (popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))]
> +  [(set (match_operand:VDQHSD_V1DI 0 "register_operand")
> + (popcount:VDQHSD_V1DI
> +   (match_operand:VDQHSD_V1DI 1 "register_operand")))]
>"TARGET_SIMD"
>{
>  if (TARGET_SVE)
> @@ -3528,6 +3529,14 @@ (define_expand "popcount2"
>   DONE;
>}
>  
> +if (mode == V1DImode)
> +  {
> + rtx out = gen_reg_rtx (DImode);
> + emit_insn (gen_popcountdi2 (out, gen_lowpart (DImode, operands[1])));
> + emit_move_insn (operands[0], gen_lowpart (mode, out));
> + DONE;
> +  }
> +
>  /* Generate a byte popcount.  */
>  machine_mode mode =  == 64 ? V8QImode : V16QImode;
>  rtx tmp = gen_reg_rtx (mode);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 92763d403c7..78f65f886b7 100644
> --- a/gcc/co

Re: [PATCH 3/3] aarch64: libgcc: Add -Werror support

2024-10-10 Thread Eric Gallager
On Wed, Oct 9, 2024 at 4:54 AM Christophe Lyon
 wrote:
>
> On Wed, 9 Oct 2024 at 03:05, Eric Gallager  wrote:
> >
> > On Tue, Oct 8, 2024 at 6:25 AM Richard Sandiford
> >  wrote:
> > >
> > > Christophe Lyon  writes:
> > > > When --enable-werror is enabled when running the top-level configure,
> > > > it passes --enable-werror-always to subdirs.  Some of them, like
> > > > libgcc, ignore it.
> > > >
> > > > This patch adds support for it, enabled only for aarch64, to avoid
> > > > breaking bootstrap for other targets.
> > > >
> > > > The patch also adds -Wno-prio-ctor-dtor to avoid a warning when 
> > > > compiling lse_init.c
> > > >
> > > >   libgcc/
> > > >   * Makefile.in (WERROR): New.
> > > >   * config/aarch64/t-aarch64: Handle WERROR. Always use
> > > >   -Wno-prio-ctor-dtor.
> > > >   * configure.ac: Add support for --enable-werror-always.
> > > >   * configure: Regenerate.
> > > > ---
> > > >  libgcc/Makefile.in  |  1 +
> > > >  libgcc/config/aarch64/t-aarch64 |  1 +
> > > >  libgcc/configure| 31 +++
> > > >  libgcc/configure.ac |  5 +
> > > >  4 files changed, 38 insertions(+)
> > > >
> > > > [...]
> > > > diff --git a/libgcc/configure.ac b/libgcc/configure.ac
> > > > index 4e8c036990f..6b3ea2aea5c 100644
> > > > --- a/libgcc/configure.ac
> > > > +++ b/libgcc/configure.ac
> > > > @@ -13,6 +13,7 @@ sinclude(../config/unwind_ipinfo.m4)
> > > >  sinclude(../config/gthr.m4)
> > > >  sinclude(../config/sjlj.m4)
> > > >  sinclude(../config/cet.m4)
> > > > +sinclude(../config/warnings.m4)
> > > >
> > > >  AC_INIT([GNU C Runtime Library], 1.0,,[libgcc])
> > > >  AC_CONFIG_SRCDIR([static-object.mk])
> > > > @@ -746,6 +747,10 @@ AC_SUBST(HAVE_STRUB_SUPPORT)
> > > >  # Determine what GCC version number to use in filesystem paths.
> > > >  GCC_BASE_VER
> > > >
> > > > +# Only enable with --enable-werror-always until existing warnings are
> > > > +# corrected.
> > > > +ACX_PROG_CC_WARNINGS_ARE_ERRORS([manual])
> > >
> > > It looks like this is borrowed from libcpp and/or libdecnumber.
> > > Those are a bit different from libgcc in that they're host libraries
> > > that can be built with any supported compiler (including non-GCC ones).
> > > In constrast, libgcc can only be built with the corresponding version
> > > of GCC.  The usual restrictions on -Werror -- only use it during stages
> > > 2 and 3, or if the user explicitly passes --enable-werror -- don't apply
> > > in libgcc's case.  We should always be building with the "right" version
> > > of GCC (even for Canadian crosses) and so should always be able to use
> > > -Werror.
> > >
> > > So personally, I think we should just go with:
> > >
> > > diff --git a/libgcc/config/aarch64/t-aarch64 
> > > b/libgcc/config/aarch64/t-aarch64
> > > index b70e7b94edd..ae1588ce307 100644
> > > --- a/libgcc/config/aarch64/t-aarch64
> > > +++ b/libgcc/config/aarch64/t-aarch64
> > > @@ -30,3 +30,4 @@ LIB2ADDEH += \
> > > $(srcdir)/config/aarch64/__arm_za_disable.S
> > >
> > >  SHLIB_MAPFILES += $(srcdir)/config/aarch64/libgcc-sme.ver
> > > +LIBGCC2_CFLAGS += $(WERROR) -Wno-prio-ctor-dtor
> > >
> > > ...this, but with $(WERROR) replaced by -Werror.
> > >
> > > At least, it would be a good way of finding out if there's a case
> > > I've forgotten :)
> > >
> > > Let's see what others think though.
> >
> > I think it would be worthwhile to test this assumption first; I have a
> > vague memory of having seen warnings in libgcc previously that would
> > presumably get turned into errors if -Werror were applied
> > unconditionally...
> >
> Sorry, it's not clear to me what you mean by "test this assumption" ?
> Do you mean I should push the patch with unconditional -Werror and
> monitor what happens for a while?
> Or investigate more / other targets?
> Or wait for others to commit?
>

I mean, I think we should try the original approach of having it be
enableable manually first, let some people test by enabling it
manually, and then if they all report back success, then we can go
ahead with the unconditional -Werror version of it.

> Thanks,
>
> Christophe
>
> > >
> > > Thanks,
> > > Richard


Re: [PATCH v2] Add -ftime-report-wall

2024-10-10 Thread Eric Gallager
On Mon, Oct 7, 2024 at 6:27 AM Richard Biener
 wrote:
>
> On Sat, Oct 5, 2024 at 10:17 AM Andi Kleen  wrote:
> >
> > From: Andi Kleen 
> >
> > Time vars normally use times(2) to get the user/sys/wall time, which is 
> > always a
> > system call. I don't think the system time is very useful because most 
> > overhead
> > is in user time. If we only use the wall (or monotonic) time modern OS have 
> > an
> > optimized path to get it directly from a CPU instruction like RDTSC
> > without system call, which is much faster.
> >
> > Add a -ftime-report-wall option. It actually uses the POSIX monotonic time,
> > so strictly it's not wall clock, but it's still a reasonable name.
> >
> > Comparing the overhead with tramp3d -O0:
> >
> >   ./gcc/cc1plus -quiet  ../tsrc/tramp3d-v4.i ran
> > 1.03 ± 0.00 times faster than ./gcc/cc1plus -quiet -ftime-report-wall 
> > ../tsrc/tramp3d-v4.i
> > 1.18 ± 0.00 times faster than ./gcc/cc1plus -quiet -ftime-report 
> > ../tsrc/tramp3d-v4.i
> >
> > -ftime-report costs 18% (excluding the output), while -ftime-report-wall
> > only costs 3%, so is nearly free. So it would be feasible for some build
> > system to always enable it and break down the build time into passes.
> >
> > With -O2 it is a bit less pronounced but still visible:
> >
> >   ./gcc/cc1plus -O2 -quiet  ../tsrc/tramp3d-v4.i ran
> > 1.00 ± 0.00 times faster than ./gcc/cc1plus -O2 -quiet 
> > -ftime-report-wall ../tsrc/tramp3d-v4.i
> > 1.08 ± 0.01 times faster than ./gcc/cc1plus -O2 -quiet -ftime-report 
> > ../tsrc/tramp3d-v4.i
> >
> > The drawback is that if there is context switching with other programs
> > the time will be overestimated, however for the common case that the
> > system is not oversubscribed it is more accurate because each
> > measurement has less overhead.
> >
> > Bootstrapped on x86_64-linux with full test suite run.
>
> Thanks for doing this - I'd like to open up for discussion whether we
> should simply
> switch the default and stop recording user/system time for
> -ftime-report.  One reason
> some infrastructure isn't using fine-grained timevars is because of overhead.
>
> So, shouldn't we go without the new option and simply change
> -ftime-report behavior?

Personally I'd prefer the original approach of adding a separate
-ftime-report-wall flag, instead of changing the output of the
existing -ftime-report flag...

>
> Related - with -ftime-trace coming up again recently I wonder if we
> should transition
> to -ftime-report={user,wall,details,trace,...} allowing
> -ftime-report=user,details.  I'll note
> that while adding an option, removing it later is always difficult.
>
> Richard.
>
> > gcc/ChangeLog:
> >
> > * common.opt (ftime-report-wall): Add.
> > * common.opt.urls: Regenerate.
> > * doc/invoke.texi: (ftime-report-wall): Document
> > * gcc.cc (try_generate_repro): Check for -ftime-report-wall.
> > * timevar.cc (get_time): Use clock_gettime if enabled.
> > (timer::print): Print only wall time for time_report_wall.
> > (make_json_for_timevar_time_def): Dito.
> > * toplev.cc (toplev::start_timevars): Check for time_report_wall.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/ext/timevar3.C: New test.
> >
> > ---
> >
> > v2: Adjust JSON/Sarif output too.
> > ---
> >  gcc/common.opt  |  4 +++
> >  gcc/common.opt.urls |  3 +++
> >  gcc/doc/invoke.texi |  7 ++
> >  gcc/gcc.cc  |  3 ++-
> >  gcc/testsuite/g++.dg/ext/timevar3.C | 14 +++
> >  gcc/timevar.cc  | 38 +++--
> >  gcc/toplev.cc   |  3 ++-
> >  7 files changed, 62 insertions(+), 10 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/ext/timevar3.C
> >
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index 12b25ff486de..a200a8a0bc45 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -3014,6 +3014,10 @@ ftime-report
> >  Common Var(time_report)
> >  Report the time taken by each compiler pass.
> >
> > +ftime-report-wall
> > +Common Var(time_report_wall)
> > +Report the wall time taken by each compiler.
> > +
> >  ftime-report-details
> >  Common Var(time_report_details)
> >  Record times taken by sub-phases separately.
> > diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
> > index e31736cd9945..6e79a8f9390b 100644
> > --- a/gcc/common.opt.urls
> > +++ b/gcc/common.opt.urls
> > @@ -1378,6 +1378,9 @@ 
> > UrlSuffix(gcc/Optimize-Options.html#index-fthread-jumps)
> >  ftime-report
> >  UrlSuffix(gcc/Developer-Options.html#index-ftime-report)
> >
> > +ftime-report-wall
> > +UrlSuffix(gcc/Developer-Options.html#index-ftime-report-wall)
> > +
> >  ftime-report-details
> >  UrlSuffix(gcc/Developer-Options.html#index-ftime-report-details)
> >
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index d38c1feb86f7..8c11d12e7521 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/

Re: [PATCH 0/2] Prime path coverage to gcc/gcov

2024-10-10 Thread Jørgen Kvalsvik

Ping.

On 10/3/24 12:46, Jørgen Kvalsvik wrote:

This is both a ping and a minor update. A few of the patches from the
previous set have been merged, but the big feature still needs review.

Since then it has been quiet, but there are two notable changes:

1. The --prime-paths-{lines,source} flags take an optional argument to
print covered or uncovered paths, or both. By default, uncovered
paths are printed like before.
2. Fixed a bad vector access when independent functions share compiler
generated statements. A reproducing case is in gcov-23.C which
relied on printing the uncovered path of multiple destructors of
static objects.

Jørgen Kvalsvik (2):
   gcov: branch, conds, calls in function summaries
   Add prime path coverage to gcc/gcov

  gcc/Makefile.in|6 +-
  gcc/builtins.cc|2 +-
  gcc/collect2.cc|5 +-
  gcc/common.opt |   16 +
  gcc/doc/gcov.texi  |  184 +++
  gcc/doc/invoke.texi|   36 +
  gcc/gcc.cc |4 +-
  gcc/gcov-counter.def   |3 +
  gcc/gcov-io.h  |3 +
  gcc/gcov.cc|  531 ++-
  gcc/ipa-inline.cc  |2 +-
  gcc/passes.cc  |4 +-
  gcc/path-coverage.cc   |  782 +
  gcc/prime-paths.cc | 2031 
  gcc/profile.cc |6 +-
  gcc/selftest-run-tests.cc  |1 +
  gcc/selftest.h |1 +
  gcc/testsuite/g++.dg/gcov/gcov-22.C|  170 ++
  gcc/testsuite/g++.dg/gcov/gcov-23-1.h  |9 +
  gcc/testsuite/g++.dg/gcov/gcov-23-2.h  |9 +
  gcc/testsuite/g++.dg/gcov/gcov-23.C|   30 +
  gcc/testsuite/gcc.misc-tests/gcov-29.c |  869 ++
  gcc/testsuite/gcc.misc-tests/gcov-30.c |  869 ++
  gcc/testsuite/gcc.misc-tests/gcov-31.c |   35 +
  gcc/testsuite/gcc.misc-tests/gcov-32.c |   24 +
  gcc/testsuite/gcc.misc-tests/gcov-33.c |   27 +
  gcc/testsuite/gcc.misc-tests/gcov-34.c |   29 +
  gcc/testsuite/lib/gcov.exp |  118 +-
  gcc/tree-profile.cc|   11 +-
  29 files changed, 5795 insertions(+), 22 deletions(-)
  create mode 100644 gcc/path-coverage.cc
  create mode 100644 gcc/prime-paths.cc
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-22.C
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23-1.h
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23-2.h
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23.C
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-29.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-30.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-31.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-32.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-33.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-34.c





Re: [PATCH] i386: Fix some patterns's mem attribute.

2024-10-10 Thread Uros Bizjak
On Thu, Oct 10, 2024 at 5:46 AM Hu, Lin1  wrote:
>
> Hi, all
>
> This is another patch to modify some pattern's type attr from ssemov to
> ssemov2.
>
> Some ssemov pattern's mem attr should be load when their 2 operand is a memory
> operand.
>
> Bootstrapped and regtested on x86-64-linux-pc, OK for trunk?
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
> * config/i386/sse.md
> (sse_movhlps): Change type attr from ssemov to ssemov2.
> (sse_loadhps): Ditto.
> (*vec_concat): Ditto.
> (vec_setv2df_0): Ditto.
> (sse_loadlps): Change attr from ssemov to ssemov2 except for 2, 3.
> (sse2_loadhps): Change attr from ssemov to ssemov2 except for 0, 1.
> (sse2_loadlpd): Change attr from ssemov to ssemov2 except for 0, 1,
> 2.
> (sse2_movsd_): Change attr from ssemov to ssemov2 except for 5.
> (vec_concatv2df): Change attr from ssemov to ssemov2 except for 0, 1,
> 2.
> (*vec_concat): Change attr from ssemov to ssemov2 for 3, 4.
> (vec_concatv2di): Change attr from ssemov to ssemov2 except for 0, 1,
> 2, 3, 4, 5.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/sse.md | 22 --
>  1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index ccef3e063ec..a45b50ad732 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -10995,7 +10995,7 @@ (define_insn "sse_movhlps"
> vmovlps\t{%H2, %1, %0|%0, %1, %H2}
> %vmovhps\t{%2, %0|%q0, %2}"
>[(set_attr "isa" "noavx,avx,noavx,avx,*")
> -   (set_attr "type" "ssemov")
> +   (set_attr "type" "ssemov2")
> (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex,maybe_vex")
> (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
>
> @@ -11557,7 +11557,7 @@ (define_insn "sse_loadhps"
> vmovlhps\t{%2, %1, %0|%0, %1, %2}
> %vmovlps\t{%2, %H0|%H0, %2}"
>[(set_attr "isa" "noavx,avx,noavx,avx,*")
> -   (set_attr "type" "ssemov")
> +   (set_attr "type" "ssemov2")
> (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex,maybe_vex")
> (set_attr "mode" "V2SF,V2SF,V4SF,V4SF,V2SF")])
>
> @@ -11610,7 +11610,7 @@ (define_insn "sse_loadlps"
> vmovlps\t{%2, %1, %0|%0, %1, %q2}
> %vmovlps\t{%2, %0|%q0, %2}"
>[(set_attr "isa" "noavx,avx,noavx,avx,*")
> -   (set_attr "type" "sseshuf,sseshuf,ssemov,ssemov,ssemov")
> +   (set_attr "type" "sseshuf,sseshuf,ssemov2,ssemov2,ssemov")
> (set (attr "length_immediate")
>   (if_then_else (eq_attr "alternative" "0,1")
>(const_string "1")
> @@ -11766,7 +11766,7 @@ (define_insn "*vec_concat"
> movhps\t{%2, %0|%0, %q2}
> vmovhps\t{%2, %1, %0|%0, %1, %q2}"
>[(set_attr "isa" "noavx,avx,noavx,avx")
> -   (set_attr "type" "ssemov")
> +   (set_attr "type" "ssemov2")
> (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex")
> (set_attr "mode" "V4SF,V4SF,V2SF,V2SF")])
>
> @@ -12214,7 +12214,7 @@ (define_insn "vec_setv2df_0"
> movlpd\t{%2, %0|%0, %2}
> vmovlpd\t{%2, %1, %0|%0, %1, %2}"
>[(set_attr "isa" "noavx,avx,noavx,avx")
> -   (set_attr "type" "ssemov")
> +   (set_attr "type" "ssemov2")
> (set_attr "mode" "DF")])
>
>  (define_expand "vec_set"
> @@ -14665,7 +14665,7 @@ (define_insn "sse2_loadhpd"
> #
> #"
>[(set_attr "isa" "noavx,avx,noavx,avx,*,*,*")
> -   (set_attr "type" "ssemov,ssemov,sselog,sselog,ssemov,fmov,imov")
> +   (set_attr "type" "ssemov2,ssemov2,sselog,sselog,ssemov,fmov,imov")
> (set (attr "prefix_data16")
>   (if_then_else (eq_attr "alternative" "0")
>(const_string "1")
> @@ -14735,6 +14735,8 @@ (define_insn "sse2_loadlpd"
>   (const_string "fmov")
> (eq_attr "alternative" "10")
>   (const_string "imov")
> +   (eq_attr "alternative" "0,1,2")
> + (const_string "ssemov2")
>]
>(const_string "ssemov")))
> (set (attr "prefix_data16")
> @@ -14787,7 +14789,7 @@ (define_insn "sse2_movsd_"
>   (if_then_else
> (eq_attr "alternative" "5")
> (const_string "sselog")
> -   (const_string "ssemov")))
> +   (const_string "ssemov2")))
> (set (attr "prefix_data16")
>   (if_then_else
> (and (eq_attr "alternative" "2,4")
> @@ -14859,7 +14861,7 @@ (define_insn "vec_concatv2df"
>   (if_then_else
> (eq_attr "alternative" "0,1,2")
> (const_string "sselog")
> -   (const_string "ssemov")))
> +   (const_string "ssemov2")))
> (set (attr "prefix_data16")
> (if_then_else (eq_attr "alternative" "3")
>   (const_string "1")
> @@ -21545,7 +21547,7 @@ (define_insn "*vec_concat"
> movhps\t{%2, %0|%0, %q2}
> vmovhps\t{%2, %1, %0|%0, %1, %q2}"
>[(set_attr "isa" "sse2_noavx,avx,noavx,noavx,avx")
> -   (set_attr "type" "sselog,sselog,ssemov,ssemov,ssemov")
> +   (set_attr "type" "sselog,sselog,ssemov,ssemov2,ssemov2")
> (set_attr "prefix" "orig,maybe_evex,o

Re: [PATCH] [PR113816] AArch64: Use SVE bit op reduction for vector reductions

2024-10-10 Thread Jennifer Schmitz


> On 2 Oct 2024, at 14:34, Tamar Christina  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
>> -Original Message-
>> From: Kyrylo Tkachov 
>> Sent: Wednesday, October 2, 2024 1:09 PM
>> To: Richard Sandiford 
>> Cc: Tamar Christina ; Jennifer Schmitz
>> ; gcc-patches@gcc.gnu.org; Kyrylo Tkachov
>> 
>> Subject: Re: [PATCH] [PR113816] AArch64: Use SVE bit op reduction for vector
>> reductions
>> 
>> 
>> 
>>> On 2 Oct 2024, at 13:43, Richard Sandiford 
>> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Tamar Christina  writes:
 Hi Jennifer,
 
> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, October 1, 2024 12:20 PM
> To: Jennifer Schmitz 
> Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov
>> 
> Subject: Re: [PATCH] [PR113816] AArch64: Use SVE bit op reduction for 
> vector
> reductions
> 
> Jennifer Schmitz  writes:
>> This patch implements the optabs reduc_and_scal_,
>> reduc_ior_scal_, and reduc_xor_scal_ for Advanced SIMD
>> integers for TARGET_SVE in order to use the SVE instructions ANDV, ORV, 
>> and
>> EORV for fixed-width bitwise reductions.
>> For example, the test case
>> 
>> int32_t foo (int32_t *a)
>> {
>> int32_t b = -1;
>> for (int i = 0; i < 4; ++i)
>>   b &= a[i];
>> return b;
>> }
>> 
>> was previously compiled to
>> (-O2 -ftree-vectorize --param aarch64-autovec-preference=asimd-only):
>> foo:
>>   ldp w2, w1, [x0]
>>   ldp w3, w0, [x0, 8]
>>   and w1, w1, w3
>>   and w0, w0, w2
>>   and w0, w1, w0
>>   ret
>> 
>> With patch, it is compiled to:
>> foo:
>>   ldr q31, [x0]
>>  ptrue   p7.b, all
>>  andvs31, p7, z31.s
>>  fmovw0, s3
>>  ret
>> 
>> Test cases were added to check the produced assembly for use of SVE
>> instructions.
> 
> I would imagine that in this particular case, the scalar version is
> better.  But I agree it's a useful feature for other cases.
> 
 
 Yeah, I'm concerned because ANDV and other reductions are extremely
>> expensive.
 But assuming the reductions are done outside of a loop then it should be 
 ok,
>> though.
 
 The issue is that the reduction latency grows with VL, so e.g. compare the
>> latencies and
 throughput for Neoverse V1 and Neoverse V2.  So I think we want to gate 
 this
>> on VL128.
 
 As an aside, is the sequence correct?  With ORR reduction ptrue makes 
 sense,
>> but for
 VL > 128 ptrue doesn't work as the top bits would be zero. So an ANDV on 
 zero
>> values
 lanes would result in zero.
>>> 
>>> Argh!  Thanks for spotting that.  I'm kicking myself for missing it :(
>>> 
 You'd want to predicate the ANDV with the size of the vector being reduced.
>> The same
 is true for SMIN and SMAX.
 
 I do wonder whether we need to split the pattern into two, where w->w uses
>> the SVE
 Instructions but w->r uses Adv SIMD.
 
 In the case of w->r as the example above
 
   ext v1.16b, v0.16b, v0.16b, #8
   and v0.8b, v0.8b, v1.8b
   fmovx8, d0
   lsr x9, x8, #32
   and w0, w8, w9
 
 would beat the ADDV on pretty much every uarch.
 
 But I'll leave it up to the maintainers.
>>> 
>>> Also a good point.  And since these are integer reductions, an r
>>> result is more probable than a w result.  w would typically only
>>> be used if the result is stored directly to memory.
>>> 
>>> At which point, the question (which you might have been implying)
>>> is whether it's worth doing this at all, given the limited cases
>>> for which it's beneficial, and the complication that's needed to
>>> (a) detect those cases and (b) make them work.
>> 
>> These are good points in the thread. Maybe it makes sense to do this only for
>> V16QI reductions?
>> Maybe a variant of Tamar’s w->r sequence wins out even there.
> 
> I do agree that they're worth implementing, and also for 64-bit vectors 
> (there you
> Skip the first reduction and just fmov the value to gpr since you don't have 
> the
> Initial 128 -> 64 bit reduction step),
> 
> But I think at the moment they're possibly not modelled as reductions in our
> cost model.  Like Richard mentioned I don't think the low iteration cases
> should vectorize and instead just unroll.
> 
>> 
>> Originally I had hoped that we’d tackle the straight-line case from PR113816 
>> but it
>> seems that GCC didn’t even try to create a reduction op for the code there.
>> Maybe that’s something to look into separately.
> 
> Yeah, I think unrolled scalar is going to beat the ORV there as you can have 
> better
> throughput doing the reductions in pairs.
> 
>> 
>> Also, for the alternative test case that we tried to use for a motivation:
>> cha

[ABOUT-TO-PUSH PATCH] MAINTAINERS (s390 port): Add myself

2024-10-10 Thread Stefan Schulze Frielinghaus
ChangeLog:

* MAINTAINERS (s390 port): Add myself.
---
 I hope the overflow into the email column doesn't break any tooling.  I
 will leave the patch as is for some time before pushing.

 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9257b33ff08..f76d12f7f3f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -120,6 +120,7 @@ rs6000 vector extns Aldy Hernandez  

 rx port Nick Clifton
 s390 port   Ulrich Weigand  
 s390 port   Andreas Krebbel 
+s390 port   Stefan Schulze Frielinghaus 
 sh port Alexandre Oliva 
 sh port Oleg Endo   
 sparc port  David S. Miller 
-- 
2.45.2



[PATCH][PR113816] AArch64: Use SIMD+GPR for logical vector reductions

2024-10-10 Thread Jennifer Schmitz
This patch implements the optabs reduc_and_scal_,
reduc_ior_scal_, and reduc_xor_scal_ for ASIMD modes V8QI,
V16QI, V4HI, and V8HI for TARGET_SIMD to improve codegen for bitwise logical
vector reduction operations.
Previously, either only vector registers or only general purpose registers (GPR)
were used. Now, vector registers are used for the reduction from 128 to 64 bits;
64-bit GPR are used for the reduction from 64 to 32 bits; and 32-bit GPR are 
used
for the rest of the reduction steps.

For example, the test case (V8HI)
int16_t foo (int16_t *a)
{
  int16_t b = -1;
  for (int i = 0; i < 8; ++i)
b &= a[i];
  return b;
}

was previously compiled to (-O2):
foo:
ldr q0, [x0]
moviv30.4s, 0
ext v29.16b, v0.16b, v30.16b, #8
and v29.16b, v29.16b, v0.16b
ext v31.16b, v29.16b, v30.16b, #4
and v31.16b, v31.16b, v29.16b
ext v30.16b, v31.16b, v30.16b, #2
and v30.16b, v30.16b, v31.16b
umovw0, v30.h[0]
ret

With patch, it is compiled to:
foo:
ldr q31, [x0]
ext v30.16b, v31.16b, v31.16b, #8
and v31.8b, v30.8b, v31.8b
fmovx0, d31
and x0, x0, x0, lsr 32
and w0, w0, w0, lsr 16
ret

For modes V4SI and V2DI, the pattern was not implemented, because the
current codegen (using only base instructions) is already efficient.

Note that the PR initially suggested to use SVE reduction ops. However,
they have higher latency than the proposed sequence, which is why using
neon and base instructions is preferable.

Test cases were added for 8/16-bit integers for all implemented modes and all
three operations to check the produced assembly.

We also added [istarget aarch64*-*-*] to the selector vect_logical_reduc,
because for aarch64 vector types, either the logical reduction optabs are
implemented or the codegen for reduction operations is good as it is.
This was motivated by failure of a scan-tree-dump directive in the test cases
gcc.dg/vect/vect-reduc-or_1.c and gcc.dg/vect/vect-reduc-or_2.c.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/
PR target/113816
* config/aarch64/aarch64-simd.md (reduc__scal_):
Implement for logical bitwise operations for VDQV_E.

gcc/testsuite/
PR target/113816
* lib/target-supports.exp (vect_logical_reduc): Add aarch64*.
* gcc.target/aarch64/simd/logical_reduc.c: New test.
* gcc.target/aarch64/vect-reduc-or_1.c: Adjust expected outcome.
---
 gcc/config/aarch64/aarch64-simd.md|  55 +
 .../gcc.target/aarch64/simd/logical_reduc.c   | 208 ++
 .../gcc.target/aarch64/vect-reduc-or_1.c  |   2 +-
 gcc/testsuite/lib/target-supports.exp |   4 +-
 4 files changed, 267 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/logical_reduc.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 23c03a96371..00286b8b020 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3608,6 +3608,61 @@
   }
 )
 
+;; Emit a sequence for bitwise logical reductions over vectors for V8QI, V16QI,
+;; V4HI, and V8HI modes.  The reduction is achieved by iteratively operating
+;; on the two halves of the input.
+;; If the input has 128 bits, the first operation is performed in vector
+;; registers.  From 64 bits down, the reduction steps are performed in general
+;; purpose registers.
+;; For example, for V8HI and operation AND, the intended sequence is:
+;; EXT  v1.16b, v0.16b, v0.16b, #8
+;; AND  v0.8b, v1.8b, v0.8b
+;; FMOV x0, d0
+;; AND  x0, x0, x0, 32
+;; AND  w0, w0, w0, 16
+;;
+;; For V8QI and operation AND, the sequence is:
+;; AND  x0, x0, x0, lsr 32
+;; AND  w0, w0, w0, lsr, 16
+;; AND  w0, w0, w0, lsr, 8
+
+(define_expand "reduc__scal_"
+ [(match_operand: 0 "register_operand")
+  (LOGICAL:VDQV_E (match_operand:VDQV_E 1 "register_operand"))]
+  "TARGET_SIMD"
+  {
+rtx dst = operands[1];
+rtx tdi = gen_reg_rtx (DImode);
+rtx tsi = lowpart_subreg (SImode, tdi, DImode);
+rtx op1_lo;
+if (known_eq (GET_MODE_SIZE (mode), 16))
+  {
+   rtx t0 = gen_reg_rtx (mode);
+   rtx t1 = gen_reg_rtx (DImode);
+   rtx t2 = gen_reg_rtx (DImode);
+   rtx idx = GEN_INT (8 / GET_MODE_UNIT_SIZE (mode));
+   emit_insn (gen_aarch64_ext (t0, dst, dst, idx));
+   op1_lo = lowpart_subreg (V2DImode, dst, mode);
+   rtx t0_lo = lowpart_subreg (V2DImode, t0, mode);
+   emit_insn (gen_aarch64_get_lanev2di (t1, op1_lo, GEN_INT (0)));
+   emit_insn (gen_aarch64_get_lanev2di (t2, t0_lo, GEN_INT (0)));
+   emit_insn (gen_di3 (t1, t1, t2));
+   emit_move_insn (tdi, t1);
+  }
+else
+  {
+   op1_lo = lowpart_subreg (DImode, dst, mode);
+   emit_move_insn (t

[Patch] (was: [Patch] Fortran/OpenMP: Fix __builtin_omp_is_initial_device)

2024-10-10 Thread Tobias Burnus

Sometimes waiting a bit leads to better code …

Tobias Burnus wrote:

...
[I guess, we eventually want to add support for more builtins. For 
instance, acc_on_device would be a candidate, but I could imagine some 
additional builtins.]


I have now implemented acc_on_device and I think the new fix-up function 
is way is nicer.


Thus, this patch does:

* (v1) Fix omp_is_initial_device → do only replace when used in calls 
(and not when used as function pointer/actual to a dummy function) + fix 
ICE due to integer(4) != logical(4) in the middle end.


* (new) For OpenACC, use a builtin for acc_on_device + actually do 
compile-time optimization when offloading is not configured.


* (new) libgomp.texi: Typo fixes accumulated, fix wording, and for 
acc_on_device, add a note that compile-time folding may be done (and how 
it can be disabled).


For OpenACC, I now mix compile time folding vs. runtime to ensure that 
it works.


Tested on x86-64 without and with offloading configured, running with 
nvptx offloading.


Code review, comments, suggestions, remarks?

Tobias

PS: The testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c example 
is not completely clear to me; however, the new optimization causes that 
without offloading enabled, the dump message is not shown. I tried to 
understand it better with -fno-builtin-acc_on_device, but that then 
caused link errors as the device function wasn't optimizated away, 
leaving me puzzled. — At the end, I just changed the dg-* and did not 
try to understand the issue.
Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device

It turned out that 'if (omp_is_initial_device() .eqv. true)' gave an ICE
due to comparing 'int' with 'logical(4)'. When digging deeper, it also
turned out that when the procedure pointer is needed, the builtin cannot
be used, either.  (Follow up to r15-2799-gf1bfba3a9b3f31 )

Extend the code to also use the builtin acc_on_device with OpenACC,
which was previously only used in C/C++.  Additionally, fix folding
when offloading is not enabled.

Fixes additionally the BT_BOOL data type, which was 'char'/integer(1)
instead of bool, backing the booleaness; use bool_type_node as the rest
of GCC.

gcc/fortran/ChangeLog:

	* gfortran.h (gfc_option_t): Add disable_acc_on_device.
	* options.cc (gfc_handle_option): Handle -fno-builtin-acc_on_device.
	* trans-decl.cc (gfc_get_extern_function_decl): Move
	__builtin_omp_is_initial_device handling to ...
	* trans-expr.cc (get_builtin_fn): ... this new function.
	(conv_function_val): Call it.
	(update_builtin_function): New.
	(gfc_conv_procedure_call): Call it.
	* types.def (BT_BOOL): Fix type by using bool_type_node.

gcc/ChangeLog:

	* gimple-fold.cc (gimple_fold_builtin_acc_on_device): Also fold
	when offloading is not configured.

libgomp/ChangeLog:

	* libgomp.texi (TR13): Fix minor typos.
	(omp_is_initial_device): Improve wording.
	(acc_on_device): Note how to disable the builtin.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Remove TODO.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
	Add -fno-builtin-acc_on_device.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Update
	dg- as !offloading_enabled now compile-time expands acc_on_device.
	* testsuite/libgomp.fortran/target-is-initial-device-3.f90: New test.
	* testsuite/libgomp.oacc-fortran/acc_on_device-2.f90: New test.

 gcc/fortran/gfortran.h |  3 +-
 gcc/fortran/options.cc |  5 +-
 gcc/fortran/trans-decl.cc  |  9 
 gcc/fortran/trans-expr.cc  | 58 +++---
 gcc/fortran/types.def  |  3 +-
 gcc/gimple-fold.cc |  2 +-
 libgomp/libgomp.texi   | 18 ---
 .../libgomp.fortran/target-is-initial-device-3.f90 | 50 +++
 .../libgomp.oacc-c-c++-common/routine-nohost-1.c   |  3 +-
 .../libgomp.oacc-fortran/acc_on_device-1-1.f90 |  5 --
 .../libgomp.oacc-fortran/acc_on_device-1-2.f   |  7 +--
 .../libgomp.oacc-fortran/acc_on_device-1-3.f   |  7 +--
 .../libgomp.oacc-fortran/acc_on_device-2.f90   | 40 +++
 13 files changed, 164 insertions(+), 46 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 917866a7ef0..680e7f7b75b 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3200,7 +3200,8 @@ typedef struct
   int flag_init_logical;
   int flag_init_character;
   char flag_init_character_value;
-  int disable_omp_is_initial_device;
+  bool disable_omp_is_initial_device;
+  bool disable_acc_on_device;
 
   int fpe;
   int fpe_summary;
diff --git a/gcc/fortran/options.cc b/gcc/fortran/options.cc
index 6f2579ad9de..4920691dba6 100644
--- a/gcc/fortran/options.cc
+++ b/gcc/fortran/options.cc
@@ -864,11 +864,14 @@ gfc_handle_option (size_t scode

[Patch] Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device (was: [Patch] (was: [Patch] Fortran/OpenMP: Fix __builtin_omp_is_initial_device))

2024-10-10 Thread Tobias Burnus
I forgot to update the subject line. To make it easier to find (patch 
archeology), now with proper subject line …


Tobias Burnus wrote:

Sometimes waiting a bit leads to better code …

Tobias Burnus wrote:

...
[I guess, we eventually want to add support for more builtins. For 
instance, acc_on_device would be a candidate, but I could imagine 
some additional builtins.]


I have now implemented acc_on_device and I think the new fix-up 
function is way is nicer.


Thus, this patch does:

* (v1) Fix omp_is_initial_device → do only replace when used in calls 
(and not when used as function pointer/actual to a dummy function) + 
fix ICE due to integer(4) != logical(4) in the middle end.


* (new) For OpenACC, use a builtin for acc_on_device + actually do 
compile-time optimization when offloading is not configured.


* (new) libgomp.texi: Typo fixes accumulated, fix wording, and for 
acc_on_device, add a note that compile-time folding may be done (and 
how it can be disabled).


For OpenACC, I now mix compile time folding vs. runtime to ensure that 
it works.


Tested on x86-64 without and with offloading configured, running with 
nvptx offloading.


Code review, comments, suggestions, remarks?

Tobias

PS: The testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c example 
is not completely clear to me; however, the new optimization causes 
that without offloading enabled, the dump message is not shown. I 
tried to understand it better with -fno-builtin-acc_on_device, but 
that then caused link errors as the device function wasn't optimizated 
away, leaving me puzzled. — At the end, I just changed the dg-* and 
did not try to understand the issue.Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device

It turned out that 'if (omp_is_initial_device() .eqv. true)' gave an ICE
due to comparing 'int' with 'logical(4)'. When digging deeper, it also
turned out that when the procedure pointer is needed, the builtin cannot
be used, either.  (Follow up to r15-2799-gf1bfba3a9b3f31 )

Extend the code to also use the builtin acc_on_device with OpenACC,
which was previously only used in C/C++.  Additionally, fix folding
when offloading is not enabled.

Fixes additionally the BT_BOOL data type, which was 'char'/integer(1)
instead of bool, backing the booleaness; use bool_type_node as the rest
of GCC.

gcc/fortran/ChangeLog:

	* gfortran.h (gfc_option_t): Add disable_acc_on_device.
	* options.cc (gfc_handle_option): Handle -fno-builtin-acc_on_device.
	* trans-decl.cc (gfc_get_extern_function_decl): Move
	__builtin_omp_is_initial_device handling to ...
	* trans-expr.cc (get_builtin_fn): ... this new function.
	(conv_function_val): Call it.
	(update_builtin_function): New.
	(gfc_conv_procedure_call): Call it.
	* types.def (BT_BOOL): Fix type by using bool_type_node.

gcc/ChangeLog:

	* gimple-fold.cc (gimple_fold_builtin_acc_on_device): Also fold
	when offloading is not configured.

libgomp/ChangeLog:

	* libgomp.texi (TR13): Fix minor typos.
	(omp_is_initial_device): Improve wording.
	(acc_on_device): Note how to disable the builtin.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Remove TODO.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
	Add -fno-builtin-acc_on_device.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Update
	dg- as !offloading_enabled now compile-time expands acc_on_device.
	* testsuite/libgomp.fortran/target-is-initial-device-3.f90: New test.
	* testsuite/libgomp.oacc-fortran/acc_on_device-2.f90: New test.

 gcc/fortran/gfortran.h |  3 +-
 gcc/fortran/options.cc |  5 +-
 gcc/fortran/trans-decl.cc  |  9 
 gcc/fortran/trans-expr.cc  | 58 +++---
 gcc/fortran/types.def  |  3 +-
 gcc/gimple-fold.cc |  2 +-
 libgomp/libgomp.texi   | 18 ---
 .../libgomp.fortran/target-is-initial-device-3.f90 | 50 +++
 .../libgomp.oacc-c-c++-common/routine-nohost-1.c   |  3 +-
 .../libgomp.oacc-fortran/acc_on_device-1-1.f90 |  5 --
 .../libgomp.oacc-fortran/acc_on_device-1-2.f   |  7 +--
 .../libgomp.oacc-fortran/acc_on_device-1-3.f   |  7 +--
 .../libgomp.oacc-fortran/acc_on_device-2.f90   | 40 +++
 13 files changed, 164 insertions(+), 46 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 917866a7ef0..680e7f7b75b 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3200,7 +3200,8 @@ typedef struct
   int flag_init_logical;
   int flag_init_character;
   char flag_init_character_value;
-  int disable_omp_is_initial_device;
+  bool disable_omp_is_initial_device;
+  bool disable_acc_on_device;
 
   int fpe;
   int fpe_summary;
diff --git a/gcc/fortran/options.cc b/gcc/fortran/options.cc
index 6f2579

Re: [PATCH] [PR116831] match.pd: Check trunc_mod vector obtap before folding.

2024-10-10 Thread Jennifer Schmitz


> On 10 Oct 2024, at 09:03, Richard Biener  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, 9 Oct 2024, Jennifer Schmitz wrote:
> 
>> 
>>> On 8 Oct 2024, at 10:31, Richard Biener  wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> On Fri, 4 Oct 2024, Jennifer Schmitz wrote:
>>> 
 As in https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663185.html,
 this patch guards the simplification x / y * y == x -> x % y == 0 in
 match.pd for vector types by a check for:
 1) Support of the mod optab for vectors OR
 2) Application before vector lowering for non-VL vectors.
 
 The patch was bootstrapped and tested with no regression on
 aarch64-linux-gnu and x86_64-linux-gnu.
 OK for mainline?
>>> 
>>> -  (if (TREE_CODE (TREE_TYPE (@0)) != COMPLEX_TYPE)
>>> +  (if (TREE_CODE (TREE_TYPE (@0)) != COMPLEX_TYPE
>>> +   || (VECTOR_INTEGER_TYPE_P (type)
>>> +  && ((optimize_vectors_before_lowering_p ()
>>> +   && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST)
>>> +  || target_supports_op_p (type, TRUNC_MOD_EXPR,
>>> +   optab_vector
>>> 
>>> this looks a bit odd, VECTOR_INTEGER_TYPE_P (type) checks the
>>> result type of the comparison.  I think the whole condition is
>>> better written as
>>> 
>>> (if (TREE_CODE (TREE_TYPE (@0)) != COMPLEX_TYPE
>>> && (!VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (@0)))
>>> || !target_supports_op_p (TREE_TYPE (@0), TRUNC_DIV_EXPR,
>>>   optab_vector)
>>> || target_supports_op_p (TREE_TYPE (@0), TRUNC_MOD_EXPR,
>>>   optab_vector)))
>>> 
>>> when we have non-vector mode we're before lowering, likewise when
>>> the target doesn't support the division.  Even before lowering
>>> we shouldn't replace a supported division (and multiplication)
>>> with an unsupported modulo.
>> Dear Richard,
>> thanks for the review. I updated the patch with your suggestion and 
>> re-validated on aarch64 and x86_64.
>> Best,
>> Jennifer
>> 
>> This patch guards the simplification x / y * y == x -> x % y == 0 in
>> match.pd by a check for:
>> 1) Non-vector mode of x OR
>> 2) Lack of support for vector division OR
>> 3) Support of vector modulo
>> 
>> The patch was bootstrapped and tested with no regression on
>> aarch64-linux-gnu and x86_64-linux-gnu.
>> OK for mainline?
> 
> OK.
Thanks, commited with a2e06b7f081a3d2e50e3afa8d3f1676a05099707.
> 
> Thanks,
> Richard.
> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/
>>  PR tree-optimization/116831
>>  * match.pd: Guard simplification to trunc_mod with check for
>>  mod optab support.
>> 
>> gcc/testsuite/
>>  PR tree-optimization/116831
>>  * gcc.dg/torture/pr116831.c: New test.
>> ---
>> gcc/match.pd|  9 +++--
>> gcc/testsuite/gcc.dg/torture/pr116831.c | 10 ++
>> 2 files changed, 17 insertions(+), 2 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/torture/pr116831.c
>> 
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index ba83f0f29e6..9b59b5c12f1 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -5380,8 +5380,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> /* x / y * y == x -> x % y == 0.  */
>> (simplify
>>   (eq:c (mult:c (trunc_div:s @0 @1) @1) @0)
>> -  (if (TREE_CODE (TREE_TYPE (@0)) != COMPLEX_TYPE)
>> -(eq (trunc_mod @0 @1) { build_zero_cst (TREE_TYPE (@0)); })))
>> +  (if (TREE_CODE (TREE_TYPE (@0)) != COMPLEX_TYPE
>> +   && (!VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (@0)))
>> +|| !target_supports_op_p (TREE_TYPE (@0), TRUNC_DIV_EXPR,
>> +  optab_vector)
>> +|| target_supports_op_p (TREE_TYPE (@0), TRUNC_MOD_EXPR,
>> + optab_vector)))
>> +   (eq (trunc_mod @0 @1) { build_zero_cst (TREE_TYPE (@0)); })))
>> 
>> /* ((X /[ex] A) +- B) * A  -->  X +- A * B.  */
>> (for op (plus minus)
>> diff --git a/gcc/testsuite/gcc.dg/torture/pr116831.c 
>> b/gcc/testsuite/gcc.dg/torture/pr116831.c
>> new file mode 100644
>> index 000..92b2a130e69
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/torture/pr116831.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-additional-options "-mcpu=neoverse-v2" { target aarch64*-*-* } } */
>> +
>> +long a;
>> +int b, c;
>> +void d (int e[][5], short f[][5][5][5])
>> +{
>> +  for (short g; g; g += 4)
>> +a = c ?: e[6][0] % b ? 0 : f[0][0][0][g];
>> +}
>> +
>> 
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)



smime.p7s
Description: S/MIME cryptographic signature


[PATCH v1 4/4] RISC-V: Add testcases for form 8 of scalar signed SAT_TRUNC

2024-10-10 Thread pan2 . li
From: Pan Li 

Form 8:
  #define DEF_SAT_S_TRUNC_FMT_8(NT, WT, NT_MIN, NT_MAX) \
  NT __attribute__((noinline))  \
  sat_s_trunc_##WT##_to_##NT##_fmt_8 (WT x) \
  { \
NT trunc = (NT)x;   \
return (WT)NT_MIN > x || x >= (WT)NT_MAX\
  ? x < 0 ? NT_MIN : NT_MAX \
  : trunc;  \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_trunc-8-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-8-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-8-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-8-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-8-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-8-i64-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-8-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-8-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-8-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-8-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-8-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-run-8-i64-to-i8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 15 ++
 .../riscv/sat_s_trunc-8-i16-to-i8.c   | 26 +
 .../riscv/sat_s_trunc-8-i32-to-i16.c  | 28 +++
 .../riscv/sat_s_trunc-8-i32-to-i8.c   | 26 +
 .../riscv/sat_s_trunc-8-i64-to-i16.c  | 28 +++
 .../riscv/sat_s_trunc-8-i64-to-i32.c  | 26 +
 .../riscv/sat_s_trunc-8-i64-to-i8.c   | 26 +
 .../riscv/sat_s_trunc-run-8-i16-to-i8.c   | 16 +++
 .../riscv/sat_s_trunc-run-8-i32-to-i16.c  | 16 +++
 .../riscv/sat_s_trunc-run-8-i32-to-i8.c   | 16 +++
 .../riscv/sat_s_trunc-run-8-i64-to-i16.c  | 16 +++
 .../riscv/sat_s_trunc-run-8-i64-to-i32.c  | 16 +++
 .../riscv/sat_s_trunc-run-8-i64-to-i8.c   | 16 +++
 13 files changed, 271 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-8-i16-to-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-8-i32-to-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-8-i32-to-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-8-i64-to-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-8-i64-to-i32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-8-i64-to-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-8-i16-to-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-8-i32-to-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-8-i32-to-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-8-i64-to-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-8-i64-to-i32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-8-i64-to-i8.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 189babd22f1..2cbd1f18c8d 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -549,6 +549,18 @@ sat_s_trunc_##WT##_to_##NT##_fmt_7 (WT x) \
 #define DEF_SAT_S_TRUNC_FMT_7_WRAP(NT, WT, NT_MIN, NT_MAX) \
   DEF_SAT_S_TRUNC_FMT_7(NT, WT, NT_MIN, NT_MAX)
 
+#define DEF_SAT_S_TRUNC_FMT_8(NT, WT, NT_MIN, NT_MAX) \
+NT __attribute__((noinline))  \
+sat_s_trunc_##WT##_to_##NT##_fmt_8 (WT x) \
+{ \
+  NT trunc = (NT)x;   \
+  return (WT)NT_MIN > x || x >= (WT)NT_MAX\
+? x < 0 ? NT_MIN : NT_MAX \
+: trunc;  \
+}
+#define DEF_SAT_S_TRUNC_FMT_8_WRAP(NT, WT, NT_MIN, NT_MAX) \
+  DEF_SAT_S_TRUNC_FMT_8(NT, WT, NT_MIN, NT_MAX)
+
 #define RUN_SAT_S_TRUNC_FMT_1(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_1 (x)
 #define RUN_SAT_S_TRUNC_FMT_1_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_1(NT, WT, x)
 
@@ -570,4 +582,7 @@ sat_s_trunc_##WT##_to_##NT##_fmt_7 (WT x) \
 #define RUN_SAT_S_TRUNC_FMT_7(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_7 (x)
 #define RUN_SAT_S_TRUNC_FMT_7_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_7(NT, WT, x)
 
+#define RUN_SAT_S_TRUNC_FMT_8(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_8 (x)
+#define RUN_SAT_S_TRUNC_FMT_8_WRAP(NT, WT, x) RUN_SAT_S_TRU

[PATCH v1 2/4] RISC-V: Add testcases for form 6 of scalar signed SAT_TRUNC

2024-10-10 Thread pan2 . li
From: Pan Li 

Form 6:
  #define DEF_SAT_S_TRUNC_FMT_6(NT, WT, NT_MIN, NT_MAX) \
  NT __attribute__((noinline))  \
  sat_s_trunc_##WT##_to_##NT##_fmt_6 (WT x) \
  { \
NT trunc = (NT)x;   \
return (WT)NT_MIN >= x || x > (WT)NT_MAX\
  ? x < 0 ? NT_MIN : NT_MAX \
  : trunc;  \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_trunc-6-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-6-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-6-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-6-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-6-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-6-i64-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-6-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-6-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-6-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-6-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-6-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-run-6-i64-to-i8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 15 ++
 .../riscv/sat_s_trunc-6-i16-to-i8.c   | 26 +
 .../riscv/sat_s_trunc-6-i32-to-i16.c  | 28 +++
 .../riscv/sat_s_trunc-6-i32-to-i8.c   | 26 +
 .../riscv/sat_s_trunc-6-i64-to-i16.c  | 28 +++
 .../riscv/sat_s_trunc-6-i64-to-i32.c  | 26 +
 .../riscv/sat_s_trunc-6-i64-to-i8.c   | 26 +
 .../riscv/sat_s_trunc-run-6-i16-to-i8.c   | 16 +++
 .../riscv/sat_s_trunc-run-6-i32-to-i16.c  | 16 +++
 .../riscv/sat_s_trunc-run-6-i32-to-i8.c   | 16 +++
 .../riscv/sat_s_trunc-run-6-i64-to-i16.c  | 16 +++
 .../riscv/sat_s_trunc-run-6-i64-to-i32.c  | 16 +++
 .../riscv/sat_s_trunc-run-6-i64-to-i8.c   | 16 +++
 13 files changed, 271 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-6-i16-to-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-6-i32-to-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-6-i32-to-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-6-i64-to-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-6-i64-to-i32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-6-i64-to-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-6-i16-to-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-6-i32-to-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-6-i32-to-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-6-i64-to-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-6-i64-to-i32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-6-i64-to-i8.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index e3c01724f07..7a5110248f4 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -525,6 +525,18 @@ sat_s_trunc_##WT##_to_##NT##_fmt_5 (WT x) \
 #define DEF_SAT_S_TRUNC_FMT_5_WRAP(NT, WT, NT_MIN, NT_MAX) \
   DEF_SAT_S_TRUNC_FMT_5(NT, WT, NT_MIN, NT_MAX)
 
+#define DEF_SAT_S_TRUNC_FMT_6(NT, WT, NT_MIN, NT_MAX) \
+NT __attribute__((noinline))  \
+sat_s_trunc_##WT##_to_##NT##_fmt_6 (WT x) \
+{ \
+  NT trunc = (NT)x;   \
+  return (WT)NT_MIN >= x || x > (WT)NT_MAX\
+? x < 0 ? NT_MIN : NT_MAX \
+: trunc;  \
+}
+#define DEF_SAT_S_TRUNC_FMT_6_WRAP(NT, WT, NT_MIN, NT_MAX) \
+  DEF_SAT_S_TRUNC_FMT_6(NT, WT, NT_MIN, NT_MAX)
+
 #define RUN_SAT_S_TRUNC_FMT_1(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_1 (x)
 #define RUN_SAT_S_TRUNC_FMT_1_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_1(NT, WT, x)
 
@@ -540,4 +552,7 @@ sat_s_trunc_##WT##_to_##NT##_fmt_5 (WT x) \
 #define RUN_SAT_S_TRUNC_FMT_5(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_5 (x)
 #define RUN_SAT_S_TRUNC_FMT_5_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_5(NT, WT, x)
 
+#define RUN_SAT_S_TRUNC_FMT_6(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_6 (x)
+#define RUN_SAT_S_TRUNC_FMT_6_WRAP(NT, WT, x) RUN_SAT_S_TRU

[PATCH v1 3/4] RISC-V: Add testcases for form 7 of scalar signed SAT_TRUNC

2024-10-10 Thread pan2 . li
From: Pan Li 

Form 7:
  #define DEF_SAT_S_TRUNC_FMT_7(NT, WT, NT_MIN, NT_MAX) \
  NT __attribute__((noinline))  \
  sat_s_trunc_##WT##_to_##NT##_fmt_7 (WT x) \
  { \
NT trunc = (NT)x;   \
return (WT)NT_MIN >= x || x >= (WT)NT_MAX   \
  ? x < 0 ? NT_MIN : NT_MAX \
  : trunc;  \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_trunc-7-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-7-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-7-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-7-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-7-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-7-i64-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-7-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-7-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-7-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-7-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-7-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-run-7-i64-to-i8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 15 ++
 .../riscv/sat_s_trunc-7-i16-to-i8.c   | 26 +
 .../riscv/sat_s_trunc-7-i32-to-i16.c  | 28 +++
 .../riscv/sat_s_trunc-7-i32-to-i8.c   | 26 +
 .../riscv/sat_s_trunc-7-i64-to-i16.c  | 28 +++
 .../riscv/sat_s_trunc-7-i64-to-i32.c  | 26 +
 .../riscv/sat_s_trunc-7-i64-to-i8.c   | 26 +
 .../riscv/sat_s_trunc-run-7-i16-to-i8.c   | 16 +++
 .../riscv/sat_s_trunc-run-7-i32-to-i16.c  | 16 +++
 .../riscv/sat_s_trunc-run-7-i32-to-i8.c   | 16 +++
 .../riscv/sat_s_trunc-run-7-i64-to-i16.c  | 16 +++
 .../riscv/sat_s_trunc-run-7-i64-to-i32.c  | 16 +++
 .../riscv/sat_s_trunc-run-7-i64-to-i8.c   | 16 +++
 13 files changed, 271 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-7-i16-to-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-7-i32-to-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-7-i32-to-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-7-i64-to-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-7-i64-to-i32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-7-i64-to-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-7-i16-to-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-7-i32-to-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-7-i32-to-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-7-i64-to-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-7-i64-to-i32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-7-i64-to-i8.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 7a5110248f4..189babd22f1 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -537,6 +537,18 @@ sat_s_trunc_##WT##_to_##NT##_fmt_6 (WT x) \
 #define DEF_SAT_S_TRUNC_FMT_6_WRAP(NT, WT, NT_MIN, NT_MAX) \
   DEF_SAT_S_TRUNC_FMT_6(NT, WT, NT_MIN, NT_MAX)
 
+#define DEF_SAT_S_TRUNC_FMT_7(NT, WT, NT_MIN, NT_MAX) \
+NT __attribute__((noinline))  \
+sat_s_trunc_##WT##_to_##NT##_fmt_7 (WT x) \
+{ \
+  NT trunc = (NT)x;   \
+  return (WT)NT_MIN >= x || x >= (WT)NT_MAX   \
+? x < 0 ? NT_MIN : NT_MAX \
+: trunc;  \
+}
+#define DEF_SAT_S_TRUNC_FMT_7_WRAP(NT, WT, NT_MIN, NT_MAX) \
+  DEF_SAT_S_TRUNC_FMT_7(NT, WT, NT_MIN, NT_MAX)
+
 #define RUN_SAT_S_TRUNC_FMT_1(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_1 (x)
 #define RUN_SAT_S_TRUNC_FMT_1_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_1(NT, WT, x)
 
@@ -555,4 +567,7 @@ sat_s_trunc_##WT##_to_##NT##_fmt_6 (WT x) \
 #define RUN_SAT_S_TRUNC_FMT_6(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_6 (x)
 #define RUN_SAT_S_TRUNC_FMT_6_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_6(NT, WT, x)
 
+#define RUN_SAT_S_TRUNC_FMT_7(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_7 (x)
+#define RUN_SAT_S_TRUNC_FMT_7_WRAP(NT, WT, x) RUN_SAT_S_TRU

[PATCH v1 1/4] RISC-V: Add testcases for form 5 of scalar signed SAT_TRUNC

2024-10-10 Thread pan2 . li
From: Pan Li 

Form 5:
  #define DEF_SAT_S_TRUNC_FMT_5(NT, WT, NT_MIN, NT_MAX) \
  NT __attribute__((noinline))  \
  sat_s_trunc_##WT##_to_##NT##_fmt_5 (WT x) \
  { \
NT trunc = (NT)x;   \
return (WT)NT_MIN > x || x > (WT)NT_MAX \
  ? x < 0 ? NT_MIN : NT_MAX \
  : trunc;  \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_trunc-5-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-5-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-5-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-5-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-5-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-5-i64-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-5-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-5-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-5-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-5-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-5-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-run-5-i64-to-i8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 15 ++
 .../riscv/sat_s_trunc-5-i16-to-i8.c   | 26 +
 .../riscv/sat_s_trunc-5-i32-to-i16.c  | 28 +++
 .../riscv/sat_s_trunc-5-i32-to-i8.c   | 26 +
 .../riscv/sat_s_trunc-5-i64-to-i16.c  | 28 +++
 .../riscv/sat_s_trunc-5-i64-to-i32.c  | 26 +
 .../riscv/sat_s_trunc-5-i64-to-i8.c   | 26 +
 .../riscv/sat_s_trunc-run-5-i16-to-i8.c   | 16 +++
 .../riscv/sat_s_trunc-run-5-i32-to-i16.c  | 16 +++
 .../riscv/sat_s_trunc-run-5-i32-to-i8.c   | 16 +++
 .../riscv/sat_s_trunc-run-5-i64-to-i16.c  | 16 +++
 .../riscv/sat_s_trunc-run-5-i64-to-i32.c  | 16 +++
 .../riscv/sat_s_trunc-run-5-i64-to-i8.c   | 16 +++
 13 files changed, 271 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-5-i16-to-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-5-i32-to-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-5-i32-to-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-5-i64-to-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-5-i64-to-i32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-5-i64-to-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-5-i16-to-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-5-i32-to-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-5-i32-to-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-5-i64-to-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-5-i64-to-i32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-5-i64-to-i8.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 0b3d0ea7073..e3c01724f07 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -513,6 +513,18 @@ sat_s_trunc_##WT##_to_##NT##_fmt_4 (WT x) \
 #define DEF_SAT_S_TRUNC_FMT_4_WRAP(NT, WT, NT_MIN, NT_MAX) \
   DEF_SAT_S_TRUNC_FMT_4(NT, WT, NT_MIN, NT_MAX)
 
+#define DEF_SAT_S_TRUNC_FMT_5(NT, WT, NT_MIN, NT_MAX) \
+NT __attribute__((noinline))  \
+sat_s_trunc_##WT##_to_##NT##_fmt_5 (WT x) \
+{ \
+  NT trunc = (NT)x;   \
+  return (WT)NT_MIN > x || x > (WT)NT_MAX \
+? x < 0 ? NT_MIN : NT_MAX \
+: trunc;  \
+}
+#define DEF_SAT_S_TRUNC_FMT_5_WRAP(NT, WT, NT_MIN, NT_MAX) \
+  DEF_SAT_S_TRUNC_FMT_5(NT, WT, NT_MIN, NT_MAX)
+
 #define RUN_SAT_S_TRUNC_FMT_1(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_1 (x)
 #define RUN_SAT_S_TRUNC_FMT_1_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_1(NT, WT, x)
 
@@ -525,4 +537,7 @@ sat_s_trunc_##WT##_to_##NT##_fmt_4 (WT x) \
 #define RUN_SAT_S_TRUNC_FMT_4(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_4 (x)
 #define RUN_SAT_S_TRUNC_FMT_4_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_4(NT, WT, x)
 
+#define RUN_SAT_S_TRUNC_FMT_5(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_5 (x)
+#define RUN_SAT_S_TRUNC_FMT_5_WRAP(NT, WT, x) RUN_SAT_S_TRU

Re: [PATCH v1 2/2] RISC-V: Add testcases for form 4 of scalar signed SAT_TRUNC

2024-10-10 Thread 钟居哲
lgtm



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-10-10 14:53
To: gcc-patches
CC: richard.guenther; Tamar.Christina; juzhe.zhong; kito.cheng; jeffreyalaw; 
rdapp.gcc; Pan Li
Subject: [PATCH v1 2/2] RISC-V: Add testcases for form 4 of scalar signed 
SAT_TRUNC
From: Pan Li 
 
Form 4:
  #define DEF_SAT_S_TRUNC_FMT_4(NT, WT, NT_MIN, NT_MAX) \
  NT __attribute__((noinline))  \
  sat_s_trunc_##WT##_to_##NT##_fmt_4 (WT x) \
  { \
NT trunc = (NT)x;   \
return (WT)NT_MIN <= x && x < (WT)NT_MAX\
  ? trunc   \
  : x < 0 ? NT_MIN : NT_MAX;\
  }
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_trunc-4-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-4-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-4-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-4-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-4-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-4-i64-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-4-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-4-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-4-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-4-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-4-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-run-4-i64-to-i8.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h| 15 ++
.../riscv/sat_s_trunc-4-i16-to-i8.c   | 26 +
.../riscv/sat_s_trunc-4-i32-to-i16.c  | 28 +++
.../riscv/sat_s_trunc-4-i32-to-i8.c   | 26 +
.../riscv/sat_s_trunc-4-i64-to-i16.c  | 28 +++
.../riscv/sat_s_trunc-4-i64-to-i32.c  | 26 +
.../riscv/sat_s_trunc-4-i64-to-i8.c   | 26 +
.../riscv/sat_s_trunc-run-4-i16-to-i8.c   | 16 +++
.../riscv/sat_s_trunc-run-4-i32-to-i16.c  | 16 +++
.../riscv/sat_s_trunc-run-4-i32-to-i8.c   | 16 +++
.../riscv/sat_s_trunc-run-4-i64-to-i16.c  | 16 +++
.../riscv/sat_s_trunc-run-4-i64-to-i32.c  | 16 +++
.../riscv/sat_s_trunc-run-4-i64-to-i8.c   | 16 +++
13 files changed, 271 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-4-i16-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-4-i32-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-4-i32-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-4-i64-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-4-i64-to-i32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-4-i64-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-4-i16-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-4-i32-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-4-i32-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-4-i64-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-4-i64-to-i32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-4-i64-to-i8.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 607bc4fc82e..0b3d0ea7073 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -501,6 +501,18 @@ sat_s_trunc_##WT##_to_##NT##_fmt_3 (WT x) \
#define DEF_SAT_S_TRUNC_FMT_3_WRAP(NT, WT, NT_MIN, NT_MAX) \
   DEF_SAT_S_TRUNC_FMT_3(NT, WT, NT_MIN, NT_MAX)
+#define DEF_SAT_S_TRUNC_FMT_4(NT, WT, NT_MIN, NT_MAX) \
+NT __attribute__((noinline))  \
+sat_s_trunc_##WT##_to_##NT##_fmt_4 (WT x) \
+{ \
+  NT trunc = (NT)x;   \
+  return (WT)NT_MIN <= x && x < (WT)NT_MAX\
+? trunc   \
+: x < 0 ? NT_MIN : NT_MAX;\
+}
+#define DEF_SAT_S_TRUNC_FMT_4_WRAP(NT, WT, NT_MIN, NT_MAX) \
+  DEF_SAT_S_TRUNC_FMT_4(NT, WT, NT_MIN, NT_MAX)
+
#define RUN_SAT_S_TRUNC_FMT_1(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_1 (x)
#define RUN_SAT_S_TRUNC_FMT_1_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_1(NT, WT, x)
@@ -510,4 +522,7 @@ sat_s_trunc_##WT##_to_##NT##_fmt_3 (WT x) \
#define RUN_SAT_S_TRUNC_FMT_3(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_3 (x)
#define RUN_SAT_S_TRUNC_FMT_3_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_3(NT, WT, x)
+#define RUN_SAT_S_TRUNC_FMT_4(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_4 (x)
+#define RUN_SAT_S_TRUNC_FMT_4_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_4(NT, WT, x)
+
#endif
diff --git a/gcc/testsuite/

Re: [PATCH v1 4/4] RISC-V: Add testcases for form 8 of scalar signed SAT_TRUNC

2024-10-10 Thread 钟居哲
lgtm



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-10-10 16:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1 4/4] RISC-V: Add testcases for form 8 of scalar signed 
SAT_TRUNC
From: Pan Li 
 
Form 8:
  #define DEF_SAT_S_TRUNC_FMT_8(NT, WT, NT_MIN, NT_MAX) \
  NT __attribute__((noinline))  \
  sat_s_trunc_##WT##_to_##NT##_fmt_8 (WT x) \
  { \
NT trunc = (NT)x;   \
return (WT)NT_MIN > x || x >= (WT)NT_MAX\
  ? x < 0 ? NT_MIN : NT_MAX \
  : trunc;  \
  }
 
The below test are passed for this patch.
* The rv64gcv fully regression test.
 
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_trunc-8-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-8-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-8-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-8-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-8-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-8-i64-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-8-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-8-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-8-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-8-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-8-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-run-8-i64-to-i8.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h| 15 ++
.../riscv/sat_s_trunc-8-i16-to-i8.c   | 26 +
.../riscv/sat_s_trunc-8-i32-to-i16.c  | 28 +++
.../riscv/sat_s_trunc-8-i32-to-i8.c   | 26 +
.../riscv/sat_s_trunc-8-i64-to-i16.c  | 28 +++
.../riscv/sat_s_trunc-8-i64-to-i32.c  | 26 +
.../riscv/sat_s_trunc-8-i64-to-i8.c   | 26 +
.../riscv/sat_s_trunc-run-8-i16-to-i8.c   | 16 +++
.../riscv/sat_s_trunc-run-8-i32-to-i16.c  | 16 +++
.../riscv/sat_s_trunc-run-8-i32-to-i8.c   | 16 +++
.../riscv/sat_s_trunc-run-8-i64-to-i16.c  | 16 +++
.../riscv/sat_s_trunc-run-8-i64-to-i32.c  | 16 +++
.../riscv/sat_s_trunc-run-8-i64-to-i8.c   | 16 +++
13 files changed, 271 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-8-i16-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-8-i32-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-8-i32-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-8-i64-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-8-i64-to-i32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-8-i64-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-8-i16-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-8-i32-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-8-i32-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-8-i64-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-8-i64-to-i32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-8-i64-to-i8.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 189babd22f1..2cbd1f18c8d 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -549,6 +549,18 @@ sat_s_trunc_##WT##_to_##NT##_fmt_7 (WT x) \
#define DEF_SAT_S_TRUNC_FMT_7_WRAP(NT, WT, NT_MIN, NT_MAX) \
   DEF_SAT_S_TRUNC_FMT_7(NT, WT, NT_MIN, NT_MAX)
+#define DEF_SAT_S_TRUNC_FMT_8(NT, WT, NT_MIN, NT_MAX) \
+NT __attribute__((noinline))  \
+sat_s_trunc_##WT##_to_##NT##_fmt_8 (WT x) \
+{ \
+  NT trunc = (NT)x;   \
+  return (WT)NT_MIN > x || x >= (WT)NT_MAX\
+? x < 0 ? NT_MIN : NT_MAX \
+: trunc;  \
+}
+#define DEF_SAT_S_TRUNC_FMT_8_WRAP(NT, WT, NT_MIN, NT_MAX) \
+  DEF_SAT_S_TRUNC_FMT_8(NT, WT, NT_MIN, NT_MAX)
+
#define RUN_SAT_S_TRUNC_FMT_1(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_1 (x)
#define RUN_SAT_S_TRUNC_FMT_1_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_1(NT, WT, x)
@@ -570,4 +582,7 @@ sat_s_trunc_##WT##_to_##NT##_fmt_7 (WT x) \
#define RUN_SAT_S_TRUNC_FMT_7(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_7 (x)
#define RUN_SAT_S_TRUNC_FMT_7_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_7(NT, WT, x)
+#define RUN_SAT_S_TRUNC_FMT_8(NT, WT, x) sat_s_t

Re: [PATCH v1 3/4] RISC-V: Add testcases for form 7 of scalar signed SAT_TRUNC

2024-10-10 Thread 钟居哲
lgtm



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-10-10 16:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1 3/4] RISC-V: Add testcases for form 7 of scalar signed 
SAT_TRUNC
From: Pan Li 
 
Form 7:
  #define DEF_SAT_S_TRUNC_FMT_7(NT, WT, NT_MIN, NT_MAX) \
  NT __attribute__((noinline))  \
  sat_s_trunc_##WT##_to_##NT##_fmt_7 (WT x) \
  { \
NT trunc = (NT)x;   \
return (WT)NT_MIN >= x || x >= (WT)NT_MAX   \
  ? x < 0 ? NT_MIN : NT_MAX \
  : trunc;  \
  }
 
The below test are passed for this patch.
* The rv64gcv fully regression test.
 
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_trunc-7-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-7-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-7-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-7-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-7-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-7-i64-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-7-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-7-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-7-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-7-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-7-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-run-7-i64-to-i8.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h| 15 ++
.../riscv/sat_s_trunc-7-i16-to-i8.c   | 26 +
.../riscv/sat_s_trunc-7-i32-to-i16.c  | 28 +++
.../riscv/sat_s_trunc-7-i32-to-i8.c   | 26 +
.../riscv/sat_s_trunc-7-i64-to-i16.c  | 28 +++
.../riscv/sat_s_trunc-7-i64-to-i32.c  | 26 +
.../riscv/sat_s_trunc-7-i64-to-i8.c   | 26 +
.../riscv/sat_s_trunc-run-7-i16-to-i8.c   | 16 +++
.../riscv/sat_s_trunc-run-7-i32-to-i16.c  | 16 +++
.../riscv/sat_s_trunc-run-7-i32-to-i8.c   | 16 +++
.../riscv/sat_s_trunc-run-7-i64-to-i16.c  | 16 +++
.../riscv/sat_s_trunc-run-7-i64-to-i32.c  | 16 +++
.../riscv/sat_s_trunc-run-7-i64-to-i8.c   | 16 +++
13 files changed, 271 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-7-i16-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-7-i32-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-7-i32-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-7-i64-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-7-i64-to-i32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-7-i64-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-7-i16-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-7-i32-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-7-i32-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-7-i64-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-7-i64-to-i32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-7-i64-to-i8.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 7a5110248f4..189babd22f1 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -537,6 +537,18 @@ sat_s_trunc_##WT##_to_##NT##_fmt_6 (WT x) \
#define DEF_SAT_S_TRUNC_FMT_6_WRAP(NT, WT, NT_MIN, NT_MAX) \
   DEF_SAT_S_TRUNC_FMT_6(NT, WT, NT_MIN, NT_MAX)
+#define DEF_SAT_S_TRUNC_FMT_7(NT, WT, NT_MIN, NT_MAX) \
+NT __attribute__((noinline))  \
+sat_s_trunc_##WT##_to_##NT##_fmt_7 (WT x) \
+{ \
+  NT trunc = (NT)x;   \
+  return (WT)NT_MIN >= x || x >= (WT)NT_MAX   \
+? x < 0 ? NT_MIN : NT_MAX \
+: trunc;  \
+}
+#define DEF_SAT_S_TRUNC_FMT_7_WRAP(NT, WT, NT_MIN, NT_MAX) \
+  DEF_SAT_S_TRUNC_FMT_7(NT, WT, NT_MIN, NT_MAX)
+
#define RUN_SAT_S_TRUNC_FMT_1(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_1 (x)
#define RUN_SAT_S_TRUNC_FMT_1_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_1(NT, WT, x)
@@ -555,4 +567,7 @@ sat_s_trunc_##WT##_to_##NT##_fmt_6 (WT x) \
#define RUN_SAT_S_TRUNC_FMT_6(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_6 (x)
#define RUN_SAT_S_TRUNC_FMT_6_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_6(NT, WT, x)
+#define RUN_SAT_S_TRUNC_FMT_7(NT, WT, x) sat_s_t

Re: [PATCH] aarch64: Fix folding of degenerate svwhilele case [PR117045]

2024-10-10 Thread Kyrylo Tkachov



> On 9 Oct 2024, at 17:35, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Tamar Christina  writes:
>> Hi Richard,
>> 
>>> -Original Message-
>>> From: Richard Sandiford 
>>> Sent: Wednesday, October 9, 2024 12:58 PM
>>> To: gcc-patches@gcc.gnu.org
>>> Cc: ktkac...@nvidia.com; Richard Earnshaw ;
>>> Tamar Christina 
>>> Subject: [PATCH] aarch64: Fix folding of degenerate svwhilele case 
>>> [PR117045]
>>> 
>>> The svwhilele folder mishandled the degenerate case in which
>>> the second argument is the maximum integer.  In that case,
>>> the result is all-true regardless of the first parameter:
>>> 
>>>  If the second scalar operand is equal to the maximum signed integer
>>>  value then a condition which includes an equality test can never fail
>>>  and the result will be an all-true predicate.
>>> 
>>> This is because the conceptual "increment the first operand
>>> by 1 after each element" is done modulo the range of the operand.
>>> The GCC code was instead treating it as infinite precision.
>>> whilele_5.c even had a test for the incorrect behaviour.
>>> 
>>> The easiest fix seemed to be to handle that case specially before
>>> doing constant folding.  This also copes with variable first operands.
>>> 
>>> Tested on aarch64-linux-gnu.  I'll push on Friday if there are no
>>> comments before then.  Since it's a wrong-code bug, I'd also like
>>> to backport to release branches.
>>> 
>>> Thanks,
>>> Richard
>>> 
>>> 
>>> gcc/
>>> PR target/116999
>>> PR target/117045
>>> * config/aarch64/aarch64-sve-builtins-base.cc
>>> (svwhilelx_impl::fold): Check for WHILELTs of the minimum value
>>> and WHILELEs of the maximum value.  Fold them to all-false and
>>> all-true respectively.
>>> 
>>> gcc/testsuite/
>>> PR target/116999
>>> PR target/117045
>>> * gcc.target/aarch64/sve/acle/general/whilele_5.c: Fix bogus
>>> expected result.
>>> * gcc.target/aarch64/sve/acle/general/whilele_11.c: New test.
>>> * gcc.target/aarch64/sve/acle/general/whilele_12.c: Likewise.
>>> ---
>>> .../aarch64/aarch64-sve-builtins-base.cc  | 11 +-
>>> .../aarch64/sve/acle/general/whilele_11.c | 31 +
>>> .../aarch64/sve/acle/general/whilele_12.c | 34 +++
>>> .../aarch64/sve/acle/general/whilele_5.c  |  2 +-
>>> 4 files changed, 76 insertions(+), 2 deletions(-)
>>> create mode 100644
>>> gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_11.c
>>> create mode 100644
>>> gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_12.c
>>> 
>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> index 4b33585d981..3d0975e4294 100644
>>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> @@ -2945,7 +2945,9 @@ public:
>>> : while_comparison (unspec_for_sint, unspec_for_uint), m_eq_p (eq_p)
>>>   {}
>>> 
>>> -  /* Try to fold a call by treating its arguments as constants of type T.  
>>> */
>>> +  /* Try to fold a call by treating its arguments as constants of type T.
>>> + We have already filtered out the degenerate cases of X .LT. MIN
>>> + and X .LE. MAX.  */
>>>   template
>>>   gimple *
>>>   fold_type (gimple_folder &f) const
>>> @@ -3001,6 +3003,13 @@ public:
>>> if (f.vectors_per_tuple () > 1)
>>>   return nullptr;
>>> 
>>> +/* Filter out cases where the condition is always true or always 
>>> false.  */
>>> +tree arg1 = gimple_call_arg (f.call, 1);
>>> +if (!m_eq_p && operand_equal_p (arg1, TYPE_MIN_VALUE (TREE_TYPE
>>> (arg1
>>> +  return f.fold_to_pfalse ();
>> 
>> Just a quick question for my own understanding, I assume the reason MIN
>> is handled here is because fold_type will decrement the value at some point?
>> 
>> Otherwise wouldn't MIN + 1 still fit inside the type's precision?
>> 
>> FWIW patch looks good to me, just wondering why the MIN case is needed :)
> 
> I admit it probably isn't needed to fix the bug.  I just though it would
> look strange if we handled the arg1 extremity for m_eq_p without also
> handling it for !m_eq_p.


The patch LGTM.
Thanks,
Kyrill

> 
> Thanks,
> Richard
> 
>> 
>> Cheers,
>> Tamar
>> 
>>> +if (m_eq_p && operand_equal_p (arg1, TYPE_MAX_VALUE (TREE_TYPE
>>> (arg1
>>> +  return f.fold_to_ptrue ();
>>> +
>>> if (f.type_suffix (1).unsigned_p)
>>>   return fold_type (f);
>>> else
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_11.c
>>> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_11.c
>>> new file mode 100644
>>> index 000..2be9dc5c534
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_11.c
>>> @@ -0,0 +1,31 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2" } */
>>> +
>>> +#include 
>>> +#include 
>>> +
>>> +svbool_t
>>> +f1 (volatile int32_t *ptr)
>>> +{
>>>

Re: [PATCH v1 1/4] RISC-V: Add testcases for form 5 of scalar signed SAT_TRUNC

2024-10-10 Thread 钟居哲
lgtm



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-10-10 16:33
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1 1/4] RISC-V: Add testcases for form 5 of scalar signed 
SAT_TRUNC
From: Pan Li 
 
Form 5:
  #define DEF_SAT_S_TRUNC_FMT_5(NT, WT, NT_MIN, NT_MAX) \
  NT __attribute__((noinline))  \
  sat_s_trunc_##WT##_to_##NT##_fmt_5 (WT x) \
  { \
NT trunc = (NT)x;   \
return (WT)NT_MIN > x || x > (WT)NT_MAX \
  ? x < 0 ? NT_MIN : NT_MAX \
  : trunc;  \
  }
 
The below test are passed for this patch.
* The rv64gcv fully regression test.
 
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_trunc-5-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-5-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-5-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-5-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-5-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-5-i64-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-5-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-5-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-5-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-5-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-5-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-run-5-i64-to-i8.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h| 15 ++
.../riscv/sat_s_trunc-5-i16-to-i8.c   | 26 +
.../riscv/sat_s_trunc-5-i32-to-i16.c  | 28 +++
.../riscv/sat_s_trunc-5-i32-to-i8.c   | 26 +
.../riscv/sat_s_trunc-5-i64-to-i16.c  | 28 +++
.../riscv/sat_s_trunc-5-i64-to-i32.c  | 26 +
.../riscv/sat_s_trunc-5-i64-to-i8.c   | 26 +
.../riscv/sat_s_trunc-run-5-i16-to-i8.c   | 16 +++
.../riscv/sat_s_trunc-run-5-i32-to-i16.c  | 16 +++
.../riscv/sat_s_trunc-run-5-i32-to-i8.c   | 16 +++
.../riscv/sat_s_trunc-run-5-i64-to-i16.c  | 16 +++
.../riscv/sat_s_trunc-run-5-i64-to-i32.c  | 16 +++
.../riscv/sat_s_trunc-run-5-i64-to-i8.c   | 16 +++
13 files changed, 271 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-5-i16-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-5-i32-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-5-i32-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-5-i64-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-5-i64-to-i32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-5-i64-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-5-i16-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-5-i32-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-5-i32-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-5-i64-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-5-i64-to-i32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-5-i64-to-i8.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 0b3d0ea7073..e3c01724f07 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -513,6 +513,18 @@ sat_s_trunc_##WT##_to_##NT##_fmt_4 (WT x) \
#define DEF_SAT_S_TRUNC_FMT_4_WRAP(NT, WT, NT_MIN, NT_MAX) \
   DEF_SAT_S_TRUNC_FMT_4(NT, WT, NT_MIN, NT_MAX)
+#define DEF_SAT_S_TRUNC_FMT_5(NT, WT, NT_MIN, NT_MAX) \
+NT __attribute__((noinline))  \
+sat_s_trunc_##WT##_to_##NT##_fmt_5 (WT x) \
+{ \
+  NT trunc = (NT)x;   \
+  return (WT)NT_MIN > x || x > (WT)NT_MAX \
+? x < 0 ? NT_MIN : NT_MAX \
+: trunc;  \
+}
+#define DEF_SAT_S_TRUNC_FMT_5_WRAP(NT, WT, NT_MIN, NT_MAX) \
+  DEF_SAT_S_TRUNC_FMT_5(NT, WT, NT_MIN, NT_MAX)
+
#define RUN_SAT_S_TRUNC_FMT_1(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_1 (x)
#define RUN_SAT_S_TRUNC_FMT_1_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_1(NT, WT, x)
@@ -525,4 +537,7 @@ sat_s_trunc_##WT##_to_##NT##_fmt_4 (WT x) \
#define RUN_SAT_S_TRUNC_FMT_4(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_4 (x)
#define RUN_SAT_S_TRUNC_FMT_4_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_4(NT, WT, x)
+#define RUN_SAT_S_TRUNC_FMT_5(NT, WT, x) sat_s_t

Re: [PATCH v1 2/4] RISC-V: Add testcases for form 6 of scalar signed SAT_TRUNC

2024-10-10 Thread 钟居哲
lgtm



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-10-10 16:33
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1 2/4] RISC-V: Add testcases for form 6 of scalar signed 
SAT_TRUNC
From: Pan Li 
 
Form 6:
  #define DEF_SAT_S_TRUNC_FMT_6(NT, WT, NT_MIN, NT_MAX) \
  NT __attribute__((noinline))  \
  sat_s_trunc_##WT##_to_##NT##_fmt_6 (WT x) \
  { \
NT trunc = (NT)x;   \
return (WT)NT_MIN >= x || x > (WT)NT_MAX\
  ? x < 0 ? NT_MIN : NT_MAX \
  : trunc;  \
  }
 
The below test are passed for this patch.
* The rv64gcv fully regression test.
 
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_trunc-6-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-6-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-6-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-6-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-6-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-6-i64-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-6-i16-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-6-i32-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-6-i32-to-i8.c: New test.
* gcc.target/riscv/sat_s_trunc-run-6-i64-to-i16.c: New test.
* gcc.target/riscv/sat_s_trunc-run-6-i64-to-i32.c: New test.
* gcc.target/riscv/sat_s_trunc-run-6-i64-to-i8.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/sat_arith.h| 15 ++
.../riscv/sat_s_trunc-6-i16-to-i8.c   | 26 +
.../riscv/sat_s_trunc-6-i32-to-i16.c  | 28 +++
.../riscv/sat_s_trunc-6-i32-to-i8.c   | 26 +
.../riscv/sat_s_trunc-6-i64-to-i16.c  | 28 +++
.../riscv/sat_s_trunc-6-i64-to-i32.c  | 26 +
.../riscv/sat_s_trunc-6-i64-to-i8.c   | 26 +
.../riscv/sat_s_trunc-run-6-i16-to-i8.c   | 16 +++
.../riscv/sat_s_trunc-run-6-i32-to-i16.c  | 16 +++
.../riscv/sat_s_trunc-run-6-i32-to-i8.c   | 16 +++
.../riscv/sat_s_trunc-run-6-i64-to-i16.c  | 16 +++
.../riscv/sat_s_trunc-run-6-i64-to-i32.c  | 16 +++
.../riscv/sat_s_trunc-run-6-i64-to-i8.c   | 16 +++
13 files changed, 271 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-6-i16-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-6-i32-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-6-i32-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-6-i64-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-6-i64-to-i32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-6-i64-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-6-i16-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-6-i32-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-6-i32-to-i8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-6-i64-to-i16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-6-i64-to-i32.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_trunc-run-6-i64-to-i8.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index e3c01724f07..7a5110248f4 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -525,6 +525,18 @@ sat_s_trunc_##WT##_to_##NT##_fmt_5 (WT x) \
#define DEF_SAT_S_TRUNC_FMT_5_WRAP(NT, WT, NT_MIN, NT_MAX) \
   DEF_SAT_S_TRUNC_FMT_5(NT, WT, NT_MIN, NT_MAX)
+#define DEF_SAT_S_TRUNC_FMT_6(NT, WT, NT_MIN, NT_MAX) \
+NT __attribute__((noinline))  \
+sat_s_trunc_##WT##_to_##NT##_fmt_6 (WT x) \
+{ \
+  NT trunc = (NT)x;   \
+  return (WT)NT_MIN >= x || x > (WT)NT_MAX\
+? x < 0 ? NT_MIN : NT_MAX \
+: trunc;  \
+}
+#define DEF_SAT_S_TRUNC_FMT_6_WRAP(NT, WT, NT_MIN, NT_MAX) \
+  DEF_SAT_S_TRUNC_FMT_6(NT, WT, NT_MIN, NT_MAX)
+
#define RUN_SAT_S_TRUNC_FMT_1(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_1 (x)
#define RUN_SAT_S_TRUNC_FMT_1_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_1(NT, WT, x)
@@ -540,4 +552,7 @@ sat_s_trunc_##WT##_to_##NT##_fmt_5 (WT x) \
#define RUN_SAT_S_TRUNC_FMT_5(NT, WT, x) sat_s_trunc_##WT##_to_##NT##_fmt_5 (x)
#define RUN_SAT_S_TRUNC_FMT_5_WRAP(NT, WT, x) RUN_SAT_S_TRUNC_FMT_5(NT, WT, x)
+#define RUN_SAT_S_TRUNC_FMT_6(NT, WT, x) sat_s_t

Re: [PATCH] RISC-V:Bugfix for C++ code compilation failure with rv32imafc_zve32f[pr116883]

2024-10-10 Thread 钟居哲
LGTM from my side. But I'd rather let kito chime in to see more comments.

Thanks.


juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2024-10-10 14:24
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; pan2.li; xuli
Subject: [PATCH] RISC-V:Bugfix for C++ code compilation failure with 
rv32imafc_zve32f[pr116883]
From: xuli 
 
Example as follows:
 
int main()
{
  unsigned long arraya[128], arrayb[128], arrayc[128];
  for (int i = 0; i < 128; i++)
   {
  arraya[i] = arrayb[i] + arrayc[i];
   }
  return 0;
}
 
Compiled with -march=rv32imafc_zve32f -mabi=ilp32f, it will cause a compilation 
issue:
 
riscv_vector.h:40:25: error: ambiguating new declaration of 'vint64m4_t 
__riscv_vle64(vbool16_t, const long long int*, unsigned int)'
   40 | #pragma riscv intrinsic "vector"
  | ^~~~
riscv_vector.h:40:25: note: old declaration 'vint64m1_t 
__riscv_vle64(vbool64_t, const long long int*, unsigned int)'
 
With zvl=32b, vbool16_t is registered in init_builtins() with
type_common.precision=0x101 (nunits=2), mode_nunits[E_RVVMF16BI]=[2,2].
 
Normally, vbool64_t is only valid when TARGET_MIN_VLEN > 32, so vbool64_t
is not registered in init_builtins(), meaning vbool64_t=null.
 
In order to implement __attribute__((target("arch=+v"))), we must register
all vector types and all RVV intrinsics. Therefore, vbool64_t will be registered
by default with zvl=128b in reinit_builtins(), resulting in
type_common.precision=0x101 (nunits=2) and mode_nunits[E_RVVMF64BI]=[2,2].
 
We then get TYPE_VECTOR_SUBPARTS(vbool16_t) == TYPE_VECTOR_SUBPARTS(vbool64_t),
calculated using type_common.precision, resulting in 2. Since vbool16_t and
vbool64_t have the same element type (boolean_type), the compiler treats them
as the same type, leading to a re-declaration conflict.
 
After all types and intrinsics have been registered, processing
__attribute__((target("arch=+v"))) will update the parameters option and
init_adjust_machine_modes. Therefore, to avoid conflicts, we can choose
zvl=4096b for the null type reinit_builtins().
 
command option zvl=32b
  type nunits
  vbool64_t => null
  vbool32_t=> [1,1]
  vbool16_t=> [2,2]
  vbool8_t=>  [4,4]
  vbool4_t=>  [8,8]
  vbool2_t=>  [16,16]
  vbool1_t=>  [32,32]
 
reinit zvl=128b
  vbool64_t => [2,2] conflict with zvl32b vbool16_t=> [2,2]
reinit zvl=256b
  vbool64_t => [4,4] conflict with zvl32b vbool8_t=>  [4,4]
reinit zvl=512b
  vbool64_t => [8,8] conflict with zvl32b vbool4_t=>  [8,8]
reinit zvl=1024b
  vbool64_t => [16,16] conflict with zvl32b vbool2_t=>  [16,16]
reinit zvl=2048b
  vbool64_t => [32,32] conflict with zvl32b vbool1_t=>  [32,32]
reinit zvl=4096b
  vbool64_t => [64,64] zvl=4096b is ok
 
Signed-off-by: xuli 
 
PR target/116883
 
gcc/ChangeLog:
 
* config/riscv/riscv-c.cc (riscv_pragma_intrinsic_flags_pollute):choose 
zvl4096b to initialize null type.
 
gcc/testsuite/ChangeLog:
 
* g++.target/riscv/rvv/base/pr116883.C: New test.
---
gcc/config/riscv/riscv-c.cc   |  7 ++-
.../g++.target/riscv/rvv/base/pr116883.C  | 15 +++
2 files changed, 21 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr116883.C
 
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 71112d9c66d..c59f408d3a8 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -59,7 +59,12 @@ riscv_pragma_intrinsic_flags_pollute (struct 
pragma_intrinsic_flags *flags)
   riscv_zvl_flags = riscv_zvl_flags
 | MASK_ZVL32B
 | MASK_ZVL64B
-| MASK_ZVL128B;
+| MASK_ZVL128B
+| MASK_ZVL256B
+| MASK_ZVL512B
+| MASK_ZVL1024B
+| MASK_ZVL2048B
+| MASK_ZVL4096B;
   riscv_vector_elen_flags = riscv_vector_elen_flags
 | MASK_VECTOR_ELEN_32
diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/pr116883.C 
b/gcc/testsuite/g++.target/riscv/rvv/base/pr116883.C
new file mode 100644
index 000..15bbec40bdd
--- /dev/null
+++ b/gcc/testsuite/g++.target/riscv/rvv/base/pr116883.C
@@ -0,0 +1,15 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv32imafc_zve32f -mabi=ilp32f" } */
+
+#include 
+
+int main()
+{
+  unsigned long arraya[128], arrayb[128], arrayc[128];
+  for (int i; i < 128; i++)
+   {
+  arraya[i] = arrayb[i] + arrayc[i];
+   }
+  return 0;
+}
-- 
2.17.1
 
 


Re: [PATCH] RISC-V:Bugfix for C++ code compilation failure with rv32imafc_zve32f[pr116883]

2024-10-10 Thread Kito Cheng
LGTM

On Thu, Oct 10, 2024 at 4:52 PM 钟居哲  wrote:
>
> LGTM from my side. But I'd rather let kito chime in to see more comments.
>
> Thanks.
> 
> juzhe.zh...@rivai.ai
>
>
> From: Li Xu
> Date: 2024-10-10 14:24
> To: gcc-patches
> CC: kito.cheng; palmer; juzhe.zhong; pan2.li; xuli
> Subject: [PATCH] RISC-V:Bugfix for C++ code compilation failure with 
> rv32imafc_zve32f[pr116883]
> From: xuli 
>
> Example as follows:
>
> int main()
> {
>   unsigned long arraya[128], arrayb[128], arrayc[128];
>   for (int i = 0; i < 128; i++)
>{
>   arraya[i] = arrayb[i] + arrayc[i];
>}
>   return 0;
> }
>
> Compiled with -march=rv32imafc_zve32f -mabi=ilp32f, it will cause a 
> compilation issue:
>
> riscv_vector.h:40:25: error: ambiguating new declaration of 'vint64m4_t 
> __riscv_vle64(vbool16_t, const long long int*, unsigned int)'
>40 | #pragma riscv intrinsic "vector"
>   | ^~~~
> riscv_vector.h:40:25: note: old declaration 'vint64m1_t 
> __riscv_vle64(vbool64_t, const long long int*, unsigned int)'
>
> With zvl=32b, vbool16_t is registered in init_builtins() with
> type_common.precision=0x101 (nunits=2), mode_nunits[E_RVVMF16BI]=[2,2].
>
> Normally, vbool64_t is only valid when TARGET_MIN_VLEN > 32, so vbool64_t
> is not registered in init_builtins(), meaning vbool64_t=null.
>
> In order to implement __attribute__((target("arch=+v"))), we must register
> all vector types and all RVV intrinsics. Therefore, vbool64_t will be 
> registered
> by default with zvl=128b in reinit_builtins(), resulting in
> type_common.precision=0x101 (nunits=2) and mode_nunits[E_RVVMF64BI]=[2,2].
>
> We then get TYPE_VECTOR_SUBPARTS(vbool16_t) == 
> TYPE_VECTOR_SUBPARTS(vbool64_t),
> calculated using type_common.precision, resulting in 2. Since vbool16_t and
> vbool64_t have the same element type (boolean_type), the compiler treats them
> as the same type, leading to a re-declaration conflict.
>
> After all types and intrinsics have been registered, processing
> __attribute__((target("arch=+v"))) will update the parameters option and
> init_adjust_machine_modes. Therefore, to avoid conflicts, we can choose
> zvl=4096b for the null type reinit_builtins().
>
> command option zvl=32b
>   type nunits
>   vbool64_t => null
>   vbool32_t=> [1,1]
>   vbool16_t=> [2,2]
>   vbool8_t=>  [4,4]
>   vbool4_t=>  [8,8]
>   vbool2_t=>  [16,16]
>   vbool1_t=>  [32,32]
>
> reinit zvl=128b
>   vbool64_t => [2,2] conflict with zvl32b vbool16_t=> [2,2]
> reinit zvl=256b
>   vbool64_t => [4,4] conflict with zvl32b vbool8_t=>  [4,4]
> reinit zvl=512b
>   vbool64_t => [8,8] conflict with zvl32b vbool4_t=>  [8,8]
> reinit zvl=1024b
>   vbool64_t => [16,16] conflict with zvl32b vbool2_t=>  [16,16]
> reinit zvl=2048b
>   vbool64_t => [32,32] conflict with zvl32b vbool1_t=>  [32,32]
> reinit zvl=4096b
>   vbool64_t => [64,64] zvl=4096b is ok
>
> Signed-off-by: xuli 
>
> PR target/116883
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-c.cc (riscv_pragma_intrinsic_flags_pollute):choose 
> zvl4096b to initialize null type.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/riscv/rvv/base/pr116883.C: New test.
> ---
> gcc/config/riscv/riscv-c.cc   |  7 ++-
> .../g++.target/riscv/rvv/base/pr116883.C  | 15 +++
> 2 files changed, 21 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr116883.C
>
> diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
> index 71112d9c66d..c59f408d3a8 100644
> --- a/gcc/config/riscv/riscv-c.cc
> +++ b/gcc/config/riscv/riscv-c.cc
> @@ -59,7 +59,12 @@ riscv_pragma_intrinsic_flags_pollute (struct 
> pragma_intrinsic_flags *flags)
>riscv_zvl_flags = riscv_zvl_flags
>  | MASK_ZVL32B
>  | MASK_ZVL64B
> -| MASK_ZVL128B;
> +| MASK_ZVL128B
> +| MASK_ZVL256B
> +| MASK_ZVL512B
> +| MASK_ZVL1024B
> +| MASK_ZVL2048B
> +| MASK_ZVL4096B;
>riscv_vector_elen_flags = riscv_vector_elen_flags
>  | MASK_VECTOR_ELEN_32
> diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/pr116883.C 
> b/gcc/testsuite/g++.target/riscv/rvv/base/pr116883.C
> new file mode 100644
> index 000..15bbec40bdd
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/riscv/rvv/base/pr116883.C
> @@ -0,0 +1,15 @@
> +/* Test that we do not have ice when compile */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32imafc_zve32f -mabi=ilp32f" } */
> +
> +#include 
> +
> +int main()
> +{
> +  unsigned long arraya[128], arrayb[128], arrayc[128];
> +  for (int i; i < 128; i++)
> +   {
> +  arraya[i] = arrayb[i] + arrayc[i];
> +   }
> +  return 0;
> +}
> --
> 2.17.1
>
>


Re: [PATCH] phiopt: Remove candorest variable return instead

2024-10-10 Thread Richard Biener



> Am 10.10.2024 um 17:23 schrieb Andrew Pinski :
> 
> After r15-3560-gb081e6c860eb9688d24365d39, the setting of candorest
> with the break can just change to a return since this is inside a lambda now.
> 
> Bootstrapped and tested on x86_64-linux-gnu.

Ok

Richard 

> gcc/ChangeLog:
> 
>* tree-ssa-phiopt.cc (pass_phiopt::execute): Remove candorest
>and return instead of setting candorest.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/tree-ssa-phiopt.cc | 7 +--
> 1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 43b65b362a3..f3ee3a80c0f 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -4322,7 +4322,6 @@ pass_phiopt::execute (function *)
>}
> 
>   gimple_stmt_iterator gsi;
> -  bool candorest = true;
> 
>   /* Check that we're looking for nested phis.  */
>   basic_block merge = diamond_p ? EDGE_SUCC (bb2, 0)->dest : bb2;
> @@ -4338,15 +4337,11 @@ pass_phiopt::execute (function *)
>tree arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
>if (value_replacement (bb, bb1, e1, e2, phi, arg0, arg1) == 2)
>  {
> -candorest = false;
>cfgchanged = true;
> -break;
> +return;
>  }
>  }
> 
> -  if (!candorest)
> -return;
> -
>   gphi *phi = single_non_singleton_phi_for_edges (phis, e1, e2);
>   if (!phi)
>return;
> --
> 2.34.1
> 


Re: Fix PR116650: check all regs in regrename targets

2024-10-10 Thread Richard Biener



> Am 10.10.2024 um 16:56 schrieb Michael Matz :
> 
> (this came up for m68k vs. LRA, but is a generic problem)
> 
> Regrename wants to use new registers for certain def-use chains.
> For validity of replacements it needs to check that the selected
> candidates are unused up to then.  That's done in check_new_reg_p.
> But if it so happens that the new register needs more hardregs
> than the old register (which happens if the target allows inter-bank
> moves and the mode is something like a DFmode that needs to be placed
> into a SImode reg-pair), then check_new_reg_p only checks the
> first of those registers for free-ness.
> 
> This is caused by that function looking up the number of necessary
> hardregs only in terms of the old hardreg number.  It of course needs
> to do that in terms of the new candidate regnumber.  The symptom is that
> regrename sometimes clobbers the higher numbered registers of such a
> regrename target pair.  This patch fixes that problem.
> 
> (In the particular case of the bug report it was LRA that left over a
> inter-bank move instruction that triggers regrename, ultimately causing
> the mis-compile.  Reload didn't do that, but in general we of course
> can't rely on such moves not happening if the target allows them.)
> 
> This also shows a general confusion in that function and the target hook
> interface here:
> 
>  for (i = nregs - 1; i >= 0; --)
>...
>|| ! HARD_REGNO_RENAME_OK (reg + i, new_reg + i))
> 
> it uses nregs in a way that requires it to be the same between old and
> new register.  The problem is that the target hook only gets register
> numbers, when it instead should get a mode and register numbers and
> would be called only for the first but not for subsequent registers.
> I've looked at a number of definitions of that target hook and I think
> that this is currently harmless in the sense that it would merely rule
> out some potential reg-renames that would in fact be okay to do.  So I'm
> not changing the target hook interface here and hence that problem
> remains unfixed.

Can you please open a bugreport tracking this?

The patch is OK.

Thanks,
Richard 

>PR rtl-optimization/116650
>* regrename.cc (check_new_reg_p): Calculate nregs in terms of
>the new candidate register.
> ---
> 
> Regstrapped on x86-64-linux, okay for trunk?
> 
> 
> Ciao,
> Michael.
> 
> ---
> gcc/regrename.cc | 25 +++--
> 1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/regrename.cc b/gcc/regrename.cc
> index 054e601740b..22668d7bf57 100644
> --- a/gcc/regrename.cc
> +++ b/gcc/regrename.cc
> @@ -324,10 +324,27 @@ static bool
> check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
> class du_head *this_head, HARD_REG_SET this_unavailable)
> {
> -  int nregs = this_head->nregs;
> +  int nregs = 1;
>   int i;
>   struct du_chain *tmp;
> 
> +  /* See whether new_reg accepts all modes that occur in
> + definition and uses and record the number of regs it would take.  */
> +  for (tmp = this_head->first; tmp; tmp = tmp->next_use)
> +{
> +  int n;
> +  /* Completely ignore DEBUG_INSNs, otherwise we can get
> + -fcompare-debug failures.  */
> +  if (DEBUG_INSN_P (tmp->insn))
> +continue;
> +
> +  if (!targetm.hard_regno_mode_ok (new_reg, GET_MODE (*tmp->loc)))
> +return false;
> +  n = hard_regno_nregs (new_reg, GET_MODE (*tmp->loc));
> +  if (n > nregs)
> +nregs = n;
> +}
> +
>   for (i = nregs - 1; i >= 0; --i)
> if (TEST_HARD_REG_BIT (this_unavailable, new_reg + i)
>|| fixed_regs[new_reg + i]
> @@ -348,14 +365,10 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
>  definition and uses.  */
>   for (tmp = this_head->first; tmp; tmp = tmp->next_use)
> {
> -  /* Completely ignore DEBUG_INSNs, otherwise we can get
> - -fcompare-debug failures.  */
>   if (DEBUG_INSN_P (tmp->insn))
>continue;
> 
> -  if (!targetm.hard_regno_mode_ok (new_reg, GET_MODE (*tmp->loc))
> -  || call_clobbered_in_chain_p (this_head, GET_MODE (*tmp->loc),
> -new_reg))
> +  if (call_clobbered_in_chain_p (this_head, GET_MODE (*tmp->loc), 
> new_reg))
>return false;
> }
> 
> --
> 2.39.1


Re: Fix PR116650: check all regs in regrename targets

2024-10-10 Thread Michael Matz
Hello again,

On Thu, 10 Oct 2024, Michael Matz wrote:

> > Can you please open a bugreport tracking this?
> 
> PR116850.

Gah, too many tabs :)  PR117064 I meant.


Ciao,
Michael.


Re: Fix PR116650: check all regs in regrename targets

2024-10-10 Thread Michael Matz
Hi,

On Thu, 10 Oct 2024, Richard Biener wrote:

> > This also shows a general confusion in that function and the target hook
> > interface here:
> > 
> >  for (i = nregs - 1; i >= 0; --)
> >...
> >|| ! HARD_REGNO_RENAME_OK (reg + i, new_reg + i))
> 
> Can you please open a bugreport tracking this?

PR116850.

> The patch is OK.

85bee4f7, thanks.


Ciao,
Michael.


Re: [PATCH] RISC-V:Bugfix for C++ code compilation failure with rv32imafc_zve32f[pr116883]

2024-10-10 Thread Jeff Law




On 10/10/24 12:24 AM, Li Xu wrote:

From: xuli

Example as follows:

int main()
{
   unsigned long arraya[128], arrayb[128], arrayc[128];
   for (int i = 0; i < 128; i++)
{
   arraya[i] = arrayb[i] + arrayc[i];
}
   return 0;
}

Compiled with -march=rv32imafc_zve32f -mabi=ilp32f, it will cause a compilation 
issue:

riscv_vector.h:40:25: error: ambiguating new declaration of 'vint64m4_t 
__riscv_vle64(vbool16_t, const long long int*, unsigned int)'
40 | #pragma riscv intrinsic "vector"
   | ^~~~
riscv_vector.h:40:25: note: old declaration 'vint64m1_t 
__riscv_vle64(vbool64_t, const long long int*, unsigned int)'

With zvl=32b, vbool16_t is registered in init_builtins() with
type_common.precision=0x101 (nunits=2), mode_nunits[E_RVVMF16BI]=[2,2].

Normally, vbool64_t is only valid when TARGET_MIN_VLEN > 32, so vbool64_t
is not registered in init_builtins(), meaning vbool64_t=null.

In order to implement __attribute__((target("arch=+v"))), we must register
all vector types and all RVV intrinsics. Therefore, vbool64_t will be registered
by default with zvl=128b in reinit_builtins(), resulting in
type_common.precision=0x101 (nunits=2) and mode_nunits[E_RVVMF64BI]=[2,2].

We then get TYPE_VECTOR_SUBPARTS(vbool16_t) == TYPE_VECTOR_SUBPARTS(vbool64_t),
calculated using type_common.precision, resulting in 2. Since vbool16_t and
vbool64_t have the same element type (boolean_type), the compiler treats them
as the same type, leading to a re-declaration conflict.

After all types and intrinsics have been registered, processing
__attribute__((target("arch=+v"))) will update the parameters option and
init_adjust_machine_modes. Therefore, to avoid conflicts, we can choose
zvl=4096b for the null type reinit_builtins().

command option zvl=32b
   type nunits
   vbool64_t => null
   vbool32_t=> [1,1]
   vbool16_t=> [2,2]
   vbool8_t=>  [4,4]
   vbool4_t=>  [8,8]
   vbool2_t=>  [16,16]
   vbool1_t=>  [32,32]

reinit zvl=128b
   vbool64_t => [2,2] conflict with zvl32b vbool16_t=> [2,2]
reinit zvl=256b
   vbool64_t => [4,4] conflict with zvl32b vbool8_t=>  [4,4]
reinit zvl=512b
   vbool64_t => [8,8] conflict with zvl32b vbool4_t=>  [8,8]
reinit zvl=1024b
   vbool64_t => [16,16] conflict with zvl32b vbool2_t=>  [16,16]
reinit zvl=2048b
   vbool64_t => [32,32] conflict with zvl32b vbool1_t=>  [32,32]
reinit zvl=4096b
   vbool64_t => [64,64] zvl=4096b is ok

Signed-off-by: xuli

PR target/116883

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_pragma_intrinsic_flags_pollute):choose 
zvl4096b to initialize null type.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/pr116883.C: New test.
Wrapped the overly long line in the ChangeLog and pushed this to the 
trunk (pre-commit CI passed cleanly).


jeff



Fix PR116650: check all regs in regrename targets

2024-10-10 Thread Michael Matz
(this came up for m68k vs. LRA, but is a generic problem)

Regrename wants to use new registers for certain def-use chains.
For validity of replacements it needs to check that the selected
candidates are unused up to then.  That's done in check_new_reg_p.
But if it so happens that the new register needs more hardregs
than the old register (which happens if the target allows inter-bank
moves and the mode is something like a DFmode that needs to be placed
into a SImode reg-pair), then check_new_reg_p only checks the
first of those registers for free-ness.

This is caused by that function looking up the number of necessary
hardregs only in terms of the old hardreg number.  It of course needs
to do that in terms of the new candidate regnumber.  The symptom is that
regrename sometimes clobbers the higher numbered registers of such a
regrename target pair.  This patch fixes that problem.

(In the particular case of the bug report it was LRA that left over a
inter-bank move instruction that triggers regrename, ultimately causing
the mis-compile.  Reload didn't do that, but in general we of course
can't rely on such moves not happening if the target allows them.)

This also shows a general confusion in that function and the target hook
interface here:

  for (i = nregs - 1; i >= 0; --)
...
|| ! HARD_REGNO_RENAME_OK (reg + i, new_reg + i))

it uses nregs in a way that requires it to be the same between old and
new register.  The problem is that the target hook only gets register
numbers, when it instead should get a mode and register numbers and
would be called only for the first but not for subsequent registers.
I've looked at a number of definitions of that target hook and I think
that this is currently harmless in the sense that it would merely rule
out some potential reg-renames that would in fact be okay to do.  So I'm
not changing the target hook interface here and hence that problem
remains unfixed.

PR rtl-optimization/116650
* regrename.cc (check_new_reg_p): Calculate nregs in terms of
the new candidate register.
---

Regstrapped on x86-64-linux, okay for trunk?


Ciao,
Michael.

---
 gcc/regrename.cc | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/regrename.cc b/gcc/regrename.cc
index 054e601740b..22668d7bf57 100644
--- a/gcc/regrename.cc
+++ b/gcc/regrename.cc
@@ -324,10 +324,27 @@ static bool
 check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
 class du_head *this_head, HARD_REG_SET this_unavailable)
 {
-  int nregs = this_head->nregs;
+  int nregs = 1;
   int i;
   struct du_chain *tmp;
 
+  /* See whether new_reg accepts all modes that occur in
+ definition and uses and record the number of regs it would take.  */
+  for (tmp = this_head->first; tmp; tmp = tmp->next_use)
+{
+  int n;
+  /* Completely ignore DEBUG_INSNs, otherwise we can get
+-fcompare-debug failures.  */
+  if (DEBUG_INSN_P (tmp->insn))
+   continue;
+
+  if (!targetm.hard_regno_mode_ok (new_reg, GET_MODE (*tmp->loc)))
+   return false;
+  n = hard_regno_nregs (new_reg, GET_MODE (*tmp->loc));
+  if (n > nregs)
+   nregs = n;
+}
+
   for (i = nregs - 1; i >= 0; --i)
 if (TEST_HARD_REG_BIT (this_unavailable, new_reg + i)
|| fixed_regs[new_reg + i]
@@ -348,14 +365,10 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
  definition and uses.  */
   for (tmp = this_head->first; tmp; tmp = tmp->next_use)
 {
-  /* Completely ignore DEBUG_INSNs, otherwise we can get
--fcompare-debug failures.  */
   if (DEBUG_INSN_P (tmp->insn))
continue;
 
-  if (!targetm.hard_regno_mode_ok (new_reg, GET_MODE (*tmp->loc))
- || call_clobbered_in_chain_p (this_head, GET_MODE (*tmp->loc),
-   new_reg))
+  if (call_clobbered_in_chain_p (this_head, GET_MODE (*tmp->loc), new_reg))
return false;
 }
 
-- 
2.39.1


[PATCH] aarch64: Alter pr116258.c test to correct for big endian.

2024-10-10 Thread Richard Ball
The test at pr116258.c fails on big endian targets,
this is because the test checks that the index of a floating
point multiply is 0, which is correct only for little endian.

gcc/testsuite/ChangeLog:

PR tree-optimization/116258
* gcc.target/aarch64/pr116258.c:
Alter test to add big-endian support.diff --git a/gcc/testsuite/gcc.target/aarch64/pr116258.c b/gcc/testsuite/gcc.target/aarch64/pr116258.c
index e727ad4b72a5b8fe86e295d6e695d46203cd082e..5b63de25b7bf6dfd5f7b71cefcb27cabb42ac99e 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr116258.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr116258.c
@@ -12,6 +12,7 @@
   return (x + h(t));
 }
 
-/* { dg-final { scan-assembler-times "\\\[0\\\]" 1 } } */
+/* { dg-final { scan-assembler-times "\\\[0\\\]" 1 { target { aarch64_little_endian } } } } */
+/* { dg-final { scan-assembler-times "\\\[3\\\]" 1 { target { aarch64_big_endian } } } } */
 /* { dg-final { scan-assembler-not "dup\t" } } */
 /* { dg-final { scan-assembler-not "ins\t" } } */


Re: [RFC/RFA] [PATCH v4 01/12] Implement internal functions for efficient CRC computation

2024-10-10 Thread Mariam Arutunian
On Wed, Oct 9, 2024 at 7:45 AM Jeff Law  wrote:

>
>
> On 10/8/24 4:52 AM, Mariam Arutunian wrote:
> >
> >
> > On Sun, Sep 29, 2024 at 9:08 PM Jeff Law  > > wrote:
> >
> >
> >
> > On 9/13/24 5:05 AM, Mariam Arutunian wrote:
> >  > Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide
> > faster
> >  > CRC generation.
> >  > One performs bit-forward and the other bit-reversed CRC
> computation.
> >  > If CRC optabs are supported, they are used for the CRC
> computation.
> >  > Otherwise, table-based CRC is generated.
> >  > The supported data and CRC sizes are 8, 16, 32, and 64 bits.
> >  > The polynomial is without the leading 1.
> >  > A table with 256 elements is used to store precomputed CRCs.
> >  > For the reflection of inputs and the output, a simple algorithm
> > involving
> >  > SHIFT, AND, and OR operations is used.
> >  >
> >  > gcc/
> >  >
> >  >  * doc/md.texi (crc@var{m}@var{n}4,
> >  >  crc_rev@var{m}@var{n}4): Document.
> >  >  * expr.cc (calculate_crc): New function.
> >  >  (assemble_crc_table): Likewise.
> >  >  (generate_crc_table): Likewise.
> >  >  (calculate_table_based_CRC): Likewise.
> >  >  (emit_crc): Likewise.
> >  >  (expand_crc_table_based): Likewise.
> >  >  (gen_common_operation_to_reflect): Likewise.
> >  >  (reflect_64_bit_value): Likewise.
> >  >  (reflect_32_bit_value): Likewise.
> >  >  (reflect_16_bit_value): Likewise.
> >  >  (reflect_8_bit_value): Likewise.
> >  >  (generate_reflecting_code_standard): Likewise.
> >  >  (expand_reversed_crc_table_based): Likewise.
> >  >  * expr.h (generate_reflecting_code_standard): New function
> > declaration.
> >  >  (expand_crc_table_based): Likewise.
> >  >  (expand_reversed_crc_table_based): Likewise.
> >  >  * internal-fn.cc: (crc_direct): Define.
> >  >  (direct_crc_optab_supported_p): Likewise.
> >  >  (expand_crc_optab_fn): New function
> >  >  * internal-fn.def (CRC, CRC_REV): New internal functions.
> >  >  * optabs.def (crc_optab, crc_rev_optab): New optabs.
> > Looks pretty good.  Just one question/comment:
> >
> >  >
> >  > +void
> >  > +emit_crc (machine_mode crc_mode, rtx* crc, rtx* op0)
> >  > +{
> >  > +  if (GET_MODE_BITSIZE (crc_mode).to_constant () == 32
> >  > +  && GET_MODE_BITSIZE (word_mode) == 64)
> >  > +{
> >  > +  rtx a_low = gen_lowpart (crc_mode, *crc);
> >  > +  *crc = gen_rtx_SIGN_EXTEND (word_mode, a_low);
> >  > +}
> >  > +  rtx tgt = *op0;
> >  > +  if (word_mode != crc_mode)
> >  > +tgt = simplify_gen_subreg (word_mode, *op0, crc_mode, 0);
> >  > +  emit_move_insn (tgt, *crc);
> >  > +}
> > It seems to me that first clause ought to apply any time word mode is
> > larger than crc mode rather than making it check 32/64 magic
> constants.
> >
> >
> > When I apply it whenever the word mode is larger than crc mode, on RISC-
> > V the CRC-16 and CRC-8 tests fail.
> We should work to understand that.  Magic constants are generally to be
> avoided.  There's nothing inherent in this code where those constant
> size values should be used.
>
> This is likely pointing to a problem either in how emit_crc is handling
> those other cases or in the RISC-V expansion code.
>
>
I think the problem mainly arises from how the *original* CRC function is
compiled.
If I use *uint32_t* crc;
and write
crc = (crc << 1) ^ 0x04C11DB7;
the compiler applies *sign extension*.

But if I use *uint16_t* crc;
crc = (crc << 1) ^ 0x0DB7;
 it applies *zero **extension*.

I suspect that for unsigned values zero extension is applied, but in this
case (with uint32_t), sign extension is applied instead.

Best regards,
Mariam


> Jeff
>


Re: [PATCH 3/3] aarch64: libgcc: Add -Werror support

2024-10-10 Thread Richard Sandiford
Eric Gallager  writes:
> On Wed, Oct 9, 2024 at 4:54 AM Christophe Lyon
>  wrote:
>>
>> On Wed, 9 Oct 2024 at 03:05, Eric Gallager  wrote:
>> >
>> > On Tue, Oct 8, 2024 at 6:25 AM Richard Sandiford
>> >  wrote:
>> > >
>> > > Christophe Lyon  writes:
>> > > > When --enable-werror is enabled when running the top-level configure,
>> > > > it passes --enable-werror-always to subdirs.  Some of them, like
>> > > > libgcc, ignore it.
>> > > >
>> > > > This patch adds support for it, enabled only for aarch64, to avoid
>> > > > breaking bootstrap for other targets.
>> > > >
>> > > > The patch also adds -Wno-prio-ctor-dtor to avoid a warning when 
>> > > > compiling lse_init.c
>> > > >
>> > > >   libgcc/
>> > > >   * Makefile.in (WERROR): New.
>> > > >   * config/aarch64/t-aarch64: Handle WERROR. Always use
>> > > >   -Wno-prio-ctor-dtor.
>> > > >   * configure.ac: Add support for --enable-werror-always.
>> > > >   * configure: Regenerate.
>> > > > ---
>> > > >  libgcc/Makefile.in  |  1 +
>> > > >  libgcc/config/aarch64/t-aarch64 |  1 +
>> > > >  libgcc/configure| 31 +++
>> > > >  libgcc/configure.ac |  5 +
>> > > >  4 files changed, 38 insertions(+)
>> > > >
>> > > > [...]
>> > > > diff --git a/libgcc/configure.ac b/libgcc/configure.ac
>> > > > index 4e8c036990f..6b3ea2aea5c 100644
>> > > > --- a/libgcc/configure.ac
>> > > > +++ b/libgcc/configure.ac
>> > > > @@ -13,6 +13,7 @@ sinclude(../config/unwind_ipinfo.m4)
>> > > >  sinclude(../config/gthr.m4)
>> > > >  sinclude(../config/sjlj.m4)
>> > > >  sinclude(../config/cet.m4)
>> > > > +sinclude(../config/warnings.m4)
>> > > >
>> > > >  AC_INIT([GNU C Runtime Library], 1.0,,[libgcc])
>> > > >  AC_CONFIG_SRCDIR([static-object.mk])
>> > > > @@ -746,6 +747,10 @@ AC_SUBST(HAVE_STRUB_SUPPORT)
>> > > >  # Determine what GCC version number to use in filesystem paths.
>> > > >  GCC_BASE_VER
>> > > >
>> > > > +# Only enable with --enable-werror-always until existing warnings are
>> > > > +# corrected.
>> > > > +ACX_PROG_CC_WARNINGS_ARE_ERRORS([manual])
>> > >
>> > > It looks like this is borrowed from libcpp and/or libdecnumber.
>> > > Those are a bit different from libgcc in that they're host libraries
>> > > that can be built with any supported compiler (including non-GCC ones).
>> > > In constrast, libgcc can only be built with the corresponding version
>> > > of GCC.  The usual restrictions on -Werror -- only use it during stages
>> > > 2 and 3, or if the user explicitly passes --enable-werror -- don't apply
>> > > in libgcc's case.  We should always be building with the "right" version
>> > > of GCC (even for Canadian crosses) and so should always be able to use
>> > > -Werror.
>> > >
>> > > So personally, I think we should just go with:
>> > >
>> > > diff --git a/libgcc/config/aarch64/t-aarch64 
>> > > b/libgcc/config/aarch64/t-aarch64
>> > > index b70e7b94edd..ae1588ce307 100644
>> > > --- a/libgcc/config/aarch64/t-aarch64
>> > > +++ b/libgcc/config/aarch64/t-aarch64
>> > > @@ -30,3 +30,4 @@ LIB2ADDEH += \
>> > > $(srcdir)/config/aarch64/__arm_za_disable.S
>> > >
>> > >  SHLIB_MAPFILES += $(srcdir)/config/aarch64/libgcc-sme.ver
>> > > +LIBGCC2_CFLAGS += $(WERROR) -Wno-prio-ctor-dtor
>> > >
>> > > ...this, but with $(WERROR) replaced by -Werror.
>> > >
>> > > At least, it would be a good way of finding out if there's a case
>> > > I've forgotten :)
>> > >
>> > > Let's see what others think though.
>> >
>> > I think it would be worthwhile to test this assumption first; I have a
>> > vague memory of having seen warnings in libgcc previously that would
>> > presumably get turned into errors if -Werror were applied
>> > unconditionally...
>> >
>> Sorry, it's not clear to me what you mean by "test this assumption" ?
>> Do you mean I should push the patch with unconditional -Werror and
>> monitor what happens for a while?
>> Or investigate more / other targets?
>> Or wait for others to commit?
>>
>
> I mean, I think we should try the original approach of having it be
> enableable manually first, let some people test by enabling it
> manually, and then if they all report back success, then we can go
> ahead with the unconditional -Werror version of it.

I can see the attraction.  But that would mean adding code only to take
it out later.  Plus, the original patch would enable -Werror for stages
2 and 3 anyway, so everyone who bootstraps would still be testing the
-Werror path, whether they'd chosen to or not.

My point is that stage 1 isn't really a different case from stages
2 and 3 for libgcc, since in all three cases, it's the recently built
gcc that is being used to build libgcc.  libgcc also gets the majority
of its command-line flags directly from the gcc directory (via
libgcc.mvars), rather than from libgcc's own configure line.

How about going for an unconditional -Werror (for aarch64 only), but with
a pre-approval for anyone with commit access to revert it if it bre

Re: [PATCH][PR113816] AArch64: Use SIMD+GPR for logical vector reductions

2024-10-10 Thread Richard Sandiford
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Thursday, October 10, 2024 8:08 PM
>> To: Jennifer Schmitz 
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw ;
>> Kyrylo Tkachov ; Tamar Christina
>> ; rguent...@suse.de
>> Subject: Re: [PATCH][PR113816] AArch64: Use SIMD+GPR for logical vector
>> reductions
>> 
>> Jennifer Schmitz  writes:
>> > This patch implements the optabs reduc_and_scal_,
>> > reduc_ior_scal_, and reduc_xor_scal_ for ASIMD modes V8QI,
>> > V16QI, V4HI, and V8HI for TARGET_SIMD to improve codegen for bitwise 
>> > logical
>> > vector reduction operations.
>> > Previously, either only vector registers or only general purpose registers 
>> > (GPR)
>> > were used. Now, vector registers are used for the reduction from 128 to 64 
>> > bits;
>> > 64-bit GPR are used for the reduction from 64 to 32 bits; and 32-bit GPR 
>> > are
>> used
>> > for the rest of the reduction steps.
>> >
>> > For example, the test case (V8HI)
>> > int16_t foo (int16_t *a)
>> > {
>> >   int16_t b = -1;
>> >   for (int i = 0; i < 8; ++i)
>> > b &= a[i];
>> >   return b;
>> > }
>> >
>> > was previously compiled to (-O2):
>> > foo:
>> >ldr q0, [x0]
>> >moviv30.4s, 0
>> >ext v29.16b, v0.16b, v30.16b, #8
>> >and v29.16b, v29.16b, v0.16b
>> >ext v31.16b, v29.16b, v30.16b, #4
>> >and v31.16b, v31.16b, v29.16b
>> >ext v30.16b, v31.16b, v30.16b, #2
>> >and v30.16b, v30.16b, v31.16b
>> >umovw0, v30.h[0]
>> >ret
>> >
>> > With patch, it is compiled to:
>> > foo:
>> >ldr q31, [x0]
>> >ext v30.16b, v31.16b, v31.16b, #8
>> >and v31.8b, v30.8b, v31.8b
>> >fmovx0, d31
>> >and x0, x0, x0, lsr 32
>> >and w0, w0, w0, lsr 16
>> >ret
>> >
>> > For modes V4SI and V2DI, the pattern was not implemented, because the
>> > current codegen (using only base instructions) is already efficient.
>> >
>> > Note that the PR initially suggested to use SVE reduction ops. However,
>> > they have higher latency than the proposed sequence, which is why using
>> > neon and base instructions is preferable.
>> >
>> > Test cases were added for 8/16-bit integers for all implemented modes and 
>> > all
>> > three operations to check the produced assembly.
>> >
>> > We also added [istarget aarch64*-*-*] to the selector vect_logical_reduc,
>> > because for aarch64 vector types, either the logical reduction optabs are
>> > implemented or the codegen for reduction operations is good as it is.
>> > This was motivated by failure of a scan-tree-dump directive in the test 
>> > cases
>> > gcc.dg/vect/vect-reduc-or_1.c and gcc.dg/vect/vect-reduc-or_2.c.
>> >
>> > The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>> > regression.
>> > OK for mainline?
>> >
>> > Signed-off-by: Jennifer Schmitz 
>> >
>> > gcc/
>> >PR target/113816
>> >* config/aarch64/aarch64-simd.md (reduc__scal_):
>> >Implement for logical bitwise operations for VDQV_E.
>> >
>> > gcc/testsuite/
>> >PR target/113816
>> >* lib/target-supports.exp (vect_logical_reduc): Add aarch64*.
>> >* gcc.target/aarch64/simd/logical_reduc.c: New test.
>> >* gcc.target/aarch64/vect-reduc-or_1.c: Adjust expected outcome.
>> > ---
>> >  gcc/config/aarch64/aarch64-simd.md|  55 +
>> >  .../gcc.target/aarch64/simd/logical_reduc.c   | 208 ++
>> >  .../gcc.target/aarch64/vect-reduc-or_1.c  |   2 +-
>> >  gcc/testsuite/lib/target-supports.exp |   4 +-
>> >  4 files changed, 267 insertions(+), 2 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/logical_reduc.c
>> >
>> > diff --git a/gcc/config/aarch64/aarch64-simd.md
>> b/gcc/config/aarch64/aarch64-simd.md
>> > index 23c03a96371..00286b8b020 100644
>> > --- a/gcc/config/aarch64/aarch64-simd.md
>> > +++ b/gcc/config/aarch64/aarch64-simd.md
>> > @@ -3608,6 +3608,61 @@
>> >}
>> >  )
>> >
>> > +;; Emit a sequence for bitwise logical reductions over vectors for V8QI, 
>> > V16QI,
>> > +;; V4HI, and V8HI modes.  The reduction is achieved by iteratively 
>> > operating
>> > +;; on the two halves of the input.
>> > +;; If the input has 128 bits, the first operation is performed in vector
>> > +;; registers.  From 64 bits down, the reduction steps are performed in 
>> > general
>> > +;; purpose registers.
>> > +;; For example, for V8HI and operation AND, the intended sequence is:
>> > +;; EXT  v1.16b, v0.16b, v0.16b, #8
>> > +;; AND  v0.8b, v1.8b, v0.8b
>> > +;; FMOV x0, d0
>> > +;; AND  x0, x0, x0, 32
>> > +;; AND  w0, w0, w0, 16
>> > +;;
>> > +;; For V8QI and operation AND, the sequence is:
>> > +;; AND  x0, x0, x0, lsr 32
>> > +;; AND  w0, w0, w0, lsr, 16
>> > +;; AND  w0, w0, w0, lsr, 8
>> > +
>> > +(define_expand "reduc__scal_"
>> > + [(match_operand: 0 "register_operand")
>> > +  (LOGICAL:VDQV_E (match_operand:VDQV_E 1 "register_operand"))]
>> > +  "TARGET_SIMD"

Re: [PATCH 4/4] c++: enable modules by default in c++20

2024-10-10 Thread Jason Merrill

On 10/9/24 7:06 PM, Patrick Palka wrote:

On Wed, 9 Oct 2024, Jason Merrill wrote:


Tested x86_64-pc-linux-gnu, will apply to trunk with the rest of the patch
series.

-- 8< --

At this point there doesn't seem to be much reason not to have modules
support enabled by default in C++20, and it's good get more test coverage to
find corner case bugs like some I fixed recently.


Not sure how much we care about PCH anymore, but won't this effectively
disable PCH in C++20 and later due to

   /* C++ modules and PCH don't play together.  */
   if (flag_modules)
 return 2;

in c_common_valid_pch?


Yes; switching from PCH to C++20 header units should be fairly seamless.

But that's why I add -fno-modules for the PCH tests.

Jason