[gcc r12-10847] c++: Don't reject pointer to virtual method during constant evaluation [PR117615]

2024-12-05 Thread Simon Martin via Gcc-cvs
https://gcc.gnu.org/g:ae8d9d2b40aa7fd6a455beda38ff1b3c21728c31

commit r12-10847-gae8d9d2b40aa7fd6a455beda38ff1b3c21728c31
Author: Simon Martin 
Date:   Tue Dec 3 14:30:43 2024 +0100

c++: Don't reject pointer to virtual method during constant evaluation 
[PR117615]

We currently reject the following valid code:

=== cut here ===
struct Base {
virtual void doit (int v) const {}
};
struct Derived : Base {
void doit (int v) const {}
};
using fn_t = void (Base::*)(int) const;
struct Helper {
fn_t mFn;
constexpr Helper (auto && fn) : mFn(static_cast(fn)) {}
};
void foo () {
constexpr Helper h (&Derived::doit);
}
=== cut here ===

The problem is that since r6-4014-gdcdbc004d531b4, &Derived::doit is
represented with an expression with type pointer to method and using an
INTEGER_CST (here 1), and that cxx_eval_constant_expression rejects any
such expression with a non-null INTEGER_CST.

This patch uses the same strategy as r12-4491-gf45610a45236e9 (fix for
PR c++/102786), and simply lets such expressions go through.

PR c++/117615

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Don't reject
INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-virtual22.C: New test.

(cherry picked from commit 72a2380a306a1c3883cb7e4f99253522bc265af0)

Diff:
---
 gcc/cp/constexpr.cc  |  6 ++
 gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C | 22 ++
 2 files changed, 28 insertions(+)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 20abbee3600e..6c8d8ab17f29 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -7353,6 +7353,12 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
return t;
  }
  }
+   else if (TYPE_PTR_P (type)
+   && TREE_CODE (TREE_TYPE (type)) == METHOD_TYPE)
+ /* INTEGER_CST with pointer-to-method type is only used
+for a virtual method in a pointer to member function.
+Don't reject those.  */
+ ;
else
  {
/* This detects for example:
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C
new file mode 100644
index ..89330bf86200
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C
@@ -0,0 +1,22 @@
+// PR c++/117615
+// { dg-do "compile" { target c++20 } }
+
+struct Base {
+virtual void doit (int v) const {}
+};
+
+struct Derived : Base {
+void doit (int v) const {}
+};
+
+using fn_t = void (Base::*)(int) const;
+
+struct Helper {
+fn_t mFn;
+constexpr Helper (auto && fn) : mFn(static_cast(fn)) {}
+};
+
+void foo () {
+constexpr Helper h (&Derived::doit);
+constexpr Helper h2 (&Base::doit);
+}


[gcc r15-5932] Allow limited extended asm at toplevel [PR41045]

2024-12-05 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:ca4d6285974817080d3488b293c4970a8231372b

commit r15-5932-gca4d6285974817080d3488b293c4970a8231372b
Author: Jakub Jelinek 
Date:   Thu Dec 5 09:25:06 2024 +0100

Allow limited extended asm at toplevel [PR41045]

In the Cauldron IPA/LTO BoF we've discussed toplevel asms and it was
discussed it would be nice to tell the compiler something about what
the toplevel asm does.  Sure, I'm aware the kernel people said they
aren't willing to use something like that, but perhaps other projects
do.  And for kernel perhaps we should add some new option which allows
some dumb parsing of the toplevel asms and gather something from that
parsing.

The following patch is just a small step towards that, namely, allow
some subset of extended inline asm outside of functions.
The patch is unfinished, LTO streaming (out/in) of the ASM_EXPRs isn't
implemented (it emits a sorry diagnostics), nor any cgraph/varpool
changes to find out references etc.

The patch allows something like:

int a[2], b;
enum { E1, E2, E3, E4, E5 };
struct S { int a; char b; long long c; };
asm (".section blah; .quad %P0, %P1, %P2, %P3, %P4; .previous"
 : : "m" (a), "m" (b), "i" (42), "i" (E4), "i" (sizeof (struct S)));

Even for non-LTO, that could be useful e.g. for getting enumerators from
C/C++ as integers into the toplevel asm, or sizeof/offsetof etc.

The restrictions I've implemented are:
1) asm qualifiers aren't still allowed, so asm goto or asm inline can't be
   specified at toplevel, asm volatile has the volatile ignored for C++ with
   a warning and is an error in C like before
2) I see good use for mainly input operands, output maybe to make it clear
   that the inline asm may write some memory, I don't see a good use for
   clobbers, so the patch doesn't allow those (and of course labels because
   asm goto can't be specified)
3) the patch allows only constraints which don't allow registers, so
   typically "m" or "i" or other memory or immediate constraints; for
   memory, it requires that the operand is addressable and its address
   could be used in static var initializer (so that no code actually
   needs to be emitted for it), for others that they are constants usable
   in the static var initializers
4) the patch disallows + (there is no reload of the operands, so I don't
   see benefits of tying some operands together), nor % (who cares if
   something is commutative in this case), or & (again, no code is emitted
   around the asm), nor the 0-9 constraints

Right now there is no way to tell the compiler that the inline asm defines
some symbol, that is implemented in a later patch, as : constraint.

Similarly, the c modifier doesn't work in all cases and the cc modifier
is implemented separately.

2024-12-05  Jakub Jelinek  

PR c/41045
gcc/
* output.h (insn_noperands): Declare.
* final.cc (insn_noperands): No longer static.
* varasm.cc (assemble_asm): Handle ASM_EXPR.
* lto-streamer-out.cc (lto_output_toplevel_asms): Add sorry_at
for non-STRING_CST toplevel asm for now.
* doc/extend.texi (Basic @code{asm}, Extended @code{asm}): Document
that extended asm is now allowed outside of functions with certain
restrictions.
gcc/c/
* c-parser.cc (c_parser_asm_string_literal): Add forward 
declaration.
(c_parser_asm_definition): Parse also extended asm without
clobbers/labels.
* c-typeck.cc (build_asm_expr): Allow extended asm outside of
functions and check extra restrictions.
gcc/cp/
* cp-tree.h (finish_asm_stmt): Add TOPLEV_P argument.
* parser.cc (cp_parser_asm_definition): Parse also extended asm
without clobbers/labels outside of functions.
* semantics.cc (finish_asm_stmt): Add TOPLEV_P argument, if set,
check extra restrictions for extended asm outside of functions.
* pt.cc (tsubst_stmt): Adjust finish_asm_stmt caller.
gcc/testsuite/
* c-c++-common/toplevel-asm-1.c: New test.
* c-c++-common/toplevel-asm-2.c: New test.
* c-c++-common/toplevel-asm-3.c: New test.

Diff:
---
 gcc/c/c-parser.cc   |  67 ++-
 gcc/c/c-typeck.cc   |  56 ++
 gcc/cp/cp-tree.h|   2 +-
 gcc/cp/parser.cc|  15 ++-
 gcc/cp/pt.cc|   2 +-
 gcc/cp/semantics.cc |  92 +++-
 gcc/doc/extend.texi |  32 +++---
 gcc/final.cc|   2 +-
 gcc/lto-streamer-out.cc |   7 ++
 gcc/output.h   

[gcc r15-5944] Match: Refactor the unsigned SAT_TRUNC match patterns [NFC]

2024-12-05 Thread Pan Li via Gcc-cvs
https://gcc.gnu.org/g:9163d16e4f56ced25839ff246c56e166ae62e962

commit r15-5944-g9163d16e4f56ced25839ff246c56e166ae62e962
Author: Pan Li 
Date:   Thu Dec 5 09:19:39 2024 +0800

Match: Refactor the unsigned SAT_TRUNC match patterns [NFC]

This patch would like to refactor the all unsigned SAT_TRUNC patterns,
aka:
* Extract type check outside.
* Re-arrange the related match pattern forms together.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Refactor sorts of unsigned SAT_TRUNC match patterns.

Signed-off-by: Pan Li 

Diff:
---
 gcc/match.pd | 112 +++
 1 file changed, 52 insertions(+), 60 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index fd1d8bcc7763..650c3f4cc1df 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3262,6 +3262,58 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 }
 (if (wi::eq_p (sum, wi::uhwi (0, precision
 
+(if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type))
+ (match (unsigned_integer_sat_trunc @0)
+  /* SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1)))  */
+  (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1))) (convert @0))
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
+   (with
+{
+ unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
+ unsigned otype_precision = TYPE_PRECISION (type);
+ wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
+ wide_int int_cst = wi::to_wide (@1, itype_precision);
+}
+(if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
int_cst))
+ (match (unsigned_integer_sat_trunc @0)
+  /* SAT_U_TRUNC = (NT)(MIN_EXPR (X, IMM))
+ If Op_0 def is MIN_EXPR and not single_use.  Aka below pattern:
+
+ _18 = MIN_EXPR ; // op_0 def
+ iftmp.0_11 = (unsigned int) _18; // op_0
+ stream.avail_out = iftmp.0_11;
+ left_37 = left_8 - _18;  // op_0 use
+
+ Transfer to .SAT_TRUNC will have MIN_EXPR still live.  Then the backend
+ (for example x86/riscv) will have 2-3 more insns generation for .SAT_TRUNC
+ besides the MIN_EXPR.  Thus,  keep the normal truncation as is should be
+ the better choose.  */
+  (convert (min@2 @0 INTEGER_CST@1))
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0)) && single_use (@2))
+   (with
+{
+ unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
+ unsigned otype_precision = TYPE_PRECISION (type);
+ wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
+ wide_int int_cst = wi::to_wide (@1, itype_precision);
+}
+(if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
int_cst))
+ (match (unsigned_integer_sat_trunc @0)
+  /* SAT_U_TRUNC = (NT)X | ((NT)(X <= (WT)-1) + (NT)-1)  */
+  (bit_ior:c (plus:c (convert (le @0 INTEGER_CST@1)) INTEGER_CST@2)
+(convert @0))
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
+   (with
+{
+ unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
+ unsigned otype_precision = TYPE_PRECISION (type);
+ wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
+ wide_int max = wi::mask (otype_precision, false, otype_precision);
+ wide_int int_cst_1 = wi::to_wide (@1);
+ wide_int int_cst_2 = wi::to_wide (@2);
+}
+(if (wi::eq_p (trunc_max, int_cst_1) && wi::eq_p (max, int_cst_2)))
+
 /* Signed saturation add, case 1:
T sum = (T)((UT)X + (UT)Y)
SAT_S_ADD = (X ^ sum) & !(X ^ Y) < 0 ? (-(T)(X < 0) ^ MAX) : sum;
@@ -3416,66 +3468,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
@2)
  (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type
 
-/* Unsigned saturation truncate, case 1, sizeof (WT) > sizeof (NT).
-   SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))).  */
-(match (unsigned_integer_sat_trunc @0)
- (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1)))
-   (convert @0))
- (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0)))
- (with
-  {
-   unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
-   unsigned otype_precision = TYPE_PRECISION (type);
-   wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
-   wide_int int_cst = wi::to_wide (@1, itype_precision);
-  }
-  (if (otype_precision < itype_precision && wi::eq_p (trunc_max, int_cst))
-
-/* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT).
-   SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)).  */
-/* If Op_0 def is MIN_EXPR and not single_use.  Aka below pattern:
-
- _18 = MIN_EXPR ; // op_0 def
- iftmp.0_11 = (unsigned int) _18; // op_0
- stream.avail_out = iftmp.0_11;
- left_37 = left_8 - _18;  // op_0 use
-
-   Transfer to .SAT_TRUNC will have MIN_EXPR still live.  Then the backend
-   (for example x86/riscv) will have 2-3 more insns generation for .SAT_TRUNC
-   b

[gcc r15-5949] arm: remove support for iWMMX/iWMMX2 intrinsics

2024-12-05 Thread Richard Earnshaw via Gcc-cvs
https://gcc.gnu.org/g:a92b2be97f369ae4c6e1cdcbb7a45525994afaad

commit r15-5949-ga92b2be97f369ae4c6e1cdcbb7a45525994afaad
Author: Richard Earnshaw 
Date:   Thu Dec 5 15:14:09 2024 +

arm: remove support for iWMMX/iWMMX2 intrinsics

The mmintrin.h header was adjusted for GCC-14 to generate a
(suppressible) warning if it was used, saying that support would be
removed in GCC-15.

Make that come true by removing the contents of this header and
emitting an error.

At this point in time I've not removed the internal support for the
intrinsics, just the wrappers that enable access to them.  That can be
done at leisure from now on.

gcc/ChangeLog:

* config/arm/mmintrin.h: Raise an error if this header is used.
Remove other content.

Diff:
---
 gcc/config/arm/mmintrin.h | 1812 +
 1 file changed, 1 insertion(+), 1811 deletions(-)

diff --git a/gcc/config/arm/mmintrin.h b/gcc/config/arm/mmintrin.h
index e9cc3ddd7ab7..65b6f943cf3d 100644
--- a/gcc/config/arm/mmintrin.h
+++ b/gcc/config/arm/mmintrin.h
@@ -24,1816 +24,6 @@
 #ifndef _MMINTRIN_H_INCLUDED
 #define _MMINTRIN_H_INCLUDED
 
-#ifndef __IWMMXT__
-#error mmintrin.h included without enabling WMMX/WMMX2 instructions (e.g. 
-march=iwmmxt or -march=iwmmxt2)
-#endif
-
-#ifndef __ENABLE_DEPRECATED_IWMMXT
-#warning support for WMMX/WMMX2 is deprecated and will be removed in GCC 15.  
Define __ENABLE_DEPRECATED_IWMMXT to suppress this warning
-#endif
-
-#if defined __cplusplus
-extern "C" {
-/* Intrinsics use C name-mangling.  */
-#endif /* __cplusplus */
-
-/* The data type intended for user use.  */
-typedef unsigned long long __m64, __int64;
-
-/* Internal data types for implementing the intrinsics.  */
-typedef int __v2si __attribute__ ((vector_size (8)));
-typedef short __v4hi __attribute__ ((vector_size (8)));
-typedef signed char __v8qi __attribute__ ((vector_size (8)));
-
-/* Provided for source compatibility with MMX.  */
-extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_empty (void)
-{
-}
-
-/* "Convert" __m64 and __int64 into each other.  */
-static __inline __m64
-_mm_cvtsi64_m64 (__int64 __i)
-{
-  return __i;
-}
-
-static __inline __int64
-_mm_cvtm64_si64 (__m64 __i)
-{
-  return __i;
-}
-
-static __inline int
-_mm_cvtsi64_si32 (__int64 __i)
-{
-  return __i;
-}
-
-static __inline __int64
-_mm_cvtsi32_si64 (int __i)
-{
-  return (__i & 0x);
-}
-
-/* Pack the four 16-bit values from M1 into the lower four 8-bit values of
-   the result, and the four 16-bit values from M2 into the upper four 8-bit
-   values of the result, all with signed saturation.  */
-static __inline __m64
-_mm_packs_pi16 (__m64 __m1, __m64 __m2)
-{
-  return (__m64) __builtin_arm_wpackhss ((__v4hi)__m1, (__v4hi)__m2);
-}
-
-/* Pack the two 32-bit values from M1 in to the lower two 16-bit values of
-   the result, and the two 32-bit values from M2 into the upper two 16-bit
-   values of the result, all with signed saturation.  */
-static __inline __m64
-_mm_packs_pi32 (__m64 __m1, __m64 __m2)
-{
-  return (__m64) __builtin_arm_wpackwss ((__v2si)__m1, (__v2si)__m2);
-}
-
-/* Copy the 64-bit value from M1 into the lower 32-bits of the result, and
-   the 64-bit value from M2 into the upper 32-bits of the result, all with
-   signed saturation for values that do not fit exactly into 32-bits.  */
-static __inline __m64
-_mm_packs_pi64 (__m64 __m1, __m64 __m2)
-{
-  return (__m64) __builtin_arm_wpackdss ((long long)__m1, (long long)__m2);
-}
-
-/* Pack the four 16-bit values from M1 into the lower four 8-bit values of
-   the result, and the four 16-bit values from M2 into the upper four 8-bit
-   values of the result, all with unsigned saturation.  */
-static __inline __m64
-_mm_packs_pu16 (__m64 __m1, __m64 __m2)
-{
-  return (__m64) __builtin_arm_wpackhus ((__v4hi)__m1, (__v4hi)__m2);
-}
-
-/* Pack the two 32-bit values from M1 into the lower two 16-bit values of
-   the result, and the two 32-bit values from M2 into the upper two 16-bit
-   values of the result, all with unsigned saturation.  */
-static __inline __m64
-_mm_packs_pu32 (__m64 __m1, __m64 __m2)
-{
-  return (__m64) __builtin_arm_wpackwus ((__v2si)__m1, (__v2si)__m2);
-}
-
-/* Copy the 64-bit value from M1 into the lower 32-bits of the result, and
-   the 64-bit value from M2 into the upper 32-bits of the result, all with
-   unsigned saturation for values that do not fit exactly into 32-bits.  */
-static __inline __m64
-_mm_packs_pu64 (__m64 __m1, __m64 __m2)
-{
-  return (__m64) __builtin_arm_wpackdus ((long long)__m1, (long long)__m2);
-}
-
-/* Interleave the four 8-bit values from the high half of M1 with the four
-   8-bit values from the high half of M2.  */
-static __inline __m64
-_mm_unpackhi_pi8 (__m64 __m1, __m64 __m2)
-{
-  return (__m64) __builtin_arm_wunpckihb ((__v8qi)__m1, (__v8qi)__m2);
-}
-
-/* Interleave the two 16-bit values 

[gcc r15-5948] aarch64: Mark vluti* intrinsics as QUIET

2024-12-05 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:cd9499a78dd57c311a9cfd1e0ba132833eaea490

commit r15-5948-gcd9499a78dd57c311a9cfd1e0ba132833eaea490
Author: Richard Sandiford 
Date:   Thu Dec 5 15:33:11 2024 +

aarch64: Mark vluti* intrinsics as QUIET

This patch fixes the vluti* definitions to say that they don't
raise FP exceptions even for floating-point modes.

gcc/
* config/aarch64/aarch64-simd-pragma-builtins.def
(ENTRY_TERNARY_VLUT8): Use FLAG_QUIET rather than FLAG_DEFAULT.
(ENTRY_TERNARY_VLUT16): Likewise.

Diff:
---
 .../aarch64/aarch64-simd-pragma-builtins.def   | 24 +++---
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd-pragma-builtins.def 
b/gcc/config/aarch64/aarch64-simd-pragma-builtins.def
index dfcfa8a0ac02..bc9a63b968af 100644
--- a/gcc/config/aarch64/aarch64-simd-pragma-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-pragma-builtins.def
@@ -37,32 +37,32 @@
 #undef ENTRY_TERNARY_VLUT8
 #define ENTRY_TERNARY_VLUT8(T) \
   ENTRY_BINARY_LANE (vluti2_lane_##T##8, T##8q, T##8, u8,  \
-UNSPEC_LUTI2, DEFAULT) \
+UNSPEC_LUTI2, QUIET)   \
   ENTRY_BINARY_LANE (vluti2_laneq_##T##8, T##8q, T##8, u8q,\
-UNSPEC_LUTI2, DEFAULT) \
+UNSPEC_LUTI2, QUIET)   \
   ENTRY_BINARY_LANE (vluti2q_lane_##T##8, T##8q, T##8q, u8,\
-UNSPEC_LUTI2, DEFAULT) \
+UNSPEC_LUTI2, QUIET)   \
   ENTRY_BINARY_LANE (vluti2q_laneq_##T##8, T##8q, T##8q, u8q,  \
-UNSPEC_LUTI2, DEFAULT) \
+UNSPEC_LUTI2, QUIET)   \
   ENTRY_BINARY_LANE (vluti4q_lane_##T##8, T##8q, T##8q, u8,\
-UNSPEC_LUTI4, DEFAULT) \
+UNSPEC_LUTI4, QUIET)   \
   ENTRY_BINARY_LANE (vluti4q_laneq_##T##8, T##8q, T##8q, u8q,  \
-UNSPEC_LUTI4, DEFAULT)
+UNSPEC_LUTI4, QUIET)
 
 #undef ENTRY_TERNARY_VLUT16
 #define ENTRY_TERNARY_VLUT16(T)
\
   ENTRY_BINARY_LANE (vluti2_lane_##T##16, T##16q, T##16, u8,   \
-UNSPEC_LUTI2, DEFAULT) \
+UNSPEC_LUTI2, QUIET)   \
   ENTRY_BINARY_LANE (vluti2_laneq_##T##16, T##16q, T##16, u8q, \
-UNSPEC_LUTI2, DEFAULT) \
+UNSPEC_LUTI2, QUIET)   \
   ENTRY_BINARY_LANE (vluti2q_lane_##T##16, T##16q, T##16q, u8, \
-UNSPEC_LUTI2, DEFAULT) \
+UNSPEC_LUTI2, QUIET)   \
   ENTRY_BINARY_LANE (vluti2q_laneq_##T##16, T##16q, T##16q, u8q,   \
-UNSPEC_LUTI2, DEFAULT) \
+UNSPEC_LUTI2, QUIET)   \
   ENTRY_BINARY_LANE (vluti4q_lane_##T##16_x2, T##16q, T##16qx2, u8,\
-UNSPEC_LUTI4, DEFAULT) \
+UNSPEC_LUTI4, QUIET)   \
   ENTRY_BINARY_LANE (vluti4q_laneq_##T##16_x2, T##16q, T##16qx2, u8q,  \
-UNSPEC_LUTI4, DEFAULT)
+UNSPEC_LUTI4, QUIET)
 
 // faminmax
 #define REQUIRED_EXTENSIONS nonstreaming_only (AARCH64_FL_FAMINMAX)


[gcc r15-5950] i386: Fix addcarry/subborrow issues [PR117860]

2024-12-05 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:b3cb0c3302a7c16e661a08c15c897c8f7bbb5d23

commit r15-5950-gb3cb0c3302a7c16e661a08c15c897c8f7bbb5d23
Author: Uros Bizjak 
Date:   Thu Dec 5 17:02:46 2024 +0100

i386: Fix addcarry/subborrow issues [PR117860]

Fix several things to enable combine to handle addcarry/subborrow patterns:

- Fix wrong canonical form of addcarry insn and friends. For
commutative operand (PLUS RTX) binary operand (LTU) takes precedence before
unary operand (ZERO_EXTEND).

- Swap operands of GTU comparison to canonicalize addcarry/subborrow
comparison. Again, the canonical form of the compare is PLUS RTX before
ZERO_EXTEND RTX. GTU comparison is not a carry flag comparison, so we have
to swap operands in x86_canonicalize_comparison to a non-canonical form
to use LTU comparison.

- Return correct compare mode (CCCmode) for addcarry/subborrow pattern
from ix86_cc_mode, so combine is able to emit required compare mode for
combined insn.

- Add *subborrow_1 pattern having const_scalar_int_operand predicate.
Here, canonicalization of SUB (op1, const) RTX to PLUS (op1, -const) 
requires
negation of constant operand when ckecking operands.

With the above changes, combine is able to create *addcarry_1/*subborrow_1
pattern with immediate operand for the testcase in the PR:

SomeAddFunc:
addq%rcx, %rsi  # 10[c=4 l=3]  adddi3_cc_overflow_1/0
movq%rdi, %rax  # 33[c=4 l=3]  *movdi_internal/3
adcq$5, %rdx# 19[c=4 l=4]  *addcarrydi_1/0
movq%rsi, (%rdi)# 23[c=4 l=3]  *movdi_internal/5
movq%rdx, 8(%rdi)   # 24[c=4 l=4]  *movdi_internal/5
setc%dl # 39[c=4 l=3]  *setcc_qi
movzbl  %dl, %edx   # 40[c=4 l=3]  zero_extendqidi2/0
movq%rdx, 16(%rdi)  # 26[c=4 l=4]  *movdi_internal/5
ret # 43[c=0 l=1]  simple_return_internal

SomeSubFunc:
subq%rcx, %rsi  # 10[c=4 l=3]  *subdi_3/0
movq%rdi, %rax  # 42[c=4 l=3]  *movdi_internal/3
sbbq$17, %rdx   # 19[c=4 l=4]  *subborrowdi_1/0
movq%rsi, (%rdi)# 33[c=4 l=3]  *movdi_internal/5
sbbq%rcx, %rcx  # 29[c=8 l=3]  *x86_movdicc_0_m1_neg
movq%rdx, 8(%rdi)   # 34[c=4 l=4]  *movdi_internal/5
movq%rcx, 16(%rdi)  # 35[c=4 l=4]  *movdi_internal/5
ret # 51[c=0 l=1]  simple_return_internal

PR target/117860

gcc/ChangeLog:

* config/i386/i386.cc (ix86_canonicalize_comparison): Swap
operands of GTU comparison to canonicalize addcarry/subborrow
comparison.
(ix86_cc_mode): Return CCCmode for the comparison of
addcarry/subborrow pattern.
* config/i386/i386.md (addcarry): Swap operands of
PLUS RTX to make it canonical.
(*addcarry_1): Ditto.
(addcarry peephole2s): Update RTXes for addcarry_1 change.
(*add3_doubleword_cc_overflow_1): Ditto.
(*subborrow_1): New insn pattern.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117860.c: New test.

Diff:
---
 gcc/config/i386/i386.cc  | 23 -
 gcc/config/i386/i386.md  | 85 +---
 gcc/testsuite/gcc.target/i386/pr117860.c | 52 +++
 3 files changed, 140 insertions(+), 20 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 0beeb514cf95..23ff16b40812 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -578,11 +578,25 @@ ix86_canonicalize_comparison (int *code, rtx *op0, rtx 
*op1,
{
  std::swap (*op0, *op1);
  *code = (int) scode;
+ return;
}
 }
+
+  /* Swap operands of GTU comparison to canonicalize
+ addcarry/subborrow comparison.  */
+  if (!op0_preserve_value
+  && *code == GTU
+  && GET_CODE (*op0) == PLUS
+  && ix86_carry_flag_operator (XEXP (*op0, 0), VOIDmode)
+  && GET_CODE (XEXP (*op0, 1)) == ZERO_EXTEND
+  && GET_CODE (*op1) == ZERO_EXTEND)
+{
+  std::swap (*op0, *op1);
+  *code = (int) swap_condition ((enum rtx_code) *code);
+  return;
+}
 }
 
-
 /* Hook to determine if one function can safely inline another.  */
 
 static bool
@@ -16479,6 +16493,13 @@ ix86_cc_mode (enum rtx_code code, rtx op0, rtx op1)
   && GET_CODE (op1) == GEU
   && GET_MODE (XEXP (op1, 0)) == CCCmode)
return CCCmode;
+  /* Similarly for the comparison of addcarry/subborrow pattern.  */
+  else if (code == LTU
+  && GET_CODE (op0) == ZERO_EXTEND
+  && GET_CODE (op1) == PLUS
+  && ix86_carry_flag_operator (XEXP (op1, 0)

[gcc r15-5943] middle-end/117801 - failed register coalescing due to GIMPLE schedule

2024-12-05 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:dc0dea98c96e02c6b24060170bc88da8d4931bc2

commit r15-5943-gdc0dea98c96e02c6b24060170bc88da8d4931bc2
Author: Richard Biener 
Date:   Wed Nov 27 13:36:19 2024 +0100

middle-end/117801 - failed register coalescing due to GIMPLE schedule

For a TSVC testcase we see failed register coalescing due to a
different schedule of GIMPLE .FMA and stores fed by it.  This
can be mitigated by making direct internal functions participate
in TER - given we're using more and more of such functions to
expose target capabilities it seems to be a natural thing to not
exempt those.

Unfortunately the internal function expanding API doesn't match
what we usually have - passing in a target and returning an RTX
but instead the LHS of the call is expanded and written to.  This
makes the TER expansion of a call SSA def a bit unwieldly.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

The ccmp changes have likely not seen any coverage, the debug stmt
changes might not be optimal, we might end up losing on replaceable
calls.

PR middle-end/117801
* tree-outof-ssa.cc (ssa_is_replaceable_p): Make
direct internal function calls replaceable.
* expr.cc (get_def_for_expr): Handle replacements with calls.
(get_def_for_expr_class): Likewise.
(optimize_bitfield_assignment_op): Likewise.
(expand_expr_real_1): Likewise.  Properly expand direct
internal function defs.
* cfgexpand.cc (expand_call_stmt): Handle replacements with calls.
(avoid_deep_ter_for_debug): Likewise, always create a debug temp
for calls.
(expand_debug_expr): Likewise, give up for calls.
(expand_gimple_basic_block): Likewise.
* ccmp.cc (ccmp_candidate_p): Likewise.
(get_compare_parts): Likewise.

Diff:
---
 gcc/ccmp.cc   |  4 ++--
 gcc/cfgexpand.cc  | 14 +++---
 gcc/expr.cc   | 19 ++-
 gcc/tree-outof-ssa.cc | 15 ---
 4 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
index 45629abadbe0..4f739dfda504 100644
--- a/gcc/ccmp.cc
+++ b/gcc/ccmp.cc
@@ -100,7 +100,7 @@ ccmp_candidate_p (gimple *g, bool outer = false)
   tree_code tcode;
   basic_block bb;
 
-  if (!g)
+  if (!g || !is_gimple_assign (g))
 return false;
 
   tcode = gimple_assign_rhs_code (g);
@@ -138,7 +138,7 @@ get_compare_parts (tree t, int *up, rtx_code *rcode,
 {
   tree_code code;
   gimple *g = get_gimple_for_ssa_name (t);
-  if (g)
+  if (g && is_gimple_assign (g))
 {
   *up = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (g)));
   code = gimple_assign_rhs_code (g);
diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 58d68ec1caa5..ea08810df045 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -2848,6 +2848,7 @@ expand_call_stmt (gcall *stmt)
   if (builtin_p
  && TREE_CODE (arg) == SSA_NAME
  && (def = get_gimple_for_ssa_name (arg))
+ && is_gimple_assign (def)
  && gimple_assign_rhs_code (def) == ADDR_EXPR)
arg = gimple_assign_rhs1 (def);
   CALL_EXPR_ARG (exp, i) = arg;
@@ -4414,7 +4415,7 @@ avoid_deep_ter_for_debug (gimple *stmt, int depth)
   gimple *g = get_gimple_for_ssa_name (use);
   if (g == NULL)
continue;
-  if (depth > 6 && !stmt_ends_bb_p (g))
+  if ((depth > 6 || !is_gimple_assign (g)) && !stmt_ends_bb_p (g))
{
  if (deep_ter_debug_map == NULL)
deep_ter_debug_map = new hash_map;
@@ -5388,7 +5389,13 @@ expand_debug_expr (tree exp)
  t = *slot;
  }
if (t == NULL_TREE)
- t = gimple_assign_rhs_to_tree (g);
+ {
+   if (is_gimple_assign (g))
+ t = gimple_assign_rhs_to_tree (g);
+   else
+ /* expand_debug_expr doesn't handle CALL_EXPR right now.  */
+ return NULL;
+ }
op0 = expand_debug_expr (t);
if (!op0)
  return NULL;
@@ -5964,7 +5971,8 @@ expand_gimple_basic_block (basic_block bb, bool 
disable_tail_calls)
  /* Look for SSA names that have their last use here (TERed
 names always have only one real use).  */
  FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_USE)
-   if ((def = get_gimple_for_ssa_name (op)))
+   if ((def = get_gimple_for_ssa_name (op))
+   && is_gimple_assign (def))
  {
imm_use_iterator imm_iter;
use_operand_p use_p;
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 70f2ecec9983..5578e3d9e993 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtx-vector-builder.h"
 #include "tree-pretty-print.h"
 #include "flags.h"
+#include "internal-fn.h"
 

[gcc r15-5942] libstdc++: Use ADL swap for containers' function objects [PR117921]

2024-12-05 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:0368c42507328774cadbea589509b95aaf3cb826

commit r15-5942-g0368c42507328774cadbea589509b95aaf3cb826
Author: Jonathan Wakely 
Date:   Thu Dec 5 12:46:26 2024 +

libstdc++: Use ADL swap for containers' function objects [PR117921]

The standard says that Compare, Pred and Hash objects should be swapped
as described in [swappable.requirements] which means calling swap
unqualified with std::swap visible to name lookup.

libstdc++-v3/ChangeLog:

PR libstdc++/117921
* include/bits/hashtable_policy.h (_Hash_code_base::_M_swap):
Use ADL swap for Hash members.
(_Hashtable_base::_M_swap): Use ADL swap for _Equal members.
* include/bits/stl_tree.h (_Rb_tree::swap): Use ADL swap for
_Compare members.
* testsuite/23_containers/set/modifiers/swap/adl.cc: New test.
* testsuite/23_containers/unordered_set/modifiers/swap-2.cc: New
test.

Diff:
---
 libstdc++-v3/include/bits/hashtable_policy.h   |  8 ++-
 libstdc++-v3/include/bits/stl_tree.h   |  4 +-
 .../23_containers/set/modifiers/swap/adl.cc| 54 +++
 .../unordered_set/modifiers/swap-2.cc  | 62 ++
 4 files changed, 125 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index ad0dfd55c3f1..f2260f3926dc 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -1177,7 +1177,10 @@ namespace __detail
 
   void
   _M_swap(_Hash_code_base& __x)
-  { std::swap(__ebo_hash::_M_get(), __x.__ebo_hash::_M_get()); }
+  {
+   using std::swap;
+   swap(__ebo_hash::_M_get(), __x.__ebo_hash::_M_get());
+  }
 
   const _Hash&
   _M_hash() const { return __ebo_hash::_M_cget(); }
@@ -1561,7 +1564,8 @@ namespace __detail
   _M_swap(_Hashtable_base& __x)
   {
__hash_code_base::_M_swap(__x);
-   std::swap(_EqualEBO::_M_get(), __x._EqualEBO::_M_get());
+   using std::swap;
+   swap(_EqualEBO::_M_get(), __x._EqualEBO::_M_get());
   }
 
   const _Equal&
diff --git a/libstdc++-v3/include/bits/stl_tree.h 
b/libstdc++-v3/include/bits/stl_tree.h
index bc27e191e8b8..0f536517d6b7 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -2091,7 +2091,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  std::swap(this->_M_impl._M_node_count, __t._M_impl._M_node_count);
}
   // No need to swap header's color as it does not change.
-  std::swap(this->_M_impl._M_key_compare, __t._M_impl._M_key_compare);
+
+  using std::swap;
+  swap(this->_M_impl._M_key_compare, __t._M_impl._M_key_compare);
 
   _Alloc_traits::_S_on_swap(_M_get_Node_allocator(),
__t._M_get_Node_allocator());
diff --git a/libstdc++-v3/testsuite/23_containers/set/modifiers/swap/adl.cc 
b/libstdc++-v3/testsuite/23_containers/set/modifiers/swap/adl.cc
new file mode 100644
index ..2b7975a366fc
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/set/modifiers/swap/adl.cc
@@ -0,0 +1,54 @@
+// { dg-do run { target c++11 } }
+
+// Bug 117921 - containers do not use ADL swap for Compare, Pred or Hash types
+
+#include 
+#include 
+
+namespace adl
+{
+  struct Less : std::less
+  {
+static bool swapped;
+friend void swap(Less&, Less&) { swapped = true; }
+  };
+  bool Less::swapped = false;
+
+  struct Allocator_base
+  {
+static bool swapped;
+  };
+  bool Allocator_base::swapped = false;
+
+  using std::size_t;
+
+  template
+struct Allocator : Allocator_base
+{
+  using value_type = T;
+
+  Allocator() { }
+  template Allocator(const Allocator&) { }
+
+  T* allocate(size_t n) { return std::allocator().allocate(n); }
+  void deallocate(T* p, size_t n) { std::allocator().deallocate(p, n); }
+
+  using propagate_on_container_swap = std::true_type;
+
+  friend void swap(Allocator&, Allocator&) { swapped = true; }
+};
+}
+
+void
+test_swap()
+{
+  std::set> s1, s2;
+  s1.swap(s2);
+  VERIFY( adl::Less::swapped );
+  VERIFY( adl::Allocator_base::swapped );
+}
+
+int main()
+{
+  test_swap();
+}
diff --git 
a/libstdc++-v3/testsuite/23_containers/unordered_set/modifiers/swap-2.cc 
b/libstdc++-v3/testsuite/23_containers/unordered_set/modifiers/swap-2.cc
new file mode 100644
index ..a0fb1a6f662f
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/unordered_set/modifiers/swap-2.cc
@@ -0,0 +1,62 @@
+// { dg-do run { target c++11 } }
+
+// Bug 117921 - containers do not use ADL swap for Compare, Pred or Hash types
+
+#include 
+#include 
+
+namespace adl
+{
+  struct Hash : std::hash
+  {
+static bool swapped;
+friend void swap(Hash&, Hash&) { swapped = true; }
+  };
+  bool Hash::swapped = false;
+
+  struct Eq : std::equal_to
+  {
+  

[gcc r15-5946] aarch64: Rename FLAG_NONE to FLAG_DEFAULT

2024-12-05 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:1e181536ba5c39d987bf394d346f49982e6df83a

commit r15-5946-g1e181536ba5c39d987bf394d346f49982e6df83a
Author: Richard Sandiford 
Date:   Thu Dec 5 15:33:10 2024 +

aarch64: Rename FLAG_NONE to FLAG_DEFAULT

This patch renames to FLAG_NONE to FLAG_DEFAULT.  "NONE" suggests
that the function has no side-effects, whereas it actually means
that floating-point operations are assumed to read FPCR and to
raise FP exceptions.

gcc/
* config/aarch64/aarch64-builtins.cc (FLAG_NONE): Rename to...
(FLAG_DEFAULT): ...this and update all references.
* config/aarch64/aarch64-simd-builtins.def: Update all references
here too.
* config/aarch64/aarch64-simd-pragma-builtins.def: Likewise.

Diff:
---
 gcc/config/aarch64/aarch64-builtins.cc |  32 +-
 gcc/config/aarch64/aarch64-simd-builtins.def   | 726 ++---
 .../aarch64/aarch64-simd-pragma-builtins.def   |  24 +-
 3 files changed, 391 insertions(+), 391 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 4f735e8e58b8..eb44580bd9cb 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -193,7 +193,7 @@ using namespace aarch64;
 #define SIMD_MAX_BUILTIN_ARGS 5
 
 /* Flags that describe what a function might do.  */
-const unsigned int FLAG_NONE = 0U;
+const unsigned int FLAG_DEFAULT = 0U;
 const unsigned int FLAG_READ_FPCR = 1U << 0;
 const unsigned int FLAG_RAISE_FP_EXCEPTIONS = 1U << 1;
 const unsigned int FLAG_READ_MEMORY = 1U << 2;
@@ -913,7 +913,7 @@ static aarch64_fcmla_laneq_builtin_datum 
aarch64_fcmla_lane_builtin_data[] = {
2, \
{ SIMD_INTR_MODE(A, L), SIMD_INTR_MODE(B, L) }, \
{ SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(B) }, \
-   FLAG_NONE, \
+   FLAG_DEFAULT, \
SIMD_INTR_MODE(A, L) == SIMD_INTR_MODE(B, L) \
  && SIMD_INTR_QUAL(A) == SIMD_INTR_QUAL(B) \
   },
@@ -925,7 +925,7 @@ static aarch64_fcmla_laneq_builtin_datum 
aarch64_fcmla_lane_builtin_data[] = {
2, \
{ SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
{ SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \
-   FLAG_NONE, \
+   FLAG_DEFAULT, \
false \
   },
 
@@ -936,7 +936,7 @@ static aarch64_fcmla_laneq_builtin_datum 
aarch64_fcmla_lane_builtin_data[] = {
2, \
{ SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
{ SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \
-   FLAG_NONE, \
+   FLAG_DEFAULT, \
false \
   },
 
@@ -1857,7 +1857,7 @@ aarch64_init_crc32_builtins ()
   aarch64_crc_builtin_datum* d = &aarch64_crc_builtin_data[i];
   tree argtype = aarch64_simd_builtin_type (d->mode, qualifier_unsigned);
   tree ftype = build_function_type_list (usi_type, usi_type, argtype, 
NULL_TREE);
-  tree attrs = aarch64_get_attributes (FLAG_NONE, d->mode);
+  tree attrs = aarch64_get_attributes (FLAG_DEFAULT, d->mode);
   tree fndecl
= aarch64_general_add_builtin (d->name, ftype, d->fcode, attrs);
 
@@ -2232,7 +2232,7 @@ static void
 aarch64_init_data_intrinsics (void)
 {
   /* These intrinsics are not fp nor they read/write memory. */
-  tree attrs = aarch64_get_attributes (FLAG_NONE, SImode);
+  tree attrs = aarch64_get_attributes (FLAG_DEFAULT, SImode);
   tree uint32_fntype = build_function_type_list (uint32_type_node,
 uint32_type_node, NULL_TREE);
   tree ulong_fntype = build_function_type_list (long_unsigned_type_node,
@@ -4048,7 +4048,7 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, 
gcall *stmt,
   switch (fcode)
 {
   BUILTIN_VALL (UNOP, reduc_plus_scal_, 10, ALL)
-  BUILTIN_VDQ_I (UNOPU, reduc_plus_scal_, 10, NONE)
+  BUILTIN_VDQ_I (UNOPU, reduc_plus_scal_, 10, DEFAULT)
new_stmt = gimple_build_call_internal (IFN_REDUC_PLUS,
   1, args[0]);
gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
@@ -4062,8 +4062,8 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, 
gcall *stmt,
break;
 
  BUILTIN_VDC (BINOP, combine, 0, QUIET)
- BUILTIN_VD_I (BINOPU, combine, 0, NONE)
- BUILTIN_VDC_P (BINOPP, combine, 0, NONE)
+ BUILTIN_VD_I (BINOPU, combine, 0, DEFAULT)
+ BUILTIN_VDC_P (BINOPP, combine, 0, DEFAULT)
{
  tree first_part, second_part;
  if (BYTES_BIG_ENDIAN)
@@ -4152,14 +4152,14 @@ aarch64_general_gimple_fold_builtin (unsigned int 
fcode, gcall *stmt,
   1, args[0]);
gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
break;
-  BUILTIN_VSDQ_I_DI (BINOP, ashl, 3, NONE)
+  BUILTIN_VSDQ_I_DI (BINOP, ashl, 3, DEFAULT)
if (TREE_CODE (args[1]) == INTEGER_CST
&& wi::ltu_p (wi::to_wide (args[1]), element_precision (args[0])))
  new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
  

[gcc r15-5945] aarch64: Rename FLAG_AUTO_FP to FLAG_QUIET

2024-12-05 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:bd7363ed699cae78bd87d23922fdbf3dd51fa03b

commit r15-5945-gbd7363ed699cae78bd87d23922fdbf3dd51fa03b
Author: Richard Sandiford 
Date:   Thu Dec 5 15:33:09 2024 +

aarch64: Rename FLAG_AUTO_FP to FLAG_QUIET

I'd suggested the name "FLAG_AUTO_FP" to mean "automatically derive
FLAG_FP from the mode", i.e. automatically decide whether the function
might read the FPCR or might raise FP exceptions.  However, the flag
currently suppresses that behaviour instead.

This patch renames FLAG_AUTO_FP to FLAG_QUIET.  That's probably not a
great name, but it's also what the SVE code means by "quiet", and is
borrowed from "quiet NaNs".

gcc/
* config/aarch64/aarch64-builtins.cc (FLAG_AUTO_FP): Rename to...
(FLAG_QUIET): ...this and update all references.
* config/aarch64/aarch64-simd-builtins.def: Update all references
here too.

Diff:
---
 gcc/config/aarch64/aarch64-builtins.cc   | 10 
 gcc/config/aarch64/aarch64-simd-builtins.def | 36 ++--
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 22f8216a45b3..4f735e8e58b8 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -202,13 +202,13 @@ const unsigned int FLAG_WRITE_MEMORY = 1U << 4;
 
 /* Not all FP intrinsics raise FP exceptions or read FPCR register,
use this flag to suppress it.  */
-const unsigned int FLAG_AUTO_FP = 1U << 5;
+const unsigned int FLAG_QUIET = 1U << 5;
 
 const unsigned int FLAG_FP = FLAG_READ_FPCR | FLAG_RAISE_FP_EXCEPTIONS;
 const unsigned int FLAG_ALL = FLAG_READ_FPCR | FLAG_RAISE_FP_EXCEPTIONS
   | FLAG_READ_MEMORY | FLAG_PREFETCH_MEMORY | FLAG_WRITE_MEMORY;
-const unsigned int FLAG_STORE = FLAG_WRITE_MEMORY | FLAG_AUTO_FP;
-const unsigned int FLAG_LOAD = FLAG_READ_MEMORY | FLAG_AUTO_FP;
+const unsigned int FLAG_STORE = FLAG_WRITE_MEMORY | FLAG_QUIET;
+const unsigned int FLAG_LOAD = FLAG_READ_MEMORY | FLAG_QUIET;
 
 typedef struct
 {
@@ -1322,7 +1322,7 @@ aarch64_init_simd_builtin_scalar_types (void)
 static unsigned int
 aarch64_call_properties (unsigned int flags, machine_mode mode)
 {
-  if (!(flags & FLAG_AUTO_FP) && FLOAT_MODE_P (mode))
+  if (!(flags & FLAG_QUIET) && FLOAT_MODE_P (mode))
 flags |= FLAG_FP;
 
   /* -fno-trapping-math means that we can assume any FP exceptions
@@ -4061,7 +4061,7 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, 
gcall *stmt,
gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
break;
 
- BUILTIN_VDC (BINOP, combine, 0, AUTO_FP)
+ BUILTIN_VDC (BINOP, combine, 0, QUIET)
  BUILTIN_VD_I (BINOPU, combine, 0, NONE)
  BUILTIN_VDC_P (BINOPP, combine, 0, NONE)
{
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 0814f8ba14f5..3df2773380ed 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -50,7 +50,7 @@
   BUILTIN_V12DI (STORESTRUCT_LANE_U, vec_stl1_lane, 0, ALL)
   BUILTIN_V12DI (STORESTRUCT_LANE_P, vec_stl1_lane, 0, ALL)
 
-  BUILTIN_VDC (BINOP, combine, 0, AUTO_FP)
+  BUILTIN_VDC (BINOP, combine, 0, QUIET)
   BUILTIN_VD_I (BINOPU, combine, 0, NONE)
   BUILTIN_VDC_P (BINOPP, combine, 0, NONE)
   BUILTIN_VB (BINOPP, pmul, 0, NONE)
@@ -657,12 +657,12 @@
 
   /* Implemented by
  aarch64_.  */
-  BUILTIN_VALL (BINOP, zip1, 0, AUTO_FP)
-  BUILTIN_VALL (BINOP, zip2, 0, AUTO_FP)
-  BUILTIN_VALL (BINOP, uzp1, 0, AUTO_FP)
-  BUILTIN_VALL (BINOP, uzp2, 0, AUTO_FP)
-  BUILTIN_VALL (BINOP, trn1, 0, AUTO_FP)
-  BUILTIN_VALL (BINOP, trn2, 0, AUTO_FP)
+  BUILTIN_VALL (BINOP, zip1, 0, QUIET)
+  BUILTIN_VALL (BINOP, zip2, 0, QUIET)
+  BUILTIN_VALL (BINOP, uzp1, 0, QUIET)
+  BUILTIN_VALL (BINOP, uzp2, 0, QUIET)
+  BUILTIN_VALL (BINOP, trn1, 0, QUIET)
+  BUILTIN_VALL (BINOP, trn2, 0, QUIET)
 
   BUILTIN_GPF_F16 (UNOP, frecpe, 0, FP)
   BUILTIN_GPF_F16 (UNOP, frecpx, 0, FP)
@@ -674,9 +674,9 @@
 
   /* Implemented by a mixture of abs2 patterns.  Note the DImode builtin is
  only ever used for the int64x1_t intrinsic, there is no scalar version.  
*/
-  BUILTIN_VSDQ_I_DI (UNOP, abs, 0, AUTO_FP)
-  BUILTIN_VHSDF (UNOP, abs, 2, AUTO_FP)
-  VAR1 (UNOP, abs, 2, AUTO_FP, hf)
+  BUILTIN_VSDQ_I_DI (UNOP, abs, 0, QUIET)
+  BUILTIN_VHSDF (UNOP, abs, 2, QUIET)
+  VAR1 (UNOP, abs, 2, QUIET, hf)
 
   BUILTIN_VQ_HSF (UNOP, vec_unpacks_hi_, 10, FP)
   VAR1 (BINOP, float_truncate_hi_, 0, FP, v4sf)
@@ -720,7 +720,7 @@
   BUILTIN_VDQQH (BSL_P, simd_bsl, 0, NONE)
   VAR2 (BSL_P, simd_bsl,0, NONE, di, v2di)
   BUILTIN_VSDQ_I_DI (BSL_U, simd_bsl, 0, NONE)
-  BUILTIN_VALLDIF (BSL_S, simd_bsl, 0, AUTO_FP)
+  BUILTIN_VALLDIF (BSL_S, simd_bsl, 0, QUIET)
 
   /* Implemented by aarch64_crypto_aes.  */
   VAR1 (BINOPU, crypto_aese, 0, NONE, v16qi)
@@ -940,12 +940,12 @@
   BUILT

[gcc r15-5947] aarch64: Reintroduce FLAG_AUTO_FP

2024-12-05 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:0a4490a1ad3f73d546f53d0940dbc9f217d12922

commit r15-5947-g0a4490a1ad3f73d546f53d0940dbc9f217d12922
Author: Richard Sandiford 
Date:   Thu Dec 5 15:33:10 2024 +

aarch64: Reintroduce FLAG_AUTO_FP

The flag now known as FLAG_QUIET is an odd-one-out in that it
removes side-effects rather than adding them.  This patch inverts
it and gives it the old name FLAG_AUTO_FP.  FLAG_QUIET now means
"no flags" instead.

gcc/
* config/aarch64/aarch64-builtins.cc (FLAG_QUIET): Redefine to 0,
replacing the old flag with...
(FLAG_AUTO_FP): ...this.
(FLAG_DEFAULT): Redefine to FLAG_AUTO_FP.
(aarch64_call_properties): Update accordingly.

Diff:
---
 gcc/config/aarch64/aarch64-builtins.cc | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index eb44580bd9cb..f528592a17d8 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -193,22 +193,23 @@ using namespace aarch64;
 #define SIMD_MAX_BUILTIN_ARGS 5
 
 /* Flags that describe what a function might do.  */
-const unsigned int FLAG_DEFAULT = 0U;
 const unsigned int FLAG_READ_FPCR = 1U << 0;
 const unsigned int FLAG_RAISE_FP_EXCEPTIONS = 1U << 1;
 const unsigned int FLAG_READ_MEMORY = 1U << 2;
 const unsigned int FLAG_PREFETCH_MEMORY = 1U << 3;
 const unsigned int FLAG_WRITE_MEMORY = 1U << 4;
 
-/* Not all FP intrinsics raise FP exceptions or read FPCR register,
-   use this flag to suppress it.  */
-const unsigned int FLAG_QUIET = 1U << 5;
+/* Indicates that READ_FPCR and RAISE_FP_EXCEPTIONS should be set for
+   floating-point modes but not for integer modes.  */
+const unsigned int FLAG_AUTO_FP = 1U << 5;
 
+const unsigned int FLAG_QUIET = 0;
+const unsigned int FLAG_DEFAULT = FLAG_AUTO_FP;
 const unsigned int FLAG_FP = FLAG_READ_FPCR | FLAG_RAISE_FP_EXCEPTIONS;
 const unsigned int FLAG_ALL = FLAG_READ_FPCR | FLAG_RAISE_FP_EXCEPTIONS
   | FLAG_READ_MEMORY | FLAG_PREFETCH_MEMORY | FLAG_WRITE_MEMORY;
-const unsigned int FLAG_STORE = FLAG_WRITE_MEMORY | FLAG_QUIET;
-const unsigned int FLAG_LOAD = FLAG_READ_MEMORY | FLAG_QUIET;
+const unsigned int FLAG_STORE = FLAG_WRITE_MEMORY;
+const unsigned int FLAG_LOAD = FLAG_READ_MEMORY;
 
 typedef struct
 {
@@ -1322,7 +1323,7 @@ aarch64_init_simd_builtin_scalar_types (void)
 static unsigned int
 aarch64_call_properties (unsigned int flags, machine_mode mode)
 {
-  if (!(flags & FLAG_QUIET) && FLOAT_MODE_P (mode))
+  if ((flags & FLAG_AUTO_FP) && FLOAT_MODE_P (mode))
 flags |= FLAG_FP;
 
   /* -fno-trapping-math means that we can assume any FP exceptions


[gcc r13-9230] c++: Don't reject pointer to virtual method during constant evaluation [PR117615]

2024-12-05 Thread Simon Martin via Gcc-cvs
https://gcc.gnu.org/g:322faea202947561ee8c03edf5ab0ccf649587e1

commit r13-9230-g322faea202947561ee8c03edf5ab0ccf649587e1
Author: Simon Martin 
Date:   Tue Dec 3 14:30:43 2024 +0100

c++: Don't reject pointer to virtual method during constant evaluation 
[PR117615]

We currently reject the following valid code:

=== cut here ===
struct Base {
virtual void doit (int v) const {}
};
struct Derived : Base {
void doit (int v) const {}
};
using fn_t = void (Base::*)(int) const;
struct Helper {
fn_t mFn;
constexpr Helper (auto && fn) : mFn(static_cast(fn)) {}
};
void foo () {
constexpr Helper h (&Derived::doit);
}
=== cut here ===

The problem is that since r6-4014-gdcdbc004d531b4, &Derived::doit is
represented with an expression with type pointer to method and using an
INTEGER_CST (here 1), and that cxx_eval_constant_expression rejects any
such expression with a non-null INTEGER_CST.

This patch uses the same strategy as r12-4491-gf45610a45236e9 (fix for
PR c++/102786), and simply lets such expressions go through.

PR c++/117615

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Don't reject
INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-virtual22.C: New test.

(cherry picked from commit 72a2380a306a1c3883cb7e4f99253522bc265af0)

Diff:
---
 gcc/cp/constexpr.cc  |  6 ++
 gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C | 22 ++
 2 files changed, 28 insertions(+)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index fb8a1023b222..f885a806c0a2 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -7778,6 +7778,12 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
return t;
  }
  }
+   else if (TYPE_PTR_P (type)
+   && TREE_CODE (TREE_TYPE (type)) == METHOD_TYPE)
+ /* INTEGER_CST with pointer-to-method type is only used
+for a virtual method in a pointer to member function.
+Don't reject those.  */
+ ;
else
  {
/* This detects for example:
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C
new file mode 100644
index ..89330bf86200
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C
@@ -0,0 +1,22 @@
+// PR c++/117615
+// { dg-do "compile" { target c++20 } }
+
+struct Base {
+virtual void doit (int v) const {}
+};
+
+struct Derived : Base {
+void doit (int v) const {}
+};
+
+using fn_t = void (Base::*)(int) const;
+
+struct Helper {
+fn_t mFn;
+constexpr Helper (auto && fn) : mFn(static_cast(fn)) {}
+};
+
+void foo () {
+constexpr Helper h (&Derived::doit);
+constexpr Helper h2 (&Base::doit);
+}


[gcc r15-5933] params.opt: Fix typo

2024-12-05 Thread Filip Kastl via Gcc-cvs
https://gcc.gnu.org/g:2a2f285ecd2cd681cadae305990ffb9e23e157cb

commit r15-5933-g2a2f285ecd2cd681cadae305990ffb9e23e157cb
Author: Filip Kastl 
Date:   Thu Dec 5 11:23:13 2024 +0100

params.opt: Fix typo

Add missing '=' after -param=cycle-accurate-model.

gcc/ChangeLog:

* params.opt: Add missing '=' after -param=cycle-accurate-model.

Signed-off-by: Filip Kastl 

Diff:
---
 gcc/params.opt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/params.opt b/gcc/params.opt
index f5cc71d0f493..5853bf02f9ee 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -66,7 +66,7 @@ Enable asan stack protection.
 Common Joined UInteger Var(param_asan_use_after_return) Init(1) 
IntegerRange(0, 1) Param Optimization
 Enable asan detection of use-after-return bugs.
 
--param=cycle-accurate-model
+-param=cycle-accurate-model=
 Common Joined UInteger Var(param_cycle_accurate_model) Init(1) IntegerRange(0, 
1) Param Optimization
 Whether the scheduling description is mostly a cycle-accurate model of the 
target processor and is likely to be spill aggressively to fill any pipeline 
bubbles.


[gcc r15-5937] AVR: target/107957 - Split multi-byte loads and stores.

2024-12-05 Thread Georg-Johann Lay via Gcc-cvs
https://gcc.gnu.org/g:b78c0dcb1b6b523880ee193698defca3ebd0b3f7

commit r15-5937-gb78c0dcb1b6b523880ee193698defca3ebd0b3f7
Author: Georg-Johann Lay 
Date:   Sun Dec 1 17:12:34 2024 +0100

AVR: target/107957 - Split multi-byte loads and stores.

This patch splits multi-byte loads and stores into single-byte
ones provided:

-  New option -msplit-ldst is on (e.g. -O2 and higher), and
-  The memory is non-volatile, and
-  The address space is generic, and
-  The split addresses are natively supported by the hardware.

gcc/
PR target/107957
* config/avr/avr.opt (-msplit-ldst, avropt_split_ldst):
New option and associated var.
* common/config/avr/avr-common.cc (avr_option_optimization_table)
[OPT_LEVELS_2_PLUS]: Turn on -msplit_ldst.
* config/avr/avr-passes.cc (splittable_address_p)
(avr_byte_maybe_mem, avr_split_ldst): New functions.
* config/avr/avr-protos.h (avr_split_ldst): New proto.
* config/avr/avr.md (define_split) [avropt_split_ldst]: Run
avr_split_ldst().

Diff:
---
 gcc/common/config/avr/avr-common.cc |   1 +
 gcc/config/avr/avr-passes.cc| 106 
 gcc/config/avr/avr-protos.h |   1 +
 gcc/config/avr/avr.md   |  19 +--
 gcc/config/avr/avr.opt  |   4 ++
 5 files changed, 126 insertions(+), 5 deletions(-)

diff --git a/gcc/common/config/avr/avr-common.cc 
b/gcc/common/config/avr/avr-common.cc
index 7473429fa360..9059e7d2b485 100644
--- a/gcc/common/config/avr/avr-common.cc
+++ b/gcc/common/config/avr/avr-common.cc
@@ -39,6 +39,7 @@ static const struct default_options 
avr_option_optimization_table[] =
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_mfuse_move_, NULL, 3 },
 { OPT_LEVELS_2_PLUS, OPT_mfuse_move_, NULL, 23 },
 { OPT_LEVELS_2_PLUS, OPT_msplit_bit_shift, NULL, 1 },
+{ OPT_LEVELS_2_PLUS, OPT_msplit_ldst, NULL, 1 },
 // Stick to the "old" placement of the subreg lowering pass.
 { OPT_LEVELS_1_PLUS, OPT_fsplit_wide_types_early, NULL, 1 },
 /* Allow optimizer to introduce store data races. This used to be the
diff --git a/gcc/config/avr/avr-passes.cc b/gcc/config/avr/avr-passes.cc
index f89a534bcbd9..de8de1cd2e8a 100644
--- a/gcc/config/avr/avr-passes.cc
+++ b/gcc/config/avr/avr-passes.cc
@@ -5466,6 +5466,112 @@ avr_split_fake_addressing_move (rtx_insn * /*insn*/, 
rtx *xop)
 }
 
 
+/* Given memory reference mem(ADDR), return true when it can be split into
+   single-byte moves, and all resulting addresses are natively supported.
+   ADDR is in addr-space generic.  */
+
+static bool
+splittable_address_p (rtx addr, int n_bytes)
+{
+  if (CONSTANT_ADDRESS_P (addr)
+  || GET_CODE (addr) == PRE_DEC
+  || GET_CODE (addr) == POST_INC)
+return true;
+
+  if (! AVR_TINY)
+{
+  rtx base = select()
+   : REG_P (addr) ? addr
+   : GET_CODE (addr) == PLUS ? XEXP (addr, 0)
+   : NULL_RTX;
+
+  int off = select()
+   : REG_P (addr) ? 0
+   : GET_CODE (addr) == PLUS ? (int) INTVAL (XEXP (addr, 1))
+   : -1;
+
+  return (base && REG_P (base)
+ && (REGNO (base) == REG_Y || REGNO (base) == REG_Z)
+ && IN_RANGE (off, 0, 64 - n_bytes));
+}
+
+  return false;
+}
+
+
+/* Like avr_byte(), but also knows how to split POST_INC and PRE_DEC
+   memory references.  */
+
+static rtx
+avr_byte_maybe_mem (rtx x, int n)
+{
+  rtx addr, b;
+  if (MEM_P (x)
+  && (GET_CODE (addr = XEXP (x, 0)) == POST_INC
+ || GET_CODE (addr) == PRE_DEC))
+b = gen_rtx_MEM (QImode, copy_rtx (addr));
+  else
+b = avr_byte (x, n);
+
+  if (MEM_P (x))
+gcc_assert (MEM_P (b));
+
+  return b;
+}
+
+
+/* Split multi-byte load / stores into 1-byte such insns
+   provided non-volatile, addr-space = generic, no reg-overlap
+   and the resulting addressings are all natively supported.
+   Returns true when the  XOP[0] = XOP[1]  insn has been split and
+   false, otherwise.  */
+
+bool
+avr_split_ldst (rtx *xop)
+{
+  rtx dest = xop[0];
+  rtx src = xop[1];
+  machine_mode mode = GET_MODE (dest);
+  int n_bytes = GET_MODE_SIZE (mode);
+  rtx mem, reg_or_0;
+
+  if (MEM_P (dest) && reg_or_0_operand (src, mode))
+{
+  mem = dest;
+  reg_or_0 = src;
+}
+  else if (register_operand (dest, mode) && MEM_P (src))
+{
+  reg_or_0 = dest;
+  mem = src;
+}
+  else
+return false;
+
+  rtx addr = XEXP (mem, 0);
+
+  if (MEM_VOLATILE_P (mem)
+  || ! ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (mem))
+  || ! IN_RANGE (n_bytes, 2, 4)
+  || ! splittable_address_p (addr, n_bytes)
+  || reg_overlap_mentioned_p (reg_or_0, addr))
+return false;
+
+  const int step = GET_CODE (addr) == PRE_DEC ? -1 : 1;
+  const int istart = step > 0 ? 0 : n_bytes - 1;
+  const int iend = istart + step * n_bytes;
+
+  for (int i = istart; i != iend; i += step)
+{
+  rtx di = avr_byte_may

[gcc r14-11062] AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.

2024-12-05 Thread Georg-Johann Lay via Gcc-cvs
https://gcc.gnu.org/g:0eb7f0a860add7b1c79ae4248e1960120bc77d60

commit r14-11062-g0eb7f0a860add7b1c79ae4248e1960120bc77d60
Author: Georg-Johann Lay 
Date:   Wed Dec 4 20:56:50 2024 +0100

AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.

In nonlocal_goto sets, change hard_frame_pointer_rtx only after
emit_stack_restore() restored SP.  This is needed because SP
my be stored in some frame location.

gcc/
PR target/64242
* config/avr/avr.md (nonlocal_goto): Don't restore
hard_frame_pointer_rtx directly, but copy it to local
register, and only set hard_frame_pointer_rtx from it
after emit_stack_restore().

(cherry picked from commit f7b5527d1b48b33d8ab633c1e9dcb9883667492a)

Diff:
---
 gcc/config/avr/avr.md | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index b7273fa19f6e..823fc716f2c7 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -404,9 +404,14 @@
 
 emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
 
-emit_move_insn (hard_frame_pointer_rtx, r_fp);
+// PR64242: When r_sp is located in the frame, we must not
+// change FP prior to reading r_sp.  Hence copy r_fp to a
+// local register (and hope that reload won't spill it).
+rtx r_fp_reg = copy_to_reg (r_fp);
 emit_stack_restore (SAVE_NONLOCAL, r_sp);
 
+emit_move_insn (hard_frame_pointer_rtx, r_fp_reg);
+
 emit_use (hard_frame_pointer_rtx);
 emit_use (stack_pointer_rtx);


[gcc r15-5934] doc: Add store-forwarding-max-distance to invoke.texi

2024-12-05 Thread Filip Kastl via Gcc-cvs
https://gcc.gnu.org/g:9755f5973473aa547063d1a97d47a409d237eb5b

commit r15-5934-g9755f5973473aa547063d1a97d47a409d237eb5b
Author: Filip Kastl 
Date:   Thu Dec 5 11:27:26 2024 +0100

doc: Add store-forwarding-max-distance to invoke.texi

gcc/ChangeLog:

* doc/invoke.texi: Add store-forwarding-max-distance.

Signed-off-by: Filip Kastl 

Diff:
---
 gcc/doc/invoke.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d2409a41d50a..4b1acf9b79c1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17122,6 +17122,11 @@ diagnostics.
 @item store-merging-max-size
 Maximum size of a single store merging region in bytes.
 
+@item store-forwarding-max-distance
+Maximum number of instruction distance that a small store forwarded to a larger
+load may stall. Value '0' disables the cost checks for the
+avoid-store-forwarding pass.
+
 @item hash-table-verification-limit
 The number of elements for which hash table verification is done
 for each searched element.


[gcc r15-5936] AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.

2024-12-05 Thread Georg-Johann Lay via Gcc-cvs
https://gcc.gnu.org/g:f7b5527d1b48b33d8ab633c1e9dcb9883667492a

commit r15-5936-gf7b5527d1b48b33d8ab633c1e9dcb9883667492a
Author: Georg-Johann Lay 
Date:   Wed Dec 4 20:56:50 2024 +0100

AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.

In nonlocal_goto sets, change hard_frame_pointer_rtx only after
emit_stack_restore() restored SP.  This is needed because SP
my be stored in some frame location.

gcc/
PR target/64242
* config/avr/avr.md (nonlocal_goto): Don't restore
hard_frame_pointer_rtx directly, but copy it to local
register, and only set hard_frame_pointer_rtx from it
after emit_stack_restore().

Diff:
---
 gcc/config/avr/avr.md | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index a68b38c272de..f45677a4533d 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -421,9 +421,14 @@
 
 emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
 
-emit_move_insn (hard_frame_pointer_rtx, r_fp);
+// PR64242: When r_sp is located in the frame, we must not
+// change FP prior to reading r_sp.  Hence copy r_fp to a
+// local register (and hope that reload won't spill it).
+rtx r_fp_reg = copy_to_reg (r_fp);
 emit_stack_restore (SAVE_NONLOCAL, r_sp);
 
+emit_move_insn (hard_frame_pointer_rtx, r_fp_reg);
+
 emit_use (hard_frame_pointer_rtx);
 emit_use (stack_pointer_rtx);


[gcc r14-11063] c++: Don't reject pointer to virtual method during constant evaluation [PR117615]

2024-12-05 Thread Simon Martin via Gcc-cvs
https://gcc.gnu.org/g:4a73efcbdc5fb9c3f6ab0cba718dd25b5062fc22

commit r14-11063-g4a73efcbdc5fb9c3f6ab0cba718dd25b5062fc22
Author: Simon Martin 
Date:   Tue Dec 3 14:30:43 2024 +0100

c++: Don't reject pointer to virtual method during constant evaluation 
[PR117615]

We currently reject the following valid code:

=== cut here ===
struct Base {
virtual void doit (int v) const {}
};
struct Derived : Base {
void doit (int v) const {}
};
using fn_t = void (Base::*)(int) const;
struct Helper {
fn_t mFn;
constexpr Helper (auto && fn) : mFn(static_cast(fn)) {}
};
void foo () {
constexpr Helper h (&Derived::doit);
}
=== cut here ===

The problem is that since r6-4014-gdcdbc004d531b4, &Derived::doit is
represented with an expression with type pointer to method and using an
INTEGER_CST (here 1), and that cxx_eval_constant_expression rejects any
such expression with a non-null INTEGER_CST.

This patch uses the same strategy as r12-4491-gf45610a45236e9 (fix for
PR c++/102786), and simply lets such expressions go through.

PR c++/117615

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Don't reject
INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-virtual22.C: New test.

(cherry picked from commit 72a2380a306a1c3883cb7e4f99253522bc265af0)

Diff:
---
 gcc/cp/constexpr.cc  |  6 ++
 gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C | 22 ++
 2 files changed, 28 insertions(+)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 853694d78a56..40cc755258ff 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -8254,6 +8254,12 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
return t;
  }
  }
+   else if (TYPE_PTR_P (type)
+   && TREE_CODE (TREE_TYPE (type)) == METHOD_TYPE)
+ /* INTEGER_CST with pointer-to-method type is only used
+for a virtual method in a pointer to member function.
+Don't reject those.  */
+ ;
else
  {
/* This detects for example:
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C
new file mode 100644
index ..89330bf86200
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C
@@ -0,0 +1,22 @@
+// PR c++/117615
+// { dg-do "compile" { target c++20 } }
+
+struct Base {
+virtual void doit (int v) const {}
+};
+
+struct Derived : Base {
+void doit (int v) const {}
+};
+
+using fn_t = void (Base::*)(int) const;
+
+struct Helper {
+fn_t mFn;
+constexpr Helper (auto && fn) : mFn(static_cast(fn)) {}
+};
+
+void foo () {
+constexpr Helper h (&Derived::doit);
+constexpr Helper h2 (&Base::doit);
+}


[gcc r15-5939] c: Diagnose unexpected va_start arguments in C23 [PR107980]

2024-12-05 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:fca04028d7075a6eaae350774a3916f14d4004ae

commit r15-5939-gfca04028d7075a6eaae350774a3916f14d4004ae
Author: Jakub Jelinek 
Date:   Thu Dec 5 12:57:44 2024 +0100

c: Diagnose unexpected va_start arguments in C23 [PR107980]

va_start macro was changed in C23 from the C17 va_start (va_list ap, parmN)
where parmN is the identifier of the last parameter into
va_start (va_list ap, ...) where arguments after ap aren't evaluated.
Late in the C23 development
"If any additional arguments expand to include unbalanced parentheses, or
a preprocessing token that does not convert to a token, the behavior is
undefined."
has been added, plus there is
"NOTE The macro allows additional arguments to be passed for va_start for
compatibility with older versions of the library only."
and
"Additional arguments beyond the first given to the va_start macro may be
expanded and used in unspecified contexts where they are unevaluated. For
example, an implementation diagnoses potentially erroneous input for an
invocation of va_start such as:"
...
va_start(vl, 1, 3.0, "12", xd); // diagnostic encouraged
...
"Simultaneously, va_start usage consistent with older revisions of this
document should not produce a diagnostic:"
...
void neigh (int last_arg, ...) {
va_list vl;
va_start(vl, last_arg); // no diagnostic

The following patch implements the recommended diagnostics.
Until now in C23 mode va_start(v, ...) was defined to
__builtin_va_start(v, 0)
and the extra arguments were silently ignored.
The following patch adds a new builtin in a form of a keyword which
parses the first argument, is silent about the __builtin_c23_va_start (ap)
form, for __builtin_c23_va_start (ap, identifier) looks the identifier up
and is silent if it is the last named parameter (except that it diagnoses
if it has register keyword), otherwise diagnoses it isn't the last one
but something else, and if there is just __builtin_c23_va_start (ap, )
or if __builtin_c23_va_start (ap, is followed by tokens other than
identifier followed by ), it skips over the tokens (with handling of
balanced ()s) until ) and diagnoses the extra tokens.
In all cases in a form of warnings.

2024-12-05  Jakub Jelinek  

PR c/107980
gcc/
* ginclude/stdarg.h (va_start): For C23+ change parameters from
v, ... to just ... and define to __builtin_c23_va_start(__VA_ARGS__)
rather than __builtin_va_start(v, 0).
gcc/c-family/
* c-common.h (enum rid): Add RID_C23_VA_START.
* c-common.cc (c_common_reswords): Add __builtin_c23_va_start.
gcc/c/
* c-parser.cc (c_parser_postfix_expression): Handle 
RID_C23_VA_START.
gcc/testsuite/
* gcc.dg/c23-stdarg-4.c: Expect extra warning.
* gcc.dg/c23-stdarg-6.c: Likewise.
* gcc.dg/c23-stdarg-7.c: Likewise.
* gcc.dg/c23-stdarg-8.c: Likewise.
* gcc.dg/c23-stdarg-10.c: New test.
* gcc.dg/c23-stdarg-11.c: New test.
* gcc.dg/torture/c23-stdarg-split-1a.c: Expect extra warning.
* gcc.dg/torture/c23-stdarg-split-1b.c: Likewise.

Diff:
---
 gcc/c-family/c-common.cc   |   1 +
 gcc/c-family/c-common.h|   2 +-
 gcc/c/c-parser.cc  |  95 +
 gcc/ginclude/stdarg.h  |   2 +-
 gcc/testsuite/gcc.dg/c23-stdarg-10.c   | 112 +
 gcc/testsuite/gcc.dg/c23-stdarg-11.c   |  11 ++
 gcc/testsuite/gcc.dg/c23-stdarg-4.c|   2 +-
 gcc/testsuite/gcc.dg/c23-stdarg-6.c|   2 +-
 gcc/testsuite/gcc.dg/c23-stdarg-7.c|   2 +
 gcc/testsuite/gcc.dg/c23-stdarg-8.c|   2 +
 gcc/testsuite/gcc.dg/torture/c23-stdarg-split-1a.c |   2 +
 gcc/testsuite/gcc.dg/torture/c23-stdarg-split-1b.c |   2 +-
 12 files changed, 230 insertions(+), 5 deletions(-)

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index d21f2f9909c4..048952311f2f 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -459,6 +459,7 @@ const struct c_common_resword c_common_reswords[] =
   { "__builtin_tgmath", RID_BUILTIN_TGMATH, D_CONLY },
   { "__builtin_offsetof", RID_OFFSETOF, 0 },
   { "__builtin_types_compatible_p", RID_TYPES_COMPATIBLE_P, D_CONLY },
+  { "__builtin_c23_va_start", RID_C23_VA_START,D_C23 },
   { "__builtin_va_arg",RID_VA_ARG, 0 },
   { "__complex",   RID_COMPLEX,0 },
   { "__complex__", RID_COMPLEX,0 },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 7834e0d19590..e2195aa54b8b 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -105,7 +105,7 @@ enum rid
 
   /* C extensions */
   RID_ASM,   RID_TYP

[gcc r15-5940] doloop: Fix up doloop df use [PR116799]

2024-12-05 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:0eed81612ad6eac2bec60286348a103d4dc02a5a

commit r15-5940-g0eed81612ad6eac2bec60286348a103d4dc02a5a
Author: Jakub Jelinek 
Date:   Thu Dec 5 13:01:21 2024 +0100

doloop: Fix up doloop df use [PR116799]

The following testcases are miscompiled on s390x-linux, because the
doloop_optimize
  /* Ensure that the new sequence doesn't clobber a register that
 is live at the end of the block.  */
  {
bitmap modified = BITMAP_ALLOC (NULL);

for (rtx_insn *i = doloop_seq; i != NULL; i = NEXT_INSN (i))
  note_stores (i, record_reg_sets, modified);

basic_block loop_end = desc->out_edge->src;
bool fail = bitmap_intersect_p (df_get_live_out (loop_end), modified);
check doesn't work as intended.
The problem is that it uses df, but the df analysis was only done using
  iv_analysis_loop_init (loop);
->
  df_analyze_loop (loop);
which computes df inside on the bbs of the loop.
While loop_end bb is inside of the loop, df_get_live_out computed that
way includes registers set in the loop and used at the start of the next
iteration, but doesn't include registers set in the loop (or before the
loop) and used after the loop.

The following patch fixes that by doing whole function df_analyze first,
changes the loop iteration mode from 0 to LI_ONLY_INNERMOST (on many
targets which use can_use_doloop_if_innermost target hook a so are known
to only handle innermost loops) or LI_FROM_INNERMOST (I think only bfin
actually allows non-innermost loops) and checking not just
df_get_live_out (loop_end) (that is needed for something used by the
next iteration), but also df_get_live_in (desc->out_edge->dest),
i.e. what will be used after the loop.  df of such a bb shouldn't
be affected by the df_analyze_loop and so should be from df_analyze
of the whole function.

2024-12-05  Jakub Jelinek  

PR rtl-optimization/113994
PR rtl-optimization/116799
* loop-doloop.cc: Include targhooks.h.
(doloop_optimize): Also punt on intersection of modified
with df_get_live_in (desc->out_edge->dest).
(doloop_optimize_loops): Call df_analyze.  Use
LI_ONLY_INNERMOST or LI_FROM_INNERMOST instead of 0 as
second loops_list argument.

* gcc.c-torture/execute/pr116799.c: New test.
* g++.dg/torture/pr113994.C: New test.

Diff:
---
 gcc/loop-doloop.cc | 20 -
 gcc/testsuite/g++.dg/torture/pr113994.C| 31 +++
 gcc/testsuite/gcc.c-torture/execute/pr116799.c | 41 ++
 3 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/gcc/loop-doloop.cc b/gcc/loop-doloop.cc
index 2f0c56b0efd2..60d5f2c10c66 100644
--- a/gcc/loop-doloop.cc
+++ b/gcc/loop-doloop.cc
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "loop-unroll.h"
 #include "regs.h"
 #include "df.h"
+#include "targhooks.h"
 
 /* This module is used to modify loops with a determinable number of
iterations to use special low-overhead looping instructions.
@@ -800,6 +801,18 @@ doloop_optimize (class loop *loop)
 
 basic_block loop_end = desc->out_edge->src;
 bool fail = bitmap_intersect_p (df_get_live_out (loop_end), modified);
+/* iv_analysis_loop_init calls df_analyze_loop, which computes just
+   partial df for blocks of the loop only.  The above will catch if
+   any of the modified registers are use inside of the loop body, but
+   it will most likely not have accurate info on registers used
+   at the destination of the out_edge.  We call df_analyze on the
+   whole function at the start of the pass though and iterate only
+   on innermost loops or from innermost loops, so
+   live in on desc->out_edge->dest should be still unmodified from
+   the initial df_analyze.  */
+if (!fail)
+  fail = bitmap_intersect_p (df_get_live_in (desc->out_edge->dest),
+modified);
 BITMAP_FREE (modified);
 
 if (fail)
@@ -825,7 +838,12 @@ doloop_optimize_loops (void)
   df_live_set_all_dirty ();
 }
 
-  for (auto loop : loops_list (cfun, 0))
+  df_analyze ();
+
+  for (auto loop : loops_list (cfun,
+  targetm.can_use_doloop_p
+  == can_use_doloop_if_innermost
+  ? LI_ONLY_INNERMOST : LI_FROM_INNERMOST))
 doloop_optimize (loop);
 
   if (optimize == 1)
diff --git a/gcc/testsuite/g++.dg/torture/pr113994.C 
b/gcc/testsuite/g++.dg/torture/pr113994.C
new file mode 100644
index ..c9c186d45ee7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr113994.C
@@ -0,0 +1,31 @@
+// PR rtl-optimization/113994
+// { dg-do run }
+
+#include 
+
+void
+foo (const std::string &x, size_t &y, std::string &z)
+{
+  size_t w = 

[gcc r15-5956] rtl-optimization/117922 - add timevar for fold-mem-offsets

2024-12-05 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:8772f37e45e9401c9a361548e00c9691424e75e0

commit r15-5956-g8772f37e45e9401c9a361548e00c9691424e75e0
Author: Richard Biener 
Date:   Fri Dec 6 08:08:55 2024 +0100

rtl-optimization/117922 - add timevar for fold-mem-offsets

The new fold-mem-offsets RTL pass takes significant amount of time
and memory.  Add a timevar for it.

PR rtl-optimization/117922
* timevar.def (TV_FOLD_MEM_OFFSETS): New.
* fold-mem-offsets.cc (pass_data_fold_mem): Use TV_FOLD_MEM_OFFSETS.

Diff:
---
 gcc/fold-mem-offsets.cc | 2 +-
 gcc/timevar.def | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc
index 84b9623058bd..284aea9f06fa 100644
--- a/gcc/fold-mem-offsets.cc
+++ b/gcc/fold-mem-offsets.cc
@@ -100,7 +100,7 @@ const pass_data pass_data_fold_mem =
   RTL_PASS, /* type */
   "fold_mem_offsets", /* name */
   OPTGROUP_NONE, /* optinfo_flags */
-  TV_NONE, /* tv_id */
+  TV_FOLD_MEM_OFFSETS, /* tv_id */
   0, /* properties_required */
   0, /* properties_provided */
   0, /* properties_destroyed */
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 574e62584ffc..4bd26e0b6b79 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -317,6 +317,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IFCVT   , "tree loop 
if-conversion")
 DEFTIMEVAR (TV_WARN_ACCESS   , "access analysis")
 DEFTIMEVAR (TV_GIMPLE_CRC_OPTIMIZATION, "crc optimization")
 DEFTIMEVAR (TV_EXT_DCE   , "ext dce")
+DEFTIMEVAR (TV_FOLD_MEM_OFFSETS  , "fold mem offsets")
 
 /* Everything else in rest_of_compilation not included above.  */
 DEFTIMEVAR (TV_EARLY_LOCAL  , "early local passes")


[gcc r15-5957] SVE intrinsics: Fold calls with pfalse predicate.

2024-12-05 Thread Jennifer Schmitz via Gcc-cvs
https://gcc.gnu.org/g:5289540ed58e42ae66255e31f22afe4ca0a6e15e

commit r15-5957-g5289540ed58e42ae66255e31f22afe4ca0a6e15e
Author: Jennifer Schmitz 
Date:   Fri Nov 15 07:45:59 2024 -0800

SVE intrinsics: Fold calls with pfalse predicate.

If an SVE intrinsic has predicate pfalse, we can fold the call to
a simplified assignment statement: For _m predication, the LHS can be 
assigned
the operand for inactive values and for _z, we can assign a zero vector.
For _x, the returned values can be arbitrary and as suggested by
Richard Sandiford, we fold to a zero vector.

For example,
svint32_t foo (svint32_t op1, svint32_t op2)
{
  return svadd_s32_m (svpfalse_b (), op1, op2);
}
can be folded to lhs = op1, such that foo is compiled to just a RET.

For implicit predication, a case distinction is necessary:
Intrinsics that read from memory can be folded to a zero vector.
Intrinsics that write to memory or prefetch can be folded to a no-op.
Other intrinsics need case-by-case implemenation, which we added in
the corresponding svxxx_impl::fold.

We implemented this optimization during gimple folding by calling a new 
method
gimple_folder::fold_pfalse from gimple_folder::fold, which covers the 
generic
cases described above.

We tested the new behavior for each intrinsic with all supported 
predications
and data types and checked the produced assembly. There is a test file
for each shape subclass with scan-assembler-times tests that look for
the simplified instruction sequences, such as individual RET instructions
or zeroing moves. There is an additional directive counting the total 
number of
functions in the test, which must be the sum of counts of all other
directives. This is to check that all tested intrinsics were optimized.

Some few intrinsics were not covered by this patch:
- svlasta and svlastb already have an implementation to cover a pfalse
predicate. No changes were made to them.
- svld1/2/3/4 return aggregate types and were excluded from the case
that folds calls with implicit predication to lhs = {0, ...}.
- svst1/2/3/4 already have an implementation in svstx_impl that precedes
our optimization, such that it is not triggered.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/ChangeLog:

PR target/106329
* config/aarch64/aarch64-sve-builtins-base.cc
(svac_impl::fold): Add folding if pfalse predicate.
(svadda_impl::fold): Likewise.
(class svaddv_impl): Likewise.
(class svandv_impl): Likewise.
(svclast_impl::fold): Likewise.
(svcmp_impl::fold): Likewise.
(svcmp_wide_impl::fold): Likewise.
(svcmpuo_impl::fold): Likewise.
(svcntp_impl::fold): Likewise.
(class svcompact_impl): Likewise.
(class svcvtnt_impl): Likewise.
(class sveorv_impl): Likewise.
(class svminv_impl): Likewise.
(class svmaxnmv_impl): Likewise.
(class svmaxv_impl): Likewise.
(class svminnmv_impl): Likewise.
(class svorv_impl): Likewise.
(svpfirst_svpnext_impl::fold): Likewise.
(svptest_impl::fold): Likewise.
(class svsplice_impl): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(class svcvtxnt_impl): Likewise.
(svmatch_svnmatch_impl::fold): Likewise.
* config/aarch64/aarch64-sve-builtins.cc
(is_pfalse): Return true if tree is pfalse.
(gimple_folder::fold_pfalse): Fold calls with pfalse predicate.
(gimple_folder::fold_call_to): Fold call to lhs = t for given tree 
t.
(gimple_folder::fold_to_stmt_vops): Helper function that folds the
call to given stmt and adjusts virtual operands.
(gimple_folder::fold): Call fold_pfalse.
* config/aarch64/aarch64-sve-builtins.h (is_pfalse): Declare 
is_pfalse.

gcc/testsuite/ChangeLog:

PR target/106329
* gcc.target/aarch64/pfalse-binary_0.h: New test.
* gcc.target/aarch64/pfalse-unary_0.h: New test.
* gcc.target/aarch64/sve/pfalse-binary.c: New test.
* gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c: New test.
* gcc.target/aarch64/sve/pfalse-binary_opt_n.c: New test.
* gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c: New test.
* gcc.target/aarch64/sve/pfalse-binary_rotate.c: New test.
* gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c: New test.
* gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c: New test.
* gcc.target/aarch64/sve/pfalse-binaryxn.c: New test.
* gcc.target/aarch64/sv

[gcc r15-5935] AVR: Rework patterns that add / subtract an (inverted) MSB.

2024-12-05 Thread Georg-Johann Lay via Gcc-cvs
https://gcc.gnu.org/g:9ae9db54631f38d6a2080a2a26c5c5d98fa9

commit r15-5935-g9ae9db54631f38d6a2080a2a26c5c5d98fa9
Author: Georg-Johann Lay 
Date:   Tue Dec 3 21:49:32 2024 +0100

AVR: Rework patterns that add / subtract an (inverted) MSB.

gcc/
* config/avr/avr-protos.h (avr_out_add_msb): New proto.
* config/avr/avr.cc (avr_out_add_msb): New function.
(avr_adjust_insn_length) [ADJUST_LEN_ADD_GE0,
ADJUST_LEN_ADD_LT0]: Handle cases.
* config/avr/avr.md (adjust_len) : New attr 
values.
(QISI2): New mode iterator.
(C_MSB): New mode_attr.
(*add3...msb_split, *add3.ge0, *add3.lt0)
(*sub3...msb_split, *sub3.ge0, *sub3.lt0): New
patterns replacing old ones, but with iterators and
using avr_out_add_msb() for asm out.

Diff:
---
 gcc/config/avr/avr-protos.h |   1 +
 gcc/config/avr/avr.cc   |  91 
 gcc/config/avr/avr.md   | 249 
 3 files changed, 227 insertions(+), 114 deletions(-)

diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h
index 4aa8554000b8..5b42f04fb313 100644
--- a/gcc/config/avr/avr-protos.h
+++ b/gcc/config/avr/avr-protos.h
@@ -109,6 +109,7 @@ extern const char *avr_out_sbxx_branch (rtx_insn *insn, rtx 
operands[]);
 extern const char* avr_out_bitop (rtx, rtx*, int*);
 extern const char* avr_out_plus (rtx, rtx*, int* =NULL, bool =true);
 extern const char* avr_out_plus_ext (rtx_insn*, rtx*, int*);
+extern const char* avr_out_add_msb (rtx_insn*, rtx*, rtx_code, int*);
 extern const char* avr_out_round (rtx_insn *, rtx*, int* =NULL);
 extern const char* avr_out_addto_sp (rtx*, int*);
 extern const char* avr_out_xload (rtx_insn *, rtx*, int*);
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 9bebd67cd9c4..3544571d3dfa 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -8274,6 +8274,94 @@ avr_out_plus_ext (rtx_insn *insn, rtx *yop, int *plen)
 }
 
 
+/* Output code for addition of a sign-bit
+
+  YOP[0] += YOP[1]  0
+
+   or such a subtraction:
+
+  YOP[0] -= YOP[2]  0
+
+   where CMP is in { GE, LT }.
+   If PLEN == NULL output the instructions.
+   If PLEN != NULL set *PLEN to the length of the sequence in words.  */
+
+const char *
+avr_out_add_msb (rtx_insn *insn, rtx *yop, rtx_code cmp, int *plen)
+{
+  const rtx_code add = GET_CODE (SET_SRC (single_set (insn)));
+  const machine_mode mode = GET_MODE (yop[0]);
+  const int n_bytes = GET_MODE_SIZE (mode);
+  rtx sigop = yop[add == PLUS ? 1 : 2];
+  rtx msb = avr_byte (sigop, GET_MODE_SIZE (GET_MODE (sigop)) - 1);
+  rtx op[3] = { yop[0], msb, nullptr };
+
+  if (plen)
+*plen = 0;
+
+  if (n_bytes == 1
+  || (n_bytes == 2 && avr_adiw_reg_p (op[0])))
+{
+  avr_asm_len (cmp == LT
+  ? "sbrc %1,7"
+  : "sbrs %1,7", op, plen, 1);
+  const char *s_add = add == PLUS
+   ? n_bytes == 1 ? "inc %0" : "adiw %0,1"
+   : n_bytes == 1 ? "dec %0" : "sbiw %0,1";
+  return avr_asm_len (s_add, op, plen, 1);
+}
+
+  bool labl_p = false;
+  const char *s_code0 = nullptr;
+
+  // Default code provided SREG.C = MSBit.
+  const char *s_code = add == PLUS
+? "adc %2,__zero_reg__"
+: "sbc %2,__zero_reg__";
+
+  if (cmp == LT)
+{
+  if (reg_unused_after (insn, sigop)
+ && ! reg_overlap_mentioned_p (msb, op[0]))
+   avr_asm_len ("lsl %1", op, plen, 1);
+  else
+   avr_asm_len ("mov __tmp_reg__,%1" CR_TAB
+"lsl __tmp_reg__", op, plen, 2);
+}
+  else if (test_hard_reg_class (LD_REGS, msb))
+{
+  avr_asm_len ("cpi %1,0x80", op, plen, 1);
+}
+  else if (test_hard_reg_class (LD_REGS, op[0]))
+{
+  labl_p = true;
+  avr_asm_len ("tst %1" CR_TAB
+  "brmi 0f", op, plen, 2);
+  s_code0 = add == PLUS ? "subi %2,-1" : "subi %2,1";
+  s_code  = add == PLUS ? "sbci %2,-1" : "sbci %2,0";
+}
+  else
+{
+  labl_p = true;
+  avr_asm_len ("tst %1"  CR_TAB
+  "brmi 0f" CR_TAB
+  "sec", op, plen, 3);
+}
+
+  for (int i = 0; i < n_bytes; ++i)
+{
+  op[2] = avr_byte (op[0], i);
+  avr_asm_len (i == 0 && s_code0
+  ? s_code0
+  : s_code, op, plen, 1);
+}
+
+  return labl_p
+? avr_asm_len ("0:", op, plen, 0)
+: "";
+}
+
+
 /* Output addition of register XOP[0] and compile time constant XOP[2].
INSN is a single_set insn or an insn pattern.
CODE == PLUS:  perform addition by using ADD instructions or
@@ -10669,6 +10757,9 @@ avr_adjust_insn_length (rtx_insn *insn, int len)
 case ADJUST_LEN_ADD_SET_ZN: avr_out_plus_set_ZN (op, &len); break;
 case ADJUST_LEN_ADD_SET_N:  avr_out_plus_set_N (op, &len); break;
 
+case ADJUST_LEN_ADD_GE0: avr_out_add_msb (insn, op, GE, &len); break;
+case ADJUST_LEN_ADD_LT0: avr_out_add_msb (insn, op, 

[gcc r15-5938] AVR: target/107957 - Propagate zero_reg to store sources.

2024-12-05 Thread Georg-Johann Lay via Gcc-cvs
https://gcc.gnu.org/g:bf6f77edd625cfe2f2f164e90437df318b96527f

commit r15-5938-gbf6f77edd625cfe2f2f164e90437df318b96527f
Author: Georg-Johann Lay 
Date:   Thu Dec 5 11:24:30 2024 +0100

AVR: target/107957 - Propagate zero_reg to store sources.

When -msplit-ldst is on, it may be possible to propagate __zero_reg__
to the sources of the new stores.  For example, without this patch,

unsigned long lx;

void store_lsr17 (void)
{
   lx >>= 17;
}

compiles to:

store_lsr17:
   lds r26,lx+2   ;  movqi_insn
   lds r27,lx+3   ;  movqi_insn
   movw r24,r26   ;  *movhi
   lsr r25;  *lshrhi3_const
   ror r24
   ldi r26,0  ;  movqi_insn
   ldi r27,0  ;  movqi_insn
   sts lx,r24 ;  movqi_insn
   sts lx+1,r25   ;  movqi_insn
   sts lx+2,r26   ;  movqi_insn
   sts lx+3,r27   ;  movqi_insn
   ret

but with this patch it becomes:

store_lsr17:
   lds r26,lx+2   ;  movqi_insn
   lds r27,lx+3   ;  movqi_insn
   movw r24,r26   ;  *movhi
   lsr r25;  *lshrhi3_const
   ror r24
   sts lx,r24 ;  movqi_insn
   sts lx+1,r25   ;  movqi_insn
   sts lx+2,__zero_reg__  ;  movqi_insn
   sts lx+3,__zero_reg__  ;  movqi_insn
   ret

gcc/
PR target/107957
* config/avr/avr-passes-fuse-move.h (bbinfo_t) :
Add static property.
* config/avr/avr-passes.cc (bbinfo_t::try_mem0_p): Define it.
(optimize_data_t::try_mem0): New method.
(bbinfo_t::optimize_one_block) [bbinfo_t::try_mem0_p]: Run try_mem0.
(bbinfo_t::optimize_one_function): Set bbinfo_t::try_mem0_p.
* config/avr/avr.md (pushhi1_insn): Also allow zero as source.
(define_split) [avropt_split_ldst]: Only run avr_split_ldst()
when avr-fuse-move has been run at least once.
* doc/invoke.texi (AVR Options) <-msplit-ldst>: Document it.

Diff:
---
 gcc/config/avr/avr-passes-fuse-move.h |  1 +
 gcc/config/avr/avr-passes.cc  | 49 ++-
 gcc/config/avr/avr.md |  9 +--
 gcc/doc/invoke.texi   |  9 +--
 4 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/gcc/config/avr/avr-passes-fuse-move.h 
b/gcc/config/avr/avr-passes-fuse-move.h
index dbed1a636f3d..432f9ca4670f 100644
--- a/gcc/config/avr/avr-passes-fuse-move.h
+++ b/gcc/config/avr/avr-passes-fuse-move.h
@@ -1172,6 +1172,7 @@ struct bbinfo_t
 
   static find_plies_data_t *fpd;
   static bool try_fuse_p;
+  static bool try_mem0_p;
   static bool try_bin_arg1_p;
   static bool try_simplify_p;
   static bool try_split_ldi_p;
diff --git a/gcc/config/avr/avr-passes.cc b/gcc/config/avr/avr-passes.cc
index de8de1cd2e8a..fad64b1b3454 100644
--- a/gcc/config/avr/avr-passes.cc
+++ b/gcc/config/avr/avr-passes.cc
@@ -434,6 +434,11 @@ static machine_mode size_to_mode (int size)
   Split all insns where the operation can be performed on individual
   bytes, like andsi3.  In example (4) the andhi3 can be optimized
   to an andqi3.
+
+   bbinfo_t::try_mem0_p
+  Try to fuse a mem = reg insn to mem = __zero_reg__.
+  This should only occur when -msplit-ldst is on, but may
+  also occur with pushes since push1 splits them.
 */
 
 
@@ -514,6 +519,7 @@ bool bbinfo_t::try_split_any_p;
 bool bbinfo_t::try_simplify_p;
 bool bbinfo_t::use_arith_p;
 bool bbinfo_t::use_set_some_p;
+bool bbinfo_t::try_mem0_p;
 
 
 // Abstract Interpretation of expressions.
@@ -1087,6 +1093,7 @@ struct optimize_data_t
   {}
 
   bool try_fuse (bbinfo_t *);
+  bool try_mem0 (bbinfo_t *);
   bool try_bin_arg1 (bbinfo_t *);
   bool try_simplify (bbinfo_t *);
   bool try_split_ldi (bbinfo_t *);
@@ -2509,6 +2516,44 @@ bbinfo_t::run_find_plies (const insninfo_t &ii, const 
memento_t &memo) const
 }
 
 
+// Try to propagate __zero_reg__ to a mem = reg insn's source.
+// Returns true on success and sets .n_new_insns.
+bool
+optimize_data_t::try_mem0 (bbinfo_t *)
+{
+  rtx_insn *insn = curr.ii.m_insn;
+  rtx set, mem, reg;
+  machine_mode mode;
+
+  if (insn
+  && (set = single_set (insn))
+  && MEM_P (mem = SET_DEST (set))
+  && REG_P (reg = SET_SRC (set))
+  && GET_MODE_SIZE (mode = GET_MODE (mem)) <= 4
+  && END_REGNO (reg) <= REG_32
+  && ! (regmask (reg) & memento_t::fixed_regs_mask)
+  && curr.regs.have_value (REGNO (reg), GET_MODE_SIZE (mode), 0x0))
+{
+  avr_dump (";; Found insn %d: mem:%m = 0 = r%d\n", INSN_UID (insn),
+   mode, REGNO (reg));
+
+  // Some insns like PUSHes don't clobber REG_CC.
+  bool clobbers_cc = GET_CODE (PATTERN (insn)) == PARALLEL;
+
+  if (clobbers_cc)
+   emit_valid_move_clobbercc (mem, CONST0_RTX (mode));
+  else
+   emit_valid_in

[gcc r13-9231] AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.

2024-12-05 Thread Georg-Johann Lay via Gcc-cvs
https://gcc.gnu.org/g:45bc6c452ef182dd08c0f0836fef88ad5b67b3aa

commit r13-9231-g45bc6c452ef182dd08c0f0836fef88ad5b67b3aa
Author: Georg-Johann Lay 
Date:   Wed Dec 4 20:56:50 2024 +0100

AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.

In nonlocal_goto sets, change hard_frame_pointer_rtx only after
emit_stack_restore() restored SP.  This is needed because SP
my be stored in some frame location.

gcc/
PR target/64242
* config/avr/avr.md (nonlocal_goto): Don't restore
hard_frame_pointer_rtx directly, but copy it to local
register, and only set hard_frame_pointer_rtx from it
after emit_stack_restore().

(cherry picked from commit f7b5527d1b48b33d8ab633c1e9dcb9883667492a)

Diff:
---
 gcc/config/avr/avr.md | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index 9bd6b9119ec4..5d134afbf2c3 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -384,9 +384,14 @@
 
 emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
 
-emit_move_insn (hard_frame_pointer_rtx, r_fp);
+// PR64242: When r_sp is located in the frame, we must not
+// change FP prior to reading r_sp.  Hence copy r_fp to a
+// local register (and hope that reload won't spill it).
+rtx r_fp_reg = copy_to_reg (r_fp);
 emit_stack_restore (SAVE_NONLOCAL, r_sp);
 
+emit_move_insn (hard_frame_pointer_rtx, r_fp_reg);
+
 emit_use (hard_frame_pointer_rtx);
 emit_use (stack_pointer_rtx);


[gcc r12-10848] AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.

2024-12-05 Thread Georg-Johann Lay via Gcc-cvs
https://gcc.gnu.org/g:499d3dc84e40849f607154bd76ed07d37d744cc1

commit r12-10848-g499d3dc84e40849f607154bd76ed07d37d744cc1
Author: Georg-Johann Lay 
Date:   Wed Dec 4 20:56:50 2024 +0100

AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.

In nonlocal_goto sets, change hard_frame_pointer_rtx only after
emit_stack_restore() restored SP.  This is needed because SP
my be stored in some frame location.

gcc/
PR target/64242
* config/avr/avr.md (nonlocal_goto): Don't restore
hard_frame_pointer_rtx directly, but copy it to local
register, and only set hard_frame_pointer_rtx from it
after emit_stack_restore().

(cherry picked from commit f7b5527d1b48b33d8ab633c1e9dcb9883667492a)

Diff:
---
 gcc/config/avr/avr.md | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index f76249340b8f..90ba2d0400e4 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -381,9 +381,14 @@
 
 emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
 
-emit_move_insn (hard_frame_pointer_rtx, r_fp);
+// PR64242: When r_sp is located in the frame, we must not
+// change FP prior to reading r_sp.  Hence copy r_fp to a
+// local register (and hope that reload won't spill it).
+rtx r_fp_reg = copy_to_reg (r_fp);
 emit_stack_restore (SAVE_NONLOCAL, r_sp);
 
+emit_move_insn (hard_frame_pointer_rtx, r_fp_reg);
+
 emit_use (hard_frame_pointer_rtx);
 emit_use (stack_pointer_rtx);


[gcc r15-5941] arm: Add CDE options for star-mc1 cpu

2024-12-05 Thread Richard Earnshaw via Gcc-cvs
https://gcc.gnu.org/g:237fdf51fbfcfa4829471c18fe67535ae9c3efdb

commit r15-5941-g237fdf51fbfcfa4829471c18fe67535ae9c3efdb
Author: Arvin Zhong 
Date:   Thu Dec 5 13:43:14 2024 +

arm: Add CDE options for star-mc1 cpu

This patch adds the CDE options support for the -mcpu=star-mc1.
The star-mc1 is an Armv8-m Mainline CPU supporting CDE feature.

gcc/ChangeLog:

* config/arm/arm-cpus.in (star-mc1): Add CDE options.
* doc/invoke.texi (cdecp options): Document for star-mc1.

Signed-off-by: Qingxin Zhong 

Diff:
---
 gcc/config/arm/arm-cpus.in | 8 
 gcc/doc/invoke.texi| 6 --
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 451b15fe9f93..5c12ffb807ba 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1689,6 +1689,14 @@ begin cpu star-mc1
  architecture armv8-m.main+dsp+fp
  option nofp remove ALL_FP
  option nodsp remove armv7em
+ option cdecp0 add cdecp0
+ option cdecp1 add cdecp1
+ option cdecp2 add cdecp2
+ option cdecp3 add cdecp3
+ option cdecp4 add cdecp4
+ option cdecp5 add cdecp5
+ option cdecp6 add cdecp6
+ option cdecp7 add cdecp7
  isa quirk_no_asmcpu quirk_vlldm
  costs v7m
 end cpu star-mc1
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 78ead0e494e1..e85a1495b70f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -23760,7 +23760,8 @@ on @samp{cortex-m52} and @samp{cortex-m85}.
 
 @item +nomve
 Disable the M-Profile Vector Extension (MVE) integer and single precision
-floating-point instructions on @samp{cortex-m52}, @samp{cortex-m55} and 
@samp{cortex-m85}.
+floating-point instructions on @samp{cortex-m52}, @samp{cortex-m55} and
+@samp{cortex-m85}.
 
 @item +nomve.fp
 Disable the M-Profile Vector Extension (MVE) single precision floating-point
@@ -23768,7 +23769,8 @@ instructions on @samp{cortex-m52}, @samp{cortex-m55} 
and @samp{cortex-m85}.
 
 @item +cdecp0, +cdecp1, ... , +cdecp7
 Enable the Custom Datapath Extension (CDE) on selected coprocessors according
-to the numbers given in the options in the range 0 to 7 on @samp{cortex-m52} 
and @samp{cortex-m55}.
+to the numbers given in the options in the range 0 to 7 on @samp{cortex-m52},
+@samp{cortex-m55} and @samp{star-mc1}.
 
 @item  +nofp
 Disables the floating-point instructions on @samp{arm9e},


[gcc r15-5955] c++: ICE with pack indexing empty pack [PR117898]

2024-12-05 Thread Marek Polacek via Gcc-cvs
https://gcc.gnu.org/g:afeef7f0d3537cd978931a5afcbd3d91c144bfeb

commit r15-5955-gafeef7f0d3537cd978931a5afcbd3d91c144bfeb
Author: Marek Polacek 
Date:   Wed Dec 4 16:58:59 2024 -0500

c++: ICE with pack indexing empty pack [PR117898]

Here we ICE with a partially-substituted pack indexing.  The pack
expanded to an empty pack, which we can't index.  It seems reasonable
to detect this case in tsubst_pack_index, even before we substitute
the index.  Other erroneous cases can wait until pack_index_element
where we have the index.

PR c++/117898

gcc/cp/ChangeLog:

* pt.cc (tsubst_pack_index): Detect indexing an empty pack.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/pack-indexing2.C: Adjust.
* g++.dg/cpp26/pack-indexing12.C: New test.

Diff:
---
 gcc/cp/pt.cc |  6 ++
 gcc/testsuite/g++.dg/cpp26/pack-indexing12.C | 16 
 gcc/testsuite/g++.dg/cpp26/pack-indexing2.C  | 26 --
 3 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 1f0f02603288..b094d141f3b0 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -13984,6 +13984,12 @@ tsubst_pack_index (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
   tree pack = PACK_INDEX_PACK (t);
   if (PACK_EXPANSION_P (pack))
 pack = tsubst_pack_expansion (pack, args, complain, in_decl);
+  if (TREE_CODE (pack) == TREE_VEC && TREE_VEC_LENGTH (pack) == 0)
+{
+  if (complain & tf_error)
+   error ("cannot index an empty pack");
+  return error_mark_node;
+}
   tree index = tsubst_expr (PACK_INDEX_INDEX (t), args, complain, in_decl);
   const bool parenthesized_p = (TREE_CODE (t) == PACK_INDEX_EXPR
&& PACK_INDEX_PARENTHESIZED_P (t));
diff --git a/gcc/testsuite/g++.dg/cpp26/pack-indexing12.C 
b/gcc/testsuite/g++.dg/cpp26/pack-indexing12.C
new file mode 100644
index ..d958af3620d0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp26/pack-indexing12.C
@@ -0,0 +1,16 @@
+// PR c++/117898
+// { dg-do compile { target c++26 } }
+
+void
+ICE (auto... args)
+{
+  [&]() {
+using R = decltype(args...[idx]); // { dg-error "cannot index an empty 
pack" }
+  }.template operator()<0>();
+}
+
+void
+g ()
+{
+  ICE(); // empty pack
+}
diff --git a/gcc/testsuite/g++.dg/cpp26/pack-indexing2.C 
b/gcc/testsuite/g++.dg/cpp26/pack-indexing2.C
index ec32527ed80f..fdc8320e2555 100644
--- a/gcc/testsuite/g++.dg/cpp26/pack-indexing2.C
+++ b/gcc/testsuite/g++.dg/cpp26/pack-indexing2.C
@@ -42,7 +42,7 @@ template
 int
 getT (auto... Ts)
 {
-  return Ts...[N]; // { dg-error "pack index is out of range" }
+  return Ts...[N]; // { dg-error "cannot index an empty pack" }
 }
 
 template
@@ -56,12 +56,26 @@ template
 void
 badtype ()
 {
-  Ts...[N] t; // { dg-error "pack index is out of range" }
+  Ts...[N] t; // { dg-error "cannot index an empty pack" }
 }
 
 template
 void
 badtype2 ()
+{
+  Ts...[N] t; // { dg-error "pack index is out of range" }
+}
+
+template
+void
+badtype3 ()
+{
+  Ts...[N] t; // { dg-error "cannot index an empty pack" }
+}
+
+template
+void
+badtype4 ()
 {
   Ts...[N] t; // { dg-error "pack index is negative" }
 }
@@ -97,12 +111,12 @@ int main()
 
   getT<0>(); // { dg-message "required from here" }
   getT<1>();  // { dg-message "required from here" }
-  getT2<-1>();  // { dg-message "required from here" }
+  getT2<-1>(1);  // { dg-message "required from here" }
 
   badtype<0>(); // { dg-message "required from here" }
-  badtype<1, int>(); // { dg-message "required from here" }
-  badtype2<-1>(); // { dg-message "required from here" }
-  badtype2<-1, int>(); // { dg-message "required from here" }
+  badtype2<1, int>(); // { dg-message "required from here" }
+  badtype3<-1>(); // { dg-message "required from here" }
+  badtype4<-1, int>(); // { dg-message "required from here" }
 
   badindex();


[gcc r15-5953] RISC-V: Fix incorrect optimization options passing to convert and unop

2024-12-05 Thread Pan Li via Gcc-cvs
https://gcc.gnu.org/g:b7baa22e47421d0a81202a333f43d88b5bbb39f5

commit r15-5953-gb7baa22e47421d0a81202a333f43d88b5bbb39f5
Author: Pan Li 
Date:   Wed Dec 4 10:08:11 2024 +0800

RISC-V: Fix incorrect optimization options passing to convert and unop

Like the strided load/store, the testcases of vector convert and unop
are designed to pick up different sorts of optimization options but
actually these option are ignored according to the Execution log of
the gcc.log.

This patch would like to make it correct almost the same as how we
fixed for strided load/store.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Fix the incorrect optimization
options passing to testcases.

Signed-off-by: Pan Li 

Diff:
---
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 65a57aa79138..aee297752f67 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -67,9 +67,9 @@ foreach op $AUTOVEC_TEST_OPTS {
   dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/cmp/*.\[cS\]]] \
 "" "$op"
   dg-runtest [lsort [glob -nocomplain 
$srcdir/$subdir/autovec/conversions/*.\[cS\]]] \
-"" "$op"
+"$op" ""
   dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/unop/*.\[cS\]]] \
-"" "$op"
+"$op" ""
   dg-runtest [lsort [glob -nocomplain 
$srcdir/$subdir/autovec/ternop/*.\[cS\]]] \
 "$op" ""
   dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/reduc/*.\[cS\]]] 
\


[gcc r15-5954] RISC-V: Refactor the testcases for bswap16-0

2024-12-05 Thread Pan Li via Gcc-cvs
https://gcc.gnu.org/g:3ac3093756cd00f50e63e8dcde4d278606722105

commit r15-5954-g3ac3093756cd00f50e63e8dcde4d278606722105
Author: Pan Li 
Date:   Wed Dec 4 10:08:12 2024 +0800

RISC-V: Refactor the testcases for bswap16-0

This patch would like to refactor the testcases of bswap16-0
after sorts of optimization option passing to testcase.  To
fits the big lmul like m8 for asm dump check.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: Update
the vector register RE to cover v10 - v31.

Signed-off-by: Pan Li 

Diff:
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c
index 605b3565b6bd..4b55c001a31d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c
@@ -10,7 +10,7 @@
 **   ...
 **   vsrl\.vi\s+v[0-9]+,\s*v[0-9],\s*8+
 **   vsll\.vi\s+v[0-9]+,\s*v[0-9],\s*8+
-**   vor\.vv\s+v[0-9]+,\s*v[0-9],\s*v[0-9]+
+**   vor\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
 **   ...
 */
 TEST_UNARY_CALL (uint16_t, __builtin_bswap16)


[gcc r15-5951] PR modula2/117904: cc1gm2 ICE when compiling a const built from VAL and SIZE

2024-12-05 Thread Gaius Mulley via Gcc-cvs
https://gcc.gnu.org/g:363382ac7c2b8f6a09415e905b349bb7eaeca38a

commit r15-5951-g363382ac7c2b8f6a09415e905b349bb7eaeca38a
Author: Gaius Mulley 
Date:   Thu Dec 5 20:31:34 2024 +

PR modula2/117904: cc1gm2 ICE when compiling a const built from VAL and SIZE

This patch fixes an ICE which occurs when a positive ZType constant
increment is used during a FOR loop.

gcc/m2/ChangeLog:

PR modula2/117904
* gm2-compiler/M2GenGCC.mod (PerformLastForIterator): Add call to
BuildConvert when increment is > 0.

gcc/testsuite/ChangeLog:

PR modula2/117904
* gm2/iso/pass/forloopbyconst.mod: New test.

Signed-off-by: Gaius Mulley 

Diff:
---
 gcc/m2/gm2-compiler/M2GenGCC.mod  | 16 +---
 gcc/testsuite/gm2/iso/pass/forloopbyconst.mod | 25 +
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/gcc/m2/gm2-compiler/M2GenGCC.mod b/gcc/m2/gm2-compiler/M2GenGCC.mod
index b6e34e019b04..c5f5a7825956 100644
--- a/gcc/m2/gm2-compiler/M2GenGCC.mod
+++ b/gcc/m2/gm2-compiler/M2GenGCC.mod
@@ -541,9 +541,19 @@ BEGIN
   THEN
  (* If incr > 0 then LastIterator := ((e2-e1) DIV incr) * incr + e1.  
*)
  expr := BuildSub (location, e2tree, e1tree, FALSE) ;
- expr := BuildDivFloor (location, expr, incrtree, FALSE) ;
- expr := BuildMult (location, expr, incrtree, FALSE) ;
- expr := BuildAdd (location, expr, e1tree, FALSE)
+ incrtree := BuildConvert (location, GetTreeType (expr), incrtree, 
FALSE) ;
+ IF TreeOverflow (incrtree)
+ THEN
+MetaErrorT0 (lastpos,
+ 'the intemediate calculation for the last iterator 
value in the {%kFOR} loop has caused an overflow') ;
+NoChange := FALSE ;
+SubQuad (quad) ;
+success := FALSE
+ ELSE
+expr := BuildDivFloor (location, expr, incrtree, FALSE) ;
+expr := BuildMult (location, expr, incrtree, FALSE) ;
+expr := BuildAdd (location, expr, e1tree, FALSE)
+ END
   ELSE
  (* Else use LastIterator := e1 - ((e1-e2) DIV PositiveBy) * PositiveBy
 to avoid unsigned div signed arithmetic.  *)
diff --git a/gcc/testsuite/gm2/iso/pass/forloopbyconst.mod 
b/gcc/testsuite/gm2/iso/pass/forloopbyconst.mod
new file mode 100644
index ..c0a1a06e0191
--- /dev/null
+++ b/gcc/testsuite/gm2/iso/pass/forloopbyconst.mod
@@ -0,0 +1,25 @@
+MODULE forloopbyconst ;
+
+
+CONST
+   block = 4 ;
+
+
+(*
+   init -
+*)
+
+PROCEDURE init ;
+VAR
+   i, n: CARDINAL ;
+BEGIN
+   n := 10 ;
+   FOR i := 1 TO n BY block DO
+
+   END
+END init ;
+
+
+BEGIN
+   init
+END forloopbyconst.