date:20221021

Hi!

As the testcase shows, when cbranchbf4/cstorebf4 patterns are defined,
we can get ICEs for conditional moves.
The problem is that the generic conditional move expansion just calls
prepare_cmp_insn which just checks that such a cbranch4 exists
and returns directly such comparison and passes it down to the conditional
move optabs.
The following patch fixes it by punting if the comparisons aren't
ix86_fp_comparison_operator (to tell the generic code it should separately
compare) and to handle the promotion of BFmode comparison operands to
SFmode such that comparison is performed in SFmode.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-10-21  Jakub Jelinek  

PR target/107322
* config/i386/i386-expand.cc (ix86_prepare_fp_compare_args): For
BFmode comparisons promote arguments to SFmode and recurse.
(ix86_expand_int_movcc, ix86_expand_fp_movcc): Return false early
if comparison operands are BFmode and operands[1] is not
ix86_fp_comparison_operator.

* gcc.target/i386/pr107322.c: New test.

--- gcc/config/i386/i386-expand.cc.jj   2022-10-19 11:20:54.602879162 +0200
+++ gcc/config/i386/i386-expand.cc  2022-10-20 12:15:37.750758679 +0200
@@ -2626,6 +2626,35 @@ ix86_prepare_fp_compare_args (enum rtx_c
   machine_mode op_mode = GET_MODE (op0);
   bool is_sse = SSE_FLOAT_MODE_SSEMATH_OR_HF_P (op_mode);
 
+  if (op_mode == BFmode)
+{
+  rtx op = gen_lowpart (HImode, op0);
+  if (CONST_INT_P (op))
+   op = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+op0, BFmode);
+  else
+   {
+ rtx t1 = gen_reg_rtx (SImode);
+ emit_insn (gen_zero_extendhisi2 (t1, op));
+ emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
+ op = gen_lowpart (SFmode, t1);
+   }
+  *pop0 = op;
+  op = gen_lowpart (HImode, op1);
+  if (CONST_INT_P (op))
+   op = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+op1, BFmode);
+  else
+   {
+ rtx t1 = gen_reg_rtx (SImode);
+ emit_insn (gen_zero_extendhisi2 (t1, op));
+ emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
+ op = gen_lowpart (SFmode, t1);
+   }
+  *pop1 = op;
+  return ix86_prepare_fp_compare_args (code, pop0, pop1);
+}
+
   /* All of the unordered compare instructions only work on registers.
  The same is true of the fcomi compare instructions.  The XFmode
  compare instructions require registers except when comparing
@@ -3164,6 +3193,10 @@ ix86_expand_int_movcc (rtx operands[])
  && !TARGET_64BIT))
 return false;
 
+  if (GET_MODE (op0) == BFmode
+  && !ix86_fp_comparison_operator (operands[1], VOIDmode))
+return false;
+
   start_sequence ();
   compare_op = ix86_expand_compare (code, op0, op1);
   compare_seq = get_insns ();
@@ -4238,6 +4271,10 @@ ix86_expand_fp_movcc (rtx operands[])
   rtx op0 = XEXP (operands[1], 0);
   rtx op1 = XEXP (operands[1], 1);
 
+  if (GET_MODE (op0) == BFmode
+  && !ix86_fp_comparison_operator (operands[1], VOIDmode))
+return false;
+
   if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 {
   machine_mode cmode;
--- gcc/testsuite/gcc.target/i386/pr107322.c.jj 2022-10-20 12:28:46.829983399 
+0200
+++ gcc/testsuite/gcc.target/i386/pr107322.c2022-10-20 12:29:44.287201650 
+0200
@@ -0,0 +1,33 @@
+/* PR target/107322 */
+/* { dg-do compile } */
+/* { dg-options "-fexcess-precision=16 -O -msse2 -mfpmath=sse" } */
+
+int i, j;
+float k, l;
+__bf16 f;
+
+void
+foo (void)
+{
+  i *= 0 >= f;
+}
+
+void
+bar (void)
+{
+  i *= 0 <= f;
+}
+
+void
+baz (int x, int y)
+{
+  i = 0 >= f ? x : y;
+  j = 0 <= f ? x + 2 : y + 3;
+}
+
+void
+qux (float x, float y)
+{
+  k = 0 >= f ? x : y;
+  l = 0 <= f ? x + 2 : y + 3;
+}

Jakub

[PATCH] builtins: Add __builtin_nextafterf16b builtin

Hi!

On top of the pending
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603665.html
patch, the following patch adds another needed builtin.
The above patch adds among other things __builtin_nextafterf16
builtin which we need in order to constexpr evaluate
std::nextafter(_Float16) overload (patch for that to be posted momentarily).
While there is inline implementation of the overload, it isn't constant
evaluation friendly, and the builtin doesn't need libm implementation
because it will be used only during constant expression evaluation.
We need the same thing also for std::nextafter(__gnu_cxx::__bfloat16_t)
though and this patch does that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-10-21  Jakub Jelinek  

* builtin-types.def (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16): New.
* builtins.def (BUILT_IN_NEXTAFTERF16B): New builtin.
* fold-const-call.cc (fold_const_call_sss): Handle
CFN_BUILT_IN_NEXTAFTERF16B.

--- gcc/builtin-types.def.jj2022-10-20 16:43:03.031928876 +0200
+++ gcc/builtin-types.def   2022-10-20 16:44:15.768934809 +0200
@@ -461,6 +461,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT64X_FLOA
 BT_FLOAT64X, BT_FLOAT64X, BT_FLOAT64X)
 DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT128X_FLOAT128X_FLOAT128X,
 BT_FLOAT128X, BT_FLOAT128X, BT_FLOAT128X)
+DEF_FUNCTION_TYPE_2 (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16,
+BT_BFLOAT16, BT_BFLOAT16, BT_BFLOAT16)
 DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT_FLOAT_FLOATPTR,
 BT_FLOAT, BT_FLOAT, BT_FLOAT_PTR)
 DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_DOUBLEPTR,
--- gcc/builtins.def.jj 2022-10-20 16:43:03.033928849 +0200
+++ gcc/builtins.def2022-10-20 16:46:27.467135944 +0200
@@ -591,6 +591,7 @@ DEF_C99_BUILTIN(BUILT_IN_NEXTAFT
 DEF_C99_BUILTIN(BUILT_IN_NEXTAFTERL, "nextafterl", 
BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_ERRNO)
 #define NEXTAFTER_TYPE(F) BT_FN_##F##_##F##_##F
 DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_NEXTAFTER, "nextafter", 
NEXTAFTER_TYPE, ATTR_MATHFN_ERRNO)
+DEF_GCC_BUILTIN(BUILT_IN_NEXTAFTERF16B, "nextafterf16b", 
BT_FN_BFLOAT16_BFLOAT16_BFLOAT16, ATTR_MATHFN_ERRNO)
 DEF_C99_BUILTIN(BUILT_IN_NEXTTOWARD, "nexttoward", 
BT_FN_DOUBLE_DOUBLE_LONGDOUBLE, ATTR_MATHFN_ERRNO)
 DEF_C99_BUILTIN(BUILT_IN_NEXTTOWARDF, "nexttowardf", 
BT_FN_FLOAT_FLOAT_LONGDOUBLE, ATTR_MATHFN_ERRNO)
 DEF_C99_BUILTIN(BUILT_IN_NEXTTOWARDL, "nexttowardl", 
BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_ERRNO)
--- gcc/fold-const-call.cc.jj   2022-10-20 16:43:03.033928849 +0200
+++ gcc/fold-const-call.cc  2022-10-20 16:50:14.300038009 +0200
@@ -1438,6 +1438,7 @@ fold_const_call_sss (real_value *result,
 
 CASE_CFN_NEXTAFTER:
 CASE_CFN_NEXTAFTER_FN:
+case CFN_BUILT_IN_NEXTAFTERF16B:
 CASE_CFN_NEXTTOWARD:
   return fold_const_nextafter (result, arg0, arg1, format);
 


Jakub

[PATCH] libstdc++: Small extended float support tweaks

Hi!

The following patch isn't for immediate commit, as it has several
dependencies, in particular:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603665.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604080.html
https://gcc.gnu.org/pipermail/libstdc++/2022-October/054849.html
On top of those, this patch
1) enables the std::float128_t overloads for x86 with glibc 2.26+
2) makes std::nextafter(std::float16_t, std::float16_t) and
   std::nextafter(std::bfloat16_t, std::bfloat16_t) constexpr
3) adds (small) testsuite coverage for that

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
if/when the above dependencies are in?

2022-10-21  Jakub Jelinek  

* config/os/gnu-linux/os_defines.h (_GLIBCXX_HAVE_FLOAT128_MATH):
Uncomment.
* include/c_global/cmath (nextafter(_Float16, _Float16)): Make it 
constexpr.
If std::__is_constant_evaluated() call __builtin_nextafterf16.
(nextafter(__gnu_cxx::__bfloat16_t, __gnu_cxx::__bfloat16_t): Similarly
but call __builtin_nextafterf16b.
* testsuite/26_numerics/headers/cmath/nextafter_c++23.cc (test): Add
static assertions to test constexpr nextafter.

--- libstdc++-v3/config/os/gnu-linux/os_defines.h.jj2022-10-18 
11:35:55.514865483 +0200
+++ libstdc++-v3/config/os/gnu-linux/os_defines.h   2022-10-20 
16:57:59.715681664 +0200
@@ -57,7 +57,7 @@
|| (defined(__powerpc__) && defined(_ARCH_PWR8) \
&& defined(__LITTLE_ENDIAN__) && (_CALL_ELF == 2) \
&& defined(__FLOAT128__)))
-//# define _GLIBCXX_HAVE_FLOAT128_MATH 1
+# define _GLIBCXX_HAVE_FLOAT128_MATH 1
 #endif
 
 #if __GLIBC_PREREQ(2, 27)
--- libstdc++-v3/include/c_global/cmath.jj  2022-10-19 11:23:51.484488161 
+0200
+++ libstdc++-v3/include/c_global/cmath 2022-10-20 17:03:56.760805581 +0200
@@ -2755,9 +2755,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   nearbyint(_Float16 __x)
   { return _Float16(__builtin_nearbyintf(__x)); }
 
-  inline _Float16
+  constexpr _Float16
   nextafter(_Float16 __x, _Float16 __y)
   {
+if (std::__is_constant_evaluated())
+  return __builtin_nextafterf16(__x, __y);
 #ifdef __INT16_TYPE__
 using __float16_int_type = __INT16_TYPE__;
 #else
@@ -3471,9 +3473,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   nearbyint(__gnu_cxx::__bfloat16_t __x)
   { return __gnu_cxx::__bfloat16_t(__builtin_nearbyintf(__x)); }
 
-  inline __gnu_cxx::__bfloat16_t
+  constexpr __gnu_cxx::__bfloat16_t
   nextafter(__gnu_cxx::__bfloat16_t __x, __gnu_cxx::__bfloat16_t __y)
   {
+if (std::__is_constant_evaluated())
+  return __builtin_nextafterf16b(__x, __y);
 #ifdef __INT16_TYPE__
 using __bfloat16_int_type = __INT16_TYPE__;
 #else
--- libstdc++-v3/testsuite/26_numerics/headers/cmath/nextafter_c++23.cc.jj  
2022-10-20 16:57:29.940088318 +0200
+++ libstdc++-v3/testsuite/26_numerics/headers/cmath/nextafter_c++23.cc 
2022-10-20 17:19:40.141923257 +0200
@@ -100,6 +100,8 @@ test ()
   VERIFY( std::fpclassify(t36) == FP_NAN );
   T t37 = std::nextafter(T(-0.0), T());
   VERIFY( t37 == T() && !std::signbit(t37) );
+  static_assert(std::nextafter(T(1.0), T(2.0)) > T(1.0));
+  static_assert(std::nextafter(std::nextafter(T(1.0), T(5.0)), T(0.0)) == 
T(1.0));
 }
 
 int

Jakub

[PATCH] c++, v2: Don't shortcut TREE_CONSTANT vector type CONSTRUCTORs in cxx_eval_constant_expression [PR107295]

On Thu, Oct 20, 2022 at 10:51:14AM -0400, Jason Merrill wrote:
> That seems like a bug; for VECTOR_TYPE we should fold even if !changed.
> 
> > Also, the reason for the short-cutting is I think trying to avoid
> > allocating a new CONSTRUCTOR when nothing changes and we just create
> > GC garbage by it.
> 
> We might limit the shortcut to non-vector types by hoisting the vector check
> in reduced_constant_expression_p out of the CONSTRUCTOR_NO_CLEARING
> condition:
> 
> >   if (CONSTRUCTOR_NO_CLEARING (t))
> > {
> >   if (TREE_CODE (TREE_TYPE (t)) == VECTOR_TYPE)
> > /* An initialized vector would have a VECTOR_CST.  */
> > return false;
> 
> then we could remove the fold in the shortcut.

Ok, so like this?
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-10-21  Jakub Jelinek  

PR c++/107295
* constexpr.cc (reduced_constant_expression_p) :
Return false for VECTOR_TYPE CONSTRUCTORs even without
CONSTRUCTOR_NO_CLEARING set on them.
(cxx_eval_bare_aggregate): If constant but !changed, fold before
returning VECTOR_TYPE_P CONSTRUCTOR.
(cxx_eval_constant_expression) : Don't fold
TREE_CONSTANT CONSTRUCTOR, just return it.

* g++.dg/ext/vector42.C: New test.

--- gcc/cp/constexpr.cc.jj  2022-10-19 11:20:28.960225787 +0200
+++ gcc/cp/constexpr.cc 2022-10-20 18:43:42.952440364 +0200
@@ -3104,12 +3104,12 @@ reduced_constant_expression_p (tree t)
 case CONSTRUCTOR:
   /* And we need to handle PTRMEM_CST wrapped in a CONSTRUCTOR.  */
   tree field;
+  if (TREE_CODE (TREE_TYPE (t)) == VECTOR_TYPE)
+   /* An initialized vector would have a VECTOR_CST.  */
+   return false;
   if (CONSTRUCTOR_NO_CLEARING (t))
{
- if (TREE_CODE (TREE_TYPE (t)) == VECTOR_TYPE)
-   /* An initialized vector would have a VECTOR_CST.  */
-   return false;
- else if (TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
+ if (TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
{
  /* There must be a valid constant initializer at every array
 index.  */
@@ -4956,8 +4956,14 @@ cxx_eval_bare_aggregate (const constexpr
  TREE_SIDE_EFFECTS (ctx->ctor) = side_effects_p;
}
 }
-  if (*non_constant_p || !changed)
+  if (*non_constant_p)
 return t;
+  if (!changed)
+{
+  if (VECTOR_TYPE_P (type))
+   t = fold (t);
+  return t;
+}
   t = ctx->ctor;
   if (!t)
 t = build_constructor (type, NULL);
@@ -7387,11 +7393,10 @@ cxx_eval_constant_expression (const cons
 case CONSTRUCTOR:
   if (TREE_CONSTANT (t) && reduced_constant_expression_p (t))
{
- /* Don't re-process a constant CONSTRUCTOR, but do fold it to
-VECTOR_CST if applicable.  */
+ /* Don't re-process a constant CONSTRUCTOR.  */
  verify_constructor_flags (t);
  if (TREE_CONSTANT (t))
-   return fold (t);
+   return t;
}
   r = cxx_eval_bare_aggregate (ctx, t, lval,
   non_constant_p, overflow_p);
--- gcc/testsuite/g++.dg/ext/vector42.C.jj  2022-10-20 17:57:42.767848544 
+0200
+++ gcc/testsuite/g++.dg/ext/vector42.C 2022-10-20 17:57:42.767848544 +0200
@@ -0,0 +1,12 @@
+// PR c++/107295
+// { dg-do compile { target c++11 } }
+
+template  struct A {
+  typedef T __attribute__((vector_size (sizeof (int V;
+};
+template  using B = typename A::V;
+template  using V = B<4, T>;
+using F = V;
+constexpr F a = F () + 0.0f;
+constexpr F b = F () + (float) 0.0;
+constexpr F c = F () + (float) 0.0L;


Jakub

[PATCH zero-call-used-regs] Add leafy mode for zero-call-used-regs

2022-10-21 Thread Alexandre Oliva via Gcc-patches

Introduce 'leafy' to auto-select between 'used' and 'all' for leaf and
nonleaf functions, respectively.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* doc/extend.texi (zero-call-used-regs): Document leafy and
variants thereof.
* flag-types.h (zero_regs_flags): Add LEAFY_MODE, as well as
LEAFY and variants.
* function.cc (gen_call_ued_regs_seq): Set only_used for leaf
functions in leafy mode.
* opts.cc (zero_call_used_regs_opts): Add leafy and variants.

for  gcc/testsuite/ChangeLog

* c-c++-common/zero-scratch-regs-leafy-1.c: New.
* c-c++-common/zero-scratch-regs-leafy-2.c: New.
* gcc.target/i386/zero-scratch-regs-leafy-1.c: New.
* gcc.target/i386/zero-scratch-regs-leafy-2.c: New.
---
 gcc/doc/extend.texi|   22 ++--
 gcc/flag-types.h   |5 +
 gcc/function.cc|3 +++
 gcc/opts.cc|4 
 .../c-c++-common/zero-scratch-regs-leafy-1.c   |   15 ++
 .../c-c++-common/zero-scratch-regs-leafy-2.c   |   21 +++
 .../gcc.target/i386/zero-scratch-regs-leafy-1.c|   12 +++
 .../gcc.target/i386/zero-scratch-regs-leafy-2.c|   16 +++
 8 files changed, 96 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-leafy-1.c
 create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-leafy-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-leafy-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-leafy-2.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 04af0584d82cc..bf11956c467fb 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -4391,10 +4391,28 @@ zeros all call-used registers that pass arguments.
 @item all-gpr-arg
 zeros all call-used general purpose registers that pass
 arguments.
+
+@item leafy
+Same as @samp{used} in a leaf function, and same as @samp{all} in a
+nonleaf function.
+
+@item leafy-gpr
+Same as @samp{used-gpr} in a leaf function, and same as @samp{all-gpr}
+in a nonleaf function.
+
+@item leafy-arg
+Same as @samp{used-arg} in a leaf function, and same as @samp{all-arg}
+in a nonleaf function.
+
+@item leafy-gpr-arg
+Same as @samp{used-gpr-arg} in a leaf function, and same as
+@samp{all-gpr-arg} in a nonleaf function.
+
 @end table
 
-Of this list, @samp{used-arg}, @samp{used-gpr-arg}, @samp{all-arg},
-and @samp{all-gpr-arg} are mainly used for ROP mitigation.
+Of this list, @samp{used-arg}, @samp{used-gpr-arg}, @samp{leafy-arg},
+@samp{leafy-gpr-arg}, @samp{all-arg}, and @samp{all-gpr-arg} are mainly
+used for ROP mitigation.
 
 The default for the attribute is controlled by @option{-fzero-call-used-regs}.
 @end table
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index d2e751060ffce..b90c85167dcd4 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -338,6 +338,7 @@ namespace zero_regs_flags {
   const unsigned int ONLY_GPR = 1UL << 2;
   const unsigned int ONLY_ARG = 1UL << 3;
   const unsigned int ENABLED = 1UL << 4;
+  const unsigned int LEAFY_MODE = 1UL << 5;
   const unsigned int USED_GPR_ARG = ENABLED | ONLY_USED | ONLY_GPR | ONLY_ARG;
   const unsigned int USED_GPR = ENABLED | ONLY_USED | ONLY_GPR;
   const unsigned int USED_ARG = ENABLED | ONLY_USED | ONLY_ARG;
@@ -346,6 +347,10 @@ namespace zero_regs_flags {
   const unsigned int ALL_GPR = ENABLED | ONLY_GPR;
   const unsigned int ALL_ARG = ENABLED | ONLY_ARG;
   const unsigned int ALL = ENABLED;
+  const unsigned int LEAFY_GPR_ARG = ENABLED | LEAFY_MODE | ONLY_GPR | 
ONLY_ARG;
+  const unsigned int LEAFY_GPR = ENABLED | LEAFY_MODE | ONLY_GPR;
+  const unsigned int LEAFY_ARG = ENABLED | LEAFY_MODE | ONLY_ARG;
+  const unsigned int LEAFY = ENABLED | LEAFY_MODE;
 }
 
 /* Settings of flag_incremental_link.  */
diff --git a/gcc/function.cc b/gcc/function.cc
index 6474a663b30b8..16582e698041a 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -5879,6 +5879,9 @@ gen_call_used_regs_seq (rtx_insn *ret, unsigned int 
zero_regs_type)
   only_used = zero_regs_type & ONLY_USED;
   only_arg = zero_regs_type & ONLY_ARG;
 
+  if ((zero_regs_type & LEAFY_MODE) && leaf_function_p ())
+only_used = true;
+
   /* For each of the hard registers, we should zero it if:
1. it is a call-used register;
and 2. it is not a fixed register;
diff --git a/gcc/opts.cc b/gcc/opts.cc
index ae079fcd20eea..39f6a1b278dc6 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -2099,6 +2099,10 @@ const struct zero_call_used_regs_opts_s 
zero_call_used_regs_opts[] =
   ZERO_CALL_USED_REGS_OPT (all-gpr, zero_regs_flags::ALL_GPR),
   ZERO_CALL_USED_REGS_OPT (all-arg, zero_regs_flags::ALL_ARG),
   ZERO_CALL_USED_REGS_OPT (all, zero_regs_flags::ALL),
+  ZERO_CALL_USED_REGS_OPT (leafy-gpr-arg, zero_regs_flags::LEAFY_GPR_

Re: [PATCH] RISC-V: Add type attribute for atomic instructions.

2022-10-21 Thread Kito Cheng via Gcc-patches

Committed, thanks :)

On Fri, Oct 21, 2022 at 1:02 PM Monk Chiang  wrote:
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.md: Add atomic type attribute.
> * config/riscv/sync.md: Add atomic type for atomic instructions.
> ---
>  gcc/config/riscv/riscv.md |  2 +-
>  gcc/config/riscv/sync.md  | 15 ++-
>  2 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index b3654915fde..9384ced0447 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -317,7 +317,7 @@
>"unknown,branch,jump,call,load,fpload,store,fpstore,
> mtc,mfc,const,arith,logical,shift,slt,imul,idiv,move,fmove,fadd,fmul,
> fmadd,fdiv,fcmp,fcvt,fsqrt,multi,auipc,sfb_alu,nop,ghost,bitmanip,rotate,
> -   rdvlenb,rdvl,vsetvl,vlde,vste,vldm,vstm,vlds,vsts,
> +   atomic,rdvlenb,rdvl,vsetvl,vlde,vste,vldm,vstm,vlds,vsts,
> vldux,vldox,vstux,vstox,vldff,vldr,vstr,
> vialu,viwalu,vext,vicalu,vshift,vnshift,vicmp,
> vimul,vidiv,viwmul,vimuladd,viwmuladd,vimerge,vimov,
> diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
> index 7deb290d9dc..449f275e6a2 100644
> --- a/gcc/config/riscv/sync.md
> +++ b/gcc/config/riscv/sync.md
> @@ -62,7 +62,8 @@
>UNSPEC_ATOMIC_STORE))]
>"TARGET_ATOMIC"
>"%F2amoswap.%A2 zero,%z1,%0"
> -  [(set (attr "length") (const_int 8))])
> +  [(set_attr "type" "atomic")
> +   (set (attr "length") (const_int 8))])
>
>  (define_insn "atomic_"
>[(set (match_operand:GPR 0 "memory_operand" "+A")
> @@ -73,7 +74,8 @@
>  UNSPEC_SYNC_OLD_OP))]
>"TARGET_ATOMIC"
>"%F2amo.%A2 zero,%z1,%0"
> -  [(set (attr "length") (const_int 8))])
> +  [(set_attr "type" "atomic")
> +   (set (attr "length") (const_int 8))])
>
>  (define_insn "atomic_fetch_"
>[(set (match_operand:GPR 0 "register_operand" "=&r")
> @@ -86,7 +88,8 @@
>  UNSPEC_SYNC_OLD_OP))]
>"TARGET_ATOMIC"
>"%F3amo.%A3 %0,%z2,%1"
> -  [(set (attr "length") (const_int 8))])
> +  [(set_attr "type" "atomic")
> +   (set (attr "length") (const_int 8))])
>
>  (define_insn "atomic_exchange"
>[(set (match_operand:GPR 0 "register_operand" "=&r")
> @@ -98,7 +101,8 @@
> (match_operand:GPR 2 "register_operand" "0"))]
>"TARGET_ATOMIC"
>"%F3amoswap.%A3 %0,%z2,%1"
> -  [(set (attr "length") (const_int 8))])
> +  [(set_attr "type" "atomic")
> +   (set (attr "length") (const_int 8))])
>
>  (define_insn "atomic_cas_value_strong"
>[(set (match_operand:GPR 0 "register_operand" "=&r")
> @@ -112,7 +116,8 @@
> (clobber (match_scratch:GPR 6 "=&r"))]
>"TARGET_ATOMIC"
>"%F5 1: lr.%A5 %0,%1; bne %0,%z2,1f; sc.%A4 %6,%z3,%1; bnez 
> %6,1b; 1:"
> -  [(set (attr "length") (const_int 20))])
> +  [(set_attr "type" "atomic")
> +   (set (attr "length") (const_int 20))])
>
>  (define_expand "atomic_compare_and_swap"
>[(match_operand:SI 0 "register_operand" "")   ;; bool output
> --
> 2.37.2
>

Re: [PATCH] i386: Fix up BFmode comparisons in conditional moves [PR107322]

2022-10-21 Thread Uros Bizjak via Gcc-patches

On Fri, Oct 21, 2022 at 9:15 AM Jakub Jelinek  wrote:
>
> Hi!
>
> As the testcase shows, when cbranchbf4/cstorebf4 patterns are defined,
> we can get ICEs for conditional moves.
> The problem is that the generic conditional move expansion just calls
> prepare_cmp_insn which just checks that such a cbranch4 exists
> and returns directly such comparison and passes it down to the conditional
> move optabs.
> The following patch fixes it by punting if the comparisons aren't
> ix86_fp_comparison_operator (to tell the generic code it should separately
> compare) and to handle the promotion of BFmode comparison operands to
> SFmode such that comparison is performed in SFmode.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2022-10-21  Jakub Jelinek  
>
> PR target/107322
> * config/i386/i386-expand.cc (ix86_prepare_fp_compare_args): For
> BFmode comparisons promote arguments to SFmode and recurse.
> (ix86_expand_int_movcc, ix86_expand_fp_movcc): Return false early
> if comparison operands are BFmode and operands[1] is not
> ix86_fp_comparison_operator.
>
> * gcc.target/i386/pr107322.c: New test.

OK, but now we have two more copies of a function that effectively
extends BF to SF. Can you please split this utility function out and
use it here and in cbranchbf4/cstorebf4? I'm talking about this part:

+  op = gen_lowpart (HImode, op1);
+  if (CONST_INT_P (op))
+   op = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+op1, BFmode);
+  else
+   {
+ rtx t1 = gen_reg_rtx (SImode);
+ emit_insn (gen_zero_extendhisi2 (t1, op));
+ emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
+ op = gen_lowpart (SFmode, t1);
+   }

Taking this a bit further, it looks like a generic function to extend
BF to SF, when extendbfsf2 named function is not defined.

The above could be a follow-up patch, the proposed patch is OK.

On a related note, I still think that without corresponding BFmode
expanders, generic middle-end code should extend BFmode to SFmode and
perform all comparisons in SFmode, in effect what cbranchbf4/cstorebf4
x86 expanders are doing now by themselves. This would allow
cbranchbf4/cstorebf4 to fail (or to not be present), and still result
in optimal code without intermediate extends and truncations.

Thanks,
Uros.

> --- gcc/config/i386/i386-expand.cc.jj   2022-10-19 11:20:54.602879162 +0200
> +++ gcc/config/i386/i386-expand.cc  2022-10-20 12:15:37.750758679 +0200
> @@ -2626,6 +2626,35 @@ ix86_prepare_fp_compare_args (enum rtx_c
>machine_mode op_mode = GET_MODE (op0);
>bool is_sse = SSE_FLOAT_MODE_SSEMATH_OR_HF_P (op_mode);
>
> +  if (op_mode == BFmode)
> +{
> +  rtx op = gen_lowpart (HImode, op0);
> +  if (CONST_INT_P (op))
> +   op = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +op0, BFmode);
> +  else
> +   {
> + rtx t1 = gen_reg_rtx (SImode);
> + emit_insn (gen_zero_extendhisi2 (t1, op));
> + emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
> + op = gen_lowpart (SFmode, t1);
> +   }
> +  *pop0 = op;
> +  op = gen_lowpart (HImode, op1);
> +  if (CONST_INT_P (op))
> +   op = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +op1, BFmode);
> +  else
> +   {
> + rtx t1 = gen_reg_rtx (SImode);
> + emit_insn (gen_zero_extendhisi2 (t1, op));
> + emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
> + op = gen_lowpart (SFmode, t1);
> +   }
> +  *pop1 = op;
> +  return ix86_prepare_fp_compare_args (code, pop0, pop1);
> +}
> +
>/* All of the unordered compare instructions only work on registers.
>   The same is true of the fcomi compare instructions.  The XFmode
>   compare instructions require registers except when comparing
> @@ -3164,6 +3193,10 @@ ix86_expand_int_movcc (rtx operands[])
>   && !TARGET_64BIT))
>  return false;
>
> +  if (GET_MODE (op0) == BFmode
> +  && !ix86_fp_comparison_operator (operands[1], VOIDmode))
> +return false;
> +
>start_sequence ();
>compare_op = ix86_expand_compare (code, op0, op1);
>compare_seq = get_insns ();
> @@ -4238,6 +4271,10 @@ ix86_expand_fp_movcc (rtx operands[])
>rtx op0 = XEXP (operands[1], 0);
>rtx op1 = XEXP (operands[1], 1);
>
> +  if (GET_MODE (op0) == BFmode
> +  && !ix86_fp_comparison_operator (operands[1], VOIDmode))
> +return false;
> +
>if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>  {
>machine_mode cmode;
> --- gcc/testsuite/gcc.target/i386/pr107322.c.jj 2022-10-20 12:28:46.829983399 
> +0200
> +++ gcc/testsuite/gcc.target/i386/pr107322.c2022-10-20 12:29:44.287201650 
> +0200
> @@ -0,0 +1,33 @@
> +/* PR target/107322 */
> +/* { dg-do compile } */
> +/* { dg-options "-fexc

Re: Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195] (was: Add 'c-c++-common/torture/pr107195-1.c' [PR107195] (was: [COMMITTED] [PR107195] Set range to zero when nonzero mask is 0.))

2022-10-21 Thread Thomas Schwinge

Hi!

On 2022-10-21T00:44:30+0200, Aldy Hernandez  wrote:
> On Thu, Oct 20, 2022 at 9:22 PM Thomas Schwinge  
> wrote:
>> "Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]" attached?
>
> I see 7 different tests in this patch.  Did the 6 that pass, fail
> before my patch for PR107195 and are now working?   Cause unless
> that's the case, they shouldn't be in a test named pr107195-3.c, but
> somewhere else.

That's correct; I should've mentioned that I had verified this.  With the
code changes of commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
"[PR107195] Set range to zero when nonzero mask is 0" reverted, we get:

PASS: gcc.dg/tree-ssa/pr107195-3.c (test for excess errors)
FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call 
 I see there's one XFAILed test in your patch

... XFAILed test case removed, see the attached
"Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]";
OK now to push that version?


> and this certainly
> doesn't look like something that has anything to do with the patch I
> submitted.  Perhaps you could open a PR with an enhancement request
> for this one?
>
> That being said...
>
> /* { dg-additional-options -O1 } */
> extern int
> __attribute__((const))
> foo4b (int);
>
> int f4b (unsigned int r)
> {
>   if (foo4b (r))
> r *= 8U;
>
>   if ((r / 2U) & 2U)
> r += foo4b (r);
>
>   return r;
> }
> /* { dg-final { scan-tree-dump-times {gimple_call  xfail *-*-* } } } */
>
> At -O2, this is something PRE is doing,  so GCC already handles this.
> However, you are suggesting this isn't handled at -O1 and should be??

My thinking was that this optimization does work for 'r >> 1', but it
doesn't work for 'r / 2'.

> None of the VRPs run at -O1 so ranger-vrp won't even get a chance.
> However, DOM runs at -O1 and it uses ranger to do simple copy
> propagation and some jump threading...so technically we could do
> something...
>
> DOM should be able to thread from the r *= 8U to the return because
> the nonzero mask (known zeros) after the multiplication is 0xfff8,
> which it could use to solve the second conditional as false.  This
> would leave us with:
>
> if (foo4b (r))
>   {
> r *= 8U;
>return r;
>   }
> else
>   {
>  if ((r / 2U) & 2U)
>r += foo4b (r);
>   }
>
> ...which exposes the fact that the second call to foo4b() has the same
> "r" as the first one, so it could be folded.  I don't know whose job
> it is to notice that two const calls have the same arguments, but ISTM
> that if we thread the above correctly, someone should be able to clean
> this up.  No clue whether this happens at -O1.
>
> However... we're not threading this.  It looks like we're not keeping
> track of nonzero bits (known zeros) through the division.  The
> multiplication gives us 0xfff8 and we should be able to divide
> that by 2 and get 0x7ffc which solves the second conditional to 0.
>
> So...maybe DOM+ranger could set things up for another pass to clean this up?
>
> Either way, you could open an enhancement request, if anything to keep
> the nonzero mask up to date through the division.

I've thus filed 
"Optimization opportunity where integer '/' corresponds to '>>'" for
continuing that investigation.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From e55e8569201c482507550eb56ff16aa3bbb48676 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 17 Oct 2022 09:10:03 +0200
Subject: [PATCH] Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]

... to display optimization performed as of recent
commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
"[PR107195] Set range to zero when nonzero mask is 0".

	PR tree-optimization/107195
	gcc/testsuite/
	* gcc.dg/tree-ssa/pr107195-3.c: New.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c | 112 +
 1 file changed, 112 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c
new file mode 100644
index 000..eba4218b3c9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c
@@ -0,0 +1,112 @@
+/* Inspired by 'libgomp.oacc-c-c++-common/nvptx-sese-1.c'.  */
+
+/* { dg-additional-options -O1 } */
+/* { dg-additional-options -fdump-tree-dom3-raw } */
+
+
+extern int
+__attribute__((const))
+foo1 (int);
+
+int f1 (int r)
+{
+  if (foo1 (r)) /* If this first 'if' holds...  */
+r *= 2; /* ..., 'r' now has a zero-value lower-most bit...  */
+
+  if (r & 1) /* ..., so this second 'if' can never hold...  */
+{ /* ..., so this is unreachable.  */
+  /* In constrast, if the first 'if' does not hold ('foo1 (r) == 0'), the
+	 second 'if' may hold, but we know ('foo1' being 'const') that
+	 'foo1

Re: Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195] (was: Add 'c-c++-common/torture/pr107195-1.c' [PR107195] (was: [COMMITTED] [PR107195] Set range to zero when nonzero mask is 0.))

2022-10-21 Thread Aldy Hernandez via Gcc-patches

On Fri, Oct 21, 2022 at 10:38 AM Thomas Schwinge
 wrote:
>
> Hi!
>
> On 2022-10-21T00:44:30+0200, Aldy Hernandez  wrote:
> > On Thu, Oct 20, 2022 at 9:22 PM Thomas Schwinge  
> > wrote:
> >> "Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]" attached?
> >
> > I see 7 different tests in this patch.  Did the 6 that pass, fail
> > before my patch for PR107195 and are now working?   Cause unless
> > that's the case, they shouldn't be in a test named pr107195-3.c, but
> > somewhere else.
>
> That's correct; I should've mentioned that I had verified this.  With the
> code changes of commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> "[PR107195] Set range to zero when nonzero mask is 0" reverted, we get:
>
> PASS: gcc.dg/tree-ssa/pr107195-3.c (test for excess errors)
> FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call 
>  FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call 
>  FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call 
>  FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call 
>  FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call 
>  FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call 
> 
> ..., and in 'pr107195-3.c.196t.dom3' instead see two calls of each
> 'foo[...]' function.
>
> That's with this...
>
> > I see there's one XFAILed test in your patch
>
> ... XFAILed test case removed, see the attached
> "Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]";
> OK now to push that version?

OK, thanks.

Re: [PATCH] expand: Convert cst - x into cst xor x.

2022-10-21 Thread Robin Dapp via Gcc-patches

> Do we have evidence that targets properly cost XOR vs SUB RTXen?
> 
> It might actually be a reload optimization - when the constant is
> available in a register use 'sub', when it needs to be reloaded
> use 'xor'?
> 
> That said, I wonder if the fallout of changing some SUB to XOR
> is bigger than the benefit when we do it early (missed combines, etc.)?

Regarding fallout I did a bootstrap and regtest for various backends
now.  No change on Power9, s390x and aarch64.  On x86 there is one
additional FAIL in pr78103-3.c:

unsigned long long
bar (unsigned int x)
{
  return __CHAR_BIT__ * sizeof (unsigned int) - 1 - __builtin_clz (x);
}

is supposed to become

bsrl%edi, %eax
ret

but now is

bsrl%edi, %eax
xorl$31, %eax
xorq$31, %rax
ret

The x86 backend has various splitters catching and simplifying something
like

 (xor (minus (const_int 63) (clz (match_operand))) (const_int 63))

to

 (bsr ...).

>From a quick glance, there are several combinations of 31, 63, xor, clz
which would need to be duplicated(?) to match against the changed
patterns.  Perhaps xor is always cheaper on x86 and a simple change from
(minus (const_int 63) (...)) to (xor (const_int 63) (...)) would be
sufficient but this would still need to be reviewed separately.

Needing to keep both patterns (as neither minus nor xor can be
considered "more canonical" than the other) seems like an annoyance.

Regards
 Robin

[PATCH] tree-optimization/107323 - loop distribution partition ordering issue

2022-10-21 Thread Richard Biener via Gcc-patches

The following reverts part of the PR94125 fix which causes us to
use a bogus partition ordering after applying versioning for
alias to the testcase in PR107323.  Instead PR94125 is fixed by
appropriately considering to be merged SCCs when skipping edges
we want to ignore because of the alias versioning.

Bootstrapped and tested on x86_64-unknown-linux-gnu,
on the 10 branch where reverting the part of PR94125 reproduces
the original issue and that's fixed by the adjustment,
on the 12 branch where the PR107323 bug can be reproduced, and
on trunk.

Pushed to trunk and gcc-12 sofar.

PR tree-optimization/107323
* tree-loop-distribution.cc (pg_unmark_merged_alias_ddrs):
New function.
(loop_distribution::break_alias_scc_partitions): Revert
postorder save/restore from the PR94125 fix.  Instead
make sure to not ignore edges from SCCs we are going to
merge.

* gcc.dg/tree-ssa/pr107323.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr107323.c | 28 +
 gcc/tree-loop-distribution.cc| 50 +---
 2 files changed, 64 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107323.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107323.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr107323.c
new file mode 100644
index 000..1204b6e36d5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107323.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-tree-vectorize" } */
+
+int A[4];
+int B[4];
+
+static const char *__attribute__((noipa)) foo()
+{
+  return "1";
+}
+
+int main()
+{
+  const char *s = foo();
+
+  A[0] = 1000;
+  for(int i = 1; i < 4; ++i) {
+  B[i] = 0;
+  A[i] = 0;
+  if(s[0])
+   B[i] = 1;
+  A[i] = A[i - 1];
+  }
+
+  if (A[3] != 1000)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index e1948fb452a..ed3dd73e1a9 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -2201,8 +2201,6 @@ struct pg_edge_callback_data
   bitmap sccs_to_merge;
   /* Array constains component information for all vertices.  */
   int *vertices_component;
-  /* Array constains postorder information for all vertices.  */
-  int *vertices_post;
   /* Vector to record all data dependence relations which are needed
  to break strong connected components by runtime alias checks.  */
   vec *alias_ddrs;
@@ -2452,6 +2450,33 @@ pg_collect_alias_ddrs (struct graph *g, struct 
graph_edge *e, void *data)
 cbdata->alias_ddrs->safe_splice (edata->alias_ddrs);
 }
 
+/* Callback function for traversing edge E.  DATA is private
+   callback data.  */
+
+static void
+pg_unmark_merged_alias_ddrs (struct graph *, struct graph_edge *e, void *data)
+{
+  int i, j, component;
+  struct pg_edge_callback_data *cbdata;
+  struct pg_edata *edata = (struct pg_edata *) e->data;
+
+  if (edata == NULL || edata->alias_ddrs.length () == 0)
+return;
+
+  cbdata = (struct pg_edge_callback_data *) data;
+  i = e->src;
+  j = e->dest;
+  component = cbdata->vertices_component[i];
+  /* Make sure to not skip vertices inside SCCs we are going to merge.  */
+  if (component == cbdata->vertices_component[j]
+  && bitmap_bit_p (cbdata->sccs_to_merge, component))
+{
+  edata->alias_ddrs.release ();
+  delete edata;
+  e->data = NULL;
+}
+}
+
 /* This is the main function breaking strong conected components in
PARTITIONS giving reduced depdendence graph RDG.  Store data dependence
relations for runtime alias check in ALIAS_DDRS.  */
@@ -2511,7 +2536,6 @@ loop_distribution::break_alias_scc_partitions (struct 
graph *rdg,
   cbdata.sccs_to_merge = sccs_to_merge;
   cbdata.alias_ddrs = alias_ddrs;
   cbdata.vertices_component = XNEWVEC (int, pg->n_vertices);
-  cbdata.vertices_post = XNEWVEC (int, pg->n_vertices);
   /* Record the component information which will be corrupted by next
 graph scc finding call.  */
   for (i = 0; i < pg->n_vertices; ++i)
@@ -2520,17 +2544,18 @@ loop_distribution::break_alias_scc_partitions (struct 
graph *rdg,
   /* Collect data dependences for runtime alias checks to break SCCs.  */
   if (bitmap_count_bits (sccs_to_merge) != (unsigned) num_sccs)
{
- /* Record the postorder information which will be corrupted by next
-graph SCC finding call.  */
- for (i = 0; i < pg->n_vertices; ++i)
-   cbdata.vertices_post[i] = pg->vertices[i].post;
+ /* For SCCs we want to merge clear all alias_ddrs for edges
+inside the component.  */
+ for_each_edge (pg, pg_unmark_merged_alias_ddrs, &cbdata);
 
  /* Run SCC finding algorithm again, with alias dependence edges
 skipped.  This is to topologically sort partitions according to
 compilation time known dependence.  Note the topological order
 is stored in the fo

Re: [PATCH][AArch64] Improve immediate expansion [PR106583]

2022-10-21 Thread Richard Sandiford via Gcc-patches

Wilco Dijkstra  writes:
> Hi Richard,
>
>> Can you do the aarch64_mov_imm changes as a separate patch?  It's difficult
>> to review the two changes folded together like this.
>
> Sure, I'll send a separate patch. So here is version 2 again:

I still think we should move the functions to avoid the forward
declarations.  That part was fine (and OK to review).  It was folding
in the extra changes to the way that we generate move immediates that
made it difficult.

Could you send a patch that makes only the changes in v2, but moves
the functions around?  In fact, the positioning of the functions
in the v3 patch looked good, so the patch is OK with the contents
of v2 but the positioning of v3.

Thanks,
Richard

> [PATCH v2][AArch64] Improve immediate expansion [PR106583]
>
> Improve immediate expansion of immediates which can be created from a
> bitmask immediate and 2 MOVKs.  Simplify, refactor and improve
> efficiency of bitmask checks.  This reduces the number of 4-instruction
> immediates in SPECINT/FP by 10-15%.
>
> Passes regress, OK for commit?
>
> gcc/ChangeLog:
>
> PR target/106583
> * config/aarch64/aarch64.cc (aarch64_internal_mov_immediate)
> Add support for a bitmask immediate with 2 MOVKs.
> (aarch64_check_bitmask): New function after refactorization.
> (aarch64_replicate_bitmask_imm): Remove function, merge into...
> (aarch64_bitmask_imm): Simplify replication of small modes.
> Split function into 64-bit only version for efficiency.
>
> gcc/testsuite:
> PR target/106583
> * gcc.target/aarch64/pr106583.c: Add new test.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 926e81f028c82aac9a5fecc18f921f84399c24ae..b2d9c7380975028131d0fe731a97b3909874b87b
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -306,6 +306,7 @@ static machine_mode aarch64_simd_container_mode 
> (scalar_mode, poly_int64);
>  static bool aarch64_print_address_internal (FILE*, machine_mode, rtx,
>  aarch64_addr_query_type);
>  static HOST_WIDE_INT aarch64_clamp_to_uimm12_shift (HOST_WIDE_INT val);
> +static bool aarch64_bitmask_imm (unsigned HOST_WIDE_INT);
>
>  /* The processor for which instructions should be scheduled.  */
>  enum aarch64_processor aarch64_tune = cortexa53;
> @@ -5502,6 +5503,30 @@ aarch64_output_sve_vector_inc_dec (const char 
> *operands, rtx x)
>   factor, nelts_per_vq);
>  }
>
> +/* Return true if the immediate VAL can be a bitfield immediate
> +   by changing the given MASK bits in VAL to zeroes, ones or bits
> +   from the other half of VAL.  Return the new immediate in VAL2.  */
> +static inline bool
> +aarch64_check_bitmask (unsigned HOST_WIDE_INT val,
> +  unsigned HOST_WIDE_INT &val2,
> +  unsigned HOST_WIDE_INT mask)
> +{
> +  val2 = val & ~mask;
> +  if (val2 != val && aarch64_bitmask_imm (val2))
> +return true;
> +  val2 = val | mask;
> +  if (val2 != val && aarch64_bitmask_imm (val2))
> +return true;
> +  val = val & ~mask;
> +  val2 = val | (((val >> 32) | (val << 32)) & mask);
> +  if (val2 != val && aarch64_bitmask_imm (val2))
> +return true;
> +  val2 = val | (((val >> 16) | (val << 48)) & mask);
> +  if (val2 != val && aarch64_bitmask_imm (val2))
> +return true;
> +  return false;
> +}
> +
>  static int
>  aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
>  scalar_int_mode mode)
> @@ -5568,36 +5593,43 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, 
> bool generate,
>one_match = ((~val & mask) == 0) + ((~val & (mask << 16)) == 0) +
>  ((~val & (mask << 32)) == 0) + ((~val & (mask << 48)) == 0);
>
> -  if (zero_match != 2 && one_match != 2)
> +  if (zero_match < 2 && one_match < 2)
>  {
>/* Try emitting a bitmask immediate with a movk replacing 16 bits.
>   For a 64-bit bitmask try whether changing 16 bits to all ones or
>   zeroes creates a valid bitmask.  To check any repeated bitmask,
>   try using 16 bits from the other 32-bit half of val.  */
>
> -  for (i = 0; i < 64; i += 16, mask <<= 16)
> -   {
> - val2 = val & ~mask;
> - if (val2 != val && aarch64_bitmask_imm (val2, mode))
> -   break;
> - val2 = val | mask;
> - if (val2 != val && aarch64_bitmask_imm (val2, mode))
> -   break;
> - val2 = val2 & ~mask;
> - val2 = val2 | (((val2 >> 32) | (val2 << 32)) & mask);
> - if (val2 != val && aarch64_bitmask_imm (val2, mode))
> -   break;
> -   }
> -  if (i != 64)
> -   {
> - if (generate)
> +  for (i = 0; i < 64; i += 16)
> +   if (aarch64_check_bitmask (val, val2, mask << i))
> + {
> +   if (generate)
> + {
> +   emit_insn (gen_rtx_SET (dest,

Re: [PATCH v4] btf: Add support to BTF_KIND_ENUM64 type

2022-10-21 Thread Indu Bhagat via Gcc-patches


On 10/19/22 19:05, Guillermo E. Martinez wrote:

Hello,

The following is patch v4 to update BTF/CTF backend supporting
BTF_KIND_ENUM64 type. Changes from v3:

   + Remove `ctf_enum_binfo' structure.
   + Remove -m{little,big}-endian from dg-options in testcase.

Comments will be welcomed and appreciated!,

Kind regards,
guillermo
--



Thanks Guillermo.

LGTM.


BTF supports 64-bits enumerators with following encoding:

   struct btf_type:
 name_off: 0 or offset to a valid C identifier
 info.kind_flag: 0 for unsigned, 1 for signed
 info.kind: BTF_KIND_ENUM64
 info.vlen: number of enum values
 size: 1/2/4/8

The btf_type is followed by info.vlen number of:

 struct btf_enum64
 {
   uint32_t name_off;   /* Offset in string section of enumerator name.  */
   uint32_t val_lo32;   /* lower 32-bit value for a 64-bit value Enumerator 
*/
   uint32_t val_hi32;   /* high 32-bit value for a 64-bit value Enumerator 
*/
 };

So, a new btf_enum64 structure was added to represent BTF_KIND_ENUM64
and a new field dtd_enum_unsigned in ctf_dtdef structure to distinguish
when CTF enum is a signed or unsigned type, later that information is
used to encode the BTF enum type.

gcc/ChangeLog:

* btfout.cc (btf_calc_num_vbytes): Compute enumeration size depending of
enumerator type btf_enum{,64}.
(btf_asm_type): Update btf_kflag according to enumeration type sign
using dtd_enum_unsigned field for both:  BTF_KIND_ENUM{,64}.
(btf_asm_enum_const): New argument to represent the size of
the BTF enum type, writing the enumerator constant value for
32 bits, if it's 64 bits then explicitly writes lower 32-bits
value and higher 32-bits value.
(output_asm_btf_enum_list): Add enumeration size argument.
* ctfc.cc (ctf_add_enum): New argument to represent CTF enum
basic information.
(ctf_add_generic): Use of ei_{name. size, unsigned} to build the
dtd structure containing enumeration information.
(ctf_add_enumerator): Update comment mention support for BTF
enumeration in 64-bits.
* dwarf2ctf.cc (gen_ctf_enumeration_type): Extract signedness
for enumeration type and use it in ctf_add_enum.
* ctfc.h (ctf_dmdef): Update dmd_value to HOST_WIDE_INT to allow
use 32/64 bits enumerators.
information.
(ctf_dtdef): New field to describe enum signedness.

include/
* btf.h (btf_enum64): Add new definition and new symbolic
constant to BTF_KIND_ENUM64 and BTF_KF_ENUM_{UN,}SIGNED.

gcc/testsuite/ChangeLog:

* gcc.dg/debug/btf/btf-enum-1.c: Update testcase, with correct
info.kflags encoding.
* gcc.dg/debug/btf/btf-enum64-1.c: New testcase.
---
  gcc/btfout.cc | 30 ++---
  gcc/ctfc.cc   | 13 +++---
  gcc/ctfc.h|  5 ++-
  gcc/dwarf2ctf.cc  |  5 ++-
  gcc/testsuite/gcc.dg/debug/btf/btf-enum-1.c   |  2 +-
  gcc/testsuite/gcc.dg/debug/btf/btf-enum64-1.c | 44 +++
  include/btf.h | 19 ++--
  7 files changed, 100 insertions(+), 18 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-enum64-1.c

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index 997a33fa089..aef9fd70a28 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -223,7 +223,9 @@ btf_calc_num_vbytes (ctf_dtdef_ref dtd)
break;
  
  case BTF_KIND_ENUM:

-  vlen_bytes += vlen * sizeof (struct btf_enum);
+  vlen_bytes += (dtd->dtd_data.ctti_size == 0x8)
+   ? vlen * sizeof (struct btf_enum64)
+   : vlen * sizeof (struct btf_enum);
break;
  
  case BTF_KIND_FUNC_PROTO:

@@ -622,6 +624,15 @@ btf_asm_type (ctf_container_ref ctfc, ctf_dtdef_ref dtd)
btf_size_type = 0;
  }
  
+  if (btf_kind == BTF_KIND_ENUM)

+{
+  btf_kflag = dtd->dtd_enum_unsigned
+   ? BTF_KF_ENUM_UNSIGNED
+   : BTF_KF_ENUM_SIGNED;
+  if (dtd->dtd_data.ctti_size == 0x8)
+   btf_kind = BTF_KIND_ENUM64;
+   }
+
dw2_asm_output_data (4, dtd->dtd_data.ctti_name, "btt_name");
dw2_asm_output_data (4, BTF_TYPE_INFO (btf_kind, btf_kflag, btf_vlen),
   "btt_info: kind=%u, kflag=%u, vlen=%u",
@@ -634,6 +645,7 @@ btf_asm_type (ctf_container_ref ctfc, ctf_dtdef_ref dtd)
  case BTF_KIND_UNION:
  case BTF_KIND_ENUM:
  case BTF_KIND_DATASEC:
+case BTF_KIND_ENUM64:
dw2_asm_output_data (4, dtd->dtd_data.ctti_size, "btt_size: %uB",
   dtd->dtd_data.ctti_size);
return;
@@ -707,13 +719,19 @@ btf_asm_sou_member (ctf_container_ref ctfc, ctf_dmdef_t * 
dmd)
  }
  }
  
-/* Asm'out an enum constant following a BTF_KIND_ENUM.  */

+/* Asm'out an enum constant following a BTF_KIND_ENUM{,64}.  */
  
  static void

-

Re: [PATCH] Always use TYPE_MODE instead of DECL_MODE for vector field

2022-10-21 Thread Richard Biener via Gcc-patches

On Thu, Oct 20, 2022 at 6:58 PM H.J. Lu via Gcc-patches
 wrote:
>
> commit e034c5c895722e0092d2239cd8c2991db77d6d39
> Author: Jakub Jelinek 
> Date:   Sat Dec 2 08:54:47 2017 +0100
>
> PR target/78643
> PR target/80583
> * expr.c (get_inner_reference): If DECL_MODE of a non-bitfield
> is BLKmode for vector field with vector raw mode, use TYPE_MODE
> instead of DECL_MODE.
>
> fixed the case where DECL_MODE of a vector field is BLKmode and its
> TYPE_MODE is a vector mode because of target attribute.  Remove the
> BLKmode check for the case where DECL_MODE of a vector field is a vector
> mode and its TYPE_MODE is BLKmode because of target attribute.
>
> gcc/
>
> PR target/107304
> * expr.c (get_inner_reference): Always use TYPE_MODE for vector
> field with vector raw mode.
>
> gcc/testsuite/
>
> PR target/107304
> * gcc.target/i386/pr107304.c: New test.
> ---
>  gcc/expr.cc  |  3 +-
>  gcc/testsuite/gcc.target/i386/pr107304.c | 39 
>  2 files changed, 40 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107304.c
>
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index efe387e6173..9145193c2c1 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -7905,8 +7905,7 @@ get_inner_reference (tree exp, poly_int64_pod *pbitsize,
>   /* For vector fields re-check the target flags, as DECL_MODE
>  could have been set with different target flags than
>  the current function has.  */
> - if (mode == BLKmode
> - && VECTOR_TYPE_P (TREE_TYPE (field))
> + if (VECTOR_TYPE_P (TREE_TYPE (field))
>   && VECTOR_MODE_P (TYPE_MODE_RAW (TREE_TYPE (field

Isn't the check on TYPE_MODE_RAW also wrong then?  Btw, the mode could
also be an integer mode.

> mode = TYPE_MODE (TREE_TYPE (field));
> }
> diff --git a/gcc/testsuite/gcc.target/i386/pr107304.c 
> b/gcc/testsuite/gcc.target/i386/pr107304.c
> new file mode 100644
> index 000..24d68795e7f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr107304.c
> @@ -0,0 +1,39 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O0 -march=tigerlake" } */
> +
> +#include 
> +
> +typedef union {
> +  uint8_t v __attribute__((aligned(256))) __attribute__ ((vector_size(64 * 
> sizeof(uint8_t;
> +  uint8_t i[64] __attribute__((aligned(256)));
> +} stress_vec_u8_64_t;
> +
> +typedef struct {
> + struct {
> +  stress_vec_u8_64_t s;
> +  stress_vec_u8_64_t o;
> +  stress_vec_u8_64_t mask1;
> +  stress_vec_u8_64_t mask2;
> + } u8_64;
> +} stress_vec_data_t;
> +
> +__attribute__((target_clones("arch=alderlake", "default")))
> +void
> +stress_vecshuf_u8_64(stress_vec_data_t *data)
> +{
> +  stress_vec_u8_64_t *__restrict s;
> +  stress_vec_u8_64_t *__restrict mask1;
> +  stress_vec_u8_64_t *__restrict mask2;
> +  register int i;
> +
> +  s = &data->u8_64.s;
> +  mask1 = &data->u8_64.mask1;
> +  mask2 = &data->u8_64.mask2;
> +
> +  for (i = 0; i < 256; i++) {  /* was i < 65536 */
> +  stress_vec_u8_64_t tmp;
> +
> +  tmp.v = __builtin_shuffle(s->v, mask1->v);
> +  s->v = __builtin_shuffle(tmp.v, mask2->v);
> +  }
> +}
> --
> 2.37.3
>

Restore 'libgomp.oacc-c-c++-common/nvptx-sese-1.c' SESE regions checking [PR107195, PR107344] (was: [COMMITTED] [PR107195] Set range to zero when nonzero mask is 0.)

2022-10-21 Thread Thomas Schwinge

Hi!

On 2022-10-17T09:43:37+0200, I wrote:
> On 2022-10-11T10:31:37+0200, Aldy Hernandez via Gcc-patches 
>  wrote:
>> When solving 0 = _15 & 1, we calculate _15 as:
>>
>>  [irange] int [-INF, -2][0, +INF] NONZERO 0xfffe
>>
>> The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with
>> the above, yielding:
>>
>>  [0, 1] NONZERO 0x0
>>
>> This eventually gets copied to a _Bool [0, 1] NONZERO 0x0.
>>
>> This is problematic because here we have a bool which is zero, but
>> returns false for irange::zero_p, since the latter does not look at
>> nonzero bits.  This causes logical_combine to assume the range is
>> not-zero, and all hell breaks loose.
>>
>> I think we should just normalize a nonzero mask of 0 to [0, 0] at
>> creation, thus avoiding all this.
>
> 1. This commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> "[PR107195] Set range to zero when nonzero mask is 0" broke a GCC/nvptx
> offloading test case:
>
> UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> (test for excess errors)
> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> execution test
> [-PASS:-]{+FAIL:+} 
> libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   
> scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* 
> [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}"
>
> Same for C++.
>
> I'll later send a patch (for the test case!) to fix that up.

Pushed to master branch commit a9de836c2b22f878cff592b96e11c1b95d4d36ee
"Restore 'libgomp.oacc-c-c++-common/nvptx-sese-1.c' SESE regions checking 
[PR107195, PR107344]",
see attached.

That discussion I suppose is to be continued in
 "GCC/nvptx SESE region optimization".


Grüße
 Thomas


>>  PR tree-optimization/107195
>>
>> gcc/ChangeLog:
>>
>>  * value-range.cc (irange::set_range_from_nonzero_bits): Set range
>>  to [0,0] when nonzero mask is 0.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.dg/tree-ssa/pr107195-1.c: New test.
>>  * gcc.dg/tree-ssa/pr107195-2.c: New test.
>> ---
>>  gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c | 15 +++
>>  gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c | 16 
>>  gcc/value-range.cc |  5 +
>>  3 files changed, 36 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c
>> new file mode 100644
>> index 000..a0c20dbd4b1
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c
>> @@ -0,0 +1,15 @@
>> +// { dg-do run }
>> +// { dg-options "-O1 -fno-tree-ccp" }
>> +
>> +int a, b;
>> +int main() {
>> +  int c = 0;
>> +  if (a)
>> +c = 1;
>> +  c = 1 & (a && c) && b;
>> +  if (a) {
>> +b = c;
>> +__builtin_abort ();
>> +  }
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c
>> new file mode 100644
>> index 000..d447c78bdd3
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c
>> @@ -0,0 +1,16 @@
>> +// { dg-do run }
>> +// { dg-options "-O1" }
>> +
>> +int a, b;
>> +int main() {
>> +  int c = 0;
>> +  long d;
>> +  for (; b < 1; b++) {
>> +(c && d) & 3 || a;
>> +d = c;
>> +c = -1;
>> +if (d)
>> +  __builtin_abort();
>> +  }
>> +  return 0;
>> +}
>> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
>> index a14f9bc4394..e07d2aa9a5b 100644
>> --- a/gcc/value-range.cc
>> +++ b/gcc/value-range.cc
>> @@ -2903,6 +2903,11 @@ irange::set_range_from_nonzero_bits ()
>>  }
>>return true;
>>  }
>> +  else if (popcount == 0)
>> +{
>> +  set_zero (type ());
>> +  return true;
>> +}
>>return false;
>>  }
>>
>> --
>> 2.37.3
>
>
> From dc4644dcef05a1f21a9ebc194689f31412811387 Mon Sep 17 00:00:00 2001
> From: Thomas Schwinge 
> Date: Mon, 17 Oct 2022 09:10:03 +0200
> Subject: [PATCH] Add 'c-c++-common/torture/pr107195-1.c' [PR107195]
>
> ... to display optimization performed as of recent
> commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> "[PR107195] Set range to zero when nonzero mask is 0".
>
>   PR tree-optimization/107195
>   gcc/testsuite/
>   * c-c++-common/torture/pr107195-1.c: New.
> ---
>  .../c-c++-common/torture/pr107195-1.c | 41 +++
>  1 file changed, 41 insertions(+)
>  create mode 100644 gcc/testsuite/c-c++-common/torture/pr107195-1.c
>
> diff --git a/gcc/testsuite/c-c++-common/torture/pr107195-1.c

[committed] libstdc++: Fix std::move_only_function for incomplete parameter types

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

The std::move_only_function::__param_t alias template attempts to
optimize argument passing for the invoker, by passing by rvalue
reference for types that are non-trivial or large. However, the
precondition for is_trivally_copyable makes it unsuitable for using
here, and can cause ODR violations. Just use is_scalar instead, and pass
all class types (even small, trivial ones) by value.

libstdc++-v3/ChangeLog:

* include/bits/mofunc_impl.h (move_only_function::__param_t):
Use __is_scalar instead of is_trivially_copyable.
* testsuite/20_util/move_only_function/call.cc: Check parameters
involving incomplete types.
---
 libstdc++-v3/include/bits/mofunc_impl.h   |  5 +
 .../testsuite/20_util/move_only_function/call.cc  | 11 +++
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/bits/mofunc_impl.h 
b/libstdc++-v3/include/bits/mofunc_impl.h
index 405c4054642..47e1e506306 100644
--- a/libstdc++-v3/include/bits/mofunc_impl.h
+++ b/libstdc++-v3/include/bits/mofunc_impl.h
@@ -205,10 +205,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 private:
   template
-   using __param_t
- = __conditional_t
- && sizeof(_Tp) <= sizeof(long),
-   _Tp, _Tp&&>;
+   using __param_t = __conditional_t, _Tp, _Tp&&>;
 
   using _Invoker = _Res (*)(_Mofunc_base _GLIBCXX_MOF_CV*,
__param_t<_ArgTypes>...) noexcept(_Noex);
diff --git a/libstdc++-v3/testsuite/20_util/move_only_function/call.cc 
b/libstdc++-v3/testsuite/20_util/move_only_function/call.cc
index 68aa20568eb..3e159836412 100644
--- a/libstdc++-v3/testsuite/20_util/move_only_function/call.cc
+++ b/libstdc++-v3/testsuite/20_util/move_only_function/call.cc
@@ -191,10 +191,21 @@ test04()
   VERIFY( std::move(std::as_const(f5))() == 3 );
 }
 
+struct Incomplete;
+
+void
+test_params()
+{
+  std::move_only_function f1;
+  std::move_only_function f2;
+  std::move_only_function f3;
+}
+
 int main()
 {
   test01();
   test02();
   test03();
   test04();
+  test_params();
 }
-- 
2.37.3

Re: [PATCH] libstdc++: respect with-{headers, newlib} for default hosted value

Pushed to trunk, thanks.

On Wed, 12 Oct 2022 at 20:48, Arsen Arsenović via Libstdc++
 wrote:
>
> This saves us a build flag when building for freestanding targets.
>
> libstdc++-v3/ChangeLog:
>
> * acinclude.m4: Default hosted to off if building without
> headers and without newlib.
> ---
> Tested for x86_64-elf.
>
>  libstdc++-v3/acinclude.m4 | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
> index 719eab15c77..8f4e901c909 100644
> --- a/libstdc++-v3/acinclude.m4
> +++ b/libstdc++-v3/acinclude.m4
> @@ -2982,7 +2982,10 @@ AC_DEFUN([GLIBCXX_ENABLE_HOSTED], [
> enable_hosted_libstdcxx=no
> ;;
> *)
> -   enable_hosted_libstdcxx=yes
> +   case "${with_newlib}-${with_headers}" in
> +   no-no) enable_hosted_libstdcxx=no ;;
> +   *) enable_hosted_libstdcxx=yes ;;
> +   esac
> ;;
>   esac])
>
> --
> 2.38.0
>

RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-21 Thread Kumar, Venkataramanan via Gcc-patches

Hi all, 

> -Original Message-
> From: Joshi, Tejas Sanjay 
> Sent: Monday, October 17, 2022 8:09 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kumar, Venkataramanan ;
> honza.hubi...@gmail.com; Uros Bizjak 
> Subject: RE: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen4 CPU
> 
> [Public]
> 
> Hi,
> 
> > BTW: Perhaps znver1.md is not the right filename anymore, since it hosts
> all four Zen schedulers.
> 
> I have renamed the file to znver.md in this revision, PFA.
> Thank you for the review, we will push it for trunk if we don't get any
> further comments.

I have pushed the patch on behalf of Tejas. 

Regards,
Venkat.

Re: Adding a new thread model to GCC

2022-10-21 Thread Charles-François Natali via Gcc-patches

How does this compare with Eric B's proposal at
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-06/msg01840.html ?

It would be good if we can accept one of them for GCC 13, but I don't
know Windows well enough to determine which is better.

On Sat, 1 Oct 2022 at 19:35, LIU Hao via Libstdc++
 wrote:
>
> Greetings.
>
> After some years I think it's time to put on this topic again.
>
> This patch series is an attempt to add a new thread model basing on the 
> mcfgthread library
> (https://github.com/lhmouse/mcfgthread), which provides efficient 
> implementations of mutexes,
> condition variables, once flags, etc. for native Windows.
>
>
> The first patch is necessary because somewhere in libgfortran, `pthread_t` is 
> referenced. If the
> thread model is not `posix`, it fails to compile.
>
> The second patch implements `std::thread::hardware_concurrency()` for 
> non-posix thread models. This
> would also work for the win32 thread model if `std::thread` would be 
> supported in the future.
>
> The third patch adds the `mcf` thread model for GCC and its libraries. A new 
> builtin macro
> `__USING_MCFGTHREAD__` is added to indicate whether this new thread model is 
> in effect. This grants
> `std::mutex` and `std::once_flag` trivial destructors; 
> `std::condition_variable` is a bit
> unfortunate because its destructor is non-trivial, but in reality no cleanup 
> is performed.
>
>
> I have been bootstrapping GCC with the MCF thread model for more than five 
> years. At the moment, C,
> C++ and Fortran are supported. Ada is untested because I don't know how to 
> bootstrap it. Objective-C
> is not supported, because threading APIs for libobjc have not been 
> implemented.
>
> Please review. If there are any changes that I have to make, let me know.
>
>
> --
> Best regards,
> LIU Hao

Re: [PING 3] [PATCH v2] libstdc++: basic_filebuf: don't flush more often than necessary.

On Thu, Oct 6, 2022, 20:03 Charles-Francois Natali 
wrote:

> `basic_filebuf::xsputn` would bypass the buffer when passed a chunk of
> size 1024 and above, seemingly as an optimisation.
>
> This can have a significant performance impact if the overhead of a
> `write` syscall is non-negligible, e.g. on a slow disk, on network
> filesystems, or simply during IO contention because instead of flushing
> every `BUFSIZ` (by default), we can flush every 1024 char.
> The impact is even greater with custom larger buffers, e.g. for network
> filesystems, because the code could issue `write` for example 1000X more
> often than necessary with respect to the buffer size.
> It also introduces a significant discontinuity in performance when
> writing chunks of size 1024 and above.
>
> See this reproducer which writes down a fixed number of chunks to a file
> open with `O_SYNC` - to replicate high-latency `write` - for varying
> size of chunks:
>
> ```
> $ cat test_fstream_flush.cpp
>
> int
> main(int argc, char* argv[])
> {
>   assert(argc == 3);
>
>   const auto* path = argv[1];
>   const auto chunk_size = std::stoul(argv[2]);
>
>   const auto fd =
> open(path, O_CREAT | O_TRUNC | O_WRONLY | O_SYNC | O_CLOEXEC, 0666);
>   assert(fd >= 0);
>
>   auto filebuf = __gnu_cxx::stdio_filebuf(fd, std::ios_base::out);
>   auto stream = std::ostream(&filebuf);
>
>   const auto chunk = std::vector(chunk_size);
>
>   for (auto i = 0; i < 1'000; ++i) {
> stream.write(chunk.data(), chunk.size());
>   }
>
>   return 0;
> }
> ```
>
> ```
> $ g++ -o /tmp/test_fstream_flush test_fstream_flush.cpp -std=c++17
> $ for i in $(seq 1021 1025); do echo -e "\n$i"; time
> /tmp/test_fstream_flush /tmp/foo $i; done
>
> 1021
>
> real0m0.997s
> user0m0.000s
> sys 0m0.038s
>
> 1022
>
> real0m0.939s
> user0m0.005s
> sys 0m0.032s
>
> 1023
>
> real0m0.954s
> user0m0.005s
> sys 0m0.034s
>
> 1024
>
> real0m7.102s
> user0m0.040s
> sys 0m0.192s
>
> 1025
>
> real0m7.204s
> user0m0.025s
> sys 0m0.209s
> ```
>
> See the huge drop in performance at the 1024-boundary.
>
> An `strace` confirms that from size 1024 we effectively defeat
> buffering:
> 1023-sized writes
> ```
> $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush
> /tmp/foo 1023 2>&1 | head -n5
> openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC,
> 0666) = 3
> writev(3,
> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=8184},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1023}], 2) = 9207
> writev(3,
> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=8184},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1023}], 2) = 9207
> writev(3,
> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=8184},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1023}], 2) = 9207
> writev(3,
> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=8184},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1023}], 2) = 9207
> ```
>
> vs 1024-sized writes
> ```
> $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush
> /tmp/foo 1024 2>&1 | head -n5
> openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC,
> 0666) = 3
> writev(3, [{iov_base=NULL, iov_len=0},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1024}], 2) = 1024
> writev(3, [{iov_base="", iov_len=0},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1024}], 2) = 1024
> writev(3, [{iov_base="", iov_len=0},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1024}], 2) = 1024
> writev(3, [{iov_base="", iov_len=0},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1024}], 2) = 1024
> ```
>
> Instead, it makes sense to only bypass the buffer if the amount of data
> to be written is larger than the buffer capacity.
>
> Closes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63746
>
> Signed-off-by: Charles-Francois Natali 
> ---
>  libstdc++-v3/include/bits/fstream.tcc |  9 ++---
>  .../27_io/basic_filebuf/sputn/char/63746.cc   | 38 +++
>  2 files changed, 41 insertions(+), 6 deletions(-)
>  create mode 100644
> libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc
>
> diff --git a/libstdc++-v3/include/bits/fstream.tcc
> b/libstdc++-v3/include/bits/fstream.tcc
> index 7ccc887b8..2e9369628 100644
> --- a/libstdc++-v3/include/bits/fstream.tcc
> +++ b/libstdc++-v3/include/bits/fstream.tcc
> @@ -757,23 +757,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  {
>streamsize __ret = 0;
>// Optimization i

Re: Adding a new thread model to GCC


On 2022-10-21 09:58, Jonathan Wakely via Libstdc++ wrote:

How does this compare with Eric B's proposal at
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-06/msg01840.html ?

It would be good if we can accept one of them for GCC 13, but I don't
know Windows well enough to determine which is better.


I had the same question...
I would like to understand what is the difference?
Moreover I would like to understand what is the difference with the 
already added support for the winpthreads library?


@LIU Hao, could you explain please?



best!



On Sat, 1 Oct 2022 at 19:35, LIU Hao via Libstdc++
 wrote:


Greetings.

After some years I think it's time to put on this topic again.

This patch series is an attempt to add a new thread model basing on 
the mcfgthread library
(https://github.com/lhmouse/mcfgthread), which provides efficient 
implementations of mutexes,

condition variables, once flags, etc. for native Windows.


The first patch is necessary because somewhere in libgfortran, 
`pthread_t` is referenced. If the

thread model is not `posix`, it fails to compile.

The second patch implements `std::thread::hardware_concurrency()` for 
non-posix thread models. This
would also work for the win32 thread model if `std::thread` would be 
supported in the future.


The third patch adds the `mcf` thread model for GCC and its libraries. 
A new builtin macro
`__USING_MCFGTHREAD__` is added to indicate whether this new thread 
model is in effect. This grants
`std::mutex` and `std::once_flag` trivial destructors; 
`std::condition_variable` is a bit
unfortunate because its destructor is non-trivial, but in reality no 
cleanup is performed.



I have been bootstrapping GCC with the MCF thread model for more than 
five years. At the moment, C,
C++ and Fortran are supported. Ada is untested because I don't know 
how to bootstrap it. Objective-C
is not supported, because threading APIs for libobjc have not been 
implemented.


Please review. If there are any changes that I have to make, let me 
know.



--
Best regards,
LIU Hao

Re: [PATCH 1/2] Add a parameter for the builtin function of prefetch to align with LLVM





On 20/10/2022 18:37, Andrew Pinski via Gcc-patches wrote:

On Thu, Oct 20, 2022 at 10:28 AM Segher Boessenkool
 wrote:


On Thu, Oct 20, 2022 at 01:44:15AM +, Jiang, Haochen wrote:

Maybe the testcase change cause some misunderstanding and concern.

Actually, the patch did not disrupt the previous builtins, as the 
builtin_prefetch
uses vargs. I set the default value of the new parameter as data prefetch, which
means that if we are not using the fourth parameter, just like how we use
prefetch previously, it is still what it is.


I still think it is a mistake to have one builtin do two very distinct
operations, only very superficially related.  Instruction fetch and data
demand loads are almosty entirely unrelated, and so is the prefetch
machinery for them, on all machines I am familiar with.


On aarch64 (armv8), it is actually the same instruction: PRFM. It
might be the only one which is that way though.
It even allows to specify the level for the instruction prefetch too
(which is actually useful for say OcteonTX2 which has an interesting
cache hierarchy).



Just because the encodings are similar doesn't mean that the 
instructions are the same, although it's true that once you reach 
unification in the cache hierarchy the end behaviour /might/ be 
indistinguishable.


Really, Segher's point seems to be 'why overload the existing builtin 
for this'?  It's not like the new parameter is something that users 
would really need to pass in as a run-time choice; and that wouldn't 
work anyway because in the end we do need distinct instructions.


R.


Though I agree it is a mistake to have one builtin which handles both
data and instruction prefetch.

Thanks,
Andrew



Which makes
sense anyway, since instruction prefetch and data prefetch have
completely different performance characteristics and considerations.
Maybe if you start with the mistake of having unified L1 caches it
seems natural, but thankfully most machines do not do that.


Segher

Ping (c,c++): Handling of main() function for freestanding

2022-10-21 Thread Arsen Arsenović via Gcc-patches

Ping on this patch.

https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603574.html

For context, see the rest of this thread.  TL;DR is that `int main' 
should implicitly return 0 on freestanding, without the other burdens of 
main (hosted should remain unchanged, as well as non-int `main's).  This 
applies to both the C and C++ frontends.
-- 
Arsen Arsenović


signature.asc
Description: This is a digitally signed message part.

Re: Adding a new thread model to GCC

On Fri, 21 Oct 2022 at 11:10, i.nixman--- via Libstdc++
 wrote:
>
> On 2022-10-21 09:58, Jonathan Wakely via Libstdc++ wrote:
> > How does this compare with Eric B's proposal at
> > https://gcc.gnu.org/legacy-ml/gcc-patches/2019-06/msg01840.html ?
> >
> > It would be good if we can accept one of them for GCC 13, but I don't
> > know Windows well enough to determine which is better.
>
> I had the same question...
> I would like to understand what is the difference?
> Moreover I would like to understand what is the difference with the
> already added support for the winpthreads library?

Well that one's easy, you don't need to use winpthreads if there's a
native thread model, so you don't need to go through two abstraction
layers (gthreads and winpthreads), just one (gthreads).

The benefits of using the native thread model for the OS seems
obvious. The question is which patch to do that should we use.

Re: Adding a new thread model to GCC

On 2022-10-21 10:48, Jonathan Wakely wrote:

On Fri, 21 Oct 2022 at 11:10, i.nixman--- via Libstdc++
 wrote:

On 2022-10-21 09:58, Jonathan Wakely via Libstdc++ wrote:
> How does this compare with Eric B's proposal at
> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-06/msg01840.html ?
>
> It would be good if we can accept one of them for GCC 13, but I don't
> know Windows well enough to determine which is better.

I had the same question...
I would like to understand what is the difference?
Moreover I would like to understand what is the difference with the
already added support for the winpthreads library?

Well that one's easy, you don't need to use winpthreads if there's a
native thread model, so you don't need to go through two abstraction
layers (gthreads and winpthreads), just one (gthreads).

sure!

nevertheless I would like to understand why we have two separate 
implementations (winthreads and mcfgthread)?

what is the difference?

best!

Re: Proxy ping [PATCH] Fortran: Add missing TKR initialization to class variables [PR100097, PR100098]

2022-10-21 Thread Mikael Morin


Le 18/10/2022 à 22:48, Harald Anlauf via Fortran a écrit :

I intended to add the updated patch but forgot, so here it is...

Am 18.10.22 um 22:41 schrieb Harald Anlauf via Fortran:

Dear all,

Jose posted a patch here that was never reviewed:

   https://gcc.gnu.org/pipermail/fortran/2021-April/055933.html

I could not find any issues with his patch, it works as advertised
and fixes the reported problem.

As his testcases did not reliably fail without the patch but rather
randomly due to the uninitialized descriptor, I added a check of
the tree-dumps to verify that the TKR initializer is generated.

Does anybody else have any comments?

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Looks good but please check the initialization of rank instead of 
elem_len in the dump patterns (elem_len actually doesn't matter).

OK with that change.

Thanks.

Re: Adding a new thread model to GCC

2022-10-21 Thread LIU Hao via Gcc-patches

在 2022/10/21 18:09, i.nix...@autistici.org 写道:

On 2022-10-21 09:58, Jonathan Wakely via Libstdc++ wrote:

How does this compare with Eric B's proposal at
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-06/msg01840.html ?

It would be good if we can accept one of them for GCC 13, but I don't
know Windows well enough to determine which is better.

I had the same question...
I would like to understand what is the difference?
Moreover I would like to understand what is the difference with the already added support for the
winpthreads library?

@LIU Hao, could you explain please?

Thank you for your interest. I'm glad to make an introduction of it.

I have read this patch before. Let's take the mutex as an example:

There are a lot of ways to implement a mutex on Windows. Basically, a non-recursive mutex can be
implemented with an atomic counter + a binary semaphore / auto-reset event. This proposed patch
contains a `__gthr_win32_CRITICAL_SECTION` definition that I think is a duplicate of the internal
`CRITICAL_SECTION` structure, so should also work the same way as it.

The problem about this approach is that, semaphores are valuable kernel objects, and the maximum
number of HANDLEs that a process can open concurrently has a limit (like FDs on Linux), while 'many
critical sections are used only occasionally (or never at all), meaning the auto-reset event often
isn’t even necessary' [1], the semaphores are actually allocated on demand. This means that locking
can fail. There is a story in article [1] which also explains the origination of keyed events; it's
worth reading.

And, since Vista we also have native win32 condition variables, also
implemented basing on keyed events.

The keyed events are undocumented and are only exposed via syscalls. However, as with other
documented syscalls, available from Windows Drivers Kit, there are several advantages:

* There is a global keyed event, which requires no initialization, but
can be utilized by all processes. Basing on that, mcfgthread provides
mutexs, condition variables, once flags, etc. that are all one-pointer
size structs, consume absolutely no additional resource, allow
constexpr initialization, and require no cleanup, much like on Linux.

* The wait syscalls take a 64-bit integer, whose positive value denotes
the number of 10^-7 seconds since 1600-01-01 00:00:00 Z, and whose
negative value denotes a relative timeout. Hence it's much more simpler
to implement `__gthread_mutex_timedlock()` and `__gthread_cond_wait()`
which take absolute timeouts. On the other hand, Win32 APIs generally
take a 32-bit relative timeout in milliseconds, which not only requires
translation from an absolute timepoint argument, but can also easily
get overflown.

* Building mutexes on top of syscalls allows a better designed algorithm
[2], and sometimes it can even outperform native `SRWLOCK`s [3].

* mcfgthread also provides standard-conforming `__cxa_atexit()` and
`__cxa_thread_atexit()` functions, for working around some strange,
weird, and broken behaviors [4][5][6]. On Linux it's glibc that
provides them, so this as a whole requires a little modification in
mingw-w64. I am working on it however; hopefully we can land it soon.

[1]
http://joeduffyblog.com/2006/11/28/windows-keyed-events-critical-sections-and-new-vista-synchronization-features/

[2] https://github.com/lhmouse/mcfgthread/blob/master/MUTEX.md
[3] https://github.com/lhmouse/mcfgthread#benchmarking

[4] https://sourceforge.net/p/mingw-w64/mailman/message/37268447/
[5] https://reviews.llvm.org/D102944
[6] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80816

--
Best regards,
LIU Hao

OpenPGP_signature
Description: OpenPGP digital signature

Re: Adding a new thread model to GCC

2022-10-21 Thread Eric Botcazou via Gcc-patches

> How does this compare with Eric B's proposal at
> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-06/msg01840.html ?

My proposal was to reimplement (and extend) the native thread model (win32) 
instead of adding a new one, the advantage being that you don't need an extra 
threading layer between GCC and Windows.

-- 
Eric Botcazou

Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-21 Thread Richard Biener via Gcc-patches

On Fri, Oct 21, 2022 at 12:00 PM Kumar, Venkataramanan via Gcc-patches
 wrote:
>
> Hi all,
>
> > -Original Message-
> > From: Joshi, Tejas Sanjay 
> > Sent: Monday, October 17, 2022 8:09 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kumar, Venkataramanan ;
> > honza.hubi...@gmail.com; Uros Bizjak 
> > Subject: RE: [PATCH] [X86_64]: Enable support for next generation AMD
> > Zen4 CPU
> >
> > [Public]
> >
> > Hi,
> >
> > > BTW: Perhaps znver1.md is not the right filename anymore, since it hosts
> > all four Zen schedulers.
> >
> > I have renamed the file to znver.md in this revision, PFA.
> > Thank you for the review, we will push it for trunk if we don't get any
> > further comments.
>
> I have pushed the patch on behalf of Tejas.

This grew insn-automata.cc from 201502 lines to 639968 lines and the build
of the automata (genautomata) to several minutes in my dev tree.

You did something wrong.  Please fix!

Richard.

> Regards,
> Venkat.
>

Re: Adding a new thread model to GCC

On 2022-10-21 11:36, LIU Hao wrote:

在 2022/10/21 18:09, i.nix...@autistici.org 写道:

On 2022-10-21 09:58, Jonathan Wakely via Libstdc++ wrote:

How does this compare with Eric B's proposal at
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-06/msg01840.html ?

It would be good if we can accept one of them for GCC 13, but I don't
know Windows well enough to determine which is better.

I had the same question...
I would like to understand what is the difference?
Moreover I would like to understand what is the difference with the
already added support for the winpthreads library?

@LIU Hao, could you explain please?

Thank you for your interest. I'm glad to make an introduction of it.

I have read this patch before. Let's take the mutex as an example:

There are a lot of ways to implement a mutex on Windows. Basically, a
non-recursive mutex can be implemented with an atomic counter + a
binary semaphore / auto-reset event. This proposed patch contains a
`__gthr_win32_CRITICAL_SECTION` definition that I think is a duplicate
of the internal `CRITICAL_SECTION` structure, so should also work the
same way as it.

The problem about this approach is that, semaphores are valuable
kernel objects, and the maximum number of HANDLEs that a process can
open concurrently has a limit (like FDs on Linux), while 'many
critical sections are used only occasionally (or never at all),
meaning the auto-reset event often isn’t even necessary' [1], the
semaphores are actually allocated on demand. This means that locking
can fail. There is a story in article [1] which also explains the
origination of keyed events; it's worth reading.

And, since Vista we also have native win32 condition variables, also
implemented basing on keyed events.

The keyed events are undocumented and are only exposed via syscalls.
However, as with other documented syscalls, available from Windows
Drivers Kit, there are several advantages:

size structs, consume absolutely no additional resource, allow
constexpr initialization, and require no cleanup, much like on
Linux.

* The wait syscalls take a 64-bit integer, whose positive value
denotes

the number of 10^-7 seconds since 1600-01-01 00:00:00 Z, and whose
negative value denotes a relative timeout. Hence it's much more
simpler
to implement `__gthread_mutex_timedlock()` and
`__gthread_cond_wait()`
which take absolute timeouts. On the other hand, Win32 APIs
generally
take a 32-bit relative timeout in milliseconds, which not only
requires
translation from an absolute timepoint argument, but can also
easily

get overflown.

* Building mutexes on top of syscalls allows a better designed
algorithm

[2], and sometimes it can even outperform native `SRWLOCK`s [3].

thank you LIU Hao for the explanation!

I have a questions:
1) wouldn't it be logical not to write yet another implementation of
pthreads-wor-windows, but to make changes to the winpthreads library
because it's already supported by GCC? (maybe I don’t know about some
reasons why it wasn’t done ...)

It seems to me the ideal and logical option is to make your
implementation part of GCC, as suggested by Eric B.

the advantages are as follows:
1) we will get a high-quality native implementation.
2) there is no need to add another thread model for GCC.
3) with dynamic linking there is no need to ship another dll with the
program. (Windows users really don't like this =))

best!

[1]
http://joeduffyblog.com/2006/11/28/windows-keyed-events-critical-sections-and-new-vista-synchronization-features/

[2] https://github.com/lhmouse/mcfgthread/blob/master/MUTEX.md
[3] https://github.com/lhmouse/mcfgthread#benchmarking

[4] https://sourceforge.net/p/mingw-w64/mailman/message/37268447/
[5] https://reviews.llvm.org/D102944
[6] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80816

Re: Adding a new thread model to GCC


On 2022-10-21 11:44, Eric Botcazou via Libstdc++ wrote:

How does this compare with Eric B's proposal at
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-06/msg01840.html ?


My proposal was to reimplement (and extend) the native thread model 
(win32)
instead of adding a new one, the advantage being that you don't need an 
extra

threading layer between GCC and Windows.


I agree!



best!

Re: Adding a new thread model to GCC

2022-10-21 Thread Jacek Caban via Gcc-patches

The problem about this approach is that, semaphores are valuable kernel objects, and the maximum 
number of HANDLEs that a process can open concurrently has a limit (like FDs on Linux), while 'many 
critical sections are used only occasionally (or never at all), meaning the auto-reset event often 
isn’t even necessary' [1], the semaphores are actually allocated on demand. This means that locking 
can fail. There is a story in article [1] which also explains the origination of keyed events; it's 
worth reading.


This is not true for past 15 years, CRITICAL_SECTIONS use something like 
RtlWaitOnAddress (an equivalent of futexes) since Vista, see Wine 
implementation for details:
https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/ntdll/sync.c#L190

Jacek

Re: [PATCH 7/15] arm: Emit build attributes for PACBTI target feature





On 12/08/2022 16:30, Andrea Corallo via Gcc-patches wrote:

This patch emits assembler directives for PACBTI build attributes as
defined by the
ABI.



gcc/ChangeLog:

* config/arm/arm.c (arm_file_start): Emit EABI attributes for
Tag_PAC_extension, Tag_BTI_extension, TAG_BTI_use, TAG_PACRET_use.

gcc/testsuite/ChangeLog:

* gcc.target/arm/acle/pacbti-m-predef-1.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-3: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-6.c: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-7.c: Likewise.

Co-Authored-By: Tejas Belagod  



OK.

R.

Re: Adding a new thread model to GCC

2022-10-21 Thread LIU Hao via Gcc-patches

在 2022/10/21 19:54, i.nix...@autistici.org 写道:

I have a questions:
1) wouldn't it be logical not to write yet another implementation of pthreads-wor-windows, but to
make changes to the winpthreads library because it's already supported by GCC? (maybe I don’t know
about some reasons why it wasn’t done ...)

While it is possible to rebuild winpthreads from scratch, I don't think it's
worth:

* There are many POSIX facilities that we don't support: rwlock,
cancellation, signals, etc.

* GCC can choose to implement `std::thread` etc. on C11 ,
which libcxx already has, but I haven't tested it.
(mcfgthread also has a C11 header, but not one for libcxx.)

It seems to me the ideal and logical option is to make your implementation part of GCC, as suggested
by Eric B.

the advantages are as follows:
1) we will get a high-quality native implementation.
2) there is no need to add another thread model for GCC.
3) with dynamic linking there is no need to ship another dll with the program. (Windows users really
don't like this =))

Jacek Caban, who is also a mingw-w64 developer, expressed the same idea a few
days ago.

While integrating mcfgthread into gcc is practically possible, my concerns are:

* GCC never provides a threading library. It always depends on glibc,
musl, win32 APIs, winpthreads, etc.

* Tampering with the win32 thread model in a dramatic way is not
acceptiable due to backwards compatibility. There are distributions
that have win32 as the default thread model, such as Debian.

* I personally need more control for future development, for example,
re-implement pthread or adding libcxx support, which doesn't fit in
GCC.

--
Best regards,
LIU Hao

OpenPGP_signature
Description: OpenPGP digital signature

[PATCH] lto: Always quote path to touch

2022-10-21 Thread Torbjörn SVENSSON via Gcc-patches

When generating the makefile, make sure that the paths are quoted so
that a native Windows path works within Cygwin.

Without this patch, this error is reported by the DejaGNU test suite:

make: [T:\ccMf0kI3.mk:3: T:\ccGEvdDp.ltrans0.ltrans.o] Error 1 (ignored)

The generated makefile fragment without the patch:

T:\ccGEvdDp.ltrans0.ltrans.o:
  @T:\build\bin\arm-none-eabi-g++.exe '-xlto' ... '-o' 
'T:\ccGEvdDp.ltrans0.ltrans.o' 'T:\ccGEvdDp.ltrans0.o'
  @-touch -r T:\ccGEvdDp.ltrans0.o T:\ccGEvdDp.ltrans0.o.tem > /dev/null 2>&1 
&& mv T:\ccGEvdDp.ltrans0.o.tem T:\ccGEvdDp.ltrans0.o
.PHONY: all
all: \
  T:\ccGEvdDp.ltrans0.ltrans.o

With the patch, the touch line would be replace with:

  @-touch -r "T:\ccGEvdDp.ltrans0.o" "T:\ccGEvdDp.ltrans0.o.tem" > /dev/null 
2>&1 && mv "T:\ccGEvdDp.ltrans0.o.tem" "T:\ccGEvdDp.ltrans0.o"

gcc/ChangeLog:

* lto-wrapper.cc: Quote paths in makefile.

Co-Authored-By: Yvan ROUX 
Signed-off-by: Torbjörn SVENSSON 
---
 gcc/lto-wrapper.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index 9a764702ffc..b12bcc1ad27 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -2010,8 +2010,8 @@ cont:
 truncate them as soon as we have processed it.  This
 reduces temporary disk-space usage.  */
  if (! save_temps)
-   fprintf (mstream, "\t@-touch -r %s %s.tem > /dev/null 2>&1 "
-"&& mv %s.tem %s\n",
+   fprintf (mstream, "\t@-touch -r \"%s\" \"%s.tem\" > /dev/null "
+"2>&1 && mv \"%s.tem\" \"%s\"\n",
 input_name, input_name, input_name, input_name); 
}
  else
-- 
2.25.1

Re: Adding a new thread model to GCC

2022-10-21 Thread LIU Hao via Gcc-patches


在 2022/10/21 20:13, Jacek Caban 写道:


This is not true for past 15 years, CRITICAL_SECTIONS use something like 
RtlWaitOnAddress (an equivalent of futexes) since Vista, see Wine 
implementation for details:
https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/ntdll/sync.c#L190




Ah Jacek, nice to see you here.

I haven't dug into this too much, though. From my limited knowledge (mostly from reading 
disassembly) now CRITICAL_SECTION uses `NtWaitForAlertByThreadId` (and no longer keyed events or 
semaphores). As with `WaitOnAddress()`, there seems to be some global data structure, protected by a 
spin lock. It's just another undocumented syscall. Keyed events are still functional.



--
Best regards,
LIU Hao


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH 9/15] arm: Set again stack pointer as CFA reg when popping if necessary





On 27/09/2022 16:24, Kyrylo Tkachov via Gcc-patches wrote:




-Original Message-
From: Andrea Corallo 
Sent: Tuesday, September 27, 2022 11:06 AM
To: Kyrylo Tkachov 
Cc: Andrea Corallo via Gcc-patches ; Richard
Earnshaw ; nd 
Subject: Re: [PATCH 9/15] arm: Set again stack pointer as CFA reg when
popping if necessary

Kyrylo Tkachov  writes:


Hi Andrea,


-Original Message-
From: Gcc-patches  On Behalf Of Andrea
Corallo via Gcc-patches
Sent: Friday, August 12, 2022 4:34 PM
To: Andrea Corallo via Gcc-patches 
Cc: Richard Earnshaw ; nd 
Subject: [PATCH 9/15] arm: Set again stack pointer as CFA reg when

popping

if necessary

Hi all,

this patch enables 'arm_emit_multi_reg_pop' to set again the stack
pointer as CFA reg when popping if this is necessary.



 From what I can tell from similar functions this is correct, but could you

elaborate on why this change is needed for my understanding please?

Thanks,
Kyrill


Hi Kyrill,

sure, if the frame pointer was set, than it is the current CFA register.
If we request to adjust the current CFA register offset indicating it
being SP (while it's actually FP) that is indeed not correct and the
incoherence we will be detected by an assertion in the dwarf emission
machinery.


Thanks,  the patch is ok
Kyrill



Best Regards

   Andrea


Hmm, wait.  Why would a multi-reg pop be updating the stack pointer? 
Please can you show a code sequence where this is needed.


R.

Re: Adding a new thread model to GCC

2022-10-21 Thread Jacek Caban via Gcc-patches


On 2022-10-21 11:44, Eric Botcazou via Libstdc++ wrote:

/How does this compare with Eric B's proposal at />>>/https://gcc.gnu.org/legacy-ml/gcc-patches/2019-06/msg01840.html 
? />>//>>/My proposal was to reimplement (and extend) the native thread model />>/(win32) />>/instead 
of adding a new one, the advantage being that you don't need an />>/extra />/> threading layer between GCC and 
Windows. />

I agree!


I agree as well and I expressed that on mingw-w64 ML when the patch was 
introduced [1]. My main concern with the new threading model is that instead of 
solving root of the problem, it introduces more fragmentation with no clear 
benefit.

On top of that, mcfgthread library is way more invasive than it needs to be. It 
requires maintaining per-thread struct and reimplements a number of things 
instead of leveraging OS capabilities. Author also plans to make invasive 
changes to mingw-w64-crt, which go against it current approach of being 
agnostic to threading model.

Jacek

[1] https://sourceforge.net/p/mingw-w64/mailman/message/37719727/

Re: Adding a new thread model to GCC


On 2022-10-21 12:19, LIU Hao wrote:

在 2022/10/21 19:54, i.nix...@autistici.org 写道:





Jacek Caban, who is also a mingw-w64 developer, expressed the same
idea a few days ago.

While integrating mcfgthread into gcc is practically possible, my 
concerns are:


  * GCC never provides a threading library. It always depends on glibc,
musl, win32 APIs, winpthreads, etc.


I think you didn't understand me.

I mean not to integrate your library into GCC as real separate library.
I mean to do changes on 
config/i386/gthr-win32.h+config/i386/gthr-win32.c+config/i386/gthr-win32-cond.c 
etc using your code to have an implementation of everything needed for 
C/C++ threads on Windows.




  * Tampering with the win32 thread model in a dramatic way is not
acceptiable due to backwards compatibility. There are distributions
that have win32 as the default thread model, such as Debian.

  * I personally need more control for future development, for example,
re-implement pthread or adding libcxx support, which doesn't fit in
GCC.


got it...
anyway it seems logical to me the way I proposed :)


best!

Re: [PATCH] c++ modules: verify_type failure with typedef enum [PR106848]

2022-10-21 Thread Nathan Sidwell via Gcc-patches


On 10/19/22 09:55, Patrick Palka wrote:

On Wed, 19 Oct 2022, Richard Biener wrote:


On Tue, Oct 18, 2022 at 8:26 PM Patrick Palka  wrote:


On Fri, 14 Oct 2022, Richard Biener wrote:


On Thu, Oct 13, 2022 at 5:40 PM Patrick Palka via Gcc-patches
 wrote:


Here during stream in we end up having created a type variant for the enum
before we read the enum's definition, and thus the variant inherited stale
TYPE_VALUES and TYPE_MIN/MAX_VALUES, which leads to an ICE (with -g).  The
stale variant got created from set_underlying_type during earlier stream in
of the (redundant) typedef for the enum.

This patch works around this by setting TYPE_VALUES and TYPE_MIN/MAX_VALUES
for all variants when reading in an enum definition.  Does this look like
the right approach?  Or perhaps we need to arrange that we read the enum
definition before reading in the typedef decl?  Note that seems to be an
issue only when the typedef name and enum names are the same (thus the
typedef is redundant), otherwise we seem to read the enum definition first
as desired.

 PR c++/106848

gcc/cp/ChangeLog:

 * module.cc (trees_in::read_enum_def): Set the TYPE_VALUES,
 TYPE_MIN_VALUE and TYPE_MAX_VALUE of all type variants.

gcc/testsuite/ChangeLog:

 * g++.dg/modules/enum-9_a.H: New test.
 * g++.dg/modules/enum-9_b.C: New test.
---
  gcc/cp/module.cc| 9 ++---
  gcc/testsuite/g++.dg/modules/enum-9_a.H | 5 +
  gcc/testsuite/g++.dg/modules/enum-9_b.C | 6 ++
  3 files changed, 17 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/enum-9_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/enum-9_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 7ffeefa7c1f..97fb80bcd44 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -12303,9 +12303,12 @@ trees_in::read_enum_def (tree defn, tree 
maybe_template)

if (installing)
  {
-  TYPE_VALUES (type) = values;
-  TYPE_MIN_VALUE (type) = min;
-  TYPE_MAX_VALUE (type) = max;
+  for (tree t = type; t; t = TYPE_NEXT_VARIANT (t))
+   {
+ TYPE_VALUES (t) = values;
+ TYPE_MIN_VALUE (t) = min;
+ TYPE_MAX_VALUE (t) = max;
+   }


it's definitely somewhat ugly but at least type_hash_canon doesn't hash
these for ENUMERAL_TYPE (but it does compare them!  which in principle
means it could as well hash them ...)

I think that if you read both from the same module that you should arrange
to read what you refer to first?  But maybe that's not the actual issue here.


*nod* reading in the enum before reading in the typedef seems like
the most direct solution, though not sure how to accomplish that :/


For LTO streaming we DFS walk tree edges from all entries into the tree
graph we want to stream, collecting and streaming SCCs.  Not sure if
doing similar for module streaming would help this case though.


FWIW I managed to obtain a more interesting reduction for this ICE, one
that doesn't use a typedef bound to the same name as the enum:

$ cat 106848_a.H
template
struct pair {
   using type = void(*)(const _T1&);
};
struct _ScannerBase {
   enum _TokenT { _S_token_anychar };
   pair<_TokenT> _M_token_tbl;
};

$ cat 106848_b.C
import "106848_a.H";

using type = _ScannerBase;

$ g++ -fmodules-ts -g 106848_a.H 106848_b.C
106848_b.C:3:14: error: type variant differs by TYPE_MAX_VALUE



Like in the less interesting testcase, the problem is ultimately that we
create a variant of the enum (as part of reading in pair<_TokenT>::type)
before reading the enum's definition, thus the variant inherits stale
TYPE_MIN/MAX_VALUE.

Perhaps pair<_TokenT>::type should indirectly depend on the definition
of _TokenT -- but IIUC we generally don't require a type to be defined
in order to refer to it, so enforcing such a dependency would be a
pessimization I think.

So ISTM this isn't a dependency issue (pair<_TokenT>::type already
implicitly depends on the ENUMERAL_TYPE, just not also the enum's
defining TYPE_DECL), and the true issue is that we're streaming
TYPE_MIN/MAX_VALUE only as part of an enum's definition, which the
linked patch fixes.


Thanks for the explanation, it's a situation I didn;t anticipate and your fix is 
good.  Could you add a comment about why you need to propagate the values though?


nathan






A somewhat orthogonal issue (that incidentally fixes this testcase) is
that we stream TYPE_MIN/MAX_VALUE only for enums with a definition, but
the frontend sets these fields even for opaque enums.  If we make sure
to stream these fields for all ENUMERAL_TYPEs, then we won't have to
worry about these fields being stale for variants that may have been
created before reading in the enum definition (their TYPE_VALUES field
will still be stale I guess, but verify_type doesn't worry about that
it seems, so we avoid the ICE).

patch to that effect is at
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603831.html



Richard.



rest_of_type_c

Re: Adding a new thread model to GCC

2022-10-21 Thread Jacek Caban via Gcc-patches


On 10/21/22 14:29, LIU Hao wrote:

在 2022/10/21 20:13, Jacek Caban 写道:


This is not true for past 15 years, CRITICAL_SECTIONS use something 
like RtlWaitOnAddress (an equivalent of futexes) since Vista, see 
Wine implementation for details:

https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/ntdll/sync.c#L190




Ah Jacek, nice to see you here.

I haven't dug into this too much, though. From my limited knowledge 
(mostly from reading disassembly) now CRITICAL_SECTION uses 
`NtWaitForAlertByThreadId` (and no longer keyed events or semaphores). 
As with `WaitOnAddress()`, there seems to be some global data 
structure, protected by a spin lock. It's just another undocumented 
syscall. Keyed events are still functional.



NtWaitForAlertByThreadId() is an underlying syscall that's used by 
WaitOnAddress(). Anyway, you don't need to worry about that if you just 
use public CRITICAL_SECTION APIs.



Jacek

Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-21 Thread Jan Hubicka via Gcc-patches

> On Fri, Oct 21, 2022 at 12:00 PM Kumar, Venkataramanan via Gcc-patches
>  wrote:
> >
> > Hi all,
> >
> > > -Original Message-
> > > From: Joshi, Tejas Sanjay 
> > > Sent: Monday, October 17, 2022 8:09 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Kumar, Venkataramanan ;
> > > honza.hubi...@gmail.com; Uros Bizjak 
> > > Subject: RE: [PATCH] [X86_64]: Enable support for next generation AMD
> > > Zen4 CPU
> > >
> > > [Public]
> > >
> > > Hi,
> > >
> > > > BTW: Perhaps znver1.md is not the right filename anymore, since it hosts
> > > all four Zen schedulers.
> > >
> > > I have renamed the file to znver.md in this revision, PFA.
> > > Thank you for the review, we will push it for trunk if we don't get any
> > > further comments.
> >
> > I have pushed the patch on behalf of Tejas.
> 
> This grew insn-automata.cc from 201502 lines to 639968 lines and the build
> of the automata (genautomata) to several minutes in my dev tree.
> 
> You did something wrong.  Please fix!

I think it may make sense to make the initial patch without scheduler
model update with zen3 scheduling.  I can work on updating the model
which needs some benchmarking and setting up the cost tables first.
The problem here is that adding extra variants to execution core model
likely forces too many states.

In general DFA is not best model for such symmetirc and parallel
execution core (since there are way too many combinations individual
pipes may get).  I was thinking of adding an option to generate
alternative model based on bitmasks, but never got around implementing
that.

So with current infrastructure we always need to simplify a bit. Which
is also not big deal since the scheduling is not well documented
anyway and our model is not precise at all (it misses the on-chip
scheduler).

Honza
> 
> Richard.
> 
> > Regards,
> > Venkat.
> >

Re: [PATCH 10/15 V2] arm: Implement cortex-M return signing address codegen





On 14/09/2022 15:20, Andrea Corallo via Gcc-patches wrote:

Hi all,

this patch enables address return signature and verification based on
Armv8.1-M Pointer Authentication [1].

To sign the return address, we use the PAC R12, LR, SP instruction
upon function entry.  This is signing LR using SP and storing the
result in R12.  R12 will be pushed into the stack.

During function epilogue R12 will be popped and AUT R12, LR, SP will
be used to verify that the content of LR is still valid before return.

Here an example of PAC instrumented function prologue and epilogue:

void foo (void);

int main()
{
   foo ();
   return 0;
}

Compiled with '-march=armv8.1-m.main -mbranch-protection=pac-ret
-mthumb' translates into:

main:
pac ip, lr, sp
push{r3, r7, ip, lr}
add r7, sp, #0
bl  foo
movsr3, #0
mov r0, r3
pop {r3, r7, ip, lr}
aut ip, lr, sp
bx  lr

The patch also takes care of generating a PACBTI instruction in place
of the sequence BTI+PAC when Branch Target Identification is enabled
contextually.

Ex. the previous example compiled with '-march=armv8.1-m.main
-mbranch-protection=pac-ret+bti -mthumb' translates into:

main:
pacbti  ip, lr, sp
push{r3, r7, ip, lr}
add r7, sp, #0
bl  foo
movsr3, #0
mov r0, r3
pop {r3, r7, ip, lr}
aut ip, lr, sp
bx  lr

As part of previous upstream suggestions a test for varargs has been
added and '-mtpcs-frame' is deemed being incompatible with this return
signing address feature being introduced.

[1] 


gcc/Changelog

2021-11-03  Andrea Corallo  

* config/arm/arm.c: (arm_compute_frame_layout)
(arm_expand_prologue, thumb2_expand_return, arm_expand_epilogue)
(arm_conditional_register_usage): Update for pac codegen.
(arm_current_function_pac_enabled_p): New function.
* config/arm/arm.md (pac_ip_lr_sp, pacbti_ip_lr_sp, aut_ip_lr_sp):
Add new patterns.
* config/arm/unspecs.md (UNSPEC_PAC_IP_LR_SP)
(UNSPEC_PACBTI_IP_LR_SP, UNSPEC_AUT_IP_LR_SP): Add unspecs.

gcc/testsuite/Changelog

2021-11-03  Andrea Corallo  

* gcc.target/arm/pac.h : New file.
* gcc.target/arm/pac-1.c : New test case.
* gcc.target/arm/pac-2.c : Likewise.
* gcc.target/arm/pac-3.c : Likewise.
* gcc.target/arm/pac-4.c : Likewise.
* gcc.target/arm/pac-5.c : Likewise.
* gcc.target/arm/pac-6.c : Likewise.
* gcc.target/arm/pac-7.c : Likewise.
* gcc.target/arm/pac-8.c : Likewise.



+  if (arm_current_function_pac_enabled_p () && !(arm_arch7 && 
arm_arch_cmse))
+error ("This architecture does not support branch protection 
instructions");


This test feels wrong.  What does having cmse give us?  I suspect you 
want a test that ensures we have at least v8-m.main so that the NOP 
instructions are correctly defined as NOPs (or, in this case, PACBTI 
instructions) rather than unpredictable; but if that's the case then I 
think you really want to write the test that way here (perhaps in a 
macro) and then move this test into that so that it becomes 
self-documenting - but don't we have a v8-m.main test anyway?



+ if (arm_current_function_pac_enabled_p ())
+   {
+  gcc_assert (!(saved_regs_mask & (1 << PC_REGNUM)));
+ arm_emit_multi_reg_pop (saved_regs_mask);
+ emit_insn (gen_aut_nop ());
+ emit_jump_insn (simple_return_rtx);
+   }

The assert is using indents that are just spaces, but the other lines 
use tabs.  Please use tabs everywhere rather than mixing like this.


+/* Return TRUE if return address signing mechanism is enabled.  */
+bool
+arm_current_function_pac_enabled_p (void)
+{
+  return aarch_ra_sign_scope == AARCH_FUNCTION_ALL
+|| (aarch_ra_sign_scope == AARCH_FUNCTION_NON_LEAF
+   && !crtl->is_leaf);
+}

This is a case where you should use parenthesis around the expression so 
that the continuation lines are correctly indented.


@@ -11518,7 +11518,7 @@ (define_expand "prologue"
  arm_expand_prologue ();
else
  thumb1_expand_prologue ();
-  DONE;
+   DONE;
   "
 )

Although this is a trivial cleanup, it has nothing to do with this 
patch.  Please remove.


+  "arm_arch7 && arm_arch_cmse"

See my comments earlier about this test; the same applies here.

+   (unspec:SI [(reg:SI SP_REGNUM) (reg:SI LR_REGNUM)]
+   UNSPEC_PAC_NOP))]
+
Again you have a mix of lines indented with tabs and lines indented with 
just spaces.  Similarly with pacbti_nop and aut_nop.


Do you have a test for the nested functions case (I can't see it, but 
perhaps I've missed it somewhere)?


R.

Re: [PATCH 13/15] arm: Add pacbti related multilib support for armv8.1-m.main.





On 12/08/2022 18:10, Srinath Parvathaneni via Gcc-patches wrote:

  Hi,

This patch supports following -march/-mbranch-protection combination by linking 
them
to existing pacbti multilibs.

$ -march=armv8.1-m.main+pacbti+fp.dp+mve.fp -mbranch-protection=standard 
-mfloat-abi=hard -mthumb
$ -march=armv8.1-m.main+pacbti+fp.dp+mve -mbranch-protection=standard 
-mfloat-abi=hard -mthumb
$ -march=armv8.1-m.main+dsp+pacbti+fp.dp -mbranch-protection=standard 
-mfloat-abi=hard -mthumb

Regression tested on arm-none-eabi and bootstrapped on arm-none-linux-gnueabihf.

Ok for master?

Regards,
Srinath.

gcc/ChangeLog:

2022-08-12  Srinath Parvathaneni  

 * config/arm/t-rmprofile: Add pacbti multililb variants.

gcc/testsuite/ChangeLog:

2022-08-12  Srinath Parvathaneni  

 * gcc.target/arm/pac-10.c: New test.
 * gcc.target/arm/pac-11.c: Likewise.
 * gcc.target/arm/pac-12.c: Likewise.


Please resend with a correctly attached patch.  You've used octet-stream 
rather than a text format.


R.

Re: [PING][PATCH 0/15] arm: Enables return address verification and branch target identification on Cortex-M





On 21/09/2022 09:07, Andrea Corallo via Gcc-patches wrote:

Hi all,

ping^2 for patches 9/15 7/15 11/15 12/15 and 10/15 V2 of this series.

   Andrea


Subject says xx/15, but I only see 1-12 from you.

R.

Re: [PATCH] c++ modules: verify_type failure with typedef enum [PR106848]

2022-10-21 Thread Patrick Palka via Gcc-patches

On Fri, 21 Oct 2022, Nathan Sidwell wrote:

> On 10/19/22 09:55, Patrick Palka wrote:
> > On Wed, 19 Oct 2022, Richard Biener wrote:
> > 
> > > On Tue, Oct 18, 2022 at 8:26 PM Patrick Palka  wrote:
> > > > 
> > > > On Fri, 14 Oct 2022, Richard Biener wrote:
> > > > 
> > > > > On Thu, Oct 13, 2022 at 5:40 PM Patrick Palka via Gcc-patches
> > > > >  wrote:
> > > > > > 
> > > > > > Here during stream in we end up having created a type variant for
> > > > > > the enum
> > > > > > before we read the enum's definition, and thus the variant inherited
> > > > > > stale
> > > > > > TYPE_VALUES and TYPE_MIN/MAX_VALUES, which leads to an ICE (with
> > > > > > -g).  The
> > > > > > stale variant got created from set_underlying_type during earlier
> > > > > > stream in
> > > > > > of the (redundant) typedef for the enum.
> > > > > > 
> > > > > > This patch works around this by setting TYPE_VALUES and
> > > > > > TYPE_MIN/MAX_VALUES
> > > > > > for all variants when reading in an enum definition.  Does this look
> > > > > > like
> > > > > > the right approach?  Or perhaps we need to arrange that we read the
> > > > > > enum
> > > > > > definition before reading in the typedef decl?  Note that seems to
> > > > > > be an
> > > > > > issue only when the typedef name and enum names are the same (thus
> > > > > > the
> > > > > > typedef is redundant), otherwise we seem to read the enum definition
> > > > > > first
> > > > > > as desired.
> > > > > > 
> > > > > >  PR c++/106848
> > > > > > 
> > > > > > gcc/cp/ChangeLog:
> > > > > > 
> > > > > >  * module.cc (trees_in::read_enum_def): Set the TYPE_VALUES,
> > > > > >  TYPE_MIN_VALUE and TYPE_MAX_VALUE of all type variants.
> > > > > > 
> > > > > > gcc/testsuite/ChangeLog:
> > > > > > 
> > > > > >  * g++.dg/modules/enum-9_a.H: New test.
> > > > > >  * g++.dg/modules/enum-9_b.C: New test.
> > > > > > ---
> > > > > >   gcc/cp/module.cc| 9 ++---
> > > > > >   gcc/testsuite/g++.dg/modules/enum-9_a.H | 5 +
> > > > > >   gcc/testsuite/g++.dg/modules/enum-9_b.C | 6 ++
> > > > > >   3 files changed, 17 insertions(+), 3 deletions(-)
> > > > > >   create mode 100644 gcc/testsuite/g++.dg/modules/enum-9_a.H
> > > > > >   create mode 100644 gcc/testsuite/g++.dg/modules/enum-9_b.C
> > > > > > 
> > > > > > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > > > > > index 7ffeefa7c1f..97fb80bcd44 100644
> > > > > > --- a/gcc/cp/module.cc
> > > > > > +++ b/gcc/cp/module.cc
> > > > > > @@ -12303,9 +12303,12 @@ trees_in::read_enum_def (tree defn, tree
> > > > > > maybe_template)
> > > > > > 
> > > > > > if (installing)
> > > > > >   {
> > > > > > -  TYPE_VALUES (type) = values;
> > > > > > -  TYPE_MIN_VALUE (type) = min;
> > > > > > -  TYPE_MAX_VALUE (type) = max;
> > > > > > +  for (tree t = type; t; t = TYPE_NEXT_VARIANT (t))
> > > > > > +   {
> > > > > > + TYPE_VALUES (t) = values;
> > > > > > + TYPE_MIN_VALUE (t) = min;
> > > > > > + TYPE_MAX_VALUE (t) = max;
> > > > > > +   }
> > > > > 
> > > > > it's definitely somewhat ugly but at least type_hash_canon doesn't
> > > > > hash
> > > > > these for ENUMERAL_TYPE (but it does compare them!  which in principle
> > > > > means it could as well hash them ...)
> > > > > 
> > > > > I think that if you read both from the same module that you should
> > > > > arrange
> > > > > to read what you refer to first?  But maybe that's not the actual
> > > > > issue here.
> > > > 
> > > > *nod* reading in the enum before reading in the typedef seems like
> > > > the most direct solution, though not sure how to accomplish that :/
> > > 
> > > For LTO streaming we DFS walk tree edges from all entries into the tree
> > > graph we want to stream, collecting and streaming SCCs.  Not sure if
> > > doing similar for module streaming would help this case though.
> > 
> > FWIW I managed to obtain a more interesting reduction for this ICE, one
> > that doesn't use a typedef bound to the same name as the enum:
> > 
> > $ cat 106848_a.H
> > template
> > struct pair {
> >using type = void(*)(const _T1&);
> > };
> > struct _ScannerBase {
> >enum _TokenT { _S_token_anychar };
> >pair<_TokenT> _M_token_tbl;
> > };
> > 
> > $ cat 106848_b.C
> > import "106848_a.H";
> > 
> > using type = _ScannerBase;
> > 
> > $ g++ -fmodules-ts -g 106848_a.H 106848_b.C
> > 106848_b.C:3:14: error: type variant differs by TYPE_MAX_VALUE
> > 
> > 
> > 
> > Like in the less interesting testcase, the problem is ultimately that we
> > create a variant of the enum (as part of reading in pair<_TokenT>::type)
> > before reading the enum's definition, thus the variant inherits stale
> > TYPE_MIN/MAX_VALUE.
> > 
> > Perhaps pair<_TokenT>::type should indirectly depend on the definition
> > of _TokenT -- but IIUC we generally don't require a type to be defined
> > in order to refer to it, so enforcing such a dependency would be a
> > pessimization I think.
> > 
>

[PATCH] Rename nonzero_bits to known_zero_bits.

2022-10-21 Thread Aldy Hernandez via Gcc-patches

The name nonzero_bits is confusing.  We're not tracking nonzero bits.
We're tracking known-zero bits, or at the worst we're tracking "maye
nonzero bits".  But really, the only thing we're sure about in the
"nonzero" bits are the bits that are zero, which are known to be 0.
We're not tracking nonzero bits.

I know we've been carrying around this name forever, but the fact that
both of the maintainers of the code *HATE* it, should be telling.
Also, we'd also like to track known-one bits in the irange, so it's
best to keep the nomenclature consistent.

Andrew, are you ok with this naming, or would you prefer something
else?

gcc/ChangeLog:

* asan.cc (handle_builtin_alloca): Rename *nonzero* to *known_zero*.
* fold-const.cc (expr_not_equal_to): Same.
(tree_nonzero_bits): Same.
* gimple-range-op.cc: Same.
* ipa-cp.cc (ipcp_bits_lattice::get_value_and_mask): Same.
* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Same.
(ipcp_update_bits): Same.
* match.pd: Same.
* range-op.cc (operator_lt::fold_range): Same.
(operator_cast::fold_range): Same.
(operator_bitwise_and::fold_range): Same.
(set_nonzero_range_from_mask): Same.
(set_known_zero_range_from_mask): Same.
(operator_bitwise_and::simple_op1_range_solver): Same.
(operator_bitwise_and::op1_range): Same.
(range_op_cast_tests): Same.
(range_op_bitwise_and_tests): Same.
* tree-data-ref.cc (split_constant_offset): Same.
* tree-ssa-ccp.cc (get_default_value): Same.
(ccp_finalize): Same.
(evaluate_stmt): Same.
* tree-ssa-dom.cc
(dom_opt_dom_walker::set_global_ranges_from_unreachable_edges): Same.
* tree-ssa-reassoc.cc (optimize_range_tests_var_bound): Same.
* tree-ssanames.cc (set_nonzero_bits): Same.
(set_known_zero_bits): Same.
(get_nonzero_bits): Same.
(get_known_zero_bits): Same.
(ssa_name_has_boolean_range): Same.
* tree-ssanames.h (set_nonzero_bits): Same.
(get_nonzero_bits): Same.
(set_known_zero_bits): Same.
(get_known_zero_bits): Same.
* tree-vect-patterns.cc (vect_get_range_info): Same.
* tree-vrp.cc (maybe_set_nonzero_bits): Same.
(maybe_set_known_zero_bits): Same.
(vrp_asserts::remove_range_assertions): Same.
* tree-vrp.h (maybe_set_nonzero_bits): Same.
(maybe_set_known_zero_bits): Same.
* tree.cc (tree_ctz): Same.
* value-range-pretty-print.cc
(vrange_printer::print_irange_bitmasks): Same.
* value-range-storage.cc (irange_storage_slot::set_irange): Same.
(irange_storage_slot::get_irange): Same.
(irange_storage_slot::dump): Same.
* value-range-storage.h: Same.
* value-range.cc (irange::operator=): Same.
(irange::copy_to_legacy): Same.
(irange::irange_set): Same.
(irange::irange_set_anti_range): Same.
(irange::set): Same.
(irange::verify_range): Same.
(irange::legacy_equal_p): Same.
(irange::operator==): Same.
(irange::contains_p): Same.
(irange::irange_single_pair_union): Same.
(irange::irange_union): Same.
(irange::irange_intersect): Same.
(irange::invert): Same.
(irange::get_nonzero_bits_from_range): Same.
(irange::get_known_zero_bits_from_range): Same.
(irange::set_range_from_nonzero_bits): Same.
(irange::set_range_from_known_zero_bits): Same.
(irange::set_nonzero_bits): Same.
(irange::set_known_zero_bits): Same.
(irange::get_nonzero_bits): Same.
(irange::get_known_zero_bits): Same.
(irange::intersect_nonzero_bits): Same.
(irange::intersect_known_zero_bits): Same.
(irange::union_nonzero_bits): Same.
(irange::union_known_zero_bits): Same.
(range_tests_nonzero_bits): Same.
* value-range.h (irange::varying_compatible_p): Same.
(gt_ggc_mx): Same.
(gt_pch_nx): Same.
(irange::set_undefined): Same.
(irange::set_varying): Same.
---
 gcc/asan.cc |   2 +-
 gcc/fold-const.cc   |   4 +-
 gcc/gimple-range-op.cc  |   2 +-
 gcc/ipa-cp.cc   |   2 +-
 gcc/ipa-prop.cc |   4 +-
 gcc/match.pd|  14 +--
 gcc/range-op.cc |  28 +++---
 gcc/tree-data-ref.cc|   2 +-
 gcc/tree-ssa-ccp.cc |   8 +-
 gcc/tree-ssa-dom.cc |   2 +-
 gcc/tree-ssa-reassoc.cc |   4 +-
 gcc/tree-ssanames.cc|  14 +--
 gcc/tree-ssanames.h |   4 +-
 gcc/tree-vect-patterns.cc   |   2 +-
 gcc/tree-vrp.cc |   6 +-
 gcc/tree-vrp.h  |   2 +-
 gcc/tree.cc |   2 +-
 gcc/value-range-pretty-print.cc |   2 +-
 gcc/value-range-storage.cc  |   6 +-
 gcc/value-range-storage.h

Re: [PATCH] c++, v2: Don't shortcut TREE_CONSTANT vector type CONSTRUCTORs in cxx_eval_constant_expression [PR107295]

2022-10-21 Thread Jason Merrill via Gcc-patches


On 10/21/22 03:30, Jakub Jelinek wrote:

On Thu, Oct 20, 2022 at 10:51:14AM -0400, Jason Merrill wrote:

That seems like a bug; for VECTOR_TYPE we should fold even if !changed.


Also, the reason for the short-cutting is I think trying to avoid
allocating a new CONSTRUCTOR when nothing changes and we just create
GC garbage by it.


We might limit the shortcut to non-vector types by hoisting the vector check
in reduced_constant_expression_p out of the CONSTRUCTOR_NO_CLEARING
condition:


   if (CONSTRUCTOR_NO_CLEARING (t))
 {
   if (TREE_CODE (TREE_TYPE (t)) == VECTOR_TYPE)
 /* An initialized vector would have a VECTOR_CST.  */
 return false;


then we could remove the fold in the shortcut.


Ok, so like this?
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2022-10-21  Jakub Jelinek  

PR c++/107295
* constexpr.cc (reduced_constant_expression_p) :
Return false for VECTOR_TYPE CONSTRUCTORs even without
CONSTRUCTOR_NO_CLEARING set on them.
(cxx_eval_bare_aggregate): If constant but !changed, fold before
returning VECTOR_TYPE_P CONSTRUCTOR.
(cxx_eval_constant_expression) : Don't fold
TREE_CONSTANT CONSTRUCTOR, just return it.

* g++.dg/ext/vector42.C: New test.

--- gcc/cp/constexpr.cc.jj  2022-10-19 11:20:28.960225787 +0200
+++ gcc/cp/constexpr.cc 2022-10-20 18:43:42.952440364 +0200
@@ -3104,12 +3104,12 @@ reduced_constant_expression_p (tree t)
  case CONSTRUCTOR:
/* And we need to handle PTRMEM_CST wrapped in a CONSTRUCTOR.  */
tree field;
+  if (TREE_CODE (TREE_TYPE (t)) == VECTOR_TYPE)
+   /* An initialized vector would have a VECTOR_CST.  */
+   return false;
if (CONSTRUCTOR_NO_CLEARING (t))
{
- if (TREE_CODE (TREE_TYPE (t)) == VECTOR_TYPE)
-   /* An initialized vector would have a VECTOR_CST.  */
-   return false;
- else if (TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
+ if (TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
{
  /* There must be a valid constant initializer at every array
 index.  */
@@ -4956,8 +4956,14 @@ cxx_eval_bare_aggregate (const constexpr
  TREE_SIDE_EFFECTS (ctx->ctor) = side_effects_p;
}
  }
-  if (*non_constant_p || !changed)
+  if (*non_constant_p)
  return t;
+  if (!changed)
+{
+  if (VECTOR_TYPE_P (type))
+   t = fold (t);
+  return t;
+}
t = ctx->ctor;
if (!t)
  t = build_constructor (type, NULL);
@@ -7387,11 +7393,10 @@ cxx_eval_constant_expression (const cons
  case CONSTRUCTOR:
if (TREE_CONSTANT (t) && reduced_constant_expression_p (t))
{
- /* Don't re-process a constant CONSTRUCTOR, but do fold it to
-VECTOR_CST if applicable.  */
+ /* Don't re-process a constant CONSTRUCTOR.  */
  verify_constructor_flags (t);
  if (TREE_CONSTANT (t))
-   return fold (t);
+   return t;
}
r = cxx_eval_bare_aggregate (ctx, t, lval,
   non_constant_p, overflow_p);
--- gcc/testsuite/g++.dg/ext/vector42.C.jj  2022-10-20 17:57:42.767848544 
+0200
+++ gcc/testsuite/g++.dg/ext/vector42.C 2022-10-20 17:57:42.767848544 +0200
@@ -0,0 +1,12 @@
+// PR c++/107295
+// { dg-do compile { target c++11 } }
+
+template  struct A {
+  typedef T __attribute__((vector_size (sizeof (int V;
+};
+template  using B = typename A::V;
+template  using V = B<4, T>;
+using F = V;
+constexpr F a = F () + 0.0f;
+constexpr F b = F () + (float) 0.0;
+constexpr F c = F () + (float) 0.0L;


Jakub

[PATCH 0/2] ivopts: Fix candidate selection for architectures with limited addressing modes.

2022-10-21 Thread Dimitrije Milosevic

Architectures like Mips are very limited when it comes to addressing modes. 
Therefore, the expected
behavior would be that, for the BASE + OFFSET addressing mode, complexity is 
lower, while, for more
complex addressing modes (e.g. BASE + INDEX << SCALE), which are not supported, 
complexity is
higher. Currently, the complexity calculation algorithm bails out if BASE + 
INDEX addressing mode
is not supported by the target architecture, resuling in 0-complexities for all 
candidates, which
leads to non-optimal candidate selection, especially in scenarios where there 
are multiple nested
loops.

Additionally, when bumping up the register pressure cost, the number of 
invariants should also be
considered, in addition to the number of candidates.

Dimitrije Milosevic (2):
  ivopts: Revert computation of address cost complexity.
  ivopts: Consider number of invariants when calculating register pressure.

 gcc/tree-ssa-address.cc |   2 +-
 gcc/tree-ssa-address.h  |   2 +
 gcc/tree-ssa-loop-ivopts.cc | 220 +---
 3 files changed, 210 insertions(+), 14 deletions(-)
---
2.25.1

[PATCH 1/2] ivopts: Revert computation of address cost complexity.

2022-10-21 Thread Dimitrije Milosevic

From: Dimitrije Milošević 

This patch reverts the computation of address cost complexity
to the legacy one. After f9f69dd, complexity is calculated
using the valid_mem_ref_p target hook. Architectures like
Mips only allow BASE + OFFSET addressing modes, which in turn
prevents the calculation of complexity for other addressing
modes, resulting in non-optimal candidate selection.

gcc/ChangeLog:

* tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
to non-static.
* tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
* tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
(compute_min_and_max_offset): Likewise.
(get_address_cost): Revert
complexity calculation.

Signed-off-by: Dimitrije Milosevic 
---
 gcc/tree-ssa-address.cc |   2 +-
 gcc/tree-ssa-address.h  |   2 +
 gcc/tree-ssa-loop-ivopts.cc | 214 ++--
 3 files changed, 207 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-ssa-address.cc b/gcc/tree-ssa-address.cc
index ba7b7c93162..442f54f0165 100644
--- a/gcc/tree-ssa-address.cc
+++ b/gcc/tree-ssa-address.cc
@@ -561,7 +561,7 @@ add_to_parts (struct mem_address *parts, tree elt)
validity for a memory reference accessing memory of mode MODE in address
space AS.  */
 
-static bool
+bool
 multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
 addr_space_t as)
 {
diff --git a/gcc/tree-ssa-address.h b/gcc/tree-ssa-address.h
index 95143a099b9..09f36ee2f19 100644
--- a/gcc/tree-ssa-address.h
+++ b/gcc/tree-ssa-address.h
@@ -38,6 +38,8 @@ tree create_mem_ref (gimple_stmt_iterator *, tree,
 class aff_tree *, tree, tree, tree, bool);
 extern void copy_ref_info (tree, tree);
 tree maybe_fold_tmr (tree);
+bool multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
+addr_space_t as);
 
 extern unsigned int preferred_mem_scale_factor (tree base,
machine_mode mem_mode,
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index a6f926a68ef..d53ba05a4f6 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -4774,6 +4774,135 @@ get_address_cost_ainc (poly_int64 ainc_step, poly_int64 
ainc_offset,
   return infinite_cost;
 }
 
+static void
+compute_symbol_and_var_present (tree e1, tree e2,
+   bool *symbol_present, bool *var_present)
+{
+  poly_uint64_pod off1, off2;
+
+  e1 = strip_offset (e1, &off1);
+  e2 = strip_offset (e2, &off2);
+
+  STRIP_NOPS (e1);
+  STRIP_NOPS (e2);
+
+  if (TREE_CODE (e1) == ADDR_EXPR)
+{
+  poly_int64_pod diff;
+  if (ptr_difference_const (e1, e2, &diff))
+  {
+*symbol_present = false;
+*var_present = false;
+return;
+  }
+
+  if (integer_zerop (e2))
+  {
+tree core;
+poly_int64_pod bitsize;
+poly_int64_pod bitpos;
+widest_int mul;
+tree toffset;
+machine_mode mode;
+int unsignedp, reversep, volatilep;
+
+core = get_inner_reference (TREE_OPERAND (e1, 0), &bitsize, &bitpos,
+  &toffset, &mode, &unsignedp, &reversep, &volatilep);
+
+if (toffset != 0
+|| !constant_multiple_p (bitpos, BITS_PER_UNIT, &mul)
+|| reversep
+|| !VAR_P (core))
+  {
+*symbol_present = false;
+*var_present = true;
+return;
+  }
+
+if (TREE_STATIC (core)
+|| DECL_EXTERNAL (core))
+  {
+*symbol_present = true;
+*var_present = false;
+return;
+  }
+
+*symbol_present = false;
+*var_present = true;
+return;
+  }
+
+  *symbol_present = false;
+  *var_present = true;
+}
+  *symbol_present = false;
+
+  if (operand_equal_p (e1, e2, 0))
+{
+  *var_present = false;
+  return;
+}
+
+  *var_present = true;
+}
+
+static void
+compute_min_and_max_offset (addr_space_t as,
+   machine_mode mem_mode, poly_int64_pod *min_offset,
+   poly_int64_pod *max_offset)
+{
+  machine_mode address_mode = targetm.addr_space.address_mode (as);
+  HOST_WIDE_INT i;
+  poly_int64_pod off, width;
+  rtx addr;
+  rtx reg1;
+
+  reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
+
+  width = GET_MODE_BITSIZE (address_mode) - 1;
+  if (known_gt (width, HOST_BITS_PER_WIDE_INT - 1))
+ width = HOST_BITS_PER_WIDE_INT - 1;
+  gcc_assert (width.is_constant ());
+  addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
+
+  off = 0;
+  for (i = width.to_constant (); i >= 0; i--)
+{
+  off = -(HOST_WIDE_INT_1U << i);
+  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+  if (memory_address_addr_space_p (mem_mode, addr, as))
+break;
+}
+  if (i == -1)
+*min_offset = 0;
+  else
+*min_offset = off;
+  // *min_offset = (i == -1? 0 : off);
+
+  for (i = width.to_constant (); i >= 0; i--)
+{
+  off = (HOST_WIDE_INT_1U << i) - 1;
+  XEXP (addr, 1) = gen_int_mode (off, address_mo

[PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.

2022-10-21 Thread Dimitrije Milosevic

From: Dimitrije Milošević 

This patch slightly modifies register pressure model function to consider
both the number of invariants and the number of candidates, rather than
just the number of candidates. This used to be the case before c18101f.

gcc/ChangeLog:

* tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.

Signed-off-by: Dimitrije Milosevic 
---
 gcc/tree-ssa-loop-ivopts.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index d53ba05a4f6..9d0b669d671 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -6409,9 +6409,9 @@ ivopts_estimate_reg_pressure (struct ivopts_data *data, 
unsigned n_invs,
   + target_spill_cost [speed] * (n_cands - available_regs) * 2
   + target_spill_cost [speed] * (regs_needed - n_cands);
 
-  /* Finally, add the number of candidates, so that we prefer eliminating
- induction variables if possible.  */
-  return cost + n_cands;
+  /* Finally, add the number of invariants and the number of candidates,
+ so that we prefer eliminating induction variables if possible.  */
+  return cost + n_invs + n_cands;
 }
 
 /* For each size of the induction variable set determine the penalty.  */
-- 
2.25.1

RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-21 Thread Joshi, Tejas Sanjay via Gcc-patches

[AMD Official Use Only - General]

Hi,

> I think it may make sense to make the initial patch without scheduler model 
> update with zen3 scheduling.  I can work on updating the model which needs 
> some benchmarking and setting up > the cost tables first.
> The problem here is that adding extra variants to execution core model likely 
> forces too many states.

Okay, I will prepare another patch which reverts the znver4 instruction 
reservations and submit it.

Thanks and Regards,
Tejas

Re: [PATCH zero-call-used-regs] Add leafy mode for zero-call-used-regs

2022-10-21 Thread Qing Zhao via Gcc-patches

Hi, Alexandre,

Could you please explain a little bit on the motivation of this patch first?

thanks.

Qing

> On Oct 21, 2022, at 3:31 AM, Alexandre Oliva  wrote:
> 
> Introduce 'leafy' to auto-select between 'used' and 'all' for leaf and
> nonleaf functions, respectively.
> 
> Regstrapped on x86_64-linux-gnu.  Ok to install?
> 
> 
> for  gcc/ChangeLog
> 
>   * doc/extend.texi (zero-call-used-regs): Document leafy and
>   variants thereof.
>   * flag-types.h (zero_regs_flags): Add LEAFY_MODE, as well as
>   LEAFY and variants.
>   * function.cc (gen_call_ued_regs_seq): Set only_used for leaf
>   functions in leafy mode.
>   * opts.cc (zero_call_used_regs_opts): Add leafy and variants.
> 
> for  gcc/testsuite/ChangeLog
> 
>   * c-c++-common/zero-scratch-regs-leafy-1.c: New.
>   * c-c++-common/zero-scratch-regs-leafy-2.c: New.
>   * gcc.target/i386/zero-scratch-regs-leafy-1.c: New.
>   * gcc.target/i386/zero-scratch-regs-leafy-2.c: New.
> ---
> gcc/doc/extend.texi|   22 ++--
> gcc/flag-types.h   |5 +
> gcc/function.cc|3 +++
> gcc/opts.cc|4 
> .../c-c++-common/zero-scratch-regs-leafy-1.c   |   15 ++
> .../c-c++-common/zero-scratch-regs-leafy-2.c   |   21 +++
> .../gcc.target/i386/zero-scratch-regs-leafy-1.c|   12 +++
> .../gcc.target/i386/zero-scratch-regs-leafy-2.c|   16 +++
> 8 files changed, 96 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-leafy-1.c
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-leafy-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-leafy-1.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-leafy-2.c
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 04af0584d82cc..bf11956c467fb 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -4391,10 +4391,28 @@ zeros all call-used registers that pass arguments.
> @item all-gpr-arg
> zeros all call-used general purpose registers that pass
> arguments.
> +
> +@item leafy
> +Same as @samp{used} in a leaf function, and same as @samp{all} in a
> +nonleaf function.
> +
> +@item leafy-gpr
> +Same as @samp{used-gpr} in a leaf function, and same as @samp{all-gpr}
> +in a nonleaf function.
> +
> +@item leafy-arg
> +Same as @samp{used-arg} in a leaf function, and same as @samp{all-arg}
> +in a nonleaf function.
> +
> +@item leafy-gpr-arg
> +Same as @samp{used-gpr-arg} in a leaf function, and same as
> +@samp{all-gpr-arg} in a nonleaf function.
> +
> @end table
> 
> -Of this list, @samp{used-arg}, @samp{used-gpr-arg}, @samp{all-arg},
> -and @samp{all-gpr-arg} are mainly used for ROP mitigation.
> +Of this list, @samp{used-arg}, @samp{used-gpr-arg}, @samp{leafy-arg},
> +@samp{leafy-gpr-arg}, @samp{all-arg}, and @samp{all-gpr-arg} are mainly
> +used for ROP mitigation.
> 
> The default for the attribute is controlled by @option{-fzero-call-used-regs}.
> @end table
> diff --git a/gcc/flag-types.h b/gcc/flag-types.h
> index d2e751060ffce..b90c85167dcd4 100644
> --- a/gcc/flag-types.h
> +++ b/gcc/flag-types.h
> @@ -338,6 +338,7 @@ namespace zero_regs_flags {
>   const unsigned int ONLY_GPR = 1UL << 2;
>   const unsigned int ONLY_ARG = 1UL << 3;
>   const unsigned int ENABLED = 1UL << 4;
> +  const unsigned int LEAFY_MODE = 1UL << 5;
>   const unsigned int USED_GPR_ARG = ENABLED | ONLY_USED | ONLY_GPR | ONLY_ARG;
>   const unsigned int USED_GPR = ENABLED | ONLY_USED | ONLY_GPR;
>   const unsigned int USED_ARG = ENABLED | ONLY_USED | ONLY_ARG;
> @@ -346,6 +347,10 @@ namespace zero_regs_flags {
>   const unsigned int ALL_GPR = ENABLED | ONLY_GPR;
>   const unsigned int ALL_ARG = ENABLED | ONLY_ARG;
>   const unsigned int ALL = ENABLED;
> +  const unsigned int LEAFY_GPR_ARG = ENABLED | LEAFY_MODE | ONLY_GPR | 
> ONLY_ARG;
> +  const unsigned int LEAFY_GPR = ENABLED | LEAFY_MODE | ONLY_GPR;
> +  const unsigned int LEAFY_ARG = ENABLED | LEAFY_MODE | ONLY_ARG;
> +  const unsigned int LEAFY = ENABLED | LEAFY_MODE;
> }
> 
> /* Settings of flag_incremental_link.  */
> diff --git a/gcc/function.cc b/gcc/function.cc
> index 6474a663b30b8..16582e698041a 100644
> --- a/gcc/function.cc
> +++ b/gcc/function.cc
> @@ -5879,6 +5879,9 @@ gen_call_used_regs_seq (rtx_insn *ret, unsigned int 
> zero_regs_type)
>   only_used = zero_regs_type & ONLY_USED;
>   only_arg = zero_regs_type & ONLY_ARG;
> 
> +  if ((zero_regs_type & LEAFY_MODE) && leaf_function_p ())
> +only_used = true;
> +
>   /* For each of the hard registers, we should zero it if:
>   1. it is a call-used register;
>   and 2. it is not a fixed register;
> diff --git a/gcc/opts.cc b/gcc/opts.cc
> index ae079fcd20eea..39f6a1b278dc6 100644
> --- a/gcc/opts.cc
> +++ b/gcc/opts.cc
> @@ -2099,6

Re: [PING][PATCH 0/15] arm: Enables return address verification and branch target identification on Cortex-M

2022-10-21 Thread Andrea Corallo via Gcc-patches

Richard Earnshaw  writes:

> On 21/09/2022 09:07, Andrea Corallo via Gcc-patches wrote:
>> Hi all,
>> ping^2 for patches 9/15 7/15 11/15 12/15 and 10/15 V2 of this
>> series.
>>Andrea
>
> Subject says xx/15, but I only see 1-12 from you.
>
> R.

Yeah, at the time Srinath asked me to leave space for three more patches
to add to the series, but then he posted only 13/15 I guess squashing
the code in one patch.

  Andrea

[RFC] how to handle the combination of -fstrict-flex-arrays + -Warray-bounds

2022-10-21 Thread Qing Zhao via Gcc-patches

Hi,

(FAM below refers to Flexible Array Members):

I need inputs on  how to handle the combination of -fstrict-flex-arrays + 
-Warray-bounds. 

Our initial goal is to update -Warray-bounds with multiple levels of 
-fstrict-flex-arrays=N 
to issue warnings according to the different levels of “N”. 
However, after detailed study, I found that this goal was very hard to be 
achieved.

1. -fstrict-flex-arrays and its levels

The new option -fstrict-flex-arrays has 4 levels:

level   trailing arrays
treated as FAM

  0 [],[0],[1],[n]  the default without option
  1 [],[0],[1]
  2 [],[0]
  3 []  the default when option specified 
without value

2. -Warray-bounds and its levels

The option -Warray-bounds currently has 2 levels:

level   trailing arrays 
treated as FAM 

  1 [],[0],[1]   the default when option specified 
without value
  2 []  

i.e, 
When -Warray-bounds=1, it treats [],[0],[1] as FAM, the same level as 
-fstrict-flex-arrays=1;
When -Warray-bounds=2, it only treat [] as FAM, the same level as 
-fstrict-flex-arrays=3; 

3. How to handle the combination of  -fstrict-flex-arrays and -Warray-bounds?

Question 1:  when -fstrict-flex-arrays does not present, the default is 
-strict-flex-arrays=0, 
which treats [],[0],[1],[n] as FAM, so should we update the 
default behavior 
of -Warray-bounds to treat any trailing array [n] as FAMs?

My immediate answer to Q1 is NO, we shouldn’t, that will be a big regression on 
-Warray-bounds, right?

Question 2:  when -fstrict-flex-arrays=N1 and -Warray-bounds=N2 present at the 
same time, 
 Which one has higher priority? N1 or N2? 

-fstrict-flex-arrays=N1 controls how the compiler code generation treats the 
trailing arrays as FAMs, it seems
reasonable to give higher priority to N1, However, then should we completely 
disable the level of -Warray-bounds
N2 under such situation? 

I really don’t know what’s the best way to handle the conflict  between N1 and 
N2.

Can we completely cancel the 2 levels of -Warray-bounds, and always honor the 
level of -fstrict-flex-arrays?

Any comments or suggestion will be helpful.

thanks.

Qing

[PATCH] builtins: Add various complex builtins for _Float{16,32,64,128,32x,64x,128x}

Hi!

On top of the pending
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603665.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604080.html
the following patch adds some complex builtins which have libm
implementation in glibc 2.26 and later on various arches.
It is needed for libstdc++ _Float128 support when long double is not
IEEE quad.

Tested on x86_64-linux, ok for trunk?

2022-10-21  Jakub Jelinek  

* builtin-types.def (BT_COMPLEX_FLOAT16, BT_COMPLEX_FLOAT32,
BT_COMPLEX_FLOAT64, BT_COMPLEX_FLOAT128, BT_COMPLEX_FLOAT32X,
BT_COMPLEX_FLOAT64X, BT_COMPLEX_FLOAT128X,
BT_FN_COMPLEX_FLOAT16_COMPLEX_FLOAT16,
BT_FN_COMPLEX_FLOAT32_COMPLEX_FLOAT32,
BT_FN_COMPLEX_FLOAT64_COMPLEX_FLOAT64,
BT_FN_COMPLEX_FLOAT128_COMPLEX_FLOAT128,
BT_FN_COMPLEX_FLOAT32X_COMPLEX_FLOAT32X,
BT_FN_COMPLEX_FLOAT64X_COMPLEX_FLOAT64X,
BT_FN_COMPLEX_FLOAT128X_COMPLEX_FLOAT128X,
BT_FN_FLOAT16_COMPLEX_FLOAT16, BT_FN_FLOAT32_COMPLEX_FLOAT32,
BT_FN_FLOAT64_COMPLEX_FLOAT64, BT_FN_FLOAT128_COMPLEX_FLOAT128,
BT_FN_FLOAT32X_COMPLEX_FLOAT32X, BT_FN_FLOAT64X_COMPLEX_FLOAT64X,
BT_FN_FLOAT128X_COMPLEX_FLOAT128X,
BT_FN_COMPLEX_FLOAT16_COMPLEX_FLOAT16_COMPLEX_FLOAT16,
BT_FN_COMPLEX_FLOAT32_COMPLEX_FLOAT32_COMPLEX_FLOAT32,
BT_FN_COMPLEX_FLOAT64_COMPLEX_FLOAT64_COMPLEX_FLOAT64,
BT_FN_COMPLEX_FLOAT128_COMPLEX_FLOAT128_COMPLEX_FLOAT128,
BT_FN_COMPLEX_FLOAT32X_COMPLEX_FLOAT32X_COMPLEX_FLOAT32X,
BT_FN_COMPLEX_FLOAT64X_COMPLEX_FLOAT64X_COMPLEX_FLOAT64X,
BT_FN_COMPLEX_FLOAT128X_COMPLEX_FLOAT128X_COMPLEX_FLOAT128X): New.
* builtins.def (CABS_TYPE, CACOSH_TYPE, CARG_TYPE, CASINH_TYPE,
CPOW_TYPE, CPROJ_TYPE): Define and undefine later.
(BUILT_IN_CABS, BUILT_IN_CACOSH, BUILT_IN_CACOS, BUILT_IN_CARG,
BUILT_IN_CASINH, BUILT_IN_CASIN, BUILT_IN_CATANH, BUILT_IN_CATAN,
BUILT_IN_CCOSH, BUILT_IN_CCOS, BUILT_IN_CEXP, BUILT_IN_CLOG,
BUILT_IN_CPOW, BUILT_IN_CPROJ, BUILT_IN_CSINH, BUILT_IN_CSIN,
BUILT_IN_CSQRT, BUILT_IN_CTANH, BUILT_IN_CTAN): Add
DEF_EXT_LIB_FLOATN_NX_BUILTINS.
* fold-const-call.cc (fold_const_call_sc, fold_const_call_cc,
fold_const_call_ccc): Add various CASE_CFN_*_FN: cases when
CASE_CFN_* is present.
* gimple-ssa-backprop.cc (backprop::process_builtin_call_use):
Likewise.
* builtins.cc (expand_builtin, fold_builtin_1): Likewise.
* fold-const.cc (negate_mathfn_p, tree_expr_finite_p,
tree_expr_maybe_signaling_nan_p, tree_expr_maybe_nan_p,
tree_expr_maybe_real_minus_zero_p, tree_call_nonnegative_warnv_p):
Likewise.

--- gcc/builtin-types.def.jj2022-10-21 09:44:13.918939702 +0200
+++ gcc/builtin-types.def   2022-10-21 13:55:25.152070472 +0200
@@ -109,6 +109,34 @@ DEF_PRIMITIVE_TYPE (BT_FLOAT128X, (float
 DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT, complex_float_type_node)
 DEF_PRIMITIVE_TYPE (BT_COMPLEX_DOUBLE, complex_double_type_node)
 DEF_PRIMITIVE_TYPE (BT_COMPLEX_LONGDOUBLE, complex_long_double_type_node)
+DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT16, (float16_type_node
+? build_complex_type
+   (float16_type_node)
+: error_mark_node))
+DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT32, (float32_type_node
+? build_complex_type
+   (float32_type_node)
+: error_mark_node))
+DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT64, (float64_type_node
+? build_complex_type
+   (float64_type_node)
+: error_mark_node))
+DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT128, (float128_type_node
+ ? build_complex_type
+   (float128_type_node)
+ : error_mark_node))
+DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT32X, (float32x_type_node
+ ? build_complex_type
+   (float32x_type_node)
+ : error_mark_node))
+DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT64X, (float64x_type_node
+ ? build_complex_type
+   (float64x_type_node)
+ : error_mark_node))
+DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT128X, (float128x_type_node
+  ? build_complex_type
+   (float128x_type_node)
+  : error_mark_node))
 
 DEF_PRIMITIVE_TYPE (BT_PTR, ptr_type_node)
 DEF_PRIMITIVE_TYPE (BT_FIL

[PATCH] libstdc++-v3: support for extended floating point types

Hi!

The following patch adds  support for extended floating point
types.
C++23 removes the float/double/long double specializations from the spec
and instead adds explicit(bool) specifier on the converting constructor.
The patch uses that for converting constructor of the base template as well
as the float/double/long double specializations's converting constructors
(e.g. so that it handles convertion construction also from complex of extended
floating point types).  Copy ctor was already defaulted as the spec now
requires.
The patch also adds partial specialization for the _Float{16,32,64,128}
and __gnu_cxx::__bfloat16_t types because the base template doesn't use
__complex__ but a pair of floating point values.
This patch is on top of
https://gcc.gnu.org/pipermail/libstdc++/2022-October/054849.html
(and if
https://gcc.gnu.org/pipermail/libstdc++/2022-October/054862.html
is also applied, then
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603665.html  

  
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604080.html  

  
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604134.html
are needed as well).
The g++.dg/cpp23/ testcase verifies explicit(bool) works correctly.

Tested on x86_64-linux, ok for trunk?

2022-10-21  Jakub Jelinek  

gcc/testsuite/
* g++.dg/cpp23/ext-floating12.C: New test.
libstdc++-v3/
* include/std/complex (complex::complex converting ctor): For C++23
use explicit specifier with constant expression and explicitly cast
both parts to _Tp.
(__complex_abs, __complex_arg, __complex_cos, __complex_cosh,
__complex_exp, __complex_log, __complex_sin, __complex_sinh,
__complex_sqrt, __complex_tan, __complex_tanh, __complex_pow): Add
__complex__ _Float{16,32,64,128} and __complex__ decltype(0.0bf16)
overloads.
(complex::complex converting ctor,
complex::complex converting ctor,
complex::complex converting ctor): For C++23 implement
as template with explicit specifier with constant expression
and explicit casts.
(__complex_type): New template.
(complex): New partial specialization for types with extended floating
point types.
(__complex_acos, __complex_asin, __complex_atan, __complex_acosh,
__complex_asinh, __complex_atanh): Add __complex__ _Float{16,32,64,128}
and __complex__ decltype(0.0bf16) overloads.
(__complex_proj): Likewise.  Add template for complex of extended
floating point types.
* include/bits/cpp_type_traits.h (__is_floating): Specialize for
_Float{16,32,64,128} and __gnu_cxx::__bfloat16_t.
* testsuite/26_numerics/complex/ext_c++23.cc: New test.

--- libstdc++-v3/include/std/complex.jj 2022-10-21 08:55:43.037675332 +0200
+++ libstdc++-v3/include/std/complex2022-10-21 17:05:36.802243229 +0200
@@ -142,8 +142,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   ///  Converting constructor.
   template
+#if __cplusplus > 202002L
+   explicit(!requires(_Up __u) { _Tp{__u}; })
+   constexpr complex(const complex<_Up>& __z)
+   : _M_real(_Tp(__z.real())), _M_imag(_Tp(__z.imag())) { }
+#else
 _GLIBCXX_CONSTEXPR complex(const complex<_Up>& __z)
: _M_real(__z.real()), _M_imag(__z.imag()) { }
+#endif
 
 #if __cplusplus >= 201103L
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
@@ -1077,6 +1083,264 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : std::pow(complex<_Tp>(__x), __y);
 }
 
+#if _GLIBCXX_USE_C99_COMPLEX
+#if defined(__STDCPP_FLOAT16_T__) && defined(_GLIBCXX_FLOAT_IS_IEEE_BINARY32)
+  inline _Float16
+  __complex_abs(__complex__ _Float16 __z)
+  { return _Float16(__builtin_cabsf(__z)); }
+
+  inline _Float16
+  __complex_arg(__complex__ _Float16 __z)
+  { return _Float16(__builtin_cargf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_cos(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_ccosf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_cosh(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_ccoshf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_exp(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_cexpf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_log(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_clogf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_sin(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_csinf(__z)); }
+
+  inline __complex__ _Float16
+  __complex_sinh(__complex__ _Float16 __z)
+  { return static_cast<__complex__ _Float16>(__builtin_csinhf(__z)); }
+
+  inline __complex__ _Float16
+  __compl

[PATCH 1/2] Add gcc/make-unique.h

2022-10-21 Thread David Malcolm via Gcc-patches

This patch adds gcc/make-unique.h, containing a minimal C++11
implementation of make_unique (std::make_unique is C++14).

The followup patch uses this in dozens of places within the analyzer.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* make-unique.h: New file.

Signed-off-by: David Malcolm 
---
 gcc/make-unique.h | 42 ++
 1 file changed, 42 insertions(+)
 create mode 100644 gcc/make-unique.h

diff --git a/gcc/make-unique.h b/gcc/make-unique.h
new file mode 100644
index 000..752a1d3dd30
--- /dev/null
+++ b/gcc/make-unique.h
@@ -0,0 +1,42 @@
+/* Minimal implementation of make_unique for C++11 compatibility.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef GCC_MAKE_UNIQUE
+#define GCC_MAKE_UNIQUE
+
+/* This header uses std::unique_ptr, but  can't be directly
+   included due to issues with macros.  Hence  must be included
+   from system.h by defining INCLUDE_MEMORY in any source file using
+   make-unique.h.  */
+
+#ifndef INCLUDE_MEMORY
+# error "You must define INCLUDE_MEMORY before including system.h to use 
make-unique.h"
+#endif
+
+/* Minimal implementation of make_unique for C++11 compatibility
+   (std::make_unique is C++14).  */
+
+template
+inline typename std::enable_if::value, 
std::unique_ptr>::type
+make_unique(Args&&... args)
+{
+  return std::unique_ptr (new T (std::forward (args)...));
+}
+
+#endif /* ! GCC_MAKE_UNIQUE */
-- 
2.26.3

[PATCH 2/2] analyzer: use std::unique_ptr for pending_diagnostic/note

2022-10-21 Thread David Malcolm via Gcc-patches

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

I can self-approve this, but it requires the patch adding make-unique.h
as a prerequisite.

gcc/analyzer/ChangeLog:
* call-info.cc: Add define of INCLUDE_MEMORY.
* call-summary.cc: Likewise.
* checker-path.cc: Likewise.
* constraint-manager.cc: Likewise.
* diagnostic-manager.cc: Likewise.
(saved_diagnostic::saved_diagnostic): Use std::unique_ptr for
param d and field m_d.
(saved_diagnostic::~saved_diagnostic): Remove explicit delete of m_d.
(saved_diagnostic::add_note): Use std::unique_ptr for
param pn.
(saved_diagnostic::get_pending_diagnostic): Update for conversion
of m_sd.m_d to unique_ptr.
(diagnostic_manager::add_diagnostic): Use std::unique_ptr for
param d.  Remove explicit deletion.
(diagnostic_manager::add_note): Use std::unique_ptr for param pn.
(diagnostic_manager::emit_saved_diagnostic): Update for conversion
of m_sd.m_d to unique_ptr.
(null_assignment_sm_context::warn): Use std::unique_ptr for
param d.  Remove explicit deletion.
* diagnostic-manager.h (saved_diagnostic::saved_diagnostic): Use
std::unique_ptr for param d.
(saved_diagnostic::add_note): Likewise for param pn.
(saved_diagnostic::m_d): Likewise.
(diagnostic_manager::add_diagnostic): Use std::unique_ptr for
param d.
(diagnostic_manager::add_note): Use std::unique_ptr for param pn.
* engine.cc: Include "make-unique.h".
(impl_region_model_context::warn): Update to use std::unique_ptr
for param, removing explicit deletion.
(impl_region_model_context::add_note): Likewise.
(impl_sm_context::warn): Update to use std::unique_ptr
for param.
(impl_region_model_context::on_state_leak): Likewise for result of
on_leak.
(exploded_node::on_longjmp): Use make_unique when creating
pending_diagnostic.
(exploded_graph::process_node): Likewise.
* exploded-graph.h (impl_region_model_context::warn): Update to
use std::unique_ptr for param.
(impl_region_model_context::add_note): Likewise.
* feasible-graph.cc: Add define of INCLUDE_MEMORY.
* pending-diagnostic.cc: Likewise.
* pending-diagnostic.h: Include analyzer.sm.h"
* program-point.cc: Add define of INCLUDE_MEMORY.
* program-state.cc: Likewise.
* region-model-asm.cc: Likewise.
* region-model-impl-calls.cc: Likewise.  Include "make-unique.h".
(region_model::impl_call_putenv): Use make_unique when creating
pending_diagnostic.
* region-model-manager.cc: Add define of INCLUDE_MEMORY.
* region-model-reachability.cc: Likewise.
* region-model.cc: Likewise.  Include "make-unique.h".
(region_model::get_gassign_result): Use make_unique when creating
pending_diagnostic.
(region_model::check_for_poison): Likewise.
(region_model::on_stmt_pre): Likewise.
(region_model::check_symbolic_bounds): Likewise.
(region_model::check_region_bounds): Likewise.
(annotating_ctxt: make_note): Use std::unique_ptr for result.
(region_model::deref_rvalue): Use make_unique when creating
pending_diagnostic.
(region_model::check_for_writable_region): Likewise.
(region_model::check_region_size): Likewise.
(region_model::check_dynamic_size_for_floats): Likewise.
(region_model::maybe_complain_about_infoleak): Likewise.
(noop_region_model_context::add_note): Use std::unique_ptr for
param.  Remove explicit deletion.
* region-model.h: Include "analyzer/pending-diagnostic.h".
(region_model_context::warn): Convert param to std::unique_ptr.
(region_model_context::add_note): Likewise.
(noop_region_model_context::warn): Likewise.
(noop_region_model_context::add_note): Likewise.
(region_model_context_decorator::warn): Likewise.
(region_model_context_decorator::add_note): Likewise.
(note_adding_context::warn): Likewise.
(note_adding_context::make_note): Likewise for return type.
(test_region_model_context::warn): Convert param to
std::unique_ptr.
* region.cc: Add define of INCLUDE_MEMORY.
* sm-fd.cc: Likewise.  Include "make-unique.h".
(fd_state_machine::check_for_fd_attrs): Use make_unique when
creating pending_diagnostics.
(fd_state_machine::on_open): Likewise.
(fd_state_machine::on_creat): Likewise.
(fd_state_machine::check_for_dup): Likewise.
(fd_state_machine::on_close): Likewise.
(fd_state_machine::check_for_open_fd): Likewise.
(fd_state_machine::on_leak): Likewise, converting return type to
std::unique_ptr.
* sm-file.cc: Add define of INCLUDE_MEMORY.  Include
"make

Re: [PATCH] Always use TYPE_MODE instead of DECL_MODE for vector field

2022-10-21 Thread H.J. Lu via Gcc-patches

On Fri, Oct 21, 2022 at 2:33 AM Richard Biener
 wrote:
>
> On Thu, Oct 20, 2022 at 6:58 PM H.J. Lu via Gcc-patches
>  wrote:
> >
> > commit e034c5c895722e0092d2239cd8c2991db77d6d39
> > Author: Jakub Jelinek 
> > Date:   Sat Dec 2 08:54:47 2017 +0100
> >
> > PR target/78643
> > PR target/80583
> > * expr.c (get_inner_reference): If DECL_MODE of a non-bitfield
> > is BLKmode for vector field with vector raw mode, use TYPE_MODE
> > instead of DECL_MODE.
> >
> > fixed the case where DECL_MODE of a vector field is BLKmode and its
> > TYPE_MODE is a vector mode because of target attribute.  Remove the
> > BLKmode check for the case where DECL_MODE of a vector field is a vector
> > mode and its TYPE_MODE is BLKmode because of target attribute.
> >
> > gcc/
> >
> > PR target/107304
> > * expr.c (get_inner_reference): Always use TYPE_MODE for vector
> > field with vector raw mode.
> >
> > gcc/testsuite/
> >
> > PR target/107304
> > * gcc.target/i386/pr107304.c: New test.
> > ---
> >  gcc/expr.cc  |  3 +-
> >  gcc/testsuite/gcc.target/i386/pr107304.c | 39 
> >  2 files changed, 40 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr107304.c
> >
> > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > index efe387e6173..9145193c2c1 100644
> > --- a/gcc/expr.cc
> > +++ b/gcc/expr.cc
> > @@ -7905,8 +7905,7 @@ get_inner_reference (tree exp, poly_int64_pod 
> > *pbitsize,
> >   /* For vector fields re-check the target flags, as DECL_MODE
> >  could have been set with different target flags than
> >  the current function has.  */
> > - if (mode == BLKmode
> > - && VECTOR_TYPE_P (TREE_TYPE (field))
> > + if (VECTOR_TYPE_P (TREE_TYPE (field))
> >   && VECTOR_MODE_P (TYPE_MODE_RAW (TREE_TYPE (field
>
> Isn't the check on TYPE_MODE_RAW also wrong then?  Btw, the mode could

TYPE_MODE_RAW is always set to a vector mode for a vector type:

   /* Find an appropriate mode for the vector type.  */
if (TYPE_MODE (type) == VOIDmode)
  SET_TYPE_MODE (type,
 mode_for_vector (SCALAR_TYPE_MODE (innertype),
  nunits).else_blk ());

But TYPE_MODE returns BLKmode if the vector mode is unsupported.

> also be an integer mode.

For a vector field, mode is either BLK mode or the vector mode.  Jakub,
can you comment on it?

>
> > mode = TYPE_MODE (TREE_TYPE (field));
> > }
> > diff --git a/gcc/testsuite/gcc.target/i386/pr107304.c 
> > b/gcc/testsuite/gcc.target/i386/pr107304.c
> > new file mode 100644
> > index 000..24d68795e7f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr107304.c
> > @@ -0,0 +1,39 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O0 -march=tigerlake" } */
> > +
> > +#include 
> > +
> > +typedef union {
> > +  uint8_t v __attribute__((aligned(256))) __attribute__ ((vector_size(64 * 
> > sizeof(uint8_t;
> > +  uint8_t i[64] __attribute__((aligned(256)));
> > +} stress_vec_u8_64_t;
> > +
> > +typedef struct {
> > + struct {
> > +  stress_vec_u8_64_t s;
> > +  stress_vec_u8_64_t o;
> > +  stress_vec_u8_64_t mask1;
> > +  stress_vec_u8_64_t mask2;
> > + } u8_64;
> > +} stress_vec_data_t;
> > +
> > +__attribute__((target_clones("arch=alderlake", "default")))
> > +void
> > +stress_vecshuf_u8_64(stress_vec_data_t *data)
> > +{
> > +  stress_vec_u8_64_t *__restrict s;
> > +  stress_vec_u8_64_t *__restrict mask1;
> > +  stress_vec_u8_64_t *__restrict mask2;
> > +  register int i;
> > +
> > +  s = &data->u8_64.s;
> > +  mask1 = &data->u8_64.mask1;
> > +  mask2 = &data->u8_64.mask2;
> > +
> > +  for (i = 0; i < 256; i++) {  /* was i < 65536 */
> > +  stress_vec_u8_64_t tmp;
> > +
> > +  tmp.v = __builtin_shuffle(s->v, mask1->v);
> > +  s->v = __builtin_shuffle(tmp.v, mask2->v);
> > +  }
> > +}
> > --
> > 2.37.3
> >



-- 
H.J.

vect: Make vect_check_gather_scatter reject offsets that aren't multiples of BITS_PER_UNIT [PR107346]

2022-10-21 Thread Andre Vieira (lists) via Gcc-patches


Hi,

The ada failure reported in the PR was being caused by 
vect_check_gather_scatter failing to deal with bit offsets that weren't 
multiples of BITS_PER_UNIT. This patch makes vect_check_gather_scatter 
reject memory accesses with such offsets.


Bootstrapped and regression tested on aarch64 and x86_64.

I wasn't sure whether I should add a new Ada test that shows the same 
failure without the bitfield lowering, I suspect this is such a rare 
form of data-structure that is why no other tests have highlighted the 
failure. Let me know if you would like me to add it still, the change is 
quite simple, just change the Int24 -> Int32 type in the structure. The 
'thing' that causes the failure is the 4-bit member inside the packed 
structure before the field we access, giving it a 4-bit offset. I 
attempted but failed to create a C test using __attribute__((packed)).


Kind Regards,
Andre

gcc/ChangeLog:

    PR tree-optimization/107346
    * tree-vect-data-refs.cc (vect_check_gather_scatter): Reject 
offsets that aren't

    multiples of BITS_PER_UNIT.
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 
4a23d6172aaa12ad7049dc626e5c4afbd5ca3f74..6c892791bd4c39f672add4e4c22a9d7835e292d6
 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4016,6 +4016,11 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, 
loop_vec_info loop_vinfo,
   if (reversep)
 return false;
 
+  /* PR 107346.  Packed structs can have fields at offsets that are not
+ multiples of BITS_PER_UNIT.  Do not use gather/scatters in such cases.  */
+  if (!multiple_p (pbitpos, BITS_PER_UNIT))
+return false;
+
   poly_int64 pbytepos = exact_div (pbitpos, BITS_PER_UNIT);
 
   if (TREE_CODE (base) == MEM_REF)

Re: [PATCH] Rename nonzero_bits to known_zero_bits.

Hi!

On Fri, Oct 21, 2022 at 03:14:26PM +0200, Aldy Hernandez via Gcc-patches wrote:
> The name nonzero_bits is confusing.  We're not tracking nonzero bits.
> We're tracking known-zero bits, or at the worst we're tracking "maye
> nonzero bits".  But really, the only thing we're sure about in the
> "nonzero" bits are the bits that are zero, which are known to be 0.
> We're not tracking nonzero bits.

Indeed.

> I know we've been carrying around this name forever, but the fact that
> both of the maintainers of the code *HATE* it, should be telling.
> Also, we'd also like to track known-one bits in the irange, so it's
> best to keep the nomenclature consistent.

And that as well.

However:

>   * asan.cc (handle_builtin_alloca): Rename *nonzero* to *known_zero*.

Our "nonzero" means "not known to be zero", not "known to be zero", so
this renaming makes it worse than it was.  Rename it to
"not_known_zero", make that a thin wrapper around a new "known_zero",
and slowly get rid of not_known_zero?

> --- a/gcc/asan.cc
> +++ b/gcc/asan.cc
> @@ -816,7 +816,7 @@ handle_builtin_alloca (gcall *call, gimple_stmt_iterator 
> *iter)
>tree redzone_size = build_int_cst (size_type_node, ASAN_RED_ZONE_SIZE);
>  
>/* Extract lower bits from old_size.  */
> -  wide_int size_nonzero_bits = get_nonzero_bits (old_size);
> +  wide_int size_nonzero_bits = get_known_zero_bits (old_size);

Such variables should also be renamed :-(


Segher

Re: [PATCH] Rename nonzero_bits to known_zero_bits.

On Fri, Oct 21, 2022 at 11:45:33AM -0500, Segher Boessenkool wrote:
> On Fri, Oct 21, 2022 at 03:14:26PM +0200, Aldy Hernandez via Gcc-patches 
> wrote:
> > The name nonzero_bits is confusing.  We're not tracking nonzero bits.
> > We're tracking known-zero bits, or at the worst we're tracking "maye
> > nonzero bits".  But really, the only thing we're sure about in the
> > "nonzero" bits are the bits that are zero, which are known to be 0.
> > We're not tracking nonzero bits.
> 
> Indeed.
> 
> > I know we've been carrying around this name forever, but the fact that
> > both of the maintainers of the code *HATE* it, should be telling.
> > Also, we'd also like to track known-one bits in the irange, so it's
> > best to keep the nomenclature consistent.
> 
> And that as well.
> 
> However:
> 
> > * asan.cc (handle_builtin_alloca): Rename *nonzero* to *known_zero*.
> 
> Our "nonzero" means "not known to be zero", not "known to be zero", so
> this renaming makes it worse than it was.  Rename it to

Agreed.

I think maybe_nonzero_bits would be fine.

Anyway, the reason it is called this way is that we have similar APIs
on the RTL side, nonzero_bits* in rtlanal.cc.
So if we rename, it should be renamed consistently.

> "not_known_zero", make that a thin wrapper around a new "known_zero",
> and slowly get rid of not_known_zero?

Jakub

Re: [PATCH] Rename nonzero_bits to known_zero_bits.

On Fri, Oct 21, 2022 at 06:51:19PM +0200, Jakub Jelinek wrote:
> Agreed.
> 
> I think maybe_nonzero_bits would be fine.

Or yet another option is to change what we track and instead of
having just one bitmask have 2 as tree-ssa-ccp.cc does,
one bitmask says which bits are known to be always the same
and the other which specifies the values of those bits.
"For X with a CONSTANT lattice value X & ~mask == value & ~mask.  The
zero bits in the mask cover constant values.  The ones mean no
information."

Jakub

Re: [PATCH] Rename nonzero_bits to known_zero_bits.

On Fri, Oct 21, 2022 at 06:51:17PM +0200, Jakub Jelinek wrote:
> On Fri, Oct 21, 2022 at 11:45:33AM -0500, Segher Boessenkool wrote:
> > On Fri, Oct 21, 2022 at 03:14:26PM +0200, Aldy Hernandez via Gcc-patches 
> > wrote:
> > >   * asan.cc (handle_builtin_alloca): Rename *nonzero* to *known_zero*.
> > 
> > Our "nonzero" means "not known to be zero", not "known to be zero", so
> > this renaming makes it worse than it was.  Rename it to
> 
> Agreed.
> 
> I think maybe_nonzero_bits would be fine.

Yes, but the shorter nam known_zero is much better.  Converting to that
is a bit more work, cannot really be mechanic: code simplifications are
needed to make things better instead of adding another layer of double
negations, and variable names and comments should be changes as well.

> Anyway, the reason it is called this way is that we have similar APIs
> on the RTL side, nonzero_bits* in rtlanal.cc.

I am well aware ;-)

> So if we rename, it should be renamed consistently.

Yes.


Segher

RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-21 Thread Joshi, Tejas Sanjay via Gcc-patches

[Public]

Hi all,
> Okay, I will prepare another patch which reverts the znver4 instruction 
> reservations and submit it.

PFA the patch which reverts the znver4 instruction reservations. I have also 
made znver4 to use znver3 scheduler for now.
If its good for the trunk, I will submit it.

Thanks and Regards,
Tejas


0001-Remove-znver4-instruction-reservations.patch
Description: 0001-Remove-znver4-instruction-reservations.patch

Re: [PATCH] Rename nonzero_bits to known_zero_bits.

On Fri, Oct 21, 2022 at 06:54:32PM +0200, Jakub Jelinek wrote:
> On Fri, Oct 21, 2022 at 06:51:19PM +0200, Jakub Jelinek wrote:
> > Agreed.
> > 
> > I think maybe_nonzero_bits would be fine.
> 
> Or yet another option is to change what we track and instead of
> having just one bitmask have 2 as tree-ssa-ccp.cc does,
> one bitmask says which bits are known to be always the same
> and the other which specifies the values of those bits.
> "For X with a CONSTANT lattice value X & ~mask == value & ~mask.  The
> zero bits in the mask cover constant values.  The ones mean no
> information."

I am still working on making the RTL nonzero_bits use DF (and indeed I
do a known_zero instead :-) ).  This makes the special version in
combine unnecessary: instead of working better than the generic version
it is strictly weaker then.  This change then makes it possible to use
nonzero_bits in instruction conditions (without causing ICEs as now --
passes after combine return a subset of the nonzero_bits the version in
combine does, which can make insns no longer match in later passes).

My fear is tracking twice as many bits might become expensive.  OTOH
ideally we can get rid of combine's reg_stat completely at some point
in the future (which has all the same problems as combine's version of
nonzero_bits: the values it returns depend on the order combine tried
possible combinations).

Storage requirements are the same for known_zero_bits and known_one_bits
vs. known_bits and known_bit_values, but the latter is a bit more
costly to compute, but more importantly it is usually a lot less
convenient in use.  (A third option is known_bits and known_zero_bits?)

Segher

Re: [PATCH 1/2] Add a parameter for the builtin function of prefetch to align with LLVM