date:20220307

Re: [PATCH] [i386] Prevent vectorization for load from parm_decl at O2 to avoid STF issue.

2022-03-07 Thread Hongtao Liu via Gcc-patches

On Fri, Mar 4, 2022 at 3:28 PM liuhongt via Gcc-patches
 wrote:
>
> For parameter passing through stack, vectorized load from parm_decl
> in callee may trigger serious STF issue. This is why GCC12 regresses
> 50% for cray at -O2 compared to GCC11.
>
> The patch add an extremely large number to stmt_cost to prevent
> vectorization for loads from parm_decl under very-cheap cost model,
> this can at least prevent O2 regression due to STF issue, but may lose
> some perf where there's no such issue(1 vector_load vs n scalar_load +
> CTOR).
>
> No impact for SPEC2017 for both plain O2 and native O2 on ICX.
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/101908
> * config/i386/i386.cc (ix86_load_maybe_stfs_p): New.
> (ix86_vector_costs::add_stmt_cost): Add extra cost for
> vector_load/unsigned_load which may have stall forward issue.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr101908-1.c: New test.
> * gcc.target/i386/pr101908-2.c: New test.
> ---
>  gcc/config/i386/i386.cc| 31 ++
>  gcc/testsuite/gcc.target/i386/pr101908-1.c | 12 +
>  gcc/testsuite/gcc.target/i386/pr101908-2.c | 12 +
>  3 files changed, 55 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-2.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index b2bf90576d5..3bbaaf65ea8 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22976,6 +22976,19 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, 
> struct noce_if_info *if_info)
>return default_noce_conversion_profitable_p (seq, if_info);
>  }
>
> +/* Return true if REF may have STF issue, otherwise false.  */
> +static bool
> +ix86_load_maybe_stfs_p (tree ref)
> +{
> +  tree addr = get_base_address (ref);
> +
> +  if (TREE_CODE (addr) != PARM_DECL
> +  || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (addr)))
> +  || tree_to_uhwi (TYPE_SIZE (TREE_TYPE (addr))) <= MAX_BITS_PER_WORD)
> +return false;
> +  return true;
> +}
> +
>  /* x86-specific vector costs.  */
>  class ix86_vector_costs : public vector_costs
>  {
> @@ -23203,6 +23216,24 @@ ix86_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
> if (TREE_CODE (op) == SSA_NAME)
>   TREE_VISITED (op) = 0;
>  }
> +
> +  /* Prevent vectorization for load from parm_decl at O2 to avoid STF issue.
> + Performance may lose when there's no STF issue(1 vector_load vs n
> + scalar_load + CTOR).
> + TODO: both extra cost(2000) and ix86_load_maybe_stfs_p need to be fine
> + tuned.  */
> +  if ((kind == vector_load || kind == unaligned_load)
> +  && flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP
> +  && stmt_info
> +  && stmt_info->slp_type == pure_slp
> +  && stmt_info->stmt
> +  && gimple_assign_load_p (stmt_info->stmt)
> +  && ix86_load_maybe_stfs_p (gimple_assign_rhs1 (stmt_info->stmt)))
> +{
> +  stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
> +  stmt_cost += 2000;
> +}
> +
>if (stmt_cost == -1)
>  stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr101908-1.c 
> b/gcc/testsuite/gcc.target/i386/pr101908-1.c
> new file mode 100644
> index 000..f8e0f2e26bb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101908-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-slp-details" } */
> +/* { dg-final { scan-tree-dump {(?n)add new stmt:.*MEM \ double\>} "slp2" } } */
> +
> +struct X { double x[2]; };
> +typedef double v2df __attribute__((vector_size(16)));
> +
> +v2df __attribute__((noipa))
> +foo (struct X* x, struct X* y)
> +{
> +  return (v2df) {x->x[1], x->x[0] } + (v2df) { y->x[1], y->x[0] };
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr101908-2.c 
> b/gcc/testsuite/gcc.target/i386/pr101908-2.c
> new file mode 100644
> index 000..7f2f00cebab
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101908-2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-slp-details" } */
> +/* { dg-final { scan-tree-dump-not {(?n)add new stmt:.*MEM \ double\>} "slp2" } } */
> +
> +struct X { double x[2]; };
> +typedef double v2df __attribute__((vector_size(16)));
> +
> +v2df __attribute__((noipa))
> +foo (struct X x, struct X y)
> +{
> +  return (v2df) {x.x[1], x.x[0] } + (v2df) { y.x[1], y.x[0] };
> +}
> --
> 2.18.1
>


-- 
BR,
Hongtao

Re: [PATCH] i386: Fix up cond_{and,ior,xor,mul}* [PR104779]

2022-03-07 Thread Jakub Jelinek via Gcc-patches

On Mon, Mar 07, 2022 at 10:15:48AM +0800, Hongtao Liu wrote:
> > Note, the predicates on cond_fma* and other FMA variants look all wrong to
> > me, usually the fma instructions require nonimmediate_operand operands,
> > but the cond_* patterns use vector_operand.  Besides what this patch
> cut from predicate.md-
> 1142; Return true when OP is operand acceptable for vector memory operand.
> 1143; Only AVX can have misaligned memory operand.
> 1144(define_predicate "vector_memory_operand"
> 1145  (and (match_operand 0 "memory_operand")
> 1146   (ior (match_test "TARGET_AVX")
> 1147(match_test "MEM_ALIGN (op) >= GET_MODE_ALIGNMENT (mode)"
> 1148
> 1149; Return true when OP is register_operand or vector_memory_operand.
> 1150(define_predicate "vector_operand"
> 1151  (ior (match_operand 0 "register_operand")
> 1152   (match_operand 0 "vector_memory_operand")))
> cut end
> 
> vector_operand is a subset of nonimmediate_operands, so it's more like
> a potential optimization issue rather than a correctness one?

You're right.
And fma apparently doesn't use something like ix86_fixup_binary_operands*,
allows all 3 inputs as MEMs and only deals with it at RA time.

Jakub

Re: [PATCH] [i386] Prevent vectorization for load from parm_decl at O2 to avoid STF issue.

2022-03-07 Thread Richard Biener via Gcc-patches

On Fri, Mar 4, 2022 at 8:27 AM liuhongt  wrote:
>
> For parameter passing through stack, vectorized load from parm_decl
> in callee may trigger serious STF issue. This is why GCC12 regresses
> 50% for cray at -O2 compared to GCC11.
>
> The patch add an extremely large number to stmt_cost to prevent
> vectorization for loads from parm_decl under very-cheap cost model,
> this can at least prevent O2 regression due to STF issue, but may lose
> some perf where there's no such issue(1 vector_load vs n scalar_load +
> CTOR).

Note this is just heuristics in that by-value passed parameters are usually
stored to the stack close before the function call.  It does not catch the
similar case from

  foo (const X &bar)  { ... }

where a

  foo ({ 1., 2. })

will have the object passed by reference constructed right before the
call.  In the end a full solution will need to perform some IPA analysis
that computes the initialization distance from the call and uses should
factor in the use distance from function entry.

> No impact for SPEC2017 for both plain O2 and native O2 on ICX.
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/101908
> * config/i386/i386.cc (ix86_load_maybe_stfs_p): New.
> (ix86_vector_costs::add_stmt_cost): Add extra cost for
> vector_load/unsigned_load which may have stall forward issue.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr101908-1.c: New test.
> * gcc.target/i386/pr101908-2.c: New test.
> ---
>  gcc/config/i386/i386.cc| 31 ++
>  gcc/testsuite/gcc.target/i386/pr101908-1.c | 12 +
>  gcc/testsuite/gcc.target/i386/pr101908-2.c | 12 +
>  3 files changed, 55 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-2.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index b2bf90576d5..3bbaaf65ea8 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22976,6 +22976,19 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, 
> struct noce_if_info *if_info)
>return default_noce_conversion_profitable_p (seq, if_info);
>  }
>
> +/* Return true if REF may have STF issue, otherwise false.  */
> +static bool
> +ix86_load_maybe_stfs_p (tree ref)
> +{
> +  tree addr = get_base_address (ref);
> +
> +  if (TREE_CODE (addr) != PARM_DECL
> +  || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (addr)))
> +  || tree_to_uhwi (TYPE_SIZE (TREE_TYPE (addr))) <= MAX_BITS_PER_WORD)
> +return false;
> +  return true;
> +}
> +
>  /* x86-specific vector costs.  */
>  class ix86_vector_costs : public vector_costs
>  {
> @@ -23203,6 +23216,24 @@ ix86_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
> if (TREE_CODE (op) == SSA_NAME)
>   TREE_VISITED (op) = 0;
>  }
> +
> +  /* Prevent vectorization for load from parm_decl at O2 to avoid STF issue.
> + Performance may lose when there's no STF issue(1 vector_load vs n
> + scalar_load + CTOR).
> + TODO: both extra cost(2000) and ix86_load_maybe_stfs_p need to be fine
> + tuned.  */
> +  if ((kind == vector_load || kind == unaligned_load)
> +  && flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP

This check doesn't make much sense, I'd rather remove it.

> +  && stmt_info
> +  && stmt_info->slp_type == pure_slp
> +  && stmt_info->stmt
> +  && gimple_assign_load_p (stmt_info->stmt)
> +  && ix86_load_maybe_stfs_p (gimple_assign_rhs1 (stmt_info->stmt)))

I'd pass down STMT_VINFO_DATA_REF instead and have ix86_load_maybe_stfs_p
and use

  tree addr = DR_BASE_ADDRESS (dr);
  if (TREE_CODE (addr) != ADDR_EXPR)
return false;
  addr = get_base_address (TREE_OPERAND (addr, 0));
  ...

since that gets you a more reliable way to look at the actual object referenced.

> +{
> +  stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
> +  stmt_cost += 2000;
> +}
> +

Maybe handle this like the Bonell case and thus put it after the
stmt_cost == -1 handling, just bumping the cost (also noting the actual number
is arbitrary).  It would be nice to have a better estimate on the penalty than
"2000", maybe formulate it in terms of the target costs simple-sse op at least.

That said, it might be interesting to micro-benchmark

v2df __attribute__((noipa))
foo (struct X* x, struct X* y)
{
  double temx0, temx1, temy0, temy1;
  temx0 = x->x[0];
  temx0 += temx0;
 ...
  temx1 = x->x[1];
  temx1 += temx1;
...
  return (v2df) {temx1, temx0 } + (v2df) { temy1, temy0 };
}

(without -ffast-math) to see how many vector adds we'd need to compensate the
STLF penalty (just to have an idea whether the magic number is closer to 200,
2000 or 2).  Maybe also put that respective kernel into the i386 testsuite
with a specific -mtune (and make the thing a target tunable?).

>if (stmt_cost == -1)
>

[PATCH] PR tree-optimization/98335: Improvements to DSE's compute_trims.

2022-03-07 Thread Roger Sayle


This patch is the main middle-end piece of a fix for PR tree-opt/98335,
which is a code-quality regression affecting mainline.  The issue occurs
in DSE's (dead store elimination's) compute_trims function that determines
where a store to memory can be trimmed.  In the testcase given in the
PR, this function notices that the first byte of a DImode store is dead,
and replaces the 8-byte store at (aligned) offset zero, with a 7-byte store
at (unaligned) offset one.  Most architectures can store a power-of-two
bytes (up to a maximum) in single instruction, so writing 7 bytes requires
more instructions than writing 8 bytes.  This patch follows Jakub Jelinek's
suggestion in comment 5, that compute_trims needs improved heuristics.

In this patch, decision of whether and how to align trim_head is based
on the number of bytes being written, the alignment of the start of the
object and where within the object the first byte is written.  The first
tests check whether we're already writing to the start of the object,
and that we're writing three or more bytes.  If we're only writing one
or two bytes, there's no benefit from providing additional alignment.
Then we determine the alignment of the object, which is either 1, 2,
4, 8 or 16 byte aligned (capping at 16 guarantees that we never write
more than 7 bytes beyond the minimum required).  If the buffer is only
1 or 2 byte aligned there's no benefit from additional alignment.  For
the remaining cases, alignment of trim_head is based upon where within
each aligned block (word) the first byte is written.  For example,
storing the last byte (or last half-word) of a word can be performed
with a single insn.

On x86_64-pc-linux-gnu with -O2 the new test case in the PR goes from:

movl$0, -24(%rsp)
movabsq $72057594037927935, %rdx
movl$0, -21(%rsp)
andq-24(%rsp), %rdx
movq%rdx, %rax
salq$8, %rax
movbc(%rip), %al
ret

to

xorl%eax, %eax
movbc(%rip), %al
ret

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  I've also added new testcases
for the original motivating PR tree-optimization/86010, to ensure that
those remain optimized (in future).  Ok for mainline?

2022-03-07  Roger Sayle  

gcc/ChangeLog
PR tree-optimization/98335
* tree-ssa-dse.cc (compute_trims): Improve logic deciding whether
to align trim_head, writing more bytes by using fewer instructions.

gcc/testsuite/ChangeLog
PR tree-optimization/98335
* g++.dg/pr98335.C: New test case.
* gcc.dg/pr86010.c: New test case.
* gcc.dg/pr86010-2.c: New test case.

Thanks in advance,
Roger
--

diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 2b22a61..080e406 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -405,10 +405,36 @@ compute_trims (ao_ref *ref, sbitmap live, int *trim_head, 
int *trim_tail,
   int first_live = bitmap_first_set_bit (live);
   *trim_head = first_live - first_orig;
 
-  /* If more than a word remains, then make sure to keep the
- starting point at least word aligned.  */
-  if (last_live - first_live > UNITS_PER_WORD)
-*trim_head &= ~(UNITS_PER_WORD - 1);
+  /* If REF is aligned, try to maintain this alignment if it reduces
+ the number of (power-of-two sized aligned) writes to memory.
+ First check that we're writing >= 3 bytes at a non-zero offset.  */
+  if (first_live
+  && last_live - first_live >= 2)
+{
+  unsigned int align = TYPE_ALIGN_UNIT (TREE_TYPE (ref->base));
+  if (DECL_P (ref->base) && DECL_ALIGN_UNIT (ref->base) > align)
+   align = DECL_ALIGN_UNIT (ref->base);
+  if (align > UNITS_PER_WORD)
+   align = UNITS_PER_WORD;
+  if (align > 16)
+   align = 16;
+  if (align > 2)
+   {
+ /* ALIGN is 4, 8 or 16.  */
+ unsigned int low = first_live & (align - 1);
+ if (low * 2 < align)
+   {
+ if (align == 16 && low >= 4 && last_live < 15)
+   *trim_head &= ~3;
+ else
+   *trim_head &= ~(align - 1);
+   }
+ else if (low + 3 == align)
+   *trim_head &= ~1;
+ else if (low > 8 && low < 12)
+   *trim_head &= ~3;
+   }
+}
 
   if ((*trim_head || *trim_tail)
   && dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/testsuite/g++.dg/pr98335.C b/gcc/testsuite/g++.dg/pr98335.C
new file mode 100644
index 000..c54f4d9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr98335.C
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+struct Data {
+  char a;
+  int b;
+};
+
+char c;
+
+Data val(int idx) {
+  return { c };  // { dg-warning "extended initializer" "c++ 98"  { target { 
c++98_only } } }
+}
+
+/* { dg-final { scan-tree-dump-not " + 1B] = {}" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/pr86010-2.c b/gcc/te

[PATCH][ARM] translation: reuse string and use switch for codes

2022-03-07 Thread Martin Liška


Hi.

The patch simplifies translation strings.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error
message.
---
 gcc/config/arm/arm-builtins.cc | 127 -
 1 file changed, 79 insertions(+), 48 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index a7acc1d71e7..bdccba2bc88 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -3926,61 +3926,92 @@ arm_expand_builtin (tree exp,
   || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW)
  && (imm < 0 || imm > 32))
{
- if (fcode == ARM_BUILTIN_WRORHI)
-   error ("the range of count should be in 0 to 32; please check the 
intrinsic %<_mm_rori_pi16%> in code");
- else if (fcode == ARM_BUILTIN_WRORWI)
-   error ("the range of count should be in 0 to 32; please check the 
intrinsic %<_mm_rori_pi32%> in code");
- else if (fcode == ARM_BUILTIN_WRORH)
-   error ("the range of count should be in 0 to 32; please check the 
intrinsic %<_mm_ror_pi16%> in code");
- else
-   error ("the range of count should be in 0 to 32; please check the 
intrinsic %<_mm_ror_pi32%> in code");
+ const char *builtin;
+ switch (fcode)
+   {
+   case ARM_BUILTIN_WRORHI:
+ builtin = "_mm_rori_pi16";
+ break;
+   case ARM_BUILTIN_WRORWI:
+ builtin = "_mm_rori_pi32";
+ break;
+   case ARM_BUILTIN_WRORH:
+ builtin = "_mm_ror_pi16";
+ break;
+   default:
+ builtin = "_mm_ror_pi32";
+ break;
+   }
+ error ("the range of count should be in 0 to 32; please check the 
intrinsic %qs in code", builtin);
}
  else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD)
   && (imm < 0 || imm > 64))
{
- if (fcode == ARM_BUILTIN_WRORDI)
-   error ("the range of count should be in 0 to 64; please check the 
intrinsic %<_mm_rori_si64%> in code");
- else
-   error ("the range of count should be in 0 to 64; please check the 
intrinsic %<_mm_ror_si64%> in code");
+ const char *builtin = fcode == ARM_BUILTIN_WRORDI ? "_mm_rori_si64" : 
"_mm_ror_si64";
+ error ("the range of count should be in 0 to 64; please check the 
intrinsic %qs in code", builtin);
}
  else if (imm < 0)
{
- if (fcode == ARM_BUILTIN_WSRLHI)
-   error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srli_pi16%> in code");
- else if (fcode == ARM_BUILTIN_WSRLWI)
-   error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srli_pi32%> in code");
- else if (fcode == ARM_BUILTIN_WSRLDI)
-   error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srli_si64%> in code");
- else if (fcode == ARM_BUILTIN_WSLLHI)
-   error ("the count should be no less than 0; please check the intrinsic 
%<_mm_slli_pi16%> in code");
- else if (fcode == ARM_BUILTIN_WSLLWI)
-   error ("the count should be no less than 0; please check the intrinsic 
%<_mm_slli_pi32%> in code");
- else if (fcode == ARM_BUILTIN_WSLLDI)
-   error ("the count should be no less than 0; please check the intrinsic 
%<_mm_slli_si64%> in code");
- else if (fcode == ARM_BUILTIN_WSRAHI)
-   error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srai_pi16%> in code");
- else if (fcode == ARM_BUILTIN_WSRAWI)
-   error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srai_pi32%> in code");
- else if (fcode == ARM_BUILTIN_WSRADI)
-   error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srai_si64%> in code");
- else if (fcode == ARM_BUILTIN_WSRLH)
-   error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srl_pi16%> in code");
- else if (fcode == ARM_BUILTIN_WSRLW)
-   error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srl_pi32%> in code");
- else if (fcode == ARM_BUILTIN_WSRLD)
-   error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srl_si64%> in code");
- else if (fcode == ARM_BUILTIN_WSLLH)
-   error ("the count should be no less than 0; please check the intrinsic 
%<_mm_sll_pi16%> in code");
- else if

Re: [PATCH] s390: Fix up *cmp_and_trap_unsigned_int constraints [PR104775]

2022-03-07 Thread Andreas Krebbel via Gcc-patches

On 3/5/22 09:33, Jakub Jelinek wrote:
> Hi!
> 
> The following testcase fails to assemble due to clgte %r6,0(%r1,%r10)
> insn not being accepted by assembler.
> My rough understanding is that in the RSY-b insn format the spot
> in other formats used for index registers is used instead for M3 what
> kind of comparison it is, so this patch follows what other similar
> instructions use for constraint (i.e. one without index register).
> 
> Bootstrapped on s390x-linux, regtest there still pending, ok for
> trunk if it passes it?
> 
> 2022-03-05  Jakub Jelinek  
> 
>   PR target/104775
>   * config/s390/s390.md (*cmp_and_trap_unsigned_int): Use
>   S constraint instead of T in the last alternative.
> 
>   * gcc.target/s390/pr104775.c: New test.

Ok. Thanks for the fix!

Bye,

Andreas

Re: [PATCH][ARM] translation: reuse string and use switch for codes

2022-03-07 Thread Jakub Jelinek via Gcc-patches

On Mon, Mar 07, 2022 at 11:09:26AM +0100, Martin Liška wrote:
> gcc/ChangeLog:
> 
>   * config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error
>   message.

Will defer review to ARM maintainers, just wanted to mention some of the
lines are way too long (before or after).

> --- a/gcc/config/arm/arm-builtins.cc
> +++ b/gcc/config/arm/arm-builtins.cc
> @@ -3926,61 +3926,92 @@ arm_expand_builtin (tree exp,
>  || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW)
> && (imm < 0 || imm > 32))
>   {
> -   if (fcode == ARM_BUILTIN_WRORHI)
> - error ("the range of count should be in 0 to 32; please check 
> the intrinsic %<_mm_rori_pi16%> in code");
> -   else if (fcode == ARM_BUILTIN_WRORWI)
> - error ("the range of count should be in 0 to 32; please check 
> the intrinsic %<_mm_rori_pi32%> in code");
> -   else if (fcode == ARM_BUILTIN_WRORH)
> - error ("the range of count should be in 0 to 32; please check 
> the intrinsic %<_mm_ror_pi16%> in code");
> -   else
> - error ("the range of count should be in 0 to 32; please check 
> the intrinsic %<_mm_ror_pi32%> in code");
> +   const char *builtin;
> +   switch (fcode)
> + {
> + case ARM_BUILTIN_WRORHI:
> +   builtin = "_mm_rori_pi16";
> +   break;
> + case ARM_BUILTIN_WRORWI:
> +   builtin = "_mm_rori_pi32";
> +   break;
> + case ARM_BUILTIN_WRORH:
> +   builtin = "_mm_ror_pi16";
> +   break;
> + default:
> +   builtin = "_mm_ror_pi32";
> +   break;
> + }
> +   error ("the range of count should be in 0 to 32; please check the 
> intrinsic %qs in code", builtin);
>   }
> else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD)
>  && (imm < 0 || imm > 64))
>   {
> -   if (fcode == ARM_BUILTIN_WRORDI)
> - error ("the range of count should be in 0 to 64; please check 
> the intrinsic %<_mm_rori_si64%> in code");
> -   else
> - error ("the range of count should be in 0 to 64; please check 
> the intrinsic %<_mm_ror_si64%> in code");
> +   const char *builtin = fcode == ARM_BUILTIN_WRORDI ? 
> "_mm_rori_si64" : "_mm_ror_si64";
> +   error ("the range of count should be in 0 to 64; please check the 
> intrinsic %qs in code", builtin);
>   }
> else if (imm < 0)
>   {
> -   if (fcode == ARM_BUILTIN_WSRLHI)
> - error ("the count should be no less than 0; please check the 
> intrinsic %<_mm_srli_pi16%> in code");
> -   else if (fcode == ARM_BUILTIN_WSRLWI)
> - error ("the count should be no less than 0; please check the 
> intrinsic %<_mm_srli_pi32%> in code");
> -   else if (fcode == ARM_BUILTIN_WSRLDI)
> - error ("the count should be no less than 0; please check the 
> intrinsic %<_mm_srli_si64%> in code");
> -   else if (fcode == ARM_BUILTIN_WSLLHI)
> - error ("the count should be no less than 0; please check the 
> intrinsic %<_mm_slli_pi16%> in code");
> -   else if (fcode == ARM_BUILTIN_WSLLWI)
> - error ("the count should be no less than 0; please check the 
> intrinsic %<_mm_slli_pi32%> in code");
> -   else if (fcode == ARM_BUILTIN_WSLLDI)
> - error ("the count should be no less than 0; please check the 
> intrinsic %<_mm_slli_si64%> in code");
> -   else if (fcode == ARM_BUILTIN_WSRAHI)
> - error ("the count should be no less than 0; please check the 
> intrinsic %<_mm_srai_pi16%> in code");
> -   else if (fcode == ARM_BUILTIN_WSRAWI)
> - error ("the count should be no less than 0; please check the 
> intrinsic %<_mm_srai_pi32%> in code");
> -   else if (fcode == ARM_BUILTIN_WSRADI)
> - error ("the count should be no less than 0; please check the 
> intrinsic %<_mm_srai_si64%> in code");
> -   else if (fcode == ARM_BUILTIN_WSRLH)
> - error ("the count should be no less than 0; please check the 
> intrinsic %<_mm_srl_pi16%> in code");
> -   else if (fcode == ARM_BUILTIN_WSRLW)
> - error ("the count should be no less than 0; please check the 
> intrinsic %<_mm_srl_pi32%> in code");
> -   else if (fcode == ARM_BUILTIN_WSRLD)
> - error ("the count should be no less than 0; please check the 
> intrinsic %<_mm_srl_si64%> in code");
> -   else if (fcode == ARM_BUILTIN_WSLLH)
> - error ("the count should be no less than 0; please check the 
> intrinsic %<_mm_sll_pi16%> in code");
> -   else if (fcode == ARM_BUILTIN_WSLLW)
> - error ("the count should be no less than 0; please check the 
> intrinsic %<_mm_sll_pi32%> in code");
> -   else if (

Re: [PATCH][ARM] translation: reuse string and use switch for codes

2022-03-07 Thread Martin Liška


Hello.

There's V2 where I fixed fact that:

ARM_BUILTIN_WRORHI || fcode == ARM_BUILTIN_WRORH can have valid range on [1, 
16].

Ready to be installed?
Thanks,
MartinFrom 879262c3a7aefd3ab9552cec881248b240f53818 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 7 Mar 2022 10:56:43 +0100
Subject: [PATCH] translation: reuse string and use switch for codes

gcc/ChangeLog:

	* config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error
	message.
---
 gcc/config/arm/arm-builtins.cc | 125 -
 1 file changed, 75 insertions(+), 50 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index a7acc1d71e7..77d47d1d059 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -3922,65 +3922,90 @@ arm_expand_builtin (tree exp,
   if (GET_MODE (op1) == VOIDmode)
 	{
 	  imm = INTVAL (op1);
-	  if ((fcode == ARM_BUILTIN_WRORHI || fcode == ARM_BUILTIN_WRORWI
-	   || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW)
+	  if ((fcode == ARM_BUILTIN_WRORWI || fcode == ARM_BUILTIN_WRORW)
 	  && (imm < 0 || imm > 32))
 	{
-	  if (fcode == ARM_BUILTIN_WRORHI)
-		error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_rori_pi16%> in code");
-	  else if (fcode == ARM_BUILTIN_WRORWI)
-		error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_rori_pi32%> in code");
-	  else if (fcode == ARM_BUILTIN_WRORH)
-		error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_ror_pi16%> in code");
-	  else
-		error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_ror_pi32%> in code");
+	  const char *builtin = (fcode == ARM_BUILTIN_WRORWI
+ ? "_mm_rori_pi32" : "_mm_ror_pi32");
+	  error ("the range of count should be in 0 to 32; please check the intrinsic %qs in code",
+		 builtin);
+	}
+	  else if ((fcode == ARM_BUILTIN_WRORHI || fcode == ARM_BUILTIN_WRORH)
+		   && (imm < 0 || imm > 16))
+	{
+	  const char *builtin = (fcode == ARM_BUILTIN_WRORHI
+ ? "_mm_rori_pi16" : "_mm_ror_pi16");
+	  error ("the range of count should be in 0 to 16; please check the intrinsic %qs in code",
+		 builtin);
 	}
 	  else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD)
 		   && (imm < 0 || imm > 64))
 	{
-	  if (fcode == ARM_BUILTIN_WRORDI)
-		error ("the range of count should be in 0 to 64; please check the intrinsic %<_mm_rori_si64%> in code");
-	  else
-		error ("the range of count should be in 0 to 64; please check the intrinsic %<_mm_ror_si64%> in code");
+	  const char *builtin = fcode == ARM_BUILTIN_WRORDI ? "_mm_rori_si64" : "_mm_ror_si64";
+	  error ("the range of count should be in 0 to 64; please check the intrinsic %qs in code", builtin);
 	}
 	  else if (imm < 0)
 	{
-	  if (fcode == ARM_BUILTIN_WSRLHI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_pi16%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRLWI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_pi32%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRLDI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_si64%> in code");
-	  else if (fcode == ARM_BUILTIN_WSLLHI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_pi16%> in code");
-	  else if (fcode == ARM_BUILTIN_WSLLWI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_pi32%> in code");
-	  else if (fcode == ARM_BUILTIN_WSLLDI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_si64%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRAHI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_pi16%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRAWI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_pi32%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRADI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_si64%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRLH)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_pi16%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRLW)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_pi32%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRLD)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_si64%> in code");
-	  else if (fcode == ARM_BUILTIN_WSLLH)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_sll_pi16%> in code");
-	  else if (fcode == ARM_BUILTIN_WSLLW)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_sll_pi32%> in code");
-	  else if (f

Re: [PATCH][ARM] translation: reuse string and use switch for codes

2022-03-07 Thread Martin Liška


On 3/7/22 11:12, Jakub Jelinek wrote:

Will defer review to ARM maintainers, just wanted to mention some of the
lines are way too long (before or after).


Sure, fixed in V3.

MartinFrom 34a195d905bb1719e13c2021ee475a0581b62a5b Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 7 Mar 2022 10:56:43 +0100
Subject: [PATCH] translation: reuse string and use switch for codes

gcc/ChangeLog:

	* config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error
	message.
---
 gcc/config/arm/arm-builtins.cc | 127 -
 1 file changed, 77 insertions(+), 50 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index a7acc1d71e7..6c0b1bda66a 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -3922,65 +3922,92 @@ arm_expand_builtin (tree exp,
   if (GET_MODE (op1) == VOIDmode)
 	{
 	  imm = INTVAL (op1);
-	  if ((fcode == ARM_BUILTIN_WRORHI || fcode == ARM_BUILTIN_WRORWI
-	   || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW)
+	  if ((fcode == ARM_BUILTIN_WRORWI || fcode == ARM_BUILTIN_WRORW)
 	  && (imm < 0 || imm > 32))
 	{
-	  if (fcode == ARM_BUILTIN_WRORHI)
-		error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_rori_pi16%> in code");
-	  else if (fcode == ARM_BUILTIN_WRORWI)
-		error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_rori_pi32%> in code");
-	  else if (fcode == ARM_BUILTIN_WRORH)
-		error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_ror_pi16%> in code");
-	  else
-		error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_ror_pi32%> in code");
+	  const char *builtin = (fcode == ARM_BUILTIN_WRORWI
+ ? "_mm_rori_pi32" : "_mm_ror_pi32");
+	  error ("the range of count should be in 0 to 32; "
+		 "please check the intrinsic %qs in code", builtin);
+	}
+	  else if ((fcode == ARM_BUILTIN_WRORHI || fcode == ARM_BUILTIN_WRORH)
+		   && (imm < 0 || imm > 16))
+	{
+	  const char *builtin = (fcode == ARM_BUILTIN_WRORHI
+ ? "_mm_rori_pi16" : "_mm_ror_pi16");
+	  error ("the range of count should be in 0 to 16; "
+		 "please check the intrinsic %qs in code", builtin);
 	}
 	  else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD)
 		   && (imm < 0 || imm > 64))
 	{
-	  if (fcode == ARM_BUILTIN_WRORDI)
-		error ("the range of count should be in 0 to 64; please check the intrinsic %<_mm_rori_si64%> in code");
-	  else
-		error ("the range of count should be in 0 to 64; please check the intrinsic %<_mm_ror_si64%> in code");
+	  const char *builtin = (fcode == ARM_BUILTIN_WRORDI
+ ? "_mm_rori_si64" : "_mm_ror_si64");
+	  error ("the range of count should be in 0 to 64; "
+		 "please check the intrinsic %qs in code", builtin);
 	}
 	  else if (imm < 0)
 	{
-	  if (fcode == ARM_BUILTIN_WSRLHI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_pi16%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRLWI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_pi32%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRLDI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_si64%> in code");
-	  else if (fcode == ARM_BUILTIN_WSLLHI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_pi16%> in code");
-	  else if (fcode == ARM_BUILTIN_WSLLWI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_pi32%> in code");
-	  else if (fcode == ARM_BUILTIN_WSLLDI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_si64%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRAHI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_pi16%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRAWI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_pi32%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRADI)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_si64%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRLH)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_pi16%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRLW)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_pi32%> in code");
-	  else if (fcode == ARM_BUILTIN_WSRLD)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_si64%> in code");
-	  else if (fcode == ARM_BUILTIN_WSLLH)
-		error ("the count should be no less than 0; please check the intrinsic %<_mm_sll_pi16%> in code");
-	  else if (fcode == ARM_BUILTIN_WSLLW)
-		error ("the count should be no less than 0; please check the intrinsic %

Re: [PATCH][ARM] translation: reuse string and use switch for codes

2022-03-07 Thread Christophe Lyon via Gcc-patches


Hi!

On 3/7/22 11:09, Martin Liška wrote:

Hi.

The patch simplifies translation strings.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
This is an arm-only patch, it's not clear if you built a cross-compiler 
for arm, and that these regression tests passed?




Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

 * config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error
 message.


I think this is about PR target/104794, so this should be mentioned?

Thanks,

Christophe



---
  gcc/config/arm/arm-builtins.cc | 127 -
  1 file changed, 79 insertions(+), 48 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc 
b/gcc/config/arm/arm-builtins.cc

index a7acc1d71e7..bdccba2bc88 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -3926,61 +3926,92 @@ arm_expand_builtin (tree exp,
     || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW)
    && (imm < 0 || imm > 32))
  {
-  if (fcode == ARM_BUILTIN_WRORHI)
-    error ("the range of count should be in 0 to 32; please check 
the intrinsic %<_mm_rori_pi16%> in code");

-  else if (fcode == ARM_BUILTIN_WRORWI)
-    error ("the range of count should be in 0 to 32; please check 
the intrinsic %<_mm_rori_pi32%> in code");

-  else if (fcode == ARM_BUILTIN_WRORH)
-    error ("the range of count should be in 0 to 32; please check 
the intrinsic %<_mm_ror_pi16%> in code");

-  else
-    error ("the range of count should be in 0 to 32; please check 
the intrinsic %<_mm_ror_pi32%> in code");

+  const char *builtin;
+  switch (fcode)
+    {
+    case ARM_BUILTIN_WRORHI:
+  builtin = "_mm_rori_pi16";
+  break;
+    case ARM_BUILTIN_WRORWI:
+  builtin = "_mm_rori_pi32";
+  break;
+    case ARM_BUILTIN_WRORH:
+  builtin = "_mm_ror_pi16";
+  break;
+    default:
+  builtin = "_mm_ror_pi32";
+  break;
+    }
+  error ("the range of count should be in 0 to 32; please check 
the intrinsic %qs in code", builtin);

  }
    else if ((fcode == ARM_BUILTIN_WRORDI || fcode == 
ARM_BUILTIN_WRORD)

     && (imm < 0 || imm > 64))
  {
-  if (fcode == ARM_BUILTIN_WRORDI)
-    error ("the range of count should be in 0 to 64; please check 
the intrinsic %<_mm_rori_si64%> in code");

-  else
-    error ("the range of count should be in 0 to 64; please check 
the intrinsic %<_mm_ror_si64%> in code");
+  const char *builtin = fcode == ARM_BUILTIN_WRORDI ? 
"_mm_rori_si64" : "_mm_ror_si64";
+  error ("the range of count should be in 0 to 64; please check 
the intrinsic %qs in code", builtin);

  }
    else if (imm < 0)
  {
-  if (fcode == ARM_BUILTIN_WSRLHI)
-    error ("the count should be no less than 0; please check the 
intrinsic %<_mm_srli_pi16%> in code");

-  else if (fcode == ARM_BUILTIN_WSRLWI)
-    error ("the count should be no less than 0; please check the 
intrinsic %<_mm_srli_pi32%> in code");

-  else if (fcode == ARM_BUILTIN_WSRLDI)
-    error ("the count should be no less than 0; please check the 
intrinsic %<_mm_srli_si64%> in code");

-  else if (fcode == ARM_BUILTIN_WSLLHI)
-    error ("the count should be no less than 0; please check the 
intrinsic %<_mm_slli_pi16%> in code");

-  else if (fcode == ARM_BUILTIN_WSLLWI)
-    error ("the count should be no less than 0; please check the 
intrinsic %<_mm_slli_pi32%> in code");

-  else if (fcode == ARM_BUILTIN_WSLLDI)
-    error ("the count should be no less than 0; please check the 
intrinsic %<_mm_slli_si64%> in code");

-  else if (fcode == ARM_BUILTIN_WSRAHI)
-    error ("the count should be no less than 0; please check the 
intrinsic %<_mm_srai_pi16%> in code");

-  else if (fcode == ARM_BUILTIN_WSRAWI)
-    error ("the count should be no less than 0; please check the 
intrinsic %<_mm_srai_pi32%> in code");

-  else if (fcode == ARM_BUILTIN_WSRADI)
-    error ("the count should be no less than 0; please check the 
intrinsic %<_mm_srai_si64%> in code");

-  else if (fcode == ARM_BUILTIN_WSRLH)
-    error ("the count should be no less than 0; please check the 
intrinsic %<_mm_srl_pi16%> in code");

-  else if (fcode == ARM_BUILTIN_WSRLW)
-    error ("the count should be no less than 0; please check the 
intrinsic %<_mm_srl_pi32%> in code");

-  else if (fcode == ARM_BUILTIN_WSRLD)
-    error ("the count should be no less than 0; please check the 
intrinsic %<_mm_srl_si64%> in code");

-  else if (fcode == ARM_BUILTIN_WSLLH)
-    error ("the count should be no less than 0; please check the 
intrinsic %<_mm_sll_pi16%> in code");

-  else if (fcode == ARM_BUILTIN_WSLLW)
-    error ("t

Re: [PATCH][ARM] translation: reuse string and use switch for codes

2022-03-07 Thread Martin Liška


On 3/7/22 11:18, Christophe Lyon via Gcc-patches wrote:

Hi!

On 3/7/22 11:09, Martin Liška wrote:

Hi.

The patch simplifies translation strings.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

This is an arm-only patch, it's not clear if you built a cross-compiler for 
arm, and that these regression tests passed?


Sorry, I meant I can built the cross compiler. If I see correctly there are no 
tests that
would check the error messages.





Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

 * config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error
 message.


I think this is about PR target/104794, so this should be mentioned?


Yep.

There's V4.

Is it ready for master?

Martin



Thanks,

Christophe



---
  gcc/config/arm/arm-builtins.cc | 127 -
  1 file changed, 79 insertions(+), 48 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index a7acc1d71e7..bdccba2bc88 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -3926,61 +3926,92 @@ arm_expand_builtin (tree exp,
 || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW)
    && (imm < 0 || imm > 32))
  {
-  if (fcode == ARM_BUILTIN_WRORHI)
-    error ("the range of count should be in 0 to 32; please check the intrinsic 
%<_mm_rori_pi16%> in code");
-  else if (fcode == ARM_BUILTIN_WRORWI)
-    error ("the range of count should be in 0 to 32; please check the intrinsic 
%<_mm_rori_pi32%> in code");
-  else if (fcode == ARM_BUILTIN_WRORH)
-    error ("the range of count should be in 0 to 32; please check the intrinsic 
%<_mm_ror_pi16%> in code");
-  else
-    error ("the range of count should be in 0 to 32; please check the intrinsic 
%<_mm_ror_pi32%> in code");
+  const char *builtin;
+  switch (fcode)
+    {
+    case ARM_BUILTIN_WRORHI:
+  builtin = "_mm_rori_pi16";
+  break;
+    case ARM_BUILTIN_WRORWI:
+  builtin = "_mm_rori_pi32";
+  break;
+    case ARM_BUILTIN_WRORH:
+  builtin = "_mm_ror_pi16";
+  break;
+    default:
+  builtin = "_mm_ror_pi32";
+  break;
+    }
+  error ("the range of count should be in 0 to 32; please check the 
intrinsic %qs in code", builtin);
  }
    else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD)
 && (imm < 0 || imm > 64))
  {
-  if (fcode == ARM_BUILTIN_WRORDI)
-    error ("the range of count should be in 0 to 64; please check the intrinsic 
%<_mm_rori_si64%> in code");
-  else
-    error ("the range of count should be in 0 to 64; please check the intrinsic 
%<_mm_ror_si64%> in code");
+  const char *builtin = fcode == ARM_BUILTIN_WRORDI ? "_mm_rori_si64" : 
"_mm_ror_si64";
+  error ("the range of count should be in 0 to 64; please check the 
intrinsic %qs in code", builtin);
  }
    else if (imm < 0)
  {
-  if (fcode == ARM_BUILTIN_WSRLHI)
-    error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srli_pi16%> in code");
-  else if (fcode == ARM_BUILTIN_WSRLWI)
-    error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srli_pi32%> in code");
-  else if (fcode == ARM_BUILTIN_WSRLDI)
-    error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srli_si64%> in code");
-  else if (fcode == ARM_BUILTIN_WSLLHI)
-    error ("the count should be no less than 0; please check the intrinsic 
%<_mm_slli_pi16%> in code");
-  else if (fcode == ARM_BUILTIN_WSLLWI)
-    error ("the count should be no less than 0; please check the intrinsic 
%<_mm_slli_pi32%> in code");
-  else if (fcode == ARM_BUILTIN_WSLLDI)
-    error ("the count should be no less than 0; please check the intrinsic 
%<_mm_slli_si64%> in code");
-  else if (fcode == ARM_BUILTIN_WSRAHI)
-    error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srai_pi16%> in code");
-  else if (fcode == ARM_BUILTIN_WSRAWI)
-    error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srai_pi32%> in code");
-  else if (fcode == ARM_BUILTIN_WSRADI)
-    error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srai_si64%> in code");
-  else if (fcode == ARM_BUILTIN_WSRLH)
-    error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srl_pi16%> in code");
-  else if (fcode == ARM_BUILTIN_WSRLW)
-    error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srl_pi32%> in code");
-  else if (fcode == ARM_BUILTIN_WSRLD)
-    error ("the count should be no less than 0; please check the intrinsic 
%<_mm_srl_si64%> in code");
-

[PATCH] arm: fix option quoting in error messages.

2022-03-07 Thread Martin Liška


This fixes option quoting in error messages.

Ready to be installed?
Thanks,
Martin

PR target/104794

gcc/ChangeLog:

* config/arm/arm.cc (arm_option_override_internal): Fix quoting
of options in error messages.
(arm_option_reconfigure_globals): Likewise.
---
 gcc/config/arm/arm.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index c1103d9fad6..4bf1ef0517d 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -3185,8 +3185,8 @@ arm_option_override_internal (struct gcc_options *opts,
   if (arm_stack_protector_guard == SSP_GLOBAL
   && opts->x_arm_stack_protector_guard_offset_str)
 {
-  error ("incompatible options %'-mstack-protector-guard=global%' and"
-"%'-mstack-protector-guard-offset=%qs%'",
+  error ("incompatible options %<-mstack-protector-guard=global%> and"
+"%<-mstack-protector-guard-offset=%s%>",
 arm_stack_protector_guard_offset_str);
 }
 
@@ -3880,7 +3880,7 @@ arm_option_reconfigure_globals (void)

 }
 
   if (!TARGET_HARD_TP && arm_stack_protector_guard == SSP_TLSREG)

-error("%'-mstack-protector-guard=tls%' needs a hardware TLS register");
+error("%<-mstack-protector-guard=tls%> needs a hardware TLS register");
 }
 
 /* Perform some validation between the desired architecture and the rest of the

--
2.35.1

RE: [PATCH][ARM] translation: reuse string and use switch for codes

2022-03-07 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Martin Liška
> Sent: Monday, March 7, 2022 10:22 AM
> To: Christophe Lyon ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH][ARM] translation: reuse string and use switch for codes
> 
> On 3/7/22 11:18, Christophe Lyon via Gcc-patches wrote:
> > Hi!
> >
> > On 3/7/22 11:09, Martin Liška wrote:
> >> Hi.
> >>
> >> The patch simplifies translation strings.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> > This is an arm-only patch, it's not clear if you built a cross-compiler for 
> > arm,
> and that these regression tests passed?
> 
> Sorry, I meant I can built the cross compiler. If I see correctly there are no
> tests that
> would check the error messages.
> 
> >
> >>
> >> Ready to be installed?
> >> Thanks,
> >> Martin
> >>
> >> gcc/ChangeLog:
> >>
> >>  * config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error
> >>  message.
> >
> > I think this is about PR target/104794, so this should be mentioned?
> 
> Yep.
> 
> There's V4.
> 
> Is it ready for master?

Ok.
Thanks,
Kyrill

> 
> Martin
> 
> >
> > Thanks,
> >
> > Christophe
> >
> >
> >> ---
> >>   gcc/config/arm/arm-builtins.cc | 127 -
> >>   1 file changed, 79 insertions(+), 48 deletions(-)
> >>
> >> diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-
> builtins.cc
> >> index a7acc1d71e7..bdccba2bc88 100644
> >> --- a/gcc/config/arm/arm-builtins.cc
> >> +++ b/gcc/config/arm/arm-builtins.cc
> >> @@ -3926,61 +3926,92 @@ arm_expand_builtin (tree exp,
> >>  || fcode == ARM_BUILTIN_WRORH || fcode ==
> ARM_BUILTIN_WRORW)
> >>     && (imm < 0 || imm > 32))
> >>   {
> >> -  if (fcode == ARM_BUILTIN_WRORHI)
> >> -    error ("the range of count should be in 0 to 32; please check the
> intrinsic %<_mm_rori_pi16%> in code");
> >> -  else if (fcode == ARM_BUILTIN_WRORWI)
> >> -    error ("the range of count should be in 0 to 32; please check the
> intrinsic %<_mm_rori_pi32%> in code");
> >> -  else if (fcode == ARM_BUILTIN_WRORH)
> >> -    error ("the range of count should be in 0 to 32; please check the
> intrinsic %<_mm_ror_pi16%> in code");
> >> -  else
> >> -    error ("the range of count should be in 0 to 32; please check the
> intrinsic %<_mm_ror_pi32%> in code");
> >> +  const char *builtin;
> >> +  switch (fcode)
> >> +    {
> >> +    case ARM_BUILTIN_WRORHI:
> >> +  builtin = "_mm_rori_pi16";
> >> +  break;
> >> +    case ARM_BUILTIN_WRORWI:
> >> +  builtin = "_mm_rori_pi32";
> >> +  break;
> >> +    case ARM_BUILTIN_WRORH:
> >> +  builtin = "_mm_ror_pi16";
> >> +  break;
> >> +    default:
> >> +  builtin = "_mm_ror_pi32";
> >> +  break;
> >> +    }
> >> +  error ("the range of count should be in 0 to 32; please check 
> >> the
> intrinsic %qs in code", builtin);
> >>   }
> >>     else if ((fcode == ARM_BUILTIN_WRORDI || fcode ==
> ARM_BUILTIN_WRORD)
> >>  && (imm < 0 || imm > 64))
> >>   {
> >> -  if (fcode == ARM_BUILTIN_WRORDI)
> >> -    error ("the range of count should be in 0 to 64; please check the
> intrinsic %<_mm_rori_si64%> in code");
> >> -  else
> >> -    error ("the range of count should be in 0 to 64; please check the
> intrinsic %<_mm_ror_si64%> in code");
> >> +  const char *builtin = fcode == ARM_BUILTIN_WRORDI ?
> "_mm_rori_si64" : "_mm_ror_si64";
> >> +  error ("the range of count should be in 0 to 64; please check 
> >> the
> intrinsic %qs in code", builtin);
> >>   }
> >>     else if (imm < 0)
> >>   {
> >> -  if (fcode == ARM_BUILTIN_WSRLHI)
> >> -    error ("the count should be no less than 0; please check the 
> >> intrinsic
> %<_mm_srli_pi16%> in code");
> >> -  else if (fcode == ARM_BUILTIN_WSRLWI)
> >> -    error ("the count should be no less than 0; please check the 
> >> intrinsic
> %<_mm_srli_pi32%> in code");
> >> -  else if (fcode == ARM_BUILTIN_WSRLDI)
> >> -    error ("the count should be no less than 0; please check the 
> >> intrinsic
> %<_mm_srli_si64%> in code");
> >> -  else if (fcode == ARM_BUILTIN_WSLLHI)
> >> -    error ("the count should be no less than 0; please check the 
> >> intrinsic
> %<_mm_slli_pi16%> in code");
> >> -  else if (fcode == ARM_BUILTIN_WSLLWI)
> >> -    error ("the count should be no less than 0; please check the 
> >> intrinsic
> %<_mm_slli_pi32%> in code");
> >> -  else if (fcode == ARM_BUILTIN_WSLLDI)
> >> -    error ("the count should be no less than 0; please check the 
> >> intrinsic
> %<_mm_slli_si64%> in code");
> >> -  else if (fcode == ARM_BUILTIN_WSRAHI)
> >> -    error ("the count should be no less than 0; please check the 
>

RE: [PATCH] arm: fix option quoting in error messages.

2022-03-07 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Martin Liška
> Sent: Monday, March 7, 2022 10:28 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Christophe Lyon 
> Subject: [PATCH] arm: fix option quoting in error messages.
> 
> This fixes option quoting in error messages.
> 
> Ready to be installed?

Ok.
Thanks,
Kyrill

> Thanks,
> Martin
> 
>   PR target/104794
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm.cc (arm_option_override_internal): Fix quoting
>   of options in error messages.
>   (arm_option_reconfigure_globals): Likewise.
> ---
>   gcc/config/arm/arm.cc | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index c1103d9fad6..4bf1ef0517d 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -3185,8 +3185,8 @@ arm_option_override_internal (struct gcc_options
> *opts,
> if (arm_stack_protector_guard == SSP_GLOBAL
> && opts->x_arm_stack_protector_guard_offset_str)
>   {
> -  error ("incompatible options %'-mstack-protector-guard=global%' and"
> -  "%'-mstack-protector-guard-offset=%qs%'",
> +  error ("incompatible options %<-mstack-protector-guard=global%> and"
> +  "%<-mstack-protector-guard-offset=%s%>",
>arm_stack_protector_guard_offset_str);
>   }
> 
> @@ -3880,7 +3880,7 @@ arm_option_reconfigure_globals (void)
>   }
> 
> if (!TARGET_HARD_TP && arm_stack_protector_guard == SSP_TLSREG)
> -error("%'-mstack-protector-guard=tls%' needs a hardware TLS register");
> +error("%<-mstack-protector-guard=tls%> needs a hardware TLS register");
>   }
> 
>   /* Perform some validation between the desired architecture and the rest
> of the
> --
> 2.35.1

[PATCH][pushed] MSP430: fix error message.

2022-03-07 Thread Martin Liška


PR target/104797

gcc/ChangeLog:

* config/msp430/msp430.cc (msp430_expand_delay_cycles): Remove
parenthesis from built-in name.
---
 gcc/config/msp430/msp430.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/msp430/msp430.cc b/gcc/config/msp430/msp430.cc
index eb219fd5e71..7a378ceac56 100644
--- a/gcc/config/msp430/msp430.cc
+++ b/gcc/config/msp430/msp430.cc
@@ -2744,7 +2744,7 @@ msp430_expand_delay_cycles (rtx arg)
 
   if (GET_CODE (arg) != CONST_INT)

 {
-  error ("%<__delay_cycles()%> only takes constant arguments");
+  error ("%<__delay_cycles%> only takes constant arguments");
   return NULL_RTX;
 }
 
--

2.35.1

[x86 PATCH] PR tree-optimization/98335: New peephole2 xorl; movb -> movzbl

2022-03-07 Thread Roger Sayle


This patch is the backend piece of my proposed fix to PR tree-opt/98335,
to allow C++ partial struct initialization to be as efficient/optimized
as full struct initialization.

With the middle-end patch just posted to gcc-patches, the test case
in the PR compiles on x86_64-pc-linux-gnu with -O2 to:

xorl%eax, %eax
movbc(%rip), %al
ret

with this additional peephole2 (actually pair of peephole2s):

movzbl  c(%rip), %eax
ret


This patch has been tested on x86_64-pc-linux-gnu, on top of the
middle-end piece, with make bootstrap and make -k check with no
new failures.  Posted in pieces to simplify review.  Ok for mainline?


2022-03-07  Roger Sayle  

gcc/ChangeLog
PR tree-optimization/98335
* config/i386/i386.md (peephole2): Transform xorl followed by
a suitable movb or movw into the equivalent movz[bw]l.

gcc/testsuite/ChangeLog
PR tree-optimization/98335
* g++.target/i386/pr98335.C: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d15170e..f1eb62a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -4276,6 +4276,26 @@
   [(set_attr "isa" "*,avx512dq,avx512dq")
(set_attr "type" "imovx,mskmov,mskmov")
(set_attr "mode" "SI,QI,QI")])
+
+;; Transform xorl; mov[bw] (set strict_low_part) into movz[bw]l.
+(define_peephole2
+  [(parallel [(set (match_operand:DI 0 "general_reg_operand")
+  (const_int 0))
+ (clobber (reg:CC FLAGS_REG))])
+   (set (strict_low_part (match_operand:SWI12 1 "general_reg_operand"))
+   (match_operand:SWI12 2 "nonimmediate_operand"))]
+  "TARGET_64BIT
+   && REGNO (operands[0]) == REGNO (operands[1])"
+  [(set (match_dup 0) (zero_extend:DI (match_dup 2)))])
+
+;; Likewise, but preserving FLAGS_REG.
+(define_peephole2
+  [(set (match_operand:DI 0 "general_reg_operand") (const_int 0))
+   (set (strict_low_part (match_operand:SWI12 1 "general_reg_operand"))
+   (match_operand:SWI12 2 "nonimmediate_operand"))]
+  "TARGET_64BIT
+   && REGNO (operands[0]) == REGNO (operands[1])"
+  [(set (match_dup 0) (zero_extend:DI (match_dup 2)))])
 
 ;; Sign extension instructions
 
diff --git a/gcc/testsuite/g++.target/i386/pr98335.C 
b/gcc/testsuite/g++.target/i386/pr98335.C
new file mode 100644
index 000..2581b83
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr98335.C
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+struct Data {
+  char a;
+  int b;
+};
+
+char c;
+
+Data val(int idx) {
+  return { c };  // { dg-warning "extended initializer" "c++ 98"  { target { 
c++98_only } } }
+}
+
+/* { dg-final { scan-assembler "movzbl" } } */
+/* { dg-final { scan-assembler-not "xorl" } } */
+/* { dg-final { scan-assembler-not "movb" } } */
+

Re: [x86 PATCH] PR tree-optimization/98335: New peephole2 xorl;movb -> movzbl

2022-03-07 Thread Uros Bizjak via Gcc-patches

On Mon, Mar 7, 2022 at 11:55 AM Roger Sayle  wrote:
>
>
> This patch is the backend piece of my proposed fix to PR tree-opt/98335,
> to allow C++ partial struct initialization to be as efficient/optimized
> as full struct initialization.
>
> With the middle-end patch just posted to gcc-patches, the test case
> in the PR compiles on x86_64-pc-linux-gnu with -O2 to:
>
> xorl%eax, %eax
> movbc(%rip), %al
> ret
>
> with this additional peephole2 (actually pair of peephole2s):
>
> movzbl  c(%rip), %eax
> ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu, on top of the
> middle-end piece, with make bootstrap and make -k check with no
> new failures.  Posted in pieces to simplify review.  Ok for mainline?

Is there a reason that only inserts to DImode registers are
implemented? IMO, these peepholes should also handle inserts to
SImode.

Uros.

>
> 2022-03-07  Roger Sayle  
>
> gcc/ChangeLog
> PR tree-optimization/98335
> * config/i386/i386.md (peephole2): Transform xorl followed by
> a suitable movb or movw into the equivalent movz[bw]l.
>
> gcc/testsuite/ChangeLog
> PR tree-optimization/98335
> * g++.target/i386/pr98335.C: New test case.
>
>
> Thanks in advance,
> Roger
> --
>

Re: [PATCH V2] [i386] Optimize v4si broadcast for noavx512vl.

2022-03-07 Thread Uros Bizjak via Gcc-patches

On Mon, Mar 7, 2022 at 6:11 AM liuhongt  wrote:
>
> >What happens if you set preferred_for_speed to false for alternative 1?
> It works, and I've removed the newly added splitter in this patch.
> Also i tried to do similar things to *vec_dup with mode iterator 
> AVX2_VEC_DUP_MODE, but it hit ICE during reload since x86 don't have direct 
> move for QImode from gpr to sse register. so in this patch i only handle 
> *vec_dupv4si.
>
> >> +(define_split
> >> +  [(set (match_operand:V4SI 0 "sse_reg_operand")
> >> +   (vec_duplicate:V4SI
> >> + (match_operand:SI 1 "general_reg_operand")))]
> >> +  "TARGET_SSE && reload_completed
> >> +   /* Disable this splitter if avx512vl_vec_dup_gprv4si insn is
> >> +  available, because then we can broadcast from GPRs directly.  */
>
> >I think avx512vl_vec_dup_gprv4si should be merged with the above
> >pattern instead.
> Remove this splitter.
>
> This will enable below
>
> -   vbroadcastss.LC1(%rip), %xmm0
> +   movl$-45, %edx
> +   vmovd   %edx, %xmm0
> +   vpshufd $0, %xmm0, %xmm0
>
> According to microbenchmark, it's faster than broadcast from memory
> for TARGET_INTER_UNIT_MOVES_TO_VEC.
>
> gcc/ChangeLog:
>
> * config/i386/sse.md (*vec_dupv4si): Disable memory operand
> for !TARGET_INTER_UNIT_MOVES_TO_VEC when prefer_for_speed.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr100865-8a.c: Adjust testcase.
> * gcc.target/i386/pr100865-8c.c: Ditto.
> * gcc.target/i386/pr100865-9c.c: Ditto.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/sse.md  | 7 ++-
>  gcc/testsuite/gcc.target/i386/pr100865-8a.c | 2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-8c.c | 2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-9c.c | 2 +-
>  4 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 3066ea3734a..a091853065e 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -25134,7 +25134,12 @@ (define_insn "*vec_dupv4si"
> (set_attr "length_immediate" "1,0,1")
> (set_attr "prefix_extra" "0,1,*")
> (set_attr "prefix" "maybe_vex,maybe_evex,orig")
> -   (set_attr "mode" "TI,V4SF,V4SF")])
> +   (set_attr "mode" "TI,V4SF,V4SF")
> +   (set (attr "preferred_for_speed")
> + (cond [(eq_attr "alternative" "1")
> + (symbol_ref "!TARGET_INTER_UNIT_MOVES_TO_VEC")
> +  ]
> +  (symbol_ref "true")))])
>
>  (define_insn "*vec_dupv2di"
>[(set (match_operand:V2DI 0 "register_operand" "=x,v,v,x")
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8a.c 
> b/gcc/testsuite/gcc.target/i386/pr100865-8a.c
> index 911b14d4a25..544a14db6f7 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-8a.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-8a.c
> @@ -20,5 +20,5 @@ foo (void)
>  array[i] = MK_CONST128_BROADCAST_SIGNED (-45);
>  }
>
> -/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t 
> \]+\[^\n\]*, %xmm\[0-9\]+" 1 { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t 
> \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */
>  /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } 
> */
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8c.c 
> b/gcc/testsuite/gcc.target/i386/pr100865-8c.c
> index 00682edb8c9..efee0488614 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-8c.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-8c.c
> @@ -3,5 +3,5 @@
>
>  #include "pr100865-8a.c"
>
> -/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, 
> %xmm\[0-9\]+" 1 { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, 
> %xmm\[0-9\]+" 1 } } */
>  /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } 
> */
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9c.c 
> b/gcc/testsuite/gcc.target/i386/pr100865-9c.c
> index 8ffcdc1629d..e6f25902c1d 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-9c.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-9c.c
> @@ -3,5 +3,5 @@
>
>  #include "pr100865-9a.c"
>
> -/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, 
> %xmm\[0-9\]+" 1 { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, 
> %xmm\[0-9\]+" 1 } } */
>  /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } 
> */
> --
> 2.18.1
>

[PATCH] tree-optimization/104782 - adjust PR101636 fix

2022-03-07 Thread Richard Biener via Gcc-patches

This reverts the reversion of r10-5979 amending the CTOR case
with a comment as to why the conversion is not necessary there.
It also adds a testcase (but not for the CTOR case).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-03-07  Richard Biener  

PR tree-optimization/104782
* tree-vect-slp.cc (vectorize_slp_instance_root_stmt):
Re-instantiate r10-5979 fix, add comment.

* gcc.dg/vect/pr104782.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr104782.c | 18 ++
 gcc/tree-vect-slp.cc |  7 +++
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr104782.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr104782.c 
b/gcc/testsuite/gcc.dg/vect/pr104782.c
new file mode 100644
index 000..7b8ca6ca25b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr104782.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+/* { dg-additional-options "-march=armv8.2-a+sve -msve-vector-bits=128" { 
target aarch64-*-* } } */
+
+int a, b, c;
+static int d;
+short *q;
+void f() {
+  int *p = &d;
+  b = 9;
+  for (b = 9; b; b--) {
+a = 2;
+for (c = 2; c <= 9; c++) {
+  for (int i = 0; i < 3; i++)
+*p |= (*q)++;
+}
+  }
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index f9bbc872a99..4ac2b70303c 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -7382,6 +7382,10 @@ vectorize_slp_instance_root_stmt (slp_tree node, 
slp_instance instance)
  gimple *child_stmt = SLP_TREE_VEC_STMTS (node)[0];
  tree vect_lhs = gimple_get_lhs (child_stmt);
  tree root_lhs = gimple_get_lhs (instance->root_stmts[0]->stmt);
+ if (!useless_type_conversion_p (TREE_TYPE (root_lhs),
+ TREE_TYPE (vect_lhs)))
+   vect_lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (root_lhs),
+  vect_lhs);
  rstmt = gimple_build_assign (root_lhs, vect_lhs);
}
   else if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) > 1)
@@ -7392,6 +7396,9 @@ vectorize_slp_instance_root_stmt (slp_tree node, 
slp_instance instance)
  vec *v;
  vec_alloc (v, nelts);
 
+ /* A CTOR can handle V16HI composition from VNx8HI so we
+do not need to convert vector elements if the types
+do not match.  */
  FOR_EACH_VEC_ELT (SLP_TREE_VEC_STMTS (node), j, child_stmt)
CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
gimple_get_lhs (child_stmt));
-- 
2.34.1

RE: [x86 PATCH] PR tree-optimization/98335: New peephole2 xorl; movb -> movzbl

2022-03-07 Thread Roger Sayle



Hi Uros,

> Is there a reason that only inserts to DImode registers are implemented?
> IMO, these peepholes should also handle inserts to SImode.

I wasn't able to construct a test case that produced a byte or word insert
into an SImode register.  The front-ends and middle-end end up producing
different code sequences, and -m32 changes the ABI so that small structs
get passed in memory rather than in registers.

Here's the expanded testcase that I investigated:

struct DataCL { char a; int b; };
struct DataWL { short a; int b; };
struct DataIL { int a; int b; };

struct DataCI { char a; short b; };
struct DataWI { short a; short b; };

char c;
short w;
int i;

DataCL foo_cl(int idx) { return { c }; }
DataCL bar_cl(int idx) { return { c, 0 }; }
DataWL foo_wl(int idx) { return { w }; }
DataWL bar_wl(int idx) { return { w, 0 }; }
DataIL foo_il(int idx) { return { i }; }
DataIL bar_il(int idx) { return { i, 0 }; }

DataCI foo_ci(int idx) { return { c }; }
DataCI bar_ci(int idx) { return { c, 0 }; }
DataWI foo_wi(int idx) { return { w }; }
DataWI bar_wi(int idx) { return { w, 0 }; }


I agree that for completeness similar peepholes handling inserts into 
SImode would be a good thing, but these wouldn't be restricted by
TARGET_64BIT and would therefore require additional -m32 testing.
The DImode peepholes I can justify for stage4 as PR tree-opt/98335
is a regression, SImode peepholes would be more of a "leap of faith".
If you’d be willing to accept a patch without a testcase, let me know.

It's also a pity that subreg handling in combine doesn't allow merging
these inserts into zero registers to be combined to zero_extends in a
machine independent way.  My recent patch for PR 95126 (awaiting
review) should also allow front-ends and middle-end passes more
flexibility in optimizing small struct constructors. 

Thanks (as always) for reviewing patches so quickly.

Roger
--

Re: [x86 PATCH] PR tree-optimization/98335: New peephole2 xorl; movb -> movzbl

2022-03-07 Thread Jakub Jelinek via Gcc-patches

On Mon, Mar 07, 2022 at 11:51:50AM -, Roger Sayle wrote:
> I agree that for completeness similar peepholes handling inserts into 
> SImode would be a good thing, but these wouldn't be restricted by
> TARGET_64BIT and would therefore require additional -m32 testing.
> The DImode peepholes I can justify for stage4 as PR tree-opt/98335
> is a regression, SImode peepholes would be more of a "leap of faith".
> If you’d be willing to accept a patch without a testcase, let me know.

For testcase, wouldn't something like:
struct Data { char a; short b; };

char c;

void val(void) {
  __asm__ __volatile__ ("" : : "r" ((struct Data) { c } ));
}
do it?

Jakub

[C++ PATCH] PR c++/96329: ICE-on-invalid-code error recovery.

2022-03-07 Thread Roger Sayle


This patch fixes PR c++/96329 which is an ICE-on-invalid-code regression
affecting mainline.

This patch has been tested on x86_64-pc-linux-gnu, enabling languages
c and c++, with make bootstrap and make -k check with no new failures.
Ok for mainline?

2022-03-07  Roger Sayle  

gcc/cp/ChangeLog
PR c++/96329
* parser.cc (cp_parser_linkage_specification): Treat the case where
linkage is error_mark_node as "invalid linkage-specification".

gcc/testsuite/ChangeLog
PR c++/96329
* g++.dg/pr96329.C: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 03d99ab..d2993c7 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -16148,8 +16148,9 @@ cp_parser_linkage_specification (cp_parser* parser, 
tree prefix_attr)
   /* Transform the literal into an identifier.  If the literal is a
  wide-character string, or contains embedded NULs, then we can't
  handle it as the user wants.  */
-  if (strlen (TREE_STRING_POINTER (linkage))
-  != (size_t) (TREE_STRING_LENGTH (linkage) - 1))
+  if (linkage == error_mark_node
+  || strlen (TREE_STRING_POINTER (linkage))
+!= (size_t) (TREE_STRING_LENGTH (linkage) - 1))
 {
   cp_parser_error (parser, "invalid linkage-specification");
   /* Assume C++ linkage.  */
diff --git a/gcc/testsuite/g++.dg/pr96329.C b/gcc/testsuite/g++.dg/pr96329.C
new file mode 100644
index 000..c9f44b7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr96329.C
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-excess-errors "" } */
+
+extern "C" ""a

[C++ PATCH] PR c++/96437: ICE-on-invalid-code error recovery.

2022-03-07 Thread Roger Sayle


This patch fixes PR c++/96437 which is an ICE-on-invalid-code regression
affecting mainline.

This patch has been tested on x86_64-pc-linux-gnu, enabling languages
c and c++, with make bootstrap and make -k check with no new failures.
Ok for mainline?

2022-03-07  Roger Sayle  

gcc/cp/ChangeLog
PR c++/96437
* parser.cc (synthesize_implicit_template_parm): Check that
TREE_VALUE (new_parm) isn't error_mark_node before setting its
DECL_VIRTUAL_P.

gcc/testsuite/ChangeLog
PR c++/96437
* g++.dg/pr96437.C: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 03d99ab..992f839 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -48217,7 +48217,8 @@ synthesize_implicit_template_parm  (cp_parser *parser, 
tree constr)
  function template is equivalent to an explicit template.
 
  Note that DECL_ARTIFICIAL is used elsewhere for template parameters.  */
-  DECL_VIRTUAL_P (TREE_VALUE (new_parm)) = true;
+  if (TREE_VALUE (new_parm) != error_mark_node)
+DECL_VIRTUAL_P (TREE_VALUE (new_parm)) = true;
 
   // Chain the new parameter to the list of implicit parameters.
   if (parser->implicit_template_parms)
diff --git a/gcc/testsuite/g++.dg/pr96437.C b/gcc/testsuite/g++.dg/pr96437.C
new file mode 100644
index 000..00c4141
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr96437.C
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-excess-errors "" } */
+
+template <()> void A(auto){}

[C++ PATCH] PR c++/96440: ICE-on-invalid-code error recovery.

2022-03-07 Thread Roger Sayle


This patch fixes PR c++/96440 which is an ICE-on-invalid-code regression
affecting mainline.

This patch has been tested on x86_64-pc-linux-gnu, enabling languages
c and c++, with make bootstrap and make -k check with no new failures.
Ok for mainline?

2022-03-07  Roger Sayle  

gcc/cp/ChangeLog
PR c++/96440
* decl.cc (start_decl): Defend against prefix_attributes being
error_mark_node.

gcc/testsuite/ChangeLog
PR c++/96440
* g++.dg/pr96440.C: New test case.

Roger
--

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 7f80f9d..de41b4d 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -5483,13 +5483,15 @@ start_decl (const cp_declarator *declarator,
 
   *pushed_scope_p = NULL_TREE;
 
-  attributes = chainon (attributes, prefix_attributes);
+  if (prefix_attributes != error_mark_node)
+attributes = chainon (attributes, prefix_attributes);
 
   decl = grokdeclarator (declarator, declspecs, NORMAL, initialized,
 &attributes);
 
   if (decl == NULL_TREE || VOID_TYPE_P (decl)
-  || decl == error_mark_node)
+  || decl == error_mark_node
+  || prefix_attributes == error_mark_node)
 return error_mark_node;
 
   context = CP_DECL_CONTEXT (decl);
diff --git a/gcc/testsuite/g++.dg/pr96440.C b/gcc/testsuite/g++.dg/pr96440.C
new file mode 100644
index 000..212271f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr96440.C
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-excess-errors "" } */
+
+auto alignas a [[]] [[const]] ();

Re: [PATCH] arm: fix option quoting in error messages.

2022-03-07 Thread Richard Earnshaw via Gcc-patches





On 07/03/2022 10:27, Martin Liška wrote:

This fixes option quoting in error messages.

Ready to be installed?
Thanks,
Martin

 PR target/104794

gcc/ChangeLog:

 * config/arm/arm.cc (arm_option_override_internal): Fix quoting
 of options in error messages.
 (arm_option_reconfigure_globals): Likewise.
---
  gcc/config/arm/arm.cc | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index c1103d9fad6..4bf1ef0517d 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -3185,8 +3185,8 @@ arm_option_override_internal (struct gcc_options 
*opts,

    if (arm_stack_protector_guard == SSP_GLOBAL
    && opts->x_arm_stack_protector_guard_offset_str)
  {
-  error ("incompatible options %'-mstack-protector-guard=global%' and"
- "%'-mstack-protector-guard-offset=%qs%'",
+  error ("incompatible options %<-mstack-protector-guard=global%> and"
+ "%<-mstack-protector-guard-offset=%s%>",


Wasn't the complaint also that there should be a space in the text 
between the 'and"' and the '"%<' on the subsequent line?


R.

   arm_stack_protector_guard_offset_str);
  }

@@ -3880,7 +3880,7 @@ arm_option_reconfigure_globals (void)
  }

    if (!TARGET_HARD_TP && arm_stack_protector_guard == SSP_TLSREG)
-    error("%'-mstack-protector-guard=tls%' needs a hardware TLS 
register");
+    error("%<-mstack-protector-guard=tls%> needs a hardware TLS 
register");

  }

  /* Perform some validation between the desired architecture and the 
rest of the

Re: [PATCH] arm: fix option quoting in error messages.

2022-03-07 Thread Martin Liška


On 3/7/22 14:14, Richard Earnshaw wrote:

Wasn't the complaint also that there should be a space in the text between the 'and"' 
and the '"%<' on the subsequent line?


Yeah, I overlooked that.

I'm going to push the following patch.

Cheers,
MartinFrom b1d8198e7df616ea80cb648a2c831e2c21f4319f Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 7 Mar 2022 14:16:21 +0100
Subject: [PATCH] arm: add missing space to error.

	PR target/104794

gcc/ChangeLog:

	* config/arm/arm.cc (arm_option_override_internal): Add missing
	space.
---
 gcc/config/arm/arm.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 4bf1ef0517d..14b6c804455 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -3185,7 +3185,7 @@ arm_option_override_internal (struct gcc_options *opts,
   if (arm_stack_protector_guard == SSP_GLOBAL
   && opts->x_arm_stack_protector_guard_offset_str)
 {
-  error ("incompatible options %<-mstack-protector-guard=global%> and"
+  error ("incompatible options %<-mstack-protector-guard=global%> and "
 	 "%<-mstack-protector-guard-offset=%s%>",
 	 arm_stack_protector_guard_offset_str);
 }
-- 
2.35.1

Re: [PATCH v2] Add TARGET_MOVE_WITH_MODE_P

2022-03-07 Thread Richard Biener via Gcc-patches

On Wed, Mar 2, 2022 at 10:18 PM H.J. Lu  wrote:
>
> On Wed, Mar 02, 2022 at 09:51:26AM +0100, Richard Biener wrote:
> > On Tue, Mar 1, 2022 at 11:41 PM H.J. Lu via Gcc-patches
> >  wrote:
> > >
> > > Add TARGET_FOLD_MEMCPY_MAX for the maximum number of bytes to fold memcpy.
> > > The default is
> > >
> > > MOVE_MAX * MOVE_RATIO (optimize_function_for_size_p (cfun))
> > >
> > > For x86, it is MOVE_MAX to restore the old behavior before
> >
> > I know we've discussed this to death in the PR, I just want to repeat here
> > that the GIMPLE folding expects to generate a single load and a single
> > store (that is what it does on the GIMPLE level) which is why MOVE_MAX
> > was chosen originally (it's documented to what a "single instruction" does).
> > In practice MOVE_MAX does not seem to cover vector register sizes
> > so Richard pulled MOVE_RATIO which is really intended to cover
> > the case of using multiple instructions for moving memory (but then I
> > don't remember whether for the ARM case the single load/store GIMPLE
> > will be expanded to multiple load/store instructions).
> >
> > TARGET_FOLD_MEMCPY_MAX sounds like a stop-gap solution,
> > being very specific for memcpy folding (we also fold memmove btw).
> >
> > There is also MOVE_MAX_PIECES which _might_ be more appropriate
> > than MOVE_MAX here and still honor the idea of single instructions.
> > Now neither arm nor aarch64 define this and it defaults to MOVE_MAX,
> > not MOVE_MAX * MOVE_RATIO.
> >
> > So if we need a new hook then that hook should at least get the
> > 'speed' argument of MOVE_RATIO and it should get a better name.
> >
> > I still think that it should be possible to improve the insn check to
> > avoid use of "disabled" modes, maybe that's also a point to add
> > a new hook like .move_with_mode_p or so?  To quote, we do
>
> Here is the v2 patch to add TARGET_MOVE_WITH_MODE_P.

Again I'd like to shine light on MOVE_MAX_PIECES which explicitely
mentions "a load or store used TO COPY MEMORY" (emphasis mine)
and whose x86 implementation would already be fine (doing larger moves
and also not doing too large moves).  But appearantly the arm folks
decided that that's not fit and instead (mis-?)used MOVE_MAX * MOVE_RATIO.

Yes, MOVE_MAX_PIECES is documented to apply to move_by_pieces.
Still GIMPLE memcpy/memmove inlining wants to mimic exactly that but
restrict itself to a single load and a single store.

> >
> >   scalar_int_mode mode;
> >   if (int_mode_for_size (ilen * 8, 0).exists (&mode)
> >   && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
> >   && have_insn_for (SET, mode)
> >   /* If the destination pointer is not aligned we must be 
> > able
> >  to emit an unaligned store.  */
> >   && (dest_align >= GET_MODE_ALIGNMENT (mode)
> >   || !targetm.slow_unaligned_access (mode, dest_align)
> >   || (optab_handler (movmisalign_optab, mode)
> >   != CODE_FOR_nothing)))
> >
> > where I understand the ISA is enabled and if the user explicitely
> > uses it that's OK but -mprefer-avx128 should tell GCC to never
> > generate AVX256 code where the user was not explicitely using it
> > (still for example glibc might happily use AVX256 code to implement
> > the memcpy we are folding!)
> >
> > Note the BB vectorizer also might end up with using AVX256 because
> > in places it also relies on optab queries and the vector_mode_supported_p
> > check (but the memcpy folding uses the fake integer modes).  So
> > x86 might need to implement the related_mode hook to avoid "auto"-using
> > a larger vector mode which the default implementation would happily do.
> >
> > Richard.
>
> OK for master?

Looking for opinions from others as well.

Btw, there's a similar use in expand_DEFERRED_INIT:

  && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
0).exists (&var_mode)
  && have_insn_for (SET, var_mode))

So it occured to me that maybe targetm.move_with_mode_p should eventually
check have_insn_for (SET, var_mode) or we should abstract checking the two
things to a generic API somewhere (in optabs-query.h maybe, or expmed.h,
not sure where it would be more appropriate).

> +@deftypefn {Target Hook} bool TARGET_MOVE_WITH_MODE_P (machine_mode 
> @var{mode})
> +This target hook returns true if move with mode @var{mode} can be
> +generated implicitly.  The default definition returns true.
> +@end deftypefn

I know what you mean but I'm not sure "can be generated implicitly" captures
that.  The compiler always generated stuff explicitely.  It's also
"should", not "can", no?

Thanks,
Richard.

> Thanks.
>
> H.J.
> ---
> Add TARGET_MOVE_WITH_MODE_P to return true if move with mode can be
> generated implicitly.  The default definition returns true.  The x86
> version returns true if the mode size <= MOVE_MAX, which is the max
> number o

Re: [PATCH] opts: fix -gtoggle + optimize attribute

2022-03-07 Thread Richard Biener via Gcc-patches

On Fri, Mar 4, 2022 at 2:12 PM Martin Liška  wrote:
>
> On 3/1/22 09:48, Richard Biener wrote:
> > I think moving flag_gtoggle handling before the flag_syntax_only handling
> > is a good thing.  But I don't quite understand the flag_var_tracking 
> > disabling
> > or how it worked before.
>
> Well, as you know, the debugging options are a can of worms. During GCC 12 
> development I moved
> most of the option logic to finish_options which is a place that is used both 
> for command line
> option processing and optimize/target pragma/attribute processing.
>
> That's why we see this problem. OPT_LEVELS_1_PLUS enables flag_var_tracking 
> but we have to drop
> debug debug_info_level == DINFO_LEVEL_NONE.
>
> > At least I think you want to check for
> > debug_info_level == NONE, no?  Why should DINFO_LEVEL_TERSE be
> > special?
>
> No, sending updated version of the patch.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK.

> Thanks,
> Martin

[PATCH v2] x86: Disable SSE on unwind-c.c and unwind-dw2.c

2022-03-07 Thread H.J. Lu via Gcc-patches

Since eh_return doesn't work with stack realignment, disable SSE on
unwind-c.c and unwind-dw2.c to avoid stack realignment with the 4-byte
incoming stack to avoid SSE usage which is caused by

commit 609e8c492d62d92465460eae3d43dfc4b2c68288
Author: H.J. Lu 
Date:   Sat Feb 26 14:17:23 2022 -0800

x86: Always return pseudo register in ix86_gen_scratch_sse_rtx

when pseudo vector registers are used to expand memset.

gcc/

PR target/104781
* config/i386/i386.cc (ix86_expand_epilogue): Assert there is
no stack realignment for eh_return.

gcc/testsuite/

PR target/104781
* gcc.target/i386/eh_return-1.c: Add -mincoming-stack-boundary=4.
* gcc.target/i386/eh_return-2.c: Likewise.

libgcc/

PR target/104781
* config.host (tmake_file): Add i386/32/t-eh-return-no-sse for
32-bit x86 Cygwin, MinGW and Solaris.
* config/i386/32/t-eh-return-no-sse: New file.
---
 gcc/config/i386/i386.cc |  5 ++---
 gcc/testsuite/gcc.target/i386/eh_return-1.c |  2 +-
 gcc/testsuite/gcc.target/i386/eh_return-2.c |  2 +-
 libgcc/config.host  | 13 +
 libgcc/config/i386/32/t-eh-return-no-sse|  5 +
 5 files changed, 22 insertions(+), 5 deletions(-)
 create mode 100644 libgcc/config/i386/32/t-eh-return-no-sse

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 95219902694..1c675304e32 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -9444,9 +9444,8 @@ ix86_expand_epilogue (int style)
  rtx sa = EH_RETURN_STACKADJ_RTX;
  rtx_insn *insn;
 
- /* %ecx can't be used for both DRAP register and eh_return.  */
- if (crtl->drap_reg)
-   gcc_assert (REGNO (crtl->drap_reg) != CX_REG);
+ /* Stack realignment doesn't work with eh_return.  */
+ gcc_assert (!crtl->stack_realign_needed);
 
  /* regparm nested functions don't work with eh_return.  */
  gcc_assert (!ix86_static_chain_on_stack);
diff --git a/gcc/testsuite/gcc.target/i386/eh_return-1.c 
b/gcc/testsuite/gcc.target/i386/eh_return-1.c
index b21fd75fc93..43f94f01a97 100644
--- a/gcc/testsuite/gcc.target/i386/eh_return-1.c
+++ b/gcc/testsuite/gcc.target/i386/eh_return-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=haswell -mno-avx512f 
-mtune-ctrl=avx256_move_by_pieces" } */
+/* { dg-options "-O2 -mincoming-stack-boundary=4 -march=haswell -mno-avx512f 
-mtune-ctrl=avx256_move_by_pieces" } */
 
 struct _Unwind_Context
 {
diff --git a/gcc/testsuite/gcc.target/i386/eh_return-2.c 
b/gcc/testsuite/gcc.target/i386/eh_return-2.c
index f23f4492dac..cb762f92cc2 100644
--- a/gcc/testsuite/gcc.target/i386/eh_return-2.c
+++ b/gcc/testsuite/gcc.target/i386/eh_return-2.c
@@ -1,6 +1,6 @@
 /* PR target/101772  */
 /* { dg-do compile } */
-/* { dg-additional-options "-O0 -march=x86-64 -mstackrealign" } */
+/* { dg-additional-options "-O0 -mincoming-stack-boundary=4 -march=x86-64 
-mstackrealign" } */
 
 struct _Unwind_Context _Unwind_Resume_or_Rethrow_this_context;
 
diff --git a/libgcc/config.host b/libgcc/config.host
index 094fd3ad254..1588ab6cf20 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1583,3 +1583,16 @@ case ${host} in
   tmake_file="$tmake_file t-gthr-noweak"
   ;;
 esac
+
+case ${host} in
+i[34567]86-*-* | x86_64-*-*)
+  if test "${host_address}" = 32; then
+case ${host} in
+*-*-cygwin* | *-*-mingw* | *-*-solaris2*)
+  # Disable SSE on unwind-c.c and unwind-dw2.c to avoid stack
+  # realignment with the 4-byte aligned incoming stack.
+  tmake_file="${tmake_file} i386/${host_address}/t-eh-return-no-sse"
+  ;;
+esac
+  fi
+esac
diff --git a/libgcc/config/i386/32/t-eh-return-no-sse 
b/libgcc/config/i386/32/t-eh-return-no-sse
new file mode 100644
index 000..5a8c3135911
--- /dev/null
+++ b/libgcc/config/i386/32/t-eh-return-no-sse
@@ -0,0 +1,5 @@
+# Since eh_return doesn't work with stack realignment, disable SSE on
+# unwind-c.c and unwind-dw2.c to avoid stack realignment with the 4-byte
+# incoming stack.
+
+unwind-c.o unwind-c_s.o unwind-dw2.o unwind-dw2_s.o : HOST_LIBGCC2_CFLAGS += 
-mno-sse
-- 
2.35.1

Re: [PATCH] x86: Disable SSE on unwind-c.c and unwind-dw2.c

2022-03-07 Thread H.J. Lu via Gcc-patches

On Sun, Mar 6, 2022 at 10:07 AM Eric Botcazou  wrote:
>
> >   PR target/104781
> >   * config.host (tmake_file): Add i386/32/t-eh-return-no-sse for
> >   32-bit x86 Cygwin and Solaris.
> >   * config/i386/32/t-eh-return-no-sse: New file.
>
> What about MinGW here?

It should be included.  Here is the v2 patch:

https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591298.html

Thanks.

-- 
H.J.

[Patch] Fortran: Fix gfc_maybe_dereference_var [PR104430]

2022-03-07 Thread Tobias Burnus


The problem is that inside the main program,
  y = f(z)
where the the result of z is
  type(t) :: z(size(x%a))
and 'x' is a dummy argument.

'x' looses the attr.dummy in gfc_add_interface_mapping
and this leads to an additional indirect ref in
gfc_maybe_dereference_var - but after the first indirect
ref, TREE_TYPE is alread a RECORD_TYPE ...

The simple fix is to simply punt when the argument
is not a pointer or reference.

This patch additionally fixes a comment in
conv_parent_component_references.

Those parts are obvious. The only potentially risky
change is the comparison change from a name-wise comparison
to a pointer-based comparison. I think it is fine and the
testsuite did not find any issue, but you prefer, I can
also exclude it.

OK for mainline? How about GCC 10/11 backports?
(I think leaving out the last change, it should be very safe to do.)

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Fix gfc_maybe_dereference_var [PR104430]

	PR fortran/104430

gcc/fortran/ChangeLog:

	* trans-expr.cc (conv_parent_component_references): Fix comment;
	simplify comparison.
	(gfc_maybe_dereference_var): Avoid derefereing a nonpointer

gcc/testsuite/ChangeLog:

	* gfortran.dg/class_result_10.f90: New test.

 gcc/fortran/trans-expr.cc |  6 --
 gcc/testsuite/gfortran.dg/class_result_10.f90 | 25 +
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index eb6a78c3a62..e441952818b 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -2805,9 +2805,9 @@ conv_parent_component_references (gfc_se * se, gfc_ref * ref)
   dt = ref->u.c.sym;
   c = ref->u.c.component;
 
-  /* Return if the component is in the parent type.  */
+  /* Return if the component is in this type, i.e. not in the parent type.  */
   for (cmp = dt->components; cmp; cmp = cmp->next)
-if (strcmp (c->name, cmp->name) == 0)
+if (c == cmp)
   return;
 
   /* Build a gfc_ref to recursively call gfc_conv_component_ref.  */
@@ -2867,6 +2867,8 @@ tree
 gfc_maybe_dereference_var (gfc_symbol *sym, tree var, bool descriptor_only_p,
 			   bool is_classarray)
 {
+  if (!POINTER_TYPE_P (TREE_TYPE (var)))
+return var;
   if (is_CFI_desc (sym, NULL))
 return build_fold_indirect_ref_loc (input_location, var);
 
diff --git a/gcc/testsuite/gfortran.dg/class_result_10.f90 b/gcc/testsuite/gfortran.dg/class_result_10.f90
new file mode 100644
index 000..acd9a73ebda
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/class_result_10.f90
@@ -0,0 +1,25 @@
+! { dg-do run}
+!
+! PR fortran/104430
+!
+! Contributed by  G. Steinmetz 
+
+module m
+   type t
+  integer :: a
+   end type
+contains
+   function f(x) result(z)
+  class(t) :: x(:)
+  type(t) :: z(size(x%a))
+  z%a = 42
+   end
+end
+program p
+   use m
+   class(t), allocatable :: y(:), z(:)
+   allocate (y(32))
+   z = f(y)
+   if (size(z) /= 32) stop 1
+   if (any (z%a /= 42)) stop 2
+end

[committed] Fix up duplicated duplicated words in comments

2022-03-07 Thread Jakub Jelinek via Gcc-patches

Hi!

Like in r10-7215-g700d4cb08c88aec37c13e21e63dd61fd698baabc 2 years ago,
I've run
grep -v 'long long\|optab optab\|template template\|double double' *.{[chS],cc} 
*/*.{[chS],cc} *.def config/*/* 2>/dev/null | grep ' \([a-zA-Z]\+\) \1 '
and for the cases that looked clearly wrong changed them, mostly by removing
one of the duplicated words but in some cases with other changes.

Committed to trunk as obvious.

2022-03-07  Jakub Jelinek  

gcc/
* tree-ssa-propagate.cc: Fix up duplicated word issue in a comment.
* config/riscv/riscv.cc: Likewise.
* config/darwin.h: Likewise.
* config/i386/i386.cc: Likewise.
* config/aarch64/thunderx3t110.md: Likewise.
* config/aarch64/fractional-cost.h: Likewise.
* config/vax/vax.cc: Likewise.
* config/rs6000/pcrel-opt.md: Likewise.
* config/rs6000/predicates.md: Likewise.
* ctfc.h: Likewise.
* tree-ssa-uninit.cc: Likewise.
* value-relation.h: Likewise.
* gimple-range-gori.cc: Likewise.
* ipa-polymorphic-call.cc: Likewise.
* pointer-query.cc: Likewise.
* ipa-sra.cc: Likewise.
* internal-fn.cc: Likewise.
* varasm.cc: Likewise.
* gimple-ssa-warn-access.cc: Likewise.
gcc/analyzer/
* store.cc: Fix up duplicated word issue in a comment.
* analyzer.cc: Likewise.
* engine.cc: Likewise.
* sm-taint.cc: Likewise.
gcc/c-family/
* c-attribs.cc: Fix up duplicated word issue in a comment.
gcc/cp/
* cvt.cc: Fix up duplicated word issue in a comment.
* pt.cc: Likewise.
* module.cc: Likewise.
* coroutines.cc: Likewise.
gcc/fortran/
* trans-expr.cc: Fix up duplicated word issue in a comment.
* gfortran.h: Likewise.
* scanner.cc: Likewise.
gcc/jit/
* libgccjit.h: Fix up duplicated word issue in a comment.

--- gcc/tree-ssa-propagate.cc.jj2022-01-18 11:59:00.090974799 +0100
+++ gcc/tree-ssa-propagate.cc   2022-03-07 14:33:28.033829512 +0100
@@ -697,7 +697,7 @@ private:
 gimple_stmt_iterator new_gsi);
 };
 
-/* Call post_new_stmt for each each new statement that has been added
+/* Call post_new_stmt for each new statement that has been added
to the current BB.  OLD_GSI is the statement iterator before the BB
changes ocurred.  NEW_GSI is the iterator which may contain new
statements.  */
--- gcc/config/riscv/riscv.cc.jj2022-02-04 14:36:54.467612813 +0100
+++ gcc/config/riscv/riscv.cc   2022-03-07 14:50:54.717372413 +0100
@@ -4984,7 +4984,7 @@ riscv_option_override (void)
 target_flags |= MASK_FDIV;
 
   /* Handle -mtune, use -mcpu if -mtune is not given, and use default -mtune
- if -mtune and -mcpu both not not given.  */
+ if -mtune and -mcpu both not given.  */
   cpu = riscv_parse_tune (riscv_tune_string ? riscv_tune_string :
  (riscv_cpu_string ? riscv_cpu_string :
   RISCV_TUNE_STRING_DEFAULT));
--- gcc/config/darwin.h.jj  2022-01-18 11:58:59.078989257 +0100
+++ gcc/config/darwin.h 2022-03-07 14:36:18.924463533 +0100
@@ -340,7 +340,7 @@ extern GTY(()) int darwin_ms_struct;
 " %:version-compare(>= 10.6 mmacosx-version-min= -no_compact_unwind) "
 
 /* In Darwin linker specs we can put -lcrt0.o and ld will search the library
-   path for crt0.o or -lcrtx.a and it will search for for libcrtx.a.  As for
+   path for crt0.o or -lcrtx.a and it will search for libcrtx.a.  As for
other ports, we can also put xxx.{o,a}%s and get the appropriate complete
startfile absolute directory.  This latter point is important when we want
to override ld's rule of .dylib being found ahead of .a and the user wants
--- gcc/config/i386/i386.cc.jj  2022-03-04 09:35:58.674788325 +0100
+++ gcc/config/i386/i386.cc 2022-03-07 14:50:08.093016106 +0100
@@ -20334,7 +20334,7 @@ ix86_division_cost (const struct process
 
 /* Return cost of shift in MODE.
If CONSTANT_OP1 is true, the op1 value is known and set in OP1_VAL.
-   AND_IN_OP1 specify in op1 is result of and and SHIFT_AND_TRUNCATE
+   AND_IN_OP1 specify in op1 is result of AND and SHIFT_AND_TRUNCATE
if op1 is a result of subreg.
 
SKIP_OP0/1 is set to true if cost of OP0/1 should be ignored.  */
--- gcc/config/aarch64/thunderx3t110.md.jj  2022-01-11 23:11:21.68935 
+0100
+++ gcc/config/aarch64/thunderx3t110.md 2022-03-07 14:47:39.710064661 +0100
@@ -138,7 +138,7 @@ (define_insn_reservation "thunderx3t110_
logic_shift_imm,logics_shift_imm"))
   "thunderx3t110_i01")
 
-; we are going for the the optimistic answer (13)
+; we are going for the optimistic answer (13)
 ; for now, the worst case is 23
 (define_insn_reservation "thunderx3t110_div" 13
   (and (eq_attr "tune" "thunderx3t110")
--- gcc/config/aarch64/fractional-cost.h.jj 2022-01-11 23:11:21.687300033 
+0100
+++ gcc/config/aarch64/fractional-cost.h2022-03-07 14:47:16.8

[Patch] Fortran: Fix gfc_conv_gfc_desc_to_cfi_desc with NULL [PR104126]

2022-03-07 Thread Tobias Burnus


Pre-remark: Related NULL, there some accepts-invalid issues, not addressed in 
this
patch. See https://gcc.gnu.org/PR104819

This patch fixes an ICE (12 regression) with NULL() that has no MOLD argument.

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Fix gfc_conv_gfc_desc_to_cfi_desc with NULL [PR104126]

	PR fortran/104126

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_conv_gfc_desc_to_cfi_desc): Handle NULL
	without MOLD.

gcc/testsuite/ChangeLog:

	* gfortran.dg/null_actual_2.f90: New test.

 gcc/fortran/trans-expr.cc   | 13 +
 gcc/testsuite/gfortran.dg/null_actual_2.f90 | 16 
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index eb6a78c3a62..e8a78ccf4e1 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -5608,8 +5608,11 @@ gfc_conv_gfc_desc_to_cfi_desc (gfc_se *parmse, gfc_expr *e, gfc_symbol *fsym)
 itype = (e->ts.u.derived->intmod_sym_id == ISOCBINDING_FUNPTR
 	 ? CFI_type_cfunptr : CFI_type_cptr);
   else
-switch (e->ts.type)
-  {
+{
+  if (e->expr_type == EXPR_NULL && e->ts.type == BT_UNKNOWN)
+	e->ts = fsym->ts;
+  switch (e->ts.type)
+	{
 	case BT_INTEGER:
 	case BT_LOGICAL:
 	case BT_REAL:
@@ -5647,7 +5650,8 @@ gfc_conv_gfc_desc_to_cfi_desc (gfc_se *parmse, gfc_expr *e, gfc_symbol *fsym)
 	case BT_UNKNOWN:
 	  // FIXME: Really unreachable? Or reachable for type(*) ? If so, CFI_type_other?
 	  gcc_unreachable ();
-  }
+	}
+}
 
   tmp = gfc_get_cfi_desc_type (cfi);
   gfc_add_modify (&block, tmp,
@@ -5700,7 +5704,8 @@ gfc_conv_gfc_desc_to_cfi_desc (gfc_se *parmse, gfc_expr *e, gfc_symbol *fsym)
   gfc_init_block (&block2);
 
   /* Set elem_len, which may be only known at run time. */
-  if (e->ts.type == BT_CHARACTER)
+  if (e->ts.type == BT_CHARACTER
+  && (e->expr_type != EXPR_NULL || gfc_strlen != NULL_TREE))
 {
   gcc_assert (gfc_strlen);
   tmp = gfc_strlen;
diff --git a/gcc/testsuite/gfortran.dg/null_actual_2.f90 b/gcc/testsuite/gfortran.dg/null_actual_2.f90
new file mode 100644
index 000..de481f01295
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/null_actual_2.f90
@@ -0,0 +1,16 @@
+! { dg-do compile }
+!
+! PR fortran/104126
+!
+! Contributed by G. Steinmetz 
+!
+program p
+   use iso_c_binding, only: c_char
+   character(len=:,kind=c_char), pointer :: d
+   call s(null(d))
+   call s(null())
+contains
+   subroutine s(x) bind(c)
+  character(len=:, kind=c_char), pointer, intent(in) :: x
+   end
+end

Re: [arm] MVE: Relax addressing modes for full loads and stores

2022-03-07 Thread Andre Vieira (lists) via Gcc-patches


On 17/01/2022 07:48, Christophe Lyon wrote:

Hi André,

On Fri, Jan 14, 2022 at 6:03 PM Andre Vieira (lists) via Gcc-patches 
 wrote:


Hi Christophe,

This patch relaxes the addressing modes for the mve full load and
stores
(by full loads and stores I mean non-widening or narrowing loads and
stores resp). The code before was requiring a LO_REGNUM for these,
where
this is only a requirement if the load is widening or the store
narrowing.

So with this your patch should not be necessary.

Regression tested on arm-none-eabi-gcc.  Can you please confirm this
fixes the issue you were seeing too?


Yes, I confirm this fixes the problem I was fixing with my patch #15 
in my MVE/VCMP/VCOND series.

I'll drop it.

Thanks!

Christophe


gcc/ChangeLog:

 * config/arm/arm.h (MVE_STN_LDW_MODE): New MACRO.
 * config/arm/arm.c (mve_vector_mem_operand): Relax
constraint on
 base register for non widening loads or narrowing stores.


Kind Regards,
Andre Vieira



Ping, I noticed this also fixes PR 104790.

Kind regards,
Andre

Re: [PATCH v2] x86: Disable SSE on unwind-c.c and unwind-dw2.c

2022-03-07 Thread Jakub Jelinek via Gcc-patches

On Mon, Mar 07, 2022 at 05:57:40AM -0800, H.J. Lu via Gcc-patches wrote:
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -9444,9 +9444,8 @@ ix86_expand_epilogue (int style)
> rtx sa = EH_RETURN_STACKADJ_RTX;
> rtx_insn *insn;
>  
> -   /* %ecx can't be used for both DRAP register and eh_return.  */
> -   if (crtl->drap_reg)
> - gcc_assert (REGNO (crtl->drap_reg) != CX_REG);
> +   /* Stack realignment doesn't work with eh_return.  */
> +   gcc_assert (!crtl->stack_realign_needed);

Shouldn't this be error or sorry?  Otherwise, won't
calls to __builtin_eh_return from random testcases where
stack realignment is needed or even with regparm nested functions
just ICE?
>  
> /* regparm nested functions don't work with eh_return.  */
> gcc_assert (!ix86_static_chain_on_stack);

Jakub

[committed] Add missing space in various string literals

2022-03-07 Thread Jakub Jelinek via Gcc-patches

Hi!

After more than 2 years I've run my
https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00844.html
script again.  While it has lots of false positives, it discovered
two bugs.

Committed to trunk as obvious.

2022-03-07  Jakub Jelinek  

gcc/c/
* c-parser.cc (c_parser_omp_clause_map): Add missing space in string
literal.
gcc/cp/
* parser.cc (cp_parser_omp_clause_map): Add missing space in string
literal.

--- gcc/c/c-parser.cc.jj2022-02-17 10:28:51.780275535 +0100
+++ gcc/c/c-parser.cc   2022-03-07 15:11:03.749647618 +0100
@@ -16230,8 +16230,8 @@ c_parser_omp_clause_map (c_parser *parse
   else
{
  c_parser_error (parser, "%<#pragma omp target%> with "
- "modifier other than % or %"
- "on % clause");
+ "modifier other than % or "
+ "% on % clause");
  parens.skip_until_found_close (parser);
  return list;
}
--- gcc/cp/parser.cc.jj 2022-02-11 00:19:22.230066001 +0100
+++ gcc/cp/parser.cc2022-03-07 15:11:58.438889997 +0100
@@ -39432,8 +39432,8 @@ cp_parser_omp_clause_map (cp_parser *par
   else
{
  cp_parser_error (parser, "%<#pragma omp target%> with "
-  "modifier other than % or %"
-  "on % clause");
+  "modifier other than % or "
+  "% on % clause");
  cp_parser_skip_to_closing_parenthesis (parser,
 /*recovering=*/true,
 /*or_comma=*/false,


Jakub

Re: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

2022-03-07 Thread Patrick Palka via Gcc-patches

On Fri, 4 Mar 2022, Jason Merrill wrote:

> On 3/4/22 14:24, Patrick Palka wrote:
> > Here we're failing to communicate to cp_finish_decl from tsubst_expr
> > that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING
> > flag), which causes do_class_deduction to always consider explicit
> > deduction guides when performing CTAD for a templated variable initializer.
> > 
> > We could fix this by passing LOOKUP_ONLYCONVERTING appropriately when
> > calling cp_finish_decl from tsubst_expr, but it seems do_class_deduction
> > can determine if we're in a copy-init context by simply inspecting the
> > initializer, and thus render its flags parameter unnecessary, which is
> > what this patch implements.  (If we were to fix this in tsubst_expr
> > instead, I think we'd have to inspect the initializer in the same way
> > in order to detect a copy-init context?)
> 
> Hmm, does this affect conversions as well?
> 
> Looks like it does:
> 
> struct A
> {
>   explicit operator int();
> };
> 
> template  void f()
> {
>   T t = A();
> }
> 
> int main()
> {
>   f(); // wrongly accepted
> }
> 
> The reverse, initializing via an explicit constructor, is caught by code in
> build_aggr_init much like the code your patch adds to do_auto_deduction;
> perhaps we should move/copy that code to cp_finish_decl?

Ah, makes sense.  Moving that code from build_aggr_init to
cp_finish_decl broke things, but using it in both spots seems to work
well.  And I suppose we might as well use it in do_class_deduction too,
since doing so lets us remove the flags parameter.

-- >8 --

Subject: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

Here we're failing to communicate to cp_finish_decl from tsubst_expr
that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING
flag), which causes us to always consider explicit deduction guides when
performing CTAD for a templated variable initializer.

It turns out this bug also affects consideration of explicit conversion
operators for the same reason.  But consideration of explicit constructors
is unaffected and seems to work correctly thanks to code in build_aggr_init
that sets LOOKUP_ONLYCONVERTING when the initializer represents
copy-initialization.

This patch factors out the copy-initialization test from build_aggr_init
and reuses it in tsubst_expr for sake of cp_finish_decl.  And we might
as well use it in do_class_deduction too, since doing so lets us get rid
of its flags parameter (which is used only to control whether explicit
deduction guides are considered).

Bootstrapped and regtestd on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/102137
PR c++/87820

gcc/cp/ChangeLog:

* cp-tree.h (is_copy_initialization): Declare.
(do_auto_deduction): Remove flags parameter.
* decl.cc (cp_finish_decl): Adjust call to do_auto_deduction.
* init.cc (build_aggr_init): Split out copy-initialization
check into ...
(is_copy_initialization): ... here.
* pt.cc (convert_template_argument): Adjust call to
do_auto_deduction.
(tsubst_expr) : Pass LOOKUP_ONLYCONVERTING
to cp_finish_decl when is_copy_initialization.
(do_class_deduction): Remove flags parameter, and instead
call is_copy_initialization to determine if we're in a copy-init
context.
(do_auto_deduction): Adjust call to do_class_deduction.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/explicit15.C: New test.
* g++.dg/cpp1z/class-deduction108.C: New test.
---
 gcc/cp/cp-tree.h  |  4 +--
 gcc/cp/decl.cc|  2 +-
 gcc/cp/init.cc| 20 +---
 gcc/cp/pt.cc  | 23 --
 gcc/testsuite/g++.dg/cpp0x/explicit15.C   | 31 +++
 .../g++.dg/cpp1z/class-deduction108.C | 31 +++
 6 files changed, 93 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/explicit15.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction108.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ac723901098..be556d2c441 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7039,6 +7039,7 @@ extern void emit_mem_initializers (tree);
 extern tree build_aggr_init(tree, tree, int,
  tsubst_flags_t);
 extern int is_class_type   (tree, int);
+extern bool is_copy_initialization (tree);
 extern tree build_zero_init(tree, tree, bool);
 extern tree build_value_init   (tree, tsubst_flags_t);
 extern tree build_value_init_noctor(tree, tsubst_flags_t);
@@ -7279,8 +7280,7 @@ extern tree do_auto_deduction   (tree, 
tree, tree,
 = tf_warning_or_error,

[PATCH v3] x86: Disable SSE on unwind-c.c and unwind-dw2.c

2022-03-07 Thread H.J. Lu via Gcc-patches

Since eh_return doesn't work with stack realignment, disable SSE on
unwind-c.c and unwind-dw2.c to avoid stack realignment with the 4-byte
incoming stack to avoid SSE usage which is caused by

commit 609e8c492d62d92465460eae3d43dfc4b2c68288
Author: H.J. Lu 
Date:   Sat Feb 26 14:17:23 2022 -0800

x86: Always return pseudo register in ix86_gen_scratch_sse_rtx

when pseudo vector registers are used to expand memset.

gcc/

PR target/104781
* config/i386/i386.cc (ix86_expand_epilogue): Sorry if there is
stack realignment with eh_return or regparm nested function.

gcc/testsuite/

PR target/104781
* gcc.target/i386/eh_return-1.c: Add -mincoming-stack-boundary=4.
* gcc.target/i386/eh_return-2.c: Likewise.

libgcc/

PR target/104781
* config.host (tmake_file): Add i386/32/t-eh-return-no-sse for
32-bit x86 Cygwin, MinGW and Solaris.
* config/i386/32/t-eh-return-no-sse: New file.
---
 gcc/config/i386/i386.cc | 11 +++
 gcc/testsuite/gcc.target/i386/eh_return-1.c |  2 +-
 gcc/testsuite/gcc.target/i386/eh_return-2.c |  2 +-
 libgcc/config.host  | 13 +
 libgcc/config/i386/32/t-eh-return-no-sse|  5 +
 5 files changed, 27 insertions(+), 6 deletions(-)
 create mode 100644 libgcc/config/i386/32/t-eh-return-no-sse

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index efa947f9795..4121f986221 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -9444,12 +9444,15 @@ ix86_expand_epilogue (int style)
  rtx sa = EH_RETURN_STACKADJ_RTX;
  rtx_insn *insn;
 
- /* %ecx can't be used for both DRAP register and eh_return.  */
- if (crtl->drap_reg)
-   gcc_assert (REGNO (crtl->drap_reg) != CX_REG);
+ /* Stack realignment doesn't work with eh_return.  */
+ if (crtl->stack_realign_needed)
+   sorry ("Stack realignment not supported with "
+  "%<__builtin_eh_return%>");
 
  /* regparm nested functions don't work with eh_return.  */
- gcc_assert (!ix86_static_chain_on_stack);
+ if (ix86_static_chain_on_stack)
+   sorry ("regparm nested function not supported with "
+  "%<__builtin_eh_return%>");
 
  if (frame_pointer_needed)
{
diff --git a/gcc/testsuite/gcc.target/i386/eh_return-1.c 
b/gcc/testsuite/gcc.target/i386/eh_return-1.c
index b21fd75fc93..43f94f01a97 100644
--- a/gcc/testsuite/gcc.target/i386/eh_return-1.c
+++ b/gcc/testsuite/gcc.target/i386/eh_return-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=haswell -mno-avx512f 
-mtune-ctrl=avx256_move_by_pieces" } */
+/* { dg-options "-O2 -mincoming-stack-boundary=4 -march=haswell -mno-avx512f 
-mtune-ctrl=avx256_move_by_pieces" } */
 
 struct _Unwind_Context
 {
diff --git a/gcc/testsuite/gcc.target/i386/eh_return-2.c 
b/gcc/testsuite/gcc.target/i386/eh_return-2.c
index f23f4492dac..cb762f92cc2 100644
--- a/gcc/testsuite/gcc.target/i386/eh_return-2.c
+++ b/gcc/testsuite/gcc.target/i386/eh_return-2.c
@@ -1,6 +1,6 @@
 /* PR target/101772  */
 /* { dg-do compile } */
-/* { dg-additional-options "-O0 -march=x86-64 -mstackrealign" } */
+/* { dg-additional-options "-O0 -mincoming-stack-boundary=4 -march=x86-64 
-mstackrealign" } */
 
 struct _Unwind_Context _Unwind_Resume_or_Rethrow_this_context;
 
diff --git a/libgcc/config.host b/libgcc/config.host
index 094fd3ad254..1588ab6cf20 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1583,3 +1583,16 @@ case ${host} in
   tmake_file="$tmake_file t-gthr-noweak"
   ;;
 esac
+
+case ${host} in
+i[34567]86-*-* | x86_64-*-*)
+  if test "${host_address}" = 32; then
+case ${host} in
+*-*-cygwin* | *-*-mingw* | *-*-solaris2*)
+  # Disable SSE on unwind-c.c and unwind-dw2.c to avoid stack
+  # realignment with the 4-byte aligned incoming stack.
+  tmake_file="${tmake_file} i386/${host_address}/t-eh-return-no-sse"
+  ;;
+esac
+  fi
+esac
diff --git a/libgcc/config/i386/32/t-eh-return-no-sse 
b/libgcc/config/i386/32/t-eh-return-no-sse
new file mode 100644
index 000..5a8c3135911
--- /dev/null
+++ b/libgcc/config/i386/32/t-eh-return-no-sse
@@ -0,0 +1,5 @@
+# Since eh_return doesn't work with stack realignment, disable SSE on
+# unwind-c.c and unwind-dw2.c to avoid stack realignment with the 4-byte
+# incoming stack.
+
+unwind-c.o unwind-c_s.o unwind-dw2.o unwind-dw2_s.o : HOST_LIBGCC2_CFLAGS += 
-mno-sse
-- 
2.35.1

Re: [PATCH v2] x86: Disable SSE on unwind-c.c and unwind-dw2.c

2022-03-07 Thread H.J. Lu via Gcc-patches

On Mon, Mar 7, 2022 at 6:23 AM Jakub Jelinek  wrote:
>
> On Mon, Mar 07, 2022 at 05:57:40AM -0800, H.J. Lu via Gcc-patches wrote:
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -9444,9 +9444,8 @@ ix86_expand_epilogue (int style)
> > rtx sa = EH_RETURN_STACKADJ_RTX;
> > rtx_insn *insn;
> >
> > -   /* %ecx can't be used for both DRAP register and eh_return.  */
> > -   if (crtl->drap_reg)
> > - gcc_assert (REGNO (crtl->drap_reg) != CX_REG);
> > +   /* Stack realignment doesn't work with eh_return.  */
> > +   gcc_assert (!crtl->stack_realign_needed);
>
> Shouldn't this be error or sorry?  Otherwise, won't
> calls to __builtin_eh_return from random testcases where
> stack realignment is needed or even with regparm nested functions
> just ICE?

Fixed in the v3 patch:

https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591307.html

Thanks.

> >
> > /* regparm nested functions don't work with eh_return.  */
> > gcc_assert (!ix86_static_chain_on_stack);
>
> Jakub
>


-- 
H.J.

Re: [committed] libstdc++: Ensure __glibcxx_assert_fail has default visibility

2022-03-07 Thread Jonathan Wakely via Gcc-patches

On Sat, 5 Mar 2022 at 20:34, Jonathan Wakely via Libstdc++
 wrote:
>
> Tested powerpc64le-linux, pushed to trunk.
>
> -- >8 --
>
> This ensures there's no linker error if libstdc++ headers are included
> following a pragma that sets hidden visibility.
>
> Similarly for std::__terminate, which is always-inline so shouldn't
> matter, but it's not wrong to do this anyway.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/c++config (__glibcxx_assert_fail): Add visibility
> attribute.
> (__terminate): Likewise.

This broke mingw targets. Fixed like so.

Tested x86_64-linux, pushed to trunk.
commit 4cb935cb69f12088975fa7f6907c6ace0580e2dd
Author: Jonathan Wakely 
Date:   Mon Mar 7 15:07:05 2022

libstdc++: Use visibility pragmas instead of attributes [PR104807]

The _GLIBCXX_PSEUDO_VISIBILITY macro isn't defined until after including
os_defines.h, so we can't use _GLIBCXX_VISIBILITY early in c++config.
Replace the uses of that macro with #pragma visibility push(default)
instead.

libstdc++-v3/ChangeLog:

PR libstdc++/104807
* include/bits/c++config (__terminate, __glibcxx_assert_fail):
Replace _GLIBCXX_VISIBILITY on function with visibility pragma.
(__is_constant_evaluated): Add visibility pragma.

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 6c134f13509..2798b9786dc 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -302,15 +302,16 @@ namespace std
   typedef decltype(nullptr)nullptr_t;
 #endif
 
+#pragma GCC visibility push(default)
   // This allows the library to terminate without including all of 
   // and without making the declaration of std::terminate visible to users.
   extern "C++" __attribute__ ((__noreturn__, __always_inline__))
-  _GLIBCXX_VISIBILITY(default)
   inline void __terminate() _GLIBCXX_USE_NOEXCEPT
   {
 void terminate() _GLIBCXX_USE_NOEXCEPT __attribute__ ((__noreturn__));
 terminate();
   }
+#pragma GCC visibility pop
 }
 
 #define _GLIBCXX_USE_DUAL_ABI
@@ -506,6 +507,7 @@ namespace std
 
 namespace std
 {
+#pragma GCC visibility push(default)
   // Internal version of std::is_constant_evaluated().
   // This can be used without checking if the compiler supports the feature.
   // The macro _GLIBCXX_HAVE_IS_CONSTANT_EVALUATED can be used to check if
@@ -523,6 +525,7 @@ namespace std
 return false;
 #endif
   }
+#pragma GCC visibility pop
 }
 
 // Debug Mode implies checking assertions.
@@ -553,13 +556,15 @@ namespace std
 # ifdef _GLIBCXX_VERBOSE_ASSERT
 namespace std
 {
+#pragma GCC visibility push(default)
   // Avoid the use of assert, because we're trying to keep the 
   // include out of the mix.
-  extern "C++" _GLIBCXX_NORETURN _GLIBCXX_VISIBILITY(default)
+  extern "C++" _GLIBCXX_NORETURN
   void
   __glibcxx_assert_fail(const char* __file, int __line,
const char* __function, const char* __condition)
   _GLIBCXX_NOEXCEPT;
+#pragma GCC visibility pop
 }
 #define __glibcxx_assert_impl(_Condition)  \
   if (__builtin_expect(!bool(_Condition), false))  \

[committed] doc: Remove redundant sentence about modules being in C++20

2022-03-07 Thread Jonathan Wakely via Gcc-patches

Discussed with Jason and then committed as obvious.

-- >8 --

As C++20 has already been published, we don't need to link to the draft
(which is now the C++23 draft anyway). And there's no need to say it's
part of the C++20 spec, or that there might be defect reports. That's
true for everything in C++20, so calling it out here just for Modules
isn't needed.

gcc/ChangeLog:

* doc/invoke.texi (C++ Modules): Remove anachronism.
---
 gcc/doc/invoke.texi | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 248ed534aee..b01ffab566a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -34151,9 +34151,7 @@ provides a modular compilation system, intending to 
provide both
 faster builds and better library isolation.  The ``Merging Modules''
 paper @uref{https://wg21.link/p1103}, provides the easiest to read set
 of changes to the standard, although it does not capture later
-changes.  That specification is now part of C++20,
-@uref{git@@github.com:cplusplus/draft.git}, it is considered complete
-(there may be defect reports to come).
+changes.
 
 @emph{G++'s modules support is not complete.}  Other than bugs, the
 known missing pieces are:
-- 
2.34.1

RE: [arm] MVE: Relax addressing modes for full loads and stores

2022-03-07 Thread Kyrylo Tkachov via Gcc-patches

Ok, please include PR 104790 in the ChangeLog.
Thanks,
Kyrill

From: Andre Vieira (lists) 
Sent: Monday, March 7, 2022 2:17 PM
To: Kyrylo Tkachov 
Cc: GCC Patches 
Subject: Re: [arm] MVE: Relax addressing modes for full loads and stores

On 17/01/2022 07:48, Christophe Lyon wrote:
Hi André,

On Fri, Jan 14, 2022 at 6:03 PM Andre Vieira (lists) via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> wrote:
Hi Christophe,

This patch relaxes the addressing modes for the mve full load and stores
(by full loads and stores I mean non-widening or narrowing loads and
stores resp). The code before was requiring a LO_REGNUM for these, where
this is only a requirement if the load is widening or the store narrowing.

So with this your patch should not be necessary.

Regression tested on arm-none-eabi-gcc.  Can you please confirm this
fixes the issue you were seeing too?

Yes, I confirm this fixes the problem I was fixing with my patch #15 in my 
MVE/VCMP/VCOND series.
I'll drop it.

Thanks!

Christophe

gcc/ChangeLog:

 * config/arm/arm.h (MVE_STN_LDW_MODE): New MACRO.
 * config/arm/arm.c (mve_vector_mem_operand): Relax constraint on
 base register for non widening loads or narrowing stores.

Kind Regards,
Andre Vieira

Ping, I noticed this also fixes PR 104790.

Kind regards,
Andre

[committed] Fortran: Fix typos

2022-03-07 Thread Tobias Burnus


I saw that Jakub fixed some typos/duplicated words, I thought
I could do likewise – and I found some more in gcc/fortran.

Committed as r12-7524.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit e3ca3e7993696affe95a3ea24c2b133c14a056e4
Author: Tobias Burnus 
Date:   Mon Mar 7 17:20:52 2022 +0100

Fortran: Fix typos

gcc/fortran/ChangeLog:

* array.cc (gfc_ref_dimen_size): Fix comment typo.
* dump-parse-tree.cc (gfc_dump_c_prototypes): Likewise.
* frontend-passes.cc (cfe_code): Likewise.
* gfortran.texi: Likewise.
* resolve.cc (generate_component_assignments): Likewise.
* simplify.cc (gfc_simplify_this_image): Likewise.
* trans-expr.cc (trans_scalar_class_assign,
gfc_maybe_dereference_var): Likewise.
* intrinsic.texi: Remove word duplication.
* invoke.texi: Likewise.

diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc
index f1d92e00c98..eb9ed8580a0 100644
--- a/gcc/fortran/array.cc
+++ b/gcc/fortran/array.cc
@@ -2420,7 +2420,7 @@ gfc_ref_dimen_size (gfc_array_ref *ar, int dimen, mpz_t *result, mpz_t *end)
 	  gfc_free_expr(stride_expr);
 	}
 
-  /* Calculate the number of elements via gfc_dep_differce, but only if
+  /* Calculate the number of elements via gfc_dep_difference, but only if
 	 start and end are both supplied in the reference or the array spec.
 	 This is to guard against strange but valid code like
 
diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 322416e6556..3635460bffd 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -3543,7 +3543,7 @@ gfc_dump_c_prototypes (gfc_namespace *ns, FILE *file)
   gfc_traverse_ns (ns, write_interop_decl);
 }
 
-/* Loop over all global symbols, writing out their declrations.  */
+/* Loop over all global symbols, writing out their declarations.  */
 
 void
 gfc_dump_external_c_prototypes (FILE * file)
diff --git a/gcc/fortran/frontend-passes.cc b/gcc/fortran/frontend-passes.cc
index 22f1bb56a2b..4033f27df99 100644
--- a/gcc/fortran/frontend-passes.cc
+++ b/gcc/fortran/frontend-passes.cc
@@ -974,7 +974,7 @@ cfe_code (gfc_code **c, int *walk_subtrees, void *data ATTRIBUTE_UNUSED)
   changed_statement = NULL;
 
   /* Do not do anything inside a WHERE statement; scalar assignments, BLOCKs
- and allocation on assigment are prohibited inside WHERE, and finally
+ and allocation on assignment are prohibited inside WHERE, and finally
  masking an expression would lead to wrong-code when replacing
 
  WHERE (a>0)
diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 2a55676791b..f8737f4d323 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -1093,7 +1093,7 @@ variable.
 
 The maximum number of bytes of user data in a subrecord is 2147483639
 (2 GiB - 9) for a four-byte record marker.  This limit can be lowered
-with the @option{-fmax-subrecord-length} option, altough this is
+with the @option{-fmax-subrecord-length} option, although this is
 rarely useful. If the length of a logical record exceeds this limit,
 the data is distributed among several subrecords.
 
diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index f182cacb4b8..e3cd8279960 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -12897,7 +12897,7 @@ end program real_kinds
 @table @asis
 @item @emph{Description}:
 @code{SET_EXPONENT(X, I)} returns the real number whose fractional part
-is that that of @var{X} and whose exponent part is @var{I}.
+is that of @var{X} and whose exponent part is @var{I}.
 
 @item @emph{Standard}:
 Fortran 90 and later
@@ -12917,7 +12917,7 @@ Elemental function
 @item @emph{Return value}:
 The return value is of the same type and kind as @var{X}.
 The real number whose fractional part
-is that that of @var{X} and whose exponent part if @var{I} is returned;
+is that of @var{X} and whose exponent part if @var{I} is returned;
 it is @code{FRACTION(X) * RADIX(X)**I}.
 
 @item @emph{Example}:
diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 6435dc4d4de..5c7501a28b1 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -1858,7 +1858,7 @@ except when optimizing for size via @option{-Os}.  If the code
 contains a very large number of argument that have to be packed, code
 size and also compilation time may become excessive.  If that is the
 case, it may be better to disable this option.  Instances of packing
-can be found by using by using @option{-Warray-temporaries}.
+can be found by using @option{-Warray-temporaries}.
 
 @item -fexternal-blas
 @opindex @code{fexternal-blas}
@@ -2068,7 +2068,7 @@ does not generate p

Re: [PATCH v8 05/12] LoongArch Port: Machine description C files and .h files.

2022-03-07 Thread Richard Sandiford via Gcc-patches

Hi,

Some comments below, but otherwise it looks good to me.

A few of the comments are about removing hook or macro definitions
that are the same as the default.  Doing that helps people who want
to update a hook interface in future, since there are then fewer
places to adjust.

xucheng...@loongson.cn writes:
> […]
> +/* Describes how a symbol is used.
> +
> +   SYMBOL_CONTEXT_CALL
> +   The symbol is used as the target of a call instruction.
> +
> +   SYMBOL_CONTEXT_LEA
> +   The symbol is used in a load-address operation.
> +
> +   SYMBOL_CONTEXT_MEM
> +   The symbol is used as the address in a MEM.  */
> +enum loongarch_symbol_context {
> +  SYMBOL_CONTEXT_CALL,
> +  SYMBOL_CONTEXT_LEA,
> +  SYMBOL_CONTEXT_MEM
> +};

It looks like this is unused: loongarch_classify_symbol takes an
argument of this type, but ignores it.

> […]
> +/* Classifies a type of call.
> +
> +   LARCH_CALL_NORMAL
> + A normal call or call_value pattern.
> +
> +   LARCH_CALL_SIBCALL
> + A sibcall or sibcall_value pattern.
> +
> +   LARCH_CALL_EPILOGUE
> + A call inserted in the epilogue.  */
> +enum loongarch_call_type {
> +  LARCH_CALL_NORMAL,
> +  LARCH_CALL_SIBCALL,
> +  LARCH_CALL_EPILOGUE
> +};

This also looks unused.

> +
> +/* Controls the conditions under which certain instructions are split.
> +
> +   SPLIT_IF_NECESSARY
> + Only perform splits that are necessary for correctness
> + (because no unsplit version exists).
> +
> +   SPLIT_FOR_SPEED
> + Perform splits that are necessary for correctness or
> + beneficial for code speed.
> +
> +   SPLIT_FOR_SIZE
> + Perform splits that are necessary for correctness or
> + beneficial for code size.  */
> +enum loongarch_split_type {
> +  SPLIT_IF_NECESSARY,
> +  SPLIT_FOR_SPEED,
> +  SPLIT_FOR_SIZE
> +};

It looks like this is also unused: loongarch_split_move_p takes an
argument of this type, but ignores it.

I think it'd be better to remove these three enums for now and add them
back later if they become useful.

> […]
> +/* RTX costs of various operations on the different architectures.  */
> +struct loongarch_rtx_cost_data
> +{
> +  unsigned short fp_add;
> +  unsigned short fp_mult_sf;
> +  unsigned short fp_mult_df;
> +  unsigned short fp_div_sf;
> +  unsigned short fp_div_df;
> +  unsigned short int_mult_si;
> +  unsigned short int_mult_di;
> +  unsigned short int_div_si;
> +  unsigned short int_div_di;
> +  unsigned short branch_cost;
> +  unsigned short memory_latency;
> +};
> +
> +/* Default RTX cost initializer.  */
> +#define COSTS_N_INSNS(N) ((N) * 4)
> +#define DEFAULT_COSTS\
> +.fp_add  = COSTS_N_INSNS (1),\
> +.fp_mult_sf  = COSTS_N_INSNS (2),\
> +.fp_mult_df  = COSTS_N_INSNS (2),\
> +.fp_div_sf   = COSTS_N_INSNS (4),\
> +.fp_div_df   = COSTS_N_INSNS (4),\
> +.int_mult_si = COSTS_N_INSNS (1),\
> +.int_mult_di = COSTS_N_INSNS (1),\
> +.int_div_si  = COSTS_N_INSNS (1),\
> +.int_div_di  = COSTS_N_INSNS (1),\
> +.branch_cost = 2,\
> +.memory_latency  = 4

Unfortunately we need to stay within standard C++(11), so the initialisers
can't use “.name = value” notation.

> […]
> +/* Classifies an address.
> +
> +   ADDRESS_REG
> +   A natural register + offset address.  The register satisfies
> +   loongarch_valid_base_register_p and the offset is a 
> const_arith_operand.
> +
> +   ADDRESS_REG_REG
> +   A base register indexed by (optionally scaled) register.
> +
> +   ADDRESS_CONST_INT
> +   A signed 16-bit constant address.
> +
> +   ADDRESS_SYMBOLIC:
> +   A constant symbolic address.  */
> +enum loongarch_address_type
> +{
> +  ADDRESS_REG,
> +  ADDRESS_REG_REG,
> +  ADDRESS_CONST_INT,
> +  ADDRESS_SYMBOLIC
> +};
> +
> +
> +/* Information about an address described by loongarch_address_type.
> +
> +   ADDRESS_CONST_INT
> +   No fields are used.
> +
> +   ADDRESS_REG
> +   REG is the base register and OFFSET is the constant offset.
> +
> +   ADDRESS_SYMBOLIC
> +   SYMBOL_TYPE is the type of symbol that the address references.  */
> +struct loongarch_address_info
> +{
> +  enum loongarch_address_type type;
> +  rtx reg;
> +  rtx offset;
> +  enum loongarch_symbol_type symbol_type;
> +};

It'd be worth documenting what the fields mean for ADDRESS_REG_REG too.
It looks like OFFSET is the index register in that case.

> […]
> +/* The value of TARGET_ATTRIBUTE_TABLE.  */
> +static const struct attribute_spec loongarch_attribute_table[] = {
> +/* { name, min_len, max_len, decl_req, type_req, fn_type_req,
> +   affects_type_identity, handler, exclude }  */
> +  { NULL, 0, 0, false, false, false, false, NULL, NULL }
> +};

I think it'd be better to leave this and TARGET_ATTRIBUTE_TABLE
undefined until a non-terminator entry is needed.

> […]
> +/* See whether TYPE is

[PATCH] c++: non-constant non-dependent decltype folding [PR104823]

2022-03-07 Thread Patrick Palka via Gcc-patches

instantiate_non_dependent_expr_sfinae instantiates only potentially
constant expressions, but when processing a non-dependent decltype
operand we want to instantiate it even if it's non-constant since
such non-dependent decltype is always resolved ahead of time.

Currently finish_decltype_type uses the former function, which causes us
to miss issuing a narrowing diagnostic for S{id(v)} in the below testcase
because we never instantiate this non-constant non-dependent decltype
operand.

So this patch makes finish_decltype_type use i_n_d_e_internal instead of
_sfinae.  And afterward, we need to keep processing_template_decl cleared
for sake of the later call to lvalue_kind, which handles templated and
non-templated COND_EXPR differently.  Otherwise we'd incorrectly reject
the declaration of g in cpp0x/cond2.C with:

  error: ‘g’ declared as function returning a function

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/104823

gcc/cp/ChangeLog:

* semantics.cc (finish_decltype_type): Use i_n_d_e_internal
instead of _sfinae when instantiating a non-dependent decltype
operand, and keep processing_template_decl cleared afterward.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/Wnarrowing19.C: New test.
---
 gcc/cp/semantics.cc   | 11 ++-
 gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C |  8 
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 07cae993efe..66d90c2f7be 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -11217,6 +11217,8 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
   /* decltype is an unevaluated context.  */
   cp_unevaluated u;
 
+  processing_template_decl_sentinel ptds (/*reset=*/false);
+
   /* Depending on the resolution of DR 1172, we may later need to distinguish
  instantiation-dependent but not type-dependent expressions so that, say,
  A::U doesn't require 'typename'.  */
@@ -11232,7 +11234,14 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
 }
   else if (processing_template_decl)
 {
-  expr = instantiate_non_dependent_expr_sfinae (expr, complain);
+  /* Instantiate the non-dependent operand to diagnose any ill-formed
+expressions.  And keep processing_template_decl cleared for the rest
+of the function (for sake of the call to lvalue_kind below, which
+handles templated and non-templated COND_EXPR differently).  */
+  processing_template_decl = 0;
+  /* Since we want to do this even for non-constant expressions, we
+use i_n_d_e_internal here instead of _sfinae.  */
+  expr = instantiate_non_dependent_expr_internal (expr, complain);
   if (expr == error_mark_node)
return error_mark_node;
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C 
b/gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C
new file mode 100644
index 000..bd9fd2eb6f9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C
@@ -0,0 +1,8 @@
+// PR c++/104823
+// { dg-do compile { target c++11 } }
+
+struct S { S(int); };
+
+double id(double v);
+
+template auto f(double v) -> decltype(S{id(v)}); // { dg-error 
"narrowing" }
-- 
2.35.1.354.g715d08a9e5

Re: [PATCH] c++: non-constant non-dependent decltype folding [PR104823]

2022-03-07 Thread Jason Merrill via Gcc-patches


On 3/7/22 14:41, Patrick Palka wrote:

instantiate_non_dependent_expr_sfinae instantiates only potentially
constant expressions


Hmm, that now strikes me as a problematic interface, as we don't know 
whether what we get back is template or non-template trees.


Maybe we want to change instantiate_non_dependent_expr to 
checking_assert that the argument is non-dependent (callers are already 
checking that), and drop the potentially-constant test?



but when processing a non-dependent decltype
operand we want to instantiate it even if it's non-constant since
such non-dependent decltype is always resolved ahead of time.

Currently finish_decltype_type uses the former function, which causes us
to miss issuing a narrowing diagnostic for S{id(v)} in the below testcase
because we never instantiate this non-constant non-dependent decltype
operand.

So this patch makes finish_decltype_type use i_n_d_e_internal instead of
_sfinae.  And afterward, we need to keep processing_template_decl cleared
for sake of the later call to lvalue_kind, which handles templated and
non-templated COND_EXPR differently.  Otherwise we'd incorrectly reject
the declaration of g in cpp0x/cond2.C with:

   error: ‘g’ declared as function returning a function

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/104823

gcc/cp/ChangeLog:

* semantics.cc (finish_decltype_type): Use i_n_d_e_internal
instead of _sfinae when instantiating a non-dependent decltype
operand, and keep processing_template_decl cleared afterward.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/Wnarrowing19.C: New test.
---
  gcc/cp/semantics.cc   | 11 ++-
  gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C |  8 
  2 files changed, 18 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 07cae993efe..66d90c2f7be 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -11217,6 +11217,8 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
/* decltype is an unevaluated context.  */
cp_unevaluated u;
  
+  processing_template_decl_sentinel ptds (/*reset=*/false);

+
/* Depending on the resolution of DR 1172, we may later need to distinguish
   instantiation-dependent but not type-dependent expressions so that, say,
   A::U doesn't require 'typename'.  */
@@ -11232,7 +11234,14 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
  }
else if (processing_template_decl)
  {
-  expr = instantiate_non_dependent_expr_sfinae (expr, complain);
+  /* Instantiate the non-dependent operand to diagnose any ill-formed
+expressions.  And keep processing_template_decl cleared for the rest
+of the function (for sake of the call to lvalue_kind below, which
+handles templated and non-templated COND_EXPR differently).  */
+  processing_template_decl = 0;
+  /* Since we want to do this even for non-constant expressions, we
+use i_n_d_e_internal here instead of _sfinae.  */
+  expr = instantiate_non_dependent_expr_internal (expr, complain);
if (expr == error_mark_node)
return error_mark_node;
  }
diff --git a/gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C 
b/gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C
new file mode 100644
index 000..bd9fd2eb6f9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C
@@ -0,0 +1,8 @@
+// PR c++/104823
+// { dg-do compile { target c++11 } }
+
+struct S { S(int); };
+
+double id(double v);
+
+template auto f(double v) -> decltype(S{id(v)}); // { dg-error 
"narrowing" }

Re: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

2022-03-07 Thread Jason Merrill via Gcc-patches


On 3/7/22 10:47, Patrick Palka wrote:

On Fri, 4 Mar 2022, Jason Merrill wrote:


On 3/4/22 14:24, Patrick Palka wrote:

Here we're failing to communicate to cp_finish_decl from tsubst_expr
that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING
flag), which causes do_class_deduction to always consider explicit
deduction guides when performing CTAD for a templated variable initializer.

We could fix this by passing LOOKUP_ONLYCONVERTING appropriately when
calling cp_finish_decl from tsubst_expr, but it seems do_class_deduction
can determine if we're in a copy-init context by simply inspecting the
initializer, and thus render its flags parameter unnecessary, which is
what this patch implements.  (If we were to fix this in tsubst_expr
instead, I think we'd have to inspect the initializer in the same way
in order to detect a copy-init context?)


Hmm, does this affect conversions as well?

Looks like it does:

struct A
{
   explicit operator int();
};

template  void f()
{
   T t = A();
}

int main()
{
   f(); // wrongly accepted
}

The reverse, initializing via an explicit constructor, is caught by code in
build_aggr_init much like the code your patch adds to do_auto_deduction;
perhaps we should move/copy that code to cp_finish_decl?


Ah, makes sense.  Moving that code from build_aggr_init to
cp_finish_decl broke things, but using it in both spots seems to work
well.  And I suppose we might as well use it in do_class_deduction too,
since doing so lets us remove the flags parameter.


Before removing the flags parameter please try asserting that it now 
matches is_copy_initialization and see if anything breaks.



-- >8 --

Subject: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

Here we're failing to communicate to cp_finish_decl from tsubst_expr
that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING
flag), which causes us to always consider explicit deduction guides when
performing CTAD for a templated variable initializer.

It turns out this bug also affects consideration of explicit conversion
operators for the same reason.  But consideration of explicit constructors
is unaffected and seems to work correctly thanks to code in build_aggr_init
that sets LOOKUP_ONLYCONVERTING when the initializer represents
copy-initialization.

This patch factors out the copy-initialization test from build_aggr_init
and reuses it in tsubst_expr for sake of cp_finish_decl.  And we might
as well use it in do_class_deduction too, since doing so lets us get rid
of its flags parameter (which is used only to control whether explicit
deduction guides are considered).

Bootstrapped and regtestd on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/102137
PR c++/87820

gcc/cp/ChangeLog:

* cp-tree.h (is_copy_initialization): Declare.
(do_auto_deduction): Remove flags parameter.
* decl.cc (cp_finish_decl): Adjust call to do_auto_deduction.
* init.cc (build_aggr_init): Split out copy-initialization
check into ...
(is_copy_initialization): ... here.
* pt.cc (convert_template_argument): Adjust call to
do_auto_deduction.
(tsubst_expr) : Pass LOOKUP_ONLYCONVERTING
to cp_finish_decl when is_copy_initialization.
(do_class_deduction): Remove flags parameter, and instead
call is_copy_initialization to determine if we're in a copy-init
context.
(do_auto_deduction): Adjust call to do_class_deduction.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/explicit15.C: New test.
* g++.dg/cpp1z/class-deduction108.C: New test.
---
  gcc/cp/cp-tree.h  |  4 +--
  gcc/cp/decl.cc|  2 +-
  gcc/cp/init.cc| 20 +---
  gcc/cp/pt.cc  | 23 --
  gcc/testsuite/g++.dg/cpp0x/explicit15.C   | 31 +++
  .../g++.dg/cpp1z/class-deduction108.C | 31 +++
  6 files changed, 93 insertions(+), 18 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/explicit15.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction108.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ac723901098..be556d2c441 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7039,6 +7039,7 @@ extern void emit_mem_initializers (tree);
  extern tree build_aggr_init   (tree, tree, int,
   tsubst_flags_t);
  extern int is_class_type  (tree, int);
+extern bool is_copy_initialization (tree);
  extern tree build_zero_init   (tree, tree, bool);
  extern tree build_value_init  (tree, tsubst_flags_t);
  extern tree build_value_init_noctor   (tree, tsubst_flags_t);
@@ -7279,8 +7280,7 @@ extern tree do_auto_deduction   (tree, 
tree, tree,

[committed] analyzer: fix leak suppression at end of 'main' [PR101983]

2022-03-07 Thread David Malcolm via Gcc-patches

PR analyzer/101983 reports what I thought were false positives
from -Wanalyzer-malloc-leak, but on closer inspection, the
analyzer is correctly reporting heap-allocated buffers that are
no longer reachable.

However, these "leaks" occur at the end of "main".  The analyzer already
has some logic to avoid reporting leaks at the end of main, where the
leak is detected at the end of the EXIT basic block.  However, in this case,
the leak is detected at the clobber in BB 2 here:
   :
  func (&res);
  res ={v} {CLOBBER(eol)};
  _4 = 0;

   :
:
  return _4;

where we have a chain BB 2 -> BB 3 -> EXIT BB.

This patch generalizes the "are we at the end of 'main'" detection to
handle such cases, silencing -Wanalyzer-malloc-leak on them.

There's a remaining issue where the analyzer unhelpfully describes one
of the leaking values as '', rather than 'res.a', but I'm
leaving that for a followup (covered by PR analyzer/99771).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-7525-g0af37ad4422052be4b7f779737e14c80e57d0ad9.

gcc/analyzer/ChangeLog:
PR analyzer/101983
* engine.cc (returning_from_function_p): New.
(impl_region_model_context::on_state_leak): Use it when rejecting
leaks at the return from "main".

gcc/testsuite/ChangeLog:
PR analyzer/101983
* gcc.dg/analyzer/pr101983-main.c: New test.
* gcc.dg/analyzer/pr101983-not-main.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/engine.cc| 48 ++-
 gcc/testsuite/gcc.dg/analyzer/pr101983-main.c | 38 +++
 .../gcc.dg/analyzer/pr101983-not-main.c   | 40 
 3 files changed, 124 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr101983-main.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr101983-not-main.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index c910d0519d8..bcea3ca4dfd 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -740,6 +740,51 @@ readability_comparator (const void *p1, const void *p2)
   return 0;
 }
 
+/* Return true is SNODE is the EXIT node of a function, or is one
+   of the final snodes within its function.
+
+   Specifically, handle the final supernodes before the EXIT node,
+   for the case of clobbers that happen immediately before exiting.
+   We need a run of snodes leading to the return_p snode, where all edges are
+   intraprocedural, and every snode has just one successor.
+
+   We use this when suppressing leak reports at the end of "main".  */
+
+static bool
+returning_from_function_p (const supernode *snode)
+{
+  if (!snode)
+return false;
+
+  unsigned count = 0;
+  const supernode *iter = snode;
+  while (true)
+{
+  if (iter->return_p ())
+   return true;
+  if (iter->m_succs.length () != 1)
+   return false;
+  const superedge *sedge = iter->m_succs[0];
+  if (sedge->get_kind () != SUPEREDGE_CFG_EDGE)
+   return false;
+  iter = sedge->m_dest;
+
+  /* Impose a limit to ensure we terminate for pathological cases.
+
+We only care about the final 3 nodes, due to cases like:
+  BB:
+(clobber causing leak)
+
+  BB:
+  :
+  return _val;
+
+  EXIT BB.*/
+  if (++count > 3)
+   return false;
+}
+}
+
 /* Find the best tree for SVAL and call SM's on_leak vfunc with it.
If on_leak returns a pending_diagnostic, queue it up to be reported,
so that we potentially complain about a leak of SVAL in the given STATE.  */
@@ -794,8 +839,7 @@ impl_region_model_context::on_state_leak (const 
state_machine &sm,
   gcc_assert (m_enode_for_diag);
 
   /* Don't complain about leaks when returning from "main".  */
-  if (m_enode_for_diag->get_supernode ()
-  && m_enode_for_diag->get_supernode ()->return_p ())
+  if (returning_from_function_p (m_enode_for_diag->get_supernode ()))
 {
   tree fndecl = m_enode_for_diag->get_function ()->decl;
   if (id_equal (DECL_NAME (fndecl), "main"))
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr101983-main.c 
b/gcc/testsuite/gcc.dg/analyzer/pr101983-main.c
new file mode 100644
index 000..a84353be35a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr101983-main.c
@@ -0,0 +1,38 @@
+/* { dg-additional-options "-Wno-analyzer-too-complex 
-fno-analyzer-call-summaries" } */
+
+#include 
+#include 
+#include 
+
+struct list {
+   struct list* next;
+   void *a;
+};
+
+void func(struct list **res)
+{
+   struct list *cur = NULL;
+   do {
+   struct list *n = malloc(sizeof(struct list));
+   void *a = malloc(1);
+   if (n == NULL || a == NULL) {
+   if (n != NULL) free(n);
+   if (a != NULL) free(a);
+   break;
+   }
+
+   if (cur == NULL) {
+   *res = cur = n;
+   } else {
+

Re: [Patch] Fortran: Fix gfc_maybe_dereference_var [PR104430]

2022-03-07 Thread Harald Anlauf via Gcc-patches


Hi Tobias,

Am 07.03.22 um 15:01 schrieb Tobias Burnus:

The problem is that inside the main program,
   y = f(z)
where the the result of z is
   type(t) :: z(size(x%a))
and 'x' is a dummy argument.

'x' looses the attr.dummy in gfc_add_interface_mapping
and this leads to an additional indirect ref in
gfc_maybe_dereference_var - but after the first indirect
ref, TREE_TYPE is alread a RECORD_TYPE ...

The simple fix is to simply punt when the argument
is not a pointer or reference.

This patch additionally fixes a comment in
conv_parent_component_references.


LGTM.  Regarding the commit message, there should be a period
at the end of

>(gfc_maybe_dereference_var): Avoid derefereing a nonpointer


I think there are other PRs which profit from this fix.
Can you please have a look at PR99585, and in particular
the link in comment#0?  ;-)


Those parts are obvious. The only potentially risky
change is the comparison change from a name-wise comparison
to a pointer-based comparison. I think it is fine and the
testsuite did not find any issue, but you prefer, I can
also exclude it.


I was thinking about this, too.  But your change will prevent
running into trouble in the (unlikely?) case c being a NULL pointer.


OK for mainline? How about GCC 10/11 backports?
(I think leaving out the last change, it should be very safe to do.)


OK, as this is both an ICE-on-valid and a regression.

Thanks for the patch!

Harald


Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: 
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; 
Registergericht München, HRB 106955

Re: [PATCH v8 04/12] LoongArch Port: Machine description files.

2022-03-07 Thread Xi Ruoyao via Gcc-patches

On Fri, 2022-03-04 at 15:18 +0800, xucheng...@loongson.cn wrote:

> * config/loongarch/loongarch.md: New file.

An ICE happens building OpenSSH-8.9p1.  Investigation shows it's caused
by the flag "-fzero-call-used-regs=".  It's because the compiler
attempts to clear FCCx registers but can't figure out how.

This flag also triggers ICE for other targets (for example, PR 104820
for MIPS), and the related tests (zero-scratch-regs-{8,9,10,11}.c) are
marked dg-skip for many targets.

But it's unfortunate that packages like OpenSSH have already start to
use this flag... I guess they just enabled it once they saw it was
working for i386 :(.  So it's better to solve the problem for a new
target.

A "quick fix" is adding an insn to clear FCCx.  This is enough to build
OpenSSH and make zero-scratch-regs-{8,9,10,11}.c PASS.

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index a9a8bc4b038..76c5ded9fe4 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2020,6 +2020,12 @@
   DONE;
 })
 
+;; Clear one FCC register
+(define_insn "movfcc" [(set (match_operand:FCC 0 "register_operand" "=z")
+ (const_int 0))]
+  ""
+  "movgr2cf\t%0,$r0")
+
 ;; Conditional move instructions.
 
 (define_insn "*sel_using_"
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]

2022-03-07 Thread Xi Ruoyao via Gcc-patches

Bootstrapped and regtested on mips64el-linux-gnuabi64.

I'm not sure if it's "correct" to clobber other registers during the
zeroing of scratch registers.  But I can't really come up with a better
idea: on MIPS there is no simple way to clear one bit in FCSR (i. e.
FCC[x]).  We can't just use "c.f.s $fccx,$f0,$f0" because it will raise
an exception if $f0 contains a sNaN.

--

This fixes the ICEs using -fzero-call-used-regs=all for MIPS target.

OpenSSH-8.9p1 has started to enable this by default, giving us a reason
to fix -fzero-call-used-regs for more targets.

gcc/

PR target/104817
PR target/104820
* config/mips/mips.cc (mips_zero_call_used_regs): New function.
  (TARGET_ZERO_CALL_USED_REGS): Define.
* config/mips/mips.md (mips_zero_fcc): New insn.

gcc/testsuite

* c-c++-common/zero-scratch-regs-8.c: Enable for MIPS.
* c-c++-common/zero-scratch-regs-9.c: Likewise.
* c-c++-common/zero-scratch-regs-10.c: Likewise.
* c-c++-common/zero-scratch-regs-11.c: Likewise.
---
 gcc/config/mips/mips.cc   | 47 +++
 gcc/config/mips/mips.md   |  6 +++
 .../c-c++-common/zero-scratch-regs-10.c   |  2 +-
 .../c-c++-common/zero-scratch-regs-11.c   |  2 +-
 .../c-c++-common/zero-scratch-regs-8.c|  2 +-
 .../c-c++-common/zero-scratch-regs-9.c|  2 +-
 6 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 4f9683e8bf4..f0d6c8f564c 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -22611,6 +22611,51 @@ mips_asm_file_end (void)
   if (NEED_INDICATE_EXEC_STACK)
 file_end_indicate_exec_stack ();
 }
+
+static HARD_REG_SET
+mips_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  HARD_REG_SET zeroed_hardregs;
+  CLEAR_HARD_REG_SET (zeroed_hardregs);
+
+  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, HI_REGNUM))
+{
+  /* Clear HI and LO altogether.  MIPS target treats HILO as a
+double-word register.  */
+  machine_mode dword_mode = TARGET_64BIT ? TImode : DImode;
+  rtx hilo = gen_rtx_REG (dword_mode, MD_REG_FIRST);
+  rtx zero = CONST0_RTX (dword_mode);
+  emit_move_insn (hilo, zero);
+
+  SET_HARD_REG_BIT (zeroed_hardregs, HI_REGNUM);
+  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, LO_REGNUM))
+   SET_HARD_REG_BIT (zeroed_hardregs, LO_REGNUM);
+  else
+   emit_clobber (gen_rtx_REG (word_mode, LO_REGNUM));
+}
+
+  bool zero_fcc = false;
+  for (int i = ST_REG_FIRST; i <= ST_REG_LAST; i++)
+if (TEST_HARD_REG_BIT (need_zeroed_hardregs, i))
+  zero_fcc = true;
+
+  /* MIPS does not have a simple way to clear one bit in FCC.  We just
+ clear FCC with ctc1 and clobber all FCC bits.  */
+  if (zero_fcc)
+{
+  emit_insn (gen_mips_zero_fcc ());
+  for (int i = ST_REG_FIRST; i <= ST_REG_LAST; i++)
+   if (TEST_HARD_REG_BIT (need_zeroed_hardregs, i))
+ SET_HARD_REG_BIT (zeroed_hardregs, i);
+   else
+ emit_clobber (gen_rtx_REG (CCmode, i));
+}
+
+  need_zeroed_hardregs &= ~zeroed_hardregs;
+  return zeroed_hardregs |
+default_zero_call_used_regs (need_zeroed_hardregs);
+}
+
 
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
@@ -22919,6 +22964,8 @@ mips_asm_file_end (void)
 #undef TARGET_ASM_FILE_END
 #define TARGET_ASM_FILE_END mips_asm_file_end
 
+#undef TARGET_ZERO_CALL_USED_REGS
+#define TARGET_ZERO_CALL_USED_REGS mips_zero_call_used_regs
 
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index e0f0a582732..edf58710cdd 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -96,6 +96,7 @@ (define_c_enum "unspec" [
   ;; Floating-point environment.
   UNSPEC_GET_FCSR
   UNSPEC_SET_FCSR
+  UNSPEC_ZERO_FCC
 
   ;; HI/LO moves.
   UNSPEC_MFHI
@@ -7670,6 +7671,11 @@ (define_insn "*mips_set_fcsr"
   "TARGET_HARD_FLOAT"
   "ctc1\t%0,$31")
 
+(define_insn "mips_zero_fcc"
+  [(unspec_volatile [(const_int 0)] UNSPEC_ZERO_FCC)]
+  "TARGET_HARD_FLOAT"
+  "ctc1\t$0,$25")
+
 ;; See tls_get_tp_mips16_ for why this form is used.
 (define_insn "mips_set_fcsr_mips16_"
   [(unspec_volatile:SI [(match_operand:P 0 "call_insn_operand" "dS")
diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c 
b/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c
index 96e0b79b328..c23b2ceb391 100644
--- a/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
aarch64*-*-* nvptx*-*-* s390*-*-* } } } */
+/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
aarch64*-*-* nvptx*-*-* s390*-*-* mips*-*-* } } } */
 /* { dg-options "-O2" } */
 
 #include 
diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-11.c 
b/gcc/test

Re: [PATCH] rs6000: Improve .machine

2022-03-07 Thread Joseph Myers

This breaks the build of libgcc for powerpc-linux-gnu (32-bit, default 
CPU; configured --disable-multilib --enable-secureplt).

cc1: warning: The '-mfloat128' option may not be fully supported
/tmp/ccHCPzSh.s: Assembler messages:
/tmp/ccHCPzSh.s:163: Error: unrecognized opcode: `lfiwzx'
/scratch/jmyers/glibc-bot/src/gcc/libgcc/shared-object.mk:14: recipe for target 
'_kf_to_sd.o' failed
make[3]: *** [_kf_to_sd.o] Error 1

https://sourceware.org/pipermail/libc-testresults/2022q1/009421.html

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] rs6000: Fix up __SIZEOF_{FLOAT, IBM}128__ defines [PR99708]

2022-03-07 Thread Segher Boessenkool

Hi!

On Sat, Mar 05, 2022 at 09:21:51AM +0100, Jakub Jelinek wrote:
> As mentioned in the PR, right now on powerpc* __SIZEOF_{FLOAT,IBM}128__
> macros are predefined unconditionally, because {ieee,ibm}128_float_type_node
> is always non-NULL, doesn't reflect whether __ieee128 or __ibm128 are
> actually supported or not.
> 
> The following patch:
> 1) makes those {ieee,ibm}128_float_type_node trees NULL if we don't
>really support them instead of equal to long_double_type_node
> 2) adjusts the builtins code to use
>ibm128_float_type_node ? ibm128_float_type_node : long_double_type_node
>for the 2 builtins, so that we don't ICE during builtins creation
>if ibm128_float_type_node is NULL (using error_mark_node instead of
>long_double_type_node sounds more risky to me)

I feel the opposite way: (potentially) using the wrong thing is just a
ticking time bomb, never "safer".

> 3) in rs6000_type_string will not match NULL as __ibm128, and adds
>a __ieee128 case
> 4) removes the clearly unused ptr_{ieee,ibm}128_float_type_node trees;
>if something needs them in the future, they can be easily added back,
>but wasting GTY just in case...
> 5) actually syncs __SIZEOF_FLOAT128__ with the __float128 macro being
>defined in addition to __ieee128 being usable
> 
> Now, in the PR Segher said he doesn't like 5), but I think it is better
> to match the reality and get this PR fixed and if we want to rethink
> how __float128 is defined (whether it is a macro, or perhaps another
> builtin type like __ieee128 which could be easily done by
>lang_hooks.types.register_builtin_type (ieee128_float_type_node,
>   "__ieee128");
>lang_hooks.types.register_builtin_type (ieee128_float_type_node,
>   "__float128");
> perhaps under some conditions, rethink if the -mfloat128 option exists
> and what it does etc., it can be done incrementally and the rs6000-c.cc
> hunks in the patch can be easily reverted (appart from the formatting
> fix).

There needs to be a __SIZEOF_IEEE128__ as well, if you like reality :-)
Sorry I did not pick up on that earlier.

The basic types, that always exist if they are supported at all, are
__ibm128 and __ieee128.  Long double picks one of those, or some other
type; and __float128 is a source of confusion (it tries to make things
more similar to x86, but things are *not* similar anyway :-( )

Any code that cares about the actual mode used, or that supports more
than one type, should use the explicit types.  Code that just wants what
"long double" stands for can just use that, of course.

> Bootstrapped/regtested on powerpc{64le,64}-linux, on powerpc64-linux
> with -m32/-m64 testing, ok for trunk?

Tested with which -mcpu=?  You need at least power7 to get __ieee128
support, but it should be tested *without* that as well.  (The actual
default for powerpc64-linux is power4 (so not even 970), and for
powerpc-linux it is 603 or such.)

>   * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
>   __SIZEOF_FLOAT128__ here and only if __float128 macro is defined.

(if and only if?)

>   * config/rs6000/rs6000-gen-builtins.cc (TYPE_MAP_SIZE): Set to
>   85 instead of 86.

This should be auto-generated, or just not exist at all
("sizeof type_map / sizeof type_map[0]" in the one place it is used,
instead -- "one place" is after removing it from the definition of the
array of course, where it is unneeded :-) )

>  static
>  const char *rs6000_type_string (tree type_node)
>  {
> +  if (type_node == NULL_TREE)
> +return "unknown";

Please say this is NULL?  If you want to indicate it is problematic, you
can say "***NULL***" or such?

> +{ "if",  "ibm128_float_type_node "
> + "? ibm128_float_type_node "
> + ": long_double" },

I don't think that is correct.  If there is no __ibm128, there is no
IFmode either (they are two sides of the same coin).  Using DFmode
instead (which is the only thing that "long double" could be if not
IFmode) will just ICE later, if this is used at all.  So best case this
will confuse the reader.

Segher

Re: [PATCH] rs6000: Improve .machine

2022-03-07 Thread Segher Boessenkool

On Mon, Mar 07, 2022 at 08:53:42PM +, Joseph Myers wrote:
> This breaks the build of libgcc for powerpc-linux-gnu (32-bit, default 
> CPU; configured --disable-multilib --enable-secureplt).
> 
> cc1: warning: The '-mfloat128' option may not be fully supported
> /tmp/ccHCPzSh.s: Assembler messages:
> /tmp/ccHCPzSh.s:163: Error: unrecognized opcode: `lfiwzx'
> /scratch/jmyers/glibc-bot/src/gcc/libgcc/shared-object.mk:14: recipe for 
> target '_kf_to_sd.o' failed
> make[3]: *** [_kf_to_sd.o] Error 1
> 
> https://sourceware.org/pipermail/libc-testresults/2022q1/009421.html

Thanks for the report!  This is now PR104829.


Segher

Re: [PATCH v3] libgo: Don't use pt_regs member in mcontext_t

2022-03-07 Thread Ian Lance Taylor via Gcc-patches

On Sun, Mar 6, 2022 at 11:11 PM  wrote:
>
> +#ifdef __PPC64__
> +   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gp_regs[32];
> +#else
> +   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[32];
> +#endif

Have you tested this in 32-bit mode?  It does not look correct based
on the glibc definitions.  Looking at glibc it seems that it ought to
be

reg.sigpc = ((ucontext_t*)(context))->uc_mcontext.uc_regs->gregs[32];

Ian

Re: [pushed] c++: (*(fn))() [PR104618]

2022-03-07 Thread Jason Merrill via Gcc-patches


On 2/27/22 18:52, Jason Merrill wrote:

The patch for PR90451 deferred marking to the point of actual use; we missed
this one because of the parens.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/104618

gcc/cp/ChangeLog:

* typeck.cc (cp_build_addr_expr_1): Also
maybe_undo_parenthesized_ref.

gcc/testsuite/ChangeLog:

* g++.dg/overload/paren1.C: New test.
---
  gcc/cp/typeck.cc   | 6 +++---
  gcc/testsuite/g++.dg/overload/paren1.C | 7 +++
  2 files changed, 10 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/overload/paren1.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index f796337f73c..bddc83759ad 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -6884,9 +6884,9 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue, 
tsubst_flags_t complain)
   so we can just form an ADDR_EXPR with the correct type.  */
if (processing_template_decl || TREE_CODE (arg) != COMPONENT_REF)
  {
-  tree stripped_arg = tree_strip_any_location_wrapper (arg);
-  if (TREE_CODE (stripped_arg) == FUNCTION_DECL
- && !mark_used (stripped_arg, complain) && !(complain & tf_error))
+  tree stripped_arg
+   = tree_strip_any_location_wrapper (maybe_undo_parenthesized_ref (arg));
+  if (!mark_single_function (stripped_arg, complain))


It occurred to me that it would be cleaner to do the stripping in 
mark_single_function.


From efea883ae8a912f9e1081abf51b1e581232e5f30 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Mon, 28 Feb 2022 12:05:34 -0400
Subject: [PATCH] c++: tweak to (*(fn))() patch [PR104618]
To: gcc-patches@gcc.gnu.org

Other callers of mark_single_function might also want to look through these
wrapapers.

	PR c++/104618

gcc/cp/ChangeLog:

	* decl2.cc (mark_single_function): Look through parens and location
	wrapper.
	* typeck.cc (cp_build_addr_expr_1): Not here.
---
 gcc/cp/decl2.cc  | 3 +++
 gcc/cp/typeck.cc | 4 +---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 22edc2ba7f9..e85bb87c955 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -5737,6 +5737,9 @@ decl_dependent_p (tree decl)
 bool
 mark_single_function (tree expr, tsubst_flags_t complain)
 {
+  expr = maybe_undo_parenthesized_ref (expr);
+  expr = tree_strip_any_location_wrapper (expr);
+
   if (is_overloaded_fn (expr) == 1
   && !mark_used (expr, complain)
   && (complain & tf_error))
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index bddc83759ad..ea6a485bbbf 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -6884,9 +6884,7 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue, tsubst_flags_t complain)
  so we can just form an ADDR_EXPR with the correct type.  */
   if (processing_template_decl || TREE_CODE (arg) != COMPONENT_REF)
 {
-  tree stripped_arg
-	= tree_strip_any_location_wrapper (maybe_undo_parenthesized_ref (arg));
-  if (!mark_single_function (stripped_arg, complain))
+  if (!mark_single_function (arg, complain))
 	return error_mark_node;
   val = build_address (arg);
   if (TREE_CODE (arg) == OFFSET_REF)
-- 
2.27.0

[PATCH] RISCV: Strengthen libatomic lrsc pairs

2022-03-07 Thread Patrick O'Neill

Currently, libatomic's _sync_fetch_and_#op# and 
__sync_val_compare_and_swap methods are not sufficiently strong for the
ATOMIC_SEQ_CST memory model.

This can be shown using the following Herd litmus test:

RISCV LRSC-LIB-CALL

{
0:x6=a; 0:x8=b; 0:x10=c;
1:x6=a; 1:x8=b; 1:x10=c;
}

 P0  | P1   ;
 ori x1,x0,1 | lw x9,0(x10) ;
 sw x1,0(x10)| fence rw,rw  ;
 lr.w.aq x7,0(x8)| lw x5,0(x6)  ;
 ori x7,x0,1 |  ;
 sc.w.rl x3,x7,0(x8) |  ;
 sw x1,0(x6) |  ;

exists (1:x9=1 /\ 1:x5=0 /\ b=1)

This patch enforces SEQ_CST by setting the .aqrl bits on the LR and SC
ops.

2022-03-07 Patrick O'Neill 

PR target/104831
* atomic.c: Change LR.aq/SC.rl pairs into sequentially
consistent LR.aqrl/SC.aqrl pair.

Signed-off-by: Patrick O'Neill 
---
RISCV LRSC-BUGFIX

{
0:x6=a; 0:x8=b; 0:x10=c;
1:x6=a; 1:x8=b; 1:x10=c;
}

 P0| P1   ;
 ori x1,x0,1   | lw x9,0(x10) ;
 sw x1,0(x10)  | fence rw,rw  ;
 lr.w.aqrl x7,0(x8)| lw x5,0(x6)  ;
 ori x7,x0,1   |  ;
 sc.w.aqrl x3,x7,0(x8) |  ;
 sw x1,0(x6)   |  ;

~exists (1:x9=1 /\ 1:x5=0 /\ b=1)
---
Below are other Herd litmus tests used to test the LR.aqrl/SC.aqrl fix.
---
RISCV LRSC-READ

(*
  LR/SC with .aq .rl bits does not allow read operations to be reordered
  within/above it.
*)

{
0:x6=a; 0:x8=b;
1:x6=a; 1:x8=b;
}

 P0 | P1  ;
 lr.w.aq.rl x7,0(x8)| ori x1,x0,1 ;
 ori x7,x0,1| sw x1,0(x6) ;
 sc.w.aq.rl x1,x7,0(x8) | fence rw,rw ;
 lw x5,0(x6)| lw x7,0(x8) ;

~exists (0:x5=0 /\ 1:x7=0 /\ b=1)
---
RISCV READ-LRSC

(*
  LR/SC with .aq .rl bits does not allow read operations to be reordered
  within/beneath it.
*)

{
0:x6=a; 0:x8=b;
1:x6=a; 1:x8=b;
}

 P0 | P1  ;
 lw x5,0(x6)| ori x1,x0,1 ;
 lr.w.aq.rl x7,0(x8)| sw x1,0(x8) ;
 ori x1,x0,1| fence rw,rw ;
 sc.w.aq.rl x1,x1,0(x8) | sw x1,0(x6) ;

~exists (0:x5=1 /\ 0:x7=0 /\ b=1)
---
RISCV LRSC-WRITE

(* 
  LR/SC with .aq .rl bits does not allow write operations to be reordered
  within/above it.
*)

{
0:x8=b; 0:x10=c;
1:x8=b; 1:x10=c;
}

 P0 | P1  ;
 ori x9,x0,1| lw x9,0(x10);
 lr.w.aq.rl x7,0(x8)| fence rw,rw ;
 ori x7,x0,1| lw x7,0(x8) ;
 sc.w.aq.rl x1,x7,0(x8) | ;
 sw x9,0(x10)   | ;

~exists (1:x9=1 /\ 1:x7=0 /\ b=1)
---
RISCV WRITE-LRSC

(*
  LR/SC with .aq .rl bits does not allow write operations to be reordered
  within/beneath it.
*)

{
0:x8=b; 0:x10=c;
1:x8=b; 1:x10=c;
}

 P0 | P1   ;
 ori x1,x0,1| ori x1,x0,1  ;
 sw x1,0(x10)   | sw x1,0(x8)  ;
 lr.w.aq.rl x7,0(x8)| fence rw,rw  ;
 sc.w.aq.rl x1,x1,0(x8) | lw x9,0(x10) ;

~exists (0:x7=0 /\ 1:x9=0 /\ b=1)
---
 libgcc/config/riscv/atomic.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
index 7007e7a20e4..0c85a6d00ea 100644
--- a/libgcc/config/riscv/atomic.c
+++ b/libgcc/config/riscv/atomic.c
@@ -39,13 +39,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 unsigned old, tmp1, tmp2;  \
\
 asm volatile ("1:\n\t" \
- "lr.w.aq %[old], %[mem]\n\t"  \
+ "lr.w.aqrl %[old], %[mem]\n\t"\
  #insn " %[tmp1], %[old], %[value]\n\t"\
  invert\
  "and %[tmp1], %[tmp1], %[mask]\n\t"   \
  "and %[tmp2], %[old], %[not_mask]\n\t"\
  "or %[tmp2], %[tmp2], %[tmp1]\n\t"\
- "sc.w.rl %[tmp1], %[tmp2], %[mem]\n\t"\
+ "sc.w.aqrl %[tmp1], %[tmp2], %[mem]\n\t"  \
  "bnez %[tmp1], 1b"\
  : [old] "=&r" (old),  \
[mem] "+A" (*(volatile unsigned*) aligned_addr),\
@@ -73,12 +73,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 unsigned old, tmp1;
\
\
 asm volatile ("1:\n\t" \
- "lr.w.aq %[old], %[mem]\n\t"  \
+ "lr.w.aqrl %[old], %[mem]\n\t"
\
  "and %[tmp1], %[old], %[mask]\n\t"\
  "

Re: [C++ PATCH] PR c++/96437: ICE-on-invalid-code error recovery.

2022-03-07 Thread Jason Merrill via Gcc-patches


On 3/7/22 08:33, Roger Sayle wrote:


This patch fixes PR c++/96437 which is an ICE-on-invalid-code regression
affecting mainline.

This patch has been tested on x86_64-pc-linux-gnu, enabling languages
c and c++, with make bootstrap and make -k check with no new failures.
Ok for mainline?

2022-03-07  Roger Sayle  

gcc/cp/ChangeLog
PR c++/96437
* parser.cc (synthesize_implicit_template_parm): Check that
TREE_VALUE (new_parm) isn't error_mark_node before setting its
DECL_VIRTUAL_P.

gcc/testsuite/ChangeLog
PR c++/96437
* g++.dg/pr96437.C: New test case.



+  if (TREE_VALUE (new_parm) != error_mark_node)
+DECL_VIRTUAL_P (TREE_VALUE (new_parm)) = true;


Hmm, I wonder about returning early if there was an error, but this is 
fine as is.



+/* { dg-options "-O2" } */


This also seems unneeded for this test.

Also, please put tests in a subdirectory of g++.dg, not the top 
directory.  This one might go in cpp2a/ since it's a C++20 feature.


OK with those adjustments.

Jason

Re: [C++ PATCH] PR c++/96329: ICE-on-invalid-code error recovery.

2022-03-07 Thread Jason Merrill via Gcc-patches


On 3/7/22 08:04, Roger Sayle wrote:


This patch fixes PR c++/96329 which is an ICE-on-invalid-code regression
affecting mainline.

This patch has been tested on x86_64-pc-linux-gnu, enabling languages
c and c++, with make bootstrap and make -k check with no new failures.
Ok for mainline?

2022-03-07  Roger Sayle  

gcc/cp/ChangeLog
PR c++/96329
* parser.cc (cp_parser_linkage_specification): Treat the case where
linkage is error_mark_node as "invalid linkage-specification".

gcc/testsuite/ChangeLog
PR c++/96329
* g++.dg/pr96329.C: New test case.



+/* { dg-options "-O2" } */


Is this necessary for the failure?  That would be surprising for a 
parser issue.


And this should probably go in g++.dg/template.

OK with those adjustments.

Jason

Re: [C++ PATCH] PR c++/96440: ICE-on-invalid-code error recovery.

2022-03-07 Thread Jason Merrill via Gcc-patches


On 3/7/22 08:55, Roger Sayle wrote:


This patch fixes PR c++/96440 which is an ICE-on-invalid-code regression
affecting mainline.

This patch has been tested on x86_64-pc-linux-gnu, enabling languages
c and c++, with make bootstrap and make -k check with no new failures.
Ok for mainline?

2022-03-07  Roger Sayle  

gcc/cp/ChangeLog
PR c++/96440
* decl.cc (start_decl): Defend against prefix_attributes being
error_mark_node.

gcc/testsuite/ChangeLog
PR c++/96440
* g++.dg/pr96440.C: New test case.



+/* { dg-options "-O2" } */


Please remove this line and put the test in g++.dg/cpp0x/

OK with those changes.

Jason

[PATCH] rtl: ICE with thread_local and inline asm [PR104777]

2022-03-07 Thread Marek Polacek via Gcc-patches

In r270550, Jakub fixed classify_insn to handle asm goto: if the asm can
jump to a label, the insn should be a JUMP_INSN.

However, as the following testcase shows, non-null ASM_OPERANDS_LABEL_VEC
doesn't guarantee that the rtx has any actual labels it can branch to.
Here, the rtvec has 0 elements because of the __thread variable: we perform
ix86_rewrite_tls_address which calls copy_isns and that allocates the rtvec:

XVEC (copy, i) = rtvec_alloc (XVECLEN (orig, i));

This causes an ICE in update_br_prob_note: BRANCH_EDGE (bb) crashes
because there's no branch edge.  I think we can fix this by checking
that there is at least one label the asm can jump to before wrapping
the ASM_OPERANDS in a JUMP_INSN.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11?

PR rtl-optimization/104777

gcc/ChangeLog:

* rtl.cc (classify_insn): For ASM_OPERANDS, return JUMP_INSN only if
ASM_OPERANDS_LABEL_VEC has at least one element.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/tls/pr104777.c: New test.
---
 gcc/rtl.cc  |  7 +++--
 gcc/testsuite/gcc.dg/torture/tls/pr104777.c | 30 +
 2 files changed, 35 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/tls/pr104777.c

diff --git a/gcc/rtl.cc b/gcc/rtl.cc
index f17474bfee1..a29d5e1371e 100644
--- a/gcc/rtl.cc
+++ b/gcc/rtl.cc
@@ -765,7 +765,9 @@ classify_insn (rtx x)
 return CALL_INSN;
   if (ANY_RETURN_P (x))
 return JUMP_INSN;
-  if (GET_CODE (x) == ASM_OPERANDS && ASM_OPERANDS_LABEL_VEC (x))
+  if (GET_CODE (x) == ASM_OPERANDS
+  && ASM_OPERANDS_LABEL_VEC (x)
+  && GET_NUM_ELEM (ASM_OPERANDS_LABEL_VEC (x)) > 0)
 return JUMP_INSN;
   if (GET_CODE (x) == SET)
 {
@@ -794,7 +796,8 @@ classify_insn (rtx x)
   if (has_return_p)
return JUMP_INSN;
   if (GET_CODE (XVECEXP (x, 0, 0)) == ASM_OPERANDS
- && ASM_OPERANDS_LABEL_VEC (XVECEXP (x, 0, 0)))
+ && ASM_OPERANDS_LABEL_VEC (XVECEXP (x, 0, 0))
+ && GET_NUM_ELEM (ASM_OPERANDS_LABEL_VEC (XVECEXP (x, 0, 0))) > 0)
return JUMP_INSN;
 }
 #ifdef GENERATOR_FILE
diff --git a/gcc/testsuite/gcc.dg/torture/tls/pr104777.c 
b/gcc/testsuite/gcc.dg/torture/tls/pr104777.c
new file mode 100644
index 000..abaf59731fc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/tls/pr104777.c
@@ -0,0 +1,30 @@
+/* PR rtl-optimization/104777 */
+/* { dg-do compile } */
+/* { dg-require-effective-target tls } */
+ 
+int savestate_r;
+int savestate_ssb;
+extern void abort();
+__thread int  loop;
+void f (void)
+{
+  int savestate_r0_5;
+  int savestate_r1_6;
+
+  __asm__("" : "=m" (savestate_ssb), "=r" (savestate_r));
+  savestate_r0_5 = savestate_r;
+  if (savestate_r0_5 == 0)
+  {
+__asm__ __volatile__("" :  : "m" (loop));
+abort ();
+  }
+
+  __asm__("" : "=m" (savestate_ssb), "=r" (savestate_r));
+  savestate_r1_6 = savestate_r;
+  if (savestate_r1_6 != 0)
+return;
+
+  __asm__ __volatile__("" :  : "m" (loop));
+  abort ();
+
+}

base-commit: b7dbe870bc0193c76345f893d94c5bf6c99a6afe
-- 
2.35.1

Re: [PATCH] rtl: ICE with thread_local and inline asm [PR104777]

2022-03-07 Thread Segher Boessenkool

On Mon, Mar 07, 2022 at 07:03:17PM -0500, Marek Polacek via Gcc-patches wrote:
> In r270550, Jakub fixed classify_insn to handle asm goto: if the asm can
> jump to a label, the insn should be a JUMP_INSN.
> 
> However, as the following testcase shows, non-null ASM_OPERANDS_LABEL_VEC
> doesn't guarantee that the rtx has any actual labels it can branch to.

But it should.

> Here, the rtvec has 0 elements because of the __thread variable: we perform
> ix86_rewrite_tls_address which calls copy_isns and that allocates the rtvec:
> 
> XVEC (copy, i) = rtvec_alloc (XVECLEN (orig, i));

So fix *that* instead?  Everywhere else does not use length zero RTL
vectors.  copy_rtx makes sure to do the right thing here, for example.

We do not have notation to create zero-length vectors in RTL source
code either, btw.:
case 'V':
  /* 'V' is an optional vector: if a closeparen follows,
 just store NULL for this element.  */
(optional vectors are at the end of an RTX), and if you write [] you
will hit
  fatal_with_file_and_line ("vector must have at least one element");

Segher

Re: [PATCH] [i386] Don't fold builtin into gimple when isa mismatches.

2022-03-07 Thread Hongtao Liu via Gcc-patches

ping^1

On Fri, Feb 25, 2022 at 1:51 PM Hongtao Liu  wrote:
>
> On Fri, Feb 25, 2022 at 1:50 PM liuhongt  wrote:
> >
> > The patch fixes ICE in ix86_gimple_fold_builtin.
> >
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> Ok for main trunk?
>
> > gcc/ChangeLog:
> >
> > PR target/104666
> > * config/i386/i386-expand.cc
> > (ix86_check_builtin_isa_match): New func.
> > (ix86_expand_builtin): Move code to
> > ix86_check_builtin_isa_match and call it.
> > * config/i386/i386-protos.h
> > (ix86_check_builtin_isa_match): Declare.
> > * config/i386/i386.cc (ix86_gimple_fold_builtin): Don't fold
> > builtin into gimple when isa mismatches.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr104666.c: New test.
> > ---
> >  gcc/config/i386/i386-expand.cc   | 97 ++--
> >  gcc/config/i386/i386-protos.h|  5 ++
> >  gcc/config/i386/i386.cc  |  4 +
> >  gcc/testsuite/gcc.target/i386/pr104666.c | 49 
> >  4 files changed, 115 insertions(+), 40 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104666.c
> >
> > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> > index faa0191c6dd..1d132f0181d 100644
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -12232,46 +12232,14 @@ ix86_expand_vec_set_builtin (tree exp)
> >return target;
> >  }
> >
> > -/* Expand an expression EXP that calls a built-in function,
> > -   with result going to TARGET if that's convenient
> > -   (and in mode MODE if that's convenient).
> > -   SUBTARGET may be used as the target for computing one of EXP's operands.
> > -   IGNORE is nonzero if the value is to be ignored.  */
> > -
> > -rtx
> > -ix86_expand_builtin (tree exp, rtx target, rtx subtarget,
> > -machine_mode mode, int ignore)
> > +/* Return true if the necessary isa options for this builtin exist,
> > +   else false.
> > +   fcode = DECL_MD_FUNCTION_CODE (fndecl);  */
> > +bool
> > +ix86_check_builtin_isa_match (unsigned int fcode,
> > + HOST_WIDE_INT* pbisa,
> > + HOST_WIDE_INT* pbisa2)
> >  {
> > -  size_t i;
> > -  enum insn_code icode, icode2;
> > -  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
> > -  tree arg0, arg1, arg2, arg3, arg4;
> > -  rtx op0, op1, op2, op3, op4, pat, pat2, insn;
> > -  machine_mode mode0, mode1, mode2, mode3, mode4;
> > -  unsigned int fcode = DECL_MD_FUNCTION_CODE (fndecl);
> > -
> > -  /* For CPU builtins that can be folded, fold first and expand the fold.  
> > */
> > -  switch (fcode)
> > -{
> > -case IX86_BUILTIN_CPU_INIT:
> > -  {
> > -   /* Make it call __cpu_indicator_init in libgcc. */
> > -   tree call_expr, fndecl, type;
> > -type = build_function_type_list (integer_type_node, NULL_TREE);
> > -   fndecl = build_fn_decl ("__cpu_indicator_init", type);
> > -   call_expr = build_call_expr (fndecl, 0);
> > -   return expand_expr (call_expr, target, mode, EXPAND_NORMAL);
> > -  }
> > -case IX86_BUILTIN_CPU_IS:
> > -case IX86_BUILTIN_CPU_SUPPORTS:
> > -  {
> > -   tree arg0 = CALL_EXPR_ARG (exp, 0);
> > -   tree fold_expr = fold_builtin_cpu (fndecl, &arg0);
> > -   gcc_assert (fold_expr != NULL_TREE);
> > -   return expand_expr (fold_expr, target, mode, EXPAND_NORMAL);
> > -  }
> > -}
> > -
> >HOST_WIDE_INT isa = ix86_isa_flags;
> >HOST_WIDE_INT isa2 = ix86_isa_flags2;
> >HOST_WIDE_INT bisa = ix86_builtins_isa[fcode].isa;
> > @@ -12321,7 +12289,56 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
> > subtarget,
> >bisa |= OPTION_MASK_ISA_SSE2;
> >  }
> >
> > -  if ((bisa & isa) != bisa || (bisa2 & isa2) != bisa2)
> > +  if (pbisa)
> > +*pbisa = bisa;
> > +  if (pbisa2)
> > +*pbisa2 = bisa2;
> > +
> > +  return (bisa & isa) == bisa && (bisa2 & isa2) == bisa2;
> > +}
> > +
> > +/* Expand an expression EXP that calls a built-in function,
> > +   with result going to TARGET if that's convenient
> > +   (and in mode MODE if that's convenient).
> > +   SUBTARGET may be used as the target for computing one of EXP's operands.
> > +   IGNORE is nonzero if the value is to be ignored.  */
> > +
> > +rtx
> > +ix86_expand_builtin (tree exp, rtx target, rtx subtarget,
> > +machine_mode mode, int ignore)
> > +{
> > +  size_t i;
> > +  enum insn_code icode, icode2;
> > +  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
> > +  tree arg0, arg1, arg2, arg3, arg4;
> > +  rtx op0, op1, op2, op3, op4, pat, pat2, insn;
> > +  machine_mode mode0, mode1, mode2, mode3, mode4;
> > +  unsigned int fcode = DECL_MD_FUNCTION_CODE (fndecl);
> > +  HOST_WIDE_INT bisa, bisa2;
> > +
> > +  /* For CPU builtins that can be folded, fold first and expand the fold.  
> > */
> > +  switch (fcode)
> > +{
> > +case IX86_BUI

[PATCH v2] RISCV: Strengthen libatomic lrsc pairs

2022-03-07 Thread Patrick O'Neill

Currently, libatomic's _sync_fetch_and_#op# and 
__sync_val_compare_and_swap methods are not sufficiently strong for the
ATOMIC_SEQ_CST memory model.

This can be shown using the following Herd litmus test:

RISCV LRSC-LIB-CALL

{
0:x6=a; 0:x8=b; 0:x10=c;
1:x6=a; 1:x8=b; 1:x10=c;
}

 P0  | P1   ;
 ori x1,x0,1 | lw x7,0(x8) ;
 sw x1,0(x10)| fence rw,rw  ;
 lr.w.aq x7,0(x8)| lw x5,0(x6)  ;
 ori x7,x0,1 |  ;
 sc.w.rl x3,x7,0(x8) |  ;
 fence rw,w  |  ;
 sw x1,0(x6) |  ;

exists (1:x7=1 /\ 1:x5=0 /\ b=1)

This patch enforces SEQ_CST by setting the .aqrl bits on the LR and .rl
bits on SC ops.

2022-03-07 Patrick O'Neill 

PR target/104831
* atomic.c: Change LR.aq/SC.rl pairs into sequentially
consistent LR.aqrl/SC.rl pair.

Signed-off-by: Patrick O'Neill 
---
Changelog v2:
 - Weakened LR/SC pairs to be in-line with ISA manual.
 - Updated litmus tests to reflect the relevant leading fences present
   in the RISCV implementation.
---
RISCV LRSC-BUGFIX

{
0:x6=a; 0:x8=b; 0:x10=c;
1:x6=a; 1:x8=b; 1:x10=c;
}

 P0  | P1   ;
 ori x1,x0,1 | lw x7,0(x8) ;
 sw x1,0(x10)| fence rw,rw  ;
 lr.w.aq x7,0(x8)| lw x5,0(x6)  ;
 ori x7,x0,1 |  ;
 sc.w.rl x3,x7,0(x8) |  ;
 fence rw,w  |  ;
 sw x1,0(x6) |  ;

exists (1:x7=1 /\ 1:x5=0 /\ b=1)
---
Below are other Herd litmus tests used to test the LR.aqrl/SC.aqrl fix.
---
RISCV LRSC-READ

(*
  LR/SC with .aq .rl bits does not allow read operations to be reordered
  within/above it.
*)

{
0:x6=a; 0:x8=b;
1:x6=a; 1:x8=b;
}

 P0  | P1  ;
 lr.w.aq.rl x7,0(x8) | ori x1,x0,1 ;
 ori x7,x0,1 | sw x1,0(x6) ;
 sc.w.rl x1,x7,0(x8) | fence rw,rw ;
 fence rw,rw | lw x7,0(x8) ;
 lw x5,0(x6) | ;

~exists (0:x5=0 /\ 1:x7=0 /\ b=1)
---
RISCV READ-LRSC

(*
  LR/SC with .aq .rl bits does not allow read operations to be reordered
  within/beneath it.
*)

{
0:x6=a; 0:x8=b;
1:x6=a; 1:x8=b;
}

 P0  | P1  ;
 lw x5,0(x6) | ori x1,x0,1 ;
 lr.w.aq.rl x7,0(x8) | sw x1,0(x8) ;
 ori x1,x0,1 | fence rw,rw ;
 sc.w.rl x1,x1,0(x8) | sw x1,0(x6) ;

~exists (0:x5=1 /\ 0:x7=0 /\ b=1)
---
RISCV LRSC-WRITE

(* 
  LR/SC with .aq .rl bits does not allow write operations to be reordered
  within/above it.
*)

{
0:x8=b; 0:x10=c;
1:x8=b; 1:x10=c;
}

 P0  | P1  ;
 ori x9,x0,1 | lw x9,0(x10);
 lr.w.aq.rl x7,0(x8) | fence rw,rw ;
 ori x7,x0,1 | lw x7,0(x8) ;
 sc.w.rl x1,x7,0(x8) | ;
 fence rw,w  | ;
 sw x9,0(x10)| ;

~exists (1:x9=1 /\ 1:x7=0 /\ b=1)
---
RISCV WRITE-LRSC

(*
  LR/SC with .aq .rl bits does not allow write operations to be reordered
  within/beneath it.
*)

{
0:x8=b; 0:x10=c;
1:x8=b; 1:x10=c;
}

 P0  | P1   ;
 ori x1,x0,1 | ori x1,x0,1  ;
 sw x1,0(x10)| sw x1,0(x8)  ;
 lr.w.aq.rl x7,0(x8) | fence rw,rw  ;
 sc.w.rl x1,x1,0(x8) | lw x9,0(x10) ;

~exists (0:x7=0 /\ 1:x9=0 /\ b=1)
---
 libgcc/config/riscv/atomic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
index 7007e7a20e4..fa0e428963f 100644
--- a/libgcc/config/riscv/atomic.c
+++ b/libgcc/config/riscv/atomic.c
@@ -39,7 +39,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 unsigned old, tmp1, tmp2;  \
\
 asm volatile ("1:\n\t" \
- "lr.w.aq %[old], %[mem]\n\t"  \
+ "lr.w.aqrl %[old], %[mem]\n\t"\
  #insn " %[tmp1], %[old], %[value]\n\t"\
  invert\
  "and %[tmp1], %[tmp1], %[mask]\n\t"   \
@@ -73,7 +73,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 unsigned old, tmp1;
\
\
 asm volatile ("1:\n\t" \
- "lr.w.aq %[old], %[mem]\n\t"  \
+ "lr.w.aqrl %[old], %[mem]\n\t"
\
  "and %[tmp1], %[old], %[mask]\n\t"\
  "bne %[tmp1], %[o], 1f\n\t"   \
  "and %[tmp1], %[old], %[not_mask]\n\t"\
-- 
2.25.1

[PATCH] RISC-V: Handle combine extension in canonical ordering.

2022-03-07 Thread shihua

From: LiaoShihua 

The crypto extension have several shorthand extensions that don't consist of 
any extra instructions.
Take zk for example, while the extension would imply zkn, zkr, zkt. 
The 3 extensions should also combine back into zk to maintain the canonical 
order in isa strings.
This patch addresses the above.
And if the other extension has the same situation, you can add them in 
riscv_combine_info[]



gcc/ChangeLog:

* common/config/riscv/riscv-common.cc 
(riscv_subset_list::handle_combine_ext):Combine back into zk to maintain the 
canonical order in isa strings.
(riscv_subset_list::parse):Ditto.
* config/riscv/riscv-subset.h:Declare handle_combine_ext();

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-17.c: New test.

---
 gcc/common/config/riscv/riscv-common.cc| 56 +++
 gcc/config/riscv/riscv-subset.h|  1 +
 gcc/testsuite/gcc.target/riscv/predef-17.c | 63 ++
 3 files changed, 120 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-17.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index a904893b9ed..1c06f83cc1c 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -189,6 +189,16 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
 
+/* Combine extensions defined in this table  */
+static const struct riscv_ext_version riscv_combine_info[] =
+{
+  {"zk",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zkn",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zks",  ISA_SPEC_CLASS_NONE, 1, 0},
+  /* Terminate the list.  */
+  {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
+};
+
 static const riscv_cpu_info riscv_cpu_tables[] =
 {
 #define RISCV_CORE(CORE_NAME, ARCH, TUNE) \
@@ -813,6 +823,50 @@ riscv_subset_list::handle_implied_ext (riscv_subset_t *ext)
 }
 }
 
+/* Check any combine extensions for EXT.  */
+void
+riscv_subset_list::handle_combine_ext (riscv_subset_list *subset_list)
+{
+  const riscv_ext_version *combine_info;
+  const riscv_implied_info_t *implied_info;
+  bool IsCombined = false;
+
+  for (combine_info = &riscv_combine_info[0]; combine_info->name; 
++combine_info)
+  {
+
+/* Skip if combine extensions are present */
+if (subset_list->lookup(combine_info->name))
+  continue;
+
+/* Find all extensions of the combine extension   */
+for (implied_info = &riscv_implied_info[0]; implied_info->ext; 
++implied_info)
+{
+  /* Skip if implied extension don't match combine extension */
+  if (strcmp(combine_info->name, implied_info->ext) != 0)
+continue; 
+
+  if (subset_list->lookup(implied_info->implied_ext))
+  {
+IsCombined = true;
+  }
+  else
+  {
+IsCombined = false;
+break;
+  }
+}
+
+/* Add combine extensions */
+if (IsCombined)
+{
+  if (subset_list->lookup(combine_info->name) == NULL)
+  {
+subset_list->add (combine_info->name, combine_info->major_version, 
combine_info->minor_version, false, true);
+  }
+}
+  }
+}
+
 /* Parsing function for multi-letter extensions.
 
Return Value:
@@ -992,6 +1046,8 @@ riscv_subset_list::parse (const char *arch, location_t loc)
   subset_list->handle_implied_ext (itr);
 }
 
+  subset_list->handle_combine_ext (subset_list);
+
   return subset_list;
 
 fail:
diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
index 4f3556a8d9b..da2e22d34f2 100644
--- a/gcc/config/riscv/riscv-subset.h
+++ b/gcc/config/riscv/riscv-subset.h
@@ -68,6 +68,7 @@ private:
 const char *);
 
   void handle_implied_ext (riscv_subset_t *);
+  void handle_combine_ext (riscv_subset_list *);
 
 public:
   ~riscv_subset_list ();
diff --git a/gcc/testsuite/gcc.target/riscv/predef-17.c 
b/gcc/testsuite/gcc.target/riscv/predef-17.c
new file mode 100644
index 000..68f5f95a66c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-17.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64i_zbkb_zbkc_zbkx_zknd_zkne_zknh_zksed_zksh_zkr_zkt 
-mabi=lp64 -mcmodel=medlow -misa-spec=2.2" } */
+
+int main () {
+
+#ifndef __riscv_arch_test
+#error "__riscv_arch_test"
+#endif
+
+#if __riscv_xlen != 64
+#error "__riscv_xlen"
+#endif
+
+#if !defined(__riscv_i)
+#error "__riscv_i"
+#endif
+
+#if !defined(__riscv_zk)
+#error "__riscv_zk"
+#endif
+
+#if !defined(__riscv_zkr)
+#error "__riscv_zkr"
+#endif
+
+#if !defined(__riscv_zkn)
+#error "__riscv_zkn"
+#endif
+
+#if !defined(__riscv_zks)
+#error "__riscv_zks"
+#endif
+
+#if !defined(__riscv_zbkb)
+#error "__riscv_zbkb"
+#endif
+
+#if !defined(__riscv_zbkc)
+#error "__riscv_zbkc"
+#endif
+
+#if !defined(__riscv_zbkx)
+#error "__riscv_zbkx"
+#endif
+
+#if !defined(__riscv_zknd)
+#error "__riscv_zknd"
+#endif
+
+#if !defined(__riscv_zkne)
+#error "__riscv_zkne"
+#endif

Re: [PATCH] [i386] Prevent vectorization for load from parm_decl at O2 to avoid STF issue.

2022-03-07 Thread Hongtao Liu via Gcc-patches

On Mon, Mar 7, 2022 at 5:37 PM Richard Biener via Gcc-patches
 wrote:
>
> On Fri, Mar 4, 2022 at 8:27 AM liuhongt  wrote:
> >
> > For parameter passing through stack, vectorized load from parm_decl
> > in callee may trigger serious STF issue. This is why GCC12 regresses
> > 50% for cray at -O2 compared to GCC11.
> >
> > The patch add an extremely large number to stmt_cost to prevent
> > vectorization for loads from parm_decl under very-cheap cost model,
> > this can at least prevent O2 regression due to STF issue, but may lose
> > some perf where there's no such issue(1 vector_load vs n scalar_load +
> > CTOR).
>
> Note this is just heuristics in that by-value passed parameters are usually
> stored to the stack close before the function call.  It does not catch the
> similar case from
>
>   foo (const X &bar)  { ... }
>
> where a
>
>   foo ({ 1., 2. })
>
> will have the object passed by reference constructed right before the
> call.  In the end a full solution will need to perform some IPA analysis
> that computes the initialization distance from the call and uses should
> factor in the use distance from function entry.
Yes, this patch only deals with by-value passed objects which has
nothing to do with ipa, but only depends on psABI.
For the case of passing references, IPA is necessary to help analyze
some obvious STFS scenarios, but static analysis still has
limitations, for example, for the store across the cache line (or
page) boundary case, IPA can not give a judgment.
>
> > No impact for SPEC2017 for both plain O2 and native O2 on ICX.
> > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > PR target/101908
> > * config/i386/i386.cc (ix86_load_maybe_stfs_p): New.
> > (ix86_vector_costs::add_stmt_cost): Add extra cost for
> > vector_load/unsigned_load which may have stall forward issue.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr101908-1.c: New test.
> > * gcc.target/i386/pr101908-2.c: New test.
> > ---
> >  gcc/config/i386/i386.cc| 31 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-1.c | 12 +
> >  gcc/testsuite/gcc.target/i386/pr101908-2.c | 12 +
> >  3 files changed, 55 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-2.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index b2bf90576d5..3bbaaf65ea8 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -22976,6 +22976,19 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, 
> > struct noce_if_info *if_info)
> >return default_noce_conversion_profitable_p (seq, if_info);
> >  }
> >
> > +/* Return true if REF may have STF issue, otherwise false.  */
> > +static bool
> > +ix86_load_maybe_stfs_p (tree ref)
> > +{
> > +  tree addr = get_base_address (ref);
> > +
> > +  if (TREE_CODE (addr) != PARM_DECL
> > +  || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (addr)))
> > +  || tree_to_uhwi (TYPE_SIZE (TREE_TYPE (addr))) <= MAX_BITS_PER_WORD)
> > +return false;
> > +  return true;
> > +}
> > +
> >  /* x86-specific vector costs.  */
> >  class ix86_vector_costs : public vector_costs
> >  {
> > @@ -23203,6 +23216,24 @@ ix86_vector_costs::add_stmt_cost (int count, 
> > vect_cost_for_stmt kind,
> > if (TREE_CODE (op) == SSA_NAME)
> >   TREE_VISITED (op) = 0;
> >  }
> > +
> > +  /* Prevent vectorization for load from parm_decl at O2 to avoid STF 
> > issue.
> > + Performance may lose when there's no STF issue(1 vector_load vs n
> > + scalar_load + CTOR).
> > + TODO: both extra cost(2000) and ix86_load_maybe_stfs_p need to be fine
> > + tuned.  */
> > +  if ((kind == vector_load || kind == unaligned_load)
> > +  && flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP
>
> This check doesn't make much sense, I'd rather remove it.
Will change.
>
> > +  && stmt_info
> > +  && stmt_info->slp_type == pure_slp
> > +  && stmt_info->stmt
> > +  && gimple_assign_load_p (stmt_info->stmt)
> > +  && ix86_load_maybe_stfs_p (gimple_assign_rhs1 (stmt_info->stmt)))
>
> I'd pass down STMT_VINFO_DATA_REF instead and have ix86_load_maybe_stfs_p
> and use
>
>   tree addr = DR_BASE_ADDRESS (dr);
>   if (TREE_CODE (addr) != ADDR_EXPR)
> return false;
>   addr = get_base_address (TREE_OPERAND (addr, 0));
>   ...
>
> since that gets you a more reliable way to look at the actual object 
> referenced.
Yes.
>
> > +{
> > +  stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, 
> > misalign);
> > +  stmt_cost += 2000;
> > +}
> > +
>
> Maybe handle this like the Bonell case and thus put it after the
> stmt_cost == -1 handling, just bumping the cost (also noting the actual number
> is arbitrary).  It would be nice to have a better estimate on the penalty than
> "2000", maybe formulate

[PATCH] c++: Don't suggest cdtor or conversion op identifiers in spelling hints [PR104806]

2022-03-07 Thread Jakub Jelinek via Gcc-patches

Hi!

On the following testcase, we emit "did you mean '__dt '?" in the error
message.  "__dt " shows there because it is dtor_identifier, but we
shouldn't suggest those to the user, they are purely internal and can't
be really typed by the user because of the final space in it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-03-08  Jakub Jelinek  

PR c++/104806
* search.cc (lookup_field_fuzzy_info::fuzzy_lookup_field): Ignore
identifiers with space at the end.

* g++.dg/spellcheck-pr104806.C: New test.

--- gcc/cp/search.cc.jj 2022-01-18 11:58:59.407984557 +0100
+++ gcc/cp/search.cc2022-03-07 10:44:33.455673155 +0100
@@ -1275,6 +1275,13 @@ lookup_field_fuzzy_info::fuzzy_lookup_fi
   if (is_lambda_ignored_entity (field))
continue;
 
+  /* Ignore special identifiers with space at the end like cdtor or
+conversion op identifiers.  */
+  if (TREE_CODE (DECL_NAME (field)) == IDENTIFIER_NODE)
+   if (unsigned int len = IDENTIFIER_LENGTH (DECL_NAME (field)))
+ if (IDENTIFIER_POINTER (DECL_NAME (field))[len - 1] == ' ')
+   continue;
+
   m_candidates.safe_push (DECL_NAME (field));
 }
 }
--- gcc/testsuite/g++.dg/spellcheck-pr104806.C.jj   2022-03-07 
10:34:07.224499657 +0100
+++ gcc/testsuite/g++.dg/spellcheck-pr104806.C  2022-03-07 10:43:41.900399808 
+0100
@@ -0,0 +1,5 @@
+// PR c++/104806
+
+struct S {};
+int main() { S s; s.__d; } // { dg-bogus "'struct S' has no member named 
'__d'; did you mean '__\[a-z]* '" }
+   // { dg-error "'struct S' has no member named 
'__d'" "" { target *-*-* } .-1 }

Jakub

67 matches

Mail list logo