RE: [PATCH v1 1/1] gcc: config: microblaze: fix cpu version check

2023-10-24 Thread Frager, Neal
>>> There is a microblaze cpu version 10.0 included in versal. If the 
>>> minor version is only a single digit, then the version comparison 
>>> will fail as version 10.0 will appear as 100 compared to version
>>> 6.00 or 8.30 which will calculate to values 600 and 830.
>>> The issue can be seen when using the '-mcpu=10.0' option.
>>> With this fix, versions with a single digit minor number such as
>>> 10.0 will be calculated as greater than versions with a smaller 
>>> major version number, but with two minor version digits.
>>> By applying this fix, several incorrect warning messages will no 
>>> longer be printed when building the versal plm application, such as 
>>> the warning message below:
>>> warning: '-mxl-multiply-high' can be used only with '-mcpu=v6.00.a' 
>>> or greater
>>> Signed-off-by: Neal Frager 
>>> ---
>>>   gcc/config/microblaze/microblaze.cc | 164 +---
>>>   1 file changed, 76 insertions(+), 88 deletions(-)
>>
>> Please add a test case.
>>
>> --
>> Michael Eager
> 
> Hi Michael,
> 
> Would you mind helping me understand how to make a gcc test case for this 
> patch?
> 
> This patch does not change the resulting binaries of a microblaze gcc build.  
> The output will be the same with our without the patch, so I do not having 
> anything in the binary itself to verify.
> 
> All that happens is false warning messages will not be printed when building 
> with ‘-mcpu=10.0’.  Is there a way to test for warning messages?
> 
> In any case, please do not commit v1 of this patch.  I am going to work on 
> making a v2 based on Mark’s feedback.

> You can create a test case which passes the -mcpu=10.0 and other options to 
> GCC and verify that the message is not generated after the patch is applied.

> You can make all GCC warnings into errors with the "-Werror" option.
> This means that the compile will fail if the warning is issued.

> Take a look at gcc/testsuite/gcc.target/aarch64/bti-1.c for an example of 
> using { dg-options "" } to specify command line options.

> There is a test suite option (dg-warning) which checks that a particular 
> source line generates a warning message, but it isn't clear whether is is 
> possible to check that a warning is not issued.

Hi Michael,

Thanks to Mark Hatle's feedback, we have a much simpler solution to the problem.

The following change is actually all that is necessary.  Since we are just 
moving from
strcasecmp to strverscmp, does v2 of the patch need to have a test case to go 
with it?

-#define MICROBLAZE_VERSION_COMPARE(VA,VB) strcasecmp (VA, VB)
+#define MICROBLAZE_VERSION_COMPARE(VA,VB) strverscmp (VA, VB)

I assume there are already test cases that verify that strverscmp works 
correctly?

Best regards,
Neal Frager
AMD


Re: [PATCH] match: Fix the `popcnt(a&b) + popcnt(a|b)` patthern for types [PR111913]

2023-10-24 Thread Richard Biener
On Tue, Oct 24, 2023 at 1:04 AM Andrew Pinski  wrote:
>
> So this pattern needs a little help on the gimple side of things to know what
> the type popcount should be. For most builtins, the type is the same as the 
> input
> but popcount and others are not. And when using it with another outer 
> expression,
> genmatch needs some slight help to know that the return type was type rather 
> than
> the argument type.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> PR tree-optimization/111913
>
> gcc/ChangeLog:
>
> * match.pd (`popcount(X&Y) + popcount(X|Y)`): Add the resulting
> type for popcount.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/compile/fold-popcount-1.c: New test.
> * gcc.dg/fold-popcount-8a.c: New test.
> ---
>  gcc/match.pd  |  2 +-
>  .../gcc.c-torture/compile/fold-popcount-1.c   | 13 
>  gcc/testsuite/gcc.dg/fold-popcount-8a.c   | 33 +++
>  3 files changed, 47 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/fold-popcount-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/fold-popcount-8a.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index ce8d159d260..f725a685863 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -8600,7 +8600,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* popcount(X&Y) + popcount(X|Y) is popcount(x) + popcount(Y).  */
>  (simplify
>(plus:c (POPCOUNT:s (bit_and:s @0 @1)) (POPCOUNT:s (bit_ior:cs @0 @1)))
> -  (plus (POPCOUNT @0) (POPCOUNT @1)))
> +  (plus (POPCOUNT:type @0) (POPCOUNT:type @1)))
>
>  /* popcount(X) + popcount(Y) - popcount(X&Y) is popcount(X|Y).  */
>  /* popcount(X) + popcount(Y) - popcount(X|Y) is popcount(X&Y).  */
> diff --git a/gcc/testsuite/gcc.c-torture/compile/fold-popcount-1.c 
> b/gcc/testsuite/gcc.c-torture/compile/fold-popcount-1.c
> new file mode 100644
> index 000..d3d3a2976e0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/fold-popcount-1.c
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/111913 */
> +
> +int f(unsigned int x, unsigned int y)
> +{
> +  return __builtin_popcount (x&y) + __builtin_popcount (y|x--);
> +}
> +
> +int f2(unsigned int x, unsigned int y)
> +{
> +  int t = __builtin_popcount (x&y);
> +  int t1 = __builtin_popcount (x|y);
> +  return t + t1;
> +}
> diff --git a/gcc/testsuite/gcc.dg/fold-popcount-8a.c 
> b/gcc/testsuite/gcc.dg/fold-popcount-8a.c
> new file mode 100644
> index 000..3001522f259
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/fold-popcount-8a.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int foo1(unsigned int x, unsigned int y)
> +{
> +  int t = __builtin_popcount (x&y);
> +  int t1 = __builtin_popcount (x|y);
> +  return t + t1;
> +}
> +
> +int foo2(unsigned int x, unsigned int y)
> +{
> +  int t1 = __builtin_popcount (x|y);
> +  int t = __builtin_popcount (x&y);
> +  return t + t1;
> +}
> +
> +int foo3(unsigned int y, unsigned int x)
> +{
> +  int t = __builtin_popcount (x&y);
> +  int t1 = __builtin_popcount (x|y);
> +  return t + t1;
> +}
> +
> +int foo4(unsigned int y, unsigned int x)
> +{
> +  int t1 = __builtin_popcount (x|y);
> +  int t = __builtin_popcount (x&y);
> +  return t + t1;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " & " "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " \\| " "optimized" } } */
> --
> 2.39.3
>


Re: [PATCH] Support vec_cmpmn/vcondmn for v2hf/v4hf.

2023-10-24 Thread Richard Biener
On Tue, Oct 24, 2023 at 7:44 AM Hongtao Liu  wrote:
>
> On Tue, Oct 24, 2023 at 1:23 PM Hongtao Liu  wrote:
> >
> > On Tue, Oct 24, 2023 at 10:53 AM Hongtao Liu  wrote:
> > >
> > > On Mon, Oct 23, 2023 at 8:35 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Mon, Oct 23, 2023 at 10:48 AM liuhongt  wrote:
> > > > >
> > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > > > Ready push to trunk.
> > > >
> > > > vcond and vcondeq shouldn't be necessary if there's
> > > > vcond_mask and vcmp support which is the "modern"
> > > > way of handling vcond.  Unless the ISA really can do
> > > > compare and select with a single instruction.
> > > For testcase
> > >
> > > typedef _Float16 __attribute__((__vector_size__ (4))) __v2hf;
> > > typedef _Float16 __attribute__((__vector_size__ (8))) __v4hf;
> > >
> > >
> > > __v4hf cf, df;
> > >
> > > __v4hf cfu (__v4hf c, __v4hf d) { return (c > d) ? cf : df; }
> > >
> > > The data_mode passes to ix86_get_mask_mode is v4hi, not v4hf since
> > >
> > >   /* Always construct signed integer vector type.  */
> > >   intt = c_common_type_for_size
> > > (GET_MODE_BITSIZE (SCALAR_TYPE_MODE (TREE_TYPE (type0))), 0);
> > >   if (!intt)
> > > {
> > >   if (complain & tf_error)
> > > error_at (location, "could not find an integer type "
> > >   "of the same size as %qT", TREE_TYPE (type0));
> > >   return error_mark_node;
> > > }
> > >   result_type = build_opaque_vector_type (intt,
> > >   TYPE_VECTOR_SUBPARTS (type0));
> > >   return build_vec_cmp (resultcode, result_type, op0, op1);
> > >
> > > The backend can't distinguish whether it's a vector fp16 comparison or
> > > a vector hi comparison.
> > > the former requires -mavx512fp16, the latter requires -mavx512bw
> > Should we pass type0 instead of result_type here?
>  6335@deftypefn {Target Hook} opt_machine_mode
> TARGET_VECTORIZE_GET_MASK_MODE (machine_mode @var{mode})
>  6336Return the mode to use for a vector mask that holds one boolean
>  6337result for each element of vector mode @var{mode}.  The returned mask 
> mode
>  6338can be a vector of integers (class @code{MODE_VECTOR_INT}), a vector of
>  6339booleans (class @code{MODE_VECTOR_BOOL}) or a scalar integer (class
>  6340@code{MODE_INT}).  Return an empty @code{opt_machine_mode} if no such
>  6341mask mode exists.
>
> Looks like it's on purpose, v2hi is exactly what we needed here.
>
> Then we use either kmask or v4hi for both v4hf and v4hi comparison,
> but can't use v4hi for v4hi comparison, but kmask for v4hf comparison.

I think it's indeed on purpose that the result of v1 < v2 is a signed
integer vector type.
But build_vec_cmp should not use the truth type for the result but instead the
truth type for the comparison, so

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 112d28fd656..01dea608980 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -11986,7 +11986,7 @@ build_vec_cmp (tree_code code, tree type,
 {
   tree zero_vec = build_zero_cst (type);
   tree minus_one_vec = build_minus_one_cst (type);
-  tree cmp_type = truth_type_for (type);
+  tree cmp_type = truth_type_for (TREE_TYPE (arg0));
   tree cmp = build2 (code, cmp_type, arg0, arg1);
   return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec);
 }


> > > >
> > > > Richard.
> > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR target/103861
> > > > > * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle
> > > > > V2HF/V2BF/V4HF/V4BFmode.
> > > > > * config/i386/mmx.md (vec_cmpv4hfqi): New expander.
> > > > > (vcondv4hf): Ditto.
> > > > > (vcondv4hi): Ditto.
> > > > > (vconduv4hi): Ditto.
> > > > > (vcond_mask_v4hi): Ditto.
> > > > > (vcond_mask_qi): Ditto.
> > > > > (vec_cmpv2hfqi): Ditto.
> > > > > (vcondv2hf): Ditto.
> > > > > (vcondv2hi): Ditto.
> > > > > (vconduv2hi): Ditto.
> > > > > (vcond_mask_v2hi): Ditto.
> > > > > * config/i386/sse.md (vcond): Merge this with ..
> > > > > (vcond): .. this into ..
> > > > > (vcond): .. this,
> > > > > and extend to V8BF/V16BF/V32BFmode.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * g++.target/i386/part-vect-vcondhf.C: New test.
> > > > > * gcc.target/i386/part-vect-vec_cmphf.c: New test.
> > > > > ---
> > > > >  gcc/config/i386/i386-expand.cc|   4 +
> > > > >  gcc/config/i386/mmx.md| 237 
> > > > > +-
> > > > >  gcc/config/i386/sse.md|  25 +-
> > > > >  .../g++.target/i386/part-vect-vcondhf.C   |  34 +++
> > > > >  .../gcc.target/i386/part-vect-vec_cmphf.c |  26 ++
> > > > >  5 files changed, 304 insertions(+), 22 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/g++.target/i386/part-vect-vcondhf.C
> > > > >  create mode 100644 
> > > > > gcc/testsuite/gcc.target/i386/part-vect-vec_cmphf.c
> > > > >
> > > > > diff --git a/

Re: [PATCHv2] Improve factor_out_conditional_operation for conversions and constants

2023-10-24 Thread Richard Biener
On Tue, Oct 24, 2023 at 8:45 AM Andrew Pinski  wrote:
>
> In the case of a NOP conversion (precisions of the 2 types are equal),
> factoring out the conversion can be done even if int_fits_type_p returns
> false and even when the conversion is defined by a statement inside the
> conditional. Since it is a NOP conversion there is no zero/sign extending
> happening which is why it is ok to be done here; we were trying to prevent
> an extra sign/zero extend from being moved away from definition which no-op
> conversions are not.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> gcc/ChangeLog:
>
> PR tree-optimization/104376
> PR tree-optimization/101541
> * tree-ssa-phiopt.cc (factor_out_conditional_operation):
> Allow nop conversions even if it is defined by a statement
> inside the conditional.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/101541
> * gcc.dg/tree-ssa/phi-opt-39.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-39.c | 43 ++
>  gcc/tree-ssa-phiopt.cc | 16 ++--
>  2 files changed, 56 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-39.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-39.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-39.c
> new file mode 100644
> index 000..6b6006a96db
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-39.c
> @@ -0,0 +1,43 @@
> +/* { dg-options "-O2 -fdump-tree-phiopt" } */
> +
> +unsigned f0(int A)
> +{
> +// A == 0? A : -Asame as -A
> +  if (A == 0)  return A;
> +  return -A;
> +}
> +
> +unsigned f1(int A)
> +{
> +// A != 0? A : -Asame as A
> +  if (A != 0)  return A;
> +  return -A;
> +}
> +unsigned f2(int A)
> +{
> +// A >= 0? A : -Asame as abs (A)
> +  if (A >= 0)  return A;
> +  return -A;
> +}
> +unsigned f3(int A)
> +{
> +// A > 0?  A : -Asame as abs (A)
> +  if (A > 0)  return A;
> +  return -A;
> +}
> +unsigned f4(int A)
> +{
> +// A <= 0? A : -Asame as -abs (A)
> +  if (A <= 0)  return A;
> +  return -A;
> +}
> +unsigned f5(int A)
> +{
> +// A < 0?  A : -Asame as -abs (A)
> +  if (A < 0)  return A;
> +  return -A;
> +}
> +
> +/* f4 and f5 are not allowed to be optimized in early phi-opt. */
> +/* { dg-final { scan-tree-dump-times "if" 2 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-not "if" "phiopt2" } } */
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 312a6f9082b..bb55a4fba33 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -310,7 +310,9 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
> *phi,
> return NULL;
>/* If arg1 is an INTEGER_CST, fold it to new type.  */
>if (INTEGRAL_TYPE_P (TREE_TYPE (new_arg0))
> - && int_fits_type_p (arg1, TREE_TYPE (new_arg0)))
> + && (int_fits_type_p (arg1, TREE_TYPE (new_arg0))
> + || (TYPE_PRECISION (TREE_TYPE (new_arg0))
> +  == TYPE_PRECISION (TREE_TYPE (arg1)
> {
>   if (gimple_assign_cast_p (arg0_def_stmt))
> {
> @@ -322,8 +324,12 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
> *phi,
>  if arg0_def_stmt is the only non-debug stmt in
>  its basic block, because then it is possible this
>  could enable further optimizations (minmax replacement
> -etc.).  See PR71016.  */
> - if (new_arg0 != gimple_cond_lhs (cond_stmt)
> +etc.).  See PR71016.
> +Note no-op conversions don't have this issue as
> +it will not generate any zero/sign extend in that case.  */
> + if ((TYPE_PRECISION (TREE_TYPE (new_arg0))
> +   != TYPE_PRECISION (TREE_TYPE (arg1)))
> + && new_arg0 != gimple_cond_lhs (cond_stmt)
>   && new_arg0 != gimple_cond_rhs (cond_stmt)
>   && gimple_bb (arg0_def_stmt) == e0->src)
> {
> @@ -354,6 +360,10 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
> *phi,
> return NULL;
> }
>   new_arg1 = fold_convert (TREE_TYPE (new_arg0), arg1);
> +
> + /* Drop the overlow that fold_convert might add. */
> + if (TREE_OVERFLOW (new_arg1))
> +   new_arg1 = drop_tree_overflow (new_arg1);
> }
>   else
> return NULL;
> --
> 2.34.1
>


Re: [PATCH] libgcc: make heap-based trampolines conditional on libc presence

2023-10-24 Thread Richard Biener
On Mon, Oct 23, 2023 at 6:41 PM Sergei Trofimovich  wrote:
>
> On Mon, 23 Oct 2023 13:54:01 +0100
> Iain Sandoe  wrote:
>
> > hi Sergei,
> >
> > > On 23 Oct 2023, at 13:43, Sergei Trofimovich  wrote:
> > >
> > > From: Sergei Trofimovich 
> > >
> > > To build `libc` for a target one needs to build `gcc` without `libc`
> > > support first. Commit r14-4823-g8abddb187b3348 "libgcc: support
> > > heap-based trampolines" added unconditional `libc` dependency and broke
> > > libc-less `gcc` builds.
> > >
> > > An example failure on `x86_64-unknown-linux-gnu`:
> > >
> > >$ mkdir -p /tmp/empty
> > >$ ../gcc/configure \
> > >--disable-multilib \
> > >--without-headers \
> > >--with-newlib \
> > >--enable-languages=c \
> > >--disable-bootstrap \
> > >--disable-gcov \
> > >--disable-threads \
> > >--disable-shared \
> > >--disable-libssp \
> > >--disable-libquadmath \
> > >--disable-libgomp \
> > >--disable-libatomic \
> > >--with-build-sysroot=/tmp/empty
> > >$ make
> > >...
> > >/tmp/gb/./gcc/xgcc -B/tmp/gb/./gcc/ 
> > > -B/usr/local/x86_64-pc-linux-gnu/bin/ 
> > > -B/usr/local/x86_64-pc-linux-gnu/lib/ -isystem 
> > > /usr/local/x86_64-pc-linux-gnu/include -isystem 
> > > /usr/local/x86_64-pc-linux-gnu/sys-include --sysroot=/tmp/empty   -g -O2 
> > > -O2  -g -O2 -DIN_GCC   -W -Wall -Wno-narrowing -Wwrite-strings 
> > > -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes 
> > > -Wold-style-definition  -isystem ./include  -fpic -mlong-double-80 
> > > -DUSE_ELF_SYMVER -fcf-protection -mshstk -g -DIN_LIBGCC2 
> > > -fbuilding-libgcc -fno-stack-protector -Dinhibit_libc -fpic 
> > > -mlong-double-80 -DUSE_ELF_SYMVER -fcf-protection -mshstk -I. -I. 
> > > -I../.././gcc -I/home/slyfox/dev/git/gcc/libgcc 
> > > -I/home/slyfox/dev/git/gcc/libgcc/. 
> > > -I/home/slyfox/dev/git/gcc/libgcc/../gcc 
> > > -I/home/slyfox/dev/git/gcc/libgcc/../include  -DHAVE_CC_TLS  -DUSE_TLS  
> > > -o heap-trampoline.o -MT heap-trampoline.o -MD -MP -MF 
> > > heap-trampoline.dep  -c .../gcc/libgcc/config/i386/heap-trampoline.c 
> > > -fvisibility=hidden -DHIDE_EXPORTS
> > >../gcc/libgcc/config/i386/heap-trampoline.c:3:10: fatal error: 
> > > unistd.h: No such file or directory
> > >3 | #include 
> > >  |  ^~
> > >compilation terminated.
> > >make[2]: *** [.../gcc/libgcc/static-object.mk:17: heap-trampoline.o] 
> > > Error 1
> > >make[2]: Leaving directory '/tmp/gb/x86_64-pc-linux-gnu/libgcc'
> > >make[1]: *** [Makefile:13307: all-target-libgcc] Error 2
> > >
> > > The change inhibits any heap-based trampoline code.
> >
> > That looks reasonable to me (I was considering using __has_include(), but 
> > the inhibit_libc is neater).
> >
> > The fact that this first compiler is buit without heap-trampoline support, 
> > would become relevant, I guess if libc wanted to use them, it would need 
> > another iteration.
> >
> > so, it looks fine, but I cannot actually approve it.
>
> Sounds good. Let's wait for others to chime in. Maybe Richard? :)

OK.

> AFAIU libcs (like `glibc`) try hard not to use link tests and uses
> mainly preprocessor and code generator to specifically accommodate this
> case. Maybe there is a way to pass the support flag to libc without the
> reliance on code presence in libgcc.
>
> Otherwise we could use __builtin_trap() as an implementation for exposed
> symbols.
>
> >
> > >
> > > libgcc/
> > >
> > > * libgcc/config/aarch64/heap-trampoline.c: Disable when libc is
> > >   not present.
> > > ---
> > > libgcc/config/aarch64/heap-trampoline.c | 5 +
> > > libgcc/config/i386/heap-trampoline.c| 5 +
> > > 2 files changed, 10 insertions(+)
> > >
> > > diff --git a/libgcc/config/aarch64/heap-trampoline.c 
> > > b/libgcc/config/aarch64/heap-trampoline.c
> > > index c8b83681ed7..f22233987ca 100644
> > > --- a/libgcc/config/aarch64/heap-trampoline.c
> > > +++ b/libgcc/config/aarch64/heap-trampoline.c
> > > @@ -1,5 +1,8 @@
> > > /* Copyright The GNU Toolchain Authors. */
> > >
> > > +/* libc is required to allocate trampolines.  */
> > > +#ifndef inhibit_libc
> > > +
> > > #include 
> > > #include 
> > > #include 
> > > @@ -170,3 +173,5 @@ __builtin_nested_func_ptr_deleted (void)
> > >   tramp_ctrl_curr = prev;
> > > }
> > > }
> > > +
> > > +#endif /* !inhibit_libc */
> > > diff --git a/libgcc/config/i386/heap-trampoline.c 
> > > b/libgcc/config/i386/heap-trampoline.c
> > > index 96e13bf828e..4b9f4365868 100644
> > > --- a/libgcc/config/i386/heap-trampoline.c
> > > +++ b/libgcc/config/i386/heap-trampoline.c
> > > @@ -1,5 +1,8 @@
> > > /* Copyright The GNU Toolchain Authors. */
> > >
> > > +/* libc is required to allocate trampolines.  */
> > > +#ifndef inhibit_libc
> > > +
> > > #include 
> > > #include 
> > > #include 
> > > @@ -170,3 +173,5 @@ __builtin_nested_func_ptr_deleted (void)
> > >   tramp_ctrl_curr = prev;
> > > }
> > >

Re: [PATCH v3] gcc: Introduce -fhardened

2023-10-24 Thread Richard Biener
On Mon, Oct 23, 2023 at 9:26 PM Marek Polacek  wrote:
>
> On Thu, Oct 19, 2023 at 02:24:11PM +0200, Richard Biener wrote:
> > On Wed, Oct 11, 2023 at 10:48 PM Marek Polacek  wrote:
> > >
> > > On Tue, Sep 19, 2023 at 10:58:19AM -0400, Marek Polacek wrote:
> > > > On Mon, Sep 18, 2023 at 08:57:39AM +0200, Richard Biener wrote:
> > > > > On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, 
> > > > > > powerpc64le-unknown-linux-gnu,
> > > > > > and aarch64-unknown-linux-gnu; ok for trunk?
> > > > > >
> > > > > > -- >8 --
> > > > > > In 
> > > > > > 
> > > > > > I proposed -fhardened, a new umbrella option that enables a 
> > > > > > reasonable set
> > > > > > of hardening flags.  The read of the room seems to be that the 
> > > > > > option
> > > > > > would be useful.  So here's a patch implementing that option.
> > > > > >
> > > > > > Currently, -fhardened enables:
> > > > > >
> > > > > >   -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
> > > > > >   -D_GLIBCXX_ASSERTIONS
> > > > > >   -ftrivial-auto-var-init=pattern
> >
> > I think =zero is much better here given the overhead is way
> > cheaper and pointers get a more reliable behavior.
>
> Ok, changed now.
>
> > > > > >   -fPIE  -pie  -Wl,-z,relro,-z,now
> > > > > >   -fstack-protector-strong
> > > > > >   -fstack-clash-protection
> > > > > >   -fcf-protection=full (x86 GNU/Linux only)
> > > > > >
> > > > > > -fhardened will not override options that were specified on the 
> > > > > > command line
> > > > > > (before or after -fhardened).  For example,
> > > > > >
> > > > > >  -D_FORTIFY_SOURCE=1 -fhardened
> > > > > >
> > > > > > means that _FORTIFY_SOURCE=1 will be used.  Similarly,
> > > > > >
> > > > > >   -fhardened -fstack-protector
> > > > > >
> > > > > > will not enable -fstack-protector-strong.
> > > > > >
> > > > > > In DW_AT_producer it is reflected only as -fhardened; it doesn't 
> > > > > > expand
> > > > > > to anything.  I think we need a better way to show what it actually
> > > > > > enables.
> > > > >
> > > > > I do think we need to find a solution here to solve asserting 
> > > > > compliance.
> > > >
> > > > Fair enough.
> > > >
> > > > > Maybe we can have -Whardened that will diagnose any altering of
> > > > > -fhardened by other options on the command-line or by missed target
> > > > > implementations?  People might for example use -fstack-protector
> > > > > but don't really want to make protection lower than requested with 
> > > > > -fhardened.
> > > > >
> > > > > Any such conflict is much less appearant than when you use the
> > > > > flags -fhardened composes.
> > > >
> > > > How about: --help=hardened says which options -fhardened attempts to
> > > > enable, and -Whardened warns when it didn't enable an option?  E.g.,
> > > >
> > > >   -fstack-protector -fhardened -Whardened
> > > >
> > > > would say that it didn't enable -fstack-protector-strong because
> > > > -fstack-protector was specified on the command line?
> > > >
> > > > If !HAVE_LD_NOW_SUPPORT, --help=hardened probably doesn't even have to
> > > > list -z now, likewise for -z relro.
> > > >
> > > > Unclear if -Whardened should be enabled by default, but probably yes?
> > >
> > > Here's v2 which adds -Whardened (enabled by default).
> > >
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> >
> > I think it's OK but I'd like to see a second ACK here.
>
> Thanks!
>
> > Can you see how our
> > primary and secondary targets (+ host OS) behave here?
>
> That's very reasonable.  I tried to build gcc on Compile Farm 119 (AIX) but
> that fails with:
>
> ar  -X64 x ../ppc64/libgcc/libgcc_s.a shr.o
> ar: 0707-100 ../ppc64/libgcc/libgcc_s.a does not exist.
> make[2]: *** [/home/polacek/gcc/libgcc/config/rs6000/t-slibgcc-aix:98: all] 
> Error 1
> make[2]: Leaving directory 
> '/home/polacek/x/trunk/powerpc-ibm-aix7.3.1.0/libgcc'
>
> and I tried Darwin (104) and that fails with
>
> *** Configuration aarch64-apple-darwin21.6.0 not supported
>
> Is anyone else able to build gcc on those machines, or test the attached
> patch?
>
> > I think the
> > documentation should elaborate a bit on expectations for non-Linux/GNU
> > targets, specifically I think the default configuration for a target should
> > with -fhardened _not_ have any -Whardened diagnostics.  Maybe we can
> > have a testcase for this?
>
> Sorry, I'm not sure how to test that.  I suppose if -fhardened enables
> something not supported on those systems, and it's something for which
> we have a configure test, then we shouldn't warn.  This is already the
> case for -pie, -z relro, and -z now.

I was thinking of

/* { dg-do compile } */
/* { dg-additional-options "-fhardened -Whardened" } */

int main () {}

and excess errors should catch "misconfigurations"?

> Should the docs say something like the following for features without
> configu

[PATCH v2] RISC-V: Fix ICE of RVV vget/vset intrinsic[PR111935]

2023-10-24 Thread Li Xu

Calling vget/vset intrinsic without receiving a return value will cause
a crash. Because in this case e.target is null.
This patch should be backported to releases/gcc-13.

PR/target 111935

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr111935.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  |  4 +++
 .../gcc.target/riscv/rvv/base/pr111935.c  | 26 +++
 2 files changed, 30 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ab12e130907..0b1409a52e0 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1740,6 +1740,8 @@ public:
 
   rtx expand (function_expander &e) const override
   {
+if (!e.target)
+  return NULL_RTX;
 rtx dest = expand_normal (CALL_EXPR_ARG (e.exp, 0));
 gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (dest)));
 rtx index = expand_normal (CALL_EXPR_ARG (e.exp, 1));
@@ -1777,6 +1779,8 @@ public:
 
   rtx expand (function_expander &e) const override
   {
+if (!e.target)
+  return NULL_RTX;
 rtx src = expand_normal (CALL_EXPR_ARG (e.exp, 0));
 gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (src)));
 rtx index = expand_normal (CALL_EXPR_ARG (e.exp, 1));
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c
new file mode 100644
index 000..0b936d849a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O0 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+inline vuint32m4_t __attribute__((__always_inline__)) transpose_indexes() {
+  static const uint32_t idx_[16] = {0, 4, 8, 12,
+  1, 5, 9, 13,
+  2, 6, 10, 14,
+  3, 7, 11, 15};
+  return __riscv_vle32_v_u32m4(idx_, 16);
+}
+
+void pffft_real_preprocess_4x4(const float *in) {
+  vfloat32m1_t r0=__riscv_vle32_v_f32m1(in,4);
+  vfloat32m4_t tmp = __riscv_vundefined_f32m4();
+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 0, r0);
+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 1, r0);
+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 2, r0);
+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 3, r0);
+  tmp = __riscv_vrgather_vv_f32m4(tmp, transpose_indexes(), 16);
+  r0 = __riscv_vget_v_f32m4_f32m1(tmp, 0);
+}
+
+/* { dg-final { scan-assembler-times 
{vl[0-9]+re[0-9]+\.v\s+v[0-9]+,\s*0\([a-z]+[0-9]+\)} 10 } } */
+/* { dg-final { scan-assembler-times 
{vs[0-9]+r\.v\s+v[0-9]+,\s*0\([a-z]+[0-9]+\)} 8 } } */
-- 
2.17.1


xu...@eswincomputing.com


Re: [ARC PATCH] Improved SImode shifts and rotates on !TARGET_BARREL_SHIFTER.

2023-10-24 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

Your patch doesn't introduce new regressions. However, before pushing
to the mainline you need to fix some issues:
1. Please fix the trailing spaces and blocks of 8 spaces which should
be replaced with tabs. You can use check_GNU_style.py script to spot
them.
2. Please use capital letters for code iterators (i.e., any_shift_rotate).

Once the above issues are fixed, please proceed with your commit.

Thank you for your contribution,
Claudiu

On Sun, Oct 8, 2023 at 10:07 PM Roger Sayle  wrote:
>
>
> This patch completes the ARC back-end's transition to using pre-reload
> splitters for SImode shifts and rotates on targets without a barrel
> shifter.  The core part is that the shift_si3 define_insn is no longer
> needed, as shifts and rotates that don't require a loop are split
> before reload, and then because shift_si3_loop is the only caller
> of output_shift, both can be significantly cleaned up and simplified.
> The output_shift function (Claudiu's "the elephant in the room") is
> renamed output_shift_loop, which handles just the four instruction
> zero-overhead loop implementations.
>
> Aside from the clean-ups, the user visible changes are much improved
> implementations of SImode shifts and rotates on affected targets.
>
> For the function:
> unsigned int rotr_1 (unsigned int x) { return (x >> 1) | (x << 31); }
>
> GCC with -O2 -mcpu=em would previously generate:
>
> rotr_1: lsr_s r2,r0
> bmsk_s r0,r0,0
> ror r0,r0
> j_s.d   [blink]
> or_sr0,r0,r2
>
> with this patch, we now generate:
>
> j_s.d   [blink]
> ror r0,r0
>
> For the function:
> unsigned int rotr_31 (unsigned int x) { return (x >> 31) | (x << 1); }
>
> GCC with -O2 -mcpu=em would previously generate:
>
> rotr_31:
> mov_s   r2,r0   ;4
> asl_s r0,r0
> add.f 0,r2,r2
> rlc r2,0
> j_s.d   [blink]
> or_sr0,r0,r2
>
> with this patch we now generate an add.f followed by an adc:
>
> rotr_31:
> add.f   r0,r0,r0
> j_s.d   [blink]
> add.cs  r0,r0,1
>
>
> Shifts by constants requiring a loop have been improved for even counts
> by performing two operations in each iteration:
>
> int shl10(int x) { return x >> 10; }
>
> Previously looked like:
>
> shl10:  mov.f lp_count, 10
> lpnz2f
> asr r0,r0
> nop
> 2:  # end single insn loop
> j_s [blink]
>
>
> And now becomes:
>
> shl10:
> mov lp_count,5
> lp  2f
> asr r0,r0
> asr r0,r0
> 2:  # end single insn loop
> j_s [blink]
>
>
> So emulating ARC's SWAP on architectures that don't have it:
>
> unsigned int rotr_16 (unsigned int x) { return (x >> 16) | (x << 16); }
>
> previously required 10 instructions and ~70 cycles:
>
> rotr_16:
> mov_s   r2,r0   ;4
> mov.f lp_count, 16
> lpnz2f
> add r0,r0,r0
> nop
> 2:  # end single insn loop
> mov.f lp_count, 16
> lpnz2f
> lsr r2,r2
> nop
> 2:  # end single insn loop
> j_s.d   [blink]
> or_sr0,r0,r2
>
> now becomes just 4 instructions and ~18 cycles:
>
> rotr_16:
> mov lp_count,8
> lp  2f
> ror r0,r0
> ror r0,r0
> 2:  # end single insn loop
> j_s [blink]
>
>
> This patch has been tested with a cross-compiler to arc-linux hosted
> on x86_64-pc-linux-gnu and (partially) tested with the compile-only
> portions of the testsuite with no regressions.  Ok for mainline, if
> your own testing shows no issues?
>
>
> 2023-10-07  Roger Sayle  
>
> gcc/ChangeLog
> * config/arc/arc-protos.h (output_shift): Rename to...
> (output_shift_loop): Tweak API to take an explicit rtx_code.
> (arc_split_ashl): Prototype new function here.
> (arc_split_ashr): Likewise.
> (arc_split_lshr): Likewise.
> (arc_split_rotl): Likewise.
> (arc_split_rotr): Likewise.
> * config/arc/arc.cc (output_shift): Delete local prototype.  Rename.
> (output_shift_loop): New function replacing output_shift to output
> a zero overheap loop for SImode shifts and rotates on ARC targets
> without barrel shifter (i.e. no hardware support for these insns).
> (arc_split_ashl): New helper function to split *ashlsi3_nobs.
> (arc_split_ashr): New helper function to split *ashrsi3_nobs.
> (arc_split_lshr): New helper function to split *lshrsi3_nobs.
> (arc_split_rotl): New helper function to split *rotlsi3_nobs.
> (arc_split_rotr): New helper function to split *rotrsi3_nobs.
> * config/arc/arc.md (any_shift_rotate): New define_code_iterator.
> (define_code_attr insn): New code attribute to map to pattern name.
> (si3): New expander unifying previous ashlsi3,
> ashrsi3 and lshrsi3 define_expands.  Adds rotlsi3 and rotrsi3.
> (*si3_nobs): New defin

Re: [PATCH v2] RISC-V: Fix ICE of RVV vget/vset intrinsic[PR111935]

2023-10-24 Thread juzhe.zh...@rivai.ai
Ok for trunk (You can commit it to the trunk now).

For GCC-13,  I'd like to wait for kito's comment.

Thanks.


juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-10-24 15:29
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong
Subject: [PATCH v2] RISC-V: Fix ICE of RVV vget/vset intrinsic[PR111935]

Calling vget/vset intrinsic without receiving a return value will cause
a crash. Because in this case e.target is null.
This patch should be backported to releases/gcc-13.

PR/target 111935

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr111935.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  |  4 +++
 .../gcc.target/riscv/rvv/base/pr111935.c  | 26 +++
 2 files changed, 30 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ab12e130907..0b1409a52e0 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1740,6 +1740,8 @@ public:
 
   rtx expand (function_expander &e) const override
   {
+if (!e.target)
+  return NULL_RTX;
 rtx dest = expand_normal (CALL_EXPR_ARG (e.exp, 0));
 gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (dest)));
 rtx index = expand_normal (CALL_EXPR_ARG (e.exp, 1));
@@ -1777,6 +1779,8 @@ public:
 
   rtx expand (function_expander &e) const override
   {
+if (!e.target)
+  return NULL_RTX;
 rtx src = expand_normal (CALL_EXPR_ARG (e.exp, 0));
 gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (src)));
 rtx index = expand_normal (CALL_EXPR_ARG (e.exp, 1));
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c
new file mode 100644
index 000..0b936d849a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O0 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+inline vuint32m4_t __attribute__((__always_inline__)) transpose_indexes() {
+  static const uint32_t idx_[16] = {0, 4, 8, 12,
+  1, 5, 9, 13,
+  2, 6, 10, 14,
+  3, 7, 11, 15};
+  return __riscv_vle32_v_u32m4(idx_, 16);
+}
+
+void pffft_real_preprocess_4x4(const float *in) {
+  vfloat32m1_t r0=__riscv_vle32_v_f32m1(in,4);
+  vfloat32m4_t tmp = __riscv_vundefined_f32m4();
+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 0, r0);
+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 1, r0);
+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 2, r0);
+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 3, r0);
+  tmp = __riscv_vrgather_vv_f32m4(tmp, transpose_indexes(), 16);
+  r0 = __riscv_vget_v_f32m4_f32m1(tmp, 0);
+}
+
+/* { dg-final { scan-assembler-times 
{vl[0-9]+re[0-9]+\.v\s+v[0-9]+,\s*0\([a-z]+[0-9]+\)} 10 } } */
+/* { dg-final { scan-assembler-times 
{vs[0-9]+r\.v\s+v[0-9]+,\s*0\([a-z]+[0-9]+\)} 8 } } */
-- 
2.17.1


xu...@eswincomputing.com


Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-24 Thread Ajit Agarwal
Hello Bernhard:

On 23/10/23 7:40 pm, Bernhard Reutner-Fischer wrote:
> On Mon, 23 Oct 2023 12:16:18 +0530
> Ajit Agarwal  wrote:
> 
>> Hello All:
>>
>> Addressed below review comments in the version 11 of the patch.
>> Please review and please let me know if its ok for trunk.
> 
> s/satisified/satisfied/
> 

I will fix this.

>>> As said, I don't see why the below was not cleaned up before the V1 
>>> submission.
>>> Iff it breaks when manually CSEing, I'm curious why?
> 
> The function below looks identical in v12 of the patch.
> Why didn't you use common subexpressions?
> ba

Using CSE here breaks aarch64 regressions hence I have reverted it back 
not to use CSE,

>>>   
> +/* Return TRUE if reg source operand of zero_extend is argument registers
> +   and not return registers and source and destination operand are same
> +   and mode of source and destination operand are not same.  */
> +
> +static bool
> +abi_extension_candidate_p (rtx_insn *insn)
> +{
> +  rtx set = single_set (insn);
> +  machine_mode dst_mode = GET_MODE (SET_DEST (set));
> +  rtx orig_src = XEXP (SET_SRC (set), 0);
> +
> +  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
> +  || abi_extension_candidate_return_reg_p (/*insn,*/ REGNO 
> (orig_src)))  
> +return false;
> +
> +  /* Mode of destination and source should be different.  */
> +  if (dst_mode == GET_MODE (orig_src))
> +return false;
> +
> +  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
> +  bool promote_p = abi_target_promote_function_mode (mode);
> +
> +  /* REGNO of source and destination should be same if not
> +  promoted.  */
> +  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
> +return false;
> +
> +  return true;
> +}
> +  
> 
> 
>>>
>>> As said, please also rephrase the above (and everything else if it 
>>> obviously looks akin the above).
> 
> thanks


Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-24 Thread Ajit Agarwal
Hello Vineet:

On 24/10/23 12:02 am, Vineet Gupta wrote:
> 
> 
> On 10/22/23 23:46, Ajit Agarwal wrote:
>> Hello All:
>>
>> Addressed below review comments in the version 11 of the patch.
>> Please review and please let me know if its ok for trunk.
>>
>> Thanks & Regards
>> Ajit
> 
> Again you are not paying attention to prior comments about fixing your 
> submission practice and like some of the prior reviewers I'm starting to get 
> tired, despite potentially good technical content.
> 

Sorry for the inconvenience caused. I will make sure all the comments from 
reviewers
are addressed.

> 1. The commentary above is NOT part of changelog. Either use a separate cover 
> letter or add patch version change history between two "---" lines just 
> before the start of code diff. And keep accumulating those as you post new 
> versions. See [1]. This is so reviewers knwo what changed over 10 months and 
> automatically gets dropped when patch is eventually applied/merged into tree.
>

Sure I will do that.
 
> 2. Acknowledge (even if it is yes) each and every comment of the reviewerw 
> explicitly inline below. That ensures you don't miss addressing a change 
> since this forces one to think about each of them.
> 

Surely I will acknowledge each and every comments inline.

> I do have some technical comments which I'll follow up with later.

I look forward to it.

> Just a short summary that v10 indeed bootstraps risc-v but I don't see any 
> improvements at all - as in whenever abi interfaces code identifies and 
> extension (saw missing a definition, the it is not able to eliminate any 
> extensions despite the patch.
>

Thanks for the summary and the check. 

Thanks & Regards
Ajit
 
> -Vineet
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632180.html
> 
>>
>> On 22/10/23 12:56 am, rep.dot@gmail.com wrote:
>>> On 21 October 2023 01:56:16 CEST, Vineet Gupta  wrote:
 On 10/19/23 23:50, Ajit Agarwal wrote:
> Hello All:
>
> This version 9 of the patch uses abi interfaces to remove zero and sign 
> extension elimination.
> Bootstrapped and regtested on powerpc-linux-gnu.
>
> In this version (version 9) of the patch following review comments are 
> incorporated.
>
> a) Removal of hard code zero_extend and sign_extend  in abi interfaces.
> b) Source and destination with different registers are considered.
> c) Further enhancements.
> d) Added sign extension elimination using abi interfaces.
 As has been trend in the past, I don't think all the review comments have 
 been addressed.
>>> And apart from that, may I ask if this is just me, or does anybody else 
>>> think that it might be worthwhile to actually read a patch before 
>>> (re-)posting?
>>>
>>> Seeing e.g. the proposed abi_extension_candidate_p as written in a first 
>>> POC would deserve some manual CSE, if nothing more then for clarity and 
>>> conciseness?
>>>
>>> Just curious from a meta perspective..
>>>
>>> And:
>>>
> ree: Improve ree pass for rs6000 target using defined abi interfaces
>>> mentioning powerpc like this, and then changing generic code could be 
>>> interpreted as misleading, IMHO.
>>>
> For rs6000 target we see redundant zero and sign extension and done
> to improve ree pass to eliminate such redundant zero and sign extension
> using defined ABI interfaces.
>>> Mentioning powerpc in the body as one of the affected target(s) is of 
>>> course fine.
>>>
>>>
>    +/* Return TRUE if target mode is equal to source mode of zero_extend
> +   or sign_extend otherwise false.  */
>>> , false otherwise.
>>>
>>> But I'm not a native speaker
>>>
>>>
> +/* Return TRUE if the candidate insn is zero extend and regno is
> +   a return registers.  */
> +
> +static bool
> +abi_extension_candidate_return_reg_p (/*rtx_insn *insn, */int regno)
>>> Leftover debug comment.
>>>
> +{
> +  if (targetm.calls.function_value_regno_p (regno))
> +    return true;
> +
> +  return false;
> +}
> +
>>> As said, I don't see why the below was not cleaned up before the V1 
>>> submission.
>>> Iff it breaks when manually CSEing, I'm curious why?
>>>
> +/* Return TRUE if reg source operand of zero_extend is argument registers
> +   and not return registers and source and destination operand are same
> +   and mode of source and destination operand are not same.  */
> +
> +static bool
> +abi_extension_candidate_p (rtx_insn *insn)
> +{
> +  rtx set = single_set (insn);
> +  machine_mode dst_mode = GET_MODE (SET_DEST (set));
> +  rtx orig_src = XEXP (SET_SRC (set), 0);
> +
> +  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
> +  || abi_extension_candidate_return_reg_p (/*insn,*/ REGNO 
> (orig_src)))
>>> On top, debug leftover.
>>>
> +    return false;
> +
> +  /* Mode of destination and source should be different.  */
> +  if (dst_mode == GET_MODE (orig_sr

Re: [PATCH v3] gcc: Introduce -fhardened

2023-10-24 Thread Iain Sandoe
Hi Marek,

> On 23 Oct 2023, at 20:25, Marek Polacek  wrote:
> 
> On Thu, Oct 19, 2023 at 02:24:11PM +0200, Richard Biener wrote:
>> On Wed, Oct 11, 2023 at 10:48 PM Marek Polacek  wrote:
>>> 
>>> On Tue, Sep 19, 2023 at 10:58:19AM -0400, Marek Polacek wrote:
 On Mon, Sep 18, 2023 at 08:57:39AM +0200, Richard Biener wrote:
> On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
>  wrote:
> 

> and I tried Darwin (104) and that fails with
> 
> *** Configuration aarch64-apple-darwin21.6.0 not supported
> 
> Is anyone else able to build gcc on those machines, or test the attached
> patch?

We’re still working on upstreaming the aarch64 Darwin port - the devt. branch
is here; https://github.com/iains/gcc-darwin-arm64 (but it will be rebased soon
because we just upstreamed some dependencies).

In the meantime, I will put your patch into my test queue - hopefully before
next week.

Iain



Re: Re: [PATCH v2] RISC-V: Fix ICE of RVV vget/vset intrinsic[PR111935]

2023-10-24 Thread Li Xu
Committed to trunk. Thanks juzhe.


--



Li Xu



>Ok for trunk (You can commit it to the trunk now).



>



>For GCC-13,  I'd like to wait for kito's comment.



>



>Thanks.



>



>



>juzhe.zh...@rivai.ai



> 



>From: Li Xu



>Date: 2023-10-24 15:29



>To: gcc-patches



>CC: kito.cheng; palmer; juzhe.zhong



>Subject: [PATCH v2] RISC-V: Fix ICE of RVV vget/vset intrinsic[PR111935]



>



>Calling vget/vset intrinsic without receiving a return value will cause



>a crash. Because in this case e.target is null.



>This patch should be backported to releases/gcc-13.



>



>    PR/target 111935



>



>gcc/ChangeLog:



>



>    * config/riscv/riscv-vector-builtins-bases.cc: fix bug.



>



>gcc/testsuite/ChangeLog:



>



>    * gcc.target/riscv/rvv/base/pr111935.c: New test.



>---



> .../riscv/riscv-vector-builtins-bases.cc  |  4 +++



> .../gcc.target/riscv/rvv/base/pr111935.c  | 26 +++



> 2 files changed, 30 insertions(+)



> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c



>



>diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
>b/gcc/config/riscv/riscv-vector-builtins-bases.cc



>index ab12e130907..0b1409a52e0 100644



>--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc



>+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc



>@@ -1740,6 +1740,8 @@ public:



> 



>   rtx expand (function_expander &e) const override



>   {



>+    if (!e.target)



>+  return NULL_RTX;



> rtx dest = expand_normal (CALL_EXPR_ARG (e.exp, 0));



> gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (dest)));



> rtx index = expand_normal (CALL_EXPR_ARG (e.exp, 1));



>@@ -1777,6 +1779,8 @@ public:



> 



>   rtx expand (function_expander &e) const override



>   {



>+    if (!e.target)



>+  return NULL_RTX;



> rtx src = expand_normal (CALL_EXPR_ARG (e.exp, 0));



> gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (src)));



> rtx index = expand_normal (CALL_EXPR_ARG (e.exp, 1));



>diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c 
>b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c



>new file mode 100644



>index 000..0b936d849a1



>--- /dev/null



>+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c



>@@ -0,0 +1,26 @@



>+/* { dg-do compile } */



>+/* { dg-options "-march=rv64gcv -mabi=lp64d -O0 -Wno-psabi" } */



>+



>+#include "riscv_vector.h"



>+



>+inline vuint32m4_t __attribute__((__always_inline__)) transpose_indexes() {



>+  static const uint32_t idx_[16] = {0, 4, 8, 12,



>+  1, 5, 9, 13,



>+  2, 6, 10, 14,



>+  3, 7, 11, 15};



>+  return __riscv_vle32_v_u32m4(idx_, 16);



>+}



>+



>+void pffft_real_preprocess_4x4(const float *in) {



>+  vfloat32m1_t r0=__riscv_vle32_v_f32m1(in,4);



>+  vfloat32m4_t tmp = __riscv_vundefined_f32m4();



>+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 0, r0);



>+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 1, r0);



>+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 2, r0);



>+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 3, r0);



>+  tmp = __riscv_vrgather_vv_f32m4(tmp, transpose_indexes(), 16);



>+  r0 = __riscv_vget_v_f32m4_f32m1(tmp, 0);



>+}



>+



>+/* { dg-final { scan-assembler-times 
>{vl[0-9]+re[0-9]+\.v\s+v[0-9]+,\s*0\([a-z]+[0-9]+\)} 10 } } */



>+/* { dg-final { scan-assembler-times 
>{vs[0-9]+r\.v\s+v[0-9]+,\s*0\([a-z]+[0-9]+\)} 8 } } */



>-- 



>2.17.1



>



>



>xu...@eswincomputing.com




[PATCH] Ignore case of header line in dg-extract-results.py

2023-10-24 Thread Paul Iannetta
On Thu, Oct 19, 2023 at 10:48:17AM -0600, Jeff Law wrote:
> On 10/18/23 03:35, Thomas Schwinge wrote:
> > 
> > Is this (case variants) maybe something that has changed in DejaGnu at
> > some point in time?  (I have not checked.)
> No idea :-)
>
Yes, it changed around 2016.

> > I suggest that we adapt all remaining upper-case instances in GCC,
> > similar to your change.  And/or, as applicable, recognize both variants
> > (or ignore case distinctions generally)?
> Yea, we should try to get this commonized.  Probably wise to recognize both
> variants as well -- especially if there are instances of these strings which
> aren't under GCC's contorl.
> 
In retrospect, I also think that we the regex should be case
insensitive, that will allow to be compatible with older releases of
dejagnu and will incur less changes in GCC. (cf. attached patch)

> > Given Paul's (and colleagues'?) ongoing work on GCC (Kalray KVX back end,
> > complex numbers support), is it maybe now time to enable Git write access
> > for him (them?)?
> > 
> > , "write after approval".
> Sure.  I'd sponsor them.

Thanks. May I request an account on sourceware.org, mentioning you as
our sponsor?

Paul


>From ce418afa1d3098603e26e1fd2ee262a8ab72e5ab Mon Sep 17 00:00:00 2001
From: Paul Iannetta 
Date: Tue, 24 Oct 2023 09:48:42 +0200
Subject: [PATCH] dg-extract-results.py: Ignore case in header line

DejaGNU changed its header line from "Test Run By" to "Test run by"
around 2016.  This patch makes it so that both alternatives are
correcly detected.

contrib/ChangeLog:

2023-10-24  Paul Iannetta  

	* dg-extract-results.py: Make the test_run regex case
	  insensitive.
---
 contrib/dg-extract-results.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py
index 0bc65d30eaf..e92e8756ccc 100644
--- a/contrib/dg-extract-results.py
+++ b/contrib/dg-extract-results.py
@@ -113,7 +113,8 @@ class Prog:
 # Whether to create .sum rather than .log output.
 self.do_sum = True
 # Regexps used while parsing.
-self.test_run_re = re.compile (r'^Test run by (\S+) on (.*)$')
+self.test_run_re = re.compile (r'^Test run by (\S+) on (.*)$',
+   re.IGNORECASE)
 self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$')
 self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED'
  r'|WARNING|ERROR|UNSUPPORTED|UNTESTED'
-- 
2.35.1.500.gb896f729e2



RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-24 Thread Jiang, Haochen
It seems that the mail got caught elsewhere and did not send into gcc-patches
mailing thread. Resending that.

Thx,
Haochen

-Original Message-
From: Jiang, Haochen 
Sent: Tuesday, October 24, 2023 4:43 PM
To: HAO CHEN GUI ; Richard Sandiford 

Cc: gcc-patches 
Subject: RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces 
[PR111449]

Hi Haochen Gui,

It seems that the commit caused lots of test case fail on x86 platforms:

https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078379.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078380.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078381.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078382.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078383.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078384.html

Please help verify that if we need some testcase change or we get bug here.

A simple reproducer under build folder is:

make check RUNTESTFLAGS="i386.exp=g++.target/i386/pr80566-2.C 
--target_board='unix{-m64\ -march=cascadelake,-m32\ 
-march=cascadelake,-m32,-m64}'"

Thx,
Haochen

> -Original Message-
> From: HAO CHEN GUI 
> Sent: Monday, October 23, 2023 9:30 AM
> To: Richard Sandiford 
> Cc: gcc-patches 
> Subject: Re: [PATCH-1v4, expand] Enable vector mode for 
> compare_by_pieces [PR111449]
> 
> Committed as r14-4835.
> 
> https://gcc.gnu.org/g:f08ca5903c7a02b450b93143467f70b9fd8e0085
> 
> Thanks
> Gui Haochen
> 
> 在 2023/10/20 16:49, Richard Sandiford 写道:
> > HAO CHEN GUI  writes:
> >> Hi,
> >>   Vector mode instructions are efficient for compare on some targets.
> >> This patch enables vector mode for compare_by_pieces. Two help 
> >> functions are added to check if vector mode is available for 
> >> certain by pieces operations and if if optabs exists for the mode 
> >> and certain by pieces operations. One member is added in class 
> >> op_by_pieces_d to record the type of operations.
> >>
> >>   The test case is in the second patch which is rs6000 specific.
> >>
> >>   Compared to last version, the main change is to add a target hook 
> >> check - scalar_mode_supported_p when retrieving the available 
> >> scalar modes. The mode which is not supported for a target should be 
> >> skipped.
> >> (e.g. TImode on ppc). Also some function names and comments are 
> >> refined according to reviewer's advice.
> >>
> >>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with 
> >> no regressions.
> >>
> >> Thanks
> >> Gui Haochen
> >>
> >> ChangeLog
> >> Expand: Enable vector mode for by pieces compares
> >>
> >> Vector mode compare instructions are efficient for equality compare 
> >> on rs6000. This patch refactors the codes of by pieces operation to 
> >> enable vector mode for compare.
> >>
> >> gcc/
> >>PR target/111449
> >>* expr.cc (can_use_qi_vectors): New function to return true if
> >>we know how to implement OP using vectors of bytes.
> >>(qi_vector_mode_supported_p): New function to check if optabs
> >>exists for the mode and certain by pieces operations.
> >>(widest_fixed_size_mode_for_size): Replace the second argument
> >>with the type of by pieces operations.  Call can_use_qi_vectors
> >>and qi_vector_mode_supported_p to do the check.  Call
> >>scalar_mode_supported_p to check if the scalar mode is supported.
> >>(by_pieces_ninsns): Pass the type of by pieces operation to
> >>widest_fixed_size_mode_for_size.
> >>(class op_by_pieces_d): Remove m_qi_vector_mode.  Add m_op to
> >>record the type of by pieces operations.
> >>(op_by_pieces_d::op_by_pieces_d): Change last argument to the
> >>type of by pieces operations, initialize m_op with it.  Pass
> >>m_op to function widest_fixed_size_mode_for_size.
> >>(op_by_pieces_d::get_usable_mode): Pass m_op to function
> >>widest_fixed_size_mode_for_size.
> >>(op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
> >>can_use_qi_vectors and qi_vector_mode_supported_p to do the
> >>check.
> >>(op_by_pieces_d::run): Pass m_op to function
> >>widest_fixed_size_mode_for_size.
> >>(move_by_pieces_d::move_by_pieces_d): Set m_op to
> MOVE_BY_PIECES.
> >>(store_by_pieces_d::store_by_pieces_d): Set m_op with the op.
> >>(can_store_by_pieces): Pass the type of by pieces operations to
> >>widest_fixed_size_mode_for_size.
> >>(clear_by_pieces): Initialize class store_by_pieces_d with
> >>CLEAR_BY_PIECES.
> >>(compare_by_pieces_d::compare_by_pieces_d): Set m_op to
> >>COMPARE_BY_PIECES.
> >
> > OK, thanks.  And thanks for your patience.
> >
> > Richard
> >
> >> patch.diff
> >> diff --git a/gcc/expr.cc b/gcc/expr.cc index 
> >> 2c9930ec674..ad5f9dd8ec2 100644
> >> --- a/gcc/expr.cc
> >> +++ b/gcc/expr.cc
> >> @@ -988,18 +988,44 @@ alignment_for_piecewise_move (unsigned int
> max_pieces, unsigned int align)
> >>return align;
> >>  }
> >>
> >> -/* Return the widest 

OpenMP/Fortran: Group handling of 'if' clause without and with modifier (was: [committed] Partial OpenMP 4.5 fortran support)

2023-10-24 Thread Thomas Schwinge
Hi!

On 2016-11-10T12:41:59+0100, Jakub Jelinek  wrote:
> gcc/fortran/

>   * gfortran.h [...]

>   (struct gfc_omp_clauses): Add [...]
>   [...]
>   [...] and if_exprs fields.

Etc.

OK to push (after testing) the attached
"OpenMP/Fortran: Group handling of 'if' clause without and with modifier"?
That makes an upcoming change a bit lighter.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From a6e15fe6b08e2ced98435739506f9fc10db96a63 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 24 Oct 2023 10:43:40 +0200
Subject: [PATCH] OpenMP/Fortran: Group handling of 'if' clause without and
 with modifier

The 'if' clause with modifier was introduced in
commit b4c3a85be96585374bf95c981ba2f602667cf5b7 (Subversion r242037)
"Partial OpenMP 4.5 fortran support", but -- in some instances -- didn't place
it next to the existing handling of 'if' clause without modifier.  Unify that;
no change in behavior.

	gcc/fortran/
	* dump-parse-tree.cc (show_omp_clauses): Group handling of 'if'
	clause without and with modifier.
	* frontend-passes.cc (gfc_code_walker): Likewise.
	* gfortran.h (gfc_omp_clauses): Likewise.
	* openmp.cc (gfc_free_omp_clauses): Likewise.
---
 gcc/fortran/dump-parse-tree.cc | 42 +-
 gcc/fortran/frontend-passes.cc |  4 ++--
 gcc/fortran/gfortran.h |  2 +-
 gcc/fortran/openmp.cc  |  4 ++--
 4 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 68122e3e6fd..cc4846e5d74 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -1593,6 +1593,27 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
   show_expr (omp_clauses->if_expr);
   fputc (')', dumpfile);
 }
+  for (i = 0; i < OMP_IF_LAST; i++)
+if (omp_clauses->if_exprs[i])
+  {
+	static const char *ifs[] = {
+	  "CANCEL",
+	  "PARALLEL",
+	  "SIMD",
+	  "TASK",
+	  "TASKLOOP",
+	  "TARGET",
+	  "TARGET DATA",
+	  "TARGET UPDATE",
+	  "TARGET ENTER DATA",
+	  "TARGET EXIT DATA"
+	};
+  fputs (" IF(", dumpfile);
+  fputs (ifs[i], dumpfile);
+  fputs (": ", dumpfile);
+  show_expr (omp_clauses->if_exprs[i]);
+  fputc (')', dumpfile);
+}
   if (omp_clauses->final_expr)
 {
   fputs (" FINAL(", dumpfile);
@@ -1999,27 +2020,6 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
   show_expr (omp_clauses->detach);
   fputc (')', dumpfile);
 }
-  for (i = 0; i < OMP_IF_LAST; i++)
-if (omp_clauses->if_exprs[i])
-  {
-	static const char *ifs[] = {
-	  "CANCEL",
-	  "PARALLEL",
-	  "SIMD",
-	  "TASK",
-	  "TASKLOOP",
-	  "TARGET",
-	  "TARGET DATA",
-	  "TARGET UPDATE",
-	  "TARGET ENTER DATA",
-	  "TARGET EXIT DATA"
-	};
-  fputs (" IF(", dumpfile);
-  fputs (ifs[i], dumpfile);
-  fputs (": ", dumpfile);
-  show_expr (omp_clauses->if_exprs[i]);
-  fputc (')', dumpfile);
-}
   if (omp_clauses->destroy)
 fputs (" DESTROY", dumpfile);
   if (omp_clauses->depend_source)
diff --git a/gcc/fortran/frontend-passes.cc b/gcc/fortran/frontend-passes.cc
index 536884b13f0..0378e0dba06 100644
--- a/gcc/fortran/frontend-passes.cc
+++ b/gcc/fortran/frontend-passes.cc
@@ -5652,6 +5652,8 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t codefn, walk_expr_fn_t exprfn,
 			OMP_LIST_MAP, OMP_LIST_TO, OMP_LIST_FROM };
 		  size_t idx;
 		  WALK_SUBEXPR (co->ext.omp_clauses->if_expr);
+		  for (idx = 0; idx < OMP_IF_LAST; idx++)
+		WALK_SUBEXPR (co->ext.omp_clauses->if_exprs[idx]);
 		  WALK_SUBEXPR (co->ext.omp_clauses->final_expr);
 		  WALK_SUBEXPR (co->ext.omp_clauses->num_threads);
 		  WALK_SUBEXPR (co->ext.omp_clauses->chunk_size);
@@ -5667,8 +5669,6 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t codefn, walk_expr_fn_t exprfn,
 		  WALK_SUBEXPR (co->ext.omp_clauses->num_tasks);
 		  WALK_SUBEXPR (co->ext.omp_clauses->priority);
 		  WALK_SUBEXPR (co->ext.omp_clauses->detach);
-		  for (idx = 0; idx < OMP_IF_LAST; idx++)
-		WALK_SUBEXPR (co->ext.omp_clauses->if_exprs[idx]);
 		  for (idx = 0; idx < ARRAY_SIZE (list_types); idx++)
 		for (n = co->ext.omp_clauses->lists[list_types[idx]];
 			 n; n = n->next)
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 88f33b0957e..9c1a39a19de 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1545,6 +1545,7 @@ typedef struct gfc_omp_clauses
 {
   gfc_omp_namelist *lists[OMP_LIST_NUM];
   struct gfc_expr *if_expr;
+  struct gfc_expr *if_exprs[OMP_IF_LAST];
   struct gfc_expr *final_expr;
   struct gfc_expr *num_threads;
   struct gfc_expr *chunk_size;
@@ -1561,7 +1562,6 @@ typedef struct gfc_omp_clauses
   struct gfc_expr *priority;
   struct gfc_expr *detach;
   struct gfc_expr *depobj;
-  struct

RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-24 Thread Jiang, Haochen
Hi Haochen Gui,

It seems that the commit caused lots of test case fail on x86 platforms:

https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078379.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078380.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078381.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078382.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078383.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078384.html

Please help verify that if we need some testcase change or we get bug here.

A simple reproducer under build folder is:

make check RUNTESTFLAGS="i386.exp=g++.target/i386/pr80566-2.C 
--target_board='unix{-m64\ -march=cascadelake,-m32\ 
-march=cascadelake,-m32,-m64}'"

Thx,
Haochen

> -Original Message-
> From: HAO CHEN GUI 
> Sent: Monday, October 23, 2023 9:30 AM
> To: Richard Sandiford 
> Cc: gcc-patches 
> Subject: Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces
> [PR111449]
> 
> Committed as r14-4835.
> 
> https://gcc.gnu.org/g:f08ca5903c7a02b450b93143467f70b9fd8e0085
> 
> Thanks
> Gui Haochen
> 
> 在 2023/10/20 16:49, Richard Sandiford 写道:
> > HAO CHEN GUI  writes:
> >> Hi,
> >>   Vector mode instructions are efficient for compare on some targets.
> >> This patch enables vector mode for compare_by_pieces. Two help
> >> functions are added to check if vector mode is available for certain
> >> by pieces operations and if if optabs exists for the mode and certain
> >> by pieces operations. One member is added in class op_by_pieces_d to
> >> record the type of operations.
> >>
> >>   The test case is in the second patch which is rs6000 specific.
> >>
> >>   Compared to last version, the main change is to add a target hook
> >> check - scalar_mode_supported_p when retrieving the available scalar
> >> modes. The mode which is not supported for a target should be skipped.
> >> (e.g. TImode on ppc). Also some function names and comments are refined
> >> according to reviewer's advice.
> >>
> >>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> >> regressions.
> >>
> >> Thanks
> >> Gui Haochen
> >>
> >> ChangeLog
> >> Expand: Enable vector mode for by pieces compares
> >>
> >> Vector mode compare instructions are efficient for equality compare on
> >> rs6000. This patch refactors the codes of by pieces operation to enable
> >> vector mode for compare.
> >>
> >> gcc/
> >>PR target/111449
> >>* expr.cc (can_use_qi_vectors): New function to return true if
> >>we know how to implement OP using vectors of bytes.
> >>(qi_vector_mode_supported_p): New function to check if optabs
> >>exists for the mode and certain by pieces operations.
> >>(widest_fixed_size_mode_for_size): Replace the second argument
> >>with the type of by pieces operations.  Call can_use_qi_vectors
> >>and qi_vector_mode_supported_p to do the check.  Call
> >>scalar_mode_supported_p to check if the scalar mode is supported.
> >>(by_pieces_ninsns): Pass the type of by pieces operation to
> >>widest_fixed_size_mode_for_size.
> >>(class op_by_pieces_d): Remove m_qi_vector_mode.  Add m_op to
> >>record the type of by pieces operations.
> >>(op_by_pieces_d::op_by_pieces_d): Change last argument to the
> >>type of by pieces operations, initialize m_op with it.  Pass
> >>m_op to function widest_fixed_size_mode_for_size.
> >>(op_by_pieces_d::get_usable_mode): Pass m_op to function
> >>widest_fixed_size_mode_for_size.
> >>(op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
> >>can_use_qi_vectors and qi_vector_mode_supported_p to do the
> >>check.
> >>(op_by_pieces_d::run): Pass m_op to function
> >>widest_fixed_size_mode_for_size.
> >>(move_by_pieces_d::move_by_pieces_d): Set m_op to
> MOVE_BY_PIECES.
> >>(store_by_pieces_d::store_by_pieces_d): Set m_op with the op.
> >>(can_store_by_pieces): Pass the type of by pieces operations to
> >>widest_fixed_size_mode_for_size.
> >>(clear_by_pieces): Initialize class store_by_pieces_d with
> >>CLEAR_BY_PIECES.
> >>(compare_by_pieces_d::compare_by_pieces_d): Set m_op to
> >>COMPARE_BY_PIECES.
> >
> > OK, thanks.  And thanks for your patience.
> >
> > Richard
> >
> >> patch.diff
> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
> >> index 2c9930ec674..ad5f9dd8ec2 100644
> >> --- a/gcc/expr.cc
> >> +++ b/gcc/expr.cc
> >> @@ -988,18 +988,44 @@ alignment_for_piecewise_move (unsigned int
> max_pieces, unsigned int align)
> >>return align;
> >>  }
> >>
> >> -/* Return the widest QI vector, if QI_MODE is true, or integer mode
> >> -   that is narrower than SIZE bytes.  */
> >> +/* Return true if we know how to implement OP using vectors of bytes.  */
> >> +static bool
> >> +can_use_qi_vectors (by_pieces_operation op)
> >> +{
> >> +  return (op == COMPARE_BY_PIECES
> >> +|| op == SET_BY_PIECES
> >> +|| op == CLEAR_BY_PIECES);
> >> +}
> >> +

Re: OpenMP/Fortran: Group handling of 'if' clause without and with modifier

2023-10-24 Thread Tobias Burnus

CC: fortran@ for completeness.

On 24.10.23 10:55, Thomas Schwinge wrote:

OK to push (after testing) the attached
"OpenMP/Fortran: Group handling of 'if' clause without and with modifier"?
That makes an upcoming change a bit lighter.


LGTM.

(The patch just moves some code up (in the same functions) such that
'if()' and 'if(:)' are next to each other.)

Tobias


 From a6e15fe6b08e2ced98435739506f9fc10db96a63 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge
Date: Tue, 24 Oct 2023 10:43:40 +0200
Subject: [PATCH] OpenMP/Fortran: Group handling of 'if' clause without and
  with modifier

The 'if' clause with modifier was introduced in
commit b4c3a85be96585374bf95c981ba2f602667cf5b7 (Subversion r242037)
"Partial OpenMP 4.5 fortran support", but -- in some instances -- didn't place
it next to the existing handling of 'if' clause without modifier.  Unify that;
no change in behavior.

  gcc/fortran/
  * dump-parse-tree.cc (show_omp_clauses): Group handling of 'if'
  clause without and with modifier.
  * frontend-passes.cc (gfc_code_walker): Likewise.
  * gfortran.h (gfc_omp_clauses): Likewise.
  * openmp.cc (gfc_free_omp_clauses): Likewise.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch] OpenMP/Fortran: Handle unlisted items in 'omp allocators' + exec. 'omp allocate'

2023-10-24 Thread Tobias Burnus

This patch assumes that EXEC_OMP_ALLOCATE/EXEC_OMP_ALLOCATORS is/will be later 
handled as currently
done in OG13, 
https://github.com/gcc-mirror/gcc/blob/devel/omp/gcc-13/gcc/fortran/trans-openmp.cc

Depending how we want to handle it in mainline, the patch still could make sense
- or parts should be modified, e.g. we might want to handle standard Fortran 
allocation (malloc)
and OpenMP one (GOMP_alloc) directly in trans-sstmt.cc; if so, we might want to 
skip adding another
allocate-stmt. - We probably still want to do the 'allocate' and diagnostic 
hanlding in openmp.cc
in all cases.

In any case, we surely need to handle on mainline:
* the dump-parse-tree.cc patch is surely needed and without removing
the empty entry (n->sym == NULL), it needs an additional fix in order not to 
crash.
* Rejecting coarrays in the empty-list case, which presumably makes most
  sense inside openmp.cc.

* * *

On mainline, an executable '!$omp allocate' / '!$omp allocators' stops
in trans-openmp.cc with a sorry, not yet implemented.
However, OG13 has some implementation for executable '!$omp allocate';
trying to merge mainline into OG13, I found the following issues:

* -fdump-parse-tree did not dump the clauses (trivial issue)
  (simple oversight)
* The not-specified items should be better handled
  => done now during resolution in openmp.cc.

* * *

While -fdump-tree-original can be used to test it, the "sorry" makes
it hard to write a testsuite test. Some testcases exist like
gfortran.dg/gomp/allocate-5.f90, which contains code similar to the
last example, but it is only a parse + 'sorry'-shows-up testcase.
(Well, the two new 'error:' cases can be tested and are tested but
they are more boring.)

* * *

The spec states:

For
  !$omp allocators allocate(align(4):a,b)
  allocate(a,b,c,d)
only a and b are allocated with an OpenMP allocator (→ 
omp_get_default_allocator())
and alignment of 4. - 'c' and 'd' are allocated in the normal Fortran way.

The deprecated works as follows:
  !$omp allocate(a,b) align(4)
  !$omp allocate align(16)   ! not: no list argument after 'allocate')
  allocate(a,b,c,d)
where a and b will be allocated with an alignment of 4 and the rest,
here, c and d, with the settings of the directive without argument list,
i.e. c and d are allocated with an alignment of 16.

The question is what is supposed to happen for:
  !$omp allocate(a,b) align(4)
  allocate(a,b,c,d)
Should that use the default allocator for c and d, i.e. the same as
  !$omp allocate(a,b) align(4)
  !$omp allocate
  allocate(a,b,c,d)

Or should it use the normal Fortran allocator, following what 'allocators' does?

The spec does not really tell (and that syntax is deprecated in 5.2, removed in
TR11/OpenMP 6). Thus, GCC now prints an error.  However, it would be trivial to
choose either of the other variants.

* * *

The attached patch now handles the not-specified items:
* It adds them in the last case to the list; namelist->sym == NULL
  is the no-arguments case; this item is also removed, avoiding
  n->sym == NULL special cases later on.
* For the first two cases, a new Fortran ALLOCATE statement is created,
  containing the non-treated items.

Comments, suggestions, remarks?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Handle unlisted items in 'omp allocators' + exec. 'omp allocate'

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_node): Show clauses for
	EXEC_OMP_ALLOCATE and EXEC_OMP_ALLOCATORS.
	* openmp.cc (resolve_omp_clauses): Process nonlisted items
	for EXEC_OMP_ALLOCATE and EXEC_OMP_ALLOCATORS.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-14.f90: Add new checks.
	* gfortran.dg/gomp/allocate-5.f90: Remove items from an allocate-stmt
	that are not explicitly/implicited listed in 'omp allocate'.

 gcc/fortran/dump-parse-tree.cc |   2 +
 gcc/fortran/openmp.cc  | 112 -
 gcc/testsuite/gfortran.dg/gomp/allocate-14.f90 |  41 +
 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90  |   4 +-
 4 files changed, 155 insertions(+), 4 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 68122e3e6fd..1440524f971 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -2241,6 +2241,8 @@ show_omp_node (int level, gfc_code *c)
 case EXEC_OACC_CACHE:
 case EXEC_OACC_ENTER_DATA:
 case EXEC_OACC_EXIT_DATA:
+case EXEC_OMP_ALLOCATE:
+case EXEC_OMP_ALLOCATORS:
 case EXEC_OMP_ASSUME:
 case EXEC_OMP_CANCEL:
 case EXEC_OMP_CANCELLATION_POINT:
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 1cc65d7fa49..95e0aaafa58 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -7924,10 +7924,14 @@ resolve_omp_clauses (

[PATCH V13] ree: Improve ree pass using defined abi interfaces

2023-10-24 Thread Ajit Agarwal
Hello Vineet, Jeff and Bernhard:

This version 13 of the patch uses abi interfaces to remove zero and sign 
extension elimination.
Bootstrapped and regtested on powerpc-linux-gnu.

In this version (version 13) of the patch following review comments are 
incorporated.

a) Removal of hard code zero_extend and sign_extend  in abi interfaces.
b) Source and destination with different registers are considered.
c) Further enhancements.
d) Added sign extension elimination using abi interfaces.
d) Addressed remaining review comments from Vineet.
e) Addressed review comments from Bernhard.
f) Fix aarch64 regressions failure.

Please let me know if there is anything missing in this patch.

Ok for trunk?

Thanks & Regards
Ajit

ree: Improve ree pass using defined abi interfaces

For rs6000 target we see zero and sign extend with missing
definitions. Improved to eliminate such zero and sign extension
using defined ABI interfaces.

2023-10-24  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (combine_reaching_defs): Eliminate zero_extend and sign_extend
using defined abi interfaces.
(add_removable_extension): Use of defined abi interfaces for no
reaching defs.
(abi_extension_candidate_return_reg_p): New function.
(abi_extension_candidate_p): New function.
(abi_extension_candidate_argno_p): New function.
(abi_handle_regs): New function.
(abi_target_promote_function_mode): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim-3.C
---
changes since v6:
  - Added missing abi interfaces.
  - Rearranging and restructuring the code.
  - Removal of hard coded zero extend and sign extend in abi interfaces.
  - Relaxed different registers with source and destination in abi interfaces.
  - Using CSE in abi interfaces.
  - Fix aarch64 regressions.
  - Add Sign extension removal in abi interfaces.
  - Modified comments as per coding convention.
  - Modified code as per coding convention.
  - Fix bug bootstrapping RISCV failures.
---
 gcc/ree.cc| 144 +-
 .../g++.target/powerpc/zext-elim-3.C  |  13 ++
 2 files changed, 151 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..72e3b625a18 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
 if (REGNO (DF_REF_REG (def)) == REGNO (reg))
   break;
 
-  gcc_assert (def != NULL);
+  if (def == NULL)
+return NULL;
 
   ref_chain = DF_REF_CHAIN (def);
 
@@ -750,6 +751,117 @@ get_extended_src_reg (rtx src)
   return src;
 }
 
+/* Return TRUE if target mode is equal to source mode, false otherwise.  */
+
+static bool
+abi_target_promote_function_mode (machine_mode mode)
+{
+  int unsignedp;
+  machine_mode tgt_mode
+= targetm.calls.promote_function_mode (NULL_TREE, mode, &unsignedp,
+  NULL_TREE, 1);
+
+  return tgt_mode == mode;
+}
+
+/* Return TRUE if regno is a return register.  */
+
+static inline bool
+abi_extension_candidate_return_reg_p (int regno)
+{
+  if (targetm.calls.function_value_regno_p (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the following conditions are satisfied.
+
+  a) reg source operand is argument register and not return register.
+  b) mode of source and destination operand are different.
+  c) if not promoted REGNO of source and destination operand are same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  machine_mode dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set), 0);
+
+  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  || abi_extension_candidate_return_reg_p (REGNO (orig_src)))
+return false;
+
+  /* Return FALSE if mode of destination and source is same.  */
+  if (dst_mode == GET_MODE (orig_src))
+return false;
+
+  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
+  bool promote_p = abi_target_promote_function_mode (mode);
+
+  /* Return FALSE if promote is false and REGNO of source and destination
+ is different.  */
+  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
+return false;
+
+  return true;
+}
+
+/* Return TRUE if regno is an argument register.  */
+
+static inline bool
+abi_extension_candidate_argno_p (int regno)
+{
+  return FUNCTION_ARG_REGNO_P (regno);
+}
+
+/* Return TRUE if the candidate insn doesn't have defs and have
+ * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class.  */
+
+static bool
+abi_handle_regs (rtx_insn *insn)
+{
+  if (side_effects_p (PATTERN (insn)))
+return false;
+
+  struct df_link *uses = get_uses (insn, SET_DEST (PATTERN (insn)));
+
+  if (!uses)
+return false;
+
+  for (df_link *use = uses; use; use = use->next)
+{
+  if (!use->ref)
+   return false;
+
+  if (BLOCK_FOR_INSN (insn) != BLOCK_FOR_

Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-24 Thread HAO CHEN GUI
OK, I will take it.

Thanks
Gui Haochen

在 2023/10/24 16:49, Jiang, Haochen 写道:
> It seems that the mail got caught elsewhere and did not send into gcc-patches
> mailing thread. Resending that.
> 
> Thx,
> Haochen
> 
> -Original Message-
> From: Jiang, Haochen 
> Sent: Tuesday, October 24, 2023 4:43 PM
> To: HAO CHEN GUI ; Richard Sandiford 
> 
> Cc: gcc-patches 
> Subject: RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces 
> [PR111449]
> 
> Hi Haochen Gui,
> 
> It seems that the commit caused lots of test case fail on x86 platforms:
> 
> https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078379.html
> https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078380.html
> https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078381.html
> https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078382.html
> https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078383.html
> https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078384.html
> 
> Please help verify that if we need some testcase change or we get bug here.
> 
> A simple reproducer under build folder is:
> 
> make check RUNTESTFLAGS="i386.exp=g++.target/i386/pr80566-2.C 
> --target_board='unix{-m64\ -march=cascadelake,-m32\ 
> -march=cascadelake,-m32,-m64}'"
> 
> Thx,
> Haochen
> 
>> -Original Message-
>> From: HAO CHEN GUI 
>> Sent: Monday, October 23, 2023 9:30 AM
>> To: Richard Sandiford 
>> Cc: gcc-patches 
>> Subject: Re: [PATCH-1v4, expand] Enable vector mode for 
>> compare_by_pieces [PR111449]
>>
>> Committed as r14-4835.
>>
>> https://gcc.gnu.org/g:f08ca5903c7a02b450b93143467f70b9fd8e0085
>>
>> Thanks
>> Gui Haochen
>>
>> 在 2023/10/20 16:49, Richard Sandiford 写道:
>>> HAO CHEN GUI  writes:
 Hi,
   Vector mode instructions are efficient for compare on some targets.
 This patch enables vector mode for compare_by_pieces. Two help 
 functions are added to check if vector mode is available for 
 certain by pieces operations and if if optabs exists for the mode 
 and certain by pieces operations. One member is added in class 
 op_by_pieces_d to record the type of operations.

   The test case is in the second patch which is rs6000 specific.

   Compared to last version, the main change is to add a target hook 
 check - scalar_mode_supported_p when retrieving the available 
 scalar modes. The mode which is not supported for a target should be 
 skipped.
 (e.g. TImode on ppc). Also some function names and comments are 
 refined according to reviewer's advice.

   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with 
 no regressions.

 Thanks
 Gui Haochen

 ChangeLog
 Expand: Enable vector mode for by pieces compares

 Vector mode compare instructions are efficient for equality compare 
 on rs6000. This patch refactors the codes of by pieces operation to 
 enable vector mode for compare.

 gcc/
PR target/111449
* expr.cc (can_use_qi_vectors): New function to return true if
we know how to implement OP using vectors of bytes.
(qi_vector_mode_supported_p): New function to check if optabs
exists for the mode and certain by pieces operations.
(widest_fixed_size_mode_for_size): Replace the second argument
with the type of by pieces operations.  Call can_use_qi_vectors
and qi_vector_mode_supported_p to do the check.  Call
scalar_mode_supported_p to check if the scalar mode is supported.
(by_pieces_ninsns): Pass the type of by pieces operation to
widest_fixed_size_mode_for_size.
(class op_by_pieces_d): Remove m_qi_vector_mode.  Add m_op to
record the type of by pieces operations.
(op_by_pieces_d::op_by_pieces_d): Change last argument to the
type of by pieces operations, initialize m_op with it.  Pass
m_op to function widest_fixed_size_mode_for_size.
(op_by_pieces_d::get_usable_mode): Pass m_op to function
widest_fixed_size_mode_for_size.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
can_use_qi_vectors and qi_vector_mode_supported_p to do the
check.
(op_by_pieces_d::run): Pass m_op to function
widest_fixed_size_mode_for_size.
(move_by_pieces_d::move_by_pieces_d): Set m_op to
>> MOVE_BY_PIECES.
(store_by_pieces_d::store_by_pieces_d): Set m_op with the op.
(can_store_by_pieces): Pass the type of by pieces operations to
widest_fixed_size_mode_for_size.
(clear_by_pieces): Initialize class store_by_pieces_d with
CLEAR_BY_PIECES.
(compare_by_pieces_d::compare_by_pieces_d): Set m_op to
COMPARE_BY_PIECES.
>>>
>>> OK, thanks.  And thanks for your patience.
>>>
>>> Richard
>>>
 patch.diff
 diff --git a/gcc/expr.cc b/gcc/expr.cc index 
 2c9930ec674..ad5f9dd8ec2 100644
 --- a/gcc/expr.cc
 +++ b/gcc/expr.cc

Re: [PATCH v3] gcc: Introduce -fhardened

2023-10-24 Thread Iain Sandoe
hi Marek,

> On 24 Oct 2023, at 08:44, Iain Sandoe  wrote:
> On 23 Oct 2023, at 20:25, Marek Polacek  wrote:
>> 
>> On Thu, Oct 19, 2023 at 02:24:11PM +0200, Richard Biener wrote:
>>> On Wed, Oct 11, 2023 at 10:48 PM Marek Polacek  wrote:
 
 On Tue, Sep 19, 2023 at 10:58:19AM -0400, Marek Polacek wrote:
> On Mon, Sep 18, 2023 at 08:57:39AM +0200, Richard Biener wrote:
>> On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
>>  wrote:
>> 
> 
>> and I tried Darwin (104) and that fails with
>> 
>> *** Configuration aarch64-apple-darwin21.6.0 not supported
>> 
>> Is anyone else able to build gcc on those machines, or test the attached
>> patch?
> 
> We’re still working on upstreaming the aarch64 Darwin port - the devt. branch
> is here; https://github.com/iains/gcc-darwin-arm64 (but it will be rebased 
> soon
> because we just upstreamed some dependencies).
> 
> In the meantime, I will put your patch into my test queue - hopefully before
> next week.

actually, I rebased already .. (but not pushed yet, pending testing).

aarch64-darwin21 bootstrapped fine with your patch (as did x86_64-darwin19)

===

$ /opt/iains/aarch64-apple-darwin21/gcc-14-0-0/bin/gcc /source/test/hello.c -o 
hc -fhardened -Whardened
cc1: warning: ‘_FORTIFY_SOURCE’ is not enabled by ‘-fhardened’ because 
optimizations are turned off [-Whardened]

$ /opt/iains/aarch64-apple-darwin21/gcc-14-0-0/bin/gcc /source/test/hello.c -o 
hc -fhardened -Whardened -O


I’m about to run the testsuite, but if there’s something else to be tested 
please let me know (NOTE: I have not read the patch, just applied it and built).

thanks,
Iain



Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-24 Thread Ajit Agarwal



On 24/10/23 1:10 pm, Ajit Agarwal wrote:
> Hello Vineet:
> 
> On 24/10/23 12:02 am, Vineet Gupta wrote:
>>
>>
>> On 10/22/23 23:46, Ajit Agarwal wrote:
>>> Hello All:
>>>
>>> Addressed below review comments in the version 11 of the patch.
>>> Please review and please let me know if its ok for trunk.
>>>
>>> Thanks & Regards
>>> Ajit
>>
>> Again you are not paying attention to prior comments about fixing your 
>> submission practice and like some of the prior reviewers I'm starting to get 
>> tired, despite potentially good technical content.
>>
> 
> Sorry for the inconvenience caused. I will make sure all the comments from 
> reviewers
> are addressed.
> 
>> 1. The commentary above is NOT part of changelog. Either use a separate 
>> cover letter or add patch version change history between two "---" lines 
>> just before the start of code diff. And keep accumulating those as you post 
>> new versions. See [1]. This is so reviewers knwo what changed over 10 months 
>> and automatically gets dropped when patch is eventually applied/merged into 
>> tree.
>>
> 
> Sure I will do that.

Made changes in version 13 of the patch with changes since v6.

Thanks & Regards
Ajit
>  
>> 2. Acknowledge (even if it is yes) each and every comment of the reviewerw 
>> explicitly inline below. That ensures you don't miss addressing a change 
>> since this forces one to think about each of them.
>>
> 
> Surely I will acknowledge each and every comments inline.
> 
>> I do have some technical comments which I'll follow up with later.
> 
> I look forward to it.
> 
>> Just a short summary that v10 indeed bootstraps risc-v but I don't see any 
>> improvements at all - as in whenever abi interfaces code identifies and 
>> extension (saw missing a definition, the it is not able to eliminate any 
>> extensions despite the patch.
>>
> 
> Thanks for the summary and the check. 
> 
> Thanks & Regards
> Ajit
>  
>> -Vineet
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632180.html
>>
>>>
>>> On 22/10/23 12:56 am, rep.dot@gmail.com wrote:
 On 21 October 2023 01:56:16 CEST, Vineet Gupta  
 wrote:
> On 10/19/23 23:50, Ajit Agarwal wrote:
>> Hello All:
>>
>> This version 9 of the patch uses abi interfaces to remove zero and sign 
>> extension elimination.
>> Bootstrapped and regtested on powerpc-linux-gnu.
>>
>> In this version (version 9) of the patch following review comments are 
>> incorporated.
>>
>> a) Removal of hard code zero_extend and sign_extend  in abi interfaces.
>> b) Source and destination with different registers are considered.
>> c) Further enhancements.
>> d) Added sign extension elimination using abi interfaces.
> As has been trend in the past, I don't think all the review comments have 
> been addressed.
 And apart from that, may I ask if this is just me, or does anybody else 
 think that it might be worthwhile to actually read a patch before 
 (re-)posting?

 Seeing e.g. the proposed abi_extension_candidate_p as written in a first 
 POC would deserve some manual CSE, if nothing more then for clarity and 
 conciseness?

 Just curious from a meta perspective..

 And:

>> ree: Improve ree pass for rs6000 target using defined abi interfaces
 mentioning powerpc like this, and then changing generic code could be 
 interpreted as misleading, IMHO.

>> For rs6000 target we see redundant zero and sign extension and done
>> to improve ree pass to eliminate such redundant zero and sign extension
>> using defined ABI interfaces.
 Mentioning powerpc in the body as one of the affected target(s) is of 
 course fine.


>>    +/* Return TRUE if target mode is equal to source mode of zero_extend
>> +   or sign_extend otherwise false.  */
 , false otherwise.

 But I'm not a native speaker


>> +/* Return TRUE if the candidate insn is zero extend and regno is
>> +   a return registers.  */
>> +
>> +static bool
>> +abi_extension_candidate_return_reg_p (/*rtx_insn *insn, */int regno)
 Leftover debug comment.

>> +{
>> +  if (targetm.calls.function_value_regno_p (regno))
>> +    return true;
>> +
>> +  return false;
>> +}
>> +
 As said, I don't see why the below was not cleaned up before the V1 
 submission.
 Iff it breaks when manually CSEing, I'm curious why?

>> +/* Return TRUE if reg source operand of zero_extend is argument 
>> registers
>> +   and not return registers and source and destination operand are same
>> +   and mode of source and destination operand are not same.  */
>> +
>> +static bool
>> +abi_extension_candidate_p (rtx_insn *insn)
>> +{
>> +  rtx set = single_set (insn);
>> +  machine_mode dst_mode = GET_MODE (SET_DEST (set));
>> +  rtx orig_src = XEXP (SET_SRC (set), 0);
>> +
>> +  if (!FUN

Re: PATCH v6 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

2023-10-24 Thread Ajit Agarwal



On 19/09/23 1:57 am, Vineet Gupta wrote:
> Hi Ajit,
> 
> On 9/17/23 22:59, Ajit Agarwal wrote:
>> This new version of patch 6 use improve ree pass for rs6000 target using 
>> defined ABI interfaces.
>> Bootstrapped and regtested on power64-linux-gnu.
>>
>> Review comments incorporated.
>>
>> Thanks & Regards
>> Ajit
> 
> Nit: This seems to belong to "what changed in v6" between the two "---" lines 
> right before start of source diff.

Addressed in V13 of the patch.
> 
>> ree: Improve ree pass for rs6000 target using defined abi interfaces
>>
>> For rs6000 target we see redundant zero and sign extension and done to
>> improve ree pass to eliminate such redundant zero and sign extension
>> using defined ABI interfaces.
> 
> It seems you have redundant "redundant zero and sign extension" - pun 
> intended  ;-)
> 
> On a serious note, when debugging your code for a possible RISC-V benefit, it 
> seems what it is trying to do is address REE giving up due to "missing 
> definition(s)". Perhaps mentioning that in commitlog would give the reader 
> more context.

Addressed in V13 of the patch.
> 
>> +/* Return TRUE if target mode is equal to source mode of zero_extend
>> +   or sign_extend otherwise false.  */
>> +
>> +static bool
>> +abi_target_promote_function_mode (machine_mode mode)
>> +{
>> +  int unsignedp;
>> +  machine_mode tgt_mode =
>> +    targetm.calls.promote_function_mode (NULL_TREE, mode, &unsignedp,
>> + NULL_TREE, 1);
>> +
>> +  if (tgt_mode == mode)
>> +    return true;
>> +  else
>> +    return false;
>> +}
>> +
>> +/* Return TRUE if the candidate insn is zero extend and regno is
>> +   an return  registers.  */
> 
> Additional Whitespace and grammer
> s/an return  registers/a return register
> 

Addressed in V12 of the patch.

> Please *run* contrib/check_gnu_style on your patch before sending out on 
> mailing lists, saves reviewers time and they can focus more on technical 
> content.
> 
>> +
>> +static bool
>> +abi_extension_candidate_return_reg_p (rtx_insn *insn, int regno)
>> +{
>> +  rtx set = single_set (insn);
>> +
>> +  if (GET_CODE (SET_SRC (set)) != ZERO_EXTEND)
>> +    return false;
> 
> This still has ABI assumptions: RISC-V generates SIGN_EXTEND for functions 
> args and return reg.
> This is not a deficiency of patch per-se, but something we would like to 
> address - even if as an addon-patch.
>

Already addressed in V13 of the patch.
 
>> +
>> +  if (FUNCTION_VALUE_REGNO_P (regno))
>> +    return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Return TRUE if reg source operand of zero_extend is argument registers
>> +   and not return registers and source and destination operand are same
>> +   and mode of source and destination operand are not same.  */
>> +
>> +static bool
>> +abi_extension_candidate_p (rtx_insn *insn)
>> +{
>> +  rtx set = single_set (insn);
>> +
>> +  if (GET_CODE (SET_SRC (set)) != ZERO_EXTEND)
>> +    return false;
> Ditto: ABI assumption.
> 

Already addressed in V12 of the patch.

>> +
>> +  machine_mode ext_dst_mode = GET_MODE (SET_DEST (set));
> 
> why not simply @dst_mode
> 
>> +  rtx orig_src = XEXP (SET_SRC (set),0);
>> +
>> +  bool copy_needed
>> +    = (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0)));
> 
> Maybe use @orig_src here, rather than duplicating XEXP (SET_SRC (set),0)
>

Already addressed.
 
>> +  if (!copy_needed && ext_dst_mode != GET_MODE (orig_src)
> 
> The bailing out for copy_needed needs extra commentary, why ?
> 
>> +  && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
>> +  && !abi_extension_candidate_return_reg_p (insn, REGNO (orig_src)))
>> +    return true;
>> +
>> +  return false;
> 
> Consider this bike-shed but I would arrange this code differently. The main 
> case here is check for function args and then the not so imp reasons
> 
> +  rtx orig_src = XEXP (src, 0);
> +
> +  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
> +  || abi_extension_candidate_return_reg_p (insn, REGNO (orig_src)))
> +    return false;
> +
> +  /* commentary as to why  */
> +  if (dst_mode == GET_MODE (orig_src))
> +    return false;
> 
> -   bool copy_needed
> -    = (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0)));
> +  /* copy needed  . */
> +  if (REGNO (SET_DEST (set)) != REGNO (orig_src))
> +    return false;
> +
> + return true;
> 

Already addressed.


>> +/* Return TRUE if the candidate insn is zero extend and regno is
>> +   an argument registers.  */
>> +
>> +static bool
>> +abi_extension_candidate_argno_p (rtx_code code, int regno)
>> +{
>> +  if (code != ZERO_EXTEND)
>> +    return false;
> 
> ABI assumption still.
>

Already addressed.
 
>> +
>> +  if (FUNCTION_ARG_REGNO_P (regno))
>> +    return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Return TRUE if the candidate insn doesn't have defs and have
>> + * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class.  */
>> +
>> +static bool
>> +abi_handle_regs_without_defs_p (rtx_insn *insn)
>> +{
>> +  if (side_effects

Re: [PATCH] Fix PR ada/111813 (Inconsistent limit in Ada.Calendar.Formatting)

2023-10-24 Thread Arnaud Charlet
This change is OK, thank you.

> The description of the second Value function (returning Duration) (ARM 
> 9.6.1(87) 
> doesn't place any limitation on the Elapsed_Time parameter's value, beyond 
> "Constraint_Error is raised if the string is not formatted as described for 
> Image, or 
> the function cannot interpret the given string as a Duration value".
> 
> It would seem reasonable that Value and Image should be consistent, in that 
> any 
> string produced by Image should be accepted by Value. Since Image must produce
> a two-digit representation of the Hours, there's an implication that its 
> Elapsed_Time parameter should be less than 100.0 hours (the ARM merely says
> that in that case the result is implementation-defined).
> 
> The current implementation of Value raises Constraint_Error if the 
> Elapsed_Time
> parameter is greater than or equal to 24 hours.
> 
> This patch removes the restriction, so that the Elapsed_Time parameter must 
> only
> be less than 100.0 hours.
> 
> gcc/ada/Changelog:
> 
>   2023-10-15 Simon Wright 
> 
>   PR ada/111813
> 
>   * gcc/ada/libgnat/a-calfor.adb (Value (2)): Allow values of parameter
>   Elapsed_Time greater than or equal to 24 hours, by doing the
>   hour calculations in Natural rather than Hour_Number (0 .. 23).
>   Calculate the result directly rather than by using Seconds_Of
>   (whose Hour parameter is of type Hour_Number).
> 
>   If an exception occurs of type Constraint_Error, re-raise it
>   rather than raising a new CE.
> 
> gcc/testsuite/Changelog:
> 
>   2023-10-15 Simon Wright 
> 
>   PR ada/111813
> 
>   * gcc/testsuite/gnat.dg/calendar_format_value.adb: New test.


[PATCH] aarch64: Avoid bogus atomics match

2023-10-24 Thread Richard Sandiford
The non-LSE pattern aarch64_atomic_exchange comes before the
LSE pattern aarch64_atomic_exchange_lse.  From a recog
perspective, the only difference between the patterns is that
the non-LSE one clobbers CC and needs a scratch.

However, combine and RTL-SSA can both add clobbers to make a
pattern match.  This means that if they try to rerecognise an
LSE pattern, they could end up turning it into a non-LSE pattern.
This patch adds a !TARGET_LSE test to avoid that.

This is needed to avoid a regression with later patches.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/atomics.md (aarch64_atomic_exchange): Require
!TARGET_LSE.
---
 gcc/config/aarch64/atomics.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 2b6f04efa6c..055a87320ca 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -224,7 +224,7 @@ (define_insn_and_split "aarch64_atomic_exchange"
   UNSPECV_ATOMIC_EXCHG))
(clobber (reg:CC CC_REGNUM))
(clobber (match_scratch:SI 4 "=&r"))]
-  ""
+  "!TARGET_LSE"
   "#"
   "&& epilogue_completed"
   [(const_int 0)]
-- 
2.25.1



[pushed] aarch64: Define TARGET_INSN_COST

2023-10-24 Thread Richard Sandiford
This patch adds a bare-bones TARGET_INSN_COST.  See the comment
in the patch for the rationale.

This change is needed to avoid a regression with a later change.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64.cc (aarch64_insn_cost): New function.
(TARGET_INSN_COST): Define.
---
 gcc/config/aarch64/aarch64.cc | 25 +
 1 file changed, 25 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a28b66acf6a..4cbfa42cb3c 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -15541,6 +15541,28 @@ aarch64_memory_move_cost (machine_mode mode, 
reg_class_t rclass_i, bool in)
  : aarch64_tune_params.memmov_cost.store_int);
 }
 
+/* Implement TARGET_INSN_COST.  We have the opportunity to do something
+   much more productive here, such as using insn attributes to cost things.
+   But we don't, not yet.
+
+   The main point of this current definition is to make calling insn_cost
+   on one instruction equivalent to calling seq_cost on a sequence that
+   contains only that instruction.  The default definition would instead
+   only look at SET_SRCs, ignoring SET_DESTs.
+
+   This ensures that, for example, storing a 128-bit zero vector is more
+   expensive than storing a 128-bit vector register.  A move of zero
+   into a 128-bit vector register followed by multiple stores of that
+   register is then cheaper than multiple stores of zero (which would
+   use STP of XZR).  This in turn allows STPs to be formed.  */
+static int
+aarch64_insn_cost (rtx_insn *insn, bool speed)
+{
+  if (rtx set = single_set (insn))
+return set_rtx_cost (set, speed);
+  return pattern_cost (PATTERN (insn), speed);
+}
+
 /* Implement TARGET_INIT_BUILTINS.  */
 static void
 aarch64_init_builtins ()
@@ -28399,6 +28421,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_RTX_COSTS
 #define TARGET_RTX_COSTS aarch64_rtx_costs_wrapper
 
+#undef TARGET_INSN_COST
+#define TARGET_INSN_COST aarch64_insn_cost
+
 #undef TARGET_SCALAR_MODE_SUPPORTED_P
 #define TARGET_SCALAR_MODE_SUPPORTED_P aarch64_scalar_mode_supported_p
 
-- 
2.25.1



[pushed] i386: Fix unprotected REGNO in aeswidekl_operation

2023-10-24 Thread Richard Sandiford
I hit an ICE in aeswidekl_operation while testing the late-combine
pass on x86.  The predicate tested REGNO without first testing REG_P.

Tested on x86_64-linux-gnu & pushed as obvious.

Richard


gcc/
* config/i386/predicates.md (aeswidekl_operation): Protect
REGNO check with REG_P.
---
 gcc/config/i386/predicates.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index ef49efdbde5..e3d55f0c502 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -2260,6 +2260,7 @@ (define_predicate "aeswidekl_operation"
  || GET_CODE (SET_SRC (elt)) != UNSPEC_VOLATILE
  || GET_MODE (SET_SRC (elt)) != V2DImode
  || XVECLEN (SET_SRC (elt), 0) != 1
+ || !REG_P (XVECEXP (SET_SRC (elt), 0, 0))
  || REGNO (XVECEXP (SET_SRC (elt), 0, 0)) != GET_SSE_REGNO (i))
return false;
 }
-- 
2.25.1



[PATCH] i386: Avoid paradoxical subreg dests in vector zero_extend

2023-10-24 Thread Richard Sandiford
For the V2HI -> V2SI zero extension in:

  typedef unsigned short v2hi __attribute__((vector_size(4)));
  typedef unsigned int v2si __attribute__((vector_size(8)));
  v2si f (v2hi x) { return (v2si) {x[0], x[1]}; }

ix86_expand_sse_extend would generate:

   (set (reg:V2HI 102)
(const_vector:V2HI [(const_int 0 [0])
(const_int 0 [0])]))
   (set (subreg:V8HI (reg:V2HI 101) 0)
(vec_select:V8HI
  (vec_concat:V16HI (subreg:V8HI (reg/v:V2HI 99 [ x ]) 0)
(subreg:V8HI (reg:V2HI 102) 0))
  (parallel [(const_int 0 [0])
 (const_int 8 [0x8])
 (const_int 1 [0x1])
 (const_int 9 [0x9])
 (const_int 2 [0x2])
 (const_int 10 [0xa])
 (const_int 3 [0x3])
 (const_int 11 [0xb])])))
  (set (reg:V2SI 100)
   (subreg:V2SI (reg:V2HI 101) 0))
(expr_list:REG_EQUAL (zero_extend:V2SI (reg/v:V2HI 99 [ x ])))

But using (subreg:V2SI (reg:V2HI 101) 0) as the destination of
the vec_select means that only the low 4 bytes of the destination
are stored.  Only the lower half of reg 100 is well-defined.

Things tend to happen to work if the register allocator ties reg 101
to reg 100.  But it caused problems with the upcoming late-combine pass
because we propagated the set of reg 100 into its uses.

Tested on x86_64-linux-gnu.  OK to install?

Richard


gcc/
* config/i386/i386-expand.cc (ix86_split_mmx_punpck): Allow the
destination to be wider than the sources.  Take the mode from the
first source.
(ix86_expand_sse_extend): Pass the destination directly to
ix86_split_mmx_punpck, rather than using a fresh register that
is half the size.
---
 gcc/config/i386/i386-expand.cc | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 1eae9d7c78c..2361ff77af3 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1110,7 +1110,9 @@ ix86_split_mmx_pack (rtx operands[], enum rtx_code code)
   ix86_move_vector_high_sse_to_mmx (op0);
 }
 
-/* Split MMX punpcklXX/punpckhXX with SSE punpcklXX.  */
+/* Split MMX punpcklXX/punpckhXX with SSE punpcklXX.  This is also used
+   for a full unpack of OPERANDS[1] and OPERANDS[2] into a wider
+   OPERANDS[0].  */
 
 void
 ix86_split_mmx_punpck (rtx operands[], bool high_p)
@@ -1118,7 +1120,7 @@ ix86_split_mmx_punpck (rtx operands[], bool high_p)
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  machine_mode mode = GET_MODE (op0);
+  machine_mode mode = GET_MODE (op1);
   rtx mask;
   /* The corresponding SSE mode.  */
   machine_mode sse_mode, double_sse_mode;
@@ -5660,7 +5662,7 @@ ix86_expand_sse_extend (rtx dest, rtx src, bool 
unsigned_p)
   gcc_unreachable ();
 }
 
-  ops[0] = gen_reg_rtx (imode);
+  ops[0] = dest;
 
   ops[1] = force_reg (imode, src);
 
@@ -5671,7 +5673,6 @@ ix86_expand_sse_extend (rtx dest, rtx src, bool 
unsigned_p)
  ops[1], pc_rtx, pc_rtx);
 
   ix86_split_mmx_punpck (ops, false);
-  emit_move_insn (dest, lowpart_subreg (GET_MODE (dest), ops[0], imode));
 }
 
 /* Unpack SRC into the next wider integer vector type.  UNSIGNED_P is
-- 
2.25.1



[PATCH] i386: Fix undefined masks in vpopcnt tests

2023-10-24 Thread Richard Sandiford
The files changed in this patch had tests for masked and unmasked
popcnt.  However, the mask inputs to the masked forms were undefined,
and would be set to zero by init_regs.  Any combine-like pass that
ran after init_regs could then fold the masked forms into the
unmasked ones.  I saw this while testing the late-combine pass
on x86.

Tested on x86_64-linux-gnu.  OK to install?  (I didn't think this
counted as obvious because there are other ways of initialising
the mask.)

Richard


gcc/testsuite/
* gcc.target/i386/avx512bitalg-vpopcntb.c: Use an asm to define
the mask.
* gcc.target/i386/avx512bitalg-vpopcntbvl.c: Likewise.
* gcc.target/i386/avx512bitalg-vpopcntw.c: Likewise.
* gcc.target/i386/avx512bitalg-vpopcntwvl.c: Likewise.
* gcc.target/i386/avx512vpopcntdq-vpopcntd.c: Likewise.
* gcc.target/i386/avx512vpopcntdq-vpopcntq.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c| 1 +
 gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c  | 1 +
 gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c| 1 +
 gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c  | 1 +
 gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c | 1 +
 gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c | 1 +
 6 files changed, 6 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c 
b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
index 44b82c0519d..c52088161a0 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
+++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
@@ -11,6 +11,7 @@ extern __m512i z, z1;
 int foo ()
 {
   __mmask16 msk;
+  asm volatile ("" : "=k" (msk));
   __m512i c = _mm512_popcnt_epi8 (z);
   asm volatile ("" : "+v" (c));
   c = _mm512_mask_popcnt_epi8 (z1, msk, z);
diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c 
b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
index 8c2dfaba9c6..7d11c6c4623 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
+++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
@@ -16,6 +16,7 @@ int foo ()
 {
   __mmask32 msk32;
   __mmask16 msk16;
+  asm volatile ("" : "=k" (msk16), "=k" (msk32));
   __m256i c256 = _mm256_popcnt_epi8 (y);
   asm volatile ("" : "+v" (c256));
   c256 = _mm256_mask_popcnt_epi8 (y_1, msk32, y);
diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c 
b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
index 2ef8589f6c1..bc470415e9b 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
+++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
@@ -11,6 +11,7 @@ extern __m512i z, z1;
 int foo ()
 {
   __mmask16 msk;
+  asm volatile ("" : "=k" (msk));
   __m512i c = _mm512_popcnt_epi16 (z);
   asm volatile ("" : "+v" (c));
   c = _mm512_mask_popcnt_epi16 (z1, msk, z);
diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c 
b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
index c976461b12e..3a6af3ed8a1 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
+++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
@@ -16,6 +16,7 @@ int foo ()
 {
   __mmask16 msk16;
   __mmask8 msk8;
+  asm volatile ("" : "=k" (msk16), "=k" (msk8));
   __m256i c256 = _mm256_popcnt_epi16 (y);
   asm volatile ("" : "+v" (c256));
   c256 = _mm256_mask_popcnt_epi16 (y_1, msk16, y);
diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c 
b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
index b4d82f97032..0a54ae83055 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
@@ -20,6 +20,7 @@ int foo ()
 {
   __mmask16 msk;
   __mmask8 msk8;
+  asm volatile ("" : "=k" (msk), "=k" (msk8));
   __m128i a = _mm_popcnt_epi32 (x);
   asm volatile ("" : "+v" (a));
   a = _mm_mask_popcnt_epi32 (x_1, msk8, x);
diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c 
b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
index e87d6c999b6..c11e6e00998 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
@@ -19,6 +19,7 @@ extern __m512i z, z_1;
 int foo ()
 {
   __mmask8 msk; 
+  asm volatile ("" : "=k" (msk));
   __m128i a = _mm_popcnt_epi64 (x);
   asm volatile ("" : "+v" (a));
   a = _mm_mask_popcnt_epi64 (x_1, msk, x);
-- 
2.25.1



[PATCH] recog/reload: Remove old UNARY_P operand support

2023-10-24 Thread Richard Sandiford
reload and constrain_operands had some old code to look through unary
operators.  E.g. an operand could be (sign_extend (reg X)), and the
constraints would match the reg rather than the sign_extend.

This was previously used by the MIPS port.  But relying on it was a
recurring source of problems, so Eric and I removed it in the MIPS
rewrite from ~20 years back.  I don't know of any other port that used it.

Also, the constraints processing in LRA and IRA do not have direct
support for these embedded operators, so I think it was only ever a
reload-specific feature (and probably only a global/local+reload-specific
feature, rather than IRA+reload).

Keeping the checks caused problems for special memory constraints,
leading to:

  /* A unary operator may be accepted by the predicate, but it
 is irrelevant for matching constraints.  */
  /* For special_memory_operand, there could be a memory operand inside,
 and it would cause a mismatch for constraint_satisfied_p.  */
  if (UNARY_P (op) && op == extract_mem_from_operand (op))
op = XEXP (op, 0);

But inline asms are another source of problems.  Asms don't have
predicates, and so we can't use recog to decide whether a given change
to an asm gives a valid match.  We instead rely on constrain_operands as
something of a recog stand-in.  For an example like:

void
foo (int *ptr)
{
  asm volatile ("%0" :: "r" (-*ptr));
}

any attempt to propagate the negation into the asm would be allowed,
because it's the negated register that would be checked against the
"r" constraint.  This would later lead to:

error: invalid 'asm': invalid operand

The same thing happened in gcc.target/aarch64/vneg_s.c with the
upcoming late-combine pass.

Rather than add more workarounds, it seemed better just to delete
this code.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard


gcc/
* recog.cc (constrain_operands): Remove UNARY_P handling.
* reload.cc (find_reloads): Likewise.
---
 gcc/recog.cc  | 15 ---
 gcc/reload.cc |  6 --
 2 files changed, 21 deletions(-)

diff --git a/gcc/recog.cc b/gcc/recog.cc
index 92f151248a6..e12b4c9500e 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -3080,13 +3080,6 @@ constrain_operands (int strict, alternative_mask 
alternatives)
 
  earlyclobber[opno] = 0;
 
- /* A unary operator may be accepted by the predicate, but it
-is irrelevant for matching constraints.  */
- /* For special_memory_operand, there could be a memory operand inside,
-and it would cause a mismatch for constraint_satisfied_p.  */
- if (UNARY_P (op) && op == extract_mem_from_operand (op))
-   op = XEXP (op, 0);
-
  if (GET_CODE (op) == SUBREG)
{
  if (REG_P (SUBREG_REG (op))
@@ -3152,14 +3145,6 @@ constrain_operands (int strict, alternative_mask 
alternatives)
{
  rtx op1 = recog_data.operand[match];
  rtx op2 = recog_data.operand[opno];
-
- /* A unary operator may be accepted by the predicate,
-but it is irrelevant for matching constraints.  */
- if (UNARY_P (op1))
-   op1 = XEXP (op1, 0);
- if (UNARY_P (op2))
-   op2 = XEXP (op2, 0);
-
  val = operands_match_p (op1, op2);
}
 
diff --git a/gcc/reload.cc b/gcc/reload.cc
index 2e57ebb3cac..07256b6cf2f 100644
--- a/gcc/reload.cc
+++ b/gcc/reload.cc
@@ -3077,12 +3077,6 @@ find_reloads (rtx_insn *insn, int replace, int 
ind_levels, int live_known,
  enum constraint_num cn;
  enum reg_class cl;
 
- /* If the predicate accepts a unary operator, it means that
-we need to reload the operand, but do not do this for
-match_operator and friends.  */
- if (UNARY_P (operand) && *p != 0)
-   operand = XEXP (operand, 0);
-
  /* If the operand is a SUBREG, extract
 the REG or MEM (or maybe even a constant) within.
 (Constants can occur as a result of reg_equiv_constant.)  */
-- 
2.25.1



[PATCH] recog: Fix propagation into ASM_OPERANDS

2023-10-24 Thread Richard Sandiford
An inline asm with multiple output operands is represented as a
parallel set in which the SET_SRCs are the same (shared) ASM_OPERANDS.
insn_propgation didn't account for this, and instead propagated
into each ASM_OPERANDS individually.  This meant that it could
apply a substitution X->Y to Y itself, which (a) could create
circularity and (b) would be semantically wrong in any case,
since Y might use a different value of X.

This patch checks explicitly for parallels involving ASM_OPERANDS,
just like combine does.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard


gcc/
* recog.cc (insn_propagation::apply_to_pattern_1): Handle shared
ASM_OPERANDS.
---
 gcc/recog.cc | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/gcc/recog.cc b/gcc/recog.cc
index e12b4c9500e..3bd2d73c259 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -1339,13 +1339,26 @@ insn_propagation::apply_to_pattern_1 (rtx *loc)
  && apply_to_pattern_1 (&COND_EXEC_CODE (body)));
 
 case PARALLEL:
-  {
-   int last = XVECLEN (body, 0) - 1;
-   for (int i = 0; i < last; ++i)
- if (!apply_to_pattern_1 (&XVECEXP (body, 0, i)))
-   return false;
-   return apply_to_pattern_1 (&XVECEXP (body, 0, last));
-  }
+  for (int i = 0; i < XVECLEN (body, 0); ++i)
+   {
+ rtx *subloc = &XVECEXP (body, 0, i);
+ if (GET_CODE (*subloc) == SET)
+   {
+ if (!apply_to_lvalue_1 (SET_DEST (*subloc)))
+   return false;
+ /* ASM_OPERANDS are shared between SETs in the same PARALLEL.
+Only process them on the first iteration.  */
+ if ((i == 0 || GET_CODE (SET_SRC (*subloc)) != ASM_OPERANDS)
+ && !apply_to_rvalue_1 (&SET_SRC (*subloc)))
+   return false;
+   }
+ else
+   {
+ if (!apply_to_pattern_1 (subloc))
+   return false;
+   }
+   }
+  return true;
 
 case ASM_OPERANDS:
   for (int i = 0, len = ASM_OPERANDS_INPUT_LENGTH (body); i < len; ++i)
-- 
2.25.1



Re: [PATCH] i386: Avoid paradoxical subreg dests in vector zero_extend

2023-10-24 Thread Uros Bizjak
On Tue, Oct 24, 2023 at 12:08 PM Richard Sandiford
 wrote:
>
> For the V2HI -> V2SI zero extension in:
>
>   typedef unsigned short v2hi __attribute__((vector_size(4)));
>   typedef unsigned int v2si __attribute__((vector_size(8)));
>   v2si f (v2hi x) { return (v2si) {x[0], x[1]}; }
>
> ix86_expand_sse_extend would generate:
>
>(set (reg:V2HI 102)
> (const_vector:V2HI [(const_int 0 [0])
> (const_int 0 [0])]))
>(set (subreg:V8HI (reg:V2HI 101) 0)
> (vec_select:V8HI
>   (vec_concat:V16HI (subreg:V8HI (reg/v:V2HI 99 [ x ]) 0)
> (subreg:V8HI (reg:V2HI 102) 0))
>   (parallel [(const_int 0 [0])
>  (const_int 8 [0x8])
>  (const_int 1 [0x1])
>  (const_int 9 [0x9])
>  (const_int 2 [0x2])
>  (const_int 10 [0xa])
>  (const_int 3 [0x3])
>  (const_int 11 [0xb])])))
>   (set (reg:V2SI 100)
>(subreg:V2SI (reg:V2HI 101) 0))
> (expr_list:REG_EQUAL (zero_extend:V2SI (reg/v:V2HI 99 [ x ])))
>
> But using (subreg:V2SI (reg:V2HI 101) 0) as the destination of
> the vec_select means that only the low 4 bytes of the destination
> are stored.  Only the lower half of reg 100 is well-defined.
>
> Things tend to happen to work if the register allocator ties reg 101
> to reg 100.  But it caused problems with the upcoming late-combine pass
> because we propagated the set of reg 100 into its uses.
>
> Tested on x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
> * config/i386/i386-expand.cc (ix86_split_mmx_punpck): Allow the
> destination to be wider than the sources.  Take the mode from the
> first source.
> (ix86_expand_sse_extend): Pass the destination directly to
> ix86_split_mmx_punpck, rather than using a fresh register that
> is half the size.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-expand.cc | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 1eae9d7c78c..2361ff77af3 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -1110,7 +1110,9 @@ ix86_split_mmx_pack (rtx operands[], enum rtx_code code)
>ix86_move_vector_high_sse_to_mmx (op0);
>  }
>
> -/* Split MMX punpcklXX/punpckhXX with SSE punpcklXX.  */
> +/* Split MMX punpcklXX/punpckhXX with SSE punpcklXX.  This is also used
> +   for a full unpack of OPERANDS[1] and OPERANDS[2] into a wider
> +   OPERANDS[0].  */
>
>  void
>  ix86_split_mmx_punpck (rtx operands[], bool high_p)
> @@ -1118,7 +1120,7 @@ ix86_split_mmx_punpck (rtx operands[], bool high_p)
>rtx op0 = operands[0];
>rtx op1 = operands[1];
>rtx op2 = operands[2];
> -  machine_mode mode = GET_MODE (op0);
> +  machine_mode mode = GET_MODE (op1);
>rtx mask;
>/* The corresponding SSE mode.  */
>machine_mode sse_mode, double_sse_mode;
> @@ -5660,7 +5662,7 @@ ix86_expand_sse_extend (rtx dest, rtx src, bool 
> unsigned_p)
>gcc_unreachable ();
>  }
>
> -  ops[0] = gen_reg_rtx (imode);
> +  ops[0] = dest;
>
>ops[1] = force_reg (imode, src);
>
> @@ -5671,7 +5673,6 @@ ix86_expand_sse_extend (rtx dest, rtx src, bool 
> unsigned_p)
>   ops[1], pc_rtx, pc_rtx);
>
>ix86_split_mmx_punpck (ops, false);
> -  emit_move_insn (dest, lowpart_subreg (GET_MODE (dest), ops[0], imode));
>  }
>
>  /* Unpack SRC into the next wider integer vector type.  UNSIGNED_P is
> --
> 2.25.1
>


[PING] [PATCH 1/3] [GCC] arm: vld1q_types_x2 ACLE intrinsics

2023-10-24 Thread Ezra Sitorus
Ping


From: ezra.sito...@arm.com 
Sent: Friday, October 6, 2023 10:49 AM
To: gcc-patches@gcc.gnu.org
Cc: Richard Earnshaw; Kyrylo Tkachov
Subject: [PATCH 1/3] [GCC] arm: vld1q_types_x2 ACLE intrinsics

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vld1q intrinsic for arm32.
This patch adds the _x2 variants of the vld1q intrinsic. Tests use xN so that 
the latter variants (_x3, _x4) could be added.

ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1q_u8_x2, vld1q_u16_x2, vld1q_u32_x2, vld1q_u64_x2): New.
(vld1q_s8_x2, vld1q_s16_x2, vld1q_s32_x2, vld1q_s64_x2): New.
(vld1q_f16_x2, vld1q_f32_x2): New.
(vld1q_p8_x2, vld1q_p16_x2, vld1q_p64_x2): New.
(vld1q_bf16_x2): New.
* config/arm/arm_neon_builtins.def (vld1_x2): New entries.
* config/arm/neon.md (vld1_x2): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1q_base_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_bf16_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_fp16_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_p64_xN_1.c: Add new test.
---
 gcc/config/arm/arm_neon.h | 128 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  10 ++
 .../gcc.target/arm/simd/vld1q_base_xN_1.c |  67 +
 .../gcc.target/arm/simd/vld1q_bf16_xN_1.c |  13 ++
 .../gcc.target/arm/simd/vld1q_fp16_xN_1.c |  14 ++
 .../gcc.target/arm/simd/vld1q_p64_xN_1.c  |  14 ++
 7 files changed, 247 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_base_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_bf16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_fp16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_p64_xN_1.c

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index cdfdb44259a..3eb41c6bdc8 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10403,6 +10403,15 @@ vld1q_p64 (const poly64_t * __a)
   return (poly64x2_t)__builtin_neon_vld1v2di ((const __builtin_neon_di *) __a);
 }

+__extension__ extern __inline poly64x2x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_p64_x2 (const poly64_t * __a)
+{
+  union { poly64x2x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline int8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10432,6 +10441,42 @@ vld1q_s64 (const int64_t * __a)
   return (int64x2_t)__builtin_neon_vld1v2di ((const __builtin_neon_di *) __a);
 }

+__extension__ extern __inline int8x16x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s8_x2 (const int8_t * __a)
+{
+  union { int8x16x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x8x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s16_x2 (const int16_t * __a)
+{
+  union { int16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x4x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s32_x2 (const int32_t * __a)
+{
+  union { int32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v4si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int64x2x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s64_x2 (const int64_t * __a)
+{
+  union { int64x2x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10448,6 +10493,26 @@ vld1q_f32 (const float32_t * __a)
   return (float32x4_t)__builtin_neon_vld1v4sf ((const __builtin_neon_sf *) 
__a);
 }

+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x8x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_f16_x2 (const float16_t * __a)
+{
+  union { float16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v8hf (__a);
+  return __rv.__i;
+}
+#endif
+
+__extension__ extern __inli

[PATCH 0/4] rtl-ssa: Some small, obvious fixes

2023-10-24 Thread Richard Sandiford
This series contains some small fixes to RTL-SSA.  Tested on
aarch64-linux-gnu & x86_64-linux-gnu, pushed as obvious.

Richard Sandiford (4):
  rtl-ssa: Fix null deref in first_any_insn_use
  rtl-ssa: Fix handling of deleted insns
  rtl-ssa: Don't insert after insns that can throw
  rtl-ssa: Avoid creating duplicated phis

 gcc/rtl-ssa.h  | 1 +
 gcc/rtl-ssa/blocks.cc  | 5 +
 gcc/rtl-ssa/changes.cc | 5 -
 gcc/rtl-ssa/member-fns.inl | 2 +-
 gcc/rtl-ssa/movement.h | 3 ++-
 5 files changed, 13 insertions(+), 3 deletions(-)

-- 
2.25.1



[PATCH 1/4] rtl-ssa: Fix null deref in first_any_insn_use

2023-10-24 Thread Richard Sandiford
first_any_insn_use implicitly (but contrary to its documentation)
assumed that there was at least one use.

gcc/
* rtl-ssa/member-fns.inl (first_any_insn_use): Handle null
m_first_use.
---
 gcc/rtl-ssa/member-fns.inl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/rtl-ssa/member-fns.inl b/gcc/rtl-ssa/member-fns.inl
index c127fab8b98..3fdca14e0ef 100644
--- a/gcc/rtl-ssa/member-fns.inl
+++ b/gcc/rtl-ssa/member-fns.inl
@@ -215,7 +215,7 @@ set_info::last_nondebug_insn_use () const
 inline use_info *
 set_info::first_any_insn_use () const
 {
-  if (m_first_use->is_in_any_insn ())
+  if (m_first_use && m_first_use->is_in_any_insn ())
 return m_first_use;
   return nullptr;
 }
-- 
2.25.1



[PATCH 4/4] rtl-ssa: Avoid creating duplicated phis

2023-10-24 Thread Richard Sandiford
If make_uses_available was called twice for the same use,
we could end up trying to create duplicate definitions for
the same extended live range.

gcc/
* rtl-ssa/blocks.cc (function_info::create_degenerate_phi): Check
whether the requested phi already exists.
---
 gcc/rtl-ssa/blocks.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
index d46cbf1e388..ecce7a68c59 100644
--- a/gcc/rtl-ssa/blocks.cc
+++ b/gcc/rtl-ssa/blocks.cc
@@ -525,6 +525,11 @@ function_info::create_phi (ebb_info *ebb, resource_info 
resource,
 phi_info *
 function_info::create_degenerate_phi (ebb_info *ebb, set_info *def)
 {
+  // Allow the function to be called twice in succession for the same def.
+  def_lookup dl = find_def (def->resource (), ebb->phi_insn ());
+  if (set_info *set = dl.matching_set ())
+return as_a (set);
+
   access_info *input = def;
   phi_info *phi = create_phi (ebb, def->resource (), &input, 1);
   if (def->is_reg ())
-- 
2.25.1



[PATCH 3/4] rtl-ssa: Don't insert after insns that can throw

2023-10-24 Thread Richard Sandiford
rtl_ssa::can_insert_after didn't handle insns that can throw.
Fixing that avoids a regression with a later patch.

gcc/
* rtl-ssa.h: Include cfgbuild.h.
* rtl-ssa/movement.h (can_insert_after): Replace is_jump with the
more comprehensive control_flow_insn_p.
---
 gcc/rtl-ssa.h  | 1 +
 gcc/rtl-ssa/movement.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/rtl-ssa.h b/gcc/rtl-ssa.h
index 7355c6c4463..3a3c8b50ee2 100644
--- a/gcc/rtl-ssa.h
+++ b/gcc/rtl-ssa.h
@@ -49,6 +49,7 @@
 #include "obstack-utils.h"
 #include "mux-utils.h"
 #include "rtlanal.h"
+#include "cfgbuild.h"
 
 // Provides the global crtl->ssa.
 #include "memmodel.h"
diff --git a/gcc/rtl-ssa/movement.h b/gcc/rtl-ssa/movement.h
index d9945f49172..67370947dbd 100644
--- a/gcc/rtl-ssa/movement.h
+++ b/gcc/rtl-ssa/movement.h
@@ -61,7 +61,8 @@ move_earlier_than (insn_range_info range, insn_info *insn)
 inline bool
 can_insert_after (insn_info *insn)
 {
-  return insn->is_bb_head () || (insn->is_real () && !insn->is_jump ());
+  return (insn->is_bb_head ()
+ || (insn->is_real () && !control_flow_insn_p (insn->rtl (;
 }
 
 // Try to restrict move range MOVE_RANGE so that it is possible to
-- 
2.25.1



[PATCH 2/4] rtl-ssa: Fix handling of deleted insns

2023-10-24 Thread Richard Sandiford
RTL-SSA queues up some invasive changes for later.  But sometimes
the insns involved in those changes can be deleted by later
optimisations, making the queued change unnecessary.  This patch
checks for that case.

gcc/
* rtl-ssa/changes.cc (function_info::perform_pending_updates): Check
whether an insn has been replaced by a note.
---
 gcc/rtl-ssa/changes.cc | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index 73ab3ccfd24..de6222ae736 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -983,7 +983,10 @@ function_info::perform_pending_updates ()
   for (insn_info *insn : m_queued_insn_updates)
 {
   rtx_insn *rtl = insn->rtl ();
-  if (JUMP_P (rtl))
+  if (NOTE_P (rtl))
+   // The insn was later optimized away, typically to a NOTE_INSN_DELETED.
+   ;
+  else if (JUMP_P (rtl))
{
  if (INSN_CODE (rtl) == NOOP_MOVE_INSN_CODE)
{
-- 
2.25.1



Re: [PATCH 1/2] testsuite: Add and use thread_fence effective-target

2023-10-24 Thread Christophe Lyon
Ping?

Le lun. 2 oct. 2023, 10:24, Christophe Lyon  a
écrit :

> ping?
>
> On Sun, 10 Sept 2023 at 21:31, Christophe Lyon 
> wrote:
>
>> Some targets like arm-eabi with newlib and default settings rely on
>> __sync_synchronize() to ensure synchronization.  Newlib does not
>> implement it by default, to make users aware they have to take special
>> care.
>>
>> This makes a few tests fail to link.
>>
>> This patch adds a new thread_fence effective target (similar to the
>> corresponding one in libstdc++ testsuite), and uses it in the tests
>> that need it, making them UNSUPPORTED instead of FAIL and UNRESOLVED.
>>
>> 2023-09-10  Christophe Lyon  
>>
>> gcc/
>> * doc/sourcebuild.texi (Other attributes): Document thread_fence
>> effective-target.
>>
>> gcc/testsuite/
>> * g++.dg/init/array54.C: Require thread_fence.
>> * gcc.dg/c2x-nullptr-1.c: Likewise.
>> * gcc.dg/pr103721-2.c: Likewise.
>> * lib/target-supports.exp (check_effective_target_thread_fence):
>> New.
>> ---
>>  gcc/doc/sourcebuild.texi  |  4 
>>  gcc/testsuite/g++.dg/init/array54.C   |  1 +
>>  gcc/testsuite/gcc.dg/c2x-nullptr-1.c  |  1 +
>>  gcc/testsuite/gcc.dg/pr103721-2.c |  1 +
>>  gcc/testsuite/lib/target-supports.exp | 12 
>>  5 files changed, 19 insertions(+)
>>
>> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
>> index 1a78b3c1abb..a5f61c29f3b 100644
>> --- a/gcc/doc/sourcebuild.texi
>> +++ b/gcc/doc/sourcebuild.texi
>> @@ -2860,6 +2860,10 @@ Compiler has been configured to support link-time
>> optimization (LTO).
>>  Compiler and linker support link-time optimization relocatable linking
>>  with @option{-r} and @option{-flto} options.
>>
>> +@item thread_fence
>> +Target implements @code{__atomic_thread_fence} without relying on
>> +non-implemented @code{__sync_synchronize()}.
>> +
>>  @item naked_functions
>>  Target supports the @code{naked} function attribute.
>>
>> diff --git a/gcc/testsuite/g++.dg/init/array54.C
>> b/gcc/testsuite/g++.dg/init/array54.C
>> index f6be350ba72..5241e451d6d 100644
>> --- a/gcc/testsuite/g++.dg/init/array54.C
>> +++ b/gcc/testsuite/g++.dg/init/array54.C
>> @@ -1,5 +1,6 @@
>>  // PR c++/90947
>>  // { dg-do run { target c++11 } }
>> +// { dg-require-effective-target thread_fence }
>>
>>  #include 
>>
>> diff --git a/gcc/testsuite/gcc.dg/c2x-nullptr-1.c
>> b/gcc/testsuite/gcc.dg/c2x-nullptr-1.c
>> index 4e440234d52..97a31c27409 100644
>> --- a/gcc/testsuite/gcc.dg/c2x-nullptr-1.c
>> +++ b/gcc/testsuite/gcc.dg/c2x-nullptr-1.c
>> @@ -1,5 +1,6 @@
>>  /* Test valid usage of C23 nullptr.  */
>>  /* { dg-do run } */
>> +// { dg-require-effective-target thread_fence }
>>  /* { dg-options "-std=c2x -pedantic-errors -Wall -Wextra
>> -Wno-unused-variable" } */
>>
>>  #include 
>> diff --git a/gcc/testsuite/gcc.dg/pr103721-2.c
>> b/gcc/testsuite/gcc.dg/pr103721-2.c
>> index aefa1f0f147..e059b1cfc2d 100644
>> --- a/gcc/testsuite/gcc.dg/pr103721-2.c
>> +++ b/gcc/testsuite/gcc.dg/pr103721-2.c
>> @@ -1,4 +1,5 @@
>>  // { dg-do run }
>> +// { dg-require-effective-target thread_fence }
>>  // { dg-options "-O2" }
>>
>>  extern void abort ();
>> diff --git a/gcc/testsuite/lib/target-supports.exp
>> b/gcc/testsuite/lib/target-supports.exp
>> index d353cc0aaf0..7ac9e7530cc 100644
>> --- a/gcc/testsuite/lib/target-supports.exp
>> +++ b/gcc/testsuite/lib/target-supports.exp
>> @@ -9107,6 +9107,18 @@ proc check_effective_target_sync_char_short { } {
>>  || [check_effective_target_mips_llsc] }}]
>>  }
>>
>> +# Return 1 if thread_fence does not rely on __sync_synchronize
>> +# library function
>> +
>> +proc check_effective_target_thread_fence {} {
>> +return [check_no_compiler_messages thread_fence executable {
>> +   int main () {
>> +   __atomic_thread_fence (__ATOMIC_SEQ_CST);
>> +   return 0;
>> +   }
>> +} ""]
>> +}
>> +
>>  # Return 1 if the target uses a ColdFire FPU.
>>
>>  proc check_effective_target_coldfire_fpu { } {
>> --
>> 2.34.1
>>
>>


Re: [PATCH] testsuite: Fix gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c

2023-10-24 Thread Christophe Lyon
Ping?

Le lun. 2 oct. 2023, 10:23, Christophe Lyon  a
écrit :

> ping? maybe this counts as obvious?
>
>
> On Thu, 14 Sept 2023 at 11:13, Christophe Lyon 
> wrote:
>
>> ping?
>>
>> On Fri, 8 Sept 2023 at 10:43, Christophe Lyon 
>> wrote:
>>
>>> The test was declaring 'int *carry;' and wrote to '*carry' without
>>> initializing 'carry' first, leading to an attempt to write at address
>>> zero, and a crash.
>>>
>>> Fix by declaring 'int carry;' and passing '&carrry' instead of 'carry'
>>> as parameter.
>>>
>>> 2023-09-08  Christophe Lyon  
>>>
>>> gcc/testsuite/
>>> * gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: Fix.
>>> ---
>>>  .../arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c | 34 +--
>>>  1 file changed, 17 insertions(+), 17 deletions(-)
>>>
>>> diff --git
>>> a/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
>>> b/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
>>> index a8c6cce67c8..931c9d2f30b 100644
>>> --- a/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
>>> +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
>>> @@ -7,7 +7,7 @@
>>>
>>>  volatile int32x4_t c1;
>>>  volatile uint32x4_t c2;
>>> -int *carry;
>>> +int carry;
>>>
>>>  int
>>>  main ()
>>> @@ -21,45 +21,45 @@ main ()
>>>uint32x4_t inactive2 = vcreateq_u32 (0, 0);
>>>
>>>mve_pred16_t p = 0x;
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c1 = vadcq (a1, b1, carry);
>>> +  c1 = vadcq (a1, b1, &carry);
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c2 = vadcq (a2, b2, carry);
>>> +  c2 = vadcq (a2, b2, &carry);
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c1 = vsbcq (a1, b1, carry);
>>> +  c1 = vsbcq (a1, b1, &carry);
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c2 = vsbcq (a2, b2, carry);
>>> +  c2 = vsbcq (a2, b2, &carry);
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c1 = vadcq_m (inactive1, a1, b1, carry, p);
>>> +  c1 = vadcq_m (inactive1, a1, b1, &carry, p);
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c2 = vadcq_m (inactive2, a2, b2, carry, p);
>>> +  c2 = vadcq_m (inactive2, a2, b2, &carry, p);
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c1 = vsbcq_m (inactive1, a1, b1, carry, p);
>>> +  c1 = vsbcq_m (inactive1, a1, b1, &carry, p);
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c2 = vsbcq_m (inactive2, a2, b2, carry, p);
>>> +  c2 = vsbcq_m (inactive2, a2, b2, &carry, p);
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>>
>>> --
>>> 2.34.1
>>>
>>>


[PATCH 1/6] rtl-ssa: Ensure global registers are live on exit

2023-10-24 Thread Richard Sandiford
RTL-SSA mostly relies on DF for block-level register liveness
information, including artificial uses and defs at the beginning
and end of blocks.  But one case was missing.  DF does not add
artificial uses of global registers to the beginning or end
of a block.  Instead it marks them as used within every block
when computing LR and LIVE problems.

For RTL-SSA, global registers behave like memory, which in
turn behaves like gimple vops.  We need to ensure that they
are live on exit so that final definitions do not appear
to be unused.

Also, the previous live-on-exit handling only considered the exit
block itself.  It needs to consider non-local gotos as well, since
they jump directly to some code in a parent function and so do
not have a path to the exit block.

gcc/
* rtl-ssa/blocks.cc (function_info::add_artificial_accesses): Force
global registers to be live on exit.  Handle any block with zero
successors like an exit block.
---
 gcc/rtl-ssa/blocks.cc | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
index ecce7a68c59..49c0d15b3cf 100644
--- a/gcc/rtl-ssa/blocks.cc
+++ b/gcc/rtl-ssa/blocks.cc
@@ -866,11 +866,14 @@ function_info::add_artificial_accesses (build_info &bi, 
df_ref_flags flags)
 
   start_insn_accesses ();
 
+  HARD_REG_SET added_regs = {};
   FOR_EACH_ARTIFICIAL_USE (ref, cfg_bb->index)
 if ((DF_REF_FLAGS (ref) & DF_REF_AT_TOP) == flags)
   {
unsigned int regno = DF_REF_REGNO (ref);
machine_mode mode = GET_MODE (DF_REF_REAL_REG (ref));
+   if (HARD_REGISTER_NUM_P (regno))
+ SET_HARD_REG_BIT (added_regs, regno);
 
// A definition must be available.
gcc_checking_assert (bitmap_bit_p (&lr_info->in, regno)
@@ -879,10 +882,20 @@ function_info::add_artificial_accesses (build_info &bi, 
df_ref_flags flags)
m_temp_uses.safe_push (create_reg_use (bi, insn, { mode, regno }));
   }
 
-  // Track the return value of memory by adding an artificial use of
-  // memory at the end of the exit block.
-  if (flags == 0 && cfg_bb->index == EXIT_BLOCK)
+  // Ensure that global registers and memory are live at the end of any
+  // block that has no successors, such as the exit block and non-local gotos.
+  // Global registers have to be singled out because they are not part of
+  // the DF artifical use list (they are instead treated as used within
+  // every block).
+  if (flags == 0 && EDGE_COUNT (cfg_bb->succs) == 0)
 {
+  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
+   if (global_regs[i] && !TEST_HARD_REG_BIT (added_regs, i))
+ {
+   auto mode = reg_raw_mode[i];
+   m_temp_uses.safe_push (create_reg_use (bi, insn, { mode, i }));
+ }
+
   auto *use = allocate (insn, memory, bi.current_mem_value ());
   add_use (use);
   m_temp_uses.safe_push (use);
-- 
2.25.1



[PATCH 0/6] rtl-ssa: Various fixes needed for the late-combine pass

2023-10-24 Thread Richard Sandiford
Testing the late-combine pass showed a depressing number of
bugs in areas of RTL-SSA that hadn't been used much until now.
Most of them relate to doing things after RA.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard

Richard Sandiford (6):
  rtl-ssa: Ensure global registers are live on exit
  rtl-ssa: Create REG_UNUSED notes after all pending changes
  rtl-ssa: Fix ICE when deleting memory clobbers
  rtl-ssa: Handle artifical uses of deleted defs
  rtl-ssa: Calculate dominance frontiers for the exit block
  rtl-ssa: Handle call clobbers in more places

 gcc/rtl-ssa/access-utils.h | 27 ++---
 gcc/rtl-ssa/accesses.cc| 25 
 gcc/rtl-ssa/blocks.cc  | 60 ++
 gcc/rtl-ssa/changes.cc | 58 +++-
 gcc/rtl-ssa/functions.cc   |  2 +-
 gcc/rtl-ssa/functions.h| 15 ++
 gcc/rtl-ssa/insns.cc   |  2 ++
 gcc/rtl-ssa/internals.h|  4 +++
 gcc/rtl-ssa/member-fns.inl |  9 ++
 9 files changed, 158 insertions(+), 44 deletions(-)

-- 
2.25.1



[PATCH 3/6] rtl-ssa: Fix ICE when deleting memory clobbers

2023-10-24 Thread Richard Sandiford
Sometimes an optimisation can remove a clobber of scratch registers
or scratch memory.  We then need to update the DU chains to reflect
the removed clobber.

For registers this isn't a problem.  Clobbers of registers are just
momentary blips in the register's lifetime.  They act as a barrier for
moving uses later or defs earlier, but otherwise they have no effect on
the semantics of other instructions.  Removing a clobber is therefore a
cheap, local operation.

In contrast, clobbers of memory are modelled as full sets.
This is because (a) a clobber of memory does not invalidate
*all* memory and (b) it's a common idiom to use (clobber (mem ...))
in stack barriers.  But removing a set and redirecting all uses
to a different set is a linear operation.  Doing it for potentially
every optimisation could lead to quadratic behaviour.

This patch therefore refrains from removing sets of memory that appear
to be redundant.  There's an opportunity to clean this up in linear time
at the end of the pass, but as things stand, nothing would benefit from
that.

This is also a very rare event.  Usually we should try to optimise the
insn before the scratch memory has been allocated.

gcc/
* rtl-ssa/changes.cc (function_info::finalize_new_accesses):
If a change describes a set of memory, ensure that that set
is kept, regardless of the insn pattern.
---
 gcc/rtl-ssa/changes.cc | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index c73c23c86fb..5800f9dba97 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -429,8 +429,18 @@ function_info::finalize_new_accesses (insn_change &change, 
insn_info *pos)
   // Also keep any explicitly-recorded call clobbers, which are deliberately
   // excluded from the vec_rtx_properties.  Calls shouldn't move, so we can
   // keep the definitions in their current position.
+  //
+  // If the change describes a set of memory, but the pattern doesn't
+  // reference memory, keep the set anyway.  This can happen if the
+  // old pattern was a parallel that contained a memory clobber, and if
+  // the new pattern was recognized without that clobber.  Keeping the
+  // set avoids a linear-complexity update to the set's users.
+  //
+  // ??? We could queue an update so that these bogus clobbers are
+  // removed later.
   for (def_info *def : change.new_defs)
-if (def->m_has_been_superceded && def->is_call_clobber ())
+if (def->m_has_been_superceded
+   && (def->is_call_clobber () || def->is_mem ()))
   {
def->m_has_been_superceded = false;
def->set_insn (insn);
@@ -535,7 +545,7 @@ function_info::finalize_new_accesses (insn_change &change, 
insn_info *pos)
}
 }
 
-  // Install the new list of definitions in CHANGE.
+  // Install the new list of uses in CHANGE.
   sort_accesses (m_temp_uses);
   change.new_uses = use_array (temp_access_array (m_temp_uses));
   m_temp_uses.truncate (0);
-- 
2.25.1



[PATCH 6/6] rtl-ssa: Handle call clobbers in more places

2023-10-24 Thread Richard Sandiford
In order to save (a lot of) memory, RTL-SSA avoids creating
individual clobber records for every call-clobbered register.
It instead maintains a list & splay tree of calls in an EBB,
grouped by ABI.

This patch takes these call clobbers into account in a couple
more routines.  I don't think this will have any effect on
existing users, since it's only necessary for hard registers.

gcc/
* rtl-ssa/access-utils.h (next_call_clobbers): New function.
(is_single_dominating_def, remains_available_on_exit): Replace with...
* rtl-ssa/functions.h (function_info::is_single_dominating_def)
(function_info::remains_available_on_exit): ...these new member
functions.
(function_info::m_clobbered_by_calls): New member variable.
* rtl-ssa/functions.cc (function_info::function_info): Explicitly
initialize m_clobbered_by_calls.
* rtl-ssa/insns.cc (function_info::record_call_clobbers): Update
m_clobbered_by_calls for each call-clobber note.
* rtl-ssa/member-fns.inl (function_info::is_single_dominating_def):
New function.  Check for call clobbers.
* rtl-ssa/accesses.cc (function_info::remains_available_on_exit):
Likewise.
---
 gcc/rtl-ssa/access-utils.h | 27 +--
 gcc/rtl-ssa/accesses.cc| 25 +
 gcc/rtl-ssa/functions.cc   |  2 +-
 gcc/rtl-ssa/functions.h| 14 ++
 gcc/rtl-ssa/insns.cc   |  2 ++
 gcc/rtl-ssa/member-fns.inl |  9 +
 6 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/gcc/rtl-ssa/access-utils.h b/gcc/rtl-ssa/access-utils.h
index 84d386b7d8b..0d7a57f843c 100644
--- a/gcc/rtl-ssa/access-utils.h
+++ b/gcc/rtl-ssa/access-utils.h
@@ -127,24 +127,6 @@ set_with_nondebug_insn_uses (access_info *access)
   return nullptr;
 }
 
-// Return true if SET is the only set of SET->resource () and if it
-// dominates all uses (excluding uses of SET->resource () at points
-// where SET->resource () is always undefined).
-inline bool
-is_single_dominating_def (const set_info *set)
-{
-  return set->is_first_def () && set->is_last_def ();
-}
-
-// SET is known to be available on entry to BB.  Return true if it is
-// also available on exit from BB.  (The value might or might not be live.)
-inline bool
-remains_available_on_exit (const set_info *set, bb_info *bb)
-{
-  return (set->is_last_def ()
- || *set->next_def ()->insn () > *bb->end_insn ());
-}
-
 // ACCESS is known to be associated with an instruction rather than
 // a phi node.  Return which instruction that is.
 inline insn_info *
@@ -313,6 +295,15 @@ next_call_clobbers_ignoring (insn_call_clobbers_tree 
&tree, insn_info *insn,
   return tree->insn ();
 }
 
+// Search forwards from immediately after INSN for the first instruction
+// recorded in TREE.  Return null if no such instruction exists.
+inline insn_info *
+next_call_clobbers (insn_call_clobbers_tree &tree, insn_info *insn)
+{
+  auto ignore = [](const insn_info *) { return false; };
+  return next_call_clobbers_ignoring (tree, insn, ignore);
+}
+
 // If ACCESS is a set, return the first use of ACCESS by a nondebug insn I
 // for which IGNORE (I) is false.  Return null if ACCESS is not a set or if
 // no such use exists.
diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index 774ab9d99ee..c35c7efb73d 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -1303,6 +1303,31 @@ function_info::insert_temp_clobber (obstack_watermark 
&watermark,
   return insert_access (watermark, clobber, old_defs);
 }
 
+// See the comment above the declaration.
+bool
+function_info::remains_available_on_exit (const set_info *set, bb_info *bb)
+{
+  if (HARD_REGISTER_NUM_P (set->regno ())
+  && TEST_HARD_REG_BIT (m_clobbered_by_calls, set->regno ()))
+{
+  insn_info *search_insn = (set->bb () == bb
+   ? set->insn ()
+   : bb->head_insn ());
+  for (ebb_call_clobbers_info *call_group : bb->ebb ()->call_clobbers ())
+   {
+ if (!call_group->clobbers (set->resource ()))
+   continue;
+
+ insn_info *insn = next_call_clobbers (*call_group, search_insn);
+ if (insn && insn->bb () == bb)
+   return false;
+   }
+}
+
+  return (set->is_last_def ()
+ || *set->next_def ()->insn () > *bb->end_insn ());
+}
+
 // A subroutine of make_uses_available.  Try to make USE's definition
 // available at the head of BB.  WILL_BE_DEBUG_USE is true if the
 // definition will be used only in debug instructions.
diff --git a/gcc/rtl-ssa/functions.cc b/gcc/rtl-ssa/functions.cc
index c35d25dbf8f..8a8108baae8 100644
--- a/gcc/rtl-ssa/functions.cc
+++ b/gcc/rtl-ssa/functions.cc
@@ -32,7 +32,7 @@
 using namespace rtl_ssa;
 
 function_info::function_info (function *fn)
-  : m_fn (fn)
+  : m_fn (fn), m_clobbered_by_calls ()
 {
   // Force the alignment to be obstack_alignment.  Everything else is normal.
   ob

[PATCH 4/6] rtl-ssa: Handle artifical uses of deleted defs

2023-10-24 Thread Richard Sandiford
If an optimisation removes the last real use of a definition,
there can still be artificial uses left.  This patch removes
those uses too.

These artificial uses exist because RTL-SSA is only an SSA-like
view of the existing RTL IL, rather than a native SSA representation.
It effectively treats RTL registers like gimple vops, but with the
addition of an RPO view of the register's lifetime(s).  Things are
structured to allow most operations to update this RPO view in
amortised sublinear time.

gcc/
* rtl-ssa/functions.h (function_info::process_uses_of_deleted_def):
New member function.
* rtl-ssa/functions.cc (function_info::process_uses_of_deleted_def):
Likewise.
(function_info::change_insns): Use it.
---
 gcc/rtl-ssa/changes.cc  | 35 +--
 gcc/rtl-ssa/functions.h |  1 +
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index 5800f9dba97..3e14069421c 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -209,6 +209,35 @@ rtl_ssa::changes_are_worthwhile (array_slice changes,
   return true;
 }
 
+// SET has been deleted.  Clean up all remaining uses.  Such uses are
+// either dead phis or now-redundant live-out uses.
+void
+function_info::process_uses_of_deleted_def (set_info *set)
+{
+  if (!set->has_any_uses ())
+return;
+
+  auto *use = *set->all_uses ().begin ();
+  do
+{
+  auto *next_use = use->next_use ();
+  if (use->is_in_phi ())
+   {
+ // This call will not recurse.
+ process_uses_of_deleted_def (use->phi ());
+ delete_phi (use->phi ());
+   }
+  else
+   {
+ gcc_assert (use->is_live_out_use ());
+ remove_use (use);
+   }
+  use = next_use;
+}
+  while (use);
+  gcc_assert (!set->has_any_uses ());
+}
+
 // Update the REG_NOTES of INSN, whose pattern has just been changed.
 static void
 update_notes (rtx_insn *insn)
@@ -695,7 +724,8 @@ function_info::change_insns (array_slice 
changes)
 }
 
   // Remove all definitions that are no longer needed.  After the above,
-  // such definitions should no longer have any registered users.
+  // the only uses of such definitions should be dead phis and now-redundant
+  // live-out uses.
   //
   // In particular, this means that consumers must handle debug
   // instructions before removing a set.
@@ -704,7 +734,8 @@ function_info::change_insns (array_slice 
changes)
   if (def->m_has_been_superceded)
{
  auto *set = dyn_cast (def);
- gcc_assert (!set || !set->has_any_uses ());
+ if (set && set->has_any_uses ())
+   process_uses_of_deleted_def (set);
  remove_def (def);
}
 
diff --git a/gcc/rtl-ssa/functions.h b/gcc/rtl-ssa/functions.h
index 73690a0e63b..cd90b6aa9df 100644
--- a/gcc/rtl-ssa/functions.h
+++ b/gcc/rtl-ssa/functions.h
@@ -263,6 +263,7 @@ private:
   bb_info *create_bb_info (basic_block);
   void append_bb (bb_info *);
 
+  void process_uses_of_deleted_def (set_info *);
   insn_info *add_placeholder_after (insn_info *);
   void possibly_queue_changes (insn_change &);
   void finalize_new_accesses (insn_change &, insn_info *);
-- 
2.25.1



[PATCH 2/6] rtl-ssa: Create REG_UNUSED notes after all pending changes

2023-10-24 Thread Richard Sandiford
Unlike REG_DEAD notes, REG_UNUSED notes need to be kept free of
false positives by all passes.  function_info::change_insns
does this by removing all REG_UNUSED notes, and then using
add_reg_unused_notes to add notes back (or create new ones)
where appropriate.

The problem was that it called add_reg_unused_notes on the fly
while updating each instruction, which meant that the information
for later instructions in the change set wasn't up to date.
This patch does it in a separate loop instead.

gcc/
* rtl-ssa/changes.cc (function_info::apply_changes_to_insn): Remove
call to add_reg_unused_notes and instead...
(function_info::change_insns): ...use a separate loop here.
---
 gcc/rtl-ssa/changes.cc | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index de6222ae736..c73c23c86fb 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -586,8 +586,6 @@ function_info::apply_changes_to_insn (insn_change &change)
 
   insn->set_accesses (builder.finish ().begin (), num_defs, num_uses);
 }
-
-  add_reg_unused_notes (insn);
 }
 
 // Add a temporary placeholder instruction after AFTER.
@@ -733,9 +731,14 @@ function_info::change_insns (array_slice 
changes)
}
 }
 
-  // Finally apply the changes to the underlying insn_infos.
+  // Apply the changes to the underlying insn_infos.
   for (insn_change *change : changes)
 apply_changes_to_insn (*change);
+
+  // Now that the insns and accesses are up to date, add any REG_UNUSED notes.
+  for (insn_change *change : changes)
+if (!change->is_deletion ())
+  add_reg_unused_notes (change->insn ());
 }
 
 // See the comment above the declaration.
-- 
2.25.1



Re: [PATCH v25 25/33] libstdc++: Optimize std::is_function compilation performance

2023-10-24 Thread Jonathan Wakely
On Tue, 24 Oct 2023 at 03:16, Ken Matsui  wrote:

> This patch optimizes the compilation performance of std::is_function
> by dispatching to the new __is_function built-in trait.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits (is_function): Use __is_function
> built-in trait.
> (is_function_v): Likewise. Optimize its implementation.
> (is_const_v): Move on top of is_function_v as is_function_v now
> depends on is_const_v.
>

I think I'd prefer to keep is_const_v where it is now, adjacent to
is_volatile_v, and move is_function_v after those.

i.e. like this (but with the additional changes to use the new built-in):

--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -3198,8 +3198,8 @@ template 
   inline constexpr bool is_union_v = __is_union(_Tp);
 template 
   inline constexpr bool is_class_v = __is_class(_Tp);
-template 
-  inline constexpr bool is_function_v = is_function<_Tp>::value;
+// is_function_v is defined below, after is_const_v.
+
 template 
   inline constexpr bool is_reference_v = false;
 template 
@@ -3226,6 +3226,8 @@ template 
   inline constexpr bool is_volatile_v = false;
 template 
   inline constexpr bool is_volatile_v = true;
+template 
+  inline constexpr bool is_function_v = is_function<_Tp>::value;

 template 
   inline constexpr bool is_trivial_v = __is_trivial(_Tp);

The variable templates are currently defined in the order shown in the
standard, in te [meta.type.synop] synopsis, and in the [meta.unary.cat]
table. So let's move _is_function_v later and add a comment saying why it's
not in the expected place.


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-24 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, 19 Oct 2023, Robin Dapp wrote:
>
>> Ugh, I didn't push yet because with a rebased trunk I am
>> seeing different behavior for some riscv testcases.
>> 
>> A reduction is not recognized because there is yet another
>> "double use" occurrence in check_reduction_path.  I guess it's
>> reasonable to loosen the restriction for conditional operations
>> here as well.
>> 
>> The only change to v4 therefore is:
>> 
>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>> index ebab1953b9c..64654a55e4c 100644
>> --- a/gcc/tree-vect-loop.cc
>> +++ b/gcc/tree-vect-loop.cc
>> @@ -4085,7 +4094,15 @@ pop:
>> || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
>>   FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
>> cnt++;
>> -  if (cnt != 1)
>> +
>> +  bool cond_fn_p = op.code.is_internal_fn ()
>> +   && (conditional_internal_fn_code (internal_fn (*code))
>> +   != ERROR_MARK);
>> +
>> +  /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have
>> +op1 twice (once as definition, once as else) in the same operation.
>> +Allow this.  */
>> +  if ((!cond_fn_p && cnt != 1) || (opi == 1 && cond_fn_p && cnt != 2))
>> 
>> Bootstrapped and regtested again on x86, aarch64 and power10.
>> Testsuite on riscv unchanged.
>
> Hmm, why opi == 1 only?  I think
>
> # _1 = PHI <.., _4>
>  _3 = .COND_ADD (_1, _2, _1);
>  _4 = .COND_ADD (_3, _5, _3);
>
> would be fine as well.  I think we want to simply ignore the 'else' value
> of conditional internal functions.  I suppose we have unary, binary
> and ternary conditional functions - I miss a internal_fn_else_index,
> but I suppose it's always the last one?

Yeah, it was always the last one before the introduction of .COND_LEN.
I agree internal_fn_else_index would be useful now.

Thanks,
Richard

>
> I think a single use on .COND functions is also OK, even when on the
> 'else' value only?  But maybe that's not too important here.
>
> Maybe
>
>   gimple *op_use_stmt;
>   unsigned cnt = 0;
>   FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi])
> if (.. op_use_stmt is conditional internal function ..)
>   {
> for (unsigned j = 0; j < gimple_call_num_args (call) - 1; ++j)
>   if (gimple_call_arg (call, j) == op.ops[opi])
> cnt++;
>   }
> else if (!is_gimple_debug (op_use_stmt)
> && (*code != ERROR_MARK
> || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
>   FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> cnt++;
>
> ?
>
>> Regards
>>  Robin
>> 
>> Subject: [PATCH v5] ifcvt/vect: Emit COND_OP for conditional scalar 
>> reduction.
>> 
>> As described in PR111401 we currently emit a COND and a PLUS expression
>> for conditional reductions.  This makes it difficult to combine both
>> into a masked reduction statement later.
>> This patch improves that by directly emitting a COND_ADD/COND_OP during
>> ifcvt and adjusting some vectorizer code to handle it.
>> 
>> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
>> is true.
>> 
>> gcc/ChangeLog:
>> 
>>  PR middle-end/111401
>>  * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
>>  if supported.
>>  (predicate_scalar_phi): Add whitespace.
>>  * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
>>  (neutral_op_for_reduction): Return -0 for PLUS.
>>  (check_reduction_path): Don't count else operand in COND_OP.
>>  (vect_is_simple_reduction): Ditto.
>>  (vect_create_epilog_for_reduction): Fix whitespace.
>>  (vectorize_fold_left_reduction): Add COND_OP handling.
>>  (vectorizable_reduction): Don't count else operand in COND_OP.
>>  (vect_transform_reduction): Add COND_OP handling.
>>  * tree-vectorizer.h (neutral_op_for_reduction): Add default
>>  parameter.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>>  * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
>>  * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
>>  * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
>> ---
>>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 +++
>>  .../riscv/rvv/autovec/cond/pr111401.c | 139 +++
>>  .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
>>  .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
>>  gcc/tree-if-conv.cc   |  49 +++--
>>  gcc/tree-vect-loop.cc | 168 ++
>>  gcc/tree-vectorizer.h |   2 +-
>>  7 files changed, 456 insertions(+), 51 deletions(-)
>>  create mode 100644 
>> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
>> 
>> diff --git 
>> a/gc

Re: [PATCH] i386: Fix undefined masks in vpopcnt tests

2023-10-24 Thread Hongtao Liu
On Tue, Oct 24, 2023 at 6:10 PM Richard Sandiford
 wrote:
>
> The files changed in this patch had tests for masked and unmasked
> popcnt.  However, the mask inputs to the masked forms were undefined,
> and would be set to zero by init_regs.  Any combine-like pass that
> ran after init_regs could then fold the masked forms into the
> unmasked ones.  I saw this while testing the late-combine pass
> on x86.
>
> Tested on x86_64-linux-gnu.  OK to install?  (I didn't think this
> counted as obvious because there are other ways of initialising
> the mask.)
Maybe just move the definition of the mask outside of the functions as
extern __mmask16 msk;
But of course your approach is also ok, so either way is ok with me.
>
> Richard
>
>
> gcc/testsuite/
> * gcc.target/i386/avx512bitalg-vpopcntb.c: Use an asm to define
> the mask.
> * gcc.target/i386/avx512bitalg-vpopcntbvl.c: Likewise.
> * gcc.target/i386/avx512bitalg-vpopcntw.c: Likewise.
> * gcc.target/i386/avx512bitalg-vpopcntwvl.c: Likewise.
> * gcc.target/i386/avx512vpopcntdq-vpopcntd.c: Likewise.
> * gcc.target/i386/avx512vpopcntdq-vpopcntq.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c| 1 +
>  gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c  | 1 +
>  gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c| 1 +
>  gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c  | 1 +
>  gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c | 1 +
>  gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c | 1 +
>  6 files changed, 6 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c 
> b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
> index 44b82c0519d..c52088161a0 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
> @@ -11,6 +11,7 @@ extern __m512i z, z1;
>  int foo ()
>  {
>__mmask16 msk;
> +  asm volatile ("" : "=k" (msk));
>__m512i c = _mm512_popcnt_epi8 (z);
>asm volatile ("" : "+v" (c));
>c = _mm512_mask_popcnt_epi8 (z1, msk, z);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c 
> b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
> index 8c2dfaba9c6..7d11c6c4623 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
> @@ -16,6 +16,7 @@ int foo ()
>  {
>__mmask32 msk32;
>__mmask16 msk16;
> +  asm volatile ("" : "=k" (msk16), "=k" (msk32));
>__m256i c256 = _mm256_popcnt_epi8 (y);
>asm volatile ("" : "+v" (c256));
>c256 = _mm256_mask_popcnt_epi8 (y_1, msk32, y);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c 
> b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
> index 2ef8589f6c1..bc470415e9b 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
> @@ -11,6 +11,7 @@ extern __m512i z, z1;
>  int foo ()
>  {
>__mmask16 msk;
> +  asm volatile ("" : "=k" (msk));
>__m512i c = _mm512_popcnt_epi16 (z);
>asm volatile ("" : "+v" (c));
>c = _mm512_mask_popcnt_epi16 (z1, msk, z);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c 
> b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
> index c976461b12e..3a6af3ed8a1 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
> @@ -16,6 +16,7 @@ int foo ()
>  {
>__mmask16 msk16;
>__mmask8 msk8;
> +  asm volatile ("" : "=k" (msk16), "=k" (msk8));
>__m256i c256 = _mm256_popcnt_epi16 (y);
>asm volatile ("" : "+v" (c256));
>c256 = _mm256_mask_popcnt_epi16 (y_1, msk16, y);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c 
> b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
> index b4d82f97032..0a54ae83055 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
> @@ -20,6 +20,7 @@ int foo ()
>  {
>__mmask16 msk;
>__mmask8 msk8;
> +  asm volatile ("" : "=k" (msk), "=k" (msk8));
>__m128i a = _mm_popcnt_epi32 (x);
>asm volatile ("" : "+v" (a));
>a = _mm_mask_popcnt_epi32 (x_1, msk8, x);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c 
> b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
> index e87d6c999b6..c11e6e00998 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
> @@ -19,6 +19,7 @@ extern __m512i z, z_1;
>  int foo ()
>  {
>__mmask8 msk;
> +  asm volatile ("" : "=k" (msk));
>__m128i a = _mm_popcnt_epi64 (x);
>asm volatile ("" : "+v" (a));
>a = _mm_mask_popcnt_epi64 (x_1, msk, x);
> --
> 2.25.1
>


-- 
BR,
Hongtao


[PATCH GCC13 backport] Avoid compile time hog on vect_peel_nonlinear_iv_init for nonlinear induction vec_step_op_mul when iteration count is too big.

2023-10-24 Thread liuhongt
This is the backport patch for releases/gcc-13 branch, the original patch for 
main trunk
is at [1].
The only difference between this backport patch and [1] is GCC13 doesn't 
support auto_mpz,
So this patch manually use mpz_init/mpz_clear.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633661.html

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ok for backport to releases/gcc-13?

There's loop in vect_peel_nonlinear_iv_init to get init_expr *
pow (step_expr, skip_niters). When skipn_iters is too big, compile time
hogs. To avoid that, optimize init_expr * pow (step_expr, skip_niters) to
init_expr << (exact_log2 (step_expr) * skip_niters) when step_expr is
pow of 2, otherwise give up vectorization when skip_niters >=
TYPE_PRECISION (TREE_TYPE (init_expr)).

Also give up vectorization when niters_skip is negative which will be
used for fully masked loop.

gcc/ChangeLog:

PR tree-optimization/111820
PR tree-optimization/111833
* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Give
up vectorization for nonlinear iv vect_step_op_mul when
step_expr is not exact_log2 and niters is greater than
TYPE_PRECISION (TREE_TYPE (step_expr)). Also don't vectorize
for nagative niters_skip which will be used by fully masked
loop.
(vect_can_advance_ivs_p): Pass whole phi_info to
vect_can_peel_nonlinear_iv_p.
* tree-vect-loop.cc (vect_peel_nonlinear_iv_init): Optimize
init_expr * pow (step_expr, skipn) to init_expr
<< (log2 (step_expr) * skipn) when step_expr is exact_log2.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111820-1.c: New test.
* gcc.target/i386/pr111820-2.c: New test.
* gcc.target/i386/pr111820-3.c: New test.
* gcc.target/i386/pr103144-mul-1.c: Adjust testcase.
* gcc.target/i386/pr103144-mul-2.c: Adjust testcase.
---
 .../gcc.target/i386/pr103144-mul-1.c  |  8 +++---
 .../gcc.target/i386/pr103144-mul-2.c  |  8 +++---
 gcc/testsuite/gcc.target/i386/pr111820-1.c| 16 +++
 gcc/testsuite/gcc.target/i386/pr111820-2.c| 16 +++
 gcc/testsuite/gcc.target/i386/pr111820-3.c| 16 +++
 gcc/tree-vect-loop-manip.cc   | 28 +--
 gcc/tree-vect-loop.cc | 21 +++---
 7 files changed, 98 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr111820-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr111820-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr111820-3.c

diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c 
b/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
index 640c34fd959..913d7737dcd 100644
--- a/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
@@ -11,7 +11,7 @@ foo_mul (int* a, int b)
   for (int i = 0; i != N; i++)
 {
   a[i] = b;
-  b *= 3;
+  b *= 4;
 }
 }
 
@@ -23,7 +23,7 @@ foo_mul_const (int* a)
   for (int i = 0; i != N; i++)
 {
   a[i] = b;
-  b *= 3;
+  b *= 4;
 }
 }
 
@@ -34,7 +34,7 @@ foo_mul_peel (int* a, int b)
   for (int i = 0; i != 39; i++)
 {
   a[i] = b;
-  b *= 3;
+  b *= 4;
 }
 }
 
@@ -46,6 +46,6 @@ foo_mul_peel_const (int* a)
   for (int i = 0; i != 39; i++)
 {
   a[i] = b;
-  b *= 3;
+  b *= 4;
 }
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c 
b/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
index 39fdea3a69d..b2ff186e335 100644
--- a/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
@@ -16,12 +16,12 @@ avx2_test (void)
 
   __builtin_memset (epi32_exp, 0, N * sizeof (int));
   int b = 8;
-  v8si init = __extension__(v8si) { b, b * 3, b * 9, b * 27, b * 81, b * 243, 
b * 729, b * 2187 };
+  v8si init = __extension__(v8si) { b, b * 4, b * 16, b * 64, b * 256, b * 
1024, b * 4096, b * 16384 };
 
   for (int i = 0; i != N / 8; i++)
 {
   memcpy (epi32_exp + i * 8, &init, 32);
-  init *= 6561;
+  init *= 65536;
 }
 
   foo_mul (epi32_dst, b);
@@ -32,11 +32,11 @@ avx2_test (void)
   if (__builtin_memcmp (epi32_dst, epi32_exp, 39 * 4) != 0)
 __builtin_abort ();
 
-  init = __extension__(v8si) { 1, 3, 9, 27, 81, 243, 729, 2187 };
+  init = __extension__(v8si) { 1, 4, 16, 64, 256, 1024, 4096, 16384 };
   for (int i = 0; i != N / 8; i++)
 {
   memcpy (epi32_exp + i * 8, &init, 32);
-  init *= 6561;
+  init *= 65536;
 }
 
   foo_mul_const (epi32_dst);
diff --git a/gcc/testsuite/gcc.target/i386/pr111820-1.c 
b/gcc/testsuite/gcc.target/i386/pr111820-1.c
new file mode 100644
index 000..50e960c39d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr111820-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -fno-tree-vrp -Wno-aggressive-loop-optimizations 
-fdump-tree-vect-details" } */
+/* { dg-final { scan-tree-du

Re: [x86 PATCH] Fine tune STV register conversion costs for -Os.

2023-10-24 Thread Uros Bizjak
On Mon, Oct 23, 2023 at 4:47 PM Roger Sayle  wrote:
>
>
> The eagle-eyed may have spotted that my recent testcases for DImode shifts
> on x86_64 included -mno-stv in the dg-options.  This is because the
> Scalar-To-Vector (STV) pass currently transforms these shifts to use
> SSE vector operations, producing larger code even with -Os.  The issue
> is that the compute_convert_gain currently underestimates the size of
> instructions required for interunit moves, which is corrected with the
> patch below.
>
> For the simple test case:
>
> unsigned long long shl1(unsigned long long x) { return x << 1; }
>
> without this patch, GCC -m32 -Os -mavx2 currently generates:
>
> shl1:   push   %ebp  // 1 byte
> mov%esp,%ebp // 2 bytes
> vmovq  0x8(%ebp),%xmm0   // 5 bytes
> pop%ebp  // 1 byte
> vpaddq %xmm0,%xmm0,%xmm0 // 4 bytes
> vmovd  %xmm0,%eax// 4 bytes
> vpextrd $0x1,%xmm0,%edx  // 6 bytes
> ret  // 1 byte  = 24 bytes total
>
> with this patch, we now generate the shorter
>
> shl1:   push   %ebp // 1 byte
> mov%esp,%ebp// 2 bytes
> mov0x8(%ebp),%eax   // 3 bytes
> mov0xc(%ebp),%edx   // 3 bytes
> pop%ebp // 1 byte
> add%eax,%eax// 2 bytes
> adc%edx,%edx// 2 bytes
> ret // 1 byte  = 15 bytes total
>
> Benchmarking using CSiBE, shows that this patch saves 1361 bytes
> when compiling with -m32 -Os, and saves 172 bytes when compiling
> with -Os.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-10-23  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-features.cc (compute_convert_gain): Provide
> more accurate values (sizes) for inter-unit moves with -Os.

LGTM.

Thanks,
Uros.

>
>
> Thanks in advance,
> Roger
> --
>


[PATCH] gcov-io.h: fix comment regarding length of records

2023-10-24 Thread Jose E. Marchesi


The length of gcov records is stored as a signed 32-bit number of bytes.
Ok?

diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index bfe4439d02d..e6f33e32652 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -101,7 +101,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
Records are not nested, but there is a record hierarchy.  Tag
numbers reflect this hierarchy.  Tags are unique across note and
data files.  Some record types have a varying amount of data.  The
-   LENGTH is the number of 4bytes that follow and is usually used to
+   LENGTH is the number of bytes that follow and is usually used to
determine how much data.  The tag value is split into 4 8-bit
fields, one for each of four possible levels.  The most significant
is allocated first.  Unused levels are zero.  Active levels are


[committed] arc: Remove mpy_dest_reg_operand predicate

2023-10-24 Thread Claudiu Zissulescu
The mpy_dest_reg_operand is just a wrapper for
register_operand. Remove it.

gcc/

* config/arc/arc.md (mulsi3_700): Update pattern.
(mulsi3_v2): Likewise.
* config/arc/predicates.md (mpy_dest_reg_operand): Remove it.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/arc/arc.md| 6 +++---
 gcc/config/arc/predicates.md | 7 ---
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 22af0bf47dd..325e4f56b9b 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -2293,7 +2293,7 @@ (define_insn "mulu64"
 ; registers, since it cannot be the destination of a multi-cycle insn
 ; like MPY or MPYU.
 (define_insn "mulsi3_700"
- [(set (match_operand:SI 0 "mpy_dest_reg_operand""=r, r,r,  r,r")
+ [(set (match_operand:SI 0 "register_operand""=r, r,r,  r,r")
(mult:SI (match_operand:SI 1 "register_operand"  "%0, r,0,  0,r")
 (match_operand:SI 2 "nonmemory_operand" "rL,rL,I,Cal,Cal")))]
  "TARGET_ARC700_MPY"
@@ -2306,8 +2306,8 @@ (define_insn "mulsi3_700"
 ; ARCv2 has no penalties between mpy and mpyu. So, we use mpy because of its
 ; short variant. LP_COUNT constraints are still valid.
 (define_insn "mulsi3_v2"
- [(set (match_operand:SI 0 "mpy_dest_reg_operand""=q,q, r, r,r,  r,  
r")
-   (mult:SI (match_operand:SI 1 "register_operand"  "%0,q, 0, r,0,  0,  c")
+ [(set (match_operand:SI 0 "register_operand""=q,q, r, r,r,  r,  
r")
+   (mult:SI (match_operand:SI 1 "register_operand"  "%0,q, 0, r,0,  0,  r")
 (match_operand:SI 2 "nonmemory_operand"  
"q,0,rL,rL,I,Cal,Cal")))]
  "TARGET_MULTI"
  "@
diff --git a/gcc/config/arc/predicates.md b/gcc/config/arc/predicates.md
index e37d8844979..e0aef86fd24 100644
--- a/gcc/config/arc/predicates.md
+++ b/gcc/config/arc/predicates.md
@@ -23,13 +23,6 @@ (define_predicate "dest_reg_operand"
   return register_operand (op, mode);
 })
 
-(define_predicate "mpy_dest_reg_operand"
-  (match_code "reg,subreg")
-{
-  return register_operand (op, mode);
-})
-
-
 ;; Returns 1 if OP is a symbol reference.
 (define_predicate "symbolic_operand"
   (match_code "symbol_ref, label_ref, const")
-- 
2.30.2



Re: Re: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-24 Thread juzhe.zh...@rivai.ai
Hi, Richard.

Assertion failed at this IR: 

  _427 = _425 & _426;
  _429 = present$0_16(D) != 0;
  _430 = _425 & _429;
  _409 = _430 | _445;
  _410 = _409 | _449;
  _411 = .LOOP_VECTORIZED (3, 6);
  if (_411 != 0)
goto ; [100.00%]
  else
goto ; [100.00%]

   [local count: 3280550]:

   [local count: 29823181]:

pretmp_56 = .MASK_LOAD (_293, 32B, _427);   ---> cause assertion failed.

You can take a look this ifcvt IR and search 'pretmp_56 = .MASK_LOAD (_293, 
32B, _427);'
RVV is totally the same IR as ARM SVE:
https://godbolt.org/z/rPbzfExWP 

I was struggling at this issue for a few days but failed to figure out why.
I am sorry about that. Could you help me with that?

And I adjust the code as follows:
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a9200767f67..42f85839c6e 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9843,10 +9843,17 @@ vectorizable_load (vec_info *vinfo,
   mask_index = internal_fn_mask_index (ifn);
   if (mask_index >= 0 && slp_node)
mask_index = vect_slp_child_index_for_operand (call, mask_index);
+  slp_tree slp_op = NULL;
   if (mask_index >= 0
  && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
- &mask, NULL, &mask_dt, &mask_vectype))
+ &mask,
+ ifn == IFN_MASK_LEN_GATHER_LOAD ? &slp_op
+ : NULL,
+ &mask_dt, &mask_vectype))
return false;
+  if (mask_index >= 0 && slp_node
+ && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
+   gcc_unreachable ();
 }

As you can see, except MASK_LEN_GATHER_LOAD, other LOADs, I pass 'NULL' same as 
before.
Only MASK_LEN_GATHER_LOAD passes '&slp_op'. It works fine for RVV. But I don't 
think it's a correct code, we may need to find another solution.

Thanks.



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-10-20 06:19
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
"juzhe.zh...@rivai.ai"  writes:
> Hi, this patch fix V4 issue:
>
> Previously as Richard S commented:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633178.html 
>
> slp_op and mask_vectype are only initialised when mask_index >= 0.
> Shouldn't this code be under mask_index >= 0 too?
> Also, when do we encounter mismatched mask_vectypes?  Presumably the SLP
> node has a known vectype by this point.  I think a comment would be useful.
>
> Since I didn't encounter mismatched case in the regression of RISC-V and X86, 
> so 
> I fix it in V4 patch as follows:
> +  if (mask_index >= 0 && slp_node)
> + {
> +   bool match_p
> + = vect_maybe_update_slp_op_vectype (slp_op, mask_vectype);
> +   gcc_assert (match_p);
> + }
> Add assertion here.
>
> However, recently an ICE suddenly appear today in RISC-V regression:
>
> FAIL: gcc.dg/tree-ssa/pr44306.c (internal compiler error: in 
> vectorizable_load, at tree-vect-stmts.cc:9885)
> FAIL: gcc.dg/tree-ssa/pr44306.c (test for excess errors)
>
> This is because we are encountering that mask_vectype is boolean type and it 
> is external def.
> Then vect_maybe_update_slp_op_vectype will return false.
>
> Then I fix this piece of code in V5 here:
>
> +  if (mask_index >= 0 && slp_node
> +   && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
> + {
> +   /* We don't vectorize the boolean type external SLP mask.  */
> +   if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +  "incompatible vector types for invariants\n");
> +   return false;
> + }
>
> Bootstrap and Regression on x86 passed.
 
Why are external defs a problem though?  E.g. in pr44306, it looks like
we should be able to create an invariant mask that contains
 
   (!present[0]) || UseDefaultScalingMatrix8x8Flag[0]
   (!present[1]) || UseDefaultScalingMatrix8x8Flag[1]
   (!present[0]) || UseDefaultScalingMatrix8x8Flag[0]
   (!present[1]) || UseDefaultScalingMatrix8x8Flag[1]
   ...repeating...
 
The type of the mask can be driven by the code that needs it.
 
Thanks,
Richard
 
>
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>  
> From: Juzhe-Zhong
> Date: 2023-10-18 20:36
> To: gcc-patches
> CC: richard.sandiford; rguenther; Juzhe-Zhong
> Subject: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> This patch fixes this following FAILs in RISC-V regression:
>  
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
>  
> The root ca

Re: Re: [PATCH v2] RISC-V: Fix ICE of RVV vget/vset intrinsic[PR111935]

2023-10-24 Thread Kito Cheng
Ok for gcc 13 but just wait one more week to make sure everything is fine
as gcc convention :)

Li Xu 於 2023年10月24日 週二,15:49寫道:

> Committed to trunk. Thanks juzhe.
>
>
> --
>
>
>
> Li Xu
>
>
>
> >Ok for trunk (You can commit it to the trunk now).
>
>
>
> >
>
>
>
> >For GCC-13,  I'd like to wait for kito's comment.
>
>
>
> >
>
>
>
> >Thanks.
>
>
>
> >
>
>
>
> >
>
>
>
> >juzhe.zh...@rivai.ai
>
>
>
> >
>
>
>
> >From: Li Xu
>
>
>
> >Date: 2023-10-24 15:29
>
>
>
> >To: gcc-patches
>
>
>
> >CC: kito.cheng; palmer; juzhe.zhong
>
>
>
> >Subject: [PATCH v2] RISC-V: Fix ICE of RVV vget/vset intrinsic[PR111935]
>
>
>
> >
>
>
>
> >Calling vget/vset intrinsic without receiving a return value will cause
>
>
>
> >a crash. Because in this case e.target is null.
>
>
>
> >This patch should be backported to releases/gcc-13.
>
>
>
> >
>
>
>
> >PR/target 111935
>
>
>
> >
>
>
>
> >gcc/ChangeLog:
>
>
>
> >
>
>
>
> >* config/riscv/riscv-vector-builtins-bases.cc: fix bug.
>
>
>
> >
>
>
>
> >gcc/testsuite/ChangeLog:
>
>
>
> >
>
>
>
> >* gcc.target/riscv/rvv/base/pr111935.c: New test.
>
>
>
> >---
>
>
>
> > .../riscv/riscv-vector-builtins-bases.cc  |  4 +++
>
>
>
> > .../gcc.target/riscv/rvv/base/pr111935.c  | 26 +++
>
>
>
> > 2 files changed, 30 insertions(+)
>
>
>
> > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c
>
>
>
> >
>
>
>
> >diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
>
>
>
> >index ab12e130907..0b1409a52e0 100644
>
>
>
> >--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
>
>
>
> >+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
>
>
>
> >@@ -1740,6 +1740,8 @@ public:
>
>
>
> >
>
>
>
> >   rtx expand (function_expander &e) const override
>
>
>
> >   {
>
>
>
> >+if (!e.target)
>
>
>
> >+  return NULL_RTX;
>
>
>
> > rtx dest = expand_normal (CALL_EXPR_ARG (e.exp, 0));
>
>
>
> > gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (dest)));
>
>
>
> > rtx index = expand_normal (CALL_EXPR_ARG (e.exp, 1));
>
>
>
> >@@ -1777,6 +1779,8 @@ public:
>
>
>
> >
>
>
>
> >   rtx expand (function_expander &e) const override
>
>
>
> >   {
>
>
>
> >+if (!e.target)
>
>
>
> >+  return NULL_RTX;
>
>
>
> > rtx src = expand_normal (CALL_EXPR_ARG (e.exp, 0));
>
>
>
> > gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (src)));
>
>
>
> > rtx index = expand_normal (CALL_EXPR_ARG (e.exp, 1));
>
>
>
> >diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c
>
>
>
> >new file mode 100644
>
>
>
> >index 000..0b936d849a1
>
>
>
> >--- /dev/null
>
>
>
> >+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c
>
>
>
> >@@ -0,0 +1,26 @@
>
>
>
> >+/* { dg-do compile } */
>
>
>
> >+/* { dg-options "-march=rv64gcv -mabi=lp64d -O0 -Wno-psabi" } */
>
>
>
> >+
>
>
>
> >+#include "riscv_vector.h"
>
>
>
> >+
>
>
>
> >+inline vuint32m4_t __attribute__((__always_inline__))
> transpose_indexes() {
>
>
>
> >+  static const uint32_t idx_[16] = {0, 4, 8, 12,
>
>
>
> >+  1, 5, 9, 13,
>
>
>
> >+  2, 6, 10, 14,
>
>
>
> >+  3, 7, 11, 15};
>
>
>
> >+  return __riscv_vle32_v_u32m4(idx_, 16);
>
>
>
> >+}
>
>
>
> >+
>
>
>
> >+void pffft_real_preprocess_4x4(const float *in) {
>
>
>
> >+  vfloat32m1_t r0=__riscv_vle32_v_f32m1(in,4);
>
>
>
> >+  vfloat32m4_t tmp = __riscv_vundefined_f32m4();
>
>
>
> >+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 0, r0);
>
>
>
> >+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 1, r0);
>
>
>
> >+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 2, r0);
>
>
>
> >+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 3, r0);
>
>
>
> >+  tmp = __riscv_vrgather_vv_f32m4(tmp, transpose_indexes(), 16);
>
>
>
> >+  r0 = __riscv_vget_v_f32m4_f32m1(tmp, 0);
>
>
>
> >+}
>
>
>
> >+
>
>
>
> >+/* { dg-final { scan-assembler-times
> {vl[0-9]+re[0-9]+\.v\s+v[0-9]+,\s*0\([a-z]+[0-9]+\)} 10 } } */
>
>
>
> >+/* { dg-final { scan-assembler-times
> {vs[0-9]+r\.v\s+v[0-9]+,\s*0\([a-z]+[0-9]+\)} 8 } } */
>
>
>
> >--
>
>
>
> >2.17.1
>
>
>
> >
>
>
>
> >
>
>
>
> >xu...@eswincomputing.com
>
>
>


Re: [PATCH] testsuite: Fix gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c

2023-10-24 Thread Richard Earnshaw




On 08/09/2023 09:43, Christophe Lyon via Gcc-patches wrote:

The test was declaring 'int *carry;' and wrote to '*carry' without
initializing 'carry' first, leading to an attempt to write at address
zero, and a crash.

Fix by declaring 'int carry;' and passing '&carrry' instead of 'carry'
as parameter.

2023-09-08  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: Fix.


OK.

R.


---
  .../arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c | 34 +--
  1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c 
b/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
index a8c6cce67c8..931c9d2f30b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
+++ b/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
@@ -7,7 +7,7 @@
  
  volatile int32x4_t c1;

  volatile uint32x4_t c2;
-int *carry;
+int carry;
  
  int

  main ()
@@ -21,45 +21,45 @@ main ()
uint32x4_t inactive2 = vcreateq_u32 (0, 0);
  
mve_pred16_t p = 0x;

-  (*carry) = 0x;
+  carry = 0x;
  
__builtin_arm_set_fpscr_nzcvqc (0);

-  c1 = vadcq (a1, b1, carry);
+  c1 = vadcq (a1, b1, &carry);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c2 = vadcq (a2, b2, carry);
+  c2 = vadcq (a2, b2, &carry);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c1 = vsbcq (a1, b1, carry);
+  c1 = vsbcq (a1, b1, &carry);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c2 = vsbcq (a2, b2, carry);
+  c2 = vsbcq (a2, b2, &carry);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c1 = vadcq_m (inactive1, a1, b1, carry, p);
+  c1 = vadcq_m (inactive1, a1, b1, &carry, p);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c2 = vadcq_m (inactive2, a2, b2, carry, p);
+  c2 = vadcq_m (inactive2, a2, b2, &carry, p);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c1 = vsbcq_m (inactive1, a1, b1, carry, p);
+  c1 = vsbcq_m (inactive1, a1, b1, &carry, p);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c2 = vsbcq_m (inactive2, a2, b2, carry, p);
+  c2 = vsbcq_m (inactive2, a2, b2, &carry, p);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
  


[RFC PATCH] Detecting lifetime-dse issues via Valgrind

2023-10-24 Thread exactlywb
From: Daniil Frolov 

PR 66487 is asking to provide sanitizer-like detection for C++ object lifetime
violations that are worked around with -fno-lifetime-dse in Firefox, LLVM,
OpenJade.

The discussion in the PR was centered around extending MSan, but MSan was not
ported to GCC (and requires rebuilding everything with instrumentation).

Instead, allow Valgrind to see lifetime boundaries by emitting client requests
along *this = { CLOBBER }.  The client request marks the "clobbered" memory as
undefined for Valgrind; clobbering assignments mark the beginning of ctor and
end of dtor execution for C++ objects.  Hence, attempts to read object storage
after the destructor, or "pre-initialize" its fields prior to the constructor
will be caught.

Valgrind client requests are offered as macros that emit inline asm.  For use
in code generation, we need to wrap it in a built-in.  We know that implementing
such a built-in in libgcc is undesirable, ideally contents of libgcc should not
depend on availability of external headers.  Suggestion for cleaner solutions
would be welcome.

gcc/ChangeLog:

* Makefile.in: Add gimple-valgrind.o.
* builtins.def (BUILT_IN_VALGRIND_MEM_UNDEF): Add new built-in.
* common.opt: Add new option.
* passes.def: Add new pass.
* tree-pass.h (make_pass_emit_valgrind): New function.
* gimple-valgrind.cc: New file.

libgcc/ChangeLog:

* Makefile.in: Add valgrind.o.
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Add option --enable-valgrind-annotations into libgcc
config.
* libgcc2.h (__valgrind_make_mem_undefined): New function.
* valgrind.c: New file.
---
 gcc/Makefile.in|   1 +
 gcc/builtins.def   |   1 +
 gcc/common.opt |   4 ++
 gcc/gimple-valgrind.cc | 124 +
 gcc/passes.def |   1 +
 gcc/tree-pass.h|   1 +
 libgcc/Makefile.in |   2 +-
 libgcc/config.in   |   9 +++
 libgcc/configure   |  79 ++
 libgcc/configure.ac|  48 
 libgcc/libgcc2.h   |   1 +
 libgcc/valgrind.c  |  50 +
 12 files changed, 320 insertions(+), 1 deletion(-)
 create mode 100644 gcc/gimple-valgrind.cc
 create mode 100644 libgcc/valgrind.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9cc16268abf..ded6bdf1673 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1487,6 +1487,7 @@ OBJS = \
gimple-ssa-warn-access.o \
gimple-ssa-warn-alloca.o \
gimple-ssa-warn-restrict.o \
+   gimple-valgrind.o \
gimple-streamer-in.o \
gimple-streamer-out.o \
gimple-walk.o \
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 5953266acba..42d34189f1e 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1064,6 +1064,7 @@ DEF_GCC_BUILTIN(BUILT_IN_VA_END, "va_end", 
BT_FN_VOID_VALIST_REF, ATTR_N
 DEF_GCC_BUILTIN(BUILT_IN_VA_START, "va_start", 
BT_FN_VOID_VALIST_REF_VAR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_VA_ARG_PACK, "va_arg_pack", BT_FN_INT, 
ATTR_PURE_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_VA_ARG_PACK_LEN, "va_arg_pack_len", 
BT_FN_INT, ATTR_PURE_NOTHROW_LEAF_LIST)
+DEF_EXT_LIB_BUILTIN(BUILT_IN_VALGRIND_MEM_UNDEF, 
"__valgrind_make_mem_undefined", BT_FN_VOID_PTR_SIZE, ATTR_NOTHROW_LEAF_LIST)
 DEF_EXT_LIB_BUILTIN(BUILT_IN__EXIT, "_exit", BT_FN_VOID_INT, 
ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN__EXIT2, "_Exit", BT_FN_VOID_INT, 
ATTR_NORETURN_NOTHROW_LEAF_LIST)
 
diff --git a/gcc/common.opt b/gcc/common.opt
index f137a1f81ac..c9040386956 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2515,6 +2515,10 @@ starts and when the destructor finishes.
 flifetime-dse=
 Common Joined RejectNegative UInteger Var(flag_lifetime_dse) Optimization 
IntegerRange(0, 2)
 
+fvalgrind-emit-annotations
+Common Var(flag_valgrind_annotations,1)
+Emit Valgrind annotations with respect to object's lifetime.
+
 flive-patching
 Common RejectNegative Alias(flive-patching=,inline-clone) Optimization
 
diff --git a/gcc/gimple-valgrind.cc b/gcc/gimple-valgrind.cc
new file mode 100644
index 000..8075e6404d4
--- /dev/null
+++ b/gcc/gimple-valgrind.cc
@@ -0,0 +1,124 @@
+/* Emit Valgrind client requests.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not 

Re: [PATCH v1 1/1] gcc: config: microblaze: fix cpu version check

2023-10-24 Thread Michael Eager

On 10/24/23 00:01, Frager, Neal wrote:

There is a microblaze cpu version 10.0 included in versal. If the
minor version is only a single digit, then the version comparison
will fail as version 10.0 will appear as 100 compared to version
6.00 or 8.30 which will calculate to values 600 and 830.
The issue can be seen when using the '-mcpu=10.0' option.
With this fix, versions with a single digit minor number such as
10.0 will be calculated as greater than versions with a smaller
major version number, but with two minor version digits.
By applying this fix, several incorrect warning messages will no
longer be printed when building the versal plm application, such as
the warning message below:
warning: '-mxl-multiply-high' can be used only with '-mcpu=v6.00.a'
or greater
Signed-off-by: Neal Frager 
---
   gcc/config/microblaze/microblaze.cc | 164 +---
   1 file changed, 76 insertions(+), 88 deletions(-)


Please add a test case.

--
Michael Eager


Hi Michael,

Would you mind helping me understand how to make a gcc test case for this patch?

This patch does not change the resulting binaries of a microblaze gcc build.  
The output will be the same with our without the patch, so I do not having 
anything in the binary itself to verify.

All that happens is false warning messages will not be printed when building 
with ‘-mcpu=10.0’.  Is there a way to test for warning messages?

In any case, please do not commit v1 of this patch.  I am going to work on 
making a v2 based on Mark’s feedback.



You can create a test case which passes the -mcpu=10.0 and other options to GCC 
and verify that the message is not generated after the patch is applied.



You can make all GCC warnings into errors with the "-Werror" option.
This means that the compile will fail if the warning is issued.



Take a look at gcc/testsuite/gcc.target/aarch64/bti-1.c for an example of using { dg-options 
"" } to specify command line options.



There is a test suite option (dg-warning) which checks that a particular source 
line generates a warning message, but it isn't clear whether is is possible to 
check that a warning is not issued.


Hi Michael,

Thanks to Mark Hatle's feedback, we have a much simpler solution to the problem.

The following change is actually all that is necessary.  Since we are just 
moving from
strcasecmp to strverscmp, does v2 of the patch need to have a test case to go 
with it?

-#define MICROBLAZE_VERSION_COMPARE(VA,VB) strcasecmp (VA, VB)
+#define MICROBLAZE_VERSION_COMPARE(VA,VB) strverscmp (VA, VB)

I assume there are already test cases that verify that strverscmp works 
correctly?


Still need a test case to verify this fix.

--
Michael Eager


Re: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2023-10-24 Thread Richard Sandiford
Sorry for the slow review.  I had a look at the arm bits too, to get
some context for the target-independent bits.

Stamatis Markianos-Wright via Gcc-patches  writes:
> [...]
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 77e76336e94..74186930f0b 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -65,8 +65,8 @@ extern void arm_emit_speculation_barrier_function (void);
>  extern void arm_decompose_di_binop (rtx, rtx, rtx *, rtx *, rtx *, rtx *);
>  extern bool arm_q_bit_access (void);
>  extern bool arm_ge_bits_access (void);
> -extern bool arm_target_insn_ok_for_lob (rtx);
> -
> +extern bool arm_target_bb_ok_for_lob (basic_block);
> +extern rtx arm_attempt_dlstp_transform (rtx);
>  #ifdef RTX_CODE
>  enum reg_class
>  arm_mode_base_reg_class (machine_mode);
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index 6e933c80183..39d97ba5e4d 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -659,6 +659,12 @@ static const struct attribute_spec arm_attribute_table[]
> [...]
> +/* Wrapper function of arm_get_required_vpr_reg with TYPE == 1, so return
> +   something only if the VPR reg is an input operand to the insn.  */
> +
> +static rtx
> +ALWAYS_INLINE

Probably best to leave out the ALWAYS_INLINE.  That's generally only
appropriate for things that need to be inlined for correctness.

> +arm_get_required_vpr_reg_param (rtx_insn *insn)
> +{
> +  return arm_get_required_vpr_reg (insn, 1);
> +}
> [...]
> +/* Recursively scan through the DF chain backwards within the basic block and
> +   determine if any of the USEs of the original insn (or the USEs of the 
> insns
> +   where thy were DEF-ed, etc., recursively) were affected by implicit VPT
> +   predication of an MVE_VPT_UNPREDICATED_INSN_P in a dlstp/letp loop.
> +   This function returns true if the insn is affected implicit predication
> +   and false otherwise.
> +   Having such implicit predication on an unpredicated insn wouldn't in 
> itself
> +   block tail predication, because the output of that insn might then be used
> +   in a correctly predicated store insn, where the disabled lanes will be
> +   ignored.  To verify this we later call:
> +   `arm_mve_check_df_chain_fwd_for_implic_predic_impact`, which will check 
> the
> +   DF chains forward to see if any implicitly-predicated operand gets used in
> +   an improper way.  */
> +
> +static bool
> +arm_mve_check_df_chain_back_for_implic_predic
> +  (hash_map, bool>* safe_insn_map, rtx_insn *insn,
> +   rtx vctp_vpr_generated)
> +{
> +  bool* temp = NULL;
> +  if ((temp = safe_insn_map->get (INSN_UID (insn
> +return *temp;
> +
> +  basic_block body = BLOCK_FOR_INSN (insn);
> +  /* The circumstances under which an instruction is affected by "implicit
> + predication" are as follows:
> +  * It is an UNPREDICATED_INSN_P:
> + * That loads/stores from/to memory.
> + * Where any one of its operands is an MVE vector from outside the
> +   loop body bb.
> + Or:
> +  * Any of it's operands, recursively backwards, are affected.  */
> +  if (MVE_VPT_UNPREDICATED_INSN_P (insn)
> +  && (arm_is_mve_load_store_insn (insn)
> +   || (arm_is_mve_across_vector_insn (insn)
> +   && !arm_mve_is_allowed_unpredic_across_vector_insn (insn
> +{
> +  safe_insn_map->put (INSN_UID (insn), true);
> +  return true;
> +}
> +
> +  df_ref insn_uses = NULL;
> +  FOR_EACH_INSN_USE (insn_uses, insn)
> +  {
> +/* If the operand is in the input reg set to the the basic block,
> +   (i.e. it has come from outside the loop!), consider it unsafe if:
> +  * It's being used in an unpredicated insn.
> +  * It is a predicable MVE vector.  */
> +if (MVE_VPT_UNPREDICATED_INSN_P (insn)
> + && VALID_MVE_MODE (GET_MODE (DF_REF_REG (insn_uses)))
> + && REGNO_REG_SET_P (DF_LR_IN (body), DF_REF_REGNO (insn_uses)))
> +  {
> + safe_insn_map->put (INSN_UID (insn), true);
> + return true;
> +  }
> +/* Scan backwards from the current INSN through the instruction chain
> +   until the start of the basic block.  */
> +for (rtx_insn *prev_insn = PREV_INSN (insn);
> +  prev_insn && prev_insn != PREV_INSN (BB_HEAD (body));
> +  prev_insn = PREV_INSN (prev_insn))
> +  {
> + /* If a previous insn defines a register that INSN uses, then recurse
> +in order to check that insn's USEs.
> +If any of these insns return true as MVE_VPT_UNPREDICATED_INSN_Ps,
> +then the whole chain is affected by the change in behaviour from
> +being placed in dlstp/letp loop.  */
> + df_ref prev_insn_defs = NULL;
> + FOR_EACH_INSN_DEF (prev_insn_defs, prev_insn)
> + {
> +   if (DF_REF_REGNO (insn_uses) == DF_REF_REGNO (prev_insn_defs)
> +   && !arm_mve_vec_insn_is_predicated_with_this_predicate
> +(insn, vctp_vpr_generated)
> +   && arm_mve_check_df_chain_back_for_impl

Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization

2023-10-24 Thread Kito Cheng
> +using namespace rtl_ssa;
> +using namespace riscv_vector;
> +
> +/* The AVL propagation instructions and corresponding preferred AVL.
> +   It will be updated during the analysis.  */
> +static hash_map *avlprops;

Maybe put into member data of pass_avlprop?

> +
> +const pass_data pass_data_avlprop = {
> +  RTL_PASS, /* type */
> +  "avlprop",/* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE,  /* tv_id */
> +  0,/* properties_required */
> +  0,/* properties_provided */
> +  0,/* properties_destroyed */
> +  0,/* todo_flags_start */
> +  0,/* todo_flags_finish */
> +};
> +
> +class pass_avlprop : public rtl_opt_pass
> +{
> +public:
> +  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass (pass_data_avlprop, ctxt) 
> {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) final override
> +  {
> +return TARGET_VECTOR && optimize > 0;
> +  }
> +  virtual unsigned int execute (function *) final override;
> +}; // class pass_avlprop
> +
> +static void
> +avlprop_init (void)

Maybe put into member function of pass_avlprop?

> +{
> +  calculate_dominance_info (CDI_DOMINATORS);
> +  df_analyze ();
> +  crtl->ssa = new function_info (cfun);

And take function * from incomping parameter of execute

> +  avlprops = new hash_map;
> +}
> +
> +static void
> +avlprop_done (void)
> +{
> +  free_dominance_info (CDI_DOMINATORS);
> +  if (crtl->ssa->perform_pending_updates ())
> +cleanup_cfg (0);
> +  delete crtl->ssa;
> +  crtl->ssa = nullptr;
> +  delete avlprops;
> +  avlprops = NULL;
> +}
> +
> +/* Helper function to get AVL operand.  */
> +static rtx
> +get_avl (insn_info *insn, bool avlprop_p)
> +{
> +  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
> +  || get_attr_avl_type (insn->rtl ()) == VLS)
> +return NULL_RTX;
> +  if (avlprop_p)
> +{
> +  if (avlprops->get (insn))
> +   return (*avlprops->get (insn));
> +  else if (vlmax_avl_type_p (insn->rtl ()))
> +   return RVV_VLMAX;

I guess I didn't get why we need handle vlmax_avl_type_p here?

> +}
> +  extract_insn_cached (insn->rtl ());
> +  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
> +}
> +
> +/* This is a straight forward pattern ALWAYS in paritial auto-vectorization:
> +
> + VL = SELECT_AVL (AVL, ...)
> + V0 = MASK_LEN_LOAD (..., VL)
> + V1 = MASK_LEN_LOAD (..., VL)
> + V2 = V0 + V1 --- Missed LEN information.
> + MASK_LEN_STORE (..., V2, VL)
> +
> +   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0, V1, dummy LEN)
> +   because:
> +
> + - Few code changes in Loop Vectorizer.
> + - Reuse the current clean flow of partial vectorization, That is, apply
> +   predicate LEN or MASK into LOAD/STORE operations and other special
> +   arithmetic operations (e.d. DIV), then do the whole vector register
> +   operation if it DON'T affect the correctness.
> +   Such flow is used by all other targets like x86, sve, s390, ... etc.
> + - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD.
> +
> +   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like PLUS_EXPR 
> which
> +   generates the VLMAX instruction due to missed LEN information. The later
> +   VSETVL PASS will elided the redundant vsetvls.
> +*/
> +
> +static rtx
> +get_autovectorize_preferred_avl (insn_info *insn)
> +{
> +  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p (insn->rtl ()))
> +return NULL_RTX;

I would prefer adding new attribute to let this become simpler.

> +
> +  rtx use_avl = NULL_RTX;
> +  insn_info *avl_use_insn = nullptr;
> +  unsigned int ratio
> += calculate_ratio (get_sew (insn->rtl ()), get_vlmul (insn->rtl ()));
> +  for (def_info *def : insn->defs ())
> +{
> +  auto set = safe_dyn_cast (def);
> +  if (!set || !set->is_reg ())
> +   return NULL_RTX;
> +  for (use_info *use : set->all_uses ())
> +   {
> + if (!use->is_in_nondebug_insn ())
> +   return NULL_RTX;
> + insn_info *use_insn = use->insn ();
> + /* FIXME: Stop AVL propagation if any USE is not a RVV real
> +instruction. It should be totally enough for vectorized codes 
> since
> +they always locate at extended blocks.
> +
> +TODO: We can extend PHI checking for intrinsic codes if it
> +necessary in the future.  */
> + if (use_insn->is_artificial () || !has_vtype_op (use_insn->rtl ()))
> +   return NULL_RTX;
> + if (!has_vl_op (use_insn->rtl ()))
> +   continue;
> +
> + rtx new_use_avl = get_avl (use_insn, true);
> + if (!new_use_avl)
> +   return NULL_RTX;
> + if (!use_avl)
> +   use_avl = new_use_avl;
> + if (!rtx_equal_p (use_avl, new_use_avl)
> + || calculate_ratio (get_sew (use_insn->rtl ()),
> + get_vlmul (use_insn->rtl ()))
> +  

[PATCH] testsuite: Fix _BitInt in gcc.misc-tests/godump-1.c

2023-10-24 Thread Stefan Schulze Frielinghaus
Currently _BitInt is only supported on x86_64 which means that for other
targets all tests fail with e.g.

gcc.misc-tests/godump-1.c:237:1: sorry, unimplemented: '_BitInt(32)' is not 
supported on this target
  237 | _BitInt(32) b32_v;
  | ^~~

Instead of requiring _BitInt support for godump-1.c, move _BitInt tests
into godump-2.c such that all other tests in godump-1.c are still
executed in case of missing _BitInt support.

Tested on s390x and x86_64.  Ok for mainline?

gcc/testsuite/ChangeLog:

* gcc.misc-tests/godump-1.c: Move _BitInt tests into godump-2.c.
* gcc.misc-tests/godump-2.c: New test.
---
 gcc/testsuite/gcc.misc-tests/godump-1.c | 12 
 gcc/testsuite/gcc.misc-tests/godump-2.c | 18 ++
 2 files changed, 18 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.misc-tests/godump-2.c

diff --git a/gcc/testsuite/gcc.misc-tests/godump-1.c 
b/gcc/testsuite/gcc.misc-tests/godump-1.c
index f359a657827..b661d04719c 100644
--- a/gcc/testsuite/gcc.misc-tests/godump-1.c
+++ b/gcc/testsuite/gcc.misc-tests/godump-1.c
@@ -234,18 +234,6 @@ const char cc_v1;
 cc_t cc_v2;
 /* { dg-final { scan-file godump-1.out "(?n)^var _cc_v2 _cc_t$" } } */
 
-_BitInt(32) b32_v;
-/* { dg-final { scan-file godump-1.out "(?n)^var _b32_v int32$" } } */
-
-_BitInt(64) b64_v;
-/* { dg-final { scan-file godump-1.out "(?n)^var _b64_v int64$" } } */
-
-unsigned _BitInt(32) b32u_v;
-/* { dg-final { scan-file godump-1.out "(?n)^var _b32u_v uint32$" } } */
-
-_BitInt(33) b33_v;
-/* { dg-final { scan-file godump-1.out "(?n)^// var _b33_v INVALID-bitint-33$" 
} } */
-
 /*** pointer and array types ***/
 typedef void *vp_t;
 /* { dg-final { scan-file godump-1.out "(?n)^type _vp_t \\*byte$" } } */
diff --git a/gcc/testsuite/gcc.misc-tests/godump-2.c 
b/gcc/testsuite/gcc.misc-tests/godump-2.c
new file mode 100644
index 000..ed093c964ac
--- /dev/null
+++ b/gcc/testsuite/gcc.misc-tests/godump-2.c
@@ -0,0 +1,18 @@
+/* { dg-options "-c -fdump-go-spec=godump-2.out" } */
+/* { dg-do compile { target bitint } } */
+/* { dg-skip-if "not supported for target" { ! "alpha*-*-* s390*-*-* i?86-*-* 
x86_64-*-*" } } */
+/* { dg-skip-if "not supported for target" { ! lp64 } } */
+
+_BitInt(32) b32_v;
+/* { dg-final { scan-file godump-2.out "(?n)^var _b32_v int32$" } } */
+
+_BitInt(64) b64_v;
+/* { dg-final { scan-file godump-2.out "(?n)^var _b64_v int64$" } } */
+
+unsigned _BitInt(32) b32u_v;
+/* { dg-final { scan-file godump-2.out "(?n)^var _b32u_v uint32$" } } */
+
+_BitInt(33) b33_v;
+/* { dg-final { scan-file godump-2.out "(?n)^// var _b33_v INVALID-bitint-33$" 
} } */
+
+/* { dg-final { remove-build-file "godump-2.out" } } */
-- 
2.41.0



[PATCH] config, aarch64: Use a more compatible sed invocation.

2023-10-24 Thread Iain Sandoe
Although this came up initially when working on the Darwin Arm64
port, it also breaks cross-compilers on platforms with non-GNU sed.

Tested on x86_64-darwin X aarch64-linux-gnu, aarch64-darwin,
aarch64-linux-gnu and x86_64-linux-gnu.  OK for master?
thanks,
Iain

--- 8< ---

Currently, the sed command used to parse --with-{cpu,tune,arch} are
using GNU-specific extension to the -e (recognising extended regex).

This is failing on Darwin, which defaults to Posix behaviour for -e.
However '-E' is accepted to indicate an extended RE.  Strictly, this
is also not really sufficient, since we should only require a Posix
sed (but it seems supported for BSD-derivatives).

gcc/ChangeLog:

* config.gcc: Use -E to to sed to indicate that we are using
extended REs.

Signed-off-by: Iain Sandoe 
---
 gcc/config.gcc | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 606d3a8513e..a7216907261 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4199,8 +4199,8 @@ case "${target}" in
fi
for which in cpu arch tune; do
eval "val=\$with_$which"
-   base_val=`echo $val | sed -e 's/\+.*//'`
-   ext_val=`echo $val | sed -e 's/[a-z0-9.-]\+//'`
+   base_val=`echo $val | sed -E 's/\+.*//'`
+   ext_val=`echo $val | sed -E 's/[a-z0-9.-]+//'`
 
if [ $which = arch ]; then
  def=aarch64-arches.def
@@ -4232,9 +4232,9 @@ case "${target}" in
 
  while [ x"$ext_val" != x ]
  do
-   ext_val=`echo $ext_val | sed -e 's/\+//'`
-   ext=`echo $ext_val | sed -e 's/\+.*//'`
-   base_ext=`echo $ext | sed -e 's/^no//'`
+   ext_val=`echo $ext_val | sed -E 's/\+//'`
+   ext=`echo $ext_val | sed -E 's/\+.*//'`
+   base_ext=`echo $ext | sed -E 's/^no//'`
opt_line=`echo -e "$options_parsed" | \
grep "^\"$base_ext\""`
 
@@ -4245,7 +4245,7 @@ case "${target}" in
  echo "Unknown extension used in 
--with-$which=$val" 1>&2
  exit 1
fi
-   ext_val=`echo $ext_val | sed -e 
's/[a-z0-9]\+//'`
+   ext_val=`echo $ext_val | sed -E 's/[a-z0-9]+//'`
  done
 
  true
-- 
2.39.2 (Apple Git-143)



Re: [PATCH] Fortran: Fix incompatible types between INTEGER(8) and TYPE(c_ptr)

2023-10-24 Thread Tobias Burnus

Hi PA, hello all,

First, I hesitate to review/approve a patch I am involved in; Thus, I would like
if someone could have a second look.

Regarding the patch itself:


On 20.10.23 16:02, Paul-Antoine Arraswrote:

Hi all,

The attached patch fixes a bug that causes valid OpenMP declare
variant directive
and functions to be rejected with the following error (see testcase):

[...]
Error: variant ‘foo_variant’ and base ‘foo’ at (1) have incompatible
types: Type mismatch in argument 'c_bv' (INTEGER(8)/TYPE(c_ptr))

The fix consists in special-casing this situation in gfc_compare_types().
OK for mainline?

...

Subject: [PATCH] Fortran: Fix incompatible types between INTEGER(8) and
  TYPE(c_ptr)

In the context of an OpenMP declare variant directive, arguments of type C_PTR
are sometimes recognised as C_PTR in the base function and as INTEGER(8) in the
variant - or the other way around, depending on the parsing order.
This patch prevents such situation from turning into a compile error.

2023-10-20  Paul-Antoine Arras
  Tobias Burnus

gcc/fortran/ChangeLog:

  * interface.cc (gfc_compare_types): Return true in this situation.


That's a bad description. It makes sense when reading the commit log but if you
only read gcc/fortran/ChangeLog, 'this situation' is a dangling reference.


  gcc/fortran/ChangeLog.omp|  5 ++
  gcc/testsuite/ChangeLog.omp  |  4 ++


On mainline, the ChangeLog not ChangeLog.omp is used. This changelog is 
automatically
filled by the data in the commit log. Thus, no need to include it in the patch.
(Besides: It keeps getting outdated by any other commit to that file.)

As you have a commit, running the following, possibly with the commit hash as
argument (unless it is the last commit) will show that the nightly script will 
use:

./contrib/gcc-changelog/git_check_commit.py -v -p

It is additionally a good check whether you got the syntax right. (This is run
as pre-commit hook.)

* * *

Regarding the patch, I think it will work, but I wonder whether we can do
better - esp. regarding c_ptr vs. c_funptr.

I started by looking why the current code fails:


index e9843e9549c..8bd35fd6d22 100644
--- a/gcc/fortran/interface.cc
+++ b/gcc/fortran/interface.cc
@@ -705,12 +705,17 @@ gfc_compare_types (gfc_typespec *ts1, gfc_typespec *ts2)
-
-  if (((ts1->type == BT_INTEGER && ts2->type == BT_DERIVED)
-   || (ts1->type == BT_DERIVED && ts2->type == BT_INTEGER))
-  && ts1->u.derived && ts2->u.derived
-  && ts1->u.derived == ts2->u.derived)


This does not work because the pointers to the derived type are different:

(gdb) p *ts1
$10 = {type = BT_INTEGER, kind = 8, u = {derived = 0x30c66b0, cl = 0x30c66b0, 
pad = 51144368}, interface = 0x0, is_c_interop = 1, is_iso_c = 0, f90_type = 
BT_VOID, deferred = false,
  interop_kind = 0x0}

(gdb) p *ts2
$11 = {type = BT_DERIVED, kind = 0, u = {derived = 0x30c2930, cl = 0x30c2930, 
pad = 51128624}, interface = 0x0, is_c_interop = 0, is_iso_c = 0, f90_type = 
BT_UNKNOWN,
  deferred = false, interop_kind = 0x0}

The reason seems to be that they are freshly created
in different namespaces. Consequently, attr.use_assoc = 1
and the namespace is different, i.e.


(gdb) p ts1->u.derived->ns->proc_name->name
$18 = 0x76ff4138 "foo"

(gdb) p ts2->u.derived->ns->proc_name->name
$19 = 0x76ffc260 "foo_variant"

* * *

Having said this, I think we can combine the current
and the modified version, i.e.


+  if ((ts1->type == BT_INTEGER && ts2->type == BT_DERIVED
+   && ts1->f90_type == BT_VOID
+   && ts2->u.derived->ts.is_iso_c
+   && ts2->u.derived->ts.u.derived->ts.f90_type == BT_VOID)
+  || (ts2->type == BT_INTEGER && ts1->type == BT_DERIVED
+   && ts2->f90_type == BT_VOID
+   && ts1->u.derived->ts.is_iso_c
+   && ts1->u.derived->ts.u.derived->ts.f90_type == BT_VOID))


See attached patch for a combined version, which checks now
whether from_intmod == INTMOD_ISO_C_BINDING and then compares
the names (to distinguish c_ptr and c_funptr). Those are unaffected
by 'use' renames, hence, we should be fine.

While in this example, the name pointers are identical, I fear that
won't hold in some more complex indirect use via module-use. Thus,
strcmp is used.

(gdb) p ts1->u.derived->name
$13 = 0x76ff4100 "c_ptr"

(gdb) p ts2->u.derived->name
$14 = 0x76ff4100 "c_ptr"

* * *

Additionally, I think it would be good to have a testcase which checks for
  c_funptr vs. c_ptr
mismatch.

Just changing c_ptr to c_funptr in the testcase (+ commenting the c_f_pointer)
prints:
  Error: variant ‘foo_variant’ and base ‘foo’ at (1) have incompatible types: 
Type mismatch in argument 'c_bv' (INTEGER(8)/TYPE(c_funptr))

I think that would be a good invalid testcase besides the valid one.

But with a tweak to get better messages to give:
  Error: variant ‘foo_variant’ and base ‘foo’ at (1) have incompatible types: 
Type mismatch in argument 'c_bv' (TYPE(c_ptr)/TYPE(c_funptr))

cf. misc.cc in the atta

[PATCH] c++: error with bit-fields and scoped enums [PR111895]

2023-10-24 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we issue a bogus error: invalid operands of types 'unsigned char:2'
and 'int' to binary 'operator!=' when casting a bit-field of scoped enum
type to bool.

In build_static_cast_1, perform_direct_initialization_if_possible returns
NULL_TREE, because the invented declaration T t(e) fails, which is
correct.  So we go down to ocp_convert, which has code to deal with this
case:
  /* We can't implicitly convert a scoped enum to bool, so convert
 to the underlying type first.  */
  if (SCOPED_ENUM_P (intype) && (convtype & CONV_STATIC))
e = build_nop (ENUM_UNDERLYING_TYPE (intype), e);
but the SCOPED_ENUM_P is false since intype is .
This could be fixed by using unlowered_expr_type.  But then
c_common_truthvalue_conversion/CASE_CONVERT has a similar problem, and
unlowered_expr_type is a C++-only function.

Rather than adding a dummy unlowered_expr_type to C, I think we should
follow [expr.static.cast]p3: "the lvalue-to-rvalue conversion is applied
to the bit-field and the resulting prvalue is used as the operand of the
static_cast."  There are no prvalue bit-fields, so the l-to-r conversion
will get us an expression whose type is the enum.  (I thought we didn't
need decay_conversion because that does a whole lot more but using it
would make sense to me too.)

PR c++/111895

gcc/cp/ChangeLog:

* typeck.cc (build_static_cast_1): Call
convert_bitfield_to_declared_type.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/scoped_enum12.C: New test.
---
 gcc/cp/typeck.cc   | 9 +
 gcc/testsuite/g++.dg/cpp0x/scoped_enum12.C | 8 
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/scoped_enum12.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index f3dc80c40cf..50427090e5d 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -8405,6 +8405,15 @@ build_static_cast_1 (location_t loc, tree type, tree 
expr, bool c_cast_p,
return expr;
   if (TREE_CODE (expr) == EXCESS_PRECISION_EXPR)
expr = TREE_OPERAND (expr, 0);
+  /* [expr.static.cast]: "If the value is not a bit-field, the result
+refers to the object or the specified base class subobject thereof;
+otherwise, the lvalue-to-rvalue conversion is applied to the
+bit-field and the resulting prvalue is used as the operand of the
+static_cast."  There are no prvalue bit-fields; the l-to-r conversion
+will give us an object of the underlying type of the bit-field.  We
+can let convert_bitfield_to_declared_type convert EXPR to the desired
+type.  */
+  expr = convert_bitfield_to_declared_type (expr);
   return ocp_convert (type, expr, CONV_C_CAST, LOOKUP_NORMAL, complain);
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/scoped_enum12.C 
b/gcc/testsuite/g++.dg/cpp0x/scoped_enum12.C
new file mode 100644
index 000..1d10431e6dc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/scoped_enum12.C
@@ -0,0 +1,8 @@
+// PR c++/111895
+// { dg-do compile { target c++11 } }
+
+enum class o_field : unsigned char { no, yes, different_from_s };
+struct fields {
+  o_field o : 2;
+};
+bool func(fields f) { return static_cast(f.o); }

base-commit: 99a6c1065de2db04d0f56f4b2cc89acecf21b72e
-- 
2.41.0



Re: [PATCH] c++: cp_stabilize_reference and non-dep exprs [PR111919]

2023-10-24 Thread Jason Merrill

On 10/23/23 19:49, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?

-- >8 --

After the removal of NON_DEPENDENT_EXPR, cp_stabilize_reference which
used to just exit early for NON_DEPENDENT_EXPR is now more prone to
passing a weird templated tree to middle-end routines, which leads to a
crash from contains_placeholder_p in the testcase below.  It seems the
best fix is to just disable cp_stabilize_reference when in a template
context like we already do for cp_save_expr; it seems SAVE_EXPR should
never appear in a templated tree (since e.g. tsubst doesn't handle it).


Hmm.  We don't want the result of cp_stabilize_reference (or 
cp_save_expr) to end up in the resulting trees in template context. 
Having a SAVE_EXPR in the result would actually be helpful for catching 
such a bug.


That said, the patch is OK.


PR c++/111919

gcc/cp/ChangeLog:

* tree.cc (cp_stabilize_reference): Do nothing when
processing_template_decl.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent27.C: New test.
---
  gcc/cp/tree.cc  | 4 
  gcc/testsuite/g++.dg/template/non-dependent27.C | 8 
  2 files changed, 12 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/template/non-dependent27.C

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index a3d61d3e7c9..417c92ba76f 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -408,6 +408,10 @@ bitfield_p (const_tree ref)
  tree
  cp_stabilize_reference (tree ref)
  {
+  if (processing_template_decl)
+/* As in cp_save_expr.  */
+return ref;
+
STRIP_ANY_LOCATION_WRAPPER (ref);
switch (TREE_CODE (ref))
  {
diff --git a/gcc/testsuite/g++.dg/template/non-dependent27.C 
b/gcc/testsuite/g++.dg/template/non-dependent27.C
new file mode 100644
index 000..cf7af6e6425
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/non-dependent27.C
@@ -0,0 +1,8 @@
+// PR c++/111919
+
+int i[3];
+
+template
+void f() {
+  i[42 / (int) sizeof (T)] |= 0;
+}




[PATCH] c++: build_new_1 and non-dep array size [PR111929]

2023-10-24 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
like the right approach?

-- >8 --

This PR is another instance of NON_DEPENDENT_EXPR having acted as an
"analysis barrier" for middle-end routines, and now that it's gone we
may end up passing weird templated trees (that have a generic tree code)
to the middle-end which leads to an ICE.  In the testcase below the
non-dependent array size 'var + 42' is expressed as an ordinary
PLUS_EXPR, but whose operand types have different precisions -- long and
int respectively -- naturally because templated trees encode only the
syntactic form of an expression devoid of e.g. implicit conversions
(typically).  This type incoherency triggers a wide_int assert during
the call to size_binop in build_new_1 which requires the operand types
have the same precision.

This patch fixes this by replacing our incremental folding of 'size'
within build_new_1 with a single call to cp_fully_fold (which is a no-op
in template context) once 'size' is fully built.

PR c++/111929

gcc/cp/ChangeLog:

* init.cc (build_new_1): Use convert, build2, build3 instead of
fold_convert, size_binop and fold_build3 when building 'size'.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent28.C: New test.
---
 gcc/cp/init.cc  | 9 +
 gcc/testsuite/g++.dg/template/non-dependent28.C | 6 ++
 2 files changed, 11 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/non-dependent28.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index d48bb16c7c5..56c1b5e9f5e 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -3261,7 +3261,7 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
   max_outer_nelts = wi::udiv_trunc (max_size, inner_size);
   max_outer_nelts_tree = wide_int_to_tree (sizetype, max_outer_nelts);
 
-  size = size_binop (MULT_EXPR, size, fold_convert (sizetype, nelts));
+  size = build2 (MULT_EXPR, sizetype, size, convert (sizetype, nelts));
 
   if (TREE_CODE (cst_outer_nelts) == INTEGER_CST)
{
@@ -3344,7 +3344,7 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
   /* Use a class-specific operator new.  */
   /* If a cookie is required, add some extra space.  */
   if (array_p && TYPE_VEC_NEW_USES_COOKIE (elt_type))
-   size = size_binop (PLUS_EXPR, size, cookie_size);
+   size = build2 (PLUS_EXPR, sizetype, size, cookie_size);
   else
{
  cookie_size = NULL_TREE;
@@ -3358,8 +3358,8 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
   if (cxx_dialect >= cxx11 && flag_exceptions)
errval = throw_bad_array_new_length ();
   if (outer_nelts_check != NULL_TREE)
-   size = fold_build3 (COND_EXPR, sizetype, outer_nelts_check,
-   size, errval);
+   size = build3 (COND_EXPR, sizetype, outer_nelts_check, size, errval);
+  size = cp_fully_fold (size);
   /* Create the argument list.  */
   vec_safe_insert (*placement, 0, size);
   /* Do name-lookup to find the appropriate operator.  */
@@ -3418,6 +3418,7 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
   /* If size is zero e.g. due to type having zero size, try to
 preserve outer_nelts for constant expression evaluation
 purposes.  */
+  size = cp_fully_fold (size);
   if (integer_zerop (size) && outer_nelts)
size = build2 (MULT_EXPR, TREE_TYPE (size), size, outer_nelts);
 
diff --git a/gcc/testsuite/g++.dg/template/non-dependent28.C 
b/gcc/testsuite/g++.dg/template/non-dependent28.C
new file mode 100644
index 000..3e45154f61d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/non-dependent28.C
@@ -0,0 +1,6 @@
+// PR c++/111929
+
+template
+void f(long var) {
+  new int[var + 42];
+}
-- 
2.42.0.424.gceadf0f3cf



[PATCH V14 4/4] ree: Improve ree pass using defined abi interfaces

2023-10-24 Thread Ajit Agarwal
Hello Vineet, Jeff and Bernhard:

This version 14 of the patch uses abi interfaces to remove zero and sign 
extension elimination.
This fixes aarch64 regressions failures with aggressive CSE.

Bootstrapped and regtested on powerpc-linux-gnu.

In this version (version 14) of the patch following review comments are 
incorporated.

a) Removal of hard code zero_extend and sign_extend  in abi interfaces.
b) Source and destination with different registers are considered.
c) Further enhancements.
d) Added sign extension elimination using abi interfaces.
d) Addressed remaining review comments from Vineet.
e) Addressed review comments from Bernhard.
f) Fix aarch64 regressions failure.

Please let me know if there is anything missing in this patch.

Ok for trunk?

Thanks & Regards
Ajit

ree: Improve ree pass using defined abi interfaces

For rs6000 target we see zero and sign extend with missing
definitions. Improved to eliminate such zero and sign extension
using defined ABI interfaces.

2023-10-24  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (combine_reaching_defs): Eliminate zero_extend and sign_extend
using defined abi interfaces.
(add_removable_extension): Use of defined abi interfaces for no
reaching defs.
(abi_extension_candidate_return_reg_p): New function.
(abi_extension_candidate_p): New function.
(abi_extension_candidate_argno_p): New function.
(abi_handle_regs): New function.
(abi_target_promote_function_mode): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim-3.C
---
changes since v6:
  - Added missing abi interfaces.
  - Rearranging and restructuring the code.
  - Removal of hard coded zero extend and sign extend in abi interfaces.
  - Relaxed different registers with source and destination in abi interfaces.
  - Using CSE in abi interfaces.
  - Fix aarch64 regressions.
  - Add Sign extension removal in abi interfaces.
  - Modified comments as per coding convention.
  - Modified code as per coding convention.
  - Fix bug bootstrapping RISCV failures.
---
 gcc/ree.cc| 147 +-
 .../g++.target/powerpc/zext-elim-3.C  |  13 ++
 2 files changed, 154 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..f557b49b366 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
 if (REGNO (DF_REF_REG (def)) == REGNO (reg))
   break;
 
-  gcc_assert (def != NULL);
+  if (def == NULL)
+return NULL;
 
   ref_chain = DF_REF_CHAIN (def);
 
@@ -750,6 +751,120 @@ get_extended_src_reg (rtx src)
   return src;
 }
 
+/* Return TRUE if target mode is equal to source mode, false otherwise.  */
+
+static bool
+abi_target_promote_function_mode (machine_mode mode)
+{
+  int unsignedp;
+  machine_mode tgt_mode
+= targetm.calls.promote_function_mode (NULL_TREE, mode, &unsignedp,
+  NULL_TREE, 1);
+
+  if (tgt_mode == mode)
+return true;
+  else
+return false;
+}
+
+/* Return TRUE if regno is a return register.  */
+
+static inline bool
+abi_extension_candidate_return_reg_p (int regno)
+{
+  if (targetm.calls.function_value_regno_p (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the following conditions are satisfied.
+
+  a) reg source operand is argument register and not return register.
+  b) mode of source and destination operand are different.
+  c) if not promoted REGNO of source and destination operand are same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  machine_mode dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set), 0);
+
+  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  || abi_extension_candidate_return_reg_p (REGNO (orig_src)))
+return false;
+
+  /* Return FALSE if mode of destination and source is same.  */
+  if (dst_mode == GET_MODE (orig_src))
+return false;
+
+  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
+  bool promote_p = abi_target_promote_function_mode (mode);
+
+  /* Return FALSE if promote is false and REGNO of source and destination
+ is different.  */
+  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
+return false;
+
+  return true;
+}
+
+/* Return TRUE if regno is an argument register.  */
+
+static inline bool
+abi_extension_candidate_argno_p (int regno)
+{
+  return FUNCTION_ARG_REGNO_P (regno);
+}
+
+/* Return TRUE if the candidate insn doesn't have defs and have
+ * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class.  */
+
+static bool
+abi_handle_regs (rtx_insn *insn)
+{
+  if (side_effects_p (PATTERN (insn)))
+return false;
+
+  struct df_link *uses = get_uses (insn, SET_DEST (PATTERN (insn)));
+
+  if (!uses)
+return false;
+
+  for (df_link *use = uses; use; use = use->n

Re: [PATCH 1/6] rtl-ssa: Ensure global registers are live on exit

2023-10-24 Thread Jeff Law




On 10/24/23 04:50, Richard Sandiford wrote:

RTL-SSA mostly relies on DF for block-level register liveness
information, including artificial uses and defs at the beginning
and end of blocks.  But one case was missing.  DF does not add
artificial uses of global registers to the beginning or end
of a block.  Instead it marks them as used within every block
when computing LR and LIVE problems.

For RTL-SSA, global registers behave like memory, which in
turn behaves like gimple vops.  We need to ensure that they
are live on exit so that final definitions do not appear
to be unused.

Also, the previous live-on-exit handling only considered the exit
block itself.  It needs to consider non-local gotos as well, since
they jump directly to some code in a parent function and so do
not have a path to the exit block.

gcc/
* rtl-ssa/blocks.cc (function_info::add_artificial_accesses): Force
global registers to be live on exit.  Handle any block with zero
successors like an exit block.

OK
jeff


Re: [PATCH 2/6] rtl-ssa: Create REG_UNUSED notes after all pending changes

2023-10-24 Thread Jeff Law




On 10/24/23 04:50, Richard Sandiford wrote:

Unlike REG_DEAD notes, REG_UNUSED notes need to be kept free of
false positives by all passes.  function_info::change_insns
does this by removing all REG_UNUSED notes, and then using
add_reg_unused_notes to add notes back (or create new ones)
where appropriate.

The problem was that it called add_reg_unused_notes on the fly
while updating each instruction, which meant that the information
for later instructions in the change set wasn't up to date.
This patch does it in a separate loop instead.

gcc/
* rtl-ssa/changes.cc (function_info::apply_changes_to_insn): Remove
call to add_reg_unused_notes and instead...
(function_info::change_insns): ...use a separate loop here.

OK
jeff


Re: [PATCH 3/6] rtl-ssa: Fix ICE when deleting memory clobbers

2023-10-24 Thread Jeff Law




On 10/24/23 04:50, Richard Sandiford wrote:

Sometimes an optimisation can remove a clobber of scratch registers
or scratch memory.  We then need to update the DU chains to reflect
the removed clobber.

For registers this isn't a problem.  Clobbers of registers are just
momentary blips in the register's lifetime.  They act as a barrier for
moving uses later or defs earlier, but otherwise they have no effect on
the semantics of other instructions.  Removing a clobber is therefore a
cheap, local operation.

In contrast, clobbers of memory are modelled as full sets.
This is because (a) a clobber of memory does not invalidate
*all* memory and (b) it's a common idiom to use (clobber (mem ...))
in stack barriers.  But removing a set and redirecting all uses
to a different set is a linear operation.  Doing it for potentially
every optimisation could lead to quadratic behaviour.

This patch therefore refrains from removing sets of memory that appear
to be redundant.  There's an opportunity to clean this up in linear time
at the end of the pass, but as things stand, nothing would benefit from
that.

This is also a very rare event.  Usually we should try to optimise the
insn before the scratch memory has been allocated.

gcc/
* rtl-ssa/changes.cc (function_info::finalize_new_accesses):
If a change describes a set of memory, ensure that that set
is kept, regardless of the insn pattern.

OK
jeff


Re: [PATCH 4/6] rtl-ssa: Handle artifical uses of deleted defs

2023-10-24 Thread Jeff Law




On 10/24/23 04:50, Richard Sandiford wrote:

If an optimisation removes the last real use of a definition,
there can still be artificial uses left.  This patch removes
those uses too.

These artificial uses exist because RTL-SSA is only an SSA-like
view of the existing RTL IL, rather than a native SSA representation.
It effectively treats RTL registers like gimple vops, but with the
addition of an RPO view of the register's lifetime(s).  Things are
structured to allow most operations to update this RPO view in
amortised sublinear time.

gcc/
* rtl-ssa/functions.h (function_info::process_uses_of_deleted_def):
New member function.
* rtl-ssa/functions.cc (function_info::process_uses_of_deleted_def):
Likewise.
(function_info::change_insns): Use it.

OK
jeff


[PATCH v2] AArch64: Improve immediate generation

2023-10-24 Thread Wilco Dijkstra
v2: Use check-function-bodies in tests

Further improve immediate generation by adding support for 2-instruction
MOV/EOR bitmask immediates.  This reduces the number of 3/4-instruction
immediates in SPECCPU2017 by ~2%.

Passes regress, OK for commit?

gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_internal_mov_immediate)
Add support for immediates using MOV/EOR bitmask.

gcc/testsuite:
* gcc.target/aarch64/imm_choice_comparison.c: Change tests.
* gcc.target/aarch64/moveor_imm.c: Add new test.
* gcc.target/aarch64/pr106583.c: Change tests.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
578a253d6e0e133e19592553fc873b3e73f9f218..ed5be2b64c9a767d74e9d78415da964c669001aa
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -5748,6 +5748,26 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool 
generate,
}
  return 2;
}
+
+  /* Try 2 bitmask immediates which are xor'd together. */
+  for (i = 0; i < 64; i += 16)
+   {
+ val2 = (val >> i) & mask;
+ val2 |= val2 << 16;
+ val2 |= val2 << 32;
+ if (aarch64_bitmask_imm (val2) && aarch64_bitmask_imm (val ^ val2))
+   break;
+   }
+
+  if (i != 64)
+   {
+ if (generate)
+   {
+ emit_insn (gen_rtx_SET (dest, GEN_INT (val2)));
+ emit_insn (gen_xordi3 (dest, dest, GEN_INT (val ^ val2)));
+   }
+ return 2;
+   }
 }
 
   /* Try a bitmask plus 2 movk to generate the immediate in 3 instructions.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c 
b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
index 
ebc44d6dbc7287d907603d77d7b54496de177c4b..a1fc90ad73411ae8ed848fa321586afcb8d710aa
 100644
--- a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
+++ b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
@@ -1,32 +1,64 @@
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 /* Go from four moves to two.  */
 
+/*
+** foo:
+** mov w[0-9]+, 2576980377
+** movkx[0-9]+, 0x, lsl 32
+** ...
+*/
+
 int
 foo (long long x)
 {
-  return x <= 0x1998;
+  return x <= 0x9998;
 }
 
+/*
+** GT:
+** mov w[0-9]+, -16777217
+** ...
+*/
+
 int
 GT (unsigned int x)
 {
   return x > 0xfefe;
 }
 
+/*
+** LE:
+** mov w[0-9]+, -16777217
+** ...
+*/
+
 int
 LE (unsigned int x)
 {
   return x <= 0xfefe;
 }
 
+/*
+** GE:
+** mov w[0-9]+, 4278190079
+** ...
+*/
+
 int
 GE (long long x)
 {
   return x >= 0xff00;
 }
 
+/*
+** LT:
+** mov w[0-9]+, -16777217
+** ...
+*/
+
 int
 LT (int x)
 {
@@ -35,6 +67,13 @@ LT (int x)
 
 /* Optimize the immediate in conditionals.  */
 
+/*
+** check:
+** ...
+** mov w[0-9]+, -16777217
+** ...
+*/
+
 int
 check (int x, int y)
 {
@@ -44,11 +83,15 @@ check (int x, int y)
   return x;
 }
 
+/*
+** tern:
+** ...
+** mov w[0-9]+, -16777217
+** ...
+*/
+
 int
 tern (int x)
 {
   return x >= 0xff00 ? 5 : -3;
 }
-
-/* baz produces one movk instruction.  */
-/* { dg-final { scan-assembler-times "movk" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/moveor_imm.c 
b/gcc/testsuite/gcc.target/aarch64/moveor_imm.c
new file mode 100644
index 
..1c0c3f3bf8c588f9661112a8b3f9a72c5ddff95c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/moveor_imm.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** f1:
+**  movx0, -6148914691236517206
+** eor x0, x0, -9223372036854775807
+** ret
+*/
+
+long f1 (void)
+{
+  return 0x2aab;
+}
+
+/*
+** f2:
+** mov x0, -1085102592571150096
+** eor x0, x0, -2305843009213693951
+** ret
+*/
+
+long f2 (void)
+{
+  return 0x10f0f0f0f0f0f0f1;
+}
+
+/*
+** f3:
+** mov x0, -3689348814741910324
+** eor x0, x0, -4611686018427387903
+** ret
+*/
+
+long f3 (void)
+{
+  return 0xccd;
+}
+
+/*
+** f4:
+** mov x0, -7378697629483820647
+** eor x0, x0, -9223372036854775807
+** ret
+*/
+
+long f4 (void)
+{
+  return 0x1998;
+}
+
+/*
+** f5:
+** mov x0, 3689348814741910323
+** eor x0, x0, 864691128656461824
+** ret
+*/
+
+long f5 (void)
+{
+  return 0x3f333f33;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr106583.c 
b/gcc/testsuite/gcc.target/aarch64/pr106583.c
index 
0f931580817d78dc1cc58f03b251bd21bec71f59..63df7395edf9491720e3601848e15aa773c51e6d
 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr106583.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr106583.c
@@ -1,41 +1,94 @@
-/* { dg-do assemble } */
-/* { dg-options "-O2 --save-temps" } */
+/* { dg-do compile } */
+/* { dg-options "-O2" } *

Re: [PATCH 5/6] rtl-ssa: Calculate dominance frontiers for the exit block

2023-10-24 Thread Jeff Law




On 10/24/23 04:50, Richard Sandiford wrote:

The exit block can have multiple predecessors, for example if the
function calls __builtin_eh_return.  We might then need PHI nodes
for values that are live on exit.

RTL-SSA uses the normal dominance frontiers approach for calculating
where PHI nodes are needed.  However, dominannce.cc only calculates
dominators for normal blocks, not the exit block.
calculate_dominance_frontiers likewise only calculates dominance
frontiers for normal blocks.

This patch fills in the “missing” frontiers manually.

gcc/
* rtl-ssa/internals.h (build_info::exit_block_dominator): New
member variable.
* rtl-ssa/blocks.cc (build_info::build_info): Initialize it.
(bb_walker::bb_walker): Use it, moving the computation of the
dominator to...
(function_info::process_all_blocks): ...here.
(function_info::place_phis): Add dominance frontiers for the
exit block.

OK
jeff


Re: [PATCH 6/6] rtl-ssa: Handle call clobbers in more places

2023-10-24 Thread Jeff Law




On 10/24/23 04:50, Richard Sandiford wrote:

In order to save (a lot of) memory, RTL-SSA avoids creating
individual clobber records for every call-clobbered register.
It instead maintains a list & splay tree of calls in an EBB,
grouped by ABI.

This patch takes these call clobbers into account in a couple
more routines.  I don't think this will have any effect on
existing users, since it's only necessary for hard registers.

gcc/
* rtl-ssa/access-utils.h (next_call_clobbers): New function.
(is_single_dominating_def, remains_available_on_exit): Replace with...
* rtl-ssa/functions.h (function_info::is_single_dominating_def)
(function_info::remains_available_on_exit): ...these new member
functions.
(function_info::m_clobbered_by_calls): New member variable.
* rtl-ssa/functions.cc (function_info::function_info): Explicitly
initialize m_clobbered_by_calls.
* rtl-ssa/insns.cc (function_info::record_call_clobbers): Update
m_clobbered_by_calls for each call-clobber note.
* rtl-ssa/member-fns.inl (function_info::is_single_dominating_def):
New function.  Check for call clobbers.
* rtl-ssa/accesses.cc (function_info::remains_available_on_exit):
Likewise.

OK
jeff

---


[PATCH 0/3] rtl-ssa: Various extensions for the late-combine pass

2023-10-24 Thread Richard Sandiford
This series adds some RTL-SSA enhancements that are needed
by the late-combine pass.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard

Richard Sandiford (3):
  rtl-ssa: Use frequency-weighted insn costs
  rtl-ssa: Extend make_uses_available
  rtl-ssa: Add new helper functions

 gcc/Makefile.in|   1 +
 gcc/rtl-ssa/access-utils.h |  41 +++
 gcc/rtl-ssa/accesses.cc| 100 -
 gcc/rtl-ssa/changes.cc |  28 +--
 gcc/rtl-ssa/functions.h|   4 ++
 gcc/rtl-ssa/movement.cc|  40 +++
 gcc/rtl-ssa/movement.h |   4 ++
 7 files changed, 212 insertions(+), 6 deletions(-)
 create mode 100644 gcc/rtl-ssa/movement.cc

-- 
2.25.1



[PATCH 1/3] rtl-ssa: Use frequency-weighted insn costs

2023-10-24 Thread Richard Sandiford
rtl_ssa::changes_are_worthwhile used the standard approach
of summing up the individual costs of the old and new sequences
to see which one is better overall.  But when optimising for
speed and changing instructions in multiple blocks, it seems
better to weight the cost of each instruction by its execution
frequency.  (We already do something similar for SLP layouts.)

gcc/
* rtl-ssa/changes.cc: Include sreal.h.
(rtl_ssa::changes_are_worthwhile): When optimizing for speed,
scale the cost of each instruction by its execution frequency.
---
 gcc/rtl-ssa/changes.cc | 28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index 3e14069421c..aab532b9f26 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -34,6 +34,7 @@
 #include "emit-rtl.h"
 #include "cfghooks.h"
 #include "cfgrtl.h"
+#include "sreal.h"
 
 using namespace rtl_ssa;
 
@@ -171,18 +172,33 @@ rtl_ssa::changes_are_worthwhile (array_slice changes,
 {
   unsigned int old_cost = 0;
   unsigned int new_cost = 0;
+  sreal weighted_old_cost = 0;
+  sreal weighted_new_cost = 0;
+  auto entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
   for (insn_change *change : changes)
 {
   old_cost += change->old_cost ();
+  basic_block cfg_bb = change->bb ()->cfg_bb ();
+  bool for_speed = optimize_bb_for_speed_p (cfg_bb);
+  if (for_speed)
+   weighted_old_cost += (cfg_bb->count.to_sreal_scale (entry_count)
+ * change->old_cost ());
   if (!change->is_deletion ())
{
- basic_block cfg_bb = change->bb ()->cfg_bb ();
- change->new_cost = insn_cost (change->rtl (),
-   optimize_bb_for_speed_p (cfg_bb));
+ change->new_cost = insn_cost (change->rtl (), for_speed);
  new_cost += change->new_cost;
+ if (for_speed)
+   weighted_new_cost += (cfg_bb->count.to_sreal_scale (entry_count)
+ * change->new_cost);
}
 }
-  bool ok_p = (strict_p ? new_cost < old_cost : new_cost <= old_cost);
+  bool ok_p;
+  if (weighted_new_cost != weighted_old_cost)
+ok_p = weighted_new_cost < weighted_old_cost;
+  else if (strict_p)
+ok_p = new_cost < old_cost;
+  else
+ok_p = new_cost <= old_cost;
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "original cost");
@@ -192,6 +208,8 @@ rtl_ssa::changes_are_worthwhile (array_slice changes,
  fprintf (dump_file, " %c %d", sep, change->old_cost ());
  sep = '+';
}
+  if (weighted_old_cost != 0)
+   fprintf (dump_file, " (weighted: %f)", weighted_old_cost.to_double ());
   fprintf (dump_file, ", replacement cost");
   sep = '=';
   for (const insn_change *change : changes)
@@ -200,6 +218,8 @@ rtl_ssa::changes_are_worthwhile (array_slice changes,
fprintf (dump_file, " %c %d", sep, change->new_cost);
sep = '+';
  }
+  if (weighted_new_cost != 0)
+   fprintf (dump_file, " (weighted: %f)", weighted_new_cost.to_double ());
   fprintf (dump_file, "; %s\n",
   ok_p ? "keeping replacement" : "rejecting replacement");
 }
-- 
2.25.1



[PATCH 2/3] rtl-ssa: Extend make_uses_available

2023-10-24 Thread Richard Sandiford
The first in-tree use of RTL-SSA was fwprop, and one of the goals
was to make the fwprop rewrite preserve the old behaviour as far
as possible.  The switch to RTL-SSA was supposed to be a pure
infrastructure change.  So RTL-SSA has various FIXMEs for things
that were artifically limited to faciliate the old-fwprop vs.
new-fwprop comparison.

One of the things that fwprop wants to do is extend live ranges, and
function_info::make_use_available tried to keep within the cases that
old fwprop could handle.

Since the information is built in extended basic blocks, it's easy
to handle intra-EBB queries directly.  This patch does that, and
removes the associated FIXME.

To get a flavour for how much difference this makes, I tried compiling
the testsuite at -Os for at least one target per supported CPU and OS.
For most targets, only a handful of tests changed, but the vast majority
of changes were positive.  The only target that seemed to benefit
significantly was i686-apple-darwin.

The main point of the patch is to remove the FIXME and to enable
the upcoming post-RA late-combine pass to handle more cases.

gcc/
* rtl-ssa/functions.h (function_info::remains_available_at_insn):
New member function.
* rtl-ssa/accesses.cc (function_info::remains_available_at_insn):
Likewise.
(function_info::make_use_available): Avoid false negatives for
queries within an EBB.
---
 gcc/rtl-ssa/accesses.cc | 37 +++--
 gcc/rtl-ssa/functions.h |  4 
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index c35c7efb73d..1b25ecc3e23 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -1303,6 +1303,33 @@ function_info::insert_temp_clobber (obstack_watermark 
&watermark,
   return insert_access (watermark, clobber, old_defs);
 }
 
+// See the comment above the declaration.
+bool
+function_info::remains_available_at_insn (const set_info *set,
+ insn_info *insn)
+{
+  auto *ebb = set->ebb ();
+  gcc_checking_assert (ebb == insn->ebb ());
+
+  def_info *next_def = set->next_def ();
+  if (next_def && *next_def->insn () < *insn)
+return false;
+
+  if (HARD_REGISTER_NUM_P (set->regno ())
+  && TEST_HARD_REG_BIT (m_clobbered_by_calls, set->regno ()))
+for (ebb_call_clobbers_info *call_group : ebb->call_clobbers ())
+  {
+   if (!call_group->clobbers (set->resource ()))
+ continue;
+
+   insn_info *call_insn = next_call_clobbers (*call_group, insn);
+   if (call_insn && *call_insn < *insn)
+ return false;
+  }
+
+  return true;
+}
+
 // See the comment above the declaration.
 bool
 function_info::remains_available_on_exit (const set_info *set, bb_info *bb)
@@ -1354,14 +1381,20 @@ function_info::make_use_available (use_info *use, 
bb_info *bb,
   if (is_single_dominating_def (def))
 return use;
 
-  // FIXME: Deliberately limited for fwprop compatibility testing.
+  if (def->ebb () == bb->ebb ())
+{
+  if (remains_available_at_insn (def, bb->head_insn ()))
+   return use;
+  return nullptr;
+}
+
   basic_block cfg_bb = bb->cfg_bb ();
   bb_info *use_bb = use->bb ();
   if (single_pred_p (cfg_bb)
   && single_pred (cfg_bb) == use_bb->cfg_bb ()
   && remains_available_on_exit (def, use_bb))
 {
-  if (def->ebb () == bb->ebb () || will_be_debug_use)
+  if (will_be_debug_use)
return use;
 
   resource_info resource = use->resource ();
diff --git a/gcc/rtl-ssa/functions.h b/gcc/rtl-ssa/functions.h
index ab253e750cb..ecb40fdaf57 100644
--- a/gcc/rtl-ssa/functions.h
+++ b/gcc/rtl-ssa/functions.h
@@ -121,6 +121,10 @@ public:
   // scope until the change has been aborted or successfully completed.
   obstack_watermark new_change_attempt () { return &m_temp_obstack; }
 
+  // SET and INSN belong to the same EBB, with SET occuring before INSN.
+  // Return true if SET is still available at INSN.
+  bool remains_available_at_insn (const set_info *set, insn_info *insn);
+
   // SET either occurs in BB or is known to be available on entry to BB.
   // Return true if it is also available on exit from BB.  (The value
   // might or might not be live.)
-- 
2.25.1



[PATCH 3/3] rtl-ssa: Add new helper functions

2023-10-24 Thread Richard Sandiford
This patch adds some RTL-SSA helper functions.  They will be
used by the upcoming late-combine pass.

The patch contains the first non-template out-of-line function declared
in movement.h, so it adds a movement.cc.  I realise it seems a bit
over-the-top to have a file with just one function, but it might grow
in future. :)

gcc/
* Makefile.in (OBJS): Add rtl-ssa/movement.o.
* rtl-ssa/access-utils.h (accesses_include_nonfixed_hard_registers)
(single_set_info): New functions.
(remove_uses_of_def, accesses_reference_same_resource): Declare.
(insn_clobbers_resources): Likewise.
* rtl-ssa/accesses.cc (rtl_ssa::remove_uses_of_def): New function.
(rtl_ssa::accesses_reference_same_resource): Likewise.
(rtl_ssa::insn_clobbers_resources): Likewise.
* rtl-ssa/movement.h (can_move_insn_p): Declare.
* rtl-ssa/movement.cc: New file.
---
 gcc/Makefile.in|  1 +
 gcc/rtl-ssa/access-utils.h | 41 +
 gcc/rtl-ssa/accesses.cc| 63 ++
 gcc/rtl-ssa/movement.cc| 40 
 gcc/rtl-ssa/movement.h |  4 +++
 5 files changed, 149 insertions(+)
 create mode 100644 gcc/rtl-ssa/movement.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7b7a4ff789a..91d6bfbea4d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1635,6 +1635,7 @@ OBJS = \
rtl-ssa/changes.o \
rtl-ssa/functions.o \
rtl-ssa/insns.o \
+   rtl-ssa/movement.o \
rtl-tests.o \
rtl.o \
rtlhash.o \
diff --git a/gcc/rtl-ssa/access-utils.h b/gcc/rtl-ssa/access-utils.h
index 0d7a57f843c..f078625babf 100644
--- a/gcc/rtl-ssa/access-utils.h
+++ b/gcc/rtl-ssa/access-utils.h
@@ -33,6 +33,20 @@ accesses_include_hard_registers (const access_array 
&accesses)
   return accesses.size () && HARD_REGISTER_NUM_P (accesses.front ()->regno ());
 }
 
+// Return true if ACCESSES includes a reference to a non-fixed hard register.
+inline bool
+accesses_include_nonfixed_hard_registers (access_array accesses)
+{
+  for (access_info *access : accesses)
+{
+  if (!HARD_REGISTER_NUM_P (access->regno ()))
+   break;
+  if (!fixed_regs[access->regno ()])
+   return true;
+}
+  return false;
+}
+
 // Return true if sorted array ACCESSES includes an access to memory.
 inline bool
 accesses_include_memory (const access_array &accesses)
@@ -246,6 +260,22 @@ last_def (def_mux mux)
   return mux.last_def ();
 }
 
+// If INSN's definitions contain a single set, return that set, otherwise
+// return null.
+inline set_info *
+single_set_info (insn_info *insn)
+{
+  set_info *set = nullptr;
+  for (auto def : insn->defs ())
+if (auto this_set = dyn_cast (def))
+  {
+   if (set)
+ return nullptr;
+   set = this_set;
+  }
+  return set;
+}
+
 int lookup_use (splay_tree &, insn_info *);
 int lookup_def (def_splay_tree &, insn_info *);
 int lookup_clobber (clobber_tree &, insn_info *);
@@ -539,6 +569,10 @@ insert_access (obstack_watermark &watermark,
   return T (insert_access_base (watermark, access1, accesses2));
 }
 
+// Return a copy of USES that drops any use of DEF.
+use_array remove_uses_of_def (obstack_watermark &, use_array uses,
+ def_info *def);
+
 // The underlying non-template implementation of remove_note_accesses.
 access_array remove_note_accesses_base (obstack_watermark &, access_array);
 
@@ -554,4 +588,11 @@ remove_note_accesses (obstack_watermark &watermark, T 
accesses)
   return T (remove_note_accesses_base (watermark, accesses));
 }
 
+// Return true if ACCESSES1 and ACCESSES2 have at least one resource in common.
+bool accesses_reference_same_resource (access_array accesses1,
+  access_array accesses2);
+
+// Return true if INSN clobbers the value of any resources in ACCESSES.
+bool insn_clobbers_resources (insn_info *insn, access_array accesses);
+
 }
diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index 1b25ecc3e23..510545a8bad 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -1569,6 +1569,19 @@ rtl_ssa::insert_access_base (obstack_watermark 
&watermark,
   return builder.finish ();
 }
 
+// See the comment above the declaration.
+use_array
+rtl_ssa::remove_uses_of_def (obstack_watermark &watermark, use_array uses,
+def_info *def)
+{
+  access_array_builder uses_builder (watermark);
+  uses_builder.reserve (uses.size ());
+  for (use_info *use : uses)
+if (use->def () != def)
+  uses_builder.quick_push (use);
+  return use_array (uses_builder.finish ());
+}
+
 // See the comment above the declaration.
 access_array
 rtl_ssa::remove_note_accesses_base (obstack_watermark &watermark,
@@ -1587,6 +1600,56 @@ rtl_ssa::remove_note_accesses_base (obstack_watermark 
&watermark,
   return accesses;
 }
 
+// See the comment above the declaration.
+bool
+rtl_ssa::accesses_reference_sa

Re: [PATCH] recog/reload: Remove old UNARY_P operand support

2023-10-24 Thread Jeff Law




On 10/24/23 04:14, Richard Sandiford wrote:

reload and constrain_operands had some old code to look through unary
operators.  E.g. an operand could be (sign_extend (reg X)), and the
constraints would match the reg rather than the sign_extend. >
This was previously used by the MIPS port.  But relying on it was a
recurring source of problems, so Eric and I removed it in the MIPS
rewrite from ~20 years back.  I don't know of any other port that used it.
I can't remember if other ports used this or not.  The most likely 
scenario would be a port from the mid/late 90s that started as a 32bit 
port and was extended to a 64bit port and has similar sign extension 
properties as MIPS.



PPC, sparc and s390 come immediately to mind.  I just checked their 
predicates.md files and they don't see to have a predicate which would 
trigger this old code, even if they were reload targets.




Also, the constraints processing in LRA and IRA do not have direct
support for these embedded operators, so I think it was only ever a
reload-specific feature (and probably only a global/local+reload-specific
feature, rather than IRA+reload).
It was definitely specific to the old register allocator+reload 
implementation.  It pre-dates the introduction of IRA by many years.





Richard


gcc/
* recog.cc (constrain_operands): Remove

OK
jeff



Re: [PATCH] gcov-io.h: fix comment regarding length of records

2023-10-24 Thread Jeff Law




On 10/24/23 06:41, Jose E. Marchesi wrote:


The length of gcov records is stored as a signed 32-bit number of bytes.
Ok?

OK.
jeff


Re: [PATCH] testsuite: Fix _BitInt in gcc.misc-tests/godump-1.c

2023-10-24 Thread Jeff Law




On 10/24/23 09:26, Stefan Schulze Frielinghaus wrote:

Currently _BitInt is only supported on x86_64 which means that for other
targets all tests fail with e.g.

gcc.misc-tests/godump-1.c:237:1: sorry, unimplemented: '_BitInt(32)' is not 
supported on this target
   237 | _BitInt(32) b32_v;
   | ^~~

Instead of requiring _BitInt support for godump-1.c, move _BitInt tests
into godump-2.c such that all other tests in godump-1.c are still
executed in case of missing _BitInt support.

Tested on s390x and x86_64.  Ok for mainline?

gcc/testsuite/ChangeLog:

* gcc.misc-tests/godump-1.c: Move _BitInt tests into godump-2.c.
* gcc.misc-tests/godump-2.c: New test.

OK
jeff


Re: [PATCH V14 4/4] ree: Improve ree pass using defined abi interfaces

2023-10-24 Thread Vineet Gupta




On 10/24/23 10:03, Ajit Agarwal wrote:

Hello Vineet, Jeff and Bernhard:

This version 14 of the patch uses abi interfaces to remove zero and sign 
extension elimination.
This fixes aarch64 regressions failures with aggressive CSE.


Once again, this information belong between the two "---" lines that you 
added for v6 and stopped updating.


And it seems the only code difference between v13 and v14 is

-  return tgt_mode == mode;
+  if (tgt_mode == mode)
+    return true;
+  else
+    return false;

How does that make any difference ?

-Vineet



Bootstrapped and regtested on powerpc-linux-gnu.

In this version (version 14) of the patch following review comments are 
incorporated.

a) Removal of hard code zero_extend and sign_extend  in abi interfaces.
b) Source and destination with different registers are considered.
c) Further enhancements.
d) Added sign extension elimination using abi interfaces.
d) Addressed remaining review comments from Vineet.
e) Addressed review comments from Bernhard.
f) Fix aarch64 regressions failure.

Please let me know if there is anything missing in this patch.

Ok for trunk?

Thanks & Regards
Ajit

ree: Improve ree pass using defined abi interfaces

For rs6000 target we see zero and sign extend with missing
definitions. Improved to eliminate such zero and sign extension
using defined ABI interfaces.

2023-10-24  Ajit Kumar Agarwal  

gcc/ChangeLog:

 * ree.cc (combine_reaching_defs): Eliminate zero_extend and sign_extend
 using defined abi interfaces.
 (add_removable_extension): Use of defined abi interfaces for no
 reaching defs.
 (abi_extension_candidate_return_reg_p): New function.
 (abi_extension_candidate_p): New function.
 (abi_extension_candidate_argno_p): New function.
 (abi_handle_regs): New function.
 (abi_target_promote_function_mode): New function.

gcc/testsuite/ChangeLog:

 * g++.target/powerpc/zext-elim-3.C
---
changes since v6:
   - Added missing abi interfaces.
   - Rearranging and restructuring the code.
   - Removal of hard coded zero extend and sign extend in abi interfaces.
   - Relaxed different registers with source and destination in abi interfaces.
   - Using CSE in abi interfaces.
   - Fix aarch64 regressions.
   - Add Sign extension removal in abi interfaces.
   - Modified comments as per coding convention.
   - Modified code as per coding convention.
   - Fix bug bootstrapping RISCV failures.
---
  gcc/ree.cc| 147 +-
  .../g++.target/powerpc/zext-elim-3.C  |  13 ++
  2 files changed, 154 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..f557b49b366 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
  if (REGNO (DF_REF_REG (def)) == REGNO (reg))
break;
  
-  gcc_assert (def != NULL);

+  if (def == NULL)
+return NULL;
  
ref_chain = DF_REF_CHAIN (def);
  
@@ -750,6 +751,120 @@ get_extended_src_reg (rtx src)

return src;
  }
  
+/* Return TRUE if target mode is equal to source mode, false otherwise.  */

+
+static bool
+abi_target_promote_function_mode (machine_mode mode)
+{
+  int unsignedp;
+  machine_mode tgt_mode
+= targetm.calls.promote_function_mode (NULL_TREE, mode, &unsignedp,
+  NULL_TREE, 1);
+
+  if (tgt_mode == mode)
+return true;
+  else
+return false;
+}
+
+/* Return TRUE if regno is a return register.  */
+
+static inline bool
+abi_extension_candidate_return_reg_p (int regno)
+{
+  if (targetm.calls.function_value_regno_p (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the following conditions are satisfied.
+
+  a) reg source operand is argument register and not return register.
+  b) mode of source and destination operand are different.
+  c) if not promoted REGNO of source and destination operand are same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  machine_mode dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set), 0);
+
+  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  || abi_extension_candidate_return_reg_p (REGNO (orig_src)))
+return false;
+
+  /* Return FALSE if mode of destination and source is same.  */
+  if (dst_mode == GET_MODE (orig_src))
+return false;
+
+  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
+  bool promote_p = abi_target_promote_function_mode (mode);
+
+  /* Return FALSE if promote is false and REGNO of source and destination
+ is different.  */
+  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
+return false;
+
+  return true;
+}
+
+/* Return TRUE if regno is an argument register.  */
+
+static inline bool
+abi_extension_candidate_argno_p (int regno)
+{
+  return FUNCTION_ARG_REGNO_P (regno);
+}
+
+

Re: [PATCH 1/3] rtl-ssa: Use frequency-weighted insn costs

2023-10-24 Thread Jeff Law




On 10/24/23 11:58, Richard Sandiford wrote:

rtl_ssa::changes_are_worthwhile used the standard approach
of summing up the individual costs of the old and new sequences
to see which one is better overall.  But when optimising for
speed and changing instructions in multiple blocks, it seems
better to weight the cost of each instruction by its execution
frequency.  (We already do something similar for SLP layouts.)

gcc/
* rtl-ssa/changes.cc: Include sreal.h.
(rtl_ssa::changes_are_worthwhile): When optimizing for speed,
scale the cost of each instruction by its execution frequency.

Agreed that it seems better.  OK.

Jeff


Re: [PATCH 2/3] rtl-ssa: Extend make_uses_available

2023-10-24 Thread Jeff Law




On 10/24/23 11:58, Richard Sandiford wrote:

The first in-tree use of RTL-SSA was fwprop, and one of the goals
was to make the fwprop rewrite preserve the old behaviour as far
as possible.  The switch to RTL-SSA was supposed to be a pure
infrastructure change.  So RTL-SSA has various FIXMEs for things
that were artifically limited to faciliate the old-fwprop vs.
new-fwprop comparison.

One of the things that fwprop wants to do is extend live ranges, and
function_info::make_use_available tried to keep within the cases that
old fwprop could handle.

Since the information is built in extended basic blocks, it's easy
to handle intra-EBB queries directly.  This patch does that, and
removes the associated FIXME.

To get a flavour for how much difference this makes, I tried compiling
the testsuite at -Os for at least one target per supported CPU and OS.
For most targets, only a handful of tests changed, but the vast majority
of changes were positive.  The only target that seemed to benefit
significantly was i686-apple-darwin.

The main point of the patch is to remove the FIXME and to enable
the upcoming post-RA late-combine pass to handle more cases.

gcc/
* rtl-ssa/functions.h (function_info::remains_available_at_insn):
New member function.
* rtl-ssa/accesses.cc (function_info::remains_available_at_insn):
Likewise.
(function_info::make_use_available): Avoid false negatives for
queries within an EBB.

OK
jeff


Re: [PATCH 3/3] rtl-ssa: Add new helper functions

2023-10-24 Thread Jeff Law




On 10/24/23 11:58, Richard Sandiford wrote:

This patch adds some RTL-SSA helper functions.  They will be
used by the upcoming late-combine pass.

The patch contains the first non-template out-of-line function declared
in movement.h, so it adds a movement.cc.  I realise it seems a bit
over-the-top to have a file with just one function, but it might grow
in future. :)

gcc/
* Makefile.in (OBJS): Add rtl-ssa/movement.o.
* rtl-ssa/access-utils.h (accesses_include_nonfixed_hard_registers)
(single_set_info): New functions.
(remove_uses_of_def, accesses_reference_same_resource): Declare.
(insn_clobbers_resources): Likewise.
* rtl-ssa/accesses.cc (rtl_ssa::remove_uses_of_def): New function.
(rtl_ssa::accesses_reference_same_resource): Likewise.
(rtl_ssa::insn_clobbers_resources): Likewise.
* rtl-ssa/movement.h (can_move_insn_p): Declare.
* rtl-ssa/movement.cc: New file.
I assumed that you'll end up with more code in there, so I'm certainly 
OK with having just one function in the file right now.


OK for the trunk.

jeff


[PATCH] Add a late-combine pass [PR106594]

2023-10-24 Thread Richard Sandiford
This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.

The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.  I hope it would
also help with Robin's vec_duplicate testcase, although the
pressure heuristic might need tweaking for that case.

This is just a first step..  I'm hoping that the pass could be
used for other combine-related optimisations in future.  In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutitable, since it doesn't have to worry about register
pressure.  If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.

I've run an assembly comparison with one target per CPU directory,
and it seems to be a win for all targets except nvptx (which is hard
to measure, being a higher-level asm).  The biggest winner seemed
to be AVR.

I'd originally hoped to enable the pass by default at -O2 and above
on all targets.  But in the end, I don't think that's possible,
because it interacts badly with x86's STV and partial register
dependency passes.

For example, gcc.target/i386/minmax-6.c tests whether the code
compiles without any spilling.  The RTL created by STV contains:

(insn 33 31 3 2 (set (subreg:V4SI (reg:SI 120) 0)
(vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 116))
(const_vector:V4SI [
(const_int 0 [0]) repeated x4
])
(const_int 1 [0x1]))) -1
 (nil))
(insn 3 33 34 2 (set (subreg:V4SI (reg:SI 118) 0)
(subreg:V4SI (reg:SI 120) 0)) {movv4si_internal}
 (expr_list:REG_DEAD (reg:SI 120)
(nil)))
(insn 34 3 32 2 (set (reg/v:SI 108 [ y ])
(reg:SI 118)) -1
 (nil))

and it's crucial for the test that reg 108 is kept, rather than
propagated into uses.  As things stand, 118 can be allocated
a vector register and 108 a scalar register.  If 108 is propagated,
there will be scalar and vector uses of 118, and so it will be
spilled to memory.

That one could be solved by running STV2 later.  But RPAD is
a bigger problem.  In gcc.target/i386/pr87007-5.c, RPAD converts:

(insn 27 26 28 6 (set (reg:DF 100 [ _15 ])
(sqrt:DF (mem/c:DF (symbol_ref:DI ("d2") {*sqrtdf2_sse}
 (nil))

into:

(insn 45 26 44 6 (set (reg:V4SF 108)
(const_vector:V4SF [
(const_double:SF 0.0 [0x0.0p+0]) repeated x4
])) -1
 (nil))
(insn 44 45 27 6 (set (reg:V2DF 109)
(vec_merge:V2DF (vec_duplicate:V2DF (sqrt:DF (mem/c:DF (symbol_ref:DI 
("d2")
(subreg:V2DF (reg:V4SF 108) 0)
(const_int 1 [0x1]))) -1
 (nil))
(insn 27 44 28 6 (set (reg:DF 100 [ _15 ])
(subreg:DF (reg:V2DF 109) 0)) {*movdf_internal}
 (nil))

But both the pre-RA and post-RA passes are able to combine these
instructions back to the original form.

The patch therefore enables the pass by default only on AArch64.
However, I did test the patch with it enabled on x86_64-linux-gnu
as well, which was useful for debugging.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu (as posted, with no regressions, and with the
pass enabled by default, with some gcc.target/i386 regressions).
OK to install?

Richard


gcc/
PR rtl-optimization/106594
* Makefile.in (OBJS): Add late-combine.o.
* common.opt (flate-combine-instructions): New option.
* doc/invoke.texi: Document it.
* common/config/aarch64/aarch64-common.cc: Enable it by default
at -O2 and above.
* tree-pass.h (make_pass_late_combine): Declare.
* late-combine.cc: New file.
* passes.def: Add two instances of late_combine.

gcc/testsuite/
PR rtl-optimization/106594
* gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
targets.
* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
* gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
* gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
* gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
* gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
described in the comment.
* gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
* gcc.target/aarch64/pr106594_1.c: New test.
---
 gcc/Makefile.in   |   1 +
 gcc/common.opt|   5 +
 gcc/common/config/aarch64/aarch64-common.cc   | 

Re: [PATCH v3] gcc: Introduce -fhardened

2023-10-24 Thread Marek Polacek
On Tue, Oct 24, 2023 at 10:34:22AM +0100, Iain Sandoe wrote:
> hi Marek,
> 
> > On 24 Oct 2023, at 08:44, Iain Sandoe  wrote:
> > On 23 Oct 2023, at 20:25, Marek Polacek  wrote:
> >> 
> >> On Thu, Oct 19, 2023 at 02:24:11PM +0200, Richard Biener wrote:
> >>> On Wed, Oct 11, 2023 at 10:48 PM Marek Polacek  wrote:
>  
>  On Tue, Sep 19, 2023 at 10:58:19AM -0400, Marek Polacek wrote:
> > On Mon, Sep 18, 2023 at 08:57:39AM +0200, Richard Biener wrote:
> >> On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
> >>  wrote:
> >> 
> > 
> >> and I tried Darwin (104) and that fails with
> >> 
> >> *** Configuration aarch64-apple-darwin21.6.0 not supported
> >> 
> >> Is anyone else able to build gcc on those machines, or test the attached
> >> patch?
> > 
> > We’re still working on upstreaming the aarch64 Darwin port - the devt. 
> > branch
> > is here; https://github.com/iains/gcc-darwin-arm64 (but it will be rebased 
> > soon
> > because we just upstreamed some dependencies).
> > 
> > In the meantime, I will put your patch into my test queue - hopefully before
> > next week.
> 
> actually, I rebased already .. (but not pushed yet, pending testing).
> 
> aarch64-darwin21 bootstrapped fine with your patch (as did x86_64-darwin19)

Thank you so much Iain!
 
> ===
> 
> $ /opt/iains/aarch64-apple-darwin21/gcc-14-0-0/bin/gcc /source/test/hello.c 
> -o hc -fhardened -Whardened
> cc1: warning: ‘_FORTIFY_SOURCE’ is not enabled by ‘-fhardened’ because 
> optimizations are turned off [-Whardened]
> 
> $ /opt/iains/aarch64-apple-darwin21/gcc-14-0-0/bin/gcc /source/test/hello.c 
> -o hc -fhardened -Whardened -O
> 

That looks correct.
 
> I’m about to run the testsuite, but if there’s something else to be tested 
> please let me know (NOTE: I have not read the patch, just applied it and 
> built).

I am mostly curious about the fhardened* tests, if they all pass.

Thanks,
Marek



Re: [PATCH v3] gcc: Introduce -fhardened

2023-10-24 Thread Marek Polacek
On Tue, Oct 24, 2023 at 09:22:25AM +0200, Richard Biener wrote:
> On Mon, Oct 23, 2023 at 9:26 PM Marek Polacek  wrote:
> >
> > On Thu, Oct 19, 2023 at 02:24:11PM +0200, Richard Biener wrote:
> > > On Wed, Oct 11, 2023 at 10:48 PM Marek Polacek  wrote:
> > > >
> > > > On Tue, Sep 19, 2023 at 10:58:19AM -0400, Marek Polacek wrote:
> > > > > On Mon, Sep 18, 2023 at 08:57:39AM +0200, Richard Biener wrote:
> > > > > > On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
> > > > > >  wrote:
> > > > > > >
> > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, 
> > > > > > > powerpc64le-unknown-linux-gnu,
> > > > > > > and aarch64-unknown-linux-gnu; ok for trunk?
> > > > > > >
> > > > > > > -- >8 --
> > > > > > > In 
> > > > > > > 
> > > > > > > I proposed -fhardened, a new umbrella option that enables a 
> > > > > > > reasonable set
> > > > > > > of hardening flags.  The read of the room seems to be that the 
> > > > > > > option
> > > > > > > would be useful.  So here's a patch implementing that option.
> > > > > > >
> > > > > > > Currently, -fhardened enables:
> > > > > > >
> > > > > > >   -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
> > > > > > >   -D_GLIBCXX_ASSERTIONS
> > > > > > >   -ftrivial-auto-var-init=pattern
> > >
> > > I think =zero is much better here given the overhead is way
> > > cheaper and pointers get a more reliable behavior.
> >
> > Ok, changed now.
> >
> > > > > > >   -fPIE  -pie  -Wl,-z,relro,-z,now
> > > > > > >   -fstack-protector-strong
> > > > > > >   -fstack-clash-protection
> > > > > > >   -fcf-protection=full (x86 GNU/Linux only)
> > > > > > >
> > > > > > > -fhardened will not override options that were specified on the 
> > > > > > > command line
> > > > > > > (before or after -fhardened).  For example,
> > > > > > >
> > > > > > >  -D_FORTIFY_SOURCE=1 -fhardened
> > > > > > >
> > > > > > > means that _FORTIFY_SOURCE=1 will be used.  Similarly,
> > > > > > >
> > > > > > >   -fhardened -fstack-protector
> > > > > > >
> > > > > > > will not enable -fstack-protector-strong.
> > > > > > >
> > > > > > > In DW_AT_producer it is reflected only as -fhardened; it doesn't 
> > > > > > > expand
> > > > > > > to anything.  I think we need a better way to show what it 
> > > > > > > actually
> > > > > > > enables.
> > > > > >
> > > > > > I do think we need to find a solution here to solve asserting 
> > > > > > compliance.
> > > > >
> > > > > Fair enough.
> > > > >
> > > > > > Maybe we can have -Whardened that will diagnose any altering of
> > > > > > -fhardened by other options on the command-line or by missed target
> > > > > > implementations?  People might for example use -fstack-protector
> > > > > > but don't really want to make protection lower than requested with 
> > > > > > -fhardened.
> > > > > >
> > > > > > Any such conflict is much less appearant than when you use the
> > > > > > flags -fhardened composes.
> > > > >
> > > > > How about: --help=hardened says which options -fhardened attempts to
> > > > > enable, and -Whardened warns when it didn't enable an option?  E.g.,
> > > > >
> > > > >   -fstack-protector -fhardened -Whardened
> > > > >
> > > > > would say that it didn't enable -fstack-protector-strong because
> > > > > -fstack-protector was specified on the command line?
> > > > >
> > > > > If !HAVE_LD_NOW_SUPPORT, --help=hardened probably doesn't even have to
> > > > > list -z now, likewise for -z relro.
> > > > >
> > > > > Unclear if -Whardened should be enabled by default, but probably yes?
> > > >
> > > > Here's v2 which adds -Whardened (enabled by default).
> > > >
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > >
> > > I think it's OK but I'd like to see a second ACK here.
> >
> > Thanks!
> >
> > > Can you see how our
> > > primary and secondary targets (+ host OS) behave here?
> >
> > That's very reasonable.  I tried to build gcc on Compile Farm 119 (AIX) but
> > that fails with:
> >
> > ar  -X64 x ../ppc64/libgcc/libgcc_s.a shr.o
> > ar: 0707-100 ../ppc64/libgcc/libgcc_s.a does not exist.
> > make[2]: *** [/home/polacek/gcc/libgcc/config/rs6000/t-slibgcc-aix:98: all] 
> > Error 1
> > make[2]: Leaving directory 
> > '/home/polacek/x/trunk/powerpc-ibm-aix7.3.1.0/libgcc'
> >
> > and I tried Darwin (104) and that fails with
> >
> > *** Configuration aarch64-apple-darwin21.6.0 not supported
> >
> > Is anyone else able to build gcc on those machines, or test the attached
> > patch?
> >
> > > I think the
> > > documentation should elaborate a bit on expectations for non-Linux/GNU
> > > targets, specifically I think the default configuration for a target 
> > > should
> > > with -fhardened _not_ have any -Whardened diagnostics.  Maybe we can
> > > have a testcase for this?
> >
> > Sorry, I'm not sure how to test that.  I suppose if -fhardened enables
> > something not supported on those systems, and it's something for which
> > we have a configure test, then we s

Re: [PATCH] gcov-io.h: fix comment regarding length of records

2023-10-24 Thread Jose E. Marchesi


> On 10/24/23 06:41, Jose E. Marchesi wrote:
>> The length of gcov records is stored as a signed 32-bit number of
>> bytes.
>> Ok?
> OK.

Pushed.  Thanks.


Re: [PATCH v3] gcc: Introduce -fhardened

2023-10-24 Thread Iain Sandoe



> On 24 Oct 2023, at 20:03, Marek Polacek  wrote:
> 
> On Tue, Oct 24, 2023 at 10:34:22AM +0100, Iain Sandoe wrote:
>> hi Marek,
>> 
>>> On 24 Oct 2023, at 08:44, Iain Sandoe  wrote:
>>> On 23 Oct 2023, at 20:25, Marek Polacek  wrote:
 
 On Thu, Oct 19, 2023 at 02:24:11PM +0200, Richard Biener wrote:
> On Wed, Oct 11, 2023 at 10:48 PM Marek Polacek  wrote:
>> 
>> On Tue, Sep 19, 2023 at 10:58:19AM -0400, Marek Polacek wrote:
>>> On Mon, Sep 18, 2023 at 08:57:39AM +0200, Richard Biener wrote:
 On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
  wrote:
 
>>> 
 and I tried Darwin (104) and that fails with
 
 *** Configuration aarch64-apple-darwin21.6.0 not supported
 
 Is anyone else able to build gcc on those machines, or test the attached
 patch?
>>> 
>>> We’re still working on upstreaming the aarch64 Darwin port - the devt. 
>>> branch
>>> is here; https://github.com/iains/gcc-darwin-arm64 (but it will be rebased 
>>> soon
>>> because we just upstreamed some dependencies).
>>> 
>>> In the meantime, I will put your patch into my test queue - hopefully before
>>> next week.
>> 
>> actually, I rebased already .. (but not pushed yet, pending testing).
>> 
>> aarch64-darwin21 bootstrapped fine with your patch (as did x86_64-darwin19)
> 
> Thank you so much Iain!
> 
>> ===
>> 
>> $ /opt/iains/aarch64-apple-darwin21/gcc-14-0-0/bin/gcc /source/test/hello.c 
>> -o hc -fhardened -Whardened
>> cc1: warning: ‘_FORTIFY_SOURCE’ is not enabled by ‘-fhardened’ because 
>> optimizations are turned off [-Whardened]
>> 
>> $ /opt/iains/aarch64-apple-darwin21/gcc-14-0-0/bin/gcc /source/test/hello.c 
>> -o hc -fhardened -Whardened -O
>> 
> 
> That looks correct.
> 
>> I’m about to run the testsuite, but if there’s something else to be tested 
>> please let me know (NOTE: I have not read the patch, just applied it and 
>> built).
> 
> I am mostly curious about the fhardened* tests, if they all pass.

No, some that require __PIE__=2 fail.

That is because Darwin has to handle PIE and PIC locally, because the way in 
which those options interact is different from Linux.  I need to amend Darwin’s 
handling to work together with fhardening on platform versions tor which that’s 
relevant (but do not expect that to be too tricky).

For aarch64-darwin, PIE is mandatory so we had not even been considering it [we 
basically ignore the flag, because all it does is to create tool warnings] (I 
need to fix the ouput of the pp define tho).

For all x86_64  Darwin>=20 warns about no-PIE so we are also defaulting it on 
there.

Of course, none of this should affect these tests (it just means that 
fhardening will be a NOP for PIE on later Darwin).

I’ll look into these changes over the next few days, if I have a chance, in any 
case, they do not need to be relevant to your patch.

Iain

> 
> Thanks,
> Marek



[PATCH] Fortran/OpenMP: event handle in task detach cannot be a coarray [PR104131]

2023-10-24 Thread Harald Anlauf
Dear all,

the attached simple patch adds a forgotten check that an event handle
cannot be a coarray.  This case appears to have been overlooked in the
original fix for this PR.

I intend to commit as obvious within 24h unless there are comments.

Thanks,
Harald

From 2b5ed32cacfe84dc4df74b4dccf16ac830d9eb98 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 24 Oct 2023 21:18:02 +0200
Subject: [PATCH] Fortran/OpenMP: event handle in task detach cannot be a
 coarray [PR104131]

gcc/fortran/ChangeLog:

	PR fortran/104131
	* openmp.cc (resolve_omp_clauses): Add check that event handle is
	not a coarray.

gcc/testsuite/ChangeLog:

	PR fortran/104131
	* gfortran.dg/gomp/pr104131-2.f90: New test.
---
 gcc/fortran/openmp.cc |  3 +++
 gcc/testsuite/gfortran.dg/gomp/pr104131-2.f90 | 12 
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/pr104131-2.f90

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 1cc65d7fa49..08081dacde4 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -8967,6 +8967,9 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
   else if (omp_clauses->detach->symtree->n.sym->attr.dimension > 0)
 	gfc_error ("The event handle at %L must not be an array element",
 		   &omp_clauses->detach->where);
+  else if (omp_clauses->detach->symtree->n.sym->attr.codimension)
+	gfc_error ("The event handle at %L must not be a coarray",
+		   &omp_clauses->detach->where);
   else if (omp_clauses->detach->symtree->n.sym->ts.type == BT_DERIVED
 	   || omp_clauses->detach->symtree->n.sym->ts.type == BT_CLASS)
 	gfc_error ("The event handle at %L must not be part of "
diff --git a/gcc/testsuite/gfortran.dg/gomp/pr104131-2.f90 b/gcc/testsuite/gfortran.dg/gomp/pr104131-2.f90
new file mode 100644
index 000..3978a6ac31a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/pr104131-2.f90
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! { dg-options "-fopenmp -fcoarray=single" }
+! PR fortran/104131 - event handle cannot be a coarray
+
+program p
+  use iso_c_binding, only: c_intptr_t
+  implicit none
+  integer, parameter :: omp_event_handle_kind = c_intptr_t
+  integer (kind=omp_event_handle_kind) :: x[*]
+!$omp task detach (x) ! { dg-error "The event handle at \\\(1\\\) must not be a coarray" }
+!$omp end task
+end
--
2.35.3



Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-24 Thread Robin Dapp
Changed as suggested.  The difference to v5 is thus:

+ if (cond_fn_p)
+   {
+ gcall *call = dyn_cast (use_stmt);
+ unsigned else_pos
+   = internal_fn_else_index (internal_fn (op.code));
+
+ for (unsigned int j = 0; j < gimple_call_num_args (call); ++j)
+   {
+ if (j == else_pos)
+   continue;
+ if (gimple_call_arg (call, j) == op.ops[opi])
+   cnt++;
+   }
+   }
+ else if (!is_gimple_debug (op_use_stmt)

as well as internal_fn_else_index.

Testsuite on riscv is unchanged, bootstrap and testsuite on power10 done,
aarch64 and x86 still running.

Regards
 Robin

>From e11ac2b5889558c58ce711d8119ebcd78173ac6c Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Wed, 13 Sep 2023 22:19:35 +0200
Subject: [PATCH v6] ifcvt/vect: Emit COND_OP for conditional scalar reduction.

As described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions.  This makes it difficult to combine both
into a masked reduction statement later.
This patch improves that by directly emitting a COND_ADD/COND_OP during
ifcvt and adjusting some vectorizer code to handle it.

It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
is true.

gcc/ChangeLog:

PR middle-end/111401
* internal-fn.cc (internal_fn_else_index): New function.
* internal-fn.h (internal_fn_else_index): Define.
* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
if supported.
(predicate_scalar_phi): Add whitespace.
* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
(neutral_op_for_reduction): Return -0 for PLUS.
(check_reduction_path): Don't count else operand in COND_OP.
(vect_is_simple_reduction): Ditto.
(vect_create_epilog_for_reduction): Fix whitespace.
(vectorize_fold_left_reduction): Add COND_OP handling.
(vectorizable_reduction): Don't count else operand in COND_OP.
(vect_transform_reduction): Add COND_OP handling.
* tree-vectorizer.h (neutral_op_for_reduction): Add default
parameter.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
---
 gcc/internal-fn.cc|  58 ++
 gcc/internal-fn.h |   1 +
 .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 +
 .../riscv/rvv/autovec/cond/pr111401.c | 139 +
 .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
 .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
 gcc/tree-if-conv.cc   |  49 +++--
 gcc/tree-vect-loop.cc | 193 ++
 gcc/tree-vectorizer.h |   2 +-
 9 files changed, 536 insertions(+), 55 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 61d5a9e4772..018175261b9 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4697,6 +4697,64 @@ internal_fn_len_index (internal_fn fn)
 }
 }
 
+int
+internal_fn_else_index (internal_fn fn)
+{
+  switch (fn)
+{
+case IFN_COND_NEG:
+case IFN_COND_NOT:
+case IFN_COND_LEN_NEG:
+case IFN_COND_LEN_NOT:
+  return 2;
+
+case IFN_COND_ADD:
+case IFN_COND_SUB:
+case IFN_COND_MUL:
+case IFN_COND_DIV:
+case IFN_COND_MOD:
+case IFN_COND_MIN:
+case IFN_COND_MAX:
+case IFN_COND_FMIN:
+case IFN_COND_FMAX:
+case IFN_COND_AND:
+case IFN_COND_IOR:
+case IFN_COND_XOR:
+case IFN_COND_SHL:
+case IFN_COND_SHR:
+case IFN_COND_LEN_ADD:
+case IFN_COND_LEN_SUB:
+case IFN_COND_LEN_MUL:
+case IFN_COND_LEN_DIV:
+case IFN_COND_LEN_MOD:
+case IFN_COND_LEN_MIN:
+case IFN_COND_LEN_MAX:
+case IFN_COND_LEN_FMIN:
+case IFN_COND_LEN_FMAX:
+case IFN_COND_LEN_AND:
+case IFN_COND_LEN_IOR:
+case IFN_COND_LEN_XOR:
+case IFN_COND_LEN_SHL:
+case IFN_COND_LEN_SHR:
+  return 3;
+
+case IFN_COND_FMA:
+case IFN_COND_FMS:
+case IFN_COND_FNMA:
+case IFN_COND_FNMS:
+case IFN_COND_LEN_FMA:
+case IFN_COND_LEN_FMS:
+case IFN_COND_LEN_FNMA:
+case IFN_COND_LEN_FNMS:
+  return 4;
+
+default:
+  return -1;
+}
+
+  return -1;
+}
+
 /* If FN takes a vector mask argument, return the index of that argument,
otherwise return -1.  */
 
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 99de13a0199..7d72f4db2d0 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -2

Re: [PATCH] Fortran/OpenMP: event handle in task detach cannot be a coarray [PR104131]

2023-10-24 Thread rep . dot . nop
On 24 October 2023 21:25:01 CEST, Harald Anlauf  wrote:
>Dear all,
>
>the attached simple patch adds a forgotten check that an event handle
>cannot be a coarray.  This case appears to have been overlooked in the
>original fix for this PR.
>
>I intend to commit as obvious within 24h unless there are comments.

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 1cc65d7fa49..08081dacde4 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -8967,6 +8967,9 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
   else if (omp_clauses->detach->symtree->n.sym->attr.dimension > 0)
gfc_error ("The event handle at %L must not be an array element",
   &omp_clauses->detach->where);
+  else if (omp_clauses->detach->symtree->n.sym->attr.codimension)
+   gfc_error ("The event handle at %L must not be a coarray",

ISTM that we usually do not mention "element" when talking about undue 
(co)array access.

Maybe we want to streamline this specific error message?

LGTM otherwise.
Thanks for your dedication!


+  &omp_clauses->detach->where);
   else if (omp_clauses->detach->symtree->n.sym->ts.type == BT_DERIVED
   || omp_clauses->detach->symtree->n.sym->ts.type == BT_CLASS)
gfc_error ("The event handle at %L must not be part of "



Re: [PATCH] libstdc++ Add cstdarg to freestanding

2023-10-24 Thread Jonathan Wakely
On Sun, 22 Oct 2023 at 21:06, Arsen Arsenović  wrote:

>
> "Paul M. Bendixen"  writes:
>
> > Updated patch, added the requested files, hopefully wrote the commit
> better.
>
> LGTM.  Jonathan?
>

Yup, looks good. I've pushed it to trunk with a tweaked changelog entry.
I'll backport it to gcc-13 soon too.

Thanks, Paul!


Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-24 Thread Qing Zhao
Hi, Sid,

Really appreciate for your example and detailed explanation. Very helpful.
I think that this example is an excellent example to show (almost) all the 
issues we need to consider.

I slightly modified this example to make it to be compilable and run-able, as 
following: 
(but I still cannot make the incorrect reordering or DSE happening, anyway, the 
potential reordering possibility is there…)

  1 #include 
  2 struct A
  3 {
  4  size_t size;
  5  char buf[] __attribute__((counted_by(size)));
  6 };
  7 
  8 static size_t
  9 get_size_from (void *ptr)
 10 {
 11  return __builtin_dynamic_object_size (ptr, 1);
 12 }
 13 
 14 void
 15 foo (size_t sz)
 16 {
 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 18  obj->size = sz;
 19  obj->buf[0] = 2;
 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
 21  return;
 22 }
 23 
 24 int main ()
 25 {
 26  foo (20);
 27  return 0;
 28 }

With my GCC, it was compiled and worked:
[opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
[opc@qinzhao-ol8u3-x86 ]$ ./a.out
20
Situation 1: With O1 and above, the routine “get_size_from” was inlined into 
“foo”, therefore, the call to __bdos is in the same routine as the 
instantiation of the object, and the TYPE information and the attached 
counted_by attribute information in the TYPE of the object can be USED by the 
__bdos call to compute the final object size. 

[opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
[opc@qinzhao-ol8u3-x86 ]$ ./a.out
-1
Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, 
therefore, the call to __bdos is Not in the same routine as the instantiation 
of the object, As a result, the TYPE info and the attached counted_by info of 
the object can NOT be USED by the __bdos call. 

Keep in mind of the above 2 situations, we will refer them in below:

1. First,  the problem we are trying to resolve is:

(Your description):

>  the reordering of __bdos w.r.t. initialization of the size parameter but to 
> also account for DSE of the assignment, we can abstract this problem to that 
> of DFA being unable to see implicit use of the size parameter in the __bdos 
> call.

basically is correct.  However, with the following exception:

The implicit use of the size parameter in the __bdos call is not always there, 
it ONLY exists WHEN the __bdos is able to evaluated to an expression of the 
size parameter in the “objsz” phase, i.e., the “Situation 1” of the above 
example. 
 In the “Situation 2”, when the __bdos does not see the TYPE of the real 
object,  it does not see the counted_by information from the TYPE, therefore,  
it is not able to evaluate the size of the object through the counted_by 
information.  As a result, the implicit use of the size parameter in the __bdos 
call does NOT exist at all.  The optimizer can freely reorder the 
initialization of the size parameter with the __bdos call since there is no 
data flow dependency between these two. 

With this exception in mind, we can see that your proposed “option 2” (making 
the type of size “volatile”) is too conservative, it will  disable many 
optimizations  unnecessarily, even though it’s safe and simple to implement. 

As a compiler optimization person for many many years, I really don’t want to 
take this approach at this moment.  -:)

2. Some facts I’d like to mention:

A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
optimization stage. During RTL stage,  the __bdos call has already been 
replaced by an expression of the size parameter or a constant, the data 
dependency is explicitly in the IR already.  I believe that the data analysis 
in RTL stage should pick up the data dependency correctly, No special handling 
is needed in RTL.

B. If the __bdos call cannot see the real object , it has no way to get the 
“counted_by” field from the TYPE of the real object. So, if we try to add the 
implicit use of the “counted_by” field to the __bdos call, the object 
instantiation should be in the same routine as the __bdos call.  Both the FE 
and the gimplification phase are too early to do this work. 

2. Then, what’s the best approach to resolve this problem:

There were several suggestions so far:

A.  Add an additional argument, the size parameter,  to __bdos, 
  A.1, during FE;
  A.2, during gimplification phase;
B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
C.  Encode the implicit USE  in the type of buf, then update the optimization 
passes to use this implicit USE encoded in the type of buf.

As I explained in the above, 
** Approach A (both A.1 and A.2) does not work;
** Approach B will have big performance impact, I’d prefer not to take this 
approach at this moment.
** Approach C will be a lot of change in GCC, and also not very necessary since 
the ONLY implicit use of the size parameter is in the __bdos call when __bdos 
can see the real object.

So, all the above pro

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-24 Thread rep . dot . nop
On 24 October 2023 09:36:22 CEST, Ajit Agarwal  wrote:
>Hello Bernhard:
>
>On 23/10/23 7:40 pm, Bernhard Reutner-Fischer wrote:
>> On Mon, 23 Oct 2023 12:16:18 +0530
>> Ajit Agarwal  wrote:
>> 
>>> Hello All:
>>>
>>> Addressed below review comments in the version 11 of the patch.
>>> Please review and please let me know if its ok for trunk.
>> 
>> s/satisified/satisfied/
>> 
>
>I will fix this.

thanks!

>
 As said, I don't see why the below was not cleaned up before the V1 
 submission.
 Iff it breaks when manually CSEing, I'm curious why?
>> 
>> The function below looks identical in v12 of the patch.
>> Why didn't you use common subexpressions?
>> ba
>
>Using CSE here breaks aarch64 regressions hence I have reverted it back 
>not to use CSE,

Just for my own education, can you please paste your patch perusing common 
subexpressions and an assembly diff of the failing versus working aarch64 
testcase, along how you configured that failing (cross-?)compiler and the 
command-line of a typical testcase that broke when manually CSEing the function 
below?

I might have not completely understood the subtile intricacies of RTL 
re-entrancy, it seems?

thanks

   
>> +/* Return TRUE if reg source operand of zero_extend is argument 
>> registers
>> +   and not return registers and source and destination operand are same
>> +   and mode of source and destination operand are not same.  */
>> +
>> +static bool
>> +abi_extension_candidate_p (rtx_insn *insn)
>> +{
>> +  rtx set = single_set (insn);
>> +  machine_mode dst_mode = GET_MODE (SET_DEST (set));
>> +  rtx orig_src = XEXP (SET_SRC (set), 0);
>> +
>> +  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
>> +  || abi_extension_candidate_return_reg_p (/*insn,*/ REGNO 
>> (orig_src)))  
>> +return false;
>> +
>> +  /* Mode of destination and source should be different.  */
>> +  if (dst_mode == GET_MODE (orig_src))
>> +return false;
>> +
>> +  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
>> +  bool promote_p = abi_target_promote_function_mode (mode);
>> +
>> +  /* REGNO of source and destination should be same if not
>> +  promoted.  */
>> +  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
>> +return false;
>> +
>> +  return true;
>> +}
>> +  
>> 
>> 

 As said, please also rephrase the above (and everything else if it 
 obviously looks akin the above).
>> 
>> thanks



Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-24 Thread Martin Uecker
Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
> Hi, Sid,
> 
> Really appreciate for your example and detailed explanation. Very helpful.
> I think that this example is an excellent example to show (almost) all the 
> issues we need to consider.
> 
> I slightly modified this example to make it to be compilable and run-able, as 
> following: 
> (but I still cannot make the incorrect reordering or DSE happening, anyway, 
> the potential reordering possibility is there…)
> 
>   1 #include 
>   2 struct A
>   3 {
>   4  size_t size;
>   5  char buf[] __attribute__((counted_by(size)));
>   6 };
>   7 
>   8 static size_t
>   9 get_size_from (void *ptr)
>  10 {
>  11  return __builtin_dynamic_object_size (ptr, 1);
>  12 }
>  13 
>  14 void
>  15 foo (size_t sz)
>  16 {
>  17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>  18  obj->size = sz;
>  19  obj->buf[0] = 2;
>  20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>  21  return;
>  22 }
>  23 
>  24 int main ()
>  25 {
>  26  foo (20);
>  27  return 0;
>  28 }
> 
> With my GCC, it was compiled and worked:
> [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> 20
> Situation 1: With O1 and above, the routine “get_size_from” was inlined into 
> “foo”, therefore, the call to __bdos is in the same routine as the 
> instantiation of the object, and the TYPE information and the attached 
> counted_by attribute information in the TYPE of the object can be USED by the 
> __bdos call to compute the final object size. 
> 
> [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> -1
> Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, 
> therefore, the call to __bdos is Not in the same routine as the instantiation 
> of the object, As a result, the TYPE info and the attached counted_by info of 
> the object can NOT be USED by the __bdos call. 
> 
> Keep in mind of the above 2 situations, we will refer them in below:
> 
> 1. First,  the problem we are trying to resolve is:
> 
> (Your description):
> 
> >  the reordering of __bdos w.r.t. initialization of the size parameter but 
> > to also account for DSE of the assignment, we can abstract this problem to 
> > that of DFA being unable to see implicit use of the size parameter in the 
> > __bdos call.
> 
> basically is correct.  However, with the following exception:
> 
> The implicit use of the size parameter in the __bdos call is not always 
> there, it ONLY exists WHEN the __bdos is able to evaluated to an expression 
> of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the 
> above example. 
>  In the “Situation 2”, when the __bdos does not see the TYPE of the real 
> object,  it does not see the counted_by information from the TYPE, therefore, 
>  it is not able to evaluate the size of the object through the counted_by 
> information.  As a result, the implicit use of the size parameter in the 
> __bdos call does NOT exist at all.  The optimizer can freely reorder the 
> initialization of the size parameter with the __bdos call since there is no 
> data flow dependency between these two. 
> 
> With this exception in mind, we can see that your proposed “option 2” (making 
> the type of size “volatile”) is too conservative, it will  disable many 
> optimizations  unnecessarily, even though it’s safe and simple to implement. 
> 
> As a compiler optimization person for many many years, I really don’t want to 
> take this approach at this moment.  -:)
> 
> 2. Some facts I’d like to mention:
> 
> A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
> optimization stage. During RTL stage,  the __bdos call has already been 
> replaced by an expression of the size parameter or a constant, the data 
> dependency is explicitly in the IR already.  I believe that the data analysis 
> in RTL stage should pick up the data dependency correctly, No special 
> handling is needed in RTL.
> 
> B. If the __bdos call cannot see the real object , it has no way to get the 
> “counted_by” field from the TYPE of the real object. So, if we try to add the 
> implicit use of the “counted_by” field to the __bdos call, the object 
> instantiation should be in the same routine as the __bdos call.  Both the FE 
> and the gimplification phase are too early to do this work. 
> 
> 2. Then, what’s the best approach to resolve this problem:
> 
> There were several suggestions so far:
> 
> A.  Add an additional argument, the size parameter,  to __bdos, 
>   A.1, during FE;
>   A.2, during gimplification phase;
> B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
> C.  Encode the implicit USE  in the type of buf, then update the optimization 
> passes to use this implicit USE encoded in the type of buf.
> 
> As I explained in the above, 
> ** Approach A (both A.1 and A.2) does not work;
> ** Approa

Re: [PATCH] Fortran/OpenMP: event handle in task detach cannot be a coarray [PR104131]

2023-10-24 Thread Harald Anlauf

Dear all,

Tobias argued in the PR that the testcase should actually be valid.
Therefore withdrawing the patch.

Sorry for expecting this to be a low-hanging fruit...

Harald

On 10/24/23 22:23, rep.dot@gmail.com wrote:

On 24 October 2023 21:25:01 CEST, Harald Anlauf  wrote:

Dear all,

the attached simple patch adds a forgotten check that an event handle
cannot be a coarray.  This case appears to have been overlooked in the
original fix for this PR.

I intend to commit as obvious within 24h unless there are comments.


diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 1cc65d7fa49..08081dacde4 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -8967,6 +8967,9 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
else if (omp_clauses->detach->symtree->n.sym->attr.dimension > 0)
gfc_error ("The event handle at %L must not be an array element",
   &omp_clauses->detach->where);
+  else if (omp_clauses->detach->symtree->n.sym->attr.codimension)
+   gfc_error ("The event handle at %L must not be a coarray",

ISTM that we usually do not mention "element" when talking about undue 
(co)array access.

Maybe we want to streamline this specific error message?

LGTM otherwise.
Thanks for your dedication!


+  &omp_clauses->detach->where);
else if (omp_clauses->detach->symtree->n.sym->ts.type == BT_DERIVED
   || omp_clauses->detach->symtree->n.sym->ts.type == BT_CLASS)
gfc_error ("The event handle at %L must not be part of "






Re: [PATCH] c++: error with bit-fields and scoped enums [PR111895]

2023-10-24 Thread Jason Merrill

On 10/24/23 12:18, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we issue a bogus error: invalid operands of types 'unsigned char:2'
and 'int' to binary 'operator!=' when casting a bit-field of scoped enum
type to bool.

In build_static_cast_1, perform_direct_initialization_if_possible returns
NULL_TREE, because the invented declaration T t(e) fails, which is
correct.  So we go down to ocp_convert, which has code to deal with this
case:
   /* We can't implicitly convert a scoped enum to bool, so convert
  to the underlying type first.  */
   if (SCOPED_ENUM_P (intype) && (convtype & CONV_STATIC))
 e = build_nop (ENUM_UNDERLYING_TYPE (intype), e);
but the SCOPED_ENUM_P is false since intype is .
This could be fixed by using unlowered_expr_type.  But then
c_common_truthvalue_conversion/CASE_CONVERT has a similar problem, and
unlowered_expr_type is a C++-only function.

Rather than adding a dummy unlowered_expr_type to C, I think we should
follow [expr.static.cast]p3: "the lvalue-to-rvalue conversion is applied
to the bit-field and the resulting prvalue is used as the operand of the
static_cast."  There are no prvalue bit-fields, so the l-to-r conversion
will get us an expression whose type is the enum.  (I thought we didn't
need decay_conversion because that does a whole lot more but using it
would make sense to me too.)


It's possible that we might want some of that more, particularly 
mark_rvalue_use; decay_conversion seems like the right answer.  OK with 
that change.


rvalue() would also make sense, though that seems to be missing a call 
to unlowered_expr_type at the moment.  In fact, after "otherwise, it's 
the lvalue-to-rvalue conversion" in decay_conv should probably just be a 
call to rvalue, with missing bits added to the latter function.


Jason



  1   2   >