Re: [PATCH v3 1/7] Improve outgoing integer argument promotion

2024-11-20 Thread H.J. Lu
On Wed, Nov 20, 2024 at 10:05 PM Richard Biener 
wrote:

> On Sun, Nov 10, 2024 at 1:55 PM H.J. Lu  wrote:
> >
> > For targets, like x86, which define TARGET_PROMOTE_PROTOTYPES to return
> > true, all integer arguments smaller than int are passed as int:
> >
> > [hjl@gnu-tgl-3 pr14907]$ cat x.c
> > extern int baz (char c1);
> >
> > int
> > foo (char c1)
> > {
> >   return baz (c1);
> > }
> > [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c
> > [hjl@gnu-tgl-3 pr14907]$ cat x.s
> > .file   "x.c"
> > .text
> > .p2align 4
> > .globl  foo
> > .type   foo, @function
> > foo:
> > .LFB0:
> > .cfi_startproc
> > movsbl  4(%esp), %eax
> > movl%eax, 4(%esp)
> > jmp baz
> > .cfi_endproc
> > .LFE0:
> > .size   foo, .-foo
> > .ident  "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)"
> > .section.note.GNU-stack,"",@progbits
> > [hjl@gnu-tgl-3 pr14907]$
> >
> > But integer promotion:
> >
> > movsbl  4(%esp), %eax
> > movl%eax, 4(%esp)
> >
> > isn't necessary if incoming arguments and outgoing arguments are the
> > same.  Drop targetm.promote_prototypes from C, C++ and Ada frontends
> > and apply targetm.promote_prototypes during RTL call expansion.
>
> I'm only commenting on the RTL expansion bit below (thanks for doing this
> btw)
>
> > gcc/
> >
> > PR middle-end/14907
> > * calls.cc: Include "ssa.h", "tree-ssa-live.h" and
> > "tree-outof-ssa.h".
> > (get_promoted_int_value_from_ssa_name): New function.
> > (get_promoted_int_value): Likewise.
> > (initialize_argument_information): Call get_promoted_int_value
> > to promote integer function argument.
> > * gimple.cc (gimple_builtin_call_types_compatible_p): Remove the
> > targetm.calls.promote_prototypes call.
> > * tree.cc (tree_builtin_call_types_compatible_p): Likewise.
> >
> > gcc/ada/
> >
> > PR middle-end/14907
> > * gcc-interface/utils.cc (create_param_decl): Remove the
> > targetm.calls.promote_prototypes call.
> >
> > gcc/c/
> >
> > PR middle-end/14907
> > * c-decl.cc (start_decl): Remove the
> > targetm.calls.promote_prototypes call.
> > (store_parm_decls_oldstyle): Likewise.
> > (finish_function): Likewise.
> > * c-typeck.cc (convert_argument): Likewise.
> > (c_safe_arg_type_equiv_p): Likewise.
> >
> > gcc/cp/
> >
> > PR middle-end/14907
> > * call.cc (type_passed_as): Remove the
> > targetm.calls.promote_prototypes call.
> > (convert_for_arg_passing): Likewise.
> > * typeck.cc (cxx_safe_arg_type_equiv_p): Likewise.
> >
> > gcc/testsuite/
> >
> > PR middle-end/14907
> > * gcc.target/i386/pr14907-1.c: New test.
> > * gcc.target/i386/pr14907-2.c: Likewise.
> > * gcc.target/i386/pr14907-3.c: Likewise.
> > * gcc.target/i386/pr14907-4.c: Likewise.
> > * gcc.target/i386/pr14907-5.c: Likewise.
> > * gcc.target/i386/pr14907-6.c: Likewise.
> > * gcc.target/i386/pr14907-7.c: Likewise.
> > * gcc.target/i386/pr14907-8.c: Likewise.
> > * gcc.target/i386/pr14907-9.c: Likewise.
> > * gcc.target/i386/pr14907-10.c: Likewise.
> > * gcc.target/i386/pr14907-11.c: Likewise.
> > * gcc.target/i386/pr14907-12.c: Likewise.
> > * gcc.target/i386/pr14907-13.c: Likewise.
> > * gcc.target/i386/pr14907-14.c: Likewise.
> > * gcc.target/i386/pr14907-15.c: Likewise.
> > * gcc.target/i386/pr14907-16.c: Likewise.
> > * gfortran.dg/pr14907-1.f90: Likewise.
> >
> > Signed-off-by: H.J. Lu 
> > ---
> >  gcc/ada/gcc-interface/utils.cc | 24 ---
> >  gcc/c/c-decl.cc| 40 ---
> >  gcc/c/c-typeck.cc  | 19 ++---
> >  gcc/calls.cc   | 81 ++
> >  gcc/cp/call.cc | 10 ---
> >  gcc/cp/typeck.cc   | 13 ++--
> >  gcc/gimple.cc  | 10 +--
> >  gcc/testsuite/gcc.target/i386/pr14907-1.c  | 21 ++
> >  gcc/testsuite/gcc.target/i386/pr14907-10.c | 23 ++
> >  gcc/testsuite/gcc.target/i386/pr14907-11.c | 12 
> >  gcc/testsuite/gcc.target/i386/pr14907-12.c | 17 +
> >  gcc/testsuite/gcc.target/i386/pr14907-13.c | 12 
> >  gcc/testsuite/gcc.target/i386/pr14907-14.c | 17 +
> >  gcc/testsuite/gcc.target/i386/pr14907-15.c | 26 +++
> >  gcc/testsuite/gcc.target/i386/pr14907-16.c | 24 +++
> >  gcc/testsuite/gcc.target/i386/pr14907-2.c  | 21 ++
> >  gcc/testsuite/gcc.target/i386/pr14907-3.c  | 21 ++
> >  gcc/testsuite/gcc.target/i386/pr14907-4.c  | 21 ++
> >  gcc/testsuite/gcc.target/i386/pr14907-5.c  | 21 ++
> >  gcc/testsuite/gcc.target/i386/pr14907-6.c  | 21 ++
> >  gcc/testsuite/gcc.target/i386/pr14907-7.c  

Re: PING Re: [PATCH] testsuite: add print-stack.exp

2024-11-20 Thread Mike Stump
On Nov 18, 2024, at 1:25 PM, David Malcolm  wrote:
> 
> Ping for this testsuite patch; I've occasionally found it *very*
> helpful when debugging DejaGnu.

Ok. Do put a comment that this is for debugging so no one removes it.



Re: testsuite: m68k: Fix tests for C23

2024-11-20 Thread Jeff Law




On 11/19/24 2:17 AM, Andreas Schwab wrote:

* gcc.target/m68k/crash1.c (seq_printf): Add prototype.
* gcc.target/m68k/pr63347.c (oof): Add missing parameter.
OK.  And similar changes elsewhere are pre-approved.  Though I'd like to 
think we're near the point of catching all the fallout at this point.


jeff



Re: [PATCH] sibcall: Adjust BLKmode argument size for alignment padding

2024-11-20 Thread H.J. Lu
On Wed, Nov 20, 2024 at 9:55 PM Richard Sandiford 
wrote:

> "H.J. Lu"  writes:
> > On Wed, Nov 20, 2024 at 2:12 AM Richard Sandiford
> >  wrote:
> >>
> >> "H.J. Lu"  writes:
> >> > Adjust BLKmode argument size for parameter alignment for sibcall
> check.
> >> >
> >> > gcc/
> >> >
> >> > PR middle-end/117098
> >> > * calls.cc (store_one_arg): Adjust BLKmode argument size for
> >> > alignment padding for sibcall check.
> >> >
> >> > gcc/testsuite/
> >> >
> >> > PR middle-end/117098
> >> > * gcc.dg/sibcall-12.c: New test.
> >> >
> >> > OK for master?
> >> >
> >> >
> >> > H.J.
> >> > From 8b0518906cb23a9b5e77b04d6132c49047daebd2 Mon Sep 17 00:00:00 2001
> >> > From: "H.J. Lu" 
> >> > Date: Sun, 13 Oct 2024 04:53:14 +0800
> >> > Subject: [PATCH] sibcall: Adjust BLKmode argument size for alignment
> padding
> >> >
> >> > Adjust BLKmode argument size for parameter alignment for sibcall
> check.
> >> >
> >> > gcc/
> >> >
> >> >   PR middle-end/117098
> >> >   * calls.cc (store_one_arg): Adjust BLKmode argument size for
> >> >   alignment padding for sibcall check.
> >> >
> >> > gcc/testsuite/
> >> >
> >> >   PR middle-end/117098
> >> >   * gcc.dg/sibcall-12.c: New test.
> >> >
> >> > Signed-off-by: H.J. Lu 
> >> > ---
> >> >  gcc/calls.cc  |  4 +++-
> >> >  gcc/testsuite/gcc.dg/sibcall-12.c | 13 +
> >> >  2 files changed, 16 insertions(+), 1 deletion(-)
> >> >  create mode 100644 gcc/testsuite/gcc.dg/sibcall-12.c
> >> >
> >> > diff --git a/gcc/calls.cc b/gcc/calls.cc
> >> > index c5c26f65280..163c7e509d9 100644
> >> > --- a/gcc/calls.cc
> >> > +++ b/gcc/calls.cc
> >> > @@ -5236,7 +5236,9 @@ store_one_arg (struct arg_data *arg, rtx
> argblock, int flags,
> >> > /* expand_call should ensure this.  */
> >> > gcc_assert (!arg->locate.offset.var
> >> > && arg->locate.size.var == 0);
> >> > -   poly_int64 size_val = rtx_to_poly_int64 (size_rtx);
> >> > +   /* Adjust for argument alignment padding.  */
> >> > +   poly_int64 size_val = ROUND_UP (UINTVAL (size_rtx),
> >> > +   parm_align /
> BITS_PER_UNIT);
> >>
> >> This doesn't look right to me.  For one thing, going from
> >> rtx_to_poly_int64 to UINTVAL drops support for non-constant parameters.
> >> But even ignoring that, I think padding size_val (the size of arg->value
> >> IIUC) will pessimise the later:
> >>
> >>   else if (maybe_in_range_p (arg->locate.offset.constant,
> >>  i, size_val))
> >> sibcall_failure = true;
> >>
> >> and so cause sibcall failures elsewhere.  I'm also not sure this
> >> accurately reproduces the padding that is added by locate_and_pad_parm
> >> for all cases (arguments that grow up vs down, padding below vs above
> >> the argument).
> >>
> >> AIUI, the point of the:
> >>
> >>   if (known_eq (arg->locate.offset.constant, i))
> >> {
> >>   /* Even though they appear to be at the same location,
> >>  if part of the outgoing argument is in registers,
> >>  they aren't really at the same location.  Check for
> >>  this by making sure that the incoming size is the
> >>  same as the outgoing size.  */
> >>   if (maybe_ne (arg->locate.size.constant, size_val))
> >> sibcall_failure_1 = true;
> >> }
> >
> > Does this
> >
> > diff --git a/gcc/calls.cc b/gcc/calls.cc
> > index 246abe34243..98429cc757f 100644
> > --- a/gcc/calls.cc
> > +++ b/gcc/calls.cc
> > @@ -5327,7 +5327,13 @@ store_one_arg (struct arg_data *arg, rtx
> > argblock, int flags,
> >they aren't really at the same location.  Check for
> >this by making sure that the incoming size is the
> >same as the outgoing size.  */
> > -   if (maybe_ne (arg->locate.size.constant, size_val))
> > +   poly_int64 aligned_size;
> > +   if (CONST_INT_P (size_rtx))
> > + aligned_size = ROUND_UP (UINTVAL (size_rtx),
> > +   parm_align / BITS_PER_UNIT);
> > +   else
> > + aligned_size = size_val;
> > +   if (maybe_ne (arg->locate.size.constant, aligned_size))
> >   sibcall_failure = true;
> >   }
> >  else if (maybe_in_range_p (arg->locate.offset.constant,
> >
> > look correct?
>
> Heh.  Playing the reviewer here, I was kind-of hoping you'd explain
> why it was correct to me :)
>
> But conceptually, the call is copying from arg->value to arg->locate.
> And this code is trying to detect whether the copy is a nop, whether
> it overlaps, or whether the source and destination are distinct.
>
> It feels odd to grow the size of arg->value (the source of the copy),
> given that the extra bytes shouldn't be copied.
>
> Like I mentioned in the previous reply, it feels to me like testing
> partial != 0 would 

[PATCH] testsuite: i386: Fix gcc.target/i386/pr117232-1.c etc. with Solaris as

2024-11-20 Thread Rainer Orth
Two tests FAIL on Solaris/x86 with the native assembler:

FAIL: gcc.target/i386/pr117232-1.c scan-assembler-times (?n)cmovn?c 7
FAIL: gcc.target/i386/pr117232-apx-1.c scan-assembler-times (?n)cmovn?c 7

The problem is that as expects a slightly different insn syntax, e.g.

cmovl.nc%esi, %eax

instead of

cmovnc  %esi, %eax

This patch allows for both forms.

Tested on i386-pc-solaris2.11 (as and gas) and x86_64-pc-linux-gnu.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-11-15  Rainer Orth  

gcc/testsuite:
* gcc.target/i386/pr117232-1.c (scan-assembler-times): Allow for
cmovl.nc etc.
* gcc.target/i386/pr117232-apx-1.c: Likewise.

# HG changeset patch
# Parent  1224ef84cc8de0bdc3083f6a47ed7979908a4afb
testsuite: i386: Fix gcc.target/i386/pr117232-1.c etc. with Solaris as

diff --git a/gcc/testsuite/gcc.target/i386/pr117232-1.c b/gcc/testsuite/gcc.target/i386/pr117232-1.c
--- a/gcc/testsuite/gcc.target/i386/pr117232-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr117232-1.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-mavx512bw -mavx512vl -mavx512dq -O2" } */
 /* { dg-final { scan-assembler-times {(?n)kortest[bwqd]} 7 { target { ! ia32 } } } } */
-/* { dg-final { scan-assembler-times {(?n)cmovn?c} 7 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times {(?n)cmov([lq]\.)?n?c} 7 { target { ! ia32 } } } } */
 
 #include 
 int
diff --git a/gcc/testsuite/gcc.target/i386/pr117232-apx-1.c b/gcc/testsuite/gcc.target/i386/pr117232-apx-1.c
--- a/gcc/testsuite/gcc.target/i386/pr117232-apx-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr117232-apx-1.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target { ! ia32 } } } */
 /* { dg-options "-mavx512bw -mavx512vl -mavx512dq -mapxf -O2" } */
 /* { dg-final { scan-assembler-times {(?n)kortest[bwqd]} 7 } } */
-/* { dg-final { scan-assembler-times {(?n)cmovn?c} 7 } } */
+/* { dg-final { scan-assembler-times {(?n)cmov([lq]\.)?n?c} 7 } } */
 
 #include 
 


Re: [PATCH] doc/cpp: Document __has_include_next

2024-11-20 Thread Joseph Myers
On Fri, 18 Oct 2024, Arsen Arsenović wrote:

> -The @code{__has_include} operator by itself, without any @var{operand} or
> -parentheses, acts as a predefined macro so that support for it can be tested
> -in portable code.  Thus, the recommended use of the operator is as follows:
> +The @code{__has_include} and @code{__has_include_next} operators by
> +themselves, without any @var{operand} or parentheses, acts as a
> +predefined macro so that support for it can be tested in portable code.
> +Thus, the recommended use of the operators is as follows:

Some things need updating for the plural: "*act* as predefined *macros* so 
that support for *them* can be tested" (not "acts", "macro", "it", and 
remove "a" before "predefined").

OK with those fixes.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] v2: Add -f{, no-}assume-sane-operators-new-delete options [PR110137]

2024-11-20 Thread Jan Hubicka
> On Tue, Nov 19, 2024 at 11:23:31AM +0100, Jakub Jelinek wrote:
> > On Tue, Nov 19, 2024 at 10:25:16AM +0100, Richard Biener wrote:
> > > I think it's pretty clear and easy to describe to users what "m " and 
> > > what "mC" do.  But with "pure" this is an odd intermediate state.  For 
> > > both
> > > "m " and "mP" you suggest above the new/delete might modify their
> > > global state but as you can't rely on the new/delete pair to prevail
> > > you cannot rely on the modification to happen.  But how do you explain
> > > that
> > 
> > If we are willing to make the default not strictly conforming (i.e.
> > basically revert PR101480 by default and make the GCC 11.1/11.2 behavior
> > the default and allow -fno-sane-operators-new-delete to change to GCC
> > 11.3/14.* behavior), I can live with it.
> > But we need to make the documentation clear that the default is not strictly
> > conforming.
> 
> Here is a modified version of the patch to do that.
> 
> Or do we want to set the default based on -std= option (-std=gnu* implies
> -fassume-sane-operators-new-delete, -std=c++* implies
> -fno-assume-sane-operators-new-delete)?  Though, not sure what to do for
> LTO then.

My oriignal plan was to add " sane" attribute to the declarations and
prevent them from being merged.  Then every direct call to new/delete
would know if it came from sane or insane translation unit.

Alternatively one can also declare
 +C++ ObjC++ LTO Var(flag_assume_sane_operators_new_delete) Init(1)
 +Assume C++ replaceable global operators new, new[], delete, delete[] don't 
read or write visible global state.
as optimization.  Then sanity would be function specific.

inline_call contains code that drops flag_strict_aliasing for function
when it inlines -fno-strict-alising function into -fstrict-aliasing.
At same place we can make new/delete operator insanity similarly
contagious.  If you inline function that has insane new/delete calls you
make the combined function also insane.

Honza



Re: [PATCH] bitintlower: Handle EXACT_DIV_EXPR like TRUNC_DIV_EXPR in bitint lowering [PR117571]

2024-11-20 Thread Richard Biener
On Tue, 19 Nov 2024, Jakub Jelinek wrote:

> Hi!
> 
> r15-4601 added match.pd simplification of some TRUNC_DIV_EXPR expressions
> into EXACT_DIV_EXPR, so bitintlower can now encounter even those.
> From bitint lowering POV the fact that the division will be exact
> doesn't matter, we still need to call at runtime the __divmodbitint4
> API and it wouldn't simplify there anything to know it is exact if
> we duplicated that, so the following patch lowers EXACT_DIV_EXPR exactly
> as TRUNC_DIV_EXPR.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> I think we don't need to backport this unless something introduces
> EXACT_DIV_EXPR on BITINT_TYPEd expressions on the 14 branch as well.

Agreed.

Richard.

> 2024-11-19  Jakub Jelinek  
> 
>   PR middle-end/117571
>   * gimple-lower-bitint.cc (bitint_large_huge::lower_muldiv_stmt,
>   bitint_large_huge::lower_stmt, stmt_needs_operand_addr,
>   build_bitint_stmt_ssa_conflicts, gimple_lower_bitint): Handle
>   EXACT_DIV_EXPR like TRUNC_DIV_EXPR.
> 
>   * gcc.dg/bitint-114.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-10-24 18:53:39.263072751 +0200
> +++ gcc/gimple-lower-bitint.cc2024-11-18 15:37:35.344738882 +0100
> @@ -3597,6 +3597,7 @@ bitint_large_huge::lower_muldiv_stmt (tr
>insert_before (g);
>break;
>  case TRUNC_DIV_EXPR:
> +case EXACT_DIV_EXPR:
>g = gimple_build_call_internal (IFN_DIVMODBITINT, 8,
> lhs, build_int_cst (sitype, prec),
> null_pointer_node,
> @@ -5560,6 +5561,7 @@ bitint_large_huge::lower_stmt (gimple *s
>   return;
> case MULT_EXPR:
> case TRUNC_DIV_EXPR:
> +   case EXACT_DIV_EXPR:
> case TRUNC_MOD_EXPR:
>   lower_muldiv_stmt (lhs, g);
>   goto handled;
> @@ -5694,6 +5696,7 @@ bitint_large_huge::lower_stmt (gimple *s
>   return;
>case MULT_EXPR:
>case TRUNC_DIV_EXPR:
> +  case EXACT_DIV_EXPR:
>case TRUNC_MOD_EXPR:
>   lower_muldiv_stmt (NULL_TREE, stmt);
>   return;
> @@ -5740,6 +5743,7 @@ stmt_needs_operand_addr (gimple *stmt)
>{
>case MULT_EXPR:
>case TRUNC_DIV_EXPR:
> +  case EXACT_DIV_EXPR:
>case TRUNC_MOD_EXPR:
>case FLOAT_EXPR:
>   return true;
> @@ -5931,6 +5935,7 @@ build_bitint_stmt_ssa_conflicts (gimple
>   {
>   case MULT_EXPR:
>   case TRUNC_DIV_EXPR:
> + case EXACT_DIV_EXPR:
>   case TRUNC_MOD_EXPR:
> muldiv_p = true;
>   default:
> @@ -6174,6 +6179,7 @@ gimple_lower_bitint (void)
>   break;
> case MULT_EXPR:
> case TRUNC_DIV_EXPR:
> +   case EXACT_DIV_EXPR:
> case TRUNC_MOD_EXPR:
>   if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (s))
> {
> @@ -6455,6 +6461,7 @@ gimple_lower_bitint (void)
>   switch (gimple_assign_rhs_code (use_stmt))
> {
> case TRUNC_DIV_EXPR:
> +   case EXACT_DIV_EXPR:
> case TRUNC_MOD_EXPR:
> case FLOAT_EXPR:
>   /* For division, modulo and casts to floating
> @@ -6568,6 +6575,7 @@ gimple_lower_bitint (void)
> case RSHIFT_EXPR:
> case MULT_EXPR:
> case TRUNC_DIV_EXPR:
> +   case EXACT_DIV_EXPR:
> case TRUNC_MOD_EXPR:
> case FIX_TRUNC_EXPR:
> case REALPART_EXPR:
> --- gcc/testsuite/gcc.dg/bitint-114.c.jj  2024-11-18 15:35:20.374624506 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-114.c 2024-11-18 15:35:55.651131671 +0100
> @@ -0,0 +1,23 @@
> +/* PR middle-end/117571 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-O2" } */
> +
> +#if __BITINT_MAXWIDTH__ >= 255
> +_BitInt(255) b;
> +
> +_BitInt(255)
> +foo ()
> +{
> +  return (b << 10) / 2;
> +}
> +#endif
> +
> +#if __BITINT_MAXWIDTH__ >= 8192
> +_BitInt(8192) c;
> +
> +_BitInt(8192)
> +bar ()
> +{
> +  return (c << 1039) / 
> 0x2000wb;
> +}
> +#endif
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] expr, c, gimplify, v3: Don't clear whole unions [PR116416]

2024-11-20 Thread Joseph Myers
On Wed, 20 Nov 2024, Jakub Jelinek wrote:

> On Tue, Nov 19, 2024 at 11:08:03PM +, Joseph Myers wrote:
> > > --- gcc/testsuite/gcc.dg/gnu11-empty-init-1.c.jj  2024-10-15 
> > > 16:14:23.411063701 +0200
> > > +++ gcc/testsuite/gcc.dg/gnu11-empty-init-1.c 2024-10-15 
> > > 16:31:02.302984714 +0200
> > > @@ -0,0 +1,199 @@
> > > +/* Test GNU C11 support for empty initializers.  */
> > > +/* { dg-do run } */
> > > +/* { dg-options "-std=gnu23" } */
> > 
> > All these gnu11-*.c tests are using -std=gnu23, which doesn't make sense.  
> > If they're meant to test what GCC does in C11 mode, use -std=gnu11; if 
> > they're meant to use -std=gnu23, name them gnu23-*.c.  (In either case, 
> > the tests might, as now, also have -fzero-init-padding-bits= options when 
> > that's part of what they're meant to test.)
> 
> Oops, sorry, good catch.
> Yes, all tests meant to use -std=gnu11.  Here is an updated patch with
> that
> sed -i -e s/-std=gnu23/-std=gnu11/ gcc/testsuite/gcc.dg/gnu11-empty-init*.c
> The tests still pass.  Note, -std=gnu23 instead of -std=gnu11 just clears
> perhaps some more padding bits in some places, but the tests are actually
> just testing when (my reading of) the C23 standard or these new options
> imply the padding bits should be zero; the tests actually don't check that
> those bits aren't zero otherwise, as that is UB and the memset from some
> other call to -1 might not keep everything still non-zero; and testing
> e.g. gimple dumps that the zeroing doesn't occur isn't bullet proof either,
> as the gimplifier in various cases just for optimization purposes decides
> to zero anyway.
> 
> Smoke tested so far, will do full bootstrap/regtest momentarily.

The C front-end / testsuite changes in the revised patch are OK 
(c23-empty-init-4.c will need renaming since I've added another test with 
that name today).

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] c: Add u{,l,ll,imax}abs builtins [PR117024]

2024-11-20 Thread Joseph Myers
On Wed, 16 Oct 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following patch adds u{,l,ll,imax}abs builtins, which just fold
> to ABSU_EXPR, similarly to how {,l,ll,imax}abs builtins fold to
> ABS_EXPR.
> 
> Tested on x86_64-linux, ok for trunk if it passes full bootstrap/regtest
> on x86_64-linux and i686-linux?

OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] libcpp: Fix ICE lexing invalid raw string in a deferred pragma [PR117118]

2024-11-20 Thread Joseph Myers
On Wed, 16 Oct 2024, Lewis Hyatt wrote:

> Hello-
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117118
> 
> This fixes an old regression from GCC 11. Is it OK for trunk and all
> backports please? Bootstrap + regtested all languages on x86-64 Linux.
> Thanks!
> 
> -Lewis
> 
> -- >8 --
> 
> The PR shows that we ICE after lexing an invalid unterminated raw string,
> because lex_raw_string() pops the main buffer unexpectedly. Resolve by
> handling this case the same way as for other directives.
> 
> libcpp/ChangeLog:
>   PR preprocessor/117118
>   * lex.cc (lex_raw_string): Treat an unterminated raw string the same
>   way for a deferred pragma as is done for other directives.
> 
> gcc/testsuite/ChangeLog:
>   PR preprocessor/117118
>   * c-c++-common/raw-string-directive-3.c: New test.
>   * c-c++-common/raw-string-directive-4.c: New test.

OK in the absence of C++ maintainer objections within 48 hours.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH v3 6/7] scev-cast.c: Adjusted

2024-11-20 Thread H.J. Lu
On Wed, Nov 20, 2024 at 10:24 PM Richard Biener 
wrote:

> On Sun, Nov 10, 2024 at 1:56 PM H.J. Lu  wrote:
> >
> > Since the C frontend no longer promotes char argument, adjust
> scev-cast.c.
>
> I wonder whether the adjusted testcase would pass now already for
> !PROMOTE_PROTOTYPE
> targets and thus whether the { target i?86-*-* x86_64-*-* } is still
> necessary after the change?
>

I will check.


> > PR middle-end/14907
> > * gcc.dg/tree-ssa/scev-cast.c: Adjusted.
> >
> > Signed-off-by: H.J. Lu 
> > ---
> >  gcc/testsuite/gcc.dg/tree-ssa/scev-cast.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-cast.c
> b/gcc/testsuite/gcc.dg/tree-ssa/scev-cast.c
> > index c569523ffa7..1a3c150a884 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/scev-cast.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-cast.c
> > @@ -22,6 +22,6 @@ void tst(void)
> >  blau ((unsigned char) i);
> >  }
> >
> > -/* { dg-final { scan-tree-dump-times "& 255" 1 "optimized" } } */
> > -/* { dg-final { scan-tree-dump-times "= \\(signed char\\)" 1
> "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "= \\(unsigned char\\)" 2
> "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "= \\(signed char\\)" 3
> "optimized" } } */
> >
> > --
> > 2.47.0
> >
>


-- 
H.J.


Re: [PATCH] Fortran: fix checking of protected variables in submodules [PR83135]

2024-11-20 Thread Harald Anlauf

Am 20.11.24 um 22:36 schrieb Jerry D:

On 11/20/24 1:08 PM, Harald Anlauf wrote:

Dear all,

the attached, actually rather straightforward patch fixes the checking of
protected variables in submodules.  When a variable was use-associated
in an ancestor module, we failed to properly diagnose this.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald



Yes, looks good to go.


Pushed as r15-5533-g3c130e410ac45d .

Thanks for the review!

Harald


Jerry






Re: [PATCH v3 2/7] Add expand_promote_outgoing_argument

2024-11-20 Thread H.J. Lu
On Wed, Nov 20, 2024 at 10:18 PM Richard Biener 
wrote:

> On Sun, Nov 10, 2024 at 1:55 PM H.J. Lu  wrote:
> >
> > Since the C/C++/Ada frontends no longer promote integer argument smaller
> > than int, add expand_promote_outgoing_argument to promote it when
> expanding
> > builtin functions.
>
> I wonder if we should instead handle this in the generic builtin expansion
> code
> instead?  Otherwise we'd need to fix all targets similarly?
>

This is for

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117547

I think it should be fixed differently in the x86 backend.


> Richard.
>
> > PR middle-end/14907
> > * expr.cc (expand_promote_outgoing_argument): New function.
> > * expr.h (expand_promote_outgoing_argument): New prototype.
> > * config/i386/i386-expand.cc (ix86_expand_binop_builtin): Call
> > expand_promote_outgoing_argument to expand the outgoing
> > argument.
> > (ix86_expand_multi_arg_builtin): Likewise.
> > (ix86_expand_unop_vec_merge_builtin): Likewise.
> > (ix86_expand_sse_compare): Likewise.
> > (ix86_expand_sse_comi): Likewise.
> > (ix86_expand_sse_round): Likewise.
> > (ix86_expand_sse_round_vec_pack_sfix): Likewise.
> > (ix86_expand_sse_ptest): Likewise.
> > (ix86_expand_sse_pcmpestr): Likewise.
> > (ix86_expand_sse_pcmpistr): Likewise.
> > (ix86_expand_args_builtin): Likewise.
> > (ix86_expand_sse_comi_round): Likewise.
> > (ix86_expand_round_builtin): Likewise.
> > (ix86_expand_special_args_builtin): Likewise.
> > (ix86_expand_vec_init_builtin): Likewise.
> > (ix86_expand_vec_ext_builtin): Likewise.
> > (ix86_expand_builtin): Likewise.
> >
> > Signed-off-by: H.J. Lu 
> > ---
> >  gcc/config/i386/i386-expand.cc | 244 -
> >  gcc/expr.cc|  18 +++
> >  gcc/expr.h |   1 +
> >  3 files changed, 141 insertions(+), 122 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386-expand.cc
> b/gcc/config/i386/i386-expand.cc
> > index 5c4a8e07d62..ce887d96f6a 100644
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -10415,8 +10415,8 @@ ix86_expand_binop_builtin (enum insn_code icode,
> tree exp, rtx target)
> >rtx pat;
> >tree arg0 = CALL_EXPR_ARG (exp, 0);
> >tree arg1 = CALL_EXPR_ARG (exp, 1);
> > -  rtx op0 = expand_normal (arg0);
> > -  rtx op1 = expand_normal (arg1);
> > +  rtx op0 = expand_promote_outgoing_argument (arg0);
> > +  rtx op1 = expand_promote_outgoing_argument (arg1);
> >machine_mode tmode = insn_data[icode].operand[0].mode;
> >machine_mode mode0 = insn_data[icode].operand[1].mode;
> >machine_mode mode1 = insn_data[icode].operand[2].mode;
> > @@ -10564,7 +10564,7 @@ ix86_expand_multi_arg_builtin (enum insn_code
> icode, tree exp, rtx target,
> >for (i = 0; i < nargs; i++)
> >  {
> >tree arg = CALL_EXPR_ARG (exp, i);
> > -  rtx op = expand_normal (arg);
> > +  rtx op = expand_promote_outgoing_argument (arg);
> >int adjust = (comparison_p) ? 1 : 0;
> >machine_mode mode = insn_data[icode].operand[i+adjust+1].mode;
> >
> > @@ -10691,7 +10691,7 @@ ix86_expand_unop_vec_merge_builtin (enum
> insn_code icode, tree exp,
> >  {
> >rtx pat;
> >tree arg0 = CALL_EXPR_ARG (exp, 0);
> > -  rtx op1, op0 = expand_normal (arg0);
> > +  rtx op1, op0 = expand_promote_outgoing_argument (arg0);
> >machine_mode tmode = insn_data[icode].operand[0].mode;
> >machine_mode mode0 = insn_data[icode].operand[1].mode;
> >
> > @@ -10727,8 +10727,8 @@ ix86_expand_sse_compare (const struct
> builtin_description *d,
> >rtx pat;
> >tree arg0 = CALL_EXPR_ARG (exp, 0);
> >tree arg1 = CALL_EXPR_ARG (exp, 1);
> > -  rtx op0 = expand_normal (arg0);
> > -  rtx op1 = expand_normal (arg1);
> > +  rtx op0 = expand_promote_outgoing_argument (arg0);
> > +  rtx op1 = expand_promote_outgoing_argument (arg1);
> >rtx op2;
> >machine_mode tmode = insn_data[d->icode].operand[0].mode;
> >machine_mode mode0 = insn_data[d->icode].operand[1].mode;
> > @@ -10823,8 +10823,8 @@ ix86_expand_sse_comi (const struct
> builtin_description *d, tree exp,
> >rtx pat, set_dst;
> >tree arg0 = CALL_EXPR_ARG (exp, 0);
> >tree arg1 = CALL_EXPR_ARG (exp, 1);
> > -  rtx op0 = expand_normal (arg0);
> > -  rtx op1 = expand_normal (arg1);
> > +  rtx op0 = expand_promote_outgoing_argument (arg0);
> > +  rtx op1 = expand_promote_outgoing_argument (arg1);
> >enum insn_code icode = d->icode;
> >const struct insn_data_d *insn_p = &insn_data[icode];
> >machine_mode mode0 = insn_p->operand[0].mode;
> > @@ -10916,7 +10916,7 @@ ix86_expand_sse_round (const struct
> builtin_description *d, tree exp,
> >  {
> >rtx pat;
> >tree arg0 = CALL_EXPR_ARG (exp, 0);
> > -  rtx op1, op0 = expand_normal (arg0);
> > +  rtx op1, op0 = expand_promote_outgoing_argument (arg0);

Re: [PATCH] json parsing: avoid relying on floating point equality [PR117677]

2024-11-20 Thread H.J. Lu
On Thu, Nov 21, 2024 at 4:52 AM David Malcolm  wrote:

> Sorry about the breakage.
>
> I wasn't able to reproduce the failures myself, but the following
> patch seems plausible as a fix; does it fix the affected
> configurations?
>

It fixed bootstrap on Linux/i686 for me.

Thanks.


>
> gcc/ChangeLog:
> PR bootstrap/117677
> * json-parsing.cc (selftest::test_parse_number): Replace
> ASSERT_EQ of 'double' values with ASSERT_NEAR.  Eliminate
> ASSERT_PRINT_EQ for such values.
> * selftest.h (ASSERT_NEAR): New.
> (ASSERT_NEAR_AT): New.
>
> Signed-off-by: David Malcolm 
> ---
>  gcc/json-parsing.cc |  9 +++--
>  gcc/selftest.h  | 20 
>  2 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/json-parsing.cc b/gcc/json-parsing.cc
> index 78188c4fef9c..457d78f97cfa 100644
> --- a/gcc/json-parsing.cc
> +++ b/gcc/json-parsing.cc
> @@ -2028,8 +2028,7 @@ test_parse_number ()
>  ASSERT_EQ (tc.get_error (), nullptr);
>  const json::value *jv = tc.get_value ();
>  ASSERT_EQ (JSON_FLOAT, jv->get_kind ());
> -ASSERT_EQ (3.141, ((const json::float_number *)jv)->get ());
> -ASSERT_PRINT_EQ (*jv, true, "3.141");
> +ASSERT_NEAR (3.141, ((const json::float_number *)jv)->get (), 0.001);
>  auto range = tc.get_range_for_value (jv);
>  ASSERT_TRUE (range);
>  ASSERT_RANGE_EQ (*range,
> @@ -2044,8 +2043,7 @@ test_parse_number ()
>ASSERT_EQ (tc.get_error (), nullptr);
>const json::value *jv = tc.get_value ();
>ASSERT_EQ (jv->get_kind (), JSON_FLOAT);
> -  ASSERT_EQ (as_a  (jv)->get (), 3.141);
> -  ASSERT_PRINT_EQ (*jv, true, "3.141");
> +  ASSERT_NEAR (as_a  (jv)->get (), 3.141,
> 0.1);
>auto range = tc.get_range_for_value (jv);
>ASSERT_TRUE (range);
>ASSERT_RANGE_EQ (*range,
> @@ -2070,8 +2068,7 @@ test_parse_number ()
>ASSERT_EQ (tc.get_error (), nullptr);
>const json::value *jv = tc.get_value ();
>ASSERT_EQ (jv->get_kind (), JSON_FLOAT);
> -  ASSERT_EQ (as_a  (jv)->get (), 4.2);
> -  ASSERT_PRINT_EQ (*jv, true, "4.2");
> +  ASSERT_NEAR (as_a  (jv)->get (), 4.2,
> 0.1);
>auto range = tc.get_range_for_value (jv);
>ASSERT_TRUE (range);
>ASSERT_RANGE_EQ (*range,
> diff --git a/gcc/selftest.h b/gcc/selftest.h
> index c6206e55428d..500095da79ca 100644
> --- a/gcc/selftest.h
> +++ b/gcc/selftest.h
> @@ -338,6 +338,26 @@ extern int num_passes;
>  ::selftest::fail ((LOC), desc_);  \
>SELFTEST_END_STMT
>
> +/* Evaluate VAL1 and VAL2 and compare them, calling
> +   ::selftest::pass if they are within ABS_ERROR of each other,
> +   ::selftest::fail if they are not.  */
> +
> +#define ASSERT_NEAR(VAL1, VAL2, ABS_ERROR) \
> +  ASSERT_NEAR_AT ((SELFTEST_LOCATION), (VAL1), (VAL2), (ABS_ERROR))
> +
> +/* Like ASSERT_NEAR, but treat LOC as the effective location of the
> +   selftest.  */
> +
> +#define ASSERT_NEAR_AT(LOC, VAL1, VAL2, ABS_ERROR)\
> +  SELFTEST_BEGIN_STMT \
> +  const char *desc_ = "ASSERT_NEAR (" #VAL1 ", " #VAL2 ", " #ABS_ERROR
> ")"; \
> +  double error = fabs ((VAL1) - (VAL2));   \
> +  if (error < (ABS_ERROR)) \
> +::selftest::pass ((LOC), desc_);   \
> +  else \
> +::selftest::fail ((LOC), desc_);   \
> +  SELFTEST_END_STMT
> +
>  /* Evaluate VAL1 and VAL2 and compare them with known_eq, calling
> ::selftest::pass if they are always equal,
> ::selftest::fail if they might be non-equal.  */
> --
> 2.26.3
>
>

-- 
H.J.


Re: [PATCH] libgccjit: Add support for machine-dependent builtins

2024-11-20 Thread Mark Wielaard
Hi Antoni,

On Wed, Nov 20, 2024 at 11:11:01AM -0500, Antoni Boucher wrote:
> From what I understand, pull requests on forge.sourceware.org can be
> removed at any time, so I could lose track of the status of my
> patches.

It is an experiment, and the experiment could fail for various
reasons. At that point we could decide to just throw everything
away. But we wouldn't do that randomly and I think people are willing
to let the experiment run for at least a year before deciding it does
or doesn't work. And we would of course give people the chance to
migrate the work they want to preserve somewhere else (forgejo has
good import/export to various other forges).

We could also decide the current setup is not good (and admittedly the
-mirror/-test thing is a little odd) and change those names and/or
resetup those repos.

But interestingly it seems that wouldn't impact your workflow. Which I
hadn't even thought was possible. But I just tried on our forgejo
setup and of course it works. You can do pull request to your own fork
from one branch to another.

Seeing this already thought me something I didn't know was possible or
useful. But I can totally see now how these "self pull requests" help
someone keep track of their work.

> I really like forgejo and use it for some of my personal projects.
> If you still think there would be benefit in me sending patches to
> forge.sourceware.org, please tell me and I'll try.

If another developer/maintainer like David is happy to try what you
already have been doing through github I think it would be
useful. Even if it doesn't work out for you that would be very
valuable feedback.

I do have to note that there are people a little nervous about reviews
completely "bypassing" the mailinglists. But that would be even more a
concern with using github for this.

Cheers,

Mark


Re: [PATCH v2 2/3] cfgexpand: Rewrite add_scope_conflicts_2 to use cache and look back further [PR111422]

2024-11-20 Thread Richard Biener
On Sat, Nov 16, 2024 at 5:25 AM Andrew Pinski  wrote:
>
> After fixing loop-im to do the correct overflow rewriting
> for pointer types too. We end up with code like:
> ```
> _9 = (unsigned long) &g;
> _84 = _9 + 18446744073709551615;
> _11 = _42 + _84;
> _44 = (signed char *) _11;
> ...
> *_44 = 10;
> g ={v} {CLOBBER(eos)};
> ...
> n[0] = &f;
> *_44 = 8;
> g ={v} {CLOBBER(eos)};
> ```
>
> Which was not being recongized by the scope conflicts code.
> This was because it only handled one level walk backs rather than multiple 
> ones.
> This fixes the issue by having a cache which records all references to 
> addresses
> of stack variables.
>
> Unlike the previous patch, this only records and looks at addresses of stack 
> variables.
> The cache uses a bitmap and uses the index as the bit to look at.
>
> PR middle-end/117426
> PR middle-end/111422
> gcc/ChangeLog:
>
> * cfgexpand.cc (struct vars_ssa_cache): New class.
> (vars_ssa_cache::vars_ssa_cache): New constructor.
> (vars_ssa_cache::~vars_ssa_cache): New deconstructor.
> (vars_ssa_cache::create): New method.
> (vars_ssa_cache::exists): New method.
> (vars_ssa_cache::add_one): New method.
> (vars_ssa_cache::update): New method.
> (vars_ssa_cache::dump): New method.
> (add_scope_conflicts_2): Factor mostly out to
> vars_ssa_cache::operator(). New cache argument.
> Walk the bitmap cache for the stack variables addresses.
> (vars_ssa_cache::operator()): New method factored out from
> add_scope_conflicts_2. Rewrite to be a full walk of all operands
> and use a worklist.
> (add_scope_conflicts_1): Add cache new argument for the addr cache.
> Just call add_scope_conflicts_2 for the phi result instead of calling
> for the uses and don't call walk_stmt_load_store_addr_ops for phis.
> Update call to add_scope_conflicts_2 to add cache argument.
> (add_scope_conflicts): Add cache argument and update calls to
> add_scope_conflicts_1.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/torture/pr117426-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/cfgexpand.cc  | 292 +++---
>  gcc/testsuite/gcc.dg/torture/pr117426-1.c |  53 
>  2 files changed, 308 insertions(+), 37 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr117426-1.c
>
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index b88e8827667..841d3c1254e 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -585,35 +585,243 @@ visit_conflict (gimple *, tree op, tree, void *data)
>return false;
>  }
>
> -/* Helper function for add_scope_conflicts_1.  For USE on
> -   a stmt, if it is a SSA_NAME and in its SSA_NAME_DEF_STMT is known to be
> -   based on some ADDR_EXPR, invoke VISIT on that ADDR_EXPR.  */
> +/* A cache for ssa name to address of stack variables.
> +   When taking into account if a ssa name refers to an
> +   address of a stack variable, we need to walk the
> +   expressions backwards to find the addresses. This
> +   cache is there so we don't need to walk the expressions
> +   all the time.  */
> +struct vars_ssa_cache
> +{
> +private:
> +  /* Currently an entry is a bitmap of all of the known stack variables
> + addresses that are referenced by the ssa name.
> + When the bitmap is the nullptr, then there is no cache.
> + Currently only empty bitmaps are shared.
> + The reason for why empty cache is not just a null is so we know the
> + cache for an entry is filled in.  */
> +  struct entry
> +  {
> +bitmap bmap = nullptr;
> +  };
> +  entry *vars_ssa_caches;
> +public:
>
> -static inline void
> -add_scope_conflicts_2 (tree use, bitmap work,
> -  walk_stmt_load_store_addr_fn visit)
> +  vars_ssa_cache();
> +  ~vars_ssa_cache();
> +  const_bitmap operator() (tree name);
> +  void dump (FILE *file);
> +
> +private:
> +  /* Can't copy. */
> +  vars_ssa_cache(const vars_ssa_cache&) = delete;
> +  vars_ssa_cache(vars_ssa_cache&&) = delete;
> +
> +  /* The shared empty bitmap.  */
> +  bitmap empty;
> +
> +  /* Unshare the index, currently only need
> + to unshare if the entry was empty. */
> +  void unshare(int indx)
> +  {
> +if (vars_ssa_caches[indx].bmap == empty)
> +   vars_ssa_caches[indx].bmap = BITMAP_ALLOC (&stack_var_bitmap_obstack);
> +  }
> +  void create (tree);
> +  bool exists (tree use);
> +  void add_one (tree old_name, unsigned);
> +  bool update (tree old_name, tree use);
> +};
> +
> +/* Constructor of the cache, create the cache array. */
> +vars_ssa_cache::vars_ssa_cache ()
> +{
> +  vars_ssa_caches = new entry[num_ssa_names]{};
> +
> +  /* Create the shared empty bitmap too. */
> +  empty = BITMAP_ALLOC (&stack_var_bitmap_obstack);
> +}
> +
> +/* Delete the array. The bitmaps will be freed
> +   when stack_var_bitmap_obstack is freed.  */
> +vars_ssa_cache::~vars_ssa_

Re: [PATCH v3 7/7] ssa-fre-4.c: Skip for all targets

2024-11-20 Thread H.J. Lu
On Wed, Nov 20, 2024 at 10:27 PM Richard Biener 
wrote:

> On Sun, Nov 10, 2024 at 1:56 PM H.J. Lu  wrote:
> >
> > Since the C frontend no longer promotes char argument, ssa-fre-4.c will
> > fail for all targets.  Skip it for all targets.
>
> Maybe instead do
>
> /* { dg-final { scan-tree-dump-not " = \\\(\[^)\]*\\\)" "fre1" } } */
>
> thus verify there are no casts in the IL for all targets?  Or simply
> remove the test,
>

I will check.

Thanks.


> skipping for all targets doesn't make much sense.
>
> Richard.
>
> > PR middle-end/14907
> > * gcc.dg/tree-ssa/ssa-fre-4.c: Skip for all targets.
> >
> > Signed-off-by: H.J. Lu 
> > ---
> >  gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c
> > index 5a7588febaa..07d4d81996a 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c
> > @@ -1,6 +1,6 @@
> > -/* If the target returns false for TARGET_PROMOTE_PROTOTYPES, then there
> > -   will be no casts for FRE to eliminate and the test will fail.  */
> > -/* { dg-do compile { target i?86-*-* x86_64-*-* hppa*-*-* m68k*-*-* } }
> */
> > +/* Since the C frontend no longer promotes char argument, there will be
> > +   no casts for FRE to eliminate and the test will fail.  */
> > +/* { dg-do compile { target !*-*-* } } */
> >  /* { dg-options "-O -fno-tree-ccp -fno-tree-forwprop
> -fdump-tree-fre1-details" } */
> >
> >  /* From PR21608.  */
> > --
> > 2.47.0
> >
>


-- 
H.J.


[PATCH v6] forwprop: Try to blend two isomorphic VEC_PERM sequences

2024-11-20 Thread Christoph Müllner
This extends forwprop by yet another VEC_PERM optimization:
It attempts to blend two isomorphic vector sequences by using the
redundancy in the lane utilization in these sequences.
This redundancy in lane utilization comes from the way how specific
scalar statements end up vectorized: two VEC_PERMs on top, binary operations
on both of them, and a final VEC_PERM to create the result.
Here is an example of this sequence:

  v_in = {e0, e1, e2, e3}
  v_1 = VEC_PERM 
  // v_1 = {e0, e2, e0, e2}
  v_2 = VEC_PERM 
  // v_2 = {e1, e3, e1, e3}

  v_x = v_1 + v_2
  // v_x = {e0+e1, e2+e3, e0+e1, e2+e3}
  v_y = v_1 - v_2
  // v_y = {e0-e1, e2-e3, e0-e1, e2-e3}

  v_out = VEC_PERM 
  // v_out = {e0+e1, e2+e3, e0-e1, e2-e3}

To remove the redundancy, lanes 2 and 3 can be freed, which allows to
change the last statement into:
  v_out' = VEC_PERM 
  // v_out' = {e0+e1, e2+e3, e0-e1, e2-e3}

The cost of eliminating the redundancy in the lane utilization is that
lowering the VEC PERM expression could get more expensive because of
tighter packing of the lanes.  Therefore this optimization is not done
alone, but in only in case we identify two such sequences that can be
blended.

Once all candidate sequences have been identified, we try to blend them,
so that we can use the freed lanes for the second sequence.
On success we convert 2x (2x BINOP + 1x VEC_PERM) to
2x VEC_PERM + 2x BINOP + 2x VEC_PERM traded for 4x VEC_PERM + 2x BINOP.

The implemented transformation reuses (rewrites) the statements
of the first sequence and the last VEC_PERM of the second sequence.
The remaining four statements of the second statment are left untouched
and will be eliminated by DCE later.

This targets x264_pixel_satd_8x4, which calculates the sum of absolute
transformed differences (SATD) using Hadamard transformation.
We have seen 8% speedup on SPEC's x264 on a 5950X (x86-64) and 7%
speedup on an AArch64 machine.

Bootstrapped and reg-tested on x86-64 and AArch64 (all languages).

gcc/ChangeLog:

* tree-ssa-forwprop.cc (struct _vec_perm_simplify_seq): New data
structure to store analysis results of a vec perm simplify sequence.
(get_vect_selector_index_map): Helper to get an index map from the
provided vector permute selector.
(recognise_vec_perm_simplify_seq): Helper to recognise a
vec perm simplify sequence.
(narrow_vec_perm_simplify_seq): Helper to pack the lanes more
tight.
(can_blend_vec_perm_simplify_seqs_p): Test if two vec perm
sequences can be blended.
(calc_perm_vec_perm_simplify_seqs): Helper to calculate the new
permutation indices.
(blend_vec_perm_simplify_seqs): Helper to blend two vec perm
simplify sequences.
(process_vec_perm_simplify_seq_list): Helper to process a list
of vec perm simplify sequences.
(append_vec_perm_simplify_seq_list): Helper to add a vec perm
simplify sequence to the list.
(pass_forwprop::execute): Integrate new functionality.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/satd-hadamard.c: New test.
* gcc.dg/tree-ssa/vector-10.c: New test.
* gcc.dg/tree-ssa/vector-8.c: New test.
* gcc.dg/tree-ssa/vector-9.c: New test.
* gcc.target/aarch64/sve/satd-hadamard.c: New test.

Signed-off-by: Christoph Müllner 
---
Changes in v6:
* Use 'unsigned int' instead of of unsigned HWI for vector indices
* Remove hash maps and replace functionality with vec<>
* Inline get_tree_def () and eliminate redundant checks
* Ensure sequences remain in a BB
* Avoid temporary objects that need to converted later
* Simplify lane calculation when blending

Changes in v5:
* Improve coding style.

Changes in v4:
* Fix test condition for writing to the dump file
* Use gimple UIDs instead on expensive walks for comparing ordering.
* Ensure to not blend across assignments to SSA_NAMES.
* Restrict list to fix-sized vector with 8 entries.
* Remove calls of expensive vec methods by restructuring the code.
* Improved wording.

Changes in v3:
* Moved code to tree-ssa-forwprop.cc where similar VEC_PERM
  optimizations are implemented.
* Test operand order less strict in case of commutative operators.
* Made naming more consistent.
* Added a test for testing dependencies between two sequences.
* Removed the instruction reordering (no necessary without dependencies).
* Added tests based on __builtin_shuffle ().

Changes in v2:
* Moved code from tree-vect-slp.cc into a new pass (from where it could
  be moved elsewhere).
* Only deduplicate lanes if sequences will be merged later on.
* Split functionality stricter into analysis and transformation parts.

Manolis Tsamis was the patch's initial author before I took it over.

 gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c |  43 ++
 gcc/testsuite/gcc.dg/tree-ssa/vector-10.c | 122 
 gcc/testsuite/gcc.dg/tree-ssa/vector-8.c  |  34 +
 gcc/testsuite/gcc.dg/tree-ssa/vector-9.c  |  34 +
 .../gcc.target/aarc

Re: [PATCH v5] forwprop: Try to blend two isomorphic VEC_PERM sequences

2024-11-20 Thread Christoph Müllner
On Tue, Nov 19, 2024 at 2:35 PM Richard Biener
 wrote:
>
> On Sat, Nov 16, 2024 at 12:00 AM Christoph Müllner
>  wrote:
> >
> > This extends forwprop by yet another VEC_PERM optimization:
> > It attempts to blend two isomorphic vector sequences by using the
> > redundancy in the lane utilization in these sequences.
> > This redundancy in lane utilization comes from the way how specific
> > scalar statements end up vectorized: two VEC_PERMs on top, binary operations
> > on both of them, and a final VEC_PERM to create the result.
> > Here is an example of this sequence:
> >
> >   v_in = {e0, e1, e2, e3}
> >   v_1 = VEC_PERM 
> >   // v_1 = {e0, e2, e0, e2}
> >   v_2 = VEC_PERM 
> >   // v_2 = {e1, e3, e1, e3}
> >
> >   v_x = v_1 + v_2
> >   // v_x = {e0+e1, e2+e3, e0+e1, e2+e3}
> >   v_y = v_1 - v_2
> >   // v_y = {e0-e1, e2-e3, e0-e1, e2-e3}
> >
> >   v_out = VEC_PERM 
> >   // v_out = {e0+e1, e2+e3, e0-e1, e2-e3}
> >
> > To remove the redundancy, lanes 2 and 3 can be freed, which allows to
> > change the last statement into:
> >   v_out' = VEC_PERM 
> >   // v_out' = {e0+e1, e2+e3, e0-e1, e2-e3}
> >
> > The cost of eliminating the redundancy in the lane utilization is that
> > lowering the VEC PERM expression could get more expensive because of
> > tighter packing of the lanes.  Therefore this optimization is not done
> > alone, but in only in case we identify two such sequences that can be
> > blended.
> >
> > Once all candidate sequences have been identified, we try to blend them,
> > so that we can use the freed lanes for the second sequence.
> > On success we convert 2x (2x BINOP + 1x VEC_PERM) to
> > 2x VEC_PERM + 2x BINOP + 2x VEC_PERM traded for 4x VEC_PERM + 2x BINOP.
> >
> > The implemented transformation reuses (rewrites) the statements
> > of the first sequence and the last VEC_PERM of the second sequence.
> > The remaining four statements of the second statment are left untouched
> > and will be eliminated by DCE later.
> >
> > This targets x264_pixel_satd_8x4, which calculates the sum of absolute
> > transformed differences (SATD) using Hadamard transformation.
> > We have seen 8% speedup on SPEC's x264 on a 5950X (x86-64) and 7%
> > speedup on an AArch64 machine.
> >
> > Bootstrapped and reg-tested on x86-64 and AArch64 (all languages).
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-forwprop.cc (struct _vec_perm_simplify_seq): New data
> > structure to store analysis results of a vec perm simplify sequence.
> > (get_tree_def): New helper to get the defining statement of an
> > SSA_NAME.
> > (get_vect_selector_index_map): Helper to get an index map from the
> > provided vector permute selector.
> > (recognise_vec_perm_simplify_seq): Helper to recognise a
> > vec perm simplify sequence.
> > (narrow_vec_perm_simplify_seq): Helper to pack the lanes more
> > tight.
> > (can_blend_vec_perm_simplify_seqs_p): Test if two vec perm
> > sequences can be blended.
> > (blend_vec_perm_simplify_seqs): Helper to blend two vec perm
> > simplify sequences.
> > (process_vec_perm_simplify_seq_list): Helper to process a list
> > of vec perm simplify sequences.
> > (append_vec_perm_simplify_seq_list): Helper to add a vec perm
> > simplify sequence to the list.
> > (pass_forwprop::execute): Integrate new functionality.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/satd-hadamard.c: New test.
> > * gcc.dg/tree-ssa/vector-10.c: New test.
> > * gcc.dg/tree-ssa/vector-8.c: New test.
> > * gcc.dg/tree-ssa/vector-9.c: New test.
> > * gcc.target/aarch64/sve/satd-hadamard.c: New test.
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> > Changes in v5:
> > * Improve coding style.
> >
> > Changes in v4:
> > * Fix test condition for writing to the dump file
> > * Use gimple UIDs instead on expensive walks for comparing ordering.
> > * Ensure to not blend across assignments to SSA_NAMES.
> > * Restrict list to fix-sized vector with 8 entries.
> > * Remove calls of expensive vec methods by restructuring the code.
> > * Improved wording.
> >
> > Changes in v3:
> > * Moved code to tree-ssa-forwprop.cc where similar VEC_PERM
> >   optimizations are implemented.
> > * Test operand order less strict in case of commutative operators.
> > * Made naming more consistent.
> > * Added a test for testing dependencies between two sequences.
> > * Removed the instruction reordering (no necessary without dependencies).
> > * Added tests based on __builtin_shuffle ().
> >
> > Changes in v2:
> > * Moved code from tree-vect-slp.cc into a new pass (from where it could
> >   be moved elsewhere).
> > * Only deduplicate lanes if sequences will be merged later on.
> > * Split functionality stricter into analysis and transformation parts.
> >
> > Manolis Tsamis was the patch's initial author before I took it over.
> >
> >  gcc/testsuite/gcc.dg/tree-ssa/sa

[PATCH 00/17] testsuite: arm: Leverage -mcpu=unset/-march=unset

2024-11-20 Thread Torbjörn SVENSSON


Hi,

This patch set tries to reduce the number of failed test cases for ARM based
targets by leveraging the -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.  With the patch set applied, the test cases listed below
will be reported as "regressions", but it's really that the test cases needs to
be adapted to the different context they might be tested in.

I have checked for regressions with r15-5047-g7e1d9f58858 as a baseline for
Cortex-A7, Cortex-M0/3/4/7/33/55/85.

For some of the regressions listed below, there is either patches shared that
should fix them, or discussion has started on how to address them.

FAIL: gcc.dg/vect/complex/fast-math-complex-mls-float.c -flto -ffat-lto-objects 
 scan-tree-dump vect "Found COMPLEX_FMA"
FAIL: gcc.dg/vect/complex/fast-math-complex-mls-float.c scan-tree-dump vect 
"Found COMPLEX_FMA"
FAIL: gcc.target/arm/aes_xor_combine.c scan-assembler-not veor
FAIL: gcc.target/arm/armv8_2-fp16-scalar-1.c (test for excess errors)
FAIL: gcc.target/arm/armv8_2-fp16-scalar-2.c (test for excess errors)
FAIL: gcc.target/arm/bfloat16_scalar_1_1.c check-function-bodies bfloat_mov_mr
FAIL: gcc.target/arm/bfloat16_scalar_1_2.c check-function-bodies bfloat_mov_mr
FAIL: gcc.target/arm/bfloat16_scalar_2_1.c check-function-bodies bfloat_mov_mr
FAIL: gcc.target/arm/bfloat16_scalar_2_2.c check-function-bodies bfloat_mov_mr
FAIL: gcc.target/arm/bfloat16_scalar_3_1.c check-function-bodies bfloat_mov_mr
FAIL: gcc.target/arm/bfloat16_scalar_3_2.c check-function-bodies bfloat_mov_mr
FAIL: gcc.target/arm/bfloat16_simd_1_2.c check-function-bodies stacktest1
FAIL: gcc.target/arm/bfloat16_simd_2_2.c check-function-bodies stacktest1
FAIL: gcc.target/arm/bfloat16_simd_3_2.c check-function-bodies stacktest1
FAIL: gcc.target/arm/crypto-vsha1cq_u32.c scan-assembler-times 
vdup.32\\tq[0-9]+, r[0-9]+ 4
FAIL: gcc.target/arm/crypto-vsha1h_u32.c scan-assembler-times 
vdup.32\\tq[0-9]+, r[0-9]+ 4
FAIL: gcc.target/arm/crypto-vsha1mq_u32.c scan-assembler-times 
vdup.32\\tq[0-9]+, r[0-9]+ 4
FAIL: gcc.target/arm/crypto-vsha1pq_u32.c scan-assembler-times 
vdup.32\\tq[0-9]+, r[0-9]+ 4
FAIL: gcc.target/arm/mve/dlstp-compile-asm-2.c check-function-bodies test7
FAIL: gcc.target/arm/mve/dlstp-invalid-asm.c scan-assembler-not \tdlstp
FAIL: gcc.target/arm/mve/dlstp-invalid-asm.c scan-assembler-not \tletp
FAIL: gcc.target/arm/pr110268-1.c (test for excess errors)
FAIL: gcc.target/arm/pr110268-2.c (test for excess errors)
FAIL: gcc.target/arm/pr112337.c (test for excess errors)
FAIL: gcc.target/arm/simd/bf16_vstn_1.c check-function-bodies test_vst3q_bf16
FAIL: gcc.target/arm/simd/mve-vabs.c scan-assembler-times memmove 3
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vabs\\.f16\\ts[0-9]+, s[0-9]+ 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vadd\\.f16\\ts[0-9]+, s[0-9]+, s[0-9]+ 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvt\\.f16\\.s32\\ts[0-9]+, s[0-9]+ 2
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvt\\.f16\\.s32\\ts[0-9]+, s[0-9]+, #1 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvt\\.f16\\.u32\\ts[0-9]+, s[0-9]+ 2
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvt\\.f16\\.u32\\ts[0-9]+, s[0-9]+, #1 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvt\\.s32\\.f16\\ts[0-9]+, s[0-9]+ 2
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvt\\.s32\\.f16\\ts[0-9]+, s[0-9]+, #1 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvt\\.u32\\.f16\\ts[0-9]+, s[0-9]+ 2
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvt\\.u32\\.f16\\ts[0-9]+, s[0-9]+, #1 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvta\\.s32\\.f16\\ts[0-9]+, s[0-9]+ 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvta\\.u32\\.f16\\ts[0-9]+, s[0-9]+ 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvtm\\.s32\\.f16\\ts[0-9]+, s[0-9]+ 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvtm\\.u32\\.f16\\ts[0-9]+, s[0-9]+ 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvtn\\.s32\\.f16\\ts[0-9]+, s[0-9]+ 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvtn\\.u32\\.f16\\ts[0-9]+, s[0-9]+ 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvtp\\.s32\\.f16\\ts[0-9]+, s[0-9]+ 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vcvtp\\.u32\\.f16\\ts[0-9]+, s[0-9]+ 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vdiv\\.f16\\ts[0-9]+, s[0-9]+, s[0-9]+ 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vfma\\.f16\\ts[0-9]+, s[0-9]+, s[0-9]+ 1
UNRESOLVED: gcc.target/arm/armv8_2-fp16-scalar-1.c scan-assembler-times 
vfms\\.f16\\

Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

2024-11-20 Thread Kees Cook
On Tue, Nov 19, 2024 at 05:41:13PM +0100, Martin Uecker wrote:
> Am Dienstag, dem 19.11.2024 um 10:47 -0500 schrieb Marek Polacek:
> > On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote:
> > > Am Montag, dem 18.11.2024 um 17:55 + schrieb Qing Zhao:
> > > > Hi,
> > > > 
> > > > I am working on extending “counted_by” attribute to pointers inside a 
> > > > structure per our previous discussion. 
> > > > 
> > > > I need advice on the following question:
> > > > 
> > > > Should -fsantize=bounds support array reference that was referenced 
> > > > through a pointer that has counted_by attribute? 
> > 
> > I don't see why it couldn't, perhaps as part of -fsanitize=bounds-strict.
> > Someone has to implement it, though.
> 
> I think Qing was volunteering to do this.  My point was that
> this would not necessarily be undefined behavior, but instead
> could trap for possibly defined behavior.  I would not mind, but
> I point out that in the past people insisted that the sanitizers
> are only intended to screen for undefined behavior.

I think it's a mistake to confuse the sanitizers with only addressing
"undefined behavior". The UB sanitizers are just a subset of the
sanitizers in general, and I think UB is a not a good category for how
to group the behaviors.

For the Linux kernel, we want robustness. UB leads to ambiguity, so
we're quite interested in getting rid of UB, but the bounds sanitizer is
expected to implement bounds checking, regardless of UB-ness.

I would absolutely want -fsanitize=bounds to check the construct Qing
mentioned.

Another aspect I want to capture for Linux is _pointer_ bounds, so that
this would be caught:

#include 

struct annotated {
  int b;
  int *c __attribute__ ((counted_by (b)));
} *p_array_annotated;

void __attribute__((__noinline__)) setup (int annotated_count)
{
  p_array_annotated
= (struct annotated *)malloc (sizeof (struct annotated));
  p_array_annotated->c = (int *) malloc (annotated_count *  sizeof (int));
  p_array_annotated->b = annotated_count;

  return;
}

int main(int argc, char *argv[])
{
  int i;
  int *c;

  setup (10);
  c = p_array_annotated->c;
  for (i = 0; i < 11; i++)
*c++ = 2; // goes boom at i == 10
  return 0;
}

This may be a separate sanitizer, and it may require a totally different
set of internal tracking, but being able to discover that we've run off
the end of an allocation is needed.

Of course, the biggest deal is that
__builtin_dynamic_object_size(p_array_annotated->c, 1) will return
10 * sizeof(*p_array_annotated->c)

> 
> >  
> > > I think the question is what -fsanitize=bounds is meant to be.
> > > 
> > > I am a bit frustrated about the sanitizer.  On the
> > > one hand, it is not doing enough to get spatial memory
> > > safety even where this would be easily possible, on the
> > > other hand, is pedantic about things which are technically
> > > UB but not problematic and then one is prevented from
> > > using it
> > > 
> > > When used in default mode, where execution continues, it
> > > also does not mix well with many warning, creates more code,
> > > and pulls in a libary dependency (and the library also depends
> > > on upstream choices / progress which seems a limitation for
> > > extensions).
> > > 
> > > What IMHO would be ideal is a protection mode for spatial
> > > memory safety that simply adds traps (which then requires
> > > no library, has no issues with other warnings, and could
> > > evolve independently from clang) 
> > > 
> > > So shouldn't we just add a -fboundscheck (which would 
> > > be like -fsanitize=bounds -fsanitize-trap=bounds just with
> > > more checking) and make it really good? I think many people
> > > would be very happy about this.
> > 
> > That's a separate concern.  We already have the -fbounds-check option,
> > currently only used in Fortran (and D?), so perhaps we could make
> > that option a shorthand for -fsanitize=bounds -fsanitize-trap=bounds.
> 
> I think it could share large parts of the implementation, but the
> main reason for having a separate option would be to do something
> better than the sanitizer.  So it could not simply be a shorthand.

I don't want to reinvent the wheel here -- the sanitizers already have 3
modes of operation (trap, callback with details, callback without
details), and Linux uses the first 2 modes already, and has had plans to
use the third (smaller resulting image).

Most notably, Linux _must_ have a warn-only mode or the feature will
never get merged (this is a hard requirement from Linus). All serious
deployments of the feature will use either trap mode or use the
trap-on-warn setting, of course. But for the feature to even see the
light of day, Linus requires there be a warn-only mode.

So, given these requirements, continuing to use the sanitizer framework
seems much simpler to me. :)

-Kees

-- 
Kees Cook


Re: [PATCH 2/2] RISC-V: Use dynamic shadow offset

2024-11-20 Thread Jeff Law




On 11/14/24 9:14 PM, Kito Cheng wrote:

Switch to dynamic offset so that we can support Sv39, Sv48, and Sv57 at
the same time without building multiple libasan versions!

[1] 
https://github.com/llvm/llvm-project/commit/da0c8b275564f814a53a5c19497669ae2d99538d

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_asan_shadow_offset): Use dynamic
offset for RV64.
(riscv_asan_dynamic_shadow_offset_p): New.

OK once prereqs are committed.

jeff



Re: [PATCH v2 5/8] c: Fix constructor bounds checking for VLA and construct VLA vector constants

2024-11-20 Thread Marek Polacek
On Mon, Nov 18, 2024 at 02:12:21PM +0530, Tejas Belagod wrote:
> This patch adds support for checking bounds of SVE ACLE vector initialization
> constructors.  It also adds support to construct vector constant from init
> constructors.
> 
> gcc/ChangeLog:
> 
>   * c/c-typeck.cc (process_init_element): Add check to restrict

The C part looks fine but please drop the c/ prefix here, thanks.

Marek



Re: [PATCH v1 2/4] aarch64: Add stdcall and cdecl attributes

2024-11-20 Thread Richard Sandiford
Evgeny Karpov  writes:
> This patch adds stdcall and cdecl attributes, which might be used for
> DLL export/import in MinGW.

If that's the main use case, did you consider putting the attributes
in the #if TARGET_DLLIMPORT_DECL_ATTRIBUTES block?  Or is that not
appropriate?  (Genuine question, in case it doesn't sound like one.)

Why is there no handling of the attributes?  Are they just being
provided for compatibility with x86 code, and so are no-ops for aarch64?

It would be good to update the documentation in doc/extend.texi as well.

Thanks,
Richard

>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc: Update.
> ---
>  gcc/config/aarch64/aarch64.cc | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index aa6b1c86ed1..f02f9c88b6e 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -869,6 +869,8 @@ static const attribute_spec aarch64_gnu_attributes[] =
>{ "Advanced SIMD type", 1, 1, false, true,  false, true,  NULL, NULL },
>{ "SVE type",3, 3, false, true,  false, true,  NULL, NULL 
> },
>{ "SVE sizeless type",  0, 0, false, true,  false, true,  NULL, NULL },
> +  { "cdecl",   0, 0, false, false, false, false, NULL, NULL },
> +  { "stdcall", 0, 0, false, false, false, false, NULL, NULL 
> },
>  #if TARGET_DLLIMPORT_DECL_ATTRIBUTES
>{ "dllimport", 0, 0, false, false, false, false, handle_dll_attribute, 
> NULL },
>{ "dllexport", 0, 0, false, false, false, false, handle_dll_attribute, 
> NULL },


Re: [PATCH] doc: mention STAGE1_CFLAGS

2024-11-20 Thread Sam James
Sam James  writes:

> STAGE1_CFLAGS can be used to accelerate the just-built stage1 compiler
> which especially improves its performance on some of the large generated
> files during bootstrap. It defaults to nothing (i.e. -O0).
>
> The downside is that if the native compiler is buggy, there's a greater
> risk of a failed bootstrap. Those with a modern native compiler, ideally
> a recent version of GCC, should be able to use -O1 or -O2 without issue
> to get a faster build.
>
>   PR rtl-optimization/111619
>   * doc/install.texi (Building a native compiler): Discuss STAGE1_CFLAGS.
> ---
> This came out of a discussion between mjw and I a little while ago when
> working on the buildbots. OK?

Ping.

>
>  gcc/doc/install.texi | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 705440ffd330..4bd60555af9b 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -3017,7 +3017,11 @@ bootstrapped, you can use @code{CFLAGS_FOR_TARGET} to 
> modify their
>  compilation flags, as for non-bootstrapped target libraries.
>  Again, if the native compiler miscompiles the stage1 compiler, you may
>  need to work around this by avoiding non-working parts of the stage1
> -compiler.  Use @code{STAGE1_TFLAGS} to this end.
> +compiler.  Use @code{STAGE1_CFLAGS} and @code{STAGE1_TFLAGS} (for target
> +libraries) to this end.  The default value for @code{STAGE1_CFLAGS} is
> +@samp{STAGE1_CFLAGS='-O0'} to increase the chances of a successful bootstrap
> +with a buggy native compiler.  Changing this to @code{-O1} or @code{-O2}
> +can improve bootstrap times, with some greater risk of a failed bootstrap.
>  
>  If you used the flag @option{--enable-languages=@dots{}} to restrict
>  the compilers to be built, only those you've actually enabled will be
>
> base-commit: 00448f9b5a123b4b6b3e6f45d2fecf0a5dca66b3


Re: [PATCH] rs6000: Inefficient vector splat of small V2DI constants [PR107757]

2024-11-20 Thread Peter Bergner
On 11/20/24 4:53 AM, Surya Kumari Jangala wrote:
> +++ b/gcc/testsuite/gcc.target/powerpc/pr107757-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2" } */
> +/* { dg-require-effective-target powerpc_vsx } */

The -mvsx option is implied by -mcpu=power8, so it not needed.
Please remove it.


> +++ b/gcc/testsuite/gcc.target/powerpc/pr107757-2.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2" } */

Likewise.


The rest LGTM.

Peter



[committed] c: Diagnose compound literal for empty array [PR114266]

2024-11-20 Thread Joseph Myers
As reported in bug 114266, GCC fails to pedwarn for a compound
literal, whose type is an array of unknown size, initialized with an
empty initializer.  This case is disallowed by C23 (which doesn't have
zero-size objects); the case of a named object is diagnosed as
expected, but not that for compound literals.  (Before C23, the
pedwarn for empty initializers sufficed.)  Add a check for this
specific case with a pedwarn.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

PR c/114266

gcc/c/
* c-decl.cc (build_compound_literal): Diagnose array of unknown
size with empty initializer for C23.

gcc/testsuite/
* gcc.dg/c23-empty-init-4.c: New test.

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 96bfe9290fd9..c58ff4ab2488 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -6514,9 +6514,14 @@ build_compound_literal (location_t loc, tree type, tree 
init, bool non_const,
 {
   int failure = complete_array_type (&TREE_TYPE (decl),
 DECL_INITIAL (decl), true);
-  /* If complete_array_type returns 3, it means that the
- initial value of the compound literal is empty.  Allow it.  */
+  /* If complete_array_type returns 3, it means that the initial value of
+ the compound literal is empty.  Allow it with a pedwarn; in pre-C23
+ modes, the empty initializer itself has been diagnosed if pedantic so
+ does not need to be diagnosed again here.  */
   gcc_assert (failure == 0 || failure == 3);
+  if (failure == 3 && flag_isoc23)
+   pedwarn (loc, OPT_Wpedantic,
+"array of unknown size with empty initializer");
 
   type = TREE_TYPE (decl);
   TREE_TYPE (DECL_INITIAL (decl)) = type;
diff --git a/gcc/testsuite/gcc.dg/c23-empty-init-4.c 
b/gcc/testsuite/gcc.dg/c23-empty-init-4.c
new file mode 100644
index ..491343c053d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c23-empty-init-4.c
@@ -0,0 +1,10 @@
+/* Test C23 support for empty initializers: invalid for empty arrays in
+   compound literals (bug 114266).  */
+/* { dg-do compile } */
+/* { dg-options "-std=c23 -pedantic-errors" } */
+
+void
+f ()
+{
+  (int []) { }; /* { dg-error "array of unknown size with empty initializer" } 
*/
+}

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] Fortran: fix checking of protected variables in submodules [PR83135]

2024-11-20 Thread Jerry D

On 11/20/24 1:08 PM, Harald Anlauf wrote:

Dear all,

the attached, actually rather straightforward patch fixes the checking of
protected variables in submodules.  When a variable was use-associated
in an ancestor module, we failed to properly diagnose this.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald



Yes, looks good to go.

Jerry


[pushed] libstdc++: remove JSON comment.

2024-11-20 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Standard JSON doesn't have comments, and it seems this file needs to be
conforming, not the common JSON-with-comments dialect.

libstdc++-v3/ChangeLog:

* src/c++23/libstdc++.modules.json.in: Remove C++ comment.
---
 libstdc++-v3/src/c++23/libstdc++.modules.json.in | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libstdc++-v3/src/c++23/libstdc++.modules.json.in 
b/libstdc++-v3/src/c++23/libstdc++.modules.json.in
index 063474805ad..57907d52d58 100644
--- a/libstdc++-v3/src/c++23/libstdc++.modules.json.in
+++ b/libstdc++-v3/src/c++23/libstdc++.modules.json.in
@@ -1,4 +1,3 @@
-// C++ module metadata, to install alongside libstdc++.so
 {
   "version": 1,
   "revision": 1,

base-commit: a4842917dcb8e6524ddf2574e5a0dc869fda1885
-- 
2.47.0



[PATCH] arm, mve: Fix arm_mve_dlstp_check_dec_counter's use of single_pred

2024-11-20 Thread Andre Vieira (lists)

Hi,

Looks like single_pred ICEs if the basic-block does not have a single 
predecessor rather than return NULL, which was what this snippet of code 
relied on.
This feels like borderline obvious to me as a fix, but I thought I'd get 
it checked by one more person.


Call 'single_pred_p' before 'single_pred' to verify it is safe to do so.

gcc/ChangeLog:

* config/arm/arm.cc (arm_mve_dlstp_check_dec_counter): Call
single_pred_p to verify it's safe to call single_pred.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/dlstp-loop-form.c: Add loop that triggered ICE.diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
0f72f3a9031237192c6362760203fe489946b948..030af7c801f8afeb9577b4e7d7637c17d6f5f638
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -35370,9 +35370,10 @@ arm_mve_dlstp_check_dec_counter (loop *loop, rtx_insn* 
vctp_insn,
 return NULL;
   else if (REG_P (condconst))
 {
-  basic_block pre_loop_bb = single_pred (loop_preheader_edge (loop)->src);
-  if (!pre_loop_bb)
+  basic_block preheader_b = loop_preheader_edge (loop)->src;
+  if (!single_pred_p (preheader_b))
return NULL;
+  basic_block pre_loop_bb = single_pred (preheader_b);
 
   rtx initial_compare = NULL_RTX;
   if (!(prev_nonnote_nondebug_insn_bb (BB_END (pre_loop_bb))
diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c 
b/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
index 
a1b26873d7908035c726e3724c91b186c697bc60..08811cef5687e94676db3d27be521602a60e7600
 100644
--- a/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
+++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
@@ -25,3 +25,15 @@ void n() {
   }
 }
 
+int a;
+void g2() {
+  long b;
+  while (a) {
+char *c;
+for (long d = b; d > 0; d -= 4) {
+  mve_pred16_t e = vctp32q(d);
+  int32x4_t f;
+  vstrbq_p_s32(c, f, e);
+}
+  }
+}


Re: [PATCH htdocs] bugs: mention ASAN too

2024-11-20 Thread Gerald Pfeifer
On Mon, 11 Nov 2024, Sam James wrote:
> Request that reporters try `-fsanitize=address,undefined` rather than
> just `-fsanitize=undefined` when reporting bugs. We get invalid bug
> reports which ASAN would've caught sometimes, even if it's less often
> than where UBSAN would help.

I don't have a strong opinion on this and would prefer someone else to 
chime in. That said, if we don't hear from someone else by early next 
week, please go ahead and push.


Just one (naive) question: Are there instances where -fsanitize=undefined 
may be available/working where -fsanitize=address,undefined may be not?

If so, perhaps provide both invocations as in
   -fsanitize=undefined or -fsanitize=address,un...
?

Your call; just a thought.

Gerald


>  htdocs/bugs/index.html | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html
> index c7d2f310..d6556b26 100644
> --- a/htdocs/bugs/index.html
> +++ b/htdocs/bugs/index.html
> @@ -52,7 +52,7 @@ try a current release or development snapshot.
>  with gcc -Wall -Wextra and see whether this shows anything
>  wrong with your code.  Similarly, if compiling with
>  -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations
> -makes a difference, or if compiling with -fsanitize=undefined
> +makes a difference, or if compiling with 
> -fsanitize=address,undefined
>  produces any run-time errors, then your code is probably not correct.
>  


Re: [PATCH] aarch64: Fix aarch64 after moving to C23

2024-11-20 Thread Andrew Pinski
On Wed, Nov 20, 2024 at 10:09 AM Richard Sandiford
 wrote:
>
> Andrew Pinski  writes:
> > This fixes a few aarch64 specific testcases after the move to default to 
> > GNU C23.
> > For the SME testcases, I decided to add a new one for the GNU C23 case as 
> > `()` changing
> > to mean `(void)` instead of a non-prototype declaration and add 
> > `-std=gnu17` to the old one.
> > For pic-*.c `-Wno-old-style-definition` was added not to warn about old 
> > style definitions.
> > For pr113573.c, I added `-std=gnu17` since I was not sure if `(...)` with 
> > C23 would invoke
> > the same issue.
> >
> > tested for aarch64-linux-gnu.
> >
> >   PR testsuite/117680
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/pic-constantpool1.c: Add 
> > -Wno-old-style-definition.
> >   * gcc.target/aarch64/pic-symrefplus.c: Likewise.
> >   * gcc.target/aarch64/pr113573.c: Add `-std=gnu17`
> >   * gcc.target/aarch64/sme/streaming_mode_1.c: Likewise.
> >   * gcc.target/aarch64/sme/za_state_1.c: Likewise.
> >   * gcc.target/aarch64/sme/za_state_2.c: Likewise.
> >   * gcc.target/aarch64/sme/streaming_mode_5.c: New test.
> >   * gcc.target/aarch64/sme/za_state_7.c: New test.
> >   * gcc.target/aarch64/sme/za_state_8.c: New test.
>
> Thanks for dealing with this.  I was going to look at the SME ones
> after clearing email/review backlog.
>
> The changes relative to streaming_mode_1.c are:
>
> @@ -7,7 +10,7 @@
>  void sc_b () [[arm::streaming_compatible]]; // { dg-error "conflicting 
> types" }
>
>  void sc_c () [[arm::streaming_compatible]];
> -void sc_c () {} // Inherits attribute from declaration (confusingly).
> +void sc_c () {} // { dg-error "conflicting types" }
>
>  void sc_d ();
>  void sc_d () [[arm::streaming_compatible]] {} // { dg-error "conflicting 
> types" }
> @@ -33,7 +36,7 @@
>  void s_b () [[arm::streaming]]; // { dg-error "conflicting types" }
>
>  void s_c () [[arm::streaming]];
> -void s_c () {} // Inherits attribute from declaration (confusingly).
> +void s_c () {} // { dg-error "conflicting types" }
>
>  void s_d ();
>  void s_d () [[arm::streaming]] {} // { dg-error "conflicting types" }
>
> As the comment implies, the old behaviour for these two lines wasn't
> really what we wanted.  The attributes on () declarations were strictly
> enforced elsewhere, even given the old semantics of ().
>
> So I think we should just update the existing streaming_mode* and
> za_state* tests to cover the new default behaviour, rather than create
> new tests.  OK with that change.

Attached is what I pushed. It removes the new tests and just updates
the old tests. In the commit message I added some small commentary
about what was done.

Thanks,
Andrew


>
> Richard
>
> > Signed-off-by: Andrew Pinski 
> > ---
> >  .../gcc.target/aarch64/pic-constantpool1.c|   2 +-
> >  .../gcc.target/aarch64/pic-symrefplus.c   |   2 +-
> >  gcc/testsuite/gcc.target/aarch64/pr113573.c   |   2 +-
> >  .../gcc.target/aarch64/sme/streaming_mode_1.c |   2 +-
> >  .../gcc.target/aarch64/sme/streaming_mode_5.c | 133 +++
> >  .../gcc.target/aarch64/sme/za_state_1.c   |   2 +-
> >  .../gcc.target/aarch64/sme/za_state_2.c   |   2 +-
> >  .../gcc.target/aarch64/sme/za_state_7.c   | 160 ++
> >  .../gcc.target/aarch64/sme/za_state_8.c   |  77 +
> >  9 files changed, 376 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_5.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_7.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_8.c
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pic-constantpool1.c 
> > b/gcc/testsuite/gcc.target/aarch64/pic-constantpool1.c
> > index 755c0b67ea4..1a5da9aacfa 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/pic-constantpool1.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/pic-constantpool1.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-options "-O2 -mcmodel=small -fPIC" }  */
> > +/* { dg-options "-Wno-old-style-definition -O2 -mcmodel=small -fPIC" }  */
> >  /* { dg-do compile } */
> >  /* { dg-require-effective-target fpic } */
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pic-symrefplus.c 
> > b/gcc/testsuite/gcc.target/aarch64/pic-symrefplus.c
> > index 0c5e7fe7fb4..ca019ce3b33 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/pic-symrefplus.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/pic-symrefplus.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-options "-O2 -mcmodel=small -fPIC -fno-builtin" }  */
> > +/* { dg-options "-Wno-old-style-definition -O2 -mcmodel=small -fPIC 
> > -fno-builtin" }  */
> >  /* { dg-do compile } */
> >  /* { dg-require-effective-target fpic } */
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr113573.c 
> > b/gcc/testsuite/gcc.target/aarch64/pr113573.c
> > index fc8607f7218..30175c4cb5c 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/pr113573.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr113573.c
>

Re: [PATCH htdocs] bugs: mention ASAN too

2024-11-20 Thread Sam James
Sam James  writes:

> Request that reporters try `-fsanitize=address,undefined` rather than
> just `-fsanitize=undefined` when reporting bugs. We get invalid bug
> reports which ASAN would've caught sometimes, even if it's less often
> than where UBSAN would help.
> ---
> OK?

Ping.

>
>  htdocs/bugs/index.html | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html
> index c7d2f310..d6556b26 100644
> --- a/htdocs/bugs/index.html
> +++ b/htdocs/bugs/index.html
> @@ -52,7 +52,7 @@ try a current release or development snapshot.
>  with gcc -Wall -Wextra and see whether this shows anything
>  wrong with your code.  Similarly, if compiling with
>  -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations
> -makes a difference, or if compiling with -fsanitize=undefined
> +makes a difference, or if compiling with 
> -fsanitize=address,undefined
>  produces any run-time errors, then your code is probably not correct.
>  
>  
>
> base-commit: 96aaafdcdba21aad22fb1b745c75a01855dc5f0c


Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

2024-11-20 Thread Qing Zhao


> On Nov 20, 2024, at 14:23, Martin Uecker  wrote:
> 
> Am Mittwoch, dem 20.11.2024 um 17:37 + schrieb Qing Zhao:
>> Hi, Martin,
>> 
>> Thanks a lot for pointing this out. 
>> 
>> This does look like a problem we need avoid for the pointer arrays.
>> 
>> Does  the same problem exist in the language extension too if the n is 
>> allowed to be changed after initialization?
>> 
>> If so, for the future language extension, is there any proposed solution to 
>> this problem? 
>> 
> 
> There is no specification yet and nothing formally proposed, so
> it is entirely unclear at this point.
> 
> My idea would be to give 'x->buf' the type '(char(*)[x->n])'
> where 'x->n' is loaded at the same time 'x->buf' is accessed
> and then the value is frozen (like in the simpler versions
> of 'counted_by' you had implemented first).  Of course, one
> would then have to set x->n before setting the buffer (or
> at the same time). This could be ensured by making the
> member 'n' const, so that it can only be changed by
> overwriting the whole struct. But I am still thinking
> about this.

Okay, so the key here is:  
x->n, x->p can only be changed by changing the whole structure at the same 
time. 

Otherwise, x->n might not be consistent with x->p. 

We might need to add such limitation for the counted-by attribute of pointer 
field in the documentation. 

> 
> In any case, I think for "counted_by" this is not an option
> because it would be confusing if it works differently
> than for the FAM case.

But we need to clearly let the user know the limitation of using counted-by 
attribute for pointer field. 

Qing
> 
> Martin
> 
> 
>> Qing
>>> On Nov 20, 2024, at 10:52, Martin Uecker  wrote:
>>> 
>>> Am Mittwoch, dem 20.11.2024 um 15:27 + schrieb Qing Zhao:
 
> On Nov 19, 2024, at 10:47, Marek Polacek  wrote:
> 
> On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote:
>> Am Montag, dem 18.11.2024 um 17:55 + schrieb Qing Zhao:
>>> Hi,
 
>>> ..
>>> 
>>> Hi Qing,
>>> 
 Per our discussion so far, if treating the following
 
 struct foo {
 int n;
 char *p __attribute__ ((counted_by (n)));
 };
 
 as an array with upper-bounds being “n” is reasonable.
>>> 
>>> There is one issue I want to point out, which I just realized during
>>> a discussion in WG14.  For "counted_by" we defined the semantics
>>> in a way that later store to 'n' will be taken into account. 
>>> We did this to support the use case where 'n' is set after
>>> the access to 'p'.
>>> 
>>> struct foo *x = ;
>>> 
>>> char *q = x->p;
>>> x->n = 100; // this should apply
>>> 
>>> 
>>> For FAMs this is fine, because it is a fixed part
>>> of the struct.  But for the new pointer case, I think this is
>>> problematic.  Consider this example:
>>> 
>>> struct foo *x = allocate_buffer(100);
>>> 
>>> where x->n is set to the right value in the allocation function.
>>> 
>>> Now let's continue with
>>> 
>>> char *q = x->p;
>>> 
>>> x = allocate_buffer(50);
>>> // x->n is set to 50.
>>> 
>>> Now q refers to the old buffer, but x->n to the size of the new
>>> buffer.  That does not seem right and scares me a little bit.
>>> 
>>> 
>>> 
>>> Martin
>>> 
>>> 
 
 Then, it’s reasonable to extend -fsanitize=bounds to instrument the 
 corresponding reference for the pointer with
 Counted-by attribute. 
 
 What do you think?
 
 Qing
 
> 
>> I think the question is what -fsanitize=bounds is meant to be.
>> 
>> I am a bit frustrated about the sanitizer.  On the
>> one hand, it is not doing enough to get spatial memory
>> safety even where this would be easily possible, on the
>> other hand, is pedantic about things which are technically
>> UB but not problematic and then one is prevented from
>> using it
>> 
>> When used in default mode, where execution continues, it
>> also does not mix well with many warning, creates more code,
>> and pulls in a libary dependency (and the library also depends
>> on upstream choices / progress which seems a limitation for
>> extensions).
>> 
>> What IMHO would be ideal is a protection mode for spatial
>> memory safety that simply adds traps (which then requires
>> no library, has no issues with other warnings, and could
>> evolve independently from clang) 
>> 
>> So shouldn't we just add a -fboundscheck (which would 
>> be like -fsanitize=bounds -fsanitize-trap=bounds just with
>> more checking) and make it really good? I think many people
>> would be very happy about this.
> 
> That's a separate concern.  We already have the -fbounds-check option,
> currently only used in Fortran (and D?), so perhaps we could make
> that option a shorthand for -fsanitize=bounds -fsanitize-trap=bounds.
> 
> Marek
> 
 
>>> 
>> 
> 



Re: [PATCH] testsuite: Require C99 for pow-to-ldexp.c

2024-11-20 Thread Jeff Law




On 11/19/24 2:03 AM, Soumya AR wrote:

pow-to-ldexp.c checks for calls to __builtin_ldexpf and __builtin_ldexpl, which
will only be performed when the compiler knows the target has a C99 libm
available.

Modified the test to add a C99 runtime requirement.

This fixes the failure on arm-eabi targets for this test case.

Committed as obvious: 90645dba41bac29cab4c5996ba320c97a0325eb2

Signed-off-by: Soumya AR 

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pow-to-ldexp.c: Require c99_runtime.

OK
jeff



[PATCH] json parsing: avoid relying on floating point equality [PR117677]

2024-11-20 Thread David Malcolm
Sorry about the breakage.

I wasn't able to reproduce the failures myself, but the following
patch seems plausible as a fix; does it fix the affected
configurations?

gcc/ChangeLog:
PR bootstrap/117677
* json-parsing.cc (selftest::test_parse_number): Replace
ASSERT_EQ of 'double' values with ASSERT_NEAR.  Eliminate
ASSERT_PRINT_EQ for such values.
* selftest.h (ASSERT_NEAR): New.
(ASSERT_NEAR_AT): New.

Signed-off-by: David Malcolm 
---
 gcc/json-parsing.cc |  9 +++--
 gcc/selftest.h  | 20 
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/gcc/json-parsing.cc b/gcc/json-parsing.cc
index 78188c4fef9c..457d78f97cfa 100644
--- a/gcc/json-parsing.cc
+++ b/gcc/json-parsing.cc
@@ -2028,8 +2028,7 @@ test_parse_number ()
 ASSERT_EQ (tc.get_error (), nullptr);
 const json::value *jv = tc.get_value ();
 ASSERT_EQ (JSON_FLOAT, jv->get_kind ());
-ASSERT_EQ (3.141, ((const json::float_number *)jv)->get ());
-ASSERT_PRINT_EQ (*jv, true, "3.141");
+ASSERT_NEAR (3.141, ((const json::float_number *)jv)->get (), 0.001);
 auto range = tc.get_range_for_value (jv);
 ASSERT_TRUE (range);
 ASSERT_RANGE_EQ (*range,
@@ -2044,8 +2043,7 @@ test_parse_number ()
   ASSERT_EQ (tc.get_error (), nullptr);
   const json::value *jv = tc.get_value ();
   ASSERT_EQ (jv->get_kind (), JSON_FLOAT);
-  ASSERT_EQ (as_a  (jv)->get (), 3.141);
-  ASSERT_PRINT_EQ (*jv, true, "3.141");
+  ASSERT_NEAR (as_a  (jv)->get (), 3.141, 0.1);
   auto range = tc.get_range_for_value (jv);
   ASSERT_TRUE (range);
   ASSERT_RANGE_EQ (*range,
@@ -2070,8 +2068,7 @@ test_parse_number ()
   ASSERT_EQ (tc.get_error (), nullptr);
   const json::value *jv = tc.get_value ();
   ASSERT_EQ (jv->get_kind (), JSON_FLOAT);
-  ASSERT_EQ (as_a  (jv)->get (), 4.2);
-  ASSERT_PRINT_EQ (*jv, true, "4.2");
+  ASSERT_NEAR (as_a  (jv)->get (), 4.2, 0.1);
   auto range = tc.get_range_for_value (jv);
   ASSERT_TRUE (range);
   ASSERT_RANGE_EQ (*range,
diff --git a/gcc/selftest.h b/gcc/selftest.h
index c6206e55428d..500095da79ca 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -338,6 +338,26 @@ extern int num_passes;
 ::selftest::fail ((LOC), desc_);  \
   SELFTEST_END_STMT
 
+/* Evaluate VAL1 and VAL2 and compare them, calling
+   ::selftest::pass if they are within ABS_ERROR of each other,
+   ::selftest::fail if they are not.  */
+
+#define ASSERT_NEAR(VAL1, VAL2, ABS_ERROR) \
+  ASSERT_NEAR_AT ((SELFTEST_LOCATION), (VAL1), (VAL2), (ABS_ERROR))
+
+/* Like ASSERT_NEAR, but treat LOC as the effective location of the
+   selftest.  */
+
+#define ASSERT_NEAR_AT(LOC, VAL1, VAL2, ABS_ERROR)\
+  SELFTEST_BEGIN_STMT \
+  const char *desc_ = "ASSERT_NEAR (" #VAL1 ", " #VAL2 ", " #ABS_ERROR ")"; \
+  double error = fabs ((VAL1) - (VAL2));   \
+  if (error < (ABS_ERROR)) \
+::selftest::pass ((LOC), desc_);   \
+  else \
+::selftest::fail ((LOC), desc_);   \
+  SELFTEST_END_STMT
+
 /* Evaluate VAL1 and VAL2 and compare them with known_eq, calling
::selftest::pass if they are always equal,
::selftest::fail if they might be non-equal.  */
-- 
2.26.3



[PATCH] simplify-rtx: Limit number of elts in when encoding.

2024-11-20 Thread Robin Dapp
Hi,

this came up when testing even/odd permutes on riscv
(https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669181.html).
I didn't yet continue with the patch but it's clear it
exposes an issue with encoding vector masks.

When we encode a vector mask with a constant number of elements
and fewer than BITS_PER_UNIT elements (i.e. 4 for a mask {1, 0, 1, 0})
native_encode_rtx results in a value 85 = 0b10101010.

This is because CONST_VECTOR_ELT assumes it's operating on an encoded
sequence and will happily return values beyond the actual number of
elements.  Subsequently, when optimizing the constant pool, the
hash values of {1, 0, 1, 0} and {1, 0, 1, 0, 1, 0, 1, 0} are identical.
Then one of them is linked to the other one, resulting in wrong code.

Therefore, this patch limits the number of elements to consider when
building the value to GET_MODE_NUNITS in case the latter is constant
and less than BITS_PER_UNIT.

Bootstrapped and regtested on x86, regtested on riscv.

Regards
 Robin

gcc/ChangeLog:

* simplify-rtx.cc (native_encode_rtx): Limit number of elements
that are used for encoded value.
---
 gcc/simplify-rtx.cc | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 893c5f6e1ae..df4ad4db941 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -7332,6 +7332,15 @@ native_encode_rtx (machine_mode mode, rtx x, 
vec &bytes,
 vectors can have several elements per byte.  */
   unsigned int elt_bits = vector_element_size (GET_MODE_PRECISION (mode),
   GET_MODE_NUNITS (mode));
+  poly_uint16 nunits = GET_MODE_NUNITS (mode);
+
+  /* For a const vector with a constant number of elements (and fewer
+than BITS_PER_UNIT elements) CONST_VECTOR_ELT will return values
+beyond the actual number of elements.*/
+  unsigned int max_elt_bits = (num_bytes == 1 && nunits.is_constant ())
+   ? MIN (BITS_PER_UNIT, nunits.to_constant ())
+   : BITS_PER_UNIT;
+
   unsigned int elt = first_byte * BITS_PER_UNIT / elt_bits;
   if (elt_bits < BITS_PER_UNIT)
{
@@ -7342,7 +7351,7 @@ native_encode_rtx (machine_mode mode, rtx x, 
vec &bytes,
  for (unsigned int i = 0; i < num_bytes; ++i)
{
  target_unit value = 0;
- for (unsigned int j = 0; j < BITS_PER_UNIT; j += elt_bits)
+ for (unsigned int j = 0; j < max_elt_bits; j += elt_bits)
{
  if (INTVAL (CONST_VECTOR_ELT (x, elt)))
value |= mask << j;
-- 
2.47.0



[PATCH] Fortran: fix checking of protected variables in submodules [PR83135]

2024-11-20 Thread Harald Anlauf
Dear all,

the attached, actually rather straightforward patch fixes the checking of
protected variables in submodules.  When a variable was use-associated
in an ancestor module, we failed to properly diagnose this.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 05bc3abfc24b38f0a6e74aa09f97e0bc05dc9511 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 20 Nov 2024 21:59:22 +0100
Subject: [PATCH] Fortran: fix checking of protected variables in submodules
 [PR83135]

When a symbol was use-associated in the ancestor of a submodule, a
PROTECTED attribute was ignored in the submodule or its descendants.
Find the real ancestor of symbols when used in a variable definition
context in a submodule.

	PR fortran/83135

gcc/fortran/ChangeLog:

	* expr.cc (sym_is_from_ancestor): New helper function.
	(gfc_check_vardef_context): Refine checking of PROTECTED attribute
	of symbols that are indirectly use-associated in a submodule.

gcc/testsuite/ChangeLog:

	* gfortran.dg/protected_10.f90: New test.
---
 gcc/fortran/expr.cc| 40 ++--
 gcc/testsuite/gfortran.dg/protected_10.f90 | 75 ++
 2 files changed, 110 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/protected_10.f90

diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index 01fbc442546..fdbf9916640 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/expr.cc
@@ -6272,6 +6272,33 @@ gfc_build_intrinsic_call (gfc_namespace *ns, gfc_isym_id id, const char* name,
 }


+/* Check if a symbol referenced in a submodule is declared in the ancestor
+   module and not accessed by use-association, and that the submodule is a
+   descendant.  */
+
+static bool
+sym_is_from_ancestor (gfc_symbol *sym)
+{
+  const char dot[2] = ".";
+  /* Symbols take the form module.submodule_ or module.name_. */
+  char ancestor_module[2 * GFC_MAX_SYMBOL_LEN + 2];
+  char *ancestor;
+
+  if (sym == NULL
+  || sym->attr.use_assoc
+  || !sym->attr.used_in_submodule
+  || !sym->module
+  || !sym->ns->proc_name
+  || !sym->ns->proc_name->name)
+return false;
+
+  memset (ancestor_module, '\0', sizeof (ancestor_module));
+  strcpy (ancestor_module, sym->ns->proc_name->name);
+  ancestor = strtok (ancestor_module, dot);
+  return strcmp (ancestor, sym->module) == 0;
+}
+
+
 /* Check if an expression may appear in a variable definition context
(F2008, 16.6.7) or pointer association context (F2008, 16.6.8).
This is called from the various places when resolving
@@ -6450,21 +6477,24 @@ gfc_check_vardef_context (gfc_expr* e, bool pointer, bool alloc_obj,
 }

   /* PROTECTED and use-associated.  */
-  if (sym->attr.is_protected && sym->attr.use_assoc && check_intentin)
+  if (sym->attr.is_protected
+  && (sym->attr.use_assoc
+	  || (sym->attr.used_in_submodule && !sym_is_from_ancestor (sym)))
+  && check_intentin)
 {
   if (pointer && is_pointer)
 	{
 	  if (context)
-	gfc_error ("Variable %qs is PROTECTED and cannot appear in a"
-		   " pointer association context (%s) at %L",
+	gfc_error ("Variable %qs is PROTECTED and cannot appear in a "
+		   "pointer association context (%s) at %L",
 		   sym->name, context, &e->where);
 	  return false;
 	}
   if (!pointer && !is_pointer)
 	{
 	  if (context)
-	gfc_error ("Variable %qs is PROTECTED and cannot appear in a"
-		   " variable definition context (%s) at %L",
+	gfc_error ("Variable %qs is PROTECTED and cannot appear in a "
+		   "variable definition context (%s) at %L",
 		   sym->name, context, &e->where);
 	  return false;
 	}
diff --git a/gcc/testsuite/gfortran.dg/protected_10.f90 b/gcc/testsuite/gfortran.dg/protected_10.f90
new file mode 100644
index 000..1bb20983e94
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/protected_10.f90
@@ -0,0 +1,75 @@
+! { dg-do compile }
+! PR fortran/83135 - fix checking of protected variables in submodules
+
+module mod1
+  implicit none
+  private
+  integer, protected, public :: xx = 42
+  public :: set_xx
+  public :: echo1_xx, echo2_xx
+  interface
+ module subroutine echo1_xx()
+ end subroutine echo1_xx
+ module subroutine echo2_xx()
+ end subroutine echo2_xx
+  end interface
+contains
+  subroutine set_xx(arg)
+integer, intent(in) :: arg
+xx = arg! valid (it is host_associated)
+  end
+end module
+!
+submodule (mod1) s1mod1
+  implicit none
+contains
+  module subroutine echo1_xx()
+xx = 11 ! valid (it is from the ancestor)
+write(*,*) "xx=", xx
+  end subroutine echo1_xx
+end submodule
+!
+submodule (mod1:s1mod1) s2mod1
+  implicit none
+contains
+  module subroutine echo2_xx()
+xx = 12 ! valid (it is from the ancestor)
+write(*,*) "xx=", xx
+  end subroutine echo2_xx
+end submodule
+!
+module mod2
+  use mod1
+  implicit none
+  integer, protected, public :: yy = 43
+  interface
+ module subroutine echo_xx()
+ end subroutine echo_xx
+  end inte

Re: [PATCH] sibcall: Adjust BLKmode argument size for alignment padding

2024-11-20 Thread Richard Sandiford
"H.J. Lu"  writes:
> On Wed, Nov 20, 2024 at 2:12 AM Richard Sandiford
>  wrote:
>>
>> "H.J. Lu"  writes:
>> > Adjust BLKmode argument size for parameter alignment for sibcall check.
>> >
>> > gcc/
>> >
>> > PR middle-end/117098
>> > * calls.cc (store_one_arg): Adjust BLKmode argument size for
>> > alignment padding for sibcall check.
>> >
>> > gcc/testsuite/
>> >
>> > PR middle-end/117098
>> > * gcc.dg/sibcall-12.c: New test.
>> >
>> > OK for master?
>> >
>> >
>> > H.J.
>> > From 8b0518906cb23a9b5e77b04d6132c49047daebd2 Mon Sep 17 00:00:00 2001
>> > From: "H.J. Lu" 
>> > Date: Sun, 13 Oct 2024 04:53:14 +0800
>> > Subject: [PATCH] sibcall: Adjust BLKmode argument size for alignment 
>> > padding
>> >
>> > Adjust BLKmode argument size for parameter alignment for sibcall check.
>> >
>> > gcc/
>> >
>> >   PR middle-end/117098
>> >   * calls.cc (store_one_arg): Adjust BLKmode argument size for
>> >   alignment padding for sibcall check.
>> >
>> > gcc/testsuite/
>> >
>> >   PR middle-end/117098
>> >   * gcc.dg/sibcall-12.c: New test.
>> >
>> > Signed-off-by: H.J. Lu 
>> > ---
>> >  gcc/calls.cc  |  4 +++-
>> >  gcc/testsuite/gcc.dg/sibcall-12.c | 13 +
>> >  2 files changed, 16 insertions(+), 1 deletion(-)
>> >  create mode 100644 gcc/testsuite/gcc.dg/sibcall-12.c
>> >
>> > diff --git a/gcc/calls.cc b/gcc/calls.cc
>> > index c5c26f65280..163c7e509d9 100644
>> > --- a/gcc/calls.cc
>> > +++ b/gcc/calls.cc
>> > @@ -5236,7 +5236,9 @@ store_one_arg (struct arg_data *arg, rtx argblock, 
>> > int flags,
>> > /* expand_call should ensure this.  */
>> > gcc_assert (!arg->locate.offset.var
>> > && arg->locate.size.var == 0);
>> > -   poly_int64 size_val = rtx_to_poly_int64 (size_rtx);
>> > +   /* Adjust for argument alignment padding.  */
>> > +   poly_int64 size_val = ROUND_UP (UINTVAL (size_rtx),
>> > +   parm_align / BITS_PER_UNIT);
>>
>> This doesn't look right to me.  For one thing, going from
>> rtx_to_poly_int64 to UINTVAL drops support for non-constant parameters.
>> But even ignoring that, I think padding size_val (the size of arg->value
>> IIUC) will pessimise the later:
>>
>>   else if (maybe_in_range_p (arg->locate.offset.constant,
>>  i, size_val))
>> sibcall_failure = true;
>>
>> and so cause sibcall failures elsewhere.  I'm also not sure this
>> accurately reproduces the padding that is added by locate_and_pad_parm
>> for all cases (arguments that grow up vs down, padding below vs above
>> the argument).
>>
>> AIUI, the point of the:
>>
>>   if (known_eq (arg->locate.offset.constant, i))
>> {
>>   /* Even though they appear to be at the same location,
>>  if part of the outgoing argument is in registers,
>>  they aren't really at the same location.  Check for
>>  this by making sure that the incoming size is the
>>  same as the outgoing size.  */
>>   if (maybe_ne (arg->locate.size.constant, size_val))
>> sibcall_failure_1 = true;
>> }
>
> Does this
>
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 246abe34243..98429cc757f 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -5327,7 +5327,13 @@ store_one_arg (struct arg_data *arg, rtx
> argblock, int flags,
>they aren't really at the same location.  Check for
>this by making sure that the incoming size is the
>same as the outgoing size.  */
> -   if (maybe_ne (arg->locate.size.constant, size_val))
> +   poly_int64 aligned_size;
> +   if (CONST_INT_P (size_rtx))
> + aligned_size = ROUND_UP (UINTVAL (size_rtx),
> +   parm_align / BITS_PER_UNIT);
> +   else
> + aligned_size = size_val;
> +   if (maybe_ne (arg->locate.size.constant, aligned_size))
>   sibcall_failure = true;
>   }
>  else if (maybe_in_range_p (arg->locate.offset.constant,
>
> look correct?

Heh.  Playing the reviewer here, I was kind-of hoping you'd explain
why it was correct to me :)

But conceptually, the call is copying from arg->value to arg->locate.
And this code is trying to detect whether the copy is a nop, whether
it overlaps, or whether the source and destination are distinct.

It feels odd to grow the size of arg->value (the source of the copy),
given that the extra bytes shouldn't be copied.

Like I mentioned in the previous reply, it feels to me like testing
partial != 0 would deal with the situation described in the comment,
and the testcase in PR32602.  Personally I think we should try that first.
Maybe it's wrong (quite probable), but if so, it'd be good to know and
understand why.

If that fails, perhaps a fallback would be to test:

maybe_lt (arg->locate.size.cons

Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

2024-11-20 Thread Qing Zhao


> On Nov 19, 2024, at 16:36, Martin Uecker  wrote:
> 
> Am Montag, dem 18.11.2024 um 21:31 + schrieb Qing Zhao:
>> 
>>> On Nov 18, 2024, at 13:10, Martin Uecker  wrote:
>> 
> ...
>> So, I guess that the more accurate question is, for the following:
>> 
>> struct annotated {
>>  int b;
>>  int *c __attribute__ ((counted_by (b)));
>> } *p_array_annotated;
>> 
>>  p_array_annotated->c[10] = 2;
>> 
>> 
>> Should we treat the reference “p_array_annotated->c[10]” as 
>> an array reference if the pointer field “c” in the “struct annotated” 
>> has the counted_by attribute? 
> 
> Assuming UBSan is they way to go, then yes and I agree
> that after casting to another type this should not be
> done anymore.

At this moment, I think that using “counted_by” attribute attached to the 
pointer field in the current
UBsan might be a reasonable and practical approach. 

If we get more request for a new option -fboundscheck without dependence on the 
shared C++ library,
We might need spend more time for that separate task. 

Currently, as I know, GCC provides the following options:

-fisolate-erroneous-paths-dereference ¶
Detect paths that trigger erroneous or undefined behavior due to dereferencing 
a null pointer. Isolate those paths from the main control flow and turn the 
statement with erroneous or undefined behavior into a trap. This flag is 
enabled by default at -O2 and higher and depends on 
-fdelete-null-pointer-checks also being enabled.

-fisolate-erroneous-paths-attribute
Detect paths that trigger erroneous or undefined behavior due to a null value 
being used in a way forbidden by a returns_nonnull or nonnull attribute. 
Isolate those paths from the main control flow and turn the statement with 
erroneous or undefined behavior into a trap. This is not currently enabled, but 
may be enabled by -O2 in the future.

From my understanding, the new option -fboundscheck as you proposed is similar 
to the above two options, is my understanding correct? 


> 
>> 
>>> 
>>> I am a bit frustrated about the sanitizer.  On the
>>> one hand, it is not doing enough to get spatial memory
>>> safety even where this would be easily possible, on the
>>> other hand, is pedantic about things which are technically
>>> UB but not problematic and then one is prevented from
>>> using it. 
>> 
>> Yes, In order to make sanitizer better, both the above issues need to be 
>> addressed. 
>>> 
>>> When used in default mode, where execution continues, it
>>> also does not mix well with many warning, creates more code,
>>> and pulls in a libary dependency (and the library also depends
>>> on upstream choices / progress which seems a limitation for
>>> extensions).
>> 
>> Right, all these are existing issues with the current sanitizer. 
>>> 
>>> What IMHO would be ideal is a protection mode for spatial
>>> memory safety that simply adds traps (which then requires
>>> no library, has no issues with other warnings, and could
>>> evolve independently from clang)
>>> 
>>> So shouldn't we just add a -fboundscheck (which would 
>>> be like -fsanitize=bounds -fsanitize-trap=bounds just with
>>> more checking) and make it really good? I think many people
>>> would be very happy about this.
>> 
>> Then why not just fix the known issues in the current
>> -fsanitize=bounds -fsanitize-trap=bounds to make it better?
>> What’s the major benefit to add another new option? 
> 
> The question is how to fix this?  
> 
> At the moment the sanitizer is tied to a shared C++ library
> maintained elsewhere (I believe) with a design that ties
> every specific case to a specific entry point in this library.
> 
> So the UBsan handlers become part of an ABI that needs to
> be maintained and upgraded.  Also you need to reimplement
> this when using it somewhere we you can't have a C++ library.
> (I assume kernels or embedded platforms have all their
> own implementations).  If we add something, everything
> needs to be upgraded.  For 'counted_by' and 'bounds' you
> may get a way with the existing message.  

Okay, I see, Yes, that’s really a problem. 
Thanks for your explanation. 

Qing
> 
> 
> Martin



Backported to gcc 14 (7 patches mostly relating to diagnostics, SARIF, analyzer)

2024-11-20 Thread David Malcolm
I've backported the following 7 patches from trunk to releases/gcc-14:

* testsuite: fix analyzer C++ failures on Solaris [PR111475]
  https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650629.html
  * Trunk: r15-131-g5219414f3cde3c
  * gcc 14: r14-10951-g156051d083d91f

* regenerate-opt-urls.py: fix transposed values for "vax" and "v850"
  https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652914.html
  * Trunk: r15-872-g7cc529fe514cc6
  * gcc 14: r14-10952-g54504e8c704f4c

* diagnostics: fixes to SARIF output [PR109360]
  https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655380.html
  * Trunk: r15-1540-g9f4fdc3acebcf6
  * gcc 14: r14-10953-g07485ccd31935b

* testsuite: check that generated .sarif files validate against the
SARIF schema [PR109360]
  https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655381.html
  * Trunk: r15-1541-ga84fe222029ff2
  * gcc 14: r14-10954-gbf01dcd117ceab

* testsuite: use check-jsonschema for validating .sarif files
[PR109360]
  https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655674.html
  * Trunk: r15-1633-g17967907102099
  * gcc 14: r14-10955-gd5d62a38493be4

* analyzer: handle  at -O0 [PR115724]
  https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656403.html
  * Trunk: r15-1845-ga6fdb1a2a29061
  * gcc 14: r14-10956-g0f26f4f76961cd

* SARIF output: fix schema URL [§3.13.3, PR116603]
  https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662679.html
  * Trunk: r15-3553-g38dc2c64710aa0
  * gcc 14: r14-10957-g41344d6077953b

having successfully bootstrapped & regrtested them on x86_64-pc-linux-
gnu.

Dave



Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

2024-11-20 Thread Qing Zhao


> On Nov 19, 2024, at 10:47, Marek Polacek  wrote:
> 
> On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote:
>> Am Montag, dem 18.11.2024 um 17:55 + schrieb Qing Zhao:
>>> Hi,
>>> 
>>> I am working on extending “counted_by” attribute to pointers inside a 
>>> structure per our previous discussion. 
>>> 
>>> I need advice on the following question:
>>> 
>>> Should -fsantize=bounds support array reference that was referenced through 
>>> a pointer that has counted_by attribute?
> 
> I don't see why it couldn't,

Okay, based on our discussion so far, looks like we all agree that it’s 
reasonable to extend the sanitizer to support the detection of out-of-bounds 
array reference that was referenced through a pointer field with counted_by 
attribute. 

Yes, I will implement this. 

> perhaps as part of -fsanitize=bounds-strict.
> Someone has to implement it, though.

Per the current documentations:

-fsanitize=bounds 
This option enables instrumentation of array bounds. Various out of bounds 
accesses are detected. Flexible array members, flexible array member-like 
arrays, and initializers of variables with static storage are not instrumented, 
with the exception of flexible array member-like arrays for which 
-fstrict-flex-arrays or -fstrict-flex-arrays= options or strict_flex_array 
attributes say they shouldn’t be treated like flexible array member-like arrays.


-fsanitize=bounds-strict
This option enables strict instrumentation of array bounds. Most out of bounds 
accesses are detected, including flexible array member-like arrays. 
Initializers of variables with static storage are not instrumented.

Looks like that the only difference between -fsanitize=bounds and 
-fsanitize=bounds-strict is:  -fsanitize=bounds-strict instruments more 
flexible array member-like arrays. 

When the flexible array member is attached with “counted-by” attribute, 
-fsanitize=bounds will instrument the corresponding reference. 

Per our discussion so far, if treating the following

struct foo {
 int n;
 char *p __attribute__ ((counted_by (n)));
};

as an array with upper-bounds being “n” is reasonable.

Then, it’s reasonable to extend -fsanitize=bounds to instrument the 
corresponding reference for the pointer with
Counted-by attribute. 

What do you think?

Qing

> 
>> I think the question is what -fsanitize=bounds is meant to be.
>> 
>> I am a bit frustrated about the sanitizer.  On the
>> one hand, it is not doing enough to get spatial memory
>> safety even where this would be easily possible, on the
>> other hand, is pedantic about things which are technically
>> UB but not problematic and then one is prevented from
>> using it
>> 
>> When used in default mode, where execution continues, it
>> also does not mix well with many warning, creates more code,
>> and pulls in a libary dependency (and the library also depends
>> on upstream choices / progress which seems a limitation for
>> extensions).
>> 
>> What IMHO would be ideal is a protection mode for spatial
>> memory safety that simply adds traps (which then requires
>> no library, has no issues with other warnings, and could
>> evolve independently from clang) 
>> 
>> So shouldn't we just add a -fboundscheck (which would 
>> be like -fsanitize=bounds -fsanitize-trap=bounds just with
>> more checking) and make it really good? I think many people
>> would be very happy about this.
> 
> That's a separate concern.  We already have the -fbounds-check option,
> currently only used in Fortran (and D?), so perhaps we could make
> that option a shorthand for -fsanitize=bounds -fsanitize-trap=bounds.
> 
> Marek
> 



Re: [PATCH 17/17] testsuite: arm: Use effective-target for pr96939 test

2024-11-20 Thread Richard Earnshaw (lists)
On 20/11/2024 15:04, Torbjorn SVENSSON wrote:
> 
> 
> On 2024-11-20 15:53, Richard Earnshaw (lists) wrote:
>> On 20/11/2024 13:00, Torbjorn SVENSSON wrote:
>>>
>>>
>>> On 2024-11-19 18:57, Richard Earnshaw (lists) wrote:
 On 19/11/2024 10:24, Torbjörn SVENSSON wrote:
> Update test case to use -mcpu=unset/-march=unset feature introduced in
> r15-3606-g7d6c6a0d15c.
>
> gcc/testsuite/ChangeLog:
>
>  * gcc.target/arm/lto/pr96939_0.c: Use effective-target
>  arm_arch_v8a.
>  * gcc.target/arm/lto/pr96939_1.c: Remove dg-options.
>
> Signed-off-by: Torbjörn SVENSSON 
> ---
>    gcc/testsuite/gcc.target/arm/lto/pr96939_0.c | 4 ++--
>    gcc/testsuite/gcc.target/arm/lto/pr96939_1.c | 1 -
>    2 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c 
> b/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c
> index 241ffd5da0a..3bb74bd1a1d 100644
> --- a/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c
> +++ b/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c
> @@ -1,7 +1,7 @@
>    /* PR target/96939 */
>    /* { dg-lto-do link } */
> -/* { dg-require-effective-target arm_arch_v8a_ok } */
> -/* { dg-lto-options { { -flto -O2 } } } */
> +/* { dg-require-effective-target arm_arch_v8a_link } */
> +/* { dg-lto-options { { -flto -O2 -mcpu=unset -march=armv8-a+simd+crc } 
> } } */
>      extern unsigned crc (unsigned, const void *);
>    typedef unsigned (*fnptr) (unsigned, const void *);
> diff --git a/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c 
> b/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
> index 4afdbdaf5ad..c641b5580ab 100644
> --- a/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
> +++ b/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
> @@ -1,5 +1,4 @@
>    /* PR target/96939 */
> -/* { dg-options "-march=armv8-a+simd+crc" } */
>      #include 
>    

 I'm not sure this is right.  The PR talks about handling streaming in of 
 objects built with different options, which are supposed to be recorded in 
 the streaming data.  But your change alters what will be recorded AFAICT.
>>>
>>> I was unsure what path I should take to address this test case.
>>> Maybe we should go with the following:
>>>
>>> gcc.target/arm/lto/pr96939_0.c:
>>> /* { dg-lto-do link } */
>>> /* { dg-require-effective-target arm_arch_v8a_link } */
>>> /* { dg-lto-options { { -flto -O2 } } } */
>>>
>>> gcc.target/arm/lto/pr96939_1.c:
>>> /* { dg-options "-mcpu=unset -march=armv8-a+simd+crc -mfpu=auto" } */
>>>
>>
>> Yes, that looks about right.
>>
>>>
>>> Should I also define an effective-target for arm_arch_v8a_crc that checks 
>>> using -march=armv8-a+crc+simd -mfpu=auto -mfloat-abi=softfp and add 
>>> dg-r-e-t for it in the pr96939_0.c file? Or is it safe to assume that this 
>>> architecture is available if v8a is available?
>>
>> LTO tests are slightly special as the require multiple source files to be 
>> compiled in the test.  I don't think it would really work to have different 
>> effective targets for the _1.c files compared to the _0.c files.
> 
> No, it sounds strange to have different architectures flags.
> 
>>
>>>
>>> Keep in mind that I cannot rely on dg-add-otions in an LTO test.
>>> Do we want to run this is -mfloat-abi=softfp or -mfloat-abi=hard mode?
>>
>> I would just copy any -mfloat-abi value that exists in v8a_link (which by 
>> the looks of things is none).  The two files must be compiled with the same 
>> ABI or the link will fail.
> 
> So, should I add -mfloat-abi=softfp to the arm_arch_v8a effective-target 
> definition to allow the require check fail for a hard only build?

For the moment I'd try to avoid that.  It risks changing the behaviour of any 
existing tests that use that effective target.  I'd stick with not overriding 
the float ABI at all, as that provides the greatest chance that the link test 
will work with the available multilibs.  Nothing in this test really requires a 
specific float ABI, it's all integer code.

R.

> 
>>
>> We don't normally add many comments (other than dg- directives) to tests, 
>> but in this case one might be worthwhile about the need for the options to 
>> remain compatible at the ABI level.
> 
> I agree.
> 
> Kind regards,
> Torbjörn
>>
>> R.
>>
>>>
>>> Kind regards
>>> Torbjörn

 R.
>>>
>>
> 



Re: [PATCH] rtl-reader: Disable reuse_rtx support for generator building

2024-11-20 Thread Richard Sandiford
Andrew Pinski  writes:
> reuse_rtx is not documented nor the format to use it is ever documented.
> So it should not be supported for the .md files.
>
> This also fixes the problem if an invalid index is supplied for reuse_rtx,
> instead of ICEing, put out a real error message.  Note since this code
> still uses atoi, an invalid index can still be used in some cases but that is
> recorded as part of PR 44574.
>
> Note I did a grep of the sources to make sure that this was only used for
> the read rtl in the GCC rather than while reading in .md files.
>
> Bootstrapped and tested on x86_64-linux-gnu.
>
> gcc/ChangeLog:
>
>   * read-md.h (class rtx_reader): Don't include m_reuse_rtx_by_id
>   when GENERATOR_FILE is defined.
>   * read-rtl.cc (rtx_reader::read_rtx_code): Disable reuse_rtx
>   support when GENERATOR_FILE is defined.

OK, thanks.

Richard

>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/read-md.h   | 2 ++
>  gcc/read-rtl.cc | 9 +++--
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/read-md.h b/gcc/read-md.h
> index 9703551a8fd..e613c42b724 100644
> --- a/gcc/read-md.h
> +++ b/gcc/read-md.h
> @@ -364,8 +364,10 @@ class rtx_reader : public md_reader
>/* Analogous to rtx_writer's m_in_call_function_usage.  */
>bool m_in_call_function_usage;
>  
> +#ifndef GENERATOR_FILE
>/* Support for "reuse_rtx" directives.  */
>auto_vec m_reuse_rtx_by_id;
> +#endif
>  };
>  
>  /* Global singleton; constrast with md_reader_ptr above.  */
> diff --git a/gcc/read-rtl.cc b/gcc/read-rtl.cc
> index bfce806f9d6..630f9c59c37 100644
> --- a/gcc/read-rtl.cc
> +++ b/gcc/read-rtl.cc
> @@ -1672,7 +1672,6 @@ rtx_reader::read_rtx_code (const char *code_name)
>struct md_name name;
>rtx return_rtx;
>int c;
> -  long reuse_id = -1;
>  
>/* Linked list structure for making RTXs: */
>struct rtx_list
> @@ -1681,6 +1680,8 @@ rtx_reader::read_rtx_code (const char *code_name)
>rtx value; /* Value of this node.  */
>  };
>  
> +#ifndef GENERATOR_FILE
> +  long reuse_id = -1;
>/* Handle reuse_rtx ids e.g. "(0|scratch:DI)".  */
>if (ISDIGIT (code_name[0]))
>  {
> @@ -1696,10 +1697,12 @@ rtx_reader::read_rtx_code (const char *code_name)
>read_name (&name);
>unsigned idx = atoi (name.string);
>/* Look it up by ID.  */
> -  gcc_assert (idx < m_reuse_rtx_by_id.length ());
> +  if (idx >= m_reuse_rtx_by_id.length ())
> + fatal_with_file_and_line ("invalid reuse index %u", idx);
>return_rtx = m_reuse_rtx_by_id[idx];
>return return_rtx;
>  }
> +#endif
>  
>/* Handle "const_double_zero".  */
>if (strcmp (code_name, "const_double_zero") == 0)
> @@ -1727,12 +1730,14 @@ rtx_reader::read_rtx_code (const char *code_name)
>memset (return_rtx, 0, RTX_CODE_SIZE (code));
>PUT_CODE (return_rtx, code);
>  
> +#ifndef GENERATOR_FILE
>if (reuse_id != -1)
>  {
>/* Store away for later reuse.  */
>m_reuse_rtx_by_id.safe_grow_cleared (reuse_id + 1, true);
>m_reuse_rtx_by_id[reuse_id] = return_rtx;
>  }
> +#endif
>  
>/* Check for flags. */
>read_flags (return_rtx);


Re: [Committed] RISC-V: testsuite: restrict big endian test to non vector

2024-11-20 Thread Edwin Lu

Pushed.

Edwin

On 11/19/2024 1:11 PM, Jeff Law wrote:



On 11/19/24 2:08 PM, Edwin Lu wrote:

RISC-V vector currently does not support big endian so the postcommit
was getting the sorry, not implemented error on vector targets. Restrict
the testcase to non-vector targets

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr117595.c: Restrict to non vector targets.

OK
jeff



Re: [PATCH v1 3/4] Rename SEH functions for reuse in AArch64

2024-11-20 Thread Richard Sandiford
Evgeny Karpov  writes:
> From 4274d1126a1aa60d16dca1cbf7dde1c5ee344bf7 Mon Sep 17 00:00:00 2001
> From: Evgeny Karpov 
> Date: Fri, 15 Nov 2024 13:36:41 +0100
> Subject: [PATCH v1 3/4] Rename SEH functions for reuse in AArch64
>
> This patch renames functions related to SEH functionality. These
> functions will be reused in the aarch64-w64-mingw32 target.
>
> gcc/ChangeLog:
>
>   * config/i386/cygming.h (TARGET_ASM_FUNCTION_END_PROLOGUE):
>   Rename.
>   (TARGET_ASM_EMIT_EXCEPT_PERSONALITY): Likewise.
>   (TARGET_ASM_INIT_SECTIONS): Likewise.
>   (SUBTARGET_ASM_UNWIND_INIT): Likewise.
>   (ASM_DECLARE_FUNCTION_SIZE): Likewise.
>   (ASM_DECLARE_COLD_FUNCTION_SIZE): Likewise.
>   * config/i386/i386-protos.h (i386_pe_end_function):
>   Remove declarations.
>   (i386_pe_end_cold_function): Likewise.
>   (i386_pe_seh_init): Likewise.
>   (i386_pe_seh_end_prologue): Likewise.
>   (i386_pe_seh_cold_init): Likewise.
>   (i386_pe_seh_emit_except_personality): Likewise.
>   (i386_pe_seh_init_sections): Likewise.
>   * config/mingw/winnt.cc (i386_pe_seh_init): Rename into ...
>   (mingw_pe_seh_init): ... this.
>   (i386_pe_seh_end_prologue): Rename into ...
>   (mingw_pe_seh_end_prologue): ... this.
>   (i386_pe_seh_cold_init): Rename into ...
>   (mingw_pe_seh_cold_init): ... this.
>   (i386_pe_seh_fini): Rename into ...
>   (mingw_pe_seh_fini): ... this.
>   (i386_pe_seh_emit_except_personality): Rename into ...
>   (mingw_pe_seh_emit_except_personality): ... this.
>   (i386_pe_seh_init_sections): Rename into ...
>   (mingw_pe_seh_init_sections): ... this.
>   (i386_pe_end_function): Rename into ...
>   (mingw_pe_end_function): ... this.
>   (i386_pe_end_cold_function): Rename into ...
>   (mingw_pe_end_cold_function): ... this.
>   * config/mingw/winnt.h (mingw_pe_end_cold_function):
>   Add declarations.
>   (mingw_pe_end_function): Likewise.
>   (mingw_pe_seh_init): Likewise.
>   (mingw_pe_seh_init_sections): Likewise.
>   (mingw_pe_seh_cold_init): Likewise.
>   (mingw_pe_seh_emit_except_personality): Likewise.
>   (mingw_pe_seh_end_prologue): Likewise.
>
> libgcc/ChangeLog:
>
>   * config.host: Update.
>   * config/i386/t-seh-eh: Move to...
>   * config/mingw/t-seh-eh: ...here.

OK, thanks.  (As usual, I can only comment on GCC internals, rather
than mingw-specific details.)

Richard

> ---
>  gcc/config/i386/cygming.h  | 14 +++---
>  gcc/config/i386/i386-protos.h  |  7 ---
>  gcc/config/mingw/winnt.cc  | 22 +++---
>  gcc/config/mingw/winnt.h   |  8 
>  libgcc/config.host |  4 ++--
>  libgcc/config/{i386 => mingw}/t-seh-eh |  0
>  6 files changed, 28 insertions(+), 27 deletions(-)
>  rename libgcc/config/{i386 => mingw}/t-seh-eh (100%)
>
> diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
> index 7a97d02b81b..a86f87a7535 100644
> --- a/gcc/config/i386/cygming.h
> +++ b/gcc/config/i386/cygming.h
> @@ -44,12 +44,12 @@ along with GCC; see the file COPYING3.  If not see
>  #undef  TARGET_ASM_UNWIND_EMIT_BEFORE_INSN
>  #define TARGET_ASM_UNWIND_EMIT_BEFORE_INSN  false
>  #undef  TARGET_ASM_FUNCTION_END_PROLOGUE
> -#define TARGET_ASM_FUNCTION_END_PROLOGUE  i386_pe_seh_end_prologue
> +#define TARGET_ASM_FUNCTION_END_PROLOGUE  mingw_pe_seh_end_prologue
>  #undef  TARGET_ASM_EMIT_EXCEPT_PERSONALITY
> -#define TARGET_ASM_EMIT_EXCEPT_PERSONALITY 
> i386_pe_seh_emit_except_personality
> +#define TARGET_ASM_EMIT_EXCEPT_PERSONALITY 
> mingw_pe_seh_emit_except_personality
>  #undef  TARGET_ASM_INIT_SECTIONS
> -#define TARGET_ASM_INIT_SECTIONS  i386_pe_seh_init_sections
> -#define SUBTARGET_ASM_UNWIND_INIT  i386_pe_seh_init
> +#define TARGET_ASM_INIT_SECTIONS  mingw_pe_seh_init_sections
> +#define SUBTARGET_ASM_UNWIND_INIT  mingw_pe_seh_init
>  
>  #undef DEFAULT_ABI
>  #define DEFAULT_ABI (TARGET_64BIT ? MS_ABI : SYSV_ABI)
> @@ -310,18 +310,18 @@ do {\
>do \
>  {\
>mingw_pe_declare_type (FILE, NAME, 0, 1);  \
> -  i386_pe_seh_cold_init (FILE, NAME);\
> +  mingw_pe_seh_cold_init (FILE, NAME);   \
>ASM_OUTPUT_LABEL (FILE, NAME); \
>  }\
>while (0)
>  
>  #undef ASM_DECLARE_FUNCTION_SIZE
>  #define ASM_DECLARE_FUNCTION_SIZE(FILE,NAME,DECL) \
> -  i386_pe_end_function (FILE, NAME, DECL)
> +  mingw_pe_end_function (FILE, NAME, DECL)
>  
>  #undef ASM_DECLARE_COLD_FUNCTION_SIZE
>  #define ASM_DECLARE_COLD_FUNCTION_SIZE(FILE,NAME,DECL) \
> -  i386_pe_end_cold_function (FILE, NAME, DECL)
> +  mingw_pe_end_cold_function (

[PATCH] tree-optimization/117355: object size for PHI nodes with negative offsets

2024-11-20 Thread Siddhesh Poyarekar
When the object size estimate is returned for a PHI node, it is the
maximum possible value, which is fine in isolation.  When combined with
negative offsets however, it may sometimes end up in zero size because
the resultant size was larger than the wholesize, leading
size_for_offset to conclude that there's a potential underflow.  Fix
this by allowing a non-strict mode to size_for_offset, which
conservatively returns the size (or wholesize) in case of a negative
offset.

gcc/ChangeLog:

PR tree-optimization/117355
* tree-object-size.cc (size_for_offset): New argument STRICT,
return SZ if it is set to false.
(plus_stmt_object_size): Adjust call to SIZE_FOR_OFFSET.

gcc/testsuite/ChangeLog:

PR tree-optimization/117355
* g++.dg/ext/builtin-object-size2.C (test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-3.c (test10): Adjust expected size.

Signed-off-by: Siddhesh Poyarekar 
---
Testing:

- bootstrapped on x86_64
- tested on i686, no new regressions
- bootstrapp with config-ubsan in progress

 .../g++.dg/ext/builtin-object-size2.C | 27 ++
 gcc/testsuite/gcc.dg/builtin-object-size-3.c  |  2 +-
 gcc/tree-object-size.cc   | 28 +++
 3 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/g++.dg/ext/builtin-object-size2.C 
b/gcc/testsuite/g++.dg/ext/builtin-object-size2.C
index 7a8f4e62733..45401b5a9c1 100644
--- a/gcc/testsuite/g++.dg/ext/builtin-object-size2.C
+++ b/gcc/testsuite/g++.dg/ext/builtin-object-size2.C
@@ -406,6 +406,32 @@ test8 (union F *f)
 FAIL ();
 }
 
+// PR117355
+#define STR "bbb"
+
+void
+__attribute__ ((noinline))
+test9 (void)
+{
+  char line[256];
+  const char *p = STR;
+  const char *q = p + sizeof (STR) - 1;
+
+  char *q1 = line;
+  for (const char *p1 = p; p1 < q;)
+{
+  *q1++ = *p1++;
+
+  if (p1 < q && (*q1++ = *p1++) != '\0')
+   {
+ if (__builtin_object_size (q1 - 2, 0) == 0)
+   __builtin_abort ();
+ if (__builtin_object_size (q1 - 2, 1) == 0)
+   __builtin_abort ();
+   }
+}
+}
+
 int
 main (void)
 {
@@ -430,5 +456,6 @@ main (void)
   union F f, *fp = &f;
   __asm ("" : "+r" (fp));
   test8 (fp);
+  test9 ();
   DONE ();
 }
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-3.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-3.c
index ec2c62c9640..e0c967e003f 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-3.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-3.c
@@ -619,7 +619,7 @@ test10 (void)
  if (__builtin_object_size (p - 3, 2) != sizeof (buf) - i + 3)
FAIL ();
 #else
- if (__builtin_object_size (p - 3, 2) != 0)
+ if (__builtin_object_size (p - 3, 2) != 3)
FAIL ();
 #endif
  break;
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 09aad88498e..6413ebcca37 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -344,7 +344,8 @@ init_offset_limit (void)
be positive and hence, be within OFFSET_LIMIT for valid offsets.  */
 
 static tree
-size_for_offset (tree sz, tree offset, tree wholesize = NULL_TREE)
+size_for_offset (tree sz, tree offset, tree wholesize = NULL_TREE,
+bool strict = true)
 {
   gcc_checking_assert (types_compatible_p (TREE_TYPE (sz), sizetype));
 
@@ -377,9 +378,17 @@ size_for_offset (tree sz, tree offset, tree wholesize = 
NULL_TREE)
return sz;
 
   /* Negative or too large offset even after adjustment, cannot be within
-bounds of an object.  */
+bounds of an object.  The exception here is when the base object size
+has been overestimated (e.g. through PHI nodes or a COND_EXPR) and the
+adjusted offset remains negative.  If the caller wants to be
+permissive, return the base size.  */
   if (compare_tree_int (offset, offset_limit) > 0)
-   return size_zero_node;
+   {
+ if (strict)
+   return size_zero_node;
+ else
+   return sz;
+   }
 }
 
   return size_binop (MINUS_EXPR, size_binop (MAX_EXPR, sz, offset), offset);
@@ -1521,16 +1530,23 @@ plus_stmt_object_size (struct object_size_info *osi, 
tree var, gimple *stmt)
  addr_object_size (osi, op0, object_size_type, &bytes, &wholesize);
}
 
+  bool pos_offset = (size_valid_p (op1, 0)
+&& compare_tree_int (op1, offset_limit) <= 0);
+
   /* size_for_offset doesn't make sense for -1 size, but it does for size 0
 since the wholesize could be non-zero and a negative offset could give
 a non-zero size.  */
   if (size_unknown_p (bytes, 0))
;
+  /* In the static case, We want SIZE_FOR_OFFSET to go a bit easy on us if
+it sees a negative offset since BYTES could have been
+overestimated.  */
   else if ((object_size_type & OST_DYNAMIC)
   || bytes != whol

Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

2024-11-20 Thread Martin Uecker
Am Mittwoch, dem 20.11.2024 um 17:37 + schrieb Qing Zhao:
> Hi, Martin,
> 
> Thanks a lot for pointing this out. 
> 
> This does look like a problem we need avoid for the pointer arrays.
> 
> Does  the same problem exist in the language extension too if the n is 
> allowed to be changed after initialization?
> 
> If so, for the future language extension, is there any proposed solution to 
> this problem? 
> 

There is no specification yet and nothing formally proposed, so
it is entirely unclear at this point.

My idea would be to give 'x->buf' the type '(char(*)[x->n])'
where 'x->n' is loaded at the same time 'x->buf' is accessed
and then the value is frozen (like in the simpler versions
of 'counted_by' you had implemented first).  Of course, one
would then have to set x->n before setting the buffer (or
at the same time). This could be ensured by making the
member 'n' const, so that it can only be changed by
overwriting the whole struct. But I am still thinking
about this.

In any case, I think for "counted_by" this is not an option
because it would be confusing if it works differently
than for the FAM case. 

Martin


> Qing
> > On Nov 20, 2024, at 10:52, Martin Uecker  wrote:
> > 
> > Am Mittwoch, dem 20.11.2024 um 15:27 + schrieb Qing Zhao:
> > > 
> > > > On Nov 19, 2024, at 10:47, Marek Polacek  wrote:
> > > > 
> > > > On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote:
> > > > > Am Montag, dem 18.11.2024 um 17:55 + schrieb Qing Zhao:
> > > > > > Hi,
> > > 
> > ..
> > 
> > Hi Qing,
> > 
> > > Per our discussion so far, if treating the following
> > > 
> > > struct foo {
> > > int n;
> > > char *p __attribute__ ((counted_by (n)));
> > > };
> > > 
> > > as an array with upper-bounds being “n” is reasonable.
> > 
> > There is one issue I want to point out, which I just realized during
> > a discussion in WG14.  For "counted_by" we defined the semantics
> > in a way that later store to 'n' will be taken into account. 
> > We did this to support the use case where 'n' is set after
> > the access to 'p'.
> > 
> > struct foo *x = ;
> > 
> > char *q = x->p;
> > x->n = 100; // this should apply
> > 
> > 
> > For FAMs this is fine, because it is a fixed part
> > of the struct.  But for the new pointer case, I think this is
> > problematic.  Consider this example:
> > 
> > struct foo *x = allocate_buffer(100);
> > 
> > where x->n is set to the right value in the allocation function.
> > 
> > Now let's continue with
> > 
> > char *q = x->p;
> > 
> > x = allocate_buffer(50);
> > // x->n is set to 50.
> > 
> > Now q refers to the old buffer, but x->n to the size of the new
> > buffer.  That does not seem right and scares me a little bit.
> > 
> > 
> > 
> > Martin
> > 
> > 
> > > 
> > > Then, it’s reasonable to extend -fsanitize=bounds to instrument the 
> > > corresponding reference for the pointer with
> > > Counted-by attribute. 
> > > 
> > > What do you think?
> > > 
> > > Qing
> > > 
> > > > 
> > > > > I think the question is what -fsanitize=bounds is meant to be.
> > > > > 
> > > > > I am a bit frustrated about the sanitizer.  On the
> > > > > one hand, it is not doing enough to get spatial memory
> > > > > safety even where this would be easily possible, on the
> > > > > other hand, is pedantic about things which are technically
> > > > > UB but not problematic and then one is prevented from
> > > > > using it
> > > > > 
> > > > > When used in default mode, where execution continues, it
> > > > > also does not mix well with many warning, creates more code,
> > > > > and pulls in a libary dependency (and the library also depends
> > > > > on upstream choices / progress which seems a limitation for
> > > > > extensions).
> > > > > 
> > > > > What IMHO would be ideal is a protection mode for spatial
> > > > > memory safety that simply adds traps (which then requires
> > > > > no library, has no issues with other warnings, and could
> > > > > evolve independently from clang) 
> > > > > 
> > > > > So shouldn't we just add a -fboundscheck (which would 
> > > > > be like -fsanitize=bounds -fsanitize-trap=bounds just with
> > > > > more checking) and make it really good? I think many people
> > > > > would be very happy about this.
> > > > 
> > > > That's a separate concern.  We already have the -fbounds-check option,
> > > > currently only used in Fortran (and D?), so perhaps we could make
> > > > that option a shorthand for -fsanitize=bounds -fsanitize-trap=bounds.
> > > > 
> > > > Marek
> > > > 
> > > 
> > 
> 



[pushed][PR116587][LRA]: Fix last chance reload pseudo allocation

2024-11-20 Thread Vladimir Makarov

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116587

The patch was successfully tested and bootstrapped on x86-64, ppc64le, 
aarch64.


commit 56fc6a6d9edc9f9170285ef31c7f312608fad88c
Author: Vladimir N. Makarov 
Date:   Wed Nov 20 14:25:41 2024 -0500

[PR116587][LRA]: Fix last chance reload pseudo allocation

On i686 PR116587 test compilation resulted in LRA failure to find
registers for a reload insn pseudo.  The insn requires 6 regs for 4
reload insn pseudos where two of them require 2 regs each.  But we
have only 5 free regs as sp is a fixed reg, bp is fixed because of
-fno-omit-frame-pointer, bx is assigned to pic_offset_table_pseudo
because of -fPIC.  LRA spills pic_offset_table_pseudo as the last
chance approach to allocate registers to the reload pseudo.  Although
it makes 2 free registers for the unallocated reload pseudo requiring
also 2 regs, the pseudo still can not be allocated as the 2 free regs
are disjoint.  The patch spills all pseudos conflicting with the
unallocated reload pseudo including already allocated reload insn
pseudos, then standard LRA code allocates spilled pseudos requiring
more one register first and avoid situation of the disjoint regs for
reload pseudos requiring more one reg.

gcc/ChangeLog:

PR target/116587
* lra-assigns.cc (find_all_spills_for): Consider all pseudos whose
classes intersect given pseudo class.

gcc/testsuite/ChangeLog:

PR target/116587
* gcc.target/i386/pr116587.c: New test.

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index bcd7967ec7d..0a14bde5e74 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1362,14 +1362,7 @@ find_all_spills_for (int regno)
 	{
 	  if (live_pseudos_reg_renumber[r2->regno] >= 0
 		  && ! sparseset_bit_p (live_range_hard_reg_pseudos, r2->regno)
-		  && rclass_intersect_p[regno_allocno_class_array[r2->regno]]
-		  && ((int) r2->regno < lra_constraint_new_regno_start
-		  || bitmap_bit_p (&lra_inheritance_pseudos, r2->regno)
-		  || bitmap_bit_p (&lra_split_regs, r2->regno)
-		  || bitmap_bit_p (&lra_optional_reload_pseudos, r2->regno)
-		  /* There is no sense to consider another reload
-			 pseudo if it has the same class.  */
-		  || regno_allocno_class_array[r2->regno] != rclass))
+		  && rclass_intersect_p[regno_allocno_class_array[r2->regno]])
 		sparseset_set_bit (live_range_hard_reg_pseudos, r2->regno);
 	}
 	}
diff --git a/gcc/testsuite/gcc.target/i386/pr116587.c b/gcc/testsuite/gcc.target/i386/pr116587.c
new file mode 100644
index 000..092830002d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr116587.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fPIC -mstackrealign -mavx512f -fharden-control-flow-redundancy -fno-omit-frame-pointer -mbmi -fkeep-gc-roots-live" } */
+
+typedef __UINT64_TYPE__ a;
+int b;
+struct c {
+  a d;
+};
+extern char e[];
+int f;
+void g();
+char *h(struct c *i, a d) {
+  while (b) {
+if ((i->d & d) == i->d) {
+  if (f)
+g();
+  g();
+  d &= ~i->d;
+}
+++i;
+  }
+  if (d)
+g();
+  if (f)
+return "";
+  return e;
+}


Re: [PATCH] libgccjit: Support signed char flag

2024-11-20 Thread David Malcolm
On Thu, 2024-02-22 at 15:29 -0500, Antoni Boucher wrote:
> Thanks for the review and idea.

Thanks for the updated patch; sorry about the delay in reviewing.

> 
> Here's the updated patch. I added a test, but I could not set -
> fsigned-
> char as this is not an option accepted by the jit frontend.
> However, the test still works in the sense that it fails without this
> patch and passes with it.
> I'm just wondering if it would pass on all targets or if I should add
> a
> target filtering directive to only execute on some target.
> What do you think?

The test looks good to me, I don't think it needs a target filtering
directive since presumably any target on which it already passed
without the patch will still pass with the patch.

The patch is OK for trunk; thanks.
Dave



> 
> On Tue, 2024-01-09 at 11:01 -0500, David Malcolm wrote:
> > On Thu, 2023-12-21 at 08:42 -0500, Antoni Boucher wrote:
> > > Hi.
> > > This patch adds support for the -fsigned-char flag.
> > 
> > Thanks.  The patch looks correct to me.
> > 
> > > I'm not sure how to test it since I stumbled upon this bug when I
> > > found
> > > this other bug
> > > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863)
> > > which is now fixed.
> > > Any idea how I could test this patch?
> > 
> > We already document that GCC_JIT_TYPE_CHAR has "some signedness". 
> > The
> > bug being fixed here is that gcc_jit_context compilations were
> > always
> > treating "char" as unsigned, regardless of the value of -fsigned-
> > char
> > (either from the target's default, or as a context option), when it
> > makes more sense to follow the C frontend's behavior.
> > 
> > So perhaps jit-written code with a context that has -fsigned-char
> > as
> > an
> > option (via gcc_jit_context_add_command_line_option), and which
> > promotes a negative char to a signed int, and then returns the
> > result
> > as an int?  Presumably if we're erroneously forcing "char" to be
> > unsigned, the int will be in the range 0x80 to 0xff, rather that
> > being
> > negative.
> > 
> > Dave
> > 
> 



Re: [PATCH v2 04/14] tree-phinodes: Use 4 instead of 2 as the minimum number of phi args

2024-11-20 Thread Richard Biener
On Tue, Nov 19, 2024 at 5:46 PM Lewis Hyatt  wrote:
>
> On Tue, Nov 19, 2024 at 9:59 AM Richard Biener
>  wrote:
> >
> > On Sun, Nov 17, 2024 at 4:28 AM Lewis Hyatt  wrote:
> > >
> > > Currently, when we allocate a gphi object, we round up the capacity for 
> > > the
> > > trailing arguments array such that it will make full use of the page size
> > > that ggc will allocate. While there is also an explicit minimum of 2
> > > arguments, in practice after rounding to the ggc page size there is always
> > > room for at least 4.
> > >
> > > It seems we have some code that has come to depend on there being this 
> > > much
> > > room before reallocation of a PHI is required. For example, the function
> > > loop_version () used during loop optimization will make sure there is room
> > > for an additional edge on each PHI that it processes. But there are call
> > > sites which cache a PHI pointer prior to calling loop_version () and 
> > > assume
> > > it remains valid afterward, thus implicitly assuming that the PHI will 
> > > have
> > > spare capacity. Examples include split_loop () and gen_parallel_loop ().
> > >
> > > This works fine now, but if the size of a gphi becomes larger, e.g. due to
> > > configuring location_t to be a 64-bit type, then on 32-bit platforms it 
> > > ends
> > > up being possible to get a gphi with only 2 arguments of capacity, causing
> > > the above call sites of loop_version () to fail. (They store a pointer to 
> > > a
> > > gphi object that no longer has the same meaning it did before it got
> > > reallocated.) The testcases gcc.dg/torture/pr113707-2.c and
> > > gcc.dg/graphite/pr81945.c exhibit that failure mode.
> > >
> > > It may be necessary to adjust those call sites to make this more robust, 
> > > but
> > > in the meantime, changing the minimum from 2 to 4 does no harm given the
> > > minimum is practically 4 anyway, and it resolves the issue for 32-bit
> > > platforms.
> >
> > We need to fix the users.  Note ideal_phi_node_len rounds up to a power of 
> > two
> > but extra_order_size_table also has MAX_ALIGNMENT * n with n from 1 to 16
> > buckets, so such extensive rounding up is not needed.
> >
> > The cache is also quite useless this way (I didn't fix this when last 
> > working
> > there).
> >
>
> Adjusting the call sites definitely sounds right, but I worry it's
> potentially a big change?

I already had to fixup quite some places because gphis now can be
ggc_free()d when removed or re-allocated.  It seems you simply uncovered
more of those places.

> So one of the call sites that caused problems here was around line 620
> in tree-ssa-loop-split.cc:
>
>  /* Find a loop PHI node that defines guard_iv directly,
>or create one doing that.  */
> gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv);
> if (!phi)
>
> It remembers "phi" and reuses it later when it might have been
> invalidated. That one is easy to fix, find_or_create_guard_phi() can
> just be called again later.

One trick is to instead remember the PHI result SSA variable and
later check its SSA_NAME_DEF_STMT which will be the re-allocated
PHI.  I think this should work almost everywhere for the re-allocation issue.

> But the other one I ran into with testing was in tree-parloops.cc:
>
> /* Element of the hashtable, representing a
>reduction in the current loop.  */
> struct reduction_info
> {
>   gimple *reduc_stmt;   /* reduction statement.  */
>   gimple *reduc_phi;/* The phi node defining the reduction.  */
>   enum tree_code reduction_code;/* code for the reduction operation.  */
>   unsigned reduc_version;   /* SSA_NAME_VERSION of original reduc_phi
>result.  */
>   gphi *keep_res;   /* The PHI_RESULT of this phi is the
> resulting value
>of the reduction variable when
> existing the loop. */
>   tree initial_value;   /* The initial value of the reduction
> var before entering the loop.  */
>   tree field;   /*  the name of the field in the
> parloop data structure intended for reduction.  */
>   tree reduc_addr;  /* The address of the reduction variable for
>openacc reductions.  */
>   tree init;/* reduction initialization value.  */
>   gphi *new_phi;/* (helper field) Newly created phi
> node whose result
>will be passed to the atomic
> operation.  Represents
>the local result each thread
> computed for the reduction
>operation.  */
> };
>
> It keeps a hash map of these throughout the pass and the whole design
> is predicated on those pointers always remaining valid, so I think
> this would need extensive redesign? I did not look into all the
> details there though.
>
> Once I saw that, I tried the approach shown here of "just" changing
> the

Re: [PATCH v2 01/14] Support for 64-bit location_t: libcpp part 1

2024-11-20 Thread Richard Biener
On Sun, Nov 17, 2024 at 4:24 AM Lewis Hyatt  wrote:
>
> Prepare libcpp to support 64-bit location_t, without yet making
> any functional changes, by adding new typedefs that enable code to be
> written such that it works with any size location_t. Update the usage of
> line maps within libcpp accordingly.
>
> Subsequent patches will prepare the rest of the codebase similarly, and then
> afterwards, location_t will be changed to uint64_t.

This is OK if there's no comment from libcpp maintainers this week.

Thanks,
Richard.

> libcpp/ChangeLog:
>
> * include/line-map.h (line_map_uint_t): New typedef, the same type
> as location_t.
> (location_diff_t): New typedef.
> (line_map_suggested_range_bits): New constant.
> (struct maps_info_ordinary): Change member types from "unsigned int"
> to "line_map_uint_t".
> (struct maps_info_macro): Likewise.
> (struct location_adhoc_data_map): Likewise.
> (LINEMAPS_ALLOCATED): Change return type from "unsigned int" to
> "line_map_uint_t".
> (LINEMAPS_ORDINARY_ALLOCATED): Likewise.
> (LINEMAPS_MACRO_ALLOCATED): Likewise.
> (LINEMAPS_USED): Likewise.
> (LINEMAPS_ORDINARY_USED): Likewise.
> (LINEMAPS_MACRO_USED): Likewise.
> (linemap_lookup_macro_index): Likewise.
> (LINEMAPS_MAP_AT): Change argument type from "unsigned int" to
> "line_map_uint_t".
> (LINEMAPS_ORDINARY_MAP_AT): Likewise.
> (LINEMAPS_MACRO_MAP_AT): Likewise.
> (line_map_new_raw): Likewise.
> (linemap_module_restore): Likewise.
> (linemap_dump): Likewise.
> (line_table_dump): Likewise.
> (LINEMAPS_LAST_MAP): Add a linemap_assert() for safety.
> (SOURCE_COLUMN): Use a cast to ensure correctness if location_t
> becomes a 64-bit type.
> * line-map.cc (location_adhoc_data_hash): Don't truncate to 32-bit
> prematurely when hashing.
> (line_maps::get_or_create_combined_loc): Adapt types to support
> potentially 64-bit location_t. Use MAX_LOCATION_T rather than a
> hard-coded constant.
> (line_maps::get_range_from_loc): Adapt types and constants to
> support potentially 64-bit location_t.
> (line_maps::pure_location_p): Likewise.
> (line_maps::get_pure_location): Likewise.
> (line_map_new_raw): Likewise.
> (LAST_SOURCE_LINE_LOCATION): Likewise.
> (linemap_add): Likewise.
> (linemap_module_restore): Likewise.
> (linemap_line_start): Likewise.
> (linemap_position_for_column): Likewise.
> (linemap_position_for_line_and_column): Likewise.
> (linemap_position_for_loc_and_offset): Likewise.
> (linemap_ordinary_map_lookup): Likewise.
> (linemap_lookup_macro_index): Likewise.
> (linemap_dump): Likewise.
> (linemap_dump_location): Likewise.
> (linemap_get_file_highest_location): Likewise.
> (line_table_dump): Likewise.
> (linemap_compare_locations): Avoid signed int overflow in the result.
> * macro.cc (num_expanded_macros_counter): Change type of global
> variable from "unsigned int" to "line_map_uint_t".
> (num_macro_tokens_counter): Likewise.
> ---
>  libcpp/include/line-map.h |  86 ++--
>  libcpp/line-map.cc| 138 ++
>  libcpp/macro.cc   |   4 +-
>  3 files changed, 130 insertions(+), 98 deletions(-)
>
> diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
> index 732ec5e6445..96fdf60644f 100644
> --- a/libcpp/include/line-map.h
> +++ b/libcpp/include/line-map.h
> @@ -87,9 +87,9 @@ enum lc_reason
> gcc there is a single line_maps instance: "line_table", declared in
> gcc/input.h and defined in gcc/input.cc.
>
> -   The values of the keys are intended to be internal to libcpp,
> -   but for ease-of-understanding the implementation, they are currently
> -   assigned as follows:
> +   The values of the keys are intended to be internal to libcpp, but for
> +   ease-of-understanding the implementation, they are currently assigned as
> +   follows in the case of 32-bit location_t:
>
>Actual | Value | Meaning
>---+---+---
> @@ -292,6 +292,12 @@ enum lc_reason
> To further see how location_t works in practice, see the
> worked example in libcpp/location-example.txt.  */
>  typedef unsigned int location_t;
> +typedef int64_t location_diff_t;
> +
> +/* Sometimes we need a type that has the same size as location_t but that 
> does
> +   not represent a location.  This typedef provides more clarity in those
> +   cases.  */
> +typedef location_t line_map_uint_t;
>
>  /* Do not track column numbers higher than this one.  As a result, the
> range of column_bits is [12, 18] (or 0 if column numbers are
> @@ -311,6 

Re: [PATCH v2 1/3] cfgexpand: Factor out getting the stack decl index

2024-11-20 Thread Richard Biener
On Sat, Nov 16, 2024 at 5:27 AM Andrew Pinski  wrote:
>
> This is the first patch in improving this code.
> Since there are a few places which get the index and they
> check the same thing, let's factor that out into one function.
>
> Bootstrapped and tested on x86_64-linux-gnu.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * cfgexpand.cc (INVALID_STACK_INDEX): New defined.
> (decl_stack_index): New function.
> (visit_op): Use decl_stack_index.
> (visit_conflict): Likewise.
> (add_scope_conflicts_1): Likewise.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/cfgexpand.cc | 62 +---
>  1 file changed, 37 insertions(+), 25 deletions(-)
>
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index ed890f692e5..b88e8827667 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -338,6 +338,8 @@ static unsigned stack_vars_alloc;
>  static unsigned stack_vars_num;
>  static hash_map *decl_to_stack_part;
>
> +#define INVALID_STACK_INDEX ((unsigned)-1)
> +
>  /* Conflict bitmaps go on this obstack.  This allows us to destroy
> all of them in one big sweep.  */
>  static bitmap_obstack stack_var_bitmap_obstack;
> @@ -526,6 +528,27 @@ stack_var_conflict_p (unsigned x, unsigned y)
>return bitmap_bit_p (a->conflicts, y);
>  }
>
> +/* Returns the DECL's index into the stack_vars array.
> +   If the DECL does not exist return INVALID_STACK_INDEX.  */
> +static unsigned
> +decl_stack_index (tree decl)
> +{
> +  if (!decl)
> +return INVALID_STACK_INDEX;
> +  if (!DECL_P (decl))
> +return INVALID_STACK_INDEX;
> +  if (DECL_RTL_IF_SET (decl) != pc_rtx)
> +return INVALID_STACK_INDEX;
> +  unsigned *v = decl_to_stack_part->get (decl);
> +  if (!v)
> +return INVALID_STACK_INDEX;
> +
> +  unsigned indx = *v;
> +  gcc_checking_assert (indx != INVALID_STACK_INDEX);
> +  gcc_checking_assert (indx < stack_vars_num);
> +  return indx;
> +}
> +
>  /* Callback for walk_stmt_ops.  If OP is a decl touched by add_stack_var
> enter its partition number into bitmap DATA.  */
>
> @@ -534,14 +557,9 @@ visit_op (gimple *, tree op, tree, void *data)
>  {
>bitmap active = (bitmap)data;
>op = get_base_address (op);
> -  if (op
> -  && DECL_P (op)
> -  && DECL_RTL_IF_SET (op) == pc_rtx)
> -{
> -  unsigned *v = decl_to_stack_part->get (op);
> -  if (v)
> -   bitmap_set_bit (active, *v);
> -}
> +  unsigned idx = decl_stack_index (op);
> +  if (idx != INVALID_STACK_INDEX)
> +bitmap_set_bit (active, idx);
>return false;
>  }
>
> @@ -554,20 +572,15 @@ visit_conflict (gimple *, tree op, tree, void *data)
>  {
>bitmap active = (bitmap)data;
>op = get_base_address (op);
> -  if (op
> -  && DECL_P (op)
> -  && DECL_RTL_IF_SET (op) == pc_rtx)
> +  unsigned num = decl_stack_index (op);
> +  if (num != INVALID_STACK_INDEX
> +  && bitmap_set_bit (active, num))
>  {
> -  unsigned *v = decl_to_stack_part->get (op);
> -  if (v && bitmap_set_bit (active, *v))
> -   {
> - unsigned num = *v;
> - bitmap_iterator bi;
> - unsigned i;
> - gcc_assert (num < stack_vars_num);
> - EXECUTE_IF_SET_IN_BITMAP (active, 0, i, bi)
> -   add_stack_var_conflict (num, i);
> -   }
> +  bitmap_iterator bi;
> +  unsigned i;
> +  gcc_assert (num < stack_vars_num);
> +  EXECUTE_IF_SET_IN_BITMAP (active, 0, i, bi)
> +   add_stack_var_conflict (num, i);
>  }
>return false;
>  }
> @@ -639,15 +652,14 @@ add_scope_conflicts_1 (basic_block bb, bitmap work, 
> bool for_conflict)
>if (gimple_clobber_p (stmt))
> {
>   tree lhs = gimple_assign_lhs (stmt);
> - unsigned *v;
>   /* Handle only plain var clobbers.
>  Nested functions lowering and C++ front-end inserts clobbers
>  which are not just plain variables.  */
>   if (!VAR_P (lhs))
> continue;
> - if (DECL_RTL_IF_SET (lhs) == pc_rtx
> - && (v = decl_to_stack_part->get (lhs)))
> -   bitmap_clear_bit (work, *v);
> + unsigned indx = decl_stack_index (lhs);
> + if (indx != INVALID_STACK_INDEX)
> +   bitmap_clear_bit (work, indx);
> }
>else if (!is_gimple_debug (stmt))
> {
> --
> 2.43.0
>


Re: [PATCH v2 06/14] Support for 64-bit location_t: Frontend parts

2024-11-20 Thread Richard Biener
On Sun, Nov 17, 2024 at 4:29 AM Lewis Hyatt  wrote:
>
> The C/C++ frontend code contains a couple instances where a callback
> receiving a "location_t" argument is prototyped to take "unsigned int"
> instead. This will make a difference once location_t can be configured to a
> different type, so adjust that now.
>
> Also remove a comment about -flarge-source-files, which will be removed
> shortly.

OK.

Thanks,
Richard.

> gcc/c-family/ChangeLog:
>
> * c-indentation.cc (should_warn_for_misleading_indentation): Remove
> comment about -flarge-source-files.
> * c-lex.cc (cb_ident): Change "unsigned int" argument to type
> "location_t".
> (cb_def_pragma): Likewise.
> (cb_define): Likewise.
> (cb_undef): Likewise.
> ---
>  gcc/c-family/c-indentation.cc |  5 -
>  gcc/c-family/c-lex.cc | 10 +-
>  2 files changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
> index 3bd85e53c59..7a70d608eec 100644
> --- a/gcc/c-family/c-indentation.cc
> +++ b/gcc/c-family/c-indentation.cc
> @@ -322,11 +322,6 @@ should_warn_for_misleading_indentation (const 
> token_indent_info &guard_tinfo,
>   "%<-Wmisleading-indentation%> is disabled from this point"
>   " onwards, since column-tracking was disabled due to"
>   " the size of the code/headers");
> - if (!flag_large_source_files)
> -   inform (guard_loc,
> -   "adding %<-flarge-source-files%> will allow for more"
> -   " column-tracking support, at the expense of compilation"
> -   " time and memory");
> }
>return false;
>  }
> diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
> index 32f19702c79..90ae4caa225 100644
> --- a/gcc/c-family/c-lex.cc
> +++ b/gcc/c-family/c-lex.cc
> @@ -54,10 +54,10 @@ static tree lex_charconst (const cpp_token *);
>  static void update_header_times (const char *);
>  static int dump_one_header (splay_tree_node, void *);
>  static void cb_line_change (cpp_reader *, const cpp_token *, int);
> -static void cb_ident (cpp_reader *, unsigned int, const cpp_string *);
> -static void cb_def_pragma (cpp_reader *, unsigned int);
> -static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *);
> -static void cb_undef (cpp_reader *, unsigned int, cpp_hashnode *);
> +static void cb_ident (cpp_reader *, location_t, const cpp_string *);
> +static void cb_def_pragma (cpp_reader *, location_t);
> +static void cb_define (cpp_reader *, location_t, cpp_hashnode *);
> +static void cb_undef (cpp_reader *, location_t, cpp_hashnode *);
>
>  void
>  init_c_lex (void)
> @@ -164,7 +164,7 @@ dump_time_statistics (void)
>
>  static void
>  cb_ident (cpp_reader * ARG_UNUSED (pfile),
> - unsigned int ARG_UNUSED (line),
> + location_t ARG_UNUSED (line),
>   const cpp_string * ARG_UNUSED (str))
>  {
>if (!flag_no_ident)


Re: [PATCH 02/15] libcpp: Fix potential unaligned access in cpp_buffer

2024-11-20 Thread Richard Biener
On Sun, Nov 3, 2024 at 11:23 PM Lewis Hyatt  wrote:
>
> libcpp makes use of the cpp_buffer pfile->a_buff to store things while it is
> handling macros. It uses it to store pointers (cpp_hashnode*, for macro
> arguments) and cpp_macro objects. This works fine because a cpp_hashnode*
> and a cpp_macro have the same alignment requirement on either 32-bit or
> 64-bit systems (namely, the same alignment as a pointer.)
>
> When 64-bit location_t is enabled on a 32-bit sytem, the alignment
> requirement may cease to be the same, because the alignment requirement of a
> cpp_macro object changes to that of a uint64_t, which be larger than that of
> a pointer. It's not the case for x86 32-bit, but for example, on sparc, a
> pointer has 4-byte alignment while a uint64_t has 8. In that case,
> intermixing the two within the same cpp_buffer leads to a misaligned
> access. The code path that triggers this is the one in _cpp_commit_buff in
> which a hash table with its own allocator (i.e. ggc) is not being used, so
> it doesn't happen within the compiler itself, but it happens in the other
> libcpp clients, such as genmatch.
>
> Fix that up by ensuring _cpp_commit_buff commits a fully aligned chunk of the
> buffer, so it's ready for anything it may be used for next.
>
> For good measure, also modify CPP_ALIGN so that it guarantees to return an
> alignment at least the size of location_t. Currently it returns the max of
> a pointer and a double. I am not aware of any platform where a double may
> have smaller alignment than a uint64_t, but it does not hurt to add
> location_t here to be sure.

OK.

Thanks,
Richard.

> libcpp/ChangeLog:
>
> * lex.cc (_cpp_commit_buff): Make sure that the buffer is properly
> aligned for the next allocation.
> * internal.h (struct dummy): Make sure alignment is large enough for
> a location_t, just in case.
> ---
>  libcpp/internal.h |  1 +
>  libcpp/lex.cc | 10 --
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/libcpp/internal.h b/libcpp/internal.h
> index e65198e89da..358e77cd622 100644
> --- a/libcpp/internal.h
> +++ b/libcpp/internal.h
> @@ -85,6 +85,7 @@ struct dummy
>{
>  double d;
>  int *p;
> +location_t l;
>} u;
>  };
>
> diff --git a/libcpp/lex.cc b/libcpp/lex.cc
> index 849447eb4d7..858970b5d17 100644
> --- a/libcpp/lex.cc
> +++ b/libcpp/lex.cc
> @@ -4997,7 +4997,8 @@ _cpp_aligned_alloc (cpp_reader *pfile, size_t len)
>  void *
>  _cpp_commit_buff (cpp_reader *pfile, size_t size)
>  {
> -  void *ptr = BUFF_FRONT (pfile->a_buff);
> +  const auto buff = pfile->a_buff;
> +  void *ptr = BUFF_FRONT (buff);
>
>if (pfile->hash_table->alloc_subobject)
>  {
> @@ -5006,7 +5007,12 @@ _cpp_commit_buff (cpp_reader *pfile, size_t size)
>ptr = copy;
>  }
>else
> -BUFF_FRONT (pfile->a_buff) += size;
> +{
> +  BUFF_FRONT (buff) += size;
> +  /* Make sure the remaining space is maximally aligned for whatever this
> +buffer holds next.  */
> +  BUFF_FRONT (buff) += BUFF_ROOM (buff) % DEFAULT_ALIGNMENT;
> +}
>
>return ptr;
>  }


[PATCH] rs6000: Inefficient vector splat of small V2DI constants [PR107757]

2024-11-20 Thread Surya Kumari Jangala
rs6000: Inefficient vector splat of small V2DI constants [PR107757]

On P8, for vector splat of double word constants, specifically -1 and 1,
gcc generates inefficient code. For -1, gcc generates two instructions
(vspltisw and vupkhsw) whereas only one instruction (vspltisw) is
sufficient. For constant 1, gcc generates a load of the constant from
.rodata instead of the instructions vspltisw and vupkhsw.

The routine vspltisw_vupkhsw_constant_p() returns true if the constant
can be synthesized with instructions vspltisw and vupkhsw. However, for
constant 1, this routine returns false.

For constant -1, this routine returns true. Vector splat of -1 can be
done with only one instruction, i.e., vspltisw. We do not need two
instructions. Hence this routine should return false for -1.

With this patch, gcc generates only one instruction (vspltisw)
for -1. And for constant 1, this patch generates two instructions
(vspltisw and vupkhsw).

2024-11-20  Surya Kumari Jangala  

gcc/
PR target/107757
* config/rs6000/rs6000.cc (vspltisw_vupkhsw_constant_p):
Return false for -1 and return true for 1.

gcc/testsuite/
PR target/107757
* gcc.target/powerpc/pr107757-1.c: New.
* gcc.target/powerpc/pr107757-2.c: New.
---

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 0d7ee1e5bdf..4de527e12eb 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -6651,7 +6651,7 @@ vspltisw_vupkhsw_constant_p (rtx op, machine_mode mode, 
int *constant_ptr)
 return false;
 
   value = INTVAL (elt);
-  if (value == 0 || value == 1
+  if (value == 0 || value == -1
   || !EASY_VECTOR_15 (value))
 return false;
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr107757-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr107757-1.c
new file mode 100644
index 000..e0a75f82bd2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr107757-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2" } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-final { scan-assembler {\mvspltisw\M} } } */
+/* { dg-final { scan-assembler {\mvupkhsw\M} } } */
+/* { dg-final { scan-assembler-not {\mlvx\M} } } */
+
+#include 
+
+vector long long
+foo ()
+{
+ return vec_splats (1LL);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr107757-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr107757-2.c
new file mode 100644
index 000..4ed8053f853
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr107757-2.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2" } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-final { scan-assembler {\mvspltisw\M} } } */
+/* { dg-final { scan-assembler-not {\mvupkhsw\M} } } */
+
+#include 
+
+vector long long
+foo ()
+{
+ return vec_splats (~0LL);
+}


Re: [PATCH v2 10/14] Support for 64-bit location_t: gimple parts

2024-11-20 Thread Richard Biener
On Tue, Nov 19, 2024 at 4:43 PM Lewis Hyatt  wrote:
>
> On Tue, Nov 19, 2024 at 10:06 AM Richard Biener
>  wrote:
> >
> > On Sun, Nov 17, 2024 at 4:30 AM Lewis Hyatt  wrote:
> > >
> > > The size of struct gimple increases by 8 bytes with the change in size of
> > > location_t from 32- to 64-bit; adjust the WORD markings in the comments
> > > accordingly. It seems that most of the WORD markings were off by one 
> > > already,
> > > probably not having been updated after a previous reduction in the size 
> > > of a
> > > gimple, so they have become retroactively correct again, and only a couple
> > > needed adjustment actually.
> > >
> > > Also move the 32-bit num_ops member of struct gimple to the end; since 
> > > there
> > > is now 4 bytes of padding after it, this may enable reuse of the tail
> > > padding for some derived structures.
> > >
> > > gcc/ChangeLog:
> > >
> > > * gimple.h (struct gimple): Update word marking comments to 
> > > reflect
> > > the new size of location_t. Move the 32-bit int field to the end.
> > > (struct gphi): Likewise.
> > > ---
> > >  gcc/gimple.h | 11 ++-
> > >  1 file changed, 6 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/gcc/gimple.h b/gcc/gimple.h
> > > index 4a6e0e97d1e..6929c792dc5 100644
> > > --- a/gcc/gimple.h
> > > +++ b/gcc/gimple.h
> > > @@ -268,9 +268,6 @@ struct GTY((desc ("gimple_statement_structure 
> > > (&%h)"), tag ("GSS_BASE"),
> > >   Locus information for debug info.  */
> > >location_t location;
> > >
> > > -  /* Number of operands in this tuple.  */
> > > -  unsigned num_ops;
> > > -
> >
> > Can you instead swap location and num_ops and insert
> >
> > unsigned int pad : 32;
> >
> > after the bits section?  Or is the intent to allow the tail padding to
> > be re-used?  I guess
> > all the [ WORD 1-n] : base class comments need adjustment?  Since
> > 'gimple' looks POD
> > to me, is tail padding even re-used?
> >
> > Thanks,
> > Richard.
>
> So I was thinking that the tail padding could be reused, but you are
> right, it won't be here since this is the base class. All the classes
> that inherit from gimple are non-POD because they use inheritance, but
> gimple itself is POD. I'll change it how you suggest.

Thanks.  I'm not always 100% sure about the C++ rules here.

Richard.


Re: [PATCH v2] Fix MV clones can not redirect to specific target on some targets

2024-11-20 Thread Andrew Carlotti
On Sun, Oct 27, 2024 at 04:00:43PM +, Yangyu Chen wrote:
> Following the implementation of commit b8ce8129a5 ("Redirect call
> within specific target attribute among MV clones (PR ipa/82625)"),
> we can now optimize calls by invoking a versioned function callee
> from a caller that shares the same target attribute. However, on
> targets that define TARGET_HAS_FMV_TARGET_ATTRIBUTE to zero, meaning
> they use the "target_versions" attribute instead of "target", this
> optimization is not feasible. Currently, the only target affected
> by this limitation is AArch64.

The existing optimization can pick the wrong version in some cases, and fixing
this properly requires better comparisons than just a simple string comparison.
I'd prefer to just disable this optimization for aarch64 and riscv for now (and
backport that fix to gcc-14), and add the necessary target hooks to be able to
implement it properly at a later date.  (The existing bug applies if you
specify both target and target_version/target_clones attributes on the same
function, which is an unlikely combination but one that we deliberately chose
to support in aarch64).


To give a specific example, suppose we have target features featv3, featv2 and
featv1, with featv3 implying featv2 implying featv1.  Suppose we have the
following function versions:

Caller: featv2, featv1, default
Callee: featv3, featv2, featv1, default

In the featv1 and default versions of the caller, we know that we would always
select the corresponding version of the callee function, so the redirection is
valid there.

However, in the featv2 version of the caller, we don't know whether we would
select the featv2 or the featv3 versions of the callee at runtime, so we cannot
eliminate the runtime indirection.

Implementing this correctly in full generality would require the addition of
target hooks that indicate whether one target version string is always implied
by another (e.g. in the above example, whenever featv3 support is detected we
know that we would also detect featv2 support).

> This commit resolves the issue by not directly using "target" with
> lookup_attribute. Instead, it checks the TARGET_HAS_FMV_TARGET_ATTRIBUTE
> macro to decide between using the "target" or "target_version"
> attribute.
> 
> Fixes: 79891c4cb5 ("Add support for target_version attribute")
> 
> gcc/ChangeLog:
> 
>   * multiple_target.cc (redirect_to_specific_clone): Fix the redirection
>   does not work on target without TARGET_HAS_FMV_TARGET_ATTRIBUTE.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/aarch64/mvc-redirect.C: New test.
> ---
>  gcc/multiple_target.cc|  8 +++---
>  .../g++.target/aarch64/mvc-redirect.C | 25 +++
>  2 files changed, 30 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/aarch64/mvc-redirect.C
> 
> diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
> index d2c9671fc1b..a1c18f4a3a7 100644
> --- a/gcc/multiple_target.cc
> +++ b/gcc/multiple_target.cc
> @@ -446,8 +446,10 @@ redirect_to_specific_clone (cgraph_node *node)
>cgraph_function_version_info *fv = node->function_version ();
>if (fv == NULL)
>  return;
> +  const char *fmv_attr = (TARGET_HAS_FMV_TARGET_ATTRIBUTE
> +   ? "target" : "target_version");
>  
> -  tree attr_target = lookup_attribute ("target", DECL_ATTRIBUTES 
> (node->decl));
> +  tree attr_target = lookup_attribute (fmv_attr, DECL_ATTRIBUTES 
> (node->decl));
>if (attr_target == NULL_TREE)
>  return;
>  
> @@ -458,7 +460,7 @@ redirect_to_specific_clone (cgraph_node *node)
>if (!fv2)
>   continue;
>  
> -  tree attr_target2 = lookup_attribute ("target",
> +  tree attr_target2 = lookup_attribute (fmv_attr,
>   DECL_ATTRIBUTES (e->callee->decl));
>  
>/* Function is not calling proper target clone.  */
> @@ -472,7 +474,7 @@ redirect_to_specific_clone (cgraph_node *node)
> for (; fv2 != NULL; fv2 = fv2->next)
>   {
> cgraph_node *callee = fv2->this_node;
> -   attr_target2 = lookup_attribute ("target",
> +   attr_target2 = lookup_attribute (fmv_attr,
>  DECL_ATTRIBUTES (callee->decl));
> if (attr_target2 != NULL_TREE
> && attribute_value_equal (attr_target, attr_target2))
> diff --git a/gcc/testsuite/g++.target/aarch64/mvc-redirect.C 
> b/gcc/testsuite/g++.target/aarch64/mvc-redirect.C
> new file mode 100644
> index 000..f29cc3745a3
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/aarch64/mvc-redirect.C
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-ifunc "" } */
> +/* { dg-options "-O0" } */
> +
> +__attribute__((target_clones("default", "dotprod", "sve+sve2")))
> +int foo ()
> +{
> +  return 1;
> +}
> +
> +__attribute__((target_clones("default", "dotprod", "sve+sve2")))
> +int bar()
> +{
> +  return foo ();
> +}
> 

Re: [PATCH] arm, mve: Fix arm_mve_dlstp_check_dec_counter's use of single_pred

2024-11-20 Thread Christophe Lyon
On Tue, 19 Nov 2024 at 15:00, Andre Vieira (lists)
 wrote:
>
> Hi,
>
> Looks like single_pred ICEs if the basic-block does not have a single
> predecessor rather than return NULL, which was what this snippet of code
> relied on.
> This feels like borderline obvious to me as a fix, but I thought I'd get
> it checked by one more person.
>
> Call 'single_pred_p' before 'single_pred' to verify it is safe to do so.
>

The patch is OK, thanks.

Christophe

> gcc/ChangeLog:
>
> * config/arm/arm.cc (arm_mve_dlstp_check_dec_counter): Call
> single_pred_p to verify it's safe to call single_pred.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arm/mve/dlstp-loop-form.c: Add loop that triggered ICE.


Re: [PATCH v2 3/3] cfgexpand: Handle integral vector types and constructors for scope conflicts [PR105769]

2024-11-20 Thread Richard Biener
On Sat, Nov 16, 2024 at 5:24 AM Andrew Pinski  wrote:
>
> This is an expansion of the last patch to also track pointers via vector 
> types and the
> constructor that are used with vector types.
> In this case we had:
> ```
> _15 = (long unsigned int) &bias;
> _10 = (long unsigned int) &cov_jn;
> _12 = {_10, _15};
> ...
>
> MEM[(struct vec *)&cov_jn] ={v} {CLOBBER(bob)};
> bias ={v} {CLOBBER(bob)};
> MEM[(struct function *)&D.6156] ={v} {CLOBBER(bob)};
>
> ...
> MEM  [(void *)&D.6172 + 32B] = _12;
> MEM[(struct function *)&D.6157] ={v} {CLOBBER(bob)};
> ```
>
> Anyways tracking the pointers via vector types to say they are alive
> at the point where the store of the vector happens fixes the bug by saying
> it is alive at the same time as another variable is alive.

OK.

Richard.

> Bootstrapped and tested on x86_64-linux-gnu.
>
> PR tree-optimization/105769
>
> gcc/ChangeLog:
>
> * cfgexpand.cc (vars_ssa_cache::operator()): For constructors
> walk over the elements.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/torture/pr105769-1.C: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/cfgexpand.cc  | 20 +--
>  gcc/testsuite/g++.dg/torture/pr105769-1.C | 67 +++
>  2 files changed, 83 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr105769-1.C
>
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index 841d3c1254e..50262b38c2d 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -729,7 +729,7 @@ vars_ssa_cache::operator() (tree name)
>gcc_assert (TREE_CODE (name) == SSA_NAME);
>
>if (!POINTER_TYPE_P (TREE_TYPE (name))
> -  && !INTEGRAL_TYPE_P (TREE_TYPE (name)))
> +  && !ANY_INTEGRAL_TYPE_P (TREE_TYPE (name)))
>  return empty;
>
>if (exists (name))
> @@ -759,7 +759,7 @@ vars_ssa_cache::operator() (tree name)
> continue;
>
>if (!POINTER_TYPE_P (TREE_TYPE (use))
> - && !INTEGRAL_TYPE_P (TREE_TYPE (use)))
> + && !ANY_INTEGRAL_TYPE_P (TREE_TYPE (use)))
> continue;
>
>/* Mark the old ssa name needs to be update from the use. */
> @@ -773,10 +773,22 @@ vars_ssa_cache::operator() (tree name)
>  so we don't go into an infinite loop for some phi nodes with loops.  
> */
>create (use);
>
> +  gimple *g = SSA_NAME_DEF_STMT (use);
> +
> +  /* CONSTRUCTOR here is always a vector initialization,
> +walk each element too. */
> +  if (gimple_assign_single_p (g)
> + && TREE_CODE (gimple_assign_rhs1 (g)) == CONSTRUCTOR)
> +   {
> + tree ctr = gimple_assign_rhs1 (g);
> + unsigned i;
> + tree elm;
> + FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (ctr), i, elm)
> +   work_list.safe_push (std::make_pair (elm, use));
> +   }
>/* For assignments, walk each operand for possible addresses.
>  For PHI nodes, walk each argument. */
> -  gimple *g = SSA_NAME_DEF_STMT (use);
> -  if (gassign *a = dyn_cast  (g))
> +  else if (gassign *a = dyn_cast  (g))
> {
>   /* operand 0 is the lhs. */
>   for (unsigned i = 1; i < gimple_num_ops (g); i++)
> diff --git a/gcc/testsuite/g++.dg/torture/pr105769-1.C 
> b/gcc/testsuite/g++.dg/torture/pr105769-1.C
> new file mode 100644
> index 000..3fe973656b8
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr105769-1.C
> @@ -0,0 +1,67 @@
> +// { dg-do run }
> +
> +// PR tree-optimization/105769
> +
> +// The partitioning code would incorrectly have bias
> +// and a temporary in the same partitioning because
> +// it was thought bias was not alive when those were alive
> +// do to vectorization of a store of pointers (that included bias).
> +
> +#include 
> +
> +template
> +struct vec {
> +  T dat[n];
> +  vec() {}
> +  explicit vec(const T& x) { for(size_t i = 0; i < n; i++) dat[i] = x; }
> +  T& operator [](size_t i) { return dat[i]; }
> +  const T& operator [](size_t i) const { return dat[i]; }
> +};
> +
> +template
> +using mat = vec>;
> +template
> +using sq_mat = mat;
> +using map_t = std::function;
> +template
> +using est_t = std::function;
> +template using est2_t = std::function;
> +map_t id_map() { return [](size_t j) -> size_t { return j; }; }
> +
> +template
> +est2_t jacknife(const est_t> est, sq_mat& cov, vec T>& bias) {
> +  return [est, &cov, &bias](map_t map) -> void
> +  {
> +bias = est(map);
> +for(size_t i = 0; i < n; i++)
> +{
> +  bias[i].print();
> +}
> +  };
> +}
> +
> +template
> +void print_cov_ratio() {
> +  sq_mat<2, T> cov_jn;
> +  vec<2, T> bias;
> +  jacknife<2, T>([](map_t map) -> vec<2, T> { vec<2, T> retv; retv[0] = 1; 
> retv[1] = 1; return retv; }, cov_jn, bias)(id_map());
> +}
> +struct ab {
> +  long long unsigned a;
> +  short unsigned b;
> +  double operator()() { return a; }
> +  ab& operator=(double rhs) { a = rhs; return *this; }
> + void print();
> +};
> +
> +void
> +ab::print()

[PATCH]middle-end: Pass along SLP node when costing vector loads/stores

2024-11-20 Thread Tamar Christina
Hi All,

With the support to SLP only we now pass the VMAT through the SLP node, however
the majority of the costing calls inside vectorizable_load and
vectorizable_store do no pass the SLP node along.  Due to this the backend 
costing
never sees the VMAT for these cases anymore.

Additionally the helper around record_stmt_cost when both SLP and stmt_vinfo are
passed would only pass the SLP node along.  However the SLP node doesn't contain
all the info available in the stmt_vinfo and we'd have to go through the
SLP_TREE_REPRESENTATIVE anyway.  As such I changed the function to just Always
pass both along.  Unlike the VMAT changes, I don't believe there to be a
correctness issue here but would minimize the number of churn in the backend
costing until vectorizer costing as a whole is revisited in GCC 16.

These changes re-enable the cost model on AArch64 and also correctly find the
VMATs on loads and stores fixing testcases such as sve_iters_low_2.c.

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-data-refs.cc (vect_get_data_access_cost): Pass NULL for SLP
node.
* tree-vect-stmts.cc (record_stmt_cost): Expose.
(vect_get_store_cost, vect_get_load_cost): Extend with SLP node.
(vectorizable_store, vectorizable_load): Pass SLP node to all costing.
* tree-vectorizer.h (record_stmt_cost): Always pass both SLP node and
stmt_vinfo to costing.
(vect_get_load_cost, vect_get_store_cost): Extend with SLP node.

---
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 
3ea5fb883b1a5289195142171eb45fa422910a95..d87ca79b8e4c16d242e67431d1b527bdb8cb74e4
 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -1729,12 +1729,14 @@ vect_get_data_access_cost (vec_info *vinfo, dr_vec_info 
*dr_info,
 ncopies = vect_get_num_copies (loop_vinfo, STMT_VINFO_VECTYPE (stmt_info));
 
   if (DR_IS_READ (dr_info->dr))
-vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
-   misalignment, true, inside_cost,
-   outside_cost, prologue_cost_vec, body_cost_vec, false);
+vect_get_load_cost (vinfo, stmt_info, NULL, ncopies,
+   alignment_support_scheme, misalignment, true,
+   inside_cost, outside_cost, prologue_cost_vec,
+   body_cost_vec, false);
   else
-vect_get_store_cost (vinfo,stmt_info, ncopies, alignment_support_scheme,
-misalignment, inside_cost, body_cost_vec);
+vect_get_store_cost (vinfo,stmt_info, NULL, ncopies,
+alignment_support_scheme, misalignment, inside_cost,
+body_cost_vec);
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
7a92da00f7ddcfdf146fa1c2511f609e8bc40e9e..46543c15c00f00e5127d06446f58fce79951c3b0
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -93,7 +93,7 @@ stmt_in_inner_loop_p (vec_info *vinfo, class _stmt_vec_info 
*stmt_info)
target model or by saving it in a vector for later processing.
Return a preliminary estimate of the statement's cost.  */
 
-static unsigned
+unsigned
 record_stmt_cost (stmt_vector_for_cost *body_cost_vec, int count,
  enum vect_cost_for_stmt kind,
  stmt_vec_info stmt_info, slp_tree node,
@@ -1008,8 +1008,8 @@ cfun_returns (tree decl)
 
 /* Calculate cost of DR's memory access.  */
 void
-vect_get_store_cost (vec_info *, stmt_vec_info stmt_info, int ncopies,
-dr_alignment_support alignment_support_scheme,
+vect_get_store_cost (vec_info *, stmt_vec_info stmt_info, slp_tree slp_node,
+int ncopies, dr_alignment_support alignment_support_scheme,
 int misalignment,
 unsigned int *inside_cost,
 stmt_vector_for_cost *body_cost_vec)
@@ -1019,7 +1019,7 @@ vect_get_store_cost (vec_info *, stmt_vec_info stmt_info, 
int ncopies,
 case dr_aligned:
   {
*inside_cost += record_stmt_cost (body_cost_vec, ncopies,
- vector_store, stmt_info, 0,
+ vector_store, stmt_info, slp_node, 0,
  vect_body);
 
 if (dump_enabled_p ())
@@ -1032,7 +1032,7 @@ vect_get_store_cost (vec_info *, stmt_vec_info stmt_info, 
int ncopies,
   {
 /* Here, we assign an additional cost for the unaligned store.  */
*inside_cost += record_stmt_cost (body_cost_vec, ncopies,
- unaligned_store, stmt_info,
+ unaligned_store, stmt_info, slp_node,
  misalignment, vect

Re: [RFC PATCH 1/5] vect: Force alignment peeling to vectorize more early break loops

2024-11-20 Thread Richard Sandiford
Sorry for the slow review.  Finally catching up on backlog.

Richard Biener  writes:
> On Mon, 28 Oct 2024, Alex Coplan wrote:
>
>> This allows us to vectorize more loops with early exits by forcing
>> peeling for alignment to make sure that we're guaranteed to be able to
>> safely read an entire vector iteration without crossing a page boundary.
>> 
>> To make this work for VLA architectures we have to allow compile-time
>> non-constant target alignments.  We also have to override the result of
>> the target's preferred_vector_alignment hook if it isn't a power-of-two
>> multiple of the TYPE_SIZE of the chosen vector type.
>> 
>> There is currently an implicit assumption that the TYPE_SIZE of the
>> vector type is itself a power of two.  For non-VLA types this
>> could be checked directly in the vectorizer.  For VLA types I
>> had discussed offline with Richard S about adding a target hook to allow
>> the vectorizer to query the backend to confirm that a given VLA type
>> is known to have a power-of-two size at runtime.
>
> GCC assumes all vectors have power-of-two size, so I don't think we
> need to check anything but we'd instead have to make sure the
> target constrains the hardware when this assumption doesn't hold
> in silicon.

We did at one point support non-power-of-2 for VLA only.  But things
might have crept in since that break it even for VLA.  It's no longer
something that matters for SVE because the architecture has been
tightened to remove the non-power-of-2 option.

My main comment on the patch is about:

+  /* Below we reject compile-time non-constant target alignments, but if
+ our misalignment is zero, then we are known to already be aligned
+ w.r.t. any such possible target alignment.  */
+  if (known_eq (misalignment, 0))
+return 0;

When is that true for VLA?  It seems surprising that we can guarantee
alignment to an unknown boundary :)  However, I agree that it's the
natural consequence of the formula.

Thanks,
Richard




Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

2024-11-20 Thread Sam James
Martin Uecker  writes:

> Am Montag, dem 18.11.2024 um 17:55 + schrieb Qing Zhao:
>> Hi,
>> 
>> I am working on extending “counted_by” attribute to pointers inside a 
>> structure per our previous discussion. 
>> 
>> I need advice on the following question:
>> 
>> Should -fsantize=bounds support array reference that was referenced through 
>> a pointer that has counted_by attribute? 
>
> I think the question is what -fsanitize=bounds is meant to be.
>
> I am a bit frustrated about the sanitizer.  On the
> one hand, it is not doing enough to get spatial memory
> safety even where this would be easily possible, on the
> other hand, is pedantic about things which are technically
> UB but not problematic and then one is prevented from
> using it

While I largely share your views on the coherence of sanitizers, I think
it's separate to whether we want -fsanitize=bounds to handle counted_by
in a particular way. It's worth us discussing properly in its own thread
on the gcc ML, IMO.

>
> When used in default mode, where execution continues, it
> also does not mix well with many warning, creates more code,
> and pulls in a libary dependency (and the library also depends
> on upstream choices / progress which seems a limitation for
> extensions).
>
> What IMHO would be ideal is a protection mode for spatial
> memory safety that simply adds traps (which then requires
> no library, has no issues with other warnings, and could
> evolve independently from clang) 
>
> So shouldn't we just add a -fboundscheck (which would 
> be like -fsanitize=bounds -fsanitize-trap=bounds just with
> more checking) and make it really good? I think many people
> would be very happy about this.
>
> Martin
>
>
>> 
>> For the following small example:
>> 
>> #include 
>> 
>> struct annotated {
>>   int b;
>>   int *c __attribute__ ((counted_by (b)));
>> } *p_array_annotated;
>> 
>> void __attribute__((__noinline__)) setup (int annotated_count)
>> {
>>   p_array_annotated
>> = (struct annotated *)malloc (sizeof (struct annotated));
>>   p_array_annotated->c = (int *) malloc (annotated_count *  sizeof (int));
>>   p_array_annotated->b = annotated_count;
>> 
>>   return;
>> }
>> 
>> int main(int argc, char *argv[])
>> {
>>   setup (10);
>>   p_array_annotated->c[11] = 2;
>>   return 0;
>> }
>> 
>> Should ubsan add instrumentation to the above reference 
>> p_array_annoated->c[11] inside routine “main”?
>> 
>> From my understanding, ubsan does not add bound checking for any pointer 
>> reference now, however, when the “counted_by” attribute is attached to a 
>> pointer field inside a structure, the “bound” information for this pointer 
>> is known, should we enhance the ubsan to instrument such reference? 
>> 
>> If Yes, then should we add the following limitation to the end user:
>> 
>>   When the counted_by attribute is attached to a pointer field, the 
>> -fsantize=bounds only work for such reference when the pointer is NOT casted 
>> to another type other than the original target type?
>> 
>> Thanks for any comments and suggestions.
>> 
>> Qing


Re: [patch,avr] Adjust comment headers

2024-11-20 Thread Georg-Johann Lay

Am 18.11.24 um 09:03 schrieb Georg-Johann Lay:

Am 16.11.24 um 13:19 schrieb Gerald Pfeifer:

On Mon, 2 Sep 2024, Georg-Johann Lay wrote:

Atmel is no more the AVR manufacturer.  This patch removes the
manufacturer from the file headers.


We also have

   AVR
   Manufacturer: Atmel
   href="https://www.microchip.com/en-us/products/microcontrollers-and-microprocessors/8-bit-mcus/avr-mcus";>AVR documentation

   

in https://gcc.gnu.org/readings.html .

How should that be changed? (Simply drop the Atmel line?)

Gerald


Hi, I'd just drop the "Atmel".  That's what I did in all the
other places.

Maybe instead of "AVR" something like "AVR 8-bit microcontrollers"
makes it clearer what AVR means in this context.

Johann


...and maybe what's also of interest are the instruction set manual (pdf)

https://ww1.microchip.com/downloads/en/DeviceDoc/AVR-InstructionSet-Manual-DS40002198.pdf

and the avr-gcc wiki that describes parts of the ABI:

https://gcc.gnu.org/wiki/avr-gcc

Johann




Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

2024-11-20 Thread Marek Polacek
On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote:
> Am Montag, dem 18.11.2024 um 17:55 + schrieb Qing Zhao:
> > Hi,
> > 
> > I am working on extending “counted_by” attribute to pointers inside a 
> > structure per our previous discussion. 
> > 
> > I need advice on the following question:
> > 
> > Should -fsantize=bounds support array reference that was referenced through 
> > a pointer that has counted_by attribute? 

I don't see why it couldn't, perhaps as part of -fsanitize=bounds-strict.
Someone has to implement it, though.
 
> I think the question is what -fsanitize=bounds is meant to be.
> 
> I am a bit frustrated about the sanitizer.  On the
> one hand, it is not doing enough to get spatial memory
> safety even where this would be easily possible, on the
> other hand, is pedantic about things which are technically
> UB but not problematic and then one is prevented from
> using it
> 
> When used in default mode, where execution continues, it
> also does not mix well with many warning, creates more code,
> and pulls in a libary dependency (and the library also depends
> on upstream choices / progress which seems a limitation for
> extensions).
> 
> What IMHO would be ideal is a protection mode for spatial
> memory safety that simply adds traps (which then requires
> no library, has no issues with other warnings, and could
> evolve independently from clang) 
> 
> So shouldn't we just add a -fboundscheck (which would 
> be like -fsanitize=bounds -fsanitize-trap=bounds just with
> more checking) and make it really good? I think many people
> would be very happy about this.

That's a separate concern.  We already have the -fbounds-check option,
currently only used in Fortran (and D?), so perhaps we could make
that option a shorthand for -fsanitize=bounds -fsanitize-trap=bounds.

Marek



Re: [PATCH 08/17] testsuite: arm: Use effective-target for vect-early-break-cbranch test

2024-11-20 Thread Richard Earnshaw (lists)
On 19/11/2024 10:23, Torbjörn SVENSSON wrote:
> Update test cases to use -mcpu=unset/-march=unset feature introduced in
> r15-3606-g7d6c6a0d15c.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/vect-early-break-cbranch.c: Use
>   effective-target arm_arch_v8a_hard.
> 
> Signed-off-by: Torbjörn SVENSSON 
> ---
>  gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c 
> b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
> index 334e064a239..fb12bfb3197 100644
> --- a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
> +++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
> @@ -2,7 +2,9 @@
>  /* { dg-require-effective-target vect_early_break } */

I think this is technically redundant (it just checks for neon on armv8a), but 
it's probably a good idea to keep it just in case it grows an additional check 
at some point.

>  /* { dg-require-effective-target arm_neon_ok } */
>  /* { dg-require-effective-target arm32 } */

These two are, I think, redundant, so can be removed; the flags added below 
will ensure they are true.

> -/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard  
> -fno-schedule-insns -fno-reorder-blocks -fno-schedule-insns2" } */
> +/* { dg-require-effective-target arm_arch_v8a_hard_ok } */
> +/* { dg-options "-O3 -fno-schedule-insns -fno-reorder-blocks 
> -fno-schedule-insns2" } */
> +/* { dg-add-options arm_arch_v8a_hard } */
>  /* { dg-final { check-function-bodies "**" "" "" } } */
>  
>  #define N 640

OK with that change.

R.


Re: [PATCH 17/17] testsuite: arm: Use effective-target for pr96939 test

2024-11-20 Thread Torbjorn SVENSSON




On 2024-11-19 18:57, Richard Earnshaw (lists) wrote:

On 19/11/2024 10:24, Torbjörn SVENSSON wrote:

Update test case to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog:

* gcc.target/arm/lto/pr96939_0.c: Use effective-target
arm_arch_v8a.
* gcc.target/arm/lto/pr96939_1.c: Remove dg-options.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/gcc.target/arm/lto/pr96939_0.c | 4 ++--
  gcc/testsuite/gcc.target/arm/lto/pr96939_1.c | 1 -
  2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c 
b/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c
index 241ffd5da0a..3bb74bd1a1d 100644
--- a/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c
+++ b/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c
@@ -1,7 +1,7 @@
  /* PR target/96939 */
  /* { dg-lto-do link } */
-/* { dg-require-effective-target arm_arch_v8a_ok } */
-/* { dg-lto-options { { -flto -O2 } } } */
+/* { dg-require-effective-target arm_arch_v8a_link } */
+/* { dg-lto-options { { -flto -O2 -mcpu=unset -march=armv8-a+simd+crc } } } */
  
  extern unsigned crc (unsigned, const void *);

  typedef unsigned (*fnptr) (unsigned, const void *);
diff --git a/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c 
b/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
index 4afdbdaf5ad..c641b5580ab 100644
--- a/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
+++ b/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
@@ -1,5 +1,4 @@
  /* PR target/96939 */
-/* { dg-options "-march=armv8-a+simd+crc" } */
  
  #include 
  


I'm not sure this is right.  The PR talks about handling streaming in of 
objects built with different options, which are supposed to be recorded in the 
streaming data.  But your change alters what will be recorded AFAICT.


I was unsure what path I should take to address this test case.
Maybe we should go with the following:

gcc.target/arm/lto/pr96939_0.c:
/* { dg-lto-do link } */
/* { dg-require-effective-target arm_arch_v8a_link } */
/* { dg-lto-options { { -flto -O2 } } } */

gcc.target/arm/lto/pr96939_1.c:
/* { dg-options "-mcpu=unset -march=armv8-a+simd+crc -mfpu=auto" } */


Should I also define an effective-target for arm_arch_v8a_crc that 
checks using -march=armv8-a+crc+simd -mfpu=auto -mfloat-abi=softfp and 
add dg-r-e-t for it in the pr96939_0.c file? Or is it safe to assume 
that this architecture is available if v8a is available?


Keep in mind that I cannot rely on dg-add-otions in an LTO test.
Do we want to run this is -mfloat-abi=softfp or -mfloat-abi=hard mode?

Kind regards
Torbjörn


R.




[r15-5489 Regression] FAIL: gcc.target/i386/pr116174.c check-function-bodies foo on Linux/x86_64

2024-11-20 Thread haochen.jiang
On Linux/x86_64,

6350e956d1a74963a62bedabef3d4a1a3f2d4852 is the first bad commit
commit 6350e956d1a74963a62bedabef3d4a1a3f2d4852
Author: MayShao-oc 
Date:   Thu Nov 7 10:57:02 2024 +0800

Add microarchtecture tunable for pass_align_tight_loops [PR117438]

caused

FAIL: gcc.target/i386/pr116174.c check-function-bodies foo

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-5489/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr116174.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH 11/17] testsuite: arm: Use effective-target for pr56184.C and pr59985.C

2024-11-20 Thread Torbjorn SVENSSON




On 11/19/24 18:08, Richard Earnshaw (lists) wrote:

On 19/11/2024 10:24, Torbjörn SVENSSON wrote:

Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog:

* g++.dg/other/pr56184.C: Use effective-target
arm_arch_v7a_neon and arm_arch_v7a_thumb.
* g++.dg/other/pr59985.C: Use effective-target
arm_arch_v7a_neon and arm_arch_v7a_arm.
* lib/target-supports.exp: Define effective-target
arm_arch_v7a_thumb.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/g++.dg/other/pr56184.C  | 8 ++--
  gcc/testsuite/g++.dg/other/pr59985.C  | 7 +--
  gcc/testsuite/lib/target-supports.exp | 1 +
  3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/g++.dg/other/pr56184.C 
b/gcc/testsuite/g++.dg/other/pr56184.C
index dc949283c98..f4a4300c385 100644
--- a/gcc/testsuite/g++.dg/other/pr56184.C
+++ b/gcc/testsuite/g++.dg/other/pr56184.C
@@ -1,6 +1,10 @@
  /* { dg-do compile { target arm*-*-* } } */
-/* { dg-skip-if "incompatible options" { ! { arm_thumb1_ok || arm_thumb2_ok } 
} } */
-/* { dg-options "-fno-short-enums -O2 -mthumb -march=armv7-a -mfpu=neon 
-mfloat-abi=softfp -mtune=cortex-a9 -fno-section-anchors -Wno-return-type" } */
+/* { dg-require-effective-target arm_arch_v7a_neon_ok } */
+/* { dg-require-effective-target arm_arch_v7a_thumb_ok } */
+/* { dg-options "-fno-short-enums -O2 -fno-section-anchors -Wno-return-type" } 
*/
+/* { dg-add-options arm_arch_v7a_neon } */
+/* { dg-additional-options "-mthumb -mtune=cortex-a9" } */
+
  


I'd add a new entry for v7a_neon_thumb for this, then we only need one dg-r-e-t 
rule here.


  typedef unsigned int size_t;
  __extension__ typedef int __intptr_t;
diff --git a/gcc/testsuite/g++.dg/other/pr59985.C 
b/gcc/testsuite/g++.dg/other/pr59985.C
index 7c9bfab35f1..a0f5e184b43 100644
--- a/gcc/testsuite/g++.dg/other/pr59985.C
+++ b/gcc/testsuite/g++.dg/other/pr59985.C
@@ -1,7 +1,10 @@
  /* { dg-do compile { target arm*-*-* } } */
-/* { dg-skip-if "incompatible options" { arm_thumb1 } } */
-/* { dg-options "-g -fcompare-debug -O2 -march=armv7-a -mtune=cortex-a9 
-mfpu=vfpv3-d16 -mfloat-abi=hard" } */
  /* { dg-skip-if "need hardfp abi" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+/* { dg-require-effective-target arm_arch_v7a_arm_ok } */
+/* { dg-require-effective-target arm_arch_v7a_neon_ok } */
+/* { dg-options "-g -fcompare-debug -O2" } */
+/* { dg-add-options arm_arch_v7a_neon } */
+/* { dg-additional-options "-marm -mtune=cortex-a9 -mfloat-abi=hard 
-mfpu=vfpv3-d16" } */


I don't follow this change, the original test never looks at neon, nor needs it 
AFAICT.


I am trying to use the existing effective-targets to verify that -marm 
and -mfloat-abi=hard is supported for the armv7-a target.
Would you like me to define an arm_arch_v7a_arm_hard effective-target 
and override with -mfpu=vfpv3-d16 or do you want a dedicated 
effective-target that will contain also the -mfpu=vfpv3-d16 in the check?


Kind regards,
Torbjörn



  
  extern void *f1 (unsigned long, unsigned long);

  extern const struct line_map *f2 (void *, int, unsigned int, const char *, 
unsigned int);
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 30e453a578a..6241c00a752 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5778,6 +5778,7 @@ foreach { armfunc armflag armdefs } {
v7a "-march=armv7-a+fp" __ARM_ARCH_7A__
v7a_arm "-march=armv7-a+fp -marm" "__ARM_ARCH_7A__ && !__thumb__"
v7a_neon "-march=armv7-a+simd -mfpu=auto -mfloat-abi=softfp" "__ARM_ARCH_7A__ 
&& __ARM_NEON__"
+   v7a_thumb "-march=armv7-a+fp -mthumb" "__ARM_ARCH_7A__ && __thumb__"


I think you want -mfpu=auto here as well.


v7r "-march=armv7-r+fp" __ARM_ARCH_7R__
v7m "-march=armv7-m -mthumb -mfloat-abi=soft" __ARM_ARCH_7M__
v7em "-march=armv7e-m+fp -mthumb" __ARM_ARCH_7EM__


R.



[PATCH v1 2/3] RISC-V: Introduce riscv/rvv/autovec/sat folder to rvv.exp testsuite

2024-11-20 Thread pan2 . li
From: Pan Li 

After we move vector SAT_ADD testcases into a isolated folder, aka
riscv/rvv/autovec/sat.  We would like to add the folder as one of
the test items of the rvv.exp testsuite.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add the vector sat folder to
the rvv.exp testsuite.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index dbe1f11c0e8..71251737be2 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -122,6 +122,8 @@ foreach op $AUTOVEC_TEST_OPTS {
 "" "$op"
   dg-runtest [lsort [glob -nocomplain 
$srcdir/$subdir/autovec/strided/*.\[cS\]]] \
 "$op" ""
+  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/sat/*.\[cS\]]] \
+"$op" ""
 }
 
 # All done.
-- 
2.43.0



Re: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types

2024-11-20 Thread Jennifer Schmitz


> On 13 Nov 2024, at 12:54, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz  writes:
>> As follow-up to
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
>> this patch implements folding of svmul and svdiv by -1 to svneg for
>> unsigned SVE vector types. The key idea is to reuse the existing code that
>> does this fold for signed types and feed it as callback to a helper function
>> that adds the necessary type conversions.
> 
> I only meant that we should do this for multiplication (since the sign
> isn't relevant for N-bit x N-bit -> N-bit multiplication).  It wouldn't
> be right for unsigned division, since unsigned division by the maximum
> value is instead equivalent to x == MAX ? MAX : 0.
> 
> Some comments on the multiplication bit below:
> 
>> 
>> For example, for the test case
>> svuint64_t foo (svuint64_t x, svbool_t pg)
>> {
>>  return svmul_n_u64_x (pg, x, -1);
>> }
>> 
>> the following gimple sequence is emitted (-O2 -mcpu=grace):
>> svuint64_t foo (svuint64_t x, svbool_t pg)
>> {
>>  svuint64_t D.12921;
>>  svint64_t D.12920;
>>  svuint64_t D.12919;
>> 
>>  D.12920 = VIEW_CONVERT_EXPR(x);
>>  D.12921 = svneg_s64_x (pg, D.12920);
>>  D.12919 = VIEW_CONVERT_EXPR(D.12921);
>>  goto ;
>>  :
>>  return D.12919;
>> }
>> 
>> In general, the new helper gimple_folder::convert_and_fold
>> - takes a target type and a function pointer,
>> - converts all non-boolean vector types to the target type,
>> - replaces the converted arguments in the function call,
>> - calls the callback function,
>> - adds the necessary view converts to the gimple sequence,
>> - and returns the new call.
>> 
>> Because all arguments are converted to the same target types, the helper
>> function is only suitable for folding calls whose arguments are all of
>> the same type. If necessary, this could be extended to convert the
>> arguments to different types differentially.
>> 
>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/ChangeLog:
>> 
>>  * config/aarch64/aarch64-sve-builtins-base.cc
>>  (svmul_impl::fold): Wrap code for folding to svneg in lambda
>>  function and pass to gimple_folder::convert_and_fold to enable
>>  the transform for unsigned types.
>>  (svdiv_impl::fold): Likewise.
>>  * config/aarch64/aarch64-sve-builtins.cc
>>  (gimple_folder::convert_and_fold): New function that converts
>>  operands to target type before calling callback function, adding the
>>  necessary conversion statements.
>>  * config/aarch64/aarch64-sve-builtins.h
>>  (gimple_folder::convert_and_fold): Declare function.
>>  (signed_type_suffix_index): Return type_suffix_index of signed
>>  vector type for given width.
>>  (function_instance::signed_type): Return signed vector type for
>>  given width.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected
>>  outcome.
>>  * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
>>  * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
>>  * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>>  * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>>  * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>>  expected outcome.
>> ---
>> .../aarch64/aarch64-sve-builtins-base.cc  | 99 ---
>> gcc/config/aarch64/aarch64-sve-builtins.cc| 40 
>> gcc/config/aarch64/aarch64-sve-builtins.h | 30 ++
>> .../gcc.target/aarch64/sve/acle/asm/div_u32.c |  9 ++
>> .../gcc.target/aarch64/sve/acle/asm/div_u64.c |  9 ++
>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 -
>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>> 9 files changed, 180 insertions(+), 50 deletions(-)
>> 
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
>> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> index 1c9f515a52c..6df14a8f4c4 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> [...]
>> @@ -2082,33 +2091,49 @@ public:
>>   return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>> 
>> /* If one of the operands is all integer -1, fold to svneg.  */
>> -tree pg = gimple_call_arg (f.call, 0);
>> -tree negated_op = NULL;
>> -if (integer_minus_onep (op2))
>> -  negated_op = op1;
>> -else if (integer_minus_onep (op1))
>> -  negated_op = op2;
>> -if (!f.type_suffix (0).unsigned_p && negated_op)
>> + if (integer_minus_onep (op1) || integer_minus_onep (op2))
> 
> Formatting nit, sorry, but: indentation looks off.
> 
>>   {
>> - function_instance instance ("svneg", functions::svneg,
>> -

Re: [PATCH 14/17] testsuite: arm: Use -march=unset for pr69175.C test

2024-11-20 Thread Richard Earnshaw (lists)
On 19/11/2024 10:24, Torbjörn SVENSSON wrote:
> Update test cases to use -mcpu=unset/-march=unset feature introduced in
> r15-3606-g7d6c6a0d15c.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/opt/pr69175.C: Added option "-mcpu=unset".
> 
> Signed-off-by: Torbjörn SVENSSON 
> ---
>  gcc/testsuite/g++.dg/opt/pr69175.C | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/g++.dg/opt/pr69175.C 
> b/gcc/testsuite/g++.dg/opt/pr69175.C
> index e24f6816b5f..6d28951d5ae 100644
> --- a/gcc/testsuite/g++.dg/opt/pr69175.C
> +++ b/gcc/testsuite/g++.dg/opt/pr69175.C
> @@ -1,7 +1,7 @@
>  // PR target/69175
>  // { dg-do compile }
>  // { dg-options "-O2" }
> -// { dg-additional-options "-march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 
> -mthumb" { target { arm_hard_vfp_ok && arm_thumb2_ok } } }
> +// { dg-additional-options "-mcpu=unset -march=armv7-a -mfloat-abi=hard 
> -mfpu=vfpv3-d16 -mthumb" { target { arm_hard_vfp_ok && arm_thumb2_ok } } }
>  
>  struct A { A *c, *d; } a;
>  struct B { A *e; A *f; void foo (); };

OK.

R.


Re: [PATCH v2 08/14] Support for 64-bit location_t: Analyzer parts

2024-11-20 Thread Richard Biener
On Sun, Nov 17, 2024 at 4:28 AM Lewis Hyatt  wrote:
>
> The analyzer occasionally prints internal location_t values for debugging;
> adjust those parts so they will work if location_t is 64-bit. For
> simplicity, to avoid hassling with the printf format string, just convert to
> (unsigned long long) in either case.

OK.

> gcc/analyzer/ChangeLog:
>
> * checker-event.cc (checker_event::dump): Support printing either
> 32- or 64-bit location_t values.
> * checker-path.cc (checker_path::inject_any_inlined_call_events):
> Likewise.
> ---
>  gcc/analyzer/checker-event.cc | 4 ++--
>  gcc/analyzer/checker-path.cc  | 5 +++--
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/analyzer/checker-event.cc b/gcc/analyzer/checker-event.cc
> index 5a292377e93..bb26f71e4b4 100644
> --- a/gcc/analyzer/checker-event.cc
> +++ b/gcc/analyzer/checker-event.cc
> @@ -188,8 +188,8 @@ checker_event::dump (pretty_printer *pp) const
>if (m_effective_fndecl != m_original_fndecl)
> pp_printf (pp, " corrected from %qE", m_original_fndecl);
>  }
> -  pp_printf (pp, ", m_loc=%x)",
> -get_location ());
> +  pp_printf (pp, ", m_loc=%llx)",
> +(unsigned long long) get_location ());
>  }
>
>  /* Dump this event to stderr (for debugging/logging purposes).  */
> diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc
> index d607679beec..9626e358cb3 100644
> --- a/gcc/analyzer/checker-path.cc
> +++ b/gcc/analyzer/checker-path.cc
> @@ -281,8 +281,9 @@ checker_path::inject_any_inlined_call_events (logger 
> *logger)
>   logger->log_partial ("  %qE", iter.get_block ());
>   if (!flag_dump_noaddr)
> logger->log_partial (" (%p)", iter.get_block ());
> - logger->log_partial (", fndecl: %qE, callsite: 0x%x",
> -  iter.get_fndecl (), iter.get_callsite ());
> + logger->log_partial (", fndecl: %qE, callsite: 0x%llx",
> +  iter.get_fndecl (),
> +  (unsigned long long) iter.get_callsite ());
>   if (iter.get_callsite ())
> dump_location (logger->get_printer (), iter.get_callsite ());
>   logger->end_log_line ();


Re: [PATCH v2 12/14] Support for 64-bit location_t: Backend parts

2024-11-20 Thread Richard Biener
On Sun, Nov 17, 2024 at 4:30 AM Lewis Hyatt  wrote:
>
> A few targets have been using "unsigned int" function arguments that need to
> receive a "location_t". Change to "location_t" to prepare for the
> possibility that location_t can be configured to be a different type.

I guess the point was that location_t isn't (wasn't) available (at some point).

OK if you checked that cc1 for the adjusted targets still builds.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * config/aarch64/aarch64-c.cc (aarch64_resolve_overloaded_builtin):
> Change "unsigned int" argument to "location_t".
> * config/avr/avr-c.cc (avr_resolve_overloaded_builtin): Likewise.
> * config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): 
> Likewise.
> * target.def: Likewise.
> * doc/tm.texi: Regenerate.
> ---
>  gcc/config/aarch64/aarch64-c.cc | 3 +--
>  gcc/config/avr/avr-c.cc | 3 +--
>  gcc/config/riscv/riscv-c.cc | 3 +--
>  gcc/doc/tm.texi | 2 +-
>  gcc/target.def  | 2 +-
>  5 files changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
> index faedb25ddb3..79a680f2e24 100644
> --- a/gcc/config/aarch64/aarch64-c.cc
> +++ b/gcc/config/aarch64/aarch64-c.cc
> @@ -369,11 +369,10 @@ aarch64_pragma_aarch64 (cpp_reader *)
>
>  /* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  */
>  static tree
> -aarch64_resolve_overloaded_builtin (unsigned int uncast_location,
> +aarch64_resolve_overloaded_builtin (location_t location,
> tree fndecl, void *uncast_arglist)
>  {
>vec empty = {};
> -  location_t location = (location_t) uncast_location;
>vec *arglist = (uncast_arglist
>? (vec *) uncast_arglist
>: &empty);
> diff --git a/gcc/config/avr/avr-c.cc b/gcc/config/avr/avr-c.cc
> index d3c40d73043..7cf8344c1c7 100644
> --- a/gcc/config/avr/avr-c.cc
> +++ b/gcc/config/avr/avr-c.cc
> @@ -48,11 +48,10 @@ enum avr_builtin_id
>  /* Implement `TARGET_RESOLVE_OVERLOADED_PLUGIN'.  */
>
>  static tree
> -avr_resolve_overloaded_builtin (unsigned int iloc, tree fndecl, void *vargs)
> +avr_resolve_overloaded_builtin (location_t loc, tree fndecl, void *vargs)
>  {
>tree type0, type1, fold = NULL_TREE;
>avr_builtin_id id = AVR_BUILTIN_COUNT;
> -  location_t loc = (location_t) iloc;
>vec &args = * (vec*) vargs;
>
>switch (DECL_MD_FUNCTION_CODE (fndecl))
> diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
> index c59f408d3a8..7f78e2cf019 100644
> --- a/gcc/config/riscv/riscv-c.cc
> +++ b/gcc/config/riscv/riscv-c.cc
> @@ -312,11 +312,10 @@ riscv_check_builtin_call (location_t loc, 
> vec arg_loc, tree fndecl,
>
>  /* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  */
>  static tree
> -riscv_resolve_overloaded_builtin (unsigned int uncast_location, tree fndecl,
> +riscv_resolve_overloaded_builtin (location_t loc, tree fndecl,
>   void *uncast_arglist)
>  {
>vec empty = {};
> -  location_t loc = (location_t) uncast_location;
>vec *arglist = (vec *) uncast_arglist;
>unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
>unsigned int subcode = code >> RISCV_BUILTIN_SHIFT;
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 109e40384b6..58a94822156 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -12115,7 +12115,7 @@ ignored.  This function should return the result of 
> the call to the
>  built-in function.
>  @end deftypefn
>
> -@deftypefn {Target Hook} tree TARGET_RESOLVE_OVERLOADED_BUILTIN (unsigned 
> int @var{loc}, tree @var{fndecl}, void *@var{arglist})
> +@deftypefn {Target Hook} tree TARGET_RESOLVE_OVERLOADED_BUILTIN (location_t 
> @var{loc}, tree @var{fndecl}, void *@var{arglist})
>  Select a replacement for a machine specific built-in function that
>  was set up by @samp{TARGET_INIT_BUILTINS}.  This is done
>  @emph{before} regular type checking, and so allows the target to
> diff --git a/gcc/target.def b/gcc/target.def
> index 523ae7ec9aa..e285cef5743 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -2497,7 +2497,7 @@ arguments passed to the built-in function.  The result 
> is a\n\
>  complete expression that implements the operation, usually\n\
>  another @code{CALL_EXPR}.\n\
>  @var{arglist} really has type @samp{VEC(tree,gc)*}",
> - tree, (unsigned int /*location_t*/ loc, tree fndecl, void *arglist), NULL)
> + tree, (location_t loc, tree fndecl, void *arglist), NULL)
>
>  DEFHOOK
>  (check_builtin_call,


Re: [PATCH v2 07/14] Support for 64-bit location_t: toplev parts

2024-11-20 Thread Richard Biener
On Sun, Nov 17, 2024 at 4:28 AM Lewis Hyatt  wrote:
>
> With the move from 32-bit to 64-bit location_t, the recommended number of
> range bits will change from 5 to 7. line-map.h now exports the recommended
> setting, so use that instead of hard-coding 5.
>
> Also silently ignore -flarge-source-files, which will become unnecessary with
> 64-bit location_t and would harm performance.

This is OK once the dependences are approved.

Richard.

> gcc/ChangeLog:
>
> * common.opt: Mark -flarge-source-files as Ignored.
> * doc/invoke.texi: Remove -flarge-source-files.
> * toplev.cc (general_init): Use new constant
> line_map_suggested_range_bits instead of the hard-coded integer 5.
> (process_options): Remove support for -flarge-source-files.
> ---
>  gcc/common.opt  |  5 ++---
>  gcc/doc/invoke.texi | 17 +
>  gcc/toplev.cc   |  5 +
>  3 files changed, 4 insertions(+), 23 deletions(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 33be6b8042a..be74dc177c8 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1804,9 +1804,8 @@ Common Undocumented Var(flag_keep_gc_roots_live) 
> Optimization
>  ; Always keep a pointer to a live memory block
>
>  flarge-source-files
> -Common Var(flag_large_source_files) Init(0)
> -Improve GCC's ability to track column numbers in large source files,
> -at the expense of slower compilation.
> +Common Ignore
> +Does nothing.  Preserved for backward compatibility.
>
>  flate-combine-instructions
>  Common Var(flag_late_combine_instructions) Optimization Init(0)
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 929feaf23fb..ba7ff14979b 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -691,7 +691,7 @@ Objective-C and Objective-C++ Dialects}.
>  -dD  -dI  -dM  -dN  -dU
>  -fdebug-cpp  -fdirectives-only  -fdollars-in-identifiers
>  -fexec-charset=@var{charset}  -fextended-identifiers
> --finput-charset=@var{charset}  -flarge-source-files
> +-finput-charset=@var{charset}
>  -fmacro-prefix-map=@var{old}=@var{new} -fmax-include-depth=@var{depth}
>  -fno-canonical-system-headers  -fpch-deps  -fpch-preprocess
>  -fpreprocessed  -ftabstop=@var{width}  -ftrack-macro-expansion
> @@ -18664,21 +18664,6 @@ This option may be useful in conjunction with the 
> @option{-B} or
>  perform additional processing of the program source between
>  normal preprocessing and compilation.
>
> -@opindex flarge-source-files
> -@item -flarge-source-files
> -Adjust GCC to expect large source files, at the expense of slower
> -compilation and higher memory usage.
> -
> -Specifically, GCC normally tracks both column numbers and line numbers
> -within source files and it normally prints both of these numbers in
> -diagnostics.  However, once it has processed a certain number of source
> -lines, it stops tracking column numbers and only tracks line numbers.
> -This means that diagnostics for later lines do not include column numbers.
> -It also means that options like @option{-Wmisleading-indentation} cease to 
> work
> -at that point, although the compiler prints a note if this happens.
> -Passing @option{-flarge-source-files} significantly increases the number
> -of source lines that GCC can process before it stops tracking columns.
> -
>  @end table
>
>  @node Assembler Options
> diff --git a/gcc/toplev.cc b/gcc/toplev.cc
> index 779049674b4..bd95521e3ff 100644
> --- a/gcc/toplev.cc
> +++ b/gcc/toplev.cc
> @@ -1137,7 +1137,7 @@ general_init (const char *argv0, bool init_signals, 
> unique_argv original_argv)
>linemap_init (line_table, BUILTINS_LOCATION);
>line_table->m_reallocator = realloc_for_line_map;
>line_table->m_round_alloc_size = ggc_round_alloc_size;
> -  line_table->default_range_bits = 5;
> +  line_table->default_range_bits = line_map_suggested_range_bits;
>init_ttree ();
>
>/* Initialize register usage now so switches may override.  */
> @@ -1765,9 +1765,6 @@ process_options ()
>  hash_table_sanitize_eq_limit
>= param_hash_table_verification_limit;
>
> -  if (flag_large_source_files)
> -line_table->default_range_bits = 0;
> -
>diagnose_options (&global_options, &global_options_set, UNKNOWN_LOCATION);
>
>/* Please don't change global_options after this point, those changes won't


Re: [PATCH] i386: Remove workaround for Solaris ld 64-bit TLS IE limitation

2024-11-20 Thread Uros Bizjak
On Wed, Nov 20, 2024 at 11:44 AM Rainer Orth
 wrote:
>
> As detailed in PR target/43309, the Solaris linker initially took the
> 64-bit x86 TLS IE code sequence literally, assuming that the spec only
> allowed %rax as target register.
>
> A workaround has been in place for more than a decade, but is no longer
> necessary.  The bug had already been fixed for Solaris 11.1, while trunk
> requires Solaris 11.4.
>
> Uros pointed this out and suggested the attached patch.
>
> Bootstrapped without regressions on i386-pc-solaris2.11.
>
> Ok for trunk?
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2024-10-15  Uros Bizjak  
>
> gcc:
> * config/i386/i386.cc (legitimize_tls_address)
> : Remove 64-bit Solaris ld workaround.
> * config/i386/i386.md (UNSPEC_TLS_IE_SUN): Remove.
> (tls_initial_exec_64_sun): Remove.

OK.

Thanks,
Uros.


Re: [PATCH 11/17] testsuite: arm: Use effective-target for pr56184.C and pr59985.C

2024-11-20 Thread Richard Earnshaw (lists)
On 20/11/2024 07:58, Torbjorn SVENSSON wrote:
> 
> 
> On 11/19/24 18:08, Richard Earnshaw (lists) wrote:
>> On 19/11/2024 10:24, Torbjörn SVENSSON wrote:
>>> Update test cases to use -mcpu=unset/-march=unset feature introduced in
>>> r15-3606-g7d6c6a0d15c.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * g++.dg/other/pr56184.C: Use effective-target
>>> arm_arch_v7a_neon and arm_arch_v7a_thumb.
>>> * g++.dg/other/pr59985.C: Use effective-target
>>> arm_arch_v7a_neon and arm_arch_v7a_arm.
>>> * lib/target-supports.exp: Define effective-target
>>> arm_arch_v7a_thumb.
>>>
>>> Signed-off-by: Torbjörn SVENSSON 
>>> ---
>>>   gcc/testsuite/g++.dg/other/pr56184.C  | 8 ++--
>>>   gcc/testsuite/g++.dg/other/pr59985.C  | 7 +--
>>>   gcc/testsuite/lib/target-supports.exp | 1 +
>>>   3 files changed, 12 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/gcc/testsuite/g++.dg/other/pr56184.C 
>>> b/gcc/testsuite/g++.dg/other/pr56184.C
>>> index dc949283c98..f4a4300c385 100644
>>> --- a/gcc/testsuite/g++.dg/other/pr56184.C
>>> +++ b/gcc/testsuite/g++.dg/other/pr56184.C
>>> @@ -1,6 +1,10 @@
>>>   /* { dg-do compile { target arm*-*-* } } */
>>> -/* { dg-skip-if "incompatible options" { ! { arm_thumb1_ok || 
>>> arm_thumb2_ok } } } */
>>> -/* { dg-options "-fno-short-enums -O2 -mthumb -march=armv7-a -mfpu=neon 
>>> -mfloat-abi=softfp -mtune=cortex-a9 -fno-section-anchors -Wno-return-type" 
>>> } */
>>> +/* { dg-require-effective-target arm_arch_v7a_neon_ok } */
>>> +/* { dg-require-effective-target arm_arch_v7a_thumb_ok } */
>>> +/* { dg-options "-fno-short-enums -O2 -fno-section-anchors 
>>> -Wno-return-type" } */
>>> +/* { dg-add-options arm_arch_v7a_neon } */
>>> +/* { dg-additional-options "-mthumb -mtune=cortex-a9" } */
>>> +
>>>   
>>
>> I'd add a new entry for v7a_neon_thumb for this, then we only need one 
>> dg-r-e-t rule here.
>>
>>>   typedef unsigned int size_t;
>>>   __extension__ typedef int __intptr_t;
>>> diff --git a/gcc/testsuite/g++.dg/other/pr59985.C 
>>> b/gcc/testsuite/g++.dg/other/pr59985.C
>>> index 7c9bfab35f1..a0f5e184b43 100644
>>> --- a/gcc/testsuite/g++.dg/other/pr59985.C
>>> +++ b/gcc/testsuite/g++.dg/other/pr59985.C
>>> @@ -1,7 +1,10 @@
>>>   /* { dg-do compile { target arm*-*-* } } */
>>> -/* { dg-skip-if "incompatible options" { arm_thumb1 } } */
>>> -/* { dg-options "-g -fcompare-debug -O2 -march=armv7-a -mtune=cortex-a9 
>>> -mfpu=vfpv3-d16 -mfloat-abi=hard" } */
>>>   /* { dg-skip-if "need hardfp abi" { *-*-* } { "-mfloat-abi=soft" } { "" } 
>>> } */
>>> +/* { dg-require-effective-target arm_arch_v7a_arm_ok } */
>>> +/* { dg-require-effective-target arm_arch_v7a_neon_ok } */
>>> +/* { dg-options "-g -fcompare-debug -O2" } */
>>> +/* { dg-add-options arm_arch_v7a_neon } */
>>> +/* { dg-additional-options "-marm -mtune=cortex-a9 -mfloat-abi=hard 
>>> -mfpu=vfpv3-d16" } */
>>
>> I don't follow this change, the original test never looks at neon, nor needs 
>> it AFAICT.
> 
> I am trying to use the existing effective-targets to verify that -marm and 
> -mfloat-abi=hard is supported for the armv7-a target.
> Would you like me to define an arm_arch_v7a_arm_hard effective-target and 
> override with -mfpu=vfpv3-d16 or do you want a dedicated effective-target 
> that will contain also the -mfpu=vfpv3-d16 in the check?

My goal is to get rid of -mfpu (other than auto) everywhere in the testsuite.  
The only exception would be for some specific backwards compatibility tests, 
which we can then know are safe to remove if/when -mfpu is obsoleted entirely.

I'm not expecting that to happen overnight, but the first step is no new uses 
of the old command-line interface and fixing up existing uses as we need to 
make changes like this.

R.

> 
> Kind regards,
> Torbjörn
> 
>>
>>>     extern void *f1 (unsigned long, unsigned long);
>>>   extern const struct line_map *f2 (void *, int, unsigned int, const char 
>>> *, unsigned int);
>>> diff --git a/gcc/testsuite/lib/target-supports.exp 
>>> b/gcc/testsuite/lib/target-supports.exp
>>> index 30e453a578a..6241c00a752 100644
>>> --- a/gcc/testsuite/lib/target-supports.exp
>>> +++ b/gcc/testsuite/lib/target-supports.exp
>>> @@ -5778,6 +5778,7 @@ foreach { armfunc armflag armdefs } {
>>>   v7a "-march=armv7-a+fp" __ARM_ARCH_7A__
>>>   v7a_arm "-march=armv7-a+fp -marm" "__ARM_ARCH_7A__ && !__thumb__"
>>>   v7a_neon "-march=armv7-a+simd -mfpu=auto -mfloat-abi=softfp" 
>>> "__ARM_ARCH_7A__ && __ARM_NEON__"
>>> +    v7a_thumb "-march=armv7-a+fp -mthumb" "__ARM_ARCH_7A__ && __thumb__"
>>
>> I think you want -mfpu=auto here as well.
>>
>>>   v7r "-march=armv7-r+fp" __ARM_ARCH_7R__
>>>   v7m "-march=armv7-m -mthumb -mfloat-abi=soft" __ARM_ARCH_7M__
>>>   v7em "-march=armv7e-m+fp -mthumb" __ARM_ARCH_7EM__
>>
>> R.
>>



[PATCH] i386: Remove workaround for Solaris ld 64-bit TLS IE limitation

2024-11-20 Thread Rainer Orth
As detailed in PR target/43309, the Solaris linker initially took the
64-bit x86 TLS IE code sequence literally, assuming that the spec only
allowed %rax as target register.

A workaround has been in place for more than a decade, but is no longer
necessary.  The bug had already been fixed for Solaris 11.1, while trunk
requires Solaris 11.4.

Uros pointed this out and suggested the attached patch.

Bootstrapped without regressions on i386-pc-solaris2.11.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-10-15  Uros Bizjak  

gcc:
* config/i386/i386.cc (legitimize_tls_address)
: Remove 64-bit Solaris ld workaround.
* config/i386/i386.md (UNSPEC_TLS_IE_SUN): Remove.
(tls_initial_exec_64_sun): Remove.

# HG changeset patch
# Parent  e1012fbd4b7a2b88a6c254cdd79c01d97f56f160
i386: Remove workaround for Solaris ld 64-bit TLS IE limitation

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -12320,17 +12320,6 @@ legitimize_tls_address (rtx x, enum tls_
 case TLS_MODEL_INITIAL_EXEC:
   if (TARGET_64BIT)
 	{
-	  if (TARGET_SUN_TLS && !TARGET_X32)
-	{
-	  /* The Sun linker took the AMD64 TLS spec literally
-		 and can only handle %rax as destination of the
-		 initial executable code sequence.  */
-
-	  dest = gen_reg_rtx (DImode);
-	  emit_insn (gen_tls_initial_exec_64_sun (dest, x));
-	  return dest;
-	}
-
 	  /* Generate DImode references to avoid %fs:(%reg32)
 	 problems and linker IE->LE relaxation bug.  */
 	  tp_mode = DImode;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -93,7 +93,6 @@
   UNSPEC_TLS_GD
   UNSPEC_TLS_LD_BASE
   UNSPEC_TLSDESC
-  UNSPEC_TLS_IE_SUN
 
   ;; Other random patterns
   UNSPEC_SCAS
@@ -22845,22 +22844,6 @@
   set_mem_addr_space (operands[2], as);
 })
 
-;; The Sun linker took the AMD64 TLS spec literally and can only handle
-;; %rax as destination of the initial executable code sequence.
-(define_insn "tls_initial_exec_64_sun"
-  [(set (match_operand:DI 0 "register_operand" "=a")
-	(unspec:DI
-	 [(match_operand 1 "tls_symbolic_operand")]
-	 UNSPEC_TLS_IE_SUN))
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && TARGET_SUN_TLS"
-{
-  output_asm_insn
-("mov{q}\t{%%fs:0, %0|%0, QWORD PTR fs:0}", operands);
-  return "add{q}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]}";
-}
-  [(set_attr "type" "multi")])
-
 ;; GNU2 TLS patterns can be split.
 
 (define_expand "tls_dynamic_gnu2_32"


Re: [PATCH 11/15] Support for 64-bit location_t: RTL parts

2024-11-20 Thread Richard Biener
On Sun, Nov 3, 2024 at 11:27 PM Lewis Hyatt  wrote:
>
> Some RTL objects need to store a location_t. Currently, they store it in the
> rt_int field of union rtunion, but in a world where location_t could be
> 64-bit, they need to store it in a larger variable. Unfortunately, rtunion
> does not currently have a 64-bit int type for that purpose, so add one. In
> order to avoid increasing any overhead when 64-bit locations are not in use,
> the new field is dedicated for location_t storage only and has type
> "location_t" so it will only be 64-bit if necessary. This necessitates
> adding a new RTX format code 'L' for locations. There are very many switch
> statements in the codebase that inspect the RTX format code. I took the
> approach of finding all of them that handle code 'i' or 'n' and making sure
> they handle 'L' too. I am sure that some of these call sites can never see
> an 'L' code, but I thought it would be safer and more future-proof to handle
> as many as possible, given it's just a line or two to add in most cases.

That sounds like a reasonable approach.

> While testing this with --enable-checking=rtl, I came across one place in
> final.cc that seems to be a (currently) harmless misuse of RTL:
>
> set_cur_block_to_this_block:
>   if (! this_block)
> {
>   if (INSN_LOCATION (insn) == UNKNOWN_LOCATION)
> continue;
>   else
> this_block = DECL_INITIAL (cfun->decl);
> }
>
> In this part of reemit_insn_block_notes(), the insn variable could actually
> be a NOTE and not an INSN. In that case, INSN_LOCATION() shouldn't be
> called on it. It works fine currently because the field is properly accessed
> by XINT() either way. (For an INSN, it is a location, but for a NOTE, it is
> the note type enum). Currently, if insn is a NOTE, the comparison must
> always be false because the note type is not equal to
> 0==UNKNOWN_LOCATION. Once locations and ints are differentiated, this line
> leads to a checking failure, which I resolved by checking for the NOTE_P
> case before calling INSN_LOCATION.

>if (! this_block)
> {
> - if (INSN_LOCATION (insn) == UNKNOWN_LOCATION)
> + if (!NOTE_P (insn) && INSN_LOCATION (insn) == UNKNOWN_LOCATION)
> continue;
>   else

I think you instead want

   if (NOTE_P (insn)
   || INSN_LOCATION (insn) == UNKNOWN_LOCATION)
 continue;

but the whole if (! this_block) block doesn't make sense to me ... I think
we only get here for NOTE_P via

  case NOTE_INSN_BEGIN_STMT:
  case NOTE_INSN_INLINE_ENTRY:
this_block = LOCATION_BLOCK (NOTE_MARKER_LOCATION (insn));
goto set_cur_block_to_this_block;

so possibly a !this_block case should be made explicit there, by checking
NOTE_MARKER_LOCATION for UNKNOWN_LOCATION.  CCing Alex who
might know.

The rest of the patch is OK absent comments from others.  I would suggest to
split out the above hunk for further review.

Richard.


> gcc/ChangeLog:
>
> * rtl.def (DEBUG_INSN): Use new format code 'L' for location_t fields.
> (INSN): Likewise.
> (JUMP_INSN): Likewise.
> (CALL_INSN): Likewise.
> (ASM_INPUT): Likewise.
> (ASM_OPERANDS): Likewise.
> * rtl.h (union rtunion): Add new location_t RT_LOC member for use by
> the 'L' format.
> (struct rtx_debug_insn): Adjust comment.
> (struct rtx_nonjump_insn): Adjust comment.
> (struct rtx_call_insn): Adjust comment.
> (XLOC): New accessor macro for rtunion::rt_loc.
> (X0LOC): Likewise.
> (XCLOC): Likewise.
> (INSN_LOCATION): Use XLOC instead of XUINT to retrieve a location_t.
> (NOTE_MARKER_LOCATION): Likewise for XCUINT -> XCLOC.
> (ASM_OPERANDS_SOURCE_LOCATION): Likewise.
> (ASM_INPUT_SOURCE_LOCATION):Likewise.
> (gen_rtx_ASM_INPUT): Adjust to use sL format instead of si.
> (gen_rtx_INSN): Adjust prototype to use location_r rather than int
> for the location.
> * cfgrtl.cc (force_nonfallthru_and_redirect): Change type of LOC
> local variable from int to location_t.
> * rtlhash.cc (add_rtx): Support 'L' format in the switch statement.
> * var-tracking.cc (loc_cmp): Likewise.
> * alias.cc (rtx_equal_for_memref_p): Likewise.
> * config/alpha/alpha.cc (summarize_insn): Likewise.
> * config/ia64/ia64.cc (rtx_needs_barrier): Likewise.
> * config/rs6000/rs6000.cc (rs6000_hash_constant): Likewise.
> * cse.cc (hash_rtx): Likewise.
> (exp_equiv_p): Likewise.
> * cselib.cc (rtx_equal_for_cselib_1): Likewise.
> (cselib_hash_rtx): Likewise.
> (cselib_expand_value_rtx_1): Likewise.
> * emit-rtl.cc (copy_insn_1): Likewise.
> (gen_rtx_INSN): Change the location argument from int to location_t,
> and call the corresponding gen_rtf_fmt_* function

[PATCH] tree-optimization/117698 - SLP vectorization and alignment

2024-11-20 Thread Richard Biener
When SLP vectorizing we fail to mark the general alignment check
as irrelevant when using VMAT_STRIDED_SLP (the implementation checks
for itself) and when VMAT_INVARIANT the override isn't effective.

This results in extra FAILs on sparc which the following fixes.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/117698
* tree-vect-stmts.cc (get_group_load_store_type): Properly
disregard alignment for VMAT_STRIDED_SLP and VMAT_INVARIANT.
---
 gcc/tree-vect-stmts.cc | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 5f7e1e622a8..67b3e379439 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2121,9 +2121,6 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
{
  gcc_assert (vls_type == VLS_LOAD);
  *memory_access_type = VMAT_INVARIANT;
- /* Invariant accesses perform only component accesses, alignment
-is irrelevant for them.  */
- *alignment_support_scheme = dr_unaligned_supported;
}
  /* Try using LOAD/STORE_LANES.  */
  else if (slp_node->ldst_lanes
@@ -2379,7 +2376,9 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
 *memory_access_type = VMAT_GATHER_SCATTER;
 
   if (*memory_access_type == VMAT_GATHER_SCATTER
-  || *memory_access_type == VMAT_ELEMENTWISE)
+  || *memory_access_type == VMAT_ELEMENTWISE
+  || *memory_access_type == VMAT_STRIDED_SLP
+  || *memory_access_type == VMAT_INVARIANT)
 {
   *alignment_support_scheme = dr_unaligned_supported;
   *misalignment = DR_MISALIGNMENT_UNKNOWN;
-- 
2.43.0


Re: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types

2024-11-20 Thread Richard Sandiford
Jennifer Schmitz  writes:
>> On 13 Nov 2024, at 12:54, Richard Sandiford  
>> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> Jennifer Schmitz  writes:
>>> As follow-up to
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
>>> this patch implements folding of svmul and svdiv by -1 to svneg for
>>> unsigned SVE vector types. The key idea is to reuse the existing code that
>>> does this fold for signed types and feed it as callback to a helper function
>>> that adds the necessary type conversions.
>> 
>> I only meant that we should do this for multiplication (since the sign
>> isn't relevant for N-bit x N-bit -> N-bit multiplication).  It wouldn't
>> be right for unsigned division, since unsigned division by the maximum
>> value is instead equivalent to x == MAX ? MAX : 0.
>> 
>> Some comments on the multiplication bit below:
>> 
>>> 
>>> For example, for the test case
>>> svuint64_t foo (svuint64_t x, svbool_t pg)
>>> {
>>>  return svmul_n_u64_x (pg, x, -1);
>>> }
>>> 
>>> the following gimple sequence is emitted (-O2 -mcpu=grace):
>>> svuint64_t foo (svuint64_t x, svbool_t pg)
>>> {
>>>  svuint64_t D.12921;
>>>  svint64_t D.12920;
>>>  svuint64_t D.12919;
>>> 
>>>  D.12920 = VIEW_CONVERT_EXPR(x);
>>>  D.12921 = svneg_s64_x (pg, D.12920);
>>>  D.12919 = VIEW_CONVERT_EXPR(D.12921);
>>>  goto ;
>>>  :
>>>  return D.12919;
>>> }
>>> 
>>> In general, the new helper gimple_folder::convert_and_fold
>>> - takes a target type and a function pointer,
>>> - converts all non-boolean vector types to the target type,
>>> - replaces the converted arguments in the function call,
>>> - calls the callback function,
>>> - adds the necessary view converts to the gimple sequence,
>>> - and returns the new call.
>>> 
>>> Because all arguments are converted to the same target types, the helper
>>> function is only suitable for folding calls whose arguments are all of
>>> the same type. If necessary, this could be extended to convert the
>>> arguments to different types differentially.
>>> 
>>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Jennifer Schmitz 
>>> 
>>> gcc/ChangeLog:
>>> 
>>>  * config/aarch64/aarch64-sve-builtins-base.cc
>>>  (svmul_impl::fold): Wrap code for folding to svneg in lambda
>>>  function and pass to gimple_folder::convert_and_fold to enable
>>>  the transform for unsigned types.
>>>  (svdiv_impl::fold): Likewise.
>>>  * config/aarch64/aarch64-sve-builtins.cc
>>>  (gimple_folder::convert_and_fold): New function that converts
>>>  operands to target type before calling callback function, adding the
>>>  necessary conversion statements.
>>>  * config/aarch64/aarch64-sve-builtins.h
>>>  (gimple_folder::convert_and_fold): Declare function.
>>>  (signed_type_suffix_index): Return type_suffix_index of signed
>>>  vector type for given width.
>>>  (function_instance::signed_type): Return signed vector type for
>>>  given width.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>  * gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected
>>>  outcome.
>>>  * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
>>>  * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
>>>  * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>>>  * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>>>  * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>>>  expected outcome.
>>> ---
>>> .../aarch64/aarch64-sve-builtins-base.cc  | 99 ---
>>> gcc/config/aarch64/aarch64-sve-builtins.cc| 40 
>>> gcc/config/aarch64/aarch64-sve-builtins.h | 30 ++
>>> .../gcc.target/aarch64/sve/acle/asm/div_u32.c |  9 ++
>>> .../gcc.target/aarch64/sve/acle/asm/div_u64.c |  9 ++
>>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 -
>>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>>> 9 files changed, 180 insertions(+), 50 deletions(-)
>>> 
>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
>>> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> index 1c9f515a52c..6df14a8f4c4 100644
>>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> [...]
>>> @@ -2082,33 +2091,49 @@ public:
>>>   return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>>> 
>>> /* If one of the operands is all integer -1, fold to svneg.  */
>>> -tree pg = gimple_call_arg (f.call, 0);
>>> -tree negated_op = NULL;
>>> -if (integer_minus_onep (op2))
>>> -  negated_op = op1;
>>> -else if (integer_minus_onep (op1))
>>> -  negated_op = op2;
>>> -if (!f.type_suffix (0).unsigned_p && negated_op)
>>> + if (integer_minus_onep (op1) || integer_minus_onep (op2))
>> 

Re: [PATCH v2 1/4] aarch64: return scalar fp8 values in fp registers

2024-11-20 Thread Claudio Bantaloukas



On 19/11/2024 17:01, Andrew Pinski wrote:

On Fri, Nov 8, 2024 at 8:11 AM Claudio Bantaloukas
 wrote:


According to the aapcs64: If the argument is an 8-bit (...) precision
Floating-point or short vector type and the NSRN is less than 8, then the
argument is allocated to the least significant bits of register v[NSRN].

gcc/
 * config/aarch64/aarch64.cc
 (aarch64_vfp_is_call_or_return_candidate): use fp registers to
 return svmfloat8_t parameters.

gcc/testsuite/
 * gcc.target/aarch64/fp8_scalar_1.c:

This changed fp8_scalar_1.c's stacktest1 body but the generated code
there changed to be just:
 sub sp, sp, #16
 str b0, [sp, 15]
 ldr b0, [sp, 15]
 add sp, sp, 16
 ret

Instead of having to require a move to the GPRs.
This code generation seems correct for what it is testing. Did
something else change the generated code after you made the change to
the testcase or was it still failing?


Hi Andrew,

there are quite a few changes around parameter passing in patch 2/4.

Files under gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ 
 have 
changed. For example, in lastb_mf8.c


    lastbw0, p0, z0\.b

became

lastbb0, p0, z0\.b

Under gcc/testsuite/gcc.target/aarch64/sve/pcs/, return_4.c and similar, 
there used to be a


umovw0, v0.b\[0\]

which is no longer needed and

movz0\.b, w4 in calees have become movz0\.b, b4

Also in varargs_2_mf8.c the change altered register allocation.

It was an oversight of mine that I should have mentioned, along with"A 
change has been added to fix return of scalar fp8 values" that multiple 
tests have been updated to reflect the return.


Cheers,

Claudio



Thanks,
Andrew



---
  gcc/config/aarch64/aarch64.cc   | 3 ++-
  gcc/testsuite/gcc.target/aarch64/fp8_scalar_1.c | 4 ++--
  2 files changed, 4 insertions(+), 3 deletions(-)



[PATCH] Use decl size in Solaris ASM_DECLARE_OBJECT_NAME [PR102296]

2024-11-20 Thread Rainer Orth
Solaris has modified versions of ASM_DECLARE_OBJECT_NAME on both i386
and sparc.  When

commit ce597aedd79e646c4a5517505088d380239cbfa5
Author: Ilya Enkovich 
Date:   Thu Aug 7 08:04:55 2014 +

elfos.h (ASM_DECLARE_OBJECT_NAME): Use decl size instead of type size.

was applied, those were missed.  At the same time, the testcase was
restricted to Linux though there's nothing Linux-specific in there, so
the error remained undetected.

This patch fixes the definitions to match elfos.h and enables the test
on Solaris, too.

Bootstrapped without regressions on i386-pc-solaris2.11 and
sparc-sun-solaris2.11.

Ok for trunk?


I noticed that both openbsd.h and mcore/mcore-elf.h have the same
problem.  Since I can test neither of those, I left them alone.

Besides, it should be possible to move the testcase out of
gcc.target/i386, simultaneously restricting it to ELF targets.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-11-19  Rainer Orth  

gcc/testsuite:
PR target/102296
* gcc.target/i386/struct-size.c: Enable on *-*-solaris*.

gcc:
PR target/102296
* config/i386/sol2.h (ASM_DECLARE_OBJECT_NAME): Use decl size
instead of type size.
* config/sparc/sol2.h (ASM_DECLARE_OBJECT_NAME): Likewise.

# HG changeset patch
# Parent  d3d7928c798dfe0651a069bb3012e3a8ead27e44
Use decl size in Solaris ASM_DECLARE_OBJECT_NAME [PR102296]

diff --git a/gcc/config/i386/sol2.h b/gcc/config/i386/sol2.h
--- a/gcc/config/i386/sol2.h
+++ b/gcc/config/i386/sol2.h
@@ -179,7 +179,7 @@ along with GCC; see the file COPYING3.  
 	  && (DECL) && DECL_SIZE (DECL))			\
 	{			\
 	  size_directive_output = 1;\
-	  size = int_size_in_bytes (TREE_TYPE (DECL));		\
+	  size = tree_to_uhwi (DECL_SIZE_UNIT (DECL));		\
 	  ASM_OUTPUT_SIZE_DIRECTIVE (FILE, NAME, size);		\
 	}			\
 \
diff --git a/gcc/config/sparc/sol2.h b/gcc/config/sparc/sol2.h
--- a/gcc/config/sparc/sol2.h
+++ b/gcc/config/sparc/sol2.h
@@ -377,7 +377,7 @@ extern const char *host_detect_local_cpu
 	  && (DECL) && DECL_SIZE (DECL))			\
 	{			\
 	  size_directive_output = 1;\
-	  size = int_size_in_bytes (TREE_TYPE (DECL));		\
+	  size = tree_to_uhwi (DECL_SIZE_UNIT (DECL));		\
 	  ASM_OUTPUT_SIZE_DIRECTIVE (FILE, NAME, size);		\
 	}			\
 \
diff --git a/gcc/testsuite/gcc.target/i386/struct-size.c b/gcc/testsuite/gcc.target/i386/struct-size.c
--- a/gcc/testsuite/gcc.target/i386/struct-size.c
+++ b/gcc/testsuite/gcc.target/i386/struct-size.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target *-*-linux* } } */
+/* { dg-do compile { target *-*-linux* *-*-solaris* } } */
 /* { dg-options "-Wno-pedantic" } */
 
 struct S {


[PATCH] Optimize 128-bit vector permutation with pand, pandn and por.

2024-11-20 Thread Cui, Lili
Hi, all

This patch aims to handle certain vector shuffle operations using pand, pandn 
and por more efficiently.

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

Regards,
Lili.


This patch introduces a new subroutine in ix86_expand_vec_perm_const_1.
On x86, use mixed constant permutation for V8HImode and V16QImode when
SSE2 is supported. This patch handles certain vector shuffle operations
more efficiently using pand, pandn and por. This change is intended to
improve assembly code generation for configurations that support SSE2.

gcc/ChangeLog:

PR target/116675
* config/i386/i386-expand.cc (expand_vec_perm_pand_pandn_por):
New subroutine.
(ix86_expand_vec_perm_const_1): Call expand_vec_perm_pand_pandn_por.

gcc/testsuite/ChangeLog:

PR target/116675
* gcc.target/i386/pr116675.c: New test.
---
 gcc/config/i386/i386-expand.cc   | 50 
 gcc/testsuite/gcc.target/i386/pr116675.c | 75 
 2 files changed, 125 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr116675.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index a6e6e738a52..f9fa0281298 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -23103,6 +23103,53 @@ expand_vec_perm_vpshufb2_vpermq_even_odd (struct 
expand_vec_perm_d *d)
   return true;
 }
 
+/* A subroutine of ix86_expand_vec_perm_const_1. Try to implement a
+   permutation (which is a bland) with and, andnot and or when pshufb is not 
available.
+
+   It handles case:
+   __builtin_shufflevector (v1, v2, 0, 9, 2, 11, 4, 13, 6, 15);
+   __builtin_shufflevector (v1, v2, 8, 1, 2, 11, 4, 13, 6, 15);
+
+   An element[i] must be chosen between op0[i] and op1[i] to satisfy the
+   requirement.
+ */
+
+static bool
+expand_vec_perm_pand_pandn_por (struct expand_vec_perm_d *d)
+{
+  rtx rperm[16], vperm;
+  unsigned int i, nelt = d->nelt;
+
+  if (!TARGET_SSE2
+  || d->one_operand_p
+  || (d->vmode != V16QImode && d->vmode != V8HImode))
+return false;
+
+  if (d->perm[0] != 0)
+return false;
+
+  /* The dest[i] must select an element between op0[i] and op1[i].  */
+  for (i = 1; i < nelt; i++)
+if ((d->perm[i] % nelt) != i)
+  return false;
+
+  if (d->testing_p)
+ return true;
+
+  /* Generates a blend mask for the operators AND and ANDNOT.  */
+  machine_mode inner_mode = GET_MODE_INNER (d->vmode);
+  for (i = 0; i < nelt; i++)
+rperm[i] = (d->perm[i] <  nelt) ? CONSTM1_RTX (inner_mode)
+  : CONST0_RTX (inner_mode);
+
+  vperm = gen_rtx_CONST_VECTOR (d->vmode, gen_rtvec_v (nelt, rperm));
+  vperm = force_reg (d->vmode, vperm);
+
+  ix86_expand_sse_movcc (d->target, vperm, d->op0, d->op1);
+
+  return true;
+}
+
 /* Implement permutation with pslldq + psrldq + por when pshufb is not
available.  */
 static bool
@@ -24162,6 +24209,9 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d 
*d)
   if (expand_vec_perm_psrlw_psllw_por (d))
 return true;
 
+  if (expand_vec_perm_pand_pandn_por (d))
+return true;
+
   /* Try sequences of four instructions.  */
 
   if (expand_vec_perm_even_odd_trunc (d))
diff --git a/gcc/testsuite/gcc.target/i386/pr116675.c 
b/gcc/testsuite/gcc.target/i386/pr116675.c
new file mode 100644
index 000..e463dd8415f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr116675.c
@@ -0,0 +1,75 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -msse2 -mno-ssse3" } */
+/* { dg-final { scan-assembler-times "pand" 4 } } */
+/* { dg-final { scan-assembler-times "pandn" 4 } } */
+/* { dg-final { scan-assembler-times "por" 4 } } */
+
+#include 
+
+__attribute__((noinline, noclone, target("sse2")))
+static __v8hi foo1 (__v8hi a, __v8hi b)
+{
+  return __builtin_shufflevector (a, b, 0, 9, 2, 11, 4, 13, 6, 15);
+}
+
+__attribute__((noinline, noclone, target("sse2")))
+static __v8hi foo2 (__v8hi a, __v8hi b)
+{
+  return __builtin_shufflevector (a, b, 8, 9, 2, 3, 4, 13, 14, 15);
+}
+
+__attribute__((noinline, noclone, target("sse2")))
+static __v16qi foo3 (__v16qi a, __v16qi b)
+{
+  return __builtin_shufflevector (a, b, 0, 17, 2, 19, 4, 21, 6, 23,
+ 8, 25, 10, 27, 12, 29, 14, 31);
+}
+
+__attribute__((noinline, noclone, target("sse2")))
+static __v16qi foo4 (__v16qi a, __v16qi b)
+{
+  return __builtin_shufflevector (a, b, 0, 1, 2, 3, 4, 21, 6, 23,
+8, 25, 10, 27,12,29,14,31);
+}
+
+__attribute__((noinline, noclone)) void
+compare_v8hi (__v8hi a,  __v8hi b)
+{
+  for (int i = 0; i < 8; i++) 
+if (a[i] != b[i]) 
+  __builtin_abort ();
+}
+
+__attribute__((noinline, noclone)) void
+compare_v16qi (__v16qi a,  __v16qi b)
+{
+  for (int i = 0; i < 16; i++)
+if (a[i] != b[i])
+  __builtin_abort ();
+}
+
+int main (void)
+{
+  __v8hi s1, s2, s3, s4, s5, s6;
+  __v16qi s7, s8, s9, s10, s11, s12;
+  s1 = (__v8hi) {0, 1, 2, 3, 4, 5, 6, 7};
+  s2 = (__v8hi) {8, 9, 10, 11, 12, 13,

Re: [PATCH v2 05/14] gimple: Handle tail padding when computing gimple_ops_offset

2024-11-20 Thread Richard Biener
On Tue, Nov 19, 2024 at 4:37 PM Lewis Hyatt  wrote:
>
> On Tue, Nov 19, 2024 at 9:55 AM Richard Biener
>  wrote:
> >
> > On Sun, Nov 17, 2024 at 4:25 AM Lewis Hyatt  wrote:
> > >
> > > The array gimple_ops_offset_[], which is used to find the trailing op[]
> > > array for a given gimple struct, is computed assuming that op[] will be
> > > found at sizeof(tree) bytes away from the end of the struct. This is only
> > > correct if the alignment requirement of a pointer is the same as the
> > > alignment requirement of the struct, otherwise there will be padding bytes
> > > that invalidate the calculation. On 64-bit platforms, this generally works
> > > fine because a pointer has 8-byte alignment and none of the structs make 
> > > use
> > > of more than that. On 32-bit platforms, it also currently works fine 
> > > because
> > > there are no 64-bit integers in the gimple structs. There are 32-bit
> > > platforms (e.g. sparc) on which a pointer has 4-byte alignment and a
> > > uint64_t has 8-byte alignment. On such platforms, adding a uint64_t to the
> > > gimple structs (as will take place when location_t is changed to be 
> > > 64-bit)
> > > causes gimple_ops_offset_ to be 4 bytes too large.
> > >
> > > It would be nice to use offsetof() to compute the offset exactly, but
> > > offsetof() is not guaranteed to work for these types, because they use
> > > inheritance and so are not standard layout types. This patch attempts to
> > > detect the presence of tail padding by detecting when such padding is 
> > > reused
> > > by inheritance; the padding should generally be reused for the same reason
> > > that offsetof() is not available, namely that all the relevant types use
> > > inheritance. One could envision systems on which this fix does not go far
> > > enough (e.g., if the ABI forbids reuse of tail padding), but it makes 
> > > things
> > > better without affecting anything that currently works.
> > >
> > > gcc/ChangeLog:
> > >
> > > * gimple.cc (get_tail_padding_adjustment): New function.
> > > (DEFGSSTRUCT): Adjust the computation of gimple_ops_offset_ to be
> > > correct in the presence of tail padding.
> > > ---
> > >  gcc/gimple.cc | 34 +-
> > >  1 file changed, 29 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/gcc/gimple.cc b/gcc/gimple.cc
> > > index f7b313be40e..f0a642f5b51 100644
> > > --- a/gcc/gimple.cc
> > > +++ b/gcc/gimple.cc
> > > @@ -52,12 +52,36 @@ along with GCC; see the file COPYING3.  If not see
> > >  #include "ipa-modref.h"
> > >  #include "dbgcnt.h"
> > >
> > > -/* All the tuples have their operand vector (if present) at the very 
> > > bottom
> > > -   of the structure.  Therefore, the offset required to find the
> > > -   operands vector the size of the structure minus the size of the 1
> > > -   element tree array at the end (see gimple_ops).  */
> > > +/* All the tuples have their operand vector (if present) at the very 
> > > bottom of
> > > +   the structure.  Therefore, the offset required to find the operands 
> > > vector is
> > > +   the size of the structure minus the size of the 1-element tree array 
> > > at the
> > > +   end (see gimple_ops).  An adjustment may be required if there is tail
> > > +   padding, as may happen on a host (e.g. sparc) where a pointer has 
> > > 4-byte
> > > +   alignment while a uint64_t has 8-byte alignment.
> > > +
> > > +   Unfortunately, we can't use offsetof to do this computation 100%
> > > +   straightforwardly, because these structs use inheritance and so are 
> > > not
> > > +   standard layout types.  However, the fact that they are not standard 
> > > layout
> > > +   types also means that tail padding will be reused in inheritance, 
> > > which makes
> > > +   it possible to check for the problematic case with the following logic
> > > +   instead.  If tail padding is detected, the offset should be decreased
> > > +   accordingly.  */
> > > +
> > > +template
> > > +static constexpr size_t
> > > +get_tail_padding_adjustment ()
> > > +{
> > > +  struct padding_check : G
> > > +  {
> > > +tree t;
> > > +  };
> > > +  return sizeof (padding_check) == sizeof (G) ? sizeof (tree) : 0;
> > > +}
> > > +
> > >  #define DEFGSSTRUCT(SYM, STRUCT, HAS_TREE_OP) \
> > > -   (HAS_TREE_OP ? sizeof (struct STRUCT) - sizeof (tree) : 0),
> > > +  (HAS_TREE_OP \
> > > +   ? sizeof (STRUCT) - sizeof (tree) - 
> > > get_tail_padding_adjustment () \
> > > +   : 0),
> >
> > I wonder if we cannot simply use offsetof (STRUCT, ops) and some
> > "trick" to avoid
> > parsing/sanitizing this when HAS_TREE_OP is false?  Maybe even a
> >
> > template 
> > constexpr size_t ops_offset () { return 0; }
> >
> > template 
> > constexpr size_t ops_offset ()
> > {
> >   return offsetof (T, ops);
> > /* or T x; return (char *)x.ops - (char *)x; */
> > }
> >
> > ?  That is I don't like the "indirect" adjustment via computing the padding.
> >
> > Richard.
>
> Thanks, yes, I tried to start this way as wel

Re: [PATCH 16/17] testsuite: arm: Use effective-target for its.c test [PR94531]

2024-11-20 Thread Torbjorn SVENSSON




On 2024-11-19 18:51, Richard Earnshaw (lists) wrote:

On 19/11/2024 10:24, Torbjörn SVENSSON wrote:

The test case gcc.target/arm/its.c was created together with restriction
of IT blocks for Cortex-M7. As the test case fails on all tunes that
does not match Cortex-M7, explicitly test it for Cortex-M7. To have some
additional faith that GCC does the correct thing, I also added another
variant of the test for Cortex-M3 that should allow longer IT blocks.

gcc/testsuite/ChangeLog:

PR testsuite/94531
* gcc.target/arm/its.c: Removed.
* gcc.target/arm/its-1.c: Copy of gcc.target/arm/its.c. Use
effective-target arm_cpu_cortex_m7.
* gcc.target/arm/its-2.c: Copy of gcc.target/arm/its.c. Use
effective-target arm_cpu_cortex_m3.

Signed-off-by: Torbjörn SVENSSON 
---
  .../gcc.target/arm/{its.c => its-1.c} |  7 +++---
  gcc/testsuite/gcc.target/arm/its-2.c  | 24 +++
  2 files changed, 28 insertions(+), 3 deletions(-)
  rename gcc/testsuite/gcc.target/arm/{its.c => its-1.c} (67%)
  create mode 100644 gcc/testsuite/gcc.target/arm/its-2.c

diff --git a/gcc/testsuite/gcc.target/arm/its.c 
b/gcc/testsuite/gcc.target/arm/its-1.c
similarity index 67%
rename from gcc/testsuite/gcc.target/arm/its.c
rename to gcc/testsuite/gcc.target/arm/its-1.c
index f81a0df51cd..78323b89892 100644
--- a/gcc/testsuite/gcc.target/arm/its.c
+++ b/gcc/testsuite/gcc.target/arm/its-1.c
@@ -1,7 +1,8 @@
  /* { dg-do compile } */
-/* { dg-require-effective-target arm_cortex_m } */
-/* { dg-require-effective-target arm_thumb2 } */
+/* { dg-require-effective-target arm_cpu_cortex_m7_ok } */
  /* { dg-options "-O2" }  */
+/* { dg-add-options arm_cpu_cortex_m7 } */
+
  int test (int a, int b)
  {
int r;
@@ -21,4 +22,4 @@ int test (int a, int b)
  }
  /* Ensure there is no IT block with more than 2 instructions, ie. we only 
allow
 IT, ITT and ITE.  */
-/* { dg-final { scan-assembler-not "\\sit\[te\]{2}" } } */
+/* { dg-final { scan-assembler-not "\tit\[te\]{2}" } } */


You don't mention the reason for this hunk in your description.  What's the 
issue you're trying to address here?


Since I was splitting this testcase into 2 parts, and I was asked to 
prefix assembler instructions with a tab in another PR, I just aligned 
it. Do you want me to keep the whitespace check rather than an explicit 
tab check?


Kind regards,
Torbjörn



R.


diff --git a/gcc/testsuite/gcc.target/arm/its-2.c 
b/gcc/testsuite/gcc.target/arm/its-2.c
new file mode 100644
index 000..9eb3bf5ce8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/its-2.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_cpu_cortex_m3_ok } */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_cpu_cortex_m3 } */
+
+int test (int a, int b)
+{
+  int r;
+  if (a > 10)
+{
+  r = a - b;
+  r += 10;
+}
+  else
+{
+  r = b - a;
+  r -= 7;
+}
+  if (r > 0)
+r -= 3;
+  return r;
+}
+/* Ensure there is an IT block with at least 2 instructions.  */
+/* { dg-final { scan-assembler "\tit\[te\]{2}" } } */






Re: [PATCH] tree-optimization/117574 - bougs niter lt-to-ne

2024-11-20 Thread Richard Biener
On Fri, 15 Nov 2024, Richard Biener wrote:

> When trying to change a IV from IV0 < IV1 to IV0' != IV1' we apply
> fancy adjustments to the may_be_zero condition we compute rather
> than using the obvious IV0->base >= IV1->base expression (to be
> able to use > instead of >=?).  This doesn't seem to go well.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> Can anybody think of what the adjustment by mod is about?

I have now pushed this.

Richard.

> Thanks,
> Richard.
> 
>   PR tree-optimization/117574
>   * tree-ssa-loop-niter.cc (number_of_iterations_lt_to_ne):
>   Use the obvious may_be_zero condition.
> 
>   * gcc.dg/torture/pr117574-1.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/torture/pr117574-1.c | 20 +++
>  gcc/tree-ssa-loop-niter.cc| 31 +--
>  2 files changed, 27 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr117574-1.c
> 
> diff --git a/gcc/testsuite/gcc.dg/torture/pr117574-1.c 
> b/gcc/testsuite/gcc.dg/torture/pr117574-1.c
> new file mode 100644
> index 000..2e99cec13b6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr117574-1.c
> @@ -0,0 +1,20 @@
> +/* { dg-do run } */
> +
> +void abort (void);
> +int a, c;
> +long b;
> +short d;
> +static long e(long f, long h, long i) {
> +  for (long g = f; g <= h; g += i)
> +b += g;
> +  return b;
> +}
> +int main() {
> +  c = 1;
> +  for (; c >= 0; c--)
> +;
> +  for (; e(d + 40, d + 76, c + 51) < 4;)
> +;
> +  if (a != 0)
> +abort ();
> +}
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 9518bf969cd..1be4b552206 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -1200,17 +1200,6 @@ number_of_iterations_lt_to_ne (tree type, affine_iv 
> *iv0, affine_iv *iv1,
> if (integer_zerop (assumption))
>   return false;
>   }
> -  if (mpz_cmp (mmod, bnds->below) < 0)
> - noloop = boolean_false_node;
> -  else if (POINTER_TYPE_P (type))
> - noloop = fold_build2 (GT_EXPR, boolean_type_node,
> -   iv0->base,
> -   fold_build_pointer_plus (iv1->base, tmod));
> -  else
> - noloop = fold_build2 (GT_EXPR, boolean_type_node,
> -   iv0->base,
> -   fold_build2 (PLUS_EXPR, type1,
> -iv1->base, tmod));
>  }
>else
>  {
> @@ -1226,21 +1215,15 @@ number_of_iterations_lt_to_ne (tree type, affine_iv 
> *iv0, affine_iv *iv1,
> if (integer_zerop (assumption))
>   return false;
>   }
> -  if (mpz_cmp (mmod, bnds->below) < 0)
> - noloop = boolean_false_node;
> -  else if (POINTER_TYPE_P (type))
> - noloop = fold_build2 (GT_EXPR, boolean_type_node,
> -   fold_build_pointer_plus (iv0->base,
> -fold_build1 (NEGATE_EXPR,
> - type1, 
> tmod)),
> -   iv1->base);
> -  else
> - noloop = fold_build2 (GT_EXPR, boolean_type_node,
> -   fold_build2 (MINUS_EXPR, type1,
> -iv0->base, tmod),
> -   iv1->base);
>  }
>  
> +  /* IV0 < IV1 does not loop if IV0->base >= IV1->base.  */
> +  if (mpz_cmp (mmod, bnds->below) < 0)
> +noloop = boolean_false_node;
> +  else
> +noloop = fold_build2 (GE_EXPR, boolean_type_node,
> +   iv0->base, iv1->base);
> +
>if (!integer_nonzerop (assumption))
>  niter->assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
> niter->assumptions,
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] testsuite: i386: Fix gcc.target/i386/pr117232-1.c etc. with Solaris as

2024-11-20 Thread Richard Biener
On Wed, Nov 20, 2024 at 11:24 AM Rainer Orth
 wrote:
>
> Two tests FAIL on Solaris/x86 with the native assembler:
>
> FAIL: gcc.target/i386/pr117232-1.c scan-assembler-times (?n)cmovn?c 7
> FAIL: gcc.target/i386/pr117232-apx-1.c scan-assembler-times (?n)cmovn?c 7
>
> The problem is that as expects a slightly different insn syntax, e.g.
>
> cmovl.nc%esi, %eax
>
> instead of
>
> cmovnc  %esi, %eax
>
> This patch allows for both forms.
>
> Tested on i386-pc-solaris2.11 (as and gas) and x86_64-pc-linux-gnu.
>
> Ok for trunk?

OK.

RIchard.

> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2024-11-15  Rainer Orth  
>
> gcc/testsuite:
> * gcc.target/i386/pr117232-1.c (scan-assembler-times): Allow for
> cmovl.nc etc.
> * gcc.target/i386/pr117232-apx-1.c: Likewise.
>


Re: [PATCH 00/15] Fix data races with sub-longword accesses on Alpha

2024-11-20 Thread John Paul Adrian Glaubitz
Hi Maciej

On Mon, 2024-11-18 at 02:59 +, Maciej W. Rozycki wrote:
>  This has come out of a discussion[1] around the removal of non-BWX Alpha 
> support from the Linux kernel due to data races affecting RCU algorithms. 
> (...)
>  Comments, questions, voices of concern or appreciation, all very welcome.

Thanks a lot for your work, much appreciated!

I gave this a try on gcc.git master with LRA enabled for both non-BWX and
BWX targets. BWX targets build fine with almost all languages enabled (I
didn't try Ada, Go and Rust), but non-BWX fails with:

during RTL pass: reload
../../../libgomp/team.c: In function 'gomp_team_start':
../../../libgomp/team.c:940:1: internal compiler error: in 
get_unaligned_address, at config/alpha/alpha.cc:1577
  940 | }
  | ^
mv -f .deps/bar.Tpo .deps/bar.Plo
none needed
checking whether /home/glaubitz/gcc.git/build/./gcc/xgcc 
-B/home/glaubitz/gcc.git/build/./gcc/ -B/usr/local/alpha-unknown-linux-gnu/bin/ 
-B/usr/local/alpha-unknown-linux-gnu/lib/ -isystem
/usr/local/alpha-unknown-linux-gnu/include -isystem 
/usr/local/alpha-unknown-linux-gnu/sys-include   -fno-checking understands -c 
and -o together... 0x1231f08d7 internal_error(char const*, ...)
../../gcc/diagnostic-global-context.cc:518
0x12319596b fancy_abort(char const*, int, char const*)
../../gcc/diagnostic.cc:1696
0x121e93827 get_unaligned_address(rtx_def*)
../../gcc/config/alpha/alpha.cc:1577
0x121e9691b alpha_expand_mov_nobwx(machine_mode, rtx_def**)
../../gcc/config/alpha/alpha.cc:2348
0x122bad4f3 gen_movqi(rtx_def*, rtx_def*)
../../gcc/config/alpha/alpha.md:4291
0x120bf3a7b rtx_insn* insn_gen_fn::operator()(rtx_def*, 
rtx_def*) const
../../gcc/recog.h:442
0x120eb25a7 emit_move_insn_1(rtx_def*, rtx_def*)
../../gcc/expr.cc:4578
0x120eb345f emit_move_insn(rtx_def*, rtx_def*)
../../gcc/expr.cc:4748
0x12134230b lra_emit_move(rtx_def*, rtx_def*)
../../gcc/lra.cc:509
0x1213674ab curr_insn_transform
../../gcc/lra-constraints.cc:4751
0x12136aa53 lra_constraints(bool)
../../gcc/lra-constraints.cc:5497
0x12134960b lra(_IO_FILE*, int)
../../gcc/lra.cc:2445
0x1212c6a5f do_reload
../../gcc/ira.cc:5977
0x1212c72d3 execute
../../gcc/ira.cc:6165
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

Do you have any plans for handling unaligned accesses for LRA as well?

Thanks,
adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH 03/15] tree-cfg: Fix call to next_discriminator_for_locus()

2024-11-20 Thread Richard Biener
On Sun, Nov 3, 2024 at 11:23 PM Lewis Hyatt  wrote:
>
> While testing 64-bit location_t support, I ran into an -fcompare-debug issue
> that was traced back here. Despite the name, next_discriminator_for_locus()
> is meant to take an integer line number argument, not a location_t. There is
> one call site which has been passing a location_t instead. For the most part
> that is harmless, although in case there are two CALL stmts on the same line
> with different location_t, it may fail to generate a unique discriminator
> where it should. Once location_t is configured to be 64-bit, however, it
> produces an -fcompare-debug failure which is what I noticed. Fix it by passing
> the line number rather than the location_t.
>
> I am not aware of a testcase that demonstrates any observable wrong
> behavior, but the file debug/pr53466.C is an example where the discriminator
> assignment is indeed different before and after this change.

OK. (also for affected branches)

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-cfg.cc (assign_discriminators): Fix incorrect value passed to
> next_discriminator_for_locus().
> ---
>  gcc/tree-cfg.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
> index 3eede0d61cd..c2100a51a7a 100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -1251,7 +1251,7 @@ assign_discriminators (void)
> }
>   /* Allocate a new discriminator for CALL stmt.  */
>   if (gimple_code (stmt) == GIMPLE_CALL)
> -   curr_discr = next_discriminator_for_locus (curr_locus);
> +   curr_discr = next_discriminator_for_locus (curr_locus_e.line);
> }
>
>gimple *last = last_nondebug_stmt (bb);


Re: [PATCH 16/17] testsuite: arm: Use effective-target for its.c test [PR94531]

2024-11-20 Thread Richard Earnshaw (lists)
On 20/11/2024 10:11, Torbjorn SVENSSON wrote:
> 
> 
> On 2024-11-19 18:51, Richard Earnshaw (lists) wrote:
>> On 19/11/2024 10:24, Torbjörn SVENSSON wrote:
>>> The test case gcc.target/arm/its.c was created together with restriction
>>> of IT blocks for Cortex-M7. As the test case fails on all tunes that
>>> does not match Cortex-M7, explicitly test it for Cortex-M7. To have some
>>> additional faith that GCC does the correct thing, I also added another
>>> variant of the test for Cortex-M3 that should allow longer IT blocks.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR testsuite/94531
>>> * gcc.target/arm/its.c: Removed.
>>> * gcc.target/arm/its-1.c: Copy of gcc.target/arm/its.c. Use
>>> effective-target arm_cpu_cortex_m7.
>>> * gcc.target/arm/its-2.c: Copy of gcc.target/arm/its.c. Use
>>> effective-target arm_cpu_cortex_m3.
>>>
>>> Signed-off-by: Torbjörn SVENSSON 
>>> ---
>>>   .../gcc.target/arm/{its.c => its-1.c} |  7 +++---
>>>   gcc/testsuite/gcc.target/arm/its-2.c  | 24 +++
>>>   2 files changed, 28 insertions(+), 3 deletions(-)
>>>   rename gcc/testsuite/gcc.target/arm/{its.c => its-1.c} (67%)
>>>   create mode 100644 gcc/testsuite/gcc.target/arm/its-2.c
>>>
>>> diff --git a/gcc/testsuite/gcc.target/arm/its.c 
>>> b/gcc/testsuite/gcc.target/arm/its-1.c
>>> similarity index 67%
>>> rename from gcc/testsuite/gcc.target/arm/its.c
>>> rename to gcc/testsuite/gcc.target/arm/its-1.c
>>> index f81a0df51cd..78323b89892 100644
>>> --- a/gcc/testsuite/gcc.target/arm/its.c
>>> +++ b/gcc/testsuite/gcc.target/arm/its-1.c
>>> @@ -1,7 +1,8 @@
>>>   /* { dg-do compile } */
>>> -/* { dg-require-effective-target arm_cortex_m } */
>>> -/* { dg-require-effective-target arm_thumb2 } */
>>> +/* { dg-require-effective-target arm_cpu_cortex_m7_ok } */
>>>   /* { dg-options "-O2" }  */
>>> +/* { dg-add-options arm_cpu_cortex_m7 } */
>>> +
>>>   int test (int a, int b)
>>>   {
>>>     int r;
>>> @@ -21,4 +22,4 @@ int test (int a, int b)
>>>   }
>>>   /* Ensure there is no IT block with more than 2 instructions, ie. we only 
>>> allow
>>>  IT, ITT and ITE.  */
>>> -/* { dg-final { scan-assembler-not "\\sit\[te\]{2}" } } */
>>> +/* { dg-final { scan-assembler-not "\tit\[te\]{2}" } } */
>>
>> You don't mention the reason for this hunk in your description.  What's the 
>> issue you're trying to address here?
> 
> Since I was splitting this testcase into 2 parts, and I was asked to prefix 
> assembler instructions with a tab in another PR, I just aligned it. Do you 
> want me to keep the whitespace check rather than an explicit tab check?
> 

I'd be inclined to leave it.  The key thing here is that there is some 
whitespace before the token we're trying to match (so that we don't risk 
matching against other text in the file).  It will probably be a tab, since 
that's what GCC currently emits, but that's not a syntactic requirement.

R.

> Kind regards,
> Torbjörn
> 
>>
>> R.
>>
>>> diff --git a/gcc/testsuite/gcc.target/arm/its-2.c 
>>> b/gcc/testsuite/gcc.target/arm/its-2.c
>>> new file mode 100644
>>> index 000..9eb3bf5ce8c
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/arm/its-2.c
>>> @@ -0,0 +1,24 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target arm_cpu_cortex_m3_ok } */
>>> +/* { dg-options "-O2" }  */
>>> +/* { dg-add-options arm_cpu_cortex_m3 } */
>>> +
>>> +int test (int a, int b)
>>> +{
>>> +  int r;
>>> +  if (a > 10)
>>> +    {
>>> +  r = a - b;
>>> +  r += 10;
>>> +    }
>>> +  else
>>> +    {
>>> +  r = b - a;
>>> +  r -= 7;
>>> +    }
>>> +  if (r > 0)
>>> +    r -= 3;
>>> +  return r;
>>> +}
>>> +/* Ensure there is an IT block with at least 2 instructions.  */
>>> +/* { dg-final { scan-assembler "\tit\[te\]{2}" } } */
>>
> 



Re: [PATCH] expr, c, gimplify, v3: Don't clear whole unions [PR116416]

2024-11-20 Thread Jakub Jelinek
On Tue, Nov 19, 2024 at 11:08:03PM +, Joseph Myers wrote:
> > --- gcc/testsuite/gcc.dg/gnu11-empty-init-1.c.jj2024-10-15 
> > 16:14:23.411063701 +0200
> > +++ gcc/testsuite/gcc.dg/gnu11-empty-init-1.c   2024-10-15 
> > 16:31:02.302984714 +0200
> > @@ -0,0 +1,199 @@
> > +/* Test GNU C11 support for empty initializers.  */
> > +/* { dg-do run } */
> > +/* { dg-options "-std=gnu23" } */
> 
> All these gnu11-*.c tests are using -std=gnu23, which doesn't make sense.  
> If they're meant to test what GCC does in C11 mode, use -std=gnu11; if 
> they're meant to use -std=gnu23, name them gnu23-*.c.  (In either case, 
> the tests might, as now, also have -fzero-init-padding-bits= options when 
> that's part of what they're meant to test.)

Oops, sorry, good catch.
Yes, all tests meant to use -std=gnu11.  Here is an updated patch with
that
sed -i -e s/-std=gnu23/-std=gnu11/ gcc/testsuite/gcc.dg/gnu11-empty-init*.c
The tests still pass.  Note, -std=gnu23 instead of -std=gnu11 just clears
perhaps some more padding bits in some places, but the tests are actually
just testing when (my reading of) the C23 standard or these new options
imply the padding bits should be zero; the tests actually don't check that
those bits aren't zero otherwise, as that is UB and the memset from some
other call to -1 might not keep everything still non-zero; and testing
e.g. gimple dumps that the zeroing doesn't occur isn't bullet proof either,
as the gimplifier in various cases just for optimization purposes decides
to zero anyway.

Smoke tested so far, will do full bootstrap/regtest momentarily.

2024-11-20  Jakub Jelinek  

PR c++/116416
gcc/
* flag-types.h (enum zero_init_padding_bits_kind): New type.
* tree.h (CONSTRUCTOR_ZERO_PADDING_BITS): Define.
* common.opt (fzero-init-padding-bits=): New option.
* expr.cc (categorize_ctor_elements_1): Handle
CONSTRUCTOR_ZERO_PADDING_BITS or
flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_ALL.  Fix up
*p_complete = -1; setting for unions.
(complete_ctor_at_level_p): Handle unions differently for
flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_STANDARD.
* gimple-fold.cc (type_has_padding_at_level_p): Fix up UNION_TYPE
handling, return also true for UNION_TYPE with no FIELD_DECLs
and non-zero size, handle QUAL_UNION_TYPE like UNION_TYPE.
* doc/invoke.texi (-fzero-init-padding-bits=@var{value}): Document.
gcc/c/
* c-parser.cc (c_parser_braced_init): Set CONSTRUCTOR_ZERO_PADDING_BITS
for flag_isoc23 empty initializers.
* c-typeck.cc (constructor_zero_padding_bits): New variable.
(struct constructor_stack): Add zero_padding_bits member.
(really_start_incremental_init): Save and clear
constructor_zero_padding_bits.
(push_init_level): Save constructor_zero_padding_bits.  Or into it
CONSTRUCTOR_ZERO_PADDING_BITS from previous value if implicit.
(pop_init_level): Set CONSTRUCTOR_ZERO_PADDING_BITS if
constructor_zero_padding_bits and restore
constructor_zero_padding_bits.
gcc/testsuite/
* gcc.dg/plugin/infoleak-1.c (test_union_2b, test_union_4b): Expect
diagnostics.
* gcc.dg/c23-empty-init-4.c: New test.
* gcc.dg/gnu11-empty-init-1.c: New test.
* gcc.dg/gnu11-empty-init-2.c: New test.
* gcc.dg/gnu11-empty-init-3.c: New test.
* gcc.dg/gnu11-empty-init-4.c: New test.

--- gcc/flag-types.h.jj 2024-10-07 11:40:04.518038504 +0200
+++ gcc/flag-types.h2024-10-15 13:50:34.800660119 +0200
@@ -291,6 +291,13 @@ enum auto_init_type {
   AUTO_INIT_ZERO = 2
 };
 
+/* Initialization of padding bits with zeros.  */
+enum zero_init_padding_bits_kind {
+  ZERO_INIT_PADDING_BITS_STANDARD = 0,
+  ZERO_INIT_PADDING_BITS_UNIONS = 1,
+  ZERO_INIT_PADDING_BITS_ALL = 2
+};
+
 /* Different instrumentation modes.  */
 enum sanitize_code {
   /* AddressSanitizer.  */
--- gcc/tree.h.jj   2024-10-07 11:40:04.521038462 +0200
+++ gcc/tree.h  2024-10-15 13:50:34.801660105 +0200
@@ -1225,6 +1225,9 @@ extern void omp_clause_range_check_faile
   (vec_safe_length (CONSTRUCTOR_ELTS (NODE)))
 #define CONSTRUCTOR_NO_CLEARING(NODE) \
   (CONSTRUCTOR_CHECK (NODE)->base.public_flag)
+/* True if even padding bits should be zeroed during initialization.  */
+#define CONSTRUCTOR_ZERO_PADDING_BITS(NODE) \
+  (CONSTRUCTOR_CHECK (NODE)->base.default_def_flag)
 
 /* Iterate through the vector V of CONSTRUCTOR_ELT elements, yielding the
value of each element (stored within VAL). IX must be a scratch variable
--- gcc/common.opt.jj   2024-10-07 11:40:04.510038616 +0200
+++ gcc/common.opt  2024-10-15 13:50:35.227654223 +0200
@@ -3505,6 +3505,22 @@ fzero-call-used-regs=
 Common RejectNegative Joined
 Clear call-used registers upon function return.
 
+fzero-init-padding-bits=
+Common Joined RejectNegative Enum(zero_init_padding_bits_kind) 
Var(flag_zero_init_padding_bits) 

[PATCH] rtl-reader: Disable reuse_rtx support for generator building

2024-11-20 Thread Andrew Pinski
reuse_rtx is not documented nor the format to use it is ever documented.
So it should not be supported for the .md files.

This also fixes the problem if an invalid index is supplied for reuse_rtx,
instead of ICEing, put out a real error message.  Note since this code
still uses atoi, an invalid index can still be used in some cases but that is
recorded as part of PR 44574.

Note I did a grep of the sources to make sure that this was only used for
the read rtl in the GCC rather than while reading in .md files.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* read-md.h (class rtx_reader): Don't include m_reuse_rtx_by_id
when GENERATOR_FILE is defined.
* read-rtl.cc (rtx_reader::read_rtx_code): Disable reuse_rtx
support when GENERATOR_FILE is defined.

Signed-off-by: Andrew Pinski 
---
 gcc/read-md.h   | 2 ++
 gcc/read-rtl.cc | 9 +++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/read-md.h b/gcc/read-md.h
index 9703551a8fd..e613c42b724 100644
--- a/gcc/read-md.h
+++ b/gcc/read-md.h
@@ -364,8 +364,10 @@ class rtx_reader : public md_reader
   /* Analogous to rtx_writer's m_in_call_function_usage.  */
   bool m_in_call_function_usage;
 
+#ifndef GENERATOR_FILE
   /* Support for "reuse_rtx" directives.  */
   auto_vec m_reuse_rtx_by_id;
+#endif
 };
 
 /* Global singleton; constrast with md_reader_ptr above.  */
diff --git a/gcc/read-rtl.cc b/gcc/read-rtl.cc
index bfce806f9d6..630f9c59c37 100644
--- a/gcc/read-rtl.cc
+++ b/gcc/read-rtl.cc
@@ -1672,7 +1672,6 @@ rtx_reader::read_rtx_code (const char *code_name)
   struct md_name name;
   rtx return_rtx;
   int c;
-  long reuse_id = -1;
 
   /* Linked list structure for making RTXs: */
   struct rtx_list
@@ -1681,6 +1680,8 @@ rtx_reader::read_rtx_code (const char *code_name)
   rtx value;   /* Value of this node.  */
 };
 
+#ifndef GENERATOR_FILE
+  long reuse_id = -1;
   /* Handle reuse_rtx ids e.g. "(0|scratch:DI)".  */
   if (ISDIGIT (code_name[0]))
 {
@@ -1696,10 +1697,12 @@ rtx_reader::read_rtx_code (const char *code_name)
   read_name (&name);
   unsigned idx = atoi (name.string);
   /* Look it up by ID.  */
-  gcc_assert (idx < m_reuse_rtx_by_id.length ());
+  if (idx >= m_reuse_rtx_by_id.length ())
+   fatal_with_file_and_line ("invalid reuse index %u", idx);
   return_rtx = m_reuse_rtx_by_id[idx];
   return return_rtx;
 }
+#endif
 
   /* Handle "const_double_zero".  */
   if (strcmp (code_name, "const_double_zero") == 0)
@@ -1727,12 +1730,14 @@ rtx_reader::read_rtx_code (const char *code_name)
   memset (return_rtx, 0, RTX_CODE_SIZE (code));
   PUT_CODE (return_rtx, code);
 
+#ifndef GENERATOR_FILE
   if (reuse_id != -1)
 {
   /* Store away for later reuse.  */
   m_reuse_rtx_by_id.safe_grow_cleared (reuse_id + 1, true);
   m_reuse_rtx_by_id[reuse_id] = return_rtx;
 }
+#endif
 
   /* Check for flags. */
   read_flags (return_rtx);
-- 
2.43.0



[PATCH v1 3/3] RISC-V: Refine the rtl dump expand check for vector SAT_ADD

2024-11-20 Thread pan2 . li
From: Pan Li 

This patch would like to remove the unnecessary option for the
vector SAT_ADD testcases at first.  And the different optimization
option like O2 and O3 will be passed to the test files for rtl
expand dump check.  If there are different dump check times for
different optimization options, the target no-opts and/or any-opts
will be leveraged for the dg-final check.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-1-s16.c: Remove
the unnecessary option and refine the rtl IFN dump check.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-1-s32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-1-s64.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-1-s8.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-2-s16.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-2-s32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-2-s64.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-2-s8.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-3-s16.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-3-s32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-3-s64.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-3-s8.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-4-s16.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-4-s32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-4-s64.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-4-s8.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-1-s16.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-1-s32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-1-s64.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-1-s8.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-2-s16.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-2-s32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-2-s64.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-2-s8.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-3-s16.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-3-s32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-3-s64.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-3-s8.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-4-s16.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-4-s32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-4-s64.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add-run-4-s8.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-1-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-4-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-6-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-7-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_add-7-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u

Re: [PATCH v3 2/7] Add expand_promote_outgoing_argument

2024-11-20 Thread Richard Biener
On Sun, Nov 10, 2024 at 1:55 PM H.J. Lu  wrote:
>
> Since the C/C++/Ada frontends no longer promote integer argument smaller
> than int, add expand_promote_outgoing_argument to promote it when expanding
> builtin functions.

I wonder if we should instead handle this in the generic builtin expansion code
instead?  Otherwise we'd need to fix all targets similarly?

Richard.

> PR middle-end/14907
> * expr.cc (expand_promote_outgoing_argument): New function.
> * expr.h (expand_promote_outgoing_argument): New prototype.
> * config/i386/i386-expand.cc (ix86_expand_binop_builtin): Call
> expand_promote_outgoing_argument to expand the outgoing
> argument.
> (ix86_expand_multi_arg_builtin): Likewise.
> (ix86_expand_unop_vec_merge_builtin): Likewise.
> (ix86_expand_sse_compare): Likewise.
> (ix86_expand_sse_comi): Likewise.
> (ix86_expand_sse_round): Likewise.
> (ix86_expand_sse_round_vec_pack_sfix): Likewise.
> (ix86_expand_sse_ptest): Likewise.
> (ix86_expand_sse_pcmpestr): Likewise.
> (ix86_expand_sse_pcmpistr): Likewise.
> (ix86_expand_args_builtin): Likewise.
> (ix86_expand_sse_comi_round): Likewise.
> (ix86_expand_round_builtin): Likewise.
> (ix86_expand_special_args_builtin): Likewise.
> (ix86_expand_vec_init_builtin): Likewise.
> (ix86_expand_vec_ext_builtin): Likewise.
> (ix86_expand_builtin): Likewise.
>
> Signed-off-by: H.J. Lu 
> ---
>  gcc/config/i386/i386-expand.cc | 244 -
>  gcc/expr.cc|  18 +++
>  gcc/expr.h |   1 +
>  3 files changed, 141 insertions(+), 122 deletions(-)
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 5c4a8e07d62..ce887d96f6a 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -10415,8 +10415,8 @@ ix86_expand_binop_builtin (enum insn_code icode, tree 
> exp, rtx target)
>rtx pat;
>tree arg0 = CALL_EXPR_ARG (exp, 0);
>tree arg1 = CALL_EXPR_ARG (exp, 1);
> -  rtx op0 = expand_normal (arg0);
> -  rtx op1 = expand_normal (arg1);
> +  rtx op0 = expand_promote_outgoing_argument (arg0);
> +  rtx op1 = expand_promote_outgoing_argument (arg1);
>machine_mode tmode = insn_data[icode].operand[0].mode;
>machine_mode mode0 = insn_data[icode].operand[1].mode;
>machine_mode mode1 = insn_data[icode].operand[2].mode;
> @@ -10564,7 +10564,7 @@ ix86_expand_multi_arg_builtin (enum insn_code icode, 
> tree exp, rtx target,
>for (i = 0; i < nargs; i++)
>  {
>tree arg = CALL_EXPR_ARG (exp, i);
> -  rtx op = expand_normal (arg);
> +  rtx op = expand_promote_outgoing_argument (arg);
>int adjust = (comparison_p) ? 1 : 0;
>machine_mode mode = insn_data[icode].operand[i+adjust+1].mode;
>
> @@ -10691,7 +10691,7 @@ ix86_expand_unop_vec_merge_builtin (enum insn_code 
> icode, tree exp,
>  {
>rtx pat;
>tree arg0 = CALL_EXPR_ARG (exp, 0);
> -  rtx op1, op0 = expand_normal (arg0);
> +  rtx op1, op0 = expand_promote_outgoing_argument (arg0);
>machine_mode tmode = insn_data[icode].operand[0].mode;
>machine_mode mode0 = insn_data[icode].operand[1].mode;
>
> @@ -10727,8 +10727,8 @@ ix86_expand_sse_compare (const struct 
> builtin_description *d,
>rtx pat;
>tree arg0 = CALL_EXPR_ARG (exp, 0);
>tree arg1 = CALL_EXPR_ARG (exp, 1);
> -  rtx op0 = expand_normal (arg0);
> -  rtx op1 = expand_normal (arg1);
> +  rtx op0 = expand_promote_outgoing_argument (arg0);
> +  rtx op1 = expand_promote_outgoing_argument (arg1);
>rtx op2;
>machine_mode tmode = insn_data[d->icode].operand[0].mode;
>machine_mode mode0 = insn_data[d->icode].operand[1].mode;
> @@ -10823,8 +10823,8 @@ ix86_expand_sse_comi (const struct 
> builtin_description *d, tree exp,
>rtx pat, set_dst;
>tree arg0 = CALL_EXPR_ARG (exp, 0);
>tree arg1 = CALL_EXPR_ARG (exp, 1);
> -  rtx op0 = expand_normal (arg0);
> -  rtx op1 = expand_normal (arg1);
> +  rtx op0 = expand_promote_outgoing_argument (arg0);
> +  rtx op1 = expand_promote_outgoing_argument (arg1);
>enum insn_code icode = d->icode;
>const struct insn_data_d *insn_p = &insn_data[icode];
>machine_mode mode0 = insn_p->operand[0].mode;
> @@ -10916,7 +10916,7 @@ ix86_expand_sse_round (const struct 
> builtin_description *d, tree exp,
>  {
>rtx pat;
>tree arg0 = CALL_EXPR_ARG (exp, 0);
> -  rtx op1, op0 = expand_normal (arg0);
> +  rtx op1, op0 = expand_promote_outgoing_argument (arg0);
>machine_mode tmode = insn_data[d->icode].operand[0].mode;
>machine_mode mode0 = insn_data[d->icode].operand[1].mode;
>
> @@ -10948,8 +10948,8 @@ ix86_expand_sse_round_vec_pack_sfix (const struct 
> builtin_description *d,
>rtx pat;
>tree arg0 = CALL_EXPR_ARG (exp, 0);
>tree arg1 = CALL_EXPR_ARG (exp, 1);
> -  rtx op0 = expand_normal (arg0);
> -  rtx op1 = expand_

Re: [PATCH v3 6/7] scev-cast.c: Adjusted

2024-11-20 Thread Richard Biener
On Sun, Nov 10, 2024 at 1:56 PM H.J. Lu  wrote:
>
> Since the C frontend no longer promotes char argument, adjust scev-cast.c.

I wonder whether the adjusted testcase would pass now already for
!PROMOTE_PROTOTYPE
targets and thus whether the { target i?86-*-* x86_64-*-* } is still
necessary after the change?

> PR middle-end/14907
> * gcc.dg/tree-ssa/scev-cast.c: Adjusted.
>
> Signed-off-by: H.J. Lu 
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/scev-cast.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-cast.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/scev-cast.c
> index c569523ffa7..1a3c150a884 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/scev-cast.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-cast.c
> @@ -22,6 +22,6 @@ void tst(void)
>  blau ((unsigned char) i);
>  }
>
> -/* { dg-final { scan-tree-dump-times "& 255" 1 "optimized" } } */
> -/* { dg-final { scan-tree-dump-times "= \\(signed char\\)" 1 "optimized" } } 
> */
> +/* { dg-final { scan-tree-dump-times "= \\(unsigned char\\)" 2 "optimized" } 
> } */
> +/* { dg-final { scan-tree-dump-times "= \\(signed char\\)" 3 "optimized" } } 
> */
>
> --
> 2.47.0
>


[to-be-committed][RISC-V][PR target/116590] Avoid emitting multiple instructions from fmacc patterns

2024-11-20 Thread Jeff Law
So much like my patch from last week, this removes alternatives that 
create multiple instructions that we really should have never needed.


In this case it fixes one of two bugs in pr116590.  In particular we 
don't want vmvNr instructions for thead-vector.  Those instructions were 
emitted as part of those two instruction sequences.



I've tested this in my tester and assuming the pre-commit tester is 
happy, I'll push it to the trunk.


JeffPR target/116590
gcc
* config/riscv/vector.md (pred_mul_mode_undef): Drop
unnecessary alternatives.
(pred_): Likewise.
(pred_): Likewise.
(pred__scalar): Likewise.
(pred__scalar): Likewise.
(pred_mul_neg__undef): Likewise.
(pred_): Likewise.
(pred_): Likewise.
(pred__scalar): Likewise.
(pred__scalar): Likewise.

gcc/testsuite
* gcc.target/riscv/pr116590.c: New test.

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 898cda847cb..02cbd2f56f1 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -6393,62 +6393,58 @@ (define_expand "@pred_mul_"
 })
 
 (define_insn "*pred_mul__undef"
-  [(set (match_operand:V_VLSF 0 "register_operand"   "=vd,vd,?&vd, vr, 
vr,?&vr")
+  [(set (match_operand:V_VLSF 0 "register_operand"   "=vd,vd, vr, vr")
(if_then_else:V_VLSF
  (unspec:
-   [(match_operand: 1 "vector_mask_operand" " vm,vm,  vm,Wc1,Wc1, 
Wc1")
-(match_operand 6 "vector_length_operand"" rK,rK,  rK, rK, rK,  
rK")
-(match_operand 7 "const_int_operand""  i, i,   i,  i,  i,  
 i")
-(match_operand 8 "const_int_operand""  i, i,   i,  i,  i,  
 i")
-(match_operand 9 "const_int_operand""  i, i,   i,  i,  i,  
 i")
-(match_operand 10 "const_int_operand"   "  i, i,   i,  i,  i,  
 i")
+   [(match_operand: 1 "vector_mask_operand" " vm,vm,Wc1,Wc1")
+(match_operand 6 "vector_length_operand"" rK,rK, rK, rK")
+(match_operand 7 "const_int_operand""  i, i,  i,  i")
+(match_operand 8 "const_int_operand""  i, i,  i,  i")
+(match_operand 9 "const_int_operand""  i, i,  i,  i")
+(match_operand 10 "const_int_operand"   "  i, i,  i,  i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)
 (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:V_VLSF
(mult:V_VLSF
- (match_operand:V_VLSF 3 "register_operand" "  0,vr,  vr,  0, 
vr,  vr")
- (match_operand:V_VLSF 4 "register_operand" " vr,vr,  vr, vr, 
vr,  vr"))
-   (match_operand:V_VLSF 5 "register_operand"   " vr, 0,  vr, vr,  
0,  vr"))
+ (match_operand:V_VLSF 3 "register_operand" "  0,vr,  0, vr")
+ (match_operand:V_VLSF 4 "register_operand" " vr,vr, vr, vr"))
+   (match_operand:V_VLSF 5 "register_operand"   " vr, 0, vr,  0"))
  (match_operand:V_VLSF 2 "vector_undef_operand")))]
   "TARGET_VECTOR"
   "@
vf.vv\t%0,%4,%5%p1
vf.vv\t%0,%3,%4%p1
-   vmv%m3r.v\t%0,%3\;vf.vv\t%0,%4,%5%p1
vf.vv\t%0,%4,%5%p1
-   vf.vv\t%0,%3,%4%p1
-   vmv%m3r.v\t%0,%3\;vf.vv\t%0,%4,%5%p1"
+   vf.vv\t%0,%3,%4%p1"
   [(set_attr "type" "vfmuladd")
(set_attr "mode" "")
(set (attr "frm_mode")
(symbol_ref "riscv_vector::get_frm_mode (operands[10])"))])
 
 (define_insn "*pred_"
-  [(set (match_operand:V_VLSF 0 "register_operand"   "=vd, ?&vd, vr, 
?&vr")
+  [(set (match_operand:V_VLSF 0 "register_operand"   "=vd, vr")
(if_then_else:V_VLSF
  (unspec:
-   [(match_operand: 1 "vector_mask_operand" " vm,   vm,Wc1,  Wc1")
-(match_operand 5 "vector_length_operand"" rK,   rK, rK,   rK")
-(match_operand 6 "const_int_operand""  i,i,  i,i")
-(match_operand 7 "const_int_operand""  i,i,  i,i")
-(match_operand 8 "const_int_operand""  i,i,  i,i")
-(match_operand 9 "const_int_operand""  i,i,  i,i")
+   [(match_operand: 1 "vector_mask_operand" " vm,Wc1")
+(match_operand 5 "vector_length_operand"" rK, rK")
+(match_operand 6 "const_int_operand""  i,  i")
+(match_operand 7 "const_int_operand""  i,  i")
+(match_operand 8 "const_int_operand""  i,  i")
+(match_operand 9 "const_int_operand""  i,  i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)
 (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:V_VLSF
(mult:V_VLSF
- (match_operand:V_VLSF 2 "register_operand" "  0,   vr,  0,   
vr")
- (match_operand:V_VLSF 3 "register_operand" " vr,   vr, vr,   
vr"))
-   (match_operand:V_VLSF 4 "register_operand"

Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

2024-11-20 Thread Qing Zhao


> On Nov 19, 2024, at 12:30, Martin Uecker  wrote:
> 
> Am Dienstag, dem 19.11.2024 um 09:18 -0800 schrieb Kees Cook:
>> On Tue, Nov 19, 2024 at 05:41:13PM +0100, Martin Uecker wrote:
>>> Am Dienstag, dem 19.11.2024 um 10:47 -0500 schrieb Marek Polacek:
 On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote:
> Am Montag, dem 18.11.2024 um 17:55 + schrieb Qing Zhao:
>> Hi,
>> 
>> I am working on extending “counted_by” attribute to pointers inside a 
>> structure per our previous discussion. 
>> 
>> I need advice on the following question:
>> 
>> Should -fsantize=bounds support array reference that was referenced 
>> through a pointer that has counted_by attribute? 
 
 I don't see why it couldn't, perhaps as part of -fsanitize=bounds-strict.
 Someone has to implement it, though.
>>> 
>>> I think Qing was volunteering to do this.  My point was that
>>> this would not necessarily be undefined behavior, but instead
>>> could trap for possibly defined behavior.  I would not mind, but
>>> I point out that in the past people insisted that the sanitizers
>>> are only intended to screen for undefined behavior.
>> 
>> I think it's a mistake to confuse the sanitizers with only addressing
>> "undefined behavior". The UB sanitizers are just a subset of the
>> sanitizers in general, and I think UB is a not a good category for how
>> to group the behaviors.
>> 
>> For the Linux kernel, we want robustness. UB leads to ambiguity, so
>> we're quite interested in getting rid of UB, but the bounds sanitizer is
>> expected to implement bounds checking, regardless of UB-ness.
>> 
>> I would absolutely want -fsanitize=bounds to check the construct Qing
>> mentioned.
>> 
>> Another aspect I want to capture for Linux is _pointer_ bounds, so that
>> this would be caught:
>> 
>> #include 
>> 
>> struct annotated {
>>  int b;
>>  int *c __attribute__ ((counted_by (b)));
>> } *p_array_annotated;
>> 
>> void __attribute__((__noinline__)) setup (int annotated_count)
>> {
>>  p_array_annotated
>>= (struct annotated *)malloc (sizeof (struct annotated));
>>  p_array_annotated->c = (int *) malloc (annotated_count *  sizeof (int));
>>  p_array_annotated->b = annotated_count;
>> 
>>  return;
>> }
>> 
>> int main(int argc, char *argv[])
>> {
>>  int i;
>>  int *c;
>> 
>>  setup (10);
>>  c = p_array_annotated->c;
>>  for (i = 0; i < 11; i++)
>>*c++ = 2; // goes boom at i == 10
>>  return 0;
>> }
>> 
>> This may be a separate sanitizer, and it may require a totally different
>> set of internal tracking, but being able to discover that we've run off
>> the end of an allocation is needed.
>> 
>> Of course, the biggest deal is that
>> __builtin_dynamic_object_size(p_array_annotated->c, 1) will return
>> 10 * sizeof(*p_array_annotated->c)
> 
> I want this too.

Good news is: 
__builtin_dynamic_object_size for a field pointer with “counted_by” attribute
 already worked in my private workspace with a minimum adjustment based on
the current implementation. -:)
This part is straightforward.


> The plan preliminary discussed in WG14 is to
> have a proper language extension for this, tentatively:
> 
> struct foo {
>  int n;
>  char (*p)[.n];
> };
> 
> (details to change, the syntax is what I would like to havE)

This is nice!  hopefully this could be included into C standard in the near 
future. 

Then, similarly, treating the following:

struct foo {
  int n;
  char *p __attribute__ ((counted_by (n)));
};

as an array with upper-bounds being “n” is reasonable. 

Thanks.

Qing

> 
> 
> 
>>> 
 
> I think the question is what -fsanitize=bounds is meant to be.
> 
> I am a bit frustrated about the sanitizer.  On the
> one hand, it is not doing enough to get spatial memory
> safety even where this would be easily possible, on the
> other hand, is pedantic about things which are technically
> UB but not problematic and then one is prevented from
> using it
> 
> When used in default mode, where execution continues, it
> also does not mix well with many warning, creates more code,
> and pulls in a libary dependency (and the library also depends
> on upstream choices / progress which seems a limitation for
> extensions).
> 
> What IMHO would be ideal is a protection mode for spatial
> memory safety that simply adds traps (which then requires
> no library, has no issues with other warnings, and could
> evolve independently from clang) 
> 
> So shouldn't we just add a -fboundscheck (which would 
> be like -fsanitize=bounds -fsanitize-trap=bounds just with
> more checking) and make it really good? I think many people
> would be very happy about this.
 
 That's a separate concern.  We already have the -fbounds-check option,
 currently only used in Fortran (and D?), so perhaps we could make
 that option a shorthand for -fsanitize=bounds -fsanitize-trap=bounds.
>>> 
>>> I think 

Re: [PATCH 11/17] testsuite: arm: Use effective-target for pr56184.C and pr59985.C

2024-11-20 Thread Richard Earnshaw
On 20/11/2024 10:49, Richard Earnshaw (lists) wrote:
> On 20/11/2024 07:58, Torbjorn SVENSSON wrote:
>>
>>
>> On 11/19/24 18:08, Richard Earnshaw (lists) wrote:
>>> On 19/11/2024 10:24, Torbjörn SVENSSON wrote:
 Update test cases to use -mcpu=unset/-march=unset feature introduced in
 r15-3606-g7d6c6a0d15c.

 gcc/testsuite/ChangeLog:

 * g++.dg/other/pr56184.C: Use effective-target
 arm_arch_v7a_neon and arm_arch_v7a_thumb.
 * g++.dg/other/pr59985.C: Use effective-target
 arm_arch_v7a_neon and arm_arch_v7a_arm.
 * lib/target-supports.exp: Define effective-target
 arm_arch_v7a_thumb.

 Signed-off-by: Torbjörn SVENSSON 
 ---
   gcc/testsuite/g++.dg/other/pr56184.C  | 8 ++--
   gcc/testsuite/g++.dg/other/pr59985.C  | 7 +--
   gcc/testsuite/lib/target-supports.exp | 1 +
   3 files changed, 12 insertions(+), 4 deletions(-)

 diff --git a/gcc/testsuite/g++.dg/other/pr56184.C 
 b/gcc/testsuite/g++.dg/other/pr56184.C
 index dc949283c98..f4a4300c385 100644
 --- a/gcc/testsuite/g++.dg/other/pr56184.C
 +++ b/gcc/testsuite/g++.dg/other/pr56184.C
 @@ -1,6 +1,10 @@
   /* { dg-do compile { target arm*-*-* } } */
 -/* { dg-skip-if "incompatible options" { ! { arm_thumb1_ok || 
 arm_thumb2_ok } } } */
 -/* { dg-options "-fno-short-enums -O2 -mthumb -march=armv7-a -mfpu=neon 
 -mfloat-abi=softfp -mtune=cortex-a9 -fno-section-anchors -Wno-return-type" 
 } */
 +/* { dg-require-effective-target arm_arch_v7a_neon_ok } */
 +/* { dg-require-effective-target arm_arch_v7a_thumb_ok } */
 +/* { dg-options "-fno-short-enums -O2 -fno-section-anchors 
 -Wno-return-type" } */
 +/* { dg-add-options arm_arch_v7a_neon } */
 +/* { dg-additional-options "-mthumb -mtune=cortex-a9" } */
 +
   
>>>
>>> I'd add a new entry for v7a_neon_thumb for this, then we only need one 
>>> dg-r-e-t rule here.
>>>
   typedef unsigned int size_t;
   __extension__ typedef int __intptr_t;
 diff --git a/gcc/testsuite/g++.dg/other/pr59985.C 
 b/gcc/testsuite/g++.dg/other/pr59985.C
 index 7c9bfab35f1..a0f5e184b43 100644
 --- a/gcc/testsuite/g++.dg/other/pr59985.C
 +++ b/gcc/testsuite/g++.dg/other/pr59985.C
 @@ -1,7 +1,10 @@
   /* { dg-do compile { target arm*-*-* } } */
 -/* { dg-skip-if "incompatible options" { arm_thumb1 } } */
 -/* { dg-options "-g -fcompare-debug -O2 -march=armv7-a -mtune=cortex-a9 
 -mfpu=vfpv3-d16 -mfloat-abi=hard" } */
   /* { dg-skip-if "need hardfp abi" { *-*-* } { "-mfloat-abi=soft" } { "" 
 } } */
 +/* { dg-require-effective-target arm_arch_v7a_arm_ok } */
 +/* { dg-require-effective-target arm_arch_v7a_neon_ok } */
 +/* { dg-options "-g -fcompare-debug -O2" } */
 +/* { dg-add-options arm_arch_v7a_neon } */
 +/* { dg-additional-options "-marm -mtune=cortex-a9 -mfloat-abi=hard 
 -mfpu=vfpv3-d16" } */
>>>
>>> I don't follow this change, the original test never looks at neon, nor 
>>> needs it AFAICT.
>>
>> I am trying to use the existing effective-targets to verify that -marm and 
>> -mfloat-abi=hard is supported for the armv7-a target.
>> Would you like me to define an arm_arch_v7a_arm_hard effective-target and 
>> override with -mfpu=vfpv3-d16 or do you want a dedicated effective-target 
>> that will contain also the -mfpu=vfpv3-d16 in the check?
> 
> My goal is to get rid of -mfpu (other than auto) everywhere in the testsuite. 
>  The only exception would be for some specific backwards compatibility tests, 
> which we can then know are safe to remove if/when -mfpu is obsoleted entirely.
> 
> I'm not expecting that to happen overnight, but the first step is no new uses 
> of the old command-line interface and fixing up existing uses as we need to 
> make changes like this.
> 

Sorry, that didn't really answer your main question.  I'd define a new arch 
armv7a_fp_hard, with the architecture set to "armv7a+fp" and the FPU set to 
auto.

R.

> R.
> 
>>
>> Kind regards,
>> Torbjörn
>>
>>>
     extern void *f1 (unsigned long, unsigned long);
   extern const struct line_map *f2 (void *, int, unsigned int, const char 
 *, unsigned int);
 diff --git a/gcc/testsuite/lib/target-supports.exp 
 b/gcc/testsuite/lib/target-supports.exp
 index 30e453a578a..6241c00a752 100644
 --- a/gcc/testsuite/lib/target-supports.exp
 +++ b/gcc/testsuite/lib/target-supports.exp
 @@ -5778,6 +5778,7 @@ foreach { armfunc armflag armdefs } {
   v7a "-march=armv7-a+fp" __ARM_ARCH_7A__
   v7a_arm "-march=armv7-a+fp -marm" "__ARM_ARCH_7A__ && !__thumb__"
   v7a_neon "-march=armv7-a+simd -mfpu=auto -mfloat-abi=softfp" 
 "__ARM_ARCH_7A__ && __ARM_NEON__"
 +    v7a_thumb "-march=armv7-a+fp -mthumb" "__ARM_ARCH_7A__ && __thumb__"
>>>
>>> I think you want -mfpu=auto here as well.
>>>
   v7r "-march=armv7-r+fp" __ARM_ARCH_7R__
   v7m "

  1   2   >