Re: [PATCH] x86-64: Don't use temp for argument in a TImode register

2024-09-08 Thread Uros Bizjak
On Fri, Sep 6, 2024 at 2:24 PM H.J. Lu  wrote:
>
> Don't use temp for a PARALLEL BLKmode argument of an EXPR_LIST expression
> in a TImode register.  Otherwise, the TImode variable will be put in
> the GPR save area which guarantees only 8-byte alignment.
>
> gcc/
>
> PR target/116621
> * config/i386/i386.cc (ix86_gimplify_va_arg): Don't use temp for
> a PARALLEL BLKmode container of an EXPR_LIST expression in a
> TImode register.
>
> gcc/testsuite/
>
> PR target/116621
> * gcc.target/i386/pr116621.c: New test.

LGTM.

Thanks,
Uros.

>
> Signed-off-by: H.J. Lu 
> ---
>  gcc/config/i386/i386.cc  | 22 ++--
>  gcc/testsuite/gcc.target/i386/pr116621.c | 43 
>  2 files changed, 63 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr116621.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 707b75a6d5d..45320124b91 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -4908,13 +4908,31 @@ ix86_gimplify_va_arg (tree valist, tree type, 
> gimple_seq *pre_p,
>
>examine_argument (nat_mode, type, 0, &needed_intregs, &needed_sseregs);
>
> -  need_temp = (!REG_P (container)
> +  bool container_in_reg = false;
> +  if (REG_P (container))
> +   container_in_reg = true;
> +  else if (GET_CODE (container) == PARALLEL
> +  && GET_MODE (container) == BLKmode
> +  && XVECLEN (container, 0) == 1)
> +   {
> + /* Check if it is a PARALLEL BLKmode container of an EXPR_LIST
> +expression in a TImode register.  In this case, temp isn't
> +needed.  Otherwise, the TImode variable will be put in the
> +GPR save area which guarantees only 8-byte alignment.   */
> + rtx x = XVECEXP (container, 0, 0);
> + if (GET_CODE (x) == EXPR_LIST
> + && REG_P (XEXP (x, 0))
> + && XEXP (x, 1) == const0_rtx)
> +   container_in_reg = true;
> +   }
> +
> +  need_temp = (!container_in_reg
>&& ((needed_intregs && TYPE_ALIGN (type) > 64)
>|| TYPE_ALIGN (type) > 128));
>
>/* In case we are passing structure, verify that it is consecutive 
> block
>   on the register save area.  If not we need to do moves.  */
> -  if (!need_temp && !REG_P (container))
> +  if (!need_temp && !container_in_reg)
> {
>   /* Verify that all registers are strictly consecutive  */
>   if (SSE_REGNO_P (REGNO (XEXP (XVECEXP (container, 0, 0), 0
> diff --git a/gcc/testsuite/gcc.target/i386/pr116621.c 
> b/gcc/testsuite/gcc.target/i386/pr116621.c
> new file mode 100644
> index 000..704266458a8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr116621.c
> @@ -0,0 +1,43 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +#include 
> +#include 
> +
> +union S8302
> +{
> +  union
> +  {
> +double b;
> +int c;
> +  } a;
> +  long double d;
> +  unsigned short int f[5];
> +};
> +
> +union S8302 s8302;
> +extern void check8302va (int i, ...);
> +
> +int
> +main (void)
> +{
> +  memset (&s8302, '\0', sizeof (s8302));
> +  s8302.a.b = -221438.25;
> +  check8302va (1, s8302);
> +  return 0;
> +}
> +
> +__attribute__((noinline, noclone))
> +void
> +check8302va (int z, ...)
> +{
> +  union S8302 arg, *p;
> +  va_list ap;
> +
> +  __builtin_va_start (ap, z);
> +  p = &s8302;
> +  arg = __builtin_va_arg (ap, union S8302);
> +  if (p->a.b != arg.a.b)
> +__builtin_abort ();
> +  __builtin_va_end (ap);
> +}
> --
> 2.46.0
>


Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-09-08 Thread Bill Wendling
On Fri, Sep 6, 2024 at 10:50 PM Martin Uecker  wrote:
>
> Am Freitag, dem 06.09.2024 um 13:59 -0700 schrieb Bill Wendling:
> > On Fri, Sep 6, 2024 at 12:32 PM Martin Uecker  wrote:
> > >
> > > Am Freitag, dem 06.09.2024 um 13:59 + schrieb Qing Zhao:
> > > >
> > > > > On Sep 5, 2024, at 18:22, Bill Wendling  wrote:
> > > > >
> > > > > Hi Qing,
> > > > >
> > > > > Sorry for my late reply.
> > > > >
> > > > > On Thu, Aug 29, 2024 at 7:22 AM Qing Zhao  
> > > > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Thanks for the information.
> > > > > >
> > > > > > Yes, providing a unary operator similar as __counted_by(PTR) as 
> > > > > > suggested by multiple people previously is a cleaner approach.
> > > > > >
> > > > > > Then the programmer will use the following:
> > > > > >
> > > > > > __builtin_choose_expr(
> > > > > > __builtin_has_attribute (__p->FAM, "counted_by”)
> > > > > > __builtin_get_counted_by(__p->FAM) = COUNT, 0);
> > > > > >
> > > > > > From the programmer’s point of view, it’s cleaner too.
> > > > > >
> > > > > > However, there is one issue with “__builtin_choose_expr” currently 
> > > > > > in GCC, its documentation explicitly mentions this limitation:  
> > > > > > (https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fchoose_005fexpr)
> > > > > >
> > > > > > "Note: This construct is only available for C. Furthermore, the 
> > > > > > unused expression (exp1 or exp2 depending on the value of 
> > > > > > const_exp) may still generate syntax errors. This may change in 
> > > > > > future revisions.”
> > > > > >
> > > > > > So, due to this limitation, when there is no counted_by attribute, 
> > > > > > the __builtin_get_counted_by() still is evaluated by the compiler 
> > > > > > and errors is issued and the compilation stops, this can be show 
> > > > > > from the small testing case:
> > > > > >
> > > > > > [opc@qinzhao-ol8u3-x86 gcc]$ cat ttt.c
> > > > > >
> > > > > > struct flex {
> > > > > >  unsigned int b;
> > > > > >  int c[];
> > > > > > } *array_flex;
> > > > > >
> > > > > > #define MY_ALLOC(P, FAM, COUNT) ({ \
> > > > > >  typeof(P) __p; \
> > > > > >  unsigned int __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \
> > > > > >  __p = (typeof(P)) __builtin_malloc(__size); \
> > > > > >  __builtin_choose_expr( \
> > > > > >__builtin_has_attribute (__p->FAM, counted_by), \
> > > > > >__builtin_counted_by_ref(__p->FAM) = COUNT, 0); \
> > > > > >  P = __p; \
> > > > > > })
> > > > > >
> > > > > > int main(int argc, char *argv[])
> > > > > > {
> > > > > >  MY_ALLOC(array_flex, c, 20);
> > > > > >  return 0;
> > > > > > }
> > > > > > [opc@qinzhao-ol8u3-x86 gcc]$ sh t
> > > > > > ttt.c: In function ‘main’:
> > > > > > ttt.c:13:5: error: the argument must have ‘counted_by’ attribute 
> > > > > > ‘__builtin_counted_by_ref’
> > > > > > ttt.c:19:3: note: in expansion of macro ‘MY_ALLOC’
> > > > > >
> > > > > > I checked the FE code on handling “__buiiltin_choose_expr”, Yes, it 
> > > > > > does parse the __builtin_counted_by_ref(__p->FAM) even when 
> > > > > > __builtin_has_attribute(__p->FAM, counted_by) is FALSE, and issued 
> > > > > > the error when parsing __builtin_counted_by_ref and stopped the 
> > > > > > compilation.
> > > > > >
> > > > > > So, in order to support this approach, we first must fix the issue 
> > > > > > in the current __builtin_choose_expr in GCC. Otherwise, it’s 
> > > > > > impossible for the user to use this new builtin.
> > > > > >
> > > > > > Let me know your comments and suggestions.
> > > > > >
> > > > > Do you need to emit a diagnostic if the FAM doesn't have the
> > > > > counted_by attribute? It was originally supposed to "silently fail" if
> > > > > it didn't. We may need to do the same for Clang if so.
> > > >
> > > > Yes, “silently fail” should workaround this problem if fixing the issue 
> > > > in the current __builtin_choose_expr is too complicate.
> > > >
> > > > I will study a little bit on how to fix the issue in 
> > > > __builtin_choose_expr first.
> > > >
> > > > Martin and Joseph, any comment or suggestion from you?
> > >
> > > My recommendation would be not to change __builtin_choose_expr.
> > >
> > > The design where __builtin_get_counted_by  returns a null
> > > pointer constant (void*)0 seems good.  Most users will
> > > get an error which I think is what we want and for those
> > > that want it to work even if the attribute is not there, the
> > > following code seems perfectly acceptable to me:
> > >
> > > auto p = __builtin_get_counted_by(__p->FAM)
> > > *_Generic(p, void*: &(int){}, default: p) = 1;
> > >
> > >
> > > Kees also seemed happy with it. And if I understood it correctly,
> > > also Clang's bounds checking people can work with this.
> > >
> > The problem with this is explained in the Clang RFC [1]. Apple's team
> > rejects taking the address of the 'counter' field when using
> > -fbounds-safety. They suggested this as an alternative:
> >
> >   __builtin_bounds_attr_a

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-09-08 Thread Bill Wendling
On Fri, Sep 6, 2024 at 5:12 PM Qing Zhao  wrote:
>
> Now, if
>
> 1. __builtin_get_counted_by should return a LVALUE instead of a pointer 
> (required by CLANG’s design)
> And
> 2. It’s better not to change the behavior of __builtin_choose_expr.
>
> Then the solution left is:
>
> __builtin_get_counted_by (p->FAM) returns a LVALUE as p->COUNT if p->FAM has 
> a counted_by attribute, if p->FAM does not have a counted_by attribute, 
> silently do nothing. (Or just issue warning if Linux is OKEY with such 
> waning).
>
> Is this acceptable?
>
We might have to finagle things a bit better, as the compiler will
probably be annoyed at whatever gets generated by the code when the
'counted_by' attribute isn't there. For example, do we implicitly
change it into '*(void *)0 = COUNT;' and expect that the
'__builtin_has_attribute' check was already done? Do we remove the
assignment expression entirely? If this is all happening inside
'__builtin_choose_expr' (or '_Generic'?) then it might not be too much
of a concern. But if it's outside of that builtin, trouble? I don't
rightly know the answers to those questions. I'm not a big fan of
turning an assignment expression into a no-op outside of
'__builtin_choose_expr'...

Let me think about it some more.

If all else fails and we need that warning, we could always add a
`-Wno-` to Linux. They have a long history of hating compiler
warnings, even in cases where they explicitly *ask* for the warnings,
so it's not likely to run into a NACK or anything...

> thanks.
>
> Qing
> > On Sep 6, 2024, at 16:59, Bill Wendling  wrote:
> >
> > On Fri, Sep 6, 2024 at 12:32 PM Martin Uecker  wrote:
> >>
> >> Am Freitag, dem 06.09.2024 um 13:59 + schrieb Qing Zhao:
> >>>
>  On Sep 5, 2024, at 18:22, Bill Wendling  wrote:
> 
>  Hi Qing,
> 
>  Sorry for my late reply.
> 
>  On Thu, Aug 29, 2024 at 7:22 AM Qing Zhao  wrote:
> >
> > Hi,
> >
> > Thanks for the information.
> >
> > Yes, providing a unary operator similar as __counted_by(PTR) as 
> > suggested by multiple people previously is a cleaner approach.
> >
> > Then the programmer will use the following:
> >
> > __builtin_choose_expr(
> >__builtin_has_attribute (__p->FAM, "counted_by”)
> >__builtin_get_counted_by(__p->FAM) = COUNT, 0);
> >
> > From the programmer’s point of view, it’s cleaner too.
> >
> > However, there is one issue with “__builtin_choose_expr” currently in 
> > GCC, its documentation explicitly mentions this limitation:  
> > (https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fchoose_005fexpr)
> >
> > "Note: This construct is only available for C. Furthermore, the unused 
> > expression (exp1 or exp2 depending on the value of const_exp) may still 
> > generate syntax errors. This may change in future revisions.”
> >
> > So, due to this limitation, when there is no counted_by attribute, the 
> > __builtin_get_counted_by() still is evaluated by the compiler and 
> > errors is issued and the compilation stops, this can be show from the 
> > small testing case:
> >
> > [opc@qinzhao-ol8u3-x86 gcc]$ cat ttt.c
> >
> > struct flex {
> > unsigned int b;
> > int c[];
> > } *array_flex;
> >
> > #define MY_ALLOC(P, FAM, COUNT) ({ \
> > typeof(P) __p; \
> > unsigned int __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \
> > __p = (typeof(P)) __builtin_malloc(__size); \
> > __builtin_choose_expr( \
> >   __builtin_has_attribute (__p->FAM, counted_by), \
> >   __builtin_counted_by_ref(__p->FAM) = COUNT, 0); \
> > P = __p; \
> > })
> >
> > int main(int argc, char *argv[])
> > {
> > MY_ALLOC(array_flex, c, 20);
> > return 0;
> > }
> > [opc@qinzhao-ol8u3-x86 gcc]$ sh t
> > ttt.c: In function ‘main’:
> > ttt.c:13:5: error: the argument must have ‘counted_by’ attribute 
> > ‘__builtin_counted_by_ref’
> > ttt.c:19:3: note: in expansion of macro ‘MY_ALLOC’
> >
> > I checked the FE code on handling “__buiiltin_choose_expr”, Yes, it 
> > does parse the __builtin_counted_by_ref(__p->FAM) even when 
> > __builtin_has_attribute(__p->FAM, counted_by) is FALSE, and issued the 
> > error when parsing __builtin_counted_by_ref and stopped the compilation.
> >
> > So, in order to support this approach, we first must fix the issue in 
> > the current __builtin_choose_expr in GCC. Otherwise, it’s impossible 
> > for the user to use this new builtin.
> >
> > Let me know your comments and suggestions.
> >
>  Do you need to emit a diagnostic if the FAM doesn't have the
>  counted_by attribute? It was originally supposed to "silently fail" if
>  it didn't. We may need to do the same for Clang if so.
> >>>
> >>> Yes, “silently fail” should workaround this problem if fixing the issue 
> >>> in the current __bui

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-09-08 Thread Bill Wendling
On Fri, Sep 6, 2024 at 11:16 PM Martin Uecker  wrote:
>
> Am Samstag, dem 07.09.2024 um 00:12 + schrieb Qing Zhao:
> > Now, if
> >
> > 1. __builtin_get_counted_by should return a LVALUE instead of a pointer 
> > (required by CLANG’s design)
> > And
> > 2. It’s better not to change the behavior of __builtin_choose_expr.
> >
> > Then the solution left is:
> >
> > __builtin_get_counted_by (p->FAM) returns a LVALUE as p->COUNT if p->FAM 
> > has a counted_by attribute, if p->FAM does not have a counted_by attribute, 
> > silently do nothing. (Or just issue warning if Linux is OKEY with such 
> > waning).
>
> What does silently do nothing mean?
>
>  /* do nothing */ = counter;
>
> will still fail to compile.  So I guess you have something
> else in mind?
>
>
> The new _Generic selection also works if you return a
> lvalue of type void:
>
> struct foo x;
> _Generic(typeof(__counted_by(&x)), void: (int){ 0 },
> default: __counted_by(&x)) = 10;
>
> https://godbolt.org/z/41E3oj84o
>
> So why not do this then?
>
I'm for any solution that a.) works with the fewest changes needed to
other builtins, and 2.) isn't super convoluted. The '_Generic' looks
like a potential compromise.

-bw


[PATCH] Testing: fail vectorization when not using SLP

2024-09-08 Thread Richard Biener
While I know the effect on x86-64 the following patch post is to trigger
CI for disabling non-SLP vectorization to look for arm/aarch64 and
risc-v specific failure modes.

* tree-vect-stmts.cc (vect_analyze_stmt): Fail vectorization
for non-SLP stmts.
---
 gcc/tree-vect-stmts.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f6c5b7a7e87..cfb1066fc96 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -13351,6 +13351,10 @@ vect_analyze_stmt (vec_info *vinfo,
   return opt_result::success ();
 }
 
+  if (!node)
+return opt_result::failure_at (stmt_info->stmt,
+  "needs non-SLP handling\n");
+
   ok = true;
   if (!bb_vinfo
   && (STMT_VINFO_RELEVANT_P (stmt_info)
-- 
2.43.0


Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-09-08 Thread Martin Uecker
Am Sonntag, dem 08.09.2024 um 02:26 -0700 schrieb Bill Wendling:
> On Sat, Sep 7, 2024 at 12:21 AM Jakub Jelinek  wrote:
> > On Sat, Sep 07, 2024 at 07:50:29AM +0200, Martin Uecker wrote:
> > > >   2. Apple's implementation supports expressions in the '__counted_by'
> > > > attribute, thus the 'count' may be an R-value that can't have its
> > > > address taken.
> > > > 
> > > > [1] 
> > > > https://discourse.llvm.org/t/rfc-introducing-new-clang-builtin-builtin-get-counted-by/80836/
> > > 
> > > Yes, this would be a limitation.
> > 
> > But then you really can't return an lvalue either, all you can return is an
> > rvalue in that case.  So just return a (void*)0 in that case.
> > 
> The builtin will return an r-value regardless. The idea is that if the
> counter can't be used as an l-value, we wouldn't add an implicit
> r-value -> l-value cast (an internal Clang AST node). In the cases
> where it can be used as an l-value, we add the implicit cast. I
> imagine GCC has some kind of analogue to that? The '(void *)0' case
> could be reserved only for when the builtin is used as an l-value but
> the r-value can't be cast as an l-value...

I do not know what a r-value -> l-value cast is or why Clang
needs it.  I guess it backconverts an r-value which came
from a l-value to an lvalue...

The issue with returning r-value when the counter expression
is not an lvalue referring to a struct member is that the user
would also need yet another new mechanism to detect this case at 
compile-time. 

Martin









Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-09-08 Thread Martin Uecker
Am Sonntag, dem 08.09.2024 um 02:09 -0700 schrieb Bill Wendling:
> On Fri, Sep 6, 2024 at 10:50 PM Martin Uecker  wrote:
> > 
> > Am Freitag, dem 06.09.2024 um 13:59 -0700 schrieb Bill Wendling:
> > > On Fri, Sep 6, 2024 at 12:32 PM Martin Uecker  wrote:
> > > > > > > 

...
> > > > 
> > > > My recommendation would be not to change __builtin_choose_expr.
> > > > 
> > > > The design where __builtin_get_counted_by  returns a null
> > > > pointer constant (void*)0 seems good.  Most users will
> > > > get an error which I think is what we want and for those
> > > > that want it to work even if the attribute is not there, the
> > > > following code seems perfectly acceptable to me:
> > > > 
> > > > auto p = __builtin_get_counted_by(__p->FAM)
> > > > *_Generic(p, void*: &(int){}, default: p) = 1;
> > > > 
> > > > 
> > > > Kees also seemed happy with it. And if I understood it correctly,
> > > > also Clang's bounds checking people can work with this.
> > > > 
> > > The problem with this is explained in the Clang RFC [1]. Apple's team
> > > rejects taking the address of the 'counter' field when using
> > > -fbounds-safety. They suggested this as an alternative:
> > > 
> > >   __builtin_bounds_attr_arg(ptr->FAM) = COUNT;
> > > 
> > > The __builtin_bounds_attr_arg(ptr->FAM) is replaced by an L-value to
> > > the 'ptr->count' field during SEMA, and life goes on as normal. There
> > > are a few reasons for this:
> > > 
> > >   1. They want to track the relationship between the FAM and the
> > > counter so that one isn't modified without the other being modified.
> > > Allowing the address to be taken makes that check vastly harder.
> > 
> > In the thread it is pointed out that returning a pointer works
> > too, and they would simply restrict passing the pointer elsewhere.
> > 
> It's in rapidsna's initial reply titled "Taking the address of a count
> variable is forbidden":
> 
>   
> https://discourse.llvm.org/t/rfc-introducing-new-clang-builtin-builtin-get-counted-by/80836/2
> 
> I didn't see her retreating from that idea.

The initial proposal already has this as "Alternative Design"
and then there is this response to my comment:

https://discourse.llvm.org/t/rfc-introducing-new-clang-builtin-builtin-get-counted-by/80836/27

which seems to imply that she is open to this idea.


> > I can't see "Apple's team rejects" and "vastly harder" at the
> > moment.
> > 
> > > 
> > >   2. Apple's implementation supports expressions in the '__counted_by'
> > > attribute, thus the 'count' may be an R-value that can't have its
> > > address taken.
> > > 
> > > [1] 
> > > https://discourse.llvm.org/t/rfc-introducing-new-clang-builtin-builtin-get-counted-by/80836/
> > 
> > Yes, this would be a limitation.
> > 
> And not one that I'm particularly excited about (allowing for (nearly)
> arbitrary expressions in the 'counted_by' attribute that is).

And I do not see how returning an r-value can work with the
intended use case for  __builtin_get_counted_by() anyway, because
this would break assignments.

So I do not see any issue against returning a pointer and (void*)0
when there is no attribute and if the expression is a r-value.

Martin







[patch,reload,v2] PR116326 Introduce RELOAD_ELIMINABLE_REGS

2024-09-08 Thread Georg-Johann Lay

The reason for PR116326 is that LRA and reload require different
ELIMINABLE_REGS for a multi-register frame pointer.  As ELIMINABLE_REGS
is used to initialize static const objects, it is not possible to make
ELIMINABLE_REGS dependent on -mlra.

It was also concluded that it is not desirable to adjust reload so that
it behaves like LRA, see

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116326#c8

This patch adds a new macro RELOAD_ELIMINABLE_REGS that takes
precedence over ELIMINABLE_REGS in reload1.cc.

The patch was proposed by H.J. Lu and is only required for trunk.

Johann

--

reload1.cc: rtl-optimization/116326 - Use RELOAD_ELIMINABLE_REGS.

The new macro is required because reload and LRA are using different
representations for a multi-register frame pointer.  As ELIMINABLE_REGS
is used to initialize static const objects, it can't depend on -mlra.

PR rtl-optimization/116326
gcc/
* reload1.cc (reg_eliminate_1): Initialize from
RELOAD_ELIMINABLE_REGS if defined.
* config/avr/avr.h (RELOAD_ELIMINABLE_REGS): Copy from ELIMINABLE_REGS.
(ELIMINABLE_REGS): Don't mention sub-regnos of the frame pointer.
gcc/testsuite/
* gcc.target/avr/torture/lra-pr116324.c: New test.
* gcc.target/avr/torture/lra-pr116325.c: New test.reload1.cc: rtl-optimization/116326 - Use RELOAD_ELIMINABLE_REGS.

The new macro is required because reload and LRA are using different
representations for a multi-register frame pointer.  As ELIMINABLE_REGS
is used to initialize static const objects, it can't depend on -mlra.

PR rtl-optimization/116326
gcc/
* reload1.cc (reg_eliminate_1): Initialize from
RELOAD_ELIMINABLE_REGS if defined.
* config/avr/avr.h (RELOAD_ELIMINABLE_REGS): Copy from ELIMINABLE_REGS.
(ELIMINABLE_REGS): Don't mention sub-regnos of the frame pointer.
gcc/testsuite/
* gcc.target/avr/torture/lra-pr116324.c: New test.
* gcc.target/avr/torture/lra-pr116325.c: New test.

diff --git a/gcc/config/avr/avr.h b/gcc/config/avr/avr.h
index 1cf4180e534..3fa2ee76c43 100644
--- a/gcc/config/avr/avr.h
+++ b/gcc/config/avr/avr.h
@@ -308,12 +308,19 @@ enum reg_class {
 
 #define STATIC_CHAIN_REGNUM ((AVR_TINY) ? 18 :2)
 
-#define ELIMINABLE_REGS {	\
+#define RELOAD_ELIMINABLE_REGS {\
 { ARG_POINTER_REGNUM, STACK_POINTER_REGNUM },   \
 { ARG_POINTER_REGNUM, FRAME_POINTER_REGNUM },   \
 { FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM }, \
 { FRAME_POINTER_REGNUM + 1, STACK_POINTER_REGNUM + 1 } }
 
+#define ELIMINABLE_REGS		\
+  {\
+{ ARG_POINTER_REGNUM, STACK_POINTER_REGNUM },		\
+{ ARG_POINTER_REGNUM, FRAME_POINTER_REGNUM },		\
+{ FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM }		\
+  }
+
 #define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET)			\
   OFFSET = avr_initial_elimination_offset (FROM, TO)
 
diff --git a/gcc/reload1.cc b/gcc/reload1.cc
index 2e059b09970..b0ae64e10b2 100644
--- a/gcc/reload1.cc
+++ b/gcc/reload1.cc
@@ -283,7 +283,13 @@ static const struct elim_table_1
   const int to;
 } reg_eliminate_1[] =
 
+  // Reload and LRA don't agree on how a multi-register frame pointer
+  // is represented for elimination.  See avr.h for a use case.
+#ifdef RELOAD_ELIMINABLE_REGS
+  RELOAD_ELIMINABLE_REGS;
+#else
   ELIMINABLE_REGS;
+#endif
 
 #define NUM_ELIMINABLE_REGS ARRAY_SIZE (reg_eliminate_1)
 
diff --git a/gcc/testsuite/gcc.target/avr/torture/lra-pr116324.c b/gcc/testsuite/gcc.target/avr/torture/lra-pr116324.c
new file mode 100644
index 000..b54eb402d8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/torture/lra-pr116324.c
@@ -0,0 +1,86 @@
+/* { dg-options { -std=gnu99 } } */
+
+void f7_clr (void *cc)
+{
+  __asm ("%~call __f7_clr_asm" :: "z" (cc) : "memory");
+}
+
+void* f7_copy (void *cc, const void *aa)
+{
+  extern void __f7_copy_asm (void);
+  __asm ("%~call __f7_copy_asm" :: "z" (cc), "x" (aa) : "memory");
+  return cc;
+}
+
+typedef _Bool bool;
+typedef unsigned int uint16_t;
+typedef unsigned char uint8_t;
+typedef int int16_t;
+
+typedef struct f7_t
+{
+  union
+  {
+struct
+{
+  uint8_t sign :1;
+  uint8_t reserved1 :1;
+  uint8_t is_nan :1;
+  uint8_t reserved2 :4;
+  uint8_t is_inf :1;
+};
+uint8_t flags;
+  };
+
+  uint8_t mant[7];
+  int16_t expo;
+} f7_t;
+
+
+static inline __attribute__((__always_inline__))
+void __f7_clr (f7_t *cc)
+{
+  extern void __f7_clr_asm (void);
+  __asm ("%~call %x[f]"
+  :
+  : [f] "i" (__f7_clr_asm), "z" (cc)
+  : "memory");
+}
+
+static inline __attribute__((__always_inline__))
+bool __f7_signbit (const f7_t *aa)
+{
+  return aa->flags & (1 << 0);
+}
+
+static inline __attribute__((__always_inline__))
+int16_t sub_ssat16 (int16_t a, int16_t b)
+{
+  _Sat _Fract sa = __builtin_avr_rbits (a);
+  _Sat _Fract sb = __builtin_avr_rbits (b);
+  return __builtin_avr_bitsr (sa - sb);
+}
+
+extern void _

RE: [PATCH] RISC-V: Fix ICE for rvv in lto

2024-09-08 Thread Li, Pan2
I see, I can reproduce this when build "-march=rv64gcv -mabi=lp64d -flto -O0 
test.c -o test.elf".

#include 

int
main ()
{
  size_t vl = 8;
  vint32m1_t vs1 = {};
  vint32m1_t vs2 = {};
  vint32m1_t vd = __riscv_vadd_vv_i32m1(vs1, vs2, vl);

  return (int)&vd;
}

Pan

-Original Message-
From: Jin Ma  
Sent: Sunday, September 8, 2024 1:15 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jinma.cont...@gmail.com
Subject: Re: [PATCH] RISC-V: Fix ICE for rvv in lto

> > #include 
> > 
> > vint32m1_t foo(vint32m1_t vs1, vint32m1_t vs2, size_t vl) 
> > {
> >   return __riscv_vadd_vv_i32m1(vs1, vs2, vl);
> > }
> 
> To double confirm, you mean "riscv64-linux-gnu-gcc-14 -march=rv64gcv 
> -mabi=lp64d -flto -O0 tmp.c -c -S -o -" with above is able to reproduce this 
> ICE?
> 
> Pan

Not too accurate, please don't add "-S" or "-c", let the compilation go to the 
linker and try to generate the binary.
The normal result of compilation should be to throw an error that the main 
function cannot be found, but unfortunately
ICE appears.

By the way, The gcc-14 in my environment is built on releases/gcc-14, I didn't 
download any compiled gcc.

Of course, it is also possible that my local environment is broken, and I will 
check it again.

BR
Jin


[PATCH] configure: fix typos

2024-09-08 Thread Andrew Kreimer
Fix typos in comments.

Signed-off-by: Andrew Kreimer 
---
 configure| 2 +-
 configure.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index e9583f2ba0c..6a77d454fd5 100755
--- a/configure
+++ b/configure
@@ -9086,7 +9086,7 @@ if test -d ${srcdir}/gcc; then
 lang_requires_boot_languages=
 # set srcdir during sourcing lang_frag to the gcc dir.
 # Sadly overriding srcdir on the . line doesn't work in plain sh as it
-# polutes this shell
+# pollutes this shell
 saved_srcdir=${srcdir}
 srcdir=${srcdir}/gcc . ${lang_frag}
 srcdir=${saved_srcdir}
diff --git a/configure.ac b/configure.ac
index f61dbe64a94..2567757e74d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2136,7 +2136,7 @@ if test -d ${srcdir}/gcc; then
 lang_requires_boot_languages=
 # set srcdir during sourcing lang_frag to the gcc dir.
 # Sadly overriding srcdir on the . line doesn't work in plain sh as it
-# polutes this shell
+# pollutes this shell
 saved_srcdir=${srcdir}
 srcdir=${srcdir}/gcc . ${lang_frag}
 srcdir=${saved_srcdir}
-- 
2.46.0



Re: [PATCH v3 08/12] OpenMP: Reject other properties with kind(any)

2024-09-08 Thread Sandra Loosemore

On 8/16/24 06:58, Jakub Jelinek wrote:


If this can apply (perhaps with small fuzz) to vanilla trunk, guess it can
be committed right now, doesn't need to wait for the rest of the
metadirective patch set.


OK.  I've tested the slightly cleaned-up version of the patch which is 
attached; I'll push it in the next day or two if there is no further 
objection.


-SandraFrom 23a82bea26805f2a430354d56b444d5bb7eaed3f Mon Sep 17 00:00:00 2001
From: Sandra Loosemore 
Date: Fri, 6 Sep 2024 20:58:13 +
Subject: [PATCH] OpenMP: Reject other properties with kind(any)

TR13 (pre-6.0) of the OpenMP spec says:

"If trait-property any is specified in the kind trait-selector of the
device selector set or the target_device selector sets, no other
trait-property may be specified in the same selector set."

This restriction dates back to OpenMP 5.1 (where it had slightly
different wording).  GCC's implementation was based on the 5.0 spec, and
several testcases include selectors that are now considered invalid.
This patch adds a diagnostic and fixes the testcases.

gcc/ChangeLog
	* omp-general.cc (omp_check_context_selector): Reject other
	properties in the same selector set with kind(any).

gcc/testsuite/ChangeLog
	* c-c++-common/gomp/declare-variant-10.c: Fix broken tests.
	* c-c++-common/gomp/declare-variant-3.c: Likewise.
	* c-c++-common/gomp/declare-variant-9.c: Likewise.
	* c-c++-common/gomp/declare-variant-any.c: New.
	* gfortran.dg/gomp/declare-variant-10.f90: Fix broken tests.
	* gfortran.dg/gomp/declare-variant-3.f90: Likewise.
	* gfortran.dg/gomp/declare-variant-9.f90: Likewise.
	* gfortran.dg/gomp/declare-variant-any.f90: New.
---
 gcc/omp-general.cc| 35 +++
 .../c-c++-common/gomp/declare-variant-10.c|  4 +--
 .../c-c++-common/gomp/declare-variant-3.c | 10 ++
 .../c-c++-common/gomp/declare-variant-9.c |  4 +--
 .../c-c++-common/gomp/declare-variant-any.c   | 10 ++
 .../gfortran.dg/gomp/declare-variant-10.f90   |  4 +--
 .../gfortran.dg/gomp/declare-variant-3.f90| 12 ++-
 .../gfortran.dg/gomp/declare-variant-9.f90|  2 +-
 .../gfortran.dg/gomp/declare-variant-any.f90  | 28 +++
 9 files changed, 86 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-variant-any.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-variant-any.f90

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index 0b61335dba4..aa53c00bec5 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -1288,6 +1288,8 @@ omp_check_context_selector (location_t loc, tree ctx)
   for (tree tss = ctx; tss; tss = TREE_CHAIN (tss))
 {
   enum omp_tss_code tss_code = OMP_TSS_CODE (tss);
+  bool saw_any_prop = false;
+  bool saw_other_prop = false;
 
   /* We can parse this, but not handle it yet.  */
   if (tss_code == OMP_TRAIT_SET_TARGET_DEVICE)
@@ -1324,6 +1326,31 @@ omp_check_context_selector (location_t loc, tree ctx)
 	  else
 	ts_seen[ts_code] = true;
 
+
+	  /* If trait-property "any" is specified in the "kind"
+	 trait-selector of the "device" selector set or the
+	 "target_device" selector sets, no other trait-property
+	 may be specified in the same selector set.  */
+	  if (ts_code == OMP_TRAIT_DEVICE_KIND)
+	for (tree p = OMP_TS_PROPERTIES (ts); p; p = TREE_CHAIN (p))
+	  {
+		const char *prop = omp_context_name_list_prop (p);
+		if (!prop)
+		  continue;
+		else if (strcmp (prop, "any") == 0)
+		  saw_any_prop = true;
+		else
+		  saw_other_prop = true;
+	  }
+	  /* It seems slightly suspicious that the spec's language covers
+	 the device_num selector too, but
+	   target_device={device_num(whatever),kind(any)}
+	 is probably not terribly useful anyway.  */
+	  else if (ts_code == OMP_TRAIT_DEVICE_ARCH
+		   || ts_code == OMP_TRAIT_DEVICE_ISA
+		   || ts_code == OMP_TRAIT_DEVICE_NUM)
+	saw_other_prop = true;
+
 	  if (omp_ts_map[ts_code].valid_properties == NULL)
 	continue;
 
@@ -1376,6 +1403,14 @@ omp_check_context_selector (location_t loc, tree ctx)
 		  break;
 	  }
 	}
+
+  if (saw_any_prop && saw_other_prop)
+	{
+	  error_at (loc,
+		"no other trait-property may be specified "
+		"in the same selector set with %");
+	  return error_mark_node;
+	}
 }
   return ctx;
 }
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-10.c b/gcc/testsuite/c-c++-common/gomp/declare-variant-10.c
index 2b8a39425b1..e77693430d1 100644
--- a/gcc/testsuite/c-c++-common/gomp/declare-variant-10.c
+++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-10.c
@@ -7,7 +7,7 @@ void f01 (void);
 #pragma omp declare variant (f01) match (device={isa(avx512f,avx512bw)})
 void f02 (void);
 void f03 (void);
-#pragma omp declare variant (f03) match (device={kind("any"),arch(x86_64),isa(avx512f,avx512bw)})
+#pragma omp declare variant (f03) match (device={arch(x86_64),isa(avx512f,avx512bw)})
 void f04 (void);
 void f05 (void);
 #pragma om

[PATCH] phiopt: Small refactoring/cleanup of non-ssa name case of factor_out_conditional_operation

2024-09-08 Thread Andrew Pinski
This small cleanup removes a redundant check for gimple_assign_cast_p and 
reformats
based on that. Also changes the if statement that checks if the integral type 
and the
check to see if the constant fits into the new type such that it returns null
and reformats based on that.

Also moves the check for has_single_use earlier so it is less complex still a 
cheaper
check than some of the others (like the check on the integer side).

This was noticed when adding a few new things to 
factor_out_conditional_operation
but those are not ready to submit yet.

Note there are no functional difference with this change.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_operation): Move the 
has_single_use
checks much earlier. Remove redundant check for gimple_assign_cast_p.
Change around the check if the integral consts fits into the new type.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 122 -
 1 file changed, 60 insertions(+), 62 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 271a5d51f09..06ec5875722 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -265,6 +265,11 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
   tree new_arg0 = arg0_op.ops[0];
   tree new_arg1;
 
+  /* If arg0 have > 1 use, then this transformation actually increases
+ the number of expressions evaluated at runtime.  */
+  if (!has_single_use (arg0))
+return NULL;
+
   if (TREE_CODE (arg1) == SSA_NAME)
 {
   /* Check if arg1 is an SSA_NAME.  */
@@ -278,6 +283,11 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
   if (arg1_op.operands_occurs_in_abnormal_phi ())
return NULL;
 
+  /* If arg1 have > 1 use, then this transformation actually increases
+the number of expressions evaluated at runtime.  */
+  if (!has_single_use (arg1))
+   return NULL;
+
   /* Either arg1_def_stmt or arg0_def_stmt should be conditional.  */
   if (dominated_by_p (CDI_DOMINATORS, gimple_bb (phi), gimple_bb 
(arg0_def_stmt))
  && dominated_by_p (CDI_DOMINATORS,
@@ -295,80 +305,68 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
   if (dominated_by_p (CDI_DOMINATORS, gimple_bb (phi), gimple_bb 
(arg0_def_stmt)))
return NULL;
 
-  /* If arg1 is an INTEGER_CST, fold it to new type.  */
-  if (INTEGRAL_TYPE_P (TREE_TYPE (new_arg0))
- && (int_fits_type_p (arg1, TREE_TYPE (new_arg0))
- || (TYPE_PRECISION (TREE_TYPE (new_arg0))
+  /* Only handle if arg1 is a INTEGER_CST and one that fits
+into the new type or if it is the same precision.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (new_arg0))
+ || !(int_fits_type_p (arg1, TREE_TYPE (new_arg0))
+  || (TYPE_PRECISION (TREE_TYPE (new_arg0))
   == TYPE_PRECISION (TREE_TYPE (arg1)
+   return NULL;
+
+  /* For the INTEGER_CST case, we are just moving the
+conversion from one place to another, which can often
+hurt as the conversion moves further away from the
+statement that computes the value.  So, perform this
+only if new_arg0 is an operand of COND_STMT, or
+if arg0_def_stmt is the only non-debug stmt in
+its basic block, because then it is possible this
+could enable further optimizations (minmax replacement
+etc.).  See PR71016.
+Note no-op conversions don't have this issue as
+it will not generate any zero/sign extend in that case.  */
+  if ((TYPE_PRECISION (TREE_TYPE (new_arg0))
+  != TYPE_PRECISION (TREE_TYPE (arg1)))
+ && new_arg0 != gimple_cond_lhs (cond_stmt)
+ && new_arg0 != gimple_cond_rhs (cond_stmt)
+ && gimple_bb (arg0_def_stmt) == e0->src)
{
- if (gimple_assign_cast_p (arg0_def_stmt))
+ gsi = gsi_for_stmt (arg0_def_stmt);
+ gsi_prev_nondebug (&gsi);
+ if (!gsi_end_p (gsi))
{
- /* For the INTEGER_CST case, we are just moving the
-conversion from one place to another, which can often
-hurt as the conversion moves further away from the
-statement that computes the value.  So, perform this
-only if new_arg0 is an operand of COND_STMT, or
-if arg0_def_stmt is the only non-debug stmt in
-its basic block, because then it is possible this
-could enable further optimizations (minmax replacement
-etc.).  See PR71016.
-Note no-op conversions don't have this issue as
-it will not generate any zero/sign extend in that case.  */
- if ((TYPE_PRECISION (TREE_TYPE (new_arg0))
-   != TYPE_PRECISION (TREE_TYPE (arg1)))
- && new_arg0 != gimple_cond_lhs (cond_stm

Re: [PATCH 1/3] gcov: Cache source files

2024-09-08 Thread Jørgen Kvalsvik

On 8/23/24 09:39, Jan Hubicka wrote:

Hi,

 1:4:int notmain(const char *entity)
 -: == inlined from hello.h ==
 1:6:  if (s)
branch  0 taken 0 (fallthrough)
branch  1 taken 1
 #:7:printf ("hello, %s!\n", s);
 %:7-block 3
call0 never executed
 -:8:  else
 1:9:printf ("hello, world!\n");
 1:9-block 4
call0 returned 1
 1:   10:  return 0;
 1:   10-block 5
 -: == inlined from hello.h (end) ==
 -:5:{
 1:6:  return hello (entity);
 1:6-block 7
 -:7:}


This indeed looks like a reasonable goal.


Thanks, pushed.


gcc/ChangeLog:

* gcov.cc (release_structures): Release source_lines.
(slurp): New function.
(output_lines): Read sources with slurp.
---
  gcc/gcov.cc | 70 -
  1 file changed, 53 insertions(+), 17 deletions(-)

diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index e76a314041c..19019f404ee 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -550,6 +550,11 @@ static vector names;
 a file being read multiple times.  */
  static vector processed_files;
  
+/* The contents of a source file.  The nth SOURCE_LINES entry is the

+   contents of the nth SOURCES, or empty if it has not or could not be
+   read.  */
+static vector*> source_lines;
+
  /* This holds data summary information.  */
  
  static unsigned object_runs;

@@ -762,6 +767,8 @@ static string make_gcov_file_name (const char *, const char 
*);
  static char *mangle_name (const char *);
  static void release_structures (void);
  extern int main (int, char **);
+static const vector&
+slurp (const source_info &src, FILE *gcov_file, const char *line_start);
  
  function_info::function_info (): m_name (NULL), m_demangled_name (NULL),

ident (0), lineno_checksum (0), cfg_checksum (0), has_catch (0),
@@ -1804,6 +1811,15 @@ release_structures (void)
 it != functions.end (); it++)
  delete (*it);
  
+  for (vector *lines : source_lines)

+{
+  if (lines)
+   for (const char *line : *lines)
+ free (const_cast  (line));
+  delete (lines);
+}
+  source_lines.resize (0);
+
for (fnfilter &filter : filters)
  regfree (&filter.regex);
  
@@ -3246,6 +3262,41 @@ read_line (FILE *file)

return pos ? string : NULL;
  }
  
+/* Get the vector with the contents SRC, possibly from a cache.  If

+   the reading fails, a message prefixed with LINE_START is written to
+   GCOV_FILE.  */
+static const vector&
+slurp (const source_info &src, FILE *gcov_file,
+   const char *line_start)
+{
+  if (source_lines.size () <= src.index)
+source_lines.resize (src.index + 1);
+
+  /* Store vector pointers so that the returned references remain
+ stable and won't be broken by successive calls to slurp.  */
+  if (!source_lines[src.index])
+source_lines[src.index] = new vector ();
+
+  if (!source_lines[src.index]->empty ())
+return *source_lines[src.index];
+
+  FILE *source_file = fopen (src.name, "r");
+  if (!source_file)
+fnotice (stderr, "Cannot open source file %s\n", src.name);
+  else if (src.file_time == 0)
+fprintf (gcov_file, "%sSource is newer than graph\n", line_start);
+
+  const char *retval;
+  vector &lines = *source_lines[src.index];
+  if (source_file)
+while ((retval = read_line (source_file)))
+  lines.push_back (xstrdup (retval));
+
+  if (source_file)
+fclose (source_file);
+  return lines;

So this is basically going to read all sources needed for single
compilation unit (.cc file) into memory at once.  I guess that is OK.
I wonder if we don't want to do something like mmap the source and then
set up source_lines array instead of reading every like into spearate
xstrduped chunk?
But that could be done incrementally, so th epatch is OK.

Honza

+}
+
  /* Pad string S with spaces from left to have total width equal to 9.  */
  
  static void

@@ -3435,9 +3486,6 @@ output_lines (FILE *gcov_file, const source_info *src)
  #define  DEFAULT_LINE_START "-:0:"
  #define FN_SEPARATOR "--\n"
  
-  FILE *source_file;

-  const char *retval;
-
/* Print colorization legend.  */
if (flag_use_colors)
  fprintf (gcov_file, "%s",
@@ -3464,17 +3512,8 @@ output_lines (FILE *gcov_file, const source_info *src)
fprintf (gcov_file, DEFAULT_LINE_START "Runs:%u\n", object_runs);
  }
  
-  source_file = fopen (src->name, "r");

-  if (!source_file)
-fnotice (stderr, "Cannot open source file %s\n", src->name);
-  else if (src->file_time == 0)
-fprintf (gcov_file, DEFAULT_LINE_START "Source is newer than graph\n");
-
-  vector source_lines;
-  if (source_file)
-while ((retval = read_line (source_file)) != NULL)
-  source_lines.push_back (xstrdup (retval));
-
+  const vector &source_lines = slurp (*src, gcov_file,
+   DEFAULT_LINE_START);

Re: [PATCH 1/4] testsuite: Use dg-compile, not gcc -c

2024-09-08 Thread Jørgen Kvalsvik

On 8/23/24 09:41, Jan Hubicka wrote:

Since this is a pure compile test it makes sense to inform dejagnu of
it.

gcc/testsuite/ChangeLog:

* gcc.misc-tests/gcov-23.c: Use dg-compile, not gcc -c


OK,
Honza


Thanks,
Pushed.


---
  gcc/testsuite/gcc.misc-tests/gcov-23.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.misc-tests/gcov-23.c 
b/gcc/testsuite/gcc.misc-tests/gcov-23.c
index 72849d80e3a..72ba0aa1389 100644
--- a/gcc/testsuite/gcc.misc-tests/gcov-23.c
+++ b/gcc/testsuite/gcc.misc-tests/gcov-23.c
@@ -1,4 +1,5 @@
-/* { dg-options "-fcondition-coverage -ftest-coverage -O2 -c" } */
+/* { dg-options "-fcondition-coverage -ftest-coverage -O2" } */
+/* { dg-do compile } */
  
  #include 

  #include 
--
2.39.2



[patch, fortran] Matmul and dot_product for unsigned

2024-09-08 Thread Thomas Koenig

Hello world,

like the subject says. The patch is gzipped because it is large;
it contains multiple MATMUL library implementations.

OK for trunk?

Implement MATMUL and DOT_PRODUCT for unsgigned.

gcc/fortran/ChangeLog:

* arith.cc (gfc_arith_uminus): Fix warning.
(gfc_arith_minus): Correctly truncate unsigneds.
* check.cc (gfc_check_dot_product): Handle unsigned arguments.
(gfc_check_matmul): Likewise.
* expr.cc (gfc_get_unsigned_expr): New function.
* gfortran.h (gfc_get_unsigned_expr): Add prototype.
* gfortran.texi: Document MATMUL and DOT_PRODUCT for unsigned.
* simplify.cc (compute_dot_product): Handle unsigneds.

libgfortran/ChangeLog:

* Makefile.am: Add unsigned MATMUL versions.
* Makefile.in: Regenerate.
* gfortran.map: Add unsigned MATMUL versions.o
* libgfortran.h: Add array types for unsigned.
* m4/iparm.m4: Add UNSIGNED if type is m.
* generated/matmul_m1.c: New file.
* generated/matmul_m16.c: New file.
* generated/matmul_m2.c: New file.
* generated/matmul_m4.c: New file.
* generated/matmul_m8.c: New file.
* generated/matmulavx128_m1.c: New file.
* generated/matmulavx128_m16.c: New file.
* generated/matmulavx128_m2.c: New file.
* generated/matmulavx128_m4.c: New file.
* generated/matmulavx128_m8.c: New file.

gcc/testsuite/ChangeLog:

* gfortran.dg/unsigned_25.f90: New test.
* gfortran.dg/unsigned_26.f90: New test.

0001-Implement-MATMUL-and-DOT_PRODUCT-for-unsgigned.patch.gz
Description: application/gzip


Re: [PATCH v2] RISC-V: Fixed incorrect semantic description in DF to DI pattern in the Zfa extension on rv32.

2024-09-08 Thread Christoph Müllner
On Sat, Sep 7, 2024 at 7:08 PM Jeff Law  wrote:
>
>
>
> On 9/6/24 5:32 AM, Jin Ma wrote:
> > In the process of DF to SI, we generally use "unsigned_fix" rather than
> > "truncate" for conversion. Although this has no effect in general,
> > unexpected ICE often occurs when precise semantic analysis is required.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv.md:  Change "truncate" to "unsigned_fix" for
> >  the Zfa extension on rv32.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/zfa-fmovh-fmovp-bug.c: New test.
> This doesn't look correct.
>
> fmv.x.w just moves the bits from one place to another, no conversion.
>
> So I don't see how the original pattern was correct.  Using truncate on
> an FP mode source isn't defined.  But I also don't think the updated
> pattern is correct either.
> jeff

Having matching pattern for these Zfa moves seems pointless because
the optimization that utilizes the instructions in
riscv_split_doubleword_move() uses:
gen_movdfsisi3_rv32(), gen_movsidf2_low_rv32() and gen_movsidf2_high_rv32().
In the XTheadFmv pattern, we use unspec, so the pattern won't match.
I think this should be done for Zfa as well.


Re: [PATCH 00/10] __builtin_dynamic_object_size

2024-09-08 Thread Andrew Pinski
On Tue, Nov 9, 2021 at 11:02 AM Siddhesh Poyarekar  wrote:
>
>
> This patchset implements the __builtin_dynamic_object_size builtin for
> gcc.  The primary motivation to have this builtin in gcc is to enable
> _FORTIFY_SOURCE=3 support with gcc, thus allowing greater fortification
> in use cases where the potential performance tradeoff is acceptable.
>
> Semantics:
> --
>
> __builtin_dynamic_object_size has the same signature as
> __builtin_object_size; it accepts a pointer and type ranging from 0 to 3
> and it returns an object size estimate for the pointer based on an
> analysis of which objects the pointer could point to.  The actual
> properties of the object size estimate are different:
>
> - In the best case __builtin_dynamic_object_size evaluates to an
>   expression that represents a precise size of the object being pointed
>   to.
>
> - In case a precise object size expression cannot be evaluated,
>   __builtin_dynamic_object_size attempts to evaluate an estimate size
>   expression based on the object size type.
>
> - In what situations the builtin returns an estimate vs a precise
>   expression is an implementation detail and may change in future.
>   Users must always assume, as in the case of __builtin_object_size, that
>   the returned value is the maximum or minimum based on the object size
>   type they have provided.
>
> - In the worst case of failure, __builtin_dynamic_object_size returns a
>   constant (size_t)-1 or (size_t)0.
>
> Implementation:
> ---
>
> - The __builtin_dynamic_object_size support is implemented in
>   tree-object-size.  In most cases the first pass (early_objsz) is able
>   to evaluate object size expressions.  The second phase mainly ends up
>   simplifying the __builtin_dynamic_object_size to
>   __builtin_object_size.
>
> - The patchset begins with structural modification of the
>   tree-object-size pass, followed by enhancement to return size
>   expressions.  I have split the implementation into one feature per
>   patch (calls, function parameters, PHI, etc.) to hopefully ease
>   review.
>
> Performance:
> 
>
> Expressions generated by this pass in theory could be arbitrarily
> complex.  I have not made an attempt to limit nesting of objects since
> it seemed too early to do that.  In practice based on the few
> applications I built, most of the complexity of the expressions got
> folded away.  Even so, the performance overhead is likely to be
> non-zero.  If we find performance degradation to be significant, we
> could later add nesting limits to bail out if a size expression gets too
> complex.
>
> I have also not implemented simplification of __*_chk to their normal
> variants if we can determine at compile time that it is safe, which
> still depends on the object size to be constant.  I hope to do this as a
> minor performance improvement in stage 3.
>
> Build time performance doesn't seem to be affected much based on an
> unscientific check to time
> `make check-gcc RUNTESTFLAGS="dg.exp=builtin*"`.  It only increases by
> about a couple of seconds when the dynamic tests are added and remains
> more or less in the same ballpark otherwise.
>
> Testing:
> 
>
> I have added tests for dynamic object sizes as well as wrappers for all
> __builtin_object_size tests to provide wide coverage.  With that in
> place I have run full bootstrap builds on Fedora rawhide by backporting the
> patches to the gcc11 branch and x86_64 and i686 have no failures in any
> of the builtin-*object-size* tests and no new failures.
>
> I have also built bash, cmake, zfs-fuse and systemtap with
> _FORTIFY_SOURCE=3 (with a hacked up glibc to make sure it works) and saw
> no issues in any of those builds.  I did some rudimentary analysis of
> the generated binaries to see if there was any difference in coverage
> and found that there was.  In terms of pure numbers, there were far more
> _chk variants of calls than the regular ones due to _FORTIFY_SOURCE
> (from about 4% to 70% in bash), but that could well be due to the _chk
> variants not being folded into regular variants when safe.  However, on
> manual inspection of some of these sites, it was clear that coverage was
> increasing significantly where __builtin_object_size was previously
> bailing out.
>
> Specifically for bash, the coverage went from 30.81% to 86.87% (it was
> 84.5% with the v1 patch).  I actually hope to reduce this a bit with
> folding improvements for __builtin___memcpy_chk, etc.
>
> A bootstrap build is in progress on x86_64.
>
> Limitations/Future work:
> 
>
> - The most immediate work is to fold _chk variants of builtins into
>   regular ones when it can be proven at compile time that the object
>   size will alwasy be less than the length of the write.  I am working
>   on it right now.
>
> - I need to enable _FORTIFY_SOURCE=3 for gcc in glibc; currently it is
>   llvm-only.  I've started working on these patches too on the side.
>
> - Instead of ba

Re: [PATCH 00/10] __builtin_dynamic_object_size

2024-09-08 Thread Siddhesh Poyarekar

On 2024-09-08 18:01, Andrew Pinski wrote:

When I was implementing improvements into phiopt I ran into case where
objsz would fail now because we get:
tmp = PHI 
ptr = ptr + tmp

where before the pointer plus was inside each branch instead. So my
question is there any progress on implementing objsz with ranger or
has that work been put off?
I filed https://gcc.gnu.org/PR116556 for this. Do you have any start


Sorry I noticed it but couldn't get time to work on it.  __bos should 
trivially handle this  I think, I'm surprised that it doesn't.


As far as using ranger in __bos is concerned, I believe there was strong 
opposition to the idea and instead to continue evaluating sizes from 
first principles like we currently do in the tree-object-size pass.  I 
think the concern was that ranger can be more aggressive in evaluation 
and assume no undefined behaviour whereas __bos tries to detect 
undefined behaviour whenever it can and is more conservative about 
computations.



of patches for this so it maybe it could be taken to finish? If not
then I am going to try to implement it since it blocks my work on
phiopt.


I'll be happy to review any changes you make to tree-object-size to fix 
this.


Thanks,
Sid


RE: [PATCH v1] Vect: Support form 1 of vector signed integer .SAT_ADD

2024-09-08 Thread Li, Pan2
Kindly ping.

Pan

-Original Message-
From: Li, Pan2  
Sent: Friday, August 30, 2024 6:16 PM
To: gcc-patches@gcc.gnu.org
Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 

Subject: [PATCH v1] Vect: Support form 1 of vector signed integer .SAT_ADD

From: Pan Li 

This patch would like to support the vector signed ssadd pattern
for the RISC-V backend.  Aka

Form 1:
  #define DEF_VEC_SAT_S_ADD_FMT_1(T, UT, MIN, MAX)   \
  void __attribute__((noinline)) \
  vec_sat_s_add_##T##_fmt_1 (T *out, T *x, T *y, unsigned n) \
  {  \
for (unsigned i = 0; i < n; i++) \
  {  \
T sum = (UT)x[i] + (UT)y[i]; \
out[i] = (x[i] ^ y[i]) < 0   \
  ? sum  \
  : (sum ^ x[i]) >= 0\
? sum\
: x[i] < 0 ? MIN : MAX;  \
  }  \
  }

DEF_VEC_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)

If the backend implemented the vector mode of ssadd, we will see IR diff
similar as below:

Before this patch:
 108   │   _114 = .SELECT_VL (ivtmp_112, POLY_INT_CST [2, 2]);
 109   │   ivtmp_77 = _114 * 8;
 110   │   vect__4.9_80 = .MASK_LEN_LOAD (vectp_x.7_78, 64B, { -1, ...  }, 
_114, 0);
 111   │   vect__5.10_81 = VIEW_CONVERT_EXPR(vect__4.9_80);
 112   │   vect__7.13_85 = .MASK_LEN_LOAD (vectp_y.11_83, 64B, { -1, ...  }, 
_114, 0);
 113   │   vect__8.14_86 = VIEW_CONVERT_EXPR(vect__7.13_85);
 114   │   vect__9.15_87 = vect__5.10_81 + vect__8.14_86;
 115   │   vect_sum_20.16_88 = VIEW_CONVERT_EXPR(vect__9.15_87);
 116   │   vect__10.17_89 = vect__4.9_80 ^ vect__7.13_85;
 117   │   vect__11.18_90 = vect__4.9_80 ^ vect_sum_20.16_88;
 118   │   mask__46.19_92 = vect__10.17_89 >= { 0, ... };
 119   │   _36 = vect__4.9_80 >> 63;
 120   │   mask__44.26_104 = vect__11.18_90 < { 0, ... };
 121   │   mask__43.27_105 = mask__46.19_92 & mask__44.26_104;
 122   │   _115 = .COND_XOR (mask__43.27_105, _36, { 9223372036854775807, ... 
}, vect_sum_20.16_88);
 123   │   .MASK_LEN_STORE (vectp_out.29_108, 64B, { -1, ... }, _114, 0, _115);
 124   │   vectp_x.7_79 = vectp_x.7_78 + ivtmp_77;
 125   │   vectp_y.11_84 = vectp_y.11_83 + ivtmp_77;
 126   │   vectp_out.29_109 = vectp_out.29_108 + ivtmp_77;
 127   │   ivtmp_113 = ivtmp_112 - _114;

After this patch:
  94   │   # vectp_x.7_82 = PHI 
  95   │   # vectp_y.10_86 = PHI 
  96   │   # vectp_out.14_91 = PHI 
  97   │   # ivtmp_95 = PHI 
  98   │   _97 = .SELECT_VL (ivtmp_95, POLY_INT_CST [2, 2]);
  99   │   ivtmp_81 = _97 * 8;
 100   │   vect__4.9_84 = .MASK_LEN_LOAD (vectp_x.7_82, 64B, { -1, ...  }, _97, 
0);
 101   │   vect__7.12_88 = .MASK_LEN_LOAD (vectp_y.10_86, 64B, { -1, ...  }, 
_97, 0);
 102   │   vect_patt_40.13_89 = .SAT_ADD (vect__4.9_84, vect__7.12_88);
 103   │   .MASK_LEN_STORE (vectp_out.14_91, 64B, { -1, ... }, _97, 0, 
vect_patt_40.13_89);
 104   │   vectp_x.7_83 = vectp_x.7_82 + ivtmp_81;
 105   │   vectp_y.10_87 = vectp_y.10_86 + ivtmp_81;
 106   │   vectp_out.14_92 = vectp_out.14_91 + ivtmp_81;
 107   │   ivtmp_96 = ivtmp_95 - _97;

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

gcc/ChangeLog:

* match.pd: Add case 2 for the signed .SAT_ADD consumed by
vect pattern.
* tree-vect-patterns.cc (gimple_signed_integer_sat_add): Add new
matching func decl for signed .SAT_ADD.
(vect_recog_sat_add_pattern): Add signed .SAT_ADD pattern match.

Signed-off-by: Pan Li 
---
 gcc/match.pd  | 17 +
 gcc/tree-vect-patterns.cc |  5 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index be211535a49..578c9dd5b77 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3207,6 +3207,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
+/* Signed saturation add, case 2:
+   T sum = (T)((UT)X + (UT)Y)
+   SAT_S_ADD = (X ^ Y) < 0 && (X ^ sum) >= 0 ? (-(T)(X < 0) ^ MAX) : sum;
+
+   The T and UT are type pair like T=int8_t, UT=uint8_t.  */
+(match (signed_integer_sat_add @0 @1)
+ (cond^ (bit_and:c (lt (bit_xor:c @0 (nop_convert@2 (plus (nop_convert @0)
+ (nop_convert @1
+  integer_zerop)
+  (ge (bit_xor:c @0 @1) integer_zerop))
+   (bit_xor:c (nop_convert (negate (nop_convert (convert
+ (l

[PATCH] Don't force_reg operands[3] when it's not const0_rtx.

2024-09-08 Thread liuhongt
It fix the regression by

a51f2fc0d80869ab079a93cc3858f24a1fd28237 is the first bad commit
commit a51f2fc0d80869ab079a93cc3858f24a1fd28237
Author: liuhongt 
Date:   Wed Sep 4 15:39:17 2024 +0800

Handle const0_operand for *avx2_pcmp3_1.

caused

FAIL: gcc.target/i386/pr59539-1.c scan-assembler-times vmovdqu|vmovups 1

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr59539-1.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr59539-1.c --target_board='unix{-m64\ 
-march=cascadelake}'"

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.


gcc/ChangeLog:

* config/i386/sse.md (*avx2_pcmp3_1): Don't force_reg
operands[3] when it's not const0_rtx.
---
 gcc/config/i386/sse.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 1946d3513be..1ae61182d0c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17929,7 +17929,8 @@ (define_insn_and_split "*avx2_pcmp3_1"
   if (INTVAL (operands[5]) == 1)
 std::swap (operands[3], operands[4]);
 
-  operands[3] = force_reg (mode, operands[3]);
+  if (operands[3] == CONST0_RTX (mode))
+operands[3] = force_reg (mode, operands[3]);
   if (operands[4] == CONST0_RTX (mode))
 operands[4] = force_reg (mode, operands[4]);
 
-- 
2.31.1



Re: [PATCH v2] RISC-V: Fixed incorrect semantic description in DF to DI pattern in the Zfa extension on rv32.

2024-09-08 Thread Jin Ma
> Having matching pattern for these Zfa moves seems pointless because
> the optimization that utilizes the instructions in
> riscv_split_doubleword_move() uses:
> gen_movdfsisi3_rv32(), gen_movsidf2_low_rv32() and gen_movsidf2_high_rv32().
> In the XTheadFmv pattern, we use unspec, so the pattern won't match.
> I think this should be done for Zfa as well.

Yes, that's a very good suggestion. I will try.

Thanks.
Jin


Re: [PATCH] s390: Fix strict_low_part generation

2024-09-08 Thread Stefan Schulze Frielinghaus
Ping

On Fri, Aug 16, 2024 at 09:14:02AM +0200, Stefan Schulze Frielinghaus wrote:
> In s390_expand_insv(), if generating code for ICM et al. src is a MEM
> and gen_lowpart might force src into a register such that we end up with
> patterns which do not match anymore.  Use adjust_address() instead in
> order to preserve a MEM.
> 
> Furthermore, it is not straight forward to enforce a subreg.  For
> example, in case of a paradoxical subreg, gen_lowpart() may return a
> register.  In order to compensate this, s390_gen_lowpart_subreg() emits
> a reference to a pseudo which does not coincide with its definition
> which is wrong.  Additionally, if dest is a paradoxical subreg, then do
> not try to emit a strict_low_part since it could mean that dest was not
> initialized even though this might be fixed up later by init-regs.
> 
> Splitter for insn *get_tp_64, *zero_extendhisi2_31,
> *zero_extendqisi2_31, *zero_extendqihi2_31 are applied after reload.
> Thus, operands[0] is a hard register and gen_lowpart (m, operands[0])
> just returns the hard register for mode m which is fine to use as an
> argument for strict_low_part, i.e., we do not need to enforce subregs
> here since after reload subregs are supposed to be eliminated anyway.
> 
> This fixes gcc.dg/torture/pr111821.c.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390-protos.h (s390_gen_lowpart_subreg): Remove.
>   * config/s390/s390.cc (s390_gen_lowpart_subreg): Remove.
>   (s390_expand_insv): Use adjust_address() and emit a
>   strict_low_part only in case of a natural subreg.
>   * config/s390/s390.md: Use gen_lowpart() instead of
>   s390_gen_lowpart_subreg().
> ---
>  Bootstrapped and regtested on s390.  Ok for mainline,gcc12,gcc13,gcc14?
> 
>  gcc/config/s390/s390-protos.h |  1 -
>  gcc/config/s390/s390.cc   | 47 +++
>  gcc/config/s390/s390.md   | 13 +-
>  3 files changed, 20 insertions(+), 41 deletions(-)
> 
> diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
> index b4646ccb606..e7ac59d17da 100644
> --- a/gcc/config/s390/s390-protos.h
> +++ b/gcc/config/s390/s390-protos.h
> @@ -50,7 +50,6 @@ extern void s390_set_has_landing_pad_p (bool);
>  extern bool s390_hard_regno_rename_ok (unsigned int, unsigned int);
>  extern int s390_class_max_nregs (enum reg_class, machine_mode);
>  extern bool s390_return_addr_from_memory(void);
> -extern rtx s390_gen_lowpart_subreg (machine_mode, rtx);
>  extern bool s390_fma_allowed_p (machine_mode);
>  #if S390_USE_TARGET_ATTRIBUTE
>  extern tree s390_valid_target_attribute_tree (tree args,
> diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
> index 7aea776da2f..7cdcebfc08b 100644
> --- a/gcc/config/s390/s390.cc
> +++ b/gcc/config/s390/s390.cc
> @@ -516,31 +516,6 @@ s390_return_addr_from_memory ()
>return cfun_gpr_save_slot(RETURN_REGNUM) == SAVE_SLOT_STACK;
>  }
>  
> -/* Generate a SUBREG for the MODE lowpart of EXPR.
> -
> -   In contrast to gen_lowpart it will always return a SUBREG
> -   expression.  This is useful to generate STRICT_LOW_PART
> -   expressions.  */
> -rtx
> -s390_gen_lowpart_subreg (machine_mode mode, rtx expr)
> -{
> -  rtx lowpart = gen_lowpart (mode, expr);
> -
> -  /* There might be no SUBREG in case it could be applied to the hard
> - REG rtx or it could be folded with a paradoxical subreg.  Bring
> - it back.  */
> -  if (!SUBREG_P (lowpart))
> -{
> -  machine_mode reg_mode = TARGET_ZARCH ? DImode : SImode;
> -  gcc_assert (REG_P (lowpart));
> -  lowpart = gen_lowpart_SUBREG (mode,
> - gen_rtx_REG (reg_mode,
> -  REGNO (lowpart)));
> -}
> -
> -  return lowpart;
> -}
> -
>  /* Return nonzero if it's OK to use fused multiply-add for MODE.  */
>  bool
>  s390_fma_allowed_p (machine_mode mode)
> @@ -7112,15 +7087,21 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src)
>/* Emit a strict_low_part pattern if possible.  */
>if (smode_bsize == bitsize && bitpos == mode_bsize - smode_bsize)
>   {
> -   rtx low_dest = s390_gen_lowpart_subreg (smode, dest);
> -   rtx low_src = gen_lowpart (smode, src);
> -
> -   switch (smode)
> +   rtx low_dest = gen_lowpart (smode, dest);
> +   if (SUBREG_P (low_dest) && !paradoxical_subreg_p (low_dest))
>   {
> - case E_QImode: emit_insn (gen_movstrictqi (low_dest, low_src)); 
> return true;
> - case E_HImode: emit_insn (gen_movstricthi (low_dest, low_src)); 
> return true;
> - case E_SImode: emit_insn (gen_movstrictsi (low_dest, low_src)); 
> return true;
> - default: break;
> +   poly_int64 offset = GET_MODE_SIZE (mode) - GET_MODE_SIZE (smode);
> +   rtx low_src = adjust_address (src, smode, offset);
> +   switch (smode)
> + {
> + case E_QImode: emit_insn (gen_movstrictqi (low_dest, low_src));
> +   

Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-08 Thread Stefan Schulze Frielinghaus
Ping

On Fri, Aug 16, 2024 at 09:41:55AM +0200, Stefan Schulze Frielinghaus wrote:
> Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1
> survive register allocation.  This in turn leads to wrong register
> renaming.  Keeping the current approach would mean we need two insns for
> *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something along the
> lines
> 
> (define_insn "*tf_to_fprx2_0"
>   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0)
> (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
>UNSPEC_TF_TO_FPRX2_0))]
>   "TARGET_VXE"
>   "#")
> 
> (define_insn "*tf_to_fprx2_0"
>   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
>UNSPEC_TF_TO_FPRX2_0))]
>   "TARGET_VXE"
>   "vpdi\t%v0,%v1,%v0,1
>   [(set_attr "op_type" "VRR")])
> 
> and similar for *tf_to_fprx2_1.  Note, pre register allocation operand 0
> has mode FPRX2 and afterwards DF once subregs have been eliminated.
> 
> Since we always copy a whole vector register into a floating-point
> register pair, another way to fix this is to merge *tf_to_fprx2_0 and
> *tf_to_fprx2_1 into a single insn which means we don't have to use
> subregs at all.  The downside of this is that the assembler template
> contains two instructions, now.  The upside is that we don't have to
> come up with some artificial insn before RA which might be more
> readable/maintainable.  That is implemented by this patch.
> 
> In commit r11-4872-ge627cda5686592, the output operand specifier %V was
> introduced which is used in tf_to_fprx2 only, now.  I didn't come up
> with its counterpart like %F for floating-point registers.  Instead I
> printed the register pair in the output function directly.  This spares
> us a new and "rare" format specifier for a single insn.  I don't have a
> strong opinion which option to choose, however, we should either add %F
> in order to mimic the same behaviour as %V or getting rid of %V and
> inline the logic in the output function.  I lean towards the latter.
> Any preferences?
> ---
>  gcc/config/s390/s390.md|  2 +
>  gcc/config/s390/vector.md  | 66 +++---
>  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +
>  3 files changed, 60 insertions(+), 34 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c
> 
> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
> index 3d5759d6252..31240899934 100644
> --- a/gcc/config/s390/s390.md
> +++ b/gcc/config/s390/s390.md
> @@ -241,6 +241,8 @@
> UNSPEC_VEC_VFMIN
> UNSPEC_VEC_VFMAX
>  
> +   UNSPEC_TF_TO_FPRX2
> +
> UNSPEC_NNPA_VCLFNHS_V8HI
> UNSPEC_NNPA_VCLFNLS_V8HI
> UNSPEC_NNPA_VCRNFS_V8HI
> diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> index a75b7cb5825..561182e0c2c 100644
> --- a/gcc/config/s390/vector.md
> +++ b/gcc/config/s390/vector.md
> @@ -907,36 +907,36 @@
>"vmrlg\t%0,%1,%2";
>[(set_attr "op_type" "VRR")])
>  
> -
> -(define_insn "*tf_to_fprx2_0"
> -  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
> - (subreg:DF (match_operand:TF1 "general_operand"   "v") 0))]
> -  "TARGET_VXE"
> -  ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1];
> -  "vpdi\t%v0,%v1,%v0,1"
> -  [(set_attr "op_type" "VRR")])
> -
> -(define_insn "*tf_to_fprx2_1"
> -  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
> - (subreg:DF (match_operand:TF1 "general_operand"   "v") 8))]
> +(define_insn "tf_to_fprx2"
> +  [(set (match_operand:FPRX2 0 "register_operand" "=f,f ,f")
> + (unspec:FPRX2 [(match_operand:TF 1 "general_operand"   "v,AR,AT")]
> +   UNSPEC_TF_TO_FPRX2))]
>"TARGET_VXE"
> -  ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1];
> -  "vpdi\t%V0,%v1,%V0,5"
> -  [(set_attr "op_type" "VRR")])
> -
> -(define_insn_and_split "tf_to_fprx2"
> -  [(set (match_operand:FPRX20 "nonimmediate_operand" "=f,f")
> - (subreg:FPRX2 (match_operand:TF 1 "general_operand"   "v,AR") 0))]
> -  "TARGET_VXE"
> -  "#"
> -  "!(MEM_P (operands[1]) && MEM_VOLATILE_P (operands[1]))"
> -  [(set (match_dup 2) (match_dup 3))
> -   (set (match_dup 4) (match_dup 5))]
>  {
> -  operands[2] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 0);
> -  operands[3] = simplify_gen_subreg (DFmode, operands[1], TFmode, 0);
> -  operands[4] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 8);
> -  operands[5] = simplify_gen_subreg (DFmode, operands[1], TFmode, 8);
> +  char buf[64];
> +  switch (which_alternative)
> +{
> +case 0:
> +  if (REGNO (operands[0]) == REGNO (operands[1]))
> + return "vpdi\t%V0,%v1,%V0,5";
> +  else
> + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
> +case 1:
> +  {
> + const char *reg_pair = reg_names[REGNO (operands[0]) + 1];
> + snprintf (buf, si

[PING^6][PATCH v2] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2024-09-08 Thread jeevitha
Ping * 6

Please review!

On 21/03/24 6:21 pm, jeevitha wrote:
> Hi All,
> 
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> PTImode assists in generating even/odd register pairs on 128 bits. When the 
> user 
> specifies PTImode as an attribute, it breaks because there is no internal 
> type 
> to handle this mode. To address this, we have created a tree node with dummy 
> type
> to handle PTImode. We are not documenting this dummy type since users are not
> allowed to use this type externally.
> 
> 2024-03-21  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/110411
>   * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add
>   RS6000_BTI_INTPTI.
>   * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Add node
>   for PTImode type.
> 
> gcc/testsuite/
>   PR target/106895
>   * gcc.target/powerpc/pr106895.c: New testcase.
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 6698274031b..f553c72779e 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -756,6 +756,15 @@ rs6000_init_builtins (void)
>else
>  ieee128_float_type_node = NULL_TREE;
>  
> +  /* PTImode to get even/odd register pairs.  */
> +  intPTI_type_internal_node = make_node(INTEGER_TYPE);
> +  TYPE_PRECISION (intPTI_type_internal_node) = GET_MODE_BITSIZE (PTImode);
> +  layout_type (intPTI_type_internal_node);
> +  SET_TYPE_MODE (intPTI_type_internal_node, PTImode);
> +  t = build_qualified_type (intPTI_type_internal_node, TYPE_QUAL_CONST);
> +  lang_hooks.types.register_builtin_type (intPTI_type_internal_node,
> +   "__dummypti");
> +
>/* Vector pair and vector quad support.  */
>vector_pair_type_node = make_node (OPAQUE_TYPE);
>SET_TYPE_MODE (vector_pair_type_node, OOmode);
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 68bc45d65ba..b6078077b20 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -2302,6 +2302,7 @@ enum rs6000_builtin_type_index
>RS6000_BTI_ptr_vector_quad,
>RS6000_BTI_ptr_long_long,
>RS6000_BTI_ptr_long_long_unsigned,
> +  RS6000_BTI_INTPTI,
>RS6000_BTI_MAX
>  };
>  
> @@ -2346,6 +2347,7 @@ enum rs6000_builtin_type_index
>  #define uintDI_type_internal_node 
> (rs6000_builtin_types[RS6000_BTI_UINTDI])
>  #define intTI_type_internal_node  
> (rs6000_builtin_types[RS6000_BTI_INTTI])
>  #define uintTI_type_internal_node 
> (rs6000_builtin_types[RS6000_BTI_UINTTI])
> +#define intPTI_type_internal_node 
> (rs6000_builtin_types[RS6000_BTI_INTPTI])
>  #define float_type_internal_node  
> (rs6000_builtin_types[RS6000_BTI_float])
>  #define double_type_internal_node 
> (rs6000_builtin_types[RS6000_BTI_double])
>  #define long_double_type_internal_node
> (rs6000_builtin_types[RS6000_BTI_long_double])
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106895.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106895.c
> new file mode 100644
> index 000..56547b7fa9d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106895.c
> @@ -0,0 +1,15 @@
> +/* PR target/106895 */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-options "-O2" } */
> +
> +/* Verify the following generates even/odd register pairs.  */
> +
> +typedef __int128 pti __attribute__((mode(PTI)));
> +
> +void
> +set128 (pti val, pti *mem)
> +{
> +asm("stq %1,%0" : "=m"(*mem) : "r"(val));
> +}
> +
> +/* { dg-final { scan-assembler "stq \[123\]?\[02468\]" } } */
> 



RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-09-08 Thread Prathamesh Kulkarni
> -Original Message-
> From: Thomas Schwinge 
> Sent: Friday, September 6, 2024 2:31 PM
> To: Prathamesh Kulkarni ; Richard Biener
> 
> Cc: Andrew Pinski ; gcc-patches@gcc.gnu.org; Jakub
> Jelinek 
> Subject: RE: [nvptx] Pass -m32/-m64 to host_compiler if it has
> multilib support
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi!
Hi Thomas,
Thanks for the review!
> 
> On 2024-08-16T15:36:29+, Prathamesh Kulkarni
>  wrote:
> >> > Am 13.08.2024 um 17:48 schrieb Thomas Schwinge
> >> :
> >> > On 2024-08-12T07:50:07+, Prathamesh Kulkarni
> >>  wrote:
> >> >>> From: Thomas Schwinge 
> >> >>> Sent: Friday, August 9, 2024 12:55 AM
> >> >
> >> >>> On 2024-08-08T06:46:25-0700, Andrew Pinski 
> >> wrote:
> >>  On Thu, Aug 8, 2024 at 6:11 AM Prathamesh Kulkarni
> >>   wrote:
> >> > compiled with -fopenmp -foffload=nvptx-none now fails with:
> >> > gcc: error: unrecognized command-line option '-m64'
> >> > nvptx mkoffload: fatal error: ../install/bin/gcc returned 1
> >> > exit
> >> >>> status compilation terminated.
> >> >>>
> >> >>> Heh.  Yeah...
> >> >>>
> >> > As mentioned in RFC email, this happens because
> >> > nvptx/mkoffload.cc:compile_native passes -m64/-m32 to host
> >> > compiler
> >> >>> depending on whether offload_abi is OFFLOAD_ABI_LP64 or
> >> >>> OFFLOAD_ABI_ILP32, and aarch64 backend doesn't recognize these
> >> >>> options.
> >> >
> >> >>> So, my idea is: instead of the current strategy that the host
> >> >>> 'TARGET_OFFLOAD_OPTIONS' synthesizes '-foffload-abi=lp64' etc.,
> >> >>> which the 'mkoffload's then interpret and re-synthesize '-m64'
> etc.
> >> >>> -- how about we instead directly tell the 'mkoffload's the
> >> >>> relevant ABI options?  That is, 'TARGET_OFFLOAD_OPTIONS'
> instead
> >> >>> synthesizes '-foffload-abi=-m64'
> >> >>> etc., which the 'mkoffload's can then readily use.  Could you
> >> >>> please give that a try, and/or does anyone see any issues with
> that approach?
> >> >>>
> >> >>> And use something like '-foffload-abi=disable' to replace the
> current:
> >> >>>
> >> >>>/* PR libgomp/65099: Currently, we only support offloading
> in 64-bit
> >> >>>   configurations.  */
> >> >>>if (offload_abi == OFFLOAD_ABI_LP64)
> >> >>>  {
> >> >>>
> >> >>> (As discussed before, this should be done differently
> altogether,
> >> >>> but that's for another day.)
> >> >> Sorry, I don't quite follow. Currently we enable offloading if
> >> >> offload_abi == OFFLOAD_ABI_LP64, which is synthesized from
> >> >> -foffload-abi=lp64. If we change -foffload-abi to instead
> specify
> >> >> host-specific ABI opts, I guess mkoffload will still need to
> >> >> somehow figure out which ABI is used, so it can disable
> offloading
> >> >> for 32-bit ? I suppose we could adjust TARGET_OFFLOAD_OPTIONS
> for
> >> >> each host to
> >> pass -foffload-abi=disable if TARGET_ILP32 is set and offload
> target
> >> is nvptx, but not sure if that'd be correct ?
> >> >
> >> > Basically, yes.  My idea was that all 'TARGET_OFFLOAD_OPTIONS'
> >> > implementations return either the correct host flags to be used
> by
> >> > the 'mkoffload's (the case that offloading is supported for the
> >> > current host flags/ABI configuration), or otherwise return '-
> foffload-abi=disable'.
> 
> Oh..., you're right of course: we do need to continue to tell the
> 'mkoffload's which kind of offload code to generate!  My bad...
> 
> >> >> I added another option -foffload-abi-host-opts to specify host
> abi
> >> >> opts, and leave -foffload-abi to specify if ABI is 32/64 bit
> which
> >> >> mkoffload can use to enable/disable offloading (as before).
> >> >
> >> > I'm not sure however, if this additional option is really
> necessary?
> > Well, my concern was if that'd change the behavior for TARGET_ILP32
> ?
> > IIUC, currently for -foffload-abi=ilp32, mkoffload will create empty
> C
> > file for ptx_cfile_name (instead of munged ptx assembly since
> > offloading will be disabled), and pass that to host compiler with -
> m32 option (in compile_native).
> >
> > If we change -foffload-abi to specify ABI host opts, and pass
> > -foffload-abi=disable for TARGET_ILP32 in TARGET_OFFLOAD_OPTIONS,
> > mkoffload will no longer be able to pass 32-bit ABI opts to host
> > compiler, which may result in linker error (arch mismatch?) if the
> > host object files are 32-bit ABI and xnvptx-none.o is 64-bit
> (assuming the host compiler is configured to generate 64-bit code-gen
> by default) ?
> >
> > So, I thought to add another option -foffload-abi-host-opts to pass
> > host-specific ABI opts, and keep -foffload-abi as-is to infer ABI
> type for enabling/disabling offloading.
> 
> Quite right, yes.
> 
> >> -Original Message-
> >> From: Richard Biener 
> >> Sent: Tuesday, August 13, 2024 10:06 PM
> 
> >> Since we do not support 32 -> 64 bit offload
> 
> We don't -- but it's generally possible.  As Tobias recently educated
> me, the OpenMP specification expl

Match: Fix ordered and nonequal: Fix 'gcc.dg/opt-ordered-and-nonequal-1.c' re 'LOGICAL_OP_NON_SHORT_CIRCUIT' [PR116635] (was: [PATCH] Match: Fix ordered and nonequal)

2024-09-08 Thread Thomas Schwinge
Hi!

On 2024-09-04T13:43:45+0800, "Hu, Lin1"  wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
> +
> +int is_ordered_and_nonequal_sh_1 (float a, float b)
> +{
> +  return !__builtin_isunordered (a, b) && (a != b);
> +}
> +
> +[...]
> +
> +/* { dg-final { scan-tree-dump-times "gimple_simplified to\[^\n\r]*<>" 9 
> "forwprop1" } } */

OK to push
"Match: Fix ordered and nonequal: Fix 'gcc.dg/opt-ordered-and-nonequal-1.c' re 
'LOGICAL_OP_NON_SHORT_CIRCUIT' [PR116635]",
see attached?


Grüße
 Thomas


>From 3e85cb373fb86db5fad86a452a69e713c7050f16 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 9 Sep 2024 08:39:10 +0200
Subject: [PATCH] Match: Fix ordered and nonequal: Fix
 'gcc.dg/opt-ordered-and-nonequal-1.c' re 'LOGICAL_OP_NON_SHORT_CIRCUIT'
 [PR116635]

Fix up to make 'gcc.dg/opt-ordered-and-nonequal-1.c' of
commit 91421e21e8f0f05f440174b8de7a43a311700e08
"Match: Fix ordered and nonequal" work for default
'LOGICAL_OP_NON_SHORT_CIRCUIT == false' configurations.

	PR testsuite/116635
	gcc/testsuite/
	* gcc.dg/opt-ordered-and-nonequal-1.c: Fix re
	'LOGICAL_OP_NON_SHORT_CIRCUIT'.
---
 gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c b/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
index 6d102c2bd0c..d61c3322214 100644
--- a/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
+++ b/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
+/* Make this work for default 'LOGICAL_OP_NON_SHORT_CIRCUIT == false' configurations:
+   { dg-additional-options "--param logical-op-non-short-circuit=1" } */
 
 int is_ordered_and_nonequal_sh_1 (float a, float b)
 {
-- 
2.34.1



Re: [patch,reload] PR116326: Add #define IN_RELOAD1_CC

2024-09-08 Thread Richard Biener
On Sat, Sep 7, 2024 at 8:22 PM H.J. Lu  wrote:
>
>
> On Sat, Sep 7, 2024, 11:07 AM Georg-Johann Lay  wrote:
>>
>> Am 07.09.24 um 19:51 schrieb H.J. Lu:
>> > On Sat, Sep 7, 2024 at 10:49 AM Georg-Johann Lay  wrote:
>> >> Am 07.09.24 um 18:35 schrieb Jeff Law:
>> >>> On 9/6/24 3:40 AM, Georg-Johann Lay wrote:
>>  The reason for PR116326 is that LRA and reload require different
>>  ELIMINABLE_REGS for a multi-register frame pointer.  As ELIMINABLE_REGS
>>  is used to initialize static const objects, it is not possible to make
>>  ELIMINABLE_REGS to depend on options or patch it in some target hook.
>> 
>>  It was also concluded that it is not desirable to adjust reload so that
>>  it behaves like LRA, but a hack like #define IN_RELOAD1_CC at the top
>>  of reload1.cc would be fine, see
>> 
>>  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116326#c8
>> 
>>  This is an according patch that defines IN_RELOAD1_CC and uses it in
>>  avr.h to define ELIMINABLE_REGS.
>> 
>>  This is only required for trunk.
>> 
>>  PR116326 occurred for some test case in avr-torture.exp, so I didn't
>>  duplicate the test case.
>> 
>>  As it appears, this patch also fixes:
>> 
>>  https://gcc.gnu.org/PR116324
>>  https://gcc.gnu.org/PR116325
>>  https://gcc.gnu.org/PR116550
>> 
>>  Johann
>> 
>>  --
>> 
>>  AVR: target/116326 - Adjust ELIMINABLE_REGS to reload resp. LRA.
>> 
>>    PR target/116326
>>  gcc/
>>    * reload1.cc (IN_RELOAD1_CC): Define prior to all includes.
>>    * config/avr/avr.h (ELIMINABLE_REGS): Depend on IN_RELOAD1_CC.
>> >>> I'm going to have to NAK this.  It's too hackish, even with reload going
>> >>> away
>> >>>
>> >>> Jeff
>> >>
>> >> So what is your proposal?
>> >>
>> >> I'd agree with Richard that we don't want to change reload
>> >> implementations such that they agree on ELIMINABLE_REGS.
>> >>
>> >> A different way would be to change reload1.cc such that it includes
>> >> reload.h prior to tm.h, i.e. prior to backend.h.  This would make the
>> >> hack avr realm entirely, but it won't be trivially no-op.
>> >>
>> >> Johann
>> >
>> > Why not add RELOAD_ELIMINABLE_REGS and default it
>> > to ELIMINABLE_REGS?
>>
>> I can't find RELOAD_ELIMINABLE_REGS anywhere.
>> So are you proposing a new hook macro?
>
>
> A new macro.

Sounds good to me btw. - the other alternative is to avoid having
-m{,no-}lra and default to LRA now (not sure if that's feasible).

Richard.

>>
>> Johann
>>