Re: [PATCH 0/8] Tweak predicate macros in tree

2023-07-19 Thread Richard Biener via Gcc-patches
On Wed, Jul 19, 2023 at 1:34 AM Ken Matsui via Gcc-patches
 wrote:
>
> This patch series tweaks predicate macros in tree.h to make the code more
> readable. TYPE_REF_P is moved to tree.h and used for INDIRECT_TYPE_P and
> TYPE_REF_IS_LVALUE. TYPE_PTR_P is also moved to tree.h and used for
> INDIRECT_TYPE_P. POINTER_TYPE_P in tree.h is replaced with INDIRECT_TYPE_P
> since it is ambiguous. TYPE_REF_IS_LVALUE is defined in tree.h through
> TYPE_REF_P and TYPE_REF_IS_RVALUE. The same behavior codes with those
> predicate macros are replaced for clarity.
>
> These works were all the way up to implementing __is_lvalue_reference
> built-in trait and optimizing the is_lvalue_reference trait. However, those
> changes were dropped since I did not observe any performance improvements.
> For those who are interested in the benchmark results, they can be found
> below:
>
> 1. is_lvalue_reference
>
> https://github.com/ken-matsui/gcc-benches/blob/main/is_lvalue_reference.md#tue-jul-18-033708-pm-pdt-2023
>
> Time: +1.35432%
> Peak Memory Usage: -0.103283%
> Total Memory Usage: No difference
>
> 2. is_lvalue_reference_v
>
> https://github.com/ken-matsui/gcc-benches/blob/main/is_lvalue_reference_v.md#tue-jul-18-034236-pm-pdt-2023
>
> Time: No difference
> Peak Memory Usage: -0.426872%
> Total Memory Usage: -0.677638%
>
> Ken Matsui (8):
>   c++, tree: Move TYPE_REF_P to tree.h
>   gcc: Use TYPE_REF_P
>   c++, tree: Move TYPE_PTR_P to tree.h
>   c++, tree: Move INDIRECT_TYPE_P to tree.h
>   gcc: Use INDIRECT_TYPE_P instead of POINTER_TYPE_P

No, please not.  Definitely not.  The tree code of POINTER_TYPE_P is
POINTER_TYPE so the predicate name is exactly correct.
REFERENCE_TYPE_P would be the canonical predicate for REFERENCE_TYPE,
not TYPE_REF_P.

I don't think the C++ frontend should be the one to decide about middle-end
tree predicate macros.

>   tree: Remove POINTER_TYPE_P
>   tree: Define TYPE_REF_IS_LVALUE
>   c++, lto: Use TYPE_REF_IS_LVALUE
>
>  gcc/ada/gcc-interface/ada-tree.h   |   2 +-
>  gcc/ada/gcc-interface/decl.cc  |   6 +-
>  gcc/ada/gcc-interface/trans.cc |  16 +--
>  gcc/ada/gcc-interface/utils.cc |  12 +-
>  gcc/ada/gcc-interface/utils2.cc|  14 +-
>  gcc/alias.cc   |  12 +-
>  gcc/analyzer/analyzer.cc   |   4 +-
>  gcc/analyzer/call-details.h|   2 +-
>  gcc/analyzer/call-summary.cc   |   2 +-
>  gcc/analyzer/checker-event.cc  |   4 +-
>  gcc/analyzer/constraint-manager.cc |   2 +-
>  gcc/analyzer/engine.cc |   4 +-
>  gcc/analyzer/program-state.cc  |   2 +-
>  gcc/analyzer/region-model-manager.cc   |   6 +-
>  gcc/analyzer/region-model.cc   |   6 +-
>  gcc/analyzer/sm.cc |   4 +-
>  gcc/analyzer/svalue.cc |   2 +-
>  gcc/analyzer/varargs.cc|   2 +-
>  gcc/asan.cc|   4 +-
>  gcc/builtins.cc|  24 ++--
>  gcc/c-family/c-ada-spec.cc |   2 +-
>  gcc/c-family/c-attribs.cc  |  32 ++---
>  gcc/c-family/c-common.cc   |  41 +++---
>  gcc/c-family/c-omp.cc  |   8 +-
>  gcc/c-family/c-pretty-print.cc |   4 +-
>  gcc/c-family/c-ubsan.cc|  10 +-
>  gcc/c-family/c-warn.cc |  34 ++---
>  gcc/c/c-decl.cc|   8 +-
>  gcc/c/c-parser.cc  |   4 +-
>  gcc/c/c-typeck.cc  |  40 +++---
>  gcc/c/gimple-parser.cc |   8 +-
>  gcc/calls.cc   |   2 +-
>  gcc/cfgexpand.cc   |   6 +-
>  gcc/cgraph.cc  |   2 +-
>  gcc/cgraphunit.cc  |   2 +-
>  gcc/config/aarch64/aarch64-builtins.cc |   2 +-
>  gcc/config/aarch64/aarch64-sve-builtins.cc |   2 +-
>  gcc/config/aarch64/aarch64.cc  |   6 +-
>  gcc/config/arc/arc.cc  |   2 +-
>  gcc/config/arm/arm-builtins.cc |   6 +-
>  gcc/config/arm/arm-mve-builtins.cc |   2 +-
>  gcc/config/avr/avr.cc  |   6 +-
>  gcc/config/epiphany/epiphany.cc|   2 +-
>  gcc/config/gcn/gcn-tree.cc |   2 +-
>  gcc/config/gcn/gcn.cc  |   6 +-
>  gcc/config/i386/i386-builtins.cc   |   2 +-
>  gcc/config/i386/i386-options.cc|   2 +-
>  gcc/config/i386/i386.cc|  10 +-
>  gcc/config/m32c/m32c.cc|   2 +-
>  gcc/config/m68k/m68k.cc|   4 +-
>  gcc/config/mips/mips.cc|   2 +-
>  gcc/config/mn10300/mn10300.cc  |   2 +-
>  gcc/config/msp430/msp430.cc|   2 +-
>  gcc/config/nios2/nios2.cc  |   2 +-
>  gcc/config/pa/pa.cc 

Re: [PATCH] Fix PR110726: a | (a == b) can sometimes produce wrong code

2023-07-19 Thread Richard Biener via Gcc-patches
On Wed, Jul 19, 2023 at 3:53 AM Andrew Pinski via Gcc-patches
 wrote:
>
> So I had missed/forgot that EQ_EXPR could have an non boolean
> type for generic when I implemented r14-2556-g0407ae8a7732d9.
> This patch adds check for one bit precision intergal type
> which fixes the problem.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> PR tree-optimization/110726
>
> gcc/ChangeLog:
>
> * match.pd ((a|b)&(a==b),a|(a==b),(a&b)|(a==b)):
> Add checks to make sure the type was one bit precision
> intergal type.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/execute/bitops-1.c: New test.
> ---
>  gcc/match.pd  | 12 +--
>  .../gcc.c-torture/execute/bitops-1.c  | 33 +++
>  2 files changed, 42 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/bitops-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 054e6585876..4dfe92623f7 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1229,7 +1229,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* (a | b) & (a == b)  -->  a & b (boolean version of the above).  */
>  (simplify
>   (bit_and:c (bit_ior @0 @1) (nop_convert? (eq:c @0 @1)))
> - (bit_and @0 @1))
> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))

that's really a constraint on 'type', not sure if it would be clearer
to test that.
What's the nop_convert you've seen in practice here?  With integer comparison
result shouldn't those be convert? instead?

> +  && TYPE_PRECISION (TREE_TYPE (@0)) == 1)
> +  (bit_and @0 @1)))
>
>  /* a | ~(a ^ b)  -->  a | ~b  */
>  (simplify
> @@ -1239,7 +1241,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* a | (a == b)  -->  a | (b^1) (boolean version of the above). */
>  (simplify
>   (bit_ior:c @0 (nop_convert? (eq:c @0 @1)))
> - (bit_ior @0 (bit_xor @1 { build_one_cst (type); })))
> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +  && TYPE_PRECISION (TREE_TYPE (@0)) == 1)
> +  (bit_ior @0 (bit_xor @1 { build_one_cst (type); }
>
>  /* (a | b) | (a &^ b)  -->  a | b  */
>  (for op (bit_and bit_xor)
> @@ -1255,7 +1259,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* (a & b) | (a == b)  -->  a == b  */
>  (simplify
>   (bit_ior:c (bit_and:c @0 @1) (nop_convert?@2 (eq @0 @1)))
> - @2)
> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +  && TYPE_PRECISION (TREE_TYPE (@0)) == 1)
> +  @2))
>
>  /* ~(~a & b)  -->  a | ~b  */
>  (simplify
> diff --git a/gcc/testsuite/gcc.c-torture/execute/bitops-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/bitops-1.c
> new file mode 100644
> index 000..cfaa6b9fd26
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/bitops-1.c
> @@ -0,0 +1,33 @@
> +/* PR tree-optimization/110726 */
> +
> +#define DECLS(n,VOL)   \
> +__attribute__((noinline,noclone))  \
> +int h##n(VOL int A, VOL int B){\
> +return (A | B) & (A == B); \
> +}  \
> +__attribute__((noinline,noclone))  \
> +int i##n(VOL int A, VOL int B){\
> +return A | (A == B);   \
> +}  \
> +__attribute__((noinline,noclone))  \
> +int k##n(VOL int A, VOL int B){\
> +return (A & B) | (A == B); \
> +}  \
> +
> +DECLS(0,)
> +DECLS(1,volatile)
> +
> +int values[] = { 0, 1, 2, 3, -1, -2, -3, 0x10080 };
> +int numvalues = sizeof(values)/sizeof(values[0]);
> +
> +int main(){
> +for(int A = 0; A < numvalues; A++)
> +  for(int B = 0; B < numvalues; B++)
> +   {
> + int a = values[A];
> + int b = values[B];
> + if (h0 (a, b) != h1 (a, b)) __builtin_abort();
> + if (i0 (a, b) != i1 (a, b)) __builtin_abort();
> + if (k0 (a, b) != k1 (a, b)) __builtin_abort();
> +   }
> +}
> --
> 2.31.1
>


Re: [PATCH] RISC-V: Fix testcase failed when default -mcmodel=medany

2023-07-19 Thread Lehua Ding
Committed V2 patch, thank you so much.




-- Original --
From:   
 "Robin Dapp"   
 


Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-19 Thread Richard Biener via Gcc-patches
On Wed, 19 Jul 2023, YunQiang Su wrote:

> Richard Biener via Gcc-patches  ?2023?7?19??? 
> 14:27???
> >
> > On Wed, 19 Jul 2023, YunQiang Su wrote:
> >
> > > PR #104914
> > >
> > > When work with
> > >   int val;
> > >   ((unsigned char*)&val)[3] = *buf;
> > >   if (val > 0) ...
> > > The RTX mode is obtained from REG instead of SUBREG, which make
> > > D is used instead of .  Thus something wrong happens
> > > on sign-extend default architectures, like MIPS64.
> > >
> > > Let's use str_rtx and mode of str_rtx as the parameters for
> > > store_integral_bit_field if:
> > >   modes of op0 and str_rtx are INT;
> > >   length of op0 is greater than str_rtx.
> > >
> > > This patch has been tested on aarch64-linux-gnu, x86_64-linux-gnu,
> > > mips64el-linux-gnuabi64 without regression.
> >
> > I still think you are "fixing" this in the wrong place.  The bugzilla
> > audit trail points to combine and later notes an eventual expansion
> > issue (but for another testcase/target).
> >
> > You have to explain in more detail on what is wrong with the initial
> > RTL on mips.
> >
> 
> In the first RTL file, aka xx.c.256r.expand, the zero_extract RTX is like
> 
> (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
> (const_int 8 [0x8])
> (const_int 0 [0]))
> (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
>  (nil))
> 
> Not, all of the REG are in DImode. On MIPS64, it will expand to `DINS`
> instructions.
> While in fact here, we expect an SImode operation, due to `val` in C
> code is `int`.
> 
> With my patch, the RTX will be like:
> 
> (insn 10 9 11 2 (set (zero_extract:SI (subreg:SI (reg/v:DI 200 [ val ]) 0)
> (const_int 8 [0x8])
> (const_int 0 [0]))
> (subreg:SI (reg:QI 202) 0)) "xx.c":4:29 -1
>  (nil))

But if this RTL is correct then the above with DImode is correct as
well and the issue is in the backend definition of the instruction
defining 'DINS'?

> So the operation will be SImode, aka `INS` instruction for MIPS64.
> 
> The problem is based on 2 fact/root cause:
> 1. MIPS's `INS` instruction will be always to sign-extension, while `DINS` 
> won't
> li $7, 0xff
> li $8, 0
> ins $8,$7,24,8  # set the 24-32 bits of $8 to 0xff.
> The value of $8 will be 0xff ff ff ff ff 00 00 00.

Bit that's wrong.  (set (zero_extract:SI ...) should not affect
bits outside of the indicated range.

@findex zero_extract
@item (zero_extract:@var{m} @var{loc} @var{size} @var{pos})
Like @code{sign_extract} but refers to an unsigned or zero-extended
bit-field.  The same sequence of bits are extracted, but they
are filled to an entire word with zeros instead of by sign-extension.

Unlike @code{sign_extract}, this type of expressions can be lvalues
in RTL; they may appear on the left side of an assignment, indicating
insertion of a value into the specified bit-field.
@end table


> li $7, 0xff
> li $8, 0
> dins $8,$7,24,8  # set the 24-32 bits of $8 to 0xff.
> The value of $8 will be 0x 00 00 00 00 ff 00 00 00.

which isn't correct either.

If you look a few dumps further you'll see which instruction was
recognized, I suspect the machine description is simply wrong here?

> 2. Due to most of MIPS instructions work with 32bit value, aka instructions
> without `d` as its first char (in fact with few exception), are 
> sign-extension,
> the MIPS backend just ignore `extendsidi2`, aka RTX
> 
> (insn 14 13 15 2 (set (reg/v:DI 200 [ val ])
> (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val ]) 0))) "xx.c":5:29 -1
>  (nil))
> 
> 
> 
> > Richard.
> >
> > > gcc/ChangeLog:
> > > PR: 104914.
> > > * expmed.cc(store_bit_field_1): Pass str_rtx and its mode
> > >   to store_integral_bit_field if the length of op0 is greater
> > >   than str_rtx.
> > >
> > > gcc/testsuite/ChangeLog:
> > > PR: 104914.
> > >   * gcc.target/mips/pr104914.c: New testcase.
> > > ---
> > >  gcc/expmed.cc| 20 +---
> > >  gcc/testsuite/gcc.target/mips/pr104914.c | 17 +
> > >  2 files changed, 34 insertions(+), 3 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/mips/pr104914.c
> > >
> > > diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> > > index fbd4ce2d42f..5531c19e891 100644
> > > --- a/gcc/expmed.cc
> > > +++ b/gcc/expmed.cc
> > > @@ -850,6 +850,7 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> > > poly_uint64 bitnum,
> > >   since that case is valid for any mode.  The following cases are only
> > >   valid for integral modes.  */
> > >opt_scalar_int_mode op0_mode = int_mode_for_mode (GET_MODE (op0));
> > > +  opt_scalar_int_mode str_mode = int_mode_for_mode (GET_MODE (str_rtx));
> > >scalar_int_mode imode;
> > >if (!op0_mode.exists (&imode) || imode != GET_MODE (op0))
> > >  {
> > > @@ -881,9 +882,22 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> > > poly_uint64 bitnum,
> > >   op0 = gen_lowpart (op0_mode.r

Re: [PATCH] Fix PR110726: a | (a == b) can sometimes produce wrong code

2023-07-19 Thread Andrew Pinski via Gcc-patches
On Wed, Jul 19, 2023 at 12:16 AM Richard Biener via Gcc-patches
 wrote:
>
> On Wed, Jul 19, 2023 at 3:53 AM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > So I had missed/forgot that EQ_EXPR could have an non boolean
> > type for generic when I implemented r14-2556-g0407ae8a7732d9.
> > This patch adds check for one bit precision intergal type
> > which fixes the problem.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > PR tree-optimization/110726
> >
> > gcc/ChangeLog:
> >
> > * match.pd ((a|b)&(a==b),a|(a==b),(a&b)|(a==b)):
> > Add checks to make sure the type was one bit precision
> > intergal type.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.c-torture/execute/bitops-1.c: New test.
> > ---
> >  gcc/match.pd  | 12 +--
> >  .../gcc.c-torture/execute/bitops-1.c  | 33 +++
> >  2 files changed, 42 insertions(+), 3 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/bitops-1.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 054e6585876..4dfe92623f7 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -1229,7 +1229,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  /* (a | b) & (a == b)  -->  a & b (boolean version of the above).  */
> >  (simplify
> >   (bit_and:c (bit_ior @0 @1) (nop_convert? (eq:c @0 @1)))
> > - (bit_and @0 @1))
> > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>
> that's really a constraint on 'type', not sure if it would be clearer
> to test that.
> What's the nop_convert you've seen in practice here?  With integer comparison
> result shouldn't those be convert? instead?

The case were nop_convert happen would happen is:
```
struct f
{
  signed a:1;
  signed b:1;
  signed c:1;
};

void f(struct f t, struct f *t1)
{
  t.c = (t.a == t.b);
  t1->c =   (t.a | t.b) & t.c;
}
```
I know the above does not show up much but it definitely can show up.

In the testcase in the bug report the EQ_EXPR didn't have convert
around it but the type was just changed to be the long type and we
cannot treat the `a == b` similar as `~(a^b)` which is only works for
1 bit integers (unsigned or signed).

Thanks,
Andrew Pinski

>
> > +  && TYPE_PRECISION (TREE_TYPE (@0)) == 1)
> > +  (bit_and @0 @1)))
> >
> >  /* a | ~(a ^ b)  -->  a | ~b  */
> >  (simplify
> > @@ -1239,7 +1241,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  /* a | (a == b)  -->  a | (b^1) (boolean version of the above). */
> >  (simplify
> >   (bit_ior:c @0 (nop_convert? (eq:c @0 @1)))
> > - (bit_ior @0 (bit_xor @1 { build_one_cst (type); })))
> > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > +  && TYPE_PRECISION (TREE_TYPE (@0)) == 1)
> > +  (bit_ior @0 (bit_xor @1 { build_one_cst (type); }
> >
> >  /* (a | b) | (a &^ b)  -->  a | b  */
> >  (for op (bit_and bit_xor)
> > @@ -1255,7 +1259,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  /* (a & b) | (a == b)  -->  a == b  */
> >  (simplify
> >   (bit_ior:c (bit_and:c @0 @1) (nop_convert?@2 (eq @0 @1)))
> > - @2)
> > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > +  && TYPE_PRECISION (TREE_TYPE (@0)) == 1)
> > +  @2))
> >
> >  /* ~(~a & b)  -->  a | ~b  */
> >  (simplify
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/bitops-1.c 
> > b/gcc/testsuite/gcc.c-torture/execute/bitops-1.c
> > new file mode 100644
> > index 000..cfaa6b9fd26
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/execute/bitops-1.c
> > @@ -0,0 +1,33 @@
> > +/* PR tree-optimization/110726 */
> > +
> > +#define DECLS(n,VOL)   \
> > +__attribute__((noinline,noclone))  \
> > +int h##n(VOL int A, VOL int B){\
> > +return (A | B) & (A == B); \
> > +}  \
> > +__attribute__((noinline,noclone))  \
> > +int i##n(VOL int A, VOL int B){\
> > +return A | (A == B);   \
> > +}  \
> > +__attribute__((noinline,noclone))  \
> > +int k##n(VOL int A, VOL int B){\
> > +return (A & B) | (A == B); \
> > +}  \
> > +
> > +DECLS(0,)
> > +DECLS(1,volatile)
> > +
> > +int values[] = { 0, 1, 2, 3, -1, -2, -3, 0x10080 };
> > +int numvalues = sizeof(values)/sizeof(values[0]);
> > +
> > +int main(){
> > +for(int A = 0; A < numvalues; A++)
> > +  for(int B = 0; B < numvalues; B++)
> > +   {
> > + int a = values[A];
> > + int b = values[B];
> > + if (h0 (a, b) != h1 (a, b)) __builtin_abort();
> > + if (i0 (a, b) != i1 (a, b)) __builtin_abort();
> > + if (k0 (a, b) != k1 (a, b)) __builtin_abort();
> > +   }
> > +}
> > --
> > 2.31.1
> >


Re: [PATCH v3] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-19 Thread Manolis Tsamis
Hi Vineet, Jeff,

On Wed, Jul 19, 2023 at 7:31 AM Jeff Law  wrote:
>
>
>
> On 7/18/23 17:42, Vineet Gupta wrote:
> > Hi Manolis,
> >
> > On 7/18/23 11:01, Jeff Law via Gcc-patches wrote:
> >> Vineet @ Rivos has indicated he stumbled across an ICE with the V3
> >> code.  Hopefully he'll get a testcase for that extracted shortly.
> >
> > Yeah, I was trying to build SPEC2017 with this patch and ran into ICE
> > for several of them with -Ofast build: The reduced test from 455.nab is
> > attached here.
> > The issue happens with v2 as well, so not something introduced by v3.
> >
> > There's ICE in cprop_hardreg which immediately follows f-m-o.
> >
> >
> > The protagonist is ins 93 which starts off in combine as a simple set of
> > a DF 0.
> >
> > | sff.i.288r.combine:(insn 93 337 94 8 (set (reg/v:DF 236 [ e ])
> > | sff.i.288r.combine- (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11
> > 190 {*movdf_hardfloat_rv64}
> >
> > Subsequently reload transforms it into SP + offset
> >
> > | sff.i.303r.reload:(insn 93 337 94 9 (set (mem/c:DF (plus:DI (reg/f:DI
> > 2 sp)
> > | sff.i.303r.reload- (const_int 8 [0x8])) [4 %sfp+-8 S8 A64])
> > | sff.i.303r.reload- (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 190
> > {*movdf_hardfloat_rv64}
> > | sff.i.303r.reload- (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
> >
> > It gets processed by f-m-o and lands in cprop_hardreg, where it triggers
> > ICE.
> >
> > | (insn 93 337 523 11 (set (mem/c:DF (plus:DI (reg/f:DI 2 sp)
> > | (const_int 8 [0x8])) [4 %sfp+-8 S8 A64])
> > | (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 -1
> > ^^^
> > |  (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
> > |(nil)))
> > | during RTL pass: cprop_hardreg
> >
> > Here's my analysis:
> >
> > f-m-o: do_check_validity() -> insn_invalid_p() tries to recog() a
> > modified version of insn 93 (actually there is no change, so perhaps
> > something we can optimize later). The corresponding md pattern
> > movdf_hardfloat_rv64 no longer matches since it expects REG_P for
> > operand0, while reload has converted it into SP + offset. f-m-o then
> > does the right thing by invalidating INSN_CODE=-1 for a subsequent
> > recog() to work correctly.
> > But it seems this -1 lingers into the next pass, and trips up
> > copyprop_hardreg_forward_1() -> extract_constrain_insn()
> > So I don't know what the right fix here should be.
> This is a bug in the RISC-V backend.  I actually fixed basically the
> same bug in another backend that was exposed by the f-m-o code.
>

I stumbled upon the same thing when doing an aarch64 bootstrap build yesterday.
Given that this causes issues, maybe doing
  int icode = INSN_CODE (insn);
  ...
  INSN_CODE (insn) = icode;
Is a good option and should also be more performant.
Even with that I'm still getting a segfault while doing a bootstrap
build that I'm investigating.

>
> >
> > In a run with -fno-fold-mem-offsets, the same insn 93 is successfully
> > grok'ed by cprop_hardreg,
> >
> > | (insn 93 337 522 11 (set (mem/c:DF (plus:DI (reg/f:DI 2 sp)
> > |(const_int 8 [0x8])) [4 %sfp+-8 S8 A64])
> > |(const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 190
> > {*movdf_hardfloat_rv64}
> > ^^^
> > | (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
> > |(nil)))
> >
> > P.S. I wonder if it is a good idea in general to call recog() post
> > reload since the insn could be changed sufficiently to no longer match
> > the md patterns. Of course I don't know the answer.
> If this ever causes a problem, it's a backend bug.  It's that simple.
>
> Conceptually it should always be safe to set INSN_CODE to -1 for any insn.
>
> Odds are for this specific case in the RV backend, we just need a
> constraint to store 0.0 into a memory location.  That can actually be
> implemented as a store from x0 since 0.0 has the bit pattern 0x0.  This
> is probably a good thing to expose anyway as an optimization and can
> move forward independently of the f-m-o patch.
>
>
>
> >
> > P.S.2 When debugging code, I noticed a minor annoyance in the patch with
> > the whole fold_mem_offsets_driver() switch-case indirection. It doesn't
> > seem to be serving any purpose, and we could simply call corresponding
> > do_* routines in execute () itself.
> We were in the process of squashing some of this out of the
> implementation.   I hadn't looked at the V3 patch to see how much
> progress had been made on this yet.
>

Thanks for pointing that out Vineet!
When I refactored the code in the separate do_* functions it never
occured to me that both the _driver function and the state enum are
now useless. I will remove all of this in the next iteration.

Manolis

> Thanks for digging into this!
>
> jeff


Re: [PATCH] Fix PR110726: a | (a == b) can sometimes produce wrong code

2023-07-19 Thread Richard Biener via Gcc-patches
On Wed, Jul 19, 2023 at 9:32 AM Andrew Pinski  wrote:
>
> On Wed, Jul 19, 2023 at 12:16 AM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Wed, Jul 19, 2023 at 3:53 AM Andrew Pinski via Gcc-patches
> >  wrote:
> > >
> > > So I had missed/forgot that EQ_EXPR could have an non boolean
> > > type for generic when I implemented r14-2556-g0407ae8a7732d9.
> > > This patch adds check for one bit precision intergal type
> > > which fixes the problem.
> > >
> > > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > >
> > > PR tree-optimization/110726
> > >
> > > gcc/ChangeLog:
> > >
> > > * match.pd ((a|b)&(a==b),a|(a==b),(a&b)|(a==b)):
> > > Add checks to make sure the type was one bit precision
> > > intergal type.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.c-torture/execute/bitops-1.c: New test.
> > > ---
> > >  gcc/match.pd  | 12 +--
> > >  .../gcc.c-torture/execute/bitops-1.c  | 33 +++
> > >  2 files changed, 42 insertions(+), 3 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/bitops-1.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index 054e6585876..4dfe92623f7 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -1229,7 +1229,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >  /* (a | b) & (a == b)  -->  a & b (boolean version of the above).  */
> > >  (simplify
> > >   (bit_and:c (bit_ior @0 @1) (nop_convert? (eq:c @0 @1)))
> > > - (bit_and @0 @1))
> > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> >
> > that's really a constraint on 'type', not sure if it would be clearer
> > to test that.
> > What's the nop_convert you've seen in practice here?  With integer 
> > comparison
> > result shouldn't those be convert? instead?
>
> The case were nop_convert happen would happen is:
> ```
> struct f
> {
>   signed a:1;
>   signed b:1;
>   signed c:1;
> };
>
> void f(struct f t, struct f *t1)
> {
>   t.c = (t.a == t.b);
>   t1->c =   (t.a | t.b) & t.c;
> }
> ```
> I know the above does not show up much but it definitely can show up.
>
> In the testcase in the bug report the EQ_EXPR didn't have convert
> around it but the type was just changed to be the long type and we
> cannot treat the `a == b` similar as `~(a^b)` which is only works for
> 1 bit integers (unsigned or signed).

I see, the patch is OK then.

Thanks,
Richard.

> Thanks,
> Andrew Pinski
>
> >
> > > +  && TYPE_PRECISION (TREE_TYPE (@0)) == 1)
> > > +  (bit_and @0 @1)))
> > >
> > >  /* a | ~(a ^ b)  -->  a | ~b  */
> > >  (simplify
> > > @@ -1239,7 +1241,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >  /* a | (a == b)  -->  a | (b^1) (boolean version of the above). */
> > >  (simplify
> > >   (bit_ior:c @0 (nop_convert? (eq:c @0 @1)))
> > > - (bit_ior @0 (bit_xor @1 { build_one_cst (type); })))
> > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > +  && TYPE_PRECISION (TREE_TYPE (@0)) == 1)
> > > +  (bit_ior @0 (bit_xor @1 { build_one_cst (type); }
> > >
> > >  /* (a | b) | (a &^ b)  -->  a | b  */
> > >  (for op (bit_and bit_xor)
> > > @@ -1255,7 +1259,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >  /* (a & b) | (a == b)  -->  a == b  */
> > >  (simplify
> > >   (bit_ior:c (bit_and:c @0 @1) (nop_convert?@2 (eq @0 @1)))
> > > - @2)
> > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > +  && TYPE_PRECISION (TREE_TYPE (@0)) == 1)
> > > +  @2))
> > >
> > >  /* ~(~a & b)  -->  a | ~b  */
> > >  (simplify
> > > diff --git a/gcc/testsuite/gcc.c-torture/execute/bitops-1.c 
> > > b/gcc/testsuite/gcc.c-torture/execute/bitops-1.c
> > > new file mode 100644
> > > index 000..cfaa6b9fd26
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.c-torture/execute/bitops-1.c
> > > @@ -0,0 +1,33 @@
> > > +/* PR tree-optimization/110726 */
> > > +
> > > +#define DECLS(n,VOL)   \
> > > +__attribute__((noinline,noclone))  \
> > > +int h##n(VOL int A, VOL int B){\
> > > +return (A | B) & (A == B); \
> > > +}  \
> > > +__attribute__((noinline,noclone))  \
> > > +int i##n(VOL int A, VOL int B){\
> > > +return A | (A == B);   \
> > > +}  \
> > > +__attribute__((noinline,noclone))  \
> > > +int k##n(VOL int A, VOL int B){\
> > > +return (A & B) | (A == B); \
> > > +}  \
> > > +
> > > +DECLS(0,)
> > > +DECLS(1,volatile)
> > > +
> > > +int values[] = { 0, 1, 2, 3, -1, -2, -3, 0x10080 };
> > > +int numvalues = sizeof(values)/sizeof(values[0]);
> > > +
> > > +int main(){
> > > +for(int A = 0; A < numvalues; A++)
> > > +  for(int B = 0; B < numvalues; B++)
> > > +   {
> > > + int a = values[A];
> > > + int b = values[B];
> > > + if (h0 (a, b) != h1 (a, b)) __builtin_abort();
> > > + if (i0 (a, b) != i1 (a, b)) __builtin_abort

Re: [PATCH V2] RISC-V: Throw compilation error for unknown sub-extension or supervisor extension

2023-07-19 Thread Lehua Ding
Commited to the trunk, thank you so much.

Re: [PATCH] VECT: Add mask_len_fold_left_plus for in-order floating-point reduction

2023-07-19 Thread Richard Biener via Gcc-patches
On Sat, 15 Jul 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, Richard and Richi.
> 
> This patch adds mask_len_fold_left_plus pattern to support in-order 
> floating-point
> reduction for target support len loop control.
> 
> Consider this following case:
> double
> foo2 (double *__restrict a,
>  double init,
>  int *__restrict cond,
>  int n)
> {
> for (int i = 0; i < n; i++)
>   if (cond[i])
> init += a[i];
> return init;
> }
> 
> ARM SVE:
> 
> ...
> vec_mask_and_60 = loop_mask_54 & mask__23.33_57;
> vect__ifc__35.37_64 = .VCOND_MASK (vec_mask_and_60, vect__8.36_61, { 0.0, ... 
> });
> _36 = .MASK_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, loop_mask_54);
> ...
> 
> For RVV, we want to see:
> ...
> _36 = .MASK_LEN_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, control_mask, 
> loop_len, bias);
> ...

OK.

Richard.

> gcc/ChangeLog:
> 
> * doc/md.texi: Add mask_len_fold_left_plus.
> * internal-fn.cc (mask_len_fold_left_direct): Ditto.
> (expand_mask_len_fold_left_optab_fn): Ditto.
> (direct_mask_len_fold_left_optab_supported_p): Ditto.
> * internal-fn.def (MASK_LEN_FOLD_LEFT_PLUS): Ditto.
> * optabs.def (OPTAB_D): Ditto.
> 
> ---
>  gcc/doc/md.texi | 13 +
>  gcc/internal-fn.cc  |  5 +
>  gcc/internal-fn.def |  3 +++
>  gcc/optabs.def  |  1 +
>  4 files changed, 22 insertions(+)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index cbcb992e5d7..6f44e66399d 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5615,6 +5615,19 @@ no reassociation.
>  Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
>  (operand 3) that specifies which elements of the source vector should be 
> added.
>  
> +@cindex @code{mask_len_fold_left_plus_@var{m}} instruction pattern
> +@item @code{mask_len_fold_left_plus_@var{m}}
> +Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
> +(operand 3), len operand (operand 4) and bias operand (operand 5) that
> +performs following operations strictly in-order (no reassociation):
> +
> +@smallexample
> +operand0 = operand1;
> +for (i = 0; i < LEN + BIAS; i++)
> +  if (operand3[i])
> +operand0 += operand2[i];
> +@end smallexample
> +
>  @cindex @code{sdot_prod@var{m}} instruction pattern
>  @item @samp{sdot_prod@var{m}}
>  
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index e698f0bffc7..2bf4fc492fe 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -190,6 +190,7 @@ init_internal_fns ()
>  #define fold_extract_direct { 2, 2, false }
>  #define fold_left_direct { 1, 1, false }
>  #define mask_fold_left_direct { 1, 1, false }
> +#define mask_len_fold_left_direct { 1, 1, false }
>  #define check_ptrs_direct { 0, 0, false }
>  
>  const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
> @@ -3890,6 +3891,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, 
> convert_optab optab,
>  #define expand_mask_fold_left_optab_fn(FN, STMT, OPTAB) \
>expand_direct_optab_fn (FN, STMT, OPTAB, 3)
>  
> +#define expand_mask_len_fold_left_optab_fn(FN, STMT, OPTAB) \
> +  expand_direct_optab_fn (FN, STMT, OPTAB, 5)
> +
>  #define expand_check_ptrs_optab_fn(FN, STMT, OPTAB) \
>expand_direct_optab_fn (FN, STMT, OPTAB, 4)
>  
> @@ -3997,6 +4001,7 @@ multi_vector_optab_supported_p (convert_optab optab, 
> tree_pair types,
>  #define direct_fold_extract_optab_supported_p direct_optab_supported_p
>  #define direct_fold_left_optab_supported_p direct_optab_supported_p
>  #define direct_mask_fold_left_optab_supported_p direct_optab_supported_p
> +#define direct_mask_len_fold_left_optab_supported_p direct_optab_supported_p
>  #define direct_check_ptrs_optab_supported_p direct_optab_supported_p
>  #define direct_vec_set_optab_supported_p direct_optab_supported_p
>  #define direct_vec_extract_optab_supported_p direct_optab_supported_p
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index ea750a921ed..d3aec51b1f2 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -319,6 +319,9 @@ DEF_INTERNAL_OPTAB_FN (FOLD_LEFT_PLUS, ECF_CONST | 
> ECF_NOTHROW,
>  DEF_INTERNAL_OPTAB_FN (MASK_FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW,
>  mask_fold_left_plus, mask_fold_left)
>  
> +DEF_INTERNAL_OPTAB_FN (MASK_LEN_FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW,
> +mask_len_fold_left_plus, mask_len_fold_left)
> +
>  /* Unary math functions.  */
>  DEF_INTERNAL_FLT_FN (ACOS, ECF_CONST, acos, unary)
>  DEF_INTERNAL_FLT_FN (ACOSH, ECF_CONST, acosh, unary)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 3dae228fba6..7023392979e 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -385,6 +385,7 @@ OPTAB_D (reduc_ior_scal_optab,  "reduc_ior_scal_$a")
>  OPTAB_D (reduc_xor_scal_optab,  "reduc_xor_scal_$a")
>  OPTAB_D (fold_left_plus_optab, "fold_left_plus_$a")
>  OPTAB_D (mask_fold_left_plus_optab, "mask_fold_left_plus_$a")
> +OPTAB_D (mask_

Re: [RFC/RFT, V2 0/3] Add compiler support for Kernel Control Flow Integrity

2023-07-19 Thread Dan Li via Gcc-patches
Hi Kees,

Sincerely sorry, I just saw this email.
Embarrassingly, due to another job change, my plan was postponed again :(.

I may not be able to attend this year's GCC meeting. Is there any other
way to let this get some traction in GCC? I really hope someone can help
with this topic.

BTW, I'm still looking at this and plan to finish it by the end of this
year, but it's taking too long and there's a lot of uncertainty, so
please just consider this only as a backup option.

Thanks,
Dan.

On Thu, 22 Jun 2023 at 05:54, Kees Cook  wrote:
>
> On Sat, Mar 25, 2023 at 01:11:14AM -0700, Dan Li wrote:
> > This series of patches is mainly used to support the control flow
> > integrity protection of the linux kernel [1], which is similar to
> > -fsanitize=kcfi in clang 16.0 [2,3].
> >
> > Any suggestion please let me know :).
>
> Hi Dan,
>
> It's been a couple months, and I didn't see any other feedback on this
> proposal. I was curious what the status of this work is. Are you able to
> attend GNU Cauldron[1] this year? I'd love to see this get some traction
> in GCC.
>
> Thanks!
>
> -Kees
>
> [1] https://gcc.gnu.org/wiki/cauldron2023
>
> --
> Kees Cook


[PATCH] mklog: fix bugs of --append option

2023-07-19 Thread Lehua Ding
Hi,

This little patch fix two bugs of mklog.py with --append option.
The first bug is that the regexp used is not accurate enough to
determine the top of diff area. The second bug is that if `---`
is not a true start, it needs to be added back to the patch file.

contrib/ChangeLog:

* mklog.py: Fix regexp and add missed `---`

---
 contrib/mklog.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/mklog.py b/contrib/mklog.py
index 26230b9b4f2..bd81c5ba92c 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -385,12 +385,13 @@ if __name__ == '__main__':
 if maybe_diff_log == 1 and line == "---\n":
 maybe_diff_log = 2
 elif maybe_diff_log == 2 and \
- re.match("\s[^\s]+\s+\|\s\d+\s[+\-]+\n", line):
+ re.match("\s[^\s]+\s+\|\s+\d+\s[\+\-]+\n", line):
 lines += [output, "---\n", line]
 maybe_diff_log = 3
 else:
 # the possible start is not the true start.
 if maybe_diff_log == 2:
+lines.append("---\n")
 maybe_diff_log = 1
 lines.append(line)
 with open(args.input, "w") as f:
-- 
2.36.1



Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-19 Thread YunQiang Su via Gcc-patches
Richard Biener  于2023年7月19日周三 15:22写道:
>
> On Wed, 19 Jul 2023, YunQiang Su wrote:
>
> > Richard Biener via Gcc-patches  ?2023?7?19??? 
> > 14:27???
> > >
> > > On Wed, 19 Jul 2023, YunQiang Su wrote:
> > >
> > > > PR #104914
> > > >
> > > > When work with
> > > >   int val;
> > > >   ((unsigned char*)&val)[3] = *buf;
> > > >   if (val > 0) ...
> > > > The RTX mode is obtained from REG instead of SUBREG, which make
> > > > D is used instead of .  Thus something wrong happens
> > > > on sign-extend default architectures, like MIPS64.
> > > >
> > > > Let's use str_rtx and mode of str_rtx as the parameters for
> > > > store_integral_bit_field if:
> > > >   modes of op0 and str_rtx are INT;
> > > >   length of op0 is greater than str_rtx.
> > > >
> > > > This patch has been tested on aarch64-linux-gnu, x86_64-linux-gnu,
> > > > mips64el-linux-gnuabi64 without regression.
> > >
> > > I still think you are "fixing" this in the wrong place.  The bugzilla
> > > audit trail points to combine and later notes an eventual expansion
> > > issue (but for another testcase/target).
> > >
> > > You have to explain in more detail on what is wrong with the initial
> > > RTL on mips.
> > >
> >
> > In the first RTL file, aka xx.c.256r.expand, the zero_extract RTX is like
> >
> > (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
> > (const_int 8 [0x8])
> > (const_int 0 [0]))
> > (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
> >  (nil))
> >
> > Not, all of the REG are in DImode. On MIPS64, it will expand to `DINS`
> > instructions.
> > While in fact here, we expect an SImode operation, due to `val` in C
> > code is `int`.
> >
> > With my patch, the RTX will be like:
> >
> > (insn 10 9 11 2 (set (zero_extract:SI (subreg:SI (reg/v:DI 200 [ val ]) 0)
> > (const_int 8 [0x8])
> > (const_int 0 [0]))
> > (subreg:SI (reg:QI 202) 0)) "xx.c":4:29 -1
> >  (nil))
>
> But if this RTL is correct then the above with DImode is correct as
> well and the issue is in the backend definition of the instruction
> defining 'DINS'?
>

I don't think so.

(insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
 ^^
 (const_int 8 [0x8])
 (const_int 0 [0]))
 (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
  (nil))

This RTL has only info about DI. It doesn't has any info about the
real length of
`val`. For backend, it has no other choice instead of `DINS`.

> > So the operation will be SImode, aka `INS` instruction for MIPS64.
> >
> > The problem is based on 2 fact/root cause:
> > 1. MIPS's `INS` instruction will be always to sign-extension, while `DINS` 
> > won't
> > li $7, 0xff
> > li $8, 0
> > ins $8,$7,24,8  # set the 24-32 bits of $8 to 0xff.
> > The value of $8 will be 0xff ff ff ff ff 00 00 00.
>
> Bit that's wrong.  (set (zero_extract:SI ...) should not affect
> bits outside of the indicated range.
>

In fact, it is how sign-extension arch work.
No matter wrong or right, the ISA was/is defined like this.

In fact, one MIPS 32 ABI, the same C code will generate the RTL like this,
and the 32bit object can still workable on 64bit CPU.
That's a smart (or brain-damaged) design.

> @findex zero_extract
> @item (zero_extract:@var{m} @var{loc} @var{size} @var{pos})
> Like @code{sign_extract} but refers to an unsigned or zero-extended
> bit-field.  The same sequence of bits are extracted, but they
> are filled to an entire word with zeros instead of by sign-extension.
>

That's depending on the definition of `word` here.
For `(zero_extract:SI`, I think that the word is limit to the low 32bit of
hardware register.
Anyway, it won't break ISA without sign-extension by default.

Due to the nature of sign-extension ISA, if we don't sign-extension the
`int` variable, it will make something wrong.

To make it clear: the word `sign extension` here means:
   the the value of 31bit will be copied to bits [32-63], and
   the value of bits[0-30] won't be copied.
Here is the examples:
li $7, 0xff
li $8, 0x00 00 ff 00
ins $8,$7,16,8
^^
The value of $8 will be: 0x 00 00 00 00 00 ff ff 00

li $7, 0xff
li $8, 0x00 00 ff 00
ins $8,$7,24,8
^^
The value of $8 will be: 0x ff ff ff ff ff 00 ff 00

> Unlike @code{sign_extract}, this type of expressions can be lvalues
> in RTL; they may appear on the left side of an assignment, indicating
> insertion of a value into the specified bit-field.
> @end table
>
>
> > li $7, 0xff
> > li $8, 0
> > dins $8,$7,24,8  # set the 24-32 bits of $8 to 0xff.
> > The value of $8 will be 0x 00 00 00 00 ff 00 00 00.
>
> which isn't correct either.
>

It is not correct or not-correct: The ISA manual just state like this,
and the hardwares are working like this.

> If you look a few dumps further you'll see which instruction was
> recognized, I suspect the machine description is simply w

Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-19 Thread YunQiang Su via Gcc-patches
YunQiang Su  于2023年7月19日周三 16:21写道:
>
> Richard Biener  于2023年7月19日周三 15:22写道:
> >
> > On Wed, 19 Jul 2023, YunQiang Su wrote:
> >
> > > Richard Biener via Gcc-patches  ?2023?7?19??? 
> > > 14:27???
> > > >
> > > > On Wed, 19 Jul 2023, YunQiang Su wrote:
> > > >
> > > > > PR #104914
> > > > >
> > > > > When work with
> > > > >   int val;
> > > > >   ((unsigned char*)&val)[3] = *buf;
> > > > >   if (val > 0) ...
> > > > > The RTX mode is obtained from REG instead of SUBREG, which make
> > > > > D is used instead of .  Thus something wrong happens
> > > > > on sign-extend default architectures, like MIPS64.
> > > > >
> > > > > Let's use str_rtx and mode of str_rtx as the parameters for
> > > > > store_integral_bit_field if:
> > > > >   modes of op0 and str_rtx are INT;
> > > > >   length of op0 is greater than str_rtx.
> > > > >
> > > > > This patch has been tested on aarch64-linux-gnu, x86_64-linux-gnu,
> > > > > mips64el-linux-gnuabi64 without regression.
> > > >
> > > > I still think you are "fixing" this in the wrong place.  The bugzilla
> > > > audit trail points to combine and later notes an eventual expansion
> > > > issue (but for another testcase/target).
> > > >
> > > > You have to explain in more detail on what is wrong with the initial
> > > > RTL on mips.
> > > >
> > >
> > > In the first RTL file, aka xx.c.256r.expand, the zero_extract RTX is like
> > >
> > > (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
> > > (const_int 8 [0x8])
> > > (const_int 0 [0]))
> > > (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
> > >  (nil))
> > >
> > > Not, all of the REG are in DImode. On MIPS64, it will expand to `DINS`
> > > instructions.
> > > While in fact here, we expect an SImode operation, due to `val` in C
> > > code is `int`.
> > >
> > > With my patch, the RTX will be like:
> > >
> > > (insn 10 9 11 2 (set (zero_extract:SI (subreg:SI (reg/v:DI 200 [ val ]) 0)
> > > (const_int 8 [0x8])
> > > (const_int 0 [0]))
> > > (subreg:SI (reg:QI 202) 0)) "xx.c":4:29 -1
> > >  (nil))
> >
> > But if this RTL is correct then the above with DImode is correct as
> > well and the issue is in the backend definition of the instruction
> > defining 'DINS'?
> >
>
> I don't think so.
>
> (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
>  ^^
>  (const_int 8 [0x8])
>  (const_int 0 [0]))
>  (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
>   (nil))
>
> This RTL has only info about DI. It doesn't has any info about the
> real length of
> `val`. For backend, it has no other choice instead of `DINS`.
>
> > > So the operation will be SImode, aka `INS` instruction for MIPS64.
> > >
> > > The problem is based on 2 fact/root cause:
> > > 1. MIPS's `INS` instruction will be always to sign-extension, while 
> > > `DINS` won't
> > > li $7, 0xff
> > > li $8, 0
> > > ins $8,$7,24,8  # set the 24-32 bits of $8 to 0xff.
> > > The value of $8 will be 0xff ff ff ff ff 00 00 00.
> >
> > Bit that's wrong.  (set (zero_extract:SI ...) should not affect
> > bits outside of the indicated range.
> >
>
> In fact, it is how sign-extension arch work.
> No matter wrong or right, the ISA was/is defined like this.
>
> In fact, one MIPS 32 ABI, the same C code will generate the RTL like this,
> and the 32bit object can still workable on 64bit CPU.
> That's a smart (or brain-damaged) design.
>
> > @findex zero_extract
> > @item (zero_extract:@var{m} @var{loc} @var{size} @var{pos})
> > Like @code{sign_extract} but refers to an unsigned or zero-extended
> > bit-field.  The same sequence of bits are extracted, but they
> > are filled to an entire word with zeros instead of by sign-extension.
> >
>
> That's depending on the definition of `word` here.
> For `(zero_extract:SI`, I think that the word is limit to the low 32bit of
> hardware register.
> Anyway, it won't break ISA without sign-extension by default.
>
> Due to the nature of sign-extension ISA, if we don't sign-extension the
> `int` variable, it will make something wrong.
>
> To make it clear: the word `sign extension` here means:
>the the value of 31bit will be copied to bits [32-63], and
>the value of bits[0-30] won't be copied.
> Here is the examples:
> li $7, 0xff
> li $8, 0x00 00 ff 00
> ins $8,$7,16,8
> ^^
> The value of $8 will be: 0x 00 00 00 00 00 ff ff 00
>
> li $7, 0xff
> li $8, 0x00 00 ff 00
> ins $8,$7,24,8
> ^^
> The value of $8 will be: 0x ff ff ff ff ff 00 ff 00
>
> > Unlike @code{sign_extract}, this type of expressions can be lvalues
> > in RTL; they may appear on the left side of an assignment, indicating
> > insertion of a value into the specified bit-field.
> > @end table
> >
> >
> > > li $7, 0xff
> > > li $8, 0
> > > dins $8,$7,24,8  # set the 24-32 bits of $8 to 0xff.
> > > The value of $8 will be 0x 0

[committed] - Re: [patch] OpenMP/Fortran: Non-rectangular loops with constant steps other than 1 or -1 [PR107424]

2023-07-19 Thread Tobias Burnus

Now committed as Rev. r14-2634-g85da0b40538fb0

Changes:

* I missed to updated another 'sorry' (msg wording change) - now fixed;
I also added it to the sorry-testcase file non-rectangular-loop-5.f90.

* I decided to retire the PR as several issues have been fixed and the
original title did not fit any more. The remaining issue is now tracked
in PR110735 (i.e. handling step != const, both the generic and possibly
a simpler special case).

* I added a link to the PR to libgomp.texi such that one can find out
what is only partially supported for Fortran.

Thanks,

Tobias

PS: Otherwise, the following still applies:

On 18.07.23 14:11, Tobias Burnus wrote:

Comments regarding the validity of the Fortran assumptions are welcome!

This patch now uses a 'simple' loop for OpenMP loops with
a constant loop-step size. Before, it only did so for step = ±1.
(Otherwise, a count variable is used from which the original
loop index variable is calculated from.)

For details, see the attached patch or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107424#c12
(comment 12 + 14 plus the email linked in comment 12).

Comments? Remarks? If there are none, I will relatively soonish
commit the attached patch to mainline, only.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 85da0b40538fb0d17d89de1e7905984668e3dfef
Author: Tobias Burnus 
Date:   Wed Jul 19 10:18:49 2023 +0200

OpenMP/Fortran: Non-rectangular loops with constant steps other than 1 or -1 [PR107424]

Before this commit, gfortran produced with OpenMP for 'do i = 1,10,2'
the code
  for (count.0 = 0; count.0 < 5; count.0 = count.0 + 1)
i = count.0 * 2 + 1;

While such an inner loop can be collapsed, a non-rectangular could not.
With this commit and for all constant loop steps, a simple loop such
as 'for (i = 1; i <= 10; i = i + 2)' is created. (Before only for the
constant steps of 1 and -1.)

The constant step permits to know the direction (increasing/decreasing)
that is required for the loop condition.

The new code is only valid if one assumes no overflow of the loop variable.
However, the Fortran standard can be read that this must be ensured by
the user. Namely, the Fortran standard requires (F2023, 10.1.5.2.4):
"The execution of any numeric operation whose result is not defined by
the arithmetic used by the processor is prohibited."

And, for DO loops, F2023's "11.1.7.4.3 The execution cycle" has the
following: The number of loop iterations handled by an iteration count,
which would permit code like 'do i = huge(i)-5, huge(i),4'. However,
in step (3), this count is not only decremented by one but also:
  "... The DO variable, if any, is incremented by the value of the
  incrementation parameter m3."
And for the example above, 'i' would be 'huge(i)+3' in the last
execution cycle, which exceeds the largest model number and should
render the example as invalid.

PR fortran/107424

gcc/fortran/ChangeLog:

* trans-openmp.cc (gfc_nonrect_loop_expr): Accept all
constant loop steps.
(gfc_trans_omp_do): Likewise; use sign to determine
loop direction.

libgomp/ChangeLog:

* libgomp.texi (Impl. Status 5.0): Add link to new PR110735.
* testsuite/libgomp.fortran/non-rectangular-loop-1.f90: Enable
commented tests.
* testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: Remove
test file; tests are in non-rectangular-loop-1.f90.
* testsuite/libgomp.fortran/non-rectangular-loop-5.f90: Change
testcase to use a non-constant step to retain the 'sorry' test.
* testsuite/libgomp.fortran/non-rectangular-loop-6.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/linear-2.f90: Update dump to remove
the additional count variable.
---
 gcc/fortran/trans-openmp.cc|  18 +-
 gcc/testsuite/gfortran.dg/gomp/linear-2.f90|   4 +-
 libgomp/libgomp.texi   |   4 +-
 .../libgomp.fortran/non-rectangular-loop-1.f90 | 537 ++---
 .../libgomp.fortran/non-rectangular-loop-1a.f90| 374 --
 .../libgomp.fortran/non-rectangular-loop-5.f90 |  22 +-
 .../libgomp.fortran/non-rectangular-loop-6.f90 | 196 
 7 files changed, 494 insertions(+), 661 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index c88ee3c7656..cf741cebf91 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -5374,10 +5374,10 @@ gfc_nonrect_loop_expr (stmtblock_t *pblock, gfc_se *sep, int loop_n,
 
   if (!simple

Re: [PATCH v2] vect: Handle demoting FLOAT and promoting FIX_TRUNC.

2023-07-19 Thread Richard Biener via Gcc-patches
On Fri, Jul 14, 2023 at 5:16 PM Robin Dapp  wrote:
>
> >>> Can you add testcases?  Also the current restriction is because
> >>> the variants you add are not always correct and I don't see any
> >>> checks that the intermediate type doesn't lose significant bits?
>
> I didn't manage to create one for aarch64 nor for x86 because AVX512
> has direct conversions e.g. for int64_t -> _Float16 and the new code
> will not be triggered.  Instead I added two separate RISC-V tests.
>
> The attached V2 always checks trapping_math when converting float
> to integer and, like the NARROW_DST case, checks if the operand fits
> the intermediate type when demoting from int to float.
>
> Would that be sufficient?
>
> riscv seems to be the only backend not (yet?) providing pack/unpack
> expanders for the vect conversions and rather relying on extend/trunc
> which seems a disadvantage now, particularly for the cases requiring
> !flag_trapping_math with NONE but not for NARROW_DST.  That might
> be reason enough to implement pack/unpack in the backend.
>
> Nevertheless the patch might improve the status quo a bit?
>
> Regards
>  Robin
>
>
> The recent changes that allowed multi-step conversions for
> "non-packing/unpacking", i.e. modifier == NONE targets included
> promoting to-float and demoting to-int variants.  This patch
> adds the missing demoting to-float and promoting to-int handling.
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vectorizable_conversion): Handle
> more demotion/promotion for modifier == NONE.
>
> gcc/testsuite/ChangeLog:
>
> * 
> gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c: New test.
> * gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c: 
> New test.
> ---
>  .../conversions/vec-narrow-int64-float16.c| 12 
>  .../conversions/vec-widen-float16-int64.c | 12 
>  gcc/tree-vect-stmts.cc| 58 +++
>  3 files changed, 71 insertions(+), 11 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c
>
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c
>  
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c
> new file mode 100644
> index 000..ebee1cfa888
> --- /dev/null
> +++ 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-std=c99 -fno-vect-cost-model 
> -march=rv64gcv_zvfh -mabi=lp64d --param=riscv-autovec-preference=scalable" } 
> */
> +
> +#include 
> +
> +void convert (_Float16 *restrict dst, int64_t *restrict a, int n)
> +{
> +  for (int i = 0; i < n; i++)
> +dst[i] = (_Float16) (a[i] & 0x7fff);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c
>  
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c
> new file mode 100644
> index 000..eb0a17e99bc
> --- /dev/null
> +++ 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-std=c99 -fno-vect-cost-model 
> -march=rv64gcv_zvfh -mabi=lp64d --param=riscv-autovec-preference=scalable 
> -fno-trapping-math" } */
> +
> +#include 
> +
> +void convert (int64_t *restrict dst, _Float16 *restrict a, int n)
> +{
> +  for (int i = 0; i < n; i++)
> +dst[i] = (int64_t) a[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index c08d0ef951f..c78a750301d 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -5192,29 +5192,65 @@ vectorizable_conversion (vec_info *vinfo,
> break;
>}
>
> -  /* For conversions between float and smaller integer types try whether 
> we
> -can use intermediate signed integer types to support the
> +  /* For conversions between float and integer types try whether
> +we can use intermediate signed integer types to support the
>  conversion.  */
>if ((code == FLOAT_EXPR
> -  && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
> +  && GET_MODE_SIZE (lhs_mode) != GET_MODE_SIZE (rhs_mode))

this

>   || (code == FIX_TRUNC_EXPR
> - && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)
> - && !flag_trapping_math))
> + && (GET_MODE_SIZE (rhs_mode) != GET_MODE_SIZE (lhs_mode)

and this check are now common between the FLOAT_EXPR and FIX_TRUNC_EXPR
cases

> + && !flag_trapping_math)))
> {
> + bool demotion = GE

Re: [PATCH V2] Provide -fcf-protection=branch,return.

2023-07-19 Thread Hongtao Liu via Gcc-patches
On Wed, Jul 12, 2023 at 3:27 PM Hongtao Liu  wrote:
>
> ping.
>
> On Mon, May 22, 2023 at 4:08 PM Hongtao Liu  wrote:
> >
> > ping.
> >
> > On Sat, May 13, 2023 at 5:20 PM liuhongt  wrote:
> > >
> > > > I think this could be simplified if you use either EnumSet or
> > > > EnumBitSet instead in common.opt for `-fcf-protection=`.
> > >
> > > Use EnumSet instead of EnumBitSet since CF_FULL is not power of 2.
> > > It is a bit tricky for sets classification, cf_branch and cf_return
> > > should be in different sets, but they both "conflicts" cf_full,
> > > cf_none. And current EnumSet don't handle this well.
> > >
> > > So in the current implementation, only cf_full,cf_none are exclusive
> > > to each other, but they can be combined with any cf_branch, cf_return,
> > > cf_check. It's not perfect, but still an improvement than original
> > > one.
> > >
I'm going to commit this patch if there's no objection, it's just a
refactor of option -fcf-protection=.
If there's any regression observed, I will fix(or revert the patch).
> > > gcc/ChangeLog:
> > >
> > > * common.opt: (fcf-protection=): Add EnumSet attribute to
> > > support combination of params.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * c-c++-common/fcf-protection-10.c: New test.
> > > * c-c++-common/fcf-protection-11.c: New test.
> > > * c-c++-common/fcf-protection-12.c: New test.
> > > * c-c++-common/fcf-protection-8.c: New test.
> > > * c-c++-common/fcf-protection-9.c: New test.
> > > * gcc.target/i386/pr89701-1.c: New test.
> > > * gcc.target/i386/pr89701-2.c: New test.
> > > * gcc.target/i386/pr89701-3.c: New test.
> > > ---
> > >  gcc/common.opt | 12 ++--
> > >  gcc/testsuite/c-c++-common/fcf-protection-10.c |  2 ++
> > >  gcc/testsuite/c-c++-common/fcf-protection-11.c |  2 ++
> > >  gcc/testsuite/c-c++-common/fcf-protection-12.c |  2 ++
> > >  gcc/testsuite/c-c++-common/fcf-protection-8.c  |  2 ++
> > >  gcc/testsuite/c-c++-common/fcf-protection-9.c  |  2 ++
> > >  gcc/testsuite/gcc.target/i386/pr89701-1.c  |  4 
> > >  gcc/testsuite/gcc.target/i386/pr89701-2.c  |  4 
> > >  gcc/testsuite/gcc.target/i386/pr89701-3.c  |  4 
> > >  9 files changed, 28 insertions(+), 6 deletions(-)
> > >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-10.c
> > >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-11.c
> > >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-12.c
> > >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-8.c
> > >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-9.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-3.c
> > >
> > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > index a28ca13385a..02f2472959a 100644
> > > --- a/gcc/common.opt
> > > +++ b/gcc/common.opt
> > > @@ -1886,7 +1886,7 @@ fcf-protection
> > >  Common RejectNegative Alias(fcf-protection=,full)
> > >
> > >  fcf-protection=
> > > -Common Joined RejectNegative Enum(cf_protection_level) 
> > > Var(flag_cf_protection) Init(CF_NONE)
> > > +Common Joined RejectNegative Enum(cf_protection_level) EnumSet 
> > > Var(flag_cf_protection) Init(CF_NONE)
> > >  -fcf-protection=[full|branch|return|none|check]Instrument 
> > > functions with checks to verify jump/call/return control-flow transfer
> > >  instructions have valid targets.
> > >
> > > @@ -1894,19 +1894,19 @@ Enum
> > >  Name(cf_protection_level) Type(enum cf_protection_level) 
> > > UnknownError(unknown Control-Flow Protection Level %qs)
> > >
> > >  EnumValue
> > > -Enum(cf_protection_level) String(full) Value(CF_FULL)
> > > +Enum(cf_protection_level) String(full) Value(CF_FULL) Set(1)
> > >
> > >  EnumValue
> > > -Enum(cf_protection_level) String(branch) Value(CF_BRANCH)
> > > +Enum(cf_protection_level) String(branch) Value(CF_BRANCH) Set(2)
> > >
> > >  EnumValue
> > > -Enum(cf_protection_level) String(return) Value(CF_RETURN)
> > > +Enum(cf_protection_level) String(return) Value(CF_RETURN) Set(3)
> > >
> > >  EnumValue
> > > -Enum(cf_protection_level) String(check) Value(CF_CHECK)
> > > +Enum(cf_protection_level) String(check) Value(CF_CHECK) Set(4)
> > >
> > >  EnumValue
> > > -Enum(cf_protection_level) String(none) Value(CF_NONE)
> > > +Enum(cf_protection_level) String(none) Value(CF_NONE) Set(1)
> > >
> > >  finstrument-functions
> > >  Common Var(flag_instrument_function_entry_exit,1)
> > > diff --git a/gcc/testsuite/c-c++-common/fcf-protection-10.c 
> > > b/gcc/testsuite/c-c++-common/fcf-protection-10.c
> > > new file mode 100644
> > > index 000..b271d134e52
> > > --- /dev/null
> > > +++ b/gcc/testsuite/c-c++-common/fcf-protection-10.c
> > > @@ -0,0 +1,2 @@
> > > +/* { dg-do compile { target { "i?86-*-* x86_64-*-*" } } } */
> > > +/* { dg-

Re: [PATCH 1/2] Add flow_sensitive_info_storage and use it in gimple-fold.

2023-07-19 Thread Richard Biener via Gcc-patches
On Sat, Jul 15, 2023 at 5:21 AM Andrew Pinski via Gcc-patches
 wrote:
>
> This adds flow_sensitive_info_storage and uses it in
> maybe_fold_comparisons_from_match_pd as mentioned in
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621817.html .
> Since using it in maybe_fold_comparisons_from_match_pd was easy
> and allowed me to test the storage earlier, I did it.
>
> This also hides better how the flow sensitive information is
> stored and only a single place needs to be updated if that
> ever changes (again).
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

Thanks for doing this!
Richard.

> gcc/ChangeLog:
>
> * gimple-fold.cc (fosa_unwind): Replace `vrange_storage *`
> with flow_sensitive_info_storage.
> (follow_outer_ssa_edges): Update how to save off the flow
> sensitive info.
> (maybe_fold_comparisons_from_match_pd): Update restoring
> of flow sensitive info.
> * tree-ssanames.cc (flow_sensitive_info_storage::save): New method.
> (flow_sensitive_info_storage::restore): New method.
> (flow_sensitive_info_storage::save_and_clear): New method.
> (flow_sensitive_info_storage::clear_storage): New method.
> * tree-ssanames.h (class flow_sensitive_info_storage): New class.
> ---
>  gcc/gimple-fold.cc   | 17 +--
>  gcc/tree-ssanames.cc | 72 
>  gcc/tree-ssanames.h  | 21 +
>  3 files changed, 100 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 4027ff71e10..de94efbcff7 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -6947,7 +6947,7 @@ and_comparisons_1 (tree type, enum tree_code code1, 
> tree op1a, tree op1b,
>  }
>
>  static basic_block fosa_bb;
> -static vec > *fosa_unwind;
> +static vec > *fosa_unwind;
>  static tree
>  follow_outer_ssa_edges (tree val)
>  {
> @@ -6967,14 +6967,11 @@ follow_outer_ssa_edges (tree val)
>|| POINTER_TYPE_P (TREE_TYPE (val)))
>   && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (val)))
> return NULL_TREE;
> +  flow_sensitive_info_storage storage;
> +  storage.save_and_clear (val);
>/* If the definition does not dominate fosa_bb temporarily reset
>  flow-sensitive info.  */
> -  if (val->ssa_name.info.range_info)
> -   {
> - fosa_unwind->safe_push (std::make_pair
> -   (val, val->ssa_name.info.range_info));
> - val->ssa_name.info.range_info = NULL;
> -   }
> +  fosa_unwind->safe_push (std::make_pair (val, storage));
>return val;
>  }
>return val;
> @@ -7034,14 +7031,14 @@ maybe_fold_comparisons_from_match_pd (tree type, enum 
> tree_code code,
>   type, gimple_assign_lhs (stmt1),
>   gimple_assign_lhs (stmt2));
>fosa_bb = outer_cond_bb;
> -  auto_vec, 8> unwind_stack;
> +  auto_vec, 8> unwind_stack;
>fosa_unwind = &unwind_stack;
>if (op.resimplify (NULL, (!outer_cond_bb
> ? follow_all_ssa_edges : follow_outer_ssa_edges)))
>  {
>fosa_unwind = NULL;
>for (auto p : unwind_stack)
> -   p.first->ssa_name.info.range_info = p.second;
> +   p.second.restore (p.first);
>if (gimple_simplified_result_is_gimple_val (&op))
> {
>   tree res = op.ops[0];
> @@ -7065,7 +7062,7 @@ maybe_fold_comparisons_from_match_pd (tree type, enum 
> tree_code code,
>  }
>fosa_unwind = NULL;
>for (auto p : unwind_stack)
> -p.first->ssa_name.info.range_info = p.second;
> +p.second.restore (p.first);
>
>return NULL_TREE;
>  }
> diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc
> index 5fdb6a37e9f..f81332451fc 100644
> --- a/gcc/tree-ssanames.cc
> +++ b/gcc/tree-ssanames.cc
> @@ -916,3 +916,75 @@ make_pass_release_ssa_names (gcc::context *ctxt)
>  {
>return new pass_release_ssa_names (ctxt);
>  }
> +
> +/* Save and restore of flow sensitive information. */
> +
> +/* Save off the flow sensitive info from NAME. */
> +
> +void
> +flow_sensitive_info_storage::save (tree name)
> +{
> +  gcc_assert (state == 0);
> +  if (!POINTER_TYPE_P (TREE_TYPE (name)))
> +{
> +  range_info = SSA_NAME_RANGE_INFO (name);
> +  state = 1;
> +  return;
> +}
> +  state = -1;
> +  auto ptr_info = SSA_NAME_PTR_INFO (name);
> +  if (ptr_info)
> +{
> +  align = ptr_info->align;
> +  misalign = ptr_info->misalign;
> +  null = SSA_NAME_PTR_INFO (name)->pt.null;
> +}
> +  else
> +{
> +  align = 0;
> +  misalign = 0;
> +  null = true;
> +}
> +}
> +
> +/* Restore the flow sensitive info from NAME. */
> +
> +void
> +flow_sensitive_info_storage::restore (tree name)
> +{
> +  gcc_assert (state != 0);
> +  if (!POINTER_TYPE_P (TREE_TYPE (name)))
> +{
> +  gcc_assert (state == 1);
> +  SSA_NAME_RANGE_INFO (name) = range_info;
> +  return;
> +}
>

Re: [PATCH 2/2] [PATCH] Fix tree-opt/110252: wrong code due to phiopt using flow sensitive info during match

2023-07-19 Thread Richard Biener via Gcc-patches
On Sat, Jul 15, 2023 at 5:21 AM Andrew Pinski via Gcc-patches
 wrote:
>
> Match will query ranger via tree_nonzero_bits/get_nonzero_bits for 2 and 3rd
> operand of the COND_EXPR and phiopt tries to do create the COND_EXPR even if 
> we moving
> one statement. That one statement could have some flow sensitive information 
> on it
> based on the condition that is for the COND_EXPR but that might create wrong 
> code
> if the statement was moved out.
>
> This is similar to the previous version of the patch except now we use
> flow_sensitive_info_storage instead of manually doing the save/restore
> and also handle all defs on a gimple statement rather than just for lhs
> of the gimple statement. Oh and a few more testcases were added that
> was failing before.
>
> OK? Bootsrapped and tested on x86_64-linux-gnu with no regressions.

OK.

Thanks,
Richard.

> PR tree-optimization/110252
>
> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (class auto_flow_sensitive): New class.
> (auto_flow_sensitive::auto_flow_sensitive): New constructor.
> (auto_flow_sensitive::~auto_flow_sensitive): New deconstructor.
> (match_simplify_replacement): Temporarily
> remove the flow sensitive info on the two statements that might
> be moved.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/phi-opt-25b.c: Updated as
> __builtin_parity loses the nonzerobits info.
> * gcc.c-torture/execute/pr110252-1.c: New test.
> * gcc.c-torture/execute/pr110252-2.c: New test.
> * gcc.c-torture/execute/pr110252-3.c: New test.
> * gcc.c-torture/execute/pr110252-4.c: New test.
> ---
>  .../gcc.c-torture/execute/pr110252-1.c| 15 ++
>  .../gcc.c-torture/execute/pr110252-2.c| 10 
>  .../gcc.c-torture/execute/pr110252-3.c| 13 +
>  .../gcc.c-torture/execute/pr110252-4.c|  8 +++
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c   |  6 +--
>  gcc/tree-ssa-phiopt.cc| 51 +--
>  6 files changed, 96 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110252-1.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110252-2.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110252-3.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110252-4.c
>
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110252-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr110252-1.c
> new file mode 100644
> index 000..4ae93ca0647
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr110252-1.c
> @@ -0,0 +1,15 @@
> +/* This is reduced from sel-sched.cc which was noticed was being miscompiled 
> too. */
> +int g(int min_need_stall) __attribute__((__noipa__));
> +int g(int min_need_stall)
> +{
> +  return  min_need_stall < 0 ? 1 : ((min_need_stall) < (1) ? 
> (min_need_stall) : (1));
> +}
> +int main(void)
> +{
> +  for(int i = -100; i <= 100; i++)
> +{
> +  int t = g(i);
> +  if (t != (i!=0))
> +__builtin_abort();
> +}
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110252-2.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr110252-2.c
> new file mode 100644
> index 000..7f1a7dbf134
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr110252-2.c
> @@ -0,0 +1,10 @@
> +signed char f() __attribute__((__noipa__));
> +signed char f() { return 0; }
> +int main()
> +{
> +  int g = f() - 1;
> +  int e = g < 0 ? 1 : ((g >> (8-2))!=0);
> +  asm("":"+r"(e));
> +  if (e != 1)
> +__builtin_abort();
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110252-3.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr110252-3.c
> new file mode 100644
> index 000..c24bf1ab1e4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr110252-3.c
> @@ -0,0 +1,13 @@
> +
> +unsigned int a = 1387579096U;
> +void sinkandcheck(unsigned b) __attribute__((noipa));
> +void sinkandcheck(unsigned b)
> +{
> +if (a != b)
> +__builtin_abort();
> +}
> +int main() {
> +a = 1 < (~a) ? 1 : (~a);
> +sinkandcheck(1);
> +return 0;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110252-4.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr110252-4.c
> new file mode 100644
> index 000..f97edd3f069
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr110252-4.c
> @@ -0,0 +1,8 @@
> +
> +int a, b = 2, c = 2;
> +int main() {
> +  b = ~(1 % (a ^ (b - (1 && c) || c & b)));
> +  if (b < -1)
> +__builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c
> index 7298da0c96e..0fd9b004a03 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c
> @@ -65,8 +65,6 @@ int test_popcountll(unsigned long long x, unsigned long 
> long y)
>return x ? __builtin_popcountll(y) : 0;
>  }
>
> -/* 3 types of functions (not including parity), each wit

Re: [RFC/RFT, V2 0/3] Add compiler support for Kernel Control Flow Integrity

2023-07-19 Thread Dan Li via Gcc-patches
Hi All,

Embarrassingly, due to personal reasons, I may not be able to complete
the series of patches on the forward side of GCC CFI for the time being.

Please forgive me for not realizing that I should have sent this help
email a long time ago :(

This topic has been delayed for a long time, and I would be very grateful
if someone can help complete this series of patches.

BTW, please let me know if there are more groups I can cc for help.

Thanks!
Dan.

On Sat, 25 Mar 2023 at 16:11, Dan Li  wrote:
>
> This series of patches is mainly used to support the control flow
> integrity protection of the linux kernel [1], which is similar to
> -fsanitize=kcfi in clang 16.0 [2,3].
>
> Any suggestion please let me know :).
>
> Thanks, Dan.
>
> [1] 
> https://lore.kernel.org/all/20220908215504.3686827-1-samitolva...@google.com/
> [2] https://clang.llvm.org/docs/ControlFlowIntegrity.html
> [3] https://reviews.llvm.org/D119296
>
> Signed-off-by: Dan Li 
>
> ---
> Dan Li (3):
>   [PR102768] flag-types.h (enum sanitize_code): Extend sanitize_code to
> 64 bits to support more features
>   [PR102768] Support CFI: Add basic support for Kernel Control Flow
> Integrity
>   [PR102768] aarch64: Add support for Kernel Control Flow Integrity
>
>  gcc/asan.h|   4 +-
>  gcc/c-family/c-attribs.cc |  10 +-
>  gcc/c-family/c-common.h   |   2 +-
>  gcc/c/c-parser.cc |   4 +-
>  gcc/cfgexpand.cc  |  26 ++
>  gcc/cgraphunit.cc |  34 +++
>  gcc/combine.cc|   1 +
>  gcc/common.opt|   4 +-
>  gcc/config/aarch64/aarch64.cc | 166 ++
>  gcc/cp/typeck.cc  |   2 +-
>  gcc/doc/invoke.texi   |  36 
>  gcc/doc/tm.texi   |  27 ++
>  gcc/doc/tm.texi.in|   8 ++
>  gcc/dwarf2asm.cc  |   2 +-
>  gcc/emit-rtl.cc   |   1 +
>  gcc/emit-rtl.h|   4 +
>  gcc/final.cc  |  24 -
>  gcc/flag-types.h  |  67 +++---
>  gcc/gimple.cc |  11 +++
>  gcc/gimple.h  |   5 +-
>  gcc/opt-suggestions.cc|   2 +-
>  gcc/opts.cc   |  26 +++---
>  gcc/opts.h|   8 +-
>  gcc/output.h  |   3 +
>  gcc/reg-notes.def |   1 +
>  gcc/target.def|  38 
>  gcc/toplev.cc |   4 +
>  gcc/tree-cfg.cc   |   2 +-
>  gcc/tree.cc   | 144 +
>  gcc/tree.h|   1 +
>  gcc/varasm.cc |  26 ++
>  31 files changed, 627 insertions(+), 66 deletions(-)
>
> --
> 2.17.1
>


Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-07-19 Thread Martin Uecker via Gcc-patches
Am Montag, dem 17.07.2023 um 16:40 -0700 schrieb Kees Cook:
> On Mon, Jul 17, 2023 at 09:17:48PM +, Qing Zhao wrote:
> > 
> > > On Jul 13, 2023, at 4:31 PM, Kees Cook 
> > > wrote:
> > > 
> > > In the bug, the problem is that "p" isn't known to be allocated,
> > > if I'm
> > > reading that correctly?
> > 
> > I think that the major point in PR109557
> > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109557):
> > 
> > for the following pointer p.3_1, 
> > 
> > p.3_1 = p;
> > _2 = __builtin_object_size (p.3_1, 0);
> > 
> > Question: why the size of p.3_1 cannot use the TYPE_SIZE of the
> > pointee of p when the TYPE_SIZE can be determined at compile time?
> > 
> > Answer:  From just knowing the type of the pointee of p, the
> > compiler cannot determine the size of the object.  
> 
> Why is that? "p" points to "struct P", which has a fixed size. There
> must be an assumption somewhere that a pointer is allocated,
> otherwise
> __bos would almost never work?

It often does not work, because it relies on the optimizer
propagating the information instead of the type system.

This is why it would be better to have proper *bounds* checks,
and not just object size checks. It is not quite clear to me
how BOS and bounds checking is supposed to work together,
but FAMs should be bounds checked. 

...

> 
> > 
> > > This may be
> > > desirable in a few situations. One example would be a large
> > > allocation
> > > that is slowly filled up by the program.
> > 
> > So, for such situation, whenever the allocation is filled up, the
> > field that hold the “counted_by” attribute should be increased at
> > the same time,
> > Then, the “counted_by” value always sync with the real allocation. 
> > > I.e. the counted_by member is
> > > slowly increased during runtime (but not beyond the true
> > > allocation size).
> > 
> > Then there should be source code to increase the “counted_by” field
> > whenever the allocated space increased too. 
> > > 
> > > Of course allocation size is only available in limited
> > > situations, so
> > > the loss of that info is fine: we have counted_by for everything
> > > else.
> > 
> > The point is: allocation size should synced with the value of
> > “counted_by”. LLVM’s RFC also have the similar requirement:
> > https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18
> 
> Right, I'm saying it would be nice if __alloc_size was checked as
> well,
> in the sense that if it is available, it knows without question what
> the
> size of the allocation is. If __alloc_size and __counted_by conflict,
> the smaller of the two should be the truth.
> 
> But, as I said, if there is some need to explicitly ignore
> __alloc_size
> when __counted_by is present, I can live with it; we just need to
> document it.
> 
> If the RFC and you agree that the __counted_by variable can only ever
> be
> (re)assigned after the flex array has been (re)allocated, then I
> guess
> we'll see how it goes. :) I think most places in the kernel using
> __counted_by will be fine, but I suspect we may have cases where we
> need
> to update it like in the loop I described above. If that's true, we
> can
> revisit the requirement then. :)

It should be the other way round: You should first set
'count' and then reassign the pointer, because you can then
often check the pointer assignment (reading 'count').  The
other way round this works only sometimes, i.e. if both
assignments are close together and the optimizer can see this.



Martin






Re: PING^2 [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

2023-07-19 Thread Kewen.Lin via Gcc-patches
Hi Fangrui,

on 2023/7/19 14:33, Fangrui Song wrote:
> On Thu, Nov 24, 2022 at 7:26 PM Kewen.Lin via Gcc-patches
>  wrote:
>>
>> Hi Richard,
>>
>> on 2022/11/23 00:08, Richard Sandiford wrote:
>>> "Kewen.Lin"  writes:
 Hi Richard,

 Many thanks for your review comments!

>>> on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote:
 Hi,

 As discussed in PR98125, -fpatchable-function-entry with
 SECTION_LINK_ORDER support doesn't work well on powerpc64
 ELFv1 because the filled "Symbol" in

   .section name,"flags"o,@type,Symbol

 sits in .opd section instead of in the function_section
 like .text or named .text*.

 Since we already generates one label LPFE* which sits in
 function_section of current_function_decl, this patch is
 to reuse it as the symbol for the linked_to section.  It
 avoids the above ABI specific issue when using the symbol
 concluded from current_function_decl.

 Besides, with this support some previous workarounds for
 powerpc64 ELFv1 can be reverted.

 btw, rs6000_print_patchable_function_entry can be dropped
 but there is another rs6000 patch which needs this rs6000
 specific hook rs6000_print_patchable_function_entry, not
 sure which one gets landed first, so just leave it here.

 Bootstrapped and regtested on below:

   1) powerpc64-linux-gnu P8 with default binutils 2.27
  and latest binutils 2.39.
   2) powerpc64le-linux-gnu P9 (default binutils 2.30).
   3) powerpc64le-linux-gnu P10 (default binutils 2.30).
   4) x86_64-redhat-linux with default binutils 2.30
  and latest binutils 2.39.
   5) aarch64-linux-gnu  with default binutils 2.30
  and latest binutils 2.39.


 [snip...]

 diff --git a/gcc/varasm.cc b/gcc/varasm.cc
 index 4db8506b106..d4de6e164ee 100644
 --- a/gcc/varasm.cc
 +++ b/gcc/varasm.cc
 @@ -6906,11 +6906,16 @@ default_elf_asm_named_section (const char 
 *name, unsigned int flags,
  fprintf (asm_out_file, ",%d", flags & SECTION_ENTSIZE);
if (flags & SECTION_LINK_ORDER)
  {
 -  tree id = DECL_ASSEMBLER_NAME (decl);
 -  ultimate_transparent_alias_target (&id);
 -  const char *name = IDENTIFIER_POINTER (id);
 -  name = targetm.strip_name_encoding (name);
 -  fprintf (asm_out_file, ",%s", name);
 +  /* For now, only section "__patchable_function_entries"
 + adopts flag SECTION_LINK_ORDER, internal label LPFE*
 + was emitted in default_print_patchable_function_entry,
 + just place it here for linked_to section.  */
 +  gcc_assert (!strcmp (name, "__patchable_function_entries"));
>
> I like the idea of removing the rs600 workaround in favour of making the
> target-independent more robust.  But this seems a bit hackish.  What
> would we do if SECTION_LINK_ORDER was used for something else in future?
>

 Good question!  I think it depends on how we can get the symbol for the
 linked_to section, if adopting the name of the decl will suffer the
 similar issue which this patch wants to fix, we have to reuse the label
 LPFE* or some kind of new artificial label in the related section; or
 we can just go with the name of the given decl, or something related to
 that decl.  Since we can't predict any future uses, I just placed an
 assertion here to ensure that we would revisit and adjust this part at
 that time.  Does it sound reasonable to you?
>>>
>>> Yeah, I guess that's good enough.  If the old scheme ends up being
>>> correct for some future use, we can make the new behaviour conditional
>>> on __patchable_function_entries.
>>
>> Yes, we can check if the given section name is
>> "__patchable_function_entries".
>>
>>>
>>> So yeah, the patch LGTM to me, thanks.
>>
>> Thanks again!  I rebased and re-tested it on x86/aarch64/powerpc64{,le},
>> just committed in r13-4294-gf120196382ac5a.
>>
>> BR,
>> Kewen
> 
> Hi, Kewen, do you think whether your patch fixed
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110729
> (__patchable_function_entries has wrong sh_link) ?

I just had a check and confirmed that it did fix the wrong
sh_link, in the past it always uses the decl saved in 
named.decl (always f for the test case in PR110729),
with this patch, it switches to use the label in its
corresponding .text* (function section).

> If yes, it may be useful to include some assembly tests... Right now
> 
> rg '\.section.*__patchable' gcc/testsuite/
> 
> returns nothing.

It's a good idea to add some testing coverage, I'm going
to make a test ca

Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-19 Thread YunQiang Su via Gcc-patches
I am not sure this patch is best, while I think that I am sure the
initial RTL is not correct,
the initial RTL of ARM64 is like

(insn 8 7 9 2 (set (zero_extract:SI (reg/v:SI 98 [ val ])
  ^^
(const_int 8 [0x8])
(const_int 0 [0]))
(reg:SI 102)) "xx.c":3:29 -1
 (nil))


YunQiang Su  于2023年7月19日周三 16:25写道:
>
> YunQiang Su  于2023年7月19日周三 16:21写道:
> >
> > Richard Biener  于2023年7月19日周三 15:22写道:
> > >
> > > On Wed, 19 Jul 2023, YunQiang Su wrote:
> > >
> > > > Richard Biener via Gcc-patches  ?2023?7?19??? 
> > > > 14:27???
> > > > >
> > > > > On Wed, 19 Jul 2023, YunQiang Su wrote:
> > > > >
> > > > > > PR #104914
> > > > > >
> > > > > > When work with
> > > > > >   int val;
> > > > > >   ((unsigned char*)&val)[3] = *buf;
> > > > > >   if (val > 0) ...
> > > > > > The RTX mode is obtained from REG instead of SUBREG, which make
> > > > > > D is used instead of .  Thus something wrong happens
> > > > > > on sign-extend default architectures, like MIPS64.
> > > > > >
> > > > > > Let's use str_rtx and mode of str_rtx as the parameters for
> > > > > > store_integral_bit_field if:
> > > > > >   modes of op0 and str_rtx are INT;
> > > > > >   length of op0 is greater than str_rtx.
> > > > > >
> > > > > > This patch has been tested on aarch64-linux-gnu, x86_64-linux-gnu,
> > > > > > mips64el-linux-gnuabi64 without regression.
> > > > >
> > > > > I still think you are "fixing" this in the wrong place.  The bugzilla
> > > > > audit trail points to combine and later notes an eventual expansion
> > > > > issue (but for another testcase/target).
> > > > >
> > > > > You have to explain in more detail on what is wrong with the initial
> > > > > RTL on mips.
> > > > >
> > > >
> > > > In the first RTL file, aka xx.c.256r.expand, the zero_extract RTX is 
> > > > like
> > > >
> > > > (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
> > > > (const_int 8 [0x8])
> > > > (const_int 0 [0]))
> > > > (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
> > > >  (nil))
> > > >
> > > > Not, all of the REG are in DImode. On MIPS64, it will expand to `DINS`
> > > > instructions.
> > > > While in fact here, we expect an SImode operation, due to `val` in C
> > > > code is `int`.
> > > >
> > > > With my patch, the RTX will be like:
> > > >
> > > > (insn 10 9 11 2 (set (zero_extract:SI (subreg:SI (reg/v:DI 200 [ val ]) 
> > > > 0)
> > > > (const_int 8 [0x8])
> > > > (const_int 0 [0]))
> > > > (subreg:SI (reg:QI 202) 0)) "xx.c":4:29 -1
> > > >  (nil))
> > >
> > > But if this RTL is correct then the above with DImode is correct as
> > > well and the issue is in the backend definition of the instruction
> > > defining 'DINS'?
> > >
> >
> > I don't think so.
> >
> > (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
> >  ^^
> >  (const_int 8 [0x8])
> >  (const_int 0 [0]))
> >  (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
> >   (nil))
> >
> > This RTL has only info about DI. It doesn't has any info about the
> > real length of
> > `val`. For backend, it has no other choice instead of `DINS`.
> >
> > > > So the operation will be SImode, aka `INS` instruction for MIPS64.
> > > >
> > > > The problem is based on 2 fact/root cause:
> > > > 1. MIPS's `INS` instruction will be always to sign-extension, while 
> > > > `DINS` won't
> > > > li $7, 0xff
> > > > li $8, 0
> > > > ins $8,$7,24,8  # set the 24-32 bits of $8 to 0xff.
> > > > The value of $8 will be 0xff ff ff ff ff 00 00 00.
> > >
> > > Bit that's wrong.  (set (zero_extract:SI ...) should not affect
> > > bits outside of the indicated range.
> > >
> >
> > In fact, it is how sign-extension arch work.
> > No matter wrong or right, the ISA was/is defined like this.
> >
> > In fact, one MIPS 32 ABI, the same C code will generate the RTL like this,
> > and the 32bit object can still workable on 64bit CPU.
> > That's a smart (or brain-damaged) design.
> >
> > > @findex zero_extract
> > > @item (zero_extract:@var{m} @var{loc} @var{size} @var{pos})
> > > Like @code{sign_extract} but refers to an unsigned or zero-extended
> > > bit-field.  The same sequence of bits are extracted, but they
> > > are filled to an entire word with zeros instead of by sign-extension.
> > >
> >
> > That's depending on the definition of `word` here.
> > For `(zero_extract:SI`, I think that the word is limit to the low 32bit of
> > hardware register.
> > Anyway, it won't break ISA without sign-extension by default.
> >
> > Due to the nature of sign-extension ISA, if we don't sign-extension the
> > `int` variable, it will make something wrong.
> >
> > To make it clear: the word `sign extension` here means:
> >the the value of 31bit will be copied to bits [32-63], and
> >the value of bits[0-30] won't be copied.
> > Here is the examples:
> > li $7,

Re: [PATCH] core: Support heap-based trampolines

2023-07-19 Thread Martin Uecker via Gcc-patches



> 
> > On 17 Jul 2023, 
> 

> >> You mention setjmp/longjmp - on darwin and other platforms
> requiring
> >> non-stack based trampolines
> >> does the system runtime provide means to deal with this issue like
> an
> >> alternate allocation method
> >> or a way to register cleanup?
> > 
> > There is an alternate mechanism relying on system libraries that is
> possible on darwin specifically (I don’t know for other targets) but
> it will only work for signed binaries, and would require us to
> codesign everything produced by gcc. During development, it was
> deemed too big an ask and the current strategy was chosen (Iain can
> surely add more background on that if needed).
> 
> I do not think that this solves the setjump/longjump issue - since
> there’s still a notional allocation that takes place (it’s just that
> the mechanism for determining permissions is different).
> 
> It is also a big barrier for the general user - and prevents normal
> folks from distributing GCC - since codesigning requires an external
> certificate (i.e. I would really rather avoid it).
> 
> >> Was there ever an attempt to provide a "generic" trampoline driven
> by
> >> a more complex descriptor?
> 
> We did look at the “unused address bits” mechanism that Ada has used
> - but that is not really available to a non-private ABI (unless the
> system vendor agrees to change ABI to leave a bit spare) for the base
> arch either the bits are not there (e.g. X86) or reserved (e.g.
> AArch64).
> 
> Andrew Burgess did the original work he might have comments on
> alternatives we tried
> 

For reference, I proposed a patch for this in 2018. It was not
accepted because minimum alignment for functions would increase
for some archs:

https://gcc.gnu.org/legacy-ml/gcc-patches/2018-12/msg01532.html



> >> (well, it could be a bytecode interpreter and the trampoline being
> >> bytecode on the stack?!)
> > 
> > My own opinion is that executable stack should go away on all
> targets at some point, so a truly generic solution to the problem
> would be great.
> 
> indeed it would.
> 

I think we need a solution rather sooner than later on all archs.

Martin

> > Having something that works reliably across all targets, like you
> suggest, is a much bigger project that this patch, and I am not aware
> of any previous attempt at it.
> 
> The bytecode interpreter idea is neat;  (a) I wonder about
> performance and (b) it is, as FX says, a bigger project - certainly
> bigger than the voluntary Darwin time available :(
> 
> Iain
> 
> > 
> > 
> >> Otherwise I suggest to split the patch into libgcc, generic and
> target parts.
> > 
> > 
> 




Re: [PATCH v2] tree-optimization/110279- Check for nested FMA chains in reassoc

2023-07-19 Thread Richard Biener via Gcc-patches
On Tue, Jul 11, 2023 at 5:00 AM Di Zhao OS via Gcc-patches
 wrote:
>
> Attached is an updated version of the patch.
>
> Based on Philipp's review, some changes:
>
> 1. Defined new enum fma_state to describe the state of FMA candidates
>for a list of operands. (Since the tests seems simple after the
>change, I didn't add predicates on it.)
> 2. Changed return type of convert_mult_to_fma_1 and convert_mult_to_fma
>to tree, to remove the in/out parameter.
> 3. Added description of return value values of rank_ops_for_fma.

I'll note that rank_ops_for_fma works on the single-use addition chain only so
there might be FMA ops "elsewhere" in the dependence chain that are not
in 'ops'.

You are using defer_p = false for the fma_deferring_state so I wonder why
you need this machinery at all?  Wouldn't the restriction only apply when
we'd actually deferred a FMA generation?  I'm CCing Martin who did this work.

But what I'm curious about is that if any of the new conditions trigger then
you leave the ops chain alone - but it _could_ already be in "bad" shape,
is there anything we could do to improve ordering?

Also

   if (ops_mult.length () >= 2 && ops_mult.length () != ops_length)
 {
+  if (maybe_le (tree_to_poly_int64 (TYPE_SIZE (type)),
+   param_avoid_fma_max_bits))
+   /* Avoid re-arrange to produce less FMA chains that can be slow.  */
+   return FMA_STATE_MULTIPLE;
+
   /* Put no-mult ops and mult ops alternately at the end of the
 queue, which is conducive to generating more FMA and reducing the
 loss of FMA when breaking the chain.  */
@@ -6829,9 +6909,9 @@ rank_ops_for_fma (vec *ops)
  if (opindex > 0)
opindex--;
}
-  return true;
+  return FMA_STATE_MULTIPLE;

so we end up parallel rewriting in any case and just avoid
"optimizing" the ops list here.
>From the PR it looks like without the rewriting we are lucky that the
FMA generation
heuristic to avoid cross backedge FMA dependences doesn't trigger
without associating
(but parallel rewrite is still good)?

For the NESTED case we avoid parallel rewriting completely, independent on
param_avoid_fma_max_bits - it seems from the PR we want more FMAs here
and the parallel rewriting makes it worse?  But I don't see exactly how.

I think these are all a bit fragile heuristics also give that reassoc
works on single-use
chains only.  The more we interwind FMA creation in widen_mult and associating
for FMA in reassoc the more I think that reassoc itself should form the FMAs?
The passes are unfortunately quite a bit separated.

Can you produce testcases for the two problematical cases in the PR?

That said, the added heuristics are not looking universally good to me without
better motivation.

> ---
> gcc/ChangeLog:
>

Missing

   PR tree-optimization/110279

so bugzilla picks up the commit.

> * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Added new parameter
> check_only_p. Changed return type to tree.
> (struct fma_transformation_info): Moved to header.
> (class fma_deferring_state): Moved to header.
> (convert_mult_to_fma): Added new parameter check_only_p. Changed
> return type to tree.
> * tree-ssa-math-opts.h (struct fma_transformation_info): Moved from 
> .cc.
> (class fma_deferring_state): Moved from .cc.
> (convert_mult_to_fma): Add function decl.
> * tree-ssa-reassoc.cc (enum fma_state): Defined new enum to describe
> the state of FMA candidates for a list of operands.
> (rewrite_expr_tree_parallel): Changed boolean parameter to enum type.
> (rank_ops_for_fma): Return enum fma_state.
> (reassociate_bb): Avoid rewriting to parallel if nested FMAs are 
> found.
>
> Thanks,
> Di Zhao
>
>


Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-19 Thread Richard Biener via Gcc-patches
On Wed, 19 Jul 2023, YunQiang Su wrote:

> Richard Biener  ?2023?7?19??? 15:22???
> >
> > On Wed, 19 Jul 2023, YunQiang Su wrote:
> >
> > > Richard Biener via Gcc-patches  ?2023?7?19??? 
> > > 14:27???
> > > >
> > > > On Wed, 19 Jul 2023, YunQiang Su wrote:
> > > >
> > > > > PR #104914
> > > > >
> > > > > When work with
> > > > >   int val;
> > > > >   ((unsigned char*)&val)[3] = *buf;
> > > > >   if (val > 0) ...
> > > > > The RTX mode is obtained from REG instead of SUBREG, which make
> > > > > D is used instead of .  Thus something wrong happens
> > > > > on sign-extend default architectures, like MIPS64.
> > > > >
> > > > > Let's use str_rtx and mode of str_rtx as the parameters for
> > > > > store_integral_bit_field if:
> > > > >   modes of op0 and str_rtx are INT;
> > > > >   length of op0 is greater than str_rtx.
> > > > >
> > > > > This patch has been tested on aarch64-linux-gnu, x86_64-linux-gnu,
> > > > > mips64el-linux-gnuabi64 without regression.
> > > >
> > > > I still think you are "fixing" this in the wrong place.  The bugzilla
> > > > audit trail points to combine and later notes an eventual expansion
> > > > issue (but for another testcase/target).
> > > >
> > > > You have to explain in more detail on what is wrong with the initial
> > > > RTL on mips.
> > > >
> > >
> > > In the first RTL file, aka xx.c.256r.expand, the zero_extract RTX is like
> > >
> > > (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
> > > (const_int 8 [0x8])
> > > (const_int 0 [0]))
> > > (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
> > >  (nil))
> > >
> > > Not, all of the REG are in DImode. On MIPS64, it will expand to `DINS`
> > > instructions.
> > > While in fact here, we expect an SImode operation, due to `val` in C
> > > code is `int`.
> > >
> > > With my patch, the RTX will be like:
> > >
> > > (insn 10 9 11 2 (set (zero_extract:SI (subreg:SI (reg/v:DI 200 [ val ]) 0)
> > > (const_int 8 [0x8])
> > > (const_int 0 [0]))
> > > (subreg:SI (reg:QI 202) 0)) "xx.c":4:29 -1
> > >  (nil))
> >
> > But if this RTL is correct then the above with DImode is correct as
> > well and the issue is in the backend definition of the instruction
> > defining 'DINS'?
> >
> 
> I don't think so.
> 
> (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
>  ^^
>  (const_int 8 [0x8])
>  (const_int 0 [0]))
>  (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
>   (nil))
> 
> This RTL has only info about DI. It doesn't has any info about the
> real length of
> `val`. For backend, it has no other choice instead of `DINS`.
> 
> > > So the operation will be SImode, aka `INS` instruction for MIPS64.
> > >
> > > The problem is based on 2 fact/root cause:
> > > 1. MIPS's `INS` instruction will be always to sign-extension, while 
> > > `DINS` won't
> > > li $7, 0xff
> > > li $8, 0
> > > ins $8,$7,24,8  # set the 24-32 bits of $8 to 0xff.
> > > The value of $8 will be 0xff ff ff ff ff 00 00 00.
> >
> > Bit that's wrong.  (set (zero_extract:SI ...) should not affect
> > bits outside of the indicated range.
> >
> 
> In fact, it is how sign-extension arch work.
> No matter wrong or right, the ISA was/is defined like this.
> 
> In fact, one MIPS 32 ABI, the same C code will generate the RTL like this,
> and the 32bit object can still workable on 64bit CPU.
> That's a smart (or brain-damaged) design.
> 
> > @findex zero_extract
> > @item (zero_extract:@var{m} @var{loc} @var{size} @var{pos})
> > Like @code{sign_extract} but refers to an unsigned or zero-extended
> > bit-field.  The same sequence of bits are extracted, but they
> > are filled to an entire word with zeros instead of by sign-extension.
> >
> 
> That's depending on the definition of `word` here.
> For `(zero_extract:SI`, I think that the word is limit to the low 32bit of
> hardware register.
> Anyway, it won't break ISA without sign-extension by default.
> 
> Due to the nature of sign-extension ISA, if we don't sign-extension the
> `int` variable, it will make something wrong.
> 
> To make it clear: the word `sign extension` here means:
>the the value of 31bit will be copied to bits [32-63], and
>the value of bits[0-30] won't be copied.
> Here is the examples:
> li $7, 0xff
> li $8, 0x00 00 ff 00
> ins $8,$7,16,8
> ^^
> The value of $8 will be: 0x 00 00 00 00 00 ff ff 00
> 
> li $7, 0xff
> li $8, 0x00 00 ff 00
> ins $8,$7,24,8
> ^^
> The value of $8 will be: 0x ff ff ff ff ff 00 ff 00

But that's INS.

> > Unlike @code{sign_extract}, this type of expressions can be lvalues
> > in RTL; they may appear on the left side of an assignment, indicating
> > insertion of a value into the specified bit-field.
> > @end table

Note ^^^ applies for (zero_extract ..) as a SET destination.  The
issue is probably that MIPS is WORD_REGISTER_OPER

Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-19 Thread Richard Biener via Gcc-patches
On Wed, 19 Jul 2023, Richard Biener wrote:

> On Wed, 19 Jul 2023, YunQiang Su wrote:
> 
> > Richard Biener  ?2023?7?19??? 15:22???
> > >
> > > On Wed, 19 Jul 2023, YunQiang Su wrote:
> > >
> > > > Richard Biener via Gcc-patches  ?2023?7?19??? 
> > > > 14:27???
> > > > >
> > > > > On Wed, 19 Jul 2023, YunQiang Su wrote:
> > > > >
> > > > > > PR #104914
> > > > > >
> > > > > > When work with
> > > > > >   int val;
> > > > > >   ((unsigned char*)&val)[3] = *buf;
> > > > > >   if (val > 0) ...
> > > > > > The RTX mode is obtained from REG instead of SUBREG, which make
> > > > > > D is used instead of .  Thus something wrong happens
> > > > > > on sign-extend default architectures, like MIPS64.
> > > > > >
> > > > > > Let's use str_rtx and mode of str_rtx as the parameters for
> > > > > > store_integral_bit_field if:
> > > > > >   modes of op0 and str_rtx are INT;
> > > > > >   length of op0 is greater than str_rtx.
> > > > > >
> > > > > > This patch has been tested on aarch64-linux-gnu, x86_64-linux-gnu,
> > > > > > mips64el-linux-gnuabi64 without regression.
> > > > >
> > > > > I still think you are "fixing" this in the wrong place.  The bugzilla
> > > > > audit trail points to combine and later notes an eventual expansion
> > > > > issue (but for another testcase/target).
> > > > >
> > > > > You have to explain in more detail on what is wrong with the initial
> > > > > RTL on mips.
> > > > >
> > > >
> > > > In the first RTL file, aka xx.c.256r.expand, the zero_extract RTX is 
> > > > like
> > > >
> > > > (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
> > > > (const_int 8 [0x8])
> > > > (const_int 0 [0]))
> > > > (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
> > > >  (nil))
> > > >
> > > > Not, all of the REG are in DImode. On MIPS64, it will expand to `DINS`
> > > > instructions.
> > > > While in fact here, we expect an SImode operation, due to `val` in C
> > > > code is `int`.
> > > >
> > > > With my patch, the RTX will be like:
> > > >
> > > > (insn 10 9 11 2 (set (zero_extract:SI (subreg:SI (reg/v:DI 200 [ val ]) 
> > > > 0)
> > > > (const_int 8 [0x8])
> > > > (const_int 0 [0]))
> > > > (subreg:SI (reg:QI 202) 0)) "xx.c":4:29 -1
> > > >  (nil))
> > >
> > > But if this RTL is correct then the above with DImode is correct as
> > > well and the issue is in the backend definition of the instruction
> > > defining 'DINS'?
> > >
> > 
> > I don't think so.
> > 
> > (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
> >  ^^
> >  (const_int 8 [0x8])
> >  (const_int 0 [0]))
> >  (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
> >   (nil))
> > 
> > This RTL has only info about DI. It doesn't has any info about the
> > real length of
> > `val`. For backend, it has no other choice instead of `DINS`.
> > 
> > > > So the operation will be SImode, aka `INS` instruction for MIPS64.
> > > >
> > > > The problem is based on 2 fact/root cause:
> > > > 1. MIPS's `INS` instruction will be always to sign-extension, while 
> > > > `DINS` won't
> > > > li $7, 0xff
> > > > li $8, 0
> > > > ins $8,$7,24,8  # set the 24-32 bits of $8 to 0xff.
> > > > The value of $8 will be 0xff ff ff ff ff 00 00 00.
> > >
> > > Bit that's wrong.  (set (zero_extract:SI ...) should not affect
> > > bits outside of the indicated range.
> > >
> > 
> > In fact, it is how sign-extension arch work.
> > No matter wrong or right, the ISA was/is defined like this.
> > 
> > In fact, one MIPS 32 ABI, the same C code will generate the RTL like this,
> > and the 32bit object can still workable on 64bit CPU.
> > That's a smart (or brain-damaged) design.
> > 
> > > @findex zero_extract
> > > @item (zero_extract:@var{m} @var{loc} @var{size} @var{pos})
> > > Like @code{sign_extract} but refers to an unsigned or zero-extended
> > > bit-field.  The same sequence of bits are extracted, but they
> > > are filled to an entire word with zeros instead of by sign-extension.
> > >
> > 
> > That's depending on the definition of `word` here.
> > For `(zero_extract:SI`, I think that the word is limit to the low 32bit of
> > hardware register.
> > Anyway, it won't break ISA without sign-extension by default.
> > 
> > Due to the nature of sign-extension ISA, if we don't sign-extension the
> > `int` variable, it will make something wrong.
> > 
> > To make it clear: the word `sign extension` here means:
> >the the value of 31bit will be copied to bits [32-63], and
> >the value of bits[0-30] won't be copied.
> > Here is the examples:
> > li $7, 0xff
> > li $8, 0x00 00 ff 00
> > ins $8,$7,16,8
> > ^^
> > The value of $8 will be: 0x 00 00 00 00 00 ff ff 00
> > 
> > li $7, 0xff
> > li $8, 0x00 00 ff 00
> > ins $8,$7,24,8
> > ^^
> > The value of $8 will be: 0x ff ff ff ff ff 00 ff 00
> 
> But that's INS.
> 
> > > Unli

Re: [PATCH] core: Support heap-based trampolines

2023-07-19 Thread Iain Sandoe
Hi Martin,

> On 19 Jul 2023, at 10:04, Martin Uecker  wrote:

>>> On 17 Jul 2023, 
>> 
> 
 You mention setjmp/longjmp - on darwin and other platforms
>> requiring
 non-stack based trampolines
 does the system runtime provide means to deal with this issue like
>> an
 alternate allocation method
 or a way to register cleanup?
>>> 
>>> There is an alternate mechanism relying on system libraries that is
>> possible on darwin specifically (I don’t know for other targets) but
>> it will only work for signed binaries, and would require us to
>> codesign everything produced by gcc. During development, it was
>> deemed too big an ask and the current strategy was chosen (Iain can
>> surely add more background on that if needed).
>> 
>> I do not think that this solves the setjump/longjump issue - since
>> there’s still a notional allocation that takes place (it’s just that
>> the mechanism for determining permissions is different).
>> 
>> It is also a big barrier for the general user - and prevents normal
>> folks from distributing GCC - since codesigning requires an external
>> certificate (i.e. I would really rather avoid it).
>> 
 Was there ever an attempt to provide a "generic" trampoline driven
>> by
 a more complex descriptor?
>> 
>> We did look at the “unused address bits” mechanism that Ada has used
>> - but that is not really available to a non-private ABI (unless the
>> system vendor agrees to change ABI to leave a bit spare) for the base
>> arch either the bits are not there (e.g. X86) or reserved (e.g.
>> AArch64).
>> 
>> Andrew Burgess did the original work he might have comments on
>> alternatives we tried
>> 
> 
> For reference, I proposed a patch for this in 2018. It was not
> accepted because minimum alignment for functions would increase
> for some archs:
> 
> https://gcc.gnu.org/legacy-ml/gcc-patches/2018-12/msg01532.html

Right - that was the one we originally looked at and has the issue that it 
breaks ABI - and thus would need vendor by-in to alter as you say.

 (well, it could be a bytecode interpreter and the trampoline being
 bytecode on the stack?!)
>>> 
>>> My own opinion is that executable stack should go away on all
>> targets at some point, so a truly generic solution to the problem
>> would be great.
>> 
>> indeed it would.

> I think we need a solution rather sooner than later on all archs.

AFAICS the  heap-based trampolines can work for any arch**, this issue is about
system security policy, rather than arch, specifically?

It seems to me that for any system security policy that permits JIT, (but not
executable stack) the heap-based trampolines are viable.

This seems to be a useful step forward; and we can add some other mechanism
to the flag’s supported list if someone develops one?

Iain

** modulo the target maintainers implementing the builtins.





Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-19 Thread YunQiang Su via Gcc-patches
Richard Biener  于2023年7月19日周三 17:23写道:
>
> On Wed, 19 Jul 2023, YunQiang Su wrote:
>
> > Richard Biener  ?2023?7?19??? 15:22???
> > >
> > > On Wed, 19 Jul 2023, YunQiang Su wrote:
> > >
> > > > Richard Biener via Gcc-patches  ?2023?7?19??? 
> > > > 14:27???
> > > > >
> > > > > On Wed, 19 Jul 2023, YunQiang Su wrote:
> > > > >
> > > > > > PR #104914
> > > > > >
> > > > > > When work with
> > > > > >   int val;
> > > > > >   ((unsigned char*)&val)[3] = *buf;
> > > > > >   if (val > 0) ...
> > > > > > The RTX mode is obtained from REG instead of SUBREG, which make
> > > > > > D is used instead of .  Thus something wrong happens
> > > > > > on sign-extend default architectures, like MIPS64.
> > > > > >
> > > > > > Let's use str_rtx and mode of str_rtx as the parameters for
> > > > > > store_integral_bit_field if:
> > > > > >   modes of op0 and str_rtx are INT;
> > > > > >   length of op0 is greater than str_rtx.
> > > > > >
> > > > > > This patch has been tested on aarch64-linux-gnu, x86_64-linux-gnu,
> > > > > > mips64el-linux-gnuabi64 without regression.
> > > > >
> > > > > I still think you are "fixing" this in the wrong place.  The bugzilla
> > > > > audit trail points to combine and later notes an eventual expansion
> > > > > issue (but for another testcase/target).
> > > > >
> > > > > You have to explain in more detail on what is wrong with the initial
> > > > > RTL on mips.
> > > > >
> > > >
> > > > In the first RTL file, aka xx.c.256r.expand, the zero_extract RTX is 
> > > > like
> > > >
> > > > (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
> > > > (const_int 8 [0x8])
> > > > (const_int 0 [0]))
> > > > (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
> > > >  (nil))
> > > >
> > > > Not, all of the REG are in DImode. On MIPS64, it will expand to `DINS`
> > > > instructions.
> > > > While in fact here, we expect an SImode operation, due to `val` in C
> > > > code is `int`.
> > > >
> > > > With my patch, the RTX will be like:
> > > >
> > > > (insn 10 9 11 2 (set (zero_extract:SI (subreg:SI (reg/v:DI 200 [ val ]) 
> > > > 0)
> > > > (const_int 8 [0x8])
> > > > (const_int 0 [0]))
> > > > (subreg:SI (reg:QI 202) 0)) "xx.c":4:29 -1
> > > >  (nil))
> > >
> > > But if this RTL is correct then the above with DImode is correct as
> > > well and the issue is in the backend definition of the instruction
> > > defining 'DINS'?
> > >
> >
> > I don't think so.
> >
> > (insn 10 9 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
> >  ^^
> >  (const_int 8 [0x8])
> >  (const_int 0 [0]))
> >  (subreg:DI (reg:QI 202) 0)) "../xx.c":4:29 -1
> >   (nil))
> >
> > This RTL has only info about DI. It doesn't has any info about the
> > real length of
> > `val`. For backend, it has no other choice instead of `DINS`.
> >
> > > > So the operation will be SImode, aka `INS` instruction for MIPS64.
> > > >
> > > > The problem is based on 2 fact/root cause:
> > > > 1. MIPS's `INS` instruction will be always to sign-extension, while 
> > > > `DINS` won't
> > > > li $7, 0xff
> > > > li $8, 0
> > > > ins $8,$7,24,8  # set the 24-32 bits of $8 to 0xff.
> > > > The value of $8 will be 0xff ff ff ff ff 00 00 00.
> > >
> > > Bit that's wrong.  (set (zero_extract:SI ...) should not affect
> > > bits outside of the indicated range.
> > >
> >
> > In fact, it is how sign-extension arch work.
> > No matter wrong or right, the ISA was/is defined like this.
> >
> > In fact, one MIPS 32 ABI, the same C code will generate the RTL like this,
> > and the 32bit object can still workable on 64bit CPU.
> > That's a smart (or brain-damaged) design.
> >
> > > @findex zero_extract
> > > @item (zero_extract:@var{m} @var{loc} @var{size} @var{pos})
> > > Like @code{sign_extract} but refers to an unsigned or zero-extended
> > > bit-field.  The same sequence of bits are extracted, but they
> > > are filled to an entire word with zeros instead of by sign-extension.
> > >
> >
> > That's depending on the definition of `word` here.
> > For `(zero_extract:SI`, I think that the word is limit to the low 32bit of
> > hardware register.
> > Anyway, it won't break ISA without sign-extension by default.
> >
> > Due to the nature of sign-extension ISA, if we don't sign-extension the
> > `int` variable, it will make something wrong.
> >
> > To make it clear: the word `sign extension` here means:
> >the the value of 31bit will be copied to bits [32-63], and
> >the value of bits[0-30] won't be copied.
> > Here is the examples:
> > li $7, 0xff
> > li $8, 0x00 00 ff 00
> > ins $8,$7,16,8
> > ^^
> > The value of $8 will be: 0x 00 00 00 00 00 ff ff 00
> >
> > li $7, 0xff
> > li $8, 0x00 00 ff 00
> > ins $8,$7,24,8
> > ^^
> > The value of $8 will be: 0x ff ff ff ff ff 00 ff 00
>
> But that's INS.
>
> > > Unlike @code{sign_ext

Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-19 Thread Eric Botcazou via Gcc-patches
> I don't see that.  That's definitely not what GCC expects here,
> the left-most word of the doubleword should be unchanged.
> 
> Your testcase should be a dg-do-run and probably more like
> 
> NOMIPS16 int __attribute__((noipa)) test (const unsigned char *buf)
> {
>   int val;
>   ((unsigned char*)&val)[0] = *buf++;
>   ((unsigned char*)&val)[1] = *buf++;
>   ((unsigned char*)&val)[2] = *buf++;
>   ((unsigned char*)&val)[3] = *buf++;
>   return val;
> }
> int main()
> {
>   int val = 0x01020304;
>   val = test (&val);
>   if (val != 0x01020304)
> abort ();
> }
> 
> not sure if I got endianess correct.  Now, the question is what
> WORD_REGISTER_OPERATIONS implies for a bitfield insert and what
> the MIPS ABI says for returning SImode.

WORD_REGISTER_OPERATIONS must *not* be taken account for bit-fields, see e;g. 
word_register_operation_p:

/* Return true if X is an operation that always operates on the full
   registers for WORD_REGISTER_OPERATIONS architectures.  */

inline bool
word_register_operation_p (const_rtx x)
{
  switch (GET_CODE (x))
{
case CONST_INT:
case ROTATE:
case ROTATERT:
case SIGN_EXTRACT:
case ZERO_EXTRACT:
  return false;

default:
  return true;
}
}

-- 
Eric Botcazou




[committed 1/3] libstdc++: Check autoconf macros for strtof and strtold [PR110653]

2023-07-19 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

As well as the _GLIBCXX_USE_C99_STDLIB check, we also have a separate
check in linkage.m4 for just strtof and strtold. We can use that to
declare std::strtof and std::strtold in  for additional
targets. That allows us to enable std::stold on hpux11.11 which is
missing strtoll, strtoull and strtof, so doesn't define
_GLIBCXX_USE_C99_STDLIB. Although it doesn't help hpux11.11, we can
define std::stof for more targets this way too.

As with the previous commit for PR110653, this only affects the narrow
character overloads. std::stof and std::stold for wstring still requires
C99  support.

libstdc++-v3/ChangeLog:

PR libstdc++/110653
* include/bits/basic_string.h [_GLIBCXX_HAVE_STRTOF] (stof):
Define.
[_GLIBCXX_HAVE_STRTOLD] (stold): Define.
* include/c_global/cstdlib [_GLIBCXX_HAVE_STRTOF] (strtof):
Declare in namespace std.
[_GLIBCXX_HAVE_STRTOLD] (strtold): Likewise.
---
 libstdc++-v3/include/bits/basic_string.h |  6 --
 libstdc++-v3/include/c_global/cstdlib| 14 ++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 01e25dad20e..32f5d4421f7 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -4148,12 +4148,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   stod(const string& __str, size_t* __idx = 0)
   { return __gnu_cxx::__stoa(&std::strtod, "stod", __str.c_str(), __idx); }
 
-#if _GLIBCXX_USE_C99_STDLIB
+#if _GLIBCXX_USE_C99_STDLIB || _GLIBCXX_HAVE_STRTOF
   // NB: strtof vs strtod.
   inline float
   stof(const string& __str, size_t* __idx = 0)
   { return __gnu_cxx::__stoa(&std::strtof, "stof", __str.c_str(), __idx); }
+#endif
 
+#if _GLIBCXX_USE_C99_STDLIB || _GLIBCXX_HAVE_STRTOLD
   inline long double
   stold(const string& __str, size_t* __idx = 0)
   { return __gnu_cxx::__stoa(&std::strtold, "stold", __str.c_str(), __idx); }
@@ -4161,7 +4163,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   inline long double
   stold(const string& __str, size_t* __idx = 0)
   { return std::stod(__str, __idx); }
-#endif // _GLIBCXX_USE_C99_STDLIB
+#endif
 
   // DR 1261. Insufficent overloads for to_string / to_wstring
 
diff --git a/libstdc++-v3/include/c_global/cstdlib 
b/libstdc++-v3/include/c_global/cstdlib
index aeb961ad69d..60317aa9a4a 100644
--- a/libstdc++-v3/include/c_global/cstdlib
+++ b/libstdc++-v3/include/c_global/cstdlib
@@ -256,6 +256,20 @@ namespace std
   using ::__gnu_cxx::strtold;
 } // namespace std
 
+#else  // ! _GLIBCXX_USE_C99_STDLIB
+
+// We also check for strtof and strtold separately from _GLIBCXX_USE_C99_STDLIB
+
+#if _GLIBCXX_HAVE_STRTOF
+#undef strtof
+namespace std { using ::strtof; }
+#endif
+
+#if _GLIBCXX_HAVE_STRTOLD
+#undef strtold
+namespace std { using ::strtold; }
+#endif
+
 #endif // _GLIBCXX_USE_C99_STDLIB
 
 } // extern "C++"
-- 
2.41.0



[committed 3/3] libstdc++: Enable tests for std::stoi etc. unconditionally [PR110653]

2023-07-19 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

Since the narrow string versions of std::stoi, std::stol, std::stoul,
std::stof and std::stod are now always defined, we don't need to check
dg-require-string-conversions in the relevant tests.

libstdc++-v3/ChangeLog:

PR libstdc++/110653
* testsuite/21_strings/basic_string/numeric_conversions/char/stod.cc:
Remove dg-require-string-conversions.
* testsuite/21_strings/basic_string/numeric_conversions/char/stof.cc:
Likewise.
* testsuite/21_strings/basic_string/numeric_conversions/char/stoi.cc:
Likewise.
* testsuite/21_strings/basic_string/numeric_conversions/char/stol.cc:
Likewise.
* testsuite/21_strings/basic_string/numeric_conversions/char/stoul.cc:
Likewise.
---
 .../21_strings/basic_string/numeric_conversions/char/stod.cc | 1 -
 .../21_strings/basic_string/numeric_conversions/char/stof.cc | 1 -
 .../21_strings/basic_string/numeric_conversions/char/stoi.cc | 1 -
 .../21_strings/basic_string/numeric_conversions/char/stol.cc | 1 -
 .../21_strings/basic_string/numeric_conversions/char/stoul.cc| 1 -
 5 files changed, 5 deletions(-)

diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stod.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stod.cc
index 062a0203c7c..c899a69ec17 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stod.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stod.cc
@@ -1,5 +1,4 @@
 // { dg-do run { target c++11 } }
-// { dg-require-string-conversions "" }
 // { dg-xfail-run-if "broken long double IO" { newlib_broken_long_double_io  } 
}
 
 // 2008-06-15  Paolo Carlini  
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stof.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stof.cc
index 584af6d072e..a06cd7e48c7 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stof.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stof.cc
@@ -1,5 +1,4 @@
 // { dg-do run { target c++11 } }
-// { dg-require-string-conversions "" }
 
 // 2008-06-15  Paolo Carlini  
 
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stoi.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stoi.cc
index 1ef89728caf..f7d88065d94 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stoi.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stoi.cc
@@ -1,5 +1,4 @@
 // { dg-do run { target c++11 } }
-// { dg-require-string-conversions "" }
 
 // 2008-06-15  Paolo Carlini  
 
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stol.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stol.cc
index 4617e38c2a9..a3e4e9cdd76 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stol.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stol.cc
@@ -1,5 +1,4 @@
 // { dg-do run { target c++11 } }
-// { dg-require-string-conversions "" }
 
 // 2008-06-15  Paolo Carlini  
 
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stoul.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stoul.cc
index 1562d5d9fbe..705feaa5e84 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stoul.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/stoul.cc
@@ -1,5 +1,4 @@
 // { dg-do run { target c++11 } }
-// { dg-require-string-conversions "" }
 
 // 2008-06-15  Paolo Carlini  
 
-- 
2.41.0



[committed 2/3] libstdc++: Define std::stof fallback in terms of std::stod [PR110653]

2023-07-19 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

For targets without std::strtof we can define std::stof by calling
std::stod and then checking if the result is out of range of float.

libstdc++-v3/ChangeLog:

PR libstdc++/110653
* include/bits/basic_string.h [!_GLIBCXX_HAVE_STRTOF] (stof):
Define in terms of std::stod.
---
 libstdc++-v3/include/bits/basic_string.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 32f5d4421f7..e4cb9846025 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -4153,6 +4153,22 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   inline float
   stof(const string& __str, size_t* __idx = 0)
   { return __gnu_cxx::__stoa(&std::strtof, "stof", __str.c_str(), __idx); }
+#else
+  inline float
+  stof(const string& __str, size_t* __idx = 0)
+  {
+double __d = std::stod(__str, __idx);
+if (__builtin_isfinite(__d))
+  {
+   double __abs_d = __builtin_fabs(__d);
+   if (__abs_d < __FLT_MIN__ || __abs_d > __FLT_MAX__)
+ {
+   errno = ERANGE;
+   std::__throw_out_of_range("stof");
+ }
+  }
+return __d;
+  }
 #endif
 
 #if _GLIBCXX_USE_C99_STDLIB || _GLIBCXX_HAVE_STRTOLD
-- 
2.41.0



[PATCH 0/5] Recognize Zicond extension

2023-07-19 Thread Xiao Zeng
Hi all RISC-V folks:

This series of patches completes support for the riscv architecture's
Zicond standard extension instruction set.

Currently, Zicond is in a frozen state.

See the Zicond specification for details:
https://github.com/riscv/riscv-zicond/releases/download/v1.0-rc2/riscv-zicond-v1.0-rc2.pdf

Prior to this, other community members have also done related work, as shown 
in: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611767.html
https://sourceware.org/pipermail/binutils/2023-January/125773.html

Xiao Zeng (5):
  [RISC-V] Recognize Zicond extension
  [RISC-V] Generate Zicond instruction for basic semantics
  [RISC-V] Generate Zicond instruction for select pattern with condition
eq or neq to 0
  [RISC-V] Generate Zicond instruction for select pattern with condition
eq or neq to non-zero
  [RISC-V] Generate Zicond instruction for conditional execution

 gcc/common/config/riscv/riscv-common.cc   |   3 +
 gcc/config/riscv/riscv-opts.h |   3 +
 gcc/config/riscv/riscv.cc | 141 +
 gcc/config/riscv/riscv.md |   3 +-
 gcc/config/riscv/zicond.md|  84 +++
 gcc/ifcvt.cc  | 251 
 gcc/testsuite/gcc.target/riscv/attribute-20.c |   6 +
 gcc/testsuite/gcc.target/riscv/attribute-21.c |   6 +
 ...ionalArithmetic_compare_0_return_imm_reg.c | 553 +
 ...ionalArithmetic_compare_0_return_reg_reg.c | 585 ++
 ...nalArithmetic_compare_imm_return_imm_reg.c | 297 +
 ...nalArithmetic_compare_imm_return_reg_reg.c | 297 +
 ...nalArithmetic_compare_reg_return_imm_reg.c | 297 +
 ...nalArithmetic_compare_reg_return_reg_reg.c | 329 ++
 .../riscv/zicond-primitiveSemantics.c |  49 ++
 .../zicond-primitiveSemantics_compare_imm.c   |  57 ++
 ...mitiveSemantics_compare_imm_return_0_imm.c |  73 +++
 ...tiveSemantics_compare_imm_return_imm_imm.c |  73 +++
 ...tiveSemantics_compare_imm_return_imm_reg.c |  65 ++
 ...tiveSemantics_compare_imm_return_reg_reg.c |  65 ++
 .../zicond-primitiveSemantics_compare_reg.c   |  65 ++
 ...mitiveSemantics_compare_reg_return_0_imm.c |  73 +++
 ...tiveSemantics_compare_reg_return_imm_imm.c |  73 +++
 ...tiveSemantics_compare_reg_return_imm_reg.c |  65 ++
 ...tiveSemantics_compare_reg_return_reg_reg.c |  77 +++
 .../zicond-primitiveSemantics_return_0_imm.c  |  65 ++
 ...zicond-primitiveSemantics_return_imm_imm.c |  73 +++
 ...zicond-primitiveSemantics_return_imm_reg.c |  65 ++
 ...zicond-primitiveSemantics_return_reg_reg.c |  65 ++
 29 files changed, 3857 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/zicond.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-20.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-21.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_0_return_imm_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_0_return_reg_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_imm_return_imm_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_imm_return_reg_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_reg_return_imm_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_reg_return_reg_reg.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_0_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_0_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c

-- 
2.17.1



[PATCH 2/5] [RISC-V] Generate Zicond instruction for basic semantics

2023-07-19 Thread Xiao Zeng
This patch completes the recognition of the basic semantics
defined in the spec, namely:

Conditional zero, if condition is equal to zero
  rd = (rs2 == 0) ? 0 : rs1
Conditional zero, if condition is non zero
  rd = (rs2 != 0) ? 0 : rs1

gcc/ChangeLog:

* config/riscv/riscv.md: Include zicond.md
* config/riscv/zicond.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-primitiveSemantics.c: New test.
---
 gcc/config/riscv/riscv.md |  1 +
 gcc/config/riscv/zicond.md| 84 +++
 .../riscv/zicond-primitiveSemantics.c | 49 +++
 3 files changed, 134 insertions(+)
 create mode 100644 gcc/config/riscv/zicond.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index d63b584a4c1..6b8c2e8e268 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3317,3 +3317,4 @@
 (include "sifive-7.md")
 (include "thead.md")
 (include "vector.md")
+(include "zicond.md")
diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
new file mode 100644
index 000..1cf28589c87
--- /dev/null
+++ b/gcc/config/riscv/zicond.md
@@ -0,0 +1,84 @@
+;; Machine description for the RISC-V Zicond extension
+;; Copyright (C) 2022-23 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_code_iterator eq_or_ne [eq ne])
+(define_code_attr eqz [(eq "nez") (ne "eqz")])
+(define_code_attr nez [(eq "eqz") (ne "nez")])
+
+;; Zicond
+(define_insn "*czero.."
+  [(set (match_operand:GPR 0 "register_operand"  "=r")
+(if_then_else:GPR (eq_or_ne (match_operand:ANYI 1 "register_operand" 
"r")
+(const_int 0))
+  (match_operand:GPR 2 "register_operand""r")
+  (const_int 0)))]
+  "TARGET_ZICOND"
+  "czero.\t%0,%2,%1"
+)
+
+(define_insn "*czero.."
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+(if_then_else:GPR (eq_or_ne (match_operand:ANYI 1 "register_operand" 
"r")
+(const_int 0))
+  (const_int 0)
+  (match_operand:GPR 2 "register_operand"   "r")))]
+  "TARGET_ZICOND"
+  "czero.\t%0,%2,%1"
+)
+
+;; Special optimization under eq/ne in primitive semantics
+(define_insn "*czero.eqz..opt1"
+  [(set (match_operand:GPR 0 "register_operand"   "=r")
+(if_then_else:GPR (eq (match_operand:ANYI 1 "register_operand" "r")
+  (const_int 0))
+  (match_operand:GPR 2 "register_operand" "1")
+  (match_operand:GPR 3 "register_operand" "r")))]
+  "TARGET_ZICOND && operands[1] == operands[2]"
+  "czero.eqz\t%0,%3,%1"
+)
+
+(define_insn "*czero.eqz..opt2"
+  [(set (match_operand:GPR 0 "register_operand"   "=r")
+(if_then_else:GPR (eq (match_operand:ANYI 1 "register_operand" "r")
+  (const_int 0))
+  (match_operand:GPR 2 "register_operand" "r")
+  (match_operand:GPR 3 "register_operand" "1")))]
+  "TARGET_ZICOND && operands[1] == operands[3]"
+  "czero.nez\t%0,%2,%1"
+)
+
+(define_insn "*czero.nez..opt3"
+  [(set (match_operand:GPR 0 "register_operand"   "=r")
+(if_then_else:GPR (ne (match_operand:ANYI 1 "register_operand" "r")
+  (const_int 0))
+  (match_operand:GPR 2 "register_operand" "r")
+  (match_operand:GPR 3 "register_operand" "1")))]
+  "TARGET_ZICOND && operands[1] == operands[3]"
+  "czero.eqz\t%0,%2,%1"
+)
+
+(define_insn "*czero.nez..opt4"
+  [(set (match_operand:GPR 0 "register_operand"   "=r")
+(if_then_else:GPR (ne (match_operand:ANYI 1 "register_operand" "r")
+  (const_int 0))
+  (match_operand:GPR 2 "register_operand" "1")
+  (match_operand:GPR 3 "register_operand" "r")))]
+  "TARGET_ZICOND && operands[1] == operands[2]"
+  "czero.nez\t%0,%3,%1"
+)
diff --git a/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics.c 
b/gcc/testsuite/gcc.target/riscv/zicond-p

[PATCH 4/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to non-zero

2023-07-19 Thread Xiao Zeng
This patch completes the recognition of Zicond when the select pattern with
condition eq or neq to non-zero (using equality as an example), namely:

1 rd = (rs2 == non-imm) ? 0 : rs1
2 rd = (rs2 == reg) ? 0 : rs1

At the same time, more Zicond non basic semantic test cases have been added.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_conditional_move): Recognize 
Zicond.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-primitiveSemantics_compare_imm.c: New test.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_0_imm.c: New test.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c: New 
test.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c: New 
test.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c: New 
test.
* gcc.target/riscv/zicond-primitiveSemantics_compare_reg.c: New test.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_0_imm.c: New test.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c: New 
test.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c: New 
test.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c: New 
test.
---
 gcc/config/riscv/riscv.cc | 16 
 .../zicond-primitiveSemantics_compare_imm.c   | 57 ++
 ...mitiveSemantics_compare_imm_return_0_imm.c | 73 ++
 ...tiveSemantics_compare_imm_return_imm_imm.c | 73 ++
 ...tiveSemantics_compare_imm_return_imm_reg.c | 65 
 ...tiveSemantics_compare_imm_return_reg_reg.c | 65 
 .../zicond-primitiveSemantics_compare_reg.c   | 65 
 ...mitiveSemantics_compare_reg_return_0_imm.c | 73 ++
 ...tiveSemantics_compare_reg_return_imm_imm.c | 73 ++
 ...tiveSemantics_compare_reg_return_imm_reg.c | 65 
 ...tiveSemantics_compare_reg_return_reg_reg.c | 77 +++
 11 files changed, 702 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_0_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_0_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7e6b24bd232..9450457e613 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3625,6 +3625,22 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
cons, rtx alt)
   riscv_emit_binary(IOR, dest, reg1, reg2);
   return true;
 }
+  /* For complex semantics of comparison value.
+ reg + 0 or 0 + reg  */
+  else if ((GET_CODE (cons) == REG &&
+   GET_CODE (alt) == CONST_INT &&
+alt == const0_rtx)
+   || (GET_CODE (alt) == REG &&
+   GET_CODE (cons) == CONST_INT &&
+   cons == const0_rtx))
+{
+  riscv_emit_int_compare (&code, &op0, &op1, need_eq_ne_p);
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  emit_insn (gen_rtx_SET (dest,
+  gen_rtx_IF_THEN_ELSE (mode, cond,
+cons, alt)));
+  return true;
+}
 }
 
   return false;
diff --git 
a/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm.c 
b/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm.c
new file mode 100644
index 000..6de50039c31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm.c
@@ -0,0 +1,57 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64d" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0"} } */
+
+long primitiveSemantics_compare_imm_00(long a, long b) {
+  return a == 2 ? 0 : b;
+}
+
+long primitiveSemantics_compare_imm_01(long a, long b) {
+  return a != 2 ? 0 : b;
+}
+
+long primitiveSemanti

[PATCH 5/5] [RISC-V] Generate Zicond instruction for conditional execution

2023-07-19 Thread Xiao Zeng
This patch completes the recognition of conditional execution
(using equality as an example), namely:

1 rd = (rc == 0) ? (rs1 arith_op rs2) : rs1

Here, arith_op represents the arithmetic operation symbol, which has 8
possibilities: + - | ^ << >>(Shift Right Arithmetic) >>(Shift Right Logical) &

At the same time, more Zicond non basic conditional execution test cases have
also been added, namely:

2 rd = (rc == 0) ? (rs1 arith_op non-imm) : rs1
3 rd = (rc == non-imm) ? (rs1 arith_op rs2) : rs1
4 rd = (rc == non-imm) ? (rs1 arith_op non-imm) : rs1
5 rd = (rc == reg) ? (rs1 arith_op rs2) : rs1
6 rd = (rc == reg) ? (rs1 arith_op non-imm) : rs1

gcc/ChangeLog:

* ifcvt.cc (noce_emit_condzero_arith): Helper function for 
noce_emit_condzero_arith.
(noce_try_condzero_arith): Recognize Zicond patterns.
(noce_process_if_block): Add noce_try_condzero_arith function.

gcc/testsuite/ChangeLog:

* 
gcc.target/riscv/zicond-conditionalArithmetic_compare_0_return_imm_reg.c: New 
test.
* 
gcc.target/riscv/zicond-conditionalArithmetic_compare_0_return_reg_reg.c: New 
test.
* 
gcc.target/riscv/zicond-conditionalArithmetic_compare_imm_return_imm_reg.c: New 
test.
* 
gcc.target/riscv/zicond-conditionalArithmetic_compare_imm_return_reg_reg.c: New 
test.
* 
gcc.target/riscv/zicond-conditionalArithmetic_compare_reg_return_imm_reg.c: New 
test.
* 
gcc.target/riscv/zicond-conditionalArithmetic_compare_reg_return_reg_reg.c: New 
test.
---
 gcc/ifcvt.cc  | 251 
 ...ionalArithmetic_compare_0_return_imm_reg.c | 553 +
 ...ionalArithmetic_compare_0_return_reg_reg.c | 585 ++
 ...nalArithmetic_compare_imm_return_imm_reg.c | 297 +
 ...nalArithmetic_compare_imm_return_reg_reg.c | 297 +
 ...nalArithmetic_compare_reg_return_imm_reg.c | 297 +
 ...nalArithmetic_compare_reg_return_reg_reg.c | 329 ++
 7 files changed, 2609 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_0_return_imm_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_0_return_reg_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_imm_return_imm_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_imm_return_reg_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_reg_return_imm_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_reg_return_reg_reg.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 0b180b4568f..0261d2f1673 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -781,12 +781,15 @@ static int noce_try_store_flag_constants (struct 
noce_if_info *);
 static int noce_try_store_flag_mask (struct noce_if_info *);
 static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
rtx, rtx, rtx, rtx = NULL, rtx = NULL);
+static rtx noce_emit_condzero_arith (struct noce_if_info *, rtx, enum 
rtx_code, rtx,
+ rtx, rtx, rtx);
 static int noce_try_cmove (struct noce_if_info *);
 static int noce_try_cmove_arith (struct noce_if_info *);
 static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
 static int noce_try_minmax (struct noce_if_info *);
 static int noce_try_abs (struct noce_if_info *);
 static int noce_try_sign_mask (struct noce_if_info *);
+static int noce_try_condzero_arith (struct noce_if_info *);
 
 /* Return the comparison code for reversed condition for IF_INFO,
or UNKNOWN if reversing the condition is not possible.  */
@@ -1830,6 +1833,60 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, 
enum rtx_code code,
 return NULL_RTX;
 }
 
+/* Helper function for noce_emit_condzero_arith.  */
+
+static rtx
+noce_emit_condzero_arith (struct noce_if_info *if_info, rtx x, enum rtx_code 
code,
+  rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue)
+{
+  rtx cond = NULL;
+
+  /* Standard form of conditional comparison.  */
+  if (GET_CODE(cmp_a) == REG && cmp_b == const0_rtx)
+cond = gen_rtx_fmt_ee (code, GET_MODE (if_info->cond), cmp_a, cmp_b);
+
+  /* Register and non-zero immediate comparison.  */
+  else if (GET_CODE(cmp_a) == REG && GET_CODE(cmp_b) == CONST_INT &&
+   cmp_b != const0_rtx)
+{
+  rtx temp1 = gen_reg_rtx (GET_MODE(cmp_a));
+  rtx temp2 = GEN_INT(-1 * INTVAL (cmp_b));
+  rtx src = gen_rtx_fmt_ee (PLUS, GET_MODE (cmp_a), cmp_a, temp2);
+  emit_insn (gen_rtx_SET (temp1, src));
+  cond = gen_rtx_fmt_ee (code, GET_MODE (if_info->cond), temp1, 
const0_rtx);
+}
+
+  /* Register and Register comparison.  */
+  else if (GET_CODE(cmp_a) == REG && GET_CODE(cmp_b) == REG)
+{
+  rtx temp1 = gen_reg_rtx (GET_MODE(cmp_a));
+  rtx src = gen_rtx_fmt_ee (MINUS, GET_MODE (c

[PATCH 1/5] [RISC-V] Recognize Zicond extension

2023-07-19 Thread Xiao Zeng
This patch is the minimal support for Zicond extension, include
the extension name, mask and target defination.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New extension.
* config/riscv/riscv-opts.h (MASK_ZICOND): New mask.
(TARGET_ZICOND): New target.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-20.c: New test.
* gcc.target/riscv/attribute-21.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc   | 3 +++
 gcc/config/riscv/riscv-opts.h | 3 +++
 gcc/testsuite/gcc.target/riscv/attribute-20.c | 6 ++
 gcc/testsuite/gcc.target/riscv/attribute-21.c | 6 ++
 4 files changed, 18 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-20.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-21.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 6091d8f281b..8460d83b0f1 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -183,6 +183,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
   {"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
 
+  {"zicond", ISA_SPEC_CLASS_NONE, 1, 0},
+
   {"zawrs", ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"zba", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1243,6 +1245,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 
   {"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
   {"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
+  {"zicond",   &gcc_options::x_riscv_zi_subext, MASK_ZICOND},
 
   {"zawrs", &gcc_options::x_riscv_za_subext, MASK_ZAWRS},
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index cfcf608ea62..cecaee7d200 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -236,6 +236,9 @@ enum riscv_entity
 #define TARGET_ZICBOM ((riscv_zicmo_subext & MASK_ZICBOM) != 0)
 #define TARGET_ZICBOP ((riscv_zicmo_subext & MASK_ZICBOP) != 0)
 
+#define MASK_ZICOND   (1 << 2)
+#define TARGET_ZICOND ((riscv_zi_subext & MASK_ZICOND) != 0)
+
 #define MASK_ZFHMIN   (1 << 0)
 #define MASK_ZFH  (1 << 1)
 #define MASK_ZVFHMIN  (1 << 2)
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-20.c 
b/gcc/testsuite/gcc.target/riscv/attribute-20.c
new file mode 100644
index 000..b69c36cf4f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/attribute-20.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-mriscv-attribute -march=rv32i_zicond -mabi=ilp32" } */
+
+void foo(){}
+
+/* { dg-final { scan-assembler ".attribute arch, \"rv32i2p1_zicond1p0\"" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-21.c 
b/gcc/testsuite/gcc.target/riscv/attribute-21.c
new file mode 100644
index 000..160312a0d48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/attribute-21.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-mriscv-attribute -march=rv64i_zicond -mabi=lp64" } */
+
+void foo(){}
+
+/* { dg-final { scan-assembler ".attribute arch, \"rv64i2p1_zicond1p0\"" } } */
-- 
2.17.1



[PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-07-19 Thread Xiao Zeng
This patch completes the recognition of Zicond when the select pattern
with condition eq or neq to 0 (using equality as an example), namely:

1 rd = (rs2 == 0) ? non-imm : 0
2 rd = (rs2 == 0) ? non-imm : non-imm
3 rd = (rs2 == 0) ? reg : non-imm
4 rd = (rs2 == 0) ? reg : reg

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_rtx_costs): IF_THEN_ELSE costs in Zicond.
(riscv_expand_conditional_move): Recognize Zicond.
* config/riscv/riscv.md: Zicond patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c: New test.
---
 gcc/config/riscv/riscv.cc | 125 ++
 gcc/config/riscv/riscv.md |   2 +-
 .../zicond-primitiveSemantics_return_0_imm.c  |  65 +
 ...zicond-primitiveSemantics_return_imm_imm.c |  73 ++
 ...zicond-primitiveSemantics_return_imm_reg.c |  65 +
 ...zicond-primitiveSemantics_return_reg_reg.c |  65 +
 6 files changed, 394 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 38d8eb2fcf5..7e6b24bd232 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2448,6 +2448,17 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  *total = COSTS_N_INSNS (1);
  return true;
}
+  else if (TARGET_ZICOND && outer_code == SET &&
+   ((GET_CODE (XEXP (x, 1)) == REG && XEXP (x, 2) == const0_rtx) ||
+   (GET_CODE (XEXP (x, 2)) == REG && XEXP (x, 1) == const0_rtx) ||
+   (GET_CODE (XEXP (x, 1)) == REG && GET_CODE (XEXP (x, 2)) &&
+XEXP (x, 1) == XEXP (XEXP (x, 0), 0)) ||
+   (GET_CODE (XEXP (x, 1)) == REG && GET_CODE (XEXP (x, 2)) &&
+XEXP (x, 2) == XEXP (XEXP (x, 0), 0
+{
+  *total = 0;
+  return true;
+}
   else if (LABEL_REF_P (XEXP (x, 1)) && XEXP (x, 2) == pc_rtx)
{
  if (equality_operator (XEXP (x, 0), mode)
@@ -3501,6 +3512,120 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
cons, rtx alt)
  cond, cons, alt)));
   return true;
 }
+  else if (TARGET_ZICOND
+   && (code == EQ || code == NE)
+   && GET_MODE_CLASS (mode) == MODE_INT)
+{
+  need_eq_ne_p = true;
+  /* 0 + imm  */
+  if (GET_CODE (cons) == CONST_INT && cons == const0_rtx
+  && GET_CODE (alt) == CONST_INT && alt != const0_rtx)
+{
+  riscv_emit_int_compare (&code, &op0, &op1, need_eq_ne_p);
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  alt = force_reg (mode, alt);
+  emit_insn (gen_rtx_SET (dest,
+  gen_rtx_IF_THEN_ELSE (mode, cond,
+cons, alt)));
+  return true;
+}
+  /* imm + imm  */
+  else if (GET_CODE (cons) == CONST_INT && cons != const0_rtx
+   && GET_CODE (alt) == CONST_INT && alt != const0_rtx)
+{
+  riscv_emit_int_compare (&code, &op0, &op1, need_eq_ne_p);
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  alt = force_reg (mode, alt);
+  rtx temp1 = gen_reg_rtx (mode);
+  rtx temp2 = GEN_INT(-1 * INTVAL (cons));
+  riscv_emit_binary(PLUS, temp1, alt, temp2);
+  emit_insn (gen_rtx_SET (dest,
+  gen_rtx_IF_THEN_ELSE (mode, cond,
+const0_rtx, alt)));
+  riscv_emit_binary(PLUS, dest, dest, cons);
+  return true;
+}
+  /* imm + reg  */
+  else if (GET_CODE (cons) == CONST_INT && cons != const0_rtx
+   && GET_CODE (alt) == REG)
+{
+  /* Optimize for register value of 0.  */
+  if (op0 == alt && op1 == const0_rtx)
+{
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  cons = force_reg (mode, cons);
+  emit_insn (gen_rtx_SET (dest,
+  gen_rtx_IF_THEN_ELSE (mode, cond,
+cons, alt)));
+  return true;
+}
+  riscv_emit_int_c

Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-19 Thread YunQiang Su via Gcc-patches
Eric Botcazou  于2023年7月19日周三 17:45写道:
>
> > I don't see that.  That's definitely not what GCC expects here,
> > the left-most word of the doubleword should be unchanged.
> >
> > Your testcase should be a dg-do-run and probably more like
> >
> > NOMIPS16 int __attribute__((noipa)) test (const unsigned char *buf)
> > {
> >   int val;
> >   ((unsigned char*)&val)[0] = *buf++;
> >   ((unsigned char*)&val)[1] = *buf++;
> >   ((unsigned char*)&val)[2] = *buf++;
> >   ((unsigned char*)&val)[3] = *buf++;
> >   return val;
> > }
> > int main()
> > {
> >   int val = 0x01020304;
> >   val = test (&val);
> >   if (val != 0x01020304)
> > abort ();
> > }
> >
> > not sure if I got endianess correct.  Now, the question is what
> > WORD_REGISTER_OPERATIONS implies for a bitfield insert and what
> > the MIPS ABI says for returning SImode.
>

MIPS N64 ABI uses 2 GPR for integer return values.
If the return value is SImode, the first v0 register is used, and it
must be sign-extended,
aka the bits[64-31] are all same.

Yes, it is same for signed and unsigned int32.

https://irix7.com/techpubs/007-2816-004.pdf
Page 6:
32-bit integer (int) parameters are always sign-extended when passed
in registers,
whether of signed or unsigned type. [This issue does not arise in the
o32-bit ABI.]


> WORD_REGISTER_OPERATIONS must *not* be taken account for bit-fields, see e;g.
> word_register_operation_p:
>
> /* Return true if X is an operation that always operates on the full
>registers for WORD_REGISTER_OPERATIONS architectures.  */
>
> inline bool
> word_register_operation_p (const_rtx x)
> {
>   switch (GET_CODE (x))
> {
> case CONST_INT:
> case ROTATE:
> case ROTATERT:
> case SIGN_EXTRACT:
> case ZERO_EXTRACT:
>   return false;
>
> default:
>   return true;
> }
> }
>
> --
> Eric Botcazou
>
>


-- 
YunQiang Su


[committed] libstdc++: Check for multiple modifiers in chrono format string [PR110708]

2023-07-19 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

This should be backported to gcc-13 too, but it can wait until 13.3
(it's just an accepts-invalid and unlikely to affect anybody in
practice).

-- >8 --

The logic for handling modified chrono specs like %Ey was just
restarting the loop after each modifier, and not checking whether we'd
already seen a modifier.

libstdc++-v3/ChangeLog:

PR libstdc++/110708
* include/bits/chrono_io.h (__formatter_chrono::_M_parse): Only
allow a single modifier.
* testsuite/std/time/format.cc: Check multiple modifiers.
---
 libstdc++-v3/include/bits/chrono_io.h |  5 +
 libstdc++-v3/testsuite/std/time/format.cc | 10 ++
 2 files changed, 15 insertions(+)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index 87caa30b83a..5f06a6d76b4 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -426,6 +426,11 @@ namespace __format
  break;
case 'O':
case 'E':
+ if (__mod) [[unlikely]]
+   {
+ __allowed_mods = _Mod_none;
+ break;
+   }
  __mod = __c;
  continue;
default:
diff --git a/libstdc++-v3/testsuite/std/time/format.cc 
b/libstdc++-v3/testsuite/std/time/format.cc
index b05e5da1af8..0dc45d58dce 100644
--- a/libstdc++-v3/testsuite/std/time/format.cc
+++ b/libstdc++-v3/testsuite/std/time/format.cc
@@ -68,6 +68,16 @@ test_bad_format_strings()
   // modifier not valid for conversion specifier
   VERIFY( not is_format_string_for("{:%Ea}", t) );
   VERIFY( not is_format_string_for("{:%Oa}", t) );
+
+  // more than one modifier (PR libstdc++/110708)
+  VERIFY( not is_format_string_for("{:%EEc}", t) );
+  VERIFY( not is_format_string_for("{:%EEEc}", t) );
+  VERIFY( not is_format_string_for("{:%OOd}", t) );
+  VERIFY( not is_format_string_for("{:%OOOd}", t) );
+  VERIFY( not is_format_string_for("{:%EEy}", t) );
+  VERIFY( not is_format_string_for("{:%OOy}", t) );
+  VERIFY( not is_format_string_for("{:%OEy}", t) );
+  VERIFY( not is_format_string_for("{:%EOy}", t) );
 }
 
 template
-- 
2.41.0



Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-19 Thread Richard Biener via Gcc-patches
On Wed, 19 Jul 2023, YunQiang Su wrote:

> Eric Botcazou  ?2023?7?19??? 17:45???
> >
> > > I don't see that.  That's definitely not what GCC expects here,
> > > the left-most word of the doubleword should be unchanged.
> > >
> > > Your testcase should be a dg-do-run and probably more like
> > >
> > > NOMIPS16 int __attribute__((noipa)) test (const unsigned char *buf)
> > > {
> > >   int val;
> > >   ((unsigned char*)&val)[0] = *buf++;
> > >   ((unsigned char*)&val)[1] = *buf++;
> > >   ((unsigned char*)&val)[2] = *buf++;
> > >   ((unsigned char*)&val)[3] = *buf++;
> > >   return val;
> > > }
> > > int main()
> > > {
> > >   int val = 0x01020304;
> > >   val = test (&val);
> > >   if (val != 0x01020304)
> > > abort ();
> > > }
> > >
> > > not sure if I got endianess correct.  Now, the question is what
> > > WORD_REGISTER_OPERATIONS implies for a bitfield insert and what
> > > the MIPS ABI says for returning SImode.
> >
> 
> MIPS N64 ABI uses 2 GPR for integer return values.
> If the return value is SImode, the first v0 register is used, and it
> must be sign-extended,
> aka the bits[64-31] are all same.
> 
> Yes, it is same for signed and unsigned int32.
> 
> https://irix7.com/techpubs/007-2816-004.pdf
> Page 6:
> 32-bit integer (int) parameters are always sign-extended when passed
> in registers,
> whether of signed or unsigned type. [This issue does not arise in the
> o32-bit ABI.]

Note I think Andrews comment#7 in the PR is spot-on then, the issue
isn't the bitfield inserts but the compare where combine elides
the sign_extend in favor of a subreg.  That's likely some wrongdoing
in simplify-rtx in the context of WORD_REGISTER_OPERATIONS.

Richard.


Re: [PATCH 3/3] testsuite: Require vectors of doubles for pr97428.c

2023-07-19 Thread Maciej W. Rozycki
On Wed, 12 Jul 2023, Richard Biener wrote:

> >  Applied, thanks.  OK to backport to the active branches?
> 
> Yes.

 Now backported, thanks.

  Maciej


Re: [PATCH] core: Support heap-based trampolines

2023-07-19 Thread Martin Uecker via Gcc-patches
Am Mittwoch, dem 19.07.2023 um 10:29 +0100 schrieb Iain Sandoe:
> Hi Martin,
> 
> > On 19 Jul 2023, at 10:04, Martin Uecker 
> > wrote:
> 
> > > > On 17 Jul 2023, 
> > > 
> > 
> > > > > You mention setjmp/longjmp - on darwin and other platforms
> > > requiring
> > > > > non-stack based trampolines
> > > > > does the system runtime provide means to deal with this issue
> > > > > like
> > > an
> > > > > alternate allocation method
> > > > > or a way to register cleanup?
> > > > 
> > > > There is an alternate mechanism relying on system libraries
> > > > that is
> > > possible on darwin specifically (I don’t know for other targets)
> > > but
> > > it will only work for signed binaries, and would require us to
> > > codesign everything produced by gcc. During development, it was
> > > deemed too big an ask and the current strategy was chosen (Iain
> > > can
> > > surely add more background on that if needed).
> > > 
> > > I do not think that this solves the setjump/longjump issue -
> > > since
> > > there’s still a notional allocation that takes place (it’s just
> > > that
> > > the mechanism for determining permissions is different).
> > > 
> > > It is also a big barrier for the general user - and prevents
> > > normal
> > > folks from distributing GCC - since codesigning requires an
> > > external
> > > certificate (i.e. I would really rather avoid it).
> > > 
> > > > > Was there ever an attempt to provide a "generic" trampoline
> > > > > driven
> > > by
> > > > > a more complex descriptor?
> > > 
> > > We did look at the “unused address bits” mechanism that Ada has
> > > used
> > > - but that is not really available to a non-private ABI (unless
> > > the
> > > system vendor agrees to change ABI to leave a bit spare) for the
> > > base
> > > arch either the bits are not there (e.g. X86) or reserved (e.g.
> > > AArch64).
> > > 
> > > Andrew Burgess did the original work he might have comments on
> > > alternatives we tried
> > > 
> > 
> > For reference, I proposed a patch for this in 2018. It was not
> > accepted because minimum alignment for functions would increase
> > for some archs:
> > 
> > https://gcc.gnu.org/legacy-ml/gcc-patches/2018-12/msg01532.html
> 
> Right - that was the one we originally looked at and has the issue
> that it 
> breaks ABI - and thus would need vendor by-in to alter as you say.
> 
> > > > > (well, it could be a bytecode interpreter and the trampoline
> > > > > being
> > > > > bytecode on the stack?!)
> > > > 
> > > > My own opinion is that executable stack should go away on all
> > > targets at some point, so a truly generic solution to the problem
> > > would be great.
> > > 
> > > indeed it would.
> 
> > I think we need a solution rather sooner than later on all archs.
> 
> AFAICS the  heap-based trampolines can work for any arch**, this
> issue is about
> system security policy, rather than arch, specifically?
> 
> It seems to me that for any system security policy that permits JIT,
> (but not
> executable stack) the heap-based trampolines are viable.

I agree. 

BTW; One option we discussed before, was to map a page with 
pre-allocated trampolines, which look up the address of
a callee and the static chain in a table based on its own
address. Then no code generation is involved.

The difficult part is avoiding leaks with longjmp / setjmp.
One idea was to have a shadow stack consisting of the
pre-allocated trampolines, but this probably causes other
issues...

I wonder how difficult it is to have longjmp / setjmp walk 
the stack in C?   This would also be useful for C++
interoperability and to free  heap-allocated VLAs.


As a user of nested functions, from my side it would also 
ok to simply add a wide function pointer type that contains
address + static chain.  This would require changing code, 
but would also work with Clang's blocks and solve other 
language interoperability problems, while avoiding all 
existing ABI issues.

> 
> This seems to be a useful step forward; and we can add some other
> mechanism to the flag’s supported list if someone develops one?

I think it is a useful step forward.

Martin


> 
> Iain
> 
> ** modulo the target maintainers implementing the builtins.
> 
> 
> 



Re: [committed] - Re: [patch] OpenMP/Fortran: Non-rectangular loops with constant steps other than 1 or -1 [PR107424]

2023-07-19 Thread Thomas Schwinge
Hi Tobias!

On 2023-07-19T10:26:12+0200, Tobias Burnus  wrote:
> Now committed as Rev. r14-2634-g85da0b40538fb0

On devel/omp/gcc-13 branch, the corresponding
commit b003e6511754dce475f7f5b0c5cd887a177e41b3
"OpenMP/Fortran: Non-rectangular loops with constant steps other than 1 or -1 
[PR107424]"
introduces a regression:

PASS: libgomp.fortran/loop-transforms/unroll-2.f90   -O0  (test for excess 
errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/loop-transforms/unroll-2.f90   -O0  
execution test

Etc.

spawn [open ...]
   4
   8
  10
  11

Program aborted. Backtrace:
#0  0x400f9c in test
at [...]/libgomp.fortran/loop-transforms/unroll-2.f90:85
#1  0x400fd3 in main
at [...]/libgomp.fortran/loop-transforms/unroll-2.f90:59


Grüße
 Thomas


> Changes:
>
> * I missed to updated another 'sorry' (msg wording change) - now fixed;
> I also added it to the sorry-testcase file non-rectangular-loop-5.f90.
>
> * I decided to retire the PR as several issues have been fixed and the
> original title did not fit any more. The remaining issue is now tracked
> in PR110735 (i.e. handling step != const, both the generic and possibly
> a simpler special case).
>
> * I added a link to the PR to libgomp.texi such that one can find out
> what is only partially supported for Fortran.
>
> Thanks,
>
> Tobias
>
> PS: Otherwise, the following still applies:
>
> On 18.07.23 14:11, Tobias Burnus wrote:
>> Comments regarding the validity of the Fortran assumptions are welcome!
>>
>> This patch now uses a 'simple' loop for OpenMP loops with
>> a constant loop-step size. Before, it only did so for step = ±1.
>> (Otherwise, a count variable is used from which the original
>> loop index variable is calculated from.)
>>
>> For details, see the attached patch or
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107424#c12
>> (comment 12 + 14 plus the email linked in comment 12).
>>
>> Comments? Remarks? If there are none, I will relatively soonish
>> commit the attached patch to mainline, only.

> commit 85da0b40538fb0d17d89de1e7905984668e3dfef
> Author: Tobias Burnus 
> Date:   Wed Jul 19 10:18:49 2023 +0200
>
> OpenMP/Fortran: Non-rectangular loops with constant steps other than 1 or 
> -1 [PR107424]
>
> Before this commit, gfortran produced with OpenMP for 'do i = 1,10,2'
> the code
>   for (count.0 = 0; count.0 < 5; count.0 = count.0 + 1)
> i = count.0 * 2 + 1;
>
> While such an inner loop can be collapsed, a non-rectangular could not.
> With this commit and for all constant loop steps, a simple loop such
> as 'for (i = 1; i <= 10; i = i + 2)' is created. (Before only for the
> constant steps of 1 and -1.)
>
> The constant step permits to know the direction (increasing/decreasing)
> that is required for the loop condition.
>
> The new code is only valid if one assumes no overflow of the loop 
> variable.
> However, the Fortran standard can be read that this must be ensured by
> the user. Namely, the Fortran standard requires (F2023, 10.1.5.2.4):
> "The execution of any numeric operation whose result is not defined by
> the arithmetic used by the processor is prohibited."
>
> And, for DO loops, F2023's "11.1.7.4.3 The execution cycle" has the
> following: The number of loop iterations handled by an iteration count,
> which would permit code like 'do i = huge(i)-5, huge(i),4'. However,
> in step (3), this count is not only decremented by one but also:
>   "... The DO variable, if any, is incremented by the value of the
>   incrementation parameter m3."
> And for the example above, 'i' would be 'huge(i)+3' in the last
> execution cycle, which exceeds the largest model number and should
> render the example as invalid.
>
> PR fortran/107424
>
> gcc/fortran/ChangeLog:
>
> * trans-openmp.cc (gfc_nonrect_loop_expr): Accept all
> constant loop steps.
> (gfc_trans_omp_do): Likewise; use sign to determine
> loop direction.
>
> libgomp/ChangeLog:
>
> * libgomp.texi (Impl. Status 5.0): Add link to new PR110735.
> * testsuite/libgomp.fortran/non-rectangular-loop-1.f90: Enable
> commented tests.
> * testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: Remove
> test file; tests are in non-rectangular-loop-1.f90.
> * testsuite/libgomp.fortran/non-rectangular-loop-5.f90: Change
> testcase to use a non-constant step to retain the 'sorry' test.
> * testsuite/libgomp.fortran/non-rectangular-loop-6.f90: New test.
>
> gcc/testsuite/ChangeLog:
>
> * gfortran.dg/gomp/linear-2.f90: Update dump to remove
> the additional count variable.
> ---
>  gcc/fortran/trans-openmp.cc|  18 +-
>  gcc/testsuite/gfortran.dg/gomp/linear-2.f90|   4 +-
>  libgomp/libgomp.t

Re: [PATCH 1/2] [i386] Support type _Float16/__bf16 independent of SSE2.

2023-07-19 Thread Jakub Jelinek via Gcc-patches
On Wed, Jul 19, 2023 at 01:58:21PM +0800, Hongtao Liu wrote:
> > LGTM, if you need someone to rubber-stamp the patch. I'm not really
> > versed in this part of the compiler, so please wait a day if someone
> > has anything to say about the patch.
> Thanks, pushed to trunk.

I see some regressions most likely with this change on i686-linux,
in particular:
+FAIL: gcc.dg/pr107547.c (test for excess errors)
+FAIL: gcc.dg/torture/floatn-convert.c   -O0  (test for excess errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -O0  compilation failed to 
produce executable
+FAIL: gcc.dg/torture/floatn-convert.c   -O1  (test for excess errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -O1  compilation failed to 
produce executable
+FAIL: gcc.dg/torture/floatn-convert.c   -O2  (test for excess errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -O2  compilation failed to 
produce executable
+FAIL: gcc.dg/torture/floatn-convert.c   -O2 -flto  (test for excess errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -O2 -flto  compilation failed to 
produce executable
+FAIL: gcc.dg/torture/floatn-convert.c   -O2 -flto -flto-partition=none  (test 
for excess errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -O2 -flto -flto-partition=none  
compilation failed to produce executable
+FAIL: gcc.dg/torture/floatn-convert.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  compilation failed to 
produce executable
+FAIL: gcc.dg/torture/floatn-convert.c   -O3 -g  (test for excess errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -O3 -g  compilation failed to 
produce executable
+FAIL: gcc.dg/torture/floatn-convert.c   -Os  (test for excess errors)
+UNRESOLVED: gcc.dg/torture/floatn-convert.c   -Os  compilation failed to 
produce executable
+FAIL: gcc.target/i386/float16-7.c  (test for errors, line 7)

Perhaps we need to tweak
gcc/testsuite/lib/target-supports.exp (add_options_for_float16)
so that it adds -msse2 for i?86-*-* x86_64-*-* (that would likely
fix up floatn-convert) and for the others perhaps
/* { dg-add-options float16 } */
?

Jakub



Re: [PATCH] Implement Bit-field lowering

2023-07-19 Thread Richard Biener via Gcc-patches
On Fri, Jul 14, 2023 at 7:49 AM naveenh--- via Gcc-patches
 wrote:
>
> From: Naveen H S 
>
> This patch adds lowering bit-field and opposite endian accesses pass.
> The patch addresses many issues in:-
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19466

A little bit more description would be nice ...

> 2023-07-14  Andrew Pinski   
> Co-authored-by: Naveen H S 
>
> gcc/ChangeLog:

And some bug references for bugzilla.  In fact you don't add any testcases
that are presumably fixed?

> * Makefile.in (OBJS): Add gimple-lower-accesses.o.
> * gimple-lower-accesses.cc: New file.
> * passes.def (pass_lower_accesses): Add.
> * tree-pass.h (make_pass_lower_accesses): Define.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/store_merging_14.c: Modify the pattern found as per new code.
> * gcc.dg/store_merging_16.c: Likewise.
> * gcc.dg/store_merging_20.c: Likewise.
> * gcc.dg/store_merging_21.c: Likewise.
> * gcc.dg/store_merging_24.c: Likewise.
> * gcc.dg/store_merging_25.c: Likewise.
> * gcc.dg/store_merging_6.c: Likewise.
> * gcc.dg/tree-ssa/20030729-1.c: Likewise.
> * gcc.dg/tree-ssa/20030814-6.c: Likewise.
> * gcc.dg/tree-ssa/loop-interchange-14.c: Likewise.
> * gcc.dg/tree-ssa/20030714-1.c: Remove.
> ---
>  gcc/Makefile.in   |   1 +
>  gcc/gimple-lower-accesses.cc  | 463 ++
>  gcc/passes.def|   4 +
>  gcc/testsuite/gcc.dg/store_merging_10.c   |   2 +-
>  gcc/testsuite/gcc.dg/store_merging_14.c   |   2 +-
>  gcc/testsuite/gcc.dg/store_merging_16.c   |   4 +-
>  gcc/testsuite/gcc.dg/store_merging_20.c   |   2 +-
>  gcc/testsuite/gcc.dg/store_merging_21.c   |   2 +-
>  gcc/testsuite/gcc.dg/store_merging_24.c   |   4 +-
>  gcc/testsuite/gcc.dg/store_merging_25.c   |   4 +-
>  gcc/testsuite/gcc.dg/store_merging_6.c|   2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/20030714-1.c|  45 --
>  gcc/testsuite/gcc.dg/tree-ssa/20030729-1.c|   5 +-
>  gcc/testsuite/gcc.dg/tree-ssa/20030814-6.c|   5 +-
>  .../gcc.dg/tree-ssa/loop-interchange-14.c |   2 +-
>  gcc/tree-pass.h   |   1 +
>  16 files changed, 485 insertions(+), 63 deletions(-)
>  create mode 100644 gcc/gimple-lower-accesses.cc
>  delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/20030714-1.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index c478ec85201..50bd28f4d04 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1338,6 +1338,7 @@ OBJS = \
> $(GIMPLE_MATCH_PD_SEQ_O) \
> gimple-match-exports.o \
> $(GENERIC_MATCH_PD_SEQ_O) \
> +   gimple-lower-accesses.o \
> insn-attrtab.o \
> insn-automata.o \
> insn-dfatab.o \
> diff --git a/gcc/gimple-lower-accesses.cc b/gcc/gimple-lower-accesses.cc
> new file mode 100644
> index 000..9d87acfba56
> --- /dev/null
> +++ b/gcc/gimple-lower-accesses.cc
> @@ -0,0 +1,463 @@
> +/* GIMPLE lowering bit-field and opposite endian accesses pass.
> +
> +   Copyright (C) 2017-2023 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "cfghooks.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "fold-const.h"
> +#include "stor-layout.h"
> +#include "tree-eh.h"
> +#include "gimplify.h"
> +#include "gimple-iterator.h"
> +#include "gimplify-me.h"
> +#include "tree-cfg.h"
> +#include "tree-dfa.h"
> +#include "tree-ssa.h"
> +#include "tree-ssa-propagate.h"
> +#include "tree-hasher.h"
> +#include "cfgloop.h"
> +#include "cfganal.h"
> +#include "alias.h"
> +#include "expr.h"
> +#include "tree-pretty-print.h"
> +
> +namespace {
> +
> +class lower_accesses
> +{
> +  function *fn;
> +public:
> +  lower_accesses (function *f) : fn(f) {}
> +  unsigned int execute (void);
> +};

please merge this with the gimple_opt_pass class

> +
> +
> +/* Handle reference to a bitfield EXPR.
> +   If BITPOS_P is non-null assume that reference is LHS and set *BITPOS_P
> +   to bit position of the field.
> +   If REF_P is non-null set it to the memory reference for the encompassing
> +   allocation unit.

there is no REF_P argument.

BITSIZE_P and PREVESEP are undocumented.

> +   Note *BI

[committed] testsuite: Add 64-bit vector variant for bb-slp-pr95839.c

2023-07-19 Thread Maciej W. Rozycki
Add dual-single float vector test complementing bb-slp-pr95839.c.

gcc/testsuite/
* gcc.dg/vect/bb-slp-pr95839-v8.c: New test.
---
Committed with Richard Biener's approval: 
.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c |   13 +
 1 file changed, 13 insertions(+)

gcc-test-bb-slp-pr95839-v8.diff
Index: gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_float } */
+/* { dg-additional-options "-w -Wno-psabi" } */
+
+typedef float __attribute__((vector_size(8))) v2f32;
+
+v2f32 f(v2f32 a, v2f32 b)
+{
+  /* Check that we vectorize this CTOR without any loads.  */
+  return (v2f32){a[0] + b[0], a[1] + b[1]};
+}
+
+/* { dg-final { scan-tree-dump "optimized: basic block" "slp2" } } */


[PATCH] wide-int: Fix up wi::divmod_internal [PR110731]

2023-07-19 Thread Jakub Jelinek via Gcc-patches
Hi!

As the following testcase shows, wi::divmod_internal doesn't handle
correctly signed division with precision > 64 when the dividend (and likely
divisor as well) is the type's minimum and the precision isn't divisible
by 64.

A few lines above what the patch hunk changes is:
  /* Make the divisor and dividend positive and remember what we
 did.  */
  if (sgn == SIGNED)
{
  if (wi::neg_p (dividend))
{
  neg_dividend = -dividend;
  dividend = neg_dividend;
  dividend_neg = true;
}
  if (wi::neg_p (divisor))
{
  neg_divisor = -divisor;
  divisor = neg_divisor;
  divisor_neg = true;
}
}
i.e. we negate negative dividend or divisor and remember those.
But, after we do that, when unpacking those values into b_dividend and
b_divisor we need to always treat the wide_ints as UNSIGNED,
because divmod_internal_2 performs an unsigned division only.
Now, if precision <= 64, we don't reach here at all, earlier code
handles it.  If dividend or divisor aren't the most negative values,
the negation clears their most significant bit, so it doesn't really
matter if we unpack SIGNED or UNSIGNED.  And if precision is multiple
of HOST_BITS_PER_WIDE_INT, there is no difference in behavior, while
-0x8000 negates to
-0x8000 the unpacking of it as SIGNED
or UNSIGNED works the same.
In the testcase, we have signed precision 119 and the dividend is
val = { 0, 0xffc0 }, len = 2, precision = 119
both before and after negation.
Divisor is
val = { 2 }, len = 1, precision = 119
But we really want to divide 0x40 by 2
unsigned and then negate at the end.
If it is unsigned precision 119 division
0x40 by 2
dividend is
val = { 0, 0xffc0 }, len = 2, precision = 119
but as we unpack it UNSIGNED, it is unpacked into
0, 0, 0, 0x0040

The following patch fixes it by always using UNSIGNED unpacking
because we've already negated negative values at that point if
sgn == SIGNED and so most negative constants should be treated as
positive.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, 13.2
and later for older branches?

2023-07-19  Jakub Jelinek  

PR tree-optimization/110731
* wide-int.cc (wi::divmod_internal): Always unpack dividend and
divisor as UNSIGNED regardless of sgn.

* gcc.dg/pr110731.c: New test.

--- gcc/wide-int.cc.jj  2023-06-12 15:47:22.461502821 +0200
+++ gcc/wide-int.cc 2023-07-19 09:52:40.241661869 +0200
@@ -1911,9 +1911,9 @@ wi::divmod_internal (HOST_WIDE_INT *quot
 }
 
   wi_unpack (b_dividend, dividend.get_val (), dividend.get_len (),
-dividend_blocks_needed, dividend_prec, sgn);
+dividend_blocks_needed, dividend_prec, UNSIGNED);
   wi_unpack (b_divisor, divisor.get_val (), divisor.get_len (),
-divisor_blocks_needed, divisor_prec, sgn);
+divisor_blocks_needed, divisor_prec, UNSIGNED);
 
   m = dividend_blocks_needed;
   b_dividend[m] = 0;
--- gcc/testsuite/gcc.dg/pr110731.c.jj  2023-07-19 10:03:03.707986705 +0200
+++ gcc/testsuite/gcc.dg/pr110731.c 2023-07-19 10:04:34.857716862 +0200
@@ -0,0 +1,17 @@
+/* PR tree-optimization/110731 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O2" } */
+
+__int128
+foo (void)
+{
+  struct S { __int128 f : 119; } s = { ((__int128) -18014398509481984) << 64 };
+  return s.f / 2;
+}
+
+int
+main ()
+{
+  if (foo () != (((__int128) -9007199254740992) << 64))
+__builtin_abort ();
+}

Jakub



Re: [PATCH] wide-int: Fix up wi::divmod_internal [PR110731]

2023-07-19 Thread Richard Biener via Gcc-patches
On Wed, 19 Jul 2023, Jakub Jelinek wrote:

> Hi!
> 
> As the following testcase shows, wi::divmod_internal doesn't handle
> correctly signed division with precision > 64 when the dividend (and likely
> divisor as well) is the type's minimum and the precision isn't divisible
> by 64.
> 
> A few lines above what the patch hunk changes is:
>   /* Make the divisor and dividend positive and remember what we
>  did.  */
>   if (sgn == SIGNED)
> {
>   if (wi::neg_p (dividend))
> {
>   neg_dividend = -dividend;
>   dividend = neg_dividend;
>   dividend_neg = true;
> }
>   if (wi::neg_p (divisor))
> {
>   neg_divisor = -divisor;
>   divisor = neg_divisor;
>   divisor_neg = true;
> }
> }
> i.e. we negate negative dividend or divisor and remember those.
> But, after we do that, when unpacking those values into b_dividend and
> b_divisor we need to always treat the wide_ints as UNSIGNED,
> because divmod_internal_2 performs an unsigned division only.
> Now, if precision <= 64, we don't reach here at all, earlier code
> handles it.  If dividend or divisor aren't the most negative values,
> the negation clears their most significant bit, so it doesn't really
> matter if we unpack SIGNED or UNSIGNED.  And if precision is multiple
> of HOST_BITS_PER_WIDE_INT, there is no difference in behavior, while
> -0x8000 negates to
> -0x8000 the unpacking of it as SIGNED
> or UNSIGNED works the same.
> In the testcase, we have signed precision 119 and the dividend is
> val = { 0, 0xffc0 }, len = 2, precision = 119
> both before and after negation.
> Divisor is
> val = { 2 }, len = 1, precision = 119
> But we really want to divide 0x40 by 2
> unsigned and then negate at the end.
> If it is unsigned precision 119 division
> 0x40 by 2
> dividend is
> val = { 0, 0xffc0 }, len = 2, precision = 119
> but as we unpack it UNSIGNED, it is unpacked into
> 0, 0, 0, 0x0040
> 
> The following patch fixes it by always using UNSIGNED unpacking
> because we've already negated negative values at that point if
> sgn == SIGNED and so most negative constants should be treated as
> positive.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, 13.2
> and later for older branches?

OK.

Thanks,
Richard.

> 2023-07-19  Jakub Jelinek  
> 
>   PR tree-optimization/110731
>   * wide-int.cc (wi::divmod_internal): Always unpack dividend and
>   divisor as UNSIGNED regardless of sgn.
> 
>   * gcc.dg/pr110731.c: New test.
> 
> --- gcc/wide-int.cc.jj2023-06-12 15:47:22.461502821 +0200
> +++ gcc/wide-int.cc   2023-07-19 09:52:40.241661869 +0200
> @@ -1911,9 +1911,9 @@ wi::divmod_internal (HOST_WIDE_INT *quot
>  }
>  
>wi_unpack (b_dividend, dividend.get_val (), dividend.get_len (),
> -  dividend_blocks_needed, dividend_prec, sgn);
> +  dividend_blocks_needed, dividend_prec, UNSIGNED);
>wi_unpack (b_divisor, divisor.get_val (), divisor.get_len (),
> -  divisor_blocks_needed, divisor_prec, sgn);
> +  divisor_blocks_needed, divisor_prec, UNSIGNED);
>  
>m = dividend_blocks_needed;
>b_dividend[m] = 0;
> --- gcc/testsuite/gcc.dg/pr110731.c.jj2023-07-19 10:03:03.707986705 
> +0200
> +++ gcc/testsuite/gcc.dg/pr110731.c   2023-07-19 10:04:34.857716862 +0200
> @@ -0,0 +1,17 @@
> +/* PR tree-optimization/110731 */
> +/* { dg-do run { target int128 } } */
> +/* { dg-options "-O2" } */
> +
> +__int128
> +foo (void)
> +{
> +  struct S { __int128 f : 119; } s = { ((__int128) -18014398509481984) << 64 
> };
> +  return s.f / 2;
> +}
> +
> +int
> +main ()
> +{
> +  if (foo () != (((__int128) -9007199254740992) << 64))
> +__builtin_abort ();
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[committed] libstdc++: Implement correct locale-specific chrono formatting [PR110719]

2023-07-19 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

I'll backport it to gcc-13 some time after the 13.2 release.

-- >8 --

This fixes some TODOs in the C++20  format support, where the
locale-specific output was incorrect or unimplemented. The approach
taken here is to either use the formatting locale's std::time_put facet
to do the formatting, or to remove subsecond precision from time points
so that locale-specific formats don't print fractional seconds. This
ensures that we are consistent with what the std::time_put facet would
print (which never includes fractional seconds) even if we actually
reimplement the formatting by hand instead of using the facet.

This also fixes a misplaced statement that allowed modifiers for %Z
which should have been on %z instead. There was also some ill-formed
code in an untested branch for formatting time zone names to wide
characters. A new test for zoned_time I/O has been added to exercise
that code properly.

libstdc++-v3/ChangeLog:

PR libstdc++/110719
* include/bits/chrono_io.h (__formatter_chrono::_M_parse): Fix
allowed modifiers for %z and %Z. Fix -Wparentheses and
-Wnarrowing warnings.
(__formatter_chrono::_M_format): Call new functions for %d, %e,
%H, %I, %m and %M.
(__formatter_chrono::_M_c): Use _S_floor_seconds to remove
subsecond precision.
(__formatter_chrono::_M_C_y_Y): Use _M_locale_fmt to handle
modifiers.
(__formatter_chrono::_M_e): Replace with _M_d_e and use
_M_locale_fmt.
(__formatter_chrono::_M_I): Replace with _M_H_I and use
_M_locale_fmt.
(__formatter_chrono::_M_m): New function.
(__formatter_chrono::_M_M): New function.
(__formatter_chrono::_M_r): Use _M_locale_fmt.
(__formatter_chrono::_M_S): Likewise.
(__formatter_chrono::_M_u_w): Likewise.
(__formatter_chrono::_M_U_V_W): Likewise.
(__formatter_chrono::_M_X): Use _S_floor_seconds.
(__formatter_chrono::_M_Z): Fix untested branch for wchar_t.
(__formatter_chrono::_S_altnum): Remove function.
(__formatter_chrono::_S_dd_zero_fill): Remove function.
(__formatter_chrono::_S_floor_seconds): New function.
(__formatter_chrono::_M_locale_fmt): New function.
* testsuite/std/time/clock/system/io.cc: Adjust expected output
for locale-specific formats and check modified formats.
* testsuite/std/time/clock/utc/io.cc: Likewise.
* testsuite/std/time/zoned_time/io.cc: New test.
---
 libstdc++-v3/include/bits/chrono_io.h | 295 +++---
 .../testsuite/std/time/clock/system/io.cc |  20 +-
 .../testsuite/std/time/clock/utc/io.cc|  12 +-
 .../testsuite/std/time/zoned_time/io.cc   |  64 
 4 files changed, 272 insertions(+), 119 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/zoned_time/io.cc

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index 5f06a6d76b4..43eeab42869 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -414,11 +414,10 @@ namespace __format
  break;
case 'z':
  __needed = _TimeZone;
- __allowed_mods = _Mod_E;
+ __allowed_mods = _Mod_E_O;
  break;
case 'Z':
  __needed = _TimeZone;
- __allowed_mods = _Mod_E_O;
  break;
case 'n':
case 't':
@@ -439,7 +438,7 @@ namespace __format
}
 
  if ((__mod == 'E' && !(__allowed_mods & _Mod_E))
-   || __mod == 'O' && !(__allowed_mods & _Mod_O))
+   || (__mod == 'O' && !(__allowed_mods & _Mod_O)))
__throw_format_error("chrono format error: invalid "
 " modifier in chrono-specs");
  __mod = _CharT();
@@ -471,7 +470,8 @@ namespace __format
 "chrono-specs");
 
  _M_spec = __spec;
- _M_spec._M_chrono_specs = {__chrono_specs, __first - __chrono_specs};
+ _M_spec._M_chrono_specs
+= __string_view(__chrono_specs, __first - __chrono_specs);
 
  return __first;
}
@@ -551,18 +551,12 @@ namespace __format
  __out = _M_C_y_Y(__t, std::move(__out), __fc, __c, __mod);
  break;
case 'd':
- // %d  The day of month as a decimal number.
- // %Od Locale's alternative representation.
- __out = _S_dd_zero_fill((unsigned)_S_day(__t),
- std::move(__out),
- __fc, __mod == 'O');
+   case 'e':
+ __out = _M_d_e(__t, std::move(__out), __fc, __c, __mod == 
'O');
  break;
case 'D':
  __ou

[committed] libstdc++: Avoid warning in std::format

2023-07-19 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

With -Wmaybe-uninitialized -Wsystem-headers there's a warning about
creating a string_view from an uninitalized array. Initializing the
first element of the array avoids the warning.

libstdc++-v3/ChangeLog:

* include/std/format (__write_padded): Initialize first element
of array to avoid a -Wmaybe-uninitialized warning.
---
 libstdc++-v3/include/std/format | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 9d5981e4882..9710bff3c03 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -610,6 +610,7 @@ namespace __format
 {
   const size_t __buflen = 0x20;
   _CharT __padding_chars[__buflen];
+  __padding_chars[0] = _CharT();
   basic_string_view<_CharT> __padding{__padding_chars, __buflen};
 
   auto __pad = [&__padding] (size_t __n, _Out& __o) {
-- 
2.41.0



Re: vectorizer: Avoid an OOB access from vectorization

2023-07-19 Thread Richard Biener via Gcc-patches
On Tue, 18 Jul 2023, Matthew Malcomson wrote:

> Tamar pointed out it would be good to have a `scan-tree-dump` in the testcase
> just to make sure that when something is currently vectorizing it stays
> vectorizing (and hence that the new code is still likely running).
> 
> Attached patch has that change, also inlined for ease of reply.

The patch is OK with a suitable changelog.

Thanks,
Richard.

> --
> > Our checks for whether the vectorization of a given loop would make an
> > out of bounds access miss the case when the vector we load is so large
> > as to span multiple iterations worth of data (while only being there to
> > implement a single iteration).
> > 
> > This patch adds a check for such an access.
> > 
> > Example where this was going wrong (smaller version of testcase added):
> > 
> > ```
> >   extern unsigned short multi_array[5][16][16];
> >   extern void initialise_s(int *);
> >   extern int get_sval();
> > 
> >   void foo() {
> > int s0 = get_sval();
> > int s[31];
> > int i,j;
> > initialise_s(&s[0]);
> > s0 = get_sval();
> > for (j=0; j < 16; j++)
> >   for (i=0; i < 16; i++)
> > multi_array[1][j][i]=s[j*2];
> >   }
> > ```
> > 
> > With the above loop we would load the `s[j*2]` integer into a 4 element
> > vector, which reads 3 extra elements than the scalar loop would.
> > `get_group_load_store_type` identifies that the loop requires a scalar
> > epilogue due to gaps.  However we do not identify that the above code
> > requires *two* scalar loops to be peeled due to the fact that each
> > iteration loads an amount of data from the *next* iteration (while not
> > using it).
> > 
> > Bootstrapped and regtested on aarch64-none-linux-gnu.
> > N.b. out of interest we came across this working with Morello.
> > 
> > 
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c 
> b/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c
> new file mode 100644
> index 
> ..1aab4c5a14d1e8346d89587bd9544a1516535a45
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c
> @@ -0,0 +1,61 @@
> +/* For some targets we end up vectorizing the below loop such that the `sp`
> +   single integer is loaded into a 4 integer vector.
> +   While the writes are all safe, without 2 scalar loops being peeled into 
> the
> +   epilogue we would read past the end of the 31 integer array.  This happens
> +   because we load a 4 integer chunk to only use the first integer and
> +   increment by 2 integers at a time, hence the last load needs s[30-33] and
> +   the penultimate load needs s[28-31].
> +   This testcase ensures that we do not crash due to that behaviour.  */
> +/* { dg-require-effective-target mmap } */
> +#include 
> +#include 
> +
> +#define MMAP_SIZE 0x2
> +#define ADDRESS 0x112200
> +
> +#define MB_BLOCK_SIZE 16
> +#define VERT_PRED_16 0
> +#define HOR_PRED_16 1
> +#define DC_PRED_16 2
> +int *sptr;
> +extern void intrapred_luma_16x16();
> +unsigned short mprr_2[5][16][16];
> +void initialise_s(int *s) { }
> +int main() {
> +void *s_mapping;
> +void *end_s;
> +s_mapping = mmap ((void *)ADDRESS, MMAP_SIZE, PROT_READ | PROT_WRITE,
> +   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +if (s_mapping == MAP_FAILED)
> +  {
> + perror ("mmap");
> + return 1;
> +  }
> +end_s = (s_mapping + MMAP_SIZE);
> +sptr = (int*)(end_s - sizeof(int[31]));
> +intrapred_luma_16x16(sptr);
> +return 0;
> +}
> +
> +void intrapred_luma_16x16(int * restrict sp) {
> +for (int j=0; j < MB_BLOCK_SIZE; j++)
> +  {
> + mprr_2[VERT_PRED_16][j][0]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][1]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][2]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][3]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][4]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][5]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][6]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][7]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][8]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][9]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][10]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][11]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][12]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][13]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][14]=sp[j*2];
> + mprr_2[VERT_PRED_16][j][15]=sp[j*2];
> +  }
> +}
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" {target vect_int } } 
> } */
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 
> c08d0ef951fc63adcfffc601917134ddf51ece45..1c8c6784cc7b5f2d327339ff55a5a5ea08835aab
>  100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -2217,7 +2217,9 @@ get_group_load_store_type (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>but the access in the loop doesn't cover the full vector
>we can end up with no gap recorded bu

[committed] tree-switch-conversion: Fix a comment typo

2023-07-19 Thread Jakub Jelinek via Gcc-patches
Hi!

I've noticed a comment typo, this patch fixes that.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
as obvious.

2023-07-19  Jakub Jelinek  

* tree-switch-conversion.h (class bit_test_cluster): Fix comment typo.

--- gcc/tree-switch-conversion.h.jj 2023-02-17 12:45:08.218636117 +0100
+++ gcc/tree-switch-conversion.h2023-07-18 10:10:21.398933379 +0200
@@ -303,7 +303,7 @@ public:
 /* A GIMPLE switch statement can be expanded to a short sequence of bit-wise
 comparisons.  "switch(x)" is converted into "if ((1 << (x-MINVAL)) & CST)"
 where CST and MINVAL are integer constants.  This is better than a series
-of compare-and-banch insns in some cases,  e.g. we can implement:
+of compare-and-branch insns in some cases,  e.g. we can implement:
 
if ((x==4) || (x==6) || (x==9) || (x==11))
 

Jakub



Re: [GCC 13 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest.

2023-07-19 Thread Richard Biener via Gcc-patches
On Sun, Jun 11, 2023 at 12:55 AM Roger Sayle  wrote:
>
>
> This is a backport of the fixes for PR target/109973 and PR target/110083.
>
> This backport to the releases/gcc-13 branch has been tested on
> x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and
> without --target_board=unix{-m32} with no new failures.  Ok for gcc-13,
> or should we just close PR 109973 in Bugzilla?

As alternative solution for the GCC 13 branch I have tested reverting
r13-2006-ga56c1641e9d25e successfully.  Can we choose between the
options please?  Sorry I'm only bringing this up now but 13.2 RC is due
tomorrow.

Thank you,
Richard.

>
>
> 2023-06-10  Roger Sayle  
> Uros Bizjak  
>
> gcc/ChangeLog
> PR target/109973
> PR target/110083
> * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
> CODE_for_sse4_1_ptestzv2di.
> (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
> (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
> (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
> * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
> when expanding UNSPEC_PTEST to compare against zero.
> * config/i386/i386-features.cc (scalar_chain::convert_compare):
> Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
> Update or delete REG_EQUAL notes, converting CONST_INT and
> CONST_WIDE_INT immediate operands to a suitable CONST_VECTOR.
> (general_scalar_chain::convert_insn): Use CCZmode for COMPARE
> result.
> (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
> * config/i386/i386-protos.h (ix86_match_ptest_ccmode): Prototype.
> * config/i386/i386.cc (ix86_match_ptest_ccmode): New predicate to
> check for suitable matching modes for the UNSPEC_PTEST pattern.
> * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
> to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
> (*_ptest): Add asterisk to hide define_insn.  Remove
> ":CC" mode of FLAGS_REG, instead use ix86_match_ptest_ccmode.
> (_ptestz): New define_expand to specify CCZ.
> (_ptestc): New define_expand to specify CCC.
> (_ptest): A define_expand using CC to preserve the
> current behavior.
> (*ptest_and): Specify CCZ to only perform this optimization
> when only the Z flag is required.
>
> gcc/testsuite/ChangeLog
> PR target/109973
> PR target/110083
> * gcc.target/i386/pr109973-1.c: New test case.
> * gcc.target/i386/pr109973-2.c: Likewise.
> * gcc.target/i386/pr110083.c: Likewise.
>
>
> Thanks,
> Roger
> --
>


Re: [PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-19 Thread Jeff Law via Gcc-patches




On 7/19/23 04:25, Richard Biener wrote:

On Wed, 19 Jul 2023, YunQiang Su wrote:


Eric Botcazou  ?2023?7?19??? 17:45???



I don't see that.  That's definitely not what GCC expects here,
the left-most word of the doubleword should be unchanged.

Your testcase should be a dg-do-run and probably more like

NOMIPS16 int __attribute__((noipa)) test (const unsigned char *buf)
{
   int val;
   ((unsigned char*)&val)[0] = *buf++;
   ((unsigned char*)&val)[1] = *buf++;
   ((unsigned char*)&val)[2] = *buf++;
   ((unsigned char*)&val)[3] = *buf++;
   return val;
}
int main()
{
   int val = 0x01020304;
   val = test (&val);
   if (val != 0x01020304)
 abort ();
}

not sure if I got endianess correct.  Now, the question is what
WORD_REGISTER_OPERATIONS implies for a bitfield insert and what
the MIPS ABI says for returning SImode.




MIPS N64 ABI uses 2 GPR for integer return values.
If the return value is SImode, the first v0 register is used, and it
must be sign-extended,
aka the bits[64-31] are all same.

Yes, it is same for signed and unsigned int32.

https://irix7.com/techpubs/007-2816-004.pdf
Page 6:
32-bit integer (int) parameters are always sign-extended when passed
in registers,
whether of signed or unsigned type. [This issue does not arise in the
o32-bit ABI.]


Note I think Andrews comment#7 in the PR is spot-on then, the issue
isn't the bitfield inserts but the compare where combine elides
the sign_extend in favor of a subreg.  That's likely some wrongdoing
in simplify-rtx in the context of WORD_REGISTER_OPERATIONS.
And I think it raises a real question about the use of GPR (which maps 
to SImode and DImode for 64bit MIPS targets) on the conditional 
branching patterns in mips.md.


So while this code works:


(insn 20 19 23 2 (set (reg/v:DI 200 [ val+-4 ])
(sign_extend:DI (subreg:SI (reg/v:DI 200 [ val+-4 ]) 4))) 
"/app/example.cpp":7:29 -1
 (nil))
(jump_insn 23 20 24 2 (set (pc)
(if_then_else (le (subreg/s/u:SI (reg/v:DI 200 [ val+-4 ]) 4)
(const_int 0 [0]))
(label_ref 32)
(pc))) "/app/example.cpp":8:5 -1
 (int_list:REG_BR_PROB 440234148 (nil))
 -> 32)



Normally the narrowing SUBREG in insn 23 would indicate we don't care 
about the bits outside SImode.  But on a W_R_O targets we very much care 
because the hardware is going to ultimately do the comparison in 64 bits.


As Andrew/Richi have indicated this very much points to combine as 
incorrectly eliminating the explict sign extension.  Most likely because 
something saw the SUBREG and concluded those upper bits set by insn 20 
were "don't care" bits.


But it may ultimately be be better for the MIPS port to not expose a 
SImode comparison.  Thus reducing the reliance on W_R_O and its 
under-specified semantics and ultimately having the RTL map more closely 
to what the hardware actually does/supports.


That's the model we're working towards on the RISC-V port as well.  I 
wouldn't be surprised if we eventually get to the point where we 
eliminate WORD_REGISTER_OPERATIONS entirely.


And yes, bitfield operations are one of the nasty sticking points.  The 
thinking for them is that we want to support bit manipulations where the 
bit position is variable.  To do that we will emit an explicit sign 
extension after such operations.  Then rely on improved REE to identify 
and remove those redundant extensions.


Jeff

Jeff


[OG13][committed] gfortran.dg/gomp/affinity-clause-1.f90: Fix scan-tree-dump

2023-07-19 Thread Tobias Burnus

OG13 (devel/omp/gcc-13) has a patch which uses sometimes
the tree's array representation. The patch unfortunately
did not make it to mainline so far.

The here attached/committed patch adapts the expected dump for
the OG13 variant to unfail this testcase. With this commit,
gfortran/gomp/ has no fails. (Contrary to gfortran/oacc/ which
has plenty.)

Committed as 49ad5a86615

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 49ad5a86615089d236ae8ee9a2b7d17db1f0c1d7
Author: Tobias Burnus 
Date:   Wed Jul 19 13:33:29 2023 +0200

gfortran.dg/gomp/affinity-clause-1.f90: Fix scan-tree-dump

gcc/testsuite/
* gfortran.dg/gomp/affinity-clause-1.f90: Fix scan-tree-dump.
---
 gcc/testsuite/ChangeLog.omp  | 4 
 gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90 | 4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index 4d986631574..435ad855dd9 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,7 @@
+2023-07-19  Tobias Burnus  
+
+	* gfortran.dg/gomp/affinity-clause-1.f90: Fix scan-tree-dump.
+
 2023-07-19  Tobias Burnus  
 
 	Backported from master:
diff --git a/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90 b/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90
index 93e9afa72d6..568ae441fcf 100644
--- a/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90
@@ -22,9 +22,9 @@ end
 
 ! { dg-final { scan-tree-dump-times "D\\.\[0-9\]+ = .integer.kind=4.. __builtin_cosf ..real.kind=4.. a \\+ 1.0e\\+0\\);" 2 "original" } }
 
-! { dg-final { scan-tree-dump-times "#pragma omp task affinity\\(iterator\\(integer\\(kind=4\\) jj=2:5:2, integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):b\\\[\\(integer\\(kind=8\\)\\) i \\+ -1\\\]\\) affinity\\(iterator\\(integer\\(kind=4\\) jj=2:5:2, integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):\\(\\(integer\\(kind=4\\)\\\[1:5\\\]\\\[1:5\\\]\\) d\\)\\\[\\(integer\\(kind=8\\)\\) jj\\\]{lb: 1 sz: 20}\\\[\\(integer\\(kind=8\\)\\) i\\\]{lb: 1 sz: 4}\\)"  1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp task affinity\\(iterator\\(integer\\(kind=4\\) jj=2:5:2, integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):b\\\[.* ? \\+ -1\\\]\\) affinity\\(iterator\\(integer\\(kind=4\\) jj=2:5:2, integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):\\(\\(integer\\(kind=4\\)\\\[1:5\\\]\\\[1:5\\\]\\) d\\)\\\[\[^\\\]\]*jj\\\]{lb: 1 sz: 20}\\\[\[^\\\]\]*i\\\]{lb: 1 sz: 4}\\)"  1 "original" } }
 
-! { dg-final { scan-tree-dump-times "#pragma omp task affinity\\(iterator\\(integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):b\\\[.* ? \\+ -1\\\]\\) affinity\\(iterator\\(integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):d\\\[\\(.*i \\+ -1\\) \\* 6\\\]\\)"  1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp task affinity\\(iterator\\(integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):b\\\[.* ? \\+ -1\\\]\\) affinity\\(iterator\\(integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):\\(\\(integer\\(kind=4\\)\\\[1:5\\\]\\\[1:5\\\]\\) d\\)\\\[\[^\\\]\]*i\\\]\\{lb: 1 sz: 20\\}\\\[\[^\\\]\]*i\\\]\\{lb: 1 sz: 4\\}\\)"  1 "original" } }
 ! { dg-final { scan-tree-dump-times "#pragma omp task affinity\\(iterator\\(integer\\(kind=4\\) i=1:5:1\\):a\\)\[^ \]" 1 "original" } }
 
 ! { dg-final { scan-tree-dump-times "#pragma omp task affinity\\(iterator\\(integer\\(kind=4\\) i=1:5:1\\):a\\) affinity\\(iterator\\(integer\\(kind=4\\) i=1:5:1\\):\\*x\\)"  1 "original" } }


Re: [GCC 13 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest.

2023-07-19 Thread Uros Bizjak via Gcc-patches
On Wed, Jul 19, 2023 at 2:21 PM Richard Biener
 wrote:
>
> On Sun, Jun 11, 2023 at 12:55 AM Roger Sayle  
> wrote:
> >
> >
> > This is a backport of the fixes for PR target/109973 and PR target/110083.
> >
> > This backport to the releases/gcc-13 branch has been tested on
> > x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and
> > without --target_board=unix{-m32} with no new failures.  Ok for gcc-13,
> > or should we just close PR 109973 in Bugzilla?
>
> As alternative solution for the GCC 13 branch I have tested reverting
> r13-2006-ga56c1641e9d25e successfully.  Can we choose between the
> options please?  Sorry I'm only bringing this up now but 13.2 RC is due
> tomorrow.
>
> Thank you,
> Richard.
>
> >
> >
> > 2023-06-10  Roger Sayle  
> > Uros Bizjak  
> >
> > gcc/ChangeLog
> > PR target/109973
> > PR target/110083
> > * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
> > CODE_for_sse4_1_ptestzv2di.
> > (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
> > (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
> > (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
> > * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
> > when expanding UNSPEC_PTEST to compare against zero.
> > * config/i386/i386-features.cc (scalar_chain::convert_compare):
> > Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
> > Update or delete REG_EQUAL notes, converting CONST_INT and
> > CONST_WIDE_INT immediate operands to a suitable CONST_VECTOR.
> > (general_scalar_chain::convert_insn): Use CCZmode for COMPARE
> > result.
> > (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
> > * config/i386/i386-protos.h (ix86_match_ptest_ccmode): Prototype.
> > * config/i386/i386.cc (ix86_match_ptest_ccmode): New predicate to
> > check for suitable matching modes for the UNSPEC_PTEST pattern.
> > * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
> > to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
> > (*_ptest): Add asterisk to hide define_insn.  Remove
> > ":CC" mode of FLAGS_REG, instead use ix86_match_ptest_ccmode.
> > (_ptestz): New define_expand to specify CCZ.
> > (_ptestc): New define_expand to specify CCC.
> > (_ptest): A define_expand using CC to preserve the
> > current behavior.
> > (*ptest_and): Specify CCZ to only perform this optimization
> > when only the Z flag is required.
> >
> > gcc/testsuite/ChangeLog
> > PR target/109973
> > PR target/110083
> > * gcc.target/i386/pr109973-1.c: New test case.
> > * gcc.target/i386/pr109973-2.c: Likewise.
> > * gcc.target/i386/pr110083.c: Likewise.

Yes, I would rather have the offending patch reverted on gcc-13.

Uros.


[OG13][committed] libgomp.fortran/map-subarray-5.f90: Fix for shared-mem device/host

2023-07-19 Thread Tobias Burnus

Fix testcase if from
  "OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic"

which has been semi-submitted* to mainline as:
  "[PATCH 09/14] OpenMP/OpenACC: Unordered/non-constant component offset runtime 
diagnostic"
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/61.html

Thus, the mainline version will need the same patch.

(* The patch is part of the OG13 rebased series of a series scheduled for
mainline inclusion; for details, cf. 00/14 email at
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622213.html )

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit a9d17fbd1e918019b77a6d9616704db85b5c3e8c
Author: Tobias Burnus 
Date:   Wed Jul 19 14:52:00 2023 +0200

libgomp.fortran/map-subarray-5.f90: Fix for shared-mem device/host

libgomp/

* testsuite/libgomp.fortran/map-subarray-5.f90: Only expect
libgomp dg-output for offload_device_nonshared_as.
---
 libgomp/ChangeLog.omp| 5 +
 libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 7cc64943446..cbc12c0fab0 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,3 +1,8 @@
+2023-07-19  Tobias Burnus  
+
+	* testsuite/libgomp.fortran/map-subarray-5.f90: Only expect
+	libgomp dg-output for offload_device_nonshared_as.
+
 2023-07-19  Tobias Burnus  
 
 	Backported from master:
diff --git a/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
index e7cdf11e610..59ad01ab76b 100644
--- a/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
+++ b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
@@ -49,6 +49,6 @@ end do
 
 end
 
-! { dg-output "(\n|\r|\r\n)" }
-! { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" }
+! { dg-output "(\n|\r|\r\n)" { target offload_device_nonshared_as } }
+! { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" { target offload_device_nonshared_as } }
 ! { dg-shouldfail "" { offload_device_nonshared_as } }


Re: [PATCH] middle-end/61747 - conditional move expansion and constants

2023-07-19 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 18, 2023 at 01:25:45PM +0200, Richard Biener wrote:
> 
>   PR middle-end/61747
>   * internal-fn.cc (expand_vec_cond_optab_fn): When the
>   value operands are equal to the original comparison operands
>   preserve that equality by re-using the comparison expansion.
>   * optabs.cc (emit_conditional_move): When the value operands
>   are equal to the comparison operands and would be forced to
>   a register by prepare_cmp_insn do so earlier, preserving the
>   equality.
> 
>   * g++.target/i386/pr61747.C: New testcase.
> ---
>  gcc/internal-fn.cc  | 17 --
>  gcc/optabs.cc   | 32 ++-
>  gcc/testsuite/g++.target/i386/pr61747.C | 42 +
>  3 files changed, 88 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr61747.C
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index e698f0bffc7..c83c3921792 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3019,8 +3019,21 @@ expand_vec_cond_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>icode = convert_optab_handler (optab, mode, cmp_op_mode);
>rtx comparison
>  = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp, icode, 4);
> -  rtx rtx_op1 = expand_normal (op1);
> -  rtx rtx_op2 = expand_normal (op2);
> +  /* vector_compare_rtx legitimizes operands, preserve equality when
> + expanding op1/op2.  */
> +  rtx rtx_op1, rtx_op2;
> +  if (operand_equal_p (op1, op0a))
> +rtx_op1 = XEXP (comparison, 0);
> +  else if (operand_equal_p (op1, op0b))
> +rtx_op1 = XEXP (comparison, 1);
> +  else
> +rtx_op1 = expand_normal (op1);
> +  if (operand_equal_p (op2, op0a))
> +rtx_op2 = XEXP (comparison, 0);
> +  else if (operand_equal_p (op2, op0b))
> +rtx_op2 = XEXP (comparison, 1);
> +  else
> +rtx_op2 = expand_normal (op2);
>  
>rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>create_output_operand (&ops[0], target, mode);

The above LGTM, it relies on vector_compare_rtx not swapping the arguments
or performing some other comparison canonicalization, but at least right now
that function doesn't seem to do that.

> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -5119,13 +5119,43 @@ emit_conditional_move (rtx target, struct 
> rtx_comparison comp,
> last = get_last_insn ();
> do_pending_stack_adjust ();
> machine_mode cmpmode = comp.mode;
> +   rtx orig_op0 = XEXP (comparison, 0);
> +   rtx orig_op1 = XEXP (comparison, 1);
> +   rtx op2p = op2;
> +   rtx op3p = op3;
> +   /* If we are optimizing, force expensive constants into a register
> +  but preserve an eventual equality with op2/op3.  */
> +   if (CONSTANT_P (orig_op0) && optimize
> +   && (rtx_cost (orig_op0, mode, COMPARE, 0,
> + optimize_insn_for_speed_p ())
> +   > COSTS_N_INSNS (1))
> +   && can_create_pseudo_p ())
> + {
> +   XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
> +   if (rtx_equal_p (orig_op0, op2))
> + op2p = XEXP (comparison, 0);
> +   if (rtx_equal_p (orig_op0, op3))
> + op3p = XEXP (comparison, 0);
> + }
> +   if (CONSTANT_P (orig_op1) && optimize
> +   && (rtx_cost (orig_op1, mode, COMPARE, 0,
> + optimize_insn_for_speed_p ())
> +   > COSTS_N_INSNS (1))
> +   && can_create_pseudo_p ())
> + {
> +   XEXP (comparison, 1) = force_reg (cmpmode, orig_op1);
> +   if (rtx_equal_p (orig_op1, op2))
> + op2p = XEXP (comparison, 1);
> +   if (rtx_equal_p (orig_op1, op3))
> + op3p = XEXP (comparison, 1);
> + }

I'm worried here, because prepare_cmp_insn before doing almost identical
forcing to reg does
  if (CONST_SCALAR_INT_P (y))
canonicalize_comparison (mode, &comparison, &y);
which the above change will make not happen anymore (for the more expensive
constants).
If we have a match between at least one of the comparison operands and
op2/op3, I think having equivalency there is perhaps more important than
the canonicalization, but it would be nice not to break it even if there
is no match.  So, perhaps force_reg only if there is a match?
force_reg (cmpmode, force_reg (cmpmode, x)) is equivalent to
force_reg (cmpmode, x), so perhaps:
{
  if (rtx_equal_p (orig_op0, op2))
op2p = XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
  if (rtx_equal_p (orig_op0, op3))
op3p = XEXP (comparison, 0)
  = force_reg (cmpmode, XEXP (comparison, 0));
}
and similarly for the other body?

Jakub



Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-19 Thread Drew Ross via Gcc-patches
Trying to lower converts to operands through, for example,
(for op (bit_ior bit_and bit_xor)
 (for rop (bit_xor bit_ior bit_and)
  (simplify
   (op:c (nop_convert (rop @0 @1)) @3)
   (op (rop (convert:type @0) (convert:type @1)) @3

(simplify
 (convert (bit_not @0))
 (bit_not (convert:type @0)))

Runs into infinite oscillations with
/* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
   when profitable.
...
  (bitop (convert@2 @0) (convert?@3 @1))
...
   (convert (bitop @0 (convert @1)

when integer constants are involved ex.
unsigned int main (int x, unsigned int y)
{
  unsigned int a = x | 4213678;
  unsigned int b = a ^ y;
  return b;
}

I think using Jakub's bitwise equal macro to get it down to 16 cases might
be our best option.

Drew

On Tue, Jul 11, 2023 at 9:58 AM Richard Biener 
wrote:

> On Tue, Jul 11, 2023 at 3:08 PM Jakub Jelinek  wrote:
> >
> > On Thu, Jul 06, 2023 at 03:00:28PM +0200, Richard Biener via Gcc-patches
> wrote:
> > > On Wed, Jul 5, 2023 at 3:42 PM Drew Ross via Gcc-patches
> > >  wrote:
> > > >
> > > > Adds a simplification for (~X | Y) ^ X to be folded into ~(X &
> Y).
> > > > Tested successfully on x86_64 and x86 targets.
> > > >
> > > > PR middle-end/109986
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * match.pd ((~X | Y) ^ X -> ~(X & Y)): New
> simplification.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.c-torture/execute/pr109986.c: New test.
> > > > * gcc.dg/tree-ssa/pr109986.c: New test.
> > > > ---
> > > >  gcc/match.pd  |  11 ++
> > > >  .../gcc.c-torture/execute/pr109986.c  |  41 
> > > >  gcc/testsuite/gcc.dg/tree-ssa/pr109986.c  | 177
> ++
> > > >  3 files changed, 229 insertions(+)
> > > >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index a17d6838c14..d9d7d932881 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -1627,6 +1627,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > >   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
> > > >(convert (bit_and @1 (bit_not @0)
> > > >
> > > > +/* (~X | Y) ^ X -> ~(X & Y).  */
> > > > +(simplify
> > > > + (bit_xor:c (nop_convert1?
> > > > + (bit_ior:c (nop_convert2? (bit_not (nop_convert3? @0)))
> > > > +@1)) (nop_convert4? @0))
> > >
> > > you want to reduce the number of nop_convert? - for example
> > > I wonder if we can canonicalize
> > >
> > >  (T)~X and ~(T)X
> > >
> > > for nop-conversions.  The same might apply to binary bitwise operations
> > > where we should push those to a direction where they are likely
> eliminated.
> > > Usually we'd push them outwards.
> > >
> > > The issue with the above pattern is that nop_convertN? expands to 2^N
> > > separate patterns.  Together with the two :c you get 64 out of this.
> > >
> > > I do not see that all of the combinations can happen when X has to
> > > match unless we fail to contract some of them like if we have
> > > (unsigned)(~(signed)X | Y) ^ X which we could rewrite like
> > > -> (unsigned)((signed)~X | Y) ^ X -> (~X | (unsigned) Y) ^ X
> > > with the last step being somewhat difficult unless we do
> > > (signed)~X | Y -> (signed)(~X | (unsigned)Y).  It feels like a
> > > propagation problem and less of a direct pattern matching one.
> >
> > The nop_convert1? in the pattern might seem to be unnecessary
> > for cases like:
> > int i, j, k, l;
> > unsigned u, v, w, x;
> >
> > void
> > foo (void)
> > {
> >   int t0 = i;
> >   int t1 = (~t0) | j;
> >   x = t1 ^ (unsigned) t0;
> >   unsigned t2 = u;
> >   unsigned t3 = (~t2) | v;
> >   i = ((int) t3) ^ (int) t2;
> > }
> > we actually optimize it with or without the nop_convert1? in place,
> > because we have the
> > /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
> >when profitable.
> > ...
> >   (bitop (convert@2 @0) (convert?@3 @1))
> > ...
> >(convert (bitop @0 (convert @1)
> > simplification.
> > Except that on
> > void
> > bar (void)
> > {
> >   unsigned t0 = u;
> >   int t1 = (~(int) t0) | j;
> >   x = t1 ^ t0;
> >   int t2 = i;
> >   unsigned t3 = (~(unsigned) t2) | v;
> >   i = ((int) t3) ^ t2;
> > }
> > the optimization doesn't trigger without the nop_convert1? and does
> > with it.
> >
> > Perhaps we could get rid of nop_convert3? and nop_convert4?
> > by introducing a macro/inline function predicate like:
> > bitwise_equal_p (expr1, expr2) and instead of using
> > (nop_convert3? @0) and (nop_convert4? @0) in the pattern
> > use @0 and @2 and then add
> > if (bitwise_equal_p (@0, @2))
> > to the condition.
> > For GENERIC (i.e. in generic-match-head.cc) it could be something like:
> > static inline bool
> > bitwise_equal_p (tree expr1, tree expr2)
> > {
> >   STRIP_NOPS (expr1);
> >   STRIP_NOPS (expr2)

Re: [PATCH] middle-end/61747 - conditional move expansion and constants

2023-07-19 Thread Richard Biener via Gcc-patches
On Wed, 19 Jul 2023, Jakub Jelinek wrote:

> On Tue, Jul 18, 2023 at 01:25:45PM +0200, Richard Biener wrote:
> > 
> > PR middle-end/61747
> > * internal-fn.cc (expand_vec_cond_optab_fn): When the
> > value operands are equal to the original comparison operands
> > preserve that equality by re-using the comparison expansion.
> > * optabs.cc (emit_conditional_move): When the value operands
> > are equal to the comparison operands and would be forced to
> > a register by prepare_cmp_insn do so earlier, preserving the
> > equality.
> > 
> > * g++.target/i386/pr61747.C: New testcase.
> > ---
> >  gcc/internal-fn.cc  | 17 --
> >  gcc/optabs.cc   | 32 ++-
> >  gcc/testsuite/g++.target/i386/pr61747.C | 42 +
> >  3 files changed, 88 insertions(+), 3 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.target/i386/pr61747.C
> > 
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index e698f0bffc7..c83c3921792 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -3019,8 +3019,21 @@ expand_vec_cond_optab_fn (internal_fn, gcall *stmt, 
> > convert_optab optab)
> >icode = convert_optab_handler (optab, mode, cmp_op_mode);
> >rtx comparison
> >  = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp, icode, 
> > 4);
> > -  rtx rtx_op1 = expand_normal (op1);
> > -  rtx rtx_op2 = expand_normal (op2);
> > +  /* vector_compare_rtx legitimizes operands, preserve equality when
> > + expanding op1/op2.  */
> > +  rtx rtx_op1, rtx_op2;
> > +  if (operand_equal_p (op1, op0a))
> > +rtx_op1 = XEXP (comparison, 0);
> > +  else if (operand_equal_p (op1, op0b))
> > +rtx_op1 = XEXP (comparison, 1);
> > +  else
> > +rtx_op1 = expand_normal (op1);
> > +  if (operand_equal_p (op2, op0a))
> > +rtx_op2 = XEXP (comparison, 0);
> > +  else if (operand_equal_p (op2, op0b))
> > +rtx_op2 = XEXP (comparison, 1);
> > +  else
> > +rtx_op2 = expand_normal (op2);
> >  
> >rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> >create_output_operand (&ops[0], target, mode);
> 
> The above LGTM, it relies on vector_compare_rtx not swapping the arguments
> or performing some other comparison canonicalization, but at least right now
> that function doesn't seem to do that.
> 
> > --- a/gcc/optabs.cc
> > +++ b/gcc/optabs.cc
> > @@ -5119,13 +5119,43 @@ emit_conditional_move (rtx target, struct 
> > rtx_comparison comp,
> >   last = get_last_insn ();
> >   do_pending_stack_adjust ();
> >   machine_mode cmpmode = comp.mode;
> > + rtx orig_op0 = XEXP (comparison, 0);
> > + rtx orig_op1 = XEXP (comparison, 1);
> > + rtx op2p = op2;
> > + rtx op3p = op3;
> > + /* If we are optimizing, force expensive constants into a register
> > +but preserve an eventual equality with op2/op3.  */
> > + if (CONSTANT_P (orig_op0) && optimize
> > + && (rtx_cost (orig_op0, mode, COMPARE, 0,
> > +   optimize_insn_for_speed_p ())
> > + > COSTS_N_INSNS (1))
> > + && can_create_pseudo_p ())
> > +   {
> > + XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
> > + if (rtx_equal_p (orig_op0, op2))
> > +   op2p = XEXP (comparison, 0);
> > + if (rtx_equal_p (orig_op0, op3))
> > +   op3p = XEXP (comparison, 0);
> > +   }
> > + if (CONSTANT_P (orig_op1) && optimize
> > + && (rtx_cost (orig_op1, mode, COMPARE, 0,
> > +   optimize_insn_for_speed_p ())
> > + > COSTS_N_INSNS (1))
> > + && can_create_pseudo_p ())
> > +   {
> > + XEXP (comparison, 1) = force_reg (cmpmode, orig_op1);
> > + if (rtx_equal_p (orig_op1, op2))
> > +   op2p = XEXP (comparison, 1);
> > + if (rtx_equal_p (orig_op1, op3))
> > +   op3p = XEXP (comparison, 1);
> > +   }
> 
> I'm worried here, because prepare_cmp_insn before doing almost identical
> forcing to reg does
>   if (CONST_SCALAR_INT_P (y))
> canonicalize_comparison (mode, &comparison, &y);
> which the above change will make not happen anymore (for the more expensive
> constants).

Hmm, yeah - that could happen.

> If we have a match between at least one of the comparison operands and
> op2/op3, I think having equivalency there is perhaps more important than
> the canonicalization, but it would be nice not to break it even if there
> is no match.  So, perhaps force_reg only if there is a match?
> force_reg (cmpmode, force_reg (cmpmode, x)) is equivalent to
> force_reg (cmpmode, x), so perhaps:
>   {
> if (rtx_equal_p (orig_op0, op2))
>   op2p = XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
> if (rtx_equal_p (orig_op0, op3))
>   op3p = XEXP (comparison, 0)
> = force_reg (cmpmode, XEXP (comparison, 0));
>   }
> and sim

RE: [PATCH] VECT: Add mask_len_fold_left_plus for in-order floating-point reduction

2023-07-19 Thread Li, Pan2 via Gcc-patches
Committed as passed both the bootstrap and regression test, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Wednesday, July 19, 2023 4:17 PM
To: Ju-Zhe Zhong 
Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com
Subject: Re: [PATCH] VECT: Add mask_len_fold_left_plus for in-order 
floating-point reduction

On Sat, 15 Jul 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, Richard and Richi.
> 
> This patch adds mask_len_fold_left_plus pattern to support in-order 
> floating-point
> reduction for target support len loop control.
> 
> Consider this following case:
> double
> foo2 (double *__restrict a,
>  double init,
>  int *__restrict cond,
>  int n)
> {
> for (int i = 0; i < n; i++)
>   if (cond[i])
> init += a[i];
> return init;
> }
> 
> ARM SVE:
> 
> ...
> vec_mask_and_60 = loop_mask_54 & mask__23.33_57;
> vect__ifc__35.37_64 = .VCOND_MASK (vec_mask_and_60, vect__8.36_61, { 0.0, ... 
> });
> _36 = .MASK_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, loop_mask_54);
> ...
> 
> For RVV, we want to see:
> ...
> _36 = .MASK_LEN_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, control_mask, 
> loop_len, bias);
> ...

OK.

Richard.

> gcc/ChangeLog:
> 
> * doc/md.texi: Add mask_len_fold_left_plus.
> * internal-fn.cc (mask_len_fold_left_direct): Ditto.
> (expand_mask_len_fold_left_optab_fn): Ditto.
> (direct_mask_len_fold_left_optab_supported_p): Ditto.
> * internal-fn.def (MASK_LEN_FOLD_LEFT_PLUS): Ditto.
> * optabs.def (OPTAB_D): Ditto.
> 
> ---
>  gcc/doc/md.texi | 13 +
>  gcc/internal-fn.cc  |  5 +
>  gcc/internal-fn.def |  3 +++
>  gcc/optabs.def  |  1 +
>  4 files changed, 22 insertions(+)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index cbcb992e5d7..6f44e66399d 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5615,6 +5615,19 @@ no reassociation.
>  Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
>  (operand 3) that specifies which elements of the source vector should be 
> added.
>  
> +@cindex @code{mask_len_fold_left_plus_@var{m}} instruction pattern
> +@item @code{mask_len_fold_left_plus_@var{m}}
> +Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
> +(operand 3), len operand (operand 4) and bias operand (operand 5) that
> +performs following operations strictly in-order (no reassociation):
> +
> +@smallexample
> +operand0 = operand1;
> +for (i = 0; i < LEN + BIAS; i++)
> +  if (operand3[i])
> +operand0 += operand2[i];
> +@end smallexample
> +
>  @cindex @code{sdot_prod@var{m}} instruction pattern
>  @item @samp{sdot_prod@var{m}}
>  
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index e698f0bffc7..2bf4fc492fe 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -190,6 +190,7 @@ init_internal_fns ()
>  #define fold_extract_direct { 2, 2, false }
>  #define fold_left_direct { 1, 1, false }
>  #define mask_fold_left_direct { 1, 1, false }
> +#define mask_len_fold_left_direct { 1, 1, false }
>  #define check_ptrs_direct { 0, 0, false }
>  
>  const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
> @@ -3890,6 +3891,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, 
> convert_optab optab,
>  #define expand_mask_fold_left_optab_fn(FN, STMT, OPTAB) \
>expand_direct_optab_fn (FN, STMT, OPTAB, 3)
>  
> +#define expand_mask_len_fold_left_optab_fn(FN, STMT, OPTAB) \
> +  expand_direct_optab_fn (FN, STMT, OPTAB, 5)
> +
>  #define expand_check_ptrs_optab_fn(FN, STMT, OPTAB) \
>expand_direct_optab_fn (FN, STMT, OPTAB, 4)
>  
> @@ -3997,6 +4001,7 @@ multi_vector_optab_supported_p (convert_optab optab, 
> tree_pair types,
>  #define direct_fold_extract_optab_supported_p direct_optab_supported_p
>  #define direct_fold_left_optab_supported_p direct_optab_supported_p
>  #define direct_mask_fold_left_optab_supported_p direct_optab_supported_p
> +#define direct_mask_len_fold_left_optab_supported_p direct_optab_supported_p
>  #define direct_check_ptrs_optab_supported_p direct_optab_supported_p
>  #define direct_vec_set_optab_supported_p direct_optab_supported_p
>  #define direct_vec_extract_optab_supported_p direct_optab_supported_p
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index ea750a921ed..d3aec51b1f2 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -319,6 +319,9 @@ DEF_INTERNAL_OPTAB_FN (FOLD_LEFT_PLUS, ECF_CONST | 
> ECF_NOTHROW,
>  DEF_INTERNAL_OPTAB_FN (MASK_FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW,
>  mask_fold_left_plus, mask_fold_left)
>  
> +DEF_INTERNAL_OPTAB_FN (MASK_LEN_FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW,
> +mask_len_fold_left_plus, mask_len_fold_left)
> +
>  /* Unary math functions.  */
>  DEF_INTERNAL_FLT_FN (ACOS, ECF_CONST, acos, unary)
>  DEF_INTERNAL_FLT_FN (ACOSH, ECF_CONST, acosh, unary)
> diff --git a/gcc/op

Re: [PATCH] middle-end/61747 - conditional move expansion and constants

2023-07-19 Thread Jakub Jelinek via Gcc-patches
On Wed, Jul 19, 2023 at 01:36:23PM +, Richard Biener wrote:
> > If we have a match between at least one of the comparison operands and
> > op2/op3, I think having equivalency there is perhaps more important than
> > the canonicalization, but it would be nice not to break it even if there
> > is no match.  So, perhaps force_reg only if there is a match?
> > force_reg (cmpmode, force_reg (cmpmode, x)) is equivalent to
> > force_reg (cmpmode, x), so perhaps:
> > {
> >   if (rtx_equal_p (orig_op0, op2))
> > op2p = XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
> >   if (rtx_equal_p (orig_op0, op3))
> > op3p = XEXP (comparison, 0)
> >   = force_reg (cmpmode, XEXP (comparison, 0));
> > }
> > and similarly for the other body?
> 
> I don't think we'll have op3 == op2 == orig_op0 because if
> op2 == op3 the 
> 
>   /* If the two source operands are identical, that's just a move.  */
> 
>   if (rtx_equal_p (op2, op3))
> {
>   if (!target)
> target = gen_reg_rtx (mode);
> 
>   emit_move_insn (target, op3);
>   return target;
> 
> code should have triggered.  So we should know we invoke force_reg
> only once for each comparison operand check?
> 
> So I'm going to test the following ontop of the patch.

Please use else if instead of the second if then.
Ok with that change.

> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -5131,11 +5131,10 @@ emit_conditional_move (rtx target, struct 
> rtx_comparison comp,
>   > COSTS_N_INSNS (1))
>   && can_create_pseudo_p ())
> {
> - XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
>   if (rtx_equal_p (orig_op0, op2))
> -   op2p = XEXP (comparison, 0);
> +   op2p = XEXP (comparison, 0) = force_reg (cmpmode, 
> orig_op0);
>   if (rtx_equal_p (orig_op0, op3))
> -   op3p = XEXP (comparison, 0);
> +   op3p = XEXP (comparison, 0) = force_reg (cmpmode, 
> orig_op0);
> }
>   if (CONSTANT_P (orig_op1) && optimize
>   && (rtx_cost (orig_op1, mode, COMPARE, 0,
> @@ -5143,11 +5142,10 @@ emit_conditional_move (rtx target, struct 
> rtx_comparison comp,
>   > COSTS_N_INSNS (1))
>   && can_create_pseudo_p ())
> {
> - XEXP (comparison, 1) = force_reg (cmpmode, orig_op1);
>   if (rtx_equal_p (orig_op1, op2))
> -   op2p = XEXP (comparison, 1);
> +   op2p = XEXP (comparison, 1) = force_reg (cmpmode, 
> orig_op1);
>   if (rtx_equal_p (orig_op1, op3))
> -   op3p = XEXP (comparison, 1);
> +   op3p = XEXP (comparison, 1) = force_reg (cmpmode, 
> orig_op1);
> }
>   prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> GET_CODE (comparison), NULL_RTX, unsignedp,

Jakub



Re: [PATCH] c++: fix ICE with is_really_empty_class [PR110106]

2023-07-19 Thread Patrick Palka via Gcc-patches
On Tue, 18 Jul 2023, Marek Polacek via Gcc-patches wrote:

> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk and branches?

Looks reasonable to me.

Though I wonder if we could also fix this by not checking potentiality
at all in this case?  The problematic call to is_rvalue_constant_expression
happens from cp_parser_constant_expression with 'allow_non_constant' != 0
and with 'non_constant_p' being a dummy out argument that comes from
cp_parser_functional_cast, so the result of is_rvalue_constant_expression
is effectively unused in this case, and we should be able to safely elide
it when 'allow_non_constant && non_constant_p == nullptr'.

Relatedly, ISTM the member cp_parser::non_integral_constant_expression_p
is also effectively unused and could be removed?

> 
> -- >8 --
> 
> is_really_empty_class is liable to crash when it gets an incomplete
> or dependent type.  Since r11-557, we pass the yet-uninstantiated
> class type S<0> of the PARM_DECL s to is_really_empty_class -- because
> of the potential_rvalue_constant_expression -> is_rvalue_constant_expression
> change in cp_parser_constant_expression.  Here we're not parsing
> a template so we did not check COMPLETE_TYPE_P as we should.
> 
>   PR c++/110106
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (potential_constant_expression_1): Check COMPLETE_TYPE_P
>   even when !processing_template_decl.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp0x/noexcept80.C: New test.
> ---
>  gcc/cp/constexpr.cc |  2 +-
>  gcc/testsuite/g++.dg/cpp0x/noexcept80.C | 12 
>  2 files changed, 13 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept80.C
> 
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> index 6e8f1c2b61e..1f59c5472fb 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -9116,7 +9116,7 @@ potential_constant_expression_1 (tree t, bool 
> want_rval, bool strict, bool now,
>if (now && want_rval)
>   {
> tree type = TREE_TYPE (t);
> -   if ((processing_template_decl && !COMPLETE_TYPE_P (type))
> +   if (!COMPLETE_TYPE_P (type)
> || dependent_type_p (type)
> || is_really_empty_class (type, /*ignore_vptr*/false))
>   /* An empty class has no data to read.  */
> diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept80.C 
> b/gcc/testsuite/g++.dg/cpp0x/noexcept80.C
> new file mode 100644
> index 000..3e90af747e2
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/noexcept80.C
> @@ -0,0 +1,12 @@
> +// PR c++/110106
> +// { dg-do compile { target c++11 } }
> +
> +template struct S
> +{
> +};
> +
> +struct G {
> +  G(S<0>);
> +};
> +
> +void y(S<0> s) noexcept(noexcept(G{s}));
> 
> base-commit: fca089e8a47314a40ad93527ba9f9d0d374b3afb
> -- 
> 2.41.0
> 
> 



Re: [PATCH v3] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-19 Thread Jeff Law via Gcc-patches




On 7/19/23 02:08, Manolis Tsamis wrote:
de.




I stumbled upon the same thing when doing an aarch64 bootstrap build yesterday.
Given that this causes issues, maybe doing
   int icode = INSN_CODE (insn);
   ...
   INSN_CODE (insn) = icode;
Is a good option and should also be more performant.
I nearly suggested you do something like this, but ultimately it's a 
workaround for target bugs.  So part of me wants to keep it as is, but I 
can also understand the desire to make a chance like you've suggesting.




Even with that I'm still getting a segfault while doing a bootstrap
build that I'm investigating.
Sounds good.  I still need to drop the V3 into my tester and validate 
the m68k (and everything else) just works.  I'm slightly concerned about 
SH, but it's still failing even after taking the V2 out of the tester, 
so the SH issues are clearly unrelated to f-m-o.


I'll take Vineet's testcase and verify that we can just use an integer 
store to handle the 0.0 case.  As I mentioned, that's the right thing to 
do anyway from both a correctness and performance standpoint.  I'll also 
review the movsf pattern for the same problem/optimization.


jeff


Re: [PATCH] core: Support heap-based trampolines

2023-07-19 Thread Iain Sandoe
Hi Martin,

> On 19 Jul 2023, at 11:43, Martin Uecker via Gcc-patches 
>  wrote:
> 
> Am Mittwoch, dem 19.07.2023 um 10:29 +0100 schrieb Iain Sandoe:

>>> On 19 Jul 2023, at 10:04, Martin Uecker 
>>> wrote:
>> 
> On 17 Jul 2023, 
 
>>> 
>> You mention setjmp/longjmp - on darwin and other platforms
 requiring
>> non-stack based trampolines
>> does the system runtime provide means to deal with this issue
>> like
 an
>> alternate allocation method
>> or a way to register cleanup?
> 
> There is an alternate mechanism relying on system libraries
> that is
 possible on darwin specifically (I don’t know for other targets)
 but
 it will only work for signed binaries, and would require us to
 codesign everything produced by gcc. During development, it was
 deemed too big an ask and the current strategy was chosen (Iain
 can
 surely add more background on that if needed).
 
 I do not think that this solves the setjump/longjump issue -
 since
 there’s still a notional allocation that takes place (it’s just
 that
 the mechanism for determining permissions is different).
 
 It is also a big barrier for the general user - and prevents
 normal
 folks from distributing GCC - since codesigning requires an
 external
 certificate (i.e. I would really rather avoid it).
 
>> Was there ever an attempt to provide a "generic" trampoline
>> driven
 by
>> a more complex descriptor?

> My own opinion is that executable stack should go away on all
 targets at some point, so a truly generic solution to the problem
 would be great.
 
 indeed it would.
>> 
>>> I think we need a solution rather sooner than later on all archs.
>> 
>> AFAICS the  heap-based trampolines can work for any arch**, this
>> issue is about
>> system security policy, rather than arch, specifically?
>> 
>> It seems to me that for any system security policy that permits JIT,
>> (but not
>> executable stack) the heap-based trampolines are viable.
> 
> I agree. 
> 
> BTW; One option we discussed before, was to map a page with 
> pre-allocated trampolines, which look up the address of
> a callee and the static chain in a table based on its own
> address. Then no code generation is involved.

That reads similar to the scheme Apple have implemented for libobjc and libffi.
In order to be extensible (i.e to allow the table to grow at runtime), it means
having some loadable executable object; if that is implemented in a way shared
between users (delivered as part of the implementation) then, for Darwin at
least, it must be codesigned - which is somewhere I really want to avoid going
with GCC.  

> The difficult part is avoiding leaks with longjmp / setjmp.
> One idea was to have a shadow stack consisting of the
> pre-allocated trampolines, but this probably causes other
> issues...

With a per-thread table, I *think* for most targets, we discussed in the team
maintaining a ’tide mark’ of the stack as part of the saved data in the
trampoline (not used as part of the execution, but only as part of the 
allocation
mangement)… but ..

> I wonder how difficult it is to have longjmp / setjmp walk 
> the stack in C?   This would also be useful for C++
> interoperability and to free  heap-allocated VLAs.

… this would be a better solution (as we can see trampolines are a small
leak c.f. the general uses)?

> As a user of nested functions, from my side it would also 
> ok to simply add a wide function pointer type that contains
> address + static chain.  This would require changing code, 
> but would also work with Clang's blocks and solve other 
> language interoperability problems, while avoiding all 
> existing ABI issues.

How does that work when passing a callback to libc (e.g. qsort?)

(Implementing Clang’s blocks is also on my TODO, but a different discussion ;))

>> This seems to be a useful step forward; and we can add some other
>> mechanism to the flag’s supported list if someone develops one?
> 
> I think it is a useful step forward.

Assembled maintainers, do you think this is OK for trunk given the various
discussions above?

thanks
Iain



Re: [PATCH] RISC-V: Refactor RVV machine modes

2023-07-19 Thread Kito Cheng via Gcc-patches
Thansk, that's really awesome!

One comment about mode iterator is the naming seems like still
prefixed with VNX which inconsistent with new mode naming scheme.

> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index cd5b19457f8..03e19259505 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -121,45 +121,30 @@
>DONE;
>  })
>
> -(define_expand "len_mask_gather_load"
> -  [(match_operand:VNX16_QHSD 0 "register_operand")
> +(define_expand "len_mask_gather_load"

Why DI is gone? and I saw many other pattern has similar issue?

> +  [(match_operand:VNX16_QHS 0 "register_operand")
> (match_operand 1 "pmode_reg_or_0_operand")
> -   (match_operand:VNX16_QHSDI 2 "register_operand")
> -   (match_operand 3 "")
> -   (match_operand 4 "")
> +   (match_operand:VNX16_QHSI 2 "register_operand")
> +   (match_operand 3 "")
> +   (match_operand 4 "")
> (match_operand 5 "autovec_length_operand")
> (match_operand 6 "const_0_operand")
> -   (match_operand: 7 "vector_mask_operand")]
> +   (match_operand: 7 "vector_mask_operand")]
>"TARGET_VECTOR"
>  {
>riscv_vector::expand_gather_scatter (operands, true);
>DONE;
>  })
>
> -(define_expand "len_mask_gather_load"
> -  [(match_operand:VNX32_QHS 0 "register_operand")
> +(define_expand "len_mask_gather_load"

Like this, SI is gone?

> @@ -2172,7 +2145,7 @@ preferred_simd_mode (scalar_mode mode)
>   riscv_autovec_lmul < RVV_M2. Since GCC loop vectorizer report ICE when 
> we
>   enable -march=rv64gc_zve32* and -march=rv32gc_zve64*. in the
>   'can_duplicate_and_interleave_p' of tree-vect-slp.cc. Since we have
> - VNx1SImode in -march=*zve32* and VNx1DImode in -march=*zve64*, they are
> + RVVM1SImode in -march=*zve32* and RVVM1DImode in -march=*zve64*, they 
> are

This comment might need to review.

>   enabled in targetm. vector_mode_supported_p and SLP vectorizer will try 
> to
>   use them. Currently, we can support auto-vectorization in
>   -march=rv32_zve32x_zvl128b. Wheras, -march=rv32_zve32x_zvl32b or
> @@ -2371,7 +2344,7 @@ autovectorize_vector_modes (vector_modes *modes, bool)
>poly_uint64 full_size
> = BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul);
>
> -  /* Start with a VNxYYQImode where YY is the number of units that
> +  /* Start with a RVVM1QImode where YY is the number of units that
>  fit a whole vector.
>  Then try YY = nunits / 2, nunits / 4 and nunits / 8 which
>  is guided by the extensions we have available (vf2, vf4 and vf8).

We don't have YY now :P

> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index f1f5a73389e..c47bcd2b412 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -972,8 +972,8 @@ riscv_valid_lo_sum_p (enum riscv_symbol_type sym_type, 
> machine_mode mode,
>  }
>
>  /* Return true if mode is the RVV enabled mode.
> -   For example: 'VNx1DI' mode is disabled if MIN_VLEN == 32.
> -   'VNx1SI' mode is enabled if MIN_VLEN == 32.  */
> +   For example: 'RVVM1DI' mode is disabled if MIN_VLEN == 32.
> +   'RVVM1SI' mode is enabled if MIN_VLEN == 32.  */

This comment need to updated :)

>
>  bool
>  riscv_v_ext_vector_mode_p (machine_mode mode)
> @@ -1023,11 +1023,36 @@ riscv_v_ext_mode_p (machine_mode mode)
>  poly_int64
>  riscv_v_adjust_nunits (machine_mode mode, int scale)
>  {
> +  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
>if (riscv_v_ext_mode_p (mode))
> -return riscv_vector_chunks * scale;
> +{
> +  if (TARGET_MIN_VLEN == 32)
> +   scale = scale / 2;
> +  return riscv_vector_chunks * scale;
> +}
>return scale;
>  }

> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 19683152259..643e7ea7330 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1040,6 +1040,7 @@ extern unsigned riscv_stack_boundary;
>  extern unsigned riscv_bytes_per_vector_chunk;
>  extern poly_uint16 riscv_vector_chunks;
>  extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> +extern poly_int64 riscv_v_adjust_nunits (machine_mode, bool, int, int);
>  extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>  extern poly_int64 riscv_v_adjust_bytesize (enum machine_mode, int);
>  /* The number of bits and bytes in a RVV vector.  */

> +  (cond [(eq_attr "mode" "RVVM8QI,RVVM1BI") (symbol_ref 
> "riscv_vector::get_vlmul(E_RVVM8QImode)")

This could be just using constant value rather than ask
riscv_vector::get_vlmul now :)

> +(eq_attr "mode" "RVVM8QI,RVVM1BI") (symbol_ref 
> "riscv_vector::get_ratio(E_RVVM8QImode)")

They are constant now too.


Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c

2023-07-19 Thread Maciej W. Rozycki
On Wed, 12 Jul 2023, Richard Biener wrote:

> > > That said, we should handle this better so can you file an
> > > enhancement bugreport for this?
> >
> >  Filed as PR -optimization/110630.
> 
> Thanks!

 Thanks for making this improvement.  I've checked MIPS results and code 
produced now is as follows:

daddiu  $sp,$sp,-64
sd  $5,24($sp)
sd  $7,40($sp)
ldc1$f0,24($sp)
ldc1$f1,40($sp)
sd  $4,16($sp)
sd  $6,32($sp)
ldc1$f2,32($sp)
add.ps  $f1,$f0,$f1
ldc1$f0,16($sp)
add.ps  $f0,$f0,$f2
sdc1$f1,56($sp)
ld  $3,56($sp)
sdc1$f0,48($sp)
ld  $2,48($sp)
jr  $31
daddiu  $sp,$sp,64

which does do vector stuff now, although it's still considerably worse 
than my handwritten example:

> > dmtc1   $4,$f0
> > dmtc1   $5,$f1
> > dmtc1   $6,$f2
> > dmtc1   $7,$f3
> > add.ps  $f0,$f0,$f1
> > add.ps  $f2,$f2,$f3
> > dmfc1   $2,$f0
> > jr  $31
> > dmfc1   $3,$f2

Or I'd say it's pretty terrible, but given the current situation with the 
MIPS backend I'm going to leave it to the new maintainer to sort out.

> >  Do you agree it still makes sense to include bb-slp-pr95839-v8.c with the
> > testsuite?
> 
> Sure, more coverage is always  nice.

 Thanks, committed (with the `vect64' requirement removed, as we can take 
it for granted with `vect_float').

  Maciej


Re: [PATCH] Add __builtin_iseqsig()

2023-07-19 Thread FX Coudert via Gcc-patches
6 weeks later, I’d like to ask a global maintainer to review this.
The idea was okay’ed previously by Joseph Myers, but he asked for testing of 
both the quiet and signalling NaN cases, which is now done.

FX


> Le 6 juin 2023 à 20:15, FX Coudert  a écrit :
> 
> Hi,
> 
> (It took me a while to get back to this.)
> 
> This is a new and improved version of the patch at 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602932.html
> It addresses the comment from Joseph that FE_INVALID should really be tested 
> in the case of both quiet and signaling NaNs, which is now done 
> systematically.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu
> OK to commit?
> 
> FX
> 


0001-Add-__builtin_iseqsig.patch
Description: Binary data




[PATCH 1/2][frontend] Add novector C++ pragma

2023-07-19 Thread Tamar Christina via Gcc-patches
Hi All,

FORTRAN currently has a pragma NOVECTOR for indicating that vectorization should
not be applied to a particular loop.

ICC/ICX also has such a pragma for C and C++ called #pragma novector.

As part of this patch series I need a way to easily turn off vectorization of
particular loops, particularly for testsuite reasons.

This patch proposes a #pragma GCC novector that does the same for C++
as gfortan does for FORTRAN and what ICX/ICX does for C++.

I added only some basic tests here, but the next patch in the series uses this
in the testsuite in about ~800 tests.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/cp/ChangeLog:

* cp-tree.def (RANGE_FOR_STMT): Update comment.
* cp-tree.h (RANGE_FOR_NOVECTOR): New.
(cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Add novector param.
* init.cc (build_vec_init): Default novector to false.
* method.cc (build_comparison_op): Likewise.
* parser.cc (cp_parser_statement): Likewise.
(cp_parser_for, cp_parser_c_for, cp_parser_range_for,
cp_convert_range_for, cp_parser_iteration_statement,
cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
(cp_parser_pragma_novector): New.
* pt.cc (tsubst_expr): Likewise.
* semantics.cc (finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Likewise.

gcc/ChangeLog:

* doc/extend.texi: Document it.

gcc/testsuite/ChangeLog:

* g++.dg/vect/vect.exp (support vect- prefix).
* g++.dg/vect/vect-novector-pragma.cc: New test.

--- inline copy of patch -- 
diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def
index 
0e66ca70e00caa1dc4beada1024ace32954e2aaf..c13c8ea98a523c4ef1c55a11e02d5da9db7e367e
 100644
--- a/gcc/cp/cp-tree.def
+++ b/gcc/cp/cp-tree.def
@@ -305,8 +305,8 @@ DEFTREECODE (IF_STMT, "if_stmt", tcc_statement, 4)
 
 /* Used to represent a range-based `for' statement. The operands are
RANGE_FOR_DECL, RANGE_FOR_EXPR, RANGE_FOR_BODY, RANGE_FOR_SCOPE,
-   RANGE_FOR_UNROLL, and RANGE_FOR_INIT_STMT, respectively.  Only used in
-   templates.  */
+   RANGE_FOR_UNROLL, RANGE_FOR_NOVECTOR and RANGE_FOR_INIT_STMT,
+   respectively.  Only used in templates.  */
 DEFTREECODE (RANGE_FOR_STMT, "range_for_stmt", tcc_statement, 6)
 
 /* Used to represent an expression statement.  Use `EXPR_STMT_EXPR' to
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 
8398223311194837441107cb335d497ff5f5ec1c..bece7bff1f01a23cfc94386fd3295a0be8c462fe
 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5377,6 +5377,7 @@ get_vec_init_expr (tree t)
 #define RANGE_FOR_UNROLL(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 4)
 #define RANGE_FOR_INIT_STMT(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 5)
 #define RANGE_FOR_IVDEP(NODE)  TREE_LANG_FLAG_6 (RANGE_FOR_STMT_CHECK (NODE))
+#define RANGE_FOR_NOVECTOR(NODE) TREE_LANG_FLAG_5 (RANGE_FOR_STMT_CHECK (NODE))
 
 /* STMT_EXPR accessor.  */
 #define STMT_EXPR_STMT(NODE)   TREE_OPERAND (STMT_EXPR_CHECK (NODE), 0)
@@ -7286,7 +7287,7 @@ extern bool maybe_clone_body  (tree);
 
 /* In parser.cc */
 extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool,
- unsigned short);
+ unsigned short, bool);
 extern void cp_convert_omp_range_for (tree &, vec *, tree &,
  tree &, tree &, tree &, tree &, tree &);
 extern void cp_finish_omp_range_for (tree, tree);
@@ -7609,16 +7610,19 @@ extern void begin_else_clause   (tree);
 extern void finish_else_clause (tree);
 extern void finish_if_stmt (tree);
 extern tree begin_while_stmt   (void);
-extern void finish_while_stmt_cond (tree, tree, bool, unsigned short);
+extern void finish_while_stmt_cond (tree, tree, bool, unsigned short,
+bool);
 extern void finish_while_stmt  (tree);
 extern tree begin_do_stmt  (void);
 extern void finish_do_body (tree);
-extern void finish_do_stmt (tree, tree, bool, unsigned short);
+extern void finish_do_stmt (tree, tree, bool, unsigned short,
+bool);
 extern tree finish_return_stmt (tree);
 extern tree begin_for_scope(tree *);
 extern tree begin_for_stmt (tree, tree);
 extern void finish_init_stmt   (tree);
-extern void finish_for_cond(tree, tree, bool, unsigned short);
+extern void finish_for_cond(tree, tree, bool, unsigned short,
+bool);
 extern void finish_for_expr(tree, tree);
 extern void finish_for_stmt(tree);
 extern tree begin_range_for_stmt 

[PATCH 2/2][frontend]: Add novector C pragma

2023-07-19 Thread Tamar Christina via Gcc-patches
Hi All,

FORTRAN currently has a pragma NOVECTOR for indicating that vectorization should
not be applied to a particular loop.

ICC/ICX also has such a pragma for C and C++ called #pragma novector.

As part of this patch series I need a way to easily turn off vectorization of
particular loops, particularly for testsuite reasons.

This patch proposes a #pragma GCC novector that does the same for C
as gfortan does for FORTRAN and what ICX/ICX does for C.

I added only some basic tests here, but the next patch in the series uses this
in the testsuite in about ~800 tests.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/c-family/ChangeLog:

* c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
* c-pragma.cc (init_pragma): Use it.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_while_statement, c_parser_do_statement,
c_parser_for_statement, c_parser_statement_after_labels,
c_parse_pragma_novector, c_parser_pragma): Wire through novector and
default to false.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-novector-pragma.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 
9cc95ab3ee376628dbef2485b84e6008210fa8fc..99cf2e8bd1c05537c198470f1aaa0a5a9da4e576
 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -87,6 +87,7 @@ enum pragma_kind {
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
   PRAGMA_UNROLL,
+  PRAGMA_NOVECTOR,
 
   PRAGMA_FIRST_EXTERNAL
 };
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 
0d2b333cebbed32423d5dc6fd2a3ac0ce0bf8b94..848a850b8e123ff1c6ae1ec4b7f8ccbd599b1a88
 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1862,6 +1862,10 @@ init_pragma (void)
 cpp_register_deferred_pragma (parse_in, "GCC", "unroll", PRAGMA_UNROLL,
  false, false);
 
+  if (!flag_preprocess_only)
+cpp_register_deferred_pragma (parse_in, "GCC", "novector", PRAGMA_NOVECTOR,
+ false, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 
24a6eb6e4596f32c477e3f1c3f98b9792f7bc92c..4c64d898cddac437958ce20c5603b88a05a99093
 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1572,9 +1572,11 @@ static tree c_parser_c99_block_statement (c_parser *, 
bool *,
  location_t * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec *);
 static void c_parser_switch_statement (c_parser *, bool *);
-static void c_parser_while_statement (c_parser *, bool, unsigned short, bool 
*);
-static void c_parser_do_statement (c_parser *, bool, unsigned short);
-static void c_parser_for_statement (c_parser *, bool, unsigned short, bool *);
+static void c_parser_while_statement (c_parser *, bool, unsigned short, bool,
+ bool *);
+static void c_parser_do_statement (c_parser *, bool, unsigned short, bool);
+static void c_parser_for_statement (c_parser *, bool, unsigned short, bool,
+   bool *);
 static tree c_parser_asm_statement (c_parser *);
 static tree c_parser_asm_operands (c_parser *);
 static tree c_parser_asm_goto_operands (c_parser *);
@@ -6644,13 +6646,13 @@ c_parser_statement_after_labels (c_parser *parser, bool 
*if_p,
  c_parser_switch_statement (parser, if_p);
  break;
case RID_WHILE:
- c_parser_while_statement (parser, false, 0, if_p);
+ c_parser_while_statement (parser, false, 0, false, if_p);
  break;
case RID_DO:
- c_parser_do_statement (parser, false, 0);
+ c_parser_do_statement (parser, false, 0, false);
  break;
case RID_FOR:
- c_parser_for_statement (parser, false, 0, if_p);
+ c_parser_for_statement (parser, false, 0, false, if_p);
  break;
case RID_GOTO:
  c_parser_consume_token (parser);
@@ -7146,7 +7148,7 @@ c_parser_switch_statement (c_parser *parser, bool *if_p)
 
 static void
 c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
- bool *if_p)
+ bool novector, bool *if_p)
 {
   tree block, cond, body;
   unsigned char save_in_statement;
@@ -7168,6 +7170,11 @@ c_parser_while_statement (c_parser *parser, bool ivdep, 
unsigned short unroll,
   build_int_cst (integer_type_node,
  annot_expr_unroll_kind),
   build_int_cst (integer_type_node, unroll));
+  if (novector && cond != error_mark_node)
+cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+  build_int_cst (integer_type_node,
+ annot_expr_no_vector_kind),
+  integer_zero_node);
   save_in_stateme

Re: [PATCH] core: Support heap-based trampolines

2023-07-19 Thread Martin Uecker via Gcc-patches
Am Mittwoch, dem 19.07.2023 um 15:23 +0100 schrieb Iain Sandoe:
> Hi Martin,
> 
> > On 19 Jul 2023, at 11:43, Martin Uecker via Gcc-patches 
> >  wrote:
> > 
> > Am Mittwoch, dem 19.07.2023 um 10:29 +0100 schrieb Iain Sandoe:
> 
> > > > On 19 Jul 2023, at 10:04, Martin Uecker 
> > > > wrote:
> > > 
> > > > > > On 17 Jul 2023, 
> > > > > 
> > > > 
> > > > > > > You mention setjmp/longjmp - on darwin and other platforms
> > > > > requiring
> > > > > > > non-stack based trampolines
> > > > > > > does the system runtime provide means to deal with this issue
> > > > > > > like
> > > > > an
> > > > > > > alternate allocation method
> > > > > > > or a way to register cleanup?
> > > > > > 
> > > > > > There is an alternate mechanism relying on system libraries
> > > > > > that is
> > > > > possible on darwin specifically (I don’t know for other targets)
> > > > > but
> > > > > it will only work for signed binaries, and would require us to
> > > > > codesign everything produced by gcc. During development, it was
> > > > > deemed too big an ask and the current strategy was chosen (Iain
> > > > > can
> > > > > surely add more background on that if needed).
> > > > > 
> > > > > I do not think that this solves the setjump/longjump issue -
> > > > > since
> > > > > there’s still a notional allocation that takes place (it’s just
> > > > > that
> > > > > the mechanism for determining permissions is different).
> > > > > 
> > > > > It is also a big barrier for the general user - and prevents
> > > > > normal
> > > > > folks from distributing GCC - since codesigning requires an
> > > > > external
> > > > > certificate (i.e. I would really rather avoid it).
> > > > > 
> > > > > > > Was there ever an attempt to provide a "generic" trampoline
> > > > > > > driven
> > > > > by
> > > > > > > a more complex descriptor?
> 
> > > > > > My own opinion is that executable stack should go away on all
> > > > > targets at some point, so a truly generic solution to the problem
> > > > > would be great.
> > > > > 
> > > > > indeed it would.
> > > 
> > > > I think we need a solution rather sooner than later on all archs.
> > > 
> > > AFAICS the  heap-based trampolines can work for any arch**, this
> > > issue is about
> > > system security policy, rather than arch, specifically?
> > > 
> > > It seems to me that for any system security policy that permits JIT,
> > > (but not
> > > executable stack) the heap-based trampolines are viable.
> > 
> > I agree. 
> > 
> > BTW; One option we discussed before, was to map a page with 
> > pre-allocated trampolines, which look up the address of
> > a callee and the static chain in a table based on its own
> > address. Then no code generation is involved.
> 
> That reads similar to the scheme Apple have implemented for libobjc and 
> libffi.
> In order to be extensible (i.e to allow the table to grow at runtime), it 
> means
> having some loadable executable object; if that is implemented in a way shared
> between users (delivered as part of the implementation) then, for Darwin at
> least, it must be codesigned - which is somewhere I really want to avoid going
> with GCC.  
> 
> > The difficult part is avoiding leaks with longjmp / setjmp.
> > One idea was to have a shadow stack consisting of the
> > pre-allocated trampolines, but this probably causes other
> > issues...
> 
> With a per-thread table, I *think* for most targets, we discussed in the team
> maintaining a ’tide mark’ of the stack as part of the saved data in the
> trampoline (not used as part of the execution, but only as part of the 
> allocation
> mangement)… but ..
> 
> > I wonder how difficult it is to have longjmp / setjmp walk 
> > the stack in C?   This would also be useful for C++
> > interoperability and to free  heap-allocated VLAs.
> 
> … this would be a better solution (as we can see trampolines are a small
> leak c.f. the general uses)?
> 
> > As a user of nested functions, from my side it would also 
> > ok to simply add a wide function pointer type that contains
> > address + static chain.  This would require changing code, 
> > but would also work with Clang's blocks and solve other 
> > language interoperability problems, while avoiding all 
> > existing ABI issues.
> 
> How does that work when passing a callback to libc (e.g. qsort?)

This would not work because it would be an ABI change, but because
it solves a general problem and would plug a major language
interoperability issue between C and all languages that have
callable objects or nested functions, I think one could make a
good case that such an extension goes into C standard together
with a set of enhanced interfaces.

I have an initial proposal here:

http://www2.open-std.org/JTC1/SC22/WG14/www/docs/n2787.pdf

One could combine this with other solutions where a user
created trampoline with explicit allocation and deallocation
calls such a wide pointer, i.e. one has a library function
that takes a wide pointer and returns regular function pointer
that points to an

[committed] libstdc++: Fix locale-specific duration formatting [PR110719]

2023-07-19 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

The r14-2640-gf4bce119f617dc commit only removed fractional seconds for
time points, but it needs to be done for durations and hh_mm_ss types
too.

libstdc++-v3/ChangeLog:

PR libstdc++/110719
* include/bits/chrono_io.h (__formatter_chrono::_S_floor_seconds):
Handle duration and hh_mm_ss.
* testsuite/20_util/duration/io.cc: Check locale-specific
formats.
* testsuite/std/time/hh_mm_ss/io.cc: Likewise.
---
 libstdc++-v3/include/bits/chrono_io.h | 22 -
 libstdc++-v3/testsuite/20_util/duration/io.cc |  4 
 .../testsuite/std/time/hh_mm_ss/io.cc | 24 +--
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index 43eeab42869..0c5f9f5058b 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -1144,11 +1144,11 @@ namespace __format
 
  __out = __format::__write(std::move(__out),
_S_two_digits(__hms.seconds().count()));
- using rep = typename decltype(__hms)::precision::rep;
  if constexpr (__hms.fractional_width != 0)
{
  locale __loc = _M_locale(__ctx);
  auto __ss = __hms.subseconds();
+ using rep = typename decltype(__ss)::rep;
  if constexpr (is_floating_point_v)
{
  __out = std::format_to(__loc, std::move(__out),
@@ -1546,11 +1546,21 @@ namespace __format
_S_floor_seconds(const _Tp& __t)
{
  using chrono::__detail::__local_time_fmt;
- if constexpr (chrono::__is_time_point_v<_Tp>)
-   if constexpr (_Tp::period::den != 1)
- return chrono::floor(__t);
-   else
- return __t;
+ if constexpr (chrono::__is_time_point_v<_Tp>
+ || chrono::__is_duration_v<_Tp>)
+   {
+ if constexpr (_Tp::period::den != 1)
+   return chrono::floor(__t);
+ else
+   return __t;
+   }
+ else if constexpr (__is_specialization_of<_Tp, chrono::hh_mm_ss>)
+   {
+ if constexpr (_Tp::fractional_width != 0)
+   return chrono::floor(__t.to_duration());
+ else
+   return __t;
+   }
  else if constexpr (__is_specialization_of<_Tp, __local_time_fmt>)
return _S_floor_seconds(__t._M_time);
  else
diff --git a/libstdc++-v3/testsuite/20_util/duration/io.cc 
b/libstdc++-v3/testsuite/20_util/duration/io.cc
index 27586b54392..ea94b062d96 100644
--- a/libstdc++-v3/testsuite/20_util/duration/io.cc
+++ b/libstdc++-v3/testsuite/20_util/duration/io.cc
@@ -71,6 +71,10 @@ test_format()
   s = std::format("{:%t} {:%t%M}", -2h, -123s);
   VERIFY( s == "\t \t-02" );
 
+  // Locale-specific formats:
+  s = std::format(std::locale::classic(), "{:%r %OH:%OM:%OS}", 123456ms);
+  VERIFY( s == "12:02:03 AM 00:02:03" );
+
   std::string_view specs = "aAbBcCdDeFgGhHIjmMpqQrRSTuUVwWxXyYzZ";
   std::string_view my_specs = "HIjMpqQrRSTX";
   for (char c : specs)
diff --git a/libstdc++-v3/testsuite/std/time/hh_mm_ss/io.cc 
b/libstdc++-v3/testsuite/std/time/hh_mm_ss/io.cc
index 3b50f40c1f6..072234328c7 100644
--- a/libstdc++-v3/testsuite/std/time/hh_mm_ss/io.cc
+++ b/libstdc++-v3/testsuite/std/time/hh_mm_ss/io.cc
@@ -6,7 +6,7 @@
 #include 
 
 void
-test01()
+test_ostream()
 {
   using std::ostringstream;
   using std::chrono::hh_mm_ss;
@@ -40,7 +40,27 @@ test01()
   VERIFY( out.str() == "18:15:45" );
 }
 
+void
+test_format()
+{
+  using namespace std::chrono;
+
+  auto s = std::format("{}", hh_mm_ss{1h + 23min + 45s});
+  VERIFY( s == "01:23:45" );
+
+  auto ws = std::format(L"{}", hh_mm_ss{1h + 23min + 45s});
+  VERIFY( ws == L"01:23:45" );
+
+  // Locale-specific formats:
+  auto loc = std::locale::classic();
+  s = std::format(loc, "{:%r %OH:%OM:%OS}", hh_mm_ss{123456ms});
+  VERIFY( s == "12:02:03 AM 00:02:03" );
+  ws = std::format(loc, L"{:%r %OH:%OM:%OS}", hh_mm_ss{123456ms});
+  VERIFY( ws == L"12:02:03 AM 00:02:03" );
+}
+
 int main()
 {
-  test01();
+  test_ostream();
+  test_format();
 }
-- 
2.41.0



[committed] libstdc++: Fix formatting of negative chrono::hh_mm_ss

2023-07-19 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

When formatting with an empty chrono spec ("{}") two minus signs were
being added to hh_mm_ss values. This is because the __is_neg flag was
checked to add one explicitly, and then the ostream operator added
another one.

We should only check the __is_neg flag for durations, because those are
the only types which are modified to be non-negative before calling
_M_format. We don't change hh_mm_ss values to be negative, because that
would require performing arithmetic on the hh_mm_ss members to sum them,
and then again to construct a new hh_mm_ss object with the positive
value.  Instead, we can just be careful about using the __is_neg flag
correctly.

To fix the bug, _M_format_to_ostream no longer checks the __is_neg flag
for non-durations, and _M_format doesn't set it for hh_mm_ss until after
the call to _M_format_to_ostream. We can also avoid setting it for types
that it doesn't apply to, by making the __print_sign lambda only inspect
it for duration and hh_mm_ss types.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_format):
Do not set __is_neg for hh_mm_ss before calling
_M_format_to_ostream. Change __print_sign lambda to only check
__is_neg for durations and hh_mm_ss types.
(__formatter_chrono::_M_format_to_ostream): Only check __is_neg
for duration types.
* testsuite/std/time/hh_mm_ss/io.cc: Check negative values.
---
 libstdc++-v3/include/bits/chrono_io.h | 27 ++-
 .../testsuite/std/time/hh_mm_ss/io.cc |  4 +++
 2 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index 0c5f9f5058b..c95301361d8 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -486,11 +486,6 @@ namespace __format
_M_format(const _Tp& __t, _FormatContext& __fc,
  bool __is_neg = false) const
{
- if constexpr (__is_specialization_of<_Tp, chrono::hh_mm_ss>)
-   __is_neg = __t.is_negative();
- else if constexpr (!chrono::__is_duration_v<_Tp>)
-   __is_neg = false;
-
  auto __first = _M_spec._M_chrono_specs.begin();
  const auto __last = _M_spec._M_chrono_specs.end();
  if (__first == __last)
@@ -513,12 +508,19 @@ namespace __format
  else
__out = __sink.out();
 
+ // formatter passes the correct value of __is_neg
+ // for durations but for hh_mm_ss we decide it here.
+ if constexpr (__is_specialization_of<_Tp, chrono::hh_mm_ss>)
+   __is_neg = __t.is_negative();
+
  auto __print_sign = [&__is_neg, &__out] {
-   if (__is_neg)
- {
-   *__out++ = _S_plus_minus[1];
-   __is_neg = false;
- }
+   if constexpr (chrono::__is_duration_v<_Tp>
+   || __is_specialization_of<_Tp, chrono::hh_mm_ss>)
+ if (__is_neg)
+   {
+ *__out++ = _S_plus_minus[1];
+ __is_neg = false;
+   }
return std::move(__out);
  };
 
@@ -708,8 +710,9 @@ namespace __format
__os << __t._M_date << ' ' << __t._M_time;
  else
{
- if (__is_neg) [[unlikely]]
-   __os << _S_plus_minus[1];
+ if constexpr (chrono::__is_duration_v<_Tp>)
+   if (__is_neg) [[unlikely]]
+ __os << _S_plus_minus[1];
  __os << __t;
}
 
diff --git a/libstdc++-v3/testsuite/std/time/hh_mm_ss/io.cc 
b/libstdc++-v3/testsuite/std/time/hh_mm_ss/io.cc
index 072234328c7..ddb1ad77d1e 100644
--- a/libstdc++-v3/testsuite/std/time/hh_mm_ss/io.cc
+++ b/libstdc++-v3/testsuite/std/time/hh_mm_ss/io.cc
@@ -47,9 +47,13 @@ test_format()
 
   auto s = std::format("{}", hh_mm_ss{1h + 23min + 45s});
   VERIFY( s == "01:23:45" );
+  s = std::format("{}", hh_mm_ss{-42min});
+  VERIFY( s == "-00:42:00" );
 
   auto ws = std::format(L"{}", hh_mm_ss{1h + 23min + 45s});
   VERIFY( ws == L"01:23:45" );
+  ws = std::format(L"{}", hh_mm_ss{-42min});
+  VERIFY( ws == L"-00:42:00" );
 
   // Locale-specific formats:
   auto loc = std::locale::classic();
-- 
2.41.0



Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-07-19 Thread Qing Zhao via Gcc-patches
More thoughts on the following example Kees provided: 

> On Jul 17, 2023, at 7:40 PM, Kees Cook  wrote:
>> 
>> The counted_by attribute is used to annotate a Flexible array member on how 
>> many elements it will have.
>> However, if this information can not accurately reflect the real number of 
>> elements for the array allocated, 
>> What’s the purpose of such information? 
> 
> For example, imagine code that allocates space for 100 elements since
> the common case is that the number of elements will grow over time.
> Elements are added as it goes. For example:
> 
> struct grows {
>   int alloc_count;
>   int valid_count;
>   struct element item[] __counted_by(valid_count);
> } *p;
> 
> void something(void)
> {
>   p = malloc(sizeof(*p) + sizeof(*p->item) * 100);
>   p->alloc_count = 100;
>   p->valid_count = 0;
> 
>   /* this loop doesn't check that we don't go over 100. */
>   while (items_to_copy) {
>   struct element *item_ptr = get_next_item();
>   /* __counted_by stays in sync: */
>   p->valid_count++;
>   p->item[p->valid_count - 1] = *item_ptr;
>   }
> }
> 
> We would want to catch cases there p->item[] is accessed with an index
> that is >= p->valid_count, even though the allocation is (currently)
> larger.
> 
> However, if we ever reached valid_count >= alloc_count, we need to trap
> too, since we can still "see" the true allocation size.
> 
> Now, the __alloc_size hint is visible in very few places, so if there is
> a strong reason to do so, I can live with saying that __counted_by takes
> full precedence over __alloc_size. It seems it should be possible to
> compare when both are present, but I can live with __counted_by being
> the universal truth. :)

In the above use case (not sure how popular such user case is?), the major 
questions are:

for one object with flexible array member, 

1. Shall we allow the situation when  the allocated size for the object 
and the number of element for the contained FAM are mismatched?

If the answer to 1 is YES (to support such user cases), then

2.  If there is a mismatch between these two, should the number of element 
impact the allocated
size for the object? (__builtin_object_size())

From the doc of object size checking: 
(https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html)

=
Built-in Function: size_t __builtin_object_size (const void * ptr, int type)
is a built-in construct that returns a constant number of bytes from ptr to the 
end of the object ptr pointer points to (if known at compile time). To 
determine the sizes of dynamically allocated objects the function relies on the 
allocation functions called to obtain the storage to be declared with the 
alloc_size attribute (see Common Function Attributes). __builtin_object_size 
never evaluates its arguments for side effects. If there are any side effects 
in them, it returns (size_t) -1 for type 0 or 1 and (size_t) 0 for type 2 or 3. 
If there are multiple objects ptr can point to and all of them are known at 
compile time, the returned number is the maximum of remaining byte counts in 
those objects if type & 2 is 0 and minimum if nonzero. If it is not possible to 
determine which objects ptr points to at compile time, __builtin_object_size 
should return (size_t) -1 for type 0 or 1 and (size_t) 0 for type 2 or 3.

=

Based on the current documentation for __bos, I think that the answer should be 
NO, i.e, we should not use the counted_by info to change the REAL allocated 
size for the object. 


3. Then, As pointed out also by Martin, only the bounds check (including  
-Warray-bounds or -fsanitizer=bounds) should be impacted by the counted_by 
information, since these checks are based on the TYPE system, and “counted_by” 
info should be treated as a complement to the TYPE system. 

Let me know your opinions.

Qing

[GCC 13 PATCH] aarch64: Remove architecture dependencies from intrinsics

2023-07-19 Thread Andrew Carlotti via Gcc-patches
Updated patch to fix the fp16 intrinsic pragmas, and pushed to master.
OK to backport to GCC 13?


Many intrinsics currently depend on both an architecture version and a
feature, despite the corresponding instructions being available within
GCC at lower architecture versions.

LLVM has already removed these explicit architecture version
dependences; this patch does the same for GCC. Note that +fp16 does not
imply +simd, so we need to add an explicit +simd for the Neon fp16
intrinsics.

Binutils did not previously support all of these architecture+feature
combinations, but this problem is already reachable from GCC.  For
example, compiling the test gcc.target/aarch64/usadv16qi-dotprod.c
with -O3 -march=armv8-a+dotprod has resulted in an assembler error since
GCC 10.  This is fixed in Binutils 2.41.

This patch retains explicit architecture version dependencies for
features that do not currently have a separate feature flag.

gcc/ChangeLog:

 * config/aarch64/aarch64.h (TARGET_MEMTAG): Remove armv8.5
 dependency.
 * config/aarch64/arm_acle.h: Remove unnecessary armv8.x
 dependencies from target pragmas.
 * config/aarch64/arm_fp16.h (target): Likewise.
 * config/aarch64/arm_neon.h (target): Likewise.

gcc/testsuite/ChangeLog:

 * gcc.target/aarch64/feature-bf16-backport.c: New test.
 * gcc.target/aarch64/feature-dotprod-backport.c: New test.
 * gcc.target/aarch64/feature-fp16-backport.c: New test.
 * gcc.target/aarch64/feature-fp16-scalar-backport.c: New test.
 * gcc.target/aarch64/feature-fp16fml-backport.c: New test.
 * gcc.target/aarch64/feature-i8mm-backport.c: New test.
 * gcc.target/aarch64/feature-memtag-backport.c: New test.
 * gcc.target/aarch64/feature-sha3-backport.c: New test.
 * gcc.target/aarch64/feature-sm4-backport.c: New test.

---

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
a01f1ee99d85917941ffba55bc3b4dcac87b41f6..2b0fc97bb71e9d560ae26035c7d7142682e46c38
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -292,7 +292,7 @@ enum class aarch64_feature : unsigned char {
 #define TARGET_RNG (AARCH64_ISA_RNG)
 
 /* Memory Tagging instructions optional to Armv8.5 enabled through +memtag.  */
-#define TARGET_MEMTAG (AARCH64_ISA_V8_5A && AARCH64_ISA_MEMTAG)
+#define TARGET_MEMTAG (AARCH64_ISA_MEMTAG)
 
 /* I8MM instructions are enabled through +i8mm.  */
 #define TARGET_I8MM (AARCH64_ISA_I8MM)
diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index 
3b6b63e6805432b5f1686745f987c52d2967c7c1..7599a32301dadf80760d3cb40a8685d2e6a476fb
 100644
--- a/gcc/config/aarch64/arm_acle.h
+++ b/gcc/config/aarch64/arm_acle.h
@@ -292,7 +292,7 @@ __rndrrs (uint64_t *__res)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.5-a+memtag")
+#pragma GCC target ("+nothing+memtag")
 
 #define __arm_mte_create_random_tag(__ptr, __u64_mask) \
   __builtin_aarch64_memtag_irg(__ptr, __u64_mask)
diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
index 
350f8cc33d99e16137e9d70fa7958b10924dc67f..c10f9dcf7e097ded1740955addcd73348649dc56
 100644
--- a/gcc/config/aarch64/arm_fp16.h
+++ b/gcc/config/aarch64/arm_fp16.h
@@ -30,7 +30,7 @@
 #include 
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+fp16")
+#pragma GCC target ("+nothing+fp16")
 
 typedef __fp16 float16_t;
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 
0ace1eeddb97443433c091d2363403fcf2907654..349f3167699447eb397af482eaeadf8a07617025
 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -25590,7 +25590,7 @@ __INTERLEAVE_LIST (zip)
 #include "arm_fp16.h"
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+fp16")
+#pragma GCC target ("+nothing+simd+fp16")
 
 /* ARMv8.2-A FP16 one operand vector intrinsics.  */
 
@@ -26753,7 +26753,7 @@ vminnmvq_f16 (float16x8_t __a)
 /* AdvSIMD Dot Product intrinsics.  */
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+dotprod")
+#pragma GCC target ("+nothing+dotprod")
 
 __extension__ extern __inline uint32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -26844,7 +26844,7 @@ vdotq_laneq_s32 (int32x4_t __r, int8x16_t __a, 
int8x16_t __b, const int __index)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+sm4")
+#pragma GCC target ("+nothing+sm4")
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -26911,7 +26911,7 @@ vsm4ekeyq_u32 (uint32x4_t __a, uint32x4_t __b)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+sha3")
+#pragma GCC target ("+nothing+sha3")
 
 __extension__ extern __inline uint64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -27547,7 +27547,7 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t 
__a, float32x4_t __b,
 #pragma GCC pop_options
 
 #pragma GCC push_opti

[PATCH V2] rs6000: Don't allow AltiVec address in movoo & movxo pattern [PR110411]

2023-07-19 Thread jeevitha via Gcc-patches
Hi All,

The following patch has been bootstrapped and regtested on powerpc64le-linux.

There are no instructions that do traditional AltiVec addresses (i.e.
with the low four bits of the address masked off) for OOmode and XOmode
objects. The solution is to modify the constraints used in the movoo and
movxo pattern to disallow these types of addresses, which assists LRA in
resolving this issue. Furthermore, the mode size 16 check has been
removed in vsx_quad_dform_memory_operand to allow OOmode and
quad_address_p already handles less than size 16.

2023-07-19  Jeevitha Palanisamy  

gcc/
PR target/110411
* config/rs6000/mma.md (define_insn_and_split movoo): Disallow
AltiVec address in movoo and movxo pattern.
(define_insn_and_split movxo): Likewise.
*config/rs6000/predicates.md (vsx_quad_dform_memory_operand):Remove
redundant mode size check.

gcc/testsuite/
PR target/110411
* gcc.target/powerpc/pr110411-1.c: New testcase.
* gcc.target/powerpc/pr110411-2.c: New testcase.

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index d36dc13872b..575751d477e 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -293,8 +293,8 @@
 })
 
 (define_insn_and_split "*movoo"
-  [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,m,wa")
-   (match_operand:OO 1 "input_operand" "m,wa,wa"))]
+  [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa")
+   (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]
   "TARGET_MMA
&& (gpc_reg_operand (operands[0], OOmode)
|| gpc_reg_operand (operands[1], OOmode))"
@@ -340,8 +340,8 @@
 })
 
 (define_insn_and_split "*movxo"
-  [(set (match_operand:XO 0 "nonimmediate_operand" "=d,m,d")
-   (match_operand:XO 1 "input_operand" "m,d,d"))]
+  [(set (match_operand:XO 0 "nonimmediate_operand" "=d,ZwO,d")
+   (match_operand:XO 1 "input_operand" "ZwO,d,d"))]
   "TARGET_MMA
&& (gpc_reg_operand (operands[0], XOmode)
|| gpc_reg_operand (operands[1], XOmode))"
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 3552d908e9d..925f69cd3fc 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -924,7 +924,7 @@
 (define_predicate "vsx_quad_dform_memory_operand"
   (match_code "mem")
 {
-  if (!TARGET_P9_VECTOR || GET_MODE_SIZE (mode) != 16)
+  if (!TARGET_P9_VECTOR)
 return false;
 
   return quad_address_p (XEXP (op, 0), mode, false);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr110411-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr110411-1.c
new file mode 100644
index 000..f42e9388d65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr110411-1.c
@@ -0,0 +1,22 @@
+/* PR target/110411 */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -mblock-ops-vector-pair" } */
+
+/* Verify we do not ICE on the following.  */
+
+#include 
+
+struct s {
+  long a;
+  long b;
+  long c;
+  long d: 1;
+};
+unsigned long ptr;
+
+void
+bug (struct s *dst)
+{
+  struct s *src = (struct s *)(ptr & ~0xFUL);
+  memcpy (dst, src, sizeof(struct s));
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr110411-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr110411-2.c
new file mode 100644
index 000..c2046fb9855
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr110411-2.c
@@ -0,0 +1,12 @@
+/* PR target/110411 */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+/* Verify we do not ICE on the following.  */
+
+void
+bug (__vector_quad *dst)
+{
+  dst = (__vector_quad *)((unsigned long)dst & ~0xFUL);
+  __builtin_mma_xxsetaccz (dst);
+}





Re: [PATCH 0/8] Tweak predicate macros in tree

2023-07-19 Thread Ken Matsui via Gcc-patches
On Wed, Jul 19, 2023 at 12:08 AM Richard Biener
 wrote:
>
> On Wed, Jul 19, 2023 at 1:34 AM Ken Matsui via Gcc-patches
>  wrote:
> >
> > This patch series tweaks predicate macros in tree.h to make the code more
> > readable. TYPE_REF_P is moved to tree.h and used for INDIRECT_TYPE_P and
> > TYPE_REF_IS_LVALUE. TYPE_PTR_P is also moved to tree.h and used for
> > INDIRECT_TYPE_P. POINTER_TYPE_P in tree.h is replaced with INDIRECT_TYPE_P
> > since it is ambiguous. TYPE_REF_IS_LVALUE is defined in tree.h through
> > TYPE_REF_P and TYPE_REF_IS_RVALUE. The same behavior codes with those
> > predicate macros are replaced for clarity.
> >
> > These works were all the way up to implementing __is_lvalue_reference
> > built-in trait and optimizing the is_lvalue_reference trait. However, those
> > changes were dropped since I did not observe any performance improvements.
> > For those who are interested in the benchmark results, they can be found
> > below:
> >
> > 1. is_lvalue_reference
> >
> > https://github.com/ken-matsui/gcc-benches/blob/main/is_lvalue_reference.md#tue-jul-18-033708-pm-pdt-2023
> >
> > Time: +1.35432%
> > Peak Memory Usage: -0.103283%
> > Total Memory Usage: No difference
> >
> > 2. is_lvalue_reference_v
> >
> > https://github.com/ken-matsui/gcc-benches/blob/main/is_lvalue_reference_v.md#tue-jul-18-034236-pm-pdt-2023
> >
> > Time: No difference
> > Peak Memory Usage: -0.426872%
> > Total Memory Usage: -0.677638%
> >
> > Ken Matsui (8):
> >   c++, tree: Move TYPE_REF_P to tree.h
> >   gcc: Use TYPE_REF_P
> >   c++, tree: Move TYPE_PTR_P to tree.h
> >   c++, tree: Move INDIRECT_TYPE_P to tree.h
> >   gcc: Use INDIRECT_TYPE_P instead of POINTER_TYPE_P
>
> No, please not.  Definitely not.  The tree code of POINTER_TYPE_P is
> POINTER_TYPE so the predicate name is exactly correct.
> REFERENCE_TYPE_P would be the canonical predicate for REFERENCE_TYPE,
> not TYPE_REF_P.
>
> I don't think the C++ frontend should be the one to decide about middle-end
> tree predicate macros.
>

Hi Richard,

Thank you for your review! This is because of the comment, which
states that POINTER_TYPE_P should be renamed to INDIRECT_TYPE_P.

  /* Nonzero if TYPE represents a pointer or reference type.
 (It should be renamed to INDIRECT_TYPE_P.)  Keep these checks in
 ascending code order.  */

  #define POINTER_TYPE_P(TYPE) \
(TREE_CODE (TYPE) == POINTER_TYPE || TYPE_REF_P (TYPE))

Also, INDIRECT_TYPE_P is equivalent to POINTER_TYPE_P.

  #define INDIRECT_TYPE_P(NODE)  \
(TYPE_PTR_P (NODE) || TYPE_REF_P (NODE))

  #define TYPE_PTR_P(NODE)   \
(TREE_CODE (NODE) == POINTER_TYPE)

IMHO, POINTER_TYPE_P and TYPE_PTR_P are confusing since POINTER_TYPE_P
involves TYPE_REF_P but TYPE_PTR_P does not. If this renaming is still
incorrect, I think we should at least remove the comment in
POINTER_TYPE_P. Could you please confirm whether or not this patch is
correct? Thank you!

Sincerely,
Ken

> >   tree: Remove POINTER_TYPE_P
> >   tree: Define TYPE_REF_IS_LVALUE
> >   c++, lto: Use TYPE_REF_IS_LVALUE
> >
> >  gcc/ada/gcc-interface/ada-tree.h   |   2 +-
> >  gcc/ada/gcc-interface/decl.cc  |   6 +-
> >  gcc/ada/gcc-interface/trans.cc |  16 +--
> >  gcc/ada/gcc-interface/utils.cc |  12 +-
> >  gcc/ada/gcc-interface/utils2.cc|  14 +-
> >  gcc/alias.cc   |  12 +-
> >  gcc/analyzer/analyzer.cc   |   4 +-
> >  gcc/analyzer/call-details.h|   2 +-
> >  gcc/analyzer/call-summary.cc   |   2 +-
> >  gcc/analyzer/checker-event.cc  |   4 +-
> >  gcc/analyzer/constraint-manager.cc |   2 +-
> >  gcc/analyzer/engine.cc |   4 +-
> >  gcc/analyzer/program-state.cc  |   2 +-
> >  gcc/analyzer/region-model-manager.cc   |   6 +-
> >  gcc/analyzer/region-model.cc   |   6 +-
> >  gcc/analyzer/sm.cc |   4 +-
> >  gcc/analyzer/svalue.cc |   2 +-
> >  gcc/analyzer/varargs.cc|   2 +-
> >  gcc/asan.cc|   4 +-
> >  gcc/builtins.cc|  24 ++--
> >  gcc/c-family/c-ada-spec.cc |   2 +-
> >  gcc/c-family/c-attribs.cc  |  32 ++---
> >  gcc/c-family/c-common.cc   |  41 +++---
> >  gcc/c-family/c-omp.cc  |   8 +-
> >  gcc/c-family/c-pretty-print.cc |   4 +-
> >  gcc/c-family/c-ubsan.cc|  10 +-
> >  gcc/c-family/c-warn.cc |  34 ++---
> >  gcc/c/c-decl.cc|   8 +-
> >  gcc/c/c-parser.cc  |   4 +-
> >  gcc/c/c-typeck.cc  |  40 +++---
> >  gcc/c/gimple-parser.cc |   8 +-
> >  gcc/calls.cc   |   2 +-
> >  gcc/cfgexpand.cc   |   6 +-
> >  gcc/cgraph.cc

[pushed][LRA]: Check and update frame to stack pointer elimination after stack slot allocation

2023-07-19 Thread Vladimir Makarov via Gcc-patches

The following patch is necessary for porting avr to LRA.

The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.


There is still avr poring problem with reloading of subreg of frame 
pointer.  I'll address it later on this week.


commit 2971ff7b1d564ac04b537d907c70e6093af70832
Author: Vladimir N. Makarov 
Date:   Wed Jul 19 09:35:37 2023 -0400

[LRA]: Check and update frame to stack pointer elimination after stack slot 
allocation

Avr is an interesting target which does not use stack pointer to
address stack slots.  The elimination of stack pointer to frame pointer
is impossible if there are stack slots.  During LRA works, the
stack slots can be allocated and used and the elimination can be done
anymore.  The situation can be complicated even more if some pseudos
were allocated to the frame pointer.

gcc/ChangeLog:

* lra-int.h (lra_update_fp2sp_elimination): New prototype.
(lra_asm_insn_error): New prototype.
* lra-spills.cc (remove_pseudos): Add check for pseudo slot memory
existence.
(lra_spill): Call lra_update_fp2sp_elimination.
* lra-eliminations.cc: Remove trailing spaces.
(elimination_fp2sp_occured_p): New static flag.
(lra_eliminate_regs_1): Set the flag up.
(update_reg_eliminate): Modify the assert for stack to frame
pointer elimination.
(lra_update_fp2sp_elimination): New function.
(lra_eliminate): Clear flag elimination_fp2sp_occured_p.

gcc/testsuite/ChangeLog:

* gcc.target/avr/lra-elim.c: New test.

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 68225339cb6..cf0aa94b69a 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -286,7 +286,7 @@ move_plus_up (rtx x)
 {
   rtx subreg_reg;
   machine_mode x_mode, subreg_reg_mode;
-  
+
   if (GET_CODE (x) != SUBREG || !subreg_lowpart_p (x))
 return x;
   subreg_reg = SUBREG_REG (x);
@@ -309,6 +309,9 @@ move_plus_up (rtx x)
   return x;
 }
 
+/* Flag that we already did frame pointer to stack pointer elimination.  */
+static bool elimination_fp2sp_occured_p = false;
+
 /* Scan X and replace any eliminable registers (such as fp) with a
replacement (such as sp) if SUBST_P, plus an offset.  The offset is
a change in the offset between the eliminable register and its
@@ -366,6 +369,9 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
{
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
+ if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
+   elimination_fp2sp_occured_p = true;
+
  if (maybe_ne (update_sp_offset, 0))
{
  if (ep->to_rtx == stack_pointer_rtx)
@@ -396,9 +402,12 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
  poly_int64 offset, curr_offset;
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
+ if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
+   elimination_fp2sp_occured_p = true;
+
  if (! update_p && ! full_p)
return gen_rtx_PLUS (Pmode, to, XEXP (x, 1));
- 
+
  if (maybe_ne (update_sp_offset, 0))
offset = ep->to_rtx == stack_pointer_rtx ? update_sp_offset : 0;
  else
@@ -456,6 +465,9 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
{
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
+ if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
+   elimination_fp2sp_occured_p = true;
+
  if (maybe_ne (update_sp_offset, 0))
{
  if (ep->to_rtx == stack_pointer_rtx)
@@ -500,7 +512,7 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
 case LE:  case LT:   case LEU:case LTU:
   {
rtx new0 = lra_eliminate_regs_1 (insn, XEXP (x, 0), mem_mode,
-subst_p, update_p, 
+subst_p, update_p,
 update_sp_offset, full_p);
rtx new1 = XEXP (x, 1)
   ? lra_eliminate_regs_1 (insn, XEXP (x, 1), mem_mode,
@@ -749,7 +761,7 @@ mark_not_eliminable (rtx x, machine_mode mem_mode)
  && poly_int_rtx_p (XEXP (XEXP (x, 1), 1), &offset
{
  poly_int64 size = GET_MODE_SIZE (mem_mode);
- 
+
 #ifdef PUSH_ROUNDING
  /* If more bytes than MEM_MODE are pushed, account for
 them.  */
@@ -822,7 +834,7 @@ mark_not_eliminable (rtx x, machine_mode mem_mode)
{
  /* See if this is setting the replacement hard register for
 an elimination.
-
+
 If DEST is the hard frame pointer, we do nothing because
 we as

[PATCH] c++: fix ICE with designated initializer [PR110114]

2023-07-19 Thread Marek Polacek via Gcc-patches
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --

r13-1227 added an assert checking that the index in a CONSTRUCTOR
is a FIELD_DECL.  That's a reasonable assumption but in this case
we never called reshape_init due to the type being incomplete, and
so the index remained an identifier node: get_class_binding never
got around to looking up the FIELD_DECL.

We can avoid the crash by returning early in build_aggr_conv; we'd
return NULL anyway due to:

  if (i < CONSTRUCTOR_NELTS (ctor))
return NULL;

PR c++/110114

gcc/cp/ChangeLog:

* call.cc (build_aggr_conv): Return early if the type isn't
complete.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/desig28.C: New test.
* g++.dg/cpp2a/desig29.C: New test.
---
 gcc/cp/call.cc   |  5 +
 gcc/testsuite/g++.dg/cpp2a/desig28.C | 17 +
 gcc/testsuite/g++.dg/cpp2a/desig29.C | 10 ++
 3 files changed, 32 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig28.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig29.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index b55230d98aa..0af20a81717 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -986,6 +986,11 @@ build_aggr_conv (tree type, tree ctor, int flags, 
tsubst_flags_t complain)
   tree empty_ctor = NULL_TREE;
   hash_set pset;
 
+  /* We've called complete_type on TYPE before calling this function, but
+ perhaps it wasn't successful.  */
+  if (!COMPLETE_TYPE_P (type))
+return nullptr;
+
   /* We already called reshape_init in implicit_conversion, but it might not
  have done anything in the case of parenthesized aggr init.  */
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/desig28.C 
b/gcc/testsuite/g++.dg/cpp2a/desig28.C
new file mode 100644
index 000..b63265fea51
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/desig28.C
@@ -0,0 +1,17 @@
+// PR c++/110114
+// { dg-do compile { target c++20 } }
+
+struct A {
+int a,b;
+};
+
+struct B;
+
+void foo(const A &) {}
+void foo(const B &) {}
+
+int
+main ()
+{
+  foo({.a=0});
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/desig29.C 
b/gcc/testsuite/g++.dg/cpp2a/desig29.C
new file mode 100644
index 000..bd1a82b041d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/desig29.C
@@ -0,0 +1,10 @@
+// PR c++/110114
+// { dg-do compile { target c++20 } }
+
+struct B;
+
+void foo(const B &) {}
+
+int main() {
+foo({.a=0}); // { dg-error "invalid" }
+}

base-commit: 2971ff7b1d564ac04b537d907c70e6093af70832
-- 
2.41.0



[PATCH] c++: passing partially inst tmpl as ttp [PR110566]

2023-07-19 Thread Patrick Palka via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13?

-- >8 --

Since the arguments 'pargs' passed to the coerce_template_parms from
coerce_template_template_parms are always a full set, we need to make sure
we always pass the parameters of the most general template because if the
template is partially instantiated then the levels won't match up.  In the
testcase below during said call to coerce_template_parms the parameters
are {X, Y} both level 1, but the arguments are {{int}, {N, M}}, which
leads to a crash during auto deduction of X and Y.

PR c++/110566

gcc/cp/ChangeLog:

* pt.cc (coerce_template_template_parms): Simplify by using
DECL_INNERMOST_TEMPLATE_PARMS and removing redundant asserts.
Always pass the parameters of the most general template to
coerce_template_parms.

gcc/testsuite/ChangeLog:

* g++.dg/template/ttp38.C: New test.
---
 gcc/cp/pt.cc  | 12 +---
 gcc/testsuite/g++.dg/template/ttp38.C | 12 
 2 files changed, 17 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/ttp38.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d882e9dd117..8723868823e 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -8073,12 +8073,10 @@ coerce_template_template_parms (tree parm_tmpl,
   tree parm, arg;
   int variadic_p = 0;
 
-  tree parm_parms = INNERMOST_TEMPLATE_PARMS (DECL_TEMPLATE_PARMS (parm_tmpl));
-  tree arg_parms_full = DECL_TEMPLATE_PARMS (arg_tmpl);
-  tree arg_parms = INNERMOST_TEMPLATE_PARMS (arg_parms_full);
-
-  gcc_assert (TREE_CODE (parm_parms) == TREE_VEC);
-  gcc_assert (TREE_CODE (arg_parms) == TREE_VEC);
+  tree parm_parms = DECL_INNERMOST_TEMPLATE_PARMS (parm_tmpl);
+  tree arg_parms = DECL_INNERMOST_TEMPLATE_PARMS (arg_tmpl);
+  tree gen_arg_tmpl = most_general_template (arg_tmpl);
+  tree gen_arg_parms = DECL_INNERMOST_TEMPLATE_PARMS (gen_arg_tmpl);
 
   nparms = TREE_VEC_LENGTH (parm_parms);
   nargs = TREE_VEC_LENGTH (arg_parms);
@@ -8134,7 +8132,7 @@ coerce_template_template_parms (tree parm_tmpl,
scope_args = TI_ARGS (tinfo);
   pargs = add_to_template_args (scope_args, pargs);
 
-  pargs = coerce_template_parms (arg_parms, pargs, NULL_TREE, tf_none);
+  pargs = coerce_template_parms (gen_arg_parms, pargs, NULL_TREE, tf_none);
   if (pargs != error_mark_node)
{
  tree targs = make_tree_vec (nargs);
diff --git a/gcc/testsuite/g++.dg/template/ttp38.C 
b/gcc/testsuite/g++.dg/template/ttp38.C
new file mode 100644
index 000..7d25d291e81
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/ttp38.C
@@ -0,0 +1,12 @@
+// PR c++/110566
+// { dg-do compile { target c++20 } }
+
+template class>
+struct A;
+
+template
+struct B {
+  template struct C;
+};
+
+using type = A::C>;
-- 
2.41.0.376.gcba07a324d



Re: [PATCH] c++: passing partially inst tmpl as ttp [PR110566]

2023-07-19 Thread Patrick Palka via Gcc-patches
On Wed, Jul 19, 2023 at 2:05 PM Patrick Palka  wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk/13?
>
> -- >8 --
>
> Since the arguments 'pargs' passed to the coerce_template_parms from
> coerce_template_template_parms are always a full set, we need to make sure
> we always pass the parameters of the most general template because if the
> template is partially instantiated then the levels won't match up.  In the
> testcase below during said call to coerce_template_parms the parameters
> are {X, Y} both level 1, but the arguments are {{int}, {N, M}}, which
> leads to a crash during auto deduction of X and Y.
>
> PR c++/110566
>
> gcc/cp/ChangeLog:
>
> * pt.cc (coerce_template_template_parms): Simplify by using
> DECL_INNERMOST_TEMPLATE_PARMS and removing redundant asserts.
> Always pass the parameters of the most general template to
> coerce_template_parms.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/template/ttp38.C: New test.
> ---
>  gcc/cp/pt.cc  | 12 +---
>  gcc/testsuite/g++.dg/template/ttp38.C | 12 
>  2 files changed, 17 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/template/ttp38.C
>
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index d882e9dd117..8723868823e 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -8073,12 +8073,10 @@ coerce_template_template_parms (tree parm_tmpl,
>tree parm, arg;
>int variadic_p = 0;
>
> -  tree parm_parms = INNERMOST_TEMPLATE_PARMS (DECL_TEMPLATE_PARMS 
> (parm_tmpl));
> -  tree arg_parms_full = DECL_TEMPLATE_PARMS (arg_tmpl);
> -  tree arg_parms = INNERMOST_TEMPLATE_PARMS (arg_parms_full);
> -
> -  gcc_assert (TREE_CODE (parm_parms) == TREE_VEC);
> -  gcc_assert (TREE_CODE (arg_parms) == TREE_VEC);
> +  tree parm_parms = DECL_INNERMOST_TEMPLATE_PARMS (parm_tmpl);
> +  tree arg_parms = DECL_INNERMOST_TEMPLATE_PARMS (arg_tmpl);
> +  tree gen_arg_tmpl = most_general_template (arg_tmpl);
> +  tree gen_arg_parms = DECL_INNERMOST_TEMPLATE_PARMS (gen_arg_tmpl);
>
>nparms = TREE_VEC_LENGTH (parm_parms);
>nargs = TREE_VEC_LENGTH (arg_parms);
> @@ -8134,7 +8132,7 @@ coerce_template_template_parms (tree parm_tmpl,
> scope_args = TI_ARGS (tinfo);
>pargs = add_to_template_args (scope_args, pargs);
>
> -  pargs = coerce_template_parms (arg_parms, pargs, NULL_TREE, tf_none);
> +  pargs = coerce_template_parms (gen_arg_parms, pargs, NULL_TREE, 
> tf_none);
>if (pargs != error_mark_node)
> {
>   tree targs = make_tree_vec (nargs);
> diff --git a/gcc/testsuite/g++.dg/template/ttp38.C 
> b/gcc/testsuite/g++.dg/template/ttp38.C
> new file mode 100644
> index 000..7d25d291e81
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/ttp38.C
> @@ -0,0 +1,12 @@
> +// PR c++/110566
> +// { dg-do compile { target c++20 } }
> +
> +template class>
> +struct A;
> +
> +template
> +struct B {
> +  template struct C;

Oops, I botched a git commit --amend.  The parameter list here should
be 'auto X, auto Y'.

> +};
> +
> +using type = A::C>;
> --
> 2.41.0.376.gcba07a324d
>



Re: [PATCH] c++: fix ICE with designated initializer [PR110114]

2023-07-19 Thread Patrick Palka via Gcc-patches
On Wed, 19 Jul 2023, Marek Polacek wrote:

> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

LGTM.  It might be preferable to check COMPLETE_TYPE_P in the caller
instead, so that we avoid inspecting CLASSTYPE_NON_AGGREGATE on an
incomplete class type, and so that the caller doesn't "commit" to
building an aggregate conversion.

> 
> -- >8 --
> 
> r13-1227 added an assert checking that the index in a CONSTRUCTOR
> is a FIELD_DECL.  That's a reasonable assumption but in this case
> we never called reshape_init due to the type being incomplete, and
> so the index remained an identifier node: get_class_binding never
> got around to looking up the FIELD_DECL.
> 
> We can avoid the crash by returning early in build_aggr_conv; we'd
> return NULL anyway due to:
> 
>   if (i < CONSTRUCTOR_NELTS (ctor))
> return NULL;
> 
>   PR c++/110114
> 
> gcc/cp/ChangeLog:
> 
>   * call.cc (build_aggr_conv): Return early if the type isn't
>   complete.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/desig28.C: New test.
>   * g++.dg/cpp2a/desig29.C: New test.
> ---
>  gcc/cp/call.cc   |  5 +
>  gcc/testsuite/g++.dg/cpp2a/desig28.C | 17 +
>  gcc/testsuite/g++.dg/cpp2a/desig29.C | 10 ++
>  3 files changed, 32 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig28.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig29.C
> 
> diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> index b55230d98aa..0af20a81717 100644
> --- a/gcc/cp/call.cc
> +++ b/gcc/cp/call.cc
> @@ -986,6 +986,11 @@ build_aggr_conv (tree type, tree ctor, int flags, 
> tsubst_flags_t complain)
>tree empty_ctor = NULL_TREE;
>hash_set pset;
>  
> +  /* We've called complete_type on TYPE before calling this function, but
> + perhaps it wasn't successful.  */
> +  if (!COMPLETE_TYPE_P (type))
> +return nullptr;
> +
>/* We already called reshape_init in implicit_conversion, but it might not
>   have done anything in the case of parenthesized aggr init.  */
>  
> diff --git a/gcc/testsuite/g++.dg/cpp2a/desig28.C 
> b/gcc/testsuite/g++.dg/cpp2a/desig28.C
> new file mode 100644
> index 000..b63265fea51
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/desig28.C
> @@ -0,0 +1,17 @@
> +// PR c++/110114
> +// { dg-do compile { target c++20 } }
> +
> +struct A {
> +int a,b;
> +};
> +
> +struct B;
> +
> +void foo(const A &) {}
> +void foo(const B &) {}
> +
> +int
> +main ()
> +{
> +  foo({.a=0});
> +}
> diff --git a/gcc/testsuite/g++.dg/cpp2a/desig29.C 
> b/gcc/testsuite/g++.dg/cpp2a/desig29.C
> new file mode 100644
> index 000..bd1a82b041d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/desig29.C
> @@ -0,0 +1,10 @@
> +// PR c++/110114
> +// { dg-do compile { target c++20 } }
> +
> +struct B;
> +
> +void foo(const B &) {}
> +
> +int main() {
> +foo({.a=0}); // { dg-error "invalid" }
> +}
> 
> base-commit: 2971ff7b1d564ac04b537d907c70e6093af70832
> -- 
> 2.41.0
> 
> 



Re: [GCC 13 PATCH] aarch64: Remove architecture dependencies from intrinsics

2023-07-19 Thread Ramana Radhakrishnan
On Wed, Jul 19, 2023 at 5:44 PM Andrew Carlotti via Gcc-patches
 wrote:
>
> Updated patch to fix the fp16 intrinsic pragmas, and pushed to master.
> OK to backport to GCC 13?
>
>
> Many intrinsics currently depend on both an architecture version and a
> feature, despite the corresponding instructions being available within
> GCC at lower architecture versions.
>
> LLVM has already removed these explicit architecture version
> dependences; this patch does the same for GCC. Note that +fp16 does not
> imply +simd, so we need to add an explicit +simd for the Neon fp16
> intrinsics.
>
> Binutils did not previously support all of these architecture+feature
> combinations, but this problem is already reachable from GCC.  For
> example, compiling the test gcc.target/aarch64/usadv16qi-dotprod.c
> with -O3 -march=armv8-a+dotprod has resulted in an assembler error since
> GCC 10.  This is fixed in Binutils 2.41.

Are there any implementations that actually implement v8-a + dotprod
?. As far as I'm aware this was v8.2-A as the base architecture where
this was allowed. Has this changed recently?


regards
Ramana


Re: [PATCH] c++: fix ICE with designated initializer [PR110114]

2023-07-19 Thread Marek Polacek via Gcc-patches
On Wed, Jul 19, 2023 at 02:32:15PM -0400, Patrick Palka wrote:
> On Wed, 19 Jul 2023, Marek Polacek wrote:
> 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> LGTM.  It might be preferable to check COMPLETE_TYPE_P in the caller
> instead, so that we avoid inspecting CLASSTYPE_NON_AGGREGATE on an
> incomplete class type, and so that the caller doesn't "commit" to
> building an aggregate conversion.

Perhaps.  I wanted to avoid the call to build_user_type_conversion_1.
I could add an early return to implicit_conversion_1 but I'd have to
move some code around not to check COMPLETE_TYPE_P before complete_type.
 
> > 
> > -- >8 --
> > 
> > r13-1227 added an assert checking that the index in a CONSTRUCTOR
> > is a FIELD_DECL.  That's a reasonable assumption but in this case
> > we never called reshape_init due to the type being incomplete, and
> > so the index remained an identifier node: get_class_binding never
> > got around to looking up the FIELD_DECL.
> > 
> > We can avoid the crash by returning early in build_aggr_conv; we'd
> > return NULL anyway due to:
> > 
> >   if (i < CONSTRUCTOR_NELTS (ctor))
> > return NULL;
> > 
> > PR c++/110114
> > 
> > gcc/cp/ChangeLog:
> > 
> > * call.cc (build_aggr_conv): Return early if the type isn't
> > complete.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/desig28.C: New test.
> > * g++.dg/cpp2a/desig29.C: New test.
> > ---
> >  gcc/cp/call.cc   |  5 +
> >  gcc/testsuite/g++.dg/cpp2a/desig28.C | 17 +
> >  gcc/testsuite/g++.dg/cpp2a/desig29.C | 10 ++
> >  3 files changed, 32 insertions(+)
> >  create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig28.C
> >  create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig29.C
> > 
> > diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> > index b55230d98aa..0af20a81717 100644
> > --- a/gcc/cp/call.cc
> > +++ b/gcc/cp/call.cc
> > @@ -986,6 +986,11 @@ build_aggr_conv (tree type, tree ctor, int flags, 
> > tsubst_flags_t complain)
> >tree empty_ctor = NULL_TREE;
> >hash_set pset;
> >  
> > +  /* We've called complete_type on TYPE before calling this function, but
> > + perhaps it wasn't successful.  */
> > +  if (!COMPLETE_TYPE_P (type))
> > +return nullptr;
> > +
> >/* We already called reshape_init in implicit_conversion, but it might 
> > not
> >   have done anything in the case of parenthesized aggr init.  */
> >  
> > diff --git a/gcc/testsuite/g++.dg/cpp2a/desig28.C 
> > b/gcc/testsuite/g++.dg/cpp2a/desig28.C
> > new file mode 100644
> > index 000..b63265fea51
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/cpp2a/desig28.C
> > @@ -0,0 +1,17 @@
> > +// PR c++/110114
> > +// { dg-do compile { target c++20 } }
> > +
> > +struct A {
> > +int a,b;
> > +};
> > +
> > +struct B;
> > +
> > +void foo(const A &) {}
> > +void foo(const B &) {}
> > +
> > +int
> > +main ()
> > +{
> > +  foo({.a=0});
> > +}
> > diff --git a/gcc/testsuite/g++.dg/cpp2a/desig29.C 
> > b/gcc/testsuite/g++.dg/cpp2a/desig29.C
> > new file mode 100644
> > index 000..bd1a82b041d
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/cpp2a/desig29.C
> > @@ -0,0 +1,10 @@
> > +// PR c++/110114
> > +// { dg-do compile { target c++20 } }
> > +
> > +struct B;
> > +
> > +void foo(const B &) {}
> > +
> > +int main() {
> > +foo({.a=0}); // { dg-error "invalid" }
> > +}
> > 
> > base-commit: 2971ff7b1d564ac04b537d907c70e6093af70832
> > -- 
> > 2.41.0
> > 
> > 
> 

Marek



Re: [PATCH] libstdc++: Define _GLIBCXX_HAS_BUILTIN_TRAIT

2023-07-19 Thread Patrick Palka via Gcc-patches
On Tue, 18 Jul 2023, Ken Matsui via Libstdc++ wrote:

> This patch defines _GLIBCXX_HAS_BUILTIN_TRAIT, which will be used as a
> flag to toggle built-in traits in the type_traits header. Through this
> macro function and _GLIBCXX_NO_BUILTIN_TRAITS macro, we can switch the
> use of built-in traits without needing to modify the source code.
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/bits/c++config (_GLIBCXX_HAS_BUILTIN_TRAIT): Define.

The ChangeLog entry should also mention the change to _GLIBCXX_HAS_BUILTIN,
e.g.

(_GLIBCXX_HAS_BUILTIN): Keep defined.

> 
> Signed-off-by: Ken Matsui 
> ---
>  libstdc++-v3/include/bits/c++config | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/libstdc++-v3/include/bits/c++config 
> b/libstdc++-v3/include/bits/c++config
> index dd47f274d5f..de13f61db71 100644
> --- a/libstdc++-v3/include/bits/c++config
> +++ b/libstdc++-v3/include/bits/c++config
> @@ -854,7 +854,11 @@ namespace __gnu_cxx
>  # define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
>  #endif
>  
> -#undef _GLIBCXX_HAS_BUILTIN
> +// Returns true if _GLIBCXX_NO_BUILTIN_TRAITS is not defined and the compiler
> +// has a corresponding built-in type trait. _GLIBCXX_NO_BUILTIN_TRAITS is
> +// defined to disable the use of built-in traits.
> +#define _GLIBCXX_HAS_BUILTIN_TRAIT(BT)  \
> +  (!defined(_GLIBCXX_NO_BUILTIN_TRAITS) && _GLIBCXX_HAS_BUILTIN(BT))

Since we don't expect _GLIBCXX_NO_BUILTIN_TRAITS to get
defined/undefined in the middle of preprocessing, perhaps we should
factor out the _GLIBCXX_NO_BUILTIN_TRAITS test from the macro function
and instead conditionally define the macro function to 0 according
_GLIBCXX_NO_BUILTIN_TRAITS?

>  
>  // Mark code that should be ignored by the compiler, but seen by Doxygen.
>  #define _GLIBCXX_DOXYGEN_ONLY(X)
> -- 
> 2.41.0
> 
> 



Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-07-19 Thread Qing Zhao via Gcc-patches

>> 
>> The point is: allocation size should synced with the value of “counted_by”. 
>> LLVM’s RFC also have the similar requirement:
>> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18
> 
> Right, I'm saying it would be nice if __alloc_size was checked as well,
> in the sense that if it is available, it knows without question what the
> size of the allocation is. If __alloc_size and __counted_by conflict,
> the smaller of the two should be the truth.

I don’t think that  “if __alloc_size and __counted_by conflict, the smaller of 
the two should be the truth” will work correctly.

When __alloc_size is larger than the value of __counted_by, it’s okay. 
But when the value of __counted_by is larger than the __alloc_size, the array 
bound check or object size sanitizer might not work correctly.


Please see the following example:

struct grows {
int alloc_count;
int valid_count;
int  item[] __counted_by(valid_count);
} *p;

void __attribute__((__noinline__)) something (int n)
{
p = malloc(sizeof(*p) + sizeof(*p->item) * 100);
p->alloc_count = 100;
p->valid_count = 102;
p->item[n] = 10;// both _alloc_size and the value of 
__counted_by are available in this routine, the smaller one is , 100;

}

void __attribute__((__noinline__))  something_2 (int n)
{
   p->item[n] = 10;   // only the value of  __counted_by is available in this 
routine, which is 102;  
}

Int main
{
   Something (101);
   Something_2 (101);
}


For the above example, the out-of-bound array access in routine “something” 
should be able to be caught by the compiler.
However, the out-of-bound array access in the routine “something_2” will NOT be 
able to be caught by the compiler.

Since in the routine “something_2” , the compiler don’t know the alloc_size, 
the only available info is the counted_by value
 through the attribute.  But this value is bigger than the REAL size of the 
array. Therefore the compiler cannot detect the 
out-of-bound array access in the routine something_2


Based on the above observation, I think we should add the following 
requirement: 

The value of “counted_by” should be equal or SMALLER than the real alloc_size 
for the flexible array member. 

This is the same requirement as the LLVM RFC. 
https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18

"the compiler inserts additional checks to ensure the new buf has at least as 
many elements as the new count indicates.”
LLVM has additional requirement in addition to this, we might need to consider 
those requirement too. 

Qing

> But, as I said, if there is some need to explicitly ignore __alloc_size
> when __counted_by is present, I can live with it; we just need to
> document it.
> 
> If the RFC and you agree that the __counted_by variable can only ever be
> (re)assigned after the flex array has been (re)allocated, then I guess
> we'll see how it goes. :) I think most places in the kernel using
> __counted_by will be fine, but I suspect we may have cases where we need
> to update it like in the loop I described above. If that's true, we can
> revisit the requirement then. :)
> 
> -Kees
> 
> -- 
> Kees Cook



[PATCH] Use strtol instead of std::stoi in gensupport.cc

2023-07-19 Thread John David Anglin
Tested on trunk with hppa64-hp-hpux11.11.

Okay?

Dave
---

Use strtol instead of std::stoi [PR110646]

Implementation of std::stoi was overlooked on hppa-hpux, so use
strtol instead.

2023-07-19  John David Anglin  

gcc/ChangeLog:

PR bootstrap/110646
* gensupport.cc(class conlist): Use strtol instead of std::stoi.

diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 959d1d9c83c..87bcf5ee441 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -640,7 +640,7 @@ public:
 
 name.assign (ns, len);
 if (numeric)
-  idx = std::stoi (name);
+  idx = strtol (name.c_str (), (char **)NULL, 10);
   }
 
   /* Adds a character to the end of the string.  */


signature.asc
Description: PGP signature


[PATCH] c++: -Wmissing-field-initializers and empty class [PR110064]

2023-07-19 Thread Marek Polacek via Gcc-patches
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --

Let's suppress -Wmissing-field-initializers for empty classes.

Here I don't think I need the usual COMPLETE_TYPE_P/dependent_type_p
checks.

PR c++/110064

gcc/cp/ChangeLog:

* typeck2.cc (process_init_constructor_record): Don't emit
-Wmissing-field-initializers for empty classes.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wmissing-field-initializers-3.C: New test.
---
 gcc/cp/typeck2.cc |  3 +-
 .../warn/Wmissing-field-initializers-3.C  | 48 +++
 2 files changed, 50 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wmissing-field-initializers-3.C

diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index 1c204c8612b..582a73bb053 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -1874,7 +1874,8 @@ process_init_constructor_record (tree type, tree init, 
int nested, int flags,
 to zero.  */
  if ((complain & tf_warning)
  && !cp_unevaluated_operand
- && !EMPTY_CONSTRUCTOR_P (init))
+ && !EMPTY_CONSTRUCTOR_P (init)
+ && !is_really_empty_class (fldtype, /*ignore_vptr*/false))
warning (OPT_Wmissing_field_initializers,
 "missing initializer for member %qD", field);
 
diff --git a/gcc/testsuite/g++.dg/warn/Wmissing-field-initializers-3.C 
b/gcc/testsuite/g++.dg/warn/Wmissing-field-initializers-3.C
new file mode 100644
index 000..a8d75b92bd1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wmissing-field-initializers-3.C
@@ -0,0 +1,48 @@
+// PR c++/110064
+// { dg-do compile { target c++17 } }
+// { dg-options "-Wmissing-field-initializers" }
+
+struct B { };
+struct D : B {
+int x;
+int y;
+};
+
+struct E {
+  int x;
+  int y;
+  B z;
+};
+
+template struct X { };
+
+template
+struct F {
+  int i;
+  int j;
+  X x;
+};
+
+int
+main ()
+{
+  D d = {.x=1, .y=2}; // { dg-bogus "missing" }
+  (void)d;
+  E e = {.x=1, .y=2}; // { dg-bogus "missing" }
+  (void)e;
+  F f = {.i=1, .j=2 }; // { dg-bogus "missing" }
+  (void)f;
+}
+
+template
+void fn ()
+{
+  F f = {.i=1, .j=2 }; // { dg-bogus "missing" }
+  (void)f;
+}
+
+void
+g ()
+{
+  fn ();
+}

base-commit: 2971ff7b1d564ac04b537d907c70e6093af70832
-- 
2.41.0



Re: [PATCH] testsuite: fix allocator-opt1.C FAIL with old ABI

2023-07-19 Thread Marek Polacek via Gcc-patches
Ping.

On Mon, Jul 10, 2023 at 04:33:26PM -0400, Marek Polacek via Gcc-patches wrote:
> Running
> $ make check-g++ 
> RUNTESTFLAGS='--target_board=unix\{-D_GLIBCXX_USE_CXX11_ABI=0,\} 
> dg.exp=allocator-opt1.C'
> yields:
> 
> FAIL: g++.dg/tree-ssa/allocator-opt1.C  -std=c++98  scan-tree-dump-times 
> gimple "struct allocator D" 1
> FAIL: g++.dg/tree-ssa/allocator-opt1.C  -std=c++14  scan-tree-dump-times 
> gimple "struct allocator D" 1
> FAIL: g++.dg/tree-ssa/allocator-opt1.C  -std=c++17  scan-tree-dump-times 
> gimple "struct allocator D" 1
> FAIL: g++.dg/tree-ssa/allocator-opt1.C  -std=c++20  scan-tree-dump-times 
> gimple "struct allocator D" 1
> 
>   === g++ Summary for unix/-D_GLIBCXX_USE_CXX11_ABI=0 ===
> 
>   === g++ Summary for unix ===
> 
> because in the old ABI we get two "struct allocator D".  This patch
> follows r14-658 although I'm not quite sure I follow the logic there.
> 
> Tested on x86_64-pc-linux-gnu, ok for trunk?
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/tree-ssa/allocator-opt1.C: Force _GLIBCXX_USE_CXX11_ABI to 1.
> ---
>  gcc/testsuite/g++.dg/tree-ssa/allocator-opt1.C | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/allocator-opt1.C 
> b/gcc/testsuite/g++.dg/tree-ssa/allocator-opt1.C
> index e8394c7ad70..9f13eedb604 100644
> --- a/gcc/testsuite/g++.dg/tree-ssa/allocator-opt1.C
> +++ b/gcc/testsuite/g++.dg/tree-ssa/allocator-opt1.C
> @@ -5,8 +5,18 @@
>  // Currently the dump doesn't print the allocator template arg in this 
> context.
>  // { dg-final { scan-tree-dump-times "struct allocator D" 1 "gimple" } }
>  
> +// In the pre-C++11 ABI we get two allocator variables.
> +#undef _GLIBCXX_USE_CXX11_ABI
> +#define _GLIBCXX_USE_CXX11_ABI 1
> +
> +// When the library is not dual-ABI and defaults to old just compile
> +// an empty TU
> +#if _GLIBCXX_USE_CXX11_ABI
> +
>  #include 
>  void f (const char *p)
>  {
>std::string lst[] = { p, p, p, p };
>  }
> +
> +#endif
> 
> base-commit: 2d7c95e31431a297060c94697af84f498abf97a2
> -- 
> 2.41.0
> 

Marek



Re: [PATCH] c++: fix ICE with designated initializer [PR110114]

2023-07-19 Thread Jason Merrill via Gcc-patches

On 7/19/23 14:38, Marek Polacek wrote:

On Wed, Jul 19, 2023 at 02:32:15PM -0400, Patrick Palka wrote:

On Wed, 19 Jul 2023, Marek Polacek wrote:


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


LGTM.  It might be preferable to check COMPLETE_TYPE_P in the caller
instead, so that we avoid inspecting CLASSTYPE_NON_AGGREGATE on an
incomplete class type, and so that the caller doesn't "commit" to
building an aggregate conversion.


Perhaps.  I wanted to avoid the call to build_user_type_conversion_1.
I could add an early return to implicit_conversion_1 but I'd have to
move some code around not to check COMPLETE_TYPE_P before complete_type.


Maybe return NULL for the incomplete case here, rather than just 
skipping reshape_init?


  /* Call reshape_init early to remove redundant braces.  */
  if (expr && BRACE_ENCLOSED_INITIALIZER_P (expr)
  && CLASS_TYPE_P (to)
  && COMPLETE_TYPE_P (complete_type (to))
  && !CLASSTYPE_NON_AGGREGATE (to))
{
  expr = reshape_init (to, expr, complain);
  if (expr == error_mark_node)
return NULL;
  from = TREE_TYPE (expr);
}

If that doesn't work, the patch is fine as-is.


-- >8 --

r13-1227 added an assert checking that the index in a CONSTRUCTOR
is a FIELD_DECL.  That's a reasonable assumption but in this case
we never called reshape_init due to the type being incomplete, and
so the index remained an identifier node: get_class_binding never
got around to looking up the FIELD_DECL.

We can avoid the crash by returning early in build_aggr_conv; we'd
return NULL anyway due to:

   if (i < CONSTRUCTOR_NELTS (ctor))
 return NULL;

PR c++/110114

gcc/cp/ChangeLog:

* call.cc (build_aggr_conv): Return early if the type isn't
complete.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/desig28.C: New test.
* g++.dg/cpp2a/desig29.C: New test.
---
  gcc/cp/call.cc   |  5 +
  gcc/testsuite/g++.dg/cpp2a/desig28.C | 17 +
  gcc/testsuite/g++.dg/cpp2a/desig29.C | 10 ++
  3 files changed, 32 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig28.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig29.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index b55230d98aa..0af20a81717 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -986,6 +986,11 @@ build_aggr_conv (tree type, tree ctor, int flags, 
tsubst_flags_t complain)
tree empty_ctor = NULL_TREE;
hash_set pset;
  
+  /* We've called complete_type on TYPE before calling this function, but

+ perhaps it wasn't successful.  */
+  if (!COMPLETE_TYPE_P (type))
+return nullptr;
+
/* We already called reshape_init in implicit_conversion, but it might not
   have done anything in the case of parenthesized aggr init.  */
  
diff --git a/gcc/testsuite/g++.dg/cpp2a/desig28.C b/gcc/testsuite/g++.dg/cpp2a/desig28.C

new file mode 100644
index 000..b63265fea51
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/desig28.C
@@ -0,0 +1,17 @@
+// PR c++/110114
+// { dg-do compile { target c++20 } }
+
+struct A {
+int a,b;
+};
+
+struct B;
+
+void foo(const A &) {}
+void foo(const B &) {}
+
+int
+main ()
+{
+  foo({.a=0});
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/desig29.C 
b/gcc/testsuite/g++.dg/cpp2a/desig29.C
new file mode 100644
index 000..bd1a82b041d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/desig29.C
@@ -0,0 +1,10 @@
+// PR c++/110114
+// { dg-do compile { target c++20 } }
+
+struct B;
+
+void foo(const B &) {}
+
+int main() {
+foo({.a=0}); // { dg-error "invalid" }
+}

base-commit: 2971ff7b1d564ac04b537d907c70e6093af70832
--
2.41.0






Marek





Re: [PATCH] libstdc++: Define _GLIBCXX_HAS_BUILTIN_TRAIT

2023-07-19 Thread Ken Matsui via Gcc-patches
On Wed, Jul 19, 2023 at 11:48 AM Patrick Palka  wrote:
>
> On Tue, 18 Jul 2023, Ken Matsui via Libstdc++ wrote:
>
> > This patch defines _GLIBCXX_HAS_BUILTIN_TRAIT, which will be used as a
> > flag to toggle built-in traits in the type_traits header. Through this
> > macro function and _GLIBCXX_NO_BUILTIN_TRAITS macro, we can switch the
> > use of built-in traits without needing to modify the source code.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * include/bits/c++config (_GLIBCXX_HAS_BUILTIN_TRAIT): Define.
>
> The ChangeLog entry should also mention the change to _GLIBCXX_HAS_BUILTIN,
> e.g.
>
> (_GLIBCXX_HAS_BUILTIN): Keep defined.
>
> >
> > Signed-off-by: Ken Matsui 
> > ---
> >  libstdc++-v3/include/bits/c++config | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/libstdc++-v3/include/bits/c++config 
> > b/libstdc++-v3/include/bits/c++config
> > index dd47f274d5f..de13f61db71 100644
> > --- a/libstdc++-v3/include/bits/c++config
> > +++ b/libstdc++-v3/include/bits/c++config
> > @@ -854,7 +854,11 @@ namespace __gnu_cxx
> >  # define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
> >  #endif
> >
> > -#undef _GLIBCXX_HAS_BUILTIN
> > +// Returns true if _GLIBCXX_NO_BUILTIN_TRAITS is not defined and the 
> > compiler
> > +// has a corresponding built-in type trait. _GLIBCXX_NO_BUILTIN_TRAITS is
> > +// defined to disable the use of built-in traits.
> > +#define _GLIBCXX_HAS_BUILTIN_TRAIT(BT)  \
> > +  (!defined(_GLIBCXX_NO_BUILTIN_TRAITS) && _GLIBCXX_HAS_BUILTIN(BT))
>
> Since we don't expect _GLIBCXX_NO_BUILTIN_TRAITS to get
> defined/undefined in the middle of preprocessing, perhaps we should
> factor out the _GLIBCXX_NO_BUILTIN_TRAITS test from the macro function
> and instead conditionally define the macro function to 0 according
> _GLIBCXX_NO_BUILTIN_TRAITS?
>
Hi, thank you for your review! I totally agree with your ideas and
will update this patch.

> >
> >  // Mark code that should be ignored by the compiler, but seen by Doxygen.
> >  #define _GLIBCXX_DOXYGEN_ONLY(X)
> > --
> > 2.41.0
> >
> >
>


[PATCH v2] libstdc++: Define _GLIBCXX_HAS_BUILTIN_TRAIT

2023-07-19 Thread Ken Matsui via Gcc-patches
This patch defines _GLIBCXX_HAS_BUILTIN_TRAIT macro, which will be used
as a flag to toggle the use of built-in traits in the type_traits header
through _GLIBCXX_NO_BUILTIN_TRAITS macro, without needing to modify the
source code.

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_HAS_BUILTIN_TRAIT): Define.
(_GLIBCXX_HAS_BUILTIN): Keep defined.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/bits/c++config | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index dd47f274d5f..984985d6fff 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -854,7 +854,15 @@ namespace __gnu_cxx
 # define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
 #endif
 
-#undef _GLIBCXX_HAS_BUILTIN
+// Returns 1 if _GLIBCXX_NO_BUILTIN_TRAITS is not defined and the compiler
+// has a corresponding built-in type trait, 0 otherwise.
+// _GLIBCXX_NO_BUILTIN_TRAITS can be defined to disable the use of built-in
+// traits.
+#ifndef _GLIBCXX_NO_BUILTIN_TRAITS
+# define _GLIBCXX_HAS_BUILTIN_TRAIT(BT) _GLIBCXX_HAS_BUILTIN(BT)
+#else
+# define _GLIBCXX_HAS_BUILTIN_TRAIT(BT) 0
+#endif
 
 // Mark code that should be ignored by the compiler, but seen by Doxygen.
 #define _GLIBCXX_DOXYGEN_ONLY(X)
-- 
2.41.0



Re: [PATCH] c++: -Wmissing-field-initializers and empty class [PR110064]

2023-07-19 Thread Jason Merrill via Gcc-patches

On 7/19/23 15:20, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.  We might also improve the diagnostic for base classes, perhaps by 
teaching dump_simple_decl about DECL_FIELD_IS_BASE?



-- >8 --

Let's suppress -Wmissing-field-initializers for empty classes.

Here I don't think I need the usual COMPLETE_TYPE_P/dependent_type_p
checks.

PR c++/110064

gcc/cp/ChangeLog:

* typeck2.cc (process_init_constructor_record): Don't emit
-Wmissing-field-initializers for empty classes.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wmissing-field-initializers-3.C: New test.
---
  gcc/cp/typeck2.cc |  3 +-
  .../warn/Wmissing-field-initializers-3.C  | 48 +++
  2 files changed, 50 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wmissing-field-initializers-3.C

diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index 1c204c8612b..582a73bb053 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -1874,7 +1874,8 @@ process_init_constructor_record (tree type, tree init, 
int nested, int flags,
 to zero.  */
  if ((complain & tf_warning)
  && !cp_unevaluated_operand
- && !EMPTY_CONSTRUCTOR_P (init))
+ && !EMPTY_CONSTRUCTOR_P (init)
+ && !is_really_empty_class (fldtype, /*ignore_vptr*/false))
warning (OPT_Wmissing_field_initializers,
 "missing initializer for member %qD", field);
  
diff --git a/gcc/testsuite/g++.dg/warn/Wmissing-field-initializers-3.C b/gcc/testsuite/g++.dg/warn/Wmissing-field-initializers-3.C

new file mode 100644
index 000..a8d75b92bd1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wmissing-field-initializers-3.C
@@ -0,0 +1,48 @@
+// PR c++/110064
+// { dg-do compile { target c++17 } }
+// { dg-options "-Wmissing-field-initializers" }
+
+struct B { };
+struct D : B {
+int x;
+int y;
+};
+
+struct E {
+  int x;
+  int y;
+  B z;
+};
+
+template struct X { };
+
+template
+struct F {
+  int i;
+  int j;
+  X x;
+};
+
+int
+main ()
+{
+  D d = {.x=1, .y=2}; // { dg-bogus "missing" }
+  (void)d;
+  E e = {.x=1, .y=2}; // { dg-bogus "missing" }
+  (void)e;
+  F f = {.i=1, .j=2 }; // { dg-bogus "missing" }
+  (void)f;
+}
+
+template
+void fn ()
+{
+  F f = {.i=1, .j=2 }; // { dg-bogus "missing" }
+  (void)f;
+}
+
+void
+g ()
+{
+  fn ();
+}

base-commit: 2971ff7b1d564ac04b537d907c70e6093af70832




Re: [PATCH] c++: -Wmissing-field-initializers and empty class [PR110064]

2023-07-19 Thread Marek Polacek via Gcc-patches
On Wed, Jul 19, 2023 at 03:36:49PM -0400, Jason Merrill wrote:
> On 7/19/23 15:20, Marek Polacek wrote:
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> OK.  We might also improve the diagnostic for base classes, perhaps by
> teaching dump_simple_decl about DECL_FIELD_IS_BASE?

As in, instead of "D::" emit "D::B"?  Good idea.  I suppose
I could do that; Barry's testcase without this patch looks like a good
test case.  Thanks,

Marek



[x86_64 PATCH] More TImode parameter passing improvements.

2023-07-19 Thread Roger Sayle

This patch is the next piece of a solution to the x86_64 ABI issues in
PR 88873.  This splits the *concat3_3 define_insn_and_split
into two patterns, a TARGET_64BIT *concatditi3_3 and a !TARGET_64BIT
*concatsidi3_3.  This allows us to add an additional alternative to the
the 64-bit version, enabling the register allocator to perform this
operation using SSE registers, which is implemented/split after reload
using vec_concatv2di.

To demonstrate the improvement, the test case from PR88873:

typedef struct { double x, y; } s_t;

s_t foo (s_t a, s_t b, s_t c)
{
  return (s_t){ __builtin_fma(a.x, b.x, c.x), __builtin_fma (a.y, b.y, c.y)
};
}

when compiled with -O2 -march=cascadelake, currently generates:

foo:vmovq   %xmm2, -56(%rsp)
movq-56(%rsp), %rax
vmovq   %xmm3, -48(%rsp)
vmovq   %xmm4, -40(%rsp)
movq-48(%rsp), %rcx
vmovq   %xmm5, -32(%rsp)
vmovq   %rax, %xmm6
movq-40(%rsp), %rax
movq-32(%rsp), %rsi
vpinsrq $1, %rcx, %xmm6, %xmm6
vmovq   %xmm0, -24(%rsp)
vmovq   %rax, %xmm7
vmovq   %xmm1, -16(%rsp)
vmovapd %xmm6, %xmm2
vpinsrq $1, %rsi, %xmm7, %xmm7
vfmadd132pd -24(%rsp), %xmm7, %xmm2
vmovapd %xmm2, -56(%rsp)
vmovsd  -48(%rsp), %xmm1
vmovsd  -56(%rsp), %xmm0
ret

with this change, we avoid many of the reloads via memory,

foo:vpunpcklqdq %xmm3, %xmm2, %xmm7
vpunpcklqdq %xmm1, %xmm0, %xmm6
vpunpcklqdq %xmm5, %xmm4, %xmm2
vmovdqa %xmm7, -24(%rsp)
vmovdqa %xmm6, %xmm1
movq-16(%rsp), %rax
vpinsrq $1, %rax, %xmm7, %xmm4
vmovapd %xmm4, %xmm6
vfmadd132pd %xmm1, %xmm2, %xmm6
vmovapd %xmm6, -24(%rsp)
vmovsd  -16(%rsp), %xmm1
vmovsd  -24(%rsp), %xmm0
ret


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-07-19  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_move): Don't call
force_reg, to use SUBREG rather than create a new pseudo when
inserting DFmode fields into TImode with insvti_{high,low}part.
(*concat3_3): Split into two define_insn_and_split...
(*concatditi3_3): 64-bit implementation.  Provide alternative
that allows register allocation to use SSE registers that is
split into vec_concatv2di after reload.
(*concatsidi3_3): 32-bit implementation.

gcc/testsuite/ChangeLog
* gcc.target/i386/pr88873.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index f9b0dc6..9c3febe 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -558,7 +558,7 @@ ix86_expand_move (machine_mode mode, rtx operands[])
  op0 = SUBREG_REG (op0);
  tmp = gen_rtx_AND (TImode, copy_rtx (op0), tmp);
  if (mode == DFmode)
-   op1 = force_reg (DImode, gen_lowpart (DImode, op1));
+   op1 = gen_lowpart (DImode, op1);
  op1 = gen_rtx_ZERO_EXTEND (TImode, op1);
  op1 = gen_rtx_IOR (TImode, tmp, op1);
}
@@ -570,7 +570,7 @@ ix86_expand_move (machine_mode mode, rtx operands[])
  op0 = SUBREG_REG (op0);
  tmp = gen_rtx_AND (TImode, copy_rtx (op0), tmp);
  if (mode == DFmode)
-   op1 = force_reg (DImode, gen_lowpart (DImode, op1));
+   op1 = gen_lowpart (DImode, op1);
  op1 = gen_rtx_ZERO_EXTEND (TImode, op1);
  op1 = gen_rtx_ASHIFT (TImode, op1, GEN_INT (64));
  op1 = gen_rtx_IOR (TImode, tmp, op1);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 47ea050..8c54aa5 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12408,21 +12408,47 @@
   DONE;
 })
 
-(define_insn_and_split "*concat3_3"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,&r")
-   (any_or_plus:
- (ashift:
-   (zero_extend:
- (match_operand:DWIH 1 "nonimmediate_operand" "r,m,r,m"))
+(define_insn_and_split "*concatditi3_3"
+  [(set (match_operand:TI 0 "nonimmediate_operand" "=ro,r,r,&r,x")
+   (any_or_plus:TI
+ (ashift:TI
+   (zero_extend:TI
+ (match_operand:DI 1 "nonimmediate_operand" "r,m,r,m,x"))
(match_operand:QI 2 "const_int_operand"))
- (zero_extend:
-   (match_operand:DWIH 3 "nonimmediate_operand" "r,r,m,m"]
-  "INTVAL (operands[2]) ==  * BITS_PER_UNIT"
+ (zero_extend:TI
+   (match_operand:DI 3 "nonimmediate_operand" "r,r,m,m,0"]
+  "TARGET_64BIT
+   && INTVAL (operands[2]) == 64"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  if (SSE_REG_P (operands[0]))
+{
+  rtx tmp = gen_rtx_REG (V2DImode, REGNO (operands[0]));
+  emit_insn (gen_vec_concatv2di (tmp,

[PATCH] libstdc++: Fix preprocessor conditions for std::from_chars [PR109921]

2023-07-19 Thread Jonathan Wakely via Gcc-patches
I'm testing this patch for gcc-13 as a better fix for PR109921, without
the breakage that r14-1431-g7037e7b6e4ac41 caused on trunk.

If testing goes well I'll push this before the 13.2 release candidate.

-- >8 --

We use the from_chars_strtod function with __strtof128 to read a
_Float128 value, but from_chars_strtod is not defined unless uselocale
is available. This can lead to compilation failures for some targets,
because we try to define the _Float128 overload in terms of a
non-existing from_chars_strtod function.

Only try to use __strtof128 if uselocale is available and therefore we
can use the from_chars_strtod function template.

This is a simpler change than r14-1431-g7037e7b6e4ac41 on trunk, because
that caused unwanted ABI regressions (PR libstdc++/110077).

libstdc++-v3/ChangeLog:

PR libstdc++/109921
* src/c++17/floating_from_chars.cc (USE_STRTOF128_FOR_FROM_CHARS):
Only define when USE_STRTOD_FOR_FROM_CHARS is also defined.
(USE_STRTOD_FOR_FROM_CHARS): Do not undefine when long double is
binary64.
(from_chars(const char*, const char*, double&, chars_format)):
Check __LDBL_MANT_DIG__ == __DBL_MANT_DIG__ here.
---
 libstdc++-v3/src/c++17/floating_from_chars.cc | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc 
b/libstdc++-v3/src/c++17/floating_from_chars.cc
index 78b9d92cdc0..b3061bdca01 100644
--- a/libstdc++-v3/src/c++17/floating_from_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
@@ -64,7 +64,7 @@
 // strtold for __ieee128
 extern "C" __ieee128 __strtoieee128(const char*, char**);
 #elif __FLT128_MANT_DIG__ == 113 && __LDBL_MANT_DIG__ != 113 \
-  && defined(__GLIBC_PREREQ)
+  && defined(__GLIBC_PREREQ) && defined(USE_STRTOD_FOR_FROM_CHARS)
 #define USE_STRTOF128_FOR_FROM_CHARS 1
 extern "C" _Float128 __strtof128(const char*, char**)
   __asm ("strtof128")
@@ -77,10 +77,6 @@ extern "C" _Float128 __strtof128(const char*, char**)
 #if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 \
 && __SIZE_WIDTH__ >= 32
 # define USE_LIB_FAST_FLOAT 1
-# if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__
-// No need to use strtold.
-#  undef USE_STRTOD_FOR_FROM_CHARS
-# endif
 #endif
 
 #if USE_LIB_FAST_FLOAT
@@ -1211,7 +1207,7 @@ from_chars_result
 from_chars(const char* first, const char* last, long double& value,
   chars_format fmt) noexcept
 {
-#if ! USE_STRTOD_FOR_FROM_CHARS
+#if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__ || !defined USE_STRTOD_FOR_FROM_CHARS
   // Either long double is the same as double, or we can't use strtold.
   // In the latter case, this might give an incorrect result (e.g. values
   // out of range of double give an error, even if they fit in long double).
@@ -1280,6 +1276,7 @@ _ZSt10from_charsPKcS0_RDF128_St12chars_format(const char* 
first,
  chars_format fmt) noexcept
 __attribute__((alias ("_ZSt10from_charsPKcS0_Ru9__ieee128St12chars_format")));
 #elif defined(USE_STRTOF128_FOR_FROM_CHARS)
+// Overload for _Float128 is not defined inline in , define it here.
 from_chars_result
 from_chars(const char* first, const char* last, _Float128& value,
   chars_format fmt) noexcept
-- 
2.41.0



Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-19 Thread Nathan Sidwell via Gcc-patches

On 7/18/23 20:01, Ben Boeckel wrote:

On Tue, Jul 18, 2023 at 16:52:44 -0400, Jason Merrill wrote:

On 6/25/23 12:36, Ben Boeckel wrote:

On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:

On 6/22/23 22:45, Ben Boeckel wrote:

On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:

On 1/25/23 16:06, Ben Boeckel wrote:

They affect the build, so report them via `-MF` mechanisms.


Why isn't this covered by the existing code in preprocessed_module?


It appears as though it is neutered in patch 3 where
`write_make_modules_deps` is used in `make_write` (or will use that name


Why do you want to record the transitive modules? I would expect just noting the
ones with imports directly in the TU would suffice (i.e check the 'outermost' 
arg)


FWIW, only GCC has "fat" modules. MSVC and Clang both require the
transitive closure to be passed. The idea there is to minimize the size
of individual module files.

If GCC only reads the "fat" modules, then only those should be recorded.
If it reads other modules, they should be recorded as well.


For clarification, given:

* a.cppm
```
export module a;
```

* b.cppm
```
export module b;
import a;
```

* use.cppm
```
import b;
```

in a "fat" module setup, `use.cppm` only needs to be told about
`b.cmi` because it contains everything that an importer needs to know
about the `a` module (reachable types, re-exported bits, whateve > With
the "thin" modules, `a.cmi` must be specified when compiling `use.cppm`
to satisfy anything that may be required transitively (e.g., a return


GCC is neither of these descriptions.  a CMI does not contain the transitive 
closure of its imports.  It contains an import table.  That table lists the 
transitive closure of its imports (it needs that closure to do remapping), and 
that table contains the CMI pathnames of the direct imports.  Those pathnames 
are absolute, if the mapper provded an absolute pathm or relative to the CMI repo.


The rationale here is that if you're building a CMI, Foo, which imports a bunch 
of modules, those imported CMIs will have the same (relative) location in this 
compilation and in compilations importing Foo (why would you move them?) Note 
this is NOT inhibiting relocatable builds, because of the CMI repo.




Maybe I'm missing how this *actually* works in GCC as I've really only
interacted with it through the command line, but I've not needed to
mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and
read through some embedded information in `b.cmi` or does `b.cmi`
include enough information to not need to read it at all? If the former,
distributed builds are going to have a problem knowing what files to
send just from the command line (I'll call this "implicit thin"). If the
latter, that is the "fat" CMI that I'm thinking of.


please don't use perjorative terms like 'fat' and 'thin'.




But wouldn't the transitive modules be dependencies of the direct
imports, so (re)building the direct imports would first require building
the transitive modules anyway?  Expressing the transitive closure of
dependencies for each importer seems redundant when it can be easily
derived from the direct dependencies of each module.


I'm not concerned whether it is transitive or not, really. If a file is
read, it should be reported here regardless of the reason. Note that
caching mechanisms may skip actually *doing* the reading, but the
dependencies should still be reported from the cached results as-if the
real work had been performed.

--Ben


--
Nathan Sidwell



[PING 2] [PATCH] Less warnings for parameters declared as arrays [PR98541, PR98536]

2023-07-19 Thread Martin Uecker via Gcc-patches



Ok for gcc-14 now?


Am Dienstag, dem 04.04.2023 um 19:31 -0600 schrieb Jeff Law:
> 
> 
> On 4/3/23 13:34, Martin Uecker via Gcc-patches wrote:
> > 
> > 
> > With the relatively new warnings (11..) affecting VLA bounds,
> > I now get a lot of false positives with -Wall. In general, I find
> > the new warnings very useful, but they seem a bit too
> > aggressive and some minor tweaks are needed, otherwise they are
> > too noisy.  This patch suggests two changes:
> > 
> > 1. For VLA bounds non-null is implied only when 'static' is
> > used (similar to clang) and not already when a bound > 0 is
> > specified:
> > 
> > int foo(int n, char buf[static n]);
> > 
> > int foo(10, 0); // warning with 'static' but not without.
> > 
> > 
> > (It also seems problematic to require a size of 0 to indicate
> > that the pointer may be null, because 0 is not allowed in
> > ISO C as a size. It is also inconsistent to how arrays with
> > static bound behave.)
> > 
> > There seems to be agreement about this change in PR98541.
> > 
> > 
> > 2. GCC always warns when the number of unspecified
> > bounds is different between two declarations:
> > 
> > int foo(int n, char buf[*]);
> > int foo(int n, char buf[n]);
> > 
> > or
> > 
> > int foo(int n, char buf[n]);
> > int foo(int n, char buf[*]);
> > 
> > But the first version is useful if the size expression
> > can not be specified in a header (e.g. because it uses
> > a macro or variable not available there) and there is
> > currently no easy way to avoid this.  The warning for
> > both cases was by design,  but I suggest to limit the
> > warning to the second case.
> > 
> > Note that the logic currently applied by GCC is too
> > simplistic anyway, as GCC does not warn for
> > 
> > int foo(int x, int y, double m[*][y]);
> > int foo(int x, int y, double m[x][*]);
> > 
> > because the number of specified / unspecified bounds
> > is the same.  So I suggest to go with the attached
> > patch now and add  more precise warnings later
> > if there is more experience with these warning
> > in gernal and if this then still seems desirable.
> > 
> > 
> > Martin
> > 
> > 
> >  Less warnings for parameters declared as arrays [PR98541,
> > PR98536]
> >  
> >  To avoid false positivies, tune the warnings for parameters
> > declared
> >  as arrays with size expressions.  Only warn about null
> > arguments with
> >  'static'.  Also do not warn when more bounds are specified in
> > the new
> >  declaration than before.
> >  
> >  PR c/98541
> >  PR c/98536
> >  
> >  c-family/
> >  * c-warn.cc (warn_parm_array_mismatch): Do not warn if
> > more
> >  bounds are specified.
> >  
> >  gcc/
> >  * gimple-ssa-warn-access.cc
> >    (pass_waccess::maybe_check_access_sizes): For VLA
> > bounds
> >  in parameters, only warn about null pointers with
> > 'static'.
> >  
> >  gcc/testsuite:
> >  * gcc.dg/Wnonnull-4: Adapt test.
> >  * gcc.dg/Wstringop-overflow-40.c: Adapt test.
> >  * gcc.dg/Wvla-parameter-4.c: Adapt test.
> >  * gcc.dg/attr-access-2.c: Adapt test.
> Neither appears to be a regression.  Seems like it should defer to
> gcc-14.
> jeff




[PATCH] c++: Improve printing of base classes [PR110745]

2023-07-19 Thread Marek Polacek via Gcc-patches
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --

This patch changes

  warning: missing initializer for member 'D::' 
[-Wmissing-field-initializers]

to

  warning: missing initializer for member 'D::B' [-Wmissing-field-initializers]

PR c++/110745

gcc/cp/ChangeLog:

* error.cc (dump_simple_decl): Print base class name.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/base.C: New test.
---
 gcc/cp/error.cc|  2 ++
 gcc/testsuite/g++.dg/diagnostic/base.C | 16 
 2 files changed, 18 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/base.C

diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index 31319aa9e87..8a5219a68a1 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -1177,6 +1177,8 @@ dump_simple_decl (cxx_pretty_printer *pp, tree t, tree 
type, int flags)
 }
   else if (DECL_DECOMPOSITION_P (t))
 pp_string (pp, M_(""));
+  else if (TREE_CODE (t) == FIELD_DECL && DECL_FIELD_IS_BASE (t))
+dump_type (pp, TREE_TYPE (t), flags);
   else
 pp_string (pp, M_(""));
 
diff --git a/gcc/testsuite/g++.dg/diagnostic/base.C 
b/gcc/testsuite/g++.dg/diagnostic/base.C
new file mode 100644
index 000..1540414072e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/base.C
@@ -0,0 +1,16 @@
+// PR c++/110745
+// { dg-do compile { target c++17 } }
+// { dg-options "-Wmissing-field-initializers" }
+
+struct B { int i; };
+struct D : B {
+int x;
+int y;
+};
+
+int
+main ()
+{
+  D d = {.x=1, .y=2}; // { dg-warning "missing initializer for member .D::B." }
+  (void)d;
+}

base-commit: b1ae46bdd19fc2aaea41bc894168bdaf4799be80
-- 
2.41.0



  1   2   >