[PATCH] Fix gcc.dg/vect/bb-slp-pr65935.c FAIL with AVX after recent change

2021-09-28 Thread Richard Biener via Gcc-patches
This avoids bigger than V2DF vectorization which disturbs the ability
to consistently check for the vectorization result after us now
also vectorizing the V2DF tail of a V4DF vectorization variant.

tested on x86_64-unknown-linux-gnu, pushed.

2021-09-28  Richard Biener  

* gcc.dg/vect/bb-slp-pr65935.c: Prefer 128bit vectorization
on x86.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
index 5d80f560f56..ee121364910 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
@@ -1,5 +1,6 @@
 /* { dg-additional-options "-O3" } */
 /* { dg-require-effective-target vect_double } */
+/* { dg-additional-options "-mprefer-vector-width=128" { target x86_64-*-* 
i?86-*-* } } */
 
 #include "tree-vect.h"
 
-- 
2.31.1


Re: [r12-3899 Regression] FAIL: gcc.dg/strlenopt-13.c scan-tree-dump-times strlen1 "memcpy \\(" 7 on Linux/x86_64

2021-09-28 Thread Richard Biener via Gcc-patches
On Mon, 27 Sep 2021, sunil.k.pandey wrote:

> On Linux/x86_64,
> 
> d06dc8a2c73735e9496f434787ba4c93ceee5eea is the first bad commit
> commit d06dc8a2c73735e9496f434787ba4c93ceee5eea
> Author: Richard Biener 
> Date:   Mon Sep 27 13:36:12 2021 +0200
> 
> middle-end/102450 - avoid type_for_size for non-existing modes
> 
> caused
> 
> FAIL: gcc.dg/out-of-bounds-1.c  (test for warnings, line 12)
> FAIL: gcc.dg/pr78408-1.c scan-tree-dump-times fab1 "after previous" 17
> FAIL: gcc.dg/strlenopt-13.c scan-tree-dump-times strlen1 "memcpy \\(" 7

After the change the new memcpy inlining limit using MOVE_MAX * MOVE_RATIO
comes into play and ends up using an OImode move which previously was
disregarded as there's no __int256 standard type in the frontend
(but now we build such type anyway after verifying the mode exists and
it has move support).

For example gcc.dg/out-of-bounds-1.c which looks like

void ProjectOverlay(const float localTextureAxis[2], char *lump)
{
   const void *d = &localTextureAxis;
   int size = sizeof(float)*8 ;
   __builtin_memcpy( &lump[ 0 ], d, size );  /* { dg-warning "reading" } 
*/
}

gets turned into

movq%rdi, -8(%rsp)
vmovdqu64   -8(%rsp), %ymm31
vmovdqu64   %ymm31, (%rsi)

which I guess is good but then the diagnostic is no longer emitted
because -Wstringop-overread only applies to the builtin.  Usually
we avoid the folding in such a case but

  /* Detect out-of-bounds accesses without issuing warnings.
 Avoid folding out-of-bounds copies but to avoid false
 positives for unreachable code defer warning until after
 DCE has worked its magic.
 -Wrestrict is still diagnosed.  */
  if (int warning = check_bounds_or_overlap (as_a (stmt),
 dest, src, len, 
len,
 false, false))
if (warning != OPT_Wrestrict)
  return false;

does not seem to trigger here.  Changing the testcase to

void ProjectOverlay(const float localTextureAxis[2], char *lump)
{
   const void *d = &localTextureAxis;
   int size = sizeof(float)*4 ;
   __builtin_memcpy( &lump[ 0 ], d, size );  /* { dg-warning "reading" } 
*/
}

also fails to warn.

Richard.


Re: [PATCH] AVX512FP16:support basic 64/32bit vector type and operation.

2021-09-28 Thread Hongyu Wang via Gcc-patches
> I'd put this new pattern in mmx.md to keep 64bit/32bit modes in
> mmx.md, similar to e.g. FMA patterns among others.

Yes, I put it after single-float patterns. Attached the patch I'm
going to check-in.
Thanks for your review.

Uros Bizjak  于2021年9月28日周二 下午2:27写道:
>
> On Tue, Sep 28, 2021 at 6:48 AM Hongyu Wang  wrote:
> >
> > > ia32 ABI declares that __m64 values pass via MMX registers. Due to
> > > this, we are not able to fully disable MMX register usage, as is the
> > > case with x86_64. So, V4HFmode values will pass to functions via MMX
> > > registers on ia32 targets.
> > >
> > > So, there should be no additional define_insn, the addition to the
> > > existing MMXMODE mode iterator should be enough. V4HFmodes should be
> > > handled in the same way as e.g. V8QImode.
> > >
> > > This is not the case with 4-byte values, which should be passed using
> > > integer ABI.
> >
> > Thanks for the explanation, updated patch by removing the extra define_insn,
> > and drop V4HFmode from VALID_AVX512FP16_REG_MODE. Now v4hf would behave
> > same as v8qi.
> >
> > Bootsrapped and regtested on x86_64-pc-linux-gnu{-m32,} and sde.
> >
> > OK for master with the updated one?
>
> I'd put this new pattern in mmx.md to keep 64bit/32bit modes in
> mmx.md, similar to e.g. FMA patterns among others.
>
> OK with the eventual above change.
>
> Thanks,
> Uros.
>
> >
> > Uros Bizjak via Gcc-patches  于2021年9月27日周一 
> > 下午7:35写道:
> > >
> > > On Mon, Sep 27, 2021 at 12:42 PM Hongyu Wang  
> > > wrote:
> > > >
> > > > Hi Uros,
> > > >
> > > > This patch intends to support V4HF/V2HF vector type and basic 
> > > > operations.
> > > >
> > > > For 32bit target, V4HF vector is parsed same as __m64 type, V2HF
> > > > is parsed by stack and returned from GPR since it is not specified
> > > > by ABI.
> > > >
> > > > We found for 64bit vector in ia32, when mmx disabled there seems no
> > > > mov_internal, so we add a define_insn for v4hf mode. It would be 
> > > > very
> > > > ppreciated if you know why the handling of 64bit vector looks as is and
> > > > give some advice.
> > >
> > > ia32 ABI declares that __m64 values pass via MMX registers. Due to
> > > this, we are not able to fully disable MMX register usage, as is the
> > > case with x86_64. So, V4HFmode values will pass to functions via MMX
> > > registers on ia32 targets.
> > >
> > > So, there should be no additional define_insn, the addition to the
> > > existing MMXMODE mode iterator should be enough. V4HFmodes should be
> > > handled in the same way as e.g. V8QImode.
> > >
> > > This is not the case with 4-byte values, which should be passed using
> > > integer ABI.
> > >
> > > Uros.
> > >
> > > >
> > > > Bootstraped and regtested on x86_64-pc-linux-gnu{-m32,} and sde.
> > > >
> > > > OK for master?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR target/102230
> > > > * config/i386/i386.h (VALID_AVX512FP16_REG_MODE): Add
> > > > V4HF and V2HF mode check.
> > > > (VALID_SSE2_REG_VHF_MODE): Likewise.
> > > > (VALID_MMX_REG_MODE): Likewise.
> > > > (SSE_REG_MODE_P): Replace VALID_AVX512FP16_REG_MODE with
> > > > vector mode condition.
> > > > * config/i386/i386.c (classify_argument): Parse V4HF/V2HF
> > > > via sse regs.
> > > > (function_arg_32): Add V4HFmode.
> > > > (function_arg_advance_32): Likewise.
> > > > * config/i386/i386.md (mode): Add V4HF/V2HF.
> > > > (MODE_SIZE): Likewise.
> > > > * config/i386/mmx.md (MMXMODE): Add V4HF mode.
> > > > (V_32): Add V2HF mode.
> > > > (*mov_internal): Adjust sse alternatives to support
> > > > V4HF mode vector move.
> > > > (*mov_internal): Adjust sse alternatives
> > > > to support V2HF mode move.
> > > > * config/i386/sse.md (VHF_32_64): New mode iterator.
> > > > (3): New define_insn for add/sub/mul/div.
> > > > (*movv4hf_internal_sse): New define_insn for -mno-mmx and -msse.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR target/102230
> > > > * gcc.target/i386/avx512fp16-floatvnhf.c: Remove xfail.
> > > > * gcc.target/i386/avx512fp16-trunc-extendvnhf.c: Ditto.
> > > > * gcc.target/i386/avx512fp16-truncvnhf.c: Ditto.
> > > > * gcc.target/i386/avx512fp16-64-32-vecop-1.c: New test.
> > > > * gcc.target/i386/avx512fp16-64-32-vecop-2.c: Ditto.
> > > > * gcc.target/i386/pr102230.c: Ditto.
> > > > ---
> > > >  gcc/config/i386/i386.c|  4 +
> > > >  gcc/config/i386/i386.h| 12 ++-
> > > >  gcc/config/i386/i386.md   |  5 +-
> > > >  gcc/config/i386/mmx.md| 27 ---
> > > >  gcc/config/i386/sse.md| 49 
> > > >  .../i386/avx512fp16-64-32-vecop-1.c   | 30 
> > > >  .../i386/avx512fp16-64-32-vecop-2.c   | 75 +++
> > > >  .../gcc.target/i38

Re: [r12-3893 Regression] FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 on Linux/x86_64

2021-09-28 Thread Hongtao Liu via Gcc-patches
On Tue, Sep 28, 2021 at 2:59 PM Richard Biener via Gcc-patches
 wrote:
>
> On Mon, 27 Sep 2021, sunil.k.pandey wrote:
>
> > On Linux/x86_64,
> >
> > 6390c5047adb75960f86d56582e6322aaa4d9281 is the first bad commit
> > commit 6390c5047adb75960f86d56582e6322aaa4d9281
> > Author: Richard Biener 
> > Date:   Wed Nov 18 09:36:57 2020 +0100
> >
> > Allow different vector types for stmt groups
> >
> > caused
> >
> > FAIL: gcc.dg/vect/bb-slp-17.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > slp2 "optimized: basic block" 1
> > FAIL: gcc.dg/vect/bb-slp-17.c scan-tree-dump-times slp2 "optimized: basic 
> > block" 1
>
> This shows that it is maybe a bad idea to support V2SImode vectorization
> with -m32 when we refuse to implement even plus.
>
> OTOH it's just the mode that's available, autovectorize_vector_modes
> doesn't include the corresponding mode but we still pick it up via
> the related vector mode for group-size == 2.
>
> > FAIL: gcc.dg/vect/bb-slp-pr65935.c -flto -ffat-lto-objects  
> > scan-tree-dump-times slp1 "optimized: basic block" 10
> > FAIL: gcc.dg/vect/bb-slp-pr65935.c scan-tree-dump-times slp1 "optimized: 
> > basic block" 10
>
> We are now vectorizing the SSE tail when vectorizing with AVX.  I'll
> adjust the testcase to prefer SSE.
>
> > FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4
>
> With -mach=cascadelake we get
>
> vpermpd $68, c, %ymm0
> vpermpd $238, c, %ymm0
>
> instead of
>
> vmovapd c, %ymm1
> vinsertf128 $1, %xmm1, %ymm1, %ymm0
> vperm2f128  $49, %ymm1, %ymm1, %ymm0
>
> what's a way to disallow additional -march= from taking effect?  It's
I usually add -mno-{avx,avx512f} and -mtune=generic or sometimes
-mprefer-vector-width=* to the testcases.
or use (?:vinsertf128|vpermpd) for alternative instructions.
> really impossible to cater for all possible ISA variants in these kind
> of testcases.
Additional option -march=cascadelake sometimes can find real regression.
>
> Richard.



-- 
BR,
Hongtao


Re: [PATCH] Control all jump threading passes with -fjump-threads.

2021-09-28 Thread Richard Biener via Gcc-patches
On Tue, Sep 28, 2021 at 8:29 AM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 9/28/2021 12:17 AM, Aldy Hernandez wrote:
> > On Tue, Sep 28, 2021 at 3:46 AM Jeff Law  wrote:
> >>
> >>
> >> On 9/27/2021 9:00 AM, Aldy Hernandez wrote:
> >>> Last year I mentioned that -fthread-jumps was being ignored by the
> >>> majority of our jump threading passes, and Jeff said he'd be in favor
> >>> of fixing this.
> >>>
> >>> This patch remedies the situation, but it does change existing behavior.
> >>> Currently -fthread-jumps is only enabled for -O2, -O3, and -Os.  This
> >>> means that even if we restricted all jump threading passes with
> >>> -fthread-jumps, DOM jump threading would still seep through since it
> >>> runs at -O1.
> >>>
> >>> I propose this patch, but it does mean that DOM jump threading would
> >>> have to be explicitly enabled with -O1 -fthread-jumps.  An
> >>> alternative would be to also offer a specific -fno-dom-threading, but
> >>> that seems icky.
> >>>
> >>> OK pending tests?
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>* tree-ssa-threadbackward.c (pass_thread_jumps::gate): Check
> >>>flag_thread_jumps.
> >>>(pass_early_thread_jumps::gate): Same.
> >>>* tree-ssa-threadedge.c (jump_threader::thread_outgoing_edges):
> >>>Return if !flag_thread_jumps.
> >>>* tree-ssa-threadupdate.c
> >>>(jt_path_registry::register_jump_thread): Assert that
> >>>flag_thread_jumps is true.
> >> OK.  Clearly this is going to be even better once we disentangle
> >> threading from DOM.
> > Annoyingly, I had to tweak a few more tests, particularly some
> > -Wuninitialized -O1 ones which seem to depend on DOM jump threading to
> > give proper diagnostics.  It seems that every change to jump threading
> > needs tweaks to the Wuninitialized code :-(.
> Well, a lot of jump threading is there to help eliminate false positives
> from Wuninitialized by eliminating paths through the CFG that we can
> prove never execute at runtime.  SO that's not a huge surprise.

I would have suggested to enable -fthread-jumps at -O1 instead
and eventually just add && flag_expensive_optimizations to the
use in cfgcleanup.c to restrict that to -O2+

Richard.

> jeff


Re: [PATCH] [GIMPLE] Simplify (_Float16) ceil ((double) x) to .CEIL (x) when available.

2021-09-28 Thread Richard Biener via Gcc-patches
On Tue, Sep 28, 2021 at 4:01 AM Hongtao Liu  wrote:
>
> On Mon, Sep 27, 2021 at 8:53 PM Richard Biener
>  wrote:
> >
> > On Fri, Sep 24, 2021 at 1:26 PM liuhongt  wrote:
> > >
> > > Hi:
> > >   Related discussion in [1] and PR.
> > >
> > >   Bootstrapped and regtest on x86_64-linux-gnu{-m32,}.
> > >   Ok for trunk?
> > >
> > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574330.html
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/102464
> > > * config/i386/i386.c (ix86_optab_supported_p):
> > > Return true for HFmode.
> > > * match.pd: Simplify (_Float16) ceil ((double) x) to
> > > __builtin_ceilf16 (a) when a is _Float16 type and
> > > direct_internal_fn_supported_p.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pr102464.c: New test.
> > > ---
> > >  gcc/config/i386/i386.c   | 20 +++-
> > >  gcc/match.pd | 28 +
> > >  gcc/testsuite/gcc.target/i386/pr102464.c | 39 
> > >  3 files changed, 79 insertions(+), 8 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102464.c
> > >
> > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > index ba89e111d28..3767fe9806d 100644
> > > --- a/gcc/config/i386/i386.c
> > > +++ b/gcc/config/i386/i386.c
> > > @@ -23582,20 +23582,24 @@ ix86_optab_supported_p (int op, machine_mode 
> > > mode1, machine_mode,
> > >return opt_type == OPTIMIZE_FOR_SPEED;
> > >
> > >  case rint_optab:
> > > -  if (SSE_FLOAT_MODE_P (mode1)
> > > - && TARGET_SSE_MATH
> > > - && !flag_trapping_math
> > > - && !TARGET_SSE4_1)
> > > +  if (mode1 == HFmode)
> > > +   return true;
> > > +  else if (SSE_FLOAT_MODE_P (mode1)
> > > +  && TARGET_SSE_MATH
> > > +  && !flag_trapping_math
> > > +  && !TARGET_SSE4_1)
> > > return opt_type == OPTIMIZE_FOR_SPEED;
> > >return true;
> > >
> > >  case floor_optab:
> > >  case ceil_optab:
> > >  case btrunc_optab:
> > > -  if (SSE_FLOAT_MODE_P (mode1)
> > > - && TARGET_SSE_MATH
> > > - && !flag_trapping_math
> > > - && TARGET_SSE4_1)
> > > +  if (mode1 == HFmode)
> > > +   return true;
> > > +  else if (SSE_FLOAT_MODE_P (mode1)
> > > +  && TARGET_SSE_MATH
> > > +  && !flag_trapping_math
> > > +  && TARGET_SSE4_1)
> > > return true;
> > >return opt_type == OPTIMIZE_FOR_SPEED;
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index a9791ceb74a..9ccec8b6ce3 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -6191,6 +6191,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > (froms (convert float_value_p@0))
> > > (convert (tos @0)
> > >
> > > +#if GIMPLE
> > > +(match float16_value_p
> > > + @0
> > > + (if (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == float16_type_node)))
> > > +(for froms (BUILT_IN_TRUNCL BUILT_IN_TRUNC BUILT_IN_TRUNCF
> > > +   BUILT_IN_FLOORL BUILT_IN_FLOOR BUILT_IN_FLOORF
> > > +   BUILT_IN_CEILL BUILT_IN_CEIL BUILT_IN_CEILF
> > > +   BUILT_IN_ROUNDEVENL BUILT_IN_ROUNDEVEN BUILT_IN_ROUNDEVENF
> > > +   BUILT_IN_ROUNDL BUILT_IN_ROUND BUILT_IN_ROUNDF
> > > +   BUILT_IN_NEARBYINTL BUILT_IN_NEARBYINT BUILT_IN_NEARBYINTF
> > > +   BUILT_IN_RINTL BUILT_IN_RINT BUILT_IN_RINTF)
> >
> > we do have patterns that convert (truncl (convert floatval)) to
> > (float)truncf (val),
> > your's does (_Float16)trunc ((double) float16) -> truncF16 (float16), 
> > doesn't it
> > make sense to have trunc ((double) float16) -> (double)trunfF16
> > (float16) as well?
> >
> > Why do you conditionalize on GIMPLE here?
> To avoid
> error: ‘direct_internal_fn_supported_p’ was not declared in this scope

You probably have to amend generic-match-head.c by adding the relevant include.
But maybe we don't want to feed too many internal function calls into the
early GENERIC ... so OK to leave as-is then.

> >
> > That said, I wonder whether we can somehow address pattern explosion here,
> > eliding the outer (convert ...) from the match would help a bit already.
> >
> > The related patterns use optimize && canonicalize_math_p as well btw., not
> > sure whether either is appropriate here since there are no _Float16 math
> > functions available.
> Yes, that's why I didn't follow the existing pattern, i think we can
> add optimize back to the condition, but not canonicalize_math_p ()
> since there's no math function for _Float16.
> Also w/o the outer (convert ..), it looks like a canonicalization to
> transform ceil ((double) a) to (double) __builtin_ceilf16 (a) but not
> an optimization.

Yes, that's likely why we have canonicalize_math_p () on the others.

So I think the original patch is OK.

Thanks,
Richard.

> >
> > > + tos (IFN_TRUNC IFN_TRUNC IFN_TRUNC
> > > + IFN_FLOOR IFN_FLOOR I

[committed] gfortran.dg/include_15.f90: Add dg-prune-output [PR102500]

2021-09-28 Thread Tobias Burnus

It turned out that one warning output depended on details how
the testsuite it run. Thus, just ignore that bit that did not
always appear or not by using dg-prune-output.

Those subtle differences in test runs which make a test fail
or not fail, depending on how GCC's testsuite is run, make
life less boring ...

Committed as r12-3912-gce450af5087b95001b003184b8ecc2c9bbf65378.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit ce450af5087b95001b003184b8ecc2c9bbf65378
Author: Tobias Burnus 
Date:   Tue Sep 28 09:49:12 2021 +0200

gfortran.dg/include_15.f90: Add dg-prune-output [PR102500]

gcc/testsuite/
PR fortran/102500
* gfortran.dg/include_15.f90: Add 'dg-prune-output' to prune
-Wmissing-include-dirs output printed or not depending on
how the testsuite is run.

diff --git a/gcc/testsuite/gfortran.dg/include_15.f90 b/gcc/testsuite/gfortran.dg/include_15.f90
index 068dcef5826..18d91f6cd32 100644
--- a/gcc/testsuite/gfortran.dg/include_15.f90
+++ b/gcc/testsuite/gfortran.dg/include_15.f90
@@ -4,3 +4,6 @@ end
 ! { dg-warning " /fdaf/: No such file or directory" "" { target *-*-* } 0 }
 ! { dg-warning " bar: No such file or directory" "" { target *-*-* } 0 }
 ! { dg-warning " foo/bar: No such file or directory" "" { target *-*-* } 0 }
+
+! Depending how the testsuite is run, it may or may not print the following warning:
+! { dg-prune-output "Warning: finclude: No such file or directory" }


Re: [PATCH] [i386] Support reduc_{plus, smax, smin, umax, min}_scal_v4hi.

2021-09-28 Thread Uros Bizjak via Gcc-patches
On Tue, Sep 28, 2021 at 8:42 AM liuhongt  wrote:
>
> Hi:
>   Bootstrapped and regtested on x86_64-pc-lunux-gnu{-m32,}.
>   Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/102494
> * config/i386/i386-expand.c (emit_reduc_half): Hanlde V4HImode.
> * config/i386/mmx.md (reduc_plus_scal_v4hi): New.
> (reduc__scal_v4hi): New.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/mmx-reduce-op-1.c: New test.
> * gcc.target/i386/mmx-reduce-op-2.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-expand.c |  5 ++
>  gcc/config/i386/mmx.md| 36 
>  .../gcc.target/i386/mmx-reduce-op-1.c | 58 +++
>  .../gcc.target/i386/mmx-reduce-op-2.c | 25 
>  4 files changed, 124 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/mmx-reduce-op-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/mmx-reduce-op-2.c
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 94ac303585e..260e7fd32a8 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -16043,6 +16043,11 @@ emit_reduc_half (rtx dest, rtx src, int i)
>  case E_V2DFmode:
>tem = gen_vec_interleave_highv2df (dest, src, src);
>break;
> +case E_V4HImode:
> +  d = gen_reg_rtx (V1DImode);
> +  tem = gen_mmx_lshrv1di3 (d, gen_lowpart (V1DImode, src),
> +  GEN_INT (i / 2));
> +  break;
>  case E_V16QImode:
>  case E_V8HImode:
>  case E_V4SImode:
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index b0093778fc6..126a2dd4b7e 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -3931,6 +3931,42 @@ (define_expand "reduc_plus_scal_v8qi"
>DONE;
>  })
>
> +(define_expand "reduc_plus_scal_v4hi"
> + [(plus:V4HI
> +(match_operand:HI 0 "register_operand")
> +(match_operand:V4HI 1 "register_operand"))]
> + "TARGET_MMX_WITH_SSE"
> +{
> +  rtx tmp = gen_reg_rtx (V4HImode);
> +  ix86_expand_reduc (gen_addv4hi3, tmp, operands[1]);
> +  emit_insn (gen_vec_extractv4hihi (operands[0], tmp, const0_rtx));
> +  DONE;
> +})
> +
> +(define_expand "reduc__scal_v4hi"
> +  [(smaxmin:V4HI
> + (match_operand:HI 0 "register_operand")
> + (match_operand:V4HI 1 "register_operand"))]
> +  "TARGET_MMX_WITH_SSE"
> +{
> +  rtx tmp = gen_reg_rtx (V4HImode);
> +  ix86_expand_reduc (gen_v4hi3, tmp, operands[1]);
> +  emit_insn (gen_vec_extractv4hihi (operands[0], tmp, const0_rtx));
> +  DONE;
> +})
> +
> +(define_expand "reduc__scal_v4hi"
> +  [(umaxmin:V4HI
> + (match_operand:HI 0 "register_operand")
> + (match_operand:V4HI 1 "register_operand"))]
> +  "TARGET_MMX_WITH_SSE && TARGET_SSE4_1"
> +{
> +  rtx tmp = gen_reg_rtx (V4HImode);
> +  ix86_expand_reduc (gen_v4hi3, tmp, operands[1]);
> +  emit_insn (gen_vec_extractv4hihi (operands[0], tmp, const0_rtx));
> +  DONE;
> +})
> +
>  (define_expand "usadv8qi"
>[(match_operand:V2SI 0 "register_operand")
> (match_operand:V8QI 1 "register_operand")
> diff --git a/gcc/testsuite/gcc.target/i386/mmx-reduce-op-1.c 
> b/gcc/testsuite/gcc.target/i386/mmx-reduce-op-1.c
> new file mode 100644
> index 000..ac20ed0d41f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/mmx-reduce-op-1.c
> @@ -0,0 +1,58 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "\.REDUC_PLUS" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\.REDUC_MIN" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\.REDUC_MAX" 2 "optimized" } } */
> +
> +#define MAX(a, b) ((a) > (b) ? (a) : (b))
> +#define MIN(a, b) ((a) > (b) ? (b) : (a))
> +
> +short
> +__attribute__((noipa, optimize("Ofast"),target("sse2")))
> +reduce_add (short* __restrict pa)
> +{
> +  short sum = 0;
> +  for (int i = 0; i != 4; i++)
> +sum += pa[i];
> +  return sum;
> +}
> +
> +short
> +__attribute__((noipa, optimize("Ofast"),target("sse2")))
> +reduce_smax (short* __restrict pa)
> +{
> +  short sum = pa[0];
> +  for (int i = 0; i != 4; i++)
> +sum = MAX(sum, pa[i]);
> +  return sum;
> +}
> +
> +short
> +__attribute__((noipa, optimize("Ofast"),target("sse2")))
> +reduce_smin (short* __restrict pa)
> +{
> +  short sum = pa[0];
> +  for (int i = 0; i != 4; i++)
> +sum = MIN(sum, pa[i]);
> +  return sum;
> +}
> +
> +unsigned short
> +__attribute__((noipa, optimize("Ofast"),target("sse4.1")))
> +reduce_umax (unsigned short* __restrict pa)
> +{
> +  unsigned short sum = pa[0];
> +  for (int i = 0; i != 4; i++)
> +sum = MAX(sum, pa[i]);
> +  return sum;
> +}
> +
> +unsigned short
> +__attribute__((noipa, optimize("Ofast"),target("sse4.1")))
> +reduce_umin (unsigned short* __restrict pa)
> +{
> +  unsigned short sum = pa[0];
> +  for (int i = 0; i != 4; i++)
> +sum = MIN(sum, pa[i]);
> +  return sum;
> +}
> diff --git a/gcc/testsuite/gcc

Re: [r12-3893 Regression] FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 on Linux/x86_64

2021-09-28 Thread Richard Biener via Gcc-patches
On Tue, 28 Sep 2021, Hongtao Liu wrote:

> On Tue, Sep 28, 2021 at 2:59 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Mon, 27 Sep 2021, sunil.k.pandey wrote:
> >
> > > On Linux/x86_64,
> > >
> > > 6390c5047adb75960f86d56582e6322aaa4d9281 is the first bad commit
> > > commit 6390c5047adb75960f86d56582e6322aaa4d9281
> > > Author: Richard Biener 
> > > Date:   Wed Nov 18 09:36:57 2020 +0100
> > >
> > > Allow different vector types for stmt groups
> > >
> > > caused
> > >
> > > FAIL: gcc.dg/vect/bb-slp-17.c -flto -ffat-lto-objects  
> > > scan-tree-dump-times slp2 "optimized: basic block" 1
> > > FAIL: gcc.dg/vect/bb-slp-17.c scan-tree-dump-times slp2 "optimized: basic 
> > > block" 1
> >
> > This shows that it is maybe a bad idea to support V2SImode vectorization
> > with -m32 when we refuse to implement even plus.
> >
> > OTOH it's just the mode that's available, autovectorize_vector_modes
> > doesn't include the corresponding mode but we still pick it up via
> > the related vector mode for group-size == 2.

It looks like we could define the vectorize.related_mode hook to
reject V2SImode when !TARGET_MMX_WITH_SSE - the default implementation
just checks for vector_mode_supported_p.

> > > FAIL: gcc.dg/vect/bb-slp-pr65935.c -flto -ffat-lto-objects  
> > > scan-tree-dump-times slp1 "optimized: basic block" 10
> > > FAIL: gcc.dg/vect/bb-slp-pr65935.c scan-tree-dump-times slp1 "optimized: 
> > > basic block" 10
> >
> > We are now vectorizing the SSE tail when vectorizing with AVX.  I'll
> > adjust the testcase to prefer SSE.
> >
> > > FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4
> >
> > With -mach=cascadelake we get
> >
> > vpermpd $68, c, %ymm0
> > vpermpd $238, c, %ymm0
> >
> > instead of
> >
> > vmovapd c, %ymm1
> > vinsertf128 $1, %xmm1, %ymm1, %ymm0
> > vperm2f128  $49, %ymm1, %ymm1, %ymm0
> >
> > what's a way to disallow additional -march= from taking effect?  It's
> I usually add -mno-{avx,avx512f} and -mtune=generic or sometimes
> -mprefer-vector-width=* to the testcases.

OK, I will try this route then.

Thanks,
Richard.


[PATCH] Fix gcc.target/i386/vect-pr97352.c for -m32 -march=cascadelake

2021-09-28 Thread Richard Biener via Gcc-patches
The easiest is to disable AVX2 and AVX512F explicitely.

Tested on x86_64-unknown-linux-gnu, pushed.

2021-09-28  Richard Biener  

* gcc.target/i386/vect-pr97352.c: Pass -mno-avx2 -mno-avx512f.
---
 gcc/testsuite/gcc.target/i386/vect-pr97352.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/vect-pr97352.c 
b/gcc/testsuite/gcc.target/i386/vect-pr97352.c
index d0e120600db..f6cbf368160 100644
--- a/gcc/testsuite/gcc.target/i386/vect-pr97352.c
+++ b/gcc/testsuite/gcc.target/i386/vect-pr97352.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -mavx" } */
+/* { dg-options "-O3 -mavx -mno-avx2 -mno-avx512f" } */
 
 double x[2], a[4], b[4], c[5];
 
-- 
2.31.1


[PATCH] rs6000: Remove builtin mask check from builtin_decl [PR102347]

2021-09-28 Thread Kewen.Lin via Gcc-patches
Hi,

As the discussion in PR102347, currently builtin_decl is invoked so
early, it's when making up the function_decl for builtin functions,
at that time the rs6000_builtin_mask could be wrong for those
builtins sitting in #pragma/attribute target functions, though it
will be updated properly later when LTO processes all nodes.

This patch is to align with the practice i386 port adopts, also
align with r10-7462 by relaxing builtin mask checking in some places.

Bootstrapped and regress-tested on powerpc64le-linux-gnu P9 and
powerpc64-linux-gnu P8.

Is it ok for trunk?

BR,
Kewen
-
gcc/ChangeLog:

PR target/102347
* config/rs6000/rs6000-call.c (rs6000_builtin_decl): Remove builtin
mask check.

gcc/testsuite/ChangeLog:

PR target/102347
* gcc.target/powerpc/pr102347.c: New test.

---
 gcc/config/rs6000/rs6000-call.c | 14 --
 gcc/testsuite/gcc.target/powerpc/pr102347.c | 15 +++
 2 files changed, 19 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102347.c

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index fd7f24da818..15e0e09c07d 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -13775,23 +13775,17 @@ rs6000_init_builtins (void)
 }
 }

-/* Returns the rs6000 builtin decl for CODE.  */
+/* Returns the rs6000 builtin decl for CODE.  Note that we don't check
+   the builtin mask here since there could be some #pragma/attribute
+   target functions and the rs6000_builtin_mask could be wrong when
+   this checking happens, though it will be updated properly later.  */

 tree
 rs6000_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
 {
-  HOST_WIDE_INT fnmask;
-
   if (code >= RS6000_BUILTIN_COUNT)
 return error_mark_node;

-  fnmask = rs6000_builtin_info[code].mask;
-  if ((fnmask & rs6000_builtin_mask) != fnmask)
-{
-  rs6000_invalid_builtin ((enum rs6000_builtins)code);
-  return error_mark_node;
-}
-
   return rs6000_builtin_decls[code];
 }

diff --git a/gcc/testsuite/gcc.target/powerpc/pr102347.c 
b/gcc/testsuite/gcc.target/powerpc/pr102347.c
new file mode 100644
index 000..05c439a8dac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102347.c
@@ -0,0 +1,15 @@
+/* { dg-do link } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target lto } */
+/* { dg-options "-flto -mdejagnu-cpu=power9" } */
+
+/* Verify there are no error messages in LTO mode.  */
+
+#pragma GCC target "cpu=power10"
+int main ()
+{
+  float *b;
+  __vector_quad c;
+  __builtin_mma_disassemble_acc (b, &c);
+  return 0;
+}
--
2.27.0



[PATCH v2] rs6000: Modify the way for extra penalized cost

2021-09-28 Thread Kewen.Lin via Gcc-patches
Hi,

This patch follows the discussions here[1][2], where Segher
pointed out the existing way to guard the extra penalized
cost for strided/elementwise loads with a magic bound does
not scale.

The way with nunits * stmt_cost can get one much
exaggerated penalized cost, such as: for V16QI on P8, it's
16 * 20 = 320, that's why we need one bound.  To make it
better and more readable, the penalized cost is simplified
as:

unsigned adjusted_cost = (nunits == 2) ? 2 : 1;
unsigned extra_cost = nunits * adjusted_cost;

For V2DI/V2DF, it uses 2 penalized cost for each scalar load
while for the other modes, it uses 1.  It's mainly concluded
from the performance evaluations.  One thing might be
related is that: More units vector gets constructed, more
instructions are used.  It has more chances to schedule them
better (even run in parallelly when enough available units
at that time), so it seems reasonable not to penalize more
for them.

The SPEC2017 evaluations on Power8/Power9/Power10 at option
sets O2-vect and Ofast-unroll show this change is neutral.

Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.

Is it ok for trunk?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580099.html
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579529.html

BR,
Kewen
-
gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_update_target_cost_per_stmt): Adjust
the way to compute extra penalized cost.  Remove useless parameter.
(rs6000_add_stmt_cost): Adjust the call to function
rs6000_update_target_cost_per_stmt.


---
 gcc/config/rs6000/rs6000.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index dd42b0964f1..8200e1152c2 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5422,7 +5422,6 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data 
*data,
enum vect_cost_for_stmt kind,
struct _stmt_vec_info *stmt_info,
enum vect_cost_model_location where,
-   int stmt_cost,
unsigned int orig_count)
 {

@@ -5462,17 +5461,23 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data 
*data,
{
  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
  unsigned int nunits = vect_nunits_for_cost (vectype);
- unsigned int extra_cost = nunits * stmt_cost;
- /* As function rs6000_builtin_vectorization_cost shows, we have
-priced much on V16QI/V8HI vector construction as their units,
-if we penalize them with nunits * stmt_cost, it can result in
-an unreliable body cost, eg: for V16QI on Power8, stmt_cost
-is 20 and nunits is 16, the extra cost is 320 which looks
-much exaggerated.  So let's use one maximum bound for the
-extra penalized cost for vector construction here.  */
- const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12;
- if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR)
-   extra_cost = MAX_PENALIZED_COST_FOR_CTOR;
+ /* Don't expect strided/elementwise loads for just 1 nunit.  */
+ gcc_assert (nunits > 1);
+ /* i386 port adopts nunits * stmt_cost as the penalized cost
+for this kind of penalization, we used to follow it but
+found it could result in an unreliable body cost especially
+for V16QI/V8HI modes.  To make it better, we choose this
+new heuristic: for each scalar load, we use 2 as penalized
+cost for the case with 2 nunits and use 1 for the other
+cases.  It's without much supporting theory, mainly
+concluded from the broad performance evaluations on Power8,
+Power9 and Power10.  One possibly related point is that:
+vector construction for more units would use more insns,
+it has more chances to schedule them better (even run in
+parallelly when enough available units at that time), so
+it seems reasonable not to penalize that much for them.  */
+ unsigned int adjusted_cost = (nunits == 2) ? 2 : 1;
+ unsigned int extra_cost = nunits * adjusted_cost;
  data->extra_ctor_cost += extra_cost;
}
 }
@@ -5510,7 +5515,7 @@ rs6000_add_stmt_cost (class vec_info *vinfo, void *data, 
int count,
   cost_data->cost[where] += retval;

   rs6000_update_target_cost_per_stmt (cost_data, kind, stmt_info, where,
- stmt_cost, orig_count);
+ orig_count);
 }

   return retval;
--
2.27.0



Re: [r12-3893 Regression] FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 on Linux/x86_64

2021-09-28 Thread Richard Biener via Gcc-patches
On Tue, 28 Sep 2021, Richard Biener wrote:

> On Tue, 28 Sep 2021, Hongtao Liu wrote:
> 
> > On Tue, Sep 28, 2021 at 2:59 PM Richard Biener via Gcc-patches
> >  wrote:
> > >
> > > On Mon, 27 Sep 2021, sunil.k.pandey wrote:
> > >
> > > > On Linux/x86_64,
> > > >
> > > > 6390c5047adb75960f86d56582e6322aaa4d9281 is the first bad commit
> > > > commit 6390c5047adb75960f86d56582e6322aaa4d9281
> > > > Author: Richard Biener 
> > > > Date:   Wed Nov 18 09:36:57 2020 +0100
> > > >
> > > > Allow different vector types for stmt groups
> > > >
> > > > caused
> > > >
> > > > FAIL: gcc.dg/vect/bb-slp-17.c -flto -ffat-lto-objects  
> > > > scan-tree-dump-times slp2 "optimized: basic block" 1
> > > > FAIL: gcc.dg/vect/bb-slp-17.c scan-tree-dump-times slp2 "optimized: 
> > > > basic block" 1
> > >
> > > This shows that it is maybe a bad idea to support V2SImode vectorization
> > > with -m32 when we refuse to implement even plus.
> > >
> > > OTOH it's just the mode that's available, autovectorize_vector_modes
> > > doesn't include the corresponding mode but we still pick it up via
> > > the related vector mode for group-size == 2.
> 
> It looks like we could define the vectorize.related_mode hook to
> reject V2SImode when !TARGET_MMX_WITH_SSE - the default implementation
> just checks for vector_mode_supported_p.

Meh, that doesn't work.  We then fall through

  else if (SCALAR_INT_MODE_P (prevailing_mode)
   || !related_vector_mode (prevailing_mode,
inner_mode, nunits).exists 
(&simd_mode))
{
  /* Fall back to using mode_for_vector, mostly in the hope of being
 able to use an integer mode.  */
  if (known_eq (nunits, 0U)
  && !multiple_p (GET_MODE_SIZE (prevailing_mode), nbytes, 
&nunits))
return NULL_TREE;

  if (!mode_for_vector (inner_mode, nunits).exists (&simd_mode))
return NULL_TREE;

and return V2SImode anyway from mode_for_vector ...

So - should we only allow integer modes here as the comment suggests?
With that, thus

  if (!mode_for_vector (inner_mode, nunits).exists (&simd_mode)
  || GET_MODE_CLASS (simd_mode) != MODE_INT)
return NULL_TREE;

we "properly" _not_ use V2SImode for vectorization on x86 when
!TARGET_MMX_WITH_SSE.  Note that will also not use V2SImode
for vectorizing copies (which are properly supported).  So I'm
not sure rejecting V2SImode outright is "proper" ...

Richard.


Re: [PATCH] rs6000: Modify the way for extra penalized cost

2021-09-28 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2021/9/23 上午6:36, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 21, 2021 at 11:24:08AM +0800, Kewen.Lin wrote:
>> on 2021/9/18 上午6:01, Segher Boessenkool wrote:
>>> On Thu, Sep 16, 2021 at 09:14:15AM +0800, Kewen.Lin wrote:
 The way with nunits * stmt_cost can get one much exaggerated
 penalized cost, such as: for V16QI on P8, it's 16 * 20 = 320,
 that's why we need one bound.  To make it scale, this patch
 doesn't use nunits * stmt_cost any more, but it still keeps
 nunits since there are actually nunits scalar loads there.  So
 it uses one cost adjusted from stmt_cost, since the current
 stmt_cost sort of considers nunits, we can stablize the cost
 for big nunits and retain the cost for small nunits.  After
 some tries, this patch gets the adjusted cost as:

 stmt_cost / (log2(nunits) * log2(nunits))
>>>
>>> So for  V16QI it gives *16/(4*4) so *1
>>> V8HI  it gives *8/(3*3)  so *8/9
>>> V4SI  it gives *4/(2*2)  so *1
>>> V2DI  it gives *2/(1*1)  so *2
>>> and for V1TI  it gives *1/(0*0) which is UB (no, does not crash for us,
>>> just gives wildly wrong answers; the div returns 0 on recent systems).
>>
>> I don't expected we will have V1TI for strided/elementwise load,
>> if it's one unit vector, it's the whole vector itself.
>> Besides, the below assertion should exclude it already.
> 
> Yes.  But ignoring the UB for unexpectedly large vector components, the
> 1 / 1.111 / 1 / 2  scoring does not make much sense.  The formulas
> "look" smooth and even sort of reasonable, but as soon as you look at
> what it *means*, and realise the domain if the function is discrete
> (only four or five possible inputs), and then see how the function
> behaves on that...  Hrm :-)
> 
>>> This of course is assuming nunits will always be a power of 2, but I'm
>>> sure that we have many other places in the compiler assuming that
>>> already, so that is fine.  And if one day this stops being true we will
>>> get a nice ICE, pretty much the best we could hope for.
>>
>> Yeah, exact_log2 returns -1 for non power of 2 input, for example:
> 
> Exactly.
> 
 +unsigned int adjusted_cost = stmt_cost / nunits_sq;
>>>
>>> But this can divide by 0.  Or are we somehow guaranteed that nunits
>>> will never be 1?  Yes the log2 check above, sure, but that ICEs if this
>>> is violated; is there anything that actually guarantees it is true?
>>
>> As I mentioned above, I don't expect we can have nunits 1 strided/ew load,
>> and the ICE should check this and ensure dividing by zero never happens.  :)
> 
> Can you assert that *directly* then please?
> 

Fix in v2.

>>> A magic crazy formula like this is no good.  If you want to make the
>>> cost of everything but V2D* be the same, and that of V2D* be twice that,
>>> that is a weird heuristic, but we can live with that perhaps.  But that
>>> beats completely unexplained (and unexplainable) magic!
>>>
>>> Sorry.
>>
>> That's all right, thanks for the comments!  let's improve it.  :)
> 
> I like that spirit :-)
> 
>> How about just assigning 2 for V2DI and 1 for the others for the
>> penalized_cost_per_load with some detailed commentary, it should have
>> the same effect with this "magic crazy formula", but I guess it can
>> be more clear.
> 
> That is fine yes!  (Well, V2DF the same I guess?  Or you'll need very
> detailed commentary :-) )
> 
> It is fine to say "this is just a heuristic without much supporting
> theory" in places.  That is what most of our --param= are as well, for
> example.  If counting two-element vectors as twice as expensive as all
> other vectors helps performance, then so be it: if there is no better
> way to cost things (or we do not know one), then what else are we to do?
> 
> 

Thanks a lot for the suggestion, I just posted v2:

https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580358.html

BR,
Kewen


Re: [PATCHv2] top-level configure: setup target_configdirs based on repository

2021-09-28 Thread Andrew Burgess
* Richard Biener  [2021-09-27 10:23:50 +0200]:

> On Fri, Sep 24, 2021 at 12:34 PM Andrew Burgess
>  wrote:
> >
> > * Thomas Schwinge  [2021-09-23 11:29:05 +0200]:
> >
> > > Hi!
> > >
> > > I only had a curious look here; hope that's still useful.
> > >
> > > On 2021-09-22T16:30:42+0100, Andrew Burgess  
> > > wrote:
> > > > The top-level configure script is shared between the gcc repository
> > > > and the binutils-gdb repository.
> > > >
> > > > The target_configdirs variable in the configure.ac script, defines
> > > > sub-directories that contain components that should be built for the
> > > > target using the target tools.
> > > >
> > > > Some components, e.g. zlib, are built as both host and target
> > > > libraries.
> > > >
> > > > This causes problems for binutils-gdb.  If we run 'make all' in the
> > > > binutils-gdb repository we end up trying to build a target version of
> > > > the zlib library, which requires the target compiler be available.
> > > > Often the target compiler isn't immediately available, and so the
> > > > build fails.
> > >
> > > I did wonder: shouldn't normally these target libraries be masked out via
> > > 'noconfigdirs' (see 'Handle --disable- generically' section),
> > > via 'enable_[...]' being set to 'no'?  But I think I now see the problem
> > > here: the 'enable_[...]' variables guard both the host and target library
> > > build!  (... if I'm quickly understanding that correctly...)
> > >
> > > ... and you do need the host zlib, thus '$enable_zlib != no'.
> > >
> > > > The problem with zlib impacted a previous attempt to synchronise the
> > > > top-level configure scripts from gcc to binutils-gdb, see this thread:
> > > >
> > > >   https://sourceware.org/pipermail/binutils/2019-May/107094.html
> > > >
> > > > And I'm in the process of importing libbacktrace in to binutils-gdb,
> > > > which is also a host and target library, and triggers the same issues.
> > > >
> > > > I believe that for binutils-gdb, at least at the moment, there are no
> > > > target libraries that we need to build.
> > > >
> > > > My proposal then is to make the value of target_libraries change based
> > > > on which repository we are building in.  Specifically, if the source
> > > > tree has a gcc/ directory then we should set the target_libraries
> > > > variable, otherwise this variable is left entry.
> > > >
> > > > I think that if someone tries to create a single unified tree (gcc +
> > > > binutils-gdb in a single source tree) and then build, this change will
> > > > not have a negative impact, the tree still has gcc/ so we'd expect the
> > > > target compiler to be built, which means building the target_libraries
> > > > should work just fine.
> > > >
> > > > However, if the source tree lacks gcc/ then we assume the target
> > > > compiler isn't built/available, and so target_libraries shouldn't be
> > > > built.
> > > >
> > > > There is already precedent within configure.ac for check on the
> > > > existence of gcc/ in the source tree, see the handling of
> > > > -enable-werror around line 3658.
> > >
> > > (I understand that one to just guard the 'cat $srcdir/gcc/DEV-PHASE',
> > > tough.)
> > >
> > > > I've tested a build of gcc on x86-64, and the same set of target
> > > > libraries still seem to get built.  On binutils-gdb this change
> > > > resolves the issues with 'make all'.
> > > >
> > > > Any thoughts?
> > >
> > > > --- a/configure.ac
> > > > +++ b/configure.ac
> > > > @@ -180,9 +180,17 @@ target_tools="target-rda"
> > > >  ## We assign ${configdirs} this way to remove all embedded newlines.  
> > > > This
> > > >  ## is important because configure will choke if they ever get through.
> > > >  ## ${configdirs} is directories we build using the host tools.
> > > > -## ${target_configdirs} is directories we build using the target tools.
> > > > +##
> > > > +## ${target_configdirs} is directories we build using the target
> > > > +## tools, these are only needed when working in the gcc tree.  This
> > > > +## file is also reused in the binutils-gdb tree, where building any
> > > > +## target stuff doesn't make sense.
> > > >  configdirs=`echo ${host_libs} ${host_tools}`
> > > > -target_configdirs=`echo ${target_libraries} ${target_tools}`
> > > > +if test -d ${srcdir}/gcc; then
> > > > +  target_configdirs=`echo ${target_libraries} ${target_tools}`
> > > > +else
> > > > +  target_configdirs=""
> > > > +fi
> > > >  build_configdirs=`echo ${build_libs} ${build_tools}`
> > >
> > > What I see is that after this, there are still occasions where inside
> > > 'case "${target}"', 'target_configdirs' gets amended, so those won't be
> > > caught by your approach?
> >
> > Good point, I'd failed to spot these.
> >
> > >
> > > Instead of erasing 'target_configdirs' as you've posted, and
> > > understanding that we can't just instead add all the "offending" ones to
> > > 'noconfigdirs' for '! test -d "$srcdir"/gcc/' (because that would also
> > > disable them for host usage),
> >
> > Great idea, th

PING^2 [PATCH] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2021-09-28 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html

One related patch [1] is ready to commit, whose test cases rely on
this patch if no changes are applied to them.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579658.html

BR,
Kewen

on 2021/9/15 下午4:42, Kewen.Lin via Gcc-patches wrote:
> Hi!
> 
> Gentle ping this patch:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html
> 
> BR,
> Kewen
> 
> on 2021/9/1 下午2:55, Kewen.Lin via Gcc-patches wrote:
>> Hi!
>>
>> This patch is to fix the inconsistent behaviors for non-LTO mode
>> and LTO mode.  As Martin pointed out, currently the function
>> rs6000_can_inline_p simply makes it inlinable if callee_tree is
>> NULL, but it's wrong, we should use the command line options
>> from target_option_default_node as default.  It also replaces
>> rs6000_isa_flags with the one from target_option_default_node
>> when caller_tree is NULL as rs6000_isa_flags could probably
>> change since initialization.
>>
>> It also extends the scope of the check for the case that callee
>> has explicit set options, for test case pr102059-2.c inlining can
>> happen unexpectedly before, it's fixed accordingly.
>>
>> As Richi/Mike pointed out, some tuning flags like MASK_P8_FUSION
>> can be neglected for inlining, this patch also exludes them when
>> the callee is attributed by always_inline.
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu Power9.
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>>  PR ipa/102059
>>  * config/rs6000/rs6000.c (rs6000_can_inline_p): Adjust with
>>  target_option_default_node and consider always_inline_safe flags.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  PR ipa/102059
>>  * gcc.target/powerpc/pr102059-1.c: New test.
>>  * gcc.target/powerpc/pr102059-2.c: New test.
>>  * gcc.target/powerpc/pr102059-3.c: New test.
>>  * gcc.target/powerpc/pr102059-4.c: New test.
>>
> 


[COMMITTED] Return VARYING in range_on_path_entry if nothing found.

2021-09-28 Thread Aldy Hernandez via Gcc-patches
The problem here is that the solver's code solving unknown SSAs on entry
to a path was returning UNDEFINED if there were no incoming edges to the
start of the path that were not the function entry block.  This caused a
cascade of pain down stream.

Tested on x86-64 Linux.

PR tree-optimization/102511

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::range_on_path_entry):
Return VARYING when nothing found.

gcc/testsuite/ChangeLog:

* gcc.dg/pr102511.c: New test.
* gcc.dg/tree-ssa/ssa-dom-thread-14.c: Adjust.
---
 gcc/gimple-range-path.cc  | 11 +-
 gcc/testsuite/gcc.dg/pr102511.c   | 21 +++
 .../gcc.dg/tree-ssa/ssa-dom-thread-14.c   |  2 +-
 3 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr102511.c

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 71e04e4deba..9da67d2a35b 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -136,14 +136,23 @@ path_range_query::range_on_path_entry (irange &r, tree 
name)
 {
   int_range_max tmp;
   basic_block entry = entry_bb ();
+  bool changed = false;
+
   r.set_undefined ();
   for (unsigned i = 0; i < EDGE_COUNT (entry->preds); ++i)
 {
   edge e = EDGE_PRED (entry, i);
   if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
  && m_ranger.range_on_edge (tmp, e, name))
-   r.union_ (tmp);
+   {
+ r.union_ (tmp);
+ changed = true;
+   }
 }
+
+  // Make sure we don't return UNDEFINED by mistake.
+  if (!changed)
+r.set_varying (TREE_TYPE (name));
 }
 
 // Return the range of NAME at the end of the path being analyzed.
diff --git a/gcc/testsuite/gcc.dg/pr102511.c b/gcc/testsuite/gcc.dg/pr102511.c
new file mode 100644
index 000..8a9af347305
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102511.c
@@ -0,0 +1,21 @@
+// { dg-do run }
+// { dg-options "-O3" }
+
+char arr_15 [8];
+__attribute__((noipa))
+void test(signed char a, unsigned short b, unsigned long long c,
+  unsigned short f) {
+  for (int d = b - 8; d < b; d += 2)
+for (short e = 0; e < (unsigned short)((f ? 122 : 0) ^ (a ? c : 0)) - 
64055;
+ e += 3)
+  arr_15[d] = 42;
+}
+int main() {
+test(37, 8, 12325048486467861044ULL, 45936);
+for (int i = 0; i < 8; ++i)
+  {
+if (arr_15[i] != ((i&1) ? 0 : 42))
+  __builtin_abort();
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-14.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-14.c
index 3bc4b3795cb..a25fe8bd89e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-14.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-14.c
@@ -37,5 +37,5 @@ expand_shift_1 (int code, int unsignedp, int rotate,
we will enter the TRUE arm of the conditional and we can thread
the test to compute the first first argument of the expand_binop
call if we look backwards through the boolean logicals.  */
-/* { dg-final { scan-tree-dump-times "Threaded" 1 "dom2"} } */
+/* { dg-final { scan-tree-dump-times "Threaded" 2 "dom2"} } */
 
-- 
2.31.1



[PATCH] c++: Fix up synthetization of defaulted comparison operators on classes with bitfields [PR102490]

2021-09-28 Thread Jakub Jelinek via Gcc-patches
Hi!

The testcases in the patch are either miscompiled or ICE with checking,
because the defaulted operator== is synthetized too early (but only if
constexpr), when the corresponding class type is still incomplete type.
The problem is that at that point the bitfield FIELD_DECLs still have as
TREE_TYPE their underlying type rather than integral type with their
precision and when layout_class_type is called for the class soon after
that, it changes those types but the COMPONENT_REFs type stay the way
that they were during the operator== synthetize_method type and the
middle-end is then upset by the mismatch of types.
As what exact type will be given isn't just a one liner but quite long code
especially for over-sized bitfields, I think it is best to just not
synthetize the comparison operators so early (the defaulted_late_check
change) and call defaulted_late_check for them once again as soon as the
class is complete.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-09-28  Jakub Jelinek  

PR c++/102490
* method.c (defaulted_late_check): Don't synthetize constexpr
defaulted comparisons if context is still incomplete type.
(finish_struct_1): Call defaulted_late_check again for defaulted
comparisons.

* g++.dg/cpp2a/spaceship-eq11.C: New test.
* g++.dg/cpp2a/spaceship-eq12.C: New test.

--- gcc/cp/method.c.jj  2021-09-15 08:55:37.563497558 +0200
+++ gcc/cp/method.c 2021-09-27 13:48:12.139271830 +0200
@@ -3160,8 +3160,11 @@ defaulted_late_check (tree fn)
   if (kind == sfk_comparison)
 {
   /* If the function was declared constexpr, check that the definition
-qualifies.  Otherwise we can define the function lazily.  */
-  if (DECL_DECLARED_CONSTEXPR_P (fn) && !DECL_INITIAL (fn))
+qualifies.  Otherwise we can define the function lazily.
+Don't do this if the class type is still incomplete.  */
+  if (DECL_DECLARED_CONSTEXPR_P (fn)
+ && !DECL_INITIAL (fn)
+ && COMPLETE_TYPE_P (ctx))
{
  /* Prevent GC.  */
  function_depth++;
--- gcc/cp/class.c.jj   2021-09-03 09:46:28.801428380 +0200
+++ gcc/cp/class.c  2021-09-27 14:07:03.465562255 +0200
@@ -7467,7 +7467,14 @@ finish_struct_1 (tree t)
  for any static member objects of the type we're working on.  */
   for (x = TYPE_FIELDS (t); x; x = DECL_CHAIN (x))
 if (DECL_DECLARES_FUNCTION_P (x))
-  DECL_IN_AGGR_P (x) = false;
+  {
+   /* Synthetize constexpr defaulted comparisons.  */
+   if (!DECL_ARTIFICIAL (x)
+   && DECL_DEFAULTED_IN_CLASS_P (x)
+   && special_function_p (x) == sfk_comparison)
+ defaulted_late_check (x);
+   DECL_IN_AGGR_P (x) = false;
+  }
 else if (VAR_P (x) && TREE_STATIC (x)
 && TREE_TYPE (x) != error_mark_node
 && same_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (x)), t))
--- gcc/testsuite/g++.dg/cpp2a/spaceship-eq11.C.jj  2021-09-27 
14:20:04.723713371 +0200
+++ gcc/testsuite/g++.dg/cpp2a/spaceship-eq11.C 2021-09-27 14:20:20.387495858 
+0200
@@ -0,0 +1,43 @@
+// PR c++/102490
+// { dg-do run { target c++20 } }
+
+struct A
+{
+  unsigned char a : 1;
+  unsigned char b : 1;
+  constexpr bool operator== (const A &) const = default;
+};
+
+struct B
+{
+  unsigned char a : 8;
+  int : 0;
+  unsigned char b : 7;
+  constexpr bool operator== (const B &) const = default;
+};
+
+struct C
+{
+  unsigned char a : 3;
+  unsigned char b : 1;
+  constexpr bool operator== (const C &) const = default;
+};
+
+void
+foo (C &x, int y)
+{
+  x.b = y;
+}
+
+int
+main ()
+{
+  A a{}, b{};
+  B c{}, d{};
+  C e{}, f{};
+  a.b = 1;
+  d.b = 1;
+  foo (e, 0);
+  foo (f, 1);
+  return a == b || c == d || e == f;
+}
--- gcc/testsuite/g++.dg/cpp2a/spaceship-eq12.C.jj  2021-09-27 
14:20:12.050611625 +0200
+++ gcc/testsuite/g++.dg/cpp2a/spaceship-eq12.C 2021-09-27 14:20:39.633228602 
+0200
@@ -0,0 +1,5 @@
+// PR c++/102490
+// { dg-do run { target c++20 } }
+// { dg-options "-O2" }
+
+#include "spaceship-eq11.C"

Jakub



Re: [RFC 1/7] Avoid references to register names in instruction output patterns.

2021-09-28 Thread YunQiang Su
Dragan Mladjenovic via Gcc-patches 
于2021年9月26日周日 下午9:26写道:
>
> This allows us to choose the different names if needed in the future.
>

I try to apply this patch to current gcc, I get this error:
/build/mips-mti-elf/srcs-gcc/gcc/testsuite/gcc.c-torture/compile/20010226-1.c:
In function 'foo':^M
/build/mips-mti-elf/srcs-gcc/gcc/testsuite/gcc.c-torture/compile/20010226-1.c:24:1:
internal compiler error: output_operand: invalid %-code^M
0xa435c4 output_operand_lossage(char const*, ...)^M
../../srcs-gcc/gcc/final.c:3235^M
0xa43ec6 output_asm_insn(char const*, rtx_def**)^M
../../srcs-gcc/gcc/final.c:3604^M
0xa482c7 output_asm_insn(char const*, rtx_def**)^M
../../srcs-gcc/gcc/final.c:3466^M
0xa482c7 final_scan_insn_1^M
../../srcs-gcc/gcc/final.c:2894^M
0xa485bb final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)^M
../../srcs-gcc/gcc/final.c:2940^M
0xa486a6 final_1^M
../../srcs-gcc/gcc/final.c:1997^M
0xa49262 rest_of_handle_final^M
../../srcs-gcc/gcc/final.c:4285^M
0xa49262 execute^M
../../srcs-gcc/gcc/final.c:4363^M
Please submit a full bug report,^M
with preprocessed source if appropriate.^M
Please include the complete backtrace with any bug report.^M
See  for instructions.^M
compiler exited with status 1

> gcc/ChangeLog:
>
> * config/mips/mips.c (mips_print_operand_punctuation):
> Handle '&' punctuation.
> (mips_output_probe_stack_range): Use '%.' instead of $0.
> * config/mips/mips.h (GLOBAL_POINTER_REGNUM): Move to ...
> * config/mips/mips.md (GLOBAL_POINTER_REGNUM): ... here.
> (trap, *conditional_trap_reg, *msac, *muls,
> *muls_di, msubsidi4): Use '%.' instead of $0.
> (clear_hazard_): Use '%&' instead of $31.
> ---
>  gcc/config/mips/mips.c  |  9 +++--
>  gcc/config/mips/mips.h  |  4 
>  gcc/config/mips/mips.md | 17 +
>  3 files changed, 16 insertions(+), 14 deletions(-)
>  3 files changed, 16 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index ce60c5500b7..ab63575eb26 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -8816,6 +8816,7 @@ mips_pop_asm_switch (struct mips_asm_switch *asm_switch)
> '^' Print the name of the pic call-through register (t9 or $25).
> '+' Print the name of the gp register (usually gp or $28).
> '$' Print the name of the stack pointer register (sp or $29).
> +   '&' Print the name of the return register (ra or $31).
> ':'  Print "c" to use the compact version if the delay slot is a nop.
> '!'  Print "s" to use the short version if the delay slot contains a
> 16-bit instruction.
> @@ -8902,6 +8903,10 @@ mips_print_operand_punctuation (FILE *file, int ch)
>fputs (reg_names[STACK_POINTER_REGNUM], file);
>break;
>
> +case '&':
> +  fputs (reg_names[RETURN_ADDR_REGNUM], file);
> +  break;
> +
>  case ':':
>/* When final_sequence is 0, the delay slot will be a nop.  We can
>  use the compact version where available.  The %: formatter will
> @@ -12133,9 +12138,9 @@ mips_output_probe_stack_range (rtx reg1, rtx reg2)
>strcpy (tmp, "%(%output_asm_insn (strcat (tmp, &loop_lab[1]), xops);
>if (TARGET_64BIT)
> -output_asm_insn ("sd\t$0,0(%0)%)", xops);
> +output_asm_insn ("sd\t%.,0(%0)%)", xops);
>else
> -output_asm_insn ("sw\t$0,0(%0)%)", xops);
> +output_asm_insn ("sw\t%.,0(%0)%)", xops);
>

I guess the problem is due to this.

>return "";
>  }
> diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
> index f4e30ba3fdb..a44ccada0bc 100644
> --- a/gcc/config/mips/mips.h
> +++ b/gcc/config/mips/mips.h
> @@ -2064,10 +2064,6 @@ FP_ASM_SPEC "\
> function address than to call an address kept in a register.  */
>  #define NO_FUNCTION_CSE 1
>
> -/* The ABI-defined global pointer.  Sometimes we use a different
> -   register in leaf functions: see PIC_OFFSET_TABLE_REGNUM.  */
> -#define GLOBAL_POINTER_REGNUM (GP_REG_FIRST + 28)
> -
>  /* We normally use $28 as the global pointer.  However, when generating
> n32/64 PIC, it is better for leaf functions to use a call-clobbered
> register instead.  They can then avoid saving and restoring $28
> diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
> index dee71dc1fb0..1c8b3b98b20 100644
> --- a/gcc/config/mips/mips.md
> +++ b/gcc/config/mips/mips.md
> @@ -167,6 +167,7 @@
> (GET_FCSR_REGNUM2)
> (SET_FCSR_REGNUM4)
> (PIC_FUNCTION_ADDR_REGNUM   25)
> +   (GLOBAL_POINTER_REGNUM  28)
> (RETURN_ADDR_REGNUM 31)
> (CPRESTORE_SLOT_REGNUM  76)
> (GOT_VERSION_REGNUM 79)
> @@ -1205,7 +1206,7 @@
>""
>  {
>if (ISA_HAS_COND_TRAP)
> -return "teq\t$0,$0";
> +return "teq\t%.,%.";
>else if (TARGET_MIPS16)
>  return "break 0";
>else
> @@ -1230,7 +1231,7 @@
>   

[PATCH] i386: Don't emit fldpi etc. if -frounding-math [PR102498]

2021-09-28 Thread Jakub Jelinek via Gcc-patches
Hi!

i387 has instructions to store some transcedental numbers into the top of
stack.  The problem is that what exact bit in the last place one gets for
those depends on the current rounding mode, the CPU knows the number with
slightly higher precision.  The compiler assumes rounding to nearest when
comparing them against constants in the IL, but at runtime the rounding
can be different and so some of these depending on rounding mode and the
constant could be 1 ulp higher or smaller than expected.
We only support changing the rounding mode at runtime if the non-default
-frounding-mode option is used, so the following patch just disables
using those constants if that flag is on.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-09-28  Jakub Jelinek  

PR target/102498
* config/i386/i386.c (standard_80387_constant_p): Don't recognize
special 80387 instruction XFmode constants if flag_rounding_math.

* gcc.target/i386/pr102498.c: New test.

--- gcc/config/i386/i386.c.jj   2021-09-18 09:44:31.720743823 +0200
+++ gcc/config/i386/i386.c  2021-09-27 16:55:37.928072249 +0200
@@ -5035,7 +5035,8 @@ standard_80387_constant_p (rtx x)
   /* For XFmode constants, try to find a special 80387 instruction when
  optimizing for size or on those CPUs that benefit from them.  */
   if (mode == XFmode
-  && (optimize_function_for_size_p (cfun) || TARGET_EXT_80387_CONSTANTS))
+  && (optimize_function_for_size_p (cfun) || TARGET_EXT_80387_CONSTANTS)
+  && !flag_rounding_math)
 {
   int i;
 
--- gcc/testsuite/gcc.target/i386/pr102498.c.jj 2021-09-27 17:09:30.387509264 
+0200
+++ gcc/testsuite/gcc.target/i386/pr102498.c2021-09-27 17:09:22.548618148 
+0200
@@ -0,0 +1,59 @@
+/* PR target/102498 */
+/* { dg-do run { target fenv } } */
+/* { dg-options "-frounding-math" } */
+
+#include 
+#include 
+
+__attribute__((noipa)) long double
+fldlg2 (void)
+{
+  return 0.3010299956639811952256464283594894482L;
+}
+
+__attribute__((noipa)) long double
+fldln2 (void)
+{
+  return 0.6931471805599453094286904741849753009L;
+}
+
+__attribute__((noipa)) long double
+fldl2e (void)
+{
+  return 1.4426950408889634073876517827983434472L;
+}
+
+__attribute__((noipa)) long double
+fldl2t (void)
+{
+  return 3.3219280948873623478083405569094566090L;
+}
+
+__attribute__((noipa)) long double
+fldpi (void)
+{
+  return 3.1415926535897932385128089594061862044L;
+}
+
+int
+main ()
+{
+  long double a = fldlg2 ();
+  long double b = fldln2 ();
+  long double c = fldl2e ();
+  long double d = fldl2t ();
+  long double e = fldpi ();
+  static int f[] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD };
+  int i;
+  for (i = 0; i < 4; i++)
+{
+  fesetround (f[i]);
+  if (a != fldlg2 ()
+ || b != fldln2 ()
+ || c != fldl2e ()
+ || d != fldl2t ()
+ || e != fldpi ())
+   abort ();
+}
+  return 0;
+}

Jakub



Re: [PATCH] Control all jump threading passes with -fjump-threads.

2021-09-28 Thread Aldy Hernandez via Gcc-patches



On 9/28/21 9:41 AM, Richard Biener wrote:

On Tue, Sep 28, 2021 at 8:29 AM Jeff Law via Gcc-patches
 wrote:




On 9/28/2021 12:17 AM, Aldy Hernandez wrote:

On Tue, Sep 28, 2021 at 3:46 AM Jeff Law  wrote:



On 9/27/2021 9:00 AM, Aldy Hernandez wrote:

Last year I mentioned that -fthread-jumps was being ignored by the
majority of our jump threading passes, and Jeff said he'd be in favor
of fixing this.

This patch remedies the situation, but it does change existing behavior.
Currently -fthread-jumps is only enabled for -O2, -O3, and -Os.  This
means that even if we restricted all jump threading passes with
-fthread-jumps, DOM jump threading would still seep through since it
runs at -O1.

I propose this patch, but it does mean that DOM jump threading would
have to be explicitly enabled with -O1 -fthread-jumps.  An
alternative would be to also offer a specific -fno-dom-threading, but
that seems icky.

OK pending tests?

gcc/ChangeLog:

* tree-ssa-threadbackward.c (pass_thread_jumps::gate): Check
flag_thread_jumps.
(pass_early_thread_jumps::gate): Same.
* tree-ssa-threadedge.c (jump_threader::thread_outgoing_edges):
Return if !flag_thread_jumps.
* tree-ssa-threadupdate.c
(jt_path_registry::register_jump_thread): Assert that
flag_thread_jumps is true.

OK.  Clearly this is going to be even better once we disentangle
threading from DOM.

Annoyingly, I had to tweak a few more tests, particularly some
-Wuninitialized -O1 ones which seem to depend on DOM jump threading to
give proper diagnostics.  It seems that every change to jump threading
needs tweaks to the Wuninitialized code :-(.

Well, a lot of jump threading is there to help eliminate false positives
from Wuninitialized by eliminating paths through the CFG that we can
prove never execute at runtime.  SO that's not a huge surprise.


I would have suggested to enable -fthread-jumps at -O1 instead
and eventually just add && flag_expensive_optimizations to the
use in cfgcleanup.c to restrict that to -O2+


Hmmm, that's a much better idea.  I was afraid of messing existing 
behavior, but I suppose adding even more false positives for -O1 
-Wuninitialized is worse.


BTW, I plugged one more tweak to the registry in 
remove_jump_threads_including.  No need to go add things to the removed 
edges hash table, if we're not going to thread.


OK pending tests?
Aldy
commit e0a5b35c8becda7bef37bc6eca1686ab2e762088
Author: Aldy Hernandez 
Date:   Tue Sep 28 11:33:11 2021 +0200

Enable jump threading at -O1.

My previous patch gating all jump threading by -fthread-jumps had the
side effect of turning off DOM jump threading at -O1.  This causes
numerous -Wuninitialized false positives.  This patch turns on jump
threading at -O1 to minimize the disruption.

gcc/ChangeLog:

* cfgcleanup.c (pass_jump::execute): Check
flag_expensive_optimizations.
(pass_jump_after_combine::gate): Same.
* doc/invoke.texi (-fthread-jumps): Enable for -O1.
* opts.c (default_options_table): Enable -fthread-jumps at -O1.
* tree-ssa-threadupdate.c
(fwd_jt_path_registry::remove_jump_threads_including): Bail unless
flag_thread_jumps.

gcc/testsuite/ChangeLog:

* gcc.dg/auto-init-uninit-1.c: Adjust.
* gcc.dg/auto-init-uninit-15.c: Same.
* gcc.dg/guality/example.c: Same.
* gcc.dg/loop-8.c: Same.
* gcc.dg/strlenopt-40.c: Same.
* gcc.dg/tree-ssa/pr18133-2.c: Same.
* gcc.dg/tree-ssa/pr18134.c: Same.
* gcc.dg/uninit-1.c: Same.
* gcc.dg/uninit-pr44547.c: Same.
* gcc.dg/uninit-pr59970.c: Same.

diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index 7b1e1ba6e80..82fc505ff50 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -3239,7 +3239,8 @@ pass_jump::execute (function *)
   if (dump_file)
 dump_flow_info (dump_file, dump_flags);
   cleanup_cfg ((optimize ? CLEANUP_EXPENSIVE : 0)
-	   | (flag_thread_jumps ? CLEANUP_THREADING : 0));
+	   | (flag_thread_jumps && flag_expensive_optimizations
+		  ? CLEANUP_THREADING : 0));
   return 0;
 }
 
@@ -3274,7 +3275,10 @@ public:
   {}
 
   /* opt_pass methods: */
-  virtual bool gate (function *) { return flag_thread_jumps; }
+  virtual bool gate (function *)
+  {
+return flag_thread_jumps && flag_expensive_optimizations;
+  }
   virtual unsigned int execute (function *);
 
 }; // class pass_jump_after_combine
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ba98eab68a5..6d9a107acd0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10781,7 +10781,7 @@ so, the first branch is redirected to either the destination of the
 second branch or a point immediately following it, depending on whether
 the condition is known to be true or false.
 
-Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}.
+

[committed] openmp: Don't call omp_finish_clause on implicitly added private clauses on simd [PR102492]

2021-09-28 Thread Jakub Jelinek via Gcc-patches
Hi!

The gimplifier adds implicit private clauses on SIMD constructs for local
variables in the SIMD body if they are addressable to make sure they use
the magic arrays with "omp simd array" attribute (such that each SIMD lane
has its own copy), but we actually don't need to default privatize etc. those,
the construction for them is done in the SIMD body and so is destruction.
omp_finish_clause for C++ now requires default constructor (and dtor) for 
private,
so that OpenMP 5.1 default(private) works, but that will never be needed on
SIMD.  So, this patch just doesn't call omp_finish_clause for private on simd.
The C and Fortran langhooks don't do anything for private.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-09-28  Jakub Jelinek  

PR middle-end/102492
* gimplify.c (gimplify_adjust_omp_clauses_1): Don't call the
omp_finish_clause langhook on implicitly added OMP_CLAUSE_PRIVATE
clauses on SIMD constructs.

* g++.dg/gomp/simd-3.C: New test.

--- gcc/gimplify.c.jj   2021-09-18 09:47:08.363574453 +0200
+++ gcc/gimplify.c  2021-09-27 15:44:23.855457483 +0200
@@ -10914,8 +10914,12 @@ gimplify_adjust_omp_clauses_1 (splay_tre
   *list_p = clause;
   struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp;
   gimplify_omp_ctxp = ctx->outer_context;
-  lang_hooks.decls.omp_finish_clause (clause, pre_p,
- (ctx->region_type & ORT_ACC) != 0);
+  /* Don't call omp_finish_clause on implicitly added OMP_CLAUSE_PRIVATE
+ in simd.  Those are only added for the local vars inside of simd body
+ and they don't need to be e.g. default constructible.  */
+  if (code != OMP_CLAUSE_PRIVATE || ctx->region_type != ORT_SIMD) 
+lang_hooks.decls.omp_finish_clause (clause, pre_p,
+   (ctx->region_type & ORT_ACC) != 0);
   if (gimplify_omp_ctxp)
 for (; clause != chain; clause = OMP_CLAUSE_CHAIN (clause))
   if (OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_MAP
--- gcc/testsuite/g++.dg/gomp/simd-3.C.jj   2021-09-27 15:46:22.726805435 
+0200
+++ gcc/testsuite/g++.dg/gomp/simd-3.C  2021-09-27 15:46:15.124911085 +0200
@@ -0,0 +1,16 @@
+// PR middle-end/102492
+// { dg-do compile }
+
+struct S { S (int); };
+void bar (S &);
+
+void
+foo ()
+{
+  #pragma omp simd
+  for (int i = 0; i < 64; i++)
+{
+  S s = 26;
+  bar (s);
+}
+}

Jakub



[PATCH] Improve jump threading dump output.

2021-09-28 Thread Aldy Hernandez via Gcc-patches
In analyzing PR102511, it has become abundantly clear that we need
better debugging aids for the jump threader solver.  Currently
debugging these issues is a nightmare if you're not intimately
familiar with the code.  This patch attempts to improve this.

First, I'm enabling path solver dumps with TDF_THREADING.  None of the
available TDF_* flags are a good match, and using TDF_DETAILS would blow
up the dump file, since both threaders continually call the solver to
try out candidates.  This will allow dumping path solver details without
having to resort to hacking the source.

I am also dumping the current registered_jump_thread dbg counter used
by the registry, in the solver.  That way narrowing down a problematic
thread can then be examined by -fdump-*-threading and looking at the
solver details surrounding the appropriate counter (which the dbgcnt
also dumps to the dump file).

You still need knowledge of the solver to debug these issues, but at
least now it's not entirely opaque.

OK?

gcc/ChangeLog:

* dbgcnt.c (dbg_cnt_counter): New.
* dbgcnt.h (dbg_cnt_counter): New.
* dumpfile.c (dump_options): Add entry for TDF_THREADING.
* dumpfile.h (enum dump_flag): Add TDF_THREADING.
* gimple-range-path.cc (DEBUG_SOLVER): Use TDF_THREADING.
* tree-ssa-threadupdate.c (dump_jump_thread_path): Dump out
debug counter.
---
 gcc/dbgcnt.c|  8 
 gcc/dbgcnt.h|  1 +
 gcc/dumpfile.c  |  1 +
 gcc/dumpfile.h  |  3 +++
 gcc/gimple-range-path.cc|  2 +-
 gcc/tree-ssa-threadupdate.c | 13 +
 6 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/gcc/dbgcnt.c b/gcc/dbgcnt.c
index 934bbe033ee..6a7eb34cd3e 100644
--- a/gcc/dbgcnt.c
+++ b/gcc/dbgcnt.c
@@ -98,6 +98,14 @@ dbg_cnt (enum debug_counter index)
 return false;
 }
 
+/* Return the counter for INDEX.  */
+
+unsigned
+dbg_cnt_counter (enum debug_counter index)
+{
+  return count[index];
+}
+
 /* Compare limit_tuple intervals by first item in descending order.  */
 
 static int
diff --git a/gcc/dbgcnt.h b/gcc/dbgcnt.h
index 17f2091f5a7..3c35dcc3e0a 100644
--- a/gcc/dbgcnt.h
+++ b/gcc/dbgcnt.h
@@ -33,6 +33,7 @@ enum debug_counter {
 
 extern bool dbg_cnt_is_enabled (enum debug_counter index);
 extern bool dbg_cnt (enum debug_counter index);
+extern unsigned dbg_cnt_counter (enum debug_counter index);
 extern void dbg_cnt_process_opt (const char *arg);
 extern void dbg_cnt_list_all_counters (void);
 
diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
index 8169daf7f59..e6ead5debe5 100644
--- a/gcc/dumpfile.c
+++ b/gcc/dumpfile.c
@@ -145,6 +145,7 @@ static const kv_pair dump_options[] =
   {"missed", MSG_MISSED_OPTIMIZATION},
   {"note", MSG_NOTE},
   {"optall", MSG_ALL_KINDS},
+  {"threading", TDF_THREADING},
   {"all", dump_flags_t (TDF_ALL_VALUES
& ~(TDF_RAW | TDF_SLIM | TDF_LINENO | TDF_GRAPH
| TDF_STMTADDR | TDF_RHS_ONLY | TDF_NOUID
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index 892bfc9ae90..6c7758dd2fb 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -197,6 +197,9 @@ enum dump_flag
   /* For error.  */
   TDF_ERROR = (1 << 26),
 
+  /* Dumping for range path solver.  */
+  TDF_THREADING = (1 << 27),
+
   /* All values.  */
   TDF_ALL_VALUES = (1 << 29) - 1
 };
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 9da67d2a35b..a29d5318ca9 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -34,7 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 
 // Internal construct to help facilitate debugging of solver.
-#define DEBUG_SOLVER (0 && dump_file)
+#define DEBUG_SOLVER (dump_file && dump_flags & TDF_THREADING)
 
 path_range_query::path_range_query (gimple_ranger &ranger, bool resolve)
   : m_ranger (ranger)
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index cf96c903668..905dea2e6ca 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -218,10 +218,15 @@ dump_jump_thread_path (FILE *dump_file,
   const vec &path,
   bool registering)
 {
-  fprintf (dump_file,
-  "  %s jump thread: (%d, %d) incoming edge; ",
-  (registering ? "Registering" : "Cancelling"),
-  path[0]->e->src->index, path[0]->e->dest->index);
+  if (registering)
+fprintf (dump_file,
+"  [%u] Registering jump thread: (%d, %d) incoming edge; ",
+dbg_cnt_counter (registered_jump_thread),
+path[0]->e->src->index, path[0]->e->dest->index);
+  else
+fprintf (dump_file,
+"  Cancelling jump thread: (%d, %d) incoming edge; ",
+path[0]->e->src->index, path[0]->e->dest->index);
 
   for (unsigned int i = 1; i < path.length (); i++)
 {
-- 
2.31.1



Re: [PATCH] i386: Don't emit fldpi etc. if -frounding-math [PR102498]

2021-09-28 Thread Uros Bizjak via Gcc-patches
On Tue, Sep 28, 2021 at 11:33 AM Jakub Jelinek  wrote:
>
> Hi!
>
> i387 has instructions to store some transcedental numbers into the top of
> stack.  The problem is that what exact bit in the last place one gets for
> those depends on the current rounding mode, the CPU knows the number with
> slightly higher precision.  The compiler assumes rounding to nearest when
> comparing them against constants in the IL, but at runtime the rounding
> can be different and so some of these depending on rounding mode and the
> constant could be 1 ulp higher or smaller than expected.
> We only support changing the rounding mode at runtime if the non-default
> -frounding-mode option is used, so the following patch just disables
> using those constants if that flag is on.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-09-28  Jakub Jelinek  
>
> PR target/102498
> * config/i386/i386.c (standard_80387_constant_p): Don't recognize
> special 80387 instruction XFmode constants if flag_rounding_math.
>
> * gcc.target/i386/pr102498.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.c.jj   2021-09-18 09:44:31.720743823 +0200
> +++ gcc/config/i386/i386.c  2021-09-27 16:55:37.928072249 +0200
> @@ -5035,7 +5035,8 @@ standard_80387_constant_p (rtx x)
>/* For XFmode constants, try to find a special 80387 instruction when
>   optimizing for size or on those CPUs that benefit from them.  */
>if (mode == XFmode
> -  && (optimize_function_for_size_p (cfun) || TARGET_EXT_80387_CONSTANTS))
> +  && (optimize_function_for_size_p (cfun) || TARGET_EXT_80387_CONSTANTS)
> +  && !flag_rounding_math)
>  {
>int i;
>
> --- gcc/testsuite/gcc.target/i386/pr102498.c.jj 2021-09-27 17:09:30.387509264 
> +0200
> +++ gcc/testsuite/gcc.target/i386/pr102498.c2021-09-27 17:09:22.548618148 
> +0200
> @@ -0,0 +1,59 @@
> +/* PR target/102498 */
> +/* { dg-do run { target fenv } } */
> +/* { dg-options "-frounding-math" } */
> +
> +#include 
> +#include 
> +
> +__attribute__((noipa)) long double
> +fldlg2 (void)
> +{
> +  return 0.3010299956639811952256464283594894482L;
> +}
> +
> +__attribute__((noipa)) long double
> +fldln2 (void)
> +{
> +  return 0.6931471805599453094286904741849753009L;
> +}
> +
> +__attribute__((noipa)) long double
> +fldl2e (void)
> +{
> +  return 1.4426950408889634073876517827983434472L;
> +}
> +
> +__attribute__((noipa)) long double
> +fldl2t (void)
> +{
> +  return 3.3219280948873623478083405569094566090L;
> +}
> +
> +__attribute__((noipa)) long double
> +fldpi (void)
> +{
> +  return 3.1415926535897932385128089594061862044L;
> +}
> +
> +int
> +main ()
> +{
> +  long double a = fldlg2 ();
> +  long double b = fldln2 ();
> +  long double c = fldl2e ();
> +  long double d = fldl2t ();
> +  long double e = fldpi ();
> +  static int f[] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD };
> +  int i;
> +  for (i = 0; i < 4; i++)
> +{
> +  fesetround (f[i]);
> +  if (a != fldlg2 ()
> + || b != fldln2 ()
> + || c != fldl2e ()
> + || d != fldl2t ()
> + || e != fldpi ())
> +   abort ();
> +}
> +  return 0;
> +}
>
> Jakub
>


[PATCH] tree-optimization/99793 - testcase for the PR

2021-09-28 Thread Richard Biener via Gcc-patches
This adds a testcase for the PR which was fixed with the fix for
PR100112.

Tested on x86_64-unknown-linux-gnu, pushed.

2021-09-28  Richard Biener  

PR tree-optimization/99793
* gcc.dg/tree-ssa/pr99793.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr99793.c | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr99793.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr99793.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr99793.c
new file mode 100644
index 000..912744928e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr99793.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fstrict-aliasing -fdump-tree-optimized" } */
+
+extern void foo(void);
+static int a, *b = &a, c, *d = &c;
+int main()
+{
+  int **e = &d;
+  if (!((unsigned)((*e = d) == 0) - (*b = 1)))
+foo();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "foo" "optimized" } } */
-- 
2.31.1


Re: [PATCH] Control all jump threading passes with -fjump-threads.

2021-09-28 Thread Richard Biener via Gcc-patches
On Tue, Sep 28, 2021 at 11:42 AM Aldy Hernandez  wrote:
>
>
>
> On 9/28/21 9:41 AM, Richard Biener wrote:
> > On Tue, Sep 28, 2021 at 8:29 AM Jeff Law via Gcc-patches
> >  wrote:
> >>
> >>
> >>
> >> On 9/28/2021 12:17 AM, Aldy Hernandez wrote:
> >>> On Tue, Sep 28, 2021 at 3:46 AM Jeff Law  wrote:
> 
> 
>  On 9/27/2021 9:00 AM, Aldy Hernandez wrote:
> > Last year I mentioned that -fthread-jumps was being ignored by the
> > majority of our jump threading passes, and Jeff said he'd be in favor
> > of fixing this.
> >
> > This patch remedies the situation, but it does change existing behavior.
> > Currently -fthread-jumps is only enabled for -O2, -O3, and -Os.  This
> > means that even if we restricted all jump threading passes with
> > -fthread-jumps, DOM jump threading would still seep through since it
> > runs at -O1.
> >
> > I propose this patch, but it does mean that DOM jump threading would
> > have to be explicitly enabled with -O1 -fthread-jumps.  An
> > alternative would be to also offer a specific -fno-dom-threading, but
> > that seems icky.
> >
> > OK pending tests?
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-threadbackward.c (pass_thread_jumps::gate): Check
> > flag_thread_jumps.
> > (pass_early_thread_jumps::gate): Same.
> > * tree-ssa-threadedge.c (jump_threader::thread_outgoing_edges):
> > Return if !flag_thread_jumps.
> > * tree-ssa-threadupdate.c
> > (jt_path_registry::register_jump_thread): Assert that
> > flag_thread_jumps is true.
>  OK.  Clearly this is going to be even better once we disentangle
>  threading from DOM.
> >>> Annoyingly, I had to tweak a few more tests, particularly some
> >>> -Wuninitialized -O1 ones which seem to depend on DOM jump threading to
> >>> give proper diagnostics.  It seems that every change to jump threading
> >>> needs tweaks to the Wuninitialized code :-(.
> >> Well, a lot of jump threading is there to help eliminate false positives
> >> from Wuninitialized by eliminating paths through the CFG that we can
> >> prove never execute at runtime.  SO that's not a huge surprise.
> >
> > I would have suggested to enable -fthread-jumps at -O1 instead
> > and eventually just add && flag_expensive_optimizations to the
> > use in cfgcleanup.c to restrict that to -O2+
>
> Hmmm, that's a much better idea.  I was afraid of messing existing
> behavior, but I suppose adding even more false positives for -O1
> -Wuninitialized is worse.
>
> BTW, I plugged one more tweak to the registry in
> remove_jump_threads_including.  No need to go add things to the removed
> edges hash table, if we're not going to thread.
>
> OK pending tests?

OK.

Richard.

> Aldy


RE: [PATCH 01/13] arm: Add new tests for comparison vectorization with Neon and MVE

2021-09-28 Thread Kyrylo Tkachov via Gcc-patches
Hi Christophe,

Sorry for the delay.

> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: 07 September 2021 10:15
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 01/13] arm: Add new tests for comparison vectorization
> with Neon and MVE
> 
> This patch mainly adds Neon tests similar to existing MVE ones,
> to make sure we do not break Neon when fixing MVE.
> 
> mve-vcmp-f32-2.c is similar to mve-vcmp-f32.c but uses a conditional
> with 2.0f and 3.0f constants to help scan-assembler-times.
> 
> 2021-09-01  Christophe Lyon 
> 
>   gcc/testsuite/
>   * gcc.target/arm/simd/mve-vcmp-f32-2.c: New.
>   * gcc.target/arm/simd/neon-compare-1.c: New.
>   * gcc.target/arm/simd/neon-compare-2.c: New.
>   * gcc.target/arm/simd/neon-compare-3.c: New.
>   * gcc.target/arm/simd/neon-compare-scalar-1.c: New.
>   * gcc.target/arm/simd/neon-vcmp-f16.c: New.
>   * gcc.target/arm/simd/neon-vcmp-f32-2.c: New.
>   * gcc.target/arm/simd/neon-vcmp-f32-3.c: New.
>   * gcc.target/arm/simd/neon-vcmp-f32.c: New.
>   * gcc.target/arm/simd/neon-vcmp.c: New.

Thanks,
Kyrill

> 
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
> b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
> new file mode 100644
> index 000..917a95bf141
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
> @@ -0,0 +1,32 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> +/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
> +
> +#include 
> +
> +#define NB 4
> +
> +#define FUNC(OP, NAME)
>   \
> +  void test_ ## NAME ##_f (float * __restrict__ dest, float *a, float *b) { \
> +int i;   \
> +for (i=0; i +  dest[i] = (a[i] OP b[i]) ? 2.0f : 3.0f;
> \
> +}
> \
> +  }
> +
> +FUNC(==, vcmpeq)
> +FUNC(!=, vcmpne)
> +FUNC(<, vcmplt)
> +FUNC(<=, vcmple)
> +FUNC(>, vcmpgt)
> +FUNC(>=, vcmpge)
> +
> +/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n}
> 1 } } */
> +/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n}
> 1 } } */
> +/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n}
> 1 } } */
> +/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n}
> 1 } } */
> +/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n}
> 1 } } */
> +/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n}
> 1 } } */
> +/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 24 } } */ /*
> Constant 2.0f.  */
> +/* { dg-final { scan-assembler-times {\t.word\t1077936128\n} 24 } } */ /*
> Constant 3.0f.  */
> diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
> b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
> new file mode 100644
> index 000..2e0222a71f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_neon_ok } */
> +/* { dg-add-options arm_neon } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include "mve-compare-1.c"
> +
> +/* 64-bit vectors.  */
> +/* vmvn is used by 'ne' comparisons: 3 sizes * 2 (signed/unsigned) * 2
> +   (register/zero) = 12.  */
> +/* { dg-final { scan-assembler-times {\tvmvn\td[0-9]+, d[0-9]+\n} 12 } } */
> +
> +/* { 8 bits } x { eq, ne, lt, le, gt, ge }. */
> +/* ne uses eq, lt/le only apply to comparison with zero, they use gt/ge
> +   otherwise.  */
> +/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, d[0-9]+\n}
> 4 } } */
> +/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, #0\n} 4 } 
> }
> */
> +/* { dg-final { scan-assembler-times {\tvclt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } 
> }
> */
> +/* { dg-final { scan-assembler-times {\tvcle.s8\td[0-9]+, d[0-9]+, #0\n} 1 } 
> }
> */
> +/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, d[0-9]+\n}
> 2 } } */
> +/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } 
> }
> */
> +/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, d[0-9]+\n}
> 2 } } */
> +/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, #0\n} 1 } 
> }
> */
> +
> +/* { 16 bits } x { eq, ne, lt, le, gt, ge }. */
> +/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, d[0-
> 9]+\n} 4 } } */
> +/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, #0\n}
> 4 } } */
> +/* { dg-final { scan-assembler-times {\tvclt.s16\td[0-9]+, d[0-9]+, #0\n} 1 
> } }
> */
> +/* { dg-final { scan-assembler-times {\tvcle.s16\td[0-9]+, d[0-9]+, #0\n}
> 1 } } */
> +/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, d[0-

RE: [PATCH 02/13] arm: Add tests for PR target/100757

2021-09-28 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: 07 September 2021 10:15
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 02/13] arm: Add tests for PR target/100757
> 
> These tests currently trigger an ICE which is fixed later in the patch
> series.
> 
> The pr100757*.c testcases are derived from
> gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
> various types and return values different from 0 and 1 to avoid
> commonalization with boolean masks.  In addition, since we should not
> need these masks, the tests make sure they are not present.

Ok, but I'd rather it was committed together with the patch that fixes the ICE.
I don't mind if it's a separate commit or rolled into that patch.

Thanks,
Kyrill

> 
> 2021-09-01  Christophe Lyon  
> 
>   gcc/testsuite/
>   PR target/100757
>   * gcc.target/arm/simd/pr100757-2.c: New.
>   * gcc.target/arm/simd/pr100757-3.c: New.
>   * gcc.target/arm/simd/pr100757-4.c: New.
>   * gcc.target/arm/simd/pr100757.c: New.
> 
> diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
> b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
> new file mode 100644
> index 000..c2262b4d81e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> +/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
> +/* Derived from gcc.c-torture/compile/20160205-1.c.  */
> +
> +float a[32];
> +int fn1(int d) {
> +  int c = 4;
> +  for (int b = 0; b < 32; b++)
> +if (a[b] != 2.0f)
> +  c = 5;
> +  return c;
> +}
> +
> +/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /*
> Constant 2.0f.  */
> +/* { dg-final { scan-assembler-times {\t.word\t4\n} 4 } } */ /* Initial value
> for c.  */
> +/* { dg-final { scan-assembler-times {\t.word\t5\n} 4 } } */ /* Possible
> value for c.  */
> +/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
> +/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
> diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
> b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
> new file mode 100644
> index 000..e604555c04c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> +/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
> +/* Copied from gcc.c-torture/compile/20160205-1.c.  */
> +
> +float a[32];
> +float fn1(int d) {
> +  float c = 4.0f;
> +  for (int b = 0; b < 32; b++)
> +if (a[b] != 2.0f)
> +  c = 5.0f;
> +  return c;
> +}
> +
> +/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /*
> Constant 2.0f.  */
> +/* { dg-final { scan-assembler-times {\t.word\t1084227584\n} 4 } } */ /*
> Initial value for c (4.0).  */
> +/* { dg-final { scan-assembler-times {\t.word\t1082130432\n} 4 } } */ /*
> Possible value for c (5.0).  */
> +/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
> +/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
> diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
> b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
> new file mode 100644
> index 000..c12040c517f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-additional-options "-O3" } */
> +/* Derived from gcc.c-torture/compile/20160205-1.c.  */
> +
> +unsigned int a[32];
> +int fn1(int d) {
> +  int c = 2;
> +  for (int b = 0; b < 32; b++)
> +if (a[b])
> +  c = 3;
> +  return c;
> +}
> +
> +/* { dg-final { scan-assembler-times {\t.word\t0\n} 4 } } */ /* 'false' mask.
> */
> +/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
> +/* { dg-final { scan-assembler-times {\t.word\t2\n} 4 } } */ /* Initial value
> for c.  */
> +/* { dg-final { scan-assembler-times {\t.word\t3\n} 4 } } */ /* Possible
> value for c.  */
> diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757.c
> b/gcc/testsuite/gcc.target/arm/simd/pr100757.c
> new file mode 100644
> index 000..41d6e4e2d7a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/pr100757.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-additional-options "-O3" } */
> +/* Derived from gcc.c-torture/compile/20160205-1.c.  */
> +
> +int a[32];
> +int fn1(int d) {
> +  int c = 2;
> +  for (int b = 0; b < 32; b++)
> +if (a[b])
> +  c = 3;

RE: [PATCH 03/13] arm: Add test for PR target/101325

2021-09-28 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: 07 September 2021 10:15
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 03/13] arm: Add test for PR target/101325
> 
> This test is derived from the one provided in the PR: it is a
> compile-only test because I do not have access to anything that could
> execute it.  We can switch it do 'dg-do run' later, however it would
> be better to write a new executable test to ensure coverage in case
> the tester cannot execute such code (and it will need a new
> arm_v8_1m_mve_hw or similar effective-target).

The test is okay for now.
I think we'll want to have a arm_v8_1m_mve_hw target sooner or later.
Maybe Alex or Andrea can help to write one we can use?

Thanks,
Kyrill

> 
> 2021-09-01  Christophe Lyon  
> 
>   gcc/testsuite/
>   PR target/101325
>   * gcc.target/arm/simd/pr101325.c: New.
> 
> diff --git a/gcc/testsuite/gcc.target/arm/simd/pr101325.c
> b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
> new file mode 100644
> index 000..a466683a0b1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include 
> +
> +unsigned foo(int8x16_t v, int8x16_t w)
> +{
> +  return vcmpeqq (v, w);
> +}
> +/* { dg-final { scan-assembler {\tvcmp.i8  eq} } } */
> +/* { dg-final { scan-assembler {\tvmrs\t r[0-9]+, P0} } } */
> +/* { dg-final { scan-assembler {\tuxth} } } */
> --
> 2.25.1



Re: [PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-09-28 Thread Richard Biener via Gcc-patches
On Sun, 26 Sep 2021, liuhongt wrote:

> Hi:
> > Please don't add the -fno- option to the warning tests.  As I said,
> > I would prefer to either suppress the vectorization for the failing
> > cases by tweaking the test code or xfail them.  That way future
> > regressions won't be masked by the option.  Once we've moved
> > the warning to a more suitable pass we'll add a new test to verify
> > it works as intended or remove the xfails.
> 
> Remove -fno-tree-vectorize from the warning tests, and add xfails to them.
> The warning information is mainly affected by vectorization of 4 or 2 char
> store. Some targets support both, some targets only support one of them,
> and some targets supported neither, which means the warning information
> would differ from targets to targets.
> I only added xfail { x86_64-*-* i?86-*-* }, other backends may need to
> re-adjust these xfail.
> 
>   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
>   * common.opt (ftree-vectorize): Add Var(flag_tree_vectorize).
>   * doc/invoke.texi (Options That Control Optimization): Update
>   documents.
>   * opts.c (default_options_table): Enable auto-vectorization at
>   O2 with very-cheap cost model.
>   (finish_options): Use cheap cost model for
>   explicit -ftree{,-loop}-vectorize.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/Wstringop-overflow-2.c: Adjust testcase.
>   * g++.dg/tree-ssa/pr81408.C: Ditto.
>   * g++.dg/warn/Wuninitialized-13.C: Ditto.
>   * gcc.dg/Warray-bounds-51.c: Ditto.
>   * gcc.dg/Warray-parameter-3.c: Ditto.
>   * gcc.dg/Wstringop-overflow-14.c: Ditto.
>   * gcc.dg/Wstringop-overflow-21.c: Ditto.
>   * gcc.dg/Wstringop-overflow-68.c: Ditto.
>   * gcc.dg/Wstringop-overflow-76.c: Ditto.
>   * gcc.dg/gomp/pr46032-2.c: Ditto.
>   * gcc.dg/gomp/pr46032-3.c: Ditto.
>   * gcc.dg/gomp/simd-2.c: Ditto.
>   * gcc.dg/gomp/simd-3.c: Ditto.
>   * gcc.dg/graphite/fuse-1.c: Ditto.
>   * gcc.dg/pr67089-6.c: Ditto.
>   * gcc.dg/pr82929-2.c: Ditto.
>   * gcc.dg/pr82929.c: Ditto.
>   * gcc.dg/store_merging_1.c: Ditto.
>   * gcc.dg/store_merging_11.c: Ditto.
>   * gcc.dg/store_merging_15.c: Ditto.
>   * gcc.dg/store_merging_16.c: Ditto.
>   * gcc.dg/store_merging_19.c: Ditto.
>   * gcc.dg/store_merging_24.c: Ditto.
>   * gcc.dg/store_merging_25.c: Ditto.
>   * gcc.dg/store_merging_28.c: Ditto.
>   * gcc.dg/store_merging_30.c: Ditto.
>   * gcc.dg/store_merging_5.c: Ditto.
>   * gcc.dg/store_merging_7.c: Ditto.
>   * gcc.dg/store_merging_8.c: Ditto.
>   * gcc.dg/strlenopt-85.c: Ditto.
>   * gcc.dg/tree-ssa/dump-6.c: Ditto.
>   * gcc.dg/tree-ssa/pr19210-1.c: Ditto.
>   * gcc.dg/tree-ssa/pr47059.c: Ditto.
>   * gcc.dg/tree-ssa/pr86017.c: Ditto.
>   * gcc.dg/tree-ssa/pr91482.c: Ditto.
>   * gcc.dg/tree-ssa/predcom-1.c: Ditto.
>   * gcc.dg/tree-ssa/predcom-dse-3.c: Ditto.
>   * gcc.dg/tree-ssa/prefetch-3.c: Ditto.
>   * gcc.dg/tree-ssa/prefetch-6.c: Ditto.
>   * gcc.dg/tree-ssa/prefetch-8.c: Ditto.
>   * gcc.dg/tree-ssa/prefetch-9.c: Ditto.
>   * gcc.dg/tree-ssa/ssa-dse-18.c: Ditto.
>   * gcc.dg/tree-ssa/ssa-dse-19.c: Ditto.
>   * gcc.dg/uninit-40.c: Ditto.
>   * gcc.dg/unroll-7.c: Ditto.
>   * gcc.misc-tests/help.exp: Ditto.
>   * gcc.target/i386/avx512vpopcntdqvl-vpopcntd-1.c: Ditto.
>   * gcc.target/i386/pr34012.c: Ditto.
>   * gcc.target/i386/pr49781-1.c: Ditto.
>   * gcc.target/i386/pr95798-1.c: Ditto.
>   * gcc.target/i386/pr95798-2.c: Ditto.
>   * gfortran.dg/pr77498.f: Ditto.
> ---
>  gcc/common.opt|  2 +-
>  gcc/doc/invoke.texi   |  8 
>  gcc/opts.c| 17 +---
>  .../c-c++-common/Wstringop-overflow-2.c   | 20 +--
>  gcc/testsuite/g++.dg/tree-ssa/pr81408.C   |  2 +-
>  gcc/testsuite/g++.dg/warn/Wuninitialized-13.C |  2 +-
>  gcc/testsuite/gcc.dg/Warray-bounds-51.c   |  2 +-
>  gcc/testsuite/gcc.dg/Warray-parameter-3.c |  4 ++--
>  gcc/testsuite/gcc.dg/Wstringop-overflow-14.c  |  4 ++--
>  gcc/testsuite/gcc.dg/Wstringop-overflow-21.c  |  8 
>  gcc/testsuite/gcc.dg/Wstringop-overflow-68.c  | 10 +-
>  gcc/testsuite/gcc.dg/Wstringop-overflow-76.c  | 20 +--
>  gcc/testsuite/gcc.dg/gomp/pr46032-2.c |  2 +-
>  gcc/testsuite/gcc.dg/gomp/pr46032-3.c |  2 +-
>  gcc/testsuite/gcc.dg/gomp/simd-2.c|  2 +-
>  gcc/testsuite/gcc.dg/gomp/simd-3.c|  2 +-
>  gcc/testsuite/gcc.dg/graphite/fuse-1.c|  2 +-
>  gcc/testsuite/gcc.dg/pr67089-6.c  |  2 +-
>  gcc/testsuite/gcc.dg/pr82929-2.c  |  2 +-
>  gcc/testsuite/gcc.dg/pr82929.c|  2 +-
>  gcc/testsuite/gcc.dg/store_merging_1.c|  2 +-
>  gcc/testsuite

RE: [PATCH 04/13] arm: Add GENERAL_AND_VPR_REGS regclass

2021-09-28 Thread Kyrylo Tkachov via Gcc-patches
Hi Christophe,

> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Christophe
> LYON via Gcc-patches
> Sent: 08 September 2021 08:49
> To: Richard Earnshaw ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH 04/13] arm: Add GENERAL_AND_VPR_REGS regclass
> 
> 
> On 07/09/2021 15:35, Richard Earnshaw wrote:
> >
> >
> > On 07/09/2021 13:05, Christophe LYON wrote:
> >>
> >> On 07/09/2021 11:42, Richard Earnshaw wrote:
> >>>
> >>>
> >>> On 07/09/2021 10:15, Christophe Lyon via Gcc-patches wrote:
>  At some point during the development of this patch series, it appeared
>  that in some cases the register allocator wants “VPR or general”
>  rather than “VPR or general or FP” (which is the same thing as
>  ALL_REGS).  The series does not seem to require this anymore, but it
>  seems to be a good thing to do anyway, to give the register allocator
>  more freedom.
> 
>  2021-09-01  Christophe Lyon 
> 
>  gcc/
>  * config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
>  (REG_CLASS_NAMES): Likewise.
>  (REG_CLASS_CONTENTS): Likewise. Add VPR_REG to ALL_REGS.
> 
>  diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
>  index 015299c1534..fab39d05916 100644
>  --- a/gcc/config/arm/arm.h
>  +++ b/gcc/config/arm/arm.h
>  @@ -1286,6 +1286,7 @@ enum reg_class
>      SFP_REG,
>      AFP_REG,
>      VPR_REG,
>  +  GENERAL_AND_VPR_REGS,
>      ALL_REGS,
>      LIM_REG_CLASSES
>    };
>  @@ -1315,6 +1316,7 @@ enum reg_class
>      "SFP_REG",    \
>      "AFP_REG",    \
>      "VPR_REG",    \
>  +  "GENERAL_AND_VPR_REGS", \
>      "ALL_REGS"    \
>    }
>    @@ -1343,7 +1345,8 @@ enum reg_class
>      { 0x, 0x, 0x, 0x0040 }, /* SFP_REG
>  */    \
>      { 0x, 0x, 0x, 0x0080 }, /* AFP_REG
>  */    \
>      { 0x, 0x, 0x, 0x0400 }, /* VPR_REG.
>  */    \
>  -  { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS.
>  */    \
>  +  { 0x5FFF, 0x, 0x, 0x0400 }, /*
>  GENERAL_AND_VPR_REGS.  */ \
>  +  { 0x7FFF, 0x, 0x, 0x040F }  /* ALL_REGS.
>  */    \
>    }
> >>>
> >>> You've changed the definition of ALL_REGS here (to include VPR_REG),
> >>> but not really explained why.  Is that the source of the underlying
> >>> issue with the 'appeared' you mention?
> >>
> >>
> >> I first added VPR_REG to ALL_REGS, but Richard Sandiford suggested I
> >> create a new GENERAL_AND_VPR_REGS that would be more restrictive. I
> >> did not remove VPR_REG from ALL_REGS because I thought it was an
> >> omission: shouldn't ALL_REGS contain all registers?
> >
> > Surely that should be a separate patch then.
> 
> OK, I can remove that line from this patch and make a separate one-liner
> for ALL_REGS.

Did you end up sending that patch out? (Sorry, I may have missed it in my 
archive).
This patch to add GENERAL_AND_VPR_REGS is okay with the ALL_REGS change 
separated out.

Thanks,
Kyrill

> 
> Thanks,
> 
> Christophe
> 
> 
> >
> > R.
> >
> >>
> >>
> >>>
> >>> R.
> >>>
> >>>
>      #define FP_SYSREGS \
> 


Re: [PATCH v3 1/3] reassoc: Do not bias loop-carried PHIs early

2021-09-28 Thread Richard Biener via Gcc-patches
On Sun, 26 Sep 2021, Ilya Leoshkevich wrote:

> Biasing loop-carried PHIs during the 1st reassociation pass interferes
> with reduction chains and does not bring measurable benefits, so do it
> only during the 2nd reassociation pass.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
> * passes.def (pass_reassoc): Rename parameter to early_p.
> * tree-ssa-reassoc.c (reassoc_bias_loop_carried_phi_ranks_p):
> New variable.
> (phi_rank): Don't bias loop-carried phi ranks
> before vectorization pass.
> (execute_reassoc): Add bias_loop_carried_phi_ranks_p parameter.
> (pass_reassoc::pass_reassoc): Add bias_loop_carried_phi_ranks_p
> initializer.
> (pass_reassoc::set_param): Set bias_loop_carried_phi_ranks_p
> value.
> (pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p to
> execute_reassoc.
> (pass_reassoc::bias_loop_carried_phi_ranks_p): New member.
> ---
>  gcc/passes.def |  4 ++--
>  gcc/tree-ssa-reassoc.c | 16 ++--
>  2 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/passes.def b/gcc/passes.def
> index d7a1f8c97a6..c5f915d04c6 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -242,7 +242,7 @@ along with GCC; see the file COPYING3.  If not see
>/* Identify paths that should never be executed in a conforming
>program and isolate those paths.  */
>NEXT_PASS (pass_isolate_erroneous_paths);
> -  NEXT_PASS (pass_reassoc, true /* insert_powi_p */);
> +  NEXT_PASS (pass_reassoc, true /* early_p */);
>NEXT_PASS (pass_dce);
>NEXT_PASS (pass_forwprop);
>NEXT_PASS (pass_phiopt, false /* early_p */);
> @@ -325,7 +325,7 @@ along with GCC; see the file COPYING3.  If not see
>NEXT_PASS (pass_lower_vector_ssa);
>NEXT_PASS (pass_lower_switch);
>NEXT_PASS (pass_cse_reciprocals);
> -  NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
> +  NEXT_PASS (pass_reassoc, false /* early_p */);
>NEXT_PASS (pass_strength_reduction);
>NEXT_PASS (pass_split_paths);
>NEXT_PASS (pass_tracer);
> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
> index 8498cfc7aa8..420c14e8cf5 100644
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -180,6 +180,10 @@ along with GCC; see the file COPYING3.  If not see
> point 3a in the pass header comment.  */
>  static bool reassoc_insert_powi_p;
>  
> +/* Enable biasing ranks of loop accumulators.  We don't want this before
> +   vectorization, since it interferes with reduction chains.  */
> +static bool reassoc_bias_loop_carried_phi_ranks_p;
> +
>  /* Statistics */
>  static struct
>  {
> @@ -269,6 +273,9 @@ phi_rank (gimple *stmt)
>use_operand_p use;
>gimple *use_stmt;
>  
> +  if (!reassoc_bias_loop_carried_phi_ranks_p)
> +return bb_rank[bb->index];
> +
>/* We only care about real loops (those with a latch).  */
>if (!father->latch)
>  return bb_rank[bb->index];
> @@ -6940,9 +6947,10 @@ fini_reassoc (void)
> optimization of a gimple conditional.  Otherwise returns zero.  */
>  
>  static unsigned int
> -execute_reassoc (bool insert_powi_p)
> +execute_reassoc (bool insert_powi_p, bool bias_loop_carried_phi_ranks_p)
>  {
>reassoc_insert_powi_p = insert_powi_p;
> +  reassoc_bias_loop_carried_phi_ranks_p = bias_loop_carried_phi_ranks_p;
>  
>init_reassoc ();
>  
> @@ -6983,15 +6991,19 @@ public:
>  {
>gcc_assert (n == 0);
>insert_powi_p = param;
> +  bias_loop_carried_phi_ranks_p = !param;
>  }
>virtual bool gate (function *) { return flag_tree_reassoc != 0; }
>virtual unsigned int execute (function *)
> -{ return execute_reassoc (insert_powi_p); }
> +  {
> +return execute_reassoc (insert_powi_p, bias_loop_carried_phi_ranks_p);
> +  }
>  
>   private:
>/* Enable insertion of __builtin_powi calls during execute_reassoc.  See
>   point 3a in the pass header comment.  */
>bool insert_powi_p;
> +  bool bias_loop_carried_phi_ranks_p;
>  }; // class pass_reassoc
>  
>  } // anon namespace
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


RE: [PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p

2021-09-28 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: 07 September 2021 10:17
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 05/13] arm: Add support for VPR_REG in
> arm_class_likely_spilled_p
> 
> VPR_REG is the only register in its class, so it should be handled by
> TARGET_CLASS_LIKELY_SPILLED_P.  No test fails without this patch, but
> it seems it should be implemented.

The documentation for the hook does recommend returning true when there is only 
one register in the class.
So this seems sensible to me. It's supposed to affect optimisation rather than 
correctness so I'm in favour of it.
Ok.
Thanks,
Kyrill

> 
> 2021-09-01  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm.c (arm_class_likely_spilled_p): Handle VPR_REG.
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 11dafc70067..1222cb0d0fe 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -29307,6 +29307,9 @@ arm_class_likely_spilled_p (reg_class_t rclass)
>|| rclass  == CC_REG)
>  return true;
> 
> +  if (TARGET_HAVE_MVE && (rclass == VPR_REG))
> +return true;
> +
>return false;
>  }
> 
> --
> 2.25.1



RE: [PATCH 06/13] arm: Fix mve_vmvnq_n_ argument mode

2021-09-28 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Christophe
> Lyon via Gcc-patches
> Sent: 07 September 2021 10:17
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 06/13] arm: Fix mve_vmvnq_n_ argument
> mode
> 
> The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
>  iterator instead of HI in mve_vmvnq_n_.

Ok. This can go in independently from the rest if testing is ok.
Thanks,
Kyrill

> 
> 2021-09-03  Christophe Lyon  
> 
>   gcc/
>   * config/arm/mve.md (mve_vmvnq_n_): Use V_elem
> mode
>   for operand 1.
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index e393518ea88..14d17060290 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_"
>  (define_insn "mve_vmvnq_n_"
>[
> (set (match_operand:MVE_5 0 "s_register_operand" "=w")
> - (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
> + (unspec:MVE_5 [(match_operand: 1
> "immediate_operand" "i")]
>VMVNQ_N))
>]
>"TARGET_HAVE_MVE"
> --
> 2.25.1



Re: [PATCH v3 2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses

2021-09-28 Thread Richard Biener via Gcc-patches
On Sun, 26 Sep 2021, Ilya Leoshkevich wrote:

> PR tree-optimization/49749 introduced code that shortens dependency
> chains containing loop accumulators by placing them last on operand
> lists of associative operations.
> 
> 456.hmmer benchmark on s390 could benefit from this, however, the code
> that needs it modifies loop accumulator before using it, and since only
> so-called loop-carried phis are are treated as loop accumulators, the
> code in the present form doesn't really help.   According to Bill
> Schmidt - the original author - such a conservative approach was chosen
> so as to avoid unnecessarily swapping operands, which might cause
> unpredictable effects.  However, giving special treatment to forms of
> loop accumulators is acceptable.
> 
> The definition of loop-carried phi is: it's a single-use phi, which is
> used in the same innermost loop it's defined in, at least one argument
> of which is defined in the same innermost loop as the phi itself.
> Given this, it seems natural to treat single uses of such phis as phis
> themselves.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
>   * tree-ssa-reassoc.c (biased_names): New global.
>   (propagate_bias_p): New function.
>   (loop_carried_phi): Remove.
>   (propagate_rank): Propagate bias along single uses.
>   (get_rank): Update biased_names when needed.
> ---
>  gcc/tree-ssa-reassoc.c | 109 -
>  1 file changed, 74 insertions(+), 35 deletions(-)
> 
> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
> index 420c14e8cf5..db9fb4e1cac 100644
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -211,6 +211,10 @@ static int64_t *bb_rank;
>  /* Operand->rank hashtable.  */
>  static hash_map *operand_rank;
>  
> +/* SSA_NAMEs that are forms of loop accumulators and whose ranks need to be
> +   biased.  */
> +static auto_bitmap biased_names;
> +
>  /* Vector of SSA_NAMEs on which after reassociate_bb is done with
> all basic blocks the CFG should be adjusted - basic blocks
> split right after that SSA_NAME's definition statement and before
> @@ -256,6 +260,53 @@ reassoc_remove_stmt (gimple_stmt_iterator *gsi)
> the rank difference between two blocks.  */
>  #define PHI_LOOP_BIAS (1 << 15)
>  
> +/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of the STMT's
> +   operands to the STMT's left-hand side.  The goal is to preserve bias in 
> code
> +   like this:
> +
> + x_1 = phi(x_0, x_2)
> + a = x_1 | 1
> + b = a ^ 2
> + .MEM = b
> + c = b + d
> + x_2 = c + e
> +
> +   That is, we need to preserve bias along single-use chains originating from
> +   loop-carried phis.  Only GIMPLE_ASSIGNs to SSA_NAMEs are considered to be
> +   uses, because only they participate in rank propagation.  */
> +static bool
> +propagate_bias_p (gimple *stmt)
> +{
> +  use_operand_p use;
> +  imm_use_iterator use_iter;
> +  gimple *single_use_stmt = NULL;
> +
> +  if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_reference)
> +return false;
> +
> +  FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt))
> +{
> +  gimple *current_use_stmt = USE_STMT (use);
> +
> +  if (is_gimple_assign (current_use_stmt)
> +   && TREE_CODE (gimple_assign_lhs (current_use_stmt)) == SSA_NAME)
> + {
> +   if (single_use_stmt != NULL && single_use_stmt != current_use_stmt)
> + return false;
> +   single_use_stmt = current_use_stmt;
> + }
> +}
> +
> +  if (single_use_stmt == NULL)
> +return false;
> +
> +  if (gimple_bb (stmt)->loop_father
> +  != gimple_bb (single_use_stmt)->loop_father)
> +return false;
> +
> +  return true;
> +}
> +
>  /* Rank assigned to a phi statement.  If STMT is a loop-carried phi of
> an innermost loop, and the phi has only a single use which is inside
> the loop, then the rank is the block rank of the loop latch plus an
> @@ -313,49 +364,27 @@ phi_rank (gimple *stmt)
>return bb_rank[bb->index];
>  }
>  
> -/* If EXP is an SSA_NAME defined by a PHI statement that represents a
> -   loop-carried dependence of an innermost loop, return TRUE; else
> -   return FALSE.  */
> -static bool
> -loop_carried_phi (tree exp)
> -{
> -  gimple *phi_stmt;
> -  int64_t block_rank;
> -
> -  if (TREE_CODE (exp) != SSA_NAME
> -  || SSA_NAME_IS_DEFAULT_DEF (exp))
> -return false;
> -
> -  phi_stmt = SSA_NAME_DEF_STMT (exp);
> -
> -  if (gimple_code (SSA_NAME_DEF_STMT (exp)) != GIMPLE_PHI)
> -return false;
> -
> -  /* Non-loop-carried phis have block rank.  Loop-carried phis have
> - an additional bias added in.  If this phi doesn't have block rank,
> - it's biased and should not be propagated.  */
> -  block_rank = bb_rank[gimple_bb (phi_stmt)->index];
> -
> -  if (phi_rank (phi_stmt) != block_rank)
> -return true;
> -
> -  return false;
> -}
> -
>  /* Return the maximum of RANK and the rank that should be propagated
> from expression OP.  For mos

Re: [PATCH v3 3/3] reassoc: Test rank biasing

2021-09-28 Thread Richard Biener via Gcc-patches
On Sun, 26 Sep 2021, Ilya Leoshkevich wrote:

> Add both positive and negative tests.

The tests will likely be quite fragile with respect to what is
actually vectorized on which target.  If you move the tests
to gcc.dg/vect/ you could at least do

/* { dg-require-effective-target vect_int } */

do you need to look for the exact GIMPLE IL or is it enough to
verify we are vectorizing the reduction?

Thanks,
Richard.


> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/reassoc-46.c: New test.
>   * gcc.dg/tree-ssa/reassoc-46.h: Common code for new tests.
>   * gcc.dg/tree-ssa/reassoc-47.c: New test.
>   * gcc.dg/tree-ssa/reassoc-48.c: New test.
>   * gcc.dg/tree-ssa/reassoc-49.c: New test.
>   * gcc.dg/tree-ssa/reassoc-50.c: New test.
>   * gcc.dg/tree-ssa/reassoc-51.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c |  7 +
>  gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++
>  gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c |  9 ++
>  gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c |  9 ++
>  gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 
>  gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 +++
>  gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 
>  7 files changed, 90 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
> new file mode 100644
> index 000..97563dd929f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */
> +
> +#include "reassoc-46.h"
> +
> +/* Check that the loop accumulator is added last.  */
> +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = 
> (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ 
> (?:vect_)?_[\d._]+)} 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h 
> b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
> new file mode 100644
> index 000..e60b490ea0d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
> @@ -0,0 +1,33 @@
> +#define M 1024
> +unsigned int arr1[M];
> +unsigned int arr2[M];
> +volatile unsigned int sink;
> +
> +unsigned int
> +test (void)
> +{
> +  unsigned int sum = 0;
> +  for (int i = 0; i < M; i++)
> +{
> +#ifdef MODIFY
> +  /* Modify the loop accumulator using a chain of operations - this 
> should
> + not affect its rank biasing.  */
> +  sum |= 1;
> +  sum ^= 2;
> +#endif
> +#ifdef STORE
> +  /* Save the loop accumulator into a global variable - this should not
> + affect its rank biasing.  */
> +  sink = sum;
> +#endif
> +#ifdef USE
> +  /* Add a tricky use of the loop accumulator - this should prevent its
> + rank biasing.  */
> +  i = (i + sum) % M;
> +#endif
> +  /* Use addends with different ranks.  */
> +  sum += arr1[i];
> +  sum += arr2[((i ^ 1) + 1) % M];
> +}
> +  return sum;
> +}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
> new file mode 100644
> index 000..1b0f0fdabe1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */
> +
> +#define MODIFY
> +#include "reassoc-46.h"
> +
> +/* Check that if the loop accumulator is saved into a global variable, it's
> +   still added last.  */
> +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = 
> (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ 
> (?:vect_)?_[\d._]+)} 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
> new file mode 100644
> index 000..13836ebe8e6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */
> +
> +#define STORE
> +#include "reassoc-46.h"
> +
> +/* Check that if the loop accumulator is modified using a chain of operations
> +   other than addition, its new value is still added last.  */
> +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = 
> (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ 
> (?:vect_)?_[\d._]+)} 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/

RE: [PATCH] Make flag_trapping_math a non-binary Boolean.

2021-09-28 Thread Roger Sayle


Hi Joseph,
Firstly very many thanks for taking the time to respond, and especially for
mentioning
the discussion in PR 54192 (and Marc Glisse's -ffenv-access patches, but
they are a
little less relevant).  Indeed the starting point for this patch is Richard
Beiner's proposal
in comment #9 for that PR.  That you've partially misunderstood the goal of
this patch is
encouraging (if it was simple to understand/fix, there wouldn't be so many
open PRs).
Hopefully, I'm bringing some fresh thinking on how to solve/tackle these
long standing
issues.

Next, I'd like to state that your "five restrictions" ontology is an
excellent starting point,
but I'd like to argue that your proposed list of 5 is the wrong shape
(insufficiently refined).
Instead, I'd like to counter-propose that an improvement/refinement of the
Myers model,
is actually "3 primitive restrictions * N trapping conditions * 2 flow
control sensitivity".

For reference, here's your original list:
> [1] Disallowing code transformations that cause some code to raise more 
> exception flags than it would have before.
> [2] Disallowing code transformations that cause some code to raise fewer 
> exception flags than it would have before.
> [3] Ensuring the code generated allows for possible non-local control flow

> from exception traps raised by floating-point operations (this is the part

> where -fnon-call-exceptions might be relevant).
> [4] Disallowing code transformations that might affect whether an exact 
> underflow exception occurs in some code (not observable through exception 
> flags, is observable through trap handlers).
> [5] Ensuring floating-point operations that might raise exception flags
are 
> not removed, or moved past code (asms or function calls) that might read 
> or modify the exception flag state

Firstly your item [3], concerns the relationship between traps and flow
control, such as C++ exception handling, which is as you correctly point out
the role of "-fnon-call-exceptions", which Richard B has recently confirmed
only applies to targets/languages supporting C++ style exceptions, i.e. this
is controlled by -fexceptions.  On targets such as nvptx-none, that don't
support non-local control flow, stack unwinding nor setjmp/longjmp, i.e.
don't support exceptions, this is completely orthogonal to the others.

Next your item [4] highlights what I consider the underlying problem that
until now has been overlooked, that there are different kinds of traps are
observationally/behaviourally different.  Above you describe, "underflow",
but likewise there are traps for inexact result, "2.0 / 3.0", traps for
division
by 0.0, that invokes undefined behaviour in C++ (but sometimes not in C),
and distinctions between quiet and signaling NaNs.  Your primitivie
restrictions,
[1], [2] and [5] may apply differently to these different kinds of
exceptions.

More relevant than Marc Glisse's -fenv-access is actually my
-fpreserve-traps
patch from July:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574885.html
which tackles restriction [5] (and perhaps [2]).

Working towards the Myers restriction model, I believe we'd be a significant
step
closer with three (command line) flags (or families of flags):

-ftrapping-math related to Myers restrictons [1],[2],[5]
-fpreserve-trapsrelated to Myers restriction [5]
-fcounted-traps related to Myers restriction [2]

The insight that untangles the Gordian knot, is that these three options are

not simple true/false Binary flags, but actually (bit) sets of exception
types
(hopefully all actually using the same TRAPPING_MATH enumeration).

Consider the following four lines of C++:
constexpr t1 = 2.0 / 3.0;
constexpr t2 = std::numeric_limits::quiet_NaN() == 0.0;
constexpr t2 = std::numeric_limits::quiet_NaN() < 0.0;
constexpr t3 = 1.0 / 0.0;
which by IEEE generate four different types of exception, but as you've
expertly
confirmed have (sometimes) different behaviours under the C++ standard.
Treating all trapping conditions identically is clearly insufficient.

Hopefully, the argument/proposal above is sufficient to convince the list
that
we need some form of enumeration (following Richard Beiner's proposal).
Perhaps the devil is in the details, as to what the final form of this
enumeration
should look like [even though at this stage there are no functional changes
yet].

Two very useful references I've been following are:
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_handle.html
https://docs.oracle.com/cd/E88353_01/html/E37846/fex-getexcepthandler-3m.htm
l

Ultimately, the fields and naming of this enumeration are a middle-end
detail,
and reflect constant folding transformations that the middle-end may or may
not perform on either trees or RTL.  In theory, the could be named after
line
numbers in match.pd, fold-const.c and simplify-rtx.c.  For example, what
IEEE calls
"FPE_INTOVF" is more commonly known as TRAPV inside GCC.  Likewise, IEEE
concepts such as FE_INVALID are really just

Re: [PATCH v3 3/3] reassoc: Test rank biasing

2021-09-28 Thread Ilya Leoshkevich via Gcc-patches
On Tue, 2021-09-28 at 13:28 +0200, Richard Biener wrote:
> On Sun, 26 Sep 2021, Ilya Leoshkevich wrote:
> 
> > Add both positive and negative tests.
> 
> The tests will likely be quite fragile with respect to what is
> actually vectorized on which target.  If you move the tests
> to gcc.dg/vect/ you could at least do
> 
> /* { dg-require-effective-target vect_int } */
> 
> do you need to look for the exact GIMPLE IL or is it enough to
> verify we are vectorizing the reduction?

Actually I don't think vectorization is that important here, and I
only check how many times sum_x = sum_y + _z appears.  So I use
(?:vect_)?, which may or may not be there.

An alternative I considered was to use -fno-tree-vectorize to get
smaller regexes, but I thought it would be nice to know that
vectorization does not mess up reassociation results.

Best regards,
Ilya



Re: [PATCH] Loop unswitching: support gswitch statements.

2021-09-28 Thread Richard Biener via Gcc-patches
On Wed, Sep 15, 2021 at 10:46 AM Martin Liška  wrote:
>
> Hello.
>
> The patch extends the loop unswitching pass so that gswitch
> statements are supported. The pass now uses ranger which marks
> switch edges that are known to be unreachable in a versioned loop.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?
> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * tree-cfg.c (gimple_lv_add_condition_to_bb): Support non-gimple
> expressions that needs to be gimplified.
> * tree-ssa-loop-unswitch.c (tree_unswitch_loop): Add new
> cond_edge parameter.
> (tree_may_unswitch_on): Support gswitch statements.
> (clean_up_switches): New function.
> (tree_ssa_unswitch_loops): Call clean_up_switches.
> (simplify_using_entry_checks): Removed and replaced with ranger.
> (tree_unswitch_single_loop): Change assumptions.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/loop-unswitch-6.c: New test.
> * gcc.dg/loop-unswitch-7.c: New test.
> * gcc.dg/loop-unswitch-8.c: New test.
> * gcc.dg/loop-unswitch-9.c: New test.
>
> Co-Authored-By: Richard Biener 
> ---
>   gcc/testsuite/gcc.dg/loop-unswitch-6.c |  56 +
>   gcc/testsuite/gcc.dg/loop-unswitch-7.c |  45 
>   gcc/testsuite/gcc.dg/loop-unswitch-8.c |  28 +++
>   gcc/testsuite/gcc.dg/loop-unswitch-9.c |  34 +++
>   gcc/tree-cfg.c |   7 +-
>   gcc/tree-ssa-loop-unswitch.c   | 284 ++---
>   6 files changed, 374 insertions(+), 80 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-6.c
>   create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-7.c
>   create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-8.c
>   create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-9.c
>
> diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-6.c 
> b/gcc/testsuite/gcc.dg/loop-unswitch-6.c
> new file mode 100644
> index 000..8a022e0f200
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/loop-unswitch-6.c
> @@ -0,0 +1,56 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-details 
> --param=max-unswitch-insns=1000 --param=max-unswitch-level=10" } */
> +
> +int
> +__attribute__((noipa))
> +foo(double *a, double *b, double *c, double *d, double *r, int size, int 
> order)
> +{
> +  for (int i = 0; i < size; i++)
> +  {
> +double tmp, tmp2;
> +
> +switch(order)
> +{
> +  case 0:
> +tmp = -8 * a[i];
> +tmp2 = 2 * b[i];
> +break;
> +  case 1:
> +tmp = 3 * a[i] -  2 * b[i];
> +tmp2 = 5 * b[i] - 2 * c[i];
> +break;
> +  case 2:
> +tmp = 9 * a[i] +  2 * b[i] + c[i];
> +tmp2 = 4 * b[i] + 2 * c[i] + 8 * d[i];
> +break;
> +  case 3:
> +tmp = 3 * a[i] +  2 * b[i] - c[i];
> +tmp2 = b[i] - 2 * c[i] + 8 * d[i];
> +break;
> +  defaut:
> +__builtin_unreachable ();
> +}
> +
> +double x = 3 * tmp + d[i] + tmp;
> +double y = 3.4f * tmp + d[i] + tmp2;
> +r[i] = x + y;
> +  }
> +
> +  return 0;
> +}
> +
> +#define N 16 * 1024
> +double aa[N], bb[N], cc[N], dd[N], rr[N];
> +
> +int main()
> +{
> +  for (int i = 0; i < 100 * 1000; i++)
> +foo (aa, bb, cc, dd, rr, N, i % 4);
> +}
> +
> +
> +/* Test that we actually unswitched something.  */
> +/* { dg-final { scan-tree-dump ";; Unswitching loop with condition: order.* 
> == 0" "unswitch" } } */
> +/* { dg-final { scan-tree-dump ";; Unswitching loop with condition: order.* 
> == 1" "unswitch" } } */
> +/* { dg-final { scan-tree-dump ";; Unswitching loop with condition: order.* 
> == 2" "unswitch" } } */
> +/* { dg-final { scan-tree-dump ";; Unswitching loop with condition: order.* 
> == 3" "unswitch" } } */
> diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-7.c 
> b/gcc/testsuite/gcc.dg/loop-unswitch-7.c
> new file mode 100644
> index 000..00f2fcff64b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/loop-unswitch-7.c
> @@ -0,0 +1,45 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-details 
> --param=max-unswitch-insns=1000 --param=max-unswitch-level=10" } */
> +
> +int
> +foo(double *a, double *b, double *c, double *d, double *r, int size, int 
> order)
> +{
> +  for (int i = 0; i < size; i++)
> +  {
> +double tmp, tmp2;
> +
> +switch(order)
> +{
> +  case 5 ... 6:
> +  case 9:
> +tmp = -8 * a[i];
> +tmp2 = 2 * b[i];
> +break;
> +  case 11:
> +tmp = 3 * a[i] -  2 * b[i];
> +tmp2 = 5 * b[i] - 2 * c[i];
> +break;
> +  case 22:
> +tmp = 9 * a[i] +  2 * b[i] + c[i];
> +tmp2 = 4 * b[i] + 2 * c[i] + 8 * d[i];
> +break;
> +  case 33:
> +tmp = 3 * a[i] +  2 * b[i] - c[i];
> +tmp2 = b[i] - 2 * c[i] + 8 * d[i];
> +break;
> +  defaut:
> +__builtin_unreachable ();
> +}
> +
> +double x = 3 * 

Re: [RFC] Don't move cold code out of loop by checking bb count

2021-09-28 Thread Richard Biener via Gcc-patches
On Fri, Sep 24, 2021 at 8:29 AM Xionghu Luo  wrote:
>
> Update the patch to v3, not sure whether you prefer the paste style
> and continue to link the previous thread as Segher dislikes this...
>
>
> [PATCH v3] Don't move cold code out of loop by checking bb count
>
>
> Changes:
> 1. Handle max_loop in determine_max_movement instead of
> outermost_invariant_loop.
> 2. Remove unnecessary changes.
> 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in can_sm_ref_p.
> 4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused
> infinite loop when implementing v1 and the iteration is missed to be
> updated actually.
>
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html
>
> There was a patch trying to avoid move cold block out of loop:
>
> https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
>
> Richard suggested to "never hoist anything from a bb with lower execution
> frequency to a bb with higher one in LIM invariantness_dom_walker
> before_dom_children".
>
> In gimple LIM analysis, add find_coldest_out_loop to move invariants to
> expected target loop, if profile count of the loop bb is colder
> than target loop preheader, it won't be hoisted out of loop.
> Likely for store motion, if all locations of the REF in loop is cold,
> don't do store motion of it.
>
> SPEC2017 performance evaluation shows 1% performance improvement for
> intrate GEOMEAN and no obvious regression for others.  Especially,
> 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
> largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
> on P8LE.
>
> gcc/ChangeLog:
>
> * loop-invariant.c (find_invariants_bb): Check profile count
> before motion.
> (find_invariants_body): Add argument.
> * tree-ssa-loop-im.c (find_coldest_out_loop): New function.
> (determine_max_movement): Use find_coldest_out_loop.
> (move_computations_worker): Adjust and fix iteration udpate.
> (execute_sm_exit): Check pointer validness.
> (class ref_in_loop_hot_body): New functor.
> (ref_in_loop_hot_body::operator): New.
> (can_sm_ref_p): Use for_all_locs_in_loop.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/recip-3.c: Adjust.
> * gcc.dg/tree-ssa/ssa-lim-18.c: New test.
> * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
> * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
> ---
>  gcc/loop-invariant.c   | 10 ++--
>  gcc/tree-ssa-loop-im.c | 61 --
>  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c|  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 27 ++
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 +
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 28 ++
>  7 files changed, 165 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index fca0c2b24be..5c3be7bf0eb 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool 
> always_reached, bool always_executed)
> call.  */
>
>  static void
> -find_invariants_bb (basic_block bb, bool always_reached, bool 
> always_executed)
> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
> +   bool always_executed)
>  {
>rtx_insn *insn;
> +  basic_block preheader = loop_preheader_edge (loop)->src;
> +
> +  if (preheader->count > bb->count)
> +return;
>
>FOR_BB_INSNS (bb, insn)
>  {
> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block 
> *body,
>unsigned i;
>
>for (i = 0; i < loop->num_nodes; i++)
> -find_invariants_bb (body[i],
> -   bitmap_bit_p (always_reached, i),
> +find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
> bitmap_bit_p (always_executed, i));
>  }
>
> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> index 4b187c2cdaf..655fab03442 100644
> --- a/gcc/tree-ssa-loop-im.c
> +++ b/gcc/tree-ssa-loop-im.c
> @@ -417,6 +417,28 @@ movement_possibility (gimple *stmt)
>return ret;
>  }
>
> +/* Find coldest loop between outmost_loop and loop by comapring profile 
> count.  */
> +
> +static class loop *
> +find_coldest_out_loop (class loop *outmost_loop, class loop *loop,
> +  basic_block curr_bb)
> +{
> +  class loop *cold_loop, *min_loop;
> +  cold_loop = min_loop = outmost_loop;
> +  profile_count min_count = loop_preheader_edge (min_loop)->src->count;
> +
> +  if (curr

Re: [PATCH] Make flag_trapping_math a non-binary Boolean.

2021-09-28 Thread Richard Biener via Gcc-patches
On Tue, Sep 28, 2021 at 1:34 PM Roger Sayle  wrote:
>
>
> Hi Joseph,
> Firstly very many thanks for taking the time to respond, and especially for
> mentioning
> the discussion in PR 54192 (and Marc Glisse's -ffenv-access patches, but
> they are a
> little less relevant).  Indeed the starting point for this patch is Richard
> Beiner's proposal
> in comment #9 for that PR.  That you've partially misunderstood the goal of
> this patch is
> encouraging (if it was simple to understand/fix, there wouldn't be so many
> open PRs).
> Hopefully, I'm bringing some fresh thinking on how to solve/tackle these
> long standing
> issues.
>
> Next, I'd like to state that your "five restrictions" ontology is an
> excellent starting point,
> but I'd like to argue that your proposed list of 5 is the wrong shape
> (insufficiently refined).
> Instead, I'd like to counter-propose that an improvement/refinement of the
> Myers model,
> is actually "3 primitive restrictions * N trapping conditions * 2 flow
> control sensitivity".
>
> For reference, here's your original list:
> > [1] Disallowing code transformations that cause some code to raise more
> > exception flags than it would have before.
> > [2] Disallowing code transformations that cause some code to raise fewer
> > exception flags than it would have before.
> > [3] Ensuring the code generated allows for possible non-local control flow
>
> > from exception traps raised by floating-point operations (this is the part
>
> > where -fnon-call-exceptions might be relevant).
> > [4] Disallowing code transformations that might affect whether an exact
> > underflow exception occurs in some code (not observable through exception
> > flags, is observable through trap handlers).
> > [5] Ensuring floating-point operations that might raise exception flags
> are
> > not removed, or moved past code (asms or function calls) that might read
> > or modify the exception flag state
>
> Firstly your item [3], concerns the relationship between traps and flow
> control, such as C++ exception handling, which is as you correctly point out
> the role of "-fnon-call-exceptions", which Richard B has recently confirmed
> only applies to targets/languages supporting C++ style exceptions, i.e. this
> is controlled by -fexceptions.  On targets such as nvptx-none, that don't
> support non-local control flow, stack unwinding nor setjmp/longjmp, i.e.
> don't support exceptions, this is completely orthogonal to the others.
>
> Next your item [4] highlights what I consider the underlying problem that
> until now has been overlooked, that there are different kinds of traps are
> observationally/behaviourally different.  Above you describe, "underflow",
> but likewise there are traps for inexact result, "2.0 / 3.0", traps for
> division
> by 0.0, that invokes undefined behaviour in C++ (but sometimes not in C),
> and distinctions between quiet and signaling NaNs.  Your primitivie
> restrictions,
> [1], [2] and [5] may apply differently to these different kinds of
> exceptions.
>
> More relevant than Marc Glisse's -fenv-access is actually my
> -fpreserve-traps
> patch from July:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574885.html
> which tackles restriction [5] (and perhaps [2]).
>
> Working towards the Myers restriction model, I believe we'd be a significant
> step
> closer with three (command line) flags (or families of flags):
>
> -ftrapping-math related to Myers restrictons [1],[2],[5]
> -fpreserve-trapsrelated to Myers restriction [5]
> -fcounted-traps related to Myers restriction [2]

Just to throw in a comment without intending to interrupt the fruitful
argument...

I'd like to keep changes refined to the frontends / middle-ends until we
sort out the bigger picture and have an approach that is usable in the
actual implementation and also extensible, that is, it doesn't fall apart
when considering the related problems Joseph mentioned (-frounding-math,
FENV access).

And only _then_ think of how to expose this best to the user with new
user-visible options and tunables.  Because those tend to stick around
forever and so mistakes there are much more costly (and it's not that
we don't have too many entangled knobs in the area of math semantics...)

> The insight that untangles the Gordian knot, is that these three options are
>
> not simple true/false Binary flags, but actually (bit) sets of exception
> types
> (hopefully all actually using the same TRAPPING_MATH enumeration).
>
> Consider the following four lines of C++:
> constexpr t1 = 2.0 / 3.0;
> constexpr t2 = std::numeric_limits::quiet_NaN() == 0.0;
> constexpr t2 = std::numeric_limits::quiet_NaN() < 0.0;
> constexpr t3 = 1.0 / 0.0;
> which by IEEE generate four different types of exception, but as you've
> expertly
> confirmed have (sometimes) different behaviours under the C++ standard.
> Treating all trapping conditions identically is clearly insufficient.
>
> Hopefully, the argument/proposal above is sufficien

Re: [committed] libgomp.oacc-fortran/privatized-ref-2.f90: Fix dg-note (was: [Patch] Fortran: Fix assumed-size to assumed-rank passing [PR94070])

2021-09-28 Thread Thomas Schwinge
Hi!

On 2021-09-27T14:38:56+0200, Tobias Burnus  wrote:
> On 27.09.21 14:07, Tobias Burnus wrote:
>> now committed r12-3897-g00f6de9c69119594f7dad3bd525937c94c8200d0
>
> I accidentally changed dg-note to dg-message when updating the expected
> output, as the dump has changed. (Copying seemingly the sorry line
> instead of the dg-note lines as template.)

Strange.  ;-P

> Changed back to dg-note & committed as
> r12-3898-gda1f6391b7c255e4e2eea983832120eff4f7d3df.

As shown by offloading testing, a bit more is necessary here; I've
pushed to master branch commit a43ae03a053faad871e6f48099d21e64b8e316cf
'Further test case adjustment re "Fortran: Fix assumed-size to
assumed-rank passing"', see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From a43ae03a053faad871e6f48099d21e64b8e316cf Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 28 Sep 2021 08:05:28 +0200
Subject: [PATCH] Further test case adjustment re "Fortran: Fix assumed-size to
 assumed-rank passing"

Fix-up for recent commit 00f6de9c69119594f7dad3bd525937c94c8200d0
"Fortran: Fix assumed-size to assumed-rank passing [PR94070]",
and commit da1f6391b7c255e4e2eea983832120eff4f7d3df
"libgomp.oacc-fortran/privatized-ref-2.f90: Fix dg-note".

Due to use of '#if !ACC_MEM_SHARED' conditionals in
'libgomp.oacc-fortran/if-1.f90', 'target { !  openacc_host_selected }'
needs some special care (ignoring the pre-existing mismatch of
'ACC_MEM_SHARED' vs. 'openacc_host_selected').

As seen with GCN offloading, we need to revert to another bit of the
original code in 'libgomp.oacc-fortran/privatized-ref-2.f90'.

	libgomp/
	* testsuite/libgomp.oacc-fortran/if-1.f90: Adjust.
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
---
 libgomp/testsuite/libgomp.oacc-fortran/if-1.f90 | 6 ++
 libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 | 3 +--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/if-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
index 3089d6a0c43..9eadfcf9738 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
@@ -394,6 +394,7 @@ program main
 
   !$acc data copyin (a(1:N)) copyout (b(1:N)) if (0 == 1)
   ! { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target { ! openacc_host_selected } } .-1 }
+  ! { dg-note {variable 'parm\.[0-9]+' declared in block is candidate for adjusting OpenACC privatization level} "" { target { ! openacc_host_selected } } .-2 }
 
 #if !ACC_MEM_SHARED
   if (acc_is_present (a) .eqv. .TRUE.) STOP 21
@@ -408,6 +409,7 @@ program main
   !$acc data copyin (a(1:N)) if (1 == 1)
   ! { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-1 }
   ! { dg-note {variable 'parm\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-2 }
+  ! { dg-note {variable 'parm\.[0-9]+' declared in block is candidate for adjusting OpenACC privatization level} "" { target { ! openacc_host_selected } } .-3 }
 
 #if !ACC_MEM_SHARED
 if (acc_is_present (a) .eqv. .FALSE.) STOP 23
@@ -416,6 +418,7 @@ program main
 !$acc data copyout (b(1:N)) if (0 == 1)
 ! { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-1 }
 ! { dg-note {variable 'parm\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-2 }
+! { dg-note {variable 'parm\.[0-9]+' declared in block is candidate for adjusting OpenACC privatization level} "" { target { ! openacc_host_selected } } .-3 }
 #if !ACC_MEM_SHARED
   if (acc_is_present (b) .eqv. .TRUE.) STOP 24
 #endif
@@ -864,6 +867,7 @@ program main
 
   !$acc data copyin (a(1:N)) copyout (b(1:N)) if (0 == 1)
   ! { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target { ! openacc_host_selected } } .-1 }
+  ! { dg-note {variable 'parm\.[0-9]+' declared in block is candidate for adjusting OpenACC privatization level} "" { target { ! openacc_host_selected } } .-2 }
 
 #if !ACC_MEM_SHARED
   if (acc_is_present (a) .eqv. .TRUE.) STOP 56
@@ -878,6 +882,7 @@ program main
   !$acc data copyin (a(1:N)) if (1 == 1)
   ! { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-1 }
   ! { dg-note {variable 'parm\.[0-9]+' declared in block isn't candida

Re: [Patch] Fortran: Fix assumed-size to assumed-rank passing [PR94070]

2021-09-28 Thread Thomas Schwinge
Hi!

On 2021-09-27T14:07:53+0200, Tobias Burnus  wrote:
> now committed r12-3897-g00f6de9c69119594f7dad3bd525937c94c8200d0


> Conclusion: Reviews are very helpful :-)

Ha!  :-) (... and I wasn't even involed here!)  ;-P


As testing showed here:

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/assumed_rank_22_aux.c
> @@ -0,0 +1,68 @@
> +/* Called by assumed_rank_22.f90.  */

> +  if (num == 0)
> +assert (x->dim[2].extent == -1);
> +  else if (num == 20)
> +assert (x->dim[2].extent == 1);
> +  else if (num == 40)
> +{
> +  /* FIXME: - dg-output = 'c_assumed ... OK' checked in .f90 file. */
> +  /* assert (x->dim[2].extent == 0); */
> +  if (x->dim[2].extent == 0)
> + __builtin_printf ("c_assumed - 40 - OK\n");
> +  else
> + __builtin_printf ("ERROR: c_assumed num=%d: "
> +   "x->dim[2].extent = %d != 0\n",
> +   num, x->dim[2].extent);
> +}
> +  else if (num == 60)
> +assert (x->dim[2].extent == 2);
> +  else if (num == 80)
> +assert (x->dim[2].extent == 2);
> +  else if (num == 100)
> +{
> +  /* FIXME: - dg-output = 'c_assumed ... OK' checked in .f90 file. */
> +  /* assert (x->dim[2].extent == 0); */
> +  if (x->dim[2].extent == 0)
> + __builtin_printf ("c_assumed - 100 - OK\n");
> +  else
> + __builtin_printf ("ERROR: c_assumed num=%d: "
> +   "x->dim[2].extent = %d != 0\n",
> +   num, x->dim[2].extent);
> +}
> +  else
> +assert (0);

... the 'ERROR:' prefixes printed do confuse DejaGnu...  As obvious,
pushed to master branch commit 95540a6d1d7b29cdd3ed06fbcb07465804504cfd
"'gfortran.dg/assumed_rank_22_aux.c' messages printed vs. DejaGnu", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 95540a6d1d7b29cdd3ed06fbcb07465804504cfd Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 28 Sep 2021 09:02:56 +0200
Subject: [PATCH] 'gfortran.dg/assumed_rank_22_aux.c' messages printed vs.
 DejaGnu

Print lower-case 'error: [...]' instead of upper-case 'ERROR: [...]', to not
confuse the DejaGnu log processing harness into thinking these are DejaGnu
harness ERRORs:

Running /scratch/tschwing/build2-trusty-cs/gcc/build/submit-big/source-gcc/gcc/testsuite/gfortran.dg/dg.exp ...
+ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
+ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
+ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
+ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
+ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
+ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
[...]

Fix-up for recent commit 00f6de9c69119594f7dad3bd525937c94c8200d0
"Fortran: Fix assumed-size to assumed-rank passing [PR94070]".

	gcc/testsuite/
	* gfortran.dg/assumed_rank_22_aux.c: Adjust messages printed.
---
 gcc/testsuite/gfortran.dg/assumed_rank_22_aux.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/assumed_rank_22_aux.c b/gcc/testsuite/gfortran.dg/assumed_rank_22_aux.c
index 2fbf83d649a..e5fe02135e9 100644
--- a/gcc/testsuite/gfortran.dg/assumed_rank_22_aux.c
+++ b/gcc/testsuite/gfortran.dg/assumed_rank_22_aux.c
@@ -29,7 +29,7 @@ c_assumed (CFI_cdesc_t *x, int num)
   if (x->dim[2].extent == 0)
 	__builtin_printf ("c_assumed - 40 - OK\n");
   else
-	__builtin_printf ("ERROR: c_assumed num=%d: "
+	__builtin_printf ("error: c_assumed num=%d: "
 		  "x->dim[2].extent = %d != 0\n",
 		  num, x->dim[2].extent);
 }
@@ -44,7 +44,7 @@ c_assumed (CFI_cdesc_t *x, int num)
   if (x->dim[2].extent == 0)
 	__builtin_printf ("c_assumed - 100 - OK\n");
   else
-	__builtin_printf ("ERROR: c_assumed num=%d: "
+	__builtin_printf ("error: c_assumed num=%d: "
 		  "x->dim[2].extent = %d != 0\n",
 		  num, x->dim[2].extent);
 }
-- 
2.33.0



Re: [PATCH v3 3/3] reassoc: Test rank biasing

2021-09-28 Thread Richard Biener via Gcc-patches
On Tue, 28 Sep 2021, Ilya Leoshkevich wrote:

> On Tue, 2021-09-28 at 13:28 +0200, Richard Biener wrote:
> > On Sun, 26 Sep 2021, Ilya Leoshkevich wrote:
> > 
> > > Add both positive and negative tests.
> > 
> > The tests will likely be quite fragile with respect to what is
> > actually vectorized on which target.  If you move the tests
> > to gcc.dg/vect/ you could at least do
> > 
> > /* { dg-require-effective-target vect_int } */
> > 
> > do you need to look for the exact GIMPLE IL or is it enough to
> > verify we are vectorizing the reduction?
> 
> Actually I don't think vectorization is that important here, and I
> only check how many times sum_x = sum_y + _z appears.  So I use
> (?:vect_)?, which may or may not be there.
> 
> An alternative I considered was to use -fno-tree-vectorize to get
> smaller regexes, but I thought it would be nice to know that
> vectorization does not mess up reassociation results.

Ah, OK.  So lets go ahead with the patch unchanged, but be prepared
to deal with eventual fallout here on weird targets.

Thanks,
Richard.


[Patch] libgomp: Only check for 2*sizeof(void*) int type with Fortran [PR96661]

2021-09-28 Thread Tobias Burnus

Found this one lurking around in one of my trees.

It does not solve the actual issue of John that hppa64-hp-hpux11.11 does
not have an __int128 alias integer(kind=16) type. The latter is required for
OpenMP's omp_depend_kind as per implementation choice is has to be large
enough to store two pointers (2*sizeof(void*)).

However, when thinking about the check again: It does not make sense to
break the build if only C/C++ is enabled and Fortran disabled.

While that probably has no real-world impact, I still think it makes
sense to fix it.

OK for mainline and GCC 11?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp: Only check for 2*sizeof(void*) int type with Fortran [PR96661]

The depend type is a struct with two pointer members for C/C++ - but for
Fortran OpenMP requires an integer type with kind = omp_depend_kind. Thus,
libgomp's configure checks that an integer type/kind with size 2*sizeof(void*)
is available. However, this integer type/kind is not needed when building without
Fortran support. Thus, only check this when Fortran is enabled.

libgomp/
	PR libgomp/96661
	* configure.ac: Only check for int-type = 2*size_t support when
	building with Fortran support.
	* configure: Regenerate.

diff --git a/libgomp/configure b/libgomp/configure
index 6161da579c0..4bc9b381c5c 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -17007,13 +17007,15 @@ fi
 if test $OMP_NEST_LOCK_25_SIZE -gt 8 || test $OMP_NEST_LOCK_25_ALIGN -gt $OMP_NEST_LOCK_25_SIZE; then
   OMP_NEST_LOCK_25_KIND=8
 fi
-if test $OMP_DEPEND_KIND -eq 16; then
-  if test $OMP_INT128_SIZE -ne 16; then
-as_fn_error $? "unsupported system, cannot find Fortran int kind=16, needed for omp_depend_kind" "$LINENO" 5
-  fi
-else
-  if test $OMP_DEPEND_KIND -ne 8; then
-as_fn_error $? "unsupported system, cannot find Fortran integer kind for omp_depend_kind" "$LINENO" 5
+if test "$ac_cv_fc_compiler_gnu" = yes; then
+  if test $OMP_DEPEND_KIND -eq 16; then
+if test $OMP_INT128_SIZE -ne 16; then
+  as_fn_error $? "unsupported system, cannot find Fortran int kind=16, needed for omp_depend_kind" "$LINENO" 5
+fi
+  else
+if test $OMP_DEPEND_KIND -ne 8; then
+  as_fn_error $? "unsupported system, cannot find Fortran integer kind for omp_depend_kind" "$LINENO" 5
+fi
   fi
 fi
 
diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index 7df80a32765..bfb613b91f0 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -438,13 +438,15 @@ fi
 if test $OMP_NEST_LOCK_25_SIZE -gt 8 || test $OMP_NEST_LOCK_25_ALIGN -gt $OMP_NEST_LOCK_25_SIZE; then
   OMP_NEST_LOCK_25_KIND=8
 fi
-if test $OMP_DEPEND_KIND -eq 16; then
-  if test $OMP_INT128_SIZE -ne 16; then
-AC_MSG_ERROR([unsupported system, cannot find Fortran int kind=16, needed for omp_depend_kind])
-  fi
-else
-  if test $OMP_DEPEND_KIND -ne 8; then
-AC_MSG_ERROR([unsupported system, cannot find Fortran integer kind for omp_depend_kind])
+if test "$ac_cv_fc_compiler_gnu" = yes; then
+  if test $OMP_DEPEND_KIND -eq 16; then
+if test $OMP_INT128_SIZE -ne 16; then
+  AC_MSG_ERROR([unsupported system, cannot find Fortran int kind=16, needed for omp_depend_kind])
+fi
+  else
+if test $OMP_DEPEND_KIND -ne 8; then
+  AC_MSG_ERROR([unsupported system, cannot find Fortran integer kind for omp_depend_kind])
+fi
   fi
 fi
 


Re: [Patch] libgomp: Only check for 2*sizeof(void*) int type with Fortran [PR96661]

2021-09-28 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 28, 2021 at 03:00:56PM +0200, Tobias Burnus wrote:
> The depend type is a struct with two pointer members for C/C++ - but for
> Fortran OpenMP requires an integer type with kind = omp_depend_kind. Thus,
> libgomp's configure checks that an integer type/kind with size 2*sizeof(void*)
> is available. However, this integer type/kind is not needed when building 
> without
> Fortran support. Thus, only check this when Fortran is enabled.
> 
> libgomp/
>   PR libgomp/96661
>   * configure.ac: Only check for int-type = 2*size_t support when
>   building with Fortran support.
>   * configure: Regenerate.

Ok, thanks.

Jakub



[PATCH 8/7] ifcvt: Second try in order to avoid unnecessary temporaries

2021-09-28 Thread Robin Dapp via Gcc-patches

Hi,

this patch implements the latest of my attempts to avoid some of the 
unnecessary temporaries noce_convert_multiple currently emits.  I named 
it 8/7 because it actually applies on top of the last series that is not 
yet approved while being a rather minor change.


The idea is to go over the list of convertible sets a second time if, 
during the first try, we noticed that we potentially overwrite the 
condition but no later set makes use of it anymore (because it can rely 
on the CC directly instead).  In that case we omit creating a temporary.


The whole series was bootstrapped and regtested on s390, x86 and ppc64.

Regards
 Robin

--

gcc/ChangeLog:

* ifcvt.c (noce_convert_multiple_sets): Perform a second try 
with less temporaries.
commit dd5a0f8d7d39447025d36ed5305709d38fe3f16b
Author: Robin Dapp 
Date:   Fri Sep 17 20:22:10 2021 +0200

ifcvt: Run second pass if it is possible to omit a temporary.

If one of the to-be-converted SETs requires the original comparison
(i.e. in order to generate a min/max insn) but no other insn after it
does, we can omit creating temporaries, thus facilitating costing.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 4f3af5e1b00..2243157e32c 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3261,6 +3261,11 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 
   need_cmov_or_rewire (then_bb, &need_no_cmov, &rewired_src);
 
+  int last_needs_comparison = -1;
+  bool second_try = false;
+
+restart:
+
   FOR_BB_INSNS (then_bb, insn)
 {
   /* Skip over non-insns.  */
@@ -3302,8 +3307,12 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 	 Therefore we introduce a temporary every time we are about to
 	 overwrite a variable used in the check.  Costing of a sequence with
 	 these is going to be inaccurate so only use temporaries when
-	 needed.  */
-  if (reg_overlap_mentioned_p (target, cond))
+	 needed.
+
+	 If performing a second try, we know how many insns require a
+	 temporary.  For the last of these, we can omit creating one.  */
+  if (reg_overlap_mentioned_p (target, cond)
+	  && (!second_try || count < last_needs_comparison))
 	temp = gen_reg_rtx (GET_MODE (target));
   else
 	temp = target;
@@ -3386,6 +3395,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 	{
 	  seq = seq1;
 	  temp_dest = temp_dest1;
+	  if (!second_try)
+	last_needs_comparison = count;
 	}
   else if (seq2 != NULL_RTX)
 	{
@@ -3409,6 +3420,24 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   unmodified_insns.safe_push (insn);
 }
 
+/* If there are insns that overwrite part of the initial
+   comparison, we can still omit creating temporaries for
+   the last of them.
+   As the second try will always create a less expensive,
+   valid sequence, we do not need to compare and can discard
+   the first one.  */
+if (!second_try && last_needs_comparison >= 0)
+  {
+	end_sequence ();
+	start_sequence ();
+	count = 0;
+	targets.truncate (0);
+	temporaries.truncate (0);
+	unmodified_insns.truncate (0);
+	second_try = true;
+	goto restart;
+  }
+
   /* We must have seen some sort of insn to insert, otherwise we were
  given an empty BB to convert, and we can't handle that.  */
   gcc_assert (!unmodified_insns.is_empty ());


Re: [PATCH] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-09-28 Thread Koning, Paul via Gcc-patches



> On Sep 28, 2021, at 2:14 AM, Richard Biener via Gcc-patches 
>  wrote:
> 
> On Tue, Sep 21, 2021 at 4:26 PM Richard Biener via Gcc-patches
>  wrote:
>> 
>> This makes defaults.h choose DWARF2_DEBUG if PREFERRED_DEBUGGING_TYPE
>> is not specified by the target and errors out if DWARF DWARF is not 
>> supported.
>> 
>> ...
>> 
>> This completes the series of deprecating STABS for GCC 12.
>> 
>> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> 
>> OK for trunk?
> 
> Ping.

pdp11 is fine.

paul



Re: [PATCH 02/13] arm: Add tests for PR target/100757

2021-09-28 Thread Christophe LYON via Gcc-patches



On 28/09/2021 13:12, Kyrylo Tkachov wrote:



-Original Message-
From: Gcc-patches  On Behalf Of Christophe
Lyon via Gcc-patches
Sent: 07 September 2021 10:15
To: gcc-patches@gcc.gnu.org
Subject: [PATCH 02/13] arm: Add tests for PR target/100757

These tests currently trigger an ICE which is fixed later in the patch
series.

The pr100757*.c testcases are derived from
gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
various types and return values different from 0 and 1 to avoid
commonalization with boolean masks.  In addition, since we should not
need these masks, the tests make sure they are not present.

Ok, but I'd rather it was committed together with the patch that fixes the ICE.
I don't mind if it's a separate commit or rolled into that patch.



Sure, I'll wait for the main patch approval. I split it this way to 
hopefully make the reviews easier, to exercise the testcase without the 
fix proposal.


Thanks,

Christophe




Thanks,
Kyrill


2021-09-01  Christophe Lyon  

gcc/testsuite/
PR target/100757
* gcc.target/arm/simd/pr100757-2.c: New.
* gcc.target/arm/simd/pr100757-3.c: New.
* gcc.target/arm/simd/pr100757-4.c: New.
* gcc.target/arm/simd/pr100757.c: New.

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
new file mode 100644
index 000..c2262b4d81e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+float a[32];
+int fn1(int d) {
+  int c = 4;
+  for (int b = 0; b < 32; b++)
+if (a[b] != 2.0f)
+  c = 5;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /*
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t4\n} 4 } } */ /* Initial value
for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t5\n} 4 } } */ /* Possible
value for c.  */
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
new file mode 100644
index 000..e604555c04c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+/* Copied from gcc.c-torture/compile/20160205-1.c.  */
+
+float a[32];
+float fn1(int d) {
+  float c = 4.0f;
+  for (int b = 0; b < 32; b++)
+if (a[b] != 2.0f)
+  c = 5.0f;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /*
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t1084227584\n} 4 } } */ /*
Initial value for c (4.0).  */
+/* { dg-final { scan-assembler-times {\t.word\t1082130432\n} 4 } } */ /*
Possible value for c (5.0).  */
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
new file mode 100644
index 000..c12040c517f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+unsigned int a[32];
+int fn1(int d) {
+  int c = 2;
+  for (int b = 0; b < 32; b++)
+if (a[b])
+  c = 3;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t0\n} 4 } } */ /* 'false' mask.
*/
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-times {\t.word\t2\n} 4 } } */ /* Initial value
for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t3\n} 4 } } */ /* Possible
value for c.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757.c
b/gcc/testsuite/gcc.target/arm/simd/pr100757.c
new file mode 100644
index 000..41d6e4e2d7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+int a[32];
+int fn1(int d) {
+  int c = 2;
+  for (int b = 0; b < 32; b++)
+if (a[b])
+  c = 3;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t0\

Re: [PATCH 03/13] arm: Add test for PR target/101325

2021-09-28 Thread Christophe LYON via Gcc-patches



On 28/09/2021 13:14, Kyrylo Tkachov wrote:



-Original Message-
From: Gcc-patches  On Behalf Of Christophe
Lyon via Gcc-patches
Sent: 07 September 2021 10:15
To: gcc-patches@gcc.gnu.org
Subject: [PATCH 03/13] arm: Add test for PR target/101325

This test is derived from the one provided in the PR: it is a
compile-only test because I do not have access to anything that could
execute it.  We can switch it do 'dg-do run' later, however it would
be better to write a new executable test to ensure coverage in case
the tester cannot execute such code (and it will need a new
arm_v8_1m_mve_hw or similar effective-target).

The test is okay for now.
I think we'll want to have a arm_v8_1m_mve_hw target sooner or later.
Maybe Alex or Andrea can help to write one we can use?



Since I posted the patch series, QEMU has gained support for MVE, I plan 
to write a similar testcase which is executable.


There's already an executable testcase in the PR.

Thanks

Christophe




Thanks,
Kyrill


2021-09-01  Christophe Lyon  

gcc/testsuite/
PR target/101325
* gcc.target/arm/simd/pr101325.c: New.

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr101325.c
b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
new file mode 100644
index 000..a466683a0b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include 
+
+unsigned foo(int8x16_t v, int8x16_t w)
+{
+  return vcmpeqq (v, w);
+}
+/* { dg-final { scan-assembler {\tvcmp.i8  eq} } } */
+/* { dg-final { scan-assembler {\tvmrs\t r[0-9]+, P0} } } */
+/* { dg-final { scan-assembler {\tuxth} } } */
--
2.25.1


Re: [PATCH 04/13] arm: Add GENERAL_AND_VPR_REGS regclass

2021-09-28 Thread Christophe LYON via Gcc-patches



On 28/09/2021 13:18, Kyrylo Tkachov wrote:

Hi Christophe,


-Original Message-
From: Gcc-patches  On Behalf Of Christophe
LYON via Gcc-patches
Sent: 08 September 2021 08:49
To: Richard Earnshaw ; gcc-
patc...@gcc.gnu.org
Subject: Re: [PATCH 04/13] arm: Add GENERAL_AND_VPR_REGS regclass


On 07/09/2021 15:35, Richard Earnshaw wrote:


On 07/09/2021 13:05, Christophe LYON wrote:

On 07/09/2021 11:42, Richard Earnshaw wrote:


On 07/09/2021 10:15, Christophe Lyon via Gcc-patches wrote:

At some point during the development of this patch series, it appeared
that in some cases the register allocator wants “VPR or general”
rather than “VPR or general or FP” (which is the same thing as
ALL_REGS).  The series does not seem to require this anymore, but it
seems to be a good thing to do anyway, to give the register allocator
more freedom.

2021-09-01  Christophe Lyon 

 gcc/
 * config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
 (REG_CLASS_NAMES): Likewise.
 (REG_CLASS_CONTENTS): Likewise. Add VPR_REG to ALL_REGS.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 015299c1534..fab39d05916 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1286,6 +1286,7 @@ enum reg_class
     SFP_REG,
     AFP_REG,
     VPR_REG,
+  GENERAL_AND_VPR_REGS,
     ALL_REGS,
     LIM_REG_CLASSES
   };
@@ -1315,6 +1316,7 @@ enum reg_class
     "SFP_REG",    \
     "AFP_REG",    \
     "VPR_REG",    \
+  "GENERAL_AND_VPR_REGS", \
     "ALL_REGS"    \
   }
   @@ -1343,7 +1345,8 @@ enum reg_class
     { 0x, 0x, 0x, 0x0040 }, /* SFP_REG
*/    \
     { 0x, 0x, 0x, 0x0080 }, /* AFP_REG
*/    \
     { 0x, 0x, 0x, 0x0400 }, /* VPR_REG.
*/    \
-  { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS.
*/    \
+  { 0x5FFF, 0x, 0x, 0x0400 }, /*
GENERAL_AND_VPR_REGS.  */ \
+  { 0x7FFF, 0x, 0x, 0x040F }  /* ALL_REGS.
*/    \
   }

You've changed the definition of ALL_REGS here (to include VPR_REG),
but not really explained why.  Is that the source of the underlying
issue with the 'appeared' you mention?


I first added VPR_REG to ALL_REGS, but Richard Sandiford suggested I
create a new GENERAL_AND_VPR_REGS that would be more restrictive. I
did not remove VPR_REG from ALL_REGS because I thought it was an
omission: shouldn't ALL_REGS contain all registers?

Surely that should be a separate patch then.

OK, I can remove that line from this patch and make a separate one-liner
for ALL_REGS.

Did you end up sending that patch out? (Sorry, I may have missed it in my 
archive).
This patch to add GENERAL_AND_VPR_REGS is okay with the ALL_REGS change 
separated out.


No I didn't send it yet: I suspect there will be iterations on the next 
patches in the series, this small change alone wasn't worth sending a v2 :-)


Thanks,

Christophe




Thanks,
Kyrill


Thanks,

Christophe



R.




R.



     #define FP_SYSREGS \



Re: [PATCH] Improve jump threading dump output.

2021-09-28 Thread Jeff Law via Gcc-patches




On 9/28/2021 3:45 AM, Aldy Hernandez wrote:

In analyzing PR102511, it has become abundantly clear that we need
better debugging aids for the jump threader solver.  Currently
debugging these issues is a nightmare if you're not intimately
familiar with the code.  This patch attempts to improve this.

First, I'm enabling path solver dumps with TDF_THREADING.  None of the
available TDF_* flags are a good match, and using TDF_DETAILS would blow
up the dump file, since both threaders continually call the solver to
try out candidates.  This will allow dumping path solver details without
having to resort to hacking the source.

I am also dumping the current registered_jump_thread dbg counter used
by the registry, in the solver.  That way narrowing down a problematic
thread can then be examined by -fdump-*-threading and looking at the
solver details surrounding the appropriate counter (which the dbgcnt
also dumps to the dump file).

You still need knowledge of the solver to debug these issues, but at
least now it's not entirely opaque.

OK?

gcc/ChangeLog:

* dbgcnt.c (dbg_cnt_counter): New.
* dbgcnt.h (dbg_cnt_counter): New.
* dumpfile.c (dump_options): Add entry for TDF_THREADING.
* dumpfile.h (enum dump_flag): Add TDF_THREADING.
* gimple-range-path.cc (DEBUG_SOLVER): Use TDF_THREADING.
* tree-ssa-threadupdate.c (dump_jump_thread_path): Dump out
debug counter.

OK.

Note we've got massive failures in the tester starting sometime 
yesterday and I suspect all the threader work.    So I'm going to slow 
down on reviews of that code as we stabilize stuff.


jeff



Re: [PATCH] c++: Fix up synthetization of defaulted comparison operators on classes with bitfields [PR102490]

2021-09-28 Thread Patrick Palka via Gcc-patches
On Tue, 28 Sep 2021, Jakub Jelinek via Gcc-patches wrote:

> Hi!
> 
> The testcases in the patch are either miscompiled or ICE with checking,
> because the defaulted operator== is synthetized too early (but only if
> constexpr), when the corresponding class type is still incomplete type.
> The problem is that at that point the bitfield FIELD_DECLs still have as
> TREE_TYPE their underlying type rather than integral type with their
> precision and when layout_class_type is called for the class soon after
> that, it changes those types but the COMPONENT_REFs type stay the way
> that they were during the operator== synthetize_method type and the
> middle-end is then upset by the mismatch of types.
> As what exact type will be given isn't just a one liner but quite long code
> especially for over-sized bitfields, I think it is best to just not
> synthetize the comparison operators so early (the defaulted_late_check
> change) and call defaulted_late_check for them once again as soon as the
> class is complete.

Nice, this might also fix PR98712.

> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2021-09-28  Jakub Jelinek  
> 
>   PR c++/102490
>   * method.c (defaulted_late_check): Don't synthetize constexpr
>   defaulted comparisons if context is still incomplete type.
>   (finish_struct_1): Call defaulted_late_check again for defaulted
>   comparisons.
> 
>   * g++.dg/cpp2a/spaceship-eq11.C: New test.
>   * g++.dg/cpp2a/spaceship-eq12.C: New test.
> 
> --- gcc/cp/method.c.jj2021-09-15 08:55:37.563497558 +0200
> +++ gcc/cp/method.c   2021-09-27 13:48:12.139271830 +0200
> @@ -3160,8 +3160,11 @@ defaulted_late_check (tree fn)
>if (kind == sfk_comparison)
>  {
>/* If the function was declared constexpr, check that the definition
> -  qualifies.  Otherwise we can define the function lazily.  */
> -  if (DECL_DECLARED_CONSTEXPR_P (fn) && !DECL_INITIAL (fn))
> +  qualifies.  Otherwise we can define the function lazily.
> +  Don't do this if the class type is still incomplete.  */
> +  if (DECL_DECLARED_CONSTEXPR_P (fn)
> +   && !DECL_INITIAL (fn)
> +   && COMPLETE_TYPE_P (ctx))
>   {

According to the function comment for defaulted_late_check, won't
COMPLETE_TYPE_P (ctx) always be false here?

> /* Prevent GC.  */
> function_depth++;
> --- gcc/cp/class.c.jj 2021-09-03 09:46:28.801428380 +0200
> +++ gcc/cp/class.c2021-09-27 14:07:03.465562255 +0200
> @@ -7467,7 +7467,14 @@ finish_struct_1 (tree t)
>   for any static member objects of the type we're working on.  */
>for (x = TYPE_FIELDS (t); x; x = DECL_CHAIN (x))
>  if (DECL_DECLARES_FUNCTION_P (x))
> -  DECL_IN_AGGR_P (x) = false;
> +  {
> + /* Synthetize constexpr defaulted comparisons.  */
> + if (!DECL_ARTIFICIAL (x)
> + && DECL_DEFAULTED_IN_CLASS_P (x)
> + && special_function_p (x) == sfk_comparison)
> +   defaulted_late_check (x);
> + DECL_IN_AGGR_P (x) = false;
> +  }
>  else if (VAR_P (x) && TREE_STATIC (x)
>&& TREE_TYPE (x) != error_mark_node
>&& same_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (x)), t))
> --- gcc/testsuite/g++.dg/cpp2a/spaceship-eq11.C.jj2021-09-27 
> 14:20:04.723713371 +0200
> +++ gcc/testsuite/g++.dg/cpp2a/spaceship-eq11.C   2021-09-27 
> 14:20:20.387495858 +0200
> @@ -0,0 +1,43 @@
> +// PR c++/102490
> +// { dg-do run { target c++20 } }
> +
> +struct A
> +{
> +  unsigned char a : 1;
> +  unsigned char b : 1;
> +  constexpr bool operator== (const A &) const = default;
> +};
> +
> +struct B
> +{
> +  unsigned char a : 8;
> +  int : 0;
> +  unsigned char b : 7;
> +  constexpr bool operator== (const B &) const = default;
> +};
> +
> +struct C
> +{
> +  unsigned char a : 3;
> +  unsigned char b : 1;
> +  constexpr bool operator== (const C &) const = default;
> +};
> +
> +void
> +foo (C &x, int y)
> +{
> +  x.b = y;
> +}
> +
> +int
> +main ()
> +{
> +  A a{}, b{};
> +  B c{}, d{};
> +  C e{}, f{};
> +  a.b = 1;
> +  d.b = 1;
> +  foo (e, 0);
> +  foo (f, 1);
> +  return a == b || c == d || e == f;
> +}
> --- gcc/testsuite/g++.dg/cpp2a/spaceship-eq12.C.jj2021-09-27 
> 14:20:12.050611625 +0200
> +++ gcc/testsuite/g++.dg/cpp2a/spaceship-eq12.C   2021-09-27 
> 14:20:39.633228602 +0200
> @@ -0,0 +1,5 @@
> +// PR c++/102490
> +// { dg-do run { target c++20 } }
> +// { dg-options "-O2" }
> +
> +#include "spaceship-eq11.C"
> 
>   Jakub
> 
> 



Re: [PATCH] c++: Fix up synthetization of defaulted comparison operators on classes with bitfields [PR102490]

2021-09-28 Thread Patrick Palka via Gcc-patches
On Tue, 28 Sep 2021, Patrick Palka wrote:

> On Tue, 28 Sep 2021, Jakub Jelinek via Gcc-patches wrote:
> 
> > Hi!
> > 
> > The testcases in the patch are either miscompiled or ICE with checking,
> > because the defaulted operator== is synthetized too early (but only if
> > constexpr), when the corresponding class type is still incomplete type.
> > The problem is that at that point the bitfield FIELD_DECLs still have as
> > TREE_TYPE their underlying type rather than integral type with their
> > precision and when layout_class_type is called for the class soon after
> > that, it changes those types but the COMPONENT_REFs type stay the way
> > that they were during the operator== synthetize_method type and the
> > middle-end is then upset by the mismatch of types.
> > As what exact type will be given isn't just a one liner but quite long code
> > especially for over-sized bitfields, I think it is best to just not
> > synthetize the comparison operators so early (the defaulted_late_check
> > change) and call defaulted_late_check for them once again as soon as the
> > class is complete.
> 
> Nice, this might also fix PR98712.
> 
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > 2021-09-28  Jakub Jelinek  
> > 
> > PR c++/102490
> > * method.c (defaulted_late_check): Don't synthetize constexpr
> > defaulted comparisons if context is still incomplete type.
> > (finish_struct_1): Call defaulted_late_check again for defaulted
> > comparisons.
> > 
> > * g++.dg/cpp2a/spaceship-eq11.C: New test.
> > * g++.dg/cpp2a/spaceship-eq12.C: New test.
> > 
> > --- gcc/cp/method.c.jj  2021-09-15 08:55:37.563497558 +0200
> > +++ gcc/cp/method.c 2021-09-27 13:48:12.139271830 +0200
> > @@ -3160,8 +3160,11 @@ defaulted_late_check (tree fn)
> >if (kind == sfk_comparison)
> >  {
> >/* If the function was declared constexpr, check that the definition
> > -qualifies.  Otherwise we can define the function lazily.  */
> > -  if (DECL_DECLARED_CONSTEXPR_P (fn) && !DECL_INITIAL (fn))
> > +qualifies.  Otherwise we can define the function lazily.
> > +Don't do this if the class type is still incomplete.  */
> > +  if (DECL_DECLARED_CONSTEXPR_P (fn)
> > + && !DECL_INITIAL (fn)
> > + && COMPLETE_TYPE_P (ctx))
> > {
> 
> According to the function comment for defaulted_late_check, won't
> COMPLETE_TYPE_P (ctx) always be false here?

If so, I wonder if we could get away with moving this entire fragment
from defaulted_late_check to finish_struct_1 instead of calling
defaulted_late_check from finish_struct_1.

> 
> >   /* Prevent GC.  */
> >   function_depth++;
> > --- gcc/cp/class.c.jj   2021-09-03 09:46:28.801428380 +0200
> > +++ gcc/cp/class.c  2021-09-27 14:07:03.465562255 +0200
> > @@ -7467,7 +7467,14 @@ finish_struct_1 (tree t)
> >   for any static member objects of the type we're working on.  */
> >for (x = TYPE_FIELDS (t); x; x = DECL_CHAIN (x))
> >  if (DECL_DECLARES_FUNCTION_P (x))
> > -  DECL_IN_AGGR_P (x) = false;
> > +  {
> > +   /* Synthetize constexpr defaulted comparisons.  */
> > +   if (!DECL_ARTIFICIAL (x)
> > +   && DECL_DEFAULTED_IN_CLASS_P (x)
> > +   && special_function_p (x) == sfk_comparison)
> > + defaulted_late_check (x);
> > +   DECL_IN_AGGR_P (x) = false;
> > +  }
> >  else if (VAR_P (x) && TREE_STATIC (x)
> >  && TREE_TYPE (x) != error_mark_node
> >  && same_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (x)), t))
> > --- gcc/testsuite/g++.dg/cpp2a/spaceship-eq11.C.jj  2021-09-27 
> > 14:20:04.723713371 +0200
> > +++ gcc/testsuite/g++.dg/cpp2a/spaceship-eq11.C 2021-09-27 
> > 14:20:20.387495858 +0200
> > @@ -0,0 +1,43 @@
> > +// PR c++/102490
> > +// { dg-do run { target c++20 } }
> > +
> > +struct A
> > +{
> > +  unsigned char a : 1;
> > +  unsigned char b : 1;
> > +  constexpr bool operator== (const A &) const = default;
> > +};
> > +
> > +struct B
> > +{
> > +  unsigned char a : 8;
> > +  int : 0;
> > +  unsigned char b : 7;
> > +  constexpr bool operator== (const B &) const = default;
> > +};
> > +
> > +struct C
> > +{
> > +  unsigned char a : 3;
> > +  unsigned char b : 1;
> > +  constexpr bool operator== (const C &) const = default;
> > +};
> > +
> > +void
> > +foo (C &x, int y)
> > +{
> > +  x.b = y;
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  A a{}, b{};
> > +  B c{}, d{};
> > +  C e{}, f{};
> > +  a.b = 1;
> > +  d.b = 1;
> > +  foo (e, 0);
> > +  foo (f, 1);
> > +  return a == b || c == d || e == f;
> > +}
> > --- gcc/testsuite/g++.dg/cpp2a/spaceship-eq12.C.jj  2021-09-27 
> > 14:20:12.050611625 +0200
> > +++ gcc/testsuite/g++.dg/cpp2a/spaceship-eq12.C 2021-09-27 
> > 14:20:39.633228602 +0200
> > @@ -0,0 +1,5 @@
> > +// PR c++/102490
> > +// { dg-do run { target c++20 } }
> > +// { dg-options "-O2" }
> > +
> > +#include "spaceship-eq11.C"
> > 
> > Jakub
> > 
> > 
> 



Re: [PATCH] Improve jump threading dump output.

2021-09-28 Thread Aldy Hernandez via Gcc-patches




On 9/28/21 3:47 PM, Jeff Law wrote:



On 9/28/2021 3:45 AM, Aldy Hernandez wrote:

In analyzing PR102511, it has become abundantly clear that we need
better debugging aids for the jump threader solver.  Currently
debugging these issues is a nightmare if you're not intimately
familiar with the code.  This patch attempts to improve this.

First, I'm enabling path solver dumps with TDF_THREADING.  None of the
available TDF_* flags are a good match, and using TDF_DETAILS would blow
up the dump file, since both threaders continually call the solver to
try out candidates.  This will allow dumping path solver details without
having to resort to hacking the source.

I am also dumping the current registered_jump_thread dbg counter used
by the registry, in the solver.  That way narrowing down a problematic
thread can then be examined by -fdump-*-threading and looking at the
solver details surrounding the appropriate counter (which the dbgcnt
also dumps to the dump file).

You still need knowledge of the solver to debug these issues, but at
least now it's not entirely opaque.

OK?

gcc/ChangeLog:

* dbgcnt.c (dbg_cnt_counter): New.
* dbgcnt.h (dbg_cnt_counter): New.
* dumpfile.c (dump_options): Add entry for TDF_THREADING.
* dumpfile.h (enum dump_flag): Add TDF_THREADING.
* gimple-range-path.cc (DEBUG_SOLVER): Use TDF_THREADING.
* tree-ssa-threadupdate.c (dump_jump_thread_path): Dump out
debug counter.

OK.

Note we've got massive failures in the tester starting sometime 
yesterday and I suspect all the threader work.    So I'm going to slow 
down on reviews of that code as we stabilize stuff.


Fair enough.  Let's knock those out then.

I just fixed a P1 that was causing undefined behavior.  Other than that, 
I don't have any known regressions apart from the loop crossing 
restrictions which you and me haven't agreed upon yet.  (Well...there 
are some archs that need testsuite tweaking, but they're not bugs per se.)


Send anything my way.

Aldy



Re: [PATCH] Improve jump threading dump output.

2021-09-28 Thread Jeff Law via Gcc-patches




On 9/28/2021 7:53 AM, Aldy Hernandez wrote:



On 9/28/21 3:47 PM, Jeff Law wrote:



On 9/28/2021 3:45 AM, Aldy Hernandez wrote:

In analyzing PR102511, it has become abundantly clear that we need
better debugging aids for the jump threader solver.  Currently
debugging these issues is a nightmare if you're not intimately
familiar with the code.  This patch attempts to improve this.

First, I'm enabling path solver dumps with TDF_THREADING. None of the
available TDF_* flags are a good match, and using TDF_DETAILS would 
blow

up the dump file, since both threaders continually call the solver to
try out candidates.  This will allow dumping path solver details 
without

having to resort to hacking the source.

I am also dumping the current registered_jump_thread dbg counter used
by the registry, in the solver.  That way narrowing down a problematic
thread can then be examined by -fdump-*-threading and looking at the
solver details surrounding the appropriate counter (which the dbgcnt
also dumps to the dump file).

You still need knowledge of the solver to debug these issues, but at
least now it's not entirely opaque.

OK?

gcc/ChangeLog:

* dbgcnt.c (dbg_cnt_counter): New.
* dbgcnt.h (dbg_cnt_counter): New.
* dumpfile.c (dump_options): Add entry for TDF_THREADING.
* dumpfile.h (enum dump_flag): Add TDF_THREADING.
* gimple-range-path.cc (DEBUG_SOLVER): Use TDF_THREADING.
* tree-ssa-threadupdate.c (dump_jump_thread_path): Dump out
debug counter.

OK.

Note we've got massive failures in the tester starting sometime 
yesterday and I suspect all the threader work.    So I'm going to 
slow down on reviews of that code as we stabilize stuff.


Fair enough.  Let's knock those out then.
Yup.  I suspect it's just one or two issues that are showing up on a 
variety of targets.  And as I've said before, that's why we've got a 
tester :-)




I just fixed a P1 that was causing undefined behavior.  Other than 
that, I don't have any known regressions apart from the loop crossing 
restrictions which you and me haven't agreed upon yet. (Well...there 
are some archs that need testsuite tweaking, but they're not bugs per 
se.)
These could end up being testsuite issues.  I've only debugged as far as 
"there's a sea of red failures" on the dashboard.




Send anything my way.
Got a docker instance of the first one spinning right now for debugging 
purposes.  I'll look at it after I finish playing chauffeur for my daughter.


jeff


Re: *PING* [PATCH] c++: fix cases of core1001/1322 by not dropping cv-qualifier of function parameter of type of typename or decltype[PR101402,PR102033,PR102034,PR102039,PR102044]

2021-09-28 Thread Jason Merrill via Gcc-patches

On 9/25/21 15:15, nick huang wrote:

Why doesn't the PR92010 fix address these testcases as well?


3. PR92010 creates new functions of "rebuild_function_or_method_type" and by 
using gdb to trace PR101402 code as following:

template struct A {
  typedef T arr[3];
};
template void f(const typename A::arr) { }// #1
template void f(const A::arr);   // #2

I added some print function declaration code before and after calling 
"maybe_rebuild_function_decl_type" to print out its parameter "r" which is function 
declaration inside "tsubst_function_decl".
Here is the result:
a) Before calling, the function declaration is "void f(int*)" and after calling, it is adjusted to correct one as 
"void f(const int*)". However, after this line "SET_DECL_IMPLICIT_INSTANTIATION (r);",  it fallback to original 
dependent type as "void f(typename A::arr) [with T = int; typename A::arr = int [3]]" till end. This 
completely defeats the purpose of template substitution effort.


That's just an artifact of (bug in) how we print it as template+args 
once it's marked as an instantiation; the actual type of the function 
returned from tsubst_function_decl is still void (const int*).


The problem seems to come when we get back to determine_specialization, 
where we have


  // Then, try to form the new function type.  
=>insttype = tsubst (TREE_TYPE (fn), targs, tf_fndecl_type, NULL_TREE);


which does the wrong substitution again, and not the correct one from 
maybe_rebuild_function_decl_type.


Both this substitution check and the constraint check just before it 
seem redundant with the checks we already did in fn_type_unification, so 
the right fix may be to just remove the broken ones here in 
determine_specialization.


Jason



Re: [PATCH] c++: Fix up synthetization of defaulted comparison operators on classes with bitfields [PR102490]

2021-09-28 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 28, 2021 at 09:49:11AM -0400, Patrick Palka via Gcc-patches wrote:
> > --- gcc/cp/method.c.jj  2021-09-15 08:55:37.563497558 +0200
> > +++ gcc/cp/method.c 2021-09-27 13:48:12.139271830 +0200
> > @@ -3160,8 +3160,11 @@ defaulted_late_check (tree fn)
> >if (kind == sfk_comparison)
> >  {
> >/* If the function was declared constexpr, check that the definition
> > -qualifies.  Otherwise we can define the function lazily.  */
> > -  if (DECL_DECLARED_CONSTEXPR_P (fn) && !DECL_INITIAL (fn))
> > +qualifies.  Otherwise we can define the function lazily.
> > +Don't do this if the class type is still incomplete.  */
> > +  if (DECL_DECLARED_CONSTEXPR_P (fn)
> > + && !DECL_INITIAL (fn)
> > + && COMPLETE_TYPE_P (ctx))
> > {
> 
> According to the function comment for defaulted_late_check, won't
> COMPLETE_TYPE_P (ctx) always be false here?

It is true in the call from the following hunk.
The function comment at least to me doesn't imply it is always called on
incomplete types, and defaultable_fn_check also calls it.
> 
> >   /* Prevent GC.  */
> >   function_depth++;
> > --- gcc/cp/class.c.jj   2021-09-03 09:46:28.801428380 +0200
> > +++ gcc/cp/class.c  2021-09-27 14:07:03.465562255 +0200
> > @@ -7467,7 +7467,14 @@ finish_struct_1 (tree t)
> >   for any static member objects of the type we're working on.  */
> >for (x = TYPE_FIELDS (t); x; x = DECL_CHAIN (x))
> >  if (DECL_DECLARES_FUNCTION_P (x))
> > -  DECL_IN_AGGR_P (x) = false;
> > +  {
> > +   /* Synthetize constexpr defaulted comparisons.  */
> > +   if (!DECL_ARTIFICIAL (x)
> > +   && DECL_DEFAULTED_IN_CLASS_P (x)
> > +   && special_function_p (x) == sfk_comparison)
> > + defaulted_late_check (x);
> > +   DECL_IN_AGGR_P (x) = false;
> > +  }
> >  else if (VAR_P (x) && TREE_STATIC (x)
> >  && TREE_TYPE (x) != error_mark_node
> >  && same_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (x)), t))

Jakub



Fwd: [PATCH][testsuite][aarch64]: Fix gcc.target/aarch64/auto-init-* tests.

2021-09-28 Thread Qing Zhao via Gcc-patches
Ping…

Qing

Begin forwarded message:

From: Qing Zhao via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Subject: [PATCH][testsuite][aarch64]: Fix gcc.target/aarch64/auto-init-* tests.
Date: September 21, 2021 at 2:20:58 PM CDT
To: gcc-patches Nick Alcock via 
mailto:gcc-patches@gcc.gnu.org>>
Reply-To: Qing Zhao mailto:qing.z...@oracle.com>>

Hi,

This is the patch to fix gcc.target/aarch64/auto-init-* tests.

I have tested the change on aarch64-linux with

make check-gcc 
RUNTESTFLAGS='--target_board=unix\{-mabi=lp64,-mabi=ilp32,-mabi=lp64/-fstack-clash-protection/-fstack-protector-all,-mabi=ilp32/-fstack-clash-protection/-fstack-protector-all,-mabi=lp64/-march=armv8-a,-mabi=ilp32/-march=armv8.2-a,-mabi=lp64/-march=armv8.4-a,-mabi=ilp32/-march=armv8.6-a,-mabi=lp64/-march=armv8-r\}
 aarch64.exp=auto-init*'

Everything works fine.

Okay for commit?

Thanks.

Qing

==



From c46888eed5621df842178a85adf7e221c7e00b48 Mon Sep 17 00:00:00 2001
From: qing zhao mailto:qing.z...@oracle.com>>
Date: Tue, 21 Sep 2021 12:05:32 -0700
Subject: [PATCH] testsuite: Fix gcc.target/aarch64/auto-init-* tests.

Add -fno-stack-protector for two testing cases and also different
pattern match for lp64 and ilp32 for the other two cases.

gcc/testsuite/ChangeLog:

2021-09-21  qing zhao  mailto:qing.z...@oracle.com>>

* gcc.target/aarch64/auto-init-1.c: Add -fno-stack-protector.
* gcc.target/aarch64/auto-init-7.c: Likewise.
* gcc.target/aarch64/auto-init-2.c: Different pattern match for
lp64 and ilp32.
* gcc.target/aarch64/auto-init-padding-5.c: Likewise.
---
gcc/testsuite/gcc.target/aarch64/auto-init-1.c | 2 +-
gcc/testsuite/gcc.target/aarch64/auto-init-2.c | 3 ++-
gcc/testsuite/gcc.target/aarch64/auto-init-7.c | 2 +-
gcc/testsuite/gcc.target/aarch64/auto-init-padding-5.c | 3 ++-
4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-1.c 
b/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
index 0fa4708..a38d91b 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
@@ -1,6 +1,6 @@
/* Verify zero initialization for integer and pointer type automatic variables. 
 */
/* { dg-do compile } */
-/* { dg-options "-ftrivial-auto-var-init=zero -fdump-rtl-expand" } */
+/* { dg-options "-ftrivial-auto-var-init=zero -fdump-rtl-expand 
-fno-stack-protector" } */

#ifndef __cplusplus
# define bool _Bool
diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-2.c 
b/gcc/testsuite/gcc.target/aarch64/auto-init-2.c
index 2c54e6d..136dbf6 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-2.c
@@ -32,4 +32,5 @@ void foo()
/* { dg-final { scan-rtl-dump-times "0xfe\\\]" 1 "expand" } } */
/* { dg-final { scan-rtl-dump-times "0xfefe" 1 "expand" } } */
/* { dg-final { scan-rtl-dump-times "0xfefefefe" 2 "expand" } } */
-/* { dg-final { scan-rtl-dump-times "0xfefefefefefefefe" 2 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "0xfefefefefefefefe" 2 "expand" { target 
lp64 } } } */
+/* { dg-final { scan-rtl-dump-times "0xfefefefefefefefe" 1 "expand" { target 
ilp32 } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-7.c 
b/gcc/testsuite/gcc.target/aarch64/auto-init-7.c
index ac27fbe..fde6e56 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-7.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-7.c
@@ -1,6 +1,6 @@
/* Verify zero initialization for array, union, and structure type automatic 
variables.  */
/* { dg-do compile } */
-/* { dg-options "-ftrivial-auto-var-init=zero -fdump-rtl-expand" } */
+/* { dg-options "-ftrivial-auto-var-init=zero -fdump-rtl-expand 
-fno-stack-protector" } */

struct S
{
diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-5.c 
b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-5.c
index 3c45a6c..7991367 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-5.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-5.c
@@ -17,6 +17,7 @@ int foo ()
  return var.four;
}

-/* { dg-final { scan-assembler-times "stp\txzr, xzr," 2 } } */
+/* { dg-final { scan-assembler-times "stp\txzr, xzr," 2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "stp\txzr, xzr," 1 { target ilp32 } } } */


--
1.9.1




[PATCH] aarch64: Add command-line support for Armv8.7-a

2021-09-28 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch adds support for -march=armv8.7-a in GCC.
It adds the +ls64 extension that's included in this architecture revision.
Currently this is just the command-line option and +ls64 allows the relevant 
instructions
to be used in inline assembly. The ACLE defines some intrinsics for them but 
those can be
added separately later (together with the appropriate __ARM_FEATURE_* 
predefine).

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

2021-09-27  Kyrylo Tkachov  

* config/aarch64/aarch64.h (AARCH64_FL_LS64): Define
(AARCH64_FL_V8_7): Likewise.
(AARCH64_FL_FOR_ARCH8_7): Likewise.
* config/aarch64/aarch64-arches.def (armv8.7-a): Define.
* config/aarch64/aarch64-option-extensions.def (ls64): Define.
* doc/invoke.texi: Document the above.


v87.patch
Description: v87.patch


Re: [r12-3899 Regression] FAIL: gcc.dg/strlenopt-13.c scan-tree-dump-times strlen1 "memcpy \\(" 7 on Linux/x86_64

2021-09-28 Thread Martin Sebor via Gcc-patches

On 9/28/21 1:20 AM, Richard Biener wrote:

On Mon, 27 Sep 2021, sunil.k.pandey wrote:


On Linux/x86_64,

d06dc8a2c73735e9496f434787ba4c93ceee5eea is the first bad commit
commit d06dc8a2c73735e9496f434787ba4c93ceee5eea
Author: Richard Biener 
Date:   Mon Sep 27 13:36:12 2021 +0200

 middle-end/102450 - avoid type_for_size for non-existing modes

caused

FAIL: gcc.dg/out-of-bounds-1.c  (test for warnings, line 12)
FAIL: gcc.dg/pr78408-1.c scan-tree-dump-times fab1 "after previous" 17
FAIL: gcc.dg/strlenopt-13.c scan-tree-dump-times strlen1 "memcpy \\(" 7


After the change the new memcpy inlining limit using MOVE_MAX * MOVE_RATIO
comes into play and ends up using an OImode move which previously was
disregarded as there's no __int256 standard type in the frontend
(but now we build such type anyway after verifying the mode exists and
it has move support).

For example gcc.dg/out-of-bounds-1.c which looks like

void ProjectOverlay(const float localTextureAxis[2], char *lump)
{
const void *d = &localTextureAxis;
int size = sizeof(float)*8 ;
__builtin_memcpy( &lump[ 0 ], d, size );  /* { dg-warning "reading" }
*/
}

gets turned into

 movq%rdi, -8(%rsp)
 vmovdqu64   -8(%rsp), %ymm31
 vmovdqu64   %ymm31, (%rsi)

which I guess is good but then the diagnostic is no longer emitted
because -Wstringop-overread only applies to the builtin.  Usually
we avoid the folding in such a case but

   /* Detect out-of-bounds accesses without issuing warnings.
  Avoid folding out-of-bounds copies but to avoid false
  positives for unreachable code defer warning until after
  DCE has worked its magic.
  -Wrestrict is still diagnosed.  */
   if (int warning = check_bounds_or_overlap (as_a (stmt),
  dest, src, len,
len,
  false, false))
 if (warning != OPT_Wrestrict)
   return false;


The check_bounds_or_overlap() call only implements -Wrestrict and
a small subset of -Warray-bounds (the subset issued for forming
out-of-bounds pointers by built-ins).  It's a limitation/bug in
the gimple-ssa-warn-restrict.c code that it doesn't detect
the problem (it's confused by taking the address of a pointer).

To let the test pass I suggest either bumping up the size or making
it an odd number (or anything else that's not a power of 2).



does not seem to trigger here.  Changing the testcase to

void ProjectOverlay(const float localTextureAxis[2], char *lump)
{
const void *d = &localTextureAxis;
int size = sizeof(float)*4 ;
__builtin_memcpy( &lump[ 0 ], d, size );  /* { dg-warning "reading" }
*/
}

also fails to warn.


The very late -Wstringop-{overflow,overread} warnings that run just
before expansion have historically only worked for built-in calls.
Now that they are in a GIMPLE pass of their own as opposed to
working with trees in builtins.c, it will be easy to handle plain
stores as well.  It's on my list of things to do.

Martin



Richard.





Re: [PATCH] Improve jump threading dump output.

2021-09-28 Thread Jeff Law via Gcc-patches




On 9/28/2021 7:53 AM, Aldy Hernandez wrote:



On 9/28/21 3:47 PM, Jeff Law wrote:



On 9/28/2021 3:45 AM, Aldy Hernandez wrote:

In analyzing PR102511, it has become abundantly clear that we need
better debugging aids for the jump threader solver.  Currently
debugging these issues is a nightmare if you're not intimately
familiar with the code.  This patch attempts to improve this.

First, I'm enabling path solver dumps with TDF_THREADING. None of the
available TDF_* flags are a good match, and using TDF_DETAILS would 
blow

up the dump file, since both threaders continually call the solver to
try out candidates.  This will allow dumping path solver details 
without

having to resort to hacking the source.

I am also dumping the current registered_jump_thread dbg counter used
by the registry, in the solver.  That way narrowing down a problematic
thread can then be examined by -fdump-*-threading and looking at the
solver details surrounding the appropriate counter (which the dbgcnt
also dumps to the dump file).

You still need knowledge of the solver to debug these issues, but at
least now it's not entirely opaque.

OK?

gcc/ChangeLog:

* dbgcnt.c (dbg_cnt_counter): New.
* dbgcnt.h (dbg_cnt_counter): New.
* dumpfile.c (dump_options): Add entry for TDF_THREADING.
* dumpfile.h (enum dump_flag): Add TDF_THREADING.
* gimple-range-path.cc (DEBUG_SOLVER): Use TDF_THREADING.
* tree-ssa-threadupdate.c (dump_jump_thread_path): Dump out
debug counter.

OK.

Note we've got massive failures in the tester starting sometime 
yesterday and I suspect all the threader work.    So I'm going to 
slow down on reviews of that code as we stabilize stuff.


Fair enough.  Let's knock those out then.

So several are failing gcc.dg/loop-unswitch-3.c.

This test appears to be verifying that we unswitch a test in one of the 
loops, which is no longer happening after the change to replace the VRP 
threader with the hybrid forward threader.


So both the old VRP threader and the new style identify and realize a 
single jump thread.


In the old VRP threader realization of the jump thread ends up creating 
nested loops.  In the new implementation we end up creating a single 
loop with two back edges to the header.


ie, the (partial) graphs look like this

OLD

       1<--+
       |  |
+->  2 |
|    /   \   |
|  3 4  |
+- + +-+

NEW


+->  2 <-+
|    /   \   |
|  3 4  |
+- + +-+


I wonder if we're not doing proper loop fixups or something similar 
after that change.  IIRC we have/had bits in the copier and CFG update 
code to mark the loops that need re-analysis and fixing up.


Anyway, you should be able to trigger and analyze with a cross compiler.

I've got to switch to my day job, but I'll pass along more as I get a 
chance to look at them.


jeff





Re: [PATCH] Improve jump threading dump output.

2021-09-28 Thread Richard Biener via Gcc-patches
On September 28, 2021 5:45:52 PM GMT+02:00, Jeff Law via Gcc-patches 
 wrote:
>
>
>On 9/28/2021 7:53 AM, Aldy Hernandez wrote:
>>
>>
>> On 9/28/21 3:47 PM, Jeff Law wrote:
>>>
>>>
>>> On 9/28/2021 3:45 AM, Aldy Hernandez wrote:
 In analyzing PR102511, it has become abundantly clear that we need
 better debugging aids for the jump threader solver.  Currently
 debugging these issues is a nightmare if you're not intimately
 familiar with the code.  This patch attempts to improve this.

 First, I'm enabling path solver dumps with TDF_THREADING. None of the
 available TDF_* flags are a good match, and using TDF_DETAILS would 
 blow
 up the dump file, since both threaders continually call the solver to
 try out candidates.  This will allow dumping path solver details 
 without
 having to resort to hacking the source.

 I am also dumping the current registered_jump_thread dbg counter used
 by the registry, in the solver.  That way narrowing down a problematic
 thread can then be examined by -fdump-*-threading and looking at the
 solver details surrounding the appropriate counter (which the dbgcnt
 also dumps to the dump file).

 You still need knowledge of the solver to debug these issues, but at
 least now it's not entirely opaque.

 OK?

 gcc/ChangeLog:

 * dbgcnt.c (dbg_cnt_counter): New.
 * dbgcnt.h (dbg_cnt_counter): New.
 * dumpfile.c (dump_options): Add entry for TDF_THREADING.
 * dumpfile.h (enum dump_flag): Add TDF_THREADING.
 * gimple-range-path.cc (DEBUG_SOLVER): Use TDF_THREADING.
 * tree-ssa-threadupdate.c (dump_jump_thread_path): Dump out
 debug counter.
>>> OK.
>>>
>>> Note we've got massive failures in the tester starting sometime 
>>> yesterday and I suspect all the threader work.    So I'm going to 
>>> slow down on reviews of that code as we stabilize stuff.
>>
>> Fair enough.  Let's knock those out then.
>So several are failing gcc.dg/loop-unswitch-3.c.
>
>This test appears to be verifying that we unswitch a test in one of the 
>loops, which is no longer happening after the change to replace the VRP 
>threader with the hybrid forward threader.
>
>So both the old VRP threader and the new style identify and realize a 
>single jump thread.
>
>In the old VRP threader realization of the jump thread ends up creating 
>nested loops.  In the new implementation we end up creating a single 
>loop with two back edges to the header.
>
>ie, the (partial) graphs look like this
>
>OLD
>
>        1<--+
>        |  |
>+->  2 |
>|    /   \   |
>|  3 4  |
>+- + +-+
>
>NEW
>
>
>+->  2 <-+
>|    /   \   |
>|  3 4  |
>+- + +-+
>
>
>I wonder if we're not doing proper loop fixups or something similar 
>after that change.  IIRC we have/had bits in the copier and CFG update 
>code to mark the loops that need re-analysis and fixing up.
>
>Anyway, you should be able to trigger and analyze with a cross compiler.
>
>I've got to switch to my day job, but I'll pass along more as I get a 
>chance to look at them.

If you're stuck I'm also happy to help. Note that relying on loop fixup is 
almost never good because we easily lose track of loop association of info like 
OMP simd loops and all loop pragmas. 

Richard. 

>jeff
>
>
>



Re: [PATCH] Improve jump threading dump output.

2021-09-28 Thread Richard Biener via Gcc-patches
On September 28, 2021 5:45:52 PM GMT+02:00, Jeff Law via Gcc-patches 
 wrote:
>
>
>On 9/28/2021 7:53 AM, Aldy Hernandez wrote:
>>
>>
>> On 9/28/21 3:47 PM, Jeff Law wrote:
>>>
>>>
>>> On 9/28/2021 3:45 AM, Aldy Hernandez wrote:
 In analyzing PR102511, it has become abundantly clear that we need
 better debugging aids for the jump threader solver.  Currently
 debugging these issues is a nightmare if you're not intimately
 familiar with the code.  This patch attempts to improve this.

 First, I'm enabling path solver dumps with TDF_THREADING. None of the
 available TDF_* flags are a good match, and using TDF_DETAILS would 
 blow
 up the dump file, since both threaders continually call the solver to
 try out candidates.  This will allow dumping path solver details 
 without
 having to resort to hacking the source.

 I am also dumping the current registered_jump_thread dbg counter used
 by the registry, in the solver.  That way narrowing down a problematic
 thread can then be examined by -fdump-*-threading and looking at the
 solver details surrounding the appropriate counter (which the dbgcnt
 also dumps to the dump file).

 You still need knowledge of the solver to debug these issues, but at
 least now it's not entirely opaque.

 OK?

 gcc/ChangeLog:

 * dbgcnt.c (dbg_cnt_counter): New.
 * dbgcnt.h (dbg_cnt_counter): New.
 * dumpfile.c (dump_options): Add entry for TDF_THREADING.
 * dumpfile.h (enum dump_flag): Add TDF_THREADING.
 * gimple-range-path.cc (DEBUG_SOLVER): Use TDF_THREADING.
 * tree-ssa-threadupdate.c (dump_jump_thread_path): Dump out
 debug counter.
>>> OK.
>>>
>>> Note we've got massive failures in the tester starting sometime 
>>> yesterday and I suspect all the threader work.    So I'm going to 
>>> slow down on reviews of that code as we stabilize stuff.
>>
>> Fair enough.  Let's knock those out then.
>So several are failing gcc.dg/loop-unswitch-3.c.
>
>This test appears to be verifying that we unswitch a test in one of the 
>loops, which is no longer happening after the change to replace the VRP 
>threader with the hybrid forward threader.
>
>So both the old VRP threader and the new style identify and realize a 
>single jump thread.
>
>In the old VRP threader realization of the jump thread ends up creating 
>nested loops.  In the new implementation we end up creating a single 
>loop with two back edges to the header.
>
>ie, the (partial) graphs look like this
>
>OLD
>
>        1<--+
>        |  |
>+->  2 |
>|    /   \   |
>|  3 4  |
>+- + +-+
>
>NEW
>
>
>+->  2 <-+
>|    /   \   |
>|  3 4  |
>+- + +-+
>
>
>I wonder if we're not doing proper loop fixups or something similar 
>after that change.  IIRC we have/had bits in the copier and CFG update 
>code to mark the loops that need re-analysis and fixing up.
>
>Anyway, you should be able to trigger and analyze with a cross compiler.
>
>I've got to switch to my day job, but I'll pass along more as I get a 
>chance to look at them.

If you're stuck I'm also happy to help. Note that relying on loop fixup is 
almost never good because we easily lose track of loop association of info like 
OMP simd loops and all loop pragmas. 

Richard. 

>jeff
>
>
>



[PATCH] [PR102501] Adjust jump threading testcases for ppc64* and others.

2021-09-28 Thread Aldy Hernandez via Gcc-patches
I really don't know what to do here.  This is a bit of whack-o-mole.
The IL is sufficiently different for various architectures that any
tweak can cause the number of jump threads to vary.

For the pr7745-2.c testcase, we have less threading candidates because 2
of them now cross loop boundaries.  Interestingly, this test matches
"Jumps threaded", not threads registered, so the block copier can
drop threads at copying time adding further confusion.

For example, we can register N threads, but the old copier can cancel
N-M threads while updating the CFG for a variety of different reasons
(removed edges, threading through loop exits, etc).  This makes the
"Registering jump threads" not to match the total number of threads this
test checks for with "Jumps threaded".

The pr66752-3.c test OTOH, is just a matter of thread4 eliminating the
"if".  I had erroneously thought it would always be eliminated by
thread3, but we really don't care where it gets cleaned up.  All we know
is that DCE can't depend on the early threaders doing this work, because
it may cross loop boundaries.  I've chosen thread4 arbitrarily, but we
could just as easily pick the ".optimized" dump.

Sorry, I'm really at my wits end here.  I don't see any clean path
forward, except rewrite these tests as gimple IL.  They're close to useless
as they sit.

OK?

gcc/testsuite/ChangeLog:

PR testsuite/102501
* gcc.dg/tree-ssa/pr66752-3.c: Adjust.
* gcc.dg/tree-ssa/pr77445-2.c: Adjust.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c | 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
index 922a331b217..ba7025ae33b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread4" } */
 
 extern int status, pt;
 extern int count;
@@ -43,4 +43,4 @@ foo (int N, int c, int b, int *a)
run after loop optimizations , can successfully eliminate the
references to FLAG.  Verify that ther are no references by the late
threading passes.  */
-/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */
+/* { dg-final { scan-tree-dump-not "if .flag" "thread4"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
index 01a0f1f197d..18f7aab2be7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
@@ -123,7 +123,7 @@ enum STATES FMS( u8 **in , u32 *transitions) {
aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
to change decisions in switch expansion which in turn can expose new
jump threading opportunities.  Skip the later tests on aarch64.  */
-/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: \[7-9\]" "thread1" } } */
 /* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */
 /* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */
 /* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */
-- 
2.31.1



Re: [PATCH] Improve jump threading dump output.

2021-09-28 Thread Aldy Hernandez via Gcc-patches




On 9/28/21 6:05 PM, Richard Biener wrote:

On September 28, 2021 5:45:52 PM GMT+02:00, Jeff Law via Gcc-patches 
 wrote:



On 9/28/2021 7:53 AM, Aldy Hernandez wrote:



On 9/28/21 3:47 PM, Jeff Law wrote:



On 9/28/2021 3:45 AM, Aldy Hernandez wrote:

In analyzing PR102511, it has become abundantly clear that we need
better debugging aids for the jump threader solver.  Currently
debugging these issues is a nightmare if you're not intimately
familiar with the code.  This patch attempts to improve this.

First, I'm enabling path solver dumps with TDF_THREADING. None of the
available TDF_* flags are a good match, and using TDF_DETAILS would
blow
up the dump file, since both threaders continually call the solver to
try out candidates.  This will allow dumping path solver details
without
having to resort to hacking the source.

I am also dumping the current registered_jump_thread dbg counter used
by the registry, in the solver.  That way narrowing down a problematic
thread can then be examined by -fdump-*-threading and looking at the
solver details surrounding the appropriate counter (which the dbgcnt
also dumps to the dump file).

You still need knowledge of the solver to debug these issues, but at
least now it's not entirely opaque.

OK?

gcc/ChangeLog:

 * dbgcnt.c (dbg_cnt_counter): New.
 * dbgcnt.h (dbg_cnt_counter): New.
 * dumpfile.c (dump_options): Add entry for TDF_THREADING.
 * dumpfile.h (enum dump_flag): Add TDF_THREADING.
 * gimple-range-path.cc (DEBUG_SOLVER): Use TDF_THREADING.
 * tree-ssa-threadupdate.c (dump_jump_thread_path): Dump out
 debug counter.

OK.

Note we've got massive failures in the tester starting sometime
yesterday and I suspect all the threader work.    So I'm going to
slow down on reviews of that code as we stabilize stuff.


Fair enough.  Let's knock those out then.

So several are failing gcc.dg/loop-unswitch-3.c.

This test appears to be verifying that we unswitch a test in one of the
loops, which is no longer happening after the change to replace the VRP
threader with the hybrid forward threader.

So both the old VRP threader and the new style identify and realize a
single jump thread.

In the old VRP threader realization of the jump thread ends up creating
nested loops.  In the new implementation we end up creating a single
loop with two back edges to the header.

ie, the (partial) graphs look like this

OLD

        1<--+
        |  |
+->  2 |
|    /   \   |
|  3 4  |
+- + +-+

NEW


+->  2 <-+
|    /   \   |
|  3 4  |
+- + +-+


I wonder if we're not doing proper loop fixups or something similar
after that change.  IIRC we have/had bits in the copier and CFG update
code to mark the loops that need re-analysis and fixing up.

Anyway, you should be able to trigger and analyze with a cross compiler.

I've got to switch to my day job, but I'll pass along more as I get a
chance to look at them.


If you're stuck I'm also happy to help. Note that relying on loop fixup is 
almost never good because we easily lose track of loop association of info like 
OMP simd loops and all loop pragmas.


I could absolutely use the help here.  Care to take a look?

Aldy



[Patch] Fortran: Fix same_type_as

2021-09-28 Thread Tobias Burnus

Found when looking at Sandra's c535b-1.f90 and playing around.
When fixing same_type_as, I spotted by code reading another issue,
related to not catering for derived types. (Untested whether it
failed indeed.)

I added now a bunch of testcases.

OK for mainline?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Fix same_type_as

A test for CLASS(*) + assumed rank was missing; adding a test to
unlimited_polymorphic_1.f03 showed an ICE as backend_decl wasn't
set. While gfc_get_symbol_decl would fix it, the code also assumed
that the class(*) was a variable and could not be a subobject of
a derived type.

gcc/fortran/ChangeLog:

	* trans-intrinsic.c (gfc_conv_same_type_as): Fix handling
	of UNLIMITED_POLY.
	* trans.h (gfc_vtpr_hash_get): Renamed prototype to ...
	(gfc_vptr_hash_get): ... this to match function name.

gcc/testsuite/ChangeLog:

	* gfortran.dg/c-interop/c535b-1.f90: Remove wrong comment.
	* gfortran.dg/unlimited_polymorphic_1.f03: Extend.
	* gfortran.dg/unlimited_polymorphic_32.f90: New test.

 gcc/fortran/trans-intrinsic.c  |  42 ++--
 gcc/fortran/trans.h|   2 +-
 gcc/testsuite/gfortran.dg/c-interop/c535b-1.f90|   2 -
 .../gfortran.dg/unlimited_polymorphic_1.f03|  17 +-
 .../gfortran.dg/unlimited_polymorphic_32.f90   | 254 +
 5 files changed, 296 insertions(+), 21 deletions(-)

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 900a1a29817..2a2829c9f04 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -9126,21 +9126,14 @@ gfc_conv_same_type_as (gfc_se *se, gfc_expr *expr)
   a = expr->value.function.actual->expr;
   b = expr->value.function.actual->next->expr;
 
-  if (UNLIMITED_POLY (a))
+  bool unlimited_poly_a = UNLIMITED_POLY (a);
+  bool unlimited_poly_b = UNLIMITED_POLY (b);
+  if (unlimited_poly_a)
 {
-  tmp = gfc_class_vptr_get (a->symtree->n.sym->backend_decl);
-  conda = fold_build2_loc (input_location, NE_EXPR, logical_type_node,
-			   tmp, build_int_cst (TREE_TYPE (tmp), 0));
-}
-
-  if (UNLIMITED_POLY (b))
-{
-  tmp = gfc_class_vptr_get (b->symtree->n.sym->backend_decl);
-  condb = fold_build2_loc (input_location, NE_EXPR, logical_type_node,
-			   tmp, build_int_cst (TREE_TYPE (tmp), 0));
+  se1.want_pointer = 1;
+  gfc_add_vptr_component (a);
 }
-
-  if (a->ts.type == BT_CLASS)
+  else if (a->ts.type == BT_CLASS)
 {
   gfc_add_vptr_component (a);
   gfc_add_hash_component (a);
@@ -9149,7 +9142,12 @@ gfc_conv_same_type_as (gfc_se *se, gfc_expr *expr)
 a = gfc_get_int_expr (gfc_default_integer_kind, NULL,
 			  a->ts.u.derived->hash_value);
 
-  if (b->ts.type == BT_CLASS)
+  if (unlimited_poly_b)
+{
+  se2.want_pointer = 1;
+  gfc_add_vptr_component (b);
+}
+  else if (b->ts.type == BT_CLASS)
 {
   gfc_add_vptr_component (b);
   gfc_add_hash_component (b);
@@ -9161,6 +9159,22 @@ gfc_conv_same_type_as (gfc_se *se, gfc_expr *expr)
   gfc_conv_expr (&se1, a);
   gfc_conv_expr (&se2, b);
 
+  if (unlimited_poly_a)
+{
+  conda = fold_build2_loc (input_location, NE_EXPR, logical_type_node,
+			   se1.expr,
+			   build_int_cst (TREE_TYPE (se1.expr), 0));
+  se1.expr = gfc_vptr_hash_get (se1.expr);
+}
+
+  if (unlimited_poly_b)
+{
+  condb = fold_build2_loc (input_location, NE_EXPR, logical_type_node,
+			   se2.expr,
+			   build_int_cst (TREE_TYPE (se2.expr), 0));
+  se2.expr = gfc_vptr_hash_get (se2.expr);
+}
+
   tmp = fold_build2_loc (input_location, EQ_EXPR,
 			 logical_type_node, se1.expr,
 			 fold_convert (TREE_TYPE (se1.expr), se2.expr));
diff --git a/gcc/fortran/trans.h b/gcc/fortran/trans.h
index 53f0f86b265..fa3e8651b44 100644
--- a/gcc/fortran/trans.h
+++ b/gcc/fortran/trans.h
@@ -438,7 +438,7 @@ tree gfc_class_vtab_def_init_get (tree);
 tree gfc_class_vtab_copy_get (tree);
 tree gfc_class_vtab_final_get (tree);
 /* Get an accessor to the vtab's * field, when a vptr handle is present.  */
-tree gfc_vtpr_hash_get (tree);
+tree gfc_vptr_hash_get (tree);
 tree gfc_vptr_size_get (tree);
 tree gfc_vptr_extends_get (tree);
 tree gfc_vptr_def_init_get (tree);
diff --git a/gcc/testsuite/gfortran.dg/c-interop/c535b-1.f90 b/gcc/testsuite/gfortran.dg/c-interop/c535b-1.f90
index 3de77b00106..748e027f897 100644
--- a/gcc/testsuite/gfortran.dg/c-interop/c535b-1.f90
+++ b/gcc/testsuite/gfortran.dg/c-interop/c535b-1.f90
@@ -297,8 +297,6 @@ end function
 ! coshape, lcobound, ucobound: requires CODIMENSION attribute, which is
 !   not permitted on an assumed-rank variable.
 !
-! extends_type_of, same_type_as: require a class argument.
-
 
 ! F2018 additionally permits the first

Re: [PATCH] [PR102501] Adjust jump threading testcases for ppc64* and others.

2021-09-28 Thread Jeff Law via Gcc-patches




On 9/28/2021 10:09 AM, Aldy Hernandez wrote:

I really don't know what to do here.  This is a bit of whack-o-mole.
The IL is sufficiently different for various architectures that any
tweak can cause the number of jump threads to vary.

For the pr7745-2.c testcase, we have less threading candidates because 2
of them now cross loop boundaries.  Interestingly, this test matches
"Jumps threaded", not threads registered, so the block copier can
drop threads at copying time adding further confusion.

For example, we can register N threads, but the old copier can cancel
N-M threads while updating the CFG for a variety of different reasons
(removed edges, threading through loop exits, etc).  This makes the
"Registering jump threads" not to match the total number of threads this
test checks for with "Jumps threaded".

The pr66752-3.c test OTOH, is just a matter of thread4 eliminating the
"if".  I had erroneously thought it would always be eliminated by
thread3, but we really don't care where it gets cleaned up.  All we know
is that DCE can't depend on the early threaders doing this work, because
it may cross loop boundaries.  I've chosen thread4 arbitrarily, but we
could just as easily pick the ".optimized" dump.

Sorry, I'm really at my wits end here.  I don't see any clean path
forward, except rewrite these tests as gimple IL.  They're close to useless
as they sit.

OK?

gcc/testsuite/ChangeLog:

PR testsuite/102501
* gcc.dg/tree-ssa/pr66752-3.c: Adjust.
* gcc.dg/tree-ssa/pr77445-2.c: Adjust.

Note these were two of the consistent failures on other targets as well.
Jeff



Re: [PATCH] c++: Fix up synthetization of defaulted comparison operators on classes with bitfields [PR102490]

2021-09-28 Thread Patrick Palka via Gcc-patches
On Tue, 28 Sep 2021, Jakub Jelinek wrote:

> On Tue, Sep 28, 2021 at 09:49:11AM -0400, Patrick Palka via Gcc-patches wrote:
> > > --- gcc/cp/method.c.jj2021-09-15 08:55:37.563497558 +0200
> > > +++ gcc/cp/method.c   2021-09-27 13:48:12.139271830 +0200
> > > @@ -3160,8 +3160,11 @@ defaulted_late_check (tree fn)
> > >if (kind == sfk_comparison)
> > >  {
> > >/* If the function was declared constexpr, check that the 
> > > definition
> > > -  qualifies.  Otherwise we can define the function lazily.  */
> > > -  if (DECL_DECLARED_CONSTEXPR_P (fn) && !DECL_INITIAL (fn))
> > > +  qualifies.  Otherwise we can define the function lazily.
> > > +  Don't do this if the class type is still incomplete.  */
> > > +  if (DECL_DECLARED_CONSTEXPR_P (fn)
> > > +   && !DECL_INITIAL (fn)
> > > +   && COMPLETE_TYPE_P (ctx))
> > >   {
> > 
> > According to the function comment for defaulted_late_check, won't
> > COMPLETE_TYPE_P (ctx) always be false here?
> 
> It is true in the call from the following hunk.
> The function comment at least to me doesn't imply it is always called on
> incomplete types, and defaultable_fn_check also calls it.

Ah yeah, sorry for the noise, I misunderstood the function comment.

On a related note I think 'ctx' can also be a NAMESPACE_DECL here in
the case of a defaulted non-member operator<=> (as in the below), for
which I'd expect the added COMPLETE_TYPE_P check to crash, but it looks
like in this case DECL_INITIAL is error_mark_node instead of NULL_TREE
so a crash is averted.  If anyone else was wondering...

  struct A {
friend constexpr bool operator==(const A&, const A&);
  };

  constexpr bool operator==(const A&, const A&) = default;

> > 
> > > /* Prevent GC.  */
> > > function_depth++;
> > > --- gcc/cp/class.c.jj 2021-09-03 09:46:28.801428380 +0200
> > > +++ gcc/cp/class.c2021-09-27 14:07:03.465562255 +0200
> > > @@ -7467,7 +7467,14 @@ finish_struct_1 (tree t)
> > >   for any static member objects of the type we're working on.  */
> > >for (x = TYPE_FIELDS (t); x; x = DECL_CHAIN (x))
> > >  if (DECL_DECLARES_FUNCTION_P (x))
> > > -  DECL_IN_AGGR_P (x) = false;
> > > +  {
> > > + /* Synthetize constexpr defaulted comparisons.  */
> > > + if (!DECL_ARTIFICIAL (x)
> > > + && DECL_DEFAULTED_IN_CLASS_P (x)
> > > + && special_function_p (x) == sfk_comparison)
> > > +   defaulted_late_check (x);
> > > + DECL_IN_AGGR_P (x) = false;
> > > +  }
> > >  else if (VAR_P (x) && TREE_STATIC (x)
> > >&& TREE_TYPE (x) != error_mark_node
> > >&& same_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (x)), t))
> 
>   Jakub
> 
> 



Re: [PATCH] c++: Fix up synthetization of defaulted comparison operators on classes with bitfields [PR102490]

2021-09-28 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 28, 2021 at 12:44:58PM -0400, Patrick Palka wrote:
> Ah yeah, sorry for the noise, I misunderstood the function comment.
> 
> On a related note I think 'ctx' can also be a NAMESPACE_DECL here in
> the case of a defaulted non-member operator<=> (as in the below), for
> which I'd expect the added COMPLETE_TYPE_P check to crash, but it looks
> like in this case DECL_INITIAL is error_mark_node instead of NULL_TREE
> so a crash is averted.  If anyone else was wondering...
> 
>   struct A {
> friend constexpr bool operator==(const A&, const A&);
>   };
> 
>   constexpr bool operator==(const A&, const A&) = default;

That means maybe ctx isn't the right way to get at the type and we
should look it up from the first argument's type?
I guess I'll look at where the build_comparison_op takes it from...

Jakub



Re: [PATCH] c++: Fix up synthetization of defaulted comparison operators on classes with bitfields [PR102490]

2021-09-28 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 28, 2021 at 06:49:38PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Tue, Sep 28, 2021 at 12:44:58PM -0400, Patrick Palka wrote:
> > Ah yeah, sorry for the noise, I misunderstood the function comment.
> > 
> > On a related note I think 'ctx' can also be a NAMESPACE_DECL here in
> > the case of a defaulted non-member operator<=> (as in the below), for
> > which I'd expect the added COMPLETE_TYPE_P check to crash, but it looks
> > like in this case DECL_INITIAL is error_mark_node instead of NULL_TREE
> > so a crash is averted.  If anyone else was wondering...
> > 
> >   struct A {
> > friend constexpr bool operator==(const A&, const A&);
> >   };
> > 
> >   constexpr bool operator==(const A&, const A&) = default;
> 
> That means maybe ctx isn't the right way to get at the type and we
> should look it up from the first argument's type?
> I guess I'll look at where the build_comparison_op takes it from...

  tree lhs = DECL_ARGUMENTS (fndecl);
  if (is_this_parameter (lhs))
lhs = cp_build_fold_indirect_ref (lhs);
  else
lhs = convert_from_reference (lhs);
  tree ctype = TYPE_MAIN_VARIANT (TREE_TYPE (lhs));
apparently.

Jakub



[committed] libstdc++: Fix mismatched noexcept-specifiers in filesystem::path [PR102499]

2021-09-28 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102499
* include/bits/fs_path.h (path::begin, path::end): Add noexcept
to declarations, to match definitions.

Tested x86_64-linux. Committed to trunk.

commit f2b7f56a15d9cbbd2f0db22e0e39c4dd161bab69
Author: Jonathan Wakely 
Date:   Mon Sep 27 22:07:12 2021

libstdc++: Fix mismatched noexcept-specifiers in filesystem::path [PR102499]

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102499
* include/bits/fs_path.h (path::begin, path::end): Add noexcept
to declarations, to match definitions.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 92f7cbbe357..1918c243d74 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -489,8 +489,8 @@ namespace __detail
 class iterator;
 using const_iterator = iterator;
 
-iterator begin() const;
-iterator end() const;
+iterator begin() const noexcept;
+iterator end() const noexcept;
 
 /// Write a path to a stream
 template


[committed] libstdc++: Improve std::forward static assert message

2021-09-28 Thread Jonathan Wakely via Gcc-patches
The previous message told you something was wrong, but not why it
happened or why it's bad. This changes it to explain that the function
is being misused.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/move.h (forward(remove_reference_t&&)):
Improve text of static_assert.
* testsuite/20_util/forward/c_neg.cc: Adjust dg-error.
* testsuite/20_util/forward/f_neg.cc: Likewise.

Tested x86_64-linux. Committed to trunk.

commit a11052d98db2f2a61841f0c5ee84de4ca1b3e296
Author: Jonathan Wakely 
Date:   Tue Sep 28 12:35:29 2021

libstdc++: Improve std::forward static assert message

The previous message told you something was wrong, but not why it
happened or why it's bad. This changes it to explain that the function
is being misused.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/move.h (forward(remove_reference_t&&)):
Improve text of static_assert.
* testsuite/20_util/forward/c_neg.cc: Adjust dg-error.
* testsuite/20_util/forward/f_neg.cc: Likewise.

diff --git a/libstdc++-v3/include/bits/move.h b/libstdc++-v3/include/bits/move.h
index 3abbb37ceeb..2dd7ed9e4f9 100644
--- a/libstdc++-v3/include/bits/move.h
+++ b/libstdc++-v3/include/bits/move.h
@@ -88,8 +88,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr _Tp&&
 forward(typename std::remove_reference<_Tp>::type&& __t) noexcept
 {
-  static_assert(!std::is_lvalue_reference<_Tp>::value, "template argument"
-   " substituting _Tp must not be an lvalue reference type");
+  static_assert(!std::is_lvalue_reference<_Tp>::value,
+ "std::forward must not be used to convert an rvalue to an lvalue");
   return static_cast<_Tp&&>(__t);
 }
 
diff --git a/libstdc++-v3/testsuite/20_util/forward/c_neg.cc 
b/libstdc++-v3/testsuite/20_util/forward/c_neg.cc
index dc7ec51bde6..3875792866e 100644
--- a/libstdc++-v3/testsuite/20_util/forward/c_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/forward/c_neg.cc
@@ -17,7 +17,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-error "must not be an lvalue reference" "" { target *-*-* } 0 }
+// { dg-error "convert an rvalue to an lvalue" "" { target *-*-* } 0 }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/20_util/forward/f_neg.cc 
b/libstdc++-v3/testsuite/20_util/forward/f_neg.cc
index 4ccd7264c65..51ccaf29c1a 100644
--- a/libstdc++-v3/testsuite/20_util/forward/f_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/forward/f_neg.cc
@@ -17,7 +17,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-error "must not be an lvalue reference" "" { target *-*-* } 0 }
+// { dg-error "convert an rvalue to an lvalue" "" { target *-*-* } 0 }
 
 #include 
 


Re: [PATCH] c++: Fix up synthetization of defaulted comparison operators on classes with bitfields [PR102490]

2021-09-28 Thread Patrick Palka via Gcc-patches
On Tue, 28 Sep 2021, Jakub Jelinek wrote:

> On Tue, Sep 28, 2021 at 06:49:38PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > On Tue, Sep 28, 2021 at 12:44:58PM -0400, Patrick Palka wrote:
> > > Ah yeah, sorry for the noise, I misunderstood the function comment.
> > > 
> > > On a related note I think 'ctx' can also be a NAMESPACE_DECL here in
> > > the case of a defaulted non-member operator<=> (as in the below), for
> > > which I'd expect the added COMPLETE_TYPE_P check to crash, but it looks
> > > like in this case DECL_INITIAL is error_mark_node instead of NULL_TREE
> > > so a crash is averted.  If anyone else was wondering...
> > > 
> > >   struct A {
> > > friend constexpr bool operator==(const A&, const A&);
> > >   };
> > > 
> > >   constexpr bool operator==(const A&, const A&) = default;
> > 
> > That means maybe ctx isn't the right way to get at the type and we
> > should look it up from the first argument's type?
> > I guess I'll look at where the build_comparison_op takes it from...

I suspect this synthesize_method call from defaulted_late_check is
really only needed when operator<=> has been defaulted inside the class
definition, because out-of-class defaulted definitions generally already
get eagerly synthesized IIUC.  So it might be fine to keep using ctx if
we also check DECL_DEFAULTED_IN_CLASS_P in defaulted_late_check.  But
Jason knows for sure..

> 
>   tree lhs = DECL_ARGUMENTS (fndecl);
>   if (is_this_parameter (lhs))
> lhs = cp_build_fold_indirect_ref (lhs);
>   else
> lhs = convert_from_reference (lhs);
>   tree ctype = TYPE_MAIN_VARIANT (TREE_TYPE (lhs));
> apparently.
> 
>   Jakub
> 
> 



Re: [PATCH] coroutines: Only set parm copy guard vars if we have exceptions [PR 102454].

2021-09-28 Thread Jason Merrill via Gcc-patches

On 9/27/21 15:38, Iain Sandoe wrote:

For coroutines, we make copies of the original function arguments into
the coroutine frame.  Normally, these are destroyed on the proper exit
from the coroutine when the frame is destroyed.

However, if an exception is thrown before the first suspend point is
reached, the cleanup has to happen in the ramp function.  These cleanups
are guarded such that they are only applied to any param copies actually
made.

The ICE is caused by an attempt to set the guard variable when there are
no exceptions enabled (the guard var is not created in this case).

Fixed by checking for flag_exceptions in this case too.

While touching this code paths, also clean up the synthetic names used
when a function parm is unnamed.

tested on x86_64-darwin,
OK for master?


OK.


Signed-off-by: Iain Sandoe 

PR c++/102454

gcc/cp/ChangeLog:

* coroutines.cc (analyze_fn_parms): Clean up synthetic names for
unnamed function params.
(morph_fn_to_coro): Do not try to set a guard variable for param
DTORs in the ramp, unless we have exceptions active.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr102454.C: New test.
---
  gcc/cp/coroutines.cc   | 26 ---
  gcc/testsuite/g++.dg/coroutines/pr102454.C | 38 ++
  2 files changed, 52 insertions(+), 12 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr102454.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index fbd5c49533f..c761e769c12 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -3829,13 +3829,12 @@ analyze_fn_parms (tree orig)
  
if (TYPE_HAS_NONTRIVIAL_DESTRUCTOR (parm.frame_type))

{
- char *buf = xasprintf ("_Coro_%s_live", IDENTIFIER_POINTER (name));
- parm.guard_var = build_lang_decl (VAR_DECL, get_identifier (buf),
-   boolean_type_node);
- free (buf);
- DECL_ARTIFICIAL (parm.guard_var) = true;
- DECL_CONTEXT (parm.guard_var) = orig;
- DECL_INITIAL (parm.guard_var) = boolean_false_node;
+ char *buf = xasprintf ("%s%s_live", DECL_NAME (arg) ? "_Coro_" : "",
+IDENTIFIER_POINTER (name));
+ parm.guard_var
+   = coro_build_artificial_var (UNKNOWN_LOCATION, get_identifier (buf),
+boolean_type_node, orig,
+boolean_false_node);
  parm.trivial_dtor = false;
}
else
@@ -4843,11 +4842,14 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
 NULL, parm.frame_type,
 LOOKUP_NORMAL,
 tf_warning_or_error);
- /* This var is now live.  */
- r = build_modify_expr (fn_start, parm.guard_var,
-boolean_type_node, INIT_EXPR, fn_start,
-boolean_true_node, boolean_type_node);
- finish_expr_stmt (r);
+ if (flag_exceptions)
+   {
+ /* This var is now live.  */
+ r = build_modify_expr (fn_start, parm.guard_var,
+boolean_type_node, INIT_EXPR, fn_start,
+boolean_true_node, boolean_type_node);
+ finish_expr_stmt (r);
+   }
}
}
  }
diff --git a/gcc/testsuite/g++.dg/coroutines/pr102454.C 
b/gcc/testsuite/g++.dg/coroutines/pr102454.C
new file mode 100644
index 000..41aeda7b973
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr102454.C
@@ -0,0 +1,38 @@
+//  { dg-additional-options "-fno-exceptions" }
+
+#include 
+#include 
+
+template 
+struct looper {
+  struct promise_type {
+auto get_return_object () { return handle_type::from_promise (*this); }
+auto initial_suspend () { return suspend_always_prt {}; }
+auto final_suspend () noexcept { return suspend_always_prt {}; }
+void return_value (T);
+void unhandled_exception ();
+  };
+
+  using handle_type = std::coroutine_handle;
+
+  looper (handle_type);
+
+  struct suspend_always_prt {
+bool await_ready () noexcept;
+void await_suspend (handle_type) noexcept;
+void await_resume () noexcept;
+  };
+};
+
+template 
+looper
+with_ctorable_state (T)
+{
+  co_return T ();
+}
+
+auto
+foo ()
+{
+  return with_ctorable_state;
+}





[PATCH] bpf: correct extra_headers

2021-09-28 Thread David Faust via Gcc-patches
The BPF CO-RE support (commit 8bdabb37549f12ce727800a1c8aa182c0b1dd42a)
mistakenly overwrote bpf-*-* extra_headers in config.gcc, causing
bpf-helpers.h to not be installed. The redefinition with coreout.h is
unneeded, so delete it.

gcc/ChangeLog:

* config.gcc (bpf-*-*): Do not overwrite extra_headers.
---
 gcc/config.gcc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 498c51e619d..aa5bd5d1459 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1531,7 +1531,6 @@ bpf-*-*)
 use_collect2=no
 extra_headers="bpf-helpers.h"
 use_gcc_stdint=provide
-extra_headers="coreout.h"
 extra_objs="coreout.o"
 target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.c"
 ;;
-- 
2.30.2



Re: [PATCH] c++: Fix up synthetization of defaulted comparison operators on classes with bitfields [PR102490]

2021-09-28 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 28, 2021 at 01:25:13PM -0400, Patrick Palka via Gcc-patches wrote:
> On Tue, 28 Sep 2021, Jakub Jelinek wrote:
> 
> > On Tue, Sep 28, 2021 at 06:49:38PM +0200, Jakub Jelinek via Gcc-patches 
> > wrote:
> > > On Tue, Sep 28, 2021 at 12:44:58PM -0400, Patrick Palka wrote:
> > > > Ah yeah, sorry for the noise, I misunderstood the function comment.
> > > > 
> > > > On a related note I think 'ctx' can also be a NAMESPACE_DECL here in
> > > > the case of a defaulted non-member operator<=> (as in the below), for
> > > > which I'd expect the added COMPLETE_TYPE_P check to crash, but it looks
> > > > like in this case DECL_INITIAL is error_mark_node instead of NULL_TREE
> > > > so a crash is averted.  If anyone else was wondering...
> > > > 
> > > >   struct A {
> > > > friend constexpr bool operator==(const A&, const A&);
> > > >   };
> > > > 
> > > >   constexpr bool operator==(const A&, const A&) = default;
> > > 
> > > That means maybe ctx isn't the right way to get at the type and we
> > > should look it up from the first argument's type?
> > > I guess I'll look at where the build_comparison_op takes it from...
> 
> I suspect this synthesize_method call from defaulted_late_check is
> really only needed when operator<=> has been defaulted inside the class
> definition, because out-of-class defaulted definitions generally already
> get eagerly synthesized IIUC.  So it might be fine to keep using ctx if
> we also check DECL_DEFAULTED_IN_CLASS_P in defaulted_late_check.  But
> Jason knows for sure..

Indeed, cp_finish_decl has:
8333  /* An out-of-class default definition is defined at
8334 the point where it is explicitly defaulted.  */
8335  if (DECL_DELETED_FN (decl))
8336maybe_explain_implicit_delete (decl);
8337  else if (DECL_INITIAL (decl) == error_mark_node)
8338synthesize_method (decl);

Jakub



[PATCH] debug/102507: ICE in btf_finalize when compiling with -gbtf

2021-09-28 Thread Indu Bhagat via Gcc-patches
Fix the free'up of btf_var_ids hash_map in btf_finalize ().

Testing notes:

- Bootstrapped GCC with -gbtf as an experiment.
- Usual bootstrap and regression testing on x86_64.
- BPF backend testing - make all-gcc, reg tested bpf.exp, btf.exp and ctf.exp.
  (tested using David Faust's config.gcc patch posted earlier
   https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580422.html)

gcc/ChangeLog:

PR debug/102507
* btfout.c (GTY): Add GTY (()) albeit for cosmetic only purpose.
(btf_finalize): Empty the hash_map btf_var_ids.
---
 gcc/btfout.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/btfout.c b/gcc/btfout.c
index cdc6c63..a787815 100644
--- a/gcc/btfout.c
+++ b/gcc/btfout.c
@@ -70,7 +70,7 @@ static char btf_info_section_label[MAX_BTF_LABEL_BYTES];
converted to BTF_KIND_VAR type records. Strictly accounts for the index
from the start of the variable type entries, does not include the number
of types emitted prior to the variable records.  */
-static hash_map  *btf_var_ids;
+static GTY (()) hash_map  *btf_var_ids;
 
 /* Mapping of type IDs from original CTF ID to BTF ID. Types do not map
1-to-1 from CTF to BTF. To avoid polluting the CTF container when updating
@@ -1119,12 +1119,12 @@ btf_finalize (void)
 
   funcs = NULL;
 
+  btf_var_ids->empty ();
+  btf_var_ids = NULL;
+
   free (btf_id_map);
   btf_id_map = NULL;
 
-  ggc_free (btf_var_ids);
-  btf_var_ids = NULL;
-
   ctf_container_ref tu_ctfc = ctf_get_tu_ctfc ();
   ctfc_delete_container (tu_ctfc);
   tu_ctfc = NULL;
-- 
1.8.3.1



[PATCH] ctf: Do not warn for CTF not supported for GNU GIMPLE

2021-09-28 Thread Indu Bhagat via Gcc-patches
CTF is supported for C only.  Currently, a warning is emitted if the -gctf
command line option is specified for a non-C frontend.  This warning is also
used by the GCC testsuite framework - it skips adding -gctf to the list of
debug flags for automated testing, if CTF is not supported for the frontend.

The following warning, however, is not useful in case of LTO:

"lto1: note: CTF debug info requested, but not supported for ‘GNU GIMPLE’
frontend"

This patch disables the generation of the above warning for GNU GIMPLE.

Bootstrapped and regression tested on x86_64.

gcc/ChangeLog:

* toplev.c (process_options): Do not warn for GNU GIMPLE.
---
 gcc/toplev.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/gcc/toplev.c b/gcc/toplev.c
index e1688aa..511a343 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1416,14 +1416,16 @@ process_options (void)
debug_info_level = DINFO_LEVEL_NONE;
 }
 
-  /* CTF is supported for only C at this time.
- Compiling with -flto results in frontend language of GNU GIMPLE.  */
+  /* CTF is supported for only C at this time.  */
   if (!lang_GNU_C ()
   && ctf_debug_info_level > CTFINFO_LEVEL_NONE)
 {
-  inform (UNKNOWN_LOCATION,
- "CTF debug info requested, but not supported for %qs frontend",
- language_string);
+  /* Compiling with -flto results in frontend language of GNU GIMPLE.  It
+is not useful to warn in that case.  */
+  if (!startswith (lang_hooks.name, "GNU GIMPLE"))
+   inform (UNKNOWN_LOCATION,
+   "CTF debug info requested, but not supported for %qs frontend",
+   language_string);
   ctf_debug_info_level = CTFINFO_LEVEL_NONE;
 }
 
-- 
1.8.3.1



[pushed] libgcc, X86, Darwin: Export cpu_model and indicator.

2021-09-28 Thread Iain Sandoe via Gcc-patches
Hi,

These two symbols have been emitted since 4.8, but were not added
to the Darwin exports, so we have been using the ones from libgcc.a.

Added to libgcc_s now.

tested on i686 and x86_64-darwin, pushed to master,
thanks
Iain

Signed-off-by: Iain Sandoe 

libgcc/ChangeLog:

* config/i386/libgcc-darwin.ver: Add Symbols for
__cpu_model, __cpu_indicator_init.
---
 libgcc/config/i386/libgcc-darwin.ver | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/libgcc/config/i386/libgcc-darwin.ver 
b/libgcc/config/i386/libgcc-darwin.ver
index 5224cdc982e..c97dae73855 100644
--- a/libgcc/config/i386/libgcc-darwin.ver
+++ b/libgcc/config/i386/libgcc-darwin.ver
@@ -1,4 +1,7 @@
-
+GCC_4.8.0 {
+  __cpu_model
+  __cpu_indicator_init
+}
 
 %inherit GCC_12.0.0 GCC_7.0.0
 GCC_12.0.0 {
-- 
2.24.3 (Apple Git-128)



Re: [Patch] Fortran: Fix assumed-size to assumed-rank passing [PR94070]

2021-09-28 Thread Harald Anlauf via Gcc-patches
Hi Tobias,

let me first reach for my brown bag...

> Otherwise, the quote from F2018 of my previous email applies:
>
> F2018:16.9.109 LBOUND has for "case(i)", i.e. with a 'dim'
> argument the following. The case without 'dim' just iterates
> through case (i) for each dim. Thus:
>
> "If DIM is present,
>   ARRAY is a whole array,
>   and either ARRAY is an assumed-size array of rank DIM
>   or dimension DIM of ARRAY has nonzero extent,
>   the result has a value equal to the lower bound for subscript DIM of ARRAY.
> Otherwise, if DIM is present, the result value is 1."

It was probably too late, and I could no longer distinguish
"assumed-size" from "assumed-rank", and likely some more...

> Here, we assume dim=2 is present [either directly or via case(ii)],
> ARRAY is a whole array but it neither is of assumed size nor has nonzero
> extent.
> Hence, the "otherwise" applies and the result is 1 - as gfortran has
> and ifort has in the caller.

... which lead to my complete confusion and loss of focus.

Of course you are right.  Sorry for that.  Will now put that bag on...

Harald



[PATCH] c++: ttp matching with constrained auto parm [PR99909]

2021-09-28 Thread Patrick Palka via Gcc-patches
Here, when unifying TT with S, processing_template_decl is unset, and
this foils the dependence checks in do_auto_deduction for avoiding
checking constraints on an auto when the initializer is dependent.

This patch fixes this issue by making sure processing_template_decl is
set during the call to unify from coerce_template_template_parms; this
seems sensible because we're unifying one set of template parameters
with another, so we're dealing with templated trees throughout.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/99909

gcc/cp/ChangeLog:

* pt.c (coerce_template_template_parms): Keep
processing_template_decl set during the call to unify as well.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-ttp3.C: New test.
---
 gcc/cp/pt.c|  4 ++--
 gcc/testsuite/g++.dg/cpp2a/concepts-ttp3.C | 11 +++
 2 files changed, 13 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ttp3.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 41fa7ed5e43..1dcdffe322a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -7994,12 +7994,12 @@ coerce_template_template_parms (tree parm_parms,
   /* So coerce P's args to apply to A's parms, and then deduce between A's
 args and the converted args.  If that succeeds, A is at least as
 specialized as P, so they match.*/
+  processing_template_decl_sentinel ptds (/*reset*/false);
+  ++processing_template_decl;
   tree pargs = template_parms_level_to_args (parm_parms);
   pargs = add_outermost_template_args (outer_args, pargs);
-  ++processing_template_decl;
   pargs = coerce_template_parms (arg_parms, pargs, NULL_TREE, tf_none,
 /*require_all*/true, /*use_default*/true);
-  --processing_template_decl;
   if (pargs != error_mark_node)
{
  tree targs = make_tree_vec (nargs);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ttp3.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp3.C
new file mode 100644
index 000..898524e0dfa
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp3.C
@@ -0,0 +1,11 @@
+// PR c++/99909
+// { dg-do compile { target c++20 } }
+
+template constexpr bool always_true = true;
+template concept C = always_true;
+
+template struct S { };
+
+template class TT> void f() { }
+
+template void f();
-- 
2.33.0.591.gddb1055343



[pushed] Darwin, PPC : Fix R13 for PPC64.

2021-09-28 Thread Iain Sandoe via Gcc-patches
Hi,

We have a somewhat unusual situation in that for PPC64, R13 is
both reserved for future use by the ABI document and callee-saved.
In fact, it is already  used internally by the pthreads
implementation to contain pthread_self.

So add R13 to the fixed regs, but also keep it in the callee-
saved set.

tested on powerpc-darwin9, pushed to master,
thanks
Iain

gcc/ChangeLog:

* config/rs6000/darwin.h (FIXED_R13): Add for PPC64.
(FIRST_SAVED_GP_REGNO): Save from R13 even when it is one
of the fixed regs.
---
 gcc/config/rs6000/darwin.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index 6abf8e84f54..120b01f9a2b 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -203,7 +203,7 @@
 
 /* Make both r2 and r13 available for allocation.  */
 #define FIXED_R2 0
-#define FIXED_R13 0
+#define FIXED_R13 TARGET_64BIT
 
 /* Base register for access to local variables of the function.  */
 
@@ -213,6 +213,9 @@
 #undef  RS6000_PIC_OFFSET_TABLE_REGNUM
 #define RS6000_PIC_OFFSET_TABLE_REGNUM 31
 
+#undef FIRST_SAVED_GP_REGNO
+#define FIRST_SAVED_GP_REGNO 13
+
 /* Darwin's stack must remain 16-byte aligned for both 32 and 64 bit
ABIs.  */
 
-- 
2.24.3 (Apple Git-128)



Re: [PATCH] rs6000: Remove builtin mask check from builtin_decl [PR102347]

2021-09-28 Thread Bill Schmidt via Gcc-patches
Hi Kewen,

Although I agree that what we do now is tragically bad (and will be fixed in 
the builtin rewrite), this seems a little too cavalier to remove all checking 
during initialization without adding any checking somewhere else. :-)  We still 
need to check for invalid usage when the builtin is expanded, and I don't think 
the old code does this at all.

Unless you are planning to do a backport, I think the proper way forward here 
is to just wait for the new builtin support to land.  In the new code, we 
initialize all built-ins up front, and check properly at expansion time whether 
the builtin is enabled in the environment that obtains during expand.

My two cents,
Bill

On 9/28/21 3:13 AM, Kewen.Lin wrote:
> Hi,
>
> As the discussion in PR102347, currently builtin_decl is invoked so
> early, it's when making up the function_decl for builtin functions,
> at that time the rs6000_builtin_mask could be wrong for those
> builtins sitting in #pragma/attribute target functions, though it
> will be updated properly later when LTO processes all nodes.
>
> This patch is to align with the practice i386 port adopts, also
> align with r10-7462 by relaxing builtin mask checking in some places.
>
> Bootstrapped and regress-tested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
>   PR target/102347
>   * config/rs6000/rs6000-call.c (rs6000_builtin_decl): Remove builtin
>   mask check.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/102347
>   * gcc.target/powerpc/pr102347.c: New test.
>
> ---
>  gcc/config/rs6000/rs6000-call.c | 14 --
>  gcc/testsuite/gcc.target/powerpc/pr102347.c | 15 +++
>  2 files changed, 19 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102347.c
>
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index fd7f24da818..15e0e09c07d 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -13775,23 +13775,17 @@ rs6000_init_builtins (void)
>  }
>  }
>
> -/* Returns the rs6000 builtin decl for CODE.  */
> +/* Returns the rs6000 builtin decl for CODE.  Note that we don't check
> +   the builtin mask here since there could be some #pragma/attribute
> +   target functions and the rs6000_builtin_mask could be wrong when
> +   this checking happens, though it will be updated properly later.  */
>
>  tree
>  rs6000_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
>  {
> -  HOST_WIDE_INT fnmask;
> -
>if (code >= RS6000_BUILTIN_COUNT)
>  return error_mark_node;
>
> -  fnmask = rs6000_builtin_info[code].mask;
> -  if ((fnmask & rs6000_builtin_mask) != fnmask)
> -{
> -  rs6000_invalid_builtin ((enum rs6000_builtins)code);
> -  return error_mark_node;
> -}
> -
>return rs6000_builtin_decls[code];
>  }
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr102347.c 
> b/gcc/testsuite/gcc.target/powerpc/pr102347.c
> new file mode 100644
> index 000..05c439a8dac
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr102347.c
> @@ -0,0 +1,15 @@
> +/* { dg-do link } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-require-effective-target lto } */
> +/* { dg-options "-flto -mdejagnu-cpu=power9" } */
> +
> +/* Verify there are no error messages in LTO mode.  */
> +
> +#pragma GCC target "cpu=power10"
> +int main ()
> +{
> +  float *b;
> +  __vector_quad c;
> +  __builtin_mma_disassemble_acc (b, &c);
> +  return 0;
> +}
> --
> 2.27.0
>



[committed] libstdc++: Specialize std::pointer_traits<__normal_iterator>

2021-09-28 Thread Jonathan Wakely via Gcc-patches
This allows std::__to_address to be used with __normal_iterator in
C++11/14/17 modes. Without the partial specialization the deduced
pointer_traits::element_type is incorrect, and so the return type of
__to_address is wrong.

A similar partial specialization is probably needed for
__gnu_debug::_Safe_iterator.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (pointer_traits): Define partial
specialization for __normal_iterator.
* testsuite/24_iterators/normal_iterator/to_address.cc: New test.

Tested x86_64-linux. Committed to trunk.

commit 82626be2d633a9802a8b08727ef51c627e37fee5
Author: Jonathan Wakely 
Date:   Tue Sep 28 15:26:46 2021

libstdc++: Specialize std::pointer_traits<__normal_iterator>

This allows std::__to_address to be used with __normal_iterator in
C++11/14/17 modes. Without the partial specialization the deduced
pointer_traits::element_type is incorrect, and so the return type of
__to_address is wrong.

A similar partial specialization is probably needed for
__gnu_debug::_Safe_iterator.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (pointer_traits): Define partial
specialization for __normal_iterator.
* testsuite/24_iterators/normal_iterator/to_address.cc: New test.

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index c5b02408c1c..004d767224d 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1285,6 +1285,34 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return __it.base(); }
 
 #if __cplusplus >= 201103L
+
+  // Need to specialize pointer_traits because the primary template will
+  // deduce element_type of __normal_iterator as T* rather than T.
+  template
+struct pointer_traits<__gnu_cxx::__normal_iterator<_Iterator, _Container>>
+{
+private:
+  using _Base = pointer_traits<_Iterator>;
+
+public:
+  using element_type = typename _Base::element_type;
+  using pointer = __gnu_cxx::__normal_iterator<_Iterator, _Container>;
+  using difference_type = typename _Base::difference_type;
+
+  template
+   using rebind = __gnu_cxx::__normal_iterator<_Tp, _Container>;
+
+  static pointer
+  pointer_to(element_type& __e) noexcept
+  { return pointer(_Base::pointer_to(__e)); }
+
+#if __cplusplus >= 202002L
+  static element_type*
+  to_address(pointer __p) noexcept
+  { return __p.base(); }
+#endif
+};
+
   /**
* @addtogroup iterators
* @{
diff --git a/libstdc++-v3/testsuite/24_iterators/normal_iterator/to_address.cc 
b/libstdc++-v3/testsuite/24_iterators/normal_iterator/to_address.cc
new file mode 100644
index 000..510d627435f
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/normal_iterator/to_address.cc
@@ -0,0 +1,6 @@
+// { dg-do compile { target { c++11 } } }
+#include 
+#include 
+
+char* p = std::__to_address(std::string("1").begin());
+const char* q = std::__to_address(std::string("2").cbegin());


[committed] libstdc++: Fix _OutputIteratorConcept checks in algorithms

2021-09-28 Thread Jonathan Wakely via Gcc-patches
The _OutputIteratorConcept should be checked using the correct value
category. The std::move_backward and std::copy_backward algorithms
should use _OutputIteratorConcept instead of _ConvertibleConcept.

In order to use the correct value category, the concept should use a
function that returns _ValueT instead of using an lvalue data member.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/boost_concept_check.h (_OutputIteratorConcept):
Use a function to preserve value category of the type.
* include/bits/stl_algobase.h (copy, move, fill_n): Use a
reference as the second argument for _OutputIteratorConcept.
(copy_backward, move_backward): Use _OutputIteratorConcept
instead of _ConvertibleConcept.

Tested x86_64-linux. Committed to trunk.

commit 45a8cd256934be3770f7e000db7b13f10eabee9a
Author: Jonathan Wakely 
Date:   Fri Sep 24 15:35:20 2021

libstdc++: Fix _OutputIteratorConcept checks in algorithms

The _OutputIteratorConcept should be checked using the correct value
category. The std::move_backward and std::copy_backward algorithms
should use _OutputIteratorConcept instead of _ConvertibleConcept.

In order to use the correct value category, the concept should use a
function that returns _ValueT instead of using an lvalue data member.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/boost_concept_check.h (_OutputIteratorConcept):
Use a function to preserve value category of the type.
* include/bits/stl_algobase.h (copy, move, fill_n): Use a
reference as the second argument for _OutputIteratorConcept.
(copy_backward, move_backward): Use _OutputIteratorConcept
instead of _ConvertibleConcept.

diff --git a/libstdc++-v3/include/bits/boost_concept_check.h 
b/libstdc++-v3/include/bits/boost_concept_check.h
index 5c87e32f36b..ba36c24abec 100644
--- a/libstdc++-v3/include/bits/boost_concept_check.h
+++ b/libstdc++-v3/include/bits/boost_concept_check.h
@@ -464,10 +464,10 @@ struct _Aux_require_same<_Tp,_Tp> { typedef _Tp _Type; };
   __function_requires< _AssignableConcept<_Tp> >();
   ++__i;// require preincrement operator
   __i++;// require postincrement operator
-  *__i++ = __t; // require postincrement and assignment
+  *__i++ = __val(); // require postincrement and assignment
 }
 _Tp __i;
-_ValueT __t;
+_ValueT __val() const;
   };
 
   template 
diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index d0c49628d7f..e1443b8a92a 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -613,7 +613,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
   // concept requirements
   __glibcxx_function_requires(_InputIteratorConcept<_II>)
   __glibcxx_function_requires(_OutputIteratorConcept<_OI,
-   typename iterator_traits<_II>::value_type>)
+   typename iterator_traits<_II>::reference>)
   __glibcxx_requires_can_increment_range(__first, __last, __result);
 
   return std::__copy_move_a<__is_move_iterator<_II>::__value>
@@ -646,7 +646,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
   // concept requirements
   __glibcxx_function_requires(_InputIteratorConcept<_II>)
   __glibcxx_function_requires(_OutputIteratorConcept<_OI,
-   typename iterator_traits<_II>::value_type>)
+   typename iterator_traits<_II>::value_type&&>)
   __glibcxx_requires_can_increment_range(__first, __last, __result);
 
   return std::__copy_move_a(std::__miter_base(__first),
@@ -850,9 +850,8 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
   // concept requirements
   __glibcxx_function_requires(_BidirectionalIteratorConcept<_BI1>)
   __glibcxx_function_requires(_Mutable_BidirectionalIteratorConcept<_BI2>)
-  __glibcxx_function_requires(_ConvertibleConcept<
-   typename iterator_traits<_BI1>::value_type,
-   typename iterator_traits<_BI2>::value_type>)
+  __glibcxx_function_requires(_OutputIteratorConcept<_BI2,
+   typename iterator_traits<_BI1>::reference>)
   __glibcxx_requires_can_decrement_range(__first, __last, __result);
 
   return std::__copy_move_backward_a<__is_move_iterator<_BI1>::__value>
@@ -886,9 +885,8 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
   // concept requirements
   __glibcxx_function_requires(_BidirectionalIteratorConcept<_BI1>)
   __glibcxx_function_requires(_Mutable_BidirectionalIteratorConcept<_BI2>)
-  __glibcxx_function_requires(_ConvertibleConcept<
-   typename iterator_traits<_BI1>::value_type,
-   typename iterator_traits<_BI2>::value_type>)
+  __glibcxx_function_requires(_OutputIteratorConcept<_BI2,
+   typename iterator_traits<_BI1>::value_type&&>)
   

[committed] libstdc++: Fix tests that use invalid types in ordered containers

2021-09-28 Thread Jonathan Wakely via Gcc-patches
Types used in ordered containers need to be comparable, or the container
needs to use a custom comparison function. These tests fail when
_GLIBCXX_CONCEPT_CHECKS is defined, because the element types aren't
comparable.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/20_util/is_nothrow_swappable/value.h: Use custom
comparison function for priority_queue of type with no
relational operators.
* testsuite/20_util/is_swappable/value.h: Likewise.
* testsuite/24_iterators/output/concept.cc: Add operator< to
type used in set.

Tested x86_64-linux. Committed to trunk.

commit 4000d722e6091e923721b54911bb784eeec3
Author: Jonathan Wakely 
Date:   Fri Sep 24 13:21:34 2021

libstdc++: Fix tests that use invalid types in ordered containers

Types used in ordered containers need to be comparable, or the container
needs to use a custom comparison function. These tests fail when
_GLIBCXX_CONCEPT_CHECKS is defined, because the element types aren't
comparable.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/20_util/is_nothrow_swappable/value.h: Use custom
comparison function for priority_queue of type with no
relational operators.
* testsuite/20_util/is_swappable/value.h: Likewise.
* testsuite/24_iterators/output/concept.cc: Add operator< to
type used in set.

diff --git a/libstdc++-v3/testsuite/20_util/is_nothrow_swappable/value.h 
b/libstdc++-v3/testsuite/20_util/is_nothrow_swappable/value.h
index 62b3db8dc1f..d6f166bee46 100644
--- a/libstdc++-v3/testsuite/20_util/is_nothrow_swappable/value.h
+++ b/libstdc++-v3/testsuite/20_util/is_nothrow_swappable/value.h
@@ -285,7 +285,9 @@ void test01()
   static_assert(test_property>(true), "");
   static_assert(test_property>(true), "");
+   std::priority_queue,
+   comps::CompareNoThrowCopyable>>(true), "");
   static_assert(test_property>(true), "");
   static_assert(test_property
+  bool operator()(const T&, const T&) const
+  { return false; }
+  };
 }
 void test01()
 {
@@ -152,7 +159,9 @@ void test01()
   static_assert(test_property[1][2][3]>(true), "");
   static_assert(test_property>(true), "");
+   std::priority_queue,
+   funny::DummyCmp>>(true), "");
   static_assert(test_property>(true), "");
   static_assert(test_property::iterator, int > );
 static_assert( output_iterator< array::iterator, B > );


[committed] libstdc++: Improve types used as iterators in testsuite

2021-09-28 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/25_algorithms/copy/34595.cc: Add missing operation
for type used as an iterator.
* testsuite/25_algorithms/unique_copy/check_type.cc: Likewise.

Tested x86_64-linux. Committed to trunk.

commit 5f1db7627f6eea2050c3d71f17bca5ecf586a813
Author: Jonathan Wakely 
Date:   Fri Sep 24 13:23:34 2021

libstdc++: Improve types used as iterators in testsuite

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/25_algorithms/copy/34595.cc: Add missing operation
for type used as an iterator.
* testsuite/25_algorithms/unique_copy/check_type.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/25_algorithms/copy/34595.cc 
b/libstdc++-v3/testsuite/25_algorithms/copy/34595.cc
index c534eeb17f5..513425a5a2c 100644
--- a/libstdc++-v3/testsuite/25_algorithms/copy/34595.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/copy/34595.cc
@@ -27,11 +27,12 @@ class Counting_output_iterator
 public:
   Counting_output_iterator() : c(0) {}
   Counting_output_iterator& operator++() { return *this; }
+  Counting_output_iterator operator++(int) { return *this; }
   Counting_output_iterator& operator*() { return *this; }
-  
+
   template 
   void operator=(const T&) { ++c; }
-  
+
   std::size_t current_counter() const { return c; }
 };
 
diff --git a/libstdc++-v3/testsuite/25_algorithms/unique_copy/check_type.cc 
b/libstdc++-v3/testsuite/25_algorithms/unique_copy/check_type.cc
index af86548609f..27b35794e8a 100644
--- a/libstdc++-v3/testsuite/25_algorithms/unique_copy/check_type.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/unique_copy/check_type.cc
@@ -25,27 +25,35 @@
 using __gnu_test::input_iterator_wrapper;
 using __gnu_test::output_iterator_wrapper;
 
-struct S1 { };
+template
+struct iter_facade
+{
+  T& operator++();
+  T operator++(int);
+  T& operator*() const;
+};
 
-struct S2
+struct S1 : iter_facade { };
+
+struct S2 : iter_facade
 {
   S2(const S1&) {}
 };
 
-bool 
+bool
 operator==(const S1&, const S1&) {return true;}
 
-struct X1 { };
+struct X1 : iter_facade  { };
 
-struct X2
+struct X2 : iter_facade
 {
   X2(const X1&) {}
 };
 
-bool 
+bool
 predicate(const X1&, const X1&) {return true;}
 
-output_iterator_wrapper 
+output_iterator_wrapper
 test1(input_iterator_wrapper& s1, output_iterator_wrapper& s2)
 { return std::unique_copy(s1, s1, s2); }
 


[committed] libstdc++: Fix concept checks for iterators

2021-09-28 Thread Jonathan Wakely via Gcc-patches
This adds some additional checks the the C++98-style concept checks for
iterators, and removes some bogus checks for mutable iterators. Instead
of requiring that the result of dereferencing a mutable iterator is
assignable (which is a property of the value type, not required for the
iterator) check that the reference type is a non-const reference to the
value type.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/boost_concept_check.h (_ForwardIteratorConcept)
(_BidirectionalIteratorConcept, _RandomAccessIteratorConcept):
Check result types of iterator operations.
(_Mutable_ForwardIteratorConcept): Check that iterator's
reference type is a reference to its value type.
(_Mutable_BidirectionalIteratorConcept): Do not require the
value type to be assignable.
(_Mutable_RandomAccessIteratorConcept): Likewise.
* testsuite/24_iterators/operations/prev_neg.cc: Adjust dg-error
line number.

Tested x86_64-linux. Committed to trunk.

commit afffc96a5259ba4e3f3cca154dc5ea32a496875e
Author: Jonathan Wakely 
Date:   Fri Sep 24 13:56:33 2021

libstdc++: Fix concept checks for iterators

This adds some additional checks the the C++98-style concept checks for
iterators, and removes some bogus checks for mutable iterators. Instead
of requiring that the result of dereferencing a mutable iterator is
assignable (which is a property of the value type, not required for the
iterator) check that the reference type is a non-const reference to the
value type.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/boost_concept_check.h (_ForwardIteratorConcept)
(_BidirectionalIteratorConcept, _RandomAccessIteratorConcept):
Check result types of iterator operations.
(_Mutable_ForwardIteratorConcept): Check that iterator's
reference type is a reference to its value type.
(_Mutable_BidirectionalIteratorConcept): Do not require the
value type to be assignable.
(_Mutable_RandomAccessIteratorConcept): Likewise.
* testsuite/24_iterators/operations/prev_neg.cc: Adjust dg-error
line number.

diff --git a/libstdc++-v3/include/bits/boost_concept_check.h 
b/libstdc++-v3/include/bits/boost_concept_check.h
index ba36c24abec..71c99c13e93 100644
--- a/libstdc++-v3/include/bits/boost_concept_check.h
+++ b/libstdc++-v3/include/bits/boost_concept_check.h
@@ -44,6 +44,14 @@
 #include 
 #include // for traits and tags
 
+namespace std  _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+  struct _Bit_iterator;
+  struct _Bit_const_iterator;
+_GLIBCXX_END_NAMESPACE_VERSION
+}
+
 namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -470,6 +478,52 @@ struct _Aux_require_same<_Tp,_Tp> { typedef _Tp _Type; };
 _ValueT __val() const;
   };
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wunused-variable"
+
+  template 
+  struct _ForwardIteratorReferenceConcept
+  {
+void __constraints() {
+#if __cplusplus >= 201103L
+  typedef typename std::iterator_traits<_Tp>::reference _Ref;
+  static_assert(std::is_reference<_Ref>::value,
+ "reference type of a forward iterator must be a real reference");
+#endif
+}
+  };
+
+  template 
+  struct _Mutable_ForwardIteratorReferenceConcept
+  {
+void __constraints() {
+  typedef typename std::iterator_traits<_Tp>::reference _Ref;
+  typedef typename std::iterator_traits<_Tp>::value_type _Val;
+  __function_requires< _SameTypeConcept<_Ref, _Val&> >();
+}
+  };
+
+  // vector::iterator is not a real forward reference, but pretend it is.
+  template <>
+  struct _ForwardIteratorReferenceConcept
+  {
+void __constraints() { }
+  };
+
+  // vector::iterator is not a real forward reference, but pretend it is.
+  template <>
+  struct _Mutable_ForwardIteratorReferenceConcept
+  {
+void __constraints() { }
+  };
+
+  // And vector::const iterator too.
+  template <>
+  struct _ForwardIteratorReferenceConcept
+  {
+void __constraints() { }
+  };
+
   template 
   struct _ForwardIteratorConcept
   {
@@ -479,8 +533,12 @@ struct _Aux_require_same<_Tp,_Tp> { typedef _Tp _Type; };
   __function_requires< _ConvertibleConcept<
 typename std::iterator_traits<_Tp>::iterator_category,
 std::forward_iterator_tag> >();
+  __function_requires< _ForwardIteratorReferenceConcept<_Tp> >();
+  _Tp& __j = ++__i;
+  const _Tp& __k = __i++;
   typedef typename std::iterator_traits<_Tp>::reference _Ref;
-  _Ref __r _IsUnused = *__i;
+  _Ref __r = *__k;
+  _Ref __r2 = *__i++;
 }
 _Tp __i;
   };
@@ -490,7 +548,9 @@ struct _Aux_require_same<_Tp,_Tp> { typedef _Tp _Type; };
   {
 void __constraints() {
   __function_requires< _ForwardIteratorConcept<_Tp> >();
-  *__i++ = *

[committed] libstdc++: Skip tests that fail with _GLIBCXX_CONCEPT_CHECKS

2021-09-28 Thread Jonathan Wakely via Gcc-patches
The extension that allows implicitly rebinding a container's allocator
is not allowed when _GLIBCXX_CONCEPT_CHECKS is defined, so skip the
tests for that extension.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* 
testsuite/23_containers/deque/requirements/explicit_instantiation/3.cc:
Do not test implicit allocator rebinding when _GLIBCXX_CONCEPT_CHECKS
is defined.
* 
testsuite/23_containers/forward_list/requirements/explicit_instantiation/3.cc:
Likewise.
* testsuite/23_containers/list/requirements/explicit_instantiation/3.cc:
Likewise.
* testsuite/23_containers/list/requirements/explicit_instantiation/5.cc:
Likewise.
* testsuite/23_containers/map/requirements/explicit_instantiation/3.cc:
Likewise.
* testsuite/23_containers/map/requirements/explicit_instantiation/5.cc:
Likewise.
* 
testsuite/23_containers/multimap/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/multimap/requirements/explicit_instantiation/5.cc:
Likewise.
* 
testsuite/23_containers/multiset/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/multiset/requirements/explicit_instantiation/5.cc:
Likewise.
* testsuite/23_containers/set/requirements/explicit_instantiation/3.cc:
Likewise.
* testsuite/23_containers/set/requirements/explicit_instantiation/5.cc:
Likewise.
* 
testsuite/23_containers/unordered_map/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/unordered_multimap/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/unordered_multiset/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/unordered_set/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/vector/ext_pointer/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/vector/requirements/explicit_instantiation/3.cc:
Likewise.

Tested x86_64-linux. Committed to trunk.

commit b701f46ea6d651aff8dbd267c29213253045e2b6
Author: Jonathan Wakely 
Date:   Fri Sep 24 14:23:36 2021

libstdc++: Skip tests that fail with _GLIBCXX_CONCEPT_CHECKS

The extension that allows implicitly rebinding a container's allocator
is not allowed when _GLIBCXX_CONCEPT_CHECKS is defined, so skip the
tests for that extension.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* 
testsuite/23_containers/deque/requirements/explicit_instantiation/3.cc:
Do not test implicit allocator rebinding when 
_GLIBCXX_CONCEPT_CHECKS
is defined.
* 
testsuite/23_containers/forward_list/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/list/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/list/requirements/explicit_instantiation/5.cc:
Likewise.
* 
testsuite/23_containers/map/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/map/requirements/explicit_instantiation/5.cc:
Likewise.
* 
testsuite/23_containers/multimap/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/multimap/requirements/explicit_instantiation/5.cc:
Likewise.
* 
testsuite/23_containers/multiset/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/multiset/requirements/explicit_instantiation/5.cc:
Likewise.
* 
testsuite/23_containers/set/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/set/requirements/explicit_instantiation/5.cc:
Likewise.
* 
testsuite/23_containers/unordered_map/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/unordered_multimap/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/unordered_multiset/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/unordered_set/requirements/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/vector/ext_pointer/explicit_instantiation/3.cc:
Likewise.
* 
testsuite/23_containers/vector/requirements/explicit_instantiation/3.cc:
Likewise.

diff --git 
a/libstdc++-v3/testsuite/23_containers/deque/requirements/explicit_instantiation/3.cc
 
b/libstdc++-v3/testsuite/23_containers/deque/requirements/explicit_instantiation/3.cc
index 0cbedf4693b..2a23eaa3f17 100644
--- 
a/libstdc++-v3/testsuite/2

[committed] libstdc++: Skip container adaptor tests that fail concept checks

2021-09-28 Thread Jonathan Wakely via Gcc-patches
As an extension, our container adaptors SFINAE away the default
constructor if the adapted sequence container is not default
constructible. When _GLIBCXX_CONCEPT_CHECKS is defined we enforce that
the sequence is default constructible, so the tests for the extension
fail. This disables the relevant parts of the tests.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* 
testsuite/23_containers/priority_queue/requirements/explicit_instantiation/1.cc:
Do not check non-default constructible sequences when
_GLIBCXX_CONCEPT_CHECKS is defined.
* 
testsuite/23_containers/priority_queue/requirements/explicit_instantiation/1_c++98.cc:
Likewise.
* 
testsuite/23_containers/queue/requirements/explicit_instantiation/1.cc:
Likewise.
* 
testsuite/23_containers/queue/requirements/explicit_instantiation/1_c++98.cc:
Likewise.
* 
testsuite/23_containers/stack/requirements/explicit_instantiation/1.cc:
Likewise.
* 
testsuite/23_containers/stack/requirements/explicit_instantiation/1_c++98.cc:
Likewise.

Tested x86_64-linux. Committed to trunk.

commit 07fbdd7bda1166ab2722dbeb4fd3c6b8558b324b
Author: Jonathan Wakely 
Date:   Fri Sep 24 14:32:34 2021

libstdc++: Skip container adaptor tests that fail concept checks

As an extension, our container adaptors SFINAE away the default
constructor if the adapted sequence container is not default
constructible. When _GLIBCXX_CONCEPT_CHECKS is defined we enforce that
the sequence is default constructible, so the tests for the extension
fail. This disables the relevant parts of the tests.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* 
testsuite/23_containers/priority_queue/requirements/explicit_instantiation/1.cc:
Do not check non-default constructible sequences when
_GLIBCXX_CONCEPT_CHECKS is defined.
* 
testsuite/23_containers/priority_queue/requirements/explicit_instantiation/1_c++98.cc:
Likewise.
* 
testsuite/23_containers/queue/requirements/explicit_instantiation/1.cc:
Likewise.
* 
testsuite/23_containers/queue/requirements/explicit_instantiation/1_c++98.cc:
Likewise.
* 
testsuite/23_containers/stack/requirements/explicit_instantiation/1.cc:
Likewise.
* 
testsuite/23_containers/stack/requirements/explicit_instantiation/1_c++98.cc:
Likewise.

diff --git 
a/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/explicit_instantiation/1.cc
 
b/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/explicit_instantiation/1.cc
index d1e18f879df..a425001612d 100644
--- 
a/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/explicit_instantiation/1.cc
+++ 
b/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/explicit_instantiation/1.cc
@@ -24,12 +24,15 @@
 
 template class std::priority_queue;
 
-struct NonDefaultConstructible : std::vector {
-  NonDefaultConstructible(int) { }
-};
 struct Cmp : std::less {
   Cmp(int) { }
 };
+template class std::priority_queue, Cmp>;
+
+#ifndef _GLIBCXX_CONCEPT_CHECKS
+struct NonDefaultConstructible : std::vector {
+  NonDefaultConstructible(int) { }
+};
 template class std::priority_queue;
 template class std::priority_queue;
-template class std::priority_queue, Cmp>;
+#endif
diff --git 
a/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/explicit_instantiation/1_c++98.cc
 
b/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/explicit_instantiation/1_c++98.cc
index def9259dc6b..28549f5246e 100644
--- 
a/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/explicit_instantiation/1_c++98.cc
+++ 
b/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/explicit_instantiation/1_c++98.cc
@@ -24,12 +24,15 @@
 
 template class std::priority_queue;
 
-struct NonDefaultConstructible : std::vector {
-  NonDefaultConstructible(int) { }
-};
 struct Cmp : std::less {
   Cmp(int) { }
 };
+template class std::priority_queue, Cmp>;
+
+#ifndef _GLIBCXX_CONCEPT_CHECKS
+struct NonDefaultConstructible : std::vector {
+  NonDefaultConstructible(int) { }
+};
 template class std::priority_queue;
 template class std::priority_queue;
-template class std::priority_queue, Cmp>;
+#endif
diff --git 
a/libstdc++-v3/testsuite/23_containers/queue/requirements/explicit_instantiation/1.cc
 
b/libstdc++-v3/testsuite/23_containers/queue/requirements/explicit_instantiation/1.cc
index b737a15a30b..3b9090cb945 100644
--- 
a/libstdc++-v3/testsuite/23_containers/queue/requirements/explicit_instantiation/1.cc
+++ 
b/libstdc++-v3/testsuite/23_containers/queue/requirements/explicit_instantiation/1.cc
@@ -24,7 +24,9 @@
 
 template class std::queue;
 
+#ifndef _GLIBCXX_CONCEPT_CHECKS
 struct NonDefaultConstructible : std::deque {
   NonDefaultConstructible(int) { }
 };
 template 

[pushed] Darwin, D : Add .d suffix to the list for invoking dsymutil.

2021-09-28 Thread Iain Sandoe via Gcc-patches
Hi,

Recognise .d for D source files on the command line.  This will
trigger an invocation of dsymutil when a D source is present.

tested along with D patches on i686, powerpc and x86_64 darwin,
pushed to master, thanks,
Iain

gcc/ChangeLog:

* config/darwin.h (DSYMUTIL_SPEC): Recognize D sources.
---
 gcc/config/darwin.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index 50524a51511..0fa1c572bc9 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -251,7 +251,7 @@ extern GTY(()) int darwin_ms_struct;
 %{v} \
 %{g*:%{!gctf:%{!gbtf:%{!gstabs*:%{%:debug-level-gt(0): -idsym}\
 %{.c|.cc|.C|.cpp|.cp|.c++|.cxx|.CPP|.m|.mm|.s|.f|.f90|\
-  .f95|.f03|.f77|.for|.F|.F90|.F95|.F03: \
+  .f95|.f03|.f77|.for|.F|.F90|.F95|.F03|.d: \
 %{g*:%{!gctf:%{!gbtf:%{!gstabs*:%{%:debug-level-gt(0): -dsym}"
 
 #define LINK_COMMAND_SPEC LINK_COMMAND_SPEC_A DSYMUTIL_SPEC
-- 
2.24.3 (Apple Git-128)



[committed] libstdc++: Define macro before it is first checked

2021-09-28 Thread Jonathan Wakely via Gcc-patches
On Thu, 2 Sept 2021 at 22:25, Jonathan Wakely wrote:
>
> On Thu, 2 Sept 2021 at 19:00, Jonathan Wakely wrote:
> >
> > * include/bits/atomic_wait.h (_GLIBCXX_HAVE_PLATFORM_WAIT):
> > Define before first attempt to check it.
> >
> > Tested x86_64-linux and powerpc64-linux, not committed yet.
>
> Actually ignore that ... I tested the wrong patch. This one introduces
> a new FAIL, which I have a fix for, but it will have to wait for next
> week.
>
>
> > I think we need this, otherwise __platform_wait_uses_type is false
> > for all T.

This is the fixed patch.

Tested x86_64-linux, pushed to trunk.
commit aeaea265cea3a2b2e772af7825351a4ceef29aac
Author: Jonathan Wakely 
Date:   Tue Aug 31 15:51:09 2021

libstdc++: Define macro before it is first checked

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/atomic_wait.h (_GLIBCXX_HAVE_PLATFORM_WAIT):
Define before first attempt to check it.

diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h
index 07bb744d822..35c92644146 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -56,9 +56,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   namespace __detail
   {
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
+#define _GLIBCXX_HAVE_PLATFORM_WAIT 1
 using __platform_wait_t = int;
 static constexpr size_t __platform_wait_alignment = 4;
 #else
+// define _GLIBCX_HAVE_PLATFORM_WAIT and implement __platform_wait()
+// and __platform_notify() if there is a more efficient primitive supported
+// by the platform (e.g. __ulock_wait()/__ulock_wake()) which is better than
+// a mutex/condvar based wait.
 using __platform_wait_t = uint64_t;
 static constexpr size_t __platform_wait_alignment
   = __alignof__(__platform_wait_t);
@@ -70,7 +75,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
   = is_scalar_v<_Tp>
&& ((sizeof(_Tp) == sizeof(__detail::__platform_wait_t))
-   && (alignof(_Tp*) >= __platform_wait_alignment));
+   && (alignof(_Tp*) >= __detail::__platform_wait_alignment));
 #else
   = false;
 #endif
@@ -78,7 +83,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   namespace __detail
   {
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
-#define _GLIBCXX_HAVE_PLATFORM_WAIT 1
 enum class __futex_wait_flags : int
 {
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX_PRIVATE
@@ -118,11 +122,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 static_cast(__futex_wait_flags::__wake_private),
 __all ? INT_MAX : 1);
   }
-#else
-// define _GLIBCX_HAVE_PLATFORM_WAIT and implement __platform_wait()
-// and __platform_notify() if there is a more efficient primitive supported
-// by the platform (e.g. __ulock_wait()/__ulock_wake()) which is better than
-// a mutex/condvar based wait
 #endif
 
 inline void
@@ -331,7 +330,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
if constexpr (__platform_wait_uses_type<_Up>)
  {
-   __val == __old;
+   __builtin_memcpy(&__val, &__old, sizeof(__val));
  }
else
  {


Re: [PATCH] c++: Fix up synthetization of defaulted comparison operators on classes with bitfields [PR102490]

2021-09-28 Thread Jason Merrill via Gcc-patches

On 9/28/21 09:53, Patrick Palka wrote:

On Tue, 28 Sep 2021, Patrick Palka wrote:


On Tue, 28 Sep 2021, Jakub Jelinek via Gcc-patches wrote:


Hi!

The testcases in the patch are either miscompiled or ICE with checking,
because the defaulted operator== is synthetized too early (but only if
constexpr), when the corresponding class type is still incomplete type.
The problem is that at that point the bitfield FIELD_DECLs still have as
TREE_TYPE their underlying type rather than integral type with their
precision and when layout_class_type is called for the class soon after
that, it changes those types but the COMPONENT_REFs type stay the way
that they were during the operator== synthetize_method type and the
middle-end is then upset by the mismatch of types.
As what exact type will be given isn't just a one liner but quite long code
especially for over-sized bitfields, I think it is best to just not
synthetize the comparison operators so early (the defaulted_late_check
change) and call defaulted_late_check for them once again as soon as the
class is complete.


Nice, this might also fix PR98712.



Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-09-28  Jakub Jelinek  

PR c++/102490
* method.c (defaulted_late_check): Don't synthetize constexpr
defaulted comparisons if context is still incomplete type.
(finish_struct_1): Call defaulted_late_check again for defaulted
comparisons.

* g++.dg/cpp2a/spaceship-eq11.C: New test.
* g++.dg/cpp2a/spaceship-eq12.C: New test.

--- gcc/cp/method.c.jj  2021-09-15 08:55:37.563497558 +0200
+++ gcc/cp/method.c 2021-09-27 13:48:12.139271830 +0200
@@ -3160,8 +3160,11 @@ defaulted_late_check (tree fn)
if (kind == sfk_comparison)
  {
/* If the function was declared constexpr, check that the definition
-qualifies.  Otherwise we can define the function lazily.  */
-  if (DECL_DECLARED_CONSTEXPR_P (fn) && !DECL_INITIAL (fn))
+qualifies.  Otherwise we can define the function lazily.
+Don't do this if the class type is still incomplete.  */
+  if (DECL_DECLARED_CONSTEXPR_P (fn)
+ && !DECL_INITIAL (fn)
+ && COMPLETE_TYPE_P (ctx))
{


According to the function comment for defaulted_late_check, won't
COMPLETE_TYPE_P (ctx) always be false here?


Not for a function defaulted outside the class.


If so, I wonder if we could get away with moving this entire fragment
from defaulted_late_check to finish_struct_1 instead of calling
defaulted_late_check from finish_struct_1.


The comment in check_bases_and_members says that we call it there so 
that it's before we clone [cd]tors.  Probably better to leave the call 
there for other functions, just skip it for comparisons.



  /* Prevent GC.  */
  function_depth++;
--- gcc/cp/class.c.jj   2021-09-03 09:46:28.801428380 +0200
+++ gcc/cp/class.c  2021-09-27 14:07:03.465562255 +0200
@@ -7467,7 +7467,14 @@ finish_struct_1 (tree t)
   for any static member objects of the type we're working on.  */
for (x = TYPE_FIELDS (t); x; x = DECL_CHAIN (x))
  if (DECL_DECLARES_FUNCTION_P (x))
-  DECL_IN_AGGR_P (x) = false;
+  {
+   /* Synthetize constexpr defaulted comparisons.  */
+   if (!DECL_ARTIFICIAL (x)
+   && DECL_DEFAULTED_IN_CLASS_P (x)
+   && special_function_p (x) == sfk_comparison)
+ defaulted_late_check (x);
+   DECL_IN_AGGR_P (x) = false;
+  }
  else if (VAR_P (x) && TREE_STATIC (x)
 && TREE_TYPE (x) != error_mark_node
 && same_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (x)), t))
--- gcc/testsuite/g++.dg/cpp2a/spaceship-eq11.C.jj  2021-09-27 
14:20:04.723713371 +0200
+++ gcc/testsuite/g++.dg/cpp2a/spaceship-eq11.C 2021-09-27 14:20:20.387495858 
+0200
@@ -0,0 +1,43 @@
+// PR c++/102490
+// { dg-do run { target c++20 } }
+
+struct A
+{
+  unsigned char a : 1;
+  unsigned char b : 1;
+  constexpr bool operator== (const A &) const = default;
+};
+
+struct B
+{
+  unsigned char a : 8;
+  int : 0;
+  unsigned char b : 7;
+  constexpr bool operator== (const B &) const = default;
+};
+
+struct C
+{
+  unsigned char a : 3;
+  unsigned char b : 1;
+  constexpr bool operator== (const C &) const = default;
+};
+
+void
+foo (C &x, int y)
+{
+  x.b = y;
+}
+
+int
+main ()
+{
+  A a{}, b{};
+  B c{}, d{};
+  C e{}, f{};
+  a.b = 1;
+  d.b = 1;
+  foo (e, 0);
+  foo (f, 1);
+  return a == b || c == d || e == f;
+}
--- gcc/testsuite/g++.dg/cpp2a/spaceship-eq12.C.jj  2021-09-27 
14:20:12.050611625 +0200
+++ gcc/testsuite/g++.dg/cpp2a/spaceship-eq12.C 2021-09-27 14:20:39.633228602 
+0200
@@ -0,0 +1,5 @@
+// PR c++/102490
+// { dg-do run { target c++20 } }
+// { dg-options "-O2" }
+
+#include "spaceship-eq11.C"

Jakub










[PATCH] libstdc++: Fix return values for atomic wait on futex

2021-09-28 Thread Jonathan Wakely via Gcc-patches
This fixes a logic error in the futex-based timed wait.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h (__platform_wait_until_impl):
Return false for ETIMEDOUT and true otherwise.

Tested x86_64-linux.

I'm not seeing any tests fail as a result of this, btu it does seem to
be incorrect. Please check my working.


commit 94dc544bbf42e95a363b916ed0d665afcf88
Author: Jonathan Wakely 
Date:   Tue Aug 31 10:20:41 2021

libstdc++: Fix return values for atomic wait on futex

This fixes a logic error in the futex-based timed wait.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h (__platform_wait_until_impl):
Return false for ETIMEDOUT and true otherwise.

diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h 
b/libstdc++-v3/include/bits/atomic_timed_wait.h
index 3db08f82707..d423a7af7c3 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -101,12 +101,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
if (__e)
  {
-   if ((errno != ETIMEDOUT) && (errno != EINTR)
-   && (errno != EAGAIN))
+   if (errno == ETIMEDOUT)
+ return false;
+   if (errno != EINTR && errno != EAGAIN)
  __throw_system_error(errno);
-   return true;
  }
-   return false;
+   return true;
   }
 
 // returns true if wait ended before timeout


[committed] libstdc++: Add noexcept to functions in

2021-09-28 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/regex.h (basic_regex, swap): Add noexcept to
non-throwing functions.
* include/bits/regex_automaton.h (_State_base, _State)
(_NFA_base): Likewise.
* include/bits/regex_compiler.h (_Compiler): Likewise.
* include/bits/regex_error.h (regex_error::code()): Likewise.
* include/bits/regex_scanner.h (_Scanner): Likewise.

Tested x86_64-linux. Committed to trunk.

commit df0dd04b78cfc0f723387b703978600caac93cbb
Author: Jonathan Wakely 
Date:   Mon Sep 27 20:42:17 2021

libstdc++: Add noexcept to functions in 

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/regex.h (basic_regex, swap): Add noexcept to
non-throwing functions.
* include/bits/regex_automaton.h (_State_base, _State)
(_NFA_base): Likewise.
* include/bits/regex_compiler.h (_Compiler): Likewise.
* include/bits/regex_error.h (regex_error::code()): Likewise.
* include/bits/regex_scanner.h (_Scanner): Likewise.

diff --git a/libstdc++-v3/include/bits/regex.h 
b/libstdc++-v3/include/bits/regex.h
index b8a0ad251d8..d4a7729de2c 100644
--- a/libstdc++-v3/include/bits/regex.h
+++ b/libstdc++-v3/include/bits/regex.h
@@ -421,7 +421,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
* Constructs a basic regular expression that does not match any
* character sequence.
*/
-  basic_regex()
+  basic_regex() noexcept
   : _M_flags(ECMAScript), _M_loc(), _M_automaton(nullptr)
   { }
 
@@ -697,7 +697,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
* expression.
*/
   unsigned int
-  mark_count() const
+  mark_count() const noexcept
   {
if (_M_automaton)
  return _M_automaton->_M_sub_count() - 1;
@@ -709,7 +709,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
* or in the last call to assign().
*/
   flag_type
-  flags() const
+  flags() const noexcept
   { return _M_flags; }
 
   // [7.8.5] locale
@@ -731,7 +731,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*object.
*/
   locale_type
-  getloc() const
+  getloc() const noexcept
   { return _M_loc; }
 
   // [7.8.6] swap
@@ -741,7 +741,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
* @param __rhs Another regular expression object.
*/
   void
-  swap(basic_regex& __rhs)
+  swap(basic_regex& __rhs) noexcept
   {
std::swap(_M_flags, __rhs._M_flags);
std::swap(_M_loc, __rhs._M_loc);
@@ -848,7 +848,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   template
 inline void
 swap(basic_regex<_Ch_type, _Rx_traits>& __lhs,
-basic_regex<_Ch_type, _Rx_traits>& __rhs)
+basic_regex<_Ch_type, _Rx_traits>& __rhs) noexcept
 { __lhs.swap(__rhs); }
 
 
diff --git a/libstdc++-v3/include/bits/regex_automaton.h 
b/libstdc++-v3/include/bits/regex_automaton.h
index 872a17fe8cb..02d81f3e417 100644
--- a/libstdc++-v3/include/bits/regex_automaton.h
+++ b/libstdc++-v3/include/bits/regex_automaton.h
@@ -95,13 +95,13 @@ namespace __detail
 };
 
   protected:
-explicit _State_base(_Opcode __opcode)
+explicit _State_base(_Opcode __opcode) noexcept
 : _M_opcode(__opcode), _M_next(_S_invalid_state_id)
 { }
 
   public:
 bool
-_M_has_alt()
+_M_has_alt() const noexcept
 {
   return _M_opcode == _S_opcode_alternative
|| _M_opcode == _S_opcode_repeat
@@ -130,7 +130,7 @@ namespace __detail
"std::function");
 
   explicit
-  _State(_Opcode __opcode) : _State_base(__opcode)
+  _State(_Opcode __opcode) noexcept : _State_base(__opcode)
   {
if (_M_opcode() == _S_opcode_match)
  new (this->_M_matcher_storage._M_addr()) _MatcherT();
@@ -143,7 +143,7 @@ namespace __detail
_MatcherT(__rhs._M_get_matcher());
   }
 
-  _State(_State&& __rhs) : _State_base(__rhs)
+  _State(_State&& __rhs) noexcept : _State_base(__rhs)
   {
if (__rhs._M_opcode() == _S_opcode_match)
  new (this->_M_matcher_storage._M_addr())
@@ -162,7 +162,7 @@ namespace __detail
   // Since correct ctor and dtor rely on _M_opcode, it's better not to
   // change it over time.
   _Opcode
-  _M_opcode() const
+  _M_opcode() const noexcept
   { return _State_base::_M_opcode; }
 
   bool
@@ -170,11 +170,11 @@ namespace __detail
   { return _M_get_matcher()(__char); }
 
   _MatcherT&
-  _M_get_matcher()
+  _M_get_matcher() noexcept
   { return *static_cast<_MatcherT*>(this->_M_matcher_storage._M_addr()); }
 
   const _MatcherT&
-  _M_get_matcher() const
+  _M_get_matcher() const noexcept
   {
return *static_cast(
this->_M_matcher_storage._M_addr());
@@ -187,7 +187,7 @@ namespace __detail
 typedef regex_constants::syntax_option_type _FlagT;
 
 explicit
-  

[committed] libstdc++: Tweaks to to avoid warnings

2021-09-28 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/regex_compiler.tcc: Add line break in empty while
statement.
* include/bits/regex_executor.tcc: Avoid unused parameter
warning.

Tested x86_64-linux. Committed to trunk.

commit b5f276b8c76d892f7fed229153cfbadc13f4696e
Author: Jonathan Wakely 
Date:   Mon Sep 27 20:44:24 2021

libstdc++: Tweaks to  to avoid warnings

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/regex_compiler.tcc: Add line break in empty while
statement.
* include/bits/regex_executor.tcc: Avoid unused parameter
warning.

diff --git a/libstdc++-v3/include/bits/regex_compiler.tcc 
b/libstdc++-v3/include/bits/regex_compiler.tcc
index 440669debe0..9f04c1be686 100644
--- a/libstdc++-v3/include/bits/regex_compiler.tcc
+++ b/libstdc++-v3/include/bits/regex_compiler.tcc
@@ -140,7 +140,8 @@ namespace __detail
return true;
   if (this->_M_atom())
{
- while (this->_M_quantifier());
+ while (this->_M_quantifier())
+   ;
  return true;
}
   return false;
@@ -440,7 +441,8 @@ namespace __detail
  __last_char.second = '-';
}
}
-  while (_M_expression_term(__last_char, __matcher));
+  while (_M_expression_term(__last_char, __matcher))
+   ;
   if (__last_char.first)
__matcher._M_add_char(__last_char.second);
   __matcher._M_ready();
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc 
b/libstdc++-v3/include/bits/regex_executor.tcc
index 3cefeda48a3..2577265c33a 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -423,7 +423,7 @@ namespace __detail
   template
 void _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
-_M_handle_accept(_Match_mode __match_mode, _StateIdT __i)
+_M_handle_accept(_Match_mode __match_mode, _StateIdT)
 {
   if (__dfs_mode)
{


[committed] libstdc++: Remove obfuscating typedefs in

2021-09-28 Thread Jonathan Wakely via Gcc-patches
There is no benefit to using _SizeT instead of size_t, and IterT tells
you less about the type than const _CharT*. This removes some unhelpful
typedefs.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/regex_automaton.h (_NFA_base::_SizeT): Remove.
* include/bits/regex_compiler.h (_Compiler::_IterT): Remove.
* include/bits/regex_compiler.tcc: Likewise.
* include/bits/regex_scanner.h (_Scanner::_IterT): Remove.
* include/bits/regex_scanner.tcc: Likewise.

Tested x86_64-linux. Committed to trunk.

commit c44c5f3d9f46705a262911c2098c1568d7e8ac2d
Author: Jonathan Wakely 
Date:   Tue Sep 28 13:39:36 2021

libstdc++: Remove obfuscating typedefs in 

There is no benefit to using _SizeT instead of size_t, and IterT tells
you less about the type than const _CharT*. This removes some unhelpful
typedefs.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/regex_automaton.h (_NFA_base::_SizeT): Remove.
* include/bits/regex_compiler.h (_Compiler::_IterT): Remove.
* include/bits/regex_compiler.tcc: Likewise.
* include/bits/regex_scanner.h (_Scanner::_IterT): Remove.
* include/bits/regex_scanner.tcc: Likewise.

diff --git a/libstdc++-v3/include/bits/regex_automaton.h 
b/libstdc++-v3/include/bits/regex_automaton.h
index 02d81f3e417..f108675f35e 100644
--- a/libstdc++-v3/include/bits/regex_automaton.h
+++ b/libstdc++-v3/include/bits/regex_automaton.h
@@ -183,7 +183,6 @@ namespace __detail
 
   struct _NFA_base
   {
-typedef size_t  _SizeT;
 typedef regex_constants::syntax_option_type _FlagT;
 
 explicit
@@ -206,14 +205,14 @@ namespace __detail
 _M_start() const noexcept
 { return _M_start_state; }
 
-_SizeT
+size_t
 _M_sub_count() const noexcept
 { return _M_subexpr_count; }
 
 _GLIBCXX_STD_C::vector _M_paren_stack;
 _FlagT_M_flags;
 _StateIdT _M_start_state;
-_SizeT_M_subexpr_count;
+size_t_M_subexpr_count;
 bool  _M_has_backref;
   };
 
diff --git a/libstdc++-v3/include/bits/regex_compiler.h 
b/libstdc++-v3/include/bits/regex_compiler.h
index 423ab823194..646766ebdf9 100644
--- a/libstdc++-v3/include/bits/regex_compiler.h
+++ b/libstdc++-v3/include/bits/regex_compiler.h
@@ -58,11 +58,10 @@ namespace __detail
 {
 public:
   typedef typename _TraitsT::char_type_CharT;
-  typedef const _CharT*   _IterT;
   typedef _NFA<_TraitsT> _RegexT;
   typedef regex_constants::syntax_option_type _FlagT;
 
-  _Compiler(_IterT __b, _IterT __e,
+  _Compiler(const _CharT* __b, const _CharT* __e,
const typename _TraitsT::locale_type& __traits, _FlagT __flags);
 
   shared_ptr
diff --git a/libstdc++-v3/include/bits/regex_compiler.tcc 
b/libstdc++-v3/include/bits/regex_compiler.tcc
index 9f04c1be686..1bd30972cbb 100644
--- a/libstdc++-v3/include/bits/regex_compiler.tcc
+++ b/libstdc++-v3/include/bits/regex_compiler.tcc
@@ -63,7 +63,7 @@ namespace __detail
 {
   template
 _Compiler<_TraitsT>::
-_Compiler(_IterT __b, _IterT __e,
+_Compiler(const _CharT* __b, const _CharT* __e,
  const typename _TraitsT::locale_type& __loc, _FlagT __flags)
 : _M_flags((__flags
& (regex_constants::ECMAScript
diff --git a/libstdc++-v3/include/bits/regex_scanner.h 
b/libstdc++-v3/include/bits/regex_scanner.h
index 05d8172a0ad..4e7d5efb34b 100644
--- a/libstdc++-v3/include/bits/regex_scanner.h
+++ b/libstdc++-v3/include/bits/regex_scanner.h
@@ -211,12 +211,11 @@ namespace __detail
 : public _ScannerBase
 {
 public:
-  typedef const _CharT*   _IterT;
   typedef std::basic_string<_CharT>   _StringT;
   typedef regex_constants::syntax_option_type _FlagT;
   typedef const std::ctype<_CharT>_CtypeT;
 
-  _Scanner(_IterT __begin, _IterT __end,
+  _Scanner(const _CharT* __begin, const _CharT* __end,
   _FlagT __flags, std::locale __loc);
 
   void
@@ -257,8 +256,8 @@ namespace __detail
   void
   _M_eat_class(char);
 
-  _IterT_M_current;
-  _IterT_M_end;
+  const _CharT* _M_current;
+  const _CharT* _M_end;
   _CtypeT&  _M_ctype;
   _StringT  _M_value;
   void (_Scanner::* _M_eat_escape)();
diff --git a/libstdc++-v3/include/bits/regex_scanner.tcc 
b/libstdc++-v3/include/bits/regex_scanner.tcc
index a9d6a613648..b2b709ce3cb 100644
--- a/libstdc++-v3/include/bits/regex_scanner.tcc
+++ b/libstdc++-v3/include/bits/regex_scanner.tcc
@@ -54,8 +54,7 @@ namespace __det

Re: [PATCH] c++: ttp matching with constrained auto parm [PR99909]

2021-09-28 Thread Jason Merrill via Gcc-patches

On 9/28/21 15:15, Patrick Palka wrote:

Here, when unifying TT with S, processing_template_decl is unset, and
this foils the dependence checks in do_auto_deduction for avoiding
checking constraints on an auto when the initializer is dependent.

This patch fixes this issue by making sure processing_template_decl is
set during the call to unify from coerce_template_template_parms; this
seems sensible because we're unifying one set of template parameters
with another, so we're dealing with templated trees throughout.



Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


PR c++/99909

gcc/cp/ChangeLog:

* pt.c (coerce_template_template_parms): Keep
processing_template_decl set during the call to unify as well.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-ttp3.C: New test.
---
  gcc/cp/pt.c|  4 ++--
  gcc/testsuite/g++.dg/cpp2a/concepts-ttp3.C | 11 +++
  2 files changed, 13 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ttp3.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 41fa7ed5e43..1dcdffe322a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -7994,12 +7994,12 @@ coerce_template_template_parms (tree parm_parms,
/* So coerce P's args to apply to A's parms, and then deduce between A's
 args and the converted args.  If that succeeds, A is at least as
 specialized as P, so they match.*/
+  processing_template_decl_sentinel ptds (/*reset*/false);
+  ++processing_template_decl;
tree pargs = template_parms_level_to_args (parm_parms);
pargs = add_outermost_template_args (outer_args, pargs);
-  ++processing_template_decl;
pargs = coerce_template_parms (arg_parms, pargs, NULL_TREE, tf_none,
 /*require_all*/true, /*use_default*/true);
-  --processing_template_decl;
if (pargs != error_mark_node)
{
  tree targs = make_tree_vec (nargs);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ttp3.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp3.C
new file mode 100644
index 000..898524e0dfa
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp3.C
@@ -0,0 +1,11 @@
+// PR c++/99909
+// { dg-do compile { target c++20 } }
+
+template constexpr bool always_true = true;
+template concept C = always_true;
+
+template struct S { };
+
+template class TT> void f() { }
+
+template void f();





Re: [PATCH v2] libgcc: Add a backchain fallback to _Unwind_Backtrace() on PowerPC

2021-09-28 Thread Segher Boessenkool
Hi!

On Thu, Aug 26, 2021 at 11:53:24AM -0300, Raphael Moreira Zinsly wrote:
> Without dwarf2 unwind tables available _Unwind_Backtrace() is not
> able to return the full backtrace.
> This patch adds a fallback function on powerpc to get the backtrace
> by doing a backchain, this code was originally at glibc.

Okay, the backchain as fallback if other (better!) methods cannot work.

>   * config/rs6000/linux-unwind.h (struct rt_sigframe): Move it to
>   outside of get_regs() in order to use it in another function,
>   this is done twice: for __powerpc64__ and for !__powerpc64__.
>   (struct trace_arg): New struct.
>   (struct layout): New struct.
>   (ppc_backchain_fallback): New function.
>   * unwind.inc (_Unwind_Backtrace): Look for _URC_NORMAL_STOP
>   code state and call MD_BACKCHAIN_FALLBACK.

Changelog lines wrap at 80 chars, not 70 or so.  The emails from commits
(to bugzilla) are a bit malformed (it counts the number of columns for
leading tabs wrong it seems), but the actual commits are just fine.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/unwind-backchain.c
> @@ -0,0 +1,22 @@
> +/* { dg-do run { target { powerpc*-*-linux* } } } */

Don't say such targets in gcc.target/powerpc/ tests please.  Everything
in gcc.target is for powerpc*-*-* already, so if you really want to
limit to powerpc*-*-linux* just write *-*-linux*.  But there are better
ways to get what you want, like, testing for the actual feature you want
(which is if backtrace() works?)  But such an improvement can be done
later (and needs more testing etc).

But please write some simple comment saying why you need -linux* in the
test.

> +void
> +test_backtrace()
> +{
> +  int addresses;
> +  void *buffer[10];
> +
> +  addresses = backtrace(buffer, 10);
> +  if(addresses != 4)
> +__builtin_abort();
> +}

Does that work?!  Has this been tested on all powerpc*-linux configs?
Importantly also BE and 32-bit.

Okay for trunk with the testcase fix, if all testing works out.  Thanks!


Segher


  1   2   >