Re: [PATCH] loop: Fix profile updates after unrolling [PR102385]

2021-10-08 Thread Richard Biener via Gcc-patches
On Tue, Oct 5, 2021 at 3:39 PM Richard Sandiford via Gcc-patches
 wrote:
>
> In g:62acc72a957b5614 I'd stopped the unroller from using
> an epilogue loop in cases where the iteration count was
> known to be a multiple of the unroll factor.  The epilogue
> and non-epilogue cases still shared this (preexisting) code
> to update the edge frequencies:
>
>   basic_block exit_bb = single_pred (loop->latch);
>   new_exit = find_edge (exit_bb, rest);
>   new_exit->probability = profile_probability::always ()
>.apply_scale (1, new_est_niter + 1);
>   [etc]
>
> But of course (in hindsight) that only makes sense for the
> epilogue case, where we've already moved the main loop's exit edge
> to be a sibling of the latch edge.  For the non-epilogue case,
> the exit edge stays (and needs to stay) in its original position.
>
> I don't really understand what the code is trying to do for
> the epilogue case.  It has:
>
>   /* Ensure that the frequencies in the loop match the new estimated
>  number of iterations, and change the probability of the new
>  exit edge.  */
>
>   profile_count freq_h = loop->header->count;
>   profile_count freq_e = (loop_preheader_edge (loop))->count ();
>   if (freq_h.nonzero_p ())
> {
>   ...
>   scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
> }
>
> Here, freq_e.probability_in (freq_h) is freq_e / freq_h, so for the
> header block, this has the effect of:
>
>   new header count = freq_h * (freq_e / freq_h)
>
> i.e. we say that the header executes exactly as often as the
> preheader edge, which would only make sense if the loop never
> iterates.  Also, after setting the probability of the nonexit edge
> (correctly) to new_est_niter / (new_est_niter + 1), the code does:
>
> scale_bbs_frequencies (&loop->latch, 1, prob);
>
> for this new probability.  I think that only makes sense if the
> nonexit edge was previously unconditional (100%).  But the code
> carefully preserved the probability of the original exit edge
> when creating the new one.
>
> All I'm trying to do here though is fix the mess I created
> and get the probabilities right for the non-epilogue case.
> Things are simpler there since we don't have to worry about
> loop versioning.  Hopefully the comments explain the approach.
>
> The function's current interface implies that it can cope with
> multiple exit edges and that the function only needs the iteration
> count relative to one of those edges in order to work correctly.
> In practice that's not the case: it assumes there is exactly one
> exit edge and all current callers also ensure that the exit test
> dominates the latch.  I think the function is easier to follow
> if we remove the implied generality.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
> PR tree-optimization/102385
> * predict.h (change_edge_frequency): Declare.
> * predict.c (change_edge_frequency): New function.
> * tree-ssa-loop-manip.h (tree_transform_and_unroll_loop): Remove
> edge argument.
> (tree_unroll_loop): Likewise.
> * gimple-loop-jam.c (tree_loop_unroll_and_jam): Update accordingly.
> * tree-predcom.c (pcom_worker::tree_predictive_commoning_loop):
> Likewise.
> * tree-ssa-loop-manip.c (tree_unroll_loop): Likewise.
> (tree_transform_and_unroll_loop): Likewise.  Use single_dom_exit
> to retrieve the exit edges.  Make all the old profile update code
> conditional on !single_loop_p -- the case it was written for --
> and use a different approach for the single-loop case.
>
> gcc/testsuite/
> * testsuite/gcc.dg/pr102385.c: New test.
> ---
>  gcc/gimple-loop-jam.c   |   3 +-
>  gcc/predict.c   |  37 +++
>  gcc/predict.h   |   1 +
>  gcc/testsuite/gcc.dg/pr102385.c |  14 
>  gcc/tree-predcom.c  |   3 +-
>  gcc/tree-ssa-loop-manip.c   | 111 
>  gcc/tree-ssa-loop-manip.h   |   5 +-
>  gcc/tree-ssa-loop-prefetch.c|   3 +-
>  8 files changed, 140 insertions(+), 37 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr102385.c
>
> diff --git a/gcc/predict.h b/gcc/predict.h
> index 8860cafa31c..4df51bd615c 100644
> --- a/gcc/predict.h
> +++ b/gcc/predict.h
> @@ -100,6 +100,7 @@ extern void rebuild_frequencies (void);
>  extern void report_predictor_hitrates (void);
>  extern void force_edge_cold (edge, bool);
>  extern void propagate_unlikely_bbs_forward (void);
> +extern void change_edge_frequency (edge, profile_probability);
>
>  extern void add_reg_br_prob_note (rtx_insn *, profile_probability);
>
> diff --git a/gcc/predict.c b/gcc/predict.c
> index d9c7249831e..68b11135680 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -4481,6 +4481,43 @@ force_edge_cold (edge e, bool impossible)
>  }
> 

Re: [PATCH] Improve integer bit test on atomic builtin return

2021-10-08 Thread Richard Biener via Gcc-patches
On Tue, 5 Oct 2021, H.J. Lu wrote:

> On Tue, Oct 5, 2021 at 3:07 AM Richard Biener  wrote:
> >
> > On Mon, 4 Oct 2021, H.J. Lu wrote:
> >
> > > commit adedd5c173388ae505470df152b9cb3947339566
> > > Author: Jakub Jelinek 
> > > Date:   Tue May 3 13:37:25 2016 +0200
> > >
> > > re PR target/49244 (__sync or __atomic builtins will not emit 'lock 
> > > bts/btr/btc')
> > >
> > > optimized bit test on atomic builtin return with lock bts/btr/btc.  But
> > > it works only for unsigned integers since atomic builtins operate on the
> > > 'uintptr_t' type.  It fails on bool:
> > >
> > >   _1 = atomic builtin;
> > >   _4 = (_Bool) _1;
> > >
> > > and signed integers:
> > >
> > >   _1 = atomic builtin;
> > >   _2 = (int) _1;
> > >   _5 = _2 & (1 << N);
> > >
> > > Improve bit test on atomic builtin return by converting:
> > >
> > >   _1 = atomic builtin;
> > >   _4 = (_Bool) _1;
> > >
> > > to
> > >
> > >   _1 = atomic builtin;
> > >   _5 = _1 & (1 << 0);
> > >   _4 = (_Bool) _5;
> > >
> > > and converting:
> > >
> > >   _1 = atomic builtin;
> > >   _2 = (int) _1;
> > >   _5 = _2 & (1 << N);
> > >
> > > to
> > >   _1 = atomic builtin;
> > >   _6 = _1 & (1 << N);
> > >   _5 = (int) _6;
> >
> > Why not do this last bit with match.pd patterns (and independent on
> > whether _1 is defined by an atomic builtin)?  For the first suggested
> 
> The full picture is
> 
>  _1 = _atomic_fetch_or_* (ptr_6, mask, _3);
>   _2 = (int) _1;
>   _5 = _2 & mask;
> 
> to
> 
>   _1 = _atomic_fetch_or_* (ptr_6, mask, _3);
>   _6 = _1 & mask;
>   _5 = (int) _6;
> 
> It is useful only if 2 masks are the same.
> 
> > transform that's likely going to be undone by folding, no?
> >
> 
> The bool case is
> 
>   _1 = __atomic_fetch_or_* (ptr_6, 1, _3);
>   _4 = (_Bool) _1;
> 
> to
> 
>   _1 = __atomic_fetch_or_* (ptr_6, 1, _3);
>   _5 = _1 & 1;
>   _4 = (_Bool) _5;
> 
> Without __atomic_fetch_or_*, the conversion isn't needed.
> After the conversion, optimize_atomic_bit_test_and will
> immediately optimize the code sequence to
> 
>   _6 = .ATOMIC_BIT_TEST_AND_SET (&v, 0, 0, 0);
>   _4 = (_Bool) _6;
> 
> and there is nothing to fold after it.

Hmm, I see - so how about instead teaching the code that
produces the .ATOMIC_BIT_TEST_AND_SET the alternate forms instead
of doing the intermediate step separately?

Sorry for the delay btw, I've been busy all week ...

Thanks,
Richard.


[PATCH] [GCC-12] Mention O2 vectorization enabling.

2021-10-08 Thread liuhongt via Gcc-patches
---
 htdocs/gcc-12/changes.html | 9 +
 1 file changed, 9 insertions(+)

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 22839f2d..6e898db7 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -68,6 +68,15 @@ a work-in-progress.
 
 General Improvements
 
+
+  The vectorization is enabled in -O2, now -O2 is
+  equivalent to the original -O2 -ftree-vectorize
+  -fvect-cost-model=very-cheap. Note that default vect cost model 
has
+  been changed which may have a subtle effect, for example for the case 
with
+  -O2 -fopenmp #pragma omp parallel for simd.
+  
+
+
 
 New Languages and Language specific improvements
 
-- 
2.18.1



[Committed] Tweak new test cases for -march=cascadelake strangeness.

2021-10-08 Thread Roger Sayle

As reported by Sunil's tester, -march=cascadelake triggers some SUBREG
non-determinacy in the generated assembler for my new tests.  Fixed
by updating the regular expressions to match either the zero or sign
extended forms.  I'm testing a backend patch that may help with the
underlying cause of these differences.

Tested on x86_64-pc-linux-gnu (with and without -march=cascadelake).


2021-10-08  Roger Sayle  

gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-mmx-paddsb-2.c: Test for -128 or 128.
* gcc.target/i386/sse2-mmx-paddusb-2.c: Test for -1 or 255.
* gcc.target/i386/sse2-mmx-psubsb-2.c: Test for -128 or 128.

Roger
--

diff --git a/gcc/testsuite/gcc.target/i386/sse2-mmx-paddsb-2.c 
b/gcc/testsuite/gcc.target/i386/sse2-mmx-paddsb-2.c
index c677884..ad4726b 100644
--- a/gcc/testsuite/gcc.target/i386/sse2-mmx-paddsb-2.c
+++ b/gcc/testsuite/gcc.target/i386/sse2-mmx-paddsb-2.c
@@ -29,5 +29,5 @@ char baz()
 
 /* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$3," 1 } } */
 /* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$127," 1 } } */
-/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$-128," 1 } } */
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$-?128," 1 } } */
 /* { dg-final { scan-assembler-not "paddsb\[ \\t\]+%xmm\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/i386/sse2-mmx-paddusb-2.c 
b/gcc/testsuite/gcc.target/i386/sse2-mmx-paddusb-2.c
index b20891c..1d3bc8b 100644
--- a/gcc/testsuite/gcc.target/i386/sse2-mmx-paddusb-2.c
+++ b/gcc/testsuite/gcc.target/i386/sse2-mmx-paddusb-2.c
@@ -20,6 +20,6 @@ char bar()
 }
 
 /* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$3," 1 } } */
-/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$-1," 1 } } */
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$(?:255|-1)," 1 } } */
 /* { dg-final { scan-assembler-not "paddusb\[ \\t\]+%xmm\[0-9\]+" } } */
 
diff --git a/gcc/testsuite/gcc.target/i386/sse2-mmx-psubsb-2.c 
b/gcc/testsuite/gcc.target/i386/sse2-mmx-psubsb-2.c
index 4fc2920..68b57f2 100644
--- a/gcc/testsuite/gcc.target/i386/sse2-mmx-psubsb-2.c
+++ b/gcc/testsuite/gcc.target/i386/sse2-mmx-psubsb-2.c
@@ -28,6 +28,6 @@ char baz()
 }
 
 /* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$3," 1 } } */
-/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$-128," 1 } } */
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$-?128," 1 } } */
 /* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$127," 1 } } */
 /* { dg-final { scan-assembler-not "paddsb\[ \\t\]+%xmm\[0-9\]+" } } */


[committed] openmp: Fix up declare target handling for vars with DECL_LOCAL_DECL_ALIAS [PR102640]

2021-10-08 Thread Jakub Jelinek via Gcc-patches
Hi!

The introduction of DECL_LOCAL_DECL_ALIAS and push_local_extern_decl_alias
in r11-3699-g4e62aca0e0520e4ed2532f2d8153581190621c1a broke the following
testcase.  The following patch fixes it by treating similarly not just
the variable to or link clause is put on, but also its DECL_LOCAL_DECL_ALIAS
if any.  If it hasn't been created yet, when it is created it will copy
attributes and therefore should get it for free, and as it is an extern,
nothing more than attributes is needed for it.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-10-08  Jakub Jelinek  

PR c++/102640
gcc/cp/
* parser.c (handle_omp_declare_target_clause): New function.
(cp_parser_omp_declare_target): Use it.
gcc/testsuite/
* c-c++-common/gomp/pr102640.c: New test.

--- gcc/cp/parser.c.jj  2021-10-07 12:52:34.986912260 +0200
+++ gcc/cp/parser.c 2021-10-07 16:36:12.748015996 +0200
@@ -45505,6 +45505,71 @@ cp_parser_late_parsing_omp_declare_simd
   return attrs;
 }
 
+/* Helper for cp_parser_omp_declare_target, handle one to or link clause
+   on #pragma omp declare target.  Return false if errors were reported.  */
+
+static bool
+handle_omp_declare_target_clause (tree c, tree t, int device_type)
+{
+  tree at1 = lookup_attribute ("omp declare target", DECL_ATTRIBUTES (t));
+  tree at2 = lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (t));
+  tree id;
+  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LINK)
+{
+  id = get_identifier ("omp declare target link");
+  std::swap (at1, at2);
+}
+  else
+id = get_identifier ("omp declare target");
+  if (at2)
+{
+  error_at (OMP_CLAUSE_LOCATION (c),
+   "%qD specified both in declare target % and %"
+   " clauses", t);
+  return false;
+}
+  if (!at1)
+{
+  DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+  if (TREE_CODE (t) != FUNCTION_DECL && !is_global_var (t))
+   return true;
+
+  symtab_node *node = symtab_node::get (t);
+  if (node != NULL)
+   {
+ node->offloadable = 1;
+ if (ENABLE_OFFLOADING)
+   {
+ g->have_offload = true;
+ if (is_a  (node))
+   vec_safe_push (offload_vars, t);
+   }
+   }
+}
+  if (TREE_CODE (t) != FUNCTION_DECL)
+return true;
+  if ((device_type & OMP_CLAUSE_DEVICE_TYPE_HOST) != 0)
+{
+  tree at3 = lookup_attribute ("omp declare target host",
+  DECL_ATTRIBUTES (t));
+  if (at3 == NULL_TREE)
+   {
+ id = get_identifier ("omp declare target host");
+ DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+   }
+}
+  if ((device_type & OMP_CLAUSE_DEVICE_TYPE_NOHOST) != 0)
+{
+  tree at3 = lookup_attribute ("omp declare target nohost",
+  DECL_ATTRIBUTES (t));
+  if (at3 == NULL_TREE)
+   {
+ id = get_identifier ("omp declare target nohost");
+ DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+   }
+}
+  return true;
+}
 
 /* OpenMP 4.0:
# pragma omp declare target new-line
@@ -45557,67 +45622,16 @@ cp_parser_omp_declare_target (cp_parser
 {
   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
continue;
-  tree t = OMP_CLAUSE_DECL (c), id;
-  tree at1 = lookup_attribute ("omp declare target", DECL_ATTRIBUTES (t));
-  tree at2 = lookup_attribute ("omp declare target link",
-  DECL_ATTRIBUTES (t));
+  tree t = OMP_CLAUSE_DECL (c);
   only_device_type = false;
-  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LINK)
-   {
- id = get_identifier ("omp declare target link");
- std::swap (at1, at2);
-   }
-  else
-   id = get_identifier ("omp declare target");
-  if (at2)
-   {
- error_at (OMP_CLAUSE_LOCATION (c),
-   "%qD specified both in declare target % and %"
-   " clauses", t);
- continue;
-   }
-  if (!at1)
-   {
- DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
- if (TREE_CODE (t) != FUNCTION_DECL && !is_global_var (t))
-   continue;
-
- symtab_node *node = symtab_node::get (t);
- if (node != NULL)
-   {
- node->offloadable = 1;
- if (ENABLE_OFFLOADING)
-   {
- g->have_offload = true;
- if (is_a  (node))
-   vec_safe_push (offload_vars, t);
-   }
-   }
-   }
-  if (TREE_CODE (t) != FUNCTION_DECL)
+  if (!handle_omp_declare_target_clause (c, t, device_type))
continue;
-  if ((device_type & OMP_CLAUSE_DEVICE_TYPE_HOST) != 0)
-   {
- tree at3 = lookup_attribute ("omp declare target host",
-  DECL_ATTRIBUTES (t));
- if (

Rewrite PTA constraint generation for function calls

2021-10-08 Thread Jan Hubicka
Hi,
this patch commonizes the three paths to produce constraints for function call
and makes it more flexible, so we can implement new features more easily.  Main
idea is to not special case pure and const since we can now describe all of
pure/const via their EAF flags (implicit_const_eaf_flags and
implicit_pure_eaf_flags) and info on existence of global memory loads/stores in
function. All this info is readily available in the modref summary.

While rewriting the function, I dropped some of optimizations (aiming to
reduce number of constraints produced, not aiming for better code
quality) in the way we generate constraints. Some of them we may want to
add back, but I think the constraint solver should be fast to get rid of
them quickly, so it looks like bit of premature optimization.

We now always produce one additional PTA variable (callescape) for things that
escape into function call and thus can be stored to parameters or global memory
(if modified). This is no longer the same as global escape in case function is
not reading global memory. It is also not same as call use, since we now
understand the fact that interposable functions may use parameter in a way that
is not releavnt for PTA (so we can not optimize out stores initializing the
memory, but we can be safe about fact that pointers stored does not escape).

Compared to previous code we now handle correctly EAF_NOT_RETURNED in all cases
(previously we did so only when all parameters had the flag) and also handle
NOCLOBBER in more cases (since we make difference between global escape and
call escape). Because I commonized code handling args and static chains, we
could now easily extend modref to also track flags for static chain and return
slot which I plan to do next.

Otherwise I put some effort into producing constraints that produce similar
solutions as before (so it is harder to debug differences). For example if
global memory is written one can simply move callescape to escape rather then
making everything escape by its own constraints, but it affects ipa-pta
testcases.

Building cc1plus I get

Alias oracle query stats:
  refs_may_alias_p: 79390176 disambiguations, 101029935 queries
  ref_maybe_used_by_call_p: 608504 disambiguations, 80453740 queries
  call_may_clobber_ref_p: 355944 disambiguations, 359972 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 39288 queries
  nonoverlapping_refs_since_match_p: 31654 disambiguations, 65783 must 
overlaps, 98330 queries
  aliasing_component_refs_p: 66051 disambiguations, 12846380 queries
  TBAA oracle: 30131336 disambiguations 100700373 queries
   14327733 are in alias set 0
   12073400 queries asked about the same object
   136 queries asked about the same alias set
   0 access volatile
   42233873 are dependent in the DAG
   1933895 are aritificially in conflict with void *

Modref stats:
  modref use: 25403 disambiguations, 742338 queries
  modref clobber: 2267306 disambiguations, 21343054 queries
  4608109 tbaa queries (0.215907 per modref query)
  703185 base compares (0.032947 per modref query)

PTA query stats:
  pt_solution_includes: 13018495 disambiguations, 36242235 queries
  pt_solutions_intersect: 1510454 disambiguations, 15485389 queries

This is very similar to stats w/o the patch.  Actually PTA query stats are bit
lower (sub 1%) however modref clobber stats are 17% up.  I am not sure why that
happens.

Bootstrapped/regtested x86_64-linux.  OK?

Honza


* ipa-modref-tree.h (modref_tree::global_access_p): New member
function.
* ipa-modref.c: 
(implicint_const_eaf_flags,implicit_pure_eaf_flags,
ignore_stores_eaf_flags): Move to ipa-modref.h
(remove_useless_eaf_flags): Remove early exit on NOCLOBBER.
(modref_summary::global_memory_read_p): New member function.
(modref_summary::global_memory_written_p): New member function.
* ipa-modref.h (modref_summary::global_memory_read_p,
modref_summary::global_memory_written_p): Declare.
(implicint_const_eaf_flags,implicit_pure_eaf_flags,
ignore_stores_eaf_flags): move here.
* tree-ssa-structalias.c: Include ipa-modref-tree.h, ipa-modref.h
and attr-fnspec.h.
(handle_rhs_call): Rewrite.
(handle_call_arg): New function.
(determine_global_memory_access): New function.
(handle_const_call): Remove
(handle_pure_call): Remove
(find_func_aliases_for_call): Update use of handle_rhs_call.
(compute_points_to_sets): Handle global memory acccesses
selectively

gcc/testsuite/ChangeLog:

* gcc.dg/torture/ssa-pta-fn-1.c: Fix template; add noipa.
* gcc.dg/tree-ssa/pta-callused.c: Fix template.

diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index 8e9b89b3e2c..52f225b1aae 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -1012,6 +1017,31 @@ struct GTY((user)) modref_

[PATCH] Refine movhfcc.

2021-10-08 Thread liuhongt via Gcc-patches
For AVX512-FP16, HFmode only supports vcmpsh whose dest is mask
register, so for movhfcc, it's

vcmpsh op2, op1, %k1
vmovsh op1, op2{%k1}
mov op2, dest

gcc/ChangeLog:

PR target/102639
* config/i386/i386-expand.c (ix86_valid_mask_cmp_mode): Handle
HFmode.
(ix86_use_mask_cmp_p): Ditto.
(ix86_expand_sse_movcc): Ditto.
* config/i386/i386.md (setcc_hf_mask): New define_insn.
(movhf_mask): Ditto.
(UNSPEC_MOVCC_MASK): New unspec.
* config/i386/sse.md (UNSPEC_PCMP): Move to i386.md.

gcc/testsuite/ChangeLog:
* g++.target/i386/pr102639.C: New test.
---
 gcc/config/i386/i386-expand.c| 19 ++---
 gcc/config/i386/i386.md  | 34 +++-
 gcc/config/i386/sse.md   |  1 -
 gcc/testsuite/g++.target/i386/pr102639.C | 19 +
 4 files changed, 67 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr102639.C

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 4780b993917..3c4a07d4d7d 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -3613,6 +3613,10 @@ ix86_valid_mask_cmp_mode (machine_mode mode)
   if (TARGET_XOP && !TARGET_AVX512F)
 return false;
 
+  /* HFmode only supports vcmpsh whose dest is mask register.  */
+  if (TARGET_AVX512FP16 && mode == HFmode)
+return true;
+
   /* AVX512F is needed for mask operation.  */
   if (!(TARGET_AVX512F && VECTOR_MODE_P (mode)))
 return false;
@@ -3634,7 +3638,9 @@ ix86_use_mask_cmp_p (machine_mode mode, machine_mode 
cmp_mode,
 {
   int vector_size = GET_MODE_SIZE (mode);
 
-  if (vector_size < 16)
+  if (cmp_mode == HFmode)
+return true;
+  else if (vector_size < 16)
 return false;
   else if (vector_size == 64)
 return true;
@@ -3750,7 +3756,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
rtx op_false)
   && GET_MODE_CLASS (cmpmode) == MODE_INT)
 {
   gcc_assert (ix86_valid_mask_cmp_mode (mode));
-  /* Using vector move with mask register.  */
+  /* Using scalar/vector move with mask register.  */
   cmp = force_reg (cmpmode, cmp);
   /* Optimize for mask zero.  */
   op_true = (op_true != CONST0_RTX (mode)
@@ -3769,8 +3775,13 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
rtx op_false)
  std::swap (op_true, op_false);
}
 
-  rtx vec_merge = gen_rtx_VEC_MERGE (mode, op_true, op_false, cmp);
-  emit_insn (gen_rtx_SET (dest, vec_merge));
+  if (mode == HFmode)
+   emit_insn (gen_movhf_mask (dest, op_true, op_false, cmp));
+  else
+   {
+ rtx vec_merge = gen_rtx_VEC_MERGE (mode, op_true, op_false, cmp);
+ emit_insn (gen_rtx_SET (dest, vec_merge));
+   }
   return;
 }
   else if (vector_all_ones_operand (op_true, mode)
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 04cb3bf6a33..c7ae4ac5fbc 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -117,6 +117,7 @@ (define_c_enum "unspec" [
   ;; For SSE/MMX support:
   UNSPEC_FIX_NOTRUNC
   UNSPEC_MASKMOV
+  UNSPEC_MOVCC_MASK
   UNSPEC_MOVMSK
   UNSPEC_BLENDV
   UNSPEC_PSHUFB
@@ -125,8 +126,9 @@ (define_c_enum "unspec" [
   UNSPEC_RSQRT
   UNSPEC_PSADBW
 
-  ;; For AVX512F support
+  ;; For AVX/AVX512F support
   UNSPEC_SCALEF
+  UNSPEC_PCMP
 
   ;; Generic math support
   UNSPEC_IEEE_MIN  ; not commutative
@@ -13608,6 +13610,20 @@ (define_insn "setcc__sse"
(set_attr "length_immediate" "1")
(set_attr "prefix" "orig,vex")
(set_attr "mode" "")])
+
+(define_insn "setcc_hf_mask"
+  [(set (match_operand:QI 0 "register_operand" "=k")
+   (unspec:QI
+ [(match_operand:HF 1 "register_operand" "v")
+  (match_operand:HF 2 "nonimmediate_operand" "vm")
+  (match_operand:SI 3 "const_0_to_31_operand" "n")]
+ UNSPEC_PCMP))]
+  "TARGET_AVX512FP16"
+  "vcmpsh\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssecmp")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
 
 ;; Basic conditional jump instructions.
 
@@ -19841,6 +19857,22 @@ (define_peephole2
   operands[9] = replace_rtx (operands[6], operands[0], operands[1], true);
 })
 
+(define_insn "movhf_mask"
+  [(set (match_operand:HF 0 "nonimmediate_operand" "=v,m,v")
+   (unspec:HF
+ [(match_operand:HF 1 "nonimmediate_operand" "m,v,v")
+  (match_operand:HF 2 "nonimm_or_0_operand" "0C,0C,0C")
+  (match_operand:QI 3 "register_operand" "Yk,Yk,Yk")]
+ UNSPEC_MOVCC_MASK))]
+  "TARGET_AVX512FP16"
+  "@
+   vmovsh\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}
+   vmovsh\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}
+   vmovsh\t{%d1, %0%{%3%}%N2|%0%{%3%}%N2, %d1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
 (define_expand "movhfcc"
   [(set (match_operand:HF 0 "register_operand")
(if_then_else:HF
diff --git a/gcc/config/i386/sse.md b/gcc/config

Re: Rewrite PTA constraint generation for function calls

2021-10-08 Thread Richard Biener via Gcc-patches
On Fri, 8 Oct 2021, Jan Hubicka wrote:

> Hi,
> this patch commonizes the three paths to produce constraints for function call
> and makes it more flexible, so we can implement new features more easily.  
> Main
> idea is to not special case pure and const since we can now describe all of
> pure/const via their EAF flags (implicit_const_eaf_flags and
> implicit_pure_eaf_flags) and info on existence of global memory loads/stores 
> in
> function. All this info is readily available in the modref summary.
> 
> While rewriting the function, I dropped some of optimizations (aiming to
> reduce number of constraints produced, not aiming for better code
> quality) in the way we generate constraints. Some of them we may want to
> add back, but I think the constraint solver should be fast to get rid of
> them quickly, so it looks like bit of premature optimization.
> 
> We now always produce one additional PTA variable (callescape) for things that
> escape into function call and thus can be stored to parameters or global 
> memory
> (if modified). This is no longer the same as global escape in case function is
> not reading global memory. It is also not same as call use, since we now
> understand the fact that interposable functions may use parameter in a way 
> that
> is not releavnt for PTA (so we can not optimize out stores initializing the
> memory, but we can be safe about fact that pointers stored does not escape).
> 
> Compared to previous code we now handle correctly EAF_NOT_RETURNED in all 
> cases
> (previously we did so only when all parameters had the flag) and also handle
> NOCLOBBER in more cases (since we make difference between global escape and
> call escape). Because I commonized code handling args and static chains, we
> could now easily extend modref to also track flags for static chain and return
> slot which I plan to do next.
> 
> Otherwise I put some effort into producing constraints that produce similar
> solutions as before (so it is harder to debug differences). For example if
> global memory is written one can simply move callescape to escape rather then
> making everything escape by its own constraints, but it affects ipa-pta
> testcases.
> 
> Building cc1plus I get
> 
> Alias oracle query stats:
>   refs_may_alias_p: 79390176 disambiguations, 101029935 queries
>   ref_maybe_used_by_call_p: 608504 disambiguations, 80453740 queries
>   call_may_clobber_ref_p: 355944 disambiguations, 359972 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 39288 queries
>   nonoverlapping_refs_since_match_p: 31654 disambiguations, 65783 must 
> overlaps, 98330 queries
>   aliasing_component_refs_p: 66051 disambiguations, 12846380 queries
>   TBAA oracle: 30131336 disambiguations 100700373 queries
>14327733 are in alias set 0
>12073400 queries asked about the same object
>136 queries asked about the same alias set
>0 access volatile
>42233873 are dependent in the DAG
>1933895 are aritificially in conflict with void *
> 
> Modref stats:
>   modref use: 25403 disambiguations, 742338 queries
>   modref clobber: 2267306 disambiguations, 21343054 queries
>   4608109 tbaa queries (0.215907 per modref query)
>   703185 base compares (0.032947 per modref query)
> 
> PTA query stats:
>   pt_solution_includes: 13018495 disambiguations, 36242235 queries
>   pt_solutions_intersect: 1510454 disambiguations, 15485389 queries
> 
> This is very similar to stats w/o the patch.  Actually PTA query stats are bit
> lower (sub 1%) however modref clobber stats are 17% up.  I am not sure why 
> that
> happens.
> 
> Bootstrapped/regtested x86_64-linux.  OK?

OK if you add the missing function level comments.

Thanks,
Richard.

> Honza
> 
> 
>   * ipa-modref-tree.h (modref_tree::global_access_p): New member
>   function.
>   * ipa-modref.c: 
>   (implicint_const_eaf_flags,implicit_pure_eaf_flags,
>   ignore_stores_eaf_flags): Move to ipa-modref.h
>   (remove_useless_eaf_flags): Remove early exit on NOCLOBBER.
>   (modref_summary::global_memory_read_p): New member function.
>   (modref_summary::global_memory_written_p): New member function.
>   * ipa-modref.h (modref_summary::global_memory_read_p,
>   modref_summary::global_memory_written_p): Declare.
>   (implicint_const_eaf_flags,implicit_pure_eaf_flags,
>   ignore_stores_eaf_flags): move here.
>   * tree-ssa-structalias.c: Include ipa-modref-tree.h, ipa-modref.h
>   and attr-fnspec.h.
>   (handle_rhs_call): Rewrite.
>   (handle_call_arg): New function.
>   (determine_global_memory_access): New function.
>   (handle_const_call): Remove
>   (handle_pure_call): Remove
>   (find_func_aliases_for_call): Update use of handle_rhs_call.
>   (compute_points_to_sets): Handle global memory acccesses
>   selectively
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/torture/ssa-pta-fn-1.c: Fix temp

[PATCH] options: use cl_optimization_hash.

2021-10-08 Thread Martin Liška

Hello.

Right now, we use legacy hashing function in cl_option_hasher and I also
need the change for the future fix of PR102585.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* tree.c (cl_option_hasher::hash): Use cl_optimization_hash
and remove legacy hashing code.
---
 gcc/tree.c | 19 +--
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/gcc/tree.c b/gcc/tree.c
index 561b9cd56bd..7bfd64160f4 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -11473,30 +11473,13 @@ hashval_t
 cl_option_hasher::hash (tree x)
 {
   const_tree const t = x;
-  const char *p;
-  size_t i;
-  size_t len = 0;
-  hashval_t hash = 0;
 
   if (TREE_CODE (t) == OPTIMIZATION_NODE)

-{
-  p = (const char *)TREE_OPTIMIZATION (t);
-  len = sizeof (struct cl_optimization);
-}
-
+return cl_optimization_hash (TREE_OPTIMIZATION (t));
   else if (TREE_CODE (t) == TARGET_OPTION_NODE)
 return cl_target_option_hash (TREE_TARGET_OPTION (t));
-
   else
 gcc_unreachable ();
-
-  /* assume most opt flags are just 0/1, some are 2-3, and a few might be
- something else.  */
-  for (i = 0; i < len; i++)
-if (p[i])
-  hash = (hash << 4) ^ ((i << 2) | p[i]);
-
-  return hash;
 }
 
 /* Return nonzero if the value represented by *X (an OPTIMIZATION or

--
2.33.0



Re: [PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-10-08 Thread Aldy Hernandez via Gcc-patches
On Thu, Sep 23, 2021 at 8:32 AM Richard Biener via Gcc-patches
 wrote:
>
> On Thu, 23 Sep 2021, Hongtao Liu wrote:
>
> > On Thu, Sep 23, 2021 at 9:48 AM Hongtao Liu  wrote:
> > >
> > > On Wed, Sep 22, 2021 at 10:21 PM Martin Sebor  wrote:
> > > >
> > > > On 9/21/21 7:38 PM, Hongtao Liu wrote:
> > > > > On Mon, Sep 20, 2021 at 4:13 AM Martin Sebor  wrote:
> > > > ...
> > > > > diff --git a/gcc/testsuite/c-c++-common/Wstringop-overflow-2.c 
> > > > > b/gcc/testsuite/c-c++-common/Wstringop-overflow-2.c
> > > > > index 1d79930cd58..9351f7e7a1a 100644
> > > > > --- a/gcc/testsuite/c-c++-common/Wstringop-overflow-2.c
> > > > > +++ b/gcc/testsuite/c-c++-common/Wstringop-overflow-2.c
> > > > > @@ -1,7 +1,7 @@
> > > > >/* PR middle-end/91458 - inconsistent warning for writing past 
> > > > > the end
> > > > >   of an array member
> > > > >   { dg-do compile }
> > > > > -   { dg-options "-O2 -Wall -Wno-array-bounds -fno-ipa-icf" } */
> > > > > +   { dg-options "-O2 -Wall -Wno-array-bounds -fno-ipa-icf 
> > > > > -fno-tree-vectorize" } */
> > > > 
> > > >  The testcase is large - what part requires this change?  Given the
> > > >  testcase was added for inconsistent warnings do they now become
> > > >  inconsistent again as we enable vectorization at -O2?
> > > > 
> > > >  That said, the testcase adjustments need some explaining - I 
> > > >  suppose
> > > >  you didn't just slap -fno-tree-vectorize to all of those changing
> > > >  behavior?
> > > > 
> > > > >>> void ga1_ (void)
> > > > >>> {
> > > > >>> a1_.a[0] = 0;
> > > > >>> a1_.a[1] = 1; // { dg-warning 
> > > > >>> "\\\[-Wstringop-overflow" }
> > > > >>> a1_.a[2] = 2; // { dg-warning 
> > > > >>> "\\\[-Wstringop-overflow" }
> > > > >>>
> > > > >>> struct A1 a;
> > > > >>> a.a[0] = 0;
> > > > >>> a.a[1] = 1;   // { dg-warning 
> > > > >>> "\\\[-Wstringop-overflow" }
> > > > >>> a.a[2] = 2;   // { dg-warning 
> > > > >>> "\\\[-Wstringop-overflow" }
> > > > >>> sink (&a);
> > > > >>> }
> > > > >>>
> > > > >>> It's supposed to be 2 warning for a.a[1] = 1 and a.a[2] = 1 since
> > > > >>> there are 2 accesses, but after enabling vectorization, there's only
> > > > >>> one access, so one warning is missing which causes the failure.
> > > >
> > > > With the stores vectorized, is the warning on the correct line or
> > > > does it point to the first store, the one that's in bounds, as
> > > > it does with -O3?  The latter would be a regression at -O2.
> > > For the upper case, It points to the second store which is out of
> > > bounds, the third store warning is missing.
> > > >
> > > > >>
> > > > >> I would find it preferable to change the test code over disabling
> > > > >> optimizations that are on by default.  My concern is that the test
> > > > >> would no longer exercise the default behavior.  (The same goes for
> > > > >> the -fno-ipa-icf option.)
> > > > > Hmm, it's a middle-end test, for some backend, it may not do
> > > > > vectorization(it depends on TARGET_VECTOR_MODE_SUPPORTED_P and
> > > > > relative cost model).
> > > >
> > > > Yes, there are quite a few warning tests like that.  Their main
> > > > purpose is to verify that in common GCC invocations (i.e., without
> > > > any special options) warnings are a) issued when expected and b)
> > > > not issued when not expected.  Otherwise, middle end warnings are
> > > > known to have both false positives and false negatives in some
> > > > invocations, depending on what optimizations are in effect.
> > > > Indiscriminately disabling common optimizations for these large
> > > > tests and invoking them under artificial conditions would
> > > > compromise this goal and hide the problems.
> > > >
> > > > If enabling vectorization at -O2 causes regressions in the quality
> > > > of diagnostics (as the test failure above indicates seems to be
> > > > happening) we should investigate these and open bugs for them so
> > > > they can be fixed.  We can then tweak the specific failing test
> > > > cases to avoid the failures until they are fixed.
> > > There are indeed cases of false positives and false negatives
> > > .i.e.
> > > // Verify warning for access to a definition with an initializer that
> > > // initializes the one-element array member.
> > > struct A1 a1i_1 = { 0, { 1 } };
> > >
> > > void ga1i_1 (void)
> > > {
> > >   a1i_1.a[0] = 0;
> > >   a1i_1.a[1] = 1;   // { dg-warning "\\\[-Wstringop-overflow" 
> > > }
> > >   a1i_1.a[2] = 2;   // { dg-warning "\\\[-Wstringop-overflow" 
> > > }
> > >
> > >   struct A1 a = { 0, { 1 } }; --- false positive here.
> > >   a.a[0] = 1;
> > >   a.a[1] = 2;   // { dg-warning
> > > "\\\[-Wstringop-overflow" } false negative here.
> > >   a.a[2] = 3;   // { dg-warning
> > > "\\\[-Wstringop-overflow" } false negative here.

[PATCH] contrib: git gcc-descr defaulting to print hash

2021-10-08 Thread Martin Liška

Hello.

I'm sending a patch originally written by Martin Jambor.
The patch changes the behavior in the following way:

$ git gcc-descr HEAD~

r12-4245-gdb3d7270b42fe2

$ git gcc-descr --short HEAD~

r12-4245

$ git gcc-undescr r12-4245-gdb3d7270b42fe2

db3d7270b42fe27fb05664c4fdf524ab7ad13a75

while right now, one gets:
$ git gcc-descr

r12-4090

$ git gcc-undescr r12-4245-gdb3d7270b42fe2
Invalid id r12-4245-gdb3d7270b42fe2

Thoughts?
Martin

contrib/ChangeLog:

* gcc-git-customization.sh: Remove --full option and add --short
one. By default, gcc-descr prints 14 characters of hash.
gcc-undescr supports r$number-$number-$hash format.
---
 contrib/gcc-git-customization.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/contrib/gcc-git-customization.sh b/contrib/gcc-git-customization.sh
index aca61b781ff..b63c4c80935 100755
--- a/contrib/gcc-git-customization.sh
+++ b/contrib/gcc-git-customization.sh
@@ -22,8 +22,8 @@ git config alias.svn-rev '!f() { rev=$1; shift; git log --all 
--grep="^From-SVN:
 
 # Add git commands to convert git commit to monotonically increasing revision number

 # and vice versa
-git config alias.gcc-descr \!"f() { if test \${1:-no} = --full; then c=\${2:-master}; r=\$(git describe 
--all --abbrev=40 --match 'basepoints/gcc-[0-9]*' \$c | sed -n 's,^\\(tags/\\)\\?basepoints/gcc-,r,p'); expr 
match \${r:-no} '^r[0-9]\\+\$' >/dev/null && r=\${r}-0-g\$(git rev-parse \${2:-master}); else 
c=\${1:-master}; r=\$(git describe --all --match 'basepoints/gcc-[0-9]*' \$c | sed -n 
's,^\\(tags/\\)\\?basepoints/gcc-\\([0-9]\\+\\)-\\([0-9]\\+\\)-g[0-9a-f]*\$,r\\2-\\3,p;s,^\\(tags/\\)\\?basepoints/gcc-\\([0-9]\\+\\)\$,r\\2-0,p');
 fi; if test -n \$r; then o=\$(git config --get gcc-config.upstream); rr=\$(echo \$r | sed -n 
's,^r\\([0-9]\\+\\)-[0-9]\\+\\(-g[0-9a-f]\\+\\)\\?\$,\\1,p'); if git rev-parse --verify --quiet 
\${o:-origin}/releases/gcc-\$rr >/dev/null; then m=releases/gcc-\$rr; else m=master; fi; git merge-base 
--is-ancestor \$c \${o:-origin}/\$m && \echo \${r}; fi; }; f"
-git config alias.gcc-undescr \!"f() { o=\$(git config --get gcc-config.upstream); r=\$(echo \$1 | sed -n 
's,^r\\([0-9]\\+\\)-[0-9]\\+\$,\\1,p'); n=\$(echo \$1 | sed -n 's,^r[0-9]\\+-\\([0-9]\\+\\)\$,\\1,p'); test -z 
\$r && echo Invalid id \$1 && exit 1; h=\$(git rev-parse --verify --quiet 
\${o:-origin}/releases/gcc-\$r); test -z \$h && h=\$(git rev-parse --verify --quiet 
\${o:-origin}/master); p=\$(git describe --all --match 'basepoints/gcc-'\$r \$h | sed -n 
's,^\\(tags/\\)\\?basepoints/gcc-[0-9]\\+-\\([0-9]\\+\\)-g[0-9a-f]*\$,\\2,p;s,^\\(tags/\\)\\?basepoints/gcc-[0-9]\\+\$,0,p');
 git rev-parse --verify \$h~\$(expr \$p - \$n); }; f"
+git config alias.gcc-descr \!"f() { if test \${1:-no} = --short; then c=\${2:-master}; r=\$(git describe 
--all --match 'basepoints/gcc-[0-9]*' \$c | sed -n 
's,^\\(tags/\\)\\?basepoints/gcc-\\([0-9]\\+\\)-\\([0-9]\\+\\)-g[0-9a-f]*\$,r\\2-\\3,p;s,^\\(tags/\\)\\?basepoints/gcc-\\([0-9]\\+\\)\$,r\\2-0,p');
 else c=\${1:-master}; r=\$(git describe --all --abbrev=14 --match 'basepoints/gcc-[0-9]*' \$c | sed -n 
's,^\\(tags/\\)\\?basepoints/gcc-,r,p'); expr match \${r:-no} '^r[0-9]\\+\$' >/dev/null && 
r=\${r}-0-g\$(git rev-parse \${2:-master}); fi; if test -n \$r; then o=\$(git config --get 
gcc-config.upstream); rr=\$(echo \$r | sed -n 's,^r\\([0-9]\\+\\)-[0-9]\\+\\(-g[0-9a-f]\\+\\)\\?\$,\\1,p'); if 
git rev-parse --verify --quiet \${o:-origin}/releases/gcc-\$rr >/dev/null; then m=releases/gcc-\$rr; else 
m=master; fi; git merge-base --is-ancestor \$c \${o:-origin}/\$m && \\echo \${r}; fi; }; f"
+git config alias.gcc-undescr \!"f() { o=\$(git config --get gcc-config.upstream); r=\$(echo \$1 | sed 
's/\\([^-]*-[^-]*\\)-.*/\\1/' | sed -n 's,^r\\([0-9]\\+\\)-[0-9]\\+\$,\\1,p'); n=\$(echo \$1 | sed 
's/\\([^-]*-[^-]*\\)-.*/\\1/' | sed -n 's,^r[0-9]\\+-\\([0-9]\\+\\)\$,\\1,p'); test -z \$r && echo 
Invalid id \$1 && exit 1; h=\$(git rev-parse --verify --quiet \${o:-origin}/releases/gcc-\$r); test -z 
\$h && h=\$(git rev-parse --verify --quiet \${o:-origin}/master); p=\$(git describe --all --match 
'basepoints/gcc-'\$r \$h | sed -n 
's,^\\(tags/\\)\\?basepoints/gcc-[0-9]\\+-\\([0-9]\\+\\)-g[0-9a-f]*\$,\\2,p;s,^\\(tags/\\)\\?basepoints/gcc-[0-9]\\+\$,0,p');
 git rev-parse --verify \$h~\$(expr \$p - \$n); }; f"
 
 git config alias.gcc-verify '!f() { "`git rev-parse --show-toplevel`/contrib/gcc-changelog/git_check_commit.py" $@; } ; f'

 git config alias.gcc-backport '!f() { "`git rev-parse 
--show-toplevel`/contrib/git-backport.py" $@; } ; f'
--
2.33.0



[PATCH] Come up with OPTION_SET_P macro.

2021-10-08 Thread Martin Liška

Hello.

It's a refactoring patch introducing a new macro.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
I verified all cross compilers do build.

Ready to be installed?
Thanks,
Martin


gcc/ada/ChangeLog:

* gcc-interface/misc.c (gnat_post_options): Use new macro
OPTION_SET_P.
(gnat_init_gcc_eh): Likewise.
(gnat_init_gcc_fp): Likewise.

gcc/c-family/ChangeLog:

* c-opts.c (c_common_post_options): Use new macro
OPTION_SET_P.

gcc/ChangeLog:

* config/alpha/alpha.c (alpha_option_override): Use new macro
OPTION_SET_P.
* config/arc/arc.c (arc_override_options): Likewise.
* config/arm/arm.c (arm_option_override): Likewise.
* config/bfin/bfin.c (bfin_load_pic_reg): Likewise.
* config/c6x/c6x.c (c6x_option_override): Likewise.
* config/csky/csky.c: Likewise.
* config/darwin.c (darwin_override_options): Likewise.
* config/frv/frv.c (frv_option_override): Likewise.
* config/i386/djgpp.h: Likewise.
* config/i386/i386.c (ix86_stack_protect_guard): Likewise.
(ix86_max_noce_ifcvt_seq_cost): Likewise.
* config/ia64/ia64.c (ia64_option_override): Likewise.
(ia64_override_options_after_change): Likewise.
* config/m32c/m32c.c (m32c_option_override): Likewise.
* config/m32r/m32r.c (m32r_init): Likewise.
* config/m68k/m68k.c (m68k_option_override): Likewise.
* config/microblaze/microblaze.c (microblaze_option_override): Likewise.
* config/mips/mips.c (mips_option_override): Likewise.
* config/nios2/nios2.c (nios2_option_override): Likewise.
* config/nvptx/nvptx.c (nvptx_option_override): Likewise.
* config/pa/pa.c (pa_option_override): Likewise.
* config/riscv/riscv.c (riscv_option_override): Likewise.
* config/rs6000/aix71.h: Likewise.
* config/rs6000/aix72.h: Likewise.
* config/rs6000/aix73.h: Likewise.
* config/rs6000/rs6000.c (darwin_rs6000_override_options): Likewise.
(rs6000_override_options_after_change): Likewise.
(rs6000_linux64_override_options): Likewise.
(glibc_supports_ieee_128bit): Likewise.
(rs6000_option_override_internal): Likewise.
(rs6000_file_start): Likewise.
(rs6000_darwin_file_start): Likewise.
* config/rs6000/rtems.h: Likewise.
* config/rs6000/sysv4.h: Likewise.
* config/rs6000/vxworks.h (SUB3TARGET_OVERRIDE_OPTIONS): Likewise.
* config/s390/s390.c (s390_option_override): Likewise.
* config/sh/linux.h: Likewise.
* config/sh/netbsd-elf.h (while): Likewise.
* config/sh/sh.c (sh_option_override): Likewise.
* config/sol2.c (solaris_override_options): Likewise.
* config/sparc/sparc.c (sparc_option_override): Likewise.
* config/tilegx/tilegx.c (tilegx_option_override): Likewise.
* config/visium/visium.c (visium_option_override): Likewise.
* config/vxworks.c (vxworks_override_options): Likewise.
* lto-opts.c (lto_write_options): Likewise.
* omp-expand.c (expand_omp_simd): Likewise.
* omp-general.c (omp_max_vf): Likewise.
* omp-offload.c (oacc_xform_loop): Likewise.
* opts.h (OPTION_SET_P): Likewise.
* targhooks.c (default_max_noce_ifcvt_seq_cost): Likewise.
* toplev.c (process_options): Likewise.
* tree-predcom.c: Likewise.
* tree-sra.c (analyze_all_variable_accesses): Likewise.

gcc/cp/ChangeLog:

* constexpr.c (maybe_warn_about_constant_value): Use new macro
OPTION_SET_P.
* decl.c (redeclaration_error_message): Likewise.
(cxx_init_decl_processing): Likewise.

gcc/d/ChangeLog:

* d-lang.cc (d_post_options): Use new macro
OPTION_SET_P.

gcc/fortran/ChangeLog:

* options.c (gfc_post_options): Use new macro
OPTION_SET_P.

gcc/objc/ChangeLog:

* objc-next-runtime-abi-01.c: Use new macro
OPTION_SET_P.
* objc-next-runtime-abi-02.c (objc_next_runtime_abi_02_init): Likewise.
---
 gcc/ada/gcc-interface/misc.c| 10 ++---
 gcc/c-family/c-opts.c   |  6 +--
 gcc/config/alpha/alpha.c|  2 +-
 gcc/config/arc/arc.c|  4 +-
 gcc/config/arm/arm.c|  9 ++--
 gcc/config/bfin/bfin.c  |  5 ++-
 gcc/config/c6x/c6x.c|  3 +-
 gcc/config/csky/csky.c  |  7 +--
 gcc/config/darwin.c | 31 ++---
 gcc/config/frv/frv.c|  5 ++-
 gcc/config/i386/djgpp.h |  2 +-
 gcc/config/i386/i386.c  |  6 +--
 gcc/config/ia64/ia64.c  |  6 +--
 gcc/config/m32c/m32c.c  |  3 +-
 gcc/config/m32r/m32r.c  |  3 +-
 gcc/config/m68k/m68k.c  |  6 +--
 gcc/config/microblaze/microblaze.c  |  3 +-
 gcc/config/mips/mips.c  |  9 ++--
 gcc/config/nios2/nios2.c 

Re: [PATCH] contrib: git gcc-descr defaulting to print hash

2021-10-08 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 08, 2021 at 01:01:33PM +0200, Martin Liška wrote:
> I'm sending a patch originally written by Martin Jambor.
> The patch changes the behavior in the following way:
> 
> $ git gcc-descr HEAD~
> 
> r12-4245-gdb3d7270b42fe2
> 
> $ git gcc-descr --short HEAD~
> 
> r12-4245

I think changing the default is ok, but dropping --full is not,
it should stay and behave the way it did before (i.e. print
r12-4245-gdb3d7270b42fe27fb05664c4fdf524ab7ad13a75
same thing as the new default except for full hash instead of
first 14 chars from it).

> $ git gcc-undescr r12-4245-gdb3d7270b42fe2
> 
> db3d7270b42fe27fb05664c4fdf524ab7ad13a75

I don't understand this.  Why do you want to make this work?
That is clearly a noop, you can use r12-4245-gdb3d7270b42fe2
directly in git commands, and if you for whatever strange reason
don't want the r12-4245-g prefix before it, just copy'n'paste
what is after it.

Jakub



Re: [PATCH] [GCC-12] Mention O2 vectorization enabling.

2021-10-08 Thread Richard Biener via Gcc-patches
On Fri, Oct 8, 2021 at 9:29 AM liuhongt via Gcc-patches
 wrote:
>
> ---
>  htdocs/gcc-12/changes.html | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
> index 22839f2d..6e898db7 100644
> --- a/htdocs/gcc-12/changes.html
> +++ b/htdocs/gcc-12/changes.html
> @@ -68,6 +68,15 @@ a work-in-progress.
>  
>  General Improvements
>
> +
> +  The vectorization is enabled in -O2, now -O2 
> is

Vectorization is enabled at -O2 which is now equivalent to the
original ...

> +  equivalent to the original -O2 -ftree-vectorize
> +  -fvect-cost-model=very-cheap. Note that default vect cost model 
> has

Note that the default vectorizer cost model

> +  been changed which may have a subtle effect, for example for the case 
> with
> +  -O2 -fopenmp #pragma omp parallel for simd.

.. which used to behave as -fvect-cost-model=cheap were specified.

OK with those changes.

Richard.

> +  
> +
> +
>  
>  New Languages and Language specific improvements
>
> --
> 2.18.1
>


Re: [PATCH] Come up with OPTION_SET_P macro.

2021-10-08 Thread Richard Biener via Gcc-patches
On Fri, Oct 8, 2021 at 1:03 PM Martin Liška  wrote:
>
> Hello.
>
> It's a refactoring patch introducing a new macro.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> I verified all cross compilers do build.
>
> Ready to be installed?

OK.

Thanks,
Richard.

> Thanks,
> Martin
>
>
> gcc/ada/ChangeLog:
>
> * gcc-interface/misc.c (gnat_post_options): Use new macro
> OPTION_SET_P.
> (gnat_init_gcc_eh): Likewise.
> (gnat_init_gcc_fp): Likewise.
>
> gcc/c-family/ChangeLog:
>
> * c-opts.c (c_common_post_options): Use new macro
> OPTION_SET_P.
>
> gcc/ChangeLog:
>
> * config/alpha/alpha.c (alpha_option_override): Use new macro
> OPTION_SET_P.
> * config/arc/arc.c (arc_override_options): Likewise.
> * config/arm/arm.c (arm_option_override): Likewise.
> * config/bfin/bfin.c (bfin_load_pic_reg): Likewise.
> * config/c6x/c6x.c (c6x_option_override): Likewise.
> * config/csky/csky.c: Likewise.
> * config/darwin.c (darwin_override_options): Likewise.
> * config/frv/frv.c (frv_option_override): Likewise.
> * config/i386/djgpp.h: Likewise.
> * config/i386/i386.c (ix86_stack_protect_guard): Likewise.
> (ix86_max_noce_ifcvt_seq_cost): Likewise.
> * config/ia64/ia64.c (ia64_option_override): Likewise.
> (ia64_override_options_after_change): Likewise.
> * config/m32c/m32c.c (m32c_option_override): Likewise.
> * config/m32r/m32r.c (m32r_init): Likewise.
> * config/m68k/m68k.c (m68k_option_override): Likewise.
> * config/microblaze/microblaze.c (microblaze_option_override): 
> Likewise.
> * config/mips/mips.c (mips_option_override): Likewise.
> * config/nios2/nios2.c (nios2_option_override): Likewise.
> * config/nvptx/nvptx.c (nvptx_option_override): Likewise.
> * config/pa/pa.c (pa_option_override): Likewise.
> * config/riscv/riscv.c (riscv_option_override): Likewise.
> * config/rs6000/aix71.h: Likewise.
> * config/rs6000/aix72.h: Likewise.
> * config/rs6000/aix73.h: Likewise.
> * config/rs6000/rs6000.c (darwin_rs6000_override_options): Likewise.
> (rs6000_override_options_after_change): Likewise.
> (rs6000_linux64_override_options): Likewise.
> (glibc_supports_ieee_128bit): Likewise.
> (rs6000_option_override_internal): Likewise.
> (rs6000_file_start): Likewise.
> (rs6000_darwin_file_start): Likewise.
> * config/rs6000/rtems.h: Likewise.
> * config/rs6000/sysv4.h: Likewise.
> * config/rs6000/vxworks.h (SUB3TARGET_OVERRIDE_OPTIONS): Likewise.
> * config/s390/s390.c (s390_option_override): Likewise.
> * config/sh/linux.h: Likewise.
> * config/sh/netbsd-elf.h (while): Likewise.
> * config/sh/sh.c (sh_option_override): Likewise.
> * config/sol2.c (solaris_override_options): Likewise.
> * config/sparc/sparc.c (sparc_option_override): Likewise.
> * config/tilegx/tilegx.c (tilegx_option_override): Likewise.
> * config/visium/visium.c (visium_option_override): Likewise.
> * config/vxworks.c (vxworks_override_options): Likewise.
> * lto-opts.c (lto_write_options): Likewise.
> * omp-expand.c (expand_omp_simd): Likewise.
> * omp-general.c (omp_max_vf): Likewise.
> * omp-offload.c (oacc_xform_loop): Likewise.
> * opts.h (OPTION_SET_P): Likewise.
> * targhooks.c (default_max_noce_ifcvt_seq_cost): Likewise.
> * toplev.c (process_options): Likewise.
> * tree-predcom.c: Likewise.
> * tree-sra.c (analyze_all_variable_accesses): Likewise.
>
> gcc/cp/ChangeLog:
>
> * constexpr.c (maybe_warn_about_constant_value): Use new macro
> OPTION_SET_P.
> * decl.c (redeclaration_error_message): Likewise.
> (cxx_init_decl_processing): Likewise.
>
> gcc/d/ChangeLog:
>
> * d-lang.cc (d_post_options): Use new macro
> OPTION_SET_P.
>
> gcc/fortran/ChangeLog:
>
> * options.c (gfc_post_options): Use new macro
> OPTION_SET_P.
>
> gcc/objc/ChangeLog:
>
> * objc-next-runtime-abi-01.c: Use new macro
> OPTION_SET_P.
> * objc-next-runtime-abi-02.c (objc_next_runtime_abi_02_init): 
> Likewise.
> ---
>   gcc/ada/gcc-interface/misc.c| 10 ++---
>   gcc/c-family/c-opts.c   |  6 +--
>   gcc/config/alpha/alpha.c|  2 +-
>   gcc/config/arc/arc.c|  4 +-
>   gcc/config/arm/arm.c|  9 ++--
>   gcc/config/bfin/bfin.c  |  5 ++-
>   gcc/config/c6x/c6x.c|  3 +-
>   gcc/config/csky/csky.c  |  7 +--
>   gcc/config/darwin.c | 31 ++---
>   gcc/config/frv/frv.c|  5 ++-
>   gcc/config/i386/djgpp.h |  2 +-
>   gcc/config/i386/i386.c  |  6 +--

[committed] libstdc++: Implement ostream insertion for chrono::duration

2021-10-08 Thread Jonathan Wakely via Gcc-patches
This is a missing piece of the C++20  header.

It would be good to move the code into the compiled library, so that we
don't need  in . It could also use spanstream in C++20,
to avoid memory allocations. That can be changed at a later date.

libstdc++-v3/ChangeLog:

* include/std/chrono (__detail::__units_suffix_misc): New
helper function.
(__detail::__units_suffix): Likewise.
(chrono::operator<<(basic_ostream&, const duration&)): Define.
* testsuite/20_util/duration/io.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit fcc13d6fc31441b5672b68a5e3b247687724218f
Author: Jonathan Wakely 
Date:   Thu Oct 7 19:58:07 2021

libstdc++: Implement ostream insertion for chrono::duration

This is a missing piece of the C++20  header.

It would be good to move the code into the compiled library, so that we
don't need  in . It could also use spanstream in C++20,
to avoid memory allocations. That can be changed at a later date.

libstdc++-v3/ChangeLog:

* include/std/chrono (__detail::__units_suffix_misc): New
helper function.
(__detail::__units_suffix): Likewise.
(chrono::operator<<(basic_ostream&, const duration&)): Define.
* testsuite/20_util/duration/io.cc: New test.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index c8060d7a67e..0662e26348f 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -37,6 +37,10 @@
 #else
 
 #include 
+#if __cplusplus > 201703L
+# include  // ostringstream
+# include 
+#endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -2077,6 +2081,101 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 /// @}
   } // inline namespace chrono_literals
   } // inline namespace literals
+
+  namespace chrono
+  {
+/// @addtogroup chrono
+/// @{
+
+/// @cond undocumented
+namespace __detail
+{
+  template
+   const char*
+   __units_suffix_misc(char* __buf, size_t __n) noexcept
+   {
+ namespace __tc = std::__detail;
+ char* __p = __buf;
+ __p[0] = '[';
+ unsigned __nlen = __tc::__to_chars_len((uintmax_t)_Period::num);
+ __tc::__to_chars_10_impl(__p + 1, __nlen, (uintmax_t)_Period::num);
+ __p += 1 + __nlen;
+ if constexpr (_Period::den != 1)
+   {
+ __p[0] = '/';
+ unsigned __dlen = __tc::__to_chars_len((uintmax_t)_Period::den);
+ __tc::__to_chars_10_impl(__p + 1, __dlen, 
(uintmax_t)_Period::den);
+ __p += 1 + __dlen;
+   }
+ __p[0] = ']';
+ __p[1] = 's';
+ __p[2] = '\0';
+ return __buf;
+   }
+
+  template
+   auto
+   __units_suffix(char* __buf, size_t __n) noexcept
+   {
+#define _GLIBCXX_UNITS_SUFFIX(period, suffix) \
+   if constexpr (is_same_v<_Period, period>)   \
+ { \
+   if constexpr (is_same_v<_CharT, wchar_t>)   \
+ return L##suffix; \
+   else\
+ return suffix;\
+ } \
+   else
+
+ _GLIBCXX_UNITS_SUFFIX(atto, "as")
+ _GLIBCXX_UNITS_SUFFIX(femto, "fs")
+ _GLIBCXX_UNITS_SUFFIX(pico, "ps")
+ _GLIBCXX_UNITS_SUFFIX(nano, "ns")
+ _GLIBCXX_UNITS_SUFFIX(micro, "\u00b5s")
+ _GLIBCXX_UNITS_SUFFIX(milli, "ms")
+ _GLIBCXX_UNITS_SUFFIX(centi, "cs")
+ _GLIBCXX_UNITS_SUFFIX(deci, "ds")
+ _GLIBCXX_UNITS_SUFFIX(ratio<1>, "s")
+ _GLIBCXX_UNITS_SUFFIX(deca, "das")
+ _GLIBCXX_UNITS_SUFFIX(hecto, "hs")
+ _GLIBCXX_UNITS_SUFFIX(kilo, "ks")
+ _GLIBCXX_UNITS_SUFFIX(mega, "Ms")
+ _GLIBCXX_UNITS_SUFFIX(giga, "Gs")
+ _GLIBCXX_UNITS_SUFFIX(tera, "Ts")
+ _GLIBCXX_UNITS_SUFFIX(tera, "Ts")
+ _GLIBCXX_UNITS_SUFFIX(peta, "Ps")
+ _GLIBCXX_UNITS_SUFFIX(exa, "Es")
+ _GLIBCXX_UNITS_SUFFIX(ratio<60>, "min")
+ _GLIBCXX_UNITS_SUFFIX(ratio<3600>, "h")
+ _GLIBCXX_UNITS_SUFFIX(ratio<86400>, "d")
+#undef _GLIBCXX_UNITS_SUFFIX
+ return __detail::__units_suffix_misc<_Period>(__buf, __n);
+   }
+} // namespace __detail
+/// @endcond
+
+template
+  inline basic_ostream<_CharT, _Traits>&
+  operator<<(std::basic_ostream<_CharT, _Traits>& __os,
+   const duration<_Rep, _Period>& __d)
+  {
+   using period = typename _Period::type;
+   char __buf[sizeof("[/]s") + 2 * numeric_limits::digits10];
+   std::basic_ostringstream<_CharT, _Traits> __s;
+   __s.flags(__os.flags());
+   __s.imbue(__os.getloc());
+   __s.precision(__os.precision());
+   __s << __d.count();
+   __s << __detail::__units_suffix(__buf, sizeof(__buf));
+   __os << st

[committed] libstdc++: Restore debug checks in uniform container erasure functions

2021-10-08 Thread Jonathan Wakely via Gcc-patches
This partially reverts commit 561078480ffb5adb68577276c6b23e4ee7b39272.

If we avoid all debug mode checks when erasing elements then we fail to
invalidate safe iterators to the removed elements. This reverts the
recent changes in r12-4083 and r12-4233, restoring the debug checking.

libstdc++-v3/ChangeLog:

* include/experimental/deque (erase, erase_if): Revert changes
to avoid debug mode overhead.
* include/experimental/map (erase, erase_if): Likewise.
* include/experimental/set (erase, erase_if): Likewise.
* include/experimental/unordered_map (erase, erase_if):
Likewise.
* include/experimental/unordered_set (erase, erase_if):
Likewise.
* include/experimental/vector (erase, erase_if): Likewise.
* include/std/deque (erase, erase_if): Likewise.
* include/std/map (erase, erase_if): Likewise.
* include/std/set (erase, erase_if): Likewise.
* include/std/unordered_map (erase, erase_if): Likewise.
* include/std/unordered_set (erase, erase_if): Likewise.
* include/std/vector (erase, erase_if): Likewise.

Tested powerpc64le-linux. Committed to trunk.

commit 82e3a826871effc7093852a9181f641c693ae94f
Author: Jonathan Wakely 
Date:   Thu Oct 7 20:33:45 2021

libstdc++: Restore debug checks in uniform container erasure functions

This partially reverts commit 561078480ffb5adb68577276c6b23e4ee7b39272.

If we avoid all debug mode checks when erasing elements then we fail to
invalidate safe iterators to the removed elements. This reverts the
recent changes in r12-4083 and r12-4233, restoring the debug checking.

libstdc++-v3/ChangeLog:

* include/experimental/deque (erase, erase_if): Revert changes
to avoid debug mode overhead.
* include/experimental/map (erase, erase_if): Likewise.
* include/experimental/set (erase, erase_if): Likewise.
* include/experimental/unordered_map (erase, erase_if):
Likewise.
* include/experimental/unordered_set (erase, erase_if):
Likewise.
* include/experimental/vector (erase, erase_if): Likewise.
* include/std/deque (erase, erase_if): Likewise.
* include/std/map (erase, erase_if): Likewise.
* include/std/set (erase, erase_if): Likewise.
* include/std/unordered_map (erase, erase_if): Likewise.
* include/std/unordered_set (erase, erase_if): Likewise.
* include/std/vector (erase, erase_if): Likewise.

diff --git a/libstdc++-v3/include/experimental/deque 
b/libstdc++-v3/include/experimental/deque
index 710833ebcad..a76fb659bbf 100644
--- a/libstdc++-v3/include/experimental/deque
+++ b/libstdc++-v3/include/experimental/deque
@@ -50,16 +50,16 @@ inline namespace fundamentals_v2
 inline void
 erase_if(deque<_Tp, _Alloc>& __cont, _Predicate __pred)
 {
-  _GLIBCXX_STD_C::deque<_Tp, _Alloc>& __c = __cont;
-  __c.erase(std::remove_if(__c.begin(), __c.end(), __pred), __c.end());
+  __cont.erase(std::remove_if(__cont.begin(), __cont.end(), __pred),
+  __cont.end());
 }
 
   template
 inline void
 erase(deque<_Tp, _Alloc>& __cont, const _Up& __value)
 {
-  _GLIBCXX_STD_C::deque<_Tp, _Alloc>& __c = __cont;
-  __c.erase(std::remove(__c.begin(), __c.end(), __value), __c.end());
+  __cont.erase(std::remove(__cont.begin(), __cont.end(), __value),
+  __cont.end());
 }
 
   namespace pmr {
diff --git a/libstdc++-v3/include/experimental/map 
b/libstdc++-v3/include/experimental/map
index ef69fadf944..0c0f4f5 100644
--- a/libstdc++-v3/include/experimental/map
+++ b/libstdc++-v3/include/experimental/map
@@ -50,19 +50,13 @@ inline namespace fundamentals_v2
   typename _Predicate>
 inline void
 erase_if(map<_Key, _Tp, _Compare, _Alloc>& __cont, _Predicate __pred)
-{
-  _GLIBCXX_STD_C::map<_Key, _Tp, _Compare, _Alloc>& __c = __cont;
-  std::__detail::__erase_nodes_if(__c, __pred);
-}
+{ std::__detail::__erase_nodes_if(__cont, __pred); }
 
   template
 inline void
 erase_if(multimap<_Key, _Tp, _Compare, _Alloc>& __cont, _Predicate __pred)
-{
-  _GLIBCXX_STD_C::multimap<_Key, _Tp, _Compare, _Alloc>& __c = __cont;
-  std::__detail::__erase_nodes_if(__c, __pred);
-}
+{ std::__detail::__erase_nodes_if(__cont, __pred); }
 
   namespace pmr {
 template>
diff --git a/libstdc++-v3/include/experimental/set 
b/libstdc++-v3/include/experimental/set
index 7a5986aec0e..c3f5433e995 100644
--- a/libstdc++-v3/include/experimental/set
+++ b/libstdc++-v3/include/experimental/set
@@ -50,19 +50,13 @@ inline namespace fundamentals_v2
   typename _Predicate>
 inline void
 erase_if(set<_Key, _Compare, _Alloc>& __cont, _Predicate __pred)
-{
-  _GLIBCXX_STD_C::set<_Key, _Compare, _Alloc>& __c = __cont;
-  std::__detail::__eras

[PATCH] libstdc++: Add wrapper for internal uses of std::terminate

2021-10-08 Thread Jonathan Wakely via Gcc-patches
This adds an inline wrapper for std::terminate that doesn't add the
declaration of std::terminate to namespace std. This allows the
library to terminate without including all of .

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h: Remove unused header.
* include/bits/c++config (std:__terminate): Define.
* include/bits/semaphore_base.h: Remove  and use
__terminate instead of terminate.
* include/bits/std_thread.h: Likewise.
* libsupc++/eh_terminate.cc (std::terminate): Use qualified-id
to call __cxxabiv1::__terminate.


This avoids including a few thousand lines of  just for one
function declaration. Any objections or better ideas?


commit 79cd06f3072fd8a68b8636f7d11bb42e62eaa6fa
Author: Jonathan Wakely 
Date:   Fri Oct 8 11:35:53 2021

libstdc++: Add wrapper for internal uses of std::terminate

This adds an inline wrapper for std::terminate that doesn't add the
declaration of std::terminate to namespace std. This allows the
library to terminate without including all of .

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h: Remove unused header.
* include/bits/c++config (std:__terminate): Define.
* include/bits/semaphore_base.h: Remove  and use
__terminate instead of terminate.
* include/bits/std_thread.h: Likewise.
* libsupc++/eh_terminate.cc (std::terminate): Use qualified-id
to call __cxxabiv1::__terminate.

diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h 
b/libstdc++-v3/include/bits/atomic_timed_wait.h
index 64c1ba62a3e..efbe3da8b6c 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -40,7 +40,6 @@
 #include 
 
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
-#include  // std::terminate
 #include 
 #endif
 
diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 69343a25533..b76ffeb2562 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -293,6 +293,15 @@ namespace std
 #if __cplusplus >= 201103L
   typedef decltype(nullptr)nullptr_t;
 #endif
+
+  // This allows the library to terminate without including all of 
+  // and without making the declaration of std::terminate visible to users.
+  __attribute__ ((__noreturn__, __always_inline__))
+  inline void __terminate() _GLIBCXX_USE_NOEXCEPT
+  {
+void terminate() _GLIBCXX_USE_NOEXCEPT __attribute__ ((__noreturn__));
+terminate();
+  }
 }
 
 #define _GLIBCXX_USE_DUAL_ABI
diff --git a/libstdc++-v3/include/bits/semaphore_base.h 
b/libstdc++-v3/include/bits/semaphore_base.h
index 2c8d7576894..afd636704e8 100644
--- a/libstdc++-v3/include/bits/semaphore_base.h
+++ b/libstdc++-v3/include/bits/semaphore_base.h
@@ -40,7 +40,6 @@
 #endif // __cpp_lib_atomic_wait
 
 #ifdef _GLIBCXX_HAVE_POSIX_SEMAPHORE
-# include   // std::terminate
 # include  // errno, EINTR, EAGAIN etc.
 # include// SEM_VALUE_MAX
 # include // sem_t, sem_init, sem_wait, sem_post etc.
@@ -80,7 +79,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  if (__err && (errno == EINTR))
continue;
  else if (__err)
-   std::terminate();
+   std::__terminate();
  else
break;
}
@@ -97,7 +96,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  else if (__err && (errno == EAGAIN))
return false;
  else if (__err)
-   std::terminate();
+   std::__terminate();
  else
break;
}
@@ -111,7 +110,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
   auto __err = sem_post(&_M_semaphore);
   if (__err)
-std::terminate();
+std::__terminate();
}
 }
 
@@ -138,7 +137,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  else if (errno == ETIMEDOUT || errno == EINVAL)
return false;
  else
-   std::terminate();
+   std::__terminate();
}
  else
break;
diff --git a/libstdc++-v3/include/bits/std_thread.h 
b/libstdc++-v3/include/bits/std_thread.h
index 2a500bf1777..801033b00ad 100644
--- a/libstdc++-v3/include/bits/std_thread.h
+++ b/libstdc++-v3/include/bits/std_thread.h
@@ -35,7 +35,6 @@
 #if __cplusplus >= 201103L
 #include 
 
-#include// std::terminate
 #include   // std::basic_ostream
 #include// std::tuple
 #include  // std::hash
@@ -149,7 +148,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 ~thread()
 {
   if (joinable())
-   std::terminate();
+   std::__terminate();
 }
 
 thread(const thread&) = delete;
@@ -162,7 +161,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 thread& operator=(thread&& __t) noexcept
 {
   if (joinable())
-   std::terminate();
+   std::__terminate();
   swap(__t);
   return *this;
 }
diff --git a/libstdc++-v3/libsupc++/e

[PATCH] regcprop: Determine subreg offset depending on endianness [PR101260]

2021-10-08 Thread Stefan Schulze Frielinghaus via Gcc-patches
gcc/ChangeLog:

* regcprop.c (maybe_mode_change): Determine offset relative to
high or low part depending on endianness.

Bootstrapped and regtested on IBM Z. Ok for mainline and gcc-{11,10,9}?

---
 gcc/regcprop.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index d2a01130fe1..0e1ac12458a 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -414,9 +414,14 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
copy_mode,
copy_nregs, &bytes_per_reg))
return NULL_RTX;
   poly_uint64 copy_offset = bytes_per_reg * (copy_nregs - use_nregs);
-  poly_uint64 offset
-   = subreg_size_lowpart_offset (GET_MODE_SIZE (new_mode) + copy_offset,
- GET_MODE_SIZE (orig_mode));
+  poly_uint64 offset =
+#if WORDS_BIG_ENDIAN
+   subreg_size_highpart_offset
+#else
+   subreg_size_lowpart_offset
+#endif
+   (GET_MODE_SIZE (new_mode) + copy_offset,
+GET_MODE_SIZE (orig_mode));
   regno += subreg_regno_offset (regno, orig_mode, offset, new_mode);
   if (targetm.hard_regno_mode_ok (regno, new_mode))
return gen_raw_REG (new_mode, regno);
-- 
2.31.1



Re: [PATCH 3/4] ipa-cp: Fix updating of profile counts and self-gen value evaluation

2021-10-08 Thread Jan Hubicka
> For non-local nodes which can have unknown callers, the algorithm just
> takes half of the counts - we may decide that taking just a third or
> some other portion is more reasonable, but I do not think we can
> attempt anything more clever.

Can't you just sum the calling edges and subtract it from callee's
count?
> 2021-08-23  Martin Jambor  
> 
>   * ipa-cp.c (struct caller_statistics): New fields rec_count_sum,
>   n_nonrec_calls and itself, document all fields.
>   (init_caller_stats): Initialize the above new fields.
>   (gather_caller_stats): Gather self-recursive counts and calls number.
>   (get_info_about_necessary_edges): Gather counts of self-recursive and
>   other edges bringing in the requested value separately.
>   (dump_profile_updates): Rework to dump info about a single node only.
>   (lenient_count_portion_handling): New function.
>   (struct gather_other_count_struct): New type.
>   (gather_count_of_non_rec_edges): New function.
>   (struct desc_incoming_count_struct): New type.
>   (analyze_clone_icoming_counts): New function.
>   (adjust_clone_incoming_counts): Likewise.
>   (update_counts_for_self_gen_clones): Likewise.
>   (update_profiling_info): Rewritten.
>   (update_specialized_profile): Adjust call to dump_profile_updates.
>   (create_specialized_node): Do not update profiling info.
>   (decide_about_value): New parameter self_gen_clones, either push new
>   clones into it or updat their profile counts.  For self-recursively
>   generated values, use a portion of the node count instead of count
>   from self-recursive edges to estimate goodness.
>   (decide_whether_version_node): Gather clones for self-generated values
>   in a new vector, update their profiles at once at the end.
> ---
>  gcc/ipa-cp.c | 543 +++
>  1 file changed, 457 insertions(+), 86 deletions(-)
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index b987d975793..53cca7aa804 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -701,20 +701,36 @@ ipcp_versionable_function_p (struct cgraph_node *node)
>  
>  struct caller_statistics
>  {
> +  /* If requested (see below), self-recursive call counts are summed into 
> this
> + field.  */
> +  profile_count rec_count_sum;
> +  /* The sum of all ipa counts of all the other (non-recursive) calls.  */
>profile_count count_sum;
> +  /* Sum of all frequencies for all calls.  */
>sreal freq_sum;
> +  /* Number of calls and hot calls respectively.  */
>int n_calls, n_hot_calls;
> +  /* If itself is set up, also count the number of non-self-recursive
> + calls.  */
> +  int n_nonrec_calls;
> +  /* If non-NULL, this is the node itself and calls from it should have their
> + counts included in rec_count_sum and not count_sum.  */
> +  cgraph_node *itself;
>  };
>  
> +/* With partial train run we do not want to assume that original's count is
> +   zero whenever we redurect all executed edges to clone.  Simply drop 
> profile
> +   to local one in this case.  In eany case, return the new value.  ORIG_NODE
> +   is the original node and its count has not been updaed yet.  */
> +
> +profile_count
> +lenient_count_portion_handling (profile_count remainder, cgraph_node 
> *orig_node)
> +{
> +  if (remainder.ipa_p () && !remainder.ipa ().nonzero_p ()
> +  && orig_node->count.ipa_p () && orig_node->count.ipa ().nonzero_p ()
> +  && opt_for_fn (orig_node->decl, flag_profile_partial_training))
> +remainder = remainder.guessed_local ();

I do not think you need partial training flag here.  You should see IPA
profile is mising by simply testing ipa_p predicate on relevant counts.
> +
> +/* If caller edge counts of a clone created for a self-recursive arithmetic 
> jump
> +   function must be adjusted, do so. NODE is the node or its thunk.  */

I would add comment on why it needs to be adjusted and how.
> +
> +static void
> +adjust_clone_incoming_counts (cgraph_node *node,
> +   desc_incoming_count_struct *desc)
> +{
> +  for (cgraph_edge *cs = node->callers; cs; cs = cs->next_caller)
> +if (cs->caller->thunk)
> +  {
> + adjust_clone_incoming_counts (cs->caller, desc);
> + profile_count sum = profile_count::zero ();
> + for (cgraph_edge *e = cs->caller->callers; e; e = e->next_caller)
> +   if (e->count.initialized_p ())
> + sum += e->count.ipa ();
> + cs->count = cs->count.combine_with_ipa_count (sum);
> +  }
> +else if (!desc->processed_edges->contains (cs)
> +  && cs->caller->clone_of == desc->orig)
> +  {
> + cs->count += desc->count;
> + if (dump_file)
> +   {
> + fprintf (dump_file, "   Adjusted count of an incoming edge of "
> +  "a clone %s -> %s to ", cs->caller->dump_name (),
> +  cs->callee->dump_name ());
> + cs->count.dump (dump_file);
> + fprintf 

[PATCH][pushed] opts: include missing header files.

2021-10-08 Thread Martin Liška

gcc/objc/ChangeLog:

* objc-next-runtime-abi-01.c: Add missing include.
* objc-next-runtime-abi-02.c: Likewise.
---
 gcc/objc/objc-next-runtime-abi-01.c | 1 +
 gcc/objc/objc-next-runtime-abi-02.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/objc/objc-next-runtime-abi-01.c 
b/gcc/objc/objc-next-runtime-abi-01.c
index 17c86189923..12f8bdc0b9c 100644
--- a/gcc/objc/objc-next-runtime-abi-01.c
+++ b/gcc/objc/objc-next-runtime-abi-01.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "objc-act.h"
+#include "opts.h"
 
 /* When building Objective-C++, we are not linking against the C

front-end and so need to replicate the C tree-construction
diff --git a/gcc/objc/objc-next-runtime-abi-02.c 
b/gcc/objc/objc-next-runtime-abi-02.c
index 677b75f0334..7ca0fd7cf00 100644
--- a/gcc/objc/objc-next-runtime-abi-02.c
+++ b/gcc/objc/objc-next-runtime-abi-02.c
@@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "target.h"

 #include "tree-iterator.h"
+#include "opts.h"
 
 #include "objc-runtime-hooks.h"

 #include "objc-runtime-shared-support.h"
--
2.33.0



[r12-4240 Regression] FAIL: libgomp.c++/scan-9.C scan-tree-dump-times vect "vectorized [2-6] loops" 2 on Linux/x86_64

2021-10-08 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

2b8453c401b699ed93c085d0413ab4b5030bcdb8 is the first bad commit
commit 2b8453c401b699ed93c085d0413ab4b5030bcdb8
Author: liuhongt 
Date:   Mon Sep 6 13:48:49 2021 +0800

Enable auto-vectorization at O2 with very-cheap cost model.

caused

FAIL: libgomp.c++/scan-10.C scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c/scan-11.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c++/scan-11.C scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c/scan-12.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c++/scan-12.C scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c/scan-13.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c++/scan-13.C scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c/scan-14.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c++/scan-14.C scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c/scan-15.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c++/scan-15.C scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c/scan-16.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c++/scan-16.C scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c/scan-17.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c/scan-18.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c/scan-19.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c/scan-20.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c/scan-21.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c/scan-22.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
FAIL: libgomp.c++/scan-9.C scan-tree-dump-times vect "vectorized [2-6] loops" 2

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4240/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/scan-10.C --target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/scan-10.C --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/scan-10.C --target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/scan-10.C --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c.exp=libgomp.c/scan-11.c --target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c.exp=libgomp.c/scan-11.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c.exp=libgomp.c/scan-11.c --target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c.exp=libgomp.c/scan-11.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/scan-11.C --target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/scan-11.C --target_board=

[Patch 1/7, Arm, GCC] Add Armv8.1-M Mainline target feature +pacbti.

2021-10-08 Thread Tejas Belagod via Gcc-patches
Hi,

This patch adds the -march feature +pacbti to Armv8.1-M Mainline.
This feature enables pointer signing and authentication instructions
on M-class architectures.

Tested on arm-none-eabi. OK for trunk?

2021-10-04  Tejas Belagod  

gcc/Changelog:

* config/arm/arm-cpus.in: Define new feature pacbti.
* config/arm/arm.h (TARGET_HAVE_PACBTI): New.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
d0d0d0f1c7e4176fc4aa30d82394fe938b083a59..8a0e9c79682766ee2bec3fd7ba6ed67dff69dbad
 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -223,6 +223,10 @@ define feature cdecp5
 define feature cdecp6
 define feature cdecp7
 
+# M-profile control flow integrity extensions (PAC/AUT/BTI).
+# Optional from Armv8.1-M Mainline.
+define feature pacbti
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -741,6 +745,7 @@ begin arch armv8.1-m.main
  option nofp remove ALL_FP
  option mve add MVE
  option mve.fp add MVE_FP
+ option pacbti add pacbti
  option cdecp0 add cdecp0
  option cdecp1 add cdecp1
  option cdecp2 add cdecp2
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
015299c15346f1bea59d70fdcb1d19545473b23b..8e6ef41f6b065217d1af3f4f1cb85b2d8fbd0dc0
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -335,6 +335,12 @@ emission of floating point pcs attributes.  */
isa_bit_mve_float) \
   && !TARGET_GENERAL_REGS_ONLY)
 
+/* Non-zero if this target supports Armv8.1-M Mainline pointer-signing
+   extension.  */
+#define TARGET_HAVE_PACBTI (arm_arch8_1m_main \
+   && bitmap_bit_p (arm_active_target.isa, \
+isa_bit_pacbti))
+
 /* MVE have few common instructions as VFP, like VLDM alias VPOP, VLDR, VSTM
alia VPUSH, VSTR and VMOV, VMSR and VMRS.  In the same manner it updates few
registers such as FPCAR, FPCCR, FPDSCR, FPSCR, MVFR0, MVFR1 and MVFR2.  All


[Patch 2/7, Arm, GCC] Add option -mbranch-protection.

2021-10-08 Thread Tejas Belagod via Gcc-patches
Hi,

Add -mbranch-protection option and its associated parsing routines.
This option enables the code-generation of pointer signing and
authentication instructions in function prologues and epilogues.

Tested on arm-none-eabi. OK for trunk?

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* common/config/arm/arm-common.c
 (arm_print_hit_for_pacbti_option): New.
 (arm_progress_next_token): New.
 (arm_parse_pac_ret_clause): New routine for parsing the
pac-ret clause for -mbranch-protection.
(arm_parse_pacbti_option): New routine to parse all the options
to -mbranch-protection.
* config/arm/arm-protos.h (arm_parse_pacbti_option): Export.
* config/arm/arm.c (arm_configure)build_target): Handle option
to -mbranch-protection.
* config/arm/arm.opt (mbranch-protection). New.
(arm_enable_pacbti): New.
diff --git a/gcc/common/config/arm/arm-common.c 
b/gcc/common/config/arm/arm-common.c
index 
de898a74165db4d7250aa0097dfab682beb0f99c..188feebb15b52f389d5d0b3ec322be3017efd5a0
 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -475,6 +475,156 @@ arm_parse_arch_option_name (const arch_option *list, 
const char *optname,
   return NULL;
 }
 
+static void
+arm_print_hint_for_pacbti_option ()
+{
+  const char *s = "pac-ret[+leaf][+b-key][+bti]"
+ " | bti[+pac-ret[+leaf][+b-key]]";
+  inform (input_location, "valid arguments are: %s", s);
+}
+
+/* Progress *E to end of next token delimited by DELIMITER.
+   Cache old *E in *OE.  */
+static void
+arm_progress_next_token (const char **oe, const char **e,
+size_t *l, const char delimiter)
+{
+  *oe = *e + 1;
+  *e = strchr (*oe, delimiter);
+  *l = *e ? *e - *oe : strlen (*oe);
+}
+
+/* Parse options to -mbranch-protection.  */
+static const char*
+arm_parse_pac_ret_clause (const char *pacret, const char *optname,
+ unsigned int *pacbti)
+{
+  const char *old_end = NULL;
+  const char *end = strchr (pacret, '+');
+  size_t len = end ? end - pacret : strlen (pacret);
+  if (len == 7 && strncmp (pacret, "pac-ret", len) == 0)
+{
+  *pacbti |= 2;
+  if (end != NULL)
+   {
+ /* pac-ret+...  */
+ arm_progress_next_token (&old_end, &end, &len, '+');
+ if (len == 4 && strncmp (old_end, "leaf", len) == 0)
+   {
+ *pacbti |= 8;
+ if (end != NULL)
+   {
+ /* pac-ret+leaf+...  */
+ arm_progress_next_token (&old_end, &end, &len, '+');
+ if (len == 5 && strncmp (old_end, "b-key", len) == 0)
+   {
+ /* Clear bit for A-key.  */
+ *pacbti &= 0xfffd;
+ *pacbti |= 4;
+ /* A non-NULL end indicates its pointing to a '+'.
+Advance it to point to the next option in the string.  
*/
+ if (end != NULL)
+   end++;
+   }
+ else
+   /* This could be 'bti', leave it to caller to parse.  */
+   end = old_end;
+   }
+   }
+ else if (len == 5 && strncmp (old_end, "b-key", len) == 0)
+   {
+ /* Clear bit for A-key.  */
+ *pacbti &= 0xfffd;
+ *pacbti |= 4;
+ if (end != NULL)
+   {
+ /* pac-ret+b-key+...  */
+ arm_progress_next_token (&old_end, &end, &len, '+');
+ if (len == 4 && strncmp (old_end, "leaf", len) == 0)
+   {
+ *pacbti |= 8;
+ /* A non-NULL end indicates its pointing to a '+'.
+Advance it to point to the next option in the string.  
*/
+ if (end != NULL)
+   end++;
+   }
+ else
+   /* This could be 'bti', leave it to caller to parse.  */
+   end = old_end;
+   }
+   }
+ else
+   {
+ /* This could be a 'bti' option, so leave it to the caller to
+parse.  Fall through to the return.  */
+ end = old_end;
+   }
+   }
+}
+  else
+{
+  error_at (input_location, "unrecognized %s argument: %s", optname, 
pacret);
+  arm_print_hint_for_pacbti_option ();
+  return NULL;
+}
+
+  return end;
+}
+
+unsigned int
+arm_parse_pacbti_option (const char *pacbti, const char *optname, bool 
complain)
+{
+  unsigned int enable_pacbti = 0;
+  const char *end = strchr (pacbti, '+');
+  size_t len = end ? end - pacbti : strlen (pacbti);
+
+  if (strcmp (pacbti, "none") == 0)
+return 0;
+
+  if (strcmp (pacbti, "standard") == 0)
+return 0x3;
+
+  if (len == 3 && strncmp (pacbti, "bti", len) == 0)
+{
+  /* bti+...  */
+  

[Patch 3/7, Arm, GCC] Add testsuite library support for PACBTI target.

2021-10-08 Thread Tejas Belagod via Gcc-patches
Hi,

Add targeting-checking entities for PACBTI in testsuite
framework.

Tested on arm-none-eabi. OK for trunk?

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* testsuite/lib/target-supports.exp
(check_effective_target_arm_pacbti_hw): New.
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
9ebca7ac007363d2a35158bb80092118f629b97b..323541c2da527e3da5dce4d85cadcb2068d9bb5c
 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5064,6 +5064,22 @@ proc check_effective_target_arm_cmse_clear_ok {} {
 } "-mcmse"];
 }
 
+# Return 1 if the target supports executing PACBTI instructions, 0
+# otherwise.
+
+proc check_effective_target_arm_pacbti_hw {} {
+return [check_runtime arm_pacbti_hw_available {
+   __attribute__ ((naked)) int
+   main (void)
+   {
+ asm ("pac r12, lr, sp");
+ asm ("mov r0, #0");
+ asm ("autg r12, lr, sp");
+ asm ("bx lr");
+   }
+} ""]
+}
+
 # Return 1 if this compilation turns on string_ops_prefer_neon on.
 
 proc check_effective_target_arm_tune_string_ops_prefer_neon { } {


[Patch 4/7, Arm. GCC] Implement target feature macros for PACBTI.

2021-10-08 Thread Tejas Belagod via Gcc-patches
Hi,

This patch implements target feature macros when PACBTI is
enabled through the -march option or -mbranch-protection.

Tested on arm-none-eabi. OK for trunk?

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* config/arm/arm-c.c (arm_cpu_builtins): Define
__ARM_FEATURE_BTI_DEFAULT and __ARM_FEATURE_PAC_DEFAULT.

gcc/testsuite/ChangeLog:

* gcc.target/arm/acle/pacbti-m-predef-2.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-4.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-5.c: New test.
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 
cc7901bca8dc9c5c27ed6afc5bc26afd42689e6d..00dc1c2f13f2023c2ba8d7b03038a4cdde068ef6
 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -193,6 +193,17 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   def_or_undef_macro (pfile, "__ARM_FEATURE_COMPLEX", TARGET_COMPLEX);
   def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
 
+  cpp_undef (pfile, "__ARM_FEATURE_BTI_DEFAULT");
+  cpp_undef (pfile, "__ARM_FEATURE_PAC_DEFAULT");
+  if (TARGET_HAVE_PACBTI)
+{
+  builtin_define_with_int_value ("__ARM_FEATURE_BTI_DEFAULT",
+arm_enable_pacbti & 0x1);
+  builtin_define_with_int_value ("__ARM_FEATURE_PAC_DEFAULT",
+arm_enable_pacbti >> 1);
+}
+
+
   cpp_undef (pfile, "__ARM_FEATURE_MVE");
   if (TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT)
 {
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c
new file mode 100644
index 
..7e8cdb2c5fc74dd22085fcac1f692229300a333a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c
@@ -0,0 +1,16 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=bti+pac-ret+b-key+leaf" } */
+
+int
+main()
+{
+  if (__ARM_FEATURE_BTI_DEFAULT != 1)
+__builtin_abort ();
+
+  if (__ARM_FEATURE_PAC_DEFAULT != 6)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-4.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-4.c
new file mode 100644
index 
..41fdcf91a8ab789d055407ae3f8c151984660ee9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-4.c
@@ -0,0 +1,16 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=pac-ret+b-key" } */
+
+int
+main()
+{
+  if (__ARM_FEATURE_BTI_DEFAULT != 0)
+__builtin_abort ();
+
+  if (__ARM_FEATURE_PAC_DEFAULT != 2)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-5.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-5.c
new file mode 100644
index 
..9527c9620a3a5c973b47a5f364ae290d975358c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-5.c
@@ -0,0 +1,16 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=bti+pac-ret+leaf" } */
+
+int
+main()
+{
+  if (__ARM_FEATURE_BTI_DEFAULT != 1)
+__builtin_abort ();
+
+  if (__ARM_FEATURE_PAC_DEFAULT != 5)
+__builtin_abort ();
+
+  return 0;
+}


[Patch 5/7, Arm. GCC] Add pointer authentication for stack-unwinding runtime.

2021-10-08 Thread Tejas Belagod via Gcc-patches
Hi,

This patch adds authentication for when the stack is unwound when
an exception is taken.  All the changes here are done to the runtime
code in libgcc's unwinder code for Arm target. All the changes are
guarded under defined (__ARM_FEATURE_PAC_DEFAULT) and activates only
if the +pacbti feature is switched on for the architecture. This means
that switching on the target feature via -march or -mcpu is sufficient
and -mbranch-protection need not be enabled. This ensures that the
unwinder is authenticated only if the PACBTI instructions are available
in the non-NOP space as it uses AUTG. Just generating PAC/AUT instructions
using -mbranch-protection will not enable authentication on the unwinder.

Tested on arm-none-eabi. OK for trunk?

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* ginclude/unwind-arm-common.h (_Unwind_VRS_RegClass): Introduce
new pseudo register class _UVRSC_PAC.
* libgcc/config/arm/pr-support.c (__gnu_unwind_execute): Decode
exception opcode (0xb4) for saving RA_AUTH_CODE and authenticate
with AUTG if found.
* libgcc/config/arm/unwind-arm.c (struct pseudo_regs): New.
(phase1_vrs): Introduce new field to store pseudo-reg state.
(phase2_vrs): Likewise.
(_Unwind_VRS_Get): Load pseudo register state from virtual reg set.
(_Unwind_VRS_Set): Store pseudo register state to virtual reg set.
(_Unwind_VRS_Pop): Load pseudo register value from stack into VRS.
diff --git a/gcc/ginclude/unwind-arm-common.h b/gcc/ginclude/unwind-arm-common.h
index 
79f107d8abb2dd1e2d4903531db47147da63fee8..903c0d22e4a7bf41d806842e030a4ad532fb835f
 100644
--- a/gcc/ginclude/unwind-arm-common.h
+++ b/gcc/ginclude/unwind-arm-common.h
@@ -127,7 +127,10 @@ extern "C" {
   _UVRSC_VFP = 1,   /* vfp */
   _UVRSC_FPA = 2,   /* fpa */
   _UVRSC_WMMXD = 3, /* Intel WMMX data register */
-  _UVRSC_WMMXC = 4  /* Intel WMMX control register */
+  _UVRSC_WMMXC = 4, /* Intel WMMX control register */
+#if defined(__ARM_FEATURE_PAC_DEFAULT)
+  _UVRSC_PAC = 5/* Armv8.1-M Mainline PAC/AUTH pseudo-register */
+#endif
 }
   _Unwind_VRS_RegClass;
 
diff --git a/libgcc/config/arm/pr-support.c b/libgcc/config/arm/pr-support.c
index 
7525e35b4918d38b4ab3ae73a69b722e31b4b322..ff45f3c6e08a8df64011c0e3a5f5dd1677b3ed11
 100644
--- a/libgcc/config/arm/pr-support.c
+++ b/libgcc/config/arm/pr-support.c
@@ -106,6 +106,9 @@ __gnu_unwind_execute (_Unwind_Context * context, 
__gnu_unwind_state * uws)
 {
   _uw op;
   int set_pc;
+#if defined(__ARM_FEATURE_PAC_DEFAULT)
+  int set_pac = 0;
+#endif
   _uw reg;
 
   set_pc = 0;
@@ -114,6 +117,22 @@ __gnu_unwind_execute (_Unwind_Context * context, 
__gnu_unwind_state * uws)
   op = next_unwind_byte (uws);
   if (op == CODE_FINISH)
{
+#if defined(__ARM_FEATURE_PAC_DEFAULT)
+ /* When we reach end, we have to authenticate R12 we just popped 
earlier.  */
+ if (set_pac)
+   {
+ _uw sp;
+ _uw lr;
+ _uw pac;
+ _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP, _UVRSD_UINT32, &sp);
+ _Unwind_VRS_Get (context, _UVRSC_CORE, R_LR, _UVRSD_UINT32, &lr);
+ _Unwind_VRS_Get (context, _UVRSC_PAC, R_IP,
+  _UVRSD_UINT32, &pac);
+ __asm__ __volatile__
+   ("autg %0, %1, %2" : : "r"(pac), "r"(lr), "r"(sp) :);
+   }
+#endif
+
  /* If we haven't already set pc then copy it from lr.  */
  if (!set_pc)
{
@@ -227,6 +246,19 @@ __gnu_unwind_execute (_Unwind_Context * context, 
__gnu_unwind_state * uws)
return _URC_FAILURE;
  continue;
}
+#if defined(__ARM_FEATURE_PAC_DEFAULT)
+ /* Pop PAC off the stack into VRS pseudo.pac.  */
+ if (op == 0xb4)
+   {
+ if (_Unwind_VRS_Pop (context, _UVRSC_PAC, 0, _UVRSD_UINT32)
+ != _UVRSR_OK)
+   return _URC_FAILURE;
+ set_pac = 1;
+ continue;
+   }
+
+#endif
+
  if ((op & 0xfc) == 0xb4)  /* Obsolete FPA.  */
return _URC_FAILURE;
 
diff --git a/libgcc/config/arm/unwind-arm.c b/libgcc/config/arm/unwind-arm.c
index 
d0394019c3649f2f6d6a2882389e55b56c21b8ef..6e6eb808d70dd1f6d68ec3c5bf0cd3978cc1166b
 100644
--- a/libgcc/config/arm/unwind-arm.c
+++ b/libgcc/config/arm/unwind-arm.c
@@ -64,6 +64,14 @@ struct wmmxc_regs
   _uw wc[4];
 };
 
+#if defined(__ARM_FEATURE_PAC_DEFAULT)
+/*  Holds value of pseudo registers eg. PAC.  */
+struct pseudo_regs
+{
+  _uw pac;
+};
+#endif
+
 /* The ABI specifies that the unwind routines may only use core registers,
except when actually manipulating coprocessor state.  This allows
us to write one implementation that works on all platforms by
@@ -78,6 +86,11 @@ typedef struct
   /* The first fields must be the same as a phase2_vrs.  */
   _uw demand_save_flags;
   struct core_regs 

[Patch 6/7, Arm, GCC] Emit build attributes for PACBTI target feature.

2021-10-08 Thread Tejas Belagod via Gcc-patches
Hi,

This patch emits assembler directives for PACBTI build attributes
as defined by the ABI. 
(https://github.com/ARM-software/abi-aa/releases/download/2021Q1/addenda32.pdf)

Tested on arm-none-eabi.

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* config/arm/arm.c (arm_file_start): Emit EABI attributes for
Tag_PAC_extension, Tag_BTI_extension, TAG_BTI_use, TAG_PACRET_use.

gcc/testsuite/ChangeLog:

* gcc.target/arm/acle/pacbti-m-predef-1.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-3: New test.
* gcc.target/arm/acle/pacbti-m-predef-6.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
1f939a6b79a90430abf120e0aa075dfc1fab29a8..557aae371e2707cb8db569ce033242a139b64e86
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28305,6 +28305,27 @@ arm_file_start (void)
arm_emit_eabi_attribute ("Tag_ABI_FP_16bit_format", 38,
 (int) arm_fp16_format);
 
+  if (TARGET_HAVE_PACBTI)
+   {
+ arm_emit_eabi_attribute ("Tag_PAC_extension", 50, 2);
+ arm_emit_eabi_attribute ("Tag_BTI_extension", 52, 2);
+ arm_emit_eabi_attribute ("TAG_BTI_use", 74, arm_enable_pacbti & 0x1);
+ arm_emit_eabi_attribute ("TAG_PACRET_use", 76,
+  (arm_enable_pacbti >> 1 != 0));
+   }
+  else
+   {
+ if (arm_enable_pacbti != 0)
+   {
+ arm_emit_eabi_attribute ("Tag_PAC_extension", 50, 1);
+ arm_emit_eabi_attribute ("Tag_BTI_extension", 52, 1);
+ arm_emit_eabi_attribute ("TAG_BTI_use", 74,
+  arm_enable_pacbti & 0x1);
+ arm_emit_eabi_attribute ("TAG_PACRET_use", 76,
+  (arm_enable_pacbti >> 1 != 0));
+   }
+   }
+
   if (arm_lang_output_object_attributes_hook)
arm_lang_output_object_attributes_hook();
 }
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c
new file mode 100644
index 
..de9102be3f293605d0891c45cd247be9cf8bd00b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c
@@ -0,0 +1,22 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=pac-ret+bti --save-temps" } */
+
+/* { dg-final { scan-assembler "\.arch_extension pacbti" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 50, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 52, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 74, 1" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 76, 1" } } */
+
+int
+main()
+{
+  if (__ARM_FEATURE_BTI_DEFAULT != 1)
+__builtin_abort ();
+
+  if (__ARM_FEATURE_PAC_DEFAULT != 1)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c
new file mode 100644
index 
..6ecdf2f7411e5d44a5304681032d0841d965b49c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c
@@ -0,0 +1,21 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=pac-ret+b-key+leaf 
--save-temps" } */
+
+/* { dg-final { scan-assembler "\.eabi_attribute 50, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 52, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 74, 0" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 76, 1" } } */
+
+int
+main()
+{
+  if (__ARM_FEATURE_BTI_DEFAULT != 0)
+__builtin_abort ();
+
+  if (__ARM_FEATURE_PAC_DEFAULT != 6)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-6.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-6.c
new file mode 100644
index 
..2340bf0f937b7ea68a02500b66f151f0ce3f39b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-6.c
@@ -0,0 +1,21 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=bti --save-temps" } */
+
+/* { dg-final { scan-assembler "\.eabi_attribute 50, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 52, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 74, 1" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 76, 0" } } */
+
+int
+main()
+{
+  if (__ARM_FEATURE_BTI_DEFAULT != 1)
+__builtin_abort ();
+
+  if (__ARM_FEATURE_PAC_DEFAULT != 0)
+__builtin_abort ();
+
+  return 0;
+}


[Patch 7/7, Arm, GCC] Introduce multilibs for PACBTI target feature.

2021-10-08 Thread Tejas Belagod via Gcc-patches
Hi,

This patch adds a multilib for pacbti target feature.

Tested on arm-none-eabi. OK for trunk?

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* config/arm/t-rmprofile: Add multilib rules for +pacbti.
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
index 
a6036bf0a5191a3cac3bfbe2329783204d5c3ef4..241bf1939e30ae7890ae332556d33759f538ced5
 100644
--- a/gcc/config/arm/t-rmprofile
+++ b/gcc/config/arm/t-rmprofile
@@ -27,8 +27,8 @@
 
 # Arch and FPU variants to build libraries with
 
-MULTI_ARCH_OPTS_RM = 
march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp/march=armv8.1-m.main+mve
-MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base v8-m.main 
v8-m.main+fp v8-m.main+dp v8.1-m.main+mve
+MULTI_ARCH_OPTS_RM = 
march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp/march=armv8.1-m.main+mve/march=armv8.1-m.main+pacbti
+MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base v8-m.main 
v8-m.main+fp v8-m.main+dp v8.1-m.main+mve v8.1-m.main+pacbti
 
 # Base M-profile (no fp)
 MULTILIB_REQUIRED  += mthumb/march=armv6s-m/mfloat-abi=soft
@@ -36,6 +36,7 @@ MULTILIB_REQUIRED += mthumb/march=armv7-m/mfloat-abi=soft
 MULTILIB_REQUIRED  += mthumb/march=armv7e-m/mfloat-abi=soft
 MULTILIB_REQUIRED  += mthumb/march=armv8-m.base/mfloat-abi=soft
 MULTILIB_REQUIRED  += mthumb/march=armv8-m.main/mfloat-abi=soft
+MULTILIB_REQUIRED  += mthumb/march=armv8.1-m.main+pacbti/mfloat-abi=soft
 
 # ARMv7e-M with FP (single and double precision variants)
 MULTILIB_REQUIRED  += mthumb/march=armv7e-m+fp/mfloat-abi=hard
@@ -93,3 +94,4 @@ MULTILIB_MATCHES  += 
march?armv8-m.main=mlibarch?armv8-m.main
 MULTILIB_MATCHES   += march?armv8-m.main+fp=mlibarch?armv8-m.main+fp
 MULTILIB_MATCHES   += march?armv8-m.main+fp.dp=mlibarch?armv8-m.main+fp.dp
 MULTILIB_MATCHES   += march?armv8.1-m.main+mve=mlibarch?armv8.1-m.main+mve
+MULTILIB_MATCHES   += 
march?armv8.1-m.main+pacbti=mlibarch?armv8.1-m.main+pacbti


Re: [PATCH] libsanitizer: Add AM_CCASFLAGS to Makefile.am

2021-10-08 Thread H.J. Lu via Gcc-patches
On Wed, Oct 6, 2021 at 11:25 AM H.J. Lu  wrote:
>
> commit 9069eb28d45baaa8baf5e3790b03b0e2cc5b49b3
> Author: Igor Tsimbalist 
> Date:   Fri Nov 17 22:34:50 2017 +0100
>
> Enable building libsanitizer with Intel CET
>
> libsanitizer/
> * acinclude.m4: Add enable.m4 and cet.m4.
> * Makefile.in: Regenerate.
> * asan/Makefile.am: Update AM_CXXFLAGS.
> * asan/Makefile.in: Regenerate.
> * configure: Likewise.
> * configure.ac: Set CET_FLAGS. Update EXTRA_CFLAGS,
> EXTRA_CXXFLAGS, EXTRA_ASFLAGS.
> * interception/Makefile.am: Update AM_CXXFLAGS.
> * interception/Makefile.in: Regenerate.
> * libbacktrace/Makefile.am: Update AM_CFLAGS, AM_CXXFLAGS.
> * libbacktrace/Makefile.in: Regenerate.
> * lsan/Makefile.am: Update AM_CXXFLAGS.
> * lsan/Makefile.in: Regenerate.
> * sanitizer_common/Makefile.am: Update AM_CXXFLAGS,
> AM_CCASFLAGS.
> * sanitizer_common/sanitizer_linux_x86_64.S: Include cet.h.
> Add _CET_ENDBR macro.
> * sanitizer_common/Makefile.in: Regenerate.
> * tsan/Makefile.am: Update AM_CXXFLAGS.
> * tsan/Makefile.in: Regenerate.
> * tsan/tsan_rtl_amd64.S Include cet.h. Add _CET_ENDBR macro.
> * ubsan/Makefile.am: Update AM_CXXFLAGS.
> * ubsan/Makefile.in: Regenerate.
>
> failed to add EXTRA_ASFLAGS to AM_CCASFLAGS in all Makefile.am.  As
> the result, CET aren't enabled in all assembly codes.
>
> Add AM_CCASFLAGS to Makefile.am to compile assembly codes with $CET_FLAGS.
>
> PR sanitizer/102632
> * asan/Makefile.am (AM_CCASFLAGS): New.  Set to $(EXTRA_ASFLAGS).
> * hwasan/Makefile.am (AM_CCASFLAGS): Likewise.
> * interception/Makefile.am (AM_CCASFLAGS): Likewise.
> * lsan/Makefile.am (AM_CCASFLAGS): Likewise.
> * tsan/Makefile.am (AM_CCASFLAGS): Likewise.
> * usan/Makefile.am (AM_CCASFLAGS): Likewise.
> * asan/Makefile.in: Regenerate.
> * hwasan/Makefile.in: Likewise.
> * interception/Makefile.in: Likewise.
> * lsan/Makefile.in: Likewise.
> * tsan/Makefile.in: Likewise.
> * usan/Makefile.in: Likewise.
> ---
>  libsanitizer/asan/Makefile.am | 1 +
>  libsanitizer/asan/Makefile.in | 1 +
>  libsanitizer/hwasan/Makefile.am   | 1 +
>  libsanitizer/hwasan/Makefile.in   | 1 +
>  libsanitizer/interception/Makefile.am | 1 +
>  libsanitizer/interception/Makefile.in | 1 +
>  libsanitizer/lsan/Makefile.am | 1 +
>  libsanitizer/lsan/Makefile.in | 1 +
>  libsanitizer/tsan/Makefile.am | 1 +
>  libsanitizer/tsan/Makefile.in | 1 +
>  libsanitizer/ubsan/Makefile.am| 1 +
>  libsanitizer/ubsan/Makefile.in| 1 +
>  12 files changed, 12 insertions(+)
>
> diff --git a/libsanitizer/asan/Makefile.am b/libsanitizer/asan/Makefile.am
> index 74658ca7b9c..4f802f723d6 100644
> --- a/libsanitizer/asan/Makefile.am
> +++ b/libsanitizer/asan/Makefile.am
> @@ -11,6 +11,7 @@ AM_CXXFLAGS = -Wall -W -Wno-unused-parameter 
> -Wwrite-strings -pedantic -Wno-long
>  AM_CXXFLAGS += $(LIBSTDCXX_RAW_CXX_CXXFLAGS)
>  AM_CXXFLAGS += -std=gnu++14
>  AM_CXXFLAGS += $(EXTRA_CXXFLAGS)
> +AM_CCASFLAGS = $(EXTRA_ASFLAGS)
>  ACLOCAL_AMFLAGS = -I $(top_srcdir) -I $(top_srcdir)/config
>
>  toolexeclib_LTLIBRARIES = libasan.la
> diff --git a/libsanitizer/asan/Makefile.in b/libsanitizer/asan/Makefile.in
> index 53efe526f9c..528ab61312c 100644
> --- a/libsanitizer/asan/Makefile.in
> +++ b/libsanitizer/asan/Makefile.in
> @@ -421,6 +421,7 @@ AM_CXXFLAGS = -Wall -W -Wno-unused-parameter 
> -Wwrite-strings -pedantic \
> -fomit-frame-pointer -funwind-tables -fvisibility=hidden \
> -Wno-variadic-macros -fno-ipa-icf \
> $(LIBSTDCXX_RAW_CXX_CXXFLAGS) -std=gnu++14 $(EXTRA_CXXFLAGS)
> +AM_CCASFLAGS = $(EXTRA_ASFLAGS)
>  ACLOCAL_AMFLAGS = -I $(top_srcdir) -I $(top_srcdir)/config
>  toolexeclib_LTLIBRARIES = libasan.la
>  nodist_toolexeclib_HEADERS = libasan_preinit.o
> diff --git a/libsanitizer/hwasan/Makefile.am b/libsanitizer/hwasan/Makefile.am
> index cfc1bfe8f01..e12c0a0ce71 100644
> --- a/libsanitizer/hwasan/Makefile.am
> +++ b/libsanitizer/hwasan/Makefile.am
> @@ -8,6 +8,7 @@ AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings 
> -pedantic -Wno-long
>  AM_CXXFLAGS += $(LIBSTDCXX_RAW_CXX_CXXFLAGS)
>  AM_CXXFLAGS += -std=gnu++14
>  AM_CXXFLAGS += $(EXTRA_CXXFLAGS)
> +AM_CCASFLAGS = $(EXTRA_ASFLAGS)
>  ACLOCAL_AMFLAGS = -I $(top_srcdir) -I $(top_srcdir)/config
>
>  toolexeclib_LTLIBRARIES = libhwasan.la
> diff --git a/libsanitizer/hwasan/Makefile.in b/libsanitizer/hwasan/Makefile.in
> index f63670b50d1..1729349e682 100644
> --- a/libsanitizer/hwasan/Makefile.in
> +++ b/libsanitizer/hwasan/Makefile.in
> @@ -409,6 +409,7 @@ AM_CXXFLAGS = -Wall -W -Wno-unused-parameter 
> -Wwrite-strings -pedant

[committed] libstdc++: Reduce header dependencies of in C++20 [PR 92546]

2021-10-08 Thread Jonathan Wakely via Gcc-patches
The  header doesn't need the stream and
streambuf iterators, so don't include the whole of .

libstdc++-v3/ChangeLog:

PR libstdc++/92546
* include/bits/ranges_algobase.h: Replace  with a
subset of the headers it includes.

Tested powerpc64le-linux. Committed to trunk.

commit a1fc4075fcdf028f2e1dc00ce515a947127e2667
Author: Jonathan Wakely 
Date:   Thu Apr 8 10:01:08 2021

libstdc++: Reduce header dependencies of  in C++20 [PR 92546]

The  header doesn't need the stream and
streambuf iterators, so don't include the whole of .

libstdc++-v3/ChangeLog:

PR libstdc++/92546
* include/bits/ranges_algobase.h: Replace  with a
subset of the headers it includes.

diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
index cfbac839749..c8c4d032983 100644
--- a/libstdc++-v3/include/bits/ranges_algobase.h
+++ b/libstdc++-v3/include/bits/ranges_algobase.h
@@ -33,7 +33,9 @@
 #if __cplusplus > 201703L
 
 #include 
-#include 
+#include 
+#include 
+#include 
 #include  // ranges::begin, ranges::range etc.
 #include   // __invoke
 #include  // __is_byte


Re: [PATCH] IBM Z: Provide rawmemchr{qi,hi,si} expander

2021-10-08 Thread Stefan Schulze Frielinghaus via Gcc-patches
On Thu, Oct 07, 2021 at 11:16:24AM +0200, Andreas Krebbel wrote:
> On 9/20/21 11:24, Stefan Schulze Frielinghaus wrote:
> > This patch implements the rawmemchr expander as introduced in
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579649.html
> > 
> > Bootstrapped and regtested in conjunction with the patch from above on
> > IBM Z.  Ok for mainline?
> > 
> 
> > From 551362cda54048dc1a51588112f11c070ed52020 Mon Sep 17 00:00:00 2001
> > From: Stefan Schulze Frielinghaus 
> > Date: Mon, 8 Feb 2021 10:35:39 +0100
> > Subject: [PATCH 2/2] IBM Z: Provide rawmemchr{qi,hi,si} expander
> >
> > gcc/ChangeLog:
> >
> > * config/s390/s390-protos.h (s390_rawmemchrqi): Add prototype.
> > (s390_rawmemchrhi): Add prototype.
> > (s390_rawmemchrsi): Add prototype.
> > * config/s390/s390.c (s390_rawmemchr): New function.
> > (s390_rawmemchrqi): New function.
> > (s390_rawmemchrhi): New function.
> > (s390_rawmemchrsi): New function.
> > * config/s390/s390.md (rawmemchr): New expander.
> > (rawmemchr): New expander.
> > * config/s390/vector.md (vec_vfees): Basically a copy of
> > the pattern vfees from vx-builtins.md.
> > * config/s390/vx-builtins.md (*vfees): Remove.
> 
> Thanks! Would it make sense to also extend the strlen and movstr expanders
> we have to support the additional character modes?

For strlen-like loops over non-character arrays the current
implementation in the loop distribution pass uses rawmemchr and
computes pointer difference in order to compute the length.  Thus we get
strlen for free and don't need to reimplement it.

> 
> A few style comments below.
> 
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/s390/rawmemchr-1.c: New test.
> > ---
> >  gcc/config/s390/s390-protos.h   |  4 +
> >  gcc/config/s390/s390.c  | 89 ++
> >  gcc/config/s390/s390.md | 20 +
> >  gcc/config/s390/vector.md   | 26 ++
> >  gcc/config/s390/vx-builtins.md  | 26 --
> >  gcc/testsuite/gcc.target/s390/rawmemchr-1.c | 99 +
> >  6 files changed, 238 insertions(+), 26 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/s390/rawmemchr-1.c
> >
> > diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
> > index 4b03c6e99f5..0d9619e8254 100644
> > --- a/gcc/config/s390/s390-protos.h
> > +++ b/gcc/config/s390/s390-protos.h
> > @@ -66,6 +66,10 @@ s390_asm_declare_function_size (FILE *asm_out_file,
> > const char *fnname ATTRIBUTE_UNUSED, tree decl);
> >  #endif
> >
> > +extern void s390_rawmemchrqi(rtx dst, rtx src, rtx pat);
> > +extern void s390_rawmemchrhi(rtx dst, rtx src, rtx pat);
> > +extern void s390_rawmemchrsi(rtx dst, rtx src, rtx pat);
> > +
> >  #ifdef RTX_CODE
> >  extern int s390_extra_constraint_str (rtx, int, const char *);
> >  extern int s390_const_ok_for_constraint_p (HOST_WIDE_INT, int, const char 
> > *);
> > diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> > index 54dd6332c3a..1435ce156e2 100644
> > --- a/gcc/config/s390/s390.c
> > +++ b/gcc/config/s390/s390.c
> > @@ -16559,6 +16559,95 @@ s390_excess_precision (enum excess_precision_type 
> > type)
> >  }
> >  #endif
> >
> > +template  > + machine_mode elt_mode,
> > + rtx (*gen_vec_vfees) (rtx, rtx, rtx, rtx)>
> > +static void
> > +s390_rawmemchr(rtx dst, rtx src, rtx pat) {
> 
> I think it would be a bit easier to turn the vec_vfees expander into a
> 'parameterized name' and add the mode as parameter.  I'll attach a patch
> to illustrate how this might look like.

Right, didn't know about parameterized names which looks more clean to
me.  Thanks for the hint!

> 
> > +  rtx lens = gen_reg_rtx (V16QImode);
> > +  rtx pattern = gen_reg_rtx (vec_mode);
> > +  rtx loop_start = gen_label_rtx ();
> > +  rtx loop_end = gen_label_rtx ();
> > +  rtx addr = gen_reg_rtx (Pmode);
> > +  rtx offset = gen_reg_rtx (Pmode);
> > +  rtx tmp = gen_reg_rtx (Pmode);
> > +  rtx loadlen = gen_reg_rtx (SImode);
> > +  rtx matchlen = gen_reg_rtx (SImode);
> > +  rtx mem;
> > +
> > +  pat = GEN_INT (trunc_int_for_mode (INTVAL (pat), elt_mode));
> > +  emit_insn (gen_rtx_SET (pattern, gen_rtx_VEC_DUPLICATE (vec_mode, pat)));
> > +
> > +  emit_move_insn (addr, XEXP (src, 0));
> > +
> > +  // alignment
> > +  emit_insn (gen_vlbb (lens, gen_rtx_MEM (BLKmode, addr), GEN_INT (6)));
> > +  emit_insn (gen_lcbb (loadlen, addr, GEN_INT (6)));
> > +  lens = convert_to_mode (vec_mode, lens, 1);
> > +  emit_insn (gen_vec_vfees (lens, lens, pattern, GEN_INT (0)));
> > +  lens = convert_to_mode (V4SImode, lens, 1);
> > +  emit_insn (gen_vec_extractv4sisi (matchlen, lens, GEN_INT (1)));
> > +  lens = convert_to_mode (vec_mode, lens, 1);
> 
> That back and forth NOP conversion stuff is ugly but I couldn't find a
> more elegant way to write this without generating worse code.  Of
> course we want to benefit here from the fact that th

[COMMITTED] Implement irange::debug()

2021-10-08 Thread Aldy Hernandez via Gcc-patches
Tested on x86-64 Linux.

gcc/ChangeLog:

* value-range.cc (irange::debug): New.
* value-range.h (irange::debug): New.
---
 gcc/value-range.cc | 7 +++
 gcc/value-range.h  | 1 +
 2 files changed, 8 insertions(+)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 147c4b04c1d..caef2498959 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1999,6 +1999,13 @@ irange::dump (FILE *file) const
 }
 }
 
+void
+irange::debug () const
+{
+  dump (stderr);
+  fprintf (stderr, "\n");
+}
+
 void
 dump_value_range (FILE *file, const irange *vr)
 {
diff --git a/gcc/value-range.h b/gcc/value-range.h
index ff6c0a6176d..32200ff890f 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -84,6 +84,7 @@ public:
   // Misc methods.
   bool fits_p (const irange &r) { return m_max_ranges >= r.num_pairs (); }
   void dump (FILE * = stderr) const;
+  void debug () const;
 
   // Deprecated legacy public methods.
   enum value_range_kind kind () const; // DEPRECATED
-- 
2.31.1



[COMMITTED] Grow non_null_ref bitmap when num_ssa_names increases.

2021-10-08 Thread Aldy Hernandez via Gcc-patches
The strlen pass changes the IL as it works with the ranger.  This
causes the non_null_ref code to sometimes get asked questions about new
SSA names.

Tested on x86-64 Linux.

gcc/ChangeLog:

* gimple-range-cache.cc (non_null_ref::non_null_deref_p): Grow
bitmap if needed.
---
 gcc/gimple-range-cache.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 7d994798e52..9cbc63d8a40 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -61,6 +61,9 @@ non_null_ref::non_null_deref_p (tree name, basic_block bb, 
bool search_dom)
 return false;
 
   unsigned v = SSA_NAME_VERSION (name);
+  if (v >= m_nn.length ())
+m_nn.safe_grow_cleared (num_ssa_names + 1);
+
   if (!m_nn[v])
 process_name (name);
 
-- 
2.31.1



[committed] libstdc++: Detect miscompilation of src/c++11/limits.cc

2021-10-08 Thread Jonathan Wakely via Gcc-patches
Add a #error directive to ensure that the definitions are not compiled
as C++17, which would prevent them being emitted.

libstdc++-v3/ChangeLog:

PR libstdc++/98725
* src/c++11/limits.cc: Fail if __cpp_inline_variables is
defined.

Tested powerpc64le-linux. Committed to trunk.

commit e6f6972b5f4711c110fa753c926df49415f230da
Author: Jonathan Wakely 
Date:   Fri Oct 8 14:45:23 2021

libstdc++: Detect miscompilation of src/c++11/limits.cc

Add a #error directive to ensure that the definitions are not compiled
as C++17, which would prevent them being emitted.

libstdc++-v3/ChangeLog:

PR libstdc++/98725
* src/c++11/limits.cc: Fail if __cpp_inline_variables is
defined.

diff --git a/libstdc++-v3/src/c++11/limits.cc b/libstdc++-v3/src/c++11/limits.cc
index 21d4427f10b..585cb8c92db 100644
--- a/libstdc++-v3/src/c++11/limits.cc
+++ b/libstdc++-v3/src/c++11/limits.cc
@@ -29,6 +29,10 @@
 // 18.2.1
 //
 
+#if __cpp_inline_variables
+# error This file must be compiled as C++11 or C++14
+#endif
+
 #include 
 
 namespace std _GLIBCXX_VISIBILITY(default)


Re: [PATCH] Improve integer bit test on atomic builtin return

2021-10-08 Thread H.J. Lu via Gcc-patches
On Fri, Oct 8, 2021 at 12:16 AM Richard Biener  wrote:
>
> On Tue, 5 Oct 2021, H.J. Lu wrote:
>
> > On Tue, Oct 5, 2021 at 3:07 AM Richard Biener  wrote:
> > >
> > > On Mon, 4 Oct 2021, H.J. Lu wrote:
> > >
> > > > commit adedd5c173388ae505470df152b9cb3947339566
> > > > Author: Jakub Jelinek 
> > > > Date:   Tue May 3 13:37:25 2016 +0200
> > > >
> > > > re PR target/49244 (__sync or __atomic builtins will not emit 'lock 
> > > > bts/btr/btc')
> > > >
> > > > optimized bit test on atomic builtin return with lock bts/btr/btc.  But
> > > > it works only for unsigned integers since atomic builtins operate on the
> > > > 'uintptr_t' type.  It fails on bool:
> > > >
> > > >   _1 = atomic builtin;
> > > >   _4 = (_Bool) _1;
> > > >
> > > > and signed integers:
> > > >
> > > >   _1 = atomic builtin;
> > > >   _2 = (int) _1;
> > > >   _5 = _2 & (1 << N);
> > > >
> > > > Improve bit test on atomic builtin return by converting:
> > > >
> > > >   _1 = atomic builtin;
> > > >   _4 = (_Bool) _1;
> > > >
> > > > to
> > > >
> > > >   _1 = atomic builtin;
> > > >   _5 = _1 & (1 << 0);
> > > >   _4 = (_Bool) _5;
> > > >
> > > > and converting:
> > > >
> > > >   _1 = atomic builtin;
> > > >   _2 = (int) _1;
> > > >   _5 = _2 & (1 << N);
> > > >
> > > > to
> > > >   _1 = atomic builtin;
> > > >   _6 = _1 & (1 << N);
> > > >   _5 = (int) _6;
> > >
> > > Why not do this last bit with match.pd patterns (and independent on
> > > whether _1 is defined by an atomic builtin)?  For the first suggested
> >
> > The full picture is
> >
> >  _1 = _atomic_fetch_or_* (ptr_6, mask, _3);
> >   _2 = (int) _1;
> >   _5 = _2 & mask;
> >
> > to
> >
> >   _1 = _atomic_fetch_or_* (ptr_6, mask, _3);
> >   _6 = _1 & mask;
> >   _5 = (int) _6;
> >
> > It is useful only if 2 masks are the same.
> >
> > > transform that's likely going to be undone by folding, no?
> > >
> >
> > The bool case is
> >
> >   _1 = __atomic_fetch_or_* (ptr_6, 1, _3);
> >   _4 = (_Bool) _1;
> >
> > to
> >
> >   _1 = __atomic_fetch_or_* (ptr_6, 1, _3);
> >   _5 = _1 & 1;
> >   _4 = (_Bool) _5;
> >
> > Without __atomic_fetch_or_*, the conversion isn't needed.
> > After the conversion, optimize_atomic_bit_test_and will
> > immediately optimize the code sequence to
> >
> >   _6 = .ATOMIC_BIT_TEST_AND_SET (&v, 0, 0, 0);
> >   _4 = (_Bool) _6;
> >
> > and there is nothing to fold after it.
>
> Hmm, I see - so how about instead teaching the code that
> produces the .ATOMIC_BIT_TEST_AND_SET the alternate forms instead
> of doing the intermediate step separately?
>

The old algorithm is

1.  Check gimple forms.  Return if the form isn't supported.
2.  Do transformation.

My current approach treats the gimple forms accepted by the
old algorithm as canonical forms and changes the algorithm
to

1.  If gimple forms aren't canonical, then
   a. If gimple forms can't be transformed to canonical forms,
   return;
   b. Transform to canonical form.
   endif
2.  Check gimple forms.  Return if the form isn't supported.
3.  Do transformation.

The #2 check is redundant when gimple forms have been
transformed to canonical forms.

I can change my patch to

1.  If gimple forms aren't canonical, then
   a. If gimple forms can't be transformed to canonical forms,
   return;
   b. Transform to canonical form.
else
  Check gimple forms. Return if the form isn't supported.
endif
2.  Do transformation.

The advantage of canonical forms is that we don't have to
transform all different forms.

Does it sound OK?

Thanks.

-- 
H.J.


Re: [PATCH] IBM Z: Provide rawmemchr{qi,hi,si} expander

2021-10-08 Thread Andreas Krebbel via Gcc-patches
On 10/8/21 16:23, Stefan Schulze Frielinghaus wrote:
> On Thu, Oct 07, 2021 at 11:16:24AM +0200, Andreas Krebbel wrote:
>> On 9/20/21 11:24, Stefan Schulze Frielinghaus wrote:
>>> This patch implements the rawmemchr expander as introduced in
>>> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579649.html
>>>
>>> Bootstrapped and regtested in conjunction with the patch from above on
>>> IBM Z.  Ok for mainline?
>>>
>>
>>> From 551362cda54048dc1a51588112f11c070ed52020 Mon Sep 17 00:00:00 2001
>>> From: Stefan Schulze Frielinghaus 
>>> Date: Mon, 8 Feb 2021 10:35:39 +0100
>>> Subject: [PATCH 2/2] IBM Z: Provide rawmemchr{qi,hi,si} expander
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/s390/s390-protos.h (s390_rawmemchrqi): Add prototype.
>>> (s390_rawmemchrhi): Add prototype.
>>> (s390_rawmemchrsi): Add prototype.
>>> * config/s390/s390.c (s390_rawmemchr): New function.
>>> (s390_rawmemchrqi): New function.
>>> (s390_rawmemchrhi): New function.
>>> (s390_rawmemchrsi): New function.
>>> * config/s390/s390.md (rawmemchr): New expander.
>>> (rawmemchr): New expander.
>>> * config/s390/vector.md (vec_vfees): Basically a copy of
>>> the pattern vfees from vx-builtins.md.
>>> * config/s390/vx-builtins.md (*vfees): Remove.
>>
>> Thanks! Would it make sense to also extend the strlen and movstr expanders
>> we have to support the additional character modes?
> 
> For strlen-like loops over non-character arrays the current
> implementation in the loop distribution pass uses rawmemchr and
> computes pointer difference in order to compute the length.  Thus we get
> strlen for free and don't need to reimplement it.

Good to know. Thanks!

...
> Please find a new version attached.  I did another bootstrap+regtest on
> IBM Z.  Ok for mainline?
> 
> Thanks for your detailed review!

Ok for mainline. Thanks!

Andreas


[PATCH] Convert strlen pass from evrp to ranger.

2021-10-08 Thread Aldy Hernandez via Gcc-patches
The following patch converts the strlen pass from evrp to ranger,
leaving DOM as the last remaining user.

No additional cleanups have been done.  For example, the strlen pass
still has uses of VR_ANTI_RANGE, and the sprintf still passes around
pairs of integers instead of using a proper range.  Fixing this
could further improve these passes.

As a further enhancement, if the relevant maintainers deem useful,
the domwalk could be removed from strlen.  That is, unless the pass
needs it for something else.

With ranger we are now able to remove the range calculation from
before_dom_children entirely.  Just working with the ranger on-demand
catches all the strlen and sprintf testcases with the exception of
builtin-sprintf-warn-22.c which is due to a limitation of the sprintf
code.  I have XFAILed the test and documented what the problem is.

It looks like the same problem in the sprintf test triggers a false
positive in gimple-ssa-warn-access.cc so I have added
-Wno-format-overflow until it can be fixed.

I can expand on the false positive if necessary, but the gist is that
this:

_17 = strlen (_132);
_18 = strlen (_136);
_19 = _18 + _17;
if (_19 > 75)
  goto ; [0.00%]
else
  goto ; [100.00%]

...dominates the sprintf in BB61.  This means that ranger can figure
out that the _17 and _18 are [0, 75].  On the other hand, evrp
returned a range of [0, 9223372036854775805] which presumably the
sprintf code was ignoring as a false positive here:

  char sizstr[80];
  ...
  ...
  char *s1 = print_generic_expr_to_str (sizrng[1]);
  gcc_checking_assert (strlen (s0) + strlen (s1)
   < sizeof sizstr - 4);
  sprintf (sizstr, "[%s, %s]", s0, s1);

The warning triggers with:

gimple-ssa-warn-access.cc: In member function ‘void 
{anonymous}::pass_waccess::maybe_check_access_sizes(rdwr_map*, tree, tree, 
gimple*)’:
gimple-ssa-warn-access.cc:2916:32: warning: ‘%s’ directive writing up to 75 
bytes into a region of size between 2 and 77 [-Wformat-overflow=]
 2916 |   sprintf (sizstr, "[%s, %s]", s0, s1);
  |^~
gimple-ssa-warn-access.cc:2916:23: note: ‘sprintf’ output between 5 and 155 
bytes into a destination of size 80
 2916 |   sprintf (sizstr, "[%s, %s]", s0, s1);
  |   ^~~~

On a positive note, these changes found two possible sprintf overflow
bugs in the C++ and Fortran front-ends which I have fixed below.

Bootstrap and regtested on x86-64 Linux.  I also ran it through our
callgrind harness and there was no overall change in overall
compilation time.

OK?

gcc/ChangeLog:

* Makefile.in: Disable -Wformat-overflow for
gimple-ssa-warn-access.o.
* tree-ssa-strlen.c (compare_nonzero_chars): Pass statement
context to ranger.
(get_addr_stridx): Same.
(get_stridx): Same.
(get_range_strlen_dynamic): Same.
(handle_builtin_strlen): Same.
(handle_builtin_strchr): Same.
(handle_builtin_strcpy): Same.
(maybe_diag_stxncpy_trunc): Same.
(handle_builtin_stxncpy_strncat):
(handle_builtin_memcpy): Same.
(handle_builtin_strcat): Same.
(handle_alloc_call): Same.
(handle_builtin_memset): Same.
(handle_builtin_string_cmp): Same.
(handle_pointer_plus): Same.
(count_nonzero_bytes_addr): Same.
(count_nonzero_bytes): Same.
(handle_store): Same.
(fold_strstr_to_strncmp): Same.
(handle_integral_assign): Same.
(check_and_optimize_stmt): Same.
(class strlen_dom_walker): Replace evrp with ranger.
(strlen_dom_walker::before_dom_children): Remove evrp.
(strlen_dom_walker::after_dom_children): Remove evrp.

gcc/cp/ChangeLog:

* ptree.c (cxx_print_xnode): Add more space to pfx array.

gcc/fortran/ChangeLog:

* misc.c (gfc_dummy_typename): Make sure ts->kind is
non-negative.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/builtin-sprintf-warn-22.c: XFAIL.
---
 gcc/Makefile.in   |   1 +
 gcc/cp/ptree.c|   2 +-
 gcc/fortran/misc.c|   2 +-
 .../gcc.dg/tree-ssa/builtin-sprintf-warn-22.c |  13 +-
 gcc/tree-ssa-strlen.c | 145 ++
 5 files changed, 92 insertions(+), 71 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f36ffa4740b..dfd2a40e80a 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -222,6 +222,7 @@ libgcov-merge-tool.o-warn = -Wno-error
 gimple-match.o-warn = -Wno-unused
 generic-match.o-warn = -Wno-unused
 dfp.o-warn = -Wno-strict-aliasing
+gimple-ssa-warn-access.o-warn = -Wno-format-overflow
 
 # All warnings have to be shut off in stage1 if the compiler used then
 # isn't gcc; configure determines that.  WARN_CFLAGS will b

Re: [SVE] [gimple-isel] PR93183 - SVE does not use neg as conditional

2021-10-08 Thread Richard Sandiford via Gcc-patches
Thanks for looking at this.

Prathamesh Kulkarni  writes:
> Hi,
> As mentioned in PR, for the following test-case:
>
> typedef unsigned char uint8_t;
>
> static inline uint8_t
> x264_clip_uint8(uint8_t x)
> {
>   uint8_t t = -x;
>   uint8_t t1 = x & ~63;
>   return (t1 != 0) ? t : x;
> }
>
> void
> mc_weight(uint8_t *restrict dst, uint8_t *restrict src, int n)
> {
>   for (int x = 0; x < n*16; x++)
> dst[x] = x264_clip_uint8(src[x]);
> }
>
> -O3 -mcpu=generic+sve generates following code for the inner loop:
>
> .L3:
> ld1bz0.b, p0/z, [x1, x2]
> movprfx z2, z0
> and z2.b, z2.b, #0xc0
> movprfx z1, z0
> neg z1.b, p1/m, z0.b
> cmpeq   p2.b, p1/z, z2.b, #0
> sel z0.b, p2, z0.b, z1.b
> st1bz0.b, p0, [x0, x2]
> add x2, x2, x4
> whilelo p0.b, w2, w3
> b.any   .L3
>
> The sel is redundant since we could conditionally negate z0 based on
> the predicate
> comparing z2 with 0.
>
> As suggested in the PR, the attached patch, introduces a new
> conditional internal function .COND_NEG, and in gimple-isel replaces
> the following sequence:
>op2 = -op1
>op0 = A cmp B
>lhs = op0 ? op1 : op2
>
> with:
>op0 = A inverted_cmp B
>lhs = .COND_NEG (op0, op1, op1).
>
> lhs = .COD_NEG (op0, op1, op1)
> implies
> lhs = neg (op1) if cond is true OR fall back to op1 if cond is false.
>
> With patch, it generates the following code-gen:
> .L3:
> ld1bz0.b, p0/z, [x1, x2]
> movprfx z1, z0
> and z1.b, z1.b, #0xc0
> cmpne   p1.b, p2/z, z1.b, #0
> neg z0.b, p1/m, z0.b
> st1bz0.b, p0, [x0, x2]
> add x2, x2, x4
> whilelo p0.b, w2, w3
> b.any   .L3
>
> While it seems to work for this test-case, I am not entirely sure if
> the patch is correct. Does it look in the right direction ?

For binary ops we use match.pd rather than isel:

(for uncond_op (UNCOND_BINARY)
 cond_op (COND_BINARY)
 (simplify
  (vec_cond @0 (view_convert? (uncond_op@4 @1 @2)) @3)
  (with { tree op_type = TREE_TYPE (@4); }
   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
&& is_truth_type_for (op_type, TREE_TYPE (@0)))
(view_convert (cond_op @0 @1 @2 (view_convert:op_type @3))
 (simplify
  (vec_cond @0 @1 (view_convert? (uncond_op@4 @2 @3)))
  (with { tree op_type = TREE_TYPE (@4); }
   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
&& is_truth_type_for (op_type, TREE_TYPE (@0)))
(view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1)))

I think it'd be good to do the same here, using new (UN)COND_UNARY
iterators.  (The iterators will only have one value to start with,
but other unary ops could get the same treatment in future.)

Richard


>
> Thanks,
> Prathamesh
>
> diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
> index 38e90933c3e..5b0dd3c1993 100644
> --- a/gcc/gimple-isel.cc
> +++ b/gcc/gimple-isel.cc
> @@ -39,6 +39,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "optabs.h"
>  #include "gimple-fold.h"
>  #include "internal-fn.h"
> +#include "fold-const.h"
> +#include "tree-pretty-print.h"
>  
>  /* Expand all ARRAY_REF(VIEW_CONVERT_EXPR) gimple assignments into calls to
> internal function based on vector type of selected expansion.
> @@ -203,6 +205,35 @@ gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi,
> return new_stmt;
>   }
>  
> +   /* Replace:
> +  op2 = -op1
> +  op0 = A cmp B
> +  lhs = op0 ? op1 : op2
> +
> +  with:
> +  op0 = A inverted_cmp B
> +  lhs = .COND_NEG (op0, op1, op1).  */
> +
> +   gassign *op1_def = nullptr;
> +   if (TREE_CODE (op1) == SSA_NAME)
> + op1_def = static_cast (SSA_NAME_DEF_STMT (op1));
> +
> +   gassign *op2_def = nullptr;
> +   if (TREE_CODE (op2) == SSA_NAME)
> + op2_def = static_cast (SSA_NAME_DEF_STMT (op2));
> +
> +   if (can_compute_op0 && op1_def && op2_def
> +   && gimple_assign_rhs_code (op2_def) == NEGATE_EXPR
> +   && operand_equal_p (gimple_assign_rhs1 (op2_def), op1, 0))
> + {
> +   auto inverted_code
> + = invert_tree_comparison (gimple_assign_rhs_code (def_stmt), 
> true);
> +   gimple_assign_set_rhs_code (def_stmt, inverted_code);
> +   auto gsi2 = gsi_for_stmt (op2_def);
> +   gsi_remove (&gsi2, true);
> +   return gimple_build_call_internal (IFN_COND_NEG, 3, op0, op1, 
> op1);
> + }
> +
> if (can_compute_op0
> && used_vec_cond_exprs >= 2
> && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index 78db25bbac4..b57c7a4ed3e 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -3877,7 +3877,8 @@ static void (*const internal_fn_expanders[]) 
> (internal_fn, gcall *) = {
> 

Re: [PATCH]AArch64[RFC] Force complicated constant to memory when beneficial

2021-10-08 Thread Richard Sandiford via Gcc-patches
Catching up on backlog, sorry for the very late response:

Tamar Christina  writes:
> Hi All,
>
> Consider the following case
>
> #include 
>
> uint64_t
> test4 (uint8x16_t input)
> {
> uint8x16_t bool_input = vshrq_n_u8(input, 7);
> poly64x2_t mask = vdupq_n_p64(0x0102040810204080UL);
> poly64_t prodL = 
> vmull_p64((poly64_t)vgetq_lane_p64((poly64x2_t)bool_input, 0),
>   vgetq_lane_p64(mask, 0));
> poly64_t prodH = vmull_high_p64((poly64x2_t)bool_input, mask);
> uint8x8_t res = vtrn2_u8((uint8x8_t)prodL, (uint8x8_t)prodH);
> return vget_lane_u16((uint16x4_t)res, 3);
> }
>
> which generates (after my CSE patches):
>
> test4:
>   ushrv0.16b, v0.16b, 7
>   mov x0, 16512
>   movkx0, 0x1020, lsl 16
>   movkx0, 0x408, lsl 32
>   movkx0, 0x102, lsl 48
>   fmovd1, x0
>   pmull   v2.1q, v0.1d, v1.1d
>   dup v1.2d, v1.d[0]
>   pmull2  v0.1q, v0.2d, v1.2d
>   trn2v2.8b, v2.8b, v0.8b
>   umovw0, v2.h[3]
>   re
>
> which is suboptimal since the constant is never needed on the genreg side and
> should have been materialized on the SIMD side since the constant is so big
> that it requires 5 instruction to create otherwise. 4 mov/movk and one fmov.
>
> The problem is that the choice of on which side to materialize the constant 
> can
> only be done during reload.  We may need an extra register (to hold the
> addressing) and so can't be done after reload.
>
> I have tried to support this with a pattern during reload, but the problem is 
> I
> can't seem to find a way to tell reload it should spill a constant under
> condition x.  Instead I tried with a split which reload selects when the
> condition hold.

If this is still an issue, one thing to try would be to put a "$" before
the "r" in the GPR alternative.  If that doesn't work then yeah,
I think we're out of luck describing this directly.  If "$" does work,
it'd be interesting to see whether "^" does too.

Thanks,
Richard

>
> This has a couple of issues:
>
> 1. The pattern can be expanded late (could be fixed with !reload_completed).
> 2. Because it's split so late we can't seem to be able to share the anchors 
> for
>the ADRP.
> 3. Because it's split so late and basically reload doesn't know about the 
> spill
>and so the ADD lo12 isn't pushed into the addressing mode of the LDR.
>
> I don't know how to properly fix these since I think the only way is for 
> reload
> to do the spill properly itself, but in this case not having the patter makes 
> it
> avoid the mem pattern and pick r <- n instead followed by r -> w.
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (*movdi_aarch6): Add Dx -> W.
>   * config/aarch64/constraints.md (Dx): New.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> eb8ccd4b97bbd4f0c3ff5791e48cfcfb42ec6c2e..a18886cb65c86daa16baa1691b1718f2d3a1be6c
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1298,8 +1298,8 @@ (define_insn_and_split "*movsi_aarch64"
>  )
>  
>  (define_insn_and_split "*movdi_aarch64"
> -  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,r, r,w, 
> m,m,  r,  r, w,r,w, w")
> - (match_operand:DI 1 "aarch64_mov_operand"  " 
> r,r,k,N,M,n,Usv,m,m,rZ,w,Usa,Ush,rZ,w,w,Dd"))]
> +  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,w  ,r  
> ,r,w, m,m,  r,  r, w,r,w,w")
> + (match_operand:DI 1 "aarch64_mov_operand"  " 
> r,r,k,N,M,n,Dx,Usv,m,m,rZ,w,Usa,Ush,rZ,w,w,Dd"))]
>"(register_operand (operands[0], DImode)
>  || aarch64_reg_or_zero (operands[1], DImode))"
>"@
> @@ -1309,6 +1309,7 @@ (define_insn_and_split "*movdi_aarch64"
> mov\\t%x0, %1
> mov\\t%w0, %1
> #
> +   #
> * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
> ldr\\t%x0, %1
> ldr\\t%d0, %1
> @@ -1321,17 +1322,27 @@ (define_insn_and_split "*movdi_aarch64"
> fmov\\t%d0, %d1
> * return aarch64_output_scalar_simd_mov_immediate (operands[1], DImode);"
> "(CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), 
> DImode))
> -&& REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
> +&& REG_P (operands[0])
> +&& (GP_REGNUM_P (REGNO (operands[0]))
> + || (can_create_pseudo_p ()
> + && !aarch64_can_const_movi_rtx_p (operands[1], DImode)))"
> [(const_int 0)]
> "{
> -   aarch64_expand_mov_immediate (operands[0], operands[1]);
> +   if (GP_REGNUM_P (REGNO (operands[0])))
> +  aarch64_expand_mov_immediate (operands[0], operands[1]);
> +   else
> +  {
> +rtx mem = force_const_mem (DImode, operands[1]);
> +gcc_assert (mem);
> +emit_move_insn (operands[0], mem);
> +  }
> DONE;
>  }"
>;; The "mov_imm" type for CNTD is just a placeholder.
> -  [(set_attr "type" "mov_reg,mov_reg,mov_r

[PATCH] openmp: Add support for OpenMP 5.1 structured-block-sequences

2021-10-08 Thread Jakub Jelinek via Gcc-patches
Hi!

Related to this is the addition of structured-block-sequence in OpenMP 5.1,
which doesn't change anything for Fortran, but for C/C++ allows multiple
statements instead of just one possibly compound around the separating
directives (section and scan).

The following patch implements that, will commit to trunk if it passes
bootstrap/regtest.

I've also made some updates to the OpenMP 5.1 support list in libgomp.texi.

2021-10-08  Jakub Jelinek  

gcc/c/
* c-parser.c (c_parser_omp_structured_block_sequence): New function.
(c_parser_omp_scan_loop_body): Use it.
(c_parser_omp_sections_scope): Likewise.
gcc/cp/
* parser.c (cp_parser_omp_structured_block): Remove disallow_omp_attrs
argument.
(cp_parser_omp_structured_block_sequence): New function.
(cp_parser_omp_scan_loop_body): Use it.
(cp_parser_omp_sections_scope): Likewise.
gcc/testsuite/
* c-c++-common/gomp/sections1.c (foo): Don't expect errors on
multiple statements in between section directive(s).  Add testcases
for invalid no statements in between section directive(s).
* gcc.dg/gomp/sections-2.c (foo): Don't expect errors on
multiple statements in between section directive(s).
* g++.dg/gomp/sections-2.C (foo): Likewise.
* g++.dg/gomp/attrs-6.C (foo): Add testcases for multiple
statements in between section directive(s).
(bar): Add testcases for multiple statements in between scan
directive.
* g++.dg/gomp/attrs-7.C (bar): Adjust expected error recovery.
libgomp/
* libgomp.texi (OpenMP 5.1): Mention implemented support for
structured block sequences in C/C++.  Mention support for
unconstrained/reproducible modifiers on order clause.
Mention partial (C/C++ only) support of extentensions to atomics
construct.  Mention partial (C/C++ on clause only) support of
align/allocator modifiers on allocate clause.

--- gcc/c/c-parser.c.jj 2021-10-07 12:52:34.923913144 +0200
+++ gcc/c/c-parser.c2021-10-08 13:56:43.989987499 +0200
@@ -18976,6 +18976,31 @@ c_parser_omp_flush (c_parser *parser)
   c_finish_omp_flush (loc, mo);
 }
 
+/* Parse an OpenMP structured block sequence.  KIND is the corresponding
+   separating directive.  */
+
+static tree
+c_parser_omp_structured_block_sequence (c_parser *parser,
+   enum pragma_kind kind)
+{
+  tree stmt = push_stmt_list ();
+  c_parser_statement (parser, NULL);
+  do
+{
+  if (c_parser_next_token_is (parser, CPP_CLOSE_BRACE))
+   break;
+  if (c_parser_next_token_is (parser, CPP_EOF))
+   break;
+
+  if (kind != PRAGMA_NONE
+ && c_parser_peek_token (parser)->pragma_kind == kind)
+   break;
+  c_parser_statement (parser, NULL);
+}
+  while (1);
+  return pop_stmt_list (stmt);
+}
+
 /* OpenMP 5.0:
 
scan-loop-body:
@@ -18997,7 +19022,7 @@ c_parser_omp_scan_loop_body (c_parser *p
   return;
 }
 
-  substmt = c_parser_omp_structured_block (parser, NULL);
+  substmt = c_parser_omp_structured_block_sequence (parser, PRAGMA_OMP_SCAN);
   substmt = build2 (OMP_SCAN, void_type_node, substmt, NULL_TREE);
   SET_EXPR_LOCATION (substmt, loc);
   add_stmt (substmt);
@@ -19032,7 +19057,7 @@ c_parser_omp_scan_loop_body (c_parser *p
 error ("expected %<#pragma omp scan%>");
 
   clauses = c_finish_omp_clauses (clauses, C_ORT_OMP);
-  substmt = c_parser_omp_structured_block (parser, NULL);
+  substmt = c_parser_omp_structured_block_sequence (parser, PRAGMA_NONE);
   substmt = build2 (OMP_SCAN, void_type_node, substmt, clauses);
   SET_EXPR_LOCATION (substmt, loc);
   add_stmt (substmt);
@@ -19860,6 +19885,8 @@ c_parser_omp_ordered (c_parser *parser,
  section-directive[opt] structured-block
  section-sequence section-directive structured-block
 
+   OpenMP 5.1 allows structured-block-sequence instead of structured-block.
+
 SECTIONS_LOC is the location of the #pragma omp sections.  */
 
 static tree
@@ -19881,7 +19908,8 @@ c_parser_omp_sections_scope (location_t
 
   if (c_parser_peek_token (parser)->pragma_kind != PRAGMA_OMP_SECTION)
 {
-  substmt = c_parser_omp_structured_block (parser, NULL);
+  substmt = c_parser_omp_structured_block_sequence (parser,
+   PRAGMA_OMP_SECTION);
   substmt = build1 (OMP_SECTION, void_type_node, substmt);
   SET_EXPR_LOCATION (substmt, loc);
   add_stmt (substmt);
@@ -19907,7 +19935,8 @@ c_parser_omp_sections_scope (location_t
  error_suppress = true;
}
 
-  substmt = c_parser_omp_structured_block (parser, NULL);
+  substmt = c_parser_omp_structured_block_sequence (parser,
+   PRAGMA_OMP_SECTION);
   substmt = build1 (OMP_SECTION, void_type_node, substmt);
   SET_EXPR_LOCATION (substmt, loc);
   add_stmt (substmt);
--- gcc/cp/parser.c.jj 

Re: [PATCH]AArch64 Make use of FADDP in simple reductions.

2021-10-08 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> This is a respin of an older patch which never got upstream reviewed by a
> maintainer.  It's been updated to fit the current GCC codegen.
>
> This patch adds a pattern to support the (F)ADDP (scalar) instruction.
>
> Before the patch, the C code
>
> typedef float v4sf __attribute__((vector_size (16)));
>
> float
> foo1 (v4sf x)
> {
>   return x[0] + x[1];
> }
>
> generated:
>
> foo1:
>   dup s1, v0.s[1]
>   fadds0, s1, s0
>   ret
>
> After patch:
> foo1:
>   faddp   s0, v0.2s
>   ret
>
> The double case is now handled by SLP but the remaining cases still need help
> from combine.  I have kept the integer and floating point separate because of
> the integer one only supports V2DI and sharing it with the float would have
> required definition of a few new iterators for just a single use.
>
> I provide support for when both elements are subregs as a different pattern
> as there's no way to tell reload that the two registers must be equal with
> just constraints.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md (*aarch64_faddp_scalar,
>   *aarch64_addp_scalarv2di, *aarch64_faddp_scalar2,
>   *aarch64_addp_scalar2v2di): New.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/simd/scalar_faddp.c: New test.
>   * gcc.target/aarch64/simd/scalar_faddp2.c: New test.
>   * gcc.target/aarch64/simd/scalar_addp.c: New test.
>
> Co-authored-by: Tamar Christina 
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 6814dae079c9ff40aaa2bb625432bf9eb8906b73..b49f8b79b11cbb1888c503d9a9384424f44bde05
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3414,6 +3414,70 @@ (define_insn "aarch64_faddp"
>[(set_attr "type" "neon_fp_reduc_add_")]
>  )
>  
> +;; For the case where both operands are a subreg we need to use a
> +;; match_dup since reload cannot enforce that the registers are
> +;; the same with a constraint in this case.
> +(define_insn "*aarch64_faddp_scalar2"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (plus:
> +   (vec_select:
> + (match_operator: 1 "subreg_lowpart_operator"
> +   [(match_operand:VHSDF 2 "register_operand" "w")])
> + (parallel [(match_operand 3 "const_int_operand" "n")]))
> +   (match_dup: 2)))]
> +  "TARGET_SIMD
> +   && ENDIAN_LANE_N (, INTVAL (operands[3])) == 1"
> +  "faddp\t%0, %2.2"
> +  [(set_attr "type" "neon_fp_reduc_add_")]
> +)

The difficulty with using match_dup here is that the first
vec_select operand ought to fold to a REG after reload,
rather than stay as a subreg.  From that POV we're forcing
the generation of non-canonical rtl.

Also…

> +(define_insn "*aarch64_faddp_scalar"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (plus:
> +   (vec_select:
> + (match_operand:VHSDF 1 "register_operand" "w")
> + (parallel [(match_operand 2 "const_int_operand" "n")]))
> +   (match_operand: 3 "register_operand" "1")))]
> +  "TARGET_SIMD
> +   && ENDIAN_LANE_N (, INTVAL (operands[2])) == 1
> +   && SUBREG_P (operands[3]) && !SUBREG_P (operands[1])
> +   && subreg_lowpart_p (operands[3])"
> +  "faddp\t%0, %1.2"
> +  [(set_attr "type" "neon_fp_reduc_add_")]
> +)

…matching constraints don't work reliably between two inputs:
the RA doesn't know how to combine two different inputs into
one input in order to make them match.

Have you tried doing this as a define_peephole2 instead?
That fits this kind of situation better (from an rtl representation
point of view), but peephole2s are admittedly less powerful than combine.

If peephole2s don't work then I think we'll have to provide
a pattern that accepts two distinct inputs and then split the
instruction if the inputs aren't in the same register.  That sounds
a bit ugly though, so it'd be good news if the peephole thing works out.

Thanks,
Richard

> +
> +;; For the case where both operands are a subreg we need to use a
> +;; match_dup since reload cannot enforce that the registers are
> +;; the same with a constraint in this case.
> +(define_insn "*aarch64_addp_scalar2v2di"
> +  [(set (match_operand:DI 0 "register_operand" "=w")
> + (plus:DI
> +   (vec_select:DI
> + (match_operator:DI 1 "subreg_lowpart_operator"
> +   [(match_operand:V2DI 2 "register_operand" "w")])
> + (parallel [(match_operand 3 "const_int_operand" "n")]))
> +   (match_dup:DI 2)))]
> +  "TARGET_SIMD
> +   && ENDIAN_LANE_N (2, INTVAL (operands[3])) == 1"
> +  "addp\t%d0, %2.2d"
> +  [(set_attr "type" "neon_reduc_add_long")]
> +)
> +
> +(define_insn "*aarch64_addp_scalarv2di"
> +  [(set (match_operand:DI 0 "register_operand" "=w")
> + (plus:DI
> +   (vec_select:DI
> + (match_operand:V2DI 1 "register_opera

Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-08 Thread Martin Sebor via Gcc-patches

On 10/8/21 9:12 AM, Aldy Hernandez via Gcc-patches wrote:

The following patch converts the strlen pass from evrp to ranger,
leaving DOM as the last remaining user.


Thanks for doing this.  I know I said I'd work on it but I'm still
bogged down in my stage 1 work that's not going so great :(  I just
have a few minor comments/questions on the strlen change (inline)
but I am a bit concerned about the test failure.



No additional cleanups have been done.  For example, the strlen pass
still has uses of VR_ANTI_RANGE, and the sprintf still passes around
pairs of integers instead of using a proper range.  Fixing this
could further improve these passes.

As a further enhancement, if the relevant maintainers deem useful,
the domwalk could be removed from strlen.  That is, unless the pass
needs it for something else.

With ranger we are now able to remove the range calculation from
before_dom_children entirely.  Just working with the ranger on-demand
catches all the strlen and sprintf testcases with the exception of
builtin-sprintf-warn-22.c which is due to a limitation of the sprintf
code.  I have XFAILed the test and documented what the problem is.


builtin-sprintf-warn-22.c is a regression test for a false positive
in Glibc.  If it fails we'll have to deal with the Glibc failure
again, which I would rather avoid.  Have you checked to see if
Glibc is affected by the change?



It looks like the same problem in the sprintf test triggers a false
positive in gimple-ssa-warn-access.cc so I have added
-Wno-format-overflow until it can be fixed.

I can expand on the false positive if necessary, but the gist is that
this:

 _17 = strlen (_132);
 _18 = strlen (_136);
 _19 = _18 + _17;
 if (_19 > 75)
   goto ; [0.00%]
 else
   goto ; [100.00%]

...dominates the sprintf in BB61.  This means that ranger can figure
out that the _17 and _18 are [0, 75].  On the other hand, evrp
returned a range of [0, 9223372036854775805] which presumably the
sprintf code was ignoring as a false positive here:


This is a feature designed to avoid false positives when the sprintf
pass doesn't know anything about the strings (i.e., their lengths
are unconstrained by either the sizes of the arrays they're stored
in or any expressions like asserts involving their lengths).

It sounds like the strlen/ranger improvement partially propagates
constraints from subsequent expressions into the strlen results
but it doesn't go far enough for them to actually fully satisfy
the constraint, which is what in turn triggers the warning.

I.e., in the test:

void g (char *s1, char *s2)
{
  char b[1025];
  size_t n = __builtin_strlen (s1), d = __builtin_strlen (s2);
  if (n + d + 1 >= 1025)
return;

  sprintf (b, "%s.%s", s1, s2); // { dg-bogus "\\\[-Wformat-overflow" }

the range of n and d is [0, INF] and so the sprintf call doesn't
trigger a warning.  With your change, because their range is
[0, 1023] each (and there's no way to express that their sum
is less than 1025), the warning triggers because it considers
the worst case scenario (the upper bounds of both).



  char sizstr[80];
  ...
  ...
  char *s1 = print_generic_expr_to_str (sizrng[1]);
  gcc_checking_assert (strlen (s0) + strlen (s1)
   < sizeof sizstr - 4);
  sprintf (sizstr, "[%s, %s]", s0, s1);

The warning triggers with:

gimple-ssa-warn-access.cc: In member function ‘void 
{anonymous}::pass_waccess::maybe_check_access_sizes(rdwr_map*, tree, tree, 
gimple*)’:
gimple-ssa-warn-access.cc:2916:32: warning: ‘%s’ directive writing up to 75 
bytes into a region of size between 2 and 77 [-Wformat-overflow=]
  2916 |   sprintf (sizstr, "[%s, %s]", s0, s1);
   |^~
gimple-ssa-warn-access.cc:2916:23: note: ‘sprintf’ output between 5 and 155 
bytes into a destination of size 80
  2916 |   sprintf (sizstr, "[%s, %s]", s0, s1);
   |   ^~~~



Yes, that does look like the same problem.  It's a side-effect
of the checking_assert.  What's troubling is that it's one that
has exactly the opposite effect of what's intended: it causes
warnings when it's intended to avoid them, which was the main
goal of the strlen/sprintf integration.

Suppressing the warning in these cases, while technically simple,
would be a design change.  We might just have to live with this.
The asserts still work to constrain individual lenghts, they just
won't work for more complex expressions involving relationships
between two or more strings.


On a positive note, these changes found two possible sprintf overflow
bugs in the C++ and Fortran front-ends which I have fixed below.


That's good to hear! :)



Bootstrap and regtested on x86-64 Linux.  I also ran it through our
callgrind harness and there was no overall change in overall
compilation time.

OK?


...

@@ -269,7 +270,7 @@ com

Re: [PATCH] libiberty: prevent buffer overflow when decoding user input

2021-10-08 Thread Iain Buclaw via Gcc-patches
Excerpts from Luís Ferreira's message of October 7, 2021 8:29 pm:
> On Tue, 2021-10-05 at 21:49 -0400, Eric Gallager wrote:
>> 
>> I can help with the autotools part if you can say how precisely you'd
>> like to use them to add address sanitization. And as for the OSS
>> fuzz part, I think someone tried setting up auto-fuzzing for it once,
>> but the main bottleneck was getting the bug reports that it generated
>> properly triaged, so if you could make sure the bug-submitting
>> portion
>> of the process is properly streamlined, that'd probably go a long way
>> towards helping it be useful.
> 
> Bugs are normally reported by email or mailing list. Is there any
> writable mailing list to publish bugs or is it strictly needed to open
> an entry on bugzilla?
> 

Please open an issue on bugzilla, fixes towards it can then be
referenced in the commit message/patch posted here.

Iain.


[committed] LRA: [PR102627] Use at least natural mode during splitting hard reg live range

2021-10-08 Thread Vladimir Makarov via Gcc-patches

The following patch fixes

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102627

The patch was successfully bootstrapped and tested on x86-64.


commit fab2d977e69539aad9bef81caff17de48e53aedf (HEAD -> master)
Author: Vladimir N. Makarov 
Date:   Fri Oct 8 10:16:09 2021 -0400

[PR102627] Use at least natural mode during splitting hard reg live range

In the PR test case SImode was used to split live range of cx on x86-64
because it was the biggest mode for this hard reg in the function.  But
all 64-bits of cx contain structure members.  We need always to use at least
natural mode of hard reg in splitting to fix this problem.

gcc/ChangeLog:

PR rtl-optimization/102627
* lra-constraints.c (split_reg): Use at least natural mode of hard reg.

gcc/testsuite/ChangeLog:

PR rtl-optimization/102627
* gcc.target/i386/pr102627.c: New test.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 4d734548c38..8f75125fc2e 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -5799,11 +5799,12 @@ split_reg (bool before_p, int original_regno, rtx_insn *insn,
 	 part of a multi-word register.  In that case, just use the reg_rtx
 	 mode.  Do the same also if the biggest mode was larger than a register
 	 or we can not compare the modes.  Otherwise, limit the size to that of
-	 the biggest access in the function.  */
+	 the biggest access in the function or to the natural mode at least.  */
   if (mode == VOIDmode
 	  || !ordered_p (GET_MODE_PRECISION (mode),
 			 GET_MODE_PRECISION (reg_rtx_mode))
-	  || paradoxical_subreg_p (mode, reg_rtx_mode))
+	  || paradoxical_subreg_p (mode, reg_rtx_mode)
+	  || maybe_gt (GET_MODE_PRECISION (reg_rtx_mode), GET_MODE_PRECISION (mode)))
 	{
 	  original_reg = regno_reg_rtx[hard_regno];
 	  mode = reg_rtx_mode;
diff --git a/gcc/testsuite/gcc.target/i386/pr102627.c b/gcc/testsuite/gcc.target/i386/pr102627.c
new file mode 100644
index 000..8ab9acaf002
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr102627.c
@@ -0,0 +1,41 @@
+/* PR rtl-optimization/102627 */
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+
+int a, f, l, m, q, c, d, g;
+long b, e;
+struct g {
+  signed h;
+  signed i;
+  unsigned j;
+  unsigned k;
+};
+unsigned n;
+char o;
+int *p = &m;
+long r(int s) { return s && b ?: b; }
+long __attribute__((noipa)) v() {
+  l = 0 || r(n & o);
+  return q;
+}
+void w(int, unsigned, struct g x) {
+  c ?: a;
+  for (; d < 2; d++)
+*p = x.k;
+}
+struct g __attribute__((noipa)) y() {
+  struct g h = {3, 908, 1, 20};
+  for (; g; g++)
+;
+  return h;
+}
+int main() {
+  long t;
+  struct g u = y();
+  t = e << f;
+  w(0, t, u);
+  v(0, 4, 4, 4);
+  if (m != 20)
+__builtin_abort ();
+  return 0;
+}


[PATCH v2, Fortran] Add diagnostic for F2018:C839 (TS29113:C535c)

2021-10-08 Thread Sandra Loosemore

On 10/7/21 9:25 AM, Tobias Burnus wrote:

Hi Sandra,

On 06.10.21 23:37, Sandra Loosemore wrote:
This patch is for PR fortran/54753, to add a diagnostic for violations 
of this constraint in the 2018 standard:


  C839 If an assumed-size or nonallocatable nonpointer assumed-rank
  array is an actual argument that corresponds to a dummy argument that
  is an INTENT (OUT) assumed-rank array, it shall not be polymorphic,
  finalizable, of a type with an allocatable ultimate component, or of a
  type for which default initialization is specified.

(It now uses an interface instead of an actual subroutine definition, 
since Tobias recently committed a patch to fix interfaces in order to 
unblock my work on this one.)  That bug is independent of enforcing 
this constraint so I'm planning to open a new issue for it with its 
own test case, if there isn't already one in Bugzilla.

I concur that that should be in a separate PR.


It's PR102641 now.


diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
...
+  gfc_array_spec *fas, *aas;
+  bool pointer_arg, allocatable_arg;;

Remove either ";" or ";".
@@ -3329,13 +3331,48 @@ gfc_compare_actual_formal (gfc_actual_arglist 
**ap, gfc_formal_arglist *formal,

+  if (a->expr->expr_type != EXPR_VARIABLE)
+    {
+  aas = NULL;
+  pointer_arg = false;
+  allocatable_arg = false;


This code is not generic but rather specific.
But it is fine as used in the code.

The question is how to prevent "?" or wrong code for future
code readers and writers.
 
Solution: I think the simplest would be to add a comment.


OK, done.


+  if (fas
+  && (fas->type == AS_ASSUMED_SHAPE
+  || fas->type == AS_DEFERRED
+  || (fas->type == AS_ASSUMED_RANK && f->sym->attr.pointer))
+  && aas
+  && aas->type == AS_ASSUMED_SIZE
    && (a->expr->ref == NULL
    || (a->expr->ref->type == REF_ARRAY
    && a->expr->ref->u.ar.type == AR_FULL)))

That's old code – but can you adapt it to handle BT_CLASS? I think
only 'f->sym->attr.pointer' causes the issue as it does not check for
CLASS_DATA()->attr.class_pointer – and the rest is fine, also because
of now using 'aas->type' which already encapsulates the classness.


Done.


Testcase:
--
type t
end type t
interface
   subroutine fc2 (x)
     import :: t
     class(t), pointer, intent(in) :: x(..)
   end
end interface
contains
   subroutine sub1(y)
     type(t), target :: y(*)
     call fc2 (y)  ! silently accepted
   end
end
--


OK, I incorporated that into the existing test case for that issue.


+  subroutine test_assumed_size_polymorphic (a1, a2)
+    class(t1) :: a1(*), a2(*)
+    call poly (a1, a2)  ! { dg-error "(A|a)ssumed.rank" }
+    call upoly (a1, a2)  ! { dg-error "(A|a)ssumed.rank" }
+  end subroutine

Can you also add a call like involving something like:
a1(5), a2(4:7), a1(:10) or a2(:-5) ? (Here, '(:-5)' is a
rank-1, size-zero array.)

Calls with those are valid as those pass the array size alongside.
 From the patch it looks as if they should just work, but it is
still good to test this.


+  subroutine test_assumed_size_unlimited_polymorphic (a1, a2)
+    class(*) :: a1(*), a2(*)
+    call upoly (a1, a2)  ! { dg-error "(A|a)ssumed.rank" }
+  end subroutine

Likewise.


This is done too.


Otherwise, it looks good to me.


OK to commit v2 of the patch (attached)?

-Sandra
commit 1beb8cc863225a5f2ba4a52fc3ff1d3320edbfef
Author: Sandra Loosemore 
Date:   Mon Sep 27 07:05:32 2021 -0700

Fortran: Add diagnostic for F2018:C839 (TS29113:C535c)

2021-10-08 Sandra Loosemore  

PR fortran/54753

gcc/fortran/
* interface.c (gfc_compare_actual_formal): Add diagnostic
for F2018:C839.  Refactor shared code and fix bugs with class
array info lookup, and extend similar diagnostic from PR94110
to also cover class types.

gcc/testsuite/
* gfortran.dg/c-interop/c535c-1.f90: Rewrite and expand.
* gfortran.dg/c-interop/c535c-2.f90: Remove xfails.
* gfortran.dg/c-interop/c535c-3.f90: Likewise.
* gfortran.dg/c-interop/c535c-4.f90: Likewise.
* gfortran.dg/PR94110.f90: Extend to cover class types.

diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index a2fea0e97b8..2a71da75c72 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -3061,6 +3061,8 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
   unsigned long actual_size, formal_size;
   bool full_array = false;
   gfc_array_ref *actual_arr_ref;
+  gfc_array_spec *fas, *aas;
+  bool pointer_dummy, pointer_arg, allocatable_arg;
 
   actual = *ap;
 
@@ -3329,13 +3331,60 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
 	  return false;
 	}
 
-  if (f->sym->as
-	  && (f->sym->as->type == AS_ASSUMED_SHAPE
-	  || f->sym->as->type == AS_DEFERR

Re: [PATCH] libiberty: prevent buffer overflow when decoding user input

2021-10-08 Thread Luís Ferreira
On Fri, 2021-10-08 at 18:52 +0200, Iain Buclaw wrote:
> Excerpts from Luís Ferreira's message of October 7, 2021 8:29 pm:
> > On Tue, 2021-10-05 at 21:49 -0400, Eric Gallager wrote:
> > > 
> > > I can help with the autotools part if you can say how precisely
> > > you'd
> > > like to use them to add address sanitization. And as for the OSS
> > > fuzz part, I think someone tried setting up auto-fuzzing for it
> > > once,
> > > but the main bottleneck was getting the bug reports that it
> > > generated
> > > properly triaged, so if you could make sure the bug-submitting
> > > portion
> > > of the process is properly streamlined, that'd probably go a long
> > > way
> > > towards helping it be useful.
> > 
> > Bugs are normally reported by email or mailing list. Is there any
> > writable mailing list to publish bugs or is it strictly needed to
> > open
> > an entry on bugzilla?
> > 
> 
> Please open an issue on bugzilla, fixes towards it can then be
> referenced in the commit message/patch posted here.
> 
> Iain.

You mean for this current issue? The discussion was about future bug
reports reported by the OSS fuzzer workers. I can also open an issue on
the bugzilla for this issue, please clarify it and let me know :)

-- 
Sincerely,
Luís Ferreira @ lsferreira.net



signature.asc
Description: This is a digitally signed message part


[r12-4240 Regression] FAIL: libgomp.c++/scan-9.C scan-tree-dump-times vect "vectorized [2-6] loops" 2 on Linux/x86_64

2021-10-08 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

2b8453c401b699ed93c085d0413ab4b5030bcdb8 is the first bad commit
commit 2b8453c401b699ed93c085d0413ab4b5030bcdb8
Author: liuhongt 
Date:   Mon Sep 6 13:48:49 2021 +0800

Enable auto-vectorization at O2 with very-cheap cost model.

caused

FAIL: gcc.dg/optimize-bswapsi-5.c scan-tree-dump-times optimized "= 
__builtin_bswap32 \\(" 2
FAIL: gcc.dg/optimize-bswapsi-6.c scan-tree-dump store-merging "32 bit bswap 
implementation found at"
FAIL: gcc.dg/torture/pr69760.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/Warray-bounds-51.c  target { i?86-*-* x86_64-*-* }  (test for 
warnings, line 41)
FAIL: gcc.dg/Wstringop-overflow-14.c  target { i?86-*-* x86_64-*-* }  (test for 
warnings, line 38)
FAIL: g++.dg/tree-ssa/pr94403.C   scan-tree-dump-times store-merging 
"__builtin_bswap32" 1
FAIL: g++.dg/tree-ssa/pr94403.C   scan-tree-dump-times store-merging 
"__builtin_bswap64" 1

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4240/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/optimize-bswapsi-5.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/optimize-bswapsi-6.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/optimize-bswapsi-6.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg-torture.exp=gcc.dg/torture/pr69760.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/Warray-bounds-51.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/Warray-bounds-51.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/Wstringop-overflow-14.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/Wstringop-overflow-14.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/tree-ssa/pr94403.C --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-08 Thread Segher Boessenkool
On Thu, Oct 07, 2021 at 08:04:23PM -0500, Paul A. Clarke wrote:
> On Thu, Oct 07, 2021 at 06:39:06PM -0500, Segher Boessenkool wrote:
> > > +  __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> > 
> > The __volatile__ does likely not do what you want.  As far as I can see
> > you do not want one here anyway?
> > 
> > "volatile" does not order asm wrt fp insns, which you likely *do* want.
> 
> Reading the GCC docs, it looks like the "volatile" qualifier for "asm"
> has no effect at all (6.47.1):
> 
> | The optional volatile qualifier has no effect. All basic asm blocks are
> | implicitly volatile.
> 
> So, it could be removed without concern.

This is not a basic asm (it contains a ":"; that is not just an easy way
to see it, it is the *definition* of basic vs. extended asm).

The manual explains:

"""
Note that the compiler can move even 'volatile asm' instructions
relative to other code, including across jump instructions.  For
example, on many targets there is a system register that controls the
rounding mode of floating-point operations.  Setting it with a 'volatile
asm' statement, as in the following PowerPC example, does not work
reliably.

 asm volatile("mtfsf 255, %0" : : "f" (fpenv));
 sum = x + y;

The compiler may move the addition back before the 'volatile asm'
statement.  To make it work as expected, add an artificial dependency to
the 'asm' by referencing a variable in the subsequent code, for example:

 asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
 sum = x + y;
"""

> > You do not need any of that __ either.
> 
> I'm surprised that I don't. A .h file needs to be concerned about the
> namespace it inherits, no?

These are local variables in a function though.  You get such
complexities in macros, but never in functions, where everything is
scoped.  Local variables are a great thing.  And macros are a bad thing!


Segher


Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-08 Thread Andrew MacLeod via Gcc-patches

On 10/8/21 12:51 PM, Martin Sebor via Gcc-patches wrote:



I.e., in the test:

void g (char *s1, char *s2)
{
  char b[1025];
  size_t n = __builtin_strlen (s1), d = __builtin_strlen (s2);
  if (n + d + 1 >= 1025)
    return;

  sprintf (b, "%s.%s", s1, s2); // { dg-bogus "\\\[-Wformat-overflow" }

the range of n and d is [0, INF] and so the sprintf call doesn't
trigger a warning.  With your change, because their range is
[0, 1023] each (and there's no way to express that their sum
is less than 1025), the warning triggers because it considers
the worst case scenario (the upper bounds of both).

So the warning operates on the assumption that no info is OK, but 
improved information causes them to break because it can't figure out 
what to do with it?


Does this ever work when there is more than 1 string in the sprintf?  It 
seems that its the inherent lack of being able to associate an 
expression with a predicate that is the problem here.  If this is a 
single string, then an accurate  range should be able to come up with an 
accurate answer.  But as soon as there is a second string, this is bound 
to fail unless the strings are known to be 1/2 their size, and likewise 
if there were 3 strings, 1/3 their size...


Should we even be attempting to warn for multiple strings if we aren't 
going to be able to calculate them accurately? It seems like a recipe 
for a lot of false positives.   And then once we figure out how to 
combine the range info with the appropriate predicates, turn it back on?


Andrew



Re: [PATCH] x86-64: Remove HAVE_LD_PIE_COPYRELOC

2021-10-08 Thread Fāng-ruì Sòng via Gcc-patches
On Fri, Sep 24, 2021 at 11:29 AM H.J. Lu  wrote:
>
> On Fri, Sep 24, 2021 at 11:14 AM Fāng-ruì Sòng  wrote:
> >
> > On Fri, Sep 24, 2021 at 10:41 AM H.J. Lu  wrote:
> > >
> > > On Fri, Sep 24, 2021 at 10:29 AM Fāng-ruì Sòng  wrote:
> > > >
> > > >  On Tue, Sep 21, 2021 at 7:08 PM Fāng-ruì Sòng  
> > > > wrote:
> > > > >
> > > > > On Tue, Sep 21, 2021 at 6:57 PM H.J. Lu  wrote:
> > > > > >
> > > > > > On Tue, Sep 21, 2021 at 9:16 AM Uros Bizjak  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Mon, Sep 20, 2021 at 8:20 PM Fāng-ruì Sòng via Gcc-patches
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > PING^5 
> > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> > > > > > > >
> > > > > > > > On Sat, Sep 4, 2021 at 12:11 PM Fāng-ruì Sòng 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > PING^4 
> > > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> > > > > > > > >
> > > > > > > > > One major design goal of PIE was to avoid copy relocations.
> > > > > > > > > The original patch for GCC 5 caused problems for many years.
> > > > > > > > >
> > > > > > > > > On Wed, Aug 18, 2021 at 11:54 PM Fāng-ruì Sòng 
> > > > > > > > >  wrote:
> > > > > > > > >>
> > > > > > > > >> PING^3 
> > > > > > > > >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> > > > > > > > >>
> > > > > > > > >> On Fri, Jun 4, 2021 at 3:04 PM Fāng-ruì Sòng 
> > > > > > > > >>  wrote:
> > > > > > > > >> >
> > > > > > > > >> > PING^2 
> > > > > > > > >> > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> > > > > > > > >> >
> > > > > > > > >> > On Mon, May 24, 2021 at 9:43 AM Fāng-ruì Sòng 
> > > > > > > > >> >  wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > Ping 
> > > > > > > > >> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> > > > > > > > >> > >
> > > > > > > > >> > > On Tue, May 11, 2021 at 8:29 PM Fangrui Song 
> > > > > > > > >> > >  wrote:
> > > > > > > > >> > > >
> > > > > > > > >> > > > This was introduced in 2014-12 to use local binding 
> > > > > > > > >> > > > for external symbols
> > > > > > > > >> > > > for -fPIE. Now that we have H.J. Lu's GOTPCRELX for 
> > > > > > > > >> > > > years which mostly
> > > > > > > > >> > > > nullify the benefit of HAVE_LD_PIE_COPYRELOC, 
> > > > > > > > >> > > > HAVE_LD_PIE_COPYRELOC
> > > > > > > > >> > > > should retire now.
> > > > > > > > >> > > >
> > > > > > > > >> > > > One design goal of -fPIE was to avoid copy relocations.
> > > > > > > > >> > > > HAVE_LD_PIE_COPYRELOC has deviated from the goal.  
> > > > > > > > >> > > > With this change, the
> > > > > > > > >> > > > -fPIE behavior of x86-64 will be closer to x86-32 and 
> > > > > > > > >> > > > other targets.
> > > > > > > > >> > > >
> > > > > > > > >> > > > ---
> > > > > > > > >> > > >
> > > > > > > > >> > > > See 
> > > > > > > > >> > > > https://gcc.gnu.org/legacy-ml/gcc/2019-05/msg00215.html
> > > > > > > > >> > > >  for a list
> > > > > > > > >> > > > of fixed and unfixed (e.g. gold incompatibility with 
> > > > > > > > >> > > > protected
> > > > > > > > >> > > > https://sourceware.org/bugzilla/show_bug.cgi?id=19823) 
> > > > > > > > >> > > > issues.
> > > > > > > > >> > > >
> > > > > > > > >> > > > If you prefer a longer write-up, see
> > > > > > > > >> > > > https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected
> > > > > > > > >> > > > ---
> > > > > > > > >> > > >  gcc/config.in |  6 ---
> > > > > > > > >> > > >  gcc/config/i386/i386.c| 11 
> > > > > > > > >> > > > +---
> > > > > > > > >> > > >  gcc/configure | 52 
> > > > > > > > >> > > > ---
> > > > > > > > >> > > >  gcc/configure.ac  | 48 
> > > > > > > > >> > > > -
> > > > > > > > >> > > >  gcc/doc/sourcebuild.texi  |  3 --
> > > > > > > > >> > > >  .../gcc.target/i386/pie-copyrelocs-1.c| 14 
> > > > > > > > >> > > > -
> > > > > > > > >> > > >  .../gcc.target/i386/pie-copyrelocs-2.c| 14 
> > > > > > > > >> > > > -
> > > > > > > > >> > > >  .../gcc.target/i386/pie-copyrelocs-3.c| 14 
> > > > > > > > >> > > > -
> > > > > > > > >> > > >  .../gcc.target/i386/pie-copyrelocs-4.c| 17 
> > > > > > > > >> > > > --
> > > > > > > > >> > > >  gcc/testsuite/lib/target-supports.exp | 47 
> > > > > > > > >> > > > -
> > > > > > > > >> > > >  10 files changed, 2 insertions(+), 224 deletions(-)
> > > > > > > > >> > > >  delete mode 100644 
> > > > > > > > >> > > > gcc/testsuite/gcc.target/i386/pie-copyrelocs-1.c
> > > > > > > > >> > > >  delete mode 100644 
> > > > > > > > >> > > > gcc/testsuite/gcc.target/i386/pie-copyrelocs-2.c
> > > > > > > > >> > > >  delete mode 100644 
> > > > > > > > >> > > > gcc/testsuite/gcc.target/i386/pie-copyrelocs-3.c
> > > > > > > > >> > > >  delete mode 100644 
> > > > > >

Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-08 Thread Paul A. Clarke via Gcc-patches
On Fri, Oct 08, 2021 at 12:39:15PM -0500, Segher Boessenkool wrote:
> On Thu, Oct 07, 2021 at 08:04:23PM -0500, Paul A. Clarke wrote:
> > On Thu, Oct 07, 2021 at 06:39:06PM -0500, Segher Boessenkool wrote:
> > > > +  __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> > > 
> > > The __volatile__ does likely not do what you want.  As far as I can see
> > > you do not want one here anyway?
> > > 
> > > "volatile" does not order asm wrt fp insns, which you likely *do* want.
> > 
> > Reading the GCC docs, it looks like the "volatile" qualifier for "asm"
> > has no effect at all (6.47.1):
> > 
> > | The optional volatile qualifier has no effect. All basic asm blocks are
> > | implicitly volatile.
> > 
> > So, it could be removed without concern.
> 
> This is not a basic asm (it contains a ":"; that is not just an easy way
> to see it, it is the *definition* of basic vs. extended asm).

Ah, basic vs extended. I learned something today... thanks for your
patience!

> The manual explains:
> 
> """
> Note that the compiler can move even 'volatile asm' instructions
> relative to other code, including across jump instructions.  For
> example, on many targets there is a system register that controls the
> rounding mode of floating-point operations.  Setting it with a 'volatile
> asm' statement, as in the following PowerPC example, does not work
> reliably.
> 
>  asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>  sum = x + y;
> 
> The compiler may move the addition back before the 'volatile asm'
> statement.  To make it work as expected, add an artificial dependency to
> the 'asm' by referencing a variable in the subsequent code, for example:
> 
>  asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>  sum = x + y;
> """

I see. Thanks for the reference. If I understand correctly, volatile
prevents some optimizations based on the defined inputs/outputs, but
the asm could still be subject to reordering.

In this particular case, I don't think it's an issue with respect to
reordering.  The code in question is:
+  __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
+  __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;

The output (__fpscr_save) is a source for the following assignment,
so the order should be respected, no?

With respect to volatile, I worry about removing it, because I do
indeed need that instruction to execute in order to clear the FPSCR
exception enable bits. That side-effect is not otherwise known to the
compiler.

> > > You do not need any of that __ either.
> > 
> > I'm surprised that I don't. A .h file needs to be concerned about the
> > namespace it inherits, no?
> 
> These are local variables in a function though.  You get such
> complexities in macros, but never in functions, where everything is
> scoped.  Local variables are a great thing.  And macros are a bad thing!

They are local variables in a function *in an include file*, though.
If a user's preprocessor macro just happens to match a local variable name
there could be problems, right?

a.h:
inline void foo () {
  int A = 0;
}

a.c:
#define A a+b
#include 

$ gcc -c -I. a.c
In file included from a.c:1:
a.c: In function ‘foo’:
a.h:1:12: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘+’ 
token
 #define A a+b
^
a.c:2:17: note: in expansion of macro ‘A’
 int foo() { int A = 0; }
 ^
a.h:1:13: error: ‘b’ undeclared (first use in this function)
 #define A a+b
 ^
a.c:2:17: note: in expansion of macro ‘A’
 int foo() { int A = 0; }
 ^
a.h:1:13: note: each undeclared identifier is reported only once for each 
function it appears in
 #define A a+b
 ^
a.c:2:17: note: in expansion of macro ‘A’
 int foo() { int A = 0; }
 ^
PC


Re: [PATCH v2, Fortran] Add diagnostic for F2018:C839 (TS29113:C535c)

2021-10-08 Thread Tobias Burnus

Hi Sandra

On 08.10.21 18:58, Sandra Loosemore wrote:

I concur that that should be in a separate PR.

It's PR102641 now.

Thanks.

OK to commit v2 of the patch (attached)?


OK – thanks for the patch!

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] libiberty: prevent buffer overflow when decoding user input

2021-10-08 Thread Iain Buclaw via Gcc-patches
Excerpts from Luís Ferreira's message of October 8, 2021 7:08 pm:
> On Fri, 2021-10-08 at 18:52 +0200, Iain Buclaw wrote:
>> Excerpts from Luís Ferreira's message of October 7, 2021 8:29 pm:
>> > On Tue, 2021-10-05 at 21:49 -0400, Eric Gallager wrote:
>> > > 
>> > > I can help with the autotools part if you can say how precisely
>> > > you'd
>> > > like to use them to add address sanitization. And as for the OSS
>> > > fuzz part, I think someone tried setting up auto-fuzzing for it
>> > > once,
>> > > but the main bottleneck was getting the bug reports that it
>> > > generated
>> > > properly triaged, so if you could make sure the bug-submitting
>> > > portion
>> > > of the process is properly streamlined, that'd probably go a long
>> > > way
>> > > towards helping it be useful.
>> > 
>> > Bugs are normally reported by email or mailing list. Is there any
>> > writable mailing list to publish bugs or is it strictly needed to
>> > open
>> > an entry on bugzilla?
>> > 
>> 
>> Please open an issue on bugzilla, fixes towards it can then be
>> referenced in the commit message/patch posted here.
>> 
>> Iain.
> 
> You mean for this current issue? The discussion was about future bug
> reports reported by the OSS fuzzer workers. I can also open an issue on
> the bugzilla for this issue, please clarify it and let me know :)
> 

1. Open one for this issue.

2. Bugs found by the fuzzer would report to bugzilla.
https://gcc.gnu.org/bugs/

Iain.


[PATCH] c++: fix cases of core1001/1322 by not dropping cv-qualifier of function parameter of type of typename or decltype[PR101402, PR102033, PR102034, PR102039, PR102

2021-10-08 Thread Nick Huang via Gcc-patches
First of all, I am sorry for my late response as I missed your email.
I need to update my filter setup of gmail after switching from hotmail.

>I think WILDCARD_TYPE_P is what you want, except...
I will try this one.

>Your patch rejects that testcase even without the specialization on
>int[], which no other compiler I'm aware of does;

Honestly I now realize this is a wakeup call for me as I have been
trying to revert PR92010 approach for months as I at that time favor
the early syntax-checking approach which is considered efficient. Now
your testcase reveals me a realist practice that almost all compilers
tolerate this seemingly user's definition error because:
a) It actually follows the principle that template without instantiation
might not give error.
b) "const T" and "T" maybe turn out to be the same after instantiation
as function parameter.
So, I guess this is the real practical approach for most compilers to
rebuild function signature from declaration only when instantiation.
My approach stucks when GCC search declaration for definition because
"T" and "const T" are two different CANONICAL types. So, I now guess
that is why declarator deliberately drops cv-qualifers to tolerate your
testcase.

>You seem to have missed my September 28 mail that argued for fixing the
>bug in determine_specialization that was preventing the 92010 fix from
>handling these cases.

I did try to see this approach, but I was stuck in a sidelined issue of
PR102624 which relates to lambda-in-unevaluated-context. The point is
that I thought PR92010 starts to satisfy this pt.c:tsubst_default_argument:
gcc_assert (same_type_ignoring_top_level_qualifiers_p (type, parmtype));

But I think after introduction of lambda in unevaluated context, this may
not be correct assertion. I could be wrong on this. However, i.e.

template 
void spam(decltype([]{})* ptr=nullptr)
{ }
void foo(){
  spam();
}

When rebuilding lambda type from declaration, it is always a unique
different type. So, that is the reason I thought this rebuild function
type approach is imperfect. In other words,  it is no good to rebuild
function type to satisfy this may-not-be-correct-assertion,

Anyway, I now think PR92010 is practically a good approach and I will
start testing your patch.

Best regards,


Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-08 Thread Martin Sebor via Gcc-patches

On 10/8/21 11:56 AM, Andrew MacLeod wrote:

On 10/8/21 12:51 PM, Martin Sebor via Gcc-patches wrote:



I.e., in the test:

void g (char *s1, char *s2)
{
  char b[1025];
  size_t n = __builtin_strlen (s1), d = __builtin_strlen (s2);
  if (n + d + 1 >= 1025)
    return;

  sprintf (b, "%s.%s", s1, s2); // { dg-bogus "\\\[-Wformat-overflow" }

the range of n and d is [0, INF] and so the sprintf call doesn't
trigger a warning.  With your change, because their range is
[0, 1023] each (and there's no way to express that their sum
is less than 1025), the warning triggers because it considers
the worst case scenario (the upper bounds of both).

So the warning operates on the assumption that no info is OK, but 
improved information causes them to break because it can't figure out 
what to do with it?


The idea is that input that appears unconstrained might have been
constrained somewhere else that we can't see, but constrained input
suggests it may not be constrained enough.   In the above, pointing
s1 and s2 at arrays same size as b, there's a decent chance that
the strings stored in them could be as long as fits (otherwise why
use such big arrays?) which would overflow the destination.



Does this ever work when there is more than 1 string in the sprintf?  It 
seems that its the inherent lack of being able to associate an 
expression with a predicate that is the problem here.  If this is a 
single string, then an accurate  range should be able to come up with an 
accurate answer.  But as soon as there is a second string, this is bound 
to fail unless the strings are known to be 1/2 their size, and likewise 
if there were 3 strings, 1/3 their size...


Right.  The logic is of course not bulletproof which is why we
integrated the sprintf pass with strlen: to get at the actual
string lengths when they're available instead of relying solely
on the worst case array size approximation.  (The array size
heuristic still applies when we don't have any strlen info.)
Even with the strlen info we don't get full accuracy because
the string lengths may be just lower bounds (e.g., as a result
of memcpy(a, "123", 3), strlen(a) no less than 3 but may be
as long as sizeof a - 1, and the warning uses the upper bound).

This, by the way, isn't just about strings.  It's the same for
numbers:

  sprintf (a, "%i %i", i, j);

will warn if i and j are in some constrained range whose upper
bound would result in overflowing a.



Should we even be attempting to warn for multiple strings if we aren't 
going to be able to calculate them accurately? It seems like a recipe 
for a lot of false positives.   And then once we figure out how to 
combine the range info with the appropriate predicates, turn it back on?


It's been this way since the warning was introduced in GCC 7
and the false positives haven't been too bad (we have just
12 in Bugzilla).  Even with perfect ranges zero false positive
rate isn't achievable with the current design (or any design),
just like we can never come close to zero false negatives.

Every now and then it seems that a three level warning might
have been better than two, with level 1 using an even more
conservative approach.  But the most conservative approach is
next to useless: it would have to assume strings of length
zero (or one), all integers between 0 and 9, and floats have
few fractional digits.  That rarely happens.  It's all based
on judgment calls.

Martin


[PATCH] PR fortran/65454 - accept both old and new-style relational operators

2021-10-08 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

F2018:10.1.5.5.1(2) requires the same interpretation of old and new-style
relational operators.  We internally distinguish between old and new style,
but try to map appropriately when used.

This mapping was missing when reading a module via
  USE module, ONLY: OPERATOR(op)
where op used a style different from the INTERFACE OPERATOR statement in
the declaring module.  The attached patch remedies this.

Note: we do neither change the module format nor actually remap an operator.
We simply improve the check whether the requested operator symbol exists in
the old-style or new-style version.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

Fortran: accept both old and new-style relational operators in USE, ONLY

F2018:10.1.5.5.1(2) requires the same interpretation of old and new-style
relational operators.  As gfortran internally distinguishes between
these versions, we must match equivalent notations in
	USE module, ONLY: OPERATOR(op)
statements when reading modules.

gcc/fortran/ChangeLog:

	PR fortran/65454
	* module.c (read_module): Handle old and new-style relational
	operators when used in USE module, ONLY: OPERATOR(op).

gcc/testsuite/ChangeLog:

	PR fortran/65454
	* gfortran.dg/interface_operator_3.f90: New test.

diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 1804066bc8c..7b98ba539d6 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -5592,6 +5592,9 @@ read_module (void)

   for (i = GFC_INTRINSIC_BEGIN; i != GFC_INTRINSIC_END; i++)
 {
+  gfc_use_rename *u = NULL, *v = NULL;
+  int j = i;
+
   if (i == INTRINSIC_USER)
 	continue;

@@ -5599,18 +5602,73 @@ read_module (void)
 	{
 	  u = find_use_operator ((gfc_intrinsic_op) i);

-	  if (u == NULL)
+	  /* F2018:10.1.5.5.1 requires same interpretation of old and new-style
+	 relational operators.  Special handling for USE, ONLY.  */
+	  switch (i)
+	{
+	case INTRINSIC_EQ:
+	  j = INTRINSIC_EQ_OS;
+	  break;
+	case INTRINSIC_EQ_OS:
+	  j = INTRINSIC_EQ;
+	  break;
+	case INTRINSIC_NE:
+	  j = INTRINSIC_NE_OS;
+	  break;
+	case INTRINSIC_NE_OS:
+	  j = INTRINSIC_NE;
+	  break;
+	case INTRINSIC_GT:
+	  j = INTRINSIC_GT_OS;
+	  break;
+	case INTRINSIC_GT_OS:
+	  j = INTRINSIC_GT;
+	  break;
+	case INTRINSIC_GE:
+	  j = INTRINSIC_GE_OS;
+	  break;
+	case INTRINSIC_GE_OS:
+	  j = INTRINSIC_GE;
+	  break;
+	case INTRINSIC_LT:
+	  j = INTRINSIC_LT_OS;
+	  break;
+	case INTRINSIC_LT_OS:
+	  j = INTRINSIC_LT;
+	  break;
+	case INTRINSIC_LE:
+	  j = INTRINSIC_LE_OS;
+	  break;
+	case INTRINSIC_LE_OS:
+	  j = INTRINSIC_LE;
+	  break;
+	default:
+	  break;
+	}
+
+	  if (j != i)
+	v = find_use_operator ((gfc_intrinsic_op) j);
+
+	  if (u == NULL && v == NULL)
 	{
 	  skip_list ();
 	  continue;
 	}

-	  u->found = 1;
+	  if (u)
+	u->found = 1;
+	  if (v)
+	v->found = 1;
 	}

   mio_interface (&gfc_current_ns->op[i]);
-  if (u && !gfc_current_ns->op[i])
-	u->found = 0;
+  if (!gfc_current_ns->op[i] && !gfc_current_ns->op[j])
+	{
+	  if (u)
+	u->found = 0;
+	  if (v)
+	v->found = 0;
+	}
 }

   mio_rparen ();
diff --git a/gcc/testsuite/gfortran.dg/interface_operator_3.f90 b/gcc/testsuite/gfortran.dg/interface_operator_3.f90
new file mode 100644
index 000..6a580b2f1cf
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/interface_operator_3.f90
@@ -0,0 +1,141 @@
+! { dg-do compile }
+! PR fortran/65454 - accept both old and new-style relational operators
+
+module m
+  implicit none
+  private :: t1
+  type t1
+ integer :: i
+  end type t1
+  interface operator (==)
+ module procedure :: my_cmp
+  end interface
+  interface operator (/=)
+ module procedure :: my_cmp
+  end interface
+  interface operator (<=)
+ module procedure :: my_cmp
+  end interface
+  interface operator (<)
+ module procedure :: my_cmp
+  end interface
+  interface operator (>=)
+ module procedure :: my_cmp
+  end interface
+  interface operator (>)
+ module procedure :: my_cmp
+  end interface
+contains
+  elemental function my_cmp (a, b) result (c)
+type(t1), intent(in) :: a, b
+logical  :: c
+c = a%i == b%i
+  end function my_cmp
+end module m
+
+module m_os
+  implicit none
+  private :: t2
+  type t2
+ integer :: i
+  end type t2
+  interface operator (.eq.)
+ module procedure :: my_cmp
+  end interface
+  interface operator (.ne.)
+ module procedure :: my_cmp
+  end interface
+  interface operator (.le.)
+ module procedure :: my_cmp
+  end interface
+  interface operator (.lt.)
+ module procedure :: my_cmp
+  end interface
+  interface operator (.ge.)
+ module procedure :: my_cmp
+  end interface
+  interface operator (.gt.)
+ module procedure :: my_cmp
+  end interface
+contains
+  elemental function my_cmp (a, b) resu

Re: [PATCH] PR fortran/65454 - accept both old and new-style relational operators

2021-10-08 Thread Jerry D via Gcc-patches




On 10/8/21 2:33 PM, Harald Anlauf via Fortran wrote:

Dear Fortranners,

F2018:10.1.5.5.1(2) requires the same interpretation of old and new-style
relational operators.  We internally distinguish between old and new style,
but try to map appropriately when used.

This mapping was missing when reading a module via
   USE module, ONLY: OPERATOR(op)
where op used a style different from the INTERFACE OPERATOR statement in
the declaring module.  The attached patch remedies this.

Note: we do neither change the module format nor actually remap an operator.
We simply improve the check whether the requested operator symbol exists in
the old-style or new-style version.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald


Looks all good Harald, OK and thanks for the support!

Jerry


Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-08 Thread Segher Boessenkool
On Fri, Oct 08, 2021 at 02:27:28PM -0500, Paul A. Clarke wrote:
> On Fri, Oct 08, 2021 at 12:39:15PM -0500, Segher Boessenkool wrote:
> > This is not a basic asm (it contains a ":"; that is not just an easy way
> > to see it, it is the *definition* of basic vs. extended asm).
> 
> Ah, basic vs extended. I learned something today... thanks for your
> patience!

To expand a little: any asm with operands is extended asm.  And without
operands can be either:  asm("eieio");  is basic, while  asm("eieio" : );
is extended.  This matters because semantics are a bit different.

> I see. Thanks for the reference. If I understand correctly, volatile
> prevents some optimizations based on the defined inputs/outputs, but
> the asm could still be subject to reordering.

"asm volatile" means there is a side effect in the asm.  This means that
it has to be executed on the real machine the same as on the abstract
machine, with the side effects in the same order.

It can still be reordered, modulo those restrictions.  It can be merged
with an identical asm as well.  And the compiler can split this into two
identical asms on two paths.

In this case you might want a side effect (the instructions writes to
the FPSCR after all).  But you need this to be tied to the FP code that
you want the flags to be changed for, and to the restore of the flags,
and finally you need to prevent other FP code from being scheduled in
between.

You need more for that than just volatile, and the solution may well
make volatile not wanted: tying the insns together somehow will
naturally make the flags restored to a sane situation again, so the
whole group can be removed if you want, etc.

> In this particular case, I don't think it's an issue with respect to
> reordering.  The code in question is:
> +  __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> +  __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> 
> The output (__fpscr_save) is a source for the following assignment,
> so the order should be respected, no?

Other FP code can be interleaved, and then do the wrong thing.

> With respect to volatile, I worry about removing it, because I do
> indeed need that instruction to execute in order to clear the FPSCR
> exception enable bits. That side-effect is not otherwise known to the
> compiler.

Yes.  But as said above, volatile isn't enough to get this to behave
correctly.

The easiest way out is to write this all in one piece of (inline) asm.

> > > > You do not need any of that __ either.
> > > 
> > > I'm surprised that I don't. A .h file needs to be concerned about the
> > > namespace it inherits, no?
> > 
> > These are local variables in a function though.  You get such
> > complexities in macros, but never in functions, where everything is
> > scoped.  Local variables are a great thing.  And macros are a bad thing!
> 
> They are local variables in a function *in an include file*, though.
> If a user's preprocessor macro just happens to match a local variable name
> there could be problems, right?

Of course.  This is why traditionally macro names are ALL_CAPS :-)  So
in practice it doesn't matter, and in practice many users use __ names
themselves as well.

But you are right.  I just don't see it will help practically :-(


Segher


Re: [PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-10-08 Thread Martin Sebor via Gcc-patches

On 10/8/21 4:49 AM, Aldy Hernandez via Gcc-patches wrote:

On Thu, Sep 23, 2021 at 8:32 AM Richard Biener via Gcc-patches
 wrote:


On Thu, 23 Sep 2021, Hongtao Liu wrote:


On Thu, Sep 23, 2021 at 9:48 AM Hongtao Liu  wrote:


On Wed, Sep 22, 2021 at 10:21 PM Martin Sebor  wrote:


On 9/21/21 7:38 PM, Hongtao Liu wrote:

On Mon, Sep 20, 2021 at 4:13 AM Martin Sebor  wrote:

...

diff --git a/gcc/testsuite/c-c++-common/Wstringop-overflow-2.c 
b/gcc/testsuite/c-c++-common/Wstringop-overflow-2.c
index 1d79930cd58..9351f7e7a1a 100644
--- a/gcc/testsuite/c-c++-common/Wstringop-overflow-2.c
+++ b/gcc/testsuite/c-c++-common/Wstringop-overflow-2.c
@@ -1,7 +1,7 @@
/* PR middle-end/91458 - inconsistent warning for writing past the end
   of an array member
   { dg-do compile }
-   { dg-options "-O2 -Wall -Wno-array-bounds -fno-ipa-icf" } */
+   { dg-options "-O2 -Wall -Wno-array-bounds -fno-ipa-icf -fno-tree-vectorize" 
} */


The testcase is large - what part requires this change?  Given the
testcase was added for inconsistent warnings do they now become
inconsistent again as we enable vectorization at -O2?

That said, the testcase adjustments need some explaining - I suppose
you didn't just slap -fno-tree-vectorize to all of those changing
behavior?


void ga1_ (void)
{
 a1_.a[0] = 0;
 a1_.a[1] = 1; // { dg-warning "\\\[-Wstringop-overflow" }
 a1_.a[2] = 2; // { dg-warning "\\\[-Wstringop-overflow" }

 struct A1 a;
 a.a[0] = 0;
 a.a[1] = 1;   // { dg-warning "\\\[-Wstringop-overflow" }
 a.a[2] = 2;   // { dg-warning "\\\[-Wstringop-overflow" }
 sink (&a);
}

It's supposed to be 2 warning for a.a[1] = 1 and a.a[2] = 1 since
there are 2 accesses, but after enabling vectorization, there's only
one access, so one warning is missing which causes the failure.


With the stores vectorized, is the warning on the correct line or
does it point to the first store, the one that's in bounds, as
it does with -O3?  The latter would be a regression at -O2.

For the upper case, It points to the second store which is out of
bounds, the third store warning is missing.




I would find it preferable to change the test code over disabling
optimizations that are on by default.  My concern is that the test
would no longer exercise the default behavior.  (The same goes for
the -fno-ipa-icf option.)

Hmm, it's a middle-end test, for some backend, it may not do
vectorization(it depends on TARGET_VECTOR_MODE_SUPPORTED_P and
relative cost model).


Yes, there are quite a few warning tests like that.  Their main
purpose is to verify that in common GCC invocations (i.e., without
any special options) warnings are a) issued when expected and b)
not issued when not expected.  Otherwise, middle end warnings are
known to have both false positives and false negatives in some
invocations, depending on what optimizations are in effect.
Indiscriminately disabling common optimizations for these large
tests and invoking them under artificial conditions would
compromise this goal and hide the problems.

If enabling vectorization at -O2 causes regressions in the quality
of diagnostics (as the test failure above indicates seems to be
happening) we should investigate these and open bugs for them so
they can be fixed.  We can then tweak the specific failing test
cases to avoid the failures until they are fixed.

There are indeed cases of false positives and false negatives
.i.e.
// Verify warning for access to a definition with an initializer that
// initializes the one-element array member.
struct A1 a1i_1 = { 0, { 1 } };

void ga1i_1 (void)
{
   a1i_1.a[0] = 0;
   a1i_1.a[1] = 1;   // { dg-warning "\\\[-Wstringop-overflow" }
   a1i_1.a[2] = 2;   // { dg-warning "\\\[-Wstringop-overflow" }

   struct A1 a = { 0, { 1 } }; --- false positive here.
   a.a[0] = 1;
   a.a[1] = 2;   // { dg-warning
"\\\[-Wstringop-overflow" } false negative here.
   a.a[2] = 3;   // { dg-warning
"\\\[-Wstringop-overflow" } false negative here.
   sink (&a);
}

Similar for
* gcc.dg/Warray-bounds-51.c.
* gcc.dg/Warray-parameter-3.c
* gcc.dg/Wstringop-overflow-14.c
* gcc.dg/Wstringop-overflow-21.c

So there're 3 situations.
1. All accesses are out of bound, and after vectorization, there are
some warnings missing.
2. Part of accesses are inbound, part of accesses are out of bound,
and after vectorization, the warning goes from out of bound line to
inbound line.
3. All access are out of bound, and after vectoriation, all warning
are missing, and goes to a false-positive line.


I remember some of the warning code explicitely excuses itself from
even trying to deal with vectorized loads/stores, that might need to
be revisited.  It would also be useful to verify whether the line
info on the vectorized loads/stores is sensible (if you dump with
-lineno you get stmts with line numbers).

It is of course impossible to preserve

[committed 1/2] libstdc++: Avoid instantiation of _Hash_node before it's needed

2021-10-08 Thread Jonathan Wakely via Gcc-patches
This is a step towards restoring support for incomplete types in
unordered containers (PR 53339).

We do not need to instantiate the node type to get its value_type
member, because we know that the value type is the first template
parameter. We can deduce that template argument using a custom trait and
a partial specialization for _Hash_node. If we wanted to support custom
hash node types we could still use typename _Tp::value_type in the
primary template of that trait, but that seems unnecessary.

The other change needed is to defer a static assert at class scope, so
that it is done when the types are complete. We must have a complete
type in the destructor, so we can do it there instead.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h: Move static assertion to destructor.
* include/bits/hashtable_policy.h: Deduce value type from node
type without instantiating it.

Tested powerpc64le-linux. Committed to trunk.

This is the patch I referred to in:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576028.html


commit 64acc43de1e33616e43b239887a260eb4a51fcc7
Author: Jonathan Wakely 
Date:   Fri Oct 8 13:35:54 2021

libstdc++: Avoid instantiation of _Hash_node before it's needed

This is a step towards restoring support for incomplete types in
unordered containers (PR 53339).

We do not need to instantiate the node type to get its value_type
member, because we know that the value type is the first template
parameter. We can deduce that template argument using a custom trait and
a partial specialization for _Hash_node. If we wanted to support custom
hash node types we could still use typename _Tp::value_type in the
primary template of that trait, but that seems unnecessary.

The other change needed is to defer a static assert at class scope, so
that it is done when the types are complete. We must have a complete
type in the destructor, so we can do it there instead.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h: Move static assertion to destructor.
* include/bits/hashtable_policy.h: Deduce value type from node
type without instantiating it.

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index 79a3096b62b..ff8af2201cd 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -329,14 +329,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   struct __hash_code_base_access : __hash_code_base
   { using __hash_code_base::_M_bucket_index; };
 
-  // Getting a bucket index from a node shall not throw because it is used
-  // in methods (erase, swap...) that shall not throw.
-  static_assert(noexcept(declval()
-   ._M_bucket_index(declval(),
-(std::size_t)0)),
-   "Cache the hash code or qualify your functors involved"
-   " in hash code and bucket index computation with noexcept");
-
   // To get bucket index we need _RangeHash not to throw.
   static_assert(is_nothrow_default_constructible<_RangeHash>::value,
"Functor used to map hash code to bucket index"
@@ -1556,6 +1548,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
 ~_Hashtable() noexcept
 {
+  // Getting a bucket index from a node shall not throw because it is used
+  // in methods (erase, swap...) that shall not throw. Need a complete
+  // type to check this, so do it in the destructor not at class scope.
+  static_assert(noexcept(declval()
+   ._M_bucket_index(declval(),
+(std::size_t)0)),
+   "Cache the hash code or qualify your functors involved"
+   " in hash code and bucket index computation with noexcept");
+
   clear();
   _M_deallocate_buckets();
 }
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index 2f8502588f5..75488da13f7 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -1840,6 +1840,13 @@ namespace __detail
 {
 private:
   using __ebo_node_alloc = _Hashtable_ebo_helper<0, _NodeAlloc>;
+
+  template
+   struct __get_value_type;
+  template
+   struct __get_value_type<_Hash_node<_Val, _Cache_hash_code>>
+   { using type = _Val; };
+
 public:
   using __node_type = typename _NodeAlloc::value_type;
   using __node_alloc_type = _NodeAlloc;
@@ -1847,7 +1854,7 @@ namespace __detail
   using __node_alloc_traits = __gnu_cxx::__alloc_traits<__node_alloc_type>;
 
   using __value_alloc_traits = typename __node_alloc_traits::template
-   rebind_traits;
+   rebind_traits::type>;
 
   using __node_ptr = __node_type*;
   using __node_

[committed 2/2] libstdc++: Access std::pair members without tuple-like helpers

2021-10-08 Thread Jonathan Wakely via Gcc-patches

On 09/10/21 00:59 +0100, Jonathan Wakely wrote:

This is a step towards restoring support for incomplete types in
unordered containers (PR 53339).

We do not need to instantiate the node type to get its value_type
member, because we know that the value type is the first template
parameter. We can deduce that template argument using a custom trait and
a partial specialization for _Hash_node. If we wanted to support custom
hash node types we could still use typename _Tp::value_type in the
primary template of that trait, but that seems unnecessary.

The other change needed is to defer a static assert at class scope, so
that it is done when the types are complete. We must have a complete
type in the destructor, so we can do it there instead.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h: Move static assertion to destructor.
* include/bits/hashtable_policy.h: Deduce value type from node
type without instantiating it.

Tested powerpc64le-linux. Committed to trunk.

This is the patch I referred to in:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576028.html


And this is the one attached to:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575964.html

This restores support for incomplete types in std::unordered_map.

Tested powerpc64le-linux. Committed to trunk.

libstdc++: Access std::pair members without tuple-like helpers

This avoids the tuple-like API for std::pair in the unordered

containers, removing some overly generic code.

The _Select1st projection can figure out the member types of a std::pair

without using decltype(std::get<0>(...)).

We don't need _Select2nd because it's only needed in

_NodeBuilder::_S_build, and that can just access the .second member of
the pair directly. The return type of that function doesn't need to be
deduced by decltype, we can just expose the __node_type typedef of the
node generator.


commit d87105d697ced10e1f7af3f1f80ef6c9890c8585
Author: Jonathan Wakely 
Date:   Fri Oct 8 13:41:19 2021

libstdc++: Access std::pair members without tuple-like helpers

This avoids the tuple-like API for std::pair in the unordered
containers, removing some overly generic code.

The _Select1st projection can figure out the member types of a std::pair
without using decltype(std::get<0>(...)).

We don't need _Select2nd because it's only needed in
_NodeBuilder::_S_build, and that can just access the .second member of
the pair directly. The return type of that function doesn't need to be
deduced by decltype, we can just expose the __node_type typedef of the
node generator.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (_Select1st): Replace use of
std::get.
(_Select2nd): Remove.
(_NodeBuilder::_S_build): Use _NodeGenerator::__node_type
typedef instead of deducing it. Remove unnecessary piecewise
construction.
(_ReuseOrAllocNode): Make __node_type public.
(_Map_base): Adjust partial specialization to be able to extract
the mapped_type without using tuple_element.
(_Map_base::at): Define inline
* testsuite/23_containers/unordered_map/requirements/53339.cc:
Remove XFAIL.
* testsuite/23_containers/unordered_multimap/requirements/53339.cc:
Likewise.

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 75488da13f7..994c7b61046 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -87,20 +87,25 @@ namespace __detail
 
   struct _Select1st
   {
-template
-  auto
-  operator()(_Tp&& __x) const noexcept
-  -> decltype(std::get<0>(std::forward<_Tp>(__x)))
-  { return std::get<0>(std::forward<_Tp>(__x)); }
-  };
+template
+  struct __1st_type;
+
+template
+  struct __1st_type>
+  { using type = _Tp; };
+
+template
+  struct __1st_type>
+  { using type = const _Tp; };
+
+template
+  struct __1st_type<_Pair&>
+  { using type = typename __1st_type<_Pair>::type&; };
 
-  struct _Select2nd
-  {
 template
-  auto
+  typename __1st_type<_Tp>::type&&
   operator()(_Tp&& __x) const noexcept
-  -> decltype(std::get<1>(std::forward<_Tp>(__x)))
-  { return std::get<1>(std::forward<_Tp>(__x)); }
+  { return std::forward<_Tp>(__x).first; }
   };
 
   template
@@ -112,14 +117,10 @@ namespace __detail
   template
 	static auto
 	_S_build(_Kt&& __k, _Arg&& __arg, const _NodeGenerator& __node_gen)
-	-> decltype(__node_gen(std::piecewise_construct,
-			   std::forward_as_tuple(std::forward<_Kt>(__k)),
-			   std::forward_as_tuple(_Select2nd{}(
-		std::forward<_Arg>(__arg)
+	-> typename _NodeGenerator::__node_type*
 	{
-	  return __node_gen(std::piecewise_construct,

[committed] libstdc++: Replace uses of _GLIBCXX_USE_INT128 in testsuite

2021-10-08 Thread Jonathan Wakely via Gcc-patches
Since r12-435 the _GLIBCXX_USE_INT128 macro is never defined, so all
uses of it in the testsuite are wrong. The tests should be checking
__SIZEOF_INT128__ instead.

Also add some tests for an INT_3 type, which were missing.

Tested powerpc64le-linux. Committed to trunk.

commit 29a9de9b40277af98515eabebb75be1f154e9505
Author: Jonathan Wakely 
Date:   Fri Oct 8 20:41:24 2021

libstdc++: Replace uses of _GLIBCXX_USE_INT128 in testsuite

Since r12-435 the _GLIBCXX_USE_INT128 macro is never defined, so all
uses of it in the testsuite are wrong. The tests should be checking
__SIZEOF_INT128__ instead.

Also add some tests for an INT_3 type, which were missing.

libstdc++-v3/ChangeLog:

* testsuite/18_support/numeric_limits/40856.cc: Replace use of
_GLIBCXX_USE_INT128.
* testsuite/18_support/numeric_limits/dr559.cc: Likewise.
* testsuite/18_support/numeric_limits/lowest.cc: Likewise.
* testsuite/18_support/numeric_limits/max_digits10.cc: Likewise.
* testsuite/20_util/is_floating_point/value.cc: Likewise.
* testsuite/20_util/is_integral/value.cc: Likewise.
* testsuite/20_util/is_signed/value.cc: Likewise.
* testsuite/20_util/is_unsigned/value.cc: Likewise.
* testsuite/20_util/make_signed/requirements/typedefs-1.cc:
Likewise.
* testsuite/20_util/make_signed/requirements/typedefs-2.cc:
Likewise.
* testsuite/20_util/make_unsigned/requirements/typedefs-1.cc:
Likewise.
* testsuite/20_util/make_unsigned/requirements/typedefs-2.cc:
Likewise.
* testsuite/20_util/type_identity/requirements/typedefs.cc:
Likewise.
* testsuite/26_numerics/bit/bit.count/countl_one.cc: Likewise.
* testsuite/26_numerics/bit/bit.count/countl_zero.cc: Likewise.
* testsuite/26_numerics/bit/bit.count/countr_one.cc: Likewise.
* testsuite/26_numerics/bit/bit.count/countr_zero.cc: Likewise.
* testsuite/26_numerics/bit/bit.count/popcount.cc: Likewise.
* testsuite/26_numerics/bit/bit.pow.two/bit_ceil.cc: Likewise.
* testsuite/26_numerics/bit/bit.pow.two/bit_floor.cc: Likewise.
* testsuite/26_numerics/bit/bit.pow.two/bit_width.cc: Likewise.
* testsuite/26_numerics/bit/bit.pow.two/has_single_bit.cc:
Likewise.
* testsuite/26_numerics/bit/bit.rotate/rotl.cc: Likewise.

libstdc++-v3/ChangeLog:

* testsuite/26_numerics/bit/bit.rotate/rotr.cc:
* testsuite/util/testsuite_common_types.h:

diff --git a/libstdc++-v3/testsuite/18_support/numeric_limits/40856.cc 
b/libstdc++-v3/testsuite/18_support/numeric_limits/40856.cc
index 08564fbf174..ee1cf9c0cf8 100644
--- a/libstdc++-v3/testsuite/18_support/numeric_limits/40856.cc
+++ b/libstdc++-v3/testsuite/18_support/numeric_limits/40856.cc
@@ -19,8 +19,8 @@
 
 #include 
 
-// libstdc++/40856 
-#if defined _GLIBCXX_USE_INT128 && ! defined __STRICT_ANSI__
+// libstdc++/40856
+#if defined __SIZEOF_INT128__
 static_assert(std::numeric_limits<__int128>::is_specialized == true, "");
 static_assert(std::numeric_limits::is_specialized == true,
  "");
diff --git a/libstdc++-v3/testsuite/18_support/numeric_limits/dr559.cc 
b/libstdc++-v3/testsuite/18_support/numeric_limits/dr559.cc
index a90cc46b186..96a63676739 100644
--- a/libstdc++-v3/testsuite/18_support/numeric_limits/dr559.cc
+++ b/libstdc++-v3/testsuite/18_support/numeric_limits/dr559.cc
@@ -98,7 +98,7 @@ int main()
   do_test();
   do_test();
   // GNU Extensions.
-#ifdef _GLIBCXX_USE_INT128
+#ifdef __SIZEOF_INT128__
   do_test<__int128>();
   do_test();
 #endif
diff --git a/libstdc++-v3/testsuite/18_support/numeric_limits/lowest.cc 
b/libstdc++-v3/testsuite/18_support/numeric_limits/lowest.cc
index 49c1c4d6953..b44dcf42826 100644
--- a/libstdc++-v3/testsuite/18_support/numeric_limits/lowest.cc
+++ b/libstdc++-v3/testsuite/18_support/numeric_limits/lowest.cc
@@ -74,7 +74,7 @@ void test01()
   do_test();
 
   // GNU Extensions.
-#ifdef _GLIBCXX_USE_INT128
+#ifdef __SIZEOF_INT128__
   do_test<__int128>();
   do_test();
 #endif
diff --git a/libstdc++-v3/testsuite/18_support/numeric_limits/max_digits10.cc 
b/libstdc++-v3/testsuite/18_support/numeric_limits/max_digits10.cc
index a136439a761..bc7317c76a3 100644
--- a/libstdc++-v3/testsuite/18_support/numeric_limits/max_digits10.cc
+++ b/libstdc++-v3/testsuite/18_support/numeric_limits/max_digits10.cc
@@ -49,7 +49,7 @@ test01()
   VERIFY( std::numeric_limits::max_digits10 == 0 );
 
   // GNU Extensions.
-#ifdef _GLIBCXX_USE_INT128
+#ifdef __SIZEOF_INT128__
   VERIFY( std::numeric_limits<__int128>::max_digits10 == 0 );
   VERIFY( std::numeric_limits::max_digits10 == 0 );
 #endif
diff --git a/libstdc++-v3/testsuite/20_util/is_floating_point/value.cc 
b/libstdc++-v3/testsuite/20_util/is_floatin

[PATCH 4/8] libstdc++: Enable vstring for wchar_t unconditionally [PR98725]

2021-10-08 Thread Jonathan Wakely via Gcc-patches
None of these vstring specializations depend on libc support for
wchar_t, so can be enabled unconditionally now that char_traits
is always available.

libstdc++-v3/ChangeLog:

PR libstdc++/98725
* include/ext/rc_string_base.h [!_GLIBCXX_USE_WCHAR_T]
(__rc_string_base): Define member function.
* include/ext/vstring.h [!_GLIBCXX_USE_WCHAR_T]
(hash<__gnu_cxx::__wvstring>): Define specialization.
* include/ext/vstring_fwd.h [!_GLIBCXX_USE_WCHAR_T] (__wvstring)
(__wsso_string, __wrc_string): Declare typedefs.
---
 libstdc++-v3/include/ext/rc_string_base.h | 2 --
 libstdc++-v3/include/ext/vstring.h| 2 --
 libstdc++-v3/include/ext/vstring_fwd.h| 2 --
 3 files changed, 6 deletions(-)

diff --git a/libstdc++-v3/include/ext/rc_string_base.h 
b/libstdc++-v3/include/ext/rc_string_base.h
index 819f52dc914..88cc656448a 100644
--- a/libstdc++-v3/include/ext/rc_string_base.h
+++ b/libstdc++-v3/include/ext/rc_string_base.h
@@ -719,7 +719,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return false;
 }
 
-#ifdef _GLIBCXX_USE_WCHAR_T
   template<>
 inline bool
 __rc_string_base,
@@ -730,7 +729,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return true;
   return false;
 }
-#endif
 
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
diff --git a/libstdc++-v3/include/ext/vstring.h 
b/libstdc++-v3/include/ext/vstring.h
index db02af18cb1..cb5872a7030 100644
--- a/libstdc++-v3/include/ext/vstring.h
+++ b/libstdc++-v3/include/ext/vstring.h
@@ -2921,7 +2921,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return std::_Hash_impl::hash(__s.data(), __s.length()); }
 };
 
-#ifdef _GLIBCXX_USE_WCHAR_T
   /// std::hash specialization for __wvstring.
   template<>
 struct hash<__gnu_cxx::__wvstring>
@@ -2932,7 +2931,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return std::_Hash_impl::hash(__s.data(),
  __s.length() * sizeof(wchar_t)); }
 };
-#endif
 
   /// std::hash specialization for __u16vstring.
   template<>
diff --git a/libstdc++-v3/include/ext/vstring_fwd.h 
b/libstdc++-v3/include/ext/vstring_fwd.h
index 645c328104f..1aa53fdc24a 100644
--- a/libstdc++-v3/include/ext/vstring_fwd.h
+++ b/libstdc++-v3/include/ext/vstring_fwd.h
@@ -58,13 +58,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __versa_string,
 std::allocator, __rc_string_base>__rc_string;
 
-#ifdef _GLIBCXX_USE_WCHAR_T
   typedef __versa_string   __wvstring;
   typedef __wvstring__wsso_string;
   typedef
   __versa_string,
 std::allocator, __rc_string_base> __wrc_string;
-#endif  
 
 #if __cplusplus >= 201103L
   typedef __versa_string  __u16vstring;
-- 
2.31.1



[PATCH 5/8] libstdc++: Enable type traits for wchar_t unconditionally [PR98725]

2021-10-08 Thread Jonathan Wakely via Gcc-patches
None of these traits depend on libc support for wchar_t, so they should
be defined unconditionally. The wchar_t type is always defined in C++.

libstdc++-v3/ChangeLog:

PR libstdc++/98725
* include/c_global/cstddef [!_GLIBCXX_USE_WCHAR_T]
(__byte_operand): Define specialization.
* include/std/type_traits (__make_signed)
(__make_unsigned): Remove redundant check for
__WCHAR_TYPE__ being defined.
* include/tr1/type_traits [!_GLIBCXX_USE_WCHAR_T]
(__is_integral_helper): Likewise.
---
 libstdc++-v3/include/c_global/cstddef | 2 --
 libstdc++-v3/include/std/type_traits  | 8 +---
 libstdc++-v3/include/tr1/type_traits  | 2 --
 3 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/libstdc++-v3/include/c_global/cstddef 
b/libstdc++-v3/include/c_global/cstddef
index 13ef7f03c12..a96319e31ef 100644
--- a/libstdc++-v3/include/c_global/cstddef
+++ b/libstdc++-v3/include/c_global/cstddef
@@ -73,9 +73,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template<> struct __byte_operand { using __type = byte; };
   template<> struct __byte_operand { using __type = byte; };
   template<> struct __byte_operand { using __type = byte; };
-#ifdef _GLIBCXX_USE_WCHAR_T
   template<> struct __byte_operand { using __type = byte; };
-#endif
 #ifdef _GLIBCXX_USE_CHAR8_T
   template<> struct __byte_operand { using __type = byte; };
 #endif
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 35ff5806c5d..d3693b1069e 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -309,12 +309,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // We want is_integral to be true (and make_signed/unsigned to work)
   // even when libc doesn't provide working  and related functions,
-  // so check __WCHAR_TYPE__ instead of _GLIBCXX_USE_WCHAR_T.
-#ifdef __WCHAR_TYPE__
+  // so don't check _GLIBCXX_USE_WCHAR_T here.
   template<>
 struct __is_integral_helper
 : public true_type { };
-#endif
 
 #ifdef _GLIBCXX_USE_CHAR8_T
   template<>
@@ -1828,14 +1826,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // neither signed integer types nor unsigned integer types, so must be
   // transformed to the unsigned integer type with the smallest rank.
   // Use the partial specialization for enumeration types to do that.
-#ifdef __WCHAR_TYPE__
   template<>
 struct __make_unsigned
 {
   using __type
= typename __make_unsigned_selector::__type;
 };
-#endif
 
 #ifdef _GLIBCXX_USE_CHAR8_T
   template<>
@@ -1960,14 +1956,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // signed integer types nor unsigned integer types, so must be
   // transformed to the signed integer type with the smallest rank.
   // Use the partial specialization for enumeration types to do that.
-#if defined(__WCHAR_TYPE__)
   template<>
 struct __make_signed
 {
   using __type
= typename __make_signed_selector::__type;
 };
-#endif
 
 #if defined(_GLIBCXX_USE_CHAR8_T)
   template<>
diff --git a/libstdc++-v3/include/tr1/type_traits 
b/libstdc++-v3/include/tr1/type_traits
index e62369c9f4c..16d7e338bfe 100644
--- a/libstdc++-v3/include/tr1/type_traits
+++ b/libstdc++-v3/include/tr1/type_traits
@@ -109,9 +109,7 @@ namespace tr1
   _DEFINE_SPEC(0, __is_integral_helper, char, true)
   _DEFINE_SPEC(0, __is_integral_helper, signed char, true)
   _DEFINE_SPEC(0, __is_integral_helper, unsigned char, true)
-#ifdef _GLIBCXX_USE_WCHAR_T
   _DEFINE_SPEC(0, __is_integral_helper, wchar_t, true)
-#endif
   _DEFINE_SPEC(0, __is_integral_helper, short, true)
   _DEFINE_SPEC(0, __is_integral_helper, unsigned short, true)
   _DEFINE_SPEC(0, __is_integral_helper, int, true)
-- 
2.31.1



[PATCH 3/8] libstdc++: Always define typedefs and hash functions for wide strings [PR 98725]

2021-10-08 Thread Jonathan Wakely via Gcc-patches
The wstring and wstring_view typedefs should be enabled even if
 isn't supported, because char_traits works
unconditionally. Similarly, the std::hash specializations for wide
strings do not depend on  support.

Although the primary template works OK for std::char_traits in
the absence of  support, this patch still defines it as an
explicit specialization for compatibility with declarations that expect
it to be specialized. The explicit specialization just uses the same
__gnu_cxx::char_traits base class as the primary template.

libstdc++-v3/ChangeLog:

PR libstdc++/98725
* include/bits/char_traits.h (char_traits): Define
explicit specialization unconditionally.
* include/bits/basic_string.h (hash): Define
unconditionally. Do not check _GLIBCXX_USE_WCHAR_T.
* include/bits/stringfwd.h (wstring): Likewise.
* include/debug/string (wstring): Likewise.
* include/experimental/string_view (experimental::wstring_view)
(hash): Likewise.
* include/std/string (pmr::wstring, hash):
Likewise.
* include/std/string_view (wstring_view, hash):
Likewise.
---
 libstdc++-v3/include/bits/basic_string.h  | 4 
 libstdc++-v3/include/bits/char_traits.h   | 6 +-
 libstdc++-v3/include/bits/stringfwd.h | 4 
 libstdc++-v3/include/debug/string | 2 --
 libstdc++-v3/include/experimental/string_view | 6 --
 libstdc++-v3/include/std/string   | 4 
 libstdc++-v3/include/std/string_view  | 6 --
 7 files changed, 5 insertions(+), 27 deletions(-)

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 68c388408f0..59c84b1b6ad 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -3954,7 +3954,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct __is_fast_hash> : std::false_type
 { };
 
-#ifdef _GLIBCXX_USE_WCHAR_T
   /// std::hash specialization for wstring.
   template<>
 struct hash
@@ -3969,7 +3968,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template<>
 struct __is_fast_hash> : std::false_type
 { };
-#endif
 #endif /* _GLIBCXX_COMPATIBILITY_CXX0X */
 
 #ifdef _GLIBCXX_USE_CHAR8_T
@@ -4034,12 +4032,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 operator""s(const char* __str, size_t __len)
 { return basic_string{__str, __len}; }
 
-#ifdef _GLIBCXX_USE_WCHAR_T
 _GLIBCXX_DEFAULT_ABI_TAG
 inline basic_string
 operator""s(const wchar_t* __str, size_t __len)
 { return basic_string{__str, __len}; }
-#endif
 
 #ifdef _GLIBCXX_USE_CHAR8_T
 _GLIBCXX_DEFAULT_ABI_TAG
diff --git a/libstdc++-v3/include/bits/char_traits.h 
b/libstdc++-v3/include/bits/char_traits.h
index 3da6e28a513..f6f8851c22d 100644
--- a/libstdc++-v3/include/bits/char_traits.h
+++ b/libstdc++-v3/include/bits/char_traits.h
@@ -256,7 +256,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  for advice on how to make use of this class for @a unusual character
*  types. Also, check out include/ext/pod_char_traits.h.
   */
-  template
+  template
 struct char_traits : public __gnu_cxx::char_traits<_CharT>
 { };
 
@@ -507,6 +507,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   not_eof(const int_type& __c) _GLIBCXX_NOEXCEPT
   { return eq_int_type(__c, eof()) ? 0 : __c; }
   };
+#else // _GLIBCXX_USE_WCHAR_T
+  template<>
+struct char_traits : public __gnu_cxx::char_traits
+{ };
 #endif //_GLIBCXX_USE_WCHAR_T
 
 #ifdef _GLIBCXX_USE_CHAR8_T
diff --git a/libstdc++-v3/include/bits/stringfwd.h 
b/libstdc++-v3/include/bits/stringfwd.h
index 7cb92ebcbfe..bcfd350e505 100644
--- a/libstdc++-v3/include/bits/stringfwd.h
+++ b/libstdc++-v3/include/bits/stringfwd.h
@@ -54,9 +54,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template<> struct char_traits;
 
-#ifdef _GLIBCXX_USE_WCHAR_T
   template<> struct char_traits;
-#endif
 
 #ifdef _GLIBCXX_USE_CHAR8_T
   template<> struct char_traits;
@@ -78,10 +76,8 @@ _GLIBCXX_END_NAMESPACE_CXX11
   /// A string of @c char
   typedef basic_stringstring;   
 
-#ifdef _GLIBCXX_USE_WCHAR_T
   /// A string of @c wchar_t
   typedef basic_string wstring;   
-#endif
 
 #ifdef _GLIBCXX_USE_CHAR8_T
   /// A string of @c char8_t
diff --git a/libstdc++-v3/include/debug/string 
b/libstdc++-v3/include/debug/string
index 8744a55be64..a8389528001 100644
--- a/libstdc++-v3/include/debug/string
+++ b/libstdc++-v3/include/debug/string
@@ -1298,9 +1298,7 @@ namespace __gnu_debug
 
   typedef basic_stringstring;
 
-#ifdef _GLIBCXX_USE_WCHAR_T
   typedef basic_string wstring;
-#endif
 
 #ifdef _GLIBCXX_USE_CHAR8_T
   /// A string of @c char8_t
diff --git a/libstdc++-v3/include/experimental/string_view 
b/libstdc++-v3/include/experimental/string_view
index d9bc5cd166d..b8e4db8ef30 100644
--- a/libstdc++-v3/include/experimental/string_view
+++ b/libstdc++-v3/include/experimental/string_view
@@ -564,9 +564,7 @@ inline namespace fundamentals_v1
   // basic_string_view typede

[PATCH 2/8] libstdc++: Move test that depends on wchar_t I/O to wchar_t sub-directory

2021-10-08 Thread Jonathan Wakely via Gcc-patches
This fixes a FAIL when --disable-wchar_t is used.

libstdc++-v3/ChangeLog:

* testsuite/27_io/basic_filebuf/close/81256.cc: Moved to...
* testsuite/27_io/basic_filebuf/close/wchar_t/81256.cc: ...here.
---
 .../testsuite/27_io/basic_filebuf/close/{ => wchar_t}/81256.cc| 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename libstdc++-v3/testsuite/27_io/basic_filebuf/close/{ => wchar_t}/81256.cc 
(100%)

diff --git a/libstdc++-v3/testsuite/27_io/basic_filebuf/close/81256.cc 
b/libstdc++-v3/testsuite/27_io/basic_filebuf/close/wchar_t/81256.cc
similarity index 100%
rename from libstdc++-v3/testsuite/27_io/basic_filebuf/close/81256.cc
rename to libstdc++-v3/testsuite/27_io/basic_filebuf/close/wchar_t/81256.cc
-- 
2.31.1



[committed 1/8] libstdc++: Add missing _GLIBCXX_USE_WCHAR_T checks in testsuite

2021-10-08 Thread Jonathan Wakely via Gcc-patches
These tests fail for a --disable-wchar_t build.

Tested powerpc64le-linux and x86_64-linux, including a --disable-wchar_t
build. Pushed to trunk.

libstdc++-v3/ChangeLog:

* testsuite/22_locale/conversions/buffer/1.cc: Check
_GLIBCXX_USE_WCHAR_T.
* testsuite/22_locale/conversions/buffer/3.cc: Likewise. Add
test using char16_t.
* testsuite/22_locale/conversions/string/1.cc: Check
_GLIBCXX_USE_WCHAR_T.
* testsuite/27_io/filesystem/path/generic/generic_string.cc:
Likewise.
* testsuite/27_io/filesystem/path/modifiers/make_preferred.cc:
Likewise.
* testsuite/27_io/filesystem/path/native/alloc.cc: Likewise.
* testsuite/27_io/filesystem/path/native/string-char8_t.cc:
Likewise.
* testsuite/27_io/filesystem/path/native/string.cc: Likewise.
* testsuite/28_regex/algorithms/regex_match/extended/wstring_locale.cc:
Likewise.
* testsuite/experimental/filesystem/path/generic/generic_string.cc:
Likewise.
* testsuite/experimental/filesystem/path/native/alloc.cc:
Likewise.
* testsuite/experimental/filesystem/path/native/string-char8_t.cc:
Likewise.
* testsuite/experimental/filesystem/path/native/string.cc:
Likewise.
---
 .../22_locale/conversions/buffer/1.cc  | 10 ++
 .../22_locale/conversions/buffer/3.cc  | 18 +-
 .../22_locale/conversions/string/1.cc  |  2 ++
 .../filesystem/path/generic/generic_string.cc  |  4 
 .../path/modifiers/make_preferred.cc   |  4 
 .../27_io/filesystem/path/native/alloc.cc  |  4 
 .../filesystem/path/native/string-char8_t.cc   |  2 ++
 .../27_io/filesystem/path/native/string.cc |  2 ++
 .../regex_match/extended/wstring_locale.cc |  2 ++
 .../filesystem/path/generic/generic_string.cc  |  2 ++
 .../filesystem/path/native/alloc.cc|  4 
 .../filesystem/path/native/string-char8_t.cc   |  2 ++
 .../filesystem/path/native/string.cc   |  2 ++
 13 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/testsuite/22_locale/conversions/buffer/1.cc 
b/libstdc++-v3/testsuite/22_locale/conversions/buffer/1.cc
index 2d5c09449ca..9db7fce7241 100644
--- a/libstdc++-v3/testsuite/22_locale/conversions/buffer/1.cc
+++ b/libstdc++-v3/testsuite/22_locale/conversions/buffer/1.cc
@@ -31,12 +31,11 @@ template
 using buf_conv = std::wbuffer_convert, Elem>;
 
 using std::string;
-using std::stringstream;
 using std::wstring;
-using std::wstringstream;
 
 void test01()
 {
+#ifdef _GLIBCXX_USE_WCHAR_T
   buf_conv buf;
   std::stringbuf sbuf;
   VERIFY( buf.rdbuf() == nullptr );
@@ -46,6 +45,7 @@ void test01()
 
   __gnu_test::implicitly_default_constructible test;
   test.operator()>(); // P0935R0
+#endif
 }
 
 void test02()
@@ -53,7 +53,7 @@ void test02()
   std::stringbuf sbuf;
   buf_conv buf(&sbuf);  // noconv
 
-  stringstream ss;
+  std::stringstream ss;
   ss.std::ios::rdbuf(&buf);
   string input = "King for a day...";
   ss << input << std::flush;
@@ -63,15 +63,17 @@ void test02()
 
 void test03()
 {
+#ifdef _GLIBCXX_USE_WCHAR_T
   std::stringbuf sbuf;
   buf_conv buf(&sbuf);
 
-  wstringstream ss;
+  std::wstringstream ss;
   ss.std::wios::rdbuf(&buf);
   wstring input = L"Fool for a lifetime";
   ss << input << std::flush;
   string output = sbuf.str();
   VERIFY( output == "Fool for a lifetime" );
+#endif
 }
 
 int main()
diff --git a/libstdc++-v3/testsuite/22_locale/conversions/buffer/3.cc 
b/libstdc++-v3/testsuite/22_locale/conversions/buffer/3.cc
index 94aa43bbfdb..3e1d90ffe92 100644
--- a/libstdc++-v3/testsuite/22_locale/conversions/buffer/3.cc
+++ b/libstdc++-v3/testsuite/22_locale/conversions/buffer/3.cc
@@ -38,21 +38,37 @@ private:
   char c = 'a';
 };
 
-struct codecvt : std::codecvt { };
 
 void
 test01()
 {
+#ifdef _GLIBCXX_USE_WCHAR_T
+  struct codecvt : std::codecvt { };
   // https://gcc.gnu.org/ml/libstdc++/2017-11/msg00022.html
   streambuf sb;
   std::wbuffer_convert conv(&sb);
   VERIFY( sb.in_avail() == 0 );
   wchar_t c = conv.sgetc();
   VERIFY( c == L'a' );
+#endif
+}
+
+
+void
+test02()
+{
+  struct codecvt : std::codecvt { };
+  // https://gcc.gnu.org/ml/libstdc++/2017-11/msg00022.html
+  streambuf sb;
+  std::wbuffer_convert conv(&sb);
+  VERIFY( sb.in_avail() == 0 );
+  char16_t c = conv.sgetc();
+  VERIFY( c == u'a' );
 }
 
 int
 main()
 {
   test01();
+  test02();
 }
diff --git a/libstdc++-v3/testsuite/22_locale/conversions/string/1.cc 
b/libstdc++-v3/testsuite/22_locale/conversions/string/1.cc
index 0016910441e..b5132dadce4 100644
--- a/libstdc++-v3/testsuite/22_locale/conversions/string/1.cc
+++ b/libstdc++-v3/testsuite/22_locale/conversions/string/1.cc
@@ -51,6 +51,7 @@ void test01()
 
 void test02()
 {
+#ifdef _GLIBCXX_USE_WCHAR_T
   typedef str_conv wsc;
   wsc c;
   string input = "Fool for a lifetime";
@@ -71,6 +72,7 @@ void test02()
 
   __gnu_test::implicitly_default_c

[PATCH 6/8] libstdc++: Define std::wstring_convert unconditionally [PR 98725]

2021-10-08 Thread Jonathan Wakely via Gcc-patches
The wchar_t type is defined unconditionally for C++, so there is no
reason for std::wstring_convert and std::wbuffer_convert to be disabled
when  is not usable. It should be possible to use those class
templates with char16_t and char32_t even if wchar_t conversions don't
work.

libstdc++-v3/ChangeLog:

PR libstdc++/98725
* include/bits/locale_conv.h (wstring_convert, wbuffer_convert):
Define unconditionally. Do not check _GLIBCXX_USE_WCHAR_T.
---
 libstdc++-v3/include/bits/locale_conv.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/libstdc++-v3/include/bits/locale_conv.h 
b/libstdc++-v3/include/bits/locale_conv.h
index 6af8a5bdc8f..41d17238fbd 100644
--- a/libstdc++-v3/include/bits/locale_conv.h
+++ b/libstdc++-v3/include/bits/locale_conv.h
@@ -253,8 +253,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   };
   }
 
-#ifdef _GLIBCXX_USE_WCHAR_T
-
 _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
   /// String conversions
@@ -626,8 +624,6 @@ _GLIBCXX_END_NAMESPACE_CXX11
   bool _M_always_noconv;
 };
 
-#endif  // _GLIBCXX_USE_WCHAR_T
-
   /// @} group locales
 
 _GLIBCXX_END_NAMESPACE_VERSION
-- 
2.31.1



[PATCH 7/8] libstdc++: Define deleted wchar_t overloads unconditionally [PR 98725]

2021-10-08 Thread Jonathan Wakely via Gcc-patches
We don't need to have  support in order to delete overloads
for inserting wide characters into narrow streams.

libstdc++-v3/ChangeLog:

PR libstdc++/98725
* include/std/ostream (operator<<(basic_ostream&, wchar_t))
(operator<<(basic_ostream&, const wchar_t*)): Always
define as deleted. Do not check _GLIBCXX_USE_WCHAR_T.
---
 libstdc++-v3/include/std/ostream | 4 
 1 file changed, 4 deletions(-)

diff --git a/libstdc++-v3/include/std/ostream b/libstdc++-v3/include/std/ostream
index 7d39c5706d5..4d7b9b4ef0b 100644
--- a/libstdc++-v3/include/std/ostream
+++ b/libstdc++-v3/include/std/ostream
@@ -533,11 +533,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // The following deleted overloads prevent formatting character values as
   // numeric values.
 
-#ifdef _GLIBCXX_USE_WCHAR_T
   template
 basic_ostream&
 operator<<(basic_ostream&, wchar_t) = delete;
-#endif // _GLIBCXX_USE_WCHAR_T
 
 #ifdef _GLIBCXX_USE_CHAR8_T
   template
@@ -629,11 +627,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// The following deleted overloads prevent formatting strings as
// pointer values.
 
-#ifdef _GLIBCXX_USE_WCHAR_T
   template
 basic_ostream&
 operator<<(basic_ostream&, const wchar_t*) = delete;
-#endif // _GLIBCXX_USE_WCHAR_T
 
 #ifdef _GLIBCXX_USE_CHAR8_T
   template
-- 
2.31.1



[PATCH 8/8] libstdc++: Remove unnecessary uses of _GLIBCXX_USE_WCHAR_T in testsuite [PR98725]

2021-10-08 Thread Jonathan Wakely via Gcc-patches
Now that std::wstring and other specializations for wchar_t are defined
unconditionally, many checks for _GLIBCXX_USE_WCHAR_T in the testsuite
are unnecessary and can be removed. Tests for iostreams, locales, regex
and filesystem::path still need to be guarded by _GLIBCXX_USE_WCHAR_T
because those components depend on libc support in  and other
headers.

libstdc++-v3/ChangeLog:

PR libstdc++/98725
* testsuite/18_support/numeric_limits/lowest.cc: Remove use of
_GLIBCXX_USE_WCHAR_T.
* testsuite/18_support/numeric_limits/min_max.cc: Replace use of
_GLIBCXX_USE_WCHAR_T with checks for WCHAR_MIN and WCHAR_MAX.
* testsuite/20_util/from_chars/1_neg.cc: Remove use of
_GLIBCXX_USE_WCHAR_T.
* testsuite/20_util/function_objects/searchers.cc: Likewise. Use
char_traits::length instead of wcslen.
* testsuite/20_util/hash/requirements/explicit_instantiation.cc:
Likewise.
* testsuite/20_util/is_arithmetic/value.cc: Likewise.
* testsuite/20_util/is_compound/value.cc: Likewise.
* testsuite/20_util/is_floating_point/value.cc: Likewise.
* testsuite/20_util/is_fundamental/value.cc: Likewise.
* testsuite/20_util/is_integral/value.cc: Likewise.
* testsuite/20_util/is_signed/value.cc: Likewise.
* testsuite/20_util/is_unsigned/value.cc: Likewise.
* testsuite/20_util/is_void/value.cc: Likewise.
* testsuite/20_util/make_signed/requirements/typedefs-1.cc:
Likewise.
* testsuite/20_util/make_signed/requirements/typedefs-2.cc:
Likewise.
* testsuite/20_util/make_signed/requirements/typedefs-3.cc:
Likewise.
* testsuite/20_util/make_signed/requirements/typedefs-4.cc:
Likewise.
* testsuite/20_util/make_unsigned/requirements/typedefs-1.cc:
Likewise.
* testsuite/20_util/make_unsigned/requirements/typedefs-2.cc:
Likewise.
* testsuite/20_util/make_unsigned/requirements/typedefs-3.cc:
Likewise.
* testsuite/20_util/to_chars/3.cc: Likewise.
* testsuite/20_util/type_identity/requirements/typedefs.cc:
Likewise.
* testsuite/21_strings/basic_string/hash/debug.cc: Likewise.
* testsuite/21_strings/basic_string/hash/hash.cc: Likewise.
* testsuite/21_strings/basic_string/literals/types-char8_t.cc:
Likewise.
* testsuite/21_strings/basic_string/literals/types.cc: Likewise.
* testsuite/21_strings/basic_string/literals/values-char8_t.cc:
Likewise.
* testsuite/21_strings/basic_string/literals/values.cc:
Likewise.
* testsuite/21_strings/basic_string/modifiers/64422.cc:
Likewise.
* testsuite/21_strings/basic_string/range_access/wchar_t/1.cc:
Likewise.
* testsuite/21_strings/basic_string/requirements/citerators.cc:
Likewise.
* testsuite/21_strings/basic_string/requirements/typedefs.cc:
Likewise.
* testsuite/21_strings/basic_string/types/pmr_typedefs.cc:
Likewise.
* testsuite/21_strings/basic_string_view/literals/types-char8_t.cc:
Likewise.
* testsuite/21_strings/basic_string_view/literals/types.cc:
Likewise.
* testsuite/21_strings/basic_string_view/literals/values-char8_t.cc:
Likewise.
* testsuite/21_strings/basic_string_view/literals/values.cc:
Likewise.
* testsuite/21_strings/basic_string_view/requirements/typedefs.cc:
Likewise.
* testsuite/21_strings/basic_string_view/typedefs.cc: Likewise.
* testsuite/21_strings/char_traits/requirements/constexpr_functions.cc:
Likewise.
* 
testsuite/21_strings/char_traits/requirements/constexpr_functions_c++17.cc:
Likewise.
* 
testsuite/21_strings/char_traits/requirements/constexpr_functions_c++20.cc:
Likewise.
* testsuite/22_locale/ctype/is/string/89728_neg.cc: Likewise.
* testsuite/25_algorithms/fill/4.cc: Likewise.
* testsuite/25_algorithms/fill_n/1.cc: Likewise.
* testsuite/experimental/functional/searchers.cc: Likewise. Use
char_traits::length instead of wcslen.
* testsuite/experimental/polymorphic_allocator/pmr_typedefs_string.cc:
Likewise.
* testsuite/experimental/string_view/literals/types-char8_t.cc:
Likewise.
* testsuite/experimental/string_view/literals/types.cc:
Likewise.
* testsuite/experimental/string_view/literals/values-char8_t.cc:
Likewise.
* testsuite/experimental/string_view/literals/values.cc:
Likewise.
* testsuite/experimental/string_view/range_access/wchar_t/1.cc:
Likewise.
* testsuite/experimental/string_view/requirements/typedefs.cc:
Likewise.
* testsuite/experimental/string_view/typedefs.cc: Likewise.
* testsuite/ext/vstring/range_access.cc: Likewise.
* testsuite/

Re: [PATCH] Refine movhfcc.

2021-10-08 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 8, 2021 at 5:31 PM liuhongt  wrote:
>
> For AVX512-FP16, HFmode only supports vcmpsh whose dest is mask
> register, so for movhfcc, it's
>
> vcmpsh op2, op1, %k1
> vmovsh op1, op2{%k1}
> mov op2, dest
>
> gcc/ChangeLog:
>
> PR target/102639
> * config/i386/i386-expand.c (ix86_valid_mask_cmp_mode): Handle
> HFmode.
> (ix86_use_mask_cmp_p): Ditto.
> (ix86_expand_sse_movcc): Ditto.
> * config/i386/i386.md (setcc_hf_mask): New define_insn.
> (movhf_mask): Ditto.
> (UNSPEC_MOVCC_MASK): New unspec.
> * config/i386/sse.md (UNSPEC_PCMP): Move to i386.md.
>
> gcc/testsuite/ChangeLog:
> * g++.target/i386/pr102639.C: New test.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Also no new failures for AVX512FP16 runtime tests under for sde{-m32,}.
Committed to trunk.
> ---
>  gcc/config/i386/i386-expand.c| 19 ++---
>  gcc/config/i386/i386.md  | 34 +++-
>  gcc/config/i386/sse.md   |  1 -
>  gcc/testsuite/g++.target/i386/pr102639.C | 19 +
>  4 files changed, 67 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr102639.C
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 4780b993917..3c4a07d4d7d 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -3613,6 +3613,10 @@ ix86_valid_mask_cmp_mode (machine_mode mode)
>if (TARGET_XOP && !TARGET_AVX512F)
>  return false;
>
> +  /* HFmode only supports vcmpsh whose dest is mask register.  */
> +  if (TARGET_AVX512FP16 && mode == HFmode)
> +return true;
> +
>/* AVX512F is needed for mask operation.  */
>if (!(TARGET_AVX512F && VECTOR_MODE_P (mode)))
>  return false;
> @@ -3634,7 +3638,9 @@ ix86_use_mask_cmp_p (machine_mode mode, machine_mode 
> cmp_mode,
>  {
>int vector_size = GET_MODE_SIZE (mode);
>
> -  if (vector_size < 16)
> +  if (cmp_mode == HFmode)
> +return true;
> +  else if (vector_size < 16)
>  return false;
>else if (vector_size == 64)
>  return true;
> @@ -3750,7 +3756,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
> rtx op_false)
>&& GET_MODE_CLASS (cmpmode) == MODE_INT)
>  {
>gcc_assert (ix86_valid_mask_cmp_mode (mode));
> -  /* Using vector move with mask register.  */
> +  /* Using scalar/vector move with mask register.  */
>cmp = force_reg (cmpmode, cmp);
>/* Optimize for mask zero.  */
>op_true = (op_true != CONST0_RTX (mode)
> @@ -3769,8 +3775,13 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
> rtx op_false)
>   std::swap (op_true, op_false);
> }
>
> -  rtx vec_merge = gen_rtx_VEC_MERGE (mode, op_true, op_false, cmp);
> -  emit_insn (gen_rtx_SET (dest, vec_merge));
> +  if (mode == HFmode)
> +   emit_insn (gen_movhf_mask (dest, op_true, op_false, cmp));
> +  else
> +   {
> + rtx vec_merge = gen_rtx_VEC_MERGE (mode, op_true, op_false, cmp);
> + emit_insn (gen_rtx_SET (dest, vec_merge));
> +   }
>return;
>  }
>else if (vector_all_ones_operand (op_true, mode)
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 04cb3bf6a33..c7ae4ac5fbc 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -117,6 +117,7 @@ (define_c_enum "unspec" [
>;; For SSE/MMX support:
>UNSPEC_FIX_NOTRUNC
>UNSPEC_MASKMOV
> +  UNSPEC_MOVCC_MASK
>UNSPEC_MOVMSK
>UNSPEC_BLENDV
>UNSPEC_PSHUFB
> @@ -125,8 +126,9 @@ (define_c_enum "unspec" [
>UNSPEC_RSQRT
>UNSPEC_PSADBW
>
> -  ;; For AVX512F support
> +  ;; For AVX/AVX512F support
>UNSPEC_SCALEF
> +  UNSPEC_PCMP
>
>;; Generic math support
>UNSPEC_IEEE_MIN  ; not commutative
> @@ -13608,6 +13610,20 @@ (define_insn "setcc__sse"
> (set_attr "length_immediate" "1")
> (set_attr "prefix" "orig,vex")
> (set_attr "mode" "")])
> +
> +(define_insn "setcc_hf_mask"
> +  [(set (match_operand:QI 0 "register_operand" "=k")
> +   (unspec:QI
> + [(match_operand:HF 1 "register_operand" "v")
> +  (match_operand:HF 2 "nonimmediate_operand" "vm")
> +  (match_operand:SI 3 "const_0_to_31_operand" "n")]
> + UNSPEC_PCMP))]
> +  "TARGET_AVX512FP16"
> +  "vcmpsh\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> +  [(set_attr "type" "ssecmp")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
>
>  ;; Basic conditional jump instructions.
>
> @@ -19841,6 +19857,22 @@ (define_peephole2
>operands[9] = replace_rtx (operands[6], operands[0], operands[1], true);
>  })
>
> +(define_insn "movhf_mask"
> +  [(set (match_operand:HF 0 "nonimmediate_operand" "=v,m,v")
> +   (unspec:HF
> + [(match_operand:HF 1 "nonimmediate_operand" "m,v,v")
> +  (match_operand:HF 2 "nonimm_or_0_operand" "0C,0C,0C")
> +  (match_operand:QI 3 "register_operand" "Yk

Re: [r12-4240 Regression] FAIL: libgomp.c++/scan-9.C scan-tree-dump-times vect "vectorized [2-6] loops" 2 on Linux/x86_64

2021-10-08 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 8, 2021 at 8:02 PM sunil.k.pandey via Gcc-patches
 wrote:
>
> On Linux/x86_64,
>
> 2b8453c401b699ed93c085d0413ab4b5030bcdb8 is the first bad commit
> commit 2b8453c401b699ed93c085d0413ab4b5030bcdb8
> Author: liuhongt 
> Date:   Mon Sep 6 13:48:49 2021 +0800
>
> Enable auto-vectorization at O2 with very-cheap cost model.
>
> caused
>
> FAIL: libgomp.c++/scan-10.C scan-tree-dump-times vect "vectorized [2-6] 
> loops" 2
> FAIL: libgomp.c/scan-11.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
> FAIL: libgomp.c++/scan-11.C scan-tree-dump-times vect "vectorized [2-6] 
> loops" 2
> FAIL: libgomp.c/scan-12.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
> FAIL: libgomp.c++/scan-12.C scan-tree-dump-times vect "vectorized [2-6] 
> loops" 2
> FAIL: libgomp.c/scan-13.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
> FAIL: libgomp.c++/scan-13.C scan-tree-dump-times vect "vectorized [2-6] 
> loops" 2
> FAIL: libgomp.c/scan-14.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
> FAIL: libgomp.c++/scan-14.C scan-tree-dump-times vect "vectorized [2-6] 
> loops" 2
> FAIL: libgomp.c/scan-15.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
> FAIL: libgomp.c++/scan-15.C scan-tree-dump-times vect "vectorized [2-6] 
> loops" 2
> FAIL: libgomp.c/scan-16.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
> FAIL: libgomp.c++/scan-16.C scan-tree-dump-times vect "vectorized [2-6] 
> loops" 2
> FAIL: libgomp.c/scan-17.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
> FAIL: libgomp.c/scan-18.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
> FAIL: libgomp.c/scan-19.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
> FAIL: libgomp.c/scan-20.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
> FAIL: libgomp.c/scan-21.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
> FAIL: libgomp.c/scan-22.c scan-tree-dump-times vect "vectorized [2-6] loops" 2
> FAIL: libgomp.c++/scan-9.C scan-tree-dump-times vect "vectorized [2-6] loops" 
> 2
>
> with GCC configured with
>
> ../../gcc/configure 
> --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4240/usr
>  --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c++/scan-10.C --target_board='unix{-m32}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c++/scan-10.C --target_board='unix{-m32\ 
> -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c++/scan-10.C --target_board='unix{-m64}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c++/scan-10.C --target_board='unix{-m64\ 
> -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c.exp=libgomp.c/scan-11.c --target_board='unix{-m32}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c.exp=libgomp.c/scan-11.c --target_board='unix{-m32\ 
> -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c.exp=libgomp.c/scan-11.c --target_board='unix{-m64}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c.exp=libgomp.c/scan-11.c --target_board='unix{-m64\ 
> -march=cascadelake}'

Re: [r12-4240 Regression] FAIL: libgomp.c++/scan-9.C scan-tree-dump-times vect "vectorized [2-6] loops" 2 on Linux/x86_64

2021-10-08 Thread Hongtao Liu via Gcc-patches
On Sat, Oct 9, 2021 at 1:27 AM sunil.k.pandey via Gcc-patches
 wrote:
>
> On Linux/x86_64,
>
> 2b8453c401b699ed93c085d0413ab4b5030bcdb8 is the first bad commit
> commit 2b8453c401b699ed93c085d0413ab4b5030bcdb8
> Author: liuhongt 
> Date:   Mon Sep 6 13:48:49 2021 +0800
>
> Enable auto-vectorization at O2 with very-cheap cost model.
>
> caused
>
> FAIL: gcc.dg/optimize-bswapsi-5.c scan-tree-dump-times optimized "= 
> __builtin_bswap32 \\(" 2
> FAIL: gcc.dg/optimize-bswapsi-6.c scan-tree-dump store-merging "32 bit bswap 
> implementation found at"
> FAIL: gcc.dg/torture/pr69760.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
Those testcase should be adjusted with either add -fno-tree-vectorize
or add -mtune=generic when target is x86 except for runtime test
pr69760.c which seems to be a real bug.
> FAIL: gcc.dg/Warray-bounds-51.c  target { i?86-*-* x86_64-*-* }  (test for 
> warnings, line 41)
> FAIL: gcc.dg/Wstringop-overflow-14.c  target { i?86-*-* x86_64-*-* }  (test 
> for warnings, line 38)
> FAIL: g++.dg/tree-ssa/pr94403.C   scan-tree-dump-times store-merging 
> "__builtin_bswap32" 1
> FAIL: g++.dg/tree-ssa/pr94403.C   scan-tree-dump-times store-merging 
> "__builtin_bswap64" 1
>
> with GCC configured with
>
> ../../gcc/configure 
> --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4240/usr
>  --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="dg.exp=gcc.dg/optimize-bswapsi-5.c --target_board='unix{-m64\ 
> -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="dg.exp=gcc.dg/optimize-bswapsi-6.c --target_board='unix{-m32\ 
> -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="dg.exp=gcc.dg/optimize-bswapsi-6.c --target_board='unix{-m64\ 
> -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="dg-torture.exp=gcc.dg/torture/pr69760.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="dg.exp=gcc.dg/Warray-bounds-51.c --target_board='unix{-m32\ 
> -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="dg.exp=gcc.dg/Warray-bounds-51.c --target_board='unix{-m64\ 
> -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="dg.exp=gcc.dg/Wstringop-overflow-14.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="dg.exp=gcc.dg/Wstringop-overflow-14.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="dg.exp=g++.dg/tree-ssa/pr94403.C --target_board='unix{-m64\ 
> -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
>
> (Please do not reply to this email, for question about this report, contact 
> me at skpgkp2 at gmail dot com)



-- 
BR,
Hongtao


Re: [r12-4240 Regression] FAIL: libgomp.c++/scan-9.C scan-tree-dump-times vect "vectorized [2-6] loops" 2 on Linux/x86_64

2021-10-08 Thread Hongtao Liu via Gcc-patches
On Sat, Oct 9, 2021 at 10:53 AM Hongtao Liu  wrote:
>
> On Sat, Oct 9, 2021 at 1:27 AM sunil.k.pandey via Gcc-patches
>  wrote:
> >
> > On Linux/x86_64,
> >
> > 2b8453c401b699ed93c085d0413ab4b5030bcdb8 is the first bad commit
> > commit 2b8453c401b699ed93c085d0413ab4b5030bcdb8
> > Author: liuhongt 
> > Date:   Mon Sep 6 13:48:49 2021 +0800
> >
> > Enable auto-vectorization at O2 with very-cheap cost model.
> >
> > caused
> >
> > FAIL: gcc.dg/optimize-bswapsi-5.c scan-tree-dump-times optimized "= 
> > __builtin_bswap32 \\(" 2
> > FAIL: gcc.dg/optimize-bswapsi-6.c scan-tree-dump store-merging "32 bit 
> > bswap implementation found at"
> > FAIL: gcc.dg/torture/pr69760.c   -O2 -flto -fuse-linker-plugin 
> > -fno-fat-lto-objects  (test for excess errors)
> Those testcase should be adjusted with either add -fno-tree-vectorize
> or add -mtune=generic when target is x86 except for runtime test
> pr69760.c which seems to be a real bug.
Oh, it's not runtime failure, its extra warning message after vectorization.

/export/users2/liuhongt/gcc/intel-innersource/O2_vectorization/gcc/testsuite/gcc.dg/torture/pr69760.c:
In function 'test_func':
/export/users2/liuhongt/gcc/intel-innersource/O2_vectorization/gcc/testsuite/gcc.dg/torture/pr69760.c:16:10:
warning: iteration 54 invokes undefined behavior
[-Waggressive-loop-optimizations]
/export/users2/liuhongt/gcc/intel-innersource/O2_vectorization/gcc/testsuite/gcc.dg/torture/pr69760.c:12:17:
note: within this loop

> > FAIL: gcc.dg/Warray-bounds-51.c  target { i?86-*-* x86_64-*-* }  (test for 
> > warnings, line 41)
> > FAIL: gcc.dg/Wstringop-overflow-14.c  target { i?86-*-* x86_64-*-* }  (test 
> > for warnings, line 38)
> > FAIL: g++.dg/tree-ssa/pr94403.C   scan-tree-dump-times store-merging 
> > "__builtin_bswap32" 1
> > FAIL: g++.dg/tree-ssa/pr94403.C   scan-tree-dump-times store-merging 
> > "__builtin_bswap64" 1
> >
> > with GCC configured with
> >
> > ../../gcc/configure 
> > --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4240/usr
> >  --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> > --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet 
> > --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> >
> > To reproduce:
> >
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="dg.exp=gcc.dg/optimize-bswapsi-5.c --target_board='unix{-m64\ 
> > -march=cascadelake}'"
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="dg.exp=gcc.dg/optimize-bswapsi-6.c --target_board='unix{-m32\ 
> > -march=cascadelake}'"
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="dg.exp=gcc.dg/optimize-bswapsi-6.c --target_board='unix{-m64\ 
> > -march=cascadelake}'"
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="dg-torture.exp=gcc.dg/torture/pr69760.c 
> > --target_board='unix{-m32\ -march=cascadelake}'"
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="dg.exp=gcc.dg/Warray-bounds-51.c --target_board='unix{-m32\ 
> > -march=cascadelake}'"
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="dg.exp=gcc.dg/Warray-bounds-51.c --target_board='unix{-m64\ 
> > -march=cascadelake}'"
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="dg.exp=gcc.dg/Wstringop-overflow-14.c 
> > --target_board='unix{-m32\ -march=cascadelake}'"
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="dg.exp=gcc.dg/Wstringop-overflow-14.c 
> > --target_board='unix{-m64\ -march=cascadelake}'"
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="dg.exp=g++.dg/tree-ssa/pr94403.C --target_board='unix{-m64\ 
> > -march=cascadelake}'"
> > $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> > RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
> > --target_board='unix{-m32}'"
> > $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> > RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
> > --target_board='unix{-m32\ -march=cascadelake}'"
> > $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> > RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
> > --target_board='unix{-m64}'"
> > $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> > RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-7.c 
> > --target_board='unix{-m64\ -march=cascadelake}'"
> > $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> > RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
> > --target_board='unix{-m32}'"
> > $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> > RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
> > --target_board='unix{-m32\ -march=cascadelake}'"
> > $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> > RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/lastprivate-conditional-8.c 
> > --target_board='unix{-m64}'"
> > $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> > RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/

Re: [PATCH] var-tracking: Fix a wrong-debug issue caused by my r10-7665 var-tracking change [PR102441]

2021-10-08 Thread Alexandre Oliva via Gcc-patches
Hello, Jakub,

On Oct  4, 2021, Jakub Jelinek  wrote:

> Finally, patch2, the shortest patch, uses MO_VAL_SET whenever destination
> is not sp and otherwise drops the micro-operation on the floor.

That sounds quite reasonable to me, and it is indeed my favorite of the
3 proposed patches, because the mapping of locations to values is kept
most accurate.

Thanks!

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] hardened conditionals

2021-10-08 Thread Alexandre Oliva via Gcc-patches


This patch introduces optional passes to harden conditionals used in
branches, and in computing boolean expressions, by adding redundant
tests of the reversed conditions, and trapping in case of unexpected
results.  Though in abstract machines the redundant tests should never
fail, CPUs may be led to misbehave under certain kinds of attacks,
such as of power deprivation, and these tests reduce the likelihood of
going too far down an unexpected execution path.

This patch was regstrapped on x86_64-linux-gnu.  It was also
bootstrapped along with an extra common.opt that enables both passes
unconditionally.  Ok to install?


for  gcc/ChangeLog

* common.opt (fharden-compares): New.
(fharden-conditional-branches): New.
* doc/invoke.texi: Document new options.
* gimple-harden-conditionals.cc: New.
* passes.def: Add new passes.
* tree-pass.h (make_pass_harden_compares): Declare.
(make_pass_harden_conditional_branches): Declare.

for  gcc/ada/ChangeLog

* doc/gnat_rm/security_hardening_features.rst
(Hardened Conditionals): New.

for  gcc/testsuite/ChangeLog

* c-c++-common/torture/harden-comp.c: New.
* c-c++-common/torture/harden-cond.c: New.
---
 gcc/Makefile.in|1 
 .../doc/gnat_rm/security_hardening_features.rst|   40 ++
 gcc/common.opt |8 
 gcc/doc/invoke.texi|   19 +
 gcc/gimple-harden-conditionals.cc  |  379 
 gcc/passes.def |2 
 gcc/testsuite/c-c++-common/torture/harden-comp.c   |   14 +
 gcc/testsuite/c-c++-common/torture/harden-cond.c   |   18 +
 gcc/tree-pass.h|3 
 9 files changed, 484 insertions(+)
 create mode 100644 gcc/gimple-harden-conditionals.cc
 create mode 100644 gcc/testsuite/c-c++-common/torture/harden-comp.c
 create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cond.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 64252697573a7..7209ed117d09d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1389,6 +1389,7 @@ OBJS = \
gimple-if-to-switch.o \
gimple-iterator.o \
gimple-fold.o \
+   gimple-harden-conditionals.o \
gimple-laddress.o \
gimple-loop-interchange.o \
gimple-loop-jam.o \
diff --git a/gcc/ada/doc/gnat_rm/security_hardening_features.rst 
b/gcc/ada/doc/gnat_rm/security_hardening_features.rst
index 1c46e3a4c7b88..52240d7e3dd54 100644
--- a/gcc/ada/doc/gnat_rm/security_hardening_features.rst
+++ b/gcc/ada/doc/gnat_rm/security_hardening_features.rst
@@ -87,3 +87,43 @@ types and subtypes, may be silently ignored.  Specifically, 
it is not
 currently recommended to rely on any effects this pragma might be
 expected to have when calling subprograms through access-to-subprogram
 variables.
+
+
+.. Hardened Conditionals:
+
+Hardened Conditionals
+=
+
+GNAT can harden conditionals to protect against control flow attacks.
+
+This is accomplished by two complementary transformations, each
+activated by a separate command-line option.
+
+The option *-fharden-compares* enables hardening of compares that
+compute results stored in variables, adding verification that the
+reversed compare yields the opposite result.
+
+The option *-fharden-conditional-branches* enables hardening of
+compares that guard conditional branches, adding verification of the
+reversed compare to both execution paths.
+
+These transformations are introduced late in the compilation pipeline,
+long after boolean expressions are decomposed into separate compares,
+each one turned into either a conditional branch or a compare whose
+result is stored in a boolean variable or temporary.  Compiler
+optimizations, if enabled, may also turn conditional branches into
+stored compares, and vice-versa.  Conditionals may also be optimized
+out entirely, if their value can be determined at compile time, and
+occasionally multiple compares can be combined into one.
+
+It is thus difficult to predict which of these two options will affect
+a specific compare operation expressed in source code.  Using both
+options ensures that every compare that is not optimized out will be
+hardened.
+
+The addition of reversed compares can be observed by enabling the dump
+files of the corresponding passes, through command line options
+*-fdump-tree-hardcmp* and *-fdump-tree-hardcbr*, respectively.
+
+They are separate options, however, because of the significantly
+different performance impact of the hardening transformations.
diff --git a/gcc/common.opt b/gcc/common.opt
index e867055fc000d..89f2e6da6e56e 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1719,6 +1719,14 @@ fguess-branch-probability
 Common Var(flag_guess_branch_prob) Optimization
 Enable guessing of branch probabilities.
 
+fharden-compares
+Common Var(flag_harden_compares) Optimiza

Re: [RFC] Don't move cold code out of loop by checking bb count

2021-10-08 Thread Xionghu Luo via Gcc-patches
Hi,

On 2021/9/28 20:09, Richard Biener wrote:
> On Fri, Sep 24, 2021 at 8:29 AM Xionghu Luo  wrote:
>>
>> Update the patch to v3, not sure whether you prefer the paste style
>> and continue to link the previous thread as Segher dislikes this...
>>
>>
>> [PATCH v3] Don't move cold code out of loop by checking bb count
>>
>>
>> Changes:
>> 1. Handle max_loop in determine_max_movement instead of
>> outermost_invariant_loop.
>> 2. Remove unnecessary changes.
>> 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in 
>> can_sm_ref_p.
>> 4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused
>> infinite loop when implementing v1 and the iteration is missed to be
>> updated actually.
>>
>> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
>> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html
>>
>> There was a patch trying to avoid move cold block out of loop:
>>
>> https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
>>
>> Richard suggested to "never hoist anything from a bb with lower execution
>> frequency to a bb with higher one in LIM invariantness_dom_walker
>> before_dom_children".
>>
>> In gimple LIM analysis, add find_coldest_out_loop to move invariants to
>> expected target loop, if profile count of the loop bb is colder
>> than target loop preheader, it won't be hoisted out of loop.
>> Likely for store motion, if all locations of the REF in loop is cold,
>> don't do store motion of it.
>>
>> SPEC2017 performance evaluation shows 1% performance improvement for
>> intrate GEOMEAN and no obvious regression for others.  Especially,
>> 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
>> largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
>> on P8LE.
>>
>> gcc/ChangeLog:
>>
>> * loop-invariant.c (find_invariants_bb): Check profile count
>> before motion.
>> (find_invariants_body): Add argument.
>> * tree-ssa-loop-im.c (find_coldest_out_loop): New function.
>> (determine_max_movement): Use find_coldest_out_loop.
>> (move_computations_worker): Adjust and fix iteration udpate.
>> (execute_sm_exit): Check pointer validness.
>> (class ref_in_loop_hot_body): New functor.
>> (ref_in_loop_hot_body::operator): New.
>> (can_sm_ref_p): Use for_all_locs_in_loop.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.dg/tree-ssa/recip-3.c: Adjust.
>> * gcc.dg/tree-ssa/ssa-lim-18.c: New test.
>> * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
>> * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
>> ---
>>  gcc/loop-invariant.c   | 10 ++--
>>  gcc/tree-ssa-loop-im.c | 61 --
>>  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c|  2 +-
>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++
>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 27 ++
>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 +
>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 28 ++
>>  7 files changed, 165 insertions(+), 8 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
>>
>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>> index fca0c2b24be..5c3be7bf0eb 100644
>> --- a/gcc/loop-invariant.c
>> +++ b/gcc/loop-invariant.c
>> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool 
>> always_reached, bool always_executed)
>> call.  */
>>
>>  static void
>> -find_invariants_bb (basic_block bb, bool always_reached, bool 
>> always_executed)
>> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
>> +   bool always_executed)
>>  {
>>rtx_insn *insn;
>> +  basic_block preheader = loop_preheader_edge (loop)->src;
>> +
>> +  if (preheader->count > bb->count)
>> +return;
>>
>>FOR_BB_INSNS (bb, insn)
>>  {
>> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block 
>> *body,
>>unsigned i;
>>
>>for (i = 0; i < loop->num_nodes; i++)
>> -find_invariants_bb (body[i],
>> -   bitmap_bit_p (always_reached, i),
>> +find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
>> bitmap_bit_p (always_executed, i));
>>  }
>>
>> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
>> index 4b187c2cdaf..655fab03442 100644
>> --- a/gcc/tree-ssa-loop-im.c
>> +++ b/gcc/tree-ssa-loop-im.c
>> @@ -417,6 +417,28 @@ movement_possibility (gimple *stmt)
>>return ret;
>>  }
>>
>> +/* Find coldest loop between outmost_loop and loop by comapring profile 
>> count.  */
>> +
>> +static class loop *
>> +find_coldest_out_loop (class loop *outmost_loop, class loop *loop,
>> +  basic_block curr_bb)
>> +{

[PATCH] Adjust more testcases for O2 vectorization enabling.

2021-10-08 Thread liuhongt via Gcc-patches
Pushed to trunk.

libgomp/ChangeLog:

* testsuite/libgomp.c++/scan-10.C: Add option -fvect-cost-model=cheap.
* testsuite/libgomp.c++/scan-11.C: Ditto.
* testsuite/libgomp.c++/scan-12.C: Ditto.
* testsuite/libgomp.c++/scan-13.C: Ditto.
* testsuite/libgomp.c++/scan-14.C: Ditto.
* testsuite/libgomp.c++/scan-15.C: Ditto.
* testsuite/libgomp.c++/scan-16.C: Ditto.
* testsuite/libgomp.c++/scan-9.C: Ditto.
* testsuite/libgomp.c-c++-common/lastprivate-conditional-7.c: Ditto.
* testsuite/libgomp.c-c++-common/lastprivate-conditional-8.c: Ditto.
* testsuite/libgomp.c/scan-11.c: Ditto.
* testsuite/libgomp.c/scan-12.c: Ditto.
* testsuite/libgomp.c/scan-13.c: Ditto.
* testsuite/libgomp.c/scan-14.c: Ditto.
* testsuite/libgomp.c/scan-15.c: Ditto.
* testsuite/libgomp.c/scan-16.c: Ditto.
* testsuite/libgomp.c/scan-17.c: Ditto.
* testsuite/libgomp.c/scan-18.c: Ditto.
* testsuite/libgomp.c/scan-19.c: Ditto.
* testsuite/libgomp.c/scan-20.c: Ditto.
* testsuite/libgomp.c/scan-21.c: Ditto.
* testsuite/libgomp.c/scan-22.c: Ditto.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr94403.C: Add -fno-tree-vectorize
* gcc.dg/optimize-bswapsi-5.c: Ditto.
* gcc.dg/optimize-bswapsi-6.c: Ditto.
* gcc.dg/Warray-bounds-51.c: Add -mtune=generic
* gcc.dg/Wstringop-overflow-14.c:
---
 gcc/testsuite/g++.dg/tree-ssa/pr94403.C| 2 +-
 gcc/testsuite/gcc.dg/Warray-bounds-51.c| 3 ++-
 gcc/testsuite/gcc.dg/Wstringop-overflow-14.c   | 3 ++-
 gcc/testsuite/gcc.dg/optimize-bswapsi-5.c  | 2 +-
 gcc/testsuite/gcc.dg/optimize-bswapsi-6.c  | 2 +-
 libgomp/testsuite/libgomp.c++/scan-10.C| 2 +-
 libgomp/testsuite/libgomp.c++/scan-11.C| 2 +-
 libgomp/testsuite/libgomp.c++/scan-12.C| 2 +-
 libgomp/testsuite/libgomp.c++/scan-13.C| 2 +-
 libgomp/testsuite/libgomp.c++/scan-14.C| 2 +-
 libgomp/testsuite/libgomp.c++/scan-15.C| 2 +-
 libgomp/testsuite/libgomp.c++/scan-16.C| 2 +-
 libgomp/testsuite/libgomp.c++/scan-9.C | 2 +-
 .../testsuite/libgomp.c-c++-common/lastprivate-conditional-7.c | 2 +-
 .../testsuite/libgomp.c-c++-common/lastprivate-conditional-8.c | 2 +-
 libgomp/testsuite/libgomp.c/scan-11.c  | 2 +-
 libgomp/testsuite/libgomp.c/scan-12.c  | 2 +-
 libgomp/testsuite/libgomp.c/scan-13.c  | 2 +-
 libgomp/testsuite/libgomp.c/scan-14.c  | 2 +-
 libgomp/testsuite/libgomp.c/scan-15.c  | 2 +-
 libgomp/testsuite/libgomp.c/scan-16.c  | 2 +-
 libgomp/testsuite/libgomp.c/scan-17.c  | 2 +-
 libgomp/testsuite/libgomp.c/scan-18.c  | 2 +-
 libgomp/testsuite/libgomp.c/scan-19.c  | 2 +-
 libgomp/testsuite/libgomp.c/scan-20.c  | 2 +-
 libgomp/testsuite/libgomp.c/scan-21.c  | 2 +-
 libgomp/testsuite/libgomp.c/scan-22.c  | 2 +-
 27 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr94403.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr94403.C
index d47e7fcc5a3..5f8f868e469 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr94403.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr94403.C
@@ -3,7 +3,7 @@
 // are either big or little endian (not pdp endian).
 // { dg-do compile { target { lp64 && { i?86-*-* x86_64-*-* powerpc*-*-* 
aarch64*-*-* } } } }
 // { dg-require-effective-target store_merge }
-// { dg-options "-O2 -fdump-tree-store-merging -std=c++17" }
+// { dg-options "-O2 -fno-tree-vectorize -fdump-tree-store-merging -std=c++17" 
}
 
 namespace std {
   template 
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-51.c 
b/gcc/testsuite/gcc.dg/Warray-bounds-51.c
index b0b8bdb7938..cadb7a3f1b2 100644
--- a/gcc/testsuite/gcc.dg/Warray-bounds-51.c
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-51.c
@@ -1,7 +1,8 @@
 /* PR middle-end/92333 - missing variable name referencing VLA in warnings
PR middle-end/82608 - missing -Warray-bounds on an out-of-bounds VLA index
{ dg-do compile }
-   { dg-options "-O2 -Wall" } */
+   { dg-options "-O2 -Wall" }
+   { dg-additional-options "-mtune=generic" { target { i?86-*-* x86_64-*-* } } 
} */
 
 void sink (void*);
 
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-14.c 
b/gcc/testsuite/gcc.dg/Wstringop-overflow-14.c
index b648f5b41b1..83b069c3de8 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overflow-14.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-14.c
@@ -2,7 +2,8 @@
types than char are diagnosed.
{ dg-do compile }
{ dg-require-e

Re: [PATCH] hardened conditionals

2021-10-08 Thread Richard Biener via Gcc-patches
On October 9, 2021 5:30:04 AM GMT+02:00, Alexandre Oliva via Gcc-patches 
 wrote:
>
>This patch introduces optional passes to harden conditionals used in
>branches, and in computing boolean expressions, by adding redundant
>tests of the reversed conditions, and trapping in case of unexpected
>results.  Though in abstract machines the redundant tests should never
>fail, CPUs may be led to misbehave under certain kinds of attacks,
>such as of power deprivation, and these tests reduce the likelihood of
>going too far down an unexpected execution path.
>
>This patch was regstrapped on x86_64-linux-gnu.  It was also
>bootstrapped along with an extra common.opt that enables both passes
>unconditionally.  Ok to install?

Why two passes (and two IL traverses?) 

How do you prevent RTL optimizers (jump threading) from removing the redundant 
tests? I'd have expected such hardening to occur very late in the RTL pipeline. 

Richard. 

>
>for  gcc/ChangeLog
>
>   * common.opt (fharden-compares): New.
>   (fharden-conditional-branches): New.
>   * doc/invoke.texi: Document new options.
>   * gimple-harden-conditionals.cc: New.
>   * passes.def: Add new passes.
>   * tree-pass.h (make_pass_harden_compares): Declare.
>   (make_pass_harden_conditional_branches): Declare.
>
>for  gcc/ada/ChangeLog
>
>   * doc/gnat_rm/security_hardening_features.rst
>   (Hardened Conditionals): New.
>
>for  gcc/testsuite/ChangeLog
>
>   * c-c++-common/torture/harden-comp.c: New.
>   * c-c++-common/torture/harden-cond.c: New.
>---
> gcc/Makefile.in|1 
> .../doc/gnat_rm/security_hardening_features.rst|   40 ++
> gcc/common.opt |8 
> gcc/doc/invoke.texi|   19 +
> gcc/gimple-harden-conditionals.cc  |  379 
> gcc/passes.def |2 
> gcc/testsuite/c-c++-common/torture/harden-comp.c   |   14 +
> gcc/testsuite/c-c++-common/torture/harden-cond.c   |   18 +
> gcc/tree-pass.h|3 
> 9 files changed, 484 insertions(+)
> create mode 100644 gcc/gimple-harden-conditionals.cc
> create mode 100644 gcc/testsuite/c-c++-common/torture/harden-comp.c
> create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cond.c
>
>diff --git a/gcc/Makefile.in b/gcc/Makefile.in
>index 64252697573a7..7209ed117d09d 100644
>--- a/gcc/Makefile.in
>+++ b/gcc/Makefile.in
>@@ -1389,6 +1389,7 @@ OBJS = \
>   gimple-if-to-switch.o \
>   gimple-iterator.o \
>   gimple-fold.o \
>+  gimple-harden-conditionals.o \
>   gimple-laddress.o \
>   gimple-loop-interchange.o \
>   gimple-loop-jam.o \
>diff --git a/gcc/ada/doc/gnat_rm/security_hardening_features.rst 
>b/gcc/ada/doc/gnat_rm/security_hardening_features.rst
>index 1c46e3a4c7b88..52240d7e3dd54 100644
>--- a/gcc/ada/doc/gnat_rm/security_hardening_features.rst
>+++ b/gcc/ada/doc/gnat_rm/security_hardening_features.rst
>@@ -87,3 +87,43 @@ types and subtypes, may be silently ignored.  Specifically, 
>it is not
> currently recommended to rely on any effects this pragma might be
> expected to have when calling subprograms through access-to-subprogram
> variables.
>+
>+
>+.. Hardened Conditionals:
>+
>+Hardened Conditionals
>+=
>+
>+GNAT can harden conditionals to protect against control flow attacks.
>+
>+This is accomplished by two complementary transformations, each
>+activated by a separate command-line option.
>+
>+The option *-fharden-compares* enables hardening of compares that
>+compute results stored in variables, adding verification that the
>+reversed compare yields the opposite result.
>+
>+The option *-fharden-conditional-branches* enables hardening of
>+compares that guard conditional branches, adding verification of the
>+reversed compare to both execution paths.
>+
>+These transformations are introduced late in the compilation pipeline,
>+long after boolean expressions are decomposed into separate compares,
>+each one turned into either a conditional branch or a compare whose
>+result is stored in a boolean variable or temporary.  Compiler
>+optimizations, if enabled, may also turn conditional branches into
>+stored compares, and vice-versa.  Conditionals may also be optimized
>+out entirely, if their value can be determined at compile time, and
>+occasionally multiple compares can be combined into one.
>+
>+It is thus difficult to predict which of these two options will affect
>+a specific compare operation expressed in source code.  Using both
>+options ensures that every compare that is not optimized out will be
>+hardened.
>+
>+The addition of reversed compares can be observed by enabling the dump
>+files of the corresponding passes, through command line options
>+*-fdump-tree-hardcmp* and *-fdump-tree-hardcbr*, respectively.
>+
>+They are separate options, however, because of the significantly
>+differen