[PATCH] Improve sequence logic in cxx_init_decl_processing

2022-01-06 Thread Olivier Hainque via Gcc-patches
Hello,

commit aa2c978400f3b3ca6e9f2d18598a379589e77ba0, introduced per

  https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545552.html

makes references to __cxa_pure_virtual weak and this is causing
issues on some VxWorks configurations, where weak symbols are only
supported for one of the two major operating modes, and not on all
versions.

While trying to circumvent that, I noticed that the current
code in cxx_init_decl_processing does something like:

  if (flag_weak)
/* If no definition is available, resolve references to NULL.  */
declare_weak (abort_fndecl);
  ...
  if (! supports_one_only ())
flag_weak = 0;


The code possibly resetting flag_weak should presumlably execute
before the test checking the flag, or we'd need a comment explaining
why this surprising order is on purpose.

The attached patch just moves the reset above the test.

It bootstraps/regtests fine on x86_64-linux and allows better control
on vxWorks. I'm not yet clear on some of the ramifications there (tigthening
the definitions of SUPPORTS_ONE_ONLY and TARGET_SUPPORTS_WEAK yields lots of
dg test failures) but that's another story.

Ok to commit?

Thanks in advance!

2021-12-30  Olivier Hainque  

gcc/
* cp/decl.c (cxx_init_decl_processing): Move code possibly
altering flag_weak before code testing it.

Olivier



0001-Improve-sequence-logic-in-cxx_init_decl_processing.patch
Description: Binary data





[PATCH] expr: Workaround profiledbootstrap uninit false positive [PR103899]

2022-01-06 Thread Jakub Jelinek via Gcc-patches
Hi!

The threader changes resulted in a false positive warning during
profiledbootstrap:
In file included from ../../gcc/expr.c:26:
../../gcc/tree.h: In function ‘rtx_def* expand_expr_real_1(tree, rtx, 
machine_mode, expand_modifier, rtx_def**, bool)’:
../../gcc/tree.h:244:56: error: ‘context’ may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
  244 | #define TREE_CODE(NODE) ((enum tree_code) (NODE)->base.code)
  |^~~~
../../gcc/expr.c:10343:8: note: ‘context’ was declared here
10343 |   tree context;
  |^~~
While it will be nice to improve the uninit pass to handle it if possible
(I do not want to close the PR until that is done), doing profiledbootstrap
is a common thing to do, so a workaround is handy, especially as in this
case when the workaround seems to be the right thing to do, as it moves
a variable declaration to the only place where it is set and used and avoids
the weird and for uninit asking
  tree context;
...
  if (exp)
context = ...;
  gcc_assert (!exp
  || use (context)
  || use_some_more (context));

Bootstrapped/regtested on x86_64-linux and i686-linux with normal bootstrap
and on aarch64-linux and powerpc64le-linux with profiledbootstrap, ok for
trunk?

2022-01-06  Jakub Jelinek  

PR tree-optimization/103899
* expr.c (expand_expr_real_1): Add a workaround for bogus uninit
warning by moving context variable to the only spot where it is used
and moving gcc_assert into if body.

--- gcc/expr.c.jj   2022-01-03 10:40:41.203164211 +0100
+++ gcc/expr.c  2022-01-05 14:47:47.684660031 +0100
@@ -10340,7 +10340,6 @@ expand_expr_real_1 (tree exp, rtx target
   enum tree_code code = TREE_CODE (exp);
   rtx subtarget, original_target;
   int ignore;
-  tree context;
   bool reduce_bit_field;
   location_t loc = EXPR_LOCATION (exp);
   struct separate_ops ops;
@@ -10579,14 +10578,16 @@ expand_expr_real_1 (tree exp, rtx target
   /* Variables inherited from containing functions should have
 been lowered by this point.  */
   if (exp)
-   context = decl_function_context (exp);
-  gcc_assert (!exp
- || SCOPE_FILE_SCOPE_P (context)
- || context == current_function_decl
- || TREE_STATIC (exp)
- || DECL_EXTERNAL (exp)
- /* ??? C++ creates functions that are not TREE_STATIC.  */
- || TREE_CODE (exp) == FUNCTION_DECL);
+   {
+ tree context = decl_function_context (exp);
+ gcc_assert (SCOPE_FILE_SCOPE_P (context)
+ || context == current_function_decl
+ || TREE_STATIC (exp)
+ || DECL_EXTERNAL (exp)
+ /* ??? C++ creates functions that are not
+TREE_STATIC.  */
+ || TREE_CODE (exp) == FUNCTION_DECL);
+   }
 
   /* This is the case of an array whose size is to be determined
 from its initializer, while the initializer is still being parsed.

Jakub



[PATCH] c++: Ensure some more that immediate functions aren't gimplified [PR103912]

2022-01-06 Thread Jakub Jelinek via Gcc-patches
Hi!

Immediate functions should never be emitted into assembly, the FE doesn't
genericize them and does various things to ensure they aren't gimplified.
But the following testcase ICEs anyway due to that, because the consteval
function returns a lambda, and operator() of the lambda has
decl_function_context of the consteval function.  cgraphunit.c then
does:
  /* Preserve a functions function context node.  It will
 later be needed to output debug info.  */
  if (tree fn = decl_function_context (decl))
{
  cgraph_node *origin_node = cgraph_node::get_create (fn);
  enqueue_node (origin_node);
}
which enqueues the immediate function and then tries to gimplify it,
which results in ICE because it hasn't been genericized.

When I try similar testcase with constexpr instead of consteval and
static constinit auto instead of auto in main, what happens is that
the functions are gimplified, later ipa.c discovers they aren't reachable
and sets body_removed to true for them (and clears other flags) and we end
up with a debug info which has the foo and bar functions without
DW_AT_low_pc and other code specific attributes, just stuff from its BLOCK
structure and in there the lambda with DW_AT_low_pc etc.

The following patch attempts to emulate that behavior early, so that cgraph
doesn't try to gimplify those and pretends they were already gimplified
and found unused and optimized away.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-01-06  Jakub Jelinek  

PR c++/103912
* semantics.c (expand_or_defer_fn): For immediate functions, set
node->body_removed to true and clear analyzed, definition and
force_output.
* decl2.c (c_parse_final_cleanups): Ignore immediate functions for
expand_or_defer_fn.

* g++.dg/cpp2a/consteval26.C: New test.

--- gcc/cp/semantics.c.jj   2022-01-03 10:40:48.0 +0100
+++ gcc/cp/semantics.c  2022-01-05 12:52:11.484379138 +0100
@@ -4785,6 +4785,18 @@ expand_or_defer_fn (tree fn)
   emit_associated_thunks (fn);
 
   function_depth--;
+
+  if (DECL_IMMEDIATE_FUNCTION_P (fn))
+   {
+ cgraph_node *node = cgraph_node::get (fn);
+ if (node)
+   {
+ node->body_removed = true;
+ node->analyzed = false;
+ node->definition = false;
+ node->force_output = false;
+   }
+   }
 }
 }
 
--- gcc/cp/decl2.c.jj   2022-01-03 10:40:48.083068010 +0100
+++ gcc/cp/decl2.c  2022-01-05 12:53:34.930204119 +0100
@@ -5272,6 +5272,7 @@ c_parse_final_cleanups (void)
  if (!DECL_EXTERNAL (decl)
  && decl_needed_p (decl)
  && !TREE_ASM_WRITTEN (decl)
+ && !DECL_IMMEDIATE_FUNCTION_P (decl)
  && !node->definition)
{
  /* We will output the function; no longer consider it in this
--- gcc/testsuite/g++.dg/cpp2a/consteval26.C.jj 2022-01-05 12:42:07.918878074 
+0100
+++ gcc/testsuite/g++.dg/cpp2a/consteval26.C2022-01-05 12:40:18.853416637 
+0100
@@ -0,0 +1,39 @@
+// PR c++/103912
+// { dg-do run { target c++20 } }
+// { dg-additional-options "-O2 -g -fkeep-inline-functions" }
+
+extern "C" void abort ();
+
+struct A { A () {} };
+
+consteval auto
+foo ()
+{
+  if (1)
+;
+  return [] (A x) { return 1; };
+}
+
+consteval auto
+bar (int a)
+{
+  int b = a + 4;
+  if (1)
+;
+  return [=] (A x) { return a + b; };
+}
+
+int
+main ()
+{
+  A x;
+  auto h = foo ();
+  if (h (x) != 1)
+abort ();
+  auto i = bar (5);
+  if (i (x) != 14)
+abort ();
+  auto j = bar (42);
+  if (j (x) != 88)
+abort ();
+}

Jakub



[PATCH] [i386] Optimize V16HF vector insert to element 0 for AVX2.

2022-01-06 Thread liuhongt via Gcc-patches
Also remove mode attribute blendsuf, use ssemodesuf instead.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready to push to trunk.

gcc/ChangeLog:

PR target/103753
* config/i386/i386-expand.c (ix86_expand_vector_set): Not use
gen_avx2_pblendph_pblendd when elt == 0.
* config/i386/sse.md (avx2_pblendph): Rename to ..
(avx2_pblend_pblendd).. this, and extend to V16HI.
(*avx2_pblendw): Rename to ..
(*avx2_pblend): .. this, and extend to V16HF.
(avx2_pblendw): Rename to ..
(*avx2_pblend): .. this, and extend to V16HF.
(blendsuf): Removed.
(sse4_1_pblend): Renamed to ..
(sse4_1_pblend): .. this.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr103753.c: New test.
---
 gcc/config/i386/i386-expand.c|  5 ++-
 gcc/config/i386/sse.md   | 48 +++-
 gcc/testsuite/gcc.target/i386/pr103753.c | 17 +
 3 files changed, 42 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103753.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index e93ef1cafa6..0d219d6bb69 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -16245,10 +16245,11 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx 
val, int elt)
   goto half;
 
 case E_V16HFmode:
-  if (TARGET_AVX2)
+  /* For ELT == 0, vec_setv8hf_0 can save 1 vpbroadcastw.  */
+  if (TARGET_AVX2 && elt != 0)
{
  mmode = SImode;
- gen_blendm = gen_avx2_pblendph;
+ gen_blendm = gen_avx2_pblendph_pblendd;
  blendm_const = true;
  break;
}
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 033b60d9aa2..c986c73bef8 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -21291,10 +21291,7 @@ (define_insn_and_split 
"*_pblendvb_lt_subreg_not"
  (lt:VI1_AVX2 (match_dup 3) (match_dup 4))] UNSPEC_BLENDV))]
   "operands[3] = gen_lowpart (mode, operands[3]);")
 
-(define_mode_attr blendsuf
-  [(V8HI "w") (V8HF "ph")])
-
-(define_insn "sse4_1_pblend"
+(define_insn "sse4_1_pblend"
   [(set (match_operand:V8_128 0 "register_operand" "=Yr,*x,x")
(vec_merge:V8_128
  (match_operand:V8_128 2 "vector_operand" "YrBm,*xBm,xm")
@@ -21313,11 +21310,11 @@ (define_insn "sse4_1_pblend"
(set_attr "mode" "TI")])
 
 ;; The builtin uses an 8-bit immediate.  Expand that.
-(define_expand "avx2_pblendw"
-  [(set (match_operand:V16HI 0 "register_operand")
-   (vec_merge:V16HI
- (match_operand:V16HI 2 "nonimmediate_operand")
- (match_operand:V16HI 1 "register_operand")
+(define_expand "avx2_pblend"
+  [(set (match_operand:V16_256 0 "register_operand")
+   (vec_merge:V16_256
+ (match_operand:V16_256 2 "nonimmediate_operand")
+ (match_operand:V16_256 1 "register_operand")
  (match_operand:SI 3 "const_0_to_255_operand")))]
   "TARGET_AVX2"
 {
@@ -21325,11 +21322,11 @@ (define_expand "avx2_pblendw"
   operands[3] = GEN_INT (val << 8 | val);
 })
 
-(define_expand "avx2_pblendph"
-  [(set (match_operand:V16HF 0 "register_operand")
-   (vec_merge:V16HF
- (match_operand:V16HF 2 "register_operand")
- (match_operand:V16HF 1 "register_operand")
+(define_expand "avx2_pblend_pblendd"
+  [(set (match_operand:V16_256 0 "register_operand")
+   (vec_merge:V16_256
+ (match_operand:V16_256 2 "register_operand")
+ (match_operand:V16_256 1 "register_operand")
  (match_operand:SI 3 "const_int_operand")))]
   "TARGET_AVX2
   && !((INTVAL (operands[3]) & 0xff) && (INTVAL (operands[3]) & 0xff00))"
@@ -21339,7 +21336,7 @@ (define_expand "avx2_pblendph"
 emit_move_insn (operands[0], operands[1]);
   else
{
- rtx tmp = gen_reg_rtx (V16HImode);
+ rtx tmp = gen_reg_rtx (mode);
  rtx blendw_idx, blendd_idx;
 
  if (mask & 0xff)
@@ -21352,13 +21349,12 @@ (define_expand "avx2_pblendph"
 blendw_idx = GEN_INT (mask >> 8 & 0xff);
 blendd_idx = GEN_INT (240);
}
- operands[1] = lowpart_subreg (V16HImode, operands[1], V16HFmode);
- operands[2] = lowpart_subreg (V16HImode, operands[2], V16HFmode);
- emit_insn (gen_avx2_pblendw (tmp, operands[1], operands[2], blendw_idx));
+ emit_insn (gen_avx2_pblend (tmp, operands[1],
+   operands[2], blendw_idx));
 
- operands[0] = lowpart_subreg (V8SImode, operands[0], V16HFmode);
- tmp = lowpart_subreg (V8SImode, tmp, V16HImode);
- operands[1] = lowpart_subreg (V8SImode, operands[1], V16HImode);
+ operands[0] = lowpart_subreg (V8SImode, operands[0], mode);
+ tmp = lowpart_subreg (V8SImode, tmp, mode);
+ operands[1] = lowpart_subreg (V8SImode, operands[1], mode);
  emit_insn (gen_avx2_pblenddv8si (operands[0], operands[1],
  tmp, blendd_idx));
   }
@@ -21366,11 +21362,11 @@ (

[PATCH] c++: Reject in constant evaluation address comparisons of start of one var and end of another [PR89074]

2022-01-06 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase used to be incorrectly accepted.  The match.pd
optimization that uses address_compare punts on folding comparison
of start of one object and end of another one only when those addresses
are cast to integral types, when the comparison is done on pointer types
it assumes undefined behavior and decides to fold the comparison such
that the addresses don't compare equal even when they at runtime they
could be equal.
But C++ says it is undefined behavior and so during constant evaluation
we should reject those, so this patch adds !folding_initializer &&
check to that spot.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note, address_compare has some special cases, e.g. it assumes that
static vars are never adjacent to automatic vars, which is the case
for the usual layout where automatic vars are on the stack and after
.rodata/.data sections there is heap:
  /* Assume that automatic variables can't be adjacent to global
 variables.  */
  else if (is_global_var (base0) != is_global_var (base1))
;
Is it ok that during constant evaluation we don't treat those as undefined
behavior, or shall that be with !folding_initializer && too?

Another special case is:
  if ((DECL_P (base0) && TREE_CODE (base1) == STRING_CST)
   || (TREE_CODE (base0) == STRING_CST && DECL_P (base1))
   || (TREE_CODE (base0) == STRING_CST
   && TREE_CODE (base1) == STRING_CST
   && ioff0 >= 0 && ioff1 >= 0
   && ioff0 < TREE_STRING_LENGTH (base0)
   && ioff1 < TREE_STRING_LENGTH (base1)
  /* This is a too conservative test that the STRING_CSTs
 will not end up being string-merged.  */
   && strncmp (TREE_STRING_POINTER (base0) + ioff0,
   TREE_STRING_POINTER (base1) + ioff1,
   MIN (TREE_STRING_LENGTH (base0) - ioff0,
TREE_STRING_LENGTH (base1) - ioff1)) != 0))
;
  else if (!DECL_P (base0) || !DECL_P (base1))
return 2;
Here we similarly assume that vars aren't adjacent to string literals
or vice versa.  Do we need to stick !folding_initializer && to those
DECL_P vs. STRING_CST cases?  Though, because of the return 2; for
non-DECL_P that would mean rejecting comparisons like &var == &"foobar"[3]
etc. which ought to be fine, no?  So perhaps we need to watch for
decls. vs. STRING_CSTs like for DECLs whether the address is at the start
or at the end of the string literal or somewhere in between (at least
for folding_initializer)?
And yet another chapter but probably unsolvable is comparison of
string literal addresses.  I think pedantically in C++
&"foo"[0] == &"foo"[0] is undefined behavior, different occurences of
the same string literals might still not be merged in some implementations.
But constexpr const char *s = "foo"; &s[0] == &s[0] should be well defined,
and we aren't tracking anywhere whether the string literal was the same one
or different (and I think other compilers don't track that either).

2022-01-06  Jakub Jelinek  

PR c++/89074
* fold-const.c (address_compare): Punt on comparison of address of
one object with address of end of another object if
folding_initializer.

* g++.dg/cpp1y/constexpr-89074-1.C: New test.

--- gcc/fold-const.c.jj 2022-01-05 20:30:08.731806756 +0100
+++ gcc/fold-const.c2022-01-05 20:34:52.277822349 +0100
@@ -16627,7 +16627,7 @@ address_compare (tree_code code, tree ty
   /* If this is a pointer comparison, ignore for now even
  valid equalities where one pointer is the offset zero
  of one object and the other to one past end of another one.  */
-  else if (!INTEGRAL_TYPE_P (type))
+  else if (!folding_initializer && !INTEGRAL_TYPE_P (type))
 ;
   /* Assume that automatic variables can't be adjacent to global
  variables.  */
--- gcc/testsuite/g++.dg/cpp1y/constexpr-89074-1.C.jj   2022-01-05 
20:43:03.696917484 +0100
+++ gcc/testsuite/g++.dg/cpp1y/constexpr-89074-1.C  2022-01-05 
20:42:12.676634044 +0100
@@ -0,0 +1,28 @@
+// PR c++/89074
+// { dg-do compile { target c++14 } }
+
+constexpr bool
+foo ()
+{
+  int a[] = { 1, 2 };
+  int b[] = { 3, 4 };
+
+  if (&a[0] == &b[0])
+return false;
+
+  if (&a[1] == &b[0])
+return false;
+
+  if (&a[1] == &b[1])
+return false;
+
+  if (&a[2] == &b[1])
+return false;
+
+  if (&a[2] == &b[0])  // { dg-error "is not a constant expression" }
+return false;
+
+  return true;
+}
+
+constexpr bool a = foo ();

Jakub



Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-06 Thread Tom de Vries via Gcc-patches

On 1/5/22 15:36, Andrew Stubbs wrote:

On 05/01/2022 13:04, Tom de Vries wrote:

On 1/5/22 12:08, Tom de Vries wrote:

The allocators-1.c test-case doesn't compile because:
...
FAIL: libgomp.c/allocators-1.c (test for excess errors)
Excess errors:
/home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.c/allocators-1.c:7:22: 
sorry, unimplemented: '    ' clause on 'requires' directive not 
supported yet


UNRESOLVED: libgomp.c/allocators-1.c compilation failed to produce 
executable

...

So, I suppose I need "[PATCH] OpenMP front-end: allow requires 
dynamic_allocators" as well, I'll try again with that applied.


After applying that, I get:
...
WARNING: program timed out.
FAIL: libgomp.c/allocators-2.c execution test
WARNING: program timed out.
FAIL: libgomp.c/allocators-3.c execution test
...


It works for me.

Those tests are doing some large number of allocations repeatedly and in 
parallel to stress the atomics. They're also slightly longer running 
than the other tests.
   - allocators-2 calls omp_alloc 8080 times, over 16 kernel launches, 
some of which will fall back to PTX malloc.


I've minimized the test-case by enabling a single call in main at a 
time.  All but the last 4 take about two seconds, the last 4 hang (and 
time out at 5min).


So, this already times out for me:
...
int
main ()
{
  test (1000, omp_low_lat_mem_alloc);
  return 0;
}
...

I tried playing around with the n, and roughly there's no hang below 
100, and a hang above 200, and inbetween there may or may not be a hang.


Again the same dynamic: if there's no hang, it just takes a few seconds.

   - allocators-3 calls omp_alloc and omp_free 8 million times each, 
over 8 kernel launches, and takes about a minute to run on my device 
(whether that falls back depends entirely on how the free calls 
interleave).


Either there is a flaw in the concurrency causing some kind of deadlock, 
or else your timeout is set too short for your device. I hope it's the 
latter. We may need to tweak this.


At first glance, the above behaviour doesn't look like a too short timeout.

[ FTR, I'm using a GT 1030 with production branch driver version 470.86 
(which is one version behind the latest 470.94) ]


Thanks,
- Tom


Re: [PATCH] match.pd: Simplify 1 / X for integer X [PR95424]

2022-01-06 Thread Zhao Wei Liew via Gcc-patches
On Wed, 5 Jan 2022 at 17:55, Richard Biener 
wrote:

> On Wed, Jan 5, 2022 at 10:42 AM Jakub Jelinek  wrote:
> >
> > On Wed, Jan 05, 2022 at 10:38:53AM +0100, Richard Biener via Gcc-patches
> wrote:
> > > On Wed, Jan 5, 2022 at 10:18 AM Zhao Wei Liew 
> wrote:
> > > >
> > > > > X >= -1 && X <= 1 is (hopefully?) going to be simplified
> > > > > as  (unsigned)X + 1 <= 2, it might be good to simplify it this way
> > > > > here as well?
> > > >
> > > > Yup, GCC does simplify it that way in the end, so I didn't really
> bother to simplify it here. That said, I'm open to simplifying it here as
> well, but I'm not sure how to do the unsigned cast.
> > >
> > > You'd do sth like
> > >   (with
> > > {  tree utype = unsigned_type_for (type); }
> > > (cond (le (plus (convert:utype @0) { build_one_cst (utype); }) {
> > > build_int_cst (utype, 2); }) ...)
> > >
> > > extra tricky will be 1 bit integer types, I guess it might be easiest
> > > to exclude them
> > > and special case them separately - X / Y is always X for them I think,
> >
> > Note, we already have:
> >  /* X / bool_range_Y is X.  */
> >  (simplify
> >   (div @0 SSA_NAME@1)
> >   (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1))
> >@0))
> > for those.
>
> Ah, it might not handle the signed : 1 case though since -1 is not in the
> bool range.  We could generalize the above though.
>
> Richard.
>
> >
> > Jakub
> >
>

Perhaps it is possible to exclude 1-bit cases and rely on other existing
simplifications to deal with them?
In particular, GCC seems to already optimally simplify the signed and
unsigned 1-bit cases [1].

[1]:
struct S {
  unsigned int x: 1;
  int y: 1;
};
unsigned int fu(S s) {
  return 1 / s.x;
}
int fi(S s) {
  return 1 / s.y;
}

fu(S):
mov eax, 1
ret
fi(S):
mov eax, -1
ret


Re: [committed] libstdc++: Reduce template instantiations in

2022-01-06 Thread Stephan Bergmann via Gcc-patches

On 05/01/2022 14:47, Jonathan Wakely via Libstdc++ wrote:

Tested powerpc64le-linux, pushed to trunk.


This moves the last two template parameters of __regex_algo_impl to be
runtime function parameters instead, so that we don't need four
different instantiations for the possible ways to call it. Most of the
function (and what it instantiates) is the same in all cases, so making
them compile-time choices doesn't really have much benefit.

Use  'if constexpr' for conditions that check template parameters, so
that when we do depend on a compile-time condition we only instantiate
what we need to.

libstdc++-v3/ChangeLog:

* include/bits/regex.h (__regex_algo_impl): Change __policy and
__match_mode template parameters to be function parameters.
(regex_match, regex_search): Pass policy and match mode as
function arguments.
* include/bits/regex.tcc (__regex_algo_impl): Change template
parameters to function parameters.
* include/bits/regex_compiler.h (_RegexTranslatorBase): Use
'if constexpr' for conditions using template parameters.
(_RegexTranslator): Likewise.
* include/bits/regex_executor.tcc (_Executor::_M_handle_accept):
Likewise.
* testsuite/util/testsuite_regex.h (regex_match_debug)
(regex_search_debug): Move template arguments to function
arguments.
---
  libstdc++-v3/include/bits/regex.h | 33 +--
  libstdc++-v3/include/bits/regex.tcc   |  8 ++---
  libstdc++-v3/include/bits/regex_compiler.h|  9 ++---
  libstdc++-v3/include/bits/regex_executor.tcc  |  2 +-
  libstdc++-v3/testsuite/util/testsuite_regex.h | 24 +++---
  5 files changed, 37 insertions(+), 39 deletions(-)


Clang now fails #include  with


In file included from 
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/regex:66:
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:799:9:
 error: unknown type name '_RegexExecutorPolicy'; did you mean 
'__detail::_RegexExecutorPolicy'?
_RegexExecutorPolicy, bool);
^
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14:
 note: '__detail::_RegexExecutorPolicy' declared here
  enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate };
 ^
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:2070:9:
 error: unknown type name '_RegexExecutorPolicy'; did you mean 
'__detail::_RegexExecutorPolicy'?
_RegexExecutorPolicy, bool);
^
gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14:
 note: '__detail::_RegexExecutorPolicy' declared here
  enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate };
 ^


and


diff --git a/libstdc++-v3/include/bits/regex.h 
b/libstdc++-v3/include/bits/regex.h
index 7480b0a5f97..46c168010bf 100644
--- a/libstdc++-v3/include/bits/regex.h
+++ b/libstdc++-v3/include/bits/regex.h
@@ -796,7 +796,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
__detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&,
const basic_regex<_Cp, _Rp>&,
regex_constants::match_flag_type,
-   _RegexExecutorPolicy, bool);
+   __detail::_RegexExecutorPolicy, bool);
 
   template

friend class __detail::_Executor;
@@ -2067,7 +2067,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
__detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&,
const basic_regex<_Cp, _Rp>&,
regex_constants::match_flag_type,
-   _RegexExecutorPolicy, bool);
+   __detail::_RegexExecutorPolicy, bool);
 
   // Reset contents to __size unmatched sub_match objects

   // (plus additional objects for prefix, suffix and unmatched sub).


would fix that.



Re: [PATCH] match.pd: Simplify 1 / X for integer X [PR95424]

2022-01-06 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 06, 2022 at 05:52:03PM +0800, Zhao Wei Liew wrote:
> On Wed, 5 Jan 2022 at 17:55, Richard Biener 
> wrote:
> 
> > On Wed, Jan 5, 2022 at 10:42 AM Jakub Jelinek  wrote:
> > >
> > > On Wed, Jan 05, 2022 at 10:38:53AM +0100, Richard Biener via Gcc-patches
> > wrote:
> > > > On Wed, Jan 5, 2022 at 10:18 AM Zhao Wei Liew 
> > wrote:
> > > > >
> > > > > > X >= -1 && X <= 1 is (hopefully?) going to be simplified
> > > > > > as  (unsigned)X + 1 <= 2, it might be good to simplify it this way
> > > > > > here as well?
> > > > >
> > > > > Yup, GCC does simplify it that way in the end, so I didn't really
> > bother to simplify it here. That said, I'm open to simplifying it here as
> > well, but I'm not sure how to do the unsigned cast.
> > > >
> > > > You'd do sth like
> > > >   (with
> > > > {  tree utype = unsigned_type_for (type); }
> > > > (cond (le (plus (convert:utype @0) { build_one_cst (utype); }) {
> > > > build_int_cst (utype, 2); }) ...)
> > > >
> > > > extra tricky will be 1 bit integer types, I guess it might be easiest
> > > > to exclude them
> > > > and special case them separately - X / Y is always X for them I think,
> > >
> > > Note, we already have:
> > >  /* X / bool_range_Y is X.  */
> > >  (simplify
> > >   (div @0 SSA_NAME@1)
> > >   (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1))
> > >@0))
> > > for those.
> >
> > Ah, it might not handle the signed : 1 case though since -1 is not in the
> > bool range.  We could generalize the above though.
> >
> > Richard.
> >
> > >
> > > Jakub
> > >
> >
> 
> Perhaps it is possible to exclude 1-bit cases and rely on other existing
> simplifications to deal with them?
> In particular, GCC seems to already optimally simplify the signed and
> unsigned 1-bit cases [1].

Sure, I thought that was what Richi was suggesting, so ad some
 && TYPE_PRECISION (...) > 1
somewhere (after making sure it is INTEGRAL_TYPE_P), or element_precision
if it is ANY_INTEGRAL_TYPE_P and could be vector type.

The discussion was about whether we optimize even those : 1 cases
sufficiently but that can be done in different simplifications...

Jakub



Re: [committed] libstdc++: Reduce template instantiations in

2022-01-06 Thread Jonathan Wakely via Gcc-patches
On Thu, 6 Jan 2022 at 10:00, Stephan Bergmann  wrote:

> On 05/01/2022 14:47, Jonathan Wakely via Libstdc++ wrote:
> > Tested powerpc64le-linux, pushed to trunk.
> >
> >
> > This moves the last two template parameters of __regex_algo_impl to be
> > runtime function parameters instead, so that we don't need four
> > different instantiations for the possible ways to call it. Most of the
> > function (and what it instantiates) is the same in all cases, so making
> > them compile-time choices doesn't really have much benefit.
> >
> > Use  'if constexpr' for conditions that check template parameters, so
> > that when we do depend on a compile-time condition we only instantiate
> > what we need to.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * include/bits/regex.h (__regex_algo_impl): Change __policy and
> >   __match_mode template parameters to be function parameters.
> >   (regex_match, regex_search): Pass policy and match mode as
> >   function arguments.
> >   * include/bits/regex.tcc (__regex_algo_impl): Change template
> >   parameters to function parameters.
> >   * include/bits/regex_compiler.h (_RegexTranslatorBase): Use
> >   'if constexpr' for conditions using template parameters.
> >   (_RegexTranslator): Likewise.
> >   * include/bits/regex_executor.tcc (_Executor::_M_handle_accept):
> >   Likewise.
> >   * testsuite/util/testsuite_regex.h (regex_match_debug)
> >   (regex_search_debug): Move template arguments to function
> >   arguments.
> > ---
> >   libstdc++-v3/include/bits/regex.h | 33 +--
> >   libstdc++-v3/include/bits/regex.tcc   |  8 ++---
> >   libstdc++-v3/include/bits/regex_compiler.h|  9 ++---
> >   libstdc++-v3/include/bits/regex_executor.tcc  |  2 +-
> >   libstdc++-v3/testsuite/util/testsuite_regex.h | 24 +++---
> >   5 files changed, 37 insertions(+), 39 deletions(-)
>
> Clang now fails #include  with
>
> > In file included from
> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/regex:66:
> >
> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:799:9:
> error: unknown type name '_RegexExecutorPolicy'; did you mean
> '__detail::_RegexExecutorPolicy'?
> > _RegexExecutorPolicy, bool);
> > ^
> >
> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14:
> note: '__detail::_RegexExecutorPolicy' declared here
> >   enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate };
> >  ^
> >
> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:2070:9:
> error: unknown type name '_RegexExecutorPolicy'; did you mean
> '__detail::_RegexExecutorPolicy'?
> > _RegexExecutorPolicy, bool);
> > ^
> >
> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14:
> note: '__detail::_RegexExecutorPolicy' declared here
> >   enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate };
> >  ^
>
> and
>
> > diff --git a/libstdc++-v3/include/bits/regex.h
> b/libstdc++-v3/include/bits/regex.h
> > index 7480b0a5f97..46c168010bf 100644
> > --- a/libstdc++-v3/include/bits/regex.h
> > +++ b/libstdc++-v3/include/bits/regex.h
> > @@ -796,7 +796,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
> > __detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&,
> > const basic_regex<_Cp, _Rp>&,
> > regex_constants::match_flag_type,
> > -   _RegexExecutorPolicy, bool);
> > +   __detail::_RegexExecutorPolicy,
> bool);
> >
> >template
> > friend class __detail::_Executor;
> > @@ -2067,7 +2067,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
> > __detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&,
> > const basic_regex<_Cp, _Rp>&,
> > regex_constants::match_flag_type,
> > -   _RegexExecutorPolicy, bool);
> > +   __detail::_RegexExecutorPolicy,
> bool);
> >
> >// Reset contents to __size unmatched sub_match objects
> >// (plus additional objects for prefix, suffix and unmatched sub).
>
> would fix that.
>


I'll make the change, but this looks like a clang bug:
https://godbolt.org/z/bozxYErrc


[PATCH v2] match.pd: Simplify 1 / X for integer X [PR95424]

2022-01-06 Thread Zhao Wei Liew via Gcc-patches
This patch implements an optimization for the following C++ code:

int f(int x) {
return 1 / x;
}

int f(unsigned int x) {
return 1 / x;
}

Before this patch, x86-64 gcc -std=c++20 -O3 produces the following
assembly:

f(int):
xor edx, edx
mov eax, 1
idiv edi
ret
f(unsigned int):
xor edx, edx
mov eax, 1
div edi
ret

In comparison, clang++ -std=c++20 -O3 produces the following assembly:

f(int):
lea ecx, [rdi + 1]
xor eax, eax
cmp ecx, 3
cmovb eax, edi
ret
f(unsigned int):
xor eax, eax
cmp edi, 1
sete al
ret

Clang's output is more efficient as it avoids expensive div operations.

With this patch, GCC now produces the following assembly:

f(int):
lea eax, [rdi + 1]
cmp eax, 2
mov eax, 0
cmovbe eax, edi
ret
f(unsigned int):
xor eax, eax
cmp edi, 1
sete al
ret

which is virtually identical to Clang's assembly output. Any slight
differences
in the output for f(int) is possibly related to a different missed
optimization.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587634.html
Changes from v1:
1. Refactor common if conditions.
2. Use build_[minus_]one_cst (type) to get -1/1 of the correct type.
3. Match only for TRUNC_DIV_EXPR and TYPE_PRECISION (type) > 1.

gcc/ChangeLog:

* match.pd: Simplify 1 / X where X is an integer.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/divide-6.c: New test.
* gcc.dg/tree-ssa/divide-7.c: New test.
---
 gcc/match.pd | 15 +++
 gcc/testsuite/gcc.dg/tree-ssa/divide-6.c |  9 +
 gcc/testsuite/gcc.dg/tree-ssa/divide-7.c |  9 +
 3 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-6.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-7.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 84c9b918041..52a0f77f455 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -432,6 +432,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && TYPE_UNSIGNED (type))
   (trunc_div @0 @1)))

+ /* 1 / X -> X == 1 for unsigned integer X.
+1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X.
+But not for 1 / 0 so that we can get proper warnings and errors,
+and not for 1-bit integers as they are edge cases better handled
elsewhere. */
+(simplify
+  (trunc_div integer_onep@0 @1)
+  (if (INTEGRAL_TYPE_P (type) && !integer_zerop (@1) && TYPE_PRECISION
(type) > 1)
+(switch
+  (if (TYPE_UNSIGNED (type))
+(eq @1 { build_one_cst (type); }))
+  (if (!TYPE_UNSIGNED (type))
+(with { tree utype = unsigned_type_for (type); }
+  (cond (le (plus (convert:utype @1) { build_one_cst (utype); }) {
build_int_cst (utype, 2); })
+@1 { build_zero_cst (type); }))
+
 /* Combine two successive divisions.  Note that combining ceil_div
and floor_div is trickier and combining round_div even more so.  */
 (for div (trunc_div exact_div)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c
b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c
new file mode 100644
index 000..a9fc4c04058
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+unsigned int f(unsigned int x) {
+  return 1 / x;
+}
+
+/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */
+/* { dg-final { scan-tree-dump "x_..D. == 1;" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c
b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c
new file mode 100644
index 000..285279af7c2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+int f(int x) {
+  return 1 / x;
+}
+
+/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */
+/* { dg-final { scan-tree-dump ".. <= 2 ? x_..D. : 0;" "optimized" } } */
-- 
2.17.1


Re: [committed] libstdc++: Reduce template instantiations in

2022-01-06 Thread Jonathan Wakely via Gcc-patches
On Thu, 6 Jan 2022 at 10:33, Jonathan Wakely  wrote:

>
>
> On Thu, 6 Jan 2022 at 10:00, Stephan Bergmann  wrote:
>
>> On 05/01/2022 14:47, Jonathan Wakely via Libstdc++ wrote:
>> > Tested powerpc64le-linux, pushed to trunk.
>> >
>> >
>> > This moves the last two template parameters of __regex_algo_impl to be
>> > runtime function parameters instead, so that we don't need four
>> > different instantiations for the possible ways to call it. Most of the
>> > function (and what it instantiates) is the same in all cases, so making
>> > them compile-time choices doesn't really have much benefit.
>> >
>> > Use  'if constexpr' for conditions that check template parameters, so
>> > that when we do depend on a compile-time condition we only instantiate
>> > what we need to.
>> >
>> > libstdc++-v3/ChangeLog:
>> >
>> >   * include/bits/regex.h (__regex_algo_impl): Change __policy and
>> >   __match_mode template parameters to be function parameters.
>> >   (regex_match, regex_search): Pass policy and match mode as
>> >   function arguments.
>> >   * include/bits/regex.tcc (__regex_algo_impl): Change template
>> >   parameters to function parameters.
>> >   * include/bits/regex_compiler.h (_RegexTranslatorBase): Use
>> >   'if constexpr' for conditions using template parameters.
>> >   (_RegexTranslator): Likewise.
>> >   * include/bits/regex_executor.tcc (_Executor::_M_handle_accept):
>> >   Likewise.
>> >   * testsuite/util/testsuite_regex.h (regex_match_debug)
>> >   (regex_search_debug): Move template arguments to function
>> >   arguments.
>> > ---
>> >   libstdc++-v3/include/bits/regex.h | 33 +--
>> >   libstdc++-v3/include/bits/regex.tcc   |  8 ++---
>> >   libstdc++-v3/include/bits/regex_compiler.h|  9 ++---
>> >   libstdc++-v3/include/bits/regex_executor.tcc  |  2 +-
>> >   libstdc++-v3/testsuite/util/testsuite_regex.h | 24 +++---
>> >   5 files changed, 37 insertions(+), 39 deletions(-)
>>
>> Clang now fails #include  with
>>
>> > In file included from
>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/regex:66:
>> >
>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:799:9:
>> error: unknown type name '_RegexExecutorPolicy'; did you mean
>> '__detail::_RegexExecutorPolicy'?
>> > _RegexExecutorPolicy, bool);
>> > ^
>> >
>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14:
>> note: '__detail::_RegexExecutorPolicy' declared here
>> >   enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate };
>> >  ^
>> >
>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:2070:9:
>> error: unknown type name '_RegexExecutorPolicy'; did you mean
>> '__detail::_RegexExecutorPolicy'?
>> > _RegexExecutorPolicy, bool);
>> > ^
>> >
>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14:
>> note: '__detail::_RegexExecutorPolicy' declared here
>> >   enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate };
>> >  ^
>>
>> and
>>
>> > diff --git a/libstdc++-v3/include/bits/regex.h
>> b/libstdc++-v3/include/bits/regex.h
>> > index 7480b0a5f97..46c168010bf 100644
>> > --- a/libstdc++-v3/include/bits/regex.h
>> > +++ b/libstdc++-v3/include/bits/regex.h
>> > @@ -796,7 +796,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>> > __detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&,
>> > const basic_regex<_Cp, _Rp>&,
>> > regex_constants::match_flag_type,
>> > -   _RegexExecutorPolicy, bool);
>> > +   __detail::_RegexExecutorPolicy,
>> bool);
>> >
>> >template
>> > friend class __detail::_Executor;
>> > @@ -2067,7 +2067,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>> > __detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&,
>> > const basic_regex<_Cp, _Rp>&,
>> > regex_constants::match_flag_type,
>> > -   _RegexExecutorPolicy, bool);
>> > +   __detail::_RegexExecutorPolicy,
>> bool);
>> >
>> >// Reset contents to __size unmatched sub_match objects
>> >// (plus additional objects for prefix, suffix and unmatched
>> sub).
>>
>> would fix that.
>>
>
>
> I'll make the change, but this looks like a clang bug:
> https://godbolt.org/z/bozxYErrc
>

Maybe this one: https://github.com/llvm/llvm-project/issues/38230


Re: [PATCH] expr: Workaround profiledbootstrap uninit false positive [PR103899]

2022-01-06 Thread Richard Biener via Gcc-patches
On January 6, 2022 9:38:40 AM GMT+01:00, Jakub Jelinek via Gcc-patches 
 wrote:
>Hi!
>
>The threader changes resulted in a false positive warning during
>profiledbootstrap:
>In file included from ../../gcc/expr.c:26:
>../../gcc/tree.h: In function ‘rtx_def* expand_expr_real_1(tree, rtx, 
>machine_mode, expand_modifier, rtx_def**, bool)’:
>../../gcc/tree.h:244:56: error: ‘context’ may be used uninitialized in this 
>function [-Werror=maybe-uninitialized]
>  244 | #define TREE_CODE(NODE) ((enum tree_code) (NODE)->base.code)
>  |^~~~
>../../gcc/expr.c:10343:8: note: ‘context’ was declared here
>10343 |   tree context;
>  |^~~
>While it will be nice to improve the uninit pass to handle it if possible
>(I do not want to close the PR until that is done), doing profiledbootstrap
>is a common thing to do, so a workaround is handy, especially as in this
>case when the workaround seems to be the right thing to do, as it moves
>a variable declaration to the only place where it is set and used and avoids
>the weird and for uninit asking
>  tree context;
>...
>  if (exp)
>context = ...;
>  gcc_assert (!exp
>  || use (context)
>  || use_some_more (context));
>
>Bootstrapped/regtested on x86_64-linux and i686-linux with normal bootstrap
>and on aarch64-linux and powerpc64le-linux with profiledbootstrap, ok for
>trunk?

Ok. 

Richard. 

>2022-01-06  Jakub Jelinek  
>
>   PR tree-optimization/103899
>   * expr.c (expand_expr_real_1): Add a workaround for bogus uninit
>   warning by moving context variable to the only spot where it is used
>   and moving gcc_assert into if body.
>
>--- gcc/expr.c.jj  2022-01-03 10:40:41.203164211 +0100
>+++ gcc/expr.c 2022-01-05 14:47:47.684660031 +0100
>@@ -10340,7 +10340,6 @@ expand_expr_real_1 (tree exp, rtx target
>   enum tree_code code = TREE_CODE (exp);
>   rtx subtarget, original_target;
>   int ignore;
>-  tree context;
>   bool reduce_bit_field;
>   location_t loc = EXPR_LOCATION (exp);
>   struct separate_ops ops;
>@@ -10579,14 +10578,16 @@ expand_expr_real_1 (tree exp, rtx target
>   /* Variables inherited from containing functions should have
>been lowered by this point.  */
>   if (exp)
>-  context = decl_function_context (exp);
>-  gcc_assert (!exp
>-|| SCOPE_FILE_SCOPE_P (context)
>-|| context == current_function_decl
>-|| TREE_STATIC (exp)
>-|| DECL_EXTERNAL (exp)
>-/* ??? C++ creates functions that are not TREE_STATIC.  */
>-|| TREE_CODE (exp) == FUNCTION_DECL);
>+  {
>+tree context = decl_function_context (exp);
>+gcc_assert (SCOPE_FILE_SCOPE_P (context)
>+|| context == current_function_decl
>+|| TREE_STATIC (exp)
>+|| DECL_EXTERNAL (exp)
>+/* ??? C++ creates functions that are not
>+   TREE_STATIC.  */
>+|| TREE_CODE (exp) == FUNCTION_DECL);
>+  }
> 
>   /* This is the case of an array whose size is to be determined
>from its initializer, while the initializer is still being parsed.
>
>   Jakub
>



Re: [PATCH] [RTL/fwprop] Allow propagations from inner loop to outer loop.

2022-01-06 Thread Richard Biener via Gcc-patches
On January 6, 2022 7:51:48 AM GMT+01:00, liuhongt  wrote:
>> that's flow_loop_nested_p (loop *outer, loop *inner) which
>> is implemented in O(1).  Note behavior for outer == inner
>> might be different (didn't check your implementation too hard)
>>
>Thanks, it seems flow_loop_nested_p assume outer and inner not to be
>NULL. So I add some conditions to check NULL which is considered as an outer
> loop of any other loop.

Huh, loop_father should never be NULL. Maybe when fwprop is run after RTL loop 
opts you instead want to add a check for current_loops or alternelatively 
initialize loops in fwprop. 

>
>
>gcc/ChangeLog:
>
>   PR rtl/103750
>   * fwprop.c (forward_propagate_into): Allow propagations from
>   inner loop to outer loop.
>
>gcc/testsuite/ChangeLog:
>
>   * g++.target/i386/pr103750-fwprop-1.C: New test.
>---
> gcc/fwprop.c  |  7 +++--
> .../g++.target/i386/pr103750-fwprop-1.C   | 26 +++
> 2 files changed, 31 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/g++.target/i386/pr103750-fwprop-1.C
>
>diff --git a/gcc/fwprop.c b/gcc/fwprop.c
>index 2eab4fd4614..af2e9d1c189 100644
>--- a/gcc/fwprop.c
>+++ b/gcc/fwprop.c
>@@ -866,10 +866,13 @@ forward_propagate_into (use_info *use, bool 
>reg_prop_only = false)
>   rtx src = SET_SRC (def_set);
> 
>   /* Allow propagations into a loop only for reg-to-reg copies, since
>- replacing one register by another shouldn't increase the cost.  */
>+ replacing one register by another shouldn't increase the cost.
>+ Propagations from inner loop to outer loop should be also ok.  */
>   struct loop *def_loop = def_insn->bb ()->cfg_bb ()->loop_father;
>   struct loop *use_loop = use->bb ()->cfg_bb ()->loop_father;
>-  if ((reg_prop_only || def_loop != use_loop)
>+  if ((reg_prop_only
>+   || (use_loop && def_loop != use_loop
>+ &&(!def_loop || !flow_loop_nested_p (use_loop, def_loop
>   && (!reg_single_def_p (dest) || !reg_single_def_p (src)))
> return false;
> 
>diff --git a/gcc/testsuite/g++.target/i386/pr103750-fwprop-1.C 
>b/gcc/testsuite/g++.target/i386/pr103750-fwprop-1.C
>new file mode 100644
>index 000..26987d307aa
>--- /dev/null
>+++ b/gcc/testsuite/g++.target/i386/pr103750-fwprop-1.C
>@@ -0,0 +1,26 @@
>+/* PR target/103750.  */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -std=c++1y -march=cannonlake -fdump-rtl-fwprop1" } */
>+/* { dg-final { scan-rtl-dump-not "subreg:HI\[ 
>\\\(\]*reg:SI\[^\n]*\n\[^\n]*UNSPEC_TZCNT" "fwprop1" } } */
>+
>+#include
>+const char16_t *qustrchr(char16_t *n, char16_t *e, char16_t c) noexcept
>+{
>+  __m256i mch256 = _mm256_set1_epi16(c);
>+  for ( ; n < e; n += 32) {
>+__m256i data1 = _mm256_loadu_si256(reinterpret_cast(n));
>+__m256i data2 = _mm256_loadu_si256(reinterpret_cast(n) + 
>1);
>+__mmask16 mask1 = _mm256_cmpeq_epu16_mask(data1, mch256);
>+__mmask16 mask2 = _mm256_cmpeq_epu16_mask(data2, mch256);
>+if (_kortestz_mask16_u8(mask1, mask2))
>+  continue;
>+
>+unsigned idx = _tzcnt_u32(mask1);
>+if (mask1 == 0) {
>+  idx = __tzcnt_u16(mask2);
>+  n += 16;
>+}
>+return n + idx;
>+  }
>+  return e;
>+}



Re: [PATCH] LTO plugin: modernize a bit.

2022-01-06 Thread Martin Liška

On 1/5/22 18:43, Jeff Law wrote:

But the timing isn't great.  We're in stage3 and presumably moving into stage4 
shortly. Shouldn't this wait until stage1 re-opens?


Agree, I'm going to prepare the changes for GCC 13, same as LTO plugin support 
for ld.mold.

Cheers,
Martin


[PATCH] [i386] Support commutative alternative for AVX512 vpcmpeq{b, w, d, q}

2022-01-06 Thread liuhongt via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready to push to trunk.

gcc/ChangeLog:

* config/i386/sse.md
(*_eq3_1): Extend to
UNSPEC_PCMP_UNSIGNED.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr103774.c: New test.
* gcc.target/i386/avx512bw-vpcmpequb-1.c: Adjust scan assembler
from vpcmpub to (?:vpcmpub|vpcmpeqb).
* gcc.target/i386/avx512bw-vpcmpequw-1.c: Ditto.
* gcc.target/i386/avx512bw-vpcmpub-1.c: Ditto.
* gcc.target/i386/avx512bw-vpcmpuw-1.c: Ditto.
* gcc.target/i386/avx512f-vpcmpequd-1.c: Ditto.
* gcc.target/i386/avx512f-vpcmpequq-1.c: Ditto.
* gcc.target/i386/avx512f-vpcmpud-1.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpequd-1.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpequq-1.c: Ditto.
* gcc.target/i386/avx512vl-vpcmpuq-1.c: Ditto.
---
 gcc/config/i386/sse.md| 64 +--
 .../gcc.target/i386/avx512bw-vpcmpequb-1.c| 12 ++--
 .../gcc.target/i386/avx512bw-vpcmpequw-1.c| 12 ++--
 .../gcc.target/i386/avx512bw-vpcmpub-1.c  |  2 +-
 .../gcc.target/i386/avx512bw-vpcmpuw-1.c  |  2 +-
 .../gcc.target/i386/avx512f-vpcmpequd-1.c |  4 +-
 .../gcc.target/i386/avx512f-vpcmpequq-1.c |  4 +-
 .../gcc.target/i386/avx512f-vpcmpud-1.c   |  2 +-
 .../gcc.target/i386/avx512vl-vpcmpequd-1.c|  8 +--
 .../gcc.target/i386/avx512vl-vpcmpequq-1.c|  8 +--
 .../gcc.target/i386/avx512vl-vpcmpuq-1.c  |  2 +-
 gcc/testsuite/gcc.target/i386/pr103774.c  | 25 
 12 files changed, 85 insertions(+), 60 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103774.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index fa1d56ae3e3..35360004818 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3895,6 +3895,22 @@ (define_insn_and_split "*_cmp3"
   UNSPEC_PCMP_ITER))]
   "operands[4] = GEN_INT (INTVAL (operands[3]) ^ 4);")
 
+(define_insn "*_eq3_1"
+  [(set (match_operand: 0 "register_operand" "=k,k")
+   (unspec:
+ [(match_operand:VI12_AVX512VL 1 "nonimm_or_0_operand" "%v,v")
+  (match_operand:VI12_AVX512VL 2 "nonimm_or_0_operand" "vm,C")
+  (const_int 0)]
+ UNSPEC_PCMP_ITER))]
+  "TARGET_AVX512BW && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "@
+   vpcmpeq\t{%2, %1, 
%0|%0, %1, %2}
+   vptestnm\t{%1, %1, 
%0|%0, %1, %1}"
+  [(set_attr "type" "ssecmp")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "")])
+
 (define_insn "_ucmp3"
   [(set (match_operand: 0 "register_operand" "=k")
(unspec:
@@ -3977,6 +3993,22 @@ (define_insn_and_split 
"*_ucmp3_zero_extend")])
 
+(define_insn "*_eq3_1"
+  [(set (match_operand: 0 "register_operand" "=k,k")
+   (unspec:
+ [(match_operand:VI48_AVX512VL 1 "nonimm_or_0_operand" "%v,v")
+  (match_operand:VI48_AVX512VL 2 "nonimm_or_0_operand" "vm,C")
+  (const_int 0)]
+ UNSPEC_PCMP_ITER))]
+  "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "@
+   vpcmpeq\t{%2, %1, 
%0|%0, %1, %2}
+   vptestnm\t{%1, %1, 
%0|%0, %1, %1}"
+  [(set_attr "type" "ssecmp")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "")])
+
 (define_insn "_ucmp3"
   [(set (match_operand: 0 "register_operand" "=k")
(unspec:
@@ -16248,38 +16280,6 @@ (define_expand 
"_eq3"
   "TARGET_AVX512F"
   "ix86_fixup_binary_operands_no_copy (EQ, mode, operands);")
 
-(define_insn "*_eq3_1"
-  [(set (match_operand: 0 "register_operand" "=k,k")
-   (unspec:
- [(match_operand:VI12_AVX512VL 1 "nonimm_or_0_operand" "%v,v")
-  (match_operand:VI12_AVX512VL 2 "nonimm_or_0_operand" "vm,C")
-  (const_int 0)]
- UNSPEC_PCMP))]
-  "TARGET_AVX512BW && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
-  "@
-   vpcmpeq\t{%2, %1, 
%0|%0, %1, %2}
-   vptestnm\t{%1, %1, 
%0|%0, %1, %1}"
-  [(set_attr "type" "ssecmp")
-   (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "evex")
-   (set_attr "mode" "")])
-
-(define_insn "*_eq3_1"
-  [(set (match_operand: 0 "register_operand" "=k,k")
-   (unspec:
- [(match_operand:VI48_AVX512VL 1 "nonimm_or_0_operand" "%v,v")
-  (match_operand:VI48_AVX512VL 2 "nonimm_or_0_operand" "vm,C")
-  (const_int 0)]
- UNSPEC_PCMP))]
-  "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
-  "@
-   vpcmpeq\t{%2, %1, 
%0|%0, %1, %2}
-   vptestnm\t{%1, %1, 
%0|%0, %1, %1}"
-  [(set_attr "type" "ssecmp")
-   (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "evex")
-   (set_attr "mode" "")])
-
 (define_insn "*sse4_1_eqv2di3"
   [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
(eq:V2DI
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-vpcmpequb-1.c 
b/gcc/testsuite/gcc.target/i386/avx512bw-vpcmpequb-1.c
index 31a0afc2f17..9a85753c9fb 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bw-vpcmpequb-1.c
+++

Re: [PATCH] Libquadmath: add nansq() function

2022-01-06 Thread FX via Gcc-patches
Hi Joseph,

> All targets with _Float128 should have __builtin_nansf128, since we have
> DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_NANS, "nans", NAN_TYPE, 
> ATTR_CONST_NOTHROW_NONNULL)
> in builtins.def.

Hum, I see, I didn’t know that version existed. To be honest, I find the “other 
built-ins” doc page to be quite confusing to read: 
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

Thanks!

FX



Re: [PATCH] gcc: pass-manager: Fix memory leak. [PR jit/63854]

2022-01-06 Thread David Malcolm via Gcc-patches
On Sun, 2021-12-19 at 22:30 +0100, Marc Nieper-Wißkirchen wrote:
> This patch fixes a memory leak in the pass manager. In the existing
> code,
> the m_name_to_pass_map is allocated in
> pass_manager::register_pass_name, but
> never deallocated.  This is fixed by adding a deletion in
> pass_manager::~pass_manager.  Moreover the string keys in
> m_name_to_pass_map are
> all dynamically allocated.  To free them, this patch adds a new hash
> trait for
> string hashes that are to be freed when the corresponding hash entry
> is removed.
> 
> This fix is particularly relevant for using GCC as a library through
> libgccjit.
> The memory leak also occurs when libgccjit is instructed to use an
> external
> driver.
> 
> Before the patch, compiling the hello world example of libgccjit with
> the external driver under Valgrind shows a loss of 12,611 (48 direct)
> bytes.  After the patch, no memory leaks are reported anymore.
> (Memory leaks occurring when using the internal driver are mostly in
> the driver code in gcc/gcc.c and have to be fixed separately.)
> 
> The patch has been tested by fully bootstrapping the compiler with
> the
> frontends C, C++, Fortran, LTO, ObjC, JIT and running the test suite
> under a x86_64-pc-linux-gnu host.

Thanks for the patch.

It looks correct to me, given that pass_manager::register_pass_name
does an xstrdup and puts the result in the map.

That said:
- I'm not officially a reviewer for this part of gcc (though I probably
touched this code last)
- is it cleaner to instead change m_name_to_pass_map's key type from
const char * to char *, to convey that the map "owns" the name?  That
way we probably wouldn't need struct typed_const_free_remove, and (I
hope) works better with the type system.

Dave

> 
> gcc/ChangeLog:
> 
> PR jit/63854
> * hash-traits.h (struct typed_const_free_remove): New.
> (struct free_string_hash): New.
> * pass_manager.h: Use free_string_hash.
> * passes.c (pass_manager::register_pass_name): Use
> free_string_hash.
>     (pass_manager::~pass_manager): Delete allocated
> m_name_to_pass_map.
> ---
>  gcc/hash-traits.h  | 17 +
>  gcc/pass_manager.h |  3 +--
>  gcc/passes.c   |  5 +++--
>  3 files changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/hash-traits.h b/gcc/hash-traits.h
> index 6f0373ec27f..1c08d6874ab 100644
> --- a/gcc/hash-traits.h
> +++ b/gcc/hash-traits.h
> @@ -28,6 +28,11 @@ struct typed_free_remove
>    static inline void remove (Type *p);
>  };
>  
> +template 
> +struct typed_const_free_remove
> +{
> +  static inline void remove (const Type *p);
> +};
>  
>  /* Remove with free.  */
>  
> @@ -38,6 +43,13 @@ typed_free_remove ::remove (Type *p)
>    free (p);
>  }
>  
> +template 
> +inline void
> +typed_const_free_remove ::remove (const Type *p)
> +{
> +  free (const_cast  (p));
> +}
> +
>  /* Helpful type for removing with delete.  */
>  
>  template 
> @@ -305,6 +317,11 @@ struct ggc_ptr_hash : pointer_hash ,
> ggc_remove  {};
>  template 
>  struct ggc_cache_ptr_hash : pointer_hash , ggc_cache_remove 
> {};
>  
> +/* Traits for string elements that should be freed when an element
> is
> +   deleted.  */
> +
> +struct free_string_hash : string_hash, typed_const_free_remove
>  {};
> +
>  /* Traits for string elements that should not be freed when an
> element
>     is deleted.  */
>  
> diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
> index aaf72cf6803..f5615e1fda8 100644
> --- a/gcc/pass_manager.h
> +++ b/gcc/pass_manager.h
> @@ -106,7 +106,7 @@ private:
>  
>  private:
>    context *m_ctxt;
> -  hash_map *m_name_to_pass_map;
> +  hash_map *m_name_to_pass_map;
>  
>    /* References to all of the individual passes.
>   These fields are generated via macro expansion.
> @@ -146,4 +146,3 @@ private:
>  } // namespace gcc
>  
>  #endif /* ! GCC_PASS_MANAGER_H */
> -
> diff --git a/gcc/passes.c b/gcc/passes.c
> index 4bea6ae5b6a..0c70ece5321 100644
> --- a/gcc/passes.c
> +++ b/gcc/passes.c
> @@ -903,7 +903,7 @@ void
>  pass_manager::register_pass_name (opt_pass *pass, const char *name)
>  {
>    if (!m_name_to_pass_map)
> -    m_name_to_pass_map = new hash_map *> (256);
> +    m_name_to_pass_map = new hash_map
> (256);
>  
>    if (m_name_to_pass_map->get (name))
>  return; /* Ignore plugin passes.  */
> @@ -1674,6 +1674,7 @@ pass_manager::~pass_manager ()
>    GCC_PASS_LISTS
>  #undef DEF_PASS_LIST
>  
> +  delete m_name_to_pass_map;
>  }
>  
>  /* If we are in IPA mode (i.e., current_function_decl is NULL), call
> @@ -1943,7 +1944,7 @@ pass_manager::dump_profile_report () const
>    " |in count |out
> prob "
>    "|in count  |out prob  "
>    "|size   |time  |\n");
> -  
> +
>    for (int i = 1; i < passes_by_id_size; i++)
>  if (profile_record[i].run)
>    {




Re: [PATCH] gcc: pass-manager: Fix memory leak. [PR jit/63854]

2022-01-06 Thread David Malcolm via Gcc-patches
On Thu, 2022-01-06 at 08:53 -0500, David Malcolm wrote:
> On Sun, 2021-12-19 at 22:30 +0100, Marc Nieper-Wißkirchen wrote:
> > This patch fixes a memory leak in the pass manager. In the existing
> > code,
> > the m_name_to_pass_map is allocated in
> > pass_manager::register_pass_name, but
> > never deallocated.  This is fixed by adding a deletion in
> > pass_manager::~pass_manager.  Moreover the string keys in
> > m_name_to_pass_map are
> > all dynamically allocated.  To free them, this patch adds a new hash
> > trait for
> > string hashes that are to be freed when the corresponding hash entry
> > is removed.
> > 
> > This fix is particularly relevant for using GCC as a library through
> > libgccjit.
> > The memory leak also occurs when libgccjit is instructed to use an
> > external
> > driver.
> > 
> > Before the patch, compiling the hello world example of libgccjit with
> > the external driver under Valgrind shows a loss of 12,611 (48 direct)
> > bytes.  After the patch, no memory leaks are reported anymore.
> > (Memory leaks occurring when using the internal driver are mostly in
> > the driver code in gcc/gcc.c and have to be fixed separately.)
> > 
> > The patch has been tested by fully bootstrapping the compiler with
> > the
> > frontends C, C++, Fortran, LTO, ObjC, JIT and running the test suite
> > under a x86_64-pc-linux-gnu host.
> 
> Thanks for the patch.
> 
> It looks correct to me, given that pass_manager::register_pass_name
> does an xstrdup and puts the result in the map.
> 
> That said:
> - I'm not officially a reviewer for this part of gcc (though I probably
> touched this code last)
> - is it cleaner to instead change m_name_to_pass_map's key type from
> const char * to char *, to convey that the map "owns" the name?  That
> way we probably wouldn't need struct typed_const_free_remove, and (I
> hope) works better with the type system.


[...snip...]

> 
> > diff --git a/gcc/passes.c b/gcc/passes.c
> > index 4bea6ae5b6a..0c70ece5321 100644
> > --- a/gcc/passes.c
> > +++ b/gcc/passes.c

[...snip...]

> > @@ -1943,7 +1944,7 @@ pass_manager::dump_profile_report () const
> >    " |in count |out
> > prob "
> >    "|in count  |out prob  "
> >    "|size   |time  |\n");
> > -  
> > +
> >    for (int i = 1; i < passes_by_id_size; i++)
> >  if (profile_record[i].run)
> >    {
> 

...and there's a stray whitespace change here (in
pass_manager::dump_profile_report), which probably shouldn't be in the
patch.

Dave



PING (C/C++): Re: [PATCH 6/6] Add __attribute__ ((tainted))

2022-01-06 Thread David Malcolm via Gcc-patches
On Sat, 2021-11-13 at 15:37 -0500, David Malcolm wrote:
> This patch adds a new __attribute__ ((tainted)) to the C/C++
> frontends.

Ping for GCC C/C++ mantainers for review of the C/C++ FE parts of this
patch (attribute registration, documentation, the name of the
attribute, etc).

(I believe it's independent of the rest of the patch kit, in that it
could go into trunk without needing the prior patches)

Thanks
Dave


> 
> It can be used on function decls: the analyzer will treat as tainted
> all parameters to the function and all buffers pointed to by
> parameters
> to the function.  Adding this in one place to the Linux kernel's
> __SYSCALL_DEFINEx macro allows the analyzer to treat all syscalls as
> having tainted inputs.  This gives additional testing beyond e.g.
> __user
> pointers added by earlier patches - an example of the use of this can
> be
> seen in CVE-2011-2210, where given:
> 
>  SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *,
> buffer,
>  unsigned long, nbytes, int __user *, start, void
> __user *, arg)
> 
> the analyzer will treat the nbytes param as under attacker control,
> and
> can complain accordingly:
> 
> taint-CVE-2011-2210-1.c: In function ‘sys_osf_getsysinfo’:
> taint-CVE-2011-2210-1.c:69:21: warning: use of attacker-controlled
> value
>   ‘nbytes’ as size without upper-bounds checking [CWE-129] [-
> Wanalyzer-tainted-size]
>    69 | if (copy_to_user(buffer, hwrpb, nbytes) != 0)
>   | ^~~
> 
> Additionally, the patch allows the attribute to be used on field
> decls:
> specifically function pointers.  Any function used as an initializer
> for such a field gets treated as tainted.  An example can be seen in
> CVE-2020-13143, where adding __attribute__((tainted)) to the "store"
> callback of configfs_attribute:
> 
>   struct configfs_attribute {
>  /* [...snip...] */
>  ssize_t (*store)(struct config_item *, const char *, size_t)
>    __attribute__((tainted));
>  /* [...snip...] */
>   };
> 
> allows the analyzer to see:
> 
>  CONFIGFS_ATTR(gadget_dev_desc_, UDC);
> 
> and treat gadget_dev_desc_UDC_store as tainted, so that it complains:
> 
> taint-CVE-2020-13143-1.c: In function ‘gadget_dev_desc_UDC_store’:
> taint-CVE-2020-13143-1.c:33:17: warning: use of attacker-controlled
> value
>   ‘len + 18446744073709551615’ as offset without upper-bounds
> checking [CWE-823] [-Wanalyzer-tainted-offset]
>    33 | if (name[len - 1] == '\n')
>   | ^
> 
> Similarly, the attribute could be used on the ioctl callback field,
> USB device callbacks, network-handling callbacks etc.  This
> potentially
> gives a lot of test coverage with relatively little code annotation,
> and
> without necessarily needing link-time analysis (which -fanalyzer can
> only do at present on trivial examples).
> 
> I believe this is the first time we've had an attribute on a field.
> If that's an issue, I could prepare a version of the patch that
> merely allowed it on functions themselves.
> 
> As before this currently still needs -fanalyzer-checker=taint (in
> addition to -fanalyzer).
> 
> gcc/analyzer/ChangeLog:
> * engine.cc: Include "stringpool.h", "attribs.h", and
> "tree-dfa.h".
> (mark_params_as_tainted): New.
> (class tainted_function_custom_event): New.
> (class tainted_function_info): New.
> (exploded_graph::add_function_entry): Handle functions with
> "tainted" attribute.
> (class tainted_field_custom_event): New.
> (class tainted_callback_custom_event): New.
> (class tainted_call_info): New.
> (add_tainted_callback): New.
> (add_any_callbacks): New.
> (exploded_graph::build_initial_worklist): Find callbacks that
> are
> reachable from global initializers, calling add_any_callbacks
> on
> them.
> 
> gcc/c-family/ChangeLog:
> * c-attribs.c (c_common_attribute_table): Add "tainted".
> (handle_tainted_attribute): New.
> 
> gcc/ChangeLog:
> * doc/extend.texi (Function Attributes): Note that "tainted"
> can
> be used on field decls.
> (Common Function Attributes): Add entry on "tainted"
> attribute.
> 
> gcc/testsuite/ChangeLog:
> * gcc.dg/analyzer/attr-tainted-1.c: New test.
> * gcc.dg/analyzer/attr-tainted-misuses.c: New test.
> * gcc.dg/analyzer/taint-CVE-2011-2210-1.c: New test.
> * gcc.dg/analyzer/taint-CVE-2020-13143-1.c: New test.
> * gcc.dg/analyzer/taint-CVE-2020-13143-2.c: New test.
> * gcc.dg/analyzer/taint-CVE-2020-13143.h: New test.
> * gcc.dg/analyzer/taint-alloc-3.c: New test.
> * gcc.dg/analyzer/taint-alloc-4.c: New test.
> 
> Signed-off-by: David Malcolm 
> ---
>  gcc/analyzer/engine.cc    | 317
> +-
>  gcc/c-family/c-attribs.c  |  36 ++
>  gcc/doc/extend.texi  

[PATCH 0/6] ira: Fix performance regression in exchange2 [PR98782]

2022-01-06 Thread Richard Sandiford via Gcc-patches
This series of patches recovers the exchange2 performance lost in the
GCC 11 timeframe (at least on aarch64 and Power9 -- thanks Pat for
testing the latter).

There are 6 patches, split into two groups of 3.  The first 3 are just
preparatory patches, although patch 2 does contain a minor bug fix.
The other 3 patches are the ones that together fix the regression.

I realise this is a bit invasive for stage 3.  However, the series is
fixing a large regression in an important benchmark and AFAIK there are
no known acceptable mitigations that we could apply instead.  I think
the series is also working with concepts that already exist in IRA:
it's really about tweaking the cost model used to control them.

We also still have at least 3 months (more realistically 4 months) of
testing before GCC 12 is released.  So perhaps one option would be to
apply any approved version of the series now, but with the understanding
that if there's significant fallout (more than a handful of small tweaks
or fixes), we would simply revert the patches rather than trying to
rework them in-situ.  The series is confined to IRA so reverting it
should be simple.  Would that be OK?

Each patch bootstrapped & regression-tested individually on
aarch64-linux-gnu.  Also tested as a series on aarch64_be-elf,
arm-linux-gnueabihf, powerpc64le-linux-gnu, and x86_64-linux-gnu.

Richard


[PATCH 1/6] ira: Add a ira_loop_border_costs class

2022-01-06 Thread Richard Sandiford via Gcc-patches
The final index into (ira_)memory_move_cost is 1 for loads and
0 for stores.  Thus the combination:

  entry_freq * memory_cost[1] + exit_freq * memory_cost[0]

is the cost of loading a register on entry to a loop and
storing it back on exit from the loop.  This is the cost to
use if the register is successfully allocated within the
loop but is spilled in the parent loop.  Similarly:

  entry_freq * memory_cost[0] + exit_freq * memory_cost[1]

is the cost of storing a register on entry to the loop and
restoring it on exit from the loop.  This is the cost to
use if the register is spilled within the loop but is
successfully allocated in the parent loop.

The patch adds a helper class for calculating these values and
mechanically replaces the existing instances.  There is no attempt to
editorialise the choice between using “spill inside” and “spill outside”
costs.  (I think one of them is the wrong way round, but a later patch
deals with that.)

No functional change intended.

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_loop_border_costs): New class.
* ira-color.c (ira_loop_border_costs::ira_loop_border_costs):
New constructor.
(calculate_allocno_spill_cost): Use ira_loop_border_costs.
(color_pass): Likewise.
(move_spill_restore): Likewise.
---
 gcc/ira-color.c | 76 +++--
 gcc/ira-int.h   | 56 
 2 files changed, 86 insertions(+), 46 deletions(-)

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index e0b4e490043..66c11710b97 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -2567,13 +2567,23 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   return REG_FREQ_FROM_EDGE_FREQ (freq);
 }
 
+/* Construct an object that describes the boundary between A and its
+   parent allocno.  */
+ira_loop_border_costs::ira_loop_border_costs (ira_allocno_t a)
+  : m_mode (ALLOCNO_MODE (a)),
+m_class (ALLOCNO_CLASS (a)),
+m_entry_freq (ira_loop_edge_freq (ALLOCNO_LOOP_TREE_NODE (a),
+ ALLOCNO_REGNO (a), false)),
+m_exit_freq (ira_loop_edge_freq (ALLOCNO_LOOP_TREE_NODE (a),
+ALLOCNO_REGNO (a), true))
+{
+}
+
 /* Calculate and return the cost of putting allocno A into memory.  */
 static int
 calculate_allocno_spill_cost (ira_allocno_t a)
 {
   int regno, cost;
-  machine_mode mode;
-  enum reg_class rclass;
   ira_allocno_t parent_allocno;
   ira_loop_tree_node_t parent_node, loop_node;
 
@@ -2586,24 +2596,12 @@ calculate_allocno_spill_cost (ira_allocno_t a)
 return cost;
   if ((parent_allocno = parent_node->regno_allocno_map[regno]) == NULL)
 return cost;
-  mode = ALLOCNO_MODE (a);
-  rclass = ALLOCNO_CLASS (a);
+  ira_loop_border_costs border_costs (a);
   if (ALLOCNO_HARD_REGNO (parent_allocno) < 0)
-cost -= (ira_memory_move_cost[mode][rclass][0]
-* ira_loop_edge_freq (loop_node, regno, true)
-+ ira_memory_move_cost[mode][rclass][1]
-* ira_loop_edge_freq (loop_node, regno, false));
+cost -= border_costs.spill_outside_loop_cost ();
   else
-{
-  ira_init_register_move_cost_if_necessary (mode);
-  cost += ((ira_memory_move_cost[mode][rclass][1]
-   * ira_loop_edge_freq (loop_node, regno, true)
-   + ira_memory_move_cost[mode][rclass][0]
-   * ira_loop_edge_freq (loop_node, regno, false))
-  - (ira_register_move_cost[mode][rclass][rclass]
- * (ira_loop_edge_freq (loop_node, regno, false)
-+ ira_loop_edge_freq (loop_node, regno, true;
-}
+cost += (border_costs.spill_inside_loop_cost ()
+- border_costs.move_between_loops_cost ());
   return cost;
 }
 
@@ -3342,7 +3340,7 @@ static void
 color_pass (ira_loop_tree_node_t loop_tree_node)
 {
   int regno, hard_regno, index = -1, n;
-  int cost, exit_freq, enter_freq;
+  int cost;
   unsigned int j;
   bitmap_iterator bi;
   machine_mode mode;
@@ -3466,8 +3464,6 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
}
  continue;
}
- exit_freq = ira_loop_edge_freq (subloop_node, regno, true);
- enter_freq = ira_loop_edge_freq (subloop_node, regno, false);
  ira_assert (regno < ira_reg_equiv_len);
  if (ira_equiv_no_lvalue_p (regno))
{
@@ -3483,16 +3479,16 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
}
  else if (hard_regno < 0)
{
+ ira_loop_border_costs border_costs (subloop_allocno);
  ALLOCNO_UPDATED_MEMORY_COST (subloop_allocno)
-   -= ((ira_memory_move_cost[mode][rclass][1] * enter_freq)
-   + (ira_memory_move_cost[mode][rclass][0] * exit_freq));
+   -= border_costs.spill_outside_loop_cost ();
}
  else
{
+ ira_loop_border_costs border_c

[PATCH 2/6] ira: Add comments and fix move_spill_restore calculation

2022-01-06 Thread Richard Sandiford via Gcc-patches
This patch adds comments to describe each use of ira_loop_border_costs.
I think this highlights that move_spill_restore was using the wrong cost
in one case, which came from tranposing [0] and [1] in the original
(pre-ira_loop_border_costs) ira_memory_move_cost expressions.  The
difference would only be noticeable on targets that distinguish between
load and store costs.

gcc/
PR rtl-optimization/98782
* ira-color.c (color_pass): Add comments to describe the spill costs.
(move_spill_restore): Likewise.  Fix reversed calculation.
---
 gcc/ira-color.c | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 66c11710b97..e7433312675 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -3479,6 +3479,13 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
}
  else if (hard_regno < 0)
{
+ /* If we allocate a register to SUBLOOP_ALLOCNO, we'll need
+to load the register on entry to the subloop and store
+the register back on exit from the subloop.  This incurs
+a fixed cost for all registers.  Since UPDATED_MEMORY_COST
+is (and should only be) used relative to the register costs
+for the same allocno, we can subtract this shared register
+cost from the memory cost.  */
  ira_loop_border_costs border_costs (subloop_allocno);
  ALLOCNO_UPDATED_MEMORY_COST (subloop_allocno)
-= border_costs.spill_outside_loop_cost ();
@@ -3503,6 +3510,9 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
  > ALLOCNO_UPDATED_HARD_REG_COSTS (subloop_allocno)[index])
ALLOCNO_UPDATED_CLASS_COST (subloop_allocno)
  = ALLOCNO_UPDATED_HARD_REG_COSTS (subloop_allocno)[index];
+ /* If we spill SUBLOOP_ALLOCNO, we'll need to store HARD_REGNO
+on entry to the subloop and restore HARD_REGNO on exit from
+the subloop.  */
  ALLOCNO_UPDATED_MEMORY_COST (subloop_allocno)
+= border_costs.spill_inside_loop_cost ();
}
@@ -3601,9 +3611,17 @@ move_spill_restore (void)
  : ALLOCNO_HARD_REG_COSTS (subloop_allocno)[index]));
  ira_loop_border_costs border_costs (subloop_allocno);
  if ((hard_regno2 = ALLOCNO_HARD_REGNO (subloop_allocno)) < 0)
-   cost -= border_costs.spill_outside_loop_cost ();
+   /* The register was spilled in the subloop.  If we spill
+  it in the outer loop too then we'll no longer need to
+  save the register on entry to the subloop and restore
+  the register on exit from the subloop.  */
+   cost -= border_costs.spill_inside_loop_cost ();
  else
{
+ /* The register was also allocated in the subloop.  If we
+spill it in the outer loop then we'll need to load the
+register on entry to the subloop and store the register
+back on exit from the subloop.  */
  cost += border_costs.spill_outside_loop_cost ();
  if (hard_regno2 != hard_regno)
cost -= border_costs.move_between_loops_cost ();
@@ -3615,9 +3633,17 @@ move_spill_restore (void)
  ira_assert (rclass == ALLOCNO_CLASS (parent_allocno));
  ira_loop_border_costs border_costs (a);
  if ((hard_regno2 = ALLOCNO_HARD_REGNO (parent_allocno)) < 0)
+   /* The register was spilled in the parent loop.  If we spill
+  it in this loop too then we'll no longer need to load the
+  register on entry to this loop and save the register back
+  on exit from this loop.  */
cost -= border_costs.spill_outside_loop_cost ();
  else
{
+ /* The register was also allocated in the parent loop.
+If we spill it in this loop then we'll need to save
+the register on entry to this loop and restore the
+register on exit from this loop.  */
  cost += border_costs.spill_inside_loop_cost ();
  if (hard_regno2 != hard_regno)
cost -= border_costs.move_between_loops_cost ();
-- 
2.25.1



[PATCH 3/6] ira: Add ira_subloop_allocnos_can_differ_p

2022-01-06 Thread Richard Sandiford via Gcc-patches
color_pass has two instances of the same code for propagating non-cap
assignments from parent loops to subloops.  This patch adds a helper
function for testing when such propagations are required for correctness
and uses it to remove the duplicated code.

A later patch will use this in ira-build.c too, which is why the
function is exported to ira-int.h.

No functional change intended.

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_subloop_allocnos_can_differ_p): New function,
extracted from...
* ira-color.c (color_pass): ...here.
---
 gcc/ira-color.c | 21 +
 gcc/ira-int.h   | 28 
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index e7433312675..ae0f08af4b3 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -3446,26 +3446,7 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
  if ((flag_ira_region == IRA_REGION_MIXED
   && (loop_tree_node->reg_pressure[pclass]
   <= ira_class_hard_regs_num[pclass]))
- || (pic_offset_table_rtx != NULL
- && regno == (int) REGNO (pic_offset_table_rtx))
- /* Avoid overlapped multi-registers. Moves between them
-might result in wrong code generation.  */
- || (hard_regno >= 0
- && ira_reg_class_max_nregs[pclass][mode] > 1))
-   {
- if (! ALLOCNO_ASSIGNED_P (subloop_allocno))
-   {
- ALLOCNO_HARD_REGNO (subloop_allocno) = hard_regno;
- ALLOCNO_ASSIGNED_P (subloop_allocno) = true;
- if (hard_regno >= 0)
-   update_costs_from_copies (subloop_allocno, true, true);
- /* We don't need updated costs anymore.  */
- ira_free_allocno_updated_costs (subloop_allocno);
-   }
- continue;
-   }
- ira_assert (regno < ira_reg_equiv_len);
- if (ira_equiv_no_lvalue_p (regno))
+ || !ira_subloop_allocnos_can_differ_p (a, hard_regno >= 0))
{
  if (! ALLOCNO_ASSIGNED_P (subloop_allocno))
{
diff --git a/gcc/ira-int.h b/gcc/ira-int.h
index b32c80d4c9e..c5b1a131abd 100644
--- a/gcc/ira-int.h
+++ b/gcc/ira-int.h
@@ -1595,4 +1595,32 @@ ira_loop_border_costs::move_between_loops_cost () const
   return move_cost * (m_entry_freq + m_exit_freq);
 }
 
+/* Return true if subloops that contain allocnos for A's register can
+   use a different assignment from A.  ALLOCATED_P is true for the case
+   in which allocation succeeded for A.  */
+inline bool
+ira_subloop_allocnos_can_differ_p (ira_allocno_t a, bool allocated_p = true)
+{
+  auto regno = ALLOCNO_REGNO (a);
+
+  if (pic_offset_table_rtx != NULL
+  && regno == (int) REGNO (pic_offset_table_rtx))
+return false;
+
+  ira_assert (regno < ira_reg_equiv_len);
+  if (ira_equiv_no_lvalue_p (regno))
+return false;
+
+  /* Avoid overlapping multi-registers.  Moves between them might result
+ in wrong code generation.  */
+  if (allocated_p)
+{
+  auto pclass = ira_pressure_class_translate[ALLOCNO_CLASS (a)];
+  if (ira_reg_class_max_nregs[pclass][ALLOCNO_MODE (a)] > 1)
+   return false;
+}
+
+  return true;
+}
+
 #endif /* GCC_IRA_INT_H */
-- 
2.25.1



[PATCH 4/6] ira: Try to avoid propagating conflicts

2022-01-06 Thread Richard Sandiford via Gcc-patches
Suppose that:

- an inner loop L contains an allocno A
- L clobbers hard register R while A is live
- A's parent allocno is AP

Previously, propagate_allocno_info would propagate conflict sets up the
loop tree, so that the conflict between A and R would become a conflict
between AP and R (and so on for ancestors of AP).

However, when IRA treats loops as separate allocation regions, it can
decide on a loop-by-loop basis whether to allocate a register or spill
to memory.  Conflicts in inner loops therefore don't need to become
hard conflicts in parent loops.  Instead we can record that using the
“conflicting” registers for the parent allocnos has a higher cost.
In the example above, this higher cost is the sum of:

- the cost of saving R on entry to L
- the cost of keeping the pseudo register in memory throughout L
- the cost of reloading R on exit from L

This value is also a cap on the hard register cost that A can contribute
to AP in general (not just for conflicts).  Whatever allocation we pick
for AP, there is always the option of spilling that register to memory
throughout L, so the cost to A of allocating a register to AP can't be
more than the cost of spilling A.

To take an extreme example: if allocating a register R2 to A is more
expensive than spilling A to memory, ALLOCNO_HARD_REG_COSTS (A)[R2]
could be (say) 2 times greater than ALLOCNO_MEMORY_COST (A) or 100
times greater than ALLOCNO_MEMORY_COST (A).  But this scale factor
doesn't matter to AP.  All that matters is that R2 is more expensive
than memory for A, so that allocating R2 to AP should be costed as
spilling A to memory (again assuming that A and AP are in different
allocation regions).  Propagating a factor of 100 would distort the
register costs for AP.

move_spill_restore tries to undo the propagation done by
propagate_allocno_info, so we need some extra processing there.

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_allocno::might_conflict_with_parent_p): New field.
(ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P): New macro.
(ira_single_region_allocno_p): New function.
(ira_total_conflict_hard_regs): Likewise.
* ira-build.c (ira_create_allocno): Initialize
ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P.
(ira_propagate_hard_reg_costs): New function.
(propagate_allocno_info): Use it.  Try to avoid propagating
hard register conflicts to parent allocnos if we can handle
the conflicts by spilling instead.  Limit the propagated
register costs to the cost of spilling throughout the child loop.
* ira-color.c (color_pass): Use ira_single_region_allocno_p to
test whether a child and parent allocno can share the same
register.
(move_spill_restore): Adjust for the new behavior of
propagate_allocno_info.

gcc/testsuite/
* gcc.target/aarch64/reg-alloc-2.c: New test.
---
 gcc/ira-build.c   | 55 +--
 gcc/ira-color.c   | 53 --
 gcc/ira-int.h | 37 +
 .../gcc.target/aarch64/reg-alloc-2.c  | 47 
 4 files changed, 169 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/reg-alloc-2.c

diff --git a/gcc/ira-build.c b/gcc/ira-build.c
index 0547edfde55..875b4d8ed7c 100644
--- a/gcc/ira-build.c
+++ b/gcc/ira-build.c
@@ -497,6 +497,7 @@ ira_create_allocno (int regno, bool cap_p,
   bitmap_set_bit (loop_tree_node->all_allocnos, ALLOCNO_NUM (a));
   ALLOCNO_NREFS (a) = 0;
   ALLOCNO_FREQ (a) = 0;
+  ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P (a) = false;
   ALLOCNO_HARD_REGNO (a) = -1;
   ALLOCNO_CALL_FREQ (a) = 0;
   ALLOCNO_CALLS_CROSSED_NUM (a) = 0;
@@ -1991,6 +1992,35 @@ propagate_modified_regnos (ira_loop_tree_node_t 
loop_tree_node)
   loop_tree_node->modified_regnos);
 }
 
+/* Propagate ALLOCNO_HARD_REG_COSTS from A to PARENT_A.  Use SPILL_COST
+   as the cost of spilling a register throughout A (which we have to do
+   for PARENT_A allocations that conflict with A).  */
+static void
+ira_propagate_hard_reg_costs (ira_allocno_t parent_a, ira_allocno_t a,
+ int spill_cost)
+{
+  HARD_REG_SET conflicts = ira_total_conflict_hard_regs (a);
+  conflicts &= ~ira_total_conflict_hard_regs (parent_a);
+
+  auto costs = ALLOCNO_HARD_REG_COSTS (a);
+  if (!hard_reg_set_empty_p (conflicts))
+ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P (a) = true;
+  else if (!costs)
+return;
+
+  auto aclass = ALLOCNO_CLASS (a);
+  ira_allocate_and_set_costs (&ALLOCNO_HARD_REG_COSTS (parent_a),
+ aclass, ALLOCNO_CLASS_COST (parent_a));
+  auto parent_costs = ALLOCNO_HARD_REG_COSTS (parent_a);
+  for (int i = 0; i < ira_class_hard_regs_num[aclass]; ++i)
+if (TEST_HARD_REG_BIT (conflicts, ira_class_hard_regs[aclass][i]))
+  parent_costs[i] += spill_cost;
+else if (costs)
+  /* The cost to A of

[PATCH 5/6] ira: Consider modelling caller-save allocations as loop spills

2022-01-06 Thread Richard Sandiford via Gcc-patches
If an allocno A in an inner loop L spans a call, a parent allocno AP
can choose to handle a call-clobbered/caller-saved hard register R
in one of two ways:

(1) save R before each call in L and restore R after each call
(2) spill R to memory throughout L

(2) can be cheaper than (1) in some cases, particularly if L does
not reference A.

Before the patch we always did (1).  The patch adds support for
picking (2) instead, when it seems cheaper.  It builds on the
earlier support for not propagating conflicts to parent allocnos.

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_caller_save_cost): New function.
(ira_caller_save_loop_spill_p): Likewise.
* ira-build.c (ira_propagate_hard_reg_costs): Test whether it is
cheaper to spill a call-clobbered register throughout a loop rather
than spill it around each individual call.  If so, treat all
call-clobbered registers as conflicts and...
(propagate_allocno_info): ...do not propagate call information
from the child to the parent.
* ira-color.c (move_spill_restore): Update accordingly.
* ira-costs.c (ira_tune_allocno_costs): Use ira_caller_save_cost.

gcc/testsuite/
* gcc.target/aarch64/reg-alloc-3.c: New test.
---
 gcc/ira-build.c   | 23 ---
 gcc/ira-color.c   | 13 ++--
 gcc/ira-costs.c   |  7 +-
 gcc/ira-int.h | 39 +++
 .../gcc.target/aarch64/reg-alloc-3.c  | 65 +++
 5 files changed, 129 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/reg-alloc-3.c

diff --git a/gcc/ira-build.c b/gcc/ira-build.c
index 875b4d8ed7c..ab3e87164e1 100644
--- a/gcc/ira-build.c
+++ b/gcc/ira-build.c
@@ -2000,6 +2000,8 @@ ira_propagate_hard_reg_costs (ira_allocno_t parent_a, 
ira_allocno_t a,
  int spill_cost)
 {
   HARD_REG_SET conflicts = ira_total_conflict_hard_regs (a);
+  if (ira_caller_save_loop_spill_p (parent_a, a, spill_cost))
+conflicts |= ira_need_caller_save_regs (a);
   conflicts &= ~ira_total_conflict_hard_regs (parent_a);
 
   auto costs = ALLOCNO_HARD_REG_COSTS (a);
@@ -2069,15 +2071,18 @@ propagate_allocno_info (void)
  if (!ira_subloop_allocnos_can_differ_p (parent_a))
merge_hard_reg_conflicts (a, parent_a, true);
 
- ALLOCNO_CALL_FREQ (parent_a) += ALLOCNO_CALL_FREQ (a);
- ALLOCNO_CALLS_CROSSED_NUM (parent_a)
-   += ALLOCNO_CALLS_CROSSED_NUM (a);
- ALLOCNO_CHEAP_CALLS_CROSSED_NUM (parent_a)
-   += ALLOCNO_CHEAP_CALLS_CROSSED_NUM (a);
- ALLOCNO_CROSSED_CALLS_ABIS (parent_a)
-   |= ALLOCNO_CROSSED_CALLS_ABIS (a);
- ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (parent_a)
-   |= ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a);
+ if (!ira_caller_save_loop_spill_p (parent_a, a, spill_cost))
+   {
+ ALLOCNO_CALL_FREQ (parent_a) += ALLOCNO_CALL_FREQ (a);
+ ALLOCNO_CALLS_CROSSED_NUM (parent_a)
+   += ALLOCNO_CALLS_CROSSED_NUM (a);
+ ALLOCNO_CHEAP_CALLS_CROSSED_NUM (parent_a)
+   += ALLOCNO_CHEAP_CALLS_CROSSED_NUM (a);
+ ALLOCNO_CROSSED_CALLS_ABIS (parent_a)
+   |= ALLOCNO_CROSSED_CALLS_ABIS (a);
+ ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (parent_a)
+   |= ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a);
+   }
  ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (parent_a)
+= ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (a);
  aclass = ALLOCNO_CLASS (a);
diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 4344ee6689e..1487afc5ef1 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -3597,11 +3597,16 @@ move_spill_restore (void)
 propagate_allocno_info will have propagated
 the cost of spilling HARD_REGNO in SUBLOOP_NODE.
 (ira_subloop_allocnos_can_differ_p must be true
-in that case.)  Otherwise, SPILL_COST acted as
-a cap on the propagated register cost, in cases
-where the allocations can differ.  */
+in that case.)  If HARD_REGNO is a caller-saved
+register, we might have modelled it in the same way.
+
+Otherwise, SPILL_COST acted as a cap on the propagated
+register cost, in cases where the allocations can differ.  */
  auto conflicts = ira_total_conflict_hard_regs (subloop_allocno);
- if (TEST_HARD_REG_BIT (conflicts, hard_regno))
+ if (TEST_HARD_REG_BIT (conflicts, hard_regno)
+ || (ira_need_caller_save_p (subloop_allocno, hard_regno)
+ && ira_caller_save_loop_spill_p (a, subloop_allocno,
+  spill_cost)))
reg_cost = spill_cost;
  else if (ira_subloop_allocno

[PATCH 6/6] ira: Handle "soft" conflicts between cap and non-cap allocnos

2022-01-06 Thread Richard Sandiford via Gcc-patches
This patch looks for allocno conflicts of the following form:

- One allocno (X) is a cap allocno for some non-cap allocno X2.
- X2 belongs to some loop L2.
- The other allocno (Y) is a non-cap allocno.
- Y is an ancestor of some allocno Y2 in L2.
- Y2 is not referenced in L2 (that is, ALLOCNO_NREFS (Y2) == 0).
- Y can use a different allocation from Y2.

In this case, Y's register is live across L2 but is not used within it,
whereas X's register is used only within L2.  The conflict is therefore
only "soft", in that it can easily be avoided by spilling Y2 inside L2
without affecting any insn references.

In principle we could do this for ALLOCNO_NREFS (Y2) != 0 too, with the
callers then taking Y2's ALLOCNO_MEMORY_COST into account.  There would
then be no "cliff edge" between a Y2 that has no references and a Y2 that
has (say) a single cold reference.

However, doing that isn't necessary for the PR and seems to give
variable results in practice.  (fotonik3d_r improves slightly but
namd_r regresses slightly.)  It therefore seemed better to start
with the higher-value zero-reference case and see how things go.

On top of the previous patches in the series, this fixes the exchange2
regression seen in GCC 11.

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_soft_conflict): Declare.
* ira-costs.c (max_soft_conflict_loop_depth): New constant.
(ira_soft_conflict): New function.
(spill_soft_conflicts): Likewise.
(assign_hard_reg): Use them to handle the case described by
the comment above ira_soft_conflict.
(improve_allocation): Likewise.
* ira.c (check_allocation): Allow allocnos with "soft" conflicts
to share the same register.

gcc/testsuite/
* gcc.target/aarch64/reg-alloc-4.c: New test.
---
 gcc/ira-color.c   | 284 --
 gcc/ira-int.h |   1 +
 gcc/ira.c |   2 +
 .../gcc.target/aarch64/reg-alloc-4.c  |  69 +
 4 files changed, 326 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/reg-alloc-4.c

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 1487afc5ef1..36f3f4d70f3 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -36,6 +36,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "reload.h"
 #include "cfgloop.h"
 
+/* To prevent soft conflict detection becoming quadratic in the
+   loop depth.  Only for very pathological cases, so it hardly
+   seems worth a --param.  */
+const int max_soft_conflict_loop_depth = 64;
+
 typedef struct allocno_hard_regs *allocno_hard_regs_t;
 
 /* The structure contains information about hard registers can be
@@ -1707,6 +1712,167 @@ calculate_saved_nregs (int hard_regno, machine_mode 
mode)
   return nregs;
 }
 
+/* Allocnos A1 and A2 are known to conflict.  Check whether, in some loop L
+   that is either the current loop or a nested subloop, the conflict is of
+   the following form:
+
+   - One allocno (X) is a cap allocno for some non-cap allocno X2.
+
+   - X2 belongs to some loop L2.
+
+   - The other allocno (Y) is a non-cap allocno.
+
+   - Y is an ancestor of some allocno Y2 in L2.  (Note that such a Y2
+ must exist, given that X and Y conflict.)
+
+   - Y2 is not referenced in L2 (that is, ALLOCNO_NREFS (Y2) == 0).
+
+   - Y can use a different allocation from Y2.
+
+   In this case, Y's register is live across L2 but is not used within it,
+   whereas X's register is used only within L2.  The conflict is therefore
+   only "soft", in that it can easily be avoided by spilling Y2 inside L2
+   without affecting any insn references.
+
+   If the conflict does have this form, return the Y2 that would need to be
+   spilled in order to allow X and Y (and thus A1 and A2) to use the same
+   register.  Return null otherwise.  Returning null is conservatively correct;
+   any nonnnull return value is an optimization.  */
+ira_allocno_t
+ira_soft_conflict (ira_allocno_t a1, ira_allocno_t a2)
+{
+  /* Search for the loop L and its associated allocnos X and Y.  */
+  int search_depth = 0;
+  while (ALLOCNO_CAP_MEMBER (a1) && ALLOCNO_CAP_MEMBER (a2))
+{
+  a1 = ALLOCNO_CAP_MEMBER (a1);
+  a2 = ALLOCNO_CAP_MEMBER (a2);
+  if (search_depth++ > max_soft_conflict_loop_depth)
+   return nullptr;
+}
+  /* This must be true if A1 and A2 conflict.  */
+  ira_assert (ALLOCNO_LOOP_TREE_NODE (a1) == ALLOCNO_LOOP_TREE_NODE (a2));
+
+  /* Make A1 the cap allocno (X in the comment above) and A2 the
+ non-cap allocno (Y in the comment above).  */
+  if (ALLOCNO_CAP_MEMBER (a2))
+std::swap (a1, a2);
+  if (!ALLOCNO_CAP_MEMBER (a1))
+return nullptr;
+
+  /* Search for the real allocno that A1 caps (X2 in the comment above).  */
+  do
+{
+  a1 = ALLOCNO_CAP_MEMBER (a1);
+  if (search_depth++ > max_soft_conflict_loop_depth)
+   return nullptr;
+}
+  while (ALLOCNO_CAP_MEMBER (a1))

[committed] libstdc++: Do not use std::isdigit in [PR103911]

2022-01-06 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.


This avoids a potential race condition if std::setlocale is used
concurrently with std::from_chars.

libstdc++-v3/ChangeLog:

PR libstdc++/103911
* include/std/charconv (__from_chars_alpha_to_num): Return
char instead of unsigned char. Change invalid return value to
127 instead of using numeric trait.
(__from_chars_alnum): Fix comment. Do not use std::isdigit.
Change type of variable to char.
---
 libstdc++-v3/include/std/charconv | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/std/charconv 
b/libstdc++-v3/include/std/charconv
index 4337a9e52f9..a3f8c7718b2 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -39,7 +39,6 @@
 
 #include 
 #include  // for __bit_width
-#include   // for isdigit
 #include  // for __to_chars_len, __to_chars_10_impl
 #include  // for std::errc
 #include 
@@ -466,7 +465,7 @@ namespace __detail
   return true;
 }
 
-  constexpr unsigned char
+  constexpr char
   __from_chars_alpha_to_num(char __c)
   {
 switch (__c)
@@ -550,10 +549,10 @@ namespace __detail
 case 'Z':
   return 35;
 }
-return __gnu_cxx::__int_traits::__max;
+return 127;
   }
 
-  /// std::from_chars implementation for integers in bases 11 to 26.
+  /// std::from_chars implementation for integers in bases 11 to 36.
   template
 bool
 __from_chars_alnum(const char*& __first, const char* __last, _Tp& __val,
@@ -562,8 +561,8 @@ namespace __detail
   bool __valid = true;
   while (__first != __last)
{
- unsigned char __c = *__first;
- if (std::isdigit(__c))
+ char __c = *__first;
+ if ('0' <= __c && __c <= '9') // isdigit
__c -= '0';
  else
{
-- 
2.31.1



[committed] libstdc++: Increase timeout for pthread7-rope.cc test

2022-01-06 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.


This test spawns thousands of threads and so times out if the tests are
run with a low timeout value and the machine is busy.

libstdc++-v3/ChangeLog:

* testsuite/ext/rope/pthread7-rope.cc: Add dg-timeout-factor.
---
 libstdc++-v3/testsuite/ext/rope/pthread7-rope.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/ext/rope/pthread7-rope.cc 
b/libstdc++-v3/testsuite/ext/rope/pthread7-rope.cc
index 9a870d1fc2e..b0b92adcfa9 100644
--- a/libstdc++-v3/testsuite/ext/rope/pthread7-rope.cc
+++ b/libstdc++-v3/testsuite/ext/rope/pthread7-rope.cc
@@ -20,6 +20,7 @@
 // { dg-do run }
 // { dg-options "-pthread"  }
 // { dg-require-effective-target pthread }
+// { dg-timeout-factor 2 }
 
 #include 
 #include 
@@ -33,7 +34,7 @@ typedef __gnu_cxx::rope > 
rope_type;
 rope_type foo2;
 rope_type foo4;
 
-void* thread_main(void *) 
+void* thread_main(void *)
 {
   // To see a problem with gcc 3.3 and before, set a break point here.
   // Single step through c_str implementation, call sched_yield after
-- 
2.31.1



Re: [committed] libstdc++: Reduce template instantiations in

2022-01-06 Thread Jonathan Wakely via Gcc-patches
On Thu, 6 Jan 2022 at 10:43, Jonathan Wakely  wrote:

>
>
> On Thu, 6 Jan 2022 at 10:33, Jonathan Wakely  wrote:
>
>>
>>
>> On Thu, 6 Jan 2022 at 10:00, Stephan Bergmann 
>> wrote:
>>
>>> On 05/01/2022 14:47, Jonathan Wakely via Libstdc++ wrote:
>>> > Tested powerpc64le-linux, pushed to trunk.
>>> >
>>> >
>>> > This moves the last two template parameters of __regex_algo_impl to be
>>> > runtime function parameters instead, so that we don't need four
>>> > different instantiations for the possible ways to call it. Most of the
>>> > function (and what it instantiates) is the same in all cases, so making
>>> > them compile-time choices doesn't really have much benefit.
>>> >
>>> > Use  'if constexpr' for conditions that check template parameters, so
>>> > that when we do depend on a compile-time condition we only instantiate
>>> > what we need to.
>>> >
>>> > libstdc++-v3/ChangeLog:
>>> >
>>> >   * include/bits/regex.h (__regex_algo_impl): Change __policy and
>>> >   __match_mode template parameters to be function parameters.
>>> >   (regex_match, regex_search): Pass policy and match mode as
>>> >   function arguments.
>>> >   * include/bits/regex.tcc (__regex_algo_impl): Change template
>>> >   parameters to function parameters.
>>> >   * include/bits/regex_compiler.h (_RegexTranslatorBase): Use
>>> >   'if constexpr' for conditions using template parameters.
>>> >   (_RegexTranslator): Likewise.
>>> >   * include/bits/regex_executor.tcc (_Executor::_M_handle_accept):
>>> >   Likewise.
>>> >   * testsuite/util/testsuite_regex.h (regex_match_debug)
>>> >   (regex_search_debug): Move template arguments to function
>>> >   arguments.
>>> > ---
>>> >   libstdc++-v3/include/bits/regex.h | 33
>>> +--
>>> >   libstdc++-v3/include/bits/regex.tcc   |  8 ++---
>>> >   libstdc++-v3/include/bits/regex_compiler.h|  9 ++---
>>> >   libstdc++-v3/include/bits/regex_executor.tcc  |  2 +-
>>> >   libstdc++-v3/testsuite/util/testsuite_regex.h | 24 +++---
>>> >   5 files changed, 37 insertions(+), 39 deletions(-)
>>>
>>> Clang now fails #include  with
>>>
>>> > In file included from
>>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/regex:66:
>>> >
>>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:799:9:
>>> error: unknown type name '_RegexExecutorPolicy'; did you mean
>>> '__detail::_RegexExecutorPolicy'?
>>> > _RegexExecutorPolicy, bool);
>>> > ^
>>> >
>>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14:
>>> note: '__detail::_RegexExecutorPolicy' declared here
>>> >   enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate };
>>> >  ^
>>> >
>>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:2070:9:
>>> error: unknown type name '_RegexExecutorPolicy'; did you mean
>>> '__detail::_RegexExecutorPolicy'?
>>> > _RegexExecutorPolicy, bool);
>>> > ^
>>> >
>>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14:
>>> note: '__detail::_RegexExecutorPolicy' declared here
>>> >   enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate };
>>> >  ^
>>>
>>> and
>>>
>>> > diff --git a/libstdc++-v3/include/bits/regex.h
>>> b/libstdc++-v3/include/bits/regex.h
>>> > index 7480b0a5f97..46c168010bf 100644
>>> > --- a/libstdc++-v3/include/bits/regex.h
>>> > +++ b/libstdc++-v3/include/bits/regex.h
>>> > @@ -796,7 +796,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>>> > __detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&,
>>> > const basic_regex<_Cp, _Rp>&,
>>> > regex_constants::match_flag_type,
>>> > -   _RegexExecutorPolicy, bool);
>>> > +   __detail::_RegexExecutorPolicy,
>>> bool);
>>> >
>>> >template
>>> > friend class __detail::_Executor;
>>> > @@ -2067,7 +2067,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>>> > __detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&,
>>> > const basic_regex<_Cp, _Rp>&,
>>> > regex_constants::match_flag_type,
>>> > -   _RegexExecutorPolicy, bool);
>>> > +   __detail::_RegexExecutorPolicy,
>>> bool);
>>> >
>>> >// Reset contents to __size unmatched sub_match objects
>>> >// (plus additional objects for prefix, suffix and unmatched
>>> sub).
>>>
>>> would fix that.
>>>
>>
>>
>> I'll make the change, but this looks like a clang bug:
>> https://godbolt.org/z/bozxYErrc
>>
>
> Maybe this one: https:/

[committed] libstdc++: Add self-merge check to std::forward_list::merge [PR103853]

2022-01-06 Thread Jonathan Wakely via Gcc-patches
From: "Pavel I. Kryukov" 

Tested powerpc64le-linux, pushed to trunk.


This implements the proposed resolution of LWG 3088, so that x.merge(x)
is a no-op, consistent with std::list::merge.

Signed-off-by: Pavel I. Kryukov 

Co-authored-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/103853
* include/bits/forward_list.tcc (forward_list::merge): Check for
self-merge.
* testsuite/23_containers/forward_list/operations/merge.cc: New test.
---
 libstdc++-v3/include/bits/forward_list.tcc|  5 ++
 .../forward_list/operations/merge.cc  | 48 +++
 2 files changed, 53 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/forward_list/operations/merge.cc

diff --git a/libstdc++-v3/include/bits/forward_list.tcc 
b/libstdc++-v3/include/bits/forward_list.tcc
index 7d90e82de39..b0a65457404 100644
--- a/libstdc++-v3/include/bits/forward_list.tcc
+++ b/libstdc++-v3/include/bits/forward_list.tcc
@@ -367,6 +367,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   forward_list<_Tp, _Alloc>::
   merge(forward_list&& __list, _Comp __comp)
   {
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3088. forward_list::merge behavior unclear when passed *this
+   if (std::__addressof(__list) == this)
+ return;
+
_Node_base* __node = &this->_M_impl._M_head;
while (__node->_M_next && __list._M_impl._M_head._M_next)
  {
diff --git 
a/libstdc++-v3/testsuite/23_containers/forward_list/operations/merge.cc 
b/libstdc++-v3/testsuite/23_containers/forward_list/operations/merge.cc
new file mode 100644
index 000..0f6f520c33b
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/forward_list/operations/merge.cc
@@ -0,0 +1,48 @@
+// { dg-do run { target c++11 } }
+// C++11 23.3.4.6 Operations [forwardlist.ops]
+
+#include 
+#include 
+
+void
+test_stable()
+{
+  std::forward_list a{1.5, 2.0, 3.5, 4.1};
+  std::forward_list b{1.0, 2.5, 3.0, 4.3, 4.2, 5.0};
+
+  a.merge(b, std::less{});
+
+  // result is sorted with respect to std::less, so 1.0 and 1.5 are
+  // equivalent, and stability guarantee means the element from a comes first.
+  const std::forward_list r { 1.5, 1.0,
+ 2.0, 2.5,
+ 3.5, 3.0,
+ 4.1, 4.3, 4.2,
+ 5.0};
+
+  VERIFY(a == r);
+}
+
+void
+test_lwg3088()
+{
+  // LWG 3088: forward_list::merge behavior unclear when passed *this
+  // PR libstdc++/103853
+  std::forward_list c1{ 1, 2, 3 };
+  const std::forward_list c2 = c1;
+  c1.merge(c1);
+  VERIFY( c1 == c2 );
+  c1.merge(c1, std::less{});
+  VERIFY( c1 == c2 );
+  c1.merge(std::move(c1));
+  VERIFY( c1 == c2 );
+  c1.merge(std::move(c1), std::less{});
+  VERIFY( c1 == c2 );
+}
+
+int
+main()
+{
+  test_stable();
+  test_lwg3088();
+}
-- 
2.31.1



Re: [PATCH 1/6] ira: Add a ira_loop_border_costs class

2022-01-06 Thread Jan Hubicka via Gcc-patches
> The final index into (ira_)memory_move_cost is 1 for loads and
> 0 for stores.  Thus the combination:
> 
>   entry_freq * memory_cost[1] + exit_freq * memory_cost[0]
> 
> is the cost of loading a register on entry to a loop and
> storing it back on exit from the loop.  This is the cost to
> use if the register is successfully allocated within the
> loop but is spilled in the parent loop.  Similarly:
> 
>   entry_freq * memory_cost[0] + exit_freq * memory_cost[1]
> 
> is the cost of storing a register on entry to the loop and
> restoring it on exit from the loop.  This is the cost to
> use if the register is spilled within the loop but is
> successfully allocated in the parent loop.
> 
> The patch adds a helper class for calculating these values and
> mechanically replaces the existing instances.  There is no attempt to
> editorialise the choice between using “spill inside” and “spill outside”
> costs.  (I think one of them is the wrong way round, but a later patch
> deals with that.)
> 
> No functional change intended.
> 
> gcc/
>   PR rtl-optimization/98782
>   * ira-int.h (ira_loop_border_costs): New class.
>   * ira-color.c (ira_loop_border_costs::ira_loop_border_costs):
>   New constructor.
>   (calculate_allocno_spill_cost): Use ira_loop_border_costs.
>   (color_pass): Likewise.
>   (move_spill_restore): Likewise.

Thanks for working on this.  For profile bits, the patch looks good to
me.  In general I am trying to move away from the integer frequencies.
It would be more precise to calculate the m_entry_freq and m_exit_freq
as profile_count m_entry_count, m_exit_count
and do conversion only later.

count->to_frequency method basically scales it to the range 0...BB_FREQ_MAX.

Honza


Re: [PATCH 1b/6] Add __attribute__((untrusted))

2022-01-06 Thread David Malcolm via Gcc-patches
On Thu, 2021-12-09 at 15:54 -0700, Martin Sebor wrote:
> On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
> > This patch adds a new:
> > 
> >    __attribute__((untrusted))
> > 
> > for use by the C front-end, intended for use by the Linux kernel for
> > use with "__user", but which could be used by other operating system
> > kernels, and potentialy by other projects.
> 
> It looks like untrusted is a type attribute (rather than one
> that applies to variables and/or function return values or
> writeable by-reference arguments).  I find that quite surprising.

FWIW I initially tried implementing it on pointer types, but doing it
on the underlying type was much cleaner.

>   I'm used to thinking of trusted vs tainted as dynamic properties
> of data so I'm having trouble deciding what to think about
> the attribute applying to types.  Can you explain why it's
> useful on types?

A type system *is* a way of detecting problems involving dynamic
properties of data.  Ultimately all we have at runtime is a collection
of bits; the toolchain has the concept of types as a way to allow us to
reason about properies of those bits without requiring a full cross-TU
analysis (to try to figure out that e.g. x is, say, a 32 bit unsigned
integer), and to document these properties clearly to human readers of
the code.

I see this as working like a qualifier (rather like "const" and
"volatile"), in that an
  untrusted char *
when dereferenced gives you an
  untrusted char

The intent is to have a way of treating the values as "actively
hostile", so that code analyzers can assume the worst possible values
for such types (or more glibly, that we're dealing with data from Satan
rather than from Murphy).

Such types are also relevant to infoleaks: writing sensitive
information to an untrusted value can be detected relatively easily
with this approach, by checking the type of the value - the types
express the trust boundary

Doing this with qualifiers allows us to use the C type system to detect
these kinds of issues without having to add a full cross-TU
interprocedural analysis, and documents it to human readers of the
code.   Compare with const-correctness; we can have an analogous
"trust-correctness".

> 
> I'd expect the taint property of a type to be quickly lost as
> an object of the type is passed through existing APIs (e.g.,
> a char array manipulated by string functions like strchr).

FWIW you can't directly pass an attacker-controlled buffer to strchr:
strchr requires there to be a 0-terminator to the array; if the array's
content is untrusted then the attacker might not have 0-terminated it.

As implemented, the patch doesn't complain about this, though maybe it
should.

The main point here is to support the existing __user annotation used
by the Linux kernel, in particular, copy_from_user and copy_to_user.

> 
> (I usually look at tests to help me understand the design of
> a change but I couldn't find an answer to my question in those
> in the patch.)

The patch kit was rather unclear on this, due to the use of two
different approaches (custom address spaces vs this untrusted
attribute).  Sorry about this.

Patches 4a and 4b in the kit add test-uaccess.h (to
gcc/testsuite/gcc.dg/analyzer) which supplies "__user"; see the tests
that use "test-uaccess.h" in patch 3:
 [PATCH 3/6] analyzer: implement infoleak detection
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584377.html
and in patch 5:
 [PATCH 5/6] analyzer: use region::untrusted_p in taint detection
   https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584374.html

(sorry about messing up the order of the patches).

Patch 4a here:
 [PATCH 4a/6] analyzer: implement region::untrusted_p in terms of custom 
address spaces
   https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584371.html
implements "__user" as a custom address space, 

whereas patch 4b here:

 [PATCH 4b/6] analyzer: implement region::untrusted_p in terms of 
__attribute__((untrusted))
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584373.html

implements "__user" to be __attribute__((untrusted)).

Perhaps I should drop the custom address space versions of the patches
and post a version of the kit that simply uses the attribute?

Dave


> 
> Thanks
> Martin
> 
> PS I found one paper online that discusses type-based taint
> analysis in Java but not much more.  I only quickly skimmed
> the paper and although it conceptually makes sense I'm still
> having difficulties seeing how it would be useful in C.
> 
> > 
> > Known issues:
> > - at least one TODO in handle_untrusted_attribute
> > - should it be permitted to dereference an untrusted pointer?  The
> > patch
> >    currently allows this
> > 
> > gcc/c-family/ChangeLog:
> > * c-attribs.c (c_common_attribute_table): Add "untrusted".
> > (build_untrusted_type): New.
> > (handle_untrusted_attribute): New.
> > * c-pretty-print.c (pp_c_cv_qualifiers): Handle
> > TYPE_QUAL_UNTRU

Re: [PATCH] Loop unswitching: support gswitch statements.

2022-01-06 Thread Andrew MacLeod via Gcc-patches

On 1/5/22 07:34, Richard Biener wrote:

On Thu, Dec 9, 2021 at 2:02 PM Martin Liška  wrote:

On 11/30/21 12:17, Richard Biener wrote:


+ unswitch_predicate *predicate
+   = new unswitch_predicate (expr, idx, edge_index);
+ ranger->gori ().outgoing_edge_range_p
(predicate->true_range, e,
+idx,
*get_global_range_query ());
+ /* Huge switches are not supported by Ranger.  */
+ if (predicate->true_range.undefined_p ())

I hope ranger will set the range to varying_p () in that case, not
undefined?  But even
then, is that a reason for punting?  I guess we fail to prune cases in
that case but
the cost modeling should then account for those and thus we are at
least consistent?


huge switches not supported?  I don't know what you mean, either that or 
I forget :-)  If there are more edges than multi-ranges support, then 
things will start getting merged because they cant be represented.. and 
the default case may then also contain some values that also have 
cases..  but all inconsistencies will move towards varying.. not undefined.


Andrew



[committed] c++: Add testcase for recently fixed PR [PR69681]

2022-01-06 Thread Patrick Palka via Gcc-patches
Fixed ever since r12-6188.

PR c++/69681

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-compare2.C: New test.
---
 gcc/testsuite/g++.dg/cpp0x/constexpr-compare2.C | 10 ++
 1 file changed, 10 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-compare2.C

diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-compare2.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-compare2.C
new file mode 100644
index 000..b1bc4720805
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-compare2.C
@@ -0,0 +1,10 @@
+// PR c++/69681
+// { dg-do compile { target c++11 } }
+
+void f();
+void g();
+static_assert(f != g, "");
+
+#if __cpp_constexpr >= 201603L
+static_assert([]{} != []{}, "");
+#endif
-- 
2.34.1.493.ge83ba647f7



Re: [PATCH] Loop unswitching: support gswitch statements.

2022-01-06 Thread Martin Liška

On 1/6/22 16:11, Andrew MacLeod wrote:

On 1/5/22 07:34, Richard Biener wrote:

On Thu, Dec 9, 2021 at 2:02 PM Martin Liška  wrote:

On 11/30/21 12:17, Richard Biener wrote:


+ unswitch_predicate *predicate
+   = new unswitch_predicate (expr, idx, edge_index);
+ ranger->gori ().outgoing_edge_range_p
(predicate->true_range, e,
+    idx,
*get_global_range_query ());
+ /* Huge switches are not supported by Ranger.  */
+ if (predicate->true_range.undefined_p ())

I hope ranger will set the range to varying_p () in that case, not
undefined?  But even
then, is that a reason for punting?  I guess we fail to prune cases in
that case but
the cost modeling should then account for those and thus we are at
least consistent?


huge switches not supported?  I don't know what you mean, either that or I 
forget :-)  If there are more edges than multi-ranges support, then things will 
start getting merged because they cant be represented.. and the default case 
may then also contain some values that also have cases..  but all 
inconsistencies will move towards varying.. not undefined.

Andrew



Hello.

If you consider the attached test-case, then I get for:

(gdb) p debug_bb(e->src)
 [local count: 955630225]:
# i_489 = PHI 
# tmp_490 = PHI 
_1329 = (long unsigned int) i_489;
_1330 = _1329 * 8;
switch (order_385(D))  [1.08%], case 0:  [1.08%], case 1:  [1.08%], case 2:  [1.08%], case 3:  [1.08%], case 4:  [1.08%], case 5:  [1.08%], case 6:  [1.08%], case 7:  [1.08%], case 8:  [1.08%], case 9:  [1.08%], case 10:  [1.08%], case 11:  [1.08%], case 12:  [1.08%], case 13:  [1.08%], case 14:  [1.08%], case 15:  [1.08%], case 16:  [1.08%], case 17:  [1.08%], case 18:  [1.08%], case 19:  [1.08%], case 
20:  [1.08%], case 21:  [1.08%], case 22:  [1.08%], case 23:  [1.08%], case 24:  [1.08%], case 25:  [1.08%], case 26:  [1.08%], case 27:  [1.08%], case 28:  [1.08%], case 29:  [1.08%], case 30:  [1.08%], case 31:  [1.08%], case 32:  [1.08%], case 33:  [1.08%], case 34:  [1.08%], case 35:  [1.08%], case 36:  [1.08%], case 37:  [1.08%], case 38:  [1.08%], case 39:  [1.08%], case 40:  [1.08%], case 41: 
 [1.08%], case 42:  [1.08%], case 43:  [1.08%], case 44:  [1.08%], case 45:  [1.08%], case 46:  [1.08%], case 47:  [1.08%], case 48:  [1.08%], case 49:  [1.08%], case 50:  [1.08%], case 51:  [1.08%], case 52:  [1.08%], case 53:  [1.08%], case 54:  [1.08%], case 55:  [1.08%], case 56:  [1.08%], case 57:  [1.08%], case 58:  [1.08%], case 59:  [1.08%], case 60:  [1.08%], case 61:  [1.08%], case 62:  
[1.08%], case 63:  [1.08%], case 64:  [1.08%], case 65:  [1.08%], case 66:  [1.08%], case 67:  [1.08%], case 68:  [1.08%], case 69:  [1.08%], case 70:  [1.08%], case 71:  [1.08%], case 72:  [1.08%], case 73:  [1.08%], case 74:  [1.08%], case 75:  [1.08%], case 76:  [1.08%], case 77:  [1.08%], case 78:  [1.08%], case 79:  [1.08%], case 80:  [1.08%], case 81:  [1.08%], case 82:  [1.08%], case 83:  [1.08%], case 
84:  [1.08%], case 85:  [1.08%], case 86:  [1.08%], case 87:  [1.08%], case 88:  [1.08%], case 89:  [1.08%], case 90:  [1.08%], case 91:  [1.08%]>

$7 = void
(gdb) p debug_bb(e->dest)
 [local count: 10275591]:
:
_3 = a_386(D) + _1330;
_4 = *_3;
tmp_478 = _4 * 0.0;
goto ; [100.00%]

│  480ranger->gori ().outgoing_edge_range_p 
(predicate->true_range, e,
│  481   idx, 
*get_global_range_query ());

Note the return value is false. But I think the comment is not precise:

│ 1233  // Calculate a range on edge E and return it in R.  Try to evaluate 
a
│ 1234  // range for NAME on this edge.  Return FALSE if this is either not 
a
│ 1235  // control edge or NAME is not defined by this edge.

and

(gdb) p predicate->true_range.debug()
UNDEFINED

So is the behavior correct Andrew?
Thanks,
Martin

int
__attribute__((noipa))
foo(double *a, double *b, double *c, double *d, double *r, int size, int order)
{
  for (int i = 0; i < size; i++)
  {
double tmp, tmp2;

switch(order)
{
  case 0: tmp = 0 * a[i]; break;
  case 1: tmp = 1 * a[i]; break;
  case 2: tmp = 2 * a[i]; break;
  case 3: tmp = 3 * a[i]; break;
  case 4: tmp = 4 * a[i]; break;
  case 5: tmp = 5 * a[i]; break;
  case 6: tmp = 6 * a[i]; break;
  case 7: tmp = 7 * a[i]; break;
  case 8: tmp = 8 * a[i]; break;
  case 9: tmp = 9 * a[i]; break;
  case 10: tmp = 10 * a[i]; break;
  case 11: tmp = 11 * a[i]; break;
  case 12: tmp = 12 * a[i]; break;
  case 13: tmp = 13 * a[i]; break;
  case 14: tmp = 14 * a[i]; break;
  case 15: tmp = 15 * a[i]; break;
  case 16: tmp = 16 * a[i]; break;
  case 17: tmp = 17 * a[i]; break;
  case 18: tmp = 18 * a[i]; break;
  case 19: tmp = 19 * a[i]; break;
  case 20: tmp = 20 * a[i]; break;
  case 21: tmp = 21 * a[i]; break;
  case 22:

Re: [PATCH] Loop unswitching: support gswitch statements.

2022-01-06 Thread Andrew MacLeod via Gcc-patches

On 1/6/22 11:02, Martin Liška wrote:

On 1/6/22 16:11, Andrew MacLeod wrote:

On 1/5/22 07:34, Richard Biener wrote:

On Thu, Dec 9, 2021 at 2:02 PM Martin Liška  wrote:

On 11/30/21 12:17, Richard Biener wrote:


+ unswitch_predicate *predicate
+   = new unswitch_predicate (expr, idx, edge_index);
+ ranger->gori ().outgoing_edge_range_p
(predicate->true_range, e,
+    idx,
*get_global_range_query ());
+ /* Huge switches are not supported by Ranger.  */
+ if (predicate->true_range.undefined_p ())

I hope ranger will set the range to varying_p () in that case, not
undefined?  But even
then, is that a reason for punting?  I guess we fail to prune cases in
that case but
the cost modeling should then account for those and thus we are at
least consistent?


huge switches not supported?  I don't know what you mean, either that 
or I forget :-)  If there are more edges than multi-ranges support, 
then things will start getting merged because they cant be 
represented.. and the default case may then also contain some values 
that also have cases..  but all inconsistencies will move towards 
varying.. not undefined.


Andrew



Hello.

If you consider the attached test-case, then I get for:

(gdb) p debug_bb(e->src)
 [local count: 955630225]:
# i_489 = PHI 
# tmp_490 = PHI 
_1329 = (long unsigned int) i_489;
_1330 = _1329 * 8;
switch (order_385(D))  [1.08%], case 0:  [1.08%], 
case 1:  [1.08%], case 2:  [1.08%], case 3:  [1.08%], case 
4:  [1.08%], case 5:  [1.08%], case 6:  [1.08%], case 7: 
 [1.08%], case 8:  [1.08%], case 9:  [1.08%], case 10: 
 [1.08%], case 11:  [1.08%], case 12:  [1.08%], case 
13:  [1.08%], case 14:  [1.08%], case 15:  [1.08%], 
case 16:  [1.08%], case 17:  [1.08%], case 18:  
[1.08%], case 19:  [1.08%], case 20:  [1.08%], case 21: 
 [1.08%], case 22:  [1.08%], case 23:  [1.08%], case 
24:  [1.08%], case 25:  [1.08%], case 26:  [1.08%], 
case 27:  [1.08%], case 28:  [1.08%], case 29:  
[1.08%], case 30:  [1.08%], case 31:  [1.08%], case 32: 
 [1.08%], case 33:  [1.08%], case 34:  [1.08%], case 
35:  [1.08%], case 36:  [1.08%], case 37:  [1.08%], 
case 38:  [1.08%], case 39:  [1.08%], case 40:  
[1.08%], case 41:  [1.08%], case 42:  [1.08%], case 43: 
 [1.08%], case 44:  [1.08%], case 45:  [1.08%], case 
46:  [1.08%], case 47:  [1.08%], case 48:  [1.08%], 
case 49:  [1.08%], case 50:  [1.08%], case 51:  
[1.08%], case 52:  [1.08%], case 53:  [1.08%], case 54: 
 [1.08%], case 55:  [1.08%], case 56:  [1.08%], case 
57:  [1.08%], case 58:  [1.08%], case 59:  [1.08%], 
case 60:  [1.08%], case 61:  [1.08%], case 62:  
[1.08%], case 63:  [1.08%], case 64:  [1.08%], case 65: 
 [1.08%], case 66:  [1.08%], case 67:  [1.08%], case 
68:  [1.08%], case 69:  [1.08%], case 70:  [1.08%], 
case 71:  [1.08%], case 72:  [1.08%], case 73:  
[1.08%], case 74:  [1.08%], case 75:  [1.08%], case 76: 
 [1.08%], case 77:  [1.08%], case 78:  [1.08%], case 
79:  [1.08%], case 80:  [1.08%], case 81:  [1.08%], 
case 82:  [1.08%], case 83:  [1.08%], case 84:  
[1.08%], case 85:  [1.08%], case 86:  [1.08%], case 87: 
 [1.08%], case 88:  [1.08%], case 89:  [1.08%], case 
90:  [1.08%], case 91:  [1.08%]>


$7 = void
(gdb) p debug_bb(e->dest)
 [local count: 10275591]:
:
_3 = a_386(D) + _1330;
_4 = *_3;
tmp_478 = _4 * 0.0;
goto ; [100.00%]

│  480    ranger->gori ().outgoing_edge_range_p 
(predicate->true_range, e,
│ 481   idx, 
*get_global_range_query ());


Note the return value is false. But I think the comment is not precise:



So if you get a FALSE back, you cannot use any results because GORI is 
claiming that it cant figure something out... or there is nothing to 
figure out.   Most of rangers routines are implemented so that if they 
return FALSE, the result is meaningless.



what is IDX you are passing?  order_385?

As a side note, theres a typo in the testcase:  .. Im not sure how that 
affects things, but :


   defaut:
    __builtin_unreachable ();


default is misspelled...  maybe it thinks that some kind of runtime 
value?   I am surprised it even compiles.  maybe that is mucking up what 
GORI thiunks it can calculate?






│ 1233  // Calculate a range on edge E and return it in R. Try to 
evaluate a
│ 1234  // range for NAME on this edge.  Return FALSE if this is 
either not a

│ 1235  // control edge or NAME is not defined by this edge.

and

(gdb) p predicate->true_range.debug()
UNDEFINED

So is the behavior correct Andrew?
Thanks,
Martin



Re: [PATCH] Loop unswitching: support gswitch statements.

2022-01-06 Thread Martin Liška

On 1/5/22 13:34, Richard Biener wrote:

On Thu, Dec 9, 2021 at 2:02 PM Martin Liška  wrote:


On 11/30/21 12:17, Richard Biener wrote:

I'd like to see the gswitch support - that's what was posted before stage3
close, this patch on its own doesn't seem worth pushing for.  That said,
I have some comments below (and the already raised ones about how
things might need to change with gswitch support).  Is it so difficult to
develop gswitch support as a separate change ontop of this?


Hello.

Took me some time, but I have a working version of the patch that makes both
refactoring of the costing model and adds support for gswitch. For quite some
time, I maintained 2 patches, but as commonly happens, I was forced doing quite
some rework. If we really want a separation for bisection purpose, I suggest 
simple
disabling of gswitch support?


It was really meant to ease review.


Sure, but keeping a separation if the final shape is changing is difficult :P


I'm now looking at the combined
patch, comments will
follow.

+static void
+clean_up_after_unswitching (const auto_edge_flag &ignored_edge_flag)
+{
+  basic_block bb;
+
...
+  /* Clean up the ignored_edge_flag from edges.  */
+  FOR_EACH_BB_FN (bb, cfun)
+{
+  edge e;

you can probably clean up outgoing edge flags in the loop above?


Done.



(I'm randomly jumping)

+ /* Build compound expression for all cases leading
+to this edge.  */
+ tree expr = NULL_TREE;
+ for (unsigned i = 1; i < gimple_switch_num_labels (stmt); ++i)
+   {
+ tree lab = gimple_switch_label (stmt, i);
+ basic_block dest = label_to_block (cfun, CASE_LABEL (lab));
+ edge e2 = find_edge (gimple_bb (stmt), dest);
+ if (e == e2)

just as a style note I prefer if (e != e2) continue; so the following code
needs less indentation


Likewise.



+ tree cmp1 = build2 (GE_EXPR, boolean_type_node, idx,
+ CASE_LOW (lab));

is there a reason you are not using fold_build2?


Changed.


Do we want to somehow account
for the built expression size or maybe have a separate limit for the number of
cases we want to combine this way?


Huh, I would ignore it for now, but yes, can be accounted as well.



+ unswitch_predicate *predicate
+   = new unswitch_predicate (expr, idx, edge_index);
+ ranger->gori ().outgoing_edge_range_p
(predicate->true_range, e,
+idx,
*get_global_range_query ());
+ /* Huge switches are not supported by Ranger.  */
+ if (predicate->true_range.undefined_p ())

I hope ranger will set the range to varying_p () in that case, not
undefined?


... it's been discussed in a different sub-thread.


But even
then, is that a reason for punting?  I guess we fail to prune cases in
that case but
the cost modeling should then account for those and thus we are at
least consistent?


Yes, but I think the situation (huge switches) will be rarely an interesting
optimization opportunity. Plus we would have to find a hot case in such switch.
That's currently ignored by the pass.



+   {
+ delete predicate;
+ return;

In this context, when we do

+  if (operand_equal_p (predicate->lhs, last_predicate->lhs, 0))
+   {
+ irange &other = (true_edge ? predicate->merged_true_range
+  : predicate->merged_false_range);
+ last_predicate->merged_true_range.intersect (other);
+ last_predicate->merged_false_range.intersect (other);
+ return;

ranger may be conservative when intersecting (and hopefully not end up
with undefined_p - heh).  I also am confused about intersecting both
merged_true_range and merged_false_range with 'other'?


other is a predicate for a SSA_NAME _x_ that we already switched on. And thus
we can improve both predicates for true/false edge that are also related to
the same SSA_NAME _x_.



I would
have expected to merge true edge info with true edge info and thus
only "swap" things somehow?  OTOH "path" suggests we're dealing
with more than one edge and associated ranges?


Yep, path means the we already did a series of unswitching like:

- if (a > 10) TRUE
- if (b == 10) FALSE
- if (c > 0) TRUE


Maybe expanding
the comment on the predicate_vector would be useful.  AFAIR we


Sure, will do that.


there store the sequence of unswitchings done with pairs of
the predicate unswitched on and a bool indicating whether we're
dealing with the copy that had the predicate true or the one that
had it false.


Exactly.



+  unswitch_predicate *predicate = NULL;
+  basic_block bb = NULL;
+  if (num > param_max_unswitch_level)
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+"Not unswitc

Re: [PATCH] Loop unswitching: support gswitch statements.

2022-01-06 Thread Andrew MacLeod via Gcc-patches

On 1/6/22 11:02, Martin Liška wrote:

On 1/6/22 16:11, Andrew MacLeod wrote:

On 1/5/22 07:34, Richard Biener wrote:

On Thu, Dec 9, 2021 at 2:02 PM Martin Liška  wrote:

On 11/30/21 12:17, Richard Biener wrote:


+ unswitch_predicate *predicate
+   = new unswitch_predicate (expr, idx, edge_index);
+ ranger->gori ().outgoing_edge_range_p
(predicate->true_range, e,
+    idx,
*get_global_range_query ());
+ /* Huge switches are not supported by Ranger.  */
+ if (predicate->true_range.undefined_p ())

I hope ranger will set the range to varying_p () in that case, not
undefined?  But even
then, is that a reason for punting?  I guess we fail to prune cases in
that case but
the cost modeling should then account for those and thus we are at
least consistent?


huge switches not supported?  I don't know what you mean, either that 
or I forget :-)  If there are more edges than multi-ranges support, 
then things will start getting merged because they cant be 
represented.. and the default case may then also contain some values 
that also have cases..  but all inconsistencies will move towards 
varying.. not undefined.


Andrew

oh oh oh.   EVRp was spending too much time in very large switches, so 
we added a param to say "don't look at switches too big"   You are 
probably tripping over that


-param=evrp-switch-limit=
Common Joined UInteger Var(param_evrp_switch_limit) Init(50) 
Optimization Param
Maximum number of outgoing edges in a switch before EVRP will not 
process it.


and in GORI we check:

  // Do not process switches if they are too large.
  if (EDGE_COUNT (bb->succs) > (unsigned)param_evrp_switch_limit)
    return;

so maybe you want to temporarily override param_evrp_switch_limit to 
some big number for your purposes...  just set it back when you are done :-)



Andrew




Re: [PATCH] Loop unswitching: support gswitch statements.

2022-01-06 Thread Martin Liška

On 1/6/22 17:20, Andrew MacLeod wrote:

So if you get a FALSE back, you cannot use any results because GORI is claiming 
that it cant figure something out... or there is nothing to figure out.   Most 
of rangers routines are implemented so that if they return FALSE, the result is 
meaningless.


All right, then it's my bad, sorry for the noise.




what is IDX you are passing?  order_385?


Yep.

(gdb) p idx
$1 = 




As a side note, theres a typo in the testcase:  .. Im not sure how that affects 
things, but :


Oh, yeah, that's typo :)



defaut:
     __builtin_unreachable ();


default is misspelled...  maybe it thinks that some kind of runtime value?   I 
am surprised it even compiles.  maybe that is mucking up what GORI thiunks it 
can calculate?


But it does not affect anything. The bailout happens due to:

│  199// Only process switches if it within the size limit.
│  200if (EDGE_COUNT (e->src->succs) > (unsigned)m_max_edges)
│  >   201  return NULL;

Cheers,
Martin


[PATCH] nvptx: Add support for PTX's cnot instruction.

2022-01-06 Thread Roger Sayle

Happy New Year for 2022.  This is a simple patch, now that the
nvptx backend has transitioned to STORE_FLAG_VALUE=1, that adds
support for NVidia's cnot instruction, that implements C/C++
style logical negation.

Previously, the simple function:

int foo(int x) { return !x; }

on nvptx-none with -O2 would generate:

mov.u32 %r24, %ar0;
setp.eq.u32 %r28, %r24, 0;
selp.u32%value, 1, 0, %r28;

with this patch, GCC now generates:

mov.u32 %r24, %ar0;
cnot.b32%value, %r24;


This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(including newlib) with a make and make -k check with no new failures.
Ok for mainline?


2022-01-06  Roger Sayle  

gcc/ChangeLog
* config/nvptx/nvptx.md (*cnot2): New define_insn.

gcc/testsuite/ChangeLog
* gcc.target/nvptx/cnot-1.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index a4c14a3..ce74672 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -592,6 +592,13 @@
   ""
   "%.\\tnot.b%T0\\t%0, %1;")
 
+(define_insn "*cnot2"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+   (eq:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+ (const_int 0)))]
+  ""
+  "%.\\tcnot.b%T0\\t%0, %1;")
+
 (define_insn "bitrev2"
   [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
(unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")]
diff --git a/gcc/testsuite/gcc.target/nvptx/cnot-1.c 
b/gcc/testsuite/gcc.target/nvptx/cnot-1.c
new file mode 100644
index 000..d0bdccd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/cnot-1.c
@@ -0,0 +1,94 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int test1(int x)
+{
+  return !x;
+}
+
+int test2(int x)
+{
+  return x ? 0 : 1;
+}
+
+int test3(int x)
+{
+  return (x == 0) ? 1 : 0;
+}
+
+unsigned int test4(unsigned int x)
+{
+  return !x;
+}
+
+unsigned int test5(unsigned int x)
+{
+  return x ? 0 : 1;
+}
+
+unsigned int test6(unsigned int x)
+{
+  return (x == 0) ? 1 : 0;
+}
+
+short test7(short x)
+{
+  return !x;
+}
+
+short test8(short x)
+{
+  return x ? 0 : 1;
+}
+
+short test9(short x)
+{
+  return (x == 0) ? 1 : 0;
+}
+
+unsigned short test10(unsigned short x)
+{
+  return !x;
+}
+
+unsigned short test11(unsigned short x)
+{
+  return x ? 0 : 1;
+}
+
+unsigned short test12(unsigned short x)
+{
+  return (x == 0) ? 1 : 0;
+}
+
+long test13(long x)
+{
+  return !x;
+}
+
+long test14(long x)
+{
+  return x ? 0 : 1;
+}
+
+long test15(long x)
+{
+  return (x == 0) ? 1: 0;
+}
+
+unsigned long test16(unsigned long x)
+{
+  return !x;
+}
+
+unsigned long test17(unsigned long x)
+{
+  return x ? 0 : 1;
+}
+
+unsigned long test18(unsigned long x)
+{
+  return (x == 0) ? 1 : 0;
+}
+
+/* { dg-final { scan-assembler-times "cnot.b" 18 } } */


Re: [PATCH] Loop unswitching: support gswitch statements.

2022-01-06 Thread Andrew MacLeod via Gcc-patches

On 1/6/22 11:35, Martin Liška wrote:

On 1/6/22 17:20, Andrew MacLeod wrote:
So if you get a FALSE back, you cannot use any results because GORI 
is claiming that it cant figure something out... or there is nothing 
to figure out.   Most of rangers routines are implemented so that if 
they return FALSE, the result is meaningless.


All right, then it's my bad, sorry for the noise.




what is IDX you are passing?  order_385?


Yep.

(gdb) p idx
$1 = 




As a side note, theres a typo in the testcase:  .. Im not sure how 
that affects things, but :


Oh, yeah, that's typo :)



    defaut:
     __builtin_unreachable ();


default is misspelled...  maybe it thinks that some kind of runtime 
value?   I am surprised it even compiles.  maybe that is mucking up 
what GORI thiunks it can calculate?


But it does not affect anything. The bailout happens due to:

│  199    // Only process switches if it within the size limit.
│  200    if (EDGE_COUNT (e->src->succs) > (unsigned)m_max_edges)
│  >   201  return NULL;

Cheers,
Martin


yeah its created by:

gori_compute::gori_compute (int not_executable_flag)
  : outgoing (param_evrp_switch_limit), tracer 
("GORI ")



so as long as you change it to whatever you want before you create a 
ranger,  you should get whatever limit you want. anything above 255 may 
produce imprecise default values, if there are big holes in the switch 
because multi ranges only currently support up to 255 ranges.. so if 
there are more than 255 distinct subranges, it wont be exact.


Andrew




Re: [PATCH 1/2] RISC-V: Allow extension name contain digit

2022-01-06 Thread Kito Cheng via Gcc-patches
Committed

On Fri, Dec 3, 2021 at 11:51 PM Kito Cheng  wrote:
>
> RISC-V spec only allow alphabetical name for extension before, however
> vector extension add several extension named with digits, so we try to
> extend the naming rule.
>
> Ref:
> https://github.com/riscv/riscv-isa-manual/pull/718
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.c
> (riscv_subset_list::parse_multiletter_ext): Allow ext. name has
> digit.
> ---
>  gcc/common/config/riscv/riscv-common.c | 42 +++---
>  1 file changed, 38 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.c 
> b/gcc/common/config/riscv/riscv-common.c
> index b8dd0aeac3e..31b1c833965 100644
> --- a/gcc/common/config/riscv/riscv-common.c
> +++ b/gcc/common/config/riscv/riscv-common.c
> @@ -760,24 +760,58 @@ riscv_subset_list::parse_multiletter_ext (const char *p,
>bool explicit_version_p = false;
>char *ext;
>char backup;
> +  size_t len;
> +  size_t end_of_version_pos, i;
> +  bool found_any_number = false;
> +  bool found_minor_version = false;
>
> -  while (*++q != '\0' && *q != '_' && !ISDIGIT (*q))
> +  /* Parse until end of this extension including version number.  */
> +  while (*++q != '\0' && *q != '_')
> ;
>
>backup = *q;
>*q = '\0';
> -  ext = xstrdup (subset);
> +  len = q - subset;
>*q = backup;
>
> +  end_of_version_pos = len;
> +  /* Find the begin of version string.  */
> +  for (i = len -1; i > 0; --i)
> +   {
> + if (ISDIGIT (subset[i]))
> +   {
> + found_any_number = true;
> + continue;
> +   }
> + /* Might be version seperator, but need to check one more char,
> +we only allow p, so we could stop parsing if found
> +any more `p`.  */
> + if (subset[i] == 'p' &&
> + !found_minor_version &&
> + found_any_number && ISDIGIT (subset[i-1]))
> +   {
> + found_minor_version = true;
> + continue;
> +   }
> +
> + end_of_version_pos = i + 1;
> + break;
> +   }
> +
> +  backup = subset[end_of_version_pos];
> +  subset[end_of_version_pos] = '\0';
> +  ext = xstrdup (subset);
> +  subset[end_of_version_pos] = backup;
> +
>end_of_version
> -   = parsing_subset_version (ext, q, &major_version, &minor_version,
> +   = parsing_subset_version (ext, subset + end_of_version_pos, 
> &major_version, &minor_version,
>   /* std_ext_p= */ false, 
> &explicit_version_p);
>free (ext);
>
>if (end_of_version == NULL)
> return NULL;
>
> -  *q = '\0';
> +  subset[end_of_version_pos] = '\0';
>
>if (strlen (subset) == 1)
> {
> --
> 2.34.0
>


Re: [PATCH 2/2] RISC-V: Minimal support of vector extensions

2022-01-06 Thread Kito Cheng via Gcc-patches
Committed with minor changelog fix

On Fri, Dec 3, 2021 at 11:52 PM Kito Cheng  wrote:
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.c (riscv_implied_info): Add
> vector extensions.
> (riscv_ext_version_table): Add version info for vector extensions.
> (riscv_ext_flag_table): Add option mask for vector extensions.
> * config/riscv/riscv-opts.h (MASK_VECTOR_EEW_32): New.
> (MASK_VECTOR_EEW_64): New.
> (MASK_VECTOR_EEW_FP_32): New.
> (MASK_VECTOR_EEW_FP_64): New.
> (MASK_ZVL32B: New.
> (MASK_ZVL64B: New.
> (MASK_ZVL128B: New.
> (MASK_ZVL256B: New.
> (MASK_ZVL512B: New.
> (MASK_ZVL1024B): New.
> (MASK_ZVL2048B): New.
> (MASK_ZVL4096B): New.
> (MASK_ZVL8192B): New.
> (MASK_ZVL16384B): New.
> (MASK_ZVL32768B): New.
> (MASK_ZVL65536B): New.
> (TARGET_ZVL32B): New.
> (TARGET_ZVL64B): New.
> (TARGET_ZVL128B): New.
> (TARGET_ZVL256B): New.
> (TARGET_ZVL512B): New.
> (TARGET_ZVL1024B): New.
> (TARGET_ZVL2048B): New.
> (TARGET_ZVL4096B): New.
> (TARGET_ZVL8192B): New.
> (TARGET_ZVL16384B): New.
> (TARGET_ZVL32768B): New.
> (TARGET_ZVL65536B): New.
> * config/riscv/riscv.opt (Mask(VECTOR)): New.
> (riscv_vector_eew_flags): New.
> (riscv_zvl_flags): New.
>
> gcc/testsuite/ChangeLog:
>
> * testsuite/gcc.target/riscv/predef-14.c: New.
> * testsuite/gcc.target/riscv/predef-15.c: Ditto.
> * testsuite/gcc.target/riscv/predef-16.c: Ditto.
> ---
>  gcc/common/config/riscv/riscv-common.c | 86 
>  gcc/config/riscv/riscv-opts.h  | 31 
>  gcc/config/riscv/riscv.opt |  8 ++
>  gcc/testsuite/gcc.target/riscv/predef-14.c | 83 
>  gcc/testsuite/gcc.target/riscv/predef-15.c | 91 ++
>  gcc/testsuite/gcc.target/riscv/predef-16.c | 91 ++
>  6 files changed, 390 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-14.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-15.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-16.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.c 
> b/gcc/common/config/riscv/riscv-common.c
> index 31b1c833965..5cf15024aef 100644
> --- a/gcc/common/config/riscv/riscv-common.c
> +++ b/gcc/common/config/riscv/riscv-common.c
> @@ -50,6 +50,38 @@ static const riscv_implied_info_t riscv_implied_info[] =
>{"d", "f"},
>{"f", "zicsr"},
>{"d", "zicsr"},
> +
> +  {"v", "zvl128b"},
> +  {"v", "zve64d"},
> +
> +  {"zve32f", "f"},
> +  {"zve64f", "f"},
> +  {"zve64d", "d"},
> +
> +  {"zve32x", "zvl32b"},
> +  {"zve32f", "zve32x"},
> +  {"zve32f", "zvl32b"},
> +
> +  {"zve64x", "zve32x"},
> +  {"zve64x", "zvl64b"},
> +  {"zve64f", "zve32f"},
> +  {"zve64f", "zve64x"},
> +  {"zve64f", "zvl64b"},
> +  {"zve64d", "zve64f"},
> +  {"zve64d", "zvl64b"},
> +
> +  {"zvl64b", "zvl32b"},
> +  {"zvl128b", "zvl64b"},
> +  {"zvl256b", "zvl128b"},
> +  {"zvl512b", "zvl256b"},
> +  {"zvl1024b", "zvl512b"},
> +  {"zvl2048b", "zvl1024b"},
> +  {"zvl4096b", "zvl2048b"},
> +  {"zvl8192b", "zvl4096b"},
> +  {"zvl16384b", "zvl8192b"},
> +  {"zvl32768b", "zvl16384b"},
> +  {"zvl65536b", "zvl32768b"},
> +
>{NULL, NULL}
>  };
>
> @@ -101,11 +133,34 @@ static const struct riscv_ext_version 
> riscv_ext_version_table[] =
>{"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
>{"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
>
> +
> +  {"v",   ISA_SPEC_CLASS_NONE, 1, 0},
> +
>{"zba", ISA_SPEC_CLASS_NONE, 1, 0},
>{"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
>{"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
>{"zbs", ISA_SPEC_CLASS_NONE, 1, 0},
>
> +  {"zve32x", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zve32f", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zve32d", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zve64x", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zve64f", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zve64d", ISA_SPEC_CLASS_NONE, 1, 0},
> +
> +  {"zvl32b", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zvl64b", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zvl128b", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zvl256b", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zvl512b", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zvl1024b", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zvl2048b", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zvl4096b", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zvl8192b", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zvl16384b", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zvl32768b", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zvl65536b", ISA_SPEC_CLASS_NONE, 1, 0},
> +
>/* Terminate the list.  */
>{NULL, ISA_SPEC_CLASS_NONE, 0, 0}
>  };
> @@ -940,6 +995,7 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>{"f", &gcc_options::x_target_flags, MASK_HARD_FLOAT},
>{"d", &gcc_options::x_target_flags, MASK_DOUBLE_FLOAT},
>{"c", &gcc_options::x_target_flags, MA

[Ada] Proof of runtime units for binary modular exponentiation

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
This proves the generic unit System.Exponu instantiated for Unsigned,
Long_Long_Unsigned and Long_Long_Long_Unsigned. The proof is simpler
than the one for signed integers, as there are no possible overflows
here.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-explllu.ads: Mark in SPARK.
* libgnat/s-expllu.ads: Mark in SPARK.
* libgnat/s-exponu.adb: Add loop invariants and needed
assertions.
* libgnat/s-exponu.ads: Add functional contract.
* libgnat/s-expuns.ads: Mark in SPARK.diff --git a/gcc/ada/libgnat/s-explllu.ads b/gcc/ada/libgnat/s-explllu.ads
--- a/gcc/ada/libgnat/s-explllu.ads
+++ b/gcc/ada/libgnat/s-explllu.ads
@@ -34,10 +34,23 @@
 --  The result is always full width, the caller must do a masking operation if
 --  the modulus is less than 2 ** Long_Long_Long_Unsigned'Size.
 
+--  Preconditions in this unit are meant for analysis only, not for run-time
+--  checking, so that the expected exceptions are raised. This is enforced
+--  by setting the corresponding assertion policy to Ignore. Postconditions
+--  and contract cases should not be executed at runtime as well, in order
+--  not to slow down the execution of these functions.
+
+pragma Assertion_Policy (Pre=> Ignore,
+ Post   => Ignore,
+ Contract_Cases => Ignore,
+ Ghost  => Ignore);
+
 with System.Exponu;
 with System.Unsigned_Types;
 
-package System.Exp_LLLU is
+package System.Exp_LLLU
+  with SPARK_Mode
+is
 
subtype Long_Long_Long_Unsigned is Unsigned_Types.Long_Long_Long_Unsigned;
 


diff --git a/gcc/ada/libgnat/s-expllu.ads b/gcc/ada/libgnat/s-expllu.ads
--- a/gcc/ada/libgnat/s-expllu.ads
+++ b/gcc/ada/libgnat/s-expllu.ads
@@ -34,10 +34,23 @@
 --  The result is always full width, the caller must do a masking operation if
 --  the modulus is less than 2 ** Long_Long_Unsigned'Size.
 
+--  Preconditions in this unit are meant for analysis only, not for run-time
+--  checking, so that the expected exceptions are raised. This is enforced
+--  by setting the corresponding assertion policy to Ignore. Postconditions
+--  and contract cases should not be executed at runtime as well, in order
+--  not to slow down the execution of these functions.
+
+pragma Assertion_Policy (Pre=> Ignore,
+ Post   => Ignore,
+ Contract_Cases => Ignore,
+ Ghost  => Ignore);
+
 with System.Exponu;
 with System.Unsigned_Types;
 
-package System.Exp_LLU is
+package System.Exp_LLU
+  with SPARK_Mode
+is
 
subtype Long_Long_Unsigned is Unsigned_Types.Long_Long_Unsigned;
 


diff --git a/gcc/ada/libgnat/s-exponu.adb b/gcc/ada/libgnat/s-exponu.adb
--- a/gcc/ada/libgnat/s-exponu.adb
+++ b/gcc/ada/libgnat/s-exponu.adb
@@ -29,7 +29,19 @@
 --  --
 --
 
-function System.Exponu (Left : Int; Right : Natural) return Int is
+function System.Exponu (Left : Int; Right : Natural) return Int
+  with SPARK_Mode
+is
+   --  Preconditions, postconditions, ghost code, loop invariants and
+   --  assertions in this unit are meant for analysis only, not for run-time
+   --  checking, as it would be too costly otherwise. This is enforced by
+   --  setting the assertion policy to Ignore.
+
+   pragma Assertion_Policy (Pre=> Ignore,
+Post   => Ignore,
+Ghost  => Ignore,
+Loop_Invariant => Ignore,
+Assert => Ignore);
 
--  Note that negative exponents get a constraint error because the
--  subtype of the Right argument (the exponent) is Natural.
@@ -49,7 +61,16 @@ begin
 
if Exp /= 0 then
   loop
+ pragma Loop_Invariant (Exp > 0);
+ pragma Loop_Invariant (Result * Factor ** Exp = Left ** Right);
+ pragma Loop_Variant (Decreases => Exp);
+
  if Exp rem 2 /= 0 then
+pragma Assert
+  (Result * (Factor * Factor ** (Exp - 1)) = Left ** Right);
+pragma Assert
+  ((Result * Factor) * Factor ** (Exp - 1) = Left ** Right);
+
 Result := Result * Factor;
  end if;
 


diff --git a/gcc/ada/libgnat/s-exponu.ads b/gcc/ada/libgnat/s-exponu.ads
--- a/gcc/ada/libgnat/s-exponu.ads
+++ b/gcc/ada/libgnat/s-exponu.ads
@@ -31,8 +31,22 @@
 
 --  Modular integer exponentiation
 
+--  Preconditions in this unit are meant for analysis only, not for run-time
+--  checking, so that the expected exceptions are raised. This is enforced
+--  by setting the corresponding assertion policy to Ignore. Postconditions
+--  and contract cases should not be executed at runtime as well, in order
+--  not to slow down the execution of these functi

[Ada] Proof of runtime unit for non-binary modular exponentiation

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
This proof combines the difficulties of proving signed exponentiation,
as we need to compare the result to the mathematical one using big
integers, with pervasive use of modulo operation. This requires
lemmas which should later be isolated in a shared library, for possible
reuse in other runtime units.

The postcondition previously stated on Exp_Modular was wrong, as it
performed exponentiation in the Unsigned type, which was silently
performing already a different modulo operation. This is fixed here by
using big integers instead.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-expmod.adb: Mark in SPARK. Add ghost code for proof.
* libgnat/s-expmod.ads: Mark in SPARK. Add ghost specifications.diff --git a/gcc/ada/libgnat/s-expmod.adb b/gcc/ada/libgnat/s-expmod.adb
--- a/gcc/ada/libgnat/s-expmod.adb
+++ b/gcc/ada/libgnat/s-expmod.adb
@@ -29,9 +29,168 @@
 --  --
 --
 
-package body System.Exp_Mod is
+--  Preconditions, postconditions, ghost code, loop invariants and assertions
+--  in this unit are meant for analysis only, not for run-time checking, as it
+--  would be too costly otherwise. This is enforced by setting the assertion
+--  policy to Ignore.
+
+pragma Assertion_Policy (Pre=> Ignore,
+ Post   => Ignore,
+ Ghost  => Ignore,
+ Loop_Invariant => Ignore,
+ Assert => Ignore);
+
+with Ada.Numerics.Big_Numbers.Big_Integers_Ghost;
+use Ada.Numerics.Big_Numbers.Big_Integers_Ghost;
+
+package body System.Exp_Mod
+  with SPARK_Mode
+is
use System.Unsigned_Types;
 
+   --  Local lemmas
+
+   procedure Lemma_Add_Mod (X, Y : Big_Natural; B : Big_Positive)
+   with
+ Ghost,
+ Post => (X + Y) mod B = ((X mod B) + (Y mod B)) mod B;
+
+   procedure Lemma_Exp_Expand (A : Big_Integer; Exp : Natural)
+   with
+ Ghost,
+ Post =>
+   (if Exp rem 2 = 0 then
+  A ** Exp = A ** (Exp / 2) * A ** (Exp / 2)
+else
+  A ** Exp = A ** (Exp / 2) * A ** (Exp / 2) * A);
+
+   procedure Lemma_Exp_Mod (A : Big_Natural; Exp : Natural; B : Big_Positive)
+   with
+ Ghost,
+ Subprogram_Variant => (Decreases => Exp),
+ Post => ((A mod B) ** Exp) mod B = (A ** Exp) mod B;
+
+   procedure Lemma_Mod_Ident (A : Big_Natural; B : Big_Positive)
+   with
+ Ghost,
+ Pre => A < B,
+ Post => A mod B = A;
+
+   procedure Lemma_Mod_Mod (A : Big_Integer; B : Big_Positive)
+   with
+ Ghost,
+ Post => A mod B mod B = A mod B;
+
+   procedure Lemma_Mult_Div (X : Big_Natural; Y : Big_Positive)
+   with
+ Ghost,
+ Post => X * Y / Y = X;
+
+   procedure Lemma_Mult_Mod (X, Y : Big_Natural; B : Big_Positive)
+   with
+ Ghost,
+ --  The following subprogram variant can be added as soon as supported
+ --  Subprogram_Variant => (Decreases => Y),
+ Post => (X * Y) mod B = ((X mod B) * (Y mod B)) mod B;
+
+   -
+   -- Local lemma null bodies --
+   -
+
+   procedure Lemma_Mod_Ident (A : Big_Natural; B : Big_Positive) is null;
+   procedure Lemma_Mod_Mod (A : Big_Integer; B : Big_Positive) is null;
+   procedure Lemma_Mult_Div (X : Big_Natural; Y : Big_Positive) is null;
+
+   ---
+   -- Lemma_Add_Mod --
+   ---
+
+   procedure Lemma_Add_Mod (X, Y : Big_Natural; B : Big_Positive) is
+  Left  : constant Big_Natural := (X + Y) mod B;
+  Right : constant Big_Natural := ((X mod B) + (Y mod B)) mod B;
+  XQuot : constant Big_Natural := X / B;
+  YQuot : constant Big_Natural := Y / B;
+  AQuot : constant Big_Natural := (X mod B + Y mod B) / B;
+   begin
+  if Y /= 0 and B > 1 then
+ pragma Assert (X = XQuot * B + X mod B);
+ pragma Assert (Y = YQuot * B + Y mod B);
+ pragma Assert
+   (Left = ((XQuot + YQuot) * B + X mod B + Y mod B) mod B);
+ pragma Assert (X mod B + Y mod B = AQuot * B + Right);
+ pragma Assert (Left = ((XQuot + YQuot + AQuot) * B + Right) mod B);
+ pragma Assert (Left = Right);
+  end if;
+   end Lemma_Add_Mod;
+
+   --
+   -- Lemma_Exp_Expand --
+   --
+
+   procedure Lemma_Exp_Expand (A : Big_Integer; Exp : Natural) is
+   begin
+  if Exp rem 2 = 0 then
+ pragma Assert (Exp = Exp / 2 + Exp / 2);
+  else
+ pragma Assert (Exp = Exp / 2 + Exp / 2 + 1);
+ pragma Assert (A ** Exp = A ** (Exp / 2) * A ** (Exp / 2 + 1));
+ pragma Assert (A ** (Exp / 2 + 1) = A ** (Exp / 2) * A);
+ pragma Assert (A ** Exp = A ** (Exp / 2) * A ** (Exp / 2) * A);
+  end if;
+   end Lemma_Exp_Expand;
+
+   ---
+   -- Lemma_Exp_Mod --
+   ---
+
+   procedure Lemma_Exp_

[Ada] Simplify GNAT AST printing with simple GNAT hash table

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
For pretty-printing of GNAT AST we had a custom hash table which stored
visited nodes. Now this custom hash table is replaced with an instance
of GNAT.Dynamic_Tables.Dynamic_Hash_Tables. Expansion and compression
factors for this table are the same as for all other instances of
Dynamic_Hash_Tables in the frontend.

Code cleanup; behaviour is unaffected; no noticeable difference in
performance either (when comparing the running time with AST dump for a
reasonably large file, i.e. "gcc -c sem_util.adb -gnatdt").

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* treepr.ads (Treepr, Print_Tree_List, Print_Tree_Elist): Fix
style in comments.
* treepr.adb (Serial_Numbers): Hash table instance.
(Hash): Hashing routine.
(Print_Field): Fix style.
(Print_Init): Adapt to simple hash table.
(Print_Term): Likewise.
(Serial_Numbers): Likewise.
(Set_Serial_Number): Likewise.diff --git a/gcc/ada/treepr.adb b/gcc/ada/treepr.adb
--- a/gcc/ada/treepr.adb
+++ b/gcc/ada/treepr.adb
@@ -23,32 +23,32 @@
 --  --
 --
 
-with Aspects;use Aspects;
-with Atree;  use Atree;
-with Csets;  use Csets;
-with Debug;  use Debug;
-with Einfo;  use Einfo;
-with Einfo.Entities; use Einfo.Entities;
-with Einfo.Utils;use Einfo.Utils;
-with Elists; use Elists;
-with Lib;use Lib;
-with Namet;  use Namet;
-with Nlists; use Nlists;
-with Output; use Output;
-with Seinfo; use Seinfo;
-with Sinfo;  use Sinfo;
-with Sinfo.Nodes;use Sinfo.Nodes;
-with Sinfo.Utils;use Sinfo.Utils;
-with Snames; use Snames;
-with Sinput; use Sinput;
-with Stand;  use Stand;
-with Stringt;use Stringt;
-with SCIL_LL;use SCIL_LL;
-with Uintp;  use Uintp;
-with Urealp; use Urealp;
-with Uname;  use Uname;
+with Aspects;  use Aspects;
+with Atree;use Atree;
+with Csets;use Csets;
+with Debug;use Debug;
+with Einfo;use Einfo;
+with Einfo.Entities;   use Einfo.Entities;
+with Einfo.Utils;  use Einfo.Utils;
+with Elists;   use Elists;
+with GNAT.Dynamic_HTables; use GNAT.Dynamic_HTables;
+with Lib;  use Lib;
+with Namet;use Namet;
+with Nlists;   use Nlists;
+with Output;   use Output;
+with Seinfo;   use Seinfo;
+with Sinfo;use Sinfo;
+with Sinfo.Nodes;  use Sinfo.Nodes;
+with Sinfo.Utils;  use Sinfo.Utils;
+with Snames;   use Snames;
+with Sinput;   use Sinput;
+with Stand;use Stand;
+with Stringt;  use Stringt;
+with SCIL_LL;  use SCIL_LL;
+with Uintp;use Uintp;
+with Urealp;   use Urealp;
+with Uname;use Uname;
 with Unchecked_Conversion;
-with Unchecked_Deallocation;
 
 package body Treepr is
 
@@ -80,24 +80,30 @@ package body Treepr is
--  Set True to print low-level information useful for debugging Atree and
--  the like.
 
-   type Hash_Record is record
-  Serial : Nat;
-  --  Serial number for hash table entry. A value of zero means that
-  --  the entry is currently unused.
-
-  Id : Int;
-  --  If serial number field is non-zero, contains corresponding Id value
-   end record;
-
-   type Hash_Table_Type is array (Nat range <>) of Hash_Record;
-   type Access_Hash_Table_Type is access Hash_Table_Type;
-   Hash_Table : Access_Hash_Table_Type;
+   function Hash (Key : Int) return GNAT.Bucket_Range_Type;
+   --  Simple Hash function for Node_Ids, List_Ids and Elist_Ids
+
+   procedure Destroy (Value : in out Nat) is null;
+   --  Dummy routine for destroing hashed values
+
+   package Serial_Numbers is new Dynamic_Hash_Tables
+ (Key_Type  => Int,
+  Value_Type=> Nat,
+  No_Value  => 0,
+  Expansion_Threshold   => 1.5,
+  Expansion_Factor  => 2,
+  Compression_Threshold => 0.3,
+  Compression_Factor=> 2,
+  "="   => "=",
+  Destroy_Value => Destroy,
+  Hash  => Hash);
+   --  Hash tables with dynamic resizing based on load factor. They provide
+   --  reasonable performance both when the printed AST is small (e.g. when
+   --  printing from debugger) and large (e.g. when printing with -gnatdt).
+
+   Hash_Table : Serial_Numbers.Dynamic_Hash_Table;
--  The hash table itself, see Serial_Number function for details of use
 
-   Hash_Table_Len : Nat;
-   --  Range of Hash_Table is from 0 .. Hash_Table_Len - 1 so that dividing
-   --  by Hash_Table_Len gives a remainder that is in Hash_Table'Range.
-
Next_Serial_Number

[Ada] Simplify repeated calls in printing of GNAT AST

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
Code cleanup; behaviour is unaffected.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* treepr.adb (Visit_Node): Simplify repeated call to
Next_Entity.diff --git a/gcc/ada/treepr.adb b/gcc/ada/treepr.adb
--- a/gcc/ada/treepr.adb
+++ b/gcc/ada/treepr.adb
@@ -2305,8 +2305,8 @@ package body Treepr is
 begin
Nod := N;
while Present (Nod) loop
-  Visit_Descendant (Union_Id (Next_Entity (Nod)));
   Next_Entity (Nod);
+  Visit_Descendant (Union_Id (Nod));
end loop;
 end;
  end if;




[Ada] Crash in class-wide pre/postconditions

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
The compiler may crash processing a class-wide pre/postcondition
that has dispatching calls using the Object.Operation notation.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* contracts.adb (Restore_Original_Selected_Component): New
subprogram that traverses a preanalyzed expression searching for
dispatching calls to functions whose original node was a
selected component, and replacing them with their original node.
This functionality is required because the preanalyis of
dispatching calls using the Object.Operation notation transforms
such calls, and we need the original condition to properly
inherit and extend the condition expression on tagged type
derivations.  This functionality was previously provided by the
routine Install_Original_Selected_Component (as part of
inheriting conditions); now it is performed as part of the
preanalysis of the condition, thus avoiding repeatedly
installing and restoring such nodes.
(Install_Original_Selected_Component): Removed.
(Restore_Dispatching_Calls): Removed.diff --git a/gcc/ada/contracts.adb b/gcc/ada/contracts.adb
--- a/gcc/ada/contracts.adb
+++ b/gcc/ada/contracts.adb
@@ -4254,6 +4254,11 @@ package body Contracts is
  procedure Remove_Formals (Id : Entity_Id);
  --  Remove formals from homonym chains and make them not visible
 
+ procedure Restore_Original_Selected_Component;
+ --  Traverse Expr searching for dispatching calls to functions whose
+ --  original node was a selected component, and replace them with
+ --  their original node.
+
  
  -- Clear_Unset_References --
  
@@ -4313,6 +4318,46 @@ package body Contracts is
 end loop;
  end Remove_Formals;
 
+ -
+ -- Restore_Original_Selected_Component --
+ -
+
+ procedure Restore_Original_Selected_Component is
+
+function Restore_Node (N : Node_Id) return Traverse_Result;
+--  Process a single node
+
+--
+-- Restore_Node --
+--
+
+function Restore_Node (N : Node_Id) return Traverse_Result is
+begin
+   if Nkind (N) = N_Function_Call
+ and then Nkind (Original_Node (N)) = N_Selected_Component
+ and then Is_Dispatching_Operation (Entity (Name (N)))
+   then
+  Rewrite (N, Original_Node (N));
+  Set_Original_Node (N, N);
+
+  --  Restore decoration of its child nodes; required to ensure
+  --  proper copies of this subtree (if required) by subsequent
+  --  calls to New_Copy_Tree (since otherwise these child nodes
+  --  are not duplicated).
+
+  Set_Parent (Prefix (N), N);
+  Set_Parent (Selector_Name (N), N);
+   end if;
+
+   return OK;
+end Restore_Node;
+
+procedure Restore_Nodes is new Traverse_Proc (Restore_Node);
+
+ begin
+Restore_Nodes (Expr);
+ end Restore_Original_Selected_Component;
+
   --  Start of processing for Preanalyze_Condition
 
   begin
@@ -4329,6 +4374,16 @@ package body Contracts is
  Remove_Formals (Subp);
  Pop_Scope;
 
+ --  If this preanalyzed condition has occurrences of dispatching calls
+ --  using the Object.Operation notation, during preanalysis such calls
+ --  are rewritten as dispatching function calls; if at later stages
+ --  this condition is inherited we must have restored the original
+ --  selected-component node to ensure that the preanalysis of the
+ --  inherited condition rewrites these dispatching calls in the
+ --  correct context to avoid reporting spurious errors.
+
+ Restore_Original_Selected_Component;
+
  --  Traverse Expr and clear the Controlling_Argument of calls to
  --  nonabstract functions. Required since the preanalyzed condition
  --  is not yet installed on its definite context and will be cloned
@@ -4373,103 +4428,9 @@ package body Contracts is
(Par_Subp : Entity_Id;
 Subp : Entity_Id) return Node_Id
  is
-Installed_Calls : constant Elist_Id := New_Elmt_List;
-
-procedure Install_Original_Selected_Component (Expr : Node_Id);
---  Traverse the given expression searching for dispatching calls
---  to functions whose original nodes was a selected component,
---  and replacing them temporarily by a copy of their original
---  node. Modified calls are stored in the list Installed

[Ada] Justify false positive message from CodePeer analysis of GNAT

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
Analysis of loop variant is known to lead to false alarms with CodePeer.
Add pragma Annotate in such a case which can be justified.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-exponu.adb (Exponu): Add annotation.diff --git a/gcc/ada/libgnat/s-exponu.adb b/gcc/ada/libgnat/s-exponu.adb
--- a/gcc/ada/libgnat/s-exponu.adb
+++ b/gcc/ada/libgnat/s-exponu.adb
@@ -64,6 +64,9 @@ begin
  pragma Loop_Invariant (Exp > 0);
  pragma Loop_Invariant (Result * Factor ** Exp = Left ** Right);
  pragma Loop_Variant (Decreases => Exp);
+ pragma Annotate
+   (CodePeer, False_Positive,
+"validity check", "confusion on generated code");
 
  if Exp rem 2 /= 0 then
 pragma Assert




[Ada] Removal of technical debt

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
This patch removes various technical debt in the form of "???" comments
throughout the GNAT sources.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch6.adb (Add_Simple_Call_By_Copy_Code): Add comments
regarding special handling of components which depend on
discriminants.
* exp_dist.adb (Build_From_Any_Function): Add Real_Rep actual
for calls to Has_Stream_Attribute_Definition.
(Build_To_Any_Function): Likewise.
(Build_TypeCode_Function): Likewise.
* freeze.adb (Freeze_Entity): Add missing comment for Test_E.
* libgnat/s-utf_32.adb: Remove disabled warning comments and
temporarily inserted pragma warnings.  Remove very old (2006 and
2012) comments about bootstrapping older versions.
* par.adb (P_Identifier): Add new parameter Force_Msg.
* par-ch2.adb (P_Identifier): Restructure and clean up function.
* par-ch3.adb (P_Defining_Identifier): Remove code duplication
for parsing identifiers.
* sem_attr.adb (Stream_Attribute_Available): Add missing
comments and add Real_Rep actual for calls to
Has_Stream_Attribute_Definition.
* sem_cat.adb (Has_Read_Write_Attribute): Add Real_Rep actual
for calls to Has_Stream_Attribute_Definition.
(Has_Stream_Attribute_Definition): Remove local Real_Rep and fix
recursive calls. Add default value for Real_Rep.
* sem_cat.ads (Has_Stream_Attribute_Definition): Add new out
parameter "Real_Rep".
* sem_type.adb (Add_Entry): Add condition to avoid passing
non-function calls to Function_Interp_Has_Abstract_Op.
(Function_Interp_Has_Abstract_Op): Add missing comments and
remove check for Is_Overloadable.
* sem_util.adb (Derivation_Too_Early_To_Inherit): Remove
duplicated code.diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -1899,7 +1899,7 @@ package body Exp_Ch6 is
 
  Reset_Packed_Prefix;
 
- Temp := Make_Temporary (Loc, 'T', Actual);
+ Temp   := Make_Temporary (Loc, 'T', Actual);
  Incod  := Relocate_Node (Actual);
  Outcod := New_Copy_Tree (Incod);
 
@@ -1921,7 +1921,10 @@ package body Exp_Ch6 is
 
  elsif Inside_Init_Proc then
 
---  Could use a comment here to match comment below ???
+--  Skip using the actual as the expression in Decl if we are in
+--  an init proc and it is not a component which depends on a
+--  discriminant, because, in this case, we need to use the actual
+--  type of the component instead.
 
 if Nkind (Actual) /= N_Selected_Component
   or else
@@ -1930,8 +1933,9 @@ package body Exp_Ch6 is
 then
Incod := Empty;
 
---  Otherwise, keep the component in order to generate the proper
---  actual subtype, that depends on enclosing discriminants.
+--  Otherwise, keep the component so we can generate the proper
+--  actual subtype - since the subtype depends on enclosing
+--  discriminants.
 
 else
null;


diff --git a/gcc/ada/exp_dist.adb b/gcc/ada/exp_dist.adb
--- a/gcc/ada/exp_dist.adb
+++ b/gcc/ada/exp_dist.adb
@@ -8600,6 +8600,8 @@ package body Exp_Dist is
 
 Use_Opaque_Representation : Boolean;
 
+Real_Rep : Node_Id;
+
  begin
 --  For a derived type, we can't go past the base type (to the
 --  parent type) here, because that would cause the attribute's
@@ -8634,10 +8636,10 @@ package body Exp_Dist is
 Use_Opaque_Representation := False;
 
 if Has_Stream_Attribute_Definition
- (Typ, TSS_Stream_Output, At_Any_Place => True)
+ (Typ, TSS_Stream_Output, Real_Rep, At_Any_Place => True)
   or else
Has_Stream_Attribute_Definition
- (Typ, TSS_Stream_Write, At_Any_Place => True)
+ (Typ, TSS_Stream_Write, Real_Rep, At_Any_Place => True)
 then
--  If user-defined stream attributes are specified for this
--  type, use them and transmit data as an opaque sequence of
@@ -9438,6 +9440,8 @@ package body Exp_Dist is
 --  When True, use stream attributes and represent type as an
 --  opaque sequence of bytes.
 
+Real_Rep : Node_Id;
+
  begin
 --  For a derived type, we can't go past the base type (to the
 --  parent type) here, because that would cause the attribute's
@@ -9492,10 +9496,10 @@ package body Exp_Dist is
 Use_Opaque_Representation := False;
 
 if Has_Stream_Attribute_Definition
- (Typ, TSS_Stream_Output, At_Any_Place => True)
+ (Typ, TSS_Stream_Output, Real_Rep, At_Any_Place

[Ada] Suppress spurious CodePeer check on generic actual subprogram

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
Procedure Destroy is intentionally doing nothing and needs an IN OUT
parameter, because to match the profile of a generic formal subprogram.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* treepr.adb (Destroy): Prevent spurious check from CodePeer.diff --git a/gcc/ada/treepr.adb b/gcc/ada/treepr.adb
--- a/gcc/ada/treepr.adb
+++ b/gcc/ada/treepr.adb
@@ -84,6 +84,8 @@ package body Treepr is
--  Simple Hash function for Node_Ids, List_Ids and Elist_Ids
 
procedure Destroy (Value : in out Nat) is null;
+   pragma Annotate (CodePeer, False_Positive, "unassigned parameter",
+"in out parameter is required to instantiate generic");
--  Dummy routine for destroing hashed values
 
package Serial_Numbers is new Dynamic_Hash_Tables




[Ada] Rename Any_Access into Universal_Access

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
The front-end defines an Any_Access entity which is only used as the type
of the literal null.  Now, since AI95-0230, the RM 4.2(8/2) clause reads:
"An integer literal is of type universal_integer. A real literal is of type
universal_real. The literal null is of type universal_access." and e.g.
Find_Non_Universal_Interpretations deals with Any_Access as if it was an
universal type, so it is more consistent to rename it into Universal_Access.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* stand.ads (Any_Access): Delete.
(Universal_Access): New entity.
* einfo.ads: Remove obsolete reference to Any_Access.
* gen_il-gen-gen_entities.adb: Likewise.
* cstand.adb (Create_Standard): Do not create Any_Access and create
Universal_Access as a full type instead.
* errout.adb (Set_Msg_Insertion_Type_Reference): Do not deal with
Any_Access and deal with Universal_Access instead.
* sem_ch3.adb (Analyze_Object_Declaration): Replace Any_Access with
Universal_Access.
* sem_ch4.adb (Analyze_Null): Likewise.
(Find_Non_Universal_Interpretations): Likewise.
(Find_Equality_Types.Try_One_Interp): Likewise and avoid shadowing
by renaming a local variable of the same name.
* sem_res.adb (Make_Call_Into_Operato): Likewise.
(Resolve_Equality_Op): Likewise.
* sem_type.adb (Covers): Likewise.
(Specific_Type): Likewise.diff --git a/gcc/ada/cstand.adb b/gcc/ada/cstand.adb
--- a/gcc/ada/cstand.adb
+++ b/gcc/ada/cstand.adb
@@ -1191,15 +1191,6 @@ package body CStand is
   pragma Assert (not Known_Esize (Any_Id));
   pragma Assert (not Known_Alignment (Any_Id));
 
-  Any_Access := New_Standard_Entity ("an access type");
-  Mutate_Ekind  (Any_Access, E_Access_Type);
-  Set_Scope (Any_Access, Standard_Standard);
-  Set_Etype (Any_Access, Any_Access);
-  Init_Size (Any_Access, System_Address_Size);
-  Set_Elem_Alignment(Any_Access);
-  Set_Directly_Designated_Type
-(Any_Access, Any_Type);
-
   Any_Character := New_Standard_Entity ("a character type");
   Mutate_Ekind  (Any_Character, E_Enumeration_Type);
   Set_Scope (Any_Character, Standard_Standard);
@@ -1416,6 +1407,16 @@ package body CStand is
   Set_Size_Known_At_Compile_Time
(Universal_Fixed);
 
+  Universal_Access := New_Standard_Entity ("universal_access");
+  Decl := New_Node (N_Full_Type_Declaration, Stloc);
+  Set_Defining_Identifier (Decl, Universal_Access);
+  Mutate_Ekind (Universal_Access, E_Access_Type);
+  Set_Etype(Universal_Access, Universal_Access);
+  Set_Scope(Universal_Access, Standard_Standard);
+  Init_Size(Universal_Access, System_Address_Size);
+  Set_Elem_Alignment   (Universal_Access);
+  Set_Directly_Designated_Type (Universal_Access, Any_Type);
+
   --  Create type declaration for Duration, using a 64-bit size. The
   --  delta and size values depend on the mode set in system.ads.
 


diff --git a/gcc/ada/einfo.ads b/gcc/ada/einfo.ads
--- a/gcc/ada/einfo.ads
+++ b/gcc/ada/einfo.ads
@@ -4864,10 +4864,6 @@ package Einfo is
 --  associated with an access attribute. After resolution a specific access
 --  type will be established as determined by the context.
 
---  Finally, the type Any_Access is used to label -null- during type
---  resolution. Any_Access is also replaced by the context type after
---  resolution.
-

-- Description of Defined Attributes for Entity_Kinds --



diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -3622,8 +3622,7 @@ package body Errout is
  Set_Msg_Str ("exception name");
  return;
 
-  elsif Error_Msg_Node_1 = Any_Access
-or else Error_Msg_Node_1 = Any_Array
+  elsif Error_Msg_Node_1 = Any_Array
 or else Error_Msg_Node_1 = Any_Boolean
 or else Error_Msg_Node_1 = Any_Character
 or else Error_Msg_Node_1 = Any_Composite
@@ -3640,17 +3639,21 @@ package body Errout is
  Set_Msg_Name_Buffer;
  return;
 
-  elsif Error_Msg_Node_1 = Universal_Real then
- Set_Msg_Str ("type universal real");
- return;
-
   elsif Error_Msg_Node_1 = Universal_Integer then
  Set_Msg_Str ("type universal integer");
  return;
 
+  elsif Error_Msg_Node_1 = Universal_Real then
+ Set_Msg_Str ("type universal real");
+ return;
+
   elsif Error_Msg_Node_1 = Universal_Fixed then
  Set_Msg_Str ("type universal fixed");
  return;
+
+  elsif Error_Msg_Node_1 = Universal_Access then
+

[Ada] Remove duplicates of empty strings

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
In package Stringt we already have a Null_String_Id, which represents a
null string with length zero. There is no need to duplicate it in other
packages.

Cleanup originating from enabling expansion of dispatching wrappers for
GNATprove; semantics is unaffected.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* par-ch4.adb (P_Simple_Expression): Reuse Null_String_Id.
* prep.adb (Parse_Def_File): Likewise; remove Empty_String.diff --git a/gcc/ada/par-ch4.adb b/gcc/ada/par-ch4.adb
--- a/gcc/ada/par-ch4.adb
+++ b/gcc/ada/par-ch4.adb
@@ -2471,9 +2471,6 @@ package body Ch4 is
   and then Num_Concats >= Num_Concats_Threshold
 then
declare
-  Empty_String_Val : String_Id;
-  --  String_Id for ""
-
   Strlit_Concat_Val : String_Id;
   --  Contains the folded value (which will be correct if the
   --  "&" operators are the predefined ones).
@@ -2509,11 +2506,9 @@ package body Ch4 is
   --  Create new folded node, and rewrite result with a concat-
   --  enation of an empty string literal and the folded node.
 
-  Start_String;
-  Empty_String_Val := End_String;
   New_Node :=
 Make_Op_Concat (Loc,
-  Make_String_Literal (Loc, Empty_String_Val),
+  Make_String_Literal (Loc, Null_String_Id),
   Make_String_Literal (Loc, Strlit_Concat_Val,
 Is_Folded_In_Parser => True));
   Rewrite (Node1, New_Node);


diff --git a/gcc/ada/prep.adb b/gcc/ada/prep.adb
--- a/gcc/ada/prep.adb
+++ b/gcc/ada/prep.adb
@@ -114,9 +114,6 @@ package body Prep is
--  Used to avoid repetition of the part of the initialisation that needs
--  to be done only once.
 
-   Empty_String : String_Id;
-   --  "", as a string_id
-
String_False : String_Id;
--  "false", as a string_id
 
@@ -809,9 +806,6 @@ package body Prep is
  Store_String_Chars ("True");
  True_Value.Value := End_String;
 
- Start_String;
- Empty_String := End_String;
-
  Start_String;
  Store_String_Chars ("False");
  String_False := End_String;
@@ -1072,7 +1066,7 @@ package body Prep is
 Original=> Original_Name,
 On_The_Command_Line => False,
 Is_A_String => False,
-Value   => Empty_String);
+Value   => Null_String_Id);
 
 else
Value_Start := Token_Ptr;




[Ada] New restriction No_Tagged_Type_Registration

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
This patch implements the No_Tagged_Type_Registration restriction,
analogous to No_Exception_Registration, but for tagged types.

Fix several violations of the RTE_Available protocol, as documented in
rtsfind.ads:

   --  If we call this and it returns True, we should generate a reference to
   --  E (usually a call). In other words, for a subprogram E, the compiler
   --  should not call RTE_Available (E) until it has decided it wants to
   --  generate a call to E.

This implies that RTE_Available must be called last in certain if statements.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-rident.ads (No_Tagged_Type_Registration): New
restriction identifier.
* restrict.ads (Implementation_Restriction): Add restriction.
* exp_ch7.adb (Process_Declarations): Suppress
tagged-type-related finalization actions if the restriction is
active.  Call RTE_Available last.
* exp_disp.adb (Make_DT): Likewise.
* exp_util.adb (Requires_Cleanup_Actions): Return False for a
tagged type declaration if No_Tagged_Type_Registration is
active.
* sem_attr.adb (Check_Stream_Attribute): Check restriction
No_Tagged_Type_Registration.
* libgnat/a-except.ads (Null_Occurrence): Minor: Initialize, to
avoid stopping at a warning in gdb.
* doc/gnat_rm/standard_and_implementation_defined_restrictions.rst:
Document new restriction.
* gnat_rm.texi: Regenerate.

patch.diff.gz
Description: application/gzip


[Ada] Spurious error when using current instance of type

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
This patch fixes an issue in the compiler whereby it fails to recognize
the presence of a current instance of an incomplete type when the
instance is used within a default expression for a record component.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch3.adb (Build_Assignment): Replace current instance of
type with Init_Proc formal.
* sem_attr.adb (OK_Self_Reference): Handle recognition of
Current_Instance to detect certain expansion.
* sem_ch4.adb (Analyze_One_Call): Set actual's type when the
actual in question is a current instance and its corresponding
formal is an incomplete type.
* sem_util.adb (Is_Current_Instance): Add check for incomplete
views and add comment.diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -2022,6 +2022,25 @@ package body Exp_Ch3 is
  Elmt2 => Defining_Identifier (First
(Parameter_Specifications
   (Parent (Proc_Id);
+
+   --  If the type has an incomplete view, a current instance
+   --  may have an incomplete type. In that case, it must also be
+   --  replaced by the formal of the Init_Proc.
+
+   if Nkind (Parent (Rec_Type)) = N_Full_Type_Declaration
+ and then Present (Incomplete_View (Parent (Rec_Type)))
+   then
+  Append_Elmt (
+N  => Defining_Identifier
+(Incomplete_View (Parent (Rec_Type))),
+To => Map);
+  Append_Elmt (
+N  => Defining_Identifier
+(First
+  (Parameter_Specifications
+(Parent (Proc_Id,
+To => Map);
+   end if;
 end if;
 
 Exp := New_Copy_Tree (Exp, New_Scope => Proc_Id, Map => Map);


diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -989,7 +989,15 @@ package body Sem_Attr is
   Set_Etype  (P, Typ);
end if;
 
-   if Typ = Scop then
+   --  A current instance typically appears immediately within
+   --  the type declaration, but may be nested within an internally
+   --  generated temporary scope - as for an aggregate of a
+   --  discriminated component.
+
+   if Typ = Scop
+ or else (In_Open_Scopes (Typ)
+   and then not Comes_From_Source (Scop))
+   then
   declare
  Q : Node_Id := Parent (N);
 


diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb
--- a/gcc/ada/sem_ch4.adb
+++ b/gcc/ada/sem_ch4.adb
@@ -3727,6 +3727,24 @@ package body Sem_Ch4 is
   Next_Actual (Actual);
   Next_Formal (Formal);
 
+   --  A current instance used as an actual of a function,
+   --  whose body has not been seen, may include a formal
+   --  whose type is an incomplete view of an enclosing
+   --  type declaration containing the current call (e.g.
+   --  in the Expression for a component declaration).
+
+   --  In this case, update the signature of the subprogram
+   --  so the formal has the type of the full view.
+
+   elsif Inside_Init_Proc
+ and then Nkind (Actual) = N_Identifier
+ and then Ekind (Etype (Formal)) = E_Incomplete_Type
+ and then Etype (Actual) = Full_View (Etype (Formal))
+   then
+  Set_Etype (Formal, Etype (Actual));
+  Next_Actual (Actual);
+  Next_Formal (Formal);
+
--  Handle failed type check
 
else


diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -16680,7 +16680,18 @@ package body Sem_Util is
   | N_Private_Type_Declaration
   | N_Subtype_Declaration
   and then Comes_From_Source (P)
-  and then Defining_Entity (P) = Typ
+
+  --  If the type has a previous incomplete declaration, the
+  --  reference in the type definition may have the incomplete
+  --  view. So, here we detect if this incomplete view is a current
+  --  instance by checking if its full view is the entity of the
+  --  full declaration begin analyzed.
+
+  and then
+(Defining_Entity (P) = Typ
+  or else
+   (Ekind (Typ) = E_Incomplete_Type
+ and then Full_View (Typ) = Defining_Entity (P)))

[Ada] Avoid building malformed component constraints

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
Given a discriminated type T1 with discriminant D1 having a component C1
of another discriminated type T2 with discriminant D2 and a propagated
discriminant constraint (that is, "C1 : T2 (D2 => D1);" and, for
example, a parameter of type T1, the compiler will sometimes build an
anonymous subtype to describe the constraints of the C1 component of
that parameter.  In some cases, these constraints were malformed; this
could result in either internal errors during compilation or the
generation of incorrect constraint checks. This error is corrected.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_util.adb (Build_Actual_Subtype_Of_Component): Define a new
local function, Build_Discriminant_Reference, and call it in
each of the three cases where Make_Selected_Component was
previously being called to construct a discriminant reference (2
in Build_Actual_Array_Constraint and 1 in
Build_Actual_Record_Constraint). Instead of unconditionally
using the passed-in object name as the prefix for the new
selected component node, this new function checks to see if
perhaps a prefix of that name should be used instead.diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -1970,6 +1970,12 @@ package body Sem_Util is
   --  Similar to previous one, for discriminated components constrained
   --  by the discriminant of the enclosing object.
 
+  function Build_Discriminant_Reference
+(Discrim_Name : Node_Id; Obj : Node_Id := P) return Node_Id;
+  --  Build a reference to the discriminant denoted by Discrim_Name.
+  --  The prefix of the result is usually Obj, but it could be
+  --  a prefix of Obj in some corner cases.
+
   function Copy_And_Maybe_Dereference (N : Node_Id) return Node_Id;
   --  Copy the subtree rooted at N and insert an explicit dereference if it
   --  is of an access type.
@@ -1993,11 +1999,7 @@ package body Sem_Util is
 Old_Hi := Type_High_Bound (Etype (Indx));
 
 if Denotes_Discriminant (Old_Lo) then
-   Lo :=
- Make_Selected_Component (Loc,
-   Prefix => Copy_And_Maybe_Dereference (P),
-   Selector_Name => New_Occurrence_Of (Entity (Old_Lo), Loc));
-
+   Lo := Build_Discriminant_Reference (Old_Lo);
 else
Lo := New_Copy_Tree (Old_Lo);
 
@@ -2011,11 +2013,7 @@ package body Sem_Util is
 end if;
 
 if Denotes_Discriminant (Old_Hi) then
-   Hi :=
- Make_Selected_Component (Loc,
-   Prefix => Copy_And_Maybe_Dereference (P),
-   Selector_Name => New_Occurrence_Of (Entity (Old_Hi), Loc));
-
+   Hi := Build_Discriminant_Reference (Old_Hi);
 else
Hi := New_Copy_Tree (Old_Hi);
Set_Analyzed (Hi, False);
@@ -2041,10 +2039,7 @@ package body Sem_Util is
  D := First_Elmt (Discriminant_Constraint (Desig_Typ));
  while Present (D) loop
 if Denotes_Discriminant (Node (D)) then
-   D_Val := Make_Selected_Component (Loc,
- Prefix => Copy_And_Maybe_Dereference (P),
-Selector_Name => New_Occurrence_Of (Entity (Node (D)), Loc));
-
+   D_Val := Build_Discriminant_Reference (Node (D));
 else
D_Val := New_Copy_Tree (Node (D));
 end if;
@@ -2056,6 +2051,89 @@ package body Sem_Util is
  return Constraints;
   end Build_Actual_Record_Constraint;
 
+  --
+  -- Build_Discriminant_Reference --
+  --
+
+  function Build_Discriminant_Reference
+(Discrim_Name : Node_Id; Obj : Node_Id := P) return Node_Id
+  is
+ Discrim : constant Entity_Id := Entity (Discrim_Name);
+
+ function Obj_Is_Good_Prefix return Boolean;
+ --  Returns True if Obj.Discrim makes sense; that is, if
+ --  Obj has Discrim as one of its discriminants (or is an
+ --  access value that designates such an object).
+
+ 
+ -- Obj_Is_Good_Prefix --
+ 
+
+ function Obj_Is_Good_Prefix return Boolean is
+Obj_Type   : Entity_Id :=
+  Implementation_Base_Type (Etype (Obj));
+
+Discriminated_Type : constant Entity_Id :=
+  Implementation_Base_Type
+(Scope (Original_Record_Component (Discrim)));
+ begin
+--  The order of the following two tests matters in the
+--  access-to-class-wide case.
+
+if Is_Access_Type (Obj_Type) then
+   Obj_Type := Implementation_Base_Type
+ (Designated_Type (Obj_Type));
+end if;
+
+if I

[Ada] Avoid building malformed component constraints

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
The previous fix introduced a not-yet-understood regression in compiling
CodePeer. For now, we attempt a quick workaround for the problem.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_util.adb (Build_Discriminant_Reference): In the unexpected
case where we previously would fail an assertion, we instead
revert to the old behavior.diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -2107,11 +2107,7 @@ package body Sem_Util is
   --  Start of processing for Build_Discriminant_Reference
 
   begin
- if Obj_Is_Good_Prefix then
-return Make_Selected_Component (Loc,
- Prefix => Copy_And_Maybe_Dereference (Obj),
- Selector_Name => New_Occurrence_Of (Discrim, Loc));
- else
+ if not Obj_Is_Good_Prefix then
 --  If the given discriminant is not a component of the given
 --  object, then try the enclosing object.
 
@@ -2128,10 +2124,15 @@ package body Sem_Util is
 (Discrim_Name => Discrim_Name,
  Obj  => Name (Parent (Entity (Obj;
 else
-   pragma Assert (False);
-   raise Program_Error;
+   --  We are in some unexpected case here, so revert to the
+   --  old behavior (by falling through to it).
+   null;
 end if;
  end if;
+
+ return Make_Selected_Component (Loc,
+  Prefix => Copy_And_Maybe_Dereference (Obj),
+  Selector_Name => New_Occurrence_Of (Discrim, Loc));
   end Build_Discriminant_Reference;
 
   




[Ada] Fix spurious error on instantiation with Text_IO name

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
This gets rid of a spurious error given by the compiler on the instantiation
of a generic package, when the instance happens to be an homonym of one of
the subpackages of Text_IO (Fixed_IO, Float_IO, etc) and when it is placed
in a context where Text_IO itself is also visible.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch8.adb (Analyze_Package_Renaming): Do not check for Text_IO
special units when the name of the renaming is a generic instance,
which is the case for package instantiations in the GNAT model.diff --git a/gcc/ada/sem_ch8.adb b/gcc/ada/sem_ch8.adb
--- a/gcc/ada/sem_ch8.adb
+++ b/gcc/ada/sem_ch8.adb
@@ -1594,9 +1594,18 @@ package body Sem_Ch8 is
  return;
   end if;
 
-  --  Check for Text_IO special unit (we may be renaming a Text_IO child)
+  --  Check for Text_IO special units (we may be renaming a Text_IO child),
+  --  but make sure not to catch renamings generated for package instances
+  --  that have nothing to do with them but are nevertheless homonyms.
 
-  Check_Text_IO_Special_Unit (Name (N));
+  if Is_Entity_Name (Name (N))
+and then Present (Entity (Name (N)))
+and then Is_Generic_Instance (Entity (Name (N)))
+  then
+ null;
+  else
+ Check_Text_IO_Special_Unit (Name (N));
+  end if;
 
   if Current_Scope /= Standard_Standard then
  Set_Is_Pure (New_P, Is_Pure (Current_Scope));




[Ada] Fix style in calls to Compile_Time_Constraint_Error

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
Cleanup related to handling of -gnatwE (warnings-as-errors) in instances
of generic units.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* checks.adb (Null_Exclusion_Static_Checks,
Selected_Range_Checks): Fix style.diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -4455,8 +4455,8 @@ package body Checks is
 
 Discard_Node
   (Compile_Time_Constraint_Error
- (N  => N,
-  Msg=>
+ (N   => N,
+  Msg =>
 "(Ada 2005) null-excluding component % of object % must "
 & "be initialized??",
   Ent => Defining_Identifier (Comp)));
@@ -4467,8 +4467,8 @@ package body Checks is
  elsif Array_Comp then
 Discard_Node
   (Compile_Time_Constraint_Error
- (N  => N,
-  Msg=>
+ (N   => N,
+  Msg =>
 "(Ada 2005) null-excluding array components must "
 & "be initialized??",
   Ent => Defining_Identifier (N)));
@@ -10896,8 +10896,8 @@ package body Checks is
   Add_Check
 (Compile_Time_Constraint_Error
((if Present (Warn_Node)
-then Warn_Node else Low_Bound (Expr)),
-"static value does not equal lower bound of}??",
+ then Warn_Node else Low_Bound (Expr)),
+ "static value does not equal lower bound of}??",
 T_Typ));
   Check_Added := True;
end if;




[Ada] Refactor repeated implicit conversion from Char_Code to Uint

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
When resolving a string literal we examine each character against low
and high bounds of the expected type. We stored each character as an Int
and implicitly converted it to Uint twice: for "<" and ">" operators.
Now we store convert it to Uint explicitly and only once.

Cleanup related to handling of compile-time constraints errors.
Semantics is unaffacted.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_res.adb (Resolve_String_Literal): Avoid unnecessary
conversions inside "<" and ">" bodies.diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -11722,14 +11722,14 @@ package body Sem_Res is
Comp_Typ_Hi : constant Node_Id :=
Type_High_Bound (Component_Type (Typ));
 
-   Char_Val : Int;
+   Char_Val : Uint;
 
 begin
if Compile_Time_Known_Value (Comp_Typ_Lo)
  and then Compile_Time_Known_Value (Comp_Typ_Hi)
then
   for J in 1 .. Strlen loop
- Char_Val := Int (Get_String_Char (Str, J));
+ Char_Val := UI_From_CC (Get_String_Char (Str, J));
 
  if Char_Val < Expr_Value (Comp_Typ_Lo)
or else Char_Val > Expr_Value (Comp_Typ_Hi)




[Ada] Simplify type conversions in source pointer arithmetic

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
Code cleanup; semantics is unaffacted.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_res.adb (Resolve_String_Literal): Simplify pointer
arithmetic.diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -11737,7 +11737,7 @@ package body Sem_Res is
 Apply_Compile_Time_Constraint_Error
   (N, "character out of range??",
CE_Range_Check_Failed,
-   Loc => Source_Ptr (Int (Loc) + J));
+   Loc => Loc + Source_Ptr (J));
  end if;
   end loop;
 




[Ada] Fix style in comments about warning messages

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
Cleanup related to handling of messages for compile-time known
constraint errors.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* errout.adb (Error_Msg_Internal): Reorder words.
* erroutc.ads (Is_Warning_Msg): Add closing paren.
* sem_util.adb (Compile_Time_Constraint_Error): Fix casing.diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -1049,7 +1049,7 @@ package body Errout is
  elsif No (Cunit (Main_Unit)) then
 null;
 
- --  If the flag location is not in the main extended source unit, then
+ --  If the flag location is not in the extended main source unit, then
  --  we want to eliminate the warning, unless it is in the extended
  --  main code unit and we want warnings on the instance.
 


diff --git a/gcc/ada/erroutc.ads b/gcc/ada/erroutc.ads
--- a/gcc/ada/erroutc.ads
+++ b/gcc/ada/erroutc.ads
@@ -69,7 +69,7 @@ package Erroutc is
 
Is_Warning_Msg : Boolean := False;
--  Set True to indicate if current message is warning message (contains ?
-   --  or contains < and Error_Msg_Warn is True.
+   --  or contains < and Error_Msg_Warn is True).
 
Is_Info_Msg : Boolean := False;
--  Set True to indicate that the current message starts with the characters


diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -6604,7 +6604,7 @@ package body Sem_Util is
   Error_Msg_Warn := Warn or SPARK_Mode /= On;
 
   --  A static constraint error in an instance body is not a fatal error.
-  --  we choose to inhibit the message altogether, because there is no
+  --  We choose to inhibit the message altogether, because there is no
   --  obvious node (for now) on which to post it. On the other hand the
   --  offending node must be replaced with a constraint_error in any case.
 




[Ada] Remove unreferenced Warn_On_Instance

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
Cleanup related to handling of -gnatwE (warnings-as-errors) in instances
of generic units.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* err_vars.ads (Warn_On_Instance): Remove; it was a relic from
the previous handling of warning in instances that was removed
decades ago.diff --git a/gcc/ada/err_vars.ads b/gcc/ada/err_vars.ads
--- a/gcc/ada/err_vars.ads
+++ b/gcc/ada/err_vars.ads
@@ -61,14 +61,6 @@ package Err_Vars is
--  not get reset by any Error_Msg call, so the caller is responsible
--  for resetting it.
 
-   Warn_On_Instance : Boolean := False;
-   --  Normally if a warning is generated in a generic template from the
-   --  analysis of the template, then the warning really belongs in the
-   --  template, and the default value of False for this Boolean achieves
-   --  that effect. If Warn_On_Instance is set True, then the warnings are
-   --  generated on the instantiation (referring to the template) rather
-   --  than on the template itself.
-
Raise_Exception_On_Error : Nat := 0;
--  If this value is non-zero, then any attempt to generate an error
--  message raises the exception Error_Msg_Exception, and the error




[Ada] Fix regression in freezing code for instantiations

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
When going to the outer level for the placement of a freeze node in the
case where the current package has no body, the previous change would
overlook instantiations whose body has not materialized yet.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch12.adb (Insert_Freeze_Node_For_Instance): When going to
the outer level, do not jump over following instantiations in
the list.diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -9703,6 +9703,7 @@ package body Sem_Ch12 is
   Decl : Node_Id;
   Decls: List_Id;
   Inst : Entity_Id;
+  Origin   : Entity_Id;
   Par_Inst : Node_Id;
   Par_N: Node_Id;
 
@@ -9793,9 +9794,10 @@ package body Sem_Ch12 is
  end;
   end if;
 
-  Decl  := N;
-  Decls := List_Containing (N);
-  Par_N := Parent (Decls);
+  Decl   := N;
+  Decls  := List_Containing (N);
+  Par_N  := Parent (Decls);
+  Origin := Empty;
 
   --  Determine the proper freeze point of an instantiation
 
@@ -9824,10 +9826,13 @@ package body Sem_Ch12 is
 not In_Same_Source_Unit (Generic_Parent (Par_Inst), Inst)
 then
while Present (Decl) loop
-  if (Nkind (Decl) in N_Unit_Body
+  if ((Nkind (Decl) in N_Unit_Body
 or else
-  Nkind (Decl) in N_Body_Stub)
-and then Comes_From_Source (Decl)
+   Nkind (Decl) in N_Body_Stub)
+  and then Comes_From_Source (Decl))
+or else (Present (Origin)
+  and then Nkind (Decl) in N_Generic_Instantiation
+  and then Instance_Spec (Decl) /= Origin)
   then
  Set_Sloc (F_Node, Sloc (Decl));
  Insert_Before (Decl, F_Node);
@@ -9851,16 +9856,19 @@ package body Sem_Ch12 is
return;
 
 --  When the instantiation occurs in a package spec and there is
---  no source body which follows, not even of the package itself
---  then insert into the declaration list of the outer level.
+--  no source body which follows, not even of the package itself,
+--  then insert into the declaration list of the outer level, but
+--  do not jump over following instantiations in this list because
+--  they may have a body that has not materialized yet, see above.
 
 elsif Nkind (Par_N) = N_Package_Specification
   and then No (Corresponding_Body (Parent (Par_N)))
   and then Is_List_Member (Parent (Par_N))
 then
-   Decl  := Parent (Par_N);
-   Decls := List_Containing (Decl);
-   Par_N := Parent (Decls);
+   Decl   := Parent (Par_N);
+   Decls  := List_Containing (Decl);
+   Par_N  := Parent (Decls);
+   Origin := Decl;
 
 --  In a package declaration, or if no source body which follows
 --  and at library level, then insert at end of list.




[Ada] Remove unnecessary guards for non-empty lists

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
All node lists can be safely iterated with First/Present/Next. There is
no need for explicit guard against empty lists. Code cleanup.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* errout.adb (Remove_Warning_Messages): Remove unnecessary guard.
* exp_util.adb (Kill_Dead_Code): Likewise.
* par_sco.adb (Traverse_Declarations_Or_Statements): Likewise.
* sem_ch3.adb (Build_Derived_Record_Type): Likewise.
* sem_ch4.adb (Traverse_Interfaces): Likewise.
* sem_eval.adb (Traverse_Interfaces): Likewise.
* sem_util.adb (Collect_Interfaces): Likewise.
(Has_Non_Null_Statements, Side_Effect_Free_Statements):
Likewise; turn into WHILE loops, for consistency.diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -3378,13 +3378,11 @@ package body Errout is
procedure Remove_Warning_Messages (L : List_Id) is
   Stat : Node_Id;
begin
-  if Is_Non_Empty_List (L) then
- Stat := First (L);
- while Present (Stat) loop
-Remove_Warning_Messages (Stat);
-Next (Stat);
- end loop;
-  end if;
+  Stat := First (L);
+  while Present (Stat) loop
+ Remove_Warning_Messages (Stat);
+ Next (Stat);
+  end loop;
end Remove_Warning_Messages;
 



diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -9495,14 +9495,12 @@ package body Exp_Util is
begin
   W := Warn;
 
-  if Is_Non_Empty_List (L) then
- N := First (L);
- while Present (N) loop
-Kill_Dead_Code (N, W);
-W := False;
-Next (N);
- end loop;
-  end if;
+  N := First (L);
+  while Present (N) loop
+ Kill_Dead_Code (N, W);
+ W := False;
+ Next (N);
+  end loop;
end Kill_Dead_Code;
 
-


diff --git a/gcc/ada/par_sco.adb b/gcc/ada/par_sco.adb
--- a/gcc/ada/par_sco.adb
+++ b/gcc/ada/par_sco.adb
@@ -2417,21 +2417,18 @@ package body Par_SCO is
 
   --  Loop through statements or declarations
 
-  if Is_Non_Empty_List (L) then
- N := First (L);
- while Present (N) loop
+  N := First (L);
+  while Present (N) loop
 
---  Note: For separate bodies, we see the tree after Par.Labl has
---  introduced implicit labels, so we need to ignore those nodes.
+ --  Note: For separate bodies, we see the tree after Par.Labl has
+ --  introduced implicit labels, so we need to ignore those nodes.
 
-if Nkind (N) /= N_Implicit_Label_Declaration then
-   Traverse_One (N);
-end if;
-
-Next (N);
- end loop;
+ if Nkind (N) /= N_Implicit_Label_Declaration then
+Traverse_One (N);
+ end if;
 
-  end if;
+ Next (N);
+  end loop;
 
   --  End sequence of statements and flush deferred decisions
 


diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -9351,13 +9351,11 @@ package body Sem_Ch3 is
  declare
 Iface : Node_Id;
  begin
-if Is_Non_Empty_List (Interface_List (Type_Def)) then
-   Iface := First (Interface_List (Type_Def));
-   while Present (Iface) loop
-  Freeze_Before (N, Etype (Iface));
-  Next (Iface);
-   end loop;
-end if;
+Iface := First (Interface_List (Type_Def));
+while Present (Iface) loop
+   Freeze_Before (N, Etype (Iface));
+   Next (Iface);
+end loop;
  end;
   end if;
 


diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb
--- a/gcc/ada/sem_ch4.adb
+++ b/gcc/ada/sem_ch4.adb
@@ -9597,31 +9597,29 @@ package body Sem_Ch4 is
  begin
 Error := False;
 
-if Is_Non_Empty_List (Intface_List) then
-   Intface := First (Intface_List);
-   while Present (Intface) loop
+Intface := First (Intface_List);
+while Present (Intface) loop
 
-  --  Look for acceptable class-wide homonyms associated with
-  --  the interface.
+   --  Look for acceptable class-wide homonyms associated with the
+   --  interface.
 
-  Traverse_Homonyms (Etype (Intface), Error);
+   Traverse_Homonyms (Etype (Intface), Error);
 
-  if Error then
- return;
-  end if;
+   if Error then
+  return;
+   end if;
 
-  --  Continue the search by looking at each of the interface's
-  --  associated interface ancestors.
+   --  Continue the search by looking at each of the 

[Ada] Move messages on division by zero to the right operand

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
All compile-time messages about division by zero are now located at the
right operand. Previously some of them were located at the division
operator, which was inconsistent.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_eval.adb (Eval_Arithmetic_Op): Add Loc parameter to all
calls to Apply_Compile_Time_Constraint_Error related to division
by zero.diff --git a/gcc/ada/sem_eval.adb b/gcc/ada/sem_eval.adb
--- a/gcc/ada/sem_eval.adb
+++ b/gcc/ada/sem_eval.adb
@@ -2117,6 +2117,7 @@ package body Sem_Eval is
 
  Apply_Compile_Time_Constraint_Error
(N, "division by zero", CE_Divide_By_Zero,
+Loc  => Sloc (Right),
 Warn => not Stat or SPARK_Mode = On);
  return;
 
@@ -2139,6 +2140,7 @@ package body Sem_Eval is
 
  Apply_Compile_Time_Constraint_Error
(N, "mod with zero divisor", CE_Divide_By_Zero,
+Loc  => Sloc (Right),
 Warn => not Stat or SPARK_Mode = On);
  return;
 
@@ -2159,6 +2161,7 @@ package body Sem_Eval is
 
  Apply_Compile_Time_Constraint_Error
(N, "rem with zero divisor", CE_Divide_By_Zero,
+Loc  => Sloc (Right),
 Warn => not Stat or SPARK_Mode = On);
  return;
 
@@ -2218,7 +2221,8 @@ package body Sem_Eval is
 else pragma Assert (Nkind (N) = N_Op_Divide);
if UR_Is_Zero (Right_Real) then
   Apply_Compile_Time_Constraint_Error
-(N, "division by zero", CE_Divide_By_Zero);
+(N, "division by zero", CE_Divide_By_Zero,
+ Loc => Sloc (Right));
   return;
end if;
 




[Ada] Remove a locally handled exception

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
Code cleanup related to handling of warnings-as-errors. Semantics is
unaffected.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch4.adb (Expand_Concatenate): There is no reason for using
declaring, raising and catching an exception; a simple return
statement is enough.diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -2695,9 +2695,6 @@ package body Exp_Ch4 is
   --  lengths of operands. The choice of this type is a little subtle and
   --  is discussed in a separate section at the start of the body code.
 
-  Concatenation_Error : exception;
-  --  Raised if concatenation is sure to raise a CE
-
   Result_May_Be_Null : Boolean := True;
   --  Reset to False if at least one operand is encountered which is known
   --  at compile time to be non-null. Used for handling the special case
@@ -3460,7 +3457,16 @@ package body Exp_Ch4 is
   --  Catch the static out of range case now
 
   if Raises_Constraint_Error (High_Bound) then
- raise Concatenation_Error;
+ --  Kill warning generated for the declaration of the static out of
+ --  range high bound, and instead generate a Constraint_Error with
+ --  an appropriate specific message.
+
+ Kill_Dead_Code (Declaration_Node (Entity (High_Bound)));
+ Apply_Compile_Time_Constraint_Error
+   (N  => Cnode,
+Msg=> "concatenation result upper bound out of range??",
+Reason => CE_Range_Check_Failed);
+ return;
   end if;
 
   --  Now we will generate the assignments to do the actual concatenation
@@ -3629,19 +3635,6 @@ package body Exp_Ch4 is
   pragma Assert (Present (Result));
   Rewrite (Cnode, Result);
   Analyze_And_Resolve (Cnode, Atyp);
-
-   exception
-  when Concatenation_Error =>
-
- --  Kill warning generated for the declaration of the static out of
- --  range high bound, and instead generate a Constraint_Error with
- --  an appropriate specific message.
-
- Kill_Dead_Code (Declaration_Node (Entity (High_Bound)));
- Apply_Compile_Time_Constraint_Error
-   (N  => Cnode,
-Msg=> "concatenation result upper bound out of range??",
-Reason => CE_Range_Check_Failed);
end Expand_Concatenate;
 
---




[Ada] Simplify traversal for removing warnings from dead code

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
Cleanup related to handling of warnings-as-errors. Semantics is
unaffected.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* errout.adb (Remove_Warning_Messages): Use traversal procedure
instead of traversal function, since we discard status of each
step anyway.diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -3254,7 +3254,7 @@ package body Errout is
   function Check_For_Warning (N : Node_Id) return Traverse_Result;
   --  This function checks one node for a possible warning message
 
-  function Check_All_Warnings is new Traverse_Func (Check_For_Warning);
+  procedure Check_All_Warnings is new Traverse_Proc (Check_For_Warning);
   --  This defines the traversal operation
 
   ---
@@ -3341,37 +3341,23 @@ package body Errout is
 --  the current tree. Given that we are in unreachable code, this
 --  modification to the tree is harmless.
 
-declare
-   Status : Traverse_Final_Result;
-
-begin
-   if Is_List_Member (N) then
-  Set_Condition (N, Original_Node (N));
-  Status := Check_All_Warnings (Condition (N));
-   else
-  Rewrite (N, Original_Node (N));
-  Status := Check_All_Warnings (N);
-   end if;
-
-   return Status;
-end;
-
- else
-return OK;
+if Is_List_Member (N) then
+   Set_Condition (N, Original_Node (N));
+   Check_All_Warnings (Condition (N));
+else
+   Rewrite (N, Original_Node (N));
+   Check_All_Warnings (N);
+end if;
  end if;
+
+ return OK;
   end Check_For_Warning;
 
--  Start of processing for Remove_Warning_Messages
 
begin
   if Warnings_Detected /= 0 then
- declare
-Discard : Traverse_Final_Result;
-pragma Warnings (Off, Discard);
-
- begin
-Discard := Check_All_Warnings (N);
- end;
+ Check_All_Warnings (N);
   end if;
end Remove_Warning_Messages;
 




[Ada] Proof of System.Generic_Array_Operations at silver level

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
Proof of the generic unit to array operations (vector/matrix), only at
silver level, for runtime errors that come from the generic part of the
unit. This does not prove e.g. absence of overflows in an instantiation
related to arithmetic operations passed as formal generic subprogram
parameters.

Justify 2 checks with pragma Annotate as False_Positive (related to
absence of aliasing) and 3 checks as Intentional (possible overflow and
explicit exception).

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/a-ngcoar.adb: Add pragma to ignore assertions in
instance.
* libgnat/a-ngrear.adb: Likewise.
* libgnat/s-gearop.adb: Prove implementation is free of runtime
errors.
* libgnat/s-gearop.ads: Add contracts to protect against runtime
errors in the generic part.

patch.diff.gz
Description: application/gzip


[Ada] Remove unnecessary declare block

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
Code cleanup related to handling of warnings-as-errors. Semantics is
unaffected.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* errout.adb (Adjust_Name_Case): Remove unnecessary declare
block.diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -3391,60 +3391,57 @@ package body Errout is
  (Buf : in out Bounded_String;
   Loc : Source_Ptr)
is
+  Src_Ind : constant Source_File_Index := Get_Source_File_Index (Loc);
+  Sbuffer : Source_Buffer_Ptr;
+  Ref_Ptr : Integer;
+  Src_Ptr : Source_Ptr;
+
begin
   --  We have an all lower case name from Namet, and now we want to set
   --  the appropriate case. If possible we copy the actual casing from
   --  the source. If not we use standard identifier casing.
 
-  declare
- Src_Ind : constant Source_File_Index := Get_Source_File_Index (Loc);
- Sbuffer : Source_Buffer_Ptr;
- Ref_Ptr : Integer;
- Src_Ptr : Source_Ptr;
+  Ref_Ptr := 1;
+  Src_Ptr := Loc;
 
-  begin
- Ref_Ptr := 1;
- Src_Ptr := Loc;
+  --  For standard locations, always use mixed case
 
- --  For standard locations, always use mixed case
+  if Loc <= No_Location then
+ Set_Casing (Buf, Mixed_Case);
 
- if Loc <= No_Location then
-Set_Casing (Buf, Mixed_Case);
+  else
+ --  Determine if the reference we are dealing with corresponds to text
+ --  at the point of the error reference. This will often be the case
+ --  for simple identifier references, and is the case where we can
+ --  copy the casing from the source.
+
+ Sbuffer := Source_Text (Src_Ind);
+
+ while Ref_Ptr <= Buf.Length loop
+exit when
+  Fold_Lower (Sbuffer (Src_Ptr)) /=
+Fold_Lower (Buf.Chars (Ref_Ptr));
+Ref_Ptr := Ref_Ptr + 1;
+Src_Ptr := Src_Ptr + 1;
+ end loop;
 
- else
---  Determine if the reference we are dealing with corresponds to
---  text at the point of the error reference. This will often be
---  the case for simple identifier references, and is the case
---  where we can copy the casing from the source.
+ --  If we get through the loop without a mismatch, then output the
+ --  name the way it is cased in the source program.
 
-Sbuffer := Source_Text (Src_Ind);
+ if Ref_Ptr > Buf.Length then
+Src_Ptr := Loc;
 
-while Ref_Ptr <= Buf.Length loop
-   exit when
- Fold_Lower (Sbuffer (Src_Ptr)) /=
-   Fold_Lower (Buf.Chars (Ref_Ptr));
-   Ref_Ptr := Ref_Ptr + 1;
+for J in 1 .. Buf.Length loop
+   Buf.Chars (J) := Sbuffer (Src_Ptr);
Src_Ptr := Src_Ptr + 1;
 end loop;
 
---  If we get through the loop without a mismatch, then output the
---  name the way it is cased in the source program
-
-if Ref_Ptr > Buf.Length then
-   Src_Ptr := Loc;
+ --  Otherwise set the casing using the default identifier casing
 
-   for J in 1 .. Buf.Length loop
-  Buf.Chars (J) := Sbuffer (Src_Ptr);
-  Src_Ptr := Src_Ptr + 1;
-   end loop;
-
---  Otherwise set the casing using the default identifier casing
-
-else
-   Set_Casing (Buf, Identifier_Casing (Src_Ind));
-end if;
+ else
+Set_Casing (Buf, Identifier_Casing (Src_Ind));
  end if;
-  end;
+  end if;
end Adjust_Name_Case;
 
---




[Ada] Warn on subtype declaration of null range

2022-01-06 Thread Pierre-Marie de Rodat via Gcc-patches
This patch adds a warning on a subtype declaration with a
compile-time-known range constraint that is a null range.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_res.adb (Resolve_Range): Warn on null range, unless we are
inside a generic unit or an instance thereof.
* sem_ch3.adb (Analyze_Subtype_Indication): Minor: avoid double
negative.diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -6048,13 +6048,13 @@ package body Sem_Ch3 is
begin
   Analyze (T);
 
-  if R /= Error then
+  if R = Error then
+ Set_Error_Posted (R);
+ Set_Error_Posted (T);
+  else
  Analyze (R);
  Set_Etype (N, Etype (R));
  Resolve (R, Entity (T));
-  else
- Set_Error_Posted (R);
- Set_Error_Posted (T);
   end if;
end Analyze_Subtype_Indication;
 


diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -10754,6 +10754,30 @@ package body Sem_Res is
 Fold_Uint (H, Expr_Value (H), Static => True);
  end if;
   end if;
+
+  --  If we have a compile-time-known null range, we warn, because that is
+  --  likely to be a mistake. (Dynamic null ranges make sense, but often
+  --  compile-time-known ones do not.) Warn only if this is in a subtype
+  --  declaration. We do this here, rather than while analyzing a subtype
+  --  declaration, in case we decide to expand the cases. We do not want to
+  --  warn in all cases, because some are idiomatic, such as an empty
+  --  aggregate (1 .. 0 => <>).
+
+  --  We don't warn in generics or their instances, because there might be
+  --  some instances where the range is null, and some where it is not,
+  --  which would lead to false alarms.
+
+  if not (Inside_A_Generic or In_Instance)
+and then Comes_From_Source (N)
+and then Compile_Time_Compare
+  (Low_Bound (N), High_Bound (N), Assume_Valid => True) = GT
+and then Nkind (Parent (N)) = N_Range_Constraint
+and then Nkind (Parent (Parent (N))) = N_Subtype_Indication
+and then Nkind (Parent (Parent (Parent (N = N_Subtype_Declaration
+and then Is_OK_Static_Range (N)
+  then
+ Error_Msg_N ("null range??", N);
+  end if;
end Resolve_Range;
 
--




Re: [PATCH] [i386] Optimize V16HF vector insert to element 0 for AVX2.

2022-01-06 Thread Uros Bizjak via Gcc-patches
On Thu, Jan 6, 2022 at 10:22 AM liuhongt via Gcc-patches
 wrote:
>
> Also remove mode attribute blendsuf, use ssemodesuf instead.
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ready to push to trunk.
>
> gcc/ChangeLog:
>
> PR target/103753
> * config/i386/i386-expand.c (ix86_expand_vector_set): Not use
> gen_avx2_pblendph_pblendd when elt == 0.
> * config/i386/sse.md (avx2_pblendph): Rename to ..
> (avx2_pblend_pblendd).. this, and extend to V16HI.
> (*avx2_pblendw): Rename to ..
> (*avx2_pblend): .. this, and extend to V16HF.
> (avx2_pblendw): Rename to ..
> (*avx2_pblend): .. this, and extend to V16HF.
> (blendsuf): Removed.
> (sse4_1_pblend): Renamed to ..
> (sse4_1_pblend): .. this.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr103753.c: New test.

LGTM, with a small nit below.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-expand.c|  5 ++-
>  gcc/config/i386/sse.md   | 48 +++-
>  gcc/testsuite/gcc.target/i386/pr103753.c | 17 +
>  3 files changed, 42 insertions(+), 28 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103753.c
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index e93ef1cafa6..0d219d6bb69 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -16245,10 +16245,11 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, 
> rtx val, int elt)
>goto half;
>
>  case E_V16HFmode:
> -  if (TARGET_AVX2)
> +  /* For ELT == 0, vec_setv8hf_0 can save 1 vpbroadcastw.  */
> +  if (TARGET_AVX2 && elt != 0)
> {
>   mmode = SImode;
> - gen_blendm = gen_avx2_pblendph;
> + gen_blendm = gen_avx2_pblendph_pblendd;
>   blendm_const = true;
>   break;
> }
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 033b60d9aa2..c986c73bef8 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -21291,10 +21291,7 @@ (define_insn_and_split 
> "*_pblendvb_lt_subreg_not"
>   (lt:VI1_AVX2 (match_dup 3) (match_dup 4))] UNSPEC_BLENDV))]
>"operands[3] = gen_lowpart (mode, operands[3]);")
>
> -(define_mode_attr blendsuf
> -  [(V8HI "w") (V8HF "ph")])
> -
> -(define_insn "sse4_1_pblend"
> +(define_insn "sse4_1_pblend"
>[(set (match_operand:V8_128 0 "register_operand" "=Yr,*x,x")
> (vec_merge:V8_128
>   (match_operand:V8_128 2 "vector_operand" "YrBm,*xBm,xm")
> @@ -21313,11 +21310,11 @@ (define_insn "sse4_1_pblend"
> (set_attr "mode" "TI")])
>
>  ;; The builtin uses an 8-bit immediate.  Expand that.
> -(define_expand "avx2_pblendw"
> -  [(set (match_operand:V16HI 0 "register_operand")
> -   (vec_merge:V16HI
> - (match_operand:V16HI 2 "nonimmediate_operand")
> - (match_operand:V16HI 1 "register_operand")
> +(define_expand "avx2_pblend"
> +  [(set (match_operand:V16_256 0 "register_operand")
> +   (vec_merge:V16_256
> + (match_operand:V16_256 2 "nonimmediate_operand")
> + (match_operand:V16_256 1 "register_operand")
>   (match_operand:SI 3 "const_0_to_255_operand")))]
>"TARGET_AVX2"
>  {
> @@ -21325,11 +21322,11 @@ (define_expand "avx2_pblendw"
>operands[3] = GEN_INT (val << 8 | val);
>  })
>
> -(define_expand "avx2_pblendph"
> -  [(set (match_operand:V16HF 0 "register_operand")
> -   (vec_merge:V16HF
> - (match_operand:V16HF 2 "register_operand")
> - (match_operand:V16HF 1 "register_operand")
> +(define_expand "avx2_pblend_pblendd"

Just name the above avx2_pblend_1.

> +  [(set (match_operand:V16_256 0 "register_operand")
> +   (vec_merge:V16_256
> + (match_operand:V16_256 2 "register_operand")
> + (match_operand:V16_256 1 "register_operand")
>   (match_operand:SI 3 "const_int_operand")))]
>"TARGET_AVX2
>&& !((INTVAL (operands[3]) & 0xff) && (INTVAL (operands[3]) & 0xff00))"
> @@ -21339,7 +21336,7 @@ (define_expand "avx2_pblendph"
>  emit_move_insn (operands[0], operands[1]);
>else
> {
> - rtx tmp = gen_reg_rtx (V16HImode);
> + rtx tmp = gen_reg_rtx (mode);
>   rtx blendw_idx, blendd_idx;
>
>   if (mask & 0xff)
> @@ -21352,13 +21349,12 @@ (define_expand "avx2_pblendph"
>  blendw_idx = GEN_INT (mask >> 8 & 0xff);
>  blendd_idx = GEN_INT (240);
> }
> - operands[1] = lowpart_subreg (V16HImode, operands[1], V16HFmode);
> - operands[2] = lowpart_subreg (V16HImode, operands[2], V16HFmode);
> - emit_insn (gen_avx2_pblendw (tmp, operands[1], operands[2], 
> blendw_idx));
> + emit_insn (gen_avx2_pblend (tmp, operands[1],
> +   operands[2], blendw_idx));
>
> - operands[0] = lowpart_subreg (V8SImode, operands[0], V16HFmode);
> - tmp = lowpart_subreg (V8SImode, tmp, V16HImode);
> - operands[1] = lowpart_

Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-06 Thread Tom de Vries via Gcc-patches

On 1/6/22 10:29, Tom de Vries wrote:

At first glance, the above behaviour doesn't look like a too short timeout.


Using patch below, this passes for me, I'm currently doing a full build 
and test to confirm.


Looks like it has to do with:
...
For sm_6x and earlier architectures, atom operations on .shared state 
space do not guarantee atomicity with respect to normal store 
instructions to the same address. It is the programmer's responsibility 
to guarantee correctness of programs that use shared memory
atomic instructions, e.g., by inserting barriers between normal stores 
and atomic operations to a common address, or by using atom.exch to 
store to locations accessed by other atomic operations.

...

My current understanding is that this is a backend problem, and needs to 
be fixed by defining atomic_store patterns which take care of this 
peculiarity.


Thanks,
- Tom

diff --git a/libgomp/config/nvptx/allocator.c 
b/libgomp/config/nvptx/allocator.c

index 6bc2ea48043..4524122b3e7 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -122,7 +122,8 @@ nvptx_memspace_alloc (omp_memspace_handle_t 
memspace, size_t size)


}

   /* Update the free chain root and release the lock.  */
-  __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, 
MEMMODEL_RELEASE);

+  __atomic_exchange_n (&__nvptx_lowlat_heap_root,
+  root.raw, MEMMODEL_RELEASE);
   return result;
 }
   else-
@@ -221,7 +222,8 @@ nvptx_memspace_free (omp_memspace_handle_t memspace, 
void *addr, s

ize_t size)
root.raw = newfreechunk.raw;

   /* Update the free chain root and release the lock.  */
-  __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, 
MEMMODEL_RELEASE);

+  __atomic_exchange_n (&__nvptx_lowlat_heap_root,
+  root.raw, MEMMODEL_RELEASE);
 }
   else
 free (addr);
@@ -331,7 +333,8 @@ nvptx_memspace_realloc (omp_memspace_handle_t 
memspace, void *addr

,
   /* Else realloc in-place has failed and result remains NULL.  */

   /* Update the free chain root and release the lock.  */
-  __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, 
MEMMODEL_RELEASE);

+  __atomic_exchange_n (&__nvptx_lowlat_heap_root,
+  root.raw, MEMMODEL_RELEASE);

   if (result == NULL)
{


Re: [PATCH v4 04/12] LoongArch Port: Machine Decsription files.

2022-01-06 Thread Xi Ruoyao via Gcc-patches
On Fri, 2021-12-24 at 17:28 +0800, chenglulu wrote:
> +(define_insn "*zero_extendsidi2_internal"
> +  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
> +   (subreg:DI (match_operand:SI 1 "nonimmediate_operand" "r,ZC,W") 0))]
> +  "TARGET_64BIT"
> +  "@
> +   bstrpick.d\t%0,%1,31,0
> +   ldptr.w\t%0,%1\n\tlu32i.d\t%0,0
> +   ld.wu\t%0,%1"
> +  [(set_attr "move_type" "arith,load,load")
> +   (set_attr "mode" "DI")
> +   (set_attr "insn_count" "1,2,1")])

This pattern is generating wrong code, causing

FAIL: gcc.dg/compat/scalar-by-value-3 c_compat_x_tst.o-c_compat_y_tst.o execute

This failure is a real bug, the reduced testcase is attached.  In the
assembly:

# ...
bstrins.d   $r5,$r13,31,0
addi.d  $r12,$r22,-228
bstrpick.d  $r12,$r12,31,0
bstrins.d   $r5,$r12,63,32
addi.w  $r4,$r0,14  # 0xe
bl  testvaci

This obviously does not make any sense: it calculates the *address* of
g[0]'s imaginary part, truncates it to 32-bit, then pass it as the
*value* of the imaginary part to testvaci().

The problem is: before the reload pass, the compiler consider g[0] a
(virtual) register.  It only becomes MEM after the reload pass.  Adding
reload_completed as the condition of this insn seems able to fix the
issue.  However I'm not sure if other insns need the change too.

> +;; See the comment before the *and3 pattern why this is generated by
> +;; combine.

A minor issue: the comment before 'and3' does not exist.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
#include 
#include 

extern void testvaci (int n, ...);

int main() {
int i;
volatile _Complex int g[14];
for (i = 0; i < 14; i++)
g[i] = i + 3 + 3i;
testvaci (14, g[0], g[1], g[2], g[3], g[4], g[5], g[6], g[7], g[8], g[9], g[10], g[11], g[12], g[13]); 
return 0;
}

void testvaci (int n, ...) {
int i;
va_list ap;
va_start(ap, n);
for (i = 0; i < n; i++) {
_Complex int t = __builtin_va_arg(ap, _Complex int);
assert(t == i + 3 + 3i);
}
va_end(ap);
}


[PATCH] x86_64: Improve (interunit) moves from TImode to V1TImode.

2022-01-06 Thread Roger Sayle
 

This patch improves the code generated when moving a 128-bit value

in TImode, represented by two 64-bit registers, to V1TImode, which

is a single SSE register.

 

Currently, the simple move:

typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));

uv1ti foo(__int128 x) { return (uv1ti)x; }

 

is always transferred via memory, as:

foo:movq%rdi, -24(%rsp)

movq%rsi, -16(%rsp)

movdqa  -24(%rsp), %xmm0

ret

 

with this patch, we now generate (with -msse2):

foo:movq%rdi, %xmm1

movq%rsi, %xmm2

punpcklqdq  %xmm2, %xmm1

movdqa  %xmm1, %xmm0

ret

 

and with -mavx2:

foo:vmovq   %rdi, %xmm1

vpinsrq $1, %rsi, %xmm1, %xmm0

ret

 

Even more dramatic is the improvement of zero extended transfers.

 

uv1ti bar(unsigned char c) { return (uv1ti)(__int128)c; }

 

Previously generated:

bar:movq$0, -16(%rsp)

movzbl  %dil, %eax

movq%rax, -24(%rsp)

vmovdqa -24(%rsp), %xmm0

ret

 

Now generates:

bar:movzbl  %dil, %edi

movq%rdi, %xmm0

ret

 

 

My first attempt at this functionality attempted to use a

simple define_split:

 

+;; Move TImode to V1TImode via V2DImode instead of memory.

+(define_split

+  [(set (match_operand:V1TI 0 "register_operand")

+(subreg:V1TI (match_operand:TI 1 "register_operand") 0))]

+  "TARGET_64BIT && TARGET_SSE2 && can_create_pseudo_p ()"

+  [(set (match_dup 2) (vec_concat:V2DI (match_dup 3) (match_dup 4)))

+   (set (match_dup 0) (subreg:V1TI (match_dup 2) 0))]

+{

+  operands[2] = gen_reg_rtx (V2DImode);

+  operands[3] = gen_lowpart (DImode, operands[1]);

+  operands[4] = gen_highpart (DImode, operands[1]);

+})

+

 

Unfortunately, this triggers very late during the compilation

preventing some of the simplification's we'd like (in combine).

For example the foo case above becomes:

 

foo:movq%rsi, -16(%rsp)

movq%rdi, %xmm0

movhps  -16(%rsp), %xmm0

 

transferring half directly, and the other half via memory.

And for the bar case above, GCC fails to appreciate that

movq/vmovq clears the high bits, resulting in:

 

bar:movzbl  %dil, %eax

xorl%edx, %edx

vmovq   %rax, %xmm1

vpinsrq $1, %rdx, %xmm1, %xmm0

ret

 

 

Hence the solution (i.e. this patch) is to add a special case

to ix86_expand_vector_move for TImode to V1TImode transfers.

 

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap

and make -k check with no new failures.  Ok for mainline?

 

 

2022-01-06  Roger Sayle  

 

gcc/ChangeLog

* config/i386/i386-expand.c (ix86_expand_vector_move): Add

special case for TImode to V1TImode moves, going via V2DImode.

 

gcc/testsuite/ChangeLog

* gcc.target/i386/sse2-v1ti-mov-1.c: New test case.

* gcc.target/i386/sse2-v1ti-zext.c: New test case.

 

 

Many thanks in advance,

Roger

--

 

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 7013c20..47c4870 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -615,6 +615,23 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[])
   return;
 }
 
+  /* Special case TImode to V1TImode conversions, via V2DI.  */
+  if (mode == V1TImode
+  && SUBREG_P (op1)
+  && GET_MODE (SUBREG_REG (op1)) == TImode
+  && TARGET_64BIT && TARGET_SSE
+  && can_create_pseudo_p ())
+{
+  rtx tmp = gen_reg_rtx (V2DImode);
+  rtx lo = gen_reg_rtx (DImode);
+  rtx hi = gen_reg_rtx (DImode);
+  emit_move_insn (lo, gen_lowpart (DImode, SUBREG_REG (op1)));
+  emit_move_insn (hi, gen_highpart (DImode, SUBREG_REG (op1)));
+  emit_insn (gen_vec_concatv2di (tmp, lo, hi));
+  emit_move_insn (op0, gen_lowpart (V1TImode, tmp));
+  return;
+}
+
   /* If operand0 is a hard register, make operand1 a pseudo.  */
   if (can_create_pseudo_p ()
   && !ix86_hardreg_mov_ok (op0, op1))
diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-mov-1.c 
b/gcc/testsuite/gcc.target/i386/sse2-v1ti-mov-1.c
new file mode 100644
index 000..a1ef7b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-mov-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse2" } */
+
+typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
+
+uv1ti foo(__int128 x)
+{
+  return (uv1ti)x;
+}
+
+/* { dg-final { scan-assembler-not "%\[er\]sp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-zext.c 
b/gcc/testsuite/gcc.target/i386/sse2-v1ti-zext.c
new file mode 100644
index 000..4dfc655
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-zext.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse2" } */
+
+typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
+
+uv1ti extqi(unsigned char c) { return (uv1ti)(__int128)c; }
+uv1ti ext

[PATCH] libgcc: Fix a broken call/return address prediction

2022-01-06 Thread Zhiyao Ma via Gcc-patches
This patch fixed a broken call/return address prediction in
segmented stack implementation on x86_64 by leveraging the
red-zone under the stack pointer.

2022-01-06  Zhiyao Ma  

libgcc/ChangeLog:

* config/i386/morestack.S: Modified instructions.
---
libgcc/config/i386/morestack.S | 11 ---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/libgcc/config/i386/morestack.S b/libgcc/config/i386/morestack.S
index f1cf32dec9f..61c91ce4a35 100644
--- a/libgcc/config/i386/morestack.S
+++ b/libgcc/config/i386/morestack.S
@@ -213,14 +213,19 @@ __morestack_non_split:
cmpl  $0x185d8d4c,(%rax)
je2f

-   # This breaks call/return prediction, as described above.
-   incq8(%rsp) # Increment 
the return address.
+  movq %rax,-8(%rsp)# Save the incremented 
return
+  
# address to the red-zone below
+  
# the stack pointer. It is
+  
# guaranteed not to be corrupted.

   popq  %rax  # Restore 
register.

   .cfi_adjust_cfa_offset -8# Adjust for popped 
register.

-   ret  # 
Return to caller.
+  callq   *-16(%rsp)  # Call into the 
caller's
+  
# function body.
+
+  ret  # 
Return to caller's prologue.

2:
popq  %rax  # Restore 
register.
--
2.25.1



Re: [PATCH] c++: CTAD within alias template [PR91911]

2022-01-06 Thread Patrick Palka via Gcc-patches
On Mon, 3 Jan 2022, Patrick Palka wrote:

> On Wed, 22 Dec 2021, Jason Merrill wrote:
> 
> > On 12/21/21 14:08, Patrick Palka wrote:
> > > On Tue, Dec 21, 2021 at 2:03 PM Patrick Palka  wrote:
> > > > 
> > > > On Wed, Jun 30, 2021 at 4:23 PM Jason Merrill  wrote:
> > > > > 
> > > > > On 6/30/21 4:18 PM, Patrick Palka wrote:
> > > > > > On Wed, Jun 30, 2021 at 3:51 PM Jason Merrill 
> > > > > > wrote:
> > > > > > > 
> > > > > > > On 6/30/21 11:58 AM, Patrick Palka wrote:
> > > > > > > > On Wed, 30 Jun 2021, Patrick Palka wrote:
> > > > > > > > 
> > > > > > > > > On Fri, 25 Jun 2021, Jason Merrill wrote:
> > > > > > > > > 
> > > > > > > > > > On 6/25/21 1:11 PM, Patrick Palka wrote:
> > > > > > > > > > > On Fri, 25 Jun 2021, Jason Merrill wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > On 6/24/21 4:45 PM, Patrick Palka wrote:
> > > > > > > > > > > > > In the first testcase below, during parsing of the 
> > > > > > > > > > > > > alias
> > > > > > > > > > > > > template
> > > > > > > > > > > > > ConstSpanType, transparency of alias template
> > > > > > > > > > > > > specializations means we
> > > > > > > > > > > > > replace SpanType with SpanType's substituted
> > > > > > > > > > > > > definition.  But this
> > > > > > > > > > > > > substitution lowers the level of the CTAD placeholder
> > > > > > > > > > > > > for span(T()) from
> > > > > > > > > > > > > 2 to 1, and so the later instantiantion of
> > > > > > > > > > > > > ConstSpanType
> > > > > > > > > > > > > erroneously substitutes this CTAD placeholder with the
> > > > > > > > > > > > > template argument
> > > > > > > > > > > > > at level 1 index 0, i.e. with int, before we get a
> > > > > > > > > > > > > chance to perform the
> > > > > > > > > > > > > CTAD.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > In light of this, it seems we should avoid level
> > > > > > > > > > > > > lowering when
> > > > > > > > > > > > > substituting through through the type-id of a 
> > > > > > > > > > > > > dependent
> > > > > > > > > > > > > alias template
> > > > > > > > > > > > > specialization.  To that end this patch makes
> > > > > > > > > > > > > lookup_template_class_1
> > > > > > > > > > > > > pass tf_partial to tsubst in this situation.
> > > > > > > > > > > > 
> > > > > > > > > > > > This makes sense, but what happens if SpanType is a 
> > > > > > > > > > > > member
> > > > > > > > > > > > template, so
> > > > > > > > > > > > that
> > > > > > > > > > > > the levels of it and ConstSpanType don't match?  Or the
> > > > > > > > > > > > other way around?
> > > > > > > > > > > 
> > > > > > > > > > > If SpanType is a member template of say the class
> > > > > > > > > > > template A (and
> > > > > > > > > > > thus its level is greater than ConstSpanType):
> > > > > > > > > > > 
> > > > > > > > > > >   template
> > > > > > > > > > >   struct A {
> > > > > > > > > > > template
> > > > > > > > > > > using SpanType = decltype(span(T()));
> > > > > > > > > > >   };
> > > > > > > > > > > 
> > > > > > > > > > >   template
> > > > > > > > > > >   using ConstSpanType = span > > > > > > > > > > A::SpanType::value_type>;
> > > > > > > > > > > 
> > > > > > > > > > >   using type = ConstSpanType;
> > > > > > > > > > > 
> > > > > > > > > > > then this case luckily works even without the patch 
> > > > > > > > > > > because
> > > > > > > > > > > instantiate_class_template now reuses the specialization
> > > > > > > > > > > A::SpanType
> > > > > > > > > > > that was formed earlier during instantiation of A,
> > > > > > > > > > > where we
> > > > > > > > > > > substitute only a single level of template arguments, so 
> > > > > > > > > > > the
> > > > > > > > > > > level of
> > > > > > > > > > > the CTAD placeholder inside the defining-type-id of this
> > > > > > > > > > > specialization
> > > > > > > > > > > dropped from 3 to 2, so still more than the level of
> > > > > > > > > > > ConstSpanType.
> > > > > > > > > > > 
> > > > > > > > > > > This luck is short-lived though, because if we replace
> > > > > > > > > > > A::SpanType with say A::SpanType 
> > > > > > > > > > > then
> > > > > > > > > > > the testcase
> > > > > > > > > > > breaks again (without the patch) because we no longer can
> > > > > > > > > > > reuse that
> > > > > > > > > > > specialization, so we instead form it on the spot by
> > > > > > > > > > > substituting two
> > > > > > > > > > > levels of template arguments (U=int,T=T) into the
> > > > > > > > > > > defining-type-id,
> > > > > > > > > > > causing the level of the placeholder to drop to 1.  I 
> > > > > > > > > > > think
> > > > > > > > > > > the patch
> > > > > > > > > > > causes its level to remain 3 (though I guess it should
> > > > > > > > > > > really be 2).
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > For the other way around, if ConstSpanType is a member
> > > > > > > > > > > template of
> > > > > > > > > > > say the class template B (and thus its level is greater
> > > > > > > > >

[PATCH] x86: Rename -harden-sls=indirect-branch to -harden-sls=indirect-jmp

2022-01-06 Thread H.J. Lu via Gcc-patches
Indirect branch also includes indirect call instructions.  Rename
-harden-sls=indirect-branch to -harden-sls=indirect-jmp to match its
intended behavior.

PR target/102952
* config/i386/i386-opts.h (harden_sls): Replace
harden_sls_indirect_branch with harden_sls_indirect_jmp.
* config/i386/i386.c (ix86_output_jmp_thunk_or_indirect):
Likewise.
(ix86_output_indirect_jmp): Likewise.
(ix86_output_call_insn): Likewise.
* config/i386/i386.opt: Replace indirect-branch with
indirect-jmp.  Replace harden_sls_indirect_branch with
harden_sls_indirect_jmp.
* doc/invoke.texi (-harden-sls=): Replace indirect-branch with
indirect-jmp.
---
 gcc/config/i386/i386-opts.h | 4 ++--
 gcc/config/i386/i386.c  | 6 +++---
 gcc/config/i386/i386.opt| 2 +-
 gcc/doc/invoke.texi | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index b31f9506c35..8f71e89fa9a 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -124,8 +124,8 @@ enum instrument_return {
 enum harden_sls {
   harden_sls_none = 0,
   harden_sls_return = 1 << 0,
-  harden_sls_indirect_branch = 1 << 1,
-  harden_sls_all = harden_sls_return | harden_sls_indirect_branch
+  harden_sls_indirect_jmp = 1 << 1,
+  harden_sls_all = harden_sls_return | harden_sls_indirect_jmp
 };
 
 #endif
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1a964fe00f4..835ec2cafcb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -16037,7 +16037,7 @@ ix86_output_jmp_thunk_or_indirect (const char 
*thunk_name, const int regno)
   fprintf (asm_out_file, "\tjmp\t");
   assemble_name (asm_out_file, thunk_name);
   putc ('\n', asm_out_file);
-  if ((ix86_harden_sls & harden_sls_indirect_branch))
+  if ((ix86_harden_sls & harden_sls_indirect_jmp))
fputs ("\tint3\n", asm_out_file);
 }
   else
@@ -16263,7 +16263,7 @@ ix86_output_indirect_jmp (rtx call_op)
 }
   else
 output_asm_insn ("%!jmp\t%A0", &call_op);
-  return (ix86_harden_sls & harden_sls_indirect_branch) ? "int3" : "";
+  return (ix86_harden_sls & harden_sls_indirect_jmp) ? "int3" : "";
 }
 
 /* Output return instrumentation for current function if needed.  */
@@ -16430,7 +16430,7 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op)
{
  output_asm_insn (xasm, &call_op);
  if (!direct_p
- && (ix86_harden_sls & harden_sls_indirect_branch))
+ && (ix86_harden_sls & harden_sls_indirect_jmp))
return "int3";
}
   return "";
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index e69955bd1a8..eb829d13d40 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1144,7 +1144,7 @@ EnumValue
 Enum(harden_sls) String(return) Value(harden_sls_return)
 
 EnumValue
-Enum(harden_sls) String(indirect-branch) Value(harden_sls_indirect_branch)
+Enum(harden_sls) String(indirect-jmp) Value(harden_sls_indirect_jmp)
 
 EnumValue
 Enum(harden_sls) String(all) Value(harden_sls_all)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 04820656ffa..6b84228f142 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -32507,8 +32507,8 @@ Force indirect call and jump via register.
 @opindex mharden-sls
 Generate code to mitigate against straight line speculation (SLS) with
 @var{choice}.  The default is @samp{none} which disables all SLS
-hardening.  @samp{return} enables SLS hardening for function return.
-@samp{indirect-branch} enables SLS hardening for indirect branch.
+hardening.  @samp{return} enables SLS hardening for function returns.
+@samp{indirect-jmp} enables SLS hardening for indirect jumps.
 @samp{all} enables all SLS hardening.
 
 @item -mindirect-branch-cs-prefix
-- 
2.33.1



[PATCH] x86: Generate INT3 for __builtin_eh_return

2022-01-06 Thread H.J. Lu via Gcc-patches
Generate INT3 after indirect jmp in exception return for -fcf-protection
with -mharden-sls=indirect-jmp.

gcc/

PR target/103925
* config/i386/i386.c (ix86_output_indirect_function_return):
Generate INT3 after indirect jmp for -mharden-sls=indirect-jmp.

gcc/testsuite/

PR target/103925
* gcc.target/i386/harden-sls-6.c: New test.
---
 gcc/config/i386/i386.c   |  9 ++---
 gcc/testsuite/gcc.target/i386/harden-sls-6.c | 18 ++
 2 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-6.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 835ec2cafcb..35fb5adf2ef 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -16367,11 +16367,14 @@ ix86_output_indirect_function_return (rtx ret_op)
}
   else
output_indirect_thunk (regno);
-
-  return "";
 }
   else
-return "%!jmp\t%A0";
+{
+  output_asm_insn ("%!jmp\t%A0", &ret_op);
+  if (ix86_harden_sls & harden_sls_indirect_jmp)
+   fputs ("\tint3\n", asm_out_file);
+}
+  return "";
 }
 
 /* Output the assembly for a call instruction.  */
diff --git a/gcc/testsuite/gcc.target/i386/harden-sls-6.c 
b/gcc/testsuite/gcc.target/i386/harden-sls-6.c
new file mode 100644
index 000..9068eb64008
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/harden-sls-6.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -fcf-protection -mharden-sls=indirect-jmp" } */
+
+struct _Unwind_Context _Unwind_Resume_or_Rethrow_this_context;
+
+void offset (int);
+
+struct _Unwind_Context {
+  void *reg[7];
+} _Unwind_Resume_or_Rethrow() {
+  struct _Unwind_Context cur_contextcur_context =
+  _Unwind_Resume_or_Rethrow_this_context;
+  offset(0);
+  __builtin_eh_return ((long) offset, 0);
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]+\\*%rcx" } } */
+/* { dg-final { scan-assembler-times "int3" 1 } } */
-- 
2.33.1



Re: [PATCH 1b/6] Add __attribute__((untrusted))

2022-01-06 Thread Martin Sebor via Gcc-patches

On 1/6/22 8:10 AM, David Malcolm wrote:

On Thu, 2021-12-09 at 15:54 -0700, Martin Sebor wrote:

On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:

This patch adds a new:

    __attribute__((untrusted))

for use by the C front-end, intended for use by the Linux kernel for
use with "__user", but which could be used by other operating system
kernels, and potentialy by other projects.


It looks like untrusted is a type attribute (rather than one
that applies to variables and/or function return values or
writeable by-reference arguments).  I find that quite surprising.


FWIW I initially tried implementing it on pointer types, but doing it
on the underlying type was much cleaner.


   I'm used to thinking of trusted vs tainted as dynamic properties
of data so I'm having trouble deciding what to think about
the attribute applying to types.  Can you explain why it's
useful on types?


A type system *is* a way of detecting problems involving dynamic
properties of data.  Ultimately all we have at runtime is a collection
of bits; the toolchain has the concept of types as a way to allow us to
reason about properies of those bits without requiring a full cross-TU
analysis (to try to figure out that e.g. x is, say, a 32 bit unsigned
integer), and to document these properties clearly to human readers of
the code.


I understand that relying on the type system is a way to do it.
It just doesn't seem like a very good way in a loosely typed
language like C (or C++).



I see this as working like a qualifier (rather like "const" and
"volatile"), in that an
   untrusted char *
when dereferenced gives you an
   untrusted char


Dereferencing a const char* yields a const char lvalue that
implicitly converts to an unqualified value of the referenced
object.  The qualifier is lost in the conversion, so modeling
taint/trust this way will also lose the property in the same
contexts.  It sounds to me like the concept you're modeling
might be more akin to a type specifier (maybe like _Atomic,
although that still converts to the underlying type).



The intent is to have a way of treating the values as "actively
hostile", so that code analyzers can assume the worst possible values
for such types (or more glibly, that we're dealing with data from Satan
rather than from Murphy).

Such types are also relevant to infoleaks: writing sensitive
information to an untrusted value can be detected relatively easily
with this approach, by checking the type of the value - the types
express the trust boundary

Doing this with qualifiers allows us to use the C type system to detect
these kinds of issues without having to add a full cross-TU
interprocedural analysis, and documents it to human readers of the
code.   Compare with const-correctness; we can have an analogous
"trust-correctness".


The problem with const-correctness in C is that it's so easily
lost (like with strchr, or in the lvalue-rvalue conversion).
This is also why I'm skeptical of the type-based approach here.





I'd expect the taint property of a type to be quickly lost as
an object of the type is passed through existing APIs (e.g.,
a char array manipulated by string functions like strchr).


FWIW you can't directly pass an attacker-controlled buffer to strchr:
strchr requires there to be a 0-terminator to the array; if the array's
content is untrusted then the attacker might not have 0-terminated it.


strchr is just an example of the many functions that in my mind
make the type-based approach less than ideal.  If the untrusted
string was known to be nul-teminated, strchr still couldn't be
used without losing the property.  Ditto for memchr.  It seems
that all sanitization would either have to be written from
scratch, without relying on existing utility functions, or by
providing wrappers that called the common utility functions
after removing the qualifier from the tainted data even before
the santization was complete.  That would obviously be error-
prone, but it's something that would be made much more robust
by tracking the taint independently of the data type.

Martin



As implemented, the patch doesn't complain about this, though maybe it
should.

The main point here is to support the existing __user annotation used
by the Linux kernel, in particular, copy_from_user and copy_to_user.



(I usually look at tests to help me understand the design of
a change but I couldn't find an answer to my question in those
in the patch.)


The patch kit was rather unclear on this, due to the use of two
different approaches (custom address spaces vs this untrusted
attribute).  Sorry about this.

Patches 4a and 4b in the kit add test-uaccess.h (to
gcc/testsuite/gcc.dg/analyzer) which supplies "__user"; see the tests
that use "test-uaccess.h" in patch 3:
  [PATCH 3/6] analyzer: implement infoleak detection
 https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584377.html
and in patch 5:
  [PATCH 5/6] analyzer: use region::untrusted_p in taint detection
https://gcc.gn

[PATCH] i386: Improve HImode interunit moves

2022-01-06 Thread Uros Bizjak via Gcc-patches
Currently, the compiler moves HImode values between GPR and XMM registers with:

%vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0}
%vpextrw\t{$0, %1, %k0|%k0, %1, 0}

but it could use slightly faster and shorter:

%vmovd\t{%k1, %0|%0, %k1}
%vmovd\t{%1, %k0|%k0, %1}

2022-01-06  Uroš Bizjak  

gcc/ChangeLog:

* config/i386/i386.c (ix86_output_ssemov) :
Add %q modifier for operands in general registers.
: Add %q modifier for operands in general registers.
* config/i386/i386.md (*movhi_internal): Change type attribute of
xmm-gpr interunit alternatives 9,10 to ssemov and mode attribute
to SImode for non-avx512fp16 targets.
(*movhf_internal): Ditto for xmm-gpr interunit alternatives 6,8.
* config/i386/mmx.md (*movv2qi_internal):
Ditto for xmm-gpr interunit alternatives 8,9.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr102811-2.c (dg-final):
Update scan-assembler-times directives.
* gcc.target/i386/sse2-float16-2.c (dg-final):
Update scan-assembler directives.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1a964fe00f4..aeb7db5a5e3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5535,15 +5535,30 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
 
 case MODE_DI:
   /* Handle broken assemblers that require movd instead of movq. */
-  if (!HAVE_AS_IX86_INTERUNIT_MOVQ
- && (GENERAL_REG_P (operands[0])
- || GENERAL_REG_P (operands[1])))
-   return "%vmovd\t{%1, %0|%0, %1}";
+  if (GENERAL_REG_P (operands[0]))
+   {
+ if (HAVE_AS_IX86_INTERUNIT_MOVQ)
+   return "%vmovq\t{%1, %q0|%q0, %1}";
+ else
+   return "%vmovd\t{%1, %q0|%q0, %1}";
+   }
+  else if (GENERAL_REG_P (operands[1]))
+   {
+ if (HAVE_AS_IX86_INTERUNIT_MOVQ)
+   return "%vmovq\t{%q1, %0|%0, %q1}";
+ else
+   return "%vmovd\t{%q1, %0|%0, %q1}";
+   }
   else
return "%vmovq\t{%1, %0|%0, %1}";
 
 case MODE_SI:
-  return "%vmovd\t{%1, %0|%0, %1}";
+  if (GENERAL_REG_P (operands[0]))
+   return "%vmovd\t{%1, %k0|%k0, %1}";
+  else if (GENERAL_REG_P (operands[1]))
+   return "%vmovd\t{%k1, %0|%0, %k1}";
+  else
+   return "%vmovd\t{%1, %0|%0, %1}";
 
 case MODE_HI:
   if (GENERAL_REG_P (operands[0]))
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9b424a3935b..376df1d51d1 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2580,13 +2580,9 @@
return standard_sse_constant_opcode (insn, operands);
 
   if (SSE_REG_P (operands[0]))
-   return MEM_P (operands[1])
- ? "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}"
- : "%vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0}";
+   return "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}";
   else
-   return MEM_P (operands[0])
- ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}"
- : "%vpextrw\t{$0, %1, %k0|%k0, %1, 0}";
+   return "%vpextrw\t{$0, %1, %0|%0, %1, 0}";
 
 case TYPE_MSKLOG:
   if (operands[1] == const0_rtx)
@@ -2614,13 +2610,13 @@
  (const_string "mskmov")
(eq_attr "alternative" "8")
  (const_string "msklog")
-   (eq_attr "alternative" "9,10,13,14")
+   (eq_attr "alternative" "13,14")
  (if_then_else (match_test "TARGET_AVX512FP16")
(const_string "ssemov")
(const_string "sselog1"))
(eq_attr "alternative" "11")
  (const_string "sselog1")
-   (eq_attr "alternative" "12")
+   (eq_attr "alternative" "9,10,12")
  (const_string "ssemov")
(match_test "optimize_function_for_size_p (cfun)")
  (const_string "imov")
@@ -2644,7 +2640,11 @@
  ]
  (const_string "orig")))
(set (attr "mode")
- (cond [(eq_attr "alternative" "9,10,13,14")
+ (cond [(eq_attr "alternative" "9,10")
+ (if_then_else (match_test "TARGET_AVX512FP16")
+   (const_string "HI")
+   (const_string "SI"))
+   (eq_attr "alternative" "13,14")
  (if_then_else (match_test "TARGET_AVX512FP16")
(const_string "HI")
(const_string "TI"))
@@ -3876,13 +3876,9 @@
return standard_sse_constant_opcode (insn, operands);
 
   if (SSE_REG_P (operands[0]))
-   return MEM_P (operands[1])
-  ? "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}"
-  : "%vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0}";
+   return "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}";
   else
-   return MEM_P (operands[0])
-  ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}"
-  : "%vpextrw\t{$0, %1, %k0|%k0, %1, 0}";
+   return "%vpextrw\t{$0, %1, %0|%0, %1, 0}";
 
 default:
   if (get_attr_mode (insn) == MODE_SI)
@@ -3901,9 +3897,9 @@
(set (attr "type")

Re: [PATCH] x86: Rename -harden-sls=indirect-branch to -harden-sls=indirect-jmp

2022-01-06 Thread Uros Bizjak via Gcc-patches
On Thu, Jan 6, 2022 at 7:57 PM H.J. Lu  wrote:
>
> Indirect branch also includes indirect call instructions.  Rename
> -harden-sls=indirect-branch to -harden-sls=indirect-jmp to match its
> intended behavior.
>
> PR target/102952
> * config/i386/i386-opts.h (harden_sls): Replace
> harden_sls_indirect_branch with harden_sls_indirect_jmp.
> * config/i386/i386.c (ix86_output_jmp_thunk_or_indirect):
> Likewise.
> (ix86_output_indirect_jmp): Likewise.
> (ix86_output_call_insn): Likewise.
> * config/i386/i386.opt: Replace indirect-branch with
> indirect-jmp.  Replace harden_sls_indirect_branch with
> harden_sls_indirect_jmp.
> * doc/invoke.texi (-harden-sls=): Replace indirect-branch with
> indirect-jmp.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-opts.h | 4 ++--
>  gcc/config/i386/i386.c  | 6 +++---
>  gcc/config/i386/i386.opt| 2 +-
>  gcc/doc/invoke.texi | 4 ++--
>  4 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
> index b31f9506c35..8f71e89fa9a 100644
> --- a/gcc/config/i386/i386-opts.h
> +++ b/gcc/config/i386/i386-opts.h
> @@ -124,8 +124,8 @@ enum instrument_return {
>  enum harden_sls {
>harden_sls_none = 0,
>harden_sls_return = 1 << 0,
> -  harden_sls_indirect_branch = 1 << 1,
> -  harden_sls_all = harden_sls_return | harden_sls_indirect_branch
> +  harden_sls_indirect_jmp = 1 << 1,
> +  harden_sls_all = harden_sls_return | harden_sls_indirect_jmp
>  };
>
>  #endif
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 1a964fe00f4..835ec2cafcb 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -16037,7 +16037,7 @@ ix86_output_jmp_thunk_or_indirect (const char 
> *thunk_name, const int regno)
>fprintf (asm_out_file, "\tjmp\t");
>assemble_name (asm_out_file, thunk_name);
>putc ('\n', asm_out_file);
> -  if ((ix86_harden_sls & harden_sls_indirect_branch))
> +  if ((ix86_harden_sls & harden_sls_indirect_jmp))
> fputs ("\tint3\n", asm_out_file);
>  }
>else
> @@ -16263,7 +16263,7 @@ ix86_output_indirect_jmp (rtx call_op)
>  }
>else
>  output_asm_insn ("%!jmp\t%A0", &call_op);
> -  return (ix86_harden_sls & harden_sls_indirect_branch) ? "int3" : "";
> +  return (ix86_harden_sls & harden_sls_indirect_jmp) ? "int3" : "";
>  }
>
>  /* Output return instrumentation for current function if needed.  */
> @@ -16430,7 +16430,7 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op)
> {
>   output_asm_insn (xasm, &call_op);
>   if (!direct_p
> - && (ix86_harden_sls & harden_sls_indirect_branch))
> + && (ix86_harden_sls & harden_sls_indirect_jmp))
> return "int3";
> }
>return "";
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index e69955bd1a8..eb829d13d40 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1144,7 +1144,7 @@ EnumValue
>  Enum(harden_sls) String(return) Value(harden_sls_return)
>
>  EnumValue
> -Enum(harden_sls) String(indirect-branch) Value(harden_sls_indirect_branch)
> +Enum(harden_sls) String(indirect-jmp) Value(harden_sls_indirect_jmp)
>
>  EnumValue
>  Enum(harden_sls) String(all) Value(harden_sls_all)
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 04820656ffa..6b84228f142 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -32507,8 +32507,8 @@ Force indirect call and jump via register.
>  @opindex mharden-sls
>  Generate code to mitigate against straight line speculation (SLS) with
>  @var{choice}.  The default is @samp{none} which disables all SLS
> -hardening.  @samp{return} enables SLS hardening for function return.
> -@samp{indirect-branch} enables SLS hardening for indirect branch.
> +hardening.  @samp{return} enables SLS hardening for function returns.
> +@samp{indirect-jmp} enables SLS hardening for indirect jumps.
>  @samp{all} enables all SLS hardening.
>
>  @item -mindirect-branch-cs-prefix
> --
> 2.33.1
>


Re: [PATCH] x86: Generate INT3 for __builtin_eh_return

2022-01-06 Thread Uros Bizjak via Gcc-patches
On Thu, Jan 6, 2022 at 7:58 PM H.J. Lu  wrote:
>
> Generate INT3 after indirect jmp in exception return for -fcf-protection
> with -mharden-sls=indirect-jmp.
>
> gcc/
>
> PR target/103925
> * config/i386/i386.c (ix86_output_indirect_function_return):
> Generate INT3 after indirect jmp for -mharden-sls=indirect-jmp.
>
> gcc/testsuite/
>
> PR target/103925
> * gcc.target/i386/harden-sls-6.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.c   |  9 ++---
>  gcc/testsuite/gcc.target/i386/harden-sls-6.c | 18 ++
>  2 files changed, 24 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-6.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 835ec2cafcb..35fb5adf2ef 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -16367,11 +16367,14 @@ ix86_output_indirect_function_return (rtx ret_op)
> }
>else
> output_indirect_thunk (regno);
> -
> -  return "";
>  }
>else
> -return "%!jmp\t%A0";
> +{
> +  output_asm_insn ("%!jmp\t%A0", &ret_op);
> +  if (ix86_harden_sls & harden_sls_indirect_jmp)
> +   fputs ("\tint3\n", asm_out_file);
> +}
> +  return "";
>  }
>
>  /* Output the assembly for a call instruction.  */
> diff --git a/gcc/testsuite/gcc.target/i386/harden-sls-6.c 
> b/gcc/testsuite/gcc.target/i386/harden-sls-6.c
> new file mode 100644
> index 000..9068eb64008
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/harden-sls-6.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -fcf-protection -mharden-sls=indirect-jmp" } */
> +
> +struct _Unwind_Context _Unwind_Resume_or_Rethrow_this_context;
> +
> +void offset (int);
> +
> +struct _Unwind_Context {
> +  void *reg[7];
> +} _Unwind_Resume_or_Rethrow() {
> +  struct _Unwind_Context cur_contextcur_context =
> +  _Unwind_Resume_or_Rethrow_this_context;
> +  offset(0);
> +  __builtin_eh_return ((long) offset, 0);
> +}
> +
> +/* { dg-final { scan-assembler "jmp\[ \t\]+\\*%rcx" } } */
> +/* { dg-final { scan-assembler-times "int3" 1 } } */
> --
> 2.33.1
>


Go patch committed: Permit unnamed types when ignoring struct tags

2022-01-06 Thread Ian Lance Taylor via Gcc-patches
This patch to the Go frontend permits converting unnamed types when
ignoring struct tags.  I think the code was written this way because
before conversion could ignore struct tags this case could only arise
with named types.  Now that conversions permit struct tags to change,
this can occur with unnamed types as well.  The test case is
https://golang.org/cl/375796.  This fixes
https://golang.org/issue/50439.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
89ce08665e128d2f8e9ae148044134d9e94a9509
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index a18f3a37349..9cc6a1c63c6 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-9732b077b9235d0f35d0fb0abfe406b94d49
+799e9807c36fc661b14dfff136369556f09a5ebf
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/types.cc b/gcc/go/gofrontend/types.cc
index 57c02a97ef0..1c67ea099eb 100644
--- a/gcc/go/gofrontend/types.cc
+++ b/gcc/go/gofrontend/types.cc
@@ -791,8 +791,7 @@ Type::are_convertible(const Type* lhs, const Type* rhs, 
std::string* reason)
 
   // The types are convertible if they have identical underlying
   // types, ignoring struct field tags.
-  if ((lhs->named_type() != NULL || rhs->named_type() != NULL)
-  && Type::are_identical(lhs->base(), rhs->base(), 0, reason))
+  if (Type::are_identical(lhs->base(), rhs->base(), 0, reason))
 return true;
 
   // The types are convertible if they are both unnamed pointer types


Re: [PATCH] Fortran: Fix ICE in argument_rank_mismatch [PR103287]

2022-01-06 Thread Harald Anlauf via Gcc-patches

Am 05.01.22 um 22:34 schrieb Sandra Loosemore:

One of my previous TS29113-related patches from last fall introduced an
ICE due to a bogus assertion.  This is the fix Steve suggested in the
issue, bundled with the test cases, regression-tested, etc.  OK to check
in?


OK.


-Sandra


Thanks,
Harald


Re: [PATCH] PR fortran/103777 - ICE in gfc_simplify_maskl, at fortran/simplify.c:4918

2022-01-06 Thread Harald Anlauf via Gcc-patches

Hi Mikael,

Am 21.12.21 um 13:38 schrieb Mikael Morin:

Le 20/12/2021 à 23:05, Harald Anlauf via Fortran a écrit :

Dear all,

we need to check the arguments of the elemental MASKL and MASKR
intrinsics also before simplifying.

Testcase by Gerhard.  The fix is almost obvious, but I'm happy to
get feedback in case there is something I overlooked.  (There is
already a check on scalar arguments to MASKL/MASKR, which however
misses the case of array arguments.)

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


Your patch looks reasonable and safe.
However, I find it surprising that it’s actually needed, as gfc_check
mask is already the check function associated to maskl and maskr in the
definition of the symbols.  The simplification function should be called
only when the associated check function has returned successfully, so it
shouldn’t be necessary to call it again at simplification time.
Looking at the backtrace, it is the do_simplify call at the beginning of
  gfc_intrinsic_func_interface that seems dubious to me, as it comes
before all the check further down in the function and it looks redundant
with the other simplification code after the checks.

So I’m inclined to test whether by any chance removing that call works,
and if it doesn’t, let’s go with this patch.


Did you find the time to try your version?


Mikael



Thanks,
Harald


Re: [power-ieee128] RFH: LTO broken

2022-01-06 Thread Thomas Koenig via Gcc-patches



On 06.01.22 06:00, Michael Meissner via Fortran wrote:

I pushed the patch to the branch.


Test results are looking quite good right now.

What is still missing is the conversion for unformatted I/O, both
ways.  I'll start doing some stuff on it. Just one question:
What are functions that I can use to convert from IBM long double
to IEEE and long double and vice versa?  It was in an e-mail somewhere,
but I cannot find it at the moment.

Best regards

Thomas


Re: [power-ieee128] RFH: LTO broken

2022-01-06 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 06, 2022 at 09:01:54PM +0100, Thomas Koenig wrote:
> On 06.01.22 06:00, Michael Meissner via Fortran wrote:
> > I pushed the patch to the branch.
> 
> Test results are looking quite good right now.
> 
> What is still missing is the conversion for unformatted I/O, both
> ways.  I'll start doing some stuff on it. Just one question:
> What are functions that I can use to convert from IBM long double
> to IEEE and long double and vice versa?  It was in an e-mail somewhere,
> but I cannot find it at the moment.

Under the hood __extendkftf2 and __trunctfkf2, as can be seen on:
__ibm128 foo (__float128 x) { return x; }
__float128 bar (__ibm128 x) { return x; }

But, I really don't think libgfortran should call those by hand, just
use C casts, (GFC_REAL_17) var_with_GFC_REAL_16_type or
(GFC_REAL_16) var_with_GFC_REAL_17_type.

Jakub



[PATCH] Fortran: Fix handling of optional argument to SIZE intrinsic [PR103898]

2022-01-06 Thread Sandra Loosemore
This patch fixes an ICE introduced with the recent-ish rewrite to inline 
the SIZE intrinsic, using a helper function to do the bulk of the work 
in producing the expansion.  It turns out to be a simple think-o type 
mistake in the wrapper around the helper rather than anything deeply 
wrong with the logic.


OK to check in?

-Sandra
commit 0e5b74440572f988dd96a6e9c33c11b59323d6cf
Author: Sandra Loosemore 
Date:   Thu Jan 6 11:23:18 2022 -0800

Fortran: Fix handling of optional argument to SIZE intrinsic [PR103898]

This patch fixes a think-o in the code that triggered an ICE
in the test case.

2021-01-06  Sandra Loosemore  

	gcc/fortran/
	* trans-intrinsic.c (gfc_conv_intrinsic_size): Make size_var
	actually be a variable and fix surrounding code.

	gcc/testsuite/
	* gfortran.dg/pr103898.f90: New test.

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 41252c9..aae34b0 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -8006,10 +8006,14 @@ gfc_conv_intrinsic_size (gfc_se * se, gfc_expr * expr)
 	  cond = gfc_evaluate_now (cond, &se->pre);
 	  /* 'block2' contains the arg2 absent case, 'block' the arg2 present
 	  case; size_var can be used in both blocks. */
-	  tree size_var = gfc_tree_array_size (&block2, arg1, e, NULL_TREE);
+	  tree size_var = gfc_create_var (TREE_TYPE (size), "size");
 	  tmp = fold_build2_loc (input_location, MODIFY_EXPR,
  TREE_TYPE (size_var), size_var, size);
 	  gfc_add_expr_to_block (&block, tmp);
+	  size = gfc_tree_array_size (&block2, arg1, e, NULL_TREE);
+	  tmp = fold_build2_loc (input_location, MODIFY_EXPR,
+ TREE_TYPE (size_var), size_var, size);
+	  gfc_add_expr_to_block (&block2, tmp);
 	  tmp = build3_v (COND_EXPR, cond, gfc_finish_block (&block),
 			  gfc_finish_block (&block2));
 	  gfc_add_expr_to_block (&se->pre, tmp);
diff --git a/gcc/testsuite/gfortran.dg/pr103898.f90 b/gcc/testsuite/gfortran.dg/pr103898.f90
new file mode 100644
index 000..6b4bb30
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr103898.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+
+! This test used to ICE during gimplification (PR103898).
+
+Module g
+contains
+  function mysize(array, dim)
+integer :: mysize
+integer, dimension(:), intent(in)   :: array
+integer, optional, intent(in)   :: dim
+if (present(dim)) then
+   mysize = size(array, dim=dim)
+endif
+  end function mysize
+end module


Re: [PATCH] Fortran: Fix handling of optional argument to SIZE intrinsic [PR103898]

2022-01-06 Thread Harald Anlauf via Gcc-patches

Hi Sandra,

Am 06.01.22 um 21:11 schrieb Sandra Loosemore:

This patch fixes an ICE introduced with the recent-ish rewrite to inline
the SIZE intrinsic, using a helper function to do the bulk of the work
in producing the expansion.  It turns out to be a simple think-o type
mistake in the wrapper around the helper rather than anything deeply
wrong with the logic.

OK to check in?


LGTM.


-Sandra


Thanks for addressing this swiftly!


Re: [PATCH] PR fortran/103777 - ICE in gfc_simplify_maskl, at fortran/simplify.c:4918

2022-01-06 Thread Mikael Morin

Le 06/01/2022 à 20:50, Harald Anlauf a écrit :


Did you find the time to try your version?


Not yet. But I have not (yet) forgotten about this.



[wwwdocs] Update C++ DR table

2022-01-06 Thread Marek Polacek via Gcc-patches
A new batch of DRs.

Pushed.

commit ecbd709b2ecb1e3c35fa4ede09bcb7ceb443def6
Author: Marek Polacek 
Date:   Thu Jan 6 17:06:24 2022 -0500

Update C++ DR table

diff --git a/htdocs/projects/cxx-dr-status.html 
b/htdocs/projects/cxx-dr-status.html
index e8002b27..cc2afc03 100644
--- a/htdocs/projects/cxx-dr-status.html
+++ b/htdocs/projects/cxx-dr-status.html
@@ -12113,7 +12113,7 @@
 
 
   https://wg21.link/cwg1726";>1726
-  drafting
+  ready
   Declarator operators and conversion function
   -
   https://gcc.gnu.org/PR79318";>PR79318
@@ -17406,7 +17406,7 @@
 
 
   https://wg21.link/cwg2482";>2482
-  review
+  accepted
   bit_cast and indeterminate values
   ?
   
@@ -17481,16 +17481,16 @@
   -
   
 
-
+
   https://wg21.link/cwg2493";>2493
-  open
+  dup
   auto as a conversion-type-id
   -
-  
+  Dup of issue 1670
 
 
   https://wg21.link/cwg2494";>2494
-  drafting
+  ready
   Multiple definitions of non-odr-used entities
   -
   
@@ -17567,16 +17567,100 @@
 
 
   https://wg21.link/cwg2505";>2505
-  open
+  drafting
   Nested unnamed namespace of inline unnamed namespace
   -
   
 
+
+  https://wg21.link/cwg2506";>2506
+  ready
+  Structured bindings and array cv-qualifiers
+  -
+  
+
+
+  https://wg21.link/cwg2507";>2507
+  review
+  Default arguments for operator[]
+  ?
+  
+
+
+  https://wg21.link/cwg2508";>2508
+  review
+  Restrictions on uses of template parameter names
+  ?
+  
+
+
+  https://wg21.link/cwg2509";>2509
+  ready
+  decl-specifier-seq in lambda-specifiers
+  -
+  
+
+
+  https://wg21.link/cwg2510";>2510
+  open
+  noexcept-specifier of friend function vs class 
completeness
+  -
+  
+
+
+  https://wg21.link/cwg2511";>2511
+  ready
+  cv-qualified bit-fields
+  -
+  
+
+
+  https://wg21.link/cwg2512";>2512
+  NAD
+  typeid and incomplete class types
+  N/A
+  
+
+
+  https://wg21.link/cwg2513";>2513
+  open
+  Ambiguity with requires-clause and 
operator-function-id
+  -
+  
+
+
+  https://wg21.link/cwg2514";>2514
+  open
+  Modifying const subobjects
+  -
+  
+
+
+  https://wg21.link/cwg2515";>2515
+  open
+  Result of a function call
+  -
+  
+
+
+  https://wg21.link/cwg2516";>2516
+  open
+  Locus of enum-specifier or 
opaque-enum-declaration
+  -
+  
+
+
+  https://wg21.link/cwg2517";>2517
+  open
+  Useless restriction on use of parameter in 
constraint-expression
+  -
+  
+
   
 
   This page is currently maintained by mailto:pola...@redhat.com";>pola...@redhat.com.
   Last update:
-Tue Nov 23 04:16:08 PM EST 2021
+Thu Jan  6 04:42:28 PM EST 2022
   
 
 



[PATCH take #3] Recognize MULT_HIGHPART_EXPR in tree-ssa-math-opts pass.

2022-01-06 Thread Roger Sayle

This is the third iteration of a patch to perceive MULT_HIGHPART_EXPR
in the middle-end.  As they say "the third time's a charm".  The first
version implemented this in match.pd, which was considered too early.
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551316.html
The second version attempted to do this during RTL expansion, and was
considered to be too late in the middle-end.
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576922.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576923.html

This latest version incorporates Richard Biener's feedback/suggestion
to perceive MULT_HIGHPART_EXPR in one of the "instruction selection
passes", specifically tree-ssa-math-opts, where the recognition of
highpart multiplications takes place in the same pass as widening
multiplications.

With each rewrite, the patch is also getting more aggressive in the
set of widening multiplications that it recognizes as highpart multiplies.
Currently any widening multiplication followed by a right shift (either
signed or unsigned) by a bit count sufficient to eliminate the lowpart
is recognized.  The result of this shift doesn't need to be truncated.
As previously, this patch confirms the target provides a suitable
optab before introducing the MULT_HIGHPART_EXPR.  This is the reason
the testcase is restricted to x86_64, as this pass doesn't do anything
on some platforms, but x86_64 should be sufficient to confirm that the
pass is working/continues to work.

This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap
and make -k check (both with and without --target_board='unix{-m32}')
with no new regressions.  Ok for mainline?


2022-01-06  Roger Sayle  

gcc/ChangeLog
* tree-ssa-math-opts.c (struct widen_mul_stats): Add a
highpart_mults_inserted field.
(convert_mult_to_highpart): New function to convert right shift
of a widening multiply into a MULT_HIGHPART_EXPR.
(math_opts_dom_walker::after_dom_children) [RSHIFT_EXPR]:
Call new convert_mult_to_highpart function.
(pass_optimize_widening_mul::execute): Add a statistics counter
for tracking "highpart multiplications inserted" events.

gcc/testsuite/ChangeLog
* gcc.target/i386/mult-highpart.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 6131824..2014139 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -207,6 +207,9 @@ static struct
 
   /* Number of divmod calls inserted.  */
   int divmod_calls_inserted;
+
+  /* Number of highpart multiplication ops inserted.  */
+  int highpart_mults_inserted;
 } widen_mul_stats;
 
 /* The instance of "struct occurrence" representing the highest
@@ -4548,9 +4551,129 @@ convert_to_divmod (gassign *stmt)
   return true; 
 }
 
+/* Process a single gimple statement STMT, which has a RSHIFT_EXPR as
+   its rhs, and try to convert it into a MULT_HIGHPART_EXPR.  The return
+   value is true iff we converted the statement.  */
+
+static bool
+convert_mult_to_highpart (gimple *stmt, gimple_stmt_iterator *gsi)
+{
+  tree lhs = gimple_assign_lhs (stmt);
+  tree stype = TREE_TYPE (lhs);
+  tree sarg0 = gimple_assign_rhs1 (stmt);
+  tree sarg1 = gimple_assign_rhs2 (stmt);
+
+  if (TREE_CODE (stype) != INTEGER_TYPE
+  || TREE_CODE (sarg1) != INTEGER_CST
+  || TREE_CODE (sarg0) != SSA_NAME
+  || !tree_fits_uhwi_p (sarg1)
+  || !has_single_use (sarg0))
+return false;
+
+  gimple *def = SSA_NAME_DEF_STMT (sarg0);
+  if (!is_gimple_assign (def))
+return false;
+
+  enum tree_code mcode = gimple_assign_rhs_code (def);
+  if (mcode == NOP_EXPR)
+{
+  tree tmp = gimple_assign_rhs1 (def);
+  if (TREE_CODE (tmp) != SSA_NAME || !has_single_use (tmp))
+   return false;
+  def = SSA_NAME_DEF_STMT (tmp);
+  if (!is_gimple_assign (def))
+   return false;
+  mcode = gimple_assign_rhs_code (def);
+}
+
+  if (mcode != WIDEN_MULT_EXPR
+  || gimple_bb (def) != gimple_bb (stmt))
+return false;
+  tree mtype = TREE_TYPE (gimple_assign_lhs (def));
+  if (TREE_CODE (mtype) != INTEGER_TYPE
+  || TYPE_PRECISION (mtype) != TYPE_PRECISION (stype))
+return false;
+
+  tree mop1 = gimple_assign_rhs1 (def);
+  tree mop2 = gimple_assign_rhs2 (def);
+  tree optype = TREE_TYPE (mop1);
+  bool unsignedp = TYPE_UNSIGNED (optype);
+  unsigned int prec = TYPE_PRECISION (optype);
+
+  if (optype != TREE_TYPE (mop2)
+  || unsignedp != TYPE_UNSIGNED (mtype)
+  || TYPE_PRECISION (mtype) != 2 * prec)
+return false;
+
+  unsigned HOST_WIDE_INT bits = tree_to_uhwi (sarg1);
+  if (bits < prec || bits >= 2 * prec)
+return false;
+
+  machine_mode mode = TYPE_MODE (optype);
+  optab tab = unsignedp ? umul_highpart_optab : smul_highpart_optab;
+  if (optab_handler (tab, mode) == CODE_FOR_nothing)
+return false;
+
+  location_t loc = gimple_location (stmt);
+  tree highpart1 = build_and_insert_binop (gsi, 

[PATCH] rs6000: More factoring of overload processing

2022-01-06 Thread Bill Schmidt via Gcc-patches
Hi!

This patch continues the refactoring started with r12-6014.  I had previously
noted that the resolve_vec* routines can be further simplified by processing
the argument list earlier, so that all routines can use the arrays of arguments
and types.  I found that this was useful for some of the routines, but not for
all of them.

For several of the special-cased overloads, we don't specify all of the
possible type combinations in rs6000-overload.def, because the types don't
matter for the expansion we do.  For these, we can't use generic error message
handling when the number of arguments is incorrect, because the result is
misleading error messages that indicate argument types are wrong.

So this patch goes halfway and improves the factoring on the remaining special
cases, but leaves vec_splats, vec_promote, vec_extract, vec_insert, and
vec_step alone.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
okay for trunk?

Thanks!
Bill


2022-01-06  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.c (resolve_vec_mul): Accept args and types
parameters instead of arglist and nargs.  Simplify accordingly.  Remove
unnecessary test for argument count mismatch.
(resolve_vec_cmpne): Likewise.
(resolve_vec_adde_sube): Likewise.
(resolve_vec_addec_subec): Likewise.
(altivec_resolve_overloaded_builtin): Move overload special handling
after the gathering of arguments into args[] and types[] and the test
for correct number of arguments.  Don't perform the test for correct
number of arguments for certain special cases.  Call the other special
cases with args and types instead of arglist and nargs.
---
 gcc/config/rs6000/rs6000-c.c | 304 +++
 1 file changed, 127 insertions(+), 177 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 24a081ced37..189a70d89bf 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -939,37 +939,25 @@ altivec_build_resolved_builtin (tree *args, int n, tree 
fntype, tree ret_type,
 enum resolution { unresolved, resolved, resolved_bad };
 
 /* Resolve an overloaded vec_mul call and return a tree expression for the
-   resolved call if successful.  NARGS is the number of arguments to the call.
-   ARGLIST contains the arguments.  RES must be set to indicate the status of
+   resolved call if successful.  ARGS contains the arguments to the call.
+   TYPES contains their types.  RES must be set to indicate the status of
the resolution attempt.  LOC contains statement location information.  */
 
 static tree
-resolve_vec_mul (resolution *res, vec *arglist, unsigned nargs,
-location_t loc)
+resolve_vec_mul (resolution *res, tree *args, tree *types, location_t loc)
 {
   /* vec_mul needs to be special cased because there are no instructions for it
  for the {un}signed char, {un}signed short, and {un}signed int types.  */
-  if (nargs != 2)
-{
-  error ("builtin %qs only accepts 2 arguments", "vec_mul");
-  *res = resolved;
-  return error_mark_node;
-}
-
-  tree arg0 = (*arglist)[0];
-  tree arg0_type = TREE_TYPE (arg0);
-  tree arg1 = (*arglist)[1];
-  tree arg1_type = TREE_TYPE (arg1);
 
   /* Both arguments must be vectors and the types must be compatible.  */
-  if (TREE_CODE (arg0_type) != VECTOR_TYPE
-  || !lang_hooks.types_compatible_p (arg0_type, arg1_type))
+  if (TREE_CODE (types[0]) != VECTOR_TYPE
+  || !lang_hooks.types_compatible_p (types[0], types[1]))
 {
   *res = resolved_bad;
   return error_mark_node;
 }
 
-  switch (TYPE_MODE (TREE_TYPE (arg0_type)))
+  switch (TYPE_MODE (TREE_TYPE (types[0])))
 {
 case E_QImode:
 case E_HImode:
@@ -978,21 +966,21 @@ resolve_vec_mul (resolution *res, vec 
*arglist, unsigned nargs,
 case E_TImode:
   /* For scalar types just use a multiply expression.  */
   *res = resolved;
-  return fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (arg0), arg0,
- fold_convert (TREE_TYPE (arg0), arg1));
+  return fold_build2_loc (loc, MULT_EXPR, types[0], args[0],
+ fold_convert (types[0], args[1]));
 case E_SFmode:
   {
/* For floats use the xvmulsp instruction directly.  */
*res = resolved;
tree call = rs6000_builtin_decls[RS6000_BIF_XVMULSP];
-   return build_call_expr (call, 2, arg0, arg1);
+   return build_call_expr (call, 2, args[0], args[1]);
   }
 case E_DFmode:
   {
/* For doubles use the xvmuldp instruction directly.  */
*res = resolved;
tree call = rs6000_builtin_decls[RS6000_BIF_XVMULDP];
-   return build_call_expr (call, 2, arg0, arg1);
+   return build_call_expr (call, 2, args[0], args[1]);
   }
 /* Other types are errors.  */
 default:
@@ -1002,37 +990,25 @@ resolve_vec_mul (resolution *res, vec 
*argli

[committed] analyzer: make use of may_be_aliased in alias detection [PR103546]

2022-01-06 Thread David Malcolm via Gcc-patches
Whilst debugging PR analyzer/103546 (false +ve in flex-generated lexers)
I noticed that the analyzer was considering that writes through symbolic
pointers could be treated as clobbering static globals such as:

   static YY_BUFFER_STATE * yy_buffer_stack = NULL;

even for such variables that never have their address taken.

This patch fixes this issue at least, so that the analyzer can preserve
knowledge of such globals on code paths with writes through symbolic
pointers.

It does not fix the false +ve in the lexer code.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-6323-gd564a83d14252d7db01381f71900b7a68357803b.

gcc/analyzer/ChangeLog:
PR analyzer/103546
* store.cc (store::eval_alias_1): Refactor handling of decl
regions, adding a test for may_be_aliased, rejecting those for
which it returns false.

gcc/testsuite/ChangeLog:
PR analyzer/103546
* gcc.dg/analyzer/aliasing-3.c: New test.
---
 gcc/analyzer/store.cc  | 18 --
 gcc/testsuite/gcc.dg/analyzer/aliasing-3.c | 75 ++
 2 files changed, 86 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/aliasing-3.c

diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc
index 8729aa8dab6..3f91b6107a9 100644
--- a/gcc/analyzer/store.cc
+++ b/gcc/analyzer/store.cc
@@ -2456,13 +2456,17 @@ store::eval_alias_1 (const region *base_reg_a,
   = base_reg_a->dyn_cast_symbolic_region ())
 {
   const svalue *sval_a = sym_reg_a->get_pointer ();
-  if (sval_a->get_kind () == SK_INITIAL)
-   if (tree decl_b = base_reg_b->maybe_get_decl ())
- if (!is_global_var (decl_b))
-   {
- /* The initial value of a pointer can't point to a local.  */
- return tristate::TS_FALSE;
-   }
+  if (tree decl_b = base_reg_b->maybe_get_decl ())
+   {
+ if (!may_be_aliased (decl_b))
+   return tristate::TS_FALSE;
+ if (sval_a->get_kind () == SK_INITIAL)
+   if (!is_global_var (decl_b))
+ {
+   /* The initial value of a pointer can't point to a local.  */
+   return tristate::TS_FALSE;
+ }
+   }
   if (sval_a->get_kind () == SK_INITIAL
  && base_reg_b->get_kind () == RK_HEAP_ALLOCATED)
{
diff --git a/gcc/testsuite/gcc.dg/analyzer/aliasing-3.c 
b/gcc/testsuite/gcc.dg/analyzer/aliasing-3.c
new file mode 100644
index 000..003077ad5c1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/aliasing-3.c
@@ -0,0 +1,75 @@
+#include "analyzer-decls.h"
+
+#define NULL ((void *)0)
+
+struct s1
+{
+  int f1;
+};
+
+static struct s1 *p1_glob = NULL;
+
+void test_1 (struct s1 **pp1, struct s1 *p1_parm)
+{
+  struct s1 *init_p1_glob = p1_glob;
+
+  __analyzer_eval (p1_glob == init_p1_glob); /* { dg-warning "TRUE" } */
+
+  if (!p1_glob)
+return;
+
+  __analyzer_eval (p1_glob == init_p1_glob); /* { dg-warning "TRUE" } */
+  __analyzer_eval (p1_glob != NULL); /* { dg-warning "TRUE" } */
+
+  *pp1 = p1_parm;
+
+  /* The write through *pp1 can't have changed p1_glob, because
+ we never take a pointer to p1_glob (and it's static to this TU).  */
+  __analyzer_eval (p1_glob == init_p1_glob); /* { dg-warning "TRUE" } */
+  __analyzer_eval (p1_glob != NULL); /* { dg-warning "TRUE" } */
+}
+
+struct s2
+{
+  int f1;
+};
+
+static struct s2 *p2_glob = NULL;
+
+void test_2 (struct s2 **pp2, struct s2 *p2_parm)
+{
+  /* Ensure that p2_glob is modified.  */
+  p2_glob = __builtin_malloc (sizeof (struct s2));
+  if (!p2_glob)
+return;
+
+  __analyzer_eval (p2_glob != NULL); /* { dg-warning "TRUE" } */
+
+  *pp2 = p2_parm;
+
+  /* The write through *pp2 can't have changed p2_glob, because
+ we never take a pointer to p2_glob (and it's static to this TU).  */
+  __analyzer_eval (p2_glob != NULL); /* { dg-warning "TRUE" } */
+}
+
+struct s3
+{
+  int f1;
+};
+
+struct s3 *p3_glob = NULL;
+
+void test_3 (struct s3 **pp3, struct s3 *p3_parm)
+{
+  p3_glob = __builtin_malloc (sizeof (struct s3));
+  if (!p3_glob)
+return;
+
+  __analyzer_eval (p3_glob != NULL); /* { dg-warning "TRUE" } */
+
+  *pp3 = p3_parm;
+
+  /* The write through *pp3 could have changed p3_glob, because
+ another TU could take a pointer to p3_glob.  */
+  __analyzer_eval (p3_glob != NULL); /* { dg-warning "UNKNOWN" } */
+}
-- 
2.26.3



[PATCH 1/1] [PATCH] Fix canadian compile for mingw-w64 copies the wrong dlls for mingw-w64 multilibs [PR100427]

2022-01-06 Thread cqwrteur via Gcc-patches
When building GCC hosted on windows with Canadian/native compilation 
(host==target), the build scripts in GCC would override DLLs with each other. 
For example, for MinGW-w64, 32-bit DLLs would override 64 bits because build 
scripts copy them both to /bin.

This patch fixes the issue by avoiding copying DLLs with multilibs. However, it 
would still copy when we do not build multilibs, usually the native build for 
GCC on windows.
---
 gcc/configure  | 26 ++
 libatomic/configure| 13 +
 libbacktrace/configure | 13 +
 libcc1/configure   | 26 ++
 libffi/configure   | 26 ++
 libgfortran/configure  | 26 ++
 libgo/config/libtool.m4| 13 +
 libgo/configure| 13 +
 libgomp/configure  | 26 ++
 libitm/configure   | 26 ++
 libobjc/configure  | 13 +
 liboffloadmic/configure| 26 ++
 liboffloadmic/plugin/configure | 26 ++
 libphobos/configure| 13 +
 libquadmath/configure  | 13 +
 libsanitizer/configure | 26 ++
 libssp/configure   | 13 +
 libstdc++-v3/configure | 26 ++
 libtool.m4 | 13 +
 libvtv/configure   | 26 ++
 lto-plugin/configure   | 13 +
 zlib/configure | 13 +
 22 files changed, 429 insertions(+)

diff --git a/gcc/configure b/gcc/configure
index 992a9d70092..d66ccc2a7f4 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -18642,6 +18642,18 @@ cygwin* | mingw* | pw32* | cegcc*)
   yes,cygwin* | yes,mingw* | yes,pw32* | yes,cegcc*)
 library_names_spec='$libname.dll.a'
 # DLL is installed to $(libdir)/../bin by postinstall_cmds
+# If user builds GCC with mulitlibs enabled, it should just install on 
$(libdir)
+# not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones.
+if test ${multilib} = yes; then
+postinstall_cmds='base_file=`basename \${file}`~
+  dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo 
\$dlname'\''`~
+  dldir=$destdir/`dirname \$dlpath`~
+  $install_prog $dir/$dlname $destdir/$dlname~
+  chmod a+x $destdir/$dlname~
+  if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
+eval '\''$striplib $destdir/$dlname'\'' || exit \$?;
+  fi'
+else
 postinstall_cmds='base_file=`basename \${file}`~
   dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo 
\$dlname'\''`~
   dldir=$destdir/`dirname \$dlpath`~
@@ -18651,6 +18663,7 @@ cygwin* | mingw* | pw32* | cegcc*)
   if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
 eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
   fi'
+fi
 postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~
   dlpath=$dir/\$dldll~
$RM \$dlpath'
@@ -22299,6 +22312,18 @@ cygwin* | mingw* | pw32* | cegcc*)
   yes,cygwin* | yes,mingw* | yes,pw32* | yes,cegcc*)
 library_names_spec='$libname.dll.a'
 # DLL is installed to $(libdir)/../bin by postinstall_cmds
+# If user builds GCC with mulitlibs enabled, it should just install on 
$(libdir)
+# not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones.
+if test ${multilib} = yes; then
+postinstall_cmds='base_file=`basename \${file}`~
+  dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo 
\$dlname'\''`~
+  dldir=$destdir/`dirname \$dlpath`~
+  $install_prog $dir/$dlname $destdir/$dlname~
+  chmod a+x $destdir/$dlname~
+  if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
+eval '\''$striplib $destdir/$dlname'\'' || exit \$?;
+  fi'
+else
 postinstall_cmds='base_file=`basename \${file}`~
   dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo 
\$dlname'\''`~
   dldir=$destdir/`dirname \$dlpath`~
@@ -22308,6 +22333,7 @@ cygwin* | mingw* | pw32* | cegcc*)
   if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
 eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
   fi'
+fi
 postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~
   dlpath=$dir/\$dldll~
$RM \$dlpath'
diff --git a/libatomic/configure b/libatomic/configure
index 5867e69ac14..4fd8cb34cd4 100755
--- a/libatomic/configure
+++ b/libatomic/configure
@@ -10461,6 +10461,18 @@ cygwin* | mingw* | pw32* | cegcc*)
   yes,cygwin* | yes,mingw* | yes,pw32* | yes,cegcc*)
 library_names_spec='$libname.dll.a'
 # DLL is installed to $(libdir)/../bin by postinstall_cmds
+# If user builds GCC with mulitlibs enabled,

[pushed 01/11] c++: don't preevaluate new-initializer

2022-01-06 Thread Jason Merrill via Gcc-patches
Here begins a series of EH cleanup bugfixes.

The preevaluation code was causing trouble with my fix for PR94041, and now
I see that it's actually wrong since P0145 was adopted for C++17, mandating
order of evaluation for many expressions that were previously unspecified.
I don't see a need to preserve the preevaluation code for older standard
modes.

gcc/cp/ChangeLog:

* init.c (build_new_1): Remove preevaluation code.

gcc/testsuite/ChangeLog:

* g++.old-deja/g++.martin/new1.C: Don't expect preeval.
---
 gcc/cp/init.c| 38 +---
 gcc/testsuite/g++.dg/tree-ssa/stabilize1.C   | 13 ---
 gcc/testsuite/g++.old-deja/g++.martin/new1.C | 18 +-
 3 files changed, 17 insertions(+), 52 deletions(-)
 delete mode 100644 gcc/testsuite/g++.dg/tree-ssa/stabilize1.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 9d616f3f5e9..2cab4b4cdce 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3047,7 +3047,6 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
  address of the first array element.  This node is a VAR_DECL, and
  is therefore reusable.  */
   tree data_addr;
-  tree init_preeval_expr = NULL_TREE;
   tree orig_type = type;
 
   if (nelts)
@@ -3561,7 +3560,6 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
  placement delete.  */
   if (is_initialized)
 {
-  bool stable;
   bool explicit_value_init_p = false;
 
   if (*init != NULL && (*init)->is_empty ())
@@ -3587,7 +3585,6 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
   init, elt_type,
   LOOKUP_NORMAL,
   complain);
- stable = stabilize_init (init_expr, &init_preeval_expr);
}
   else if (array_p)
{
@@ -3633,11 +3630,6 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
  explicit_value_init_p,
  /*from_array=*/0,
   complain);
-
- /* An array initialization is stable because the initialization
-of each element is a full-expression, so the temporaries don't
-leak out.  */
- stable = true;
}
   else
{
@@ -3694,8 +3686,6 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
  = replace_placeholders (TREE_OPERAND (init_expr, 1),
  TREE_OPERAND (init_expr, 0),
  &had_placeholder);
- stable = (!had_placeholder
-   && stabilize_init (init_expr, &init_preeval_expr));
}
 
   if (init_expr == error_mark_node)
@@ -3726,20 +3716,7 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
  alloc_fn,
  complain));
 
- if (!cleanup)
-   /* We're done.  */;
- else if (stable)
-   /* This is much simpler if we were able to preevaluate all of
-  the arguments to the constructor call.  */
-   {
- /* CLEANUP is compiler-generated, so no diagnostics.  */
- suppress_warning (cleanup);
- init_expr = build2 (TRY_CATCH_EXPR, void_type_node,
- init_expr, cleanup);
- /* Likewise, this try-catch is compiler-generated.  */
- suppress_warning (init_expr);
-   }
- else
+ if (cleanup && !processing_template_decl)
/* Ack!  First we allocate the memory.  Then we set our sentry
   variable to true, and expand a cleanup that deletes the
   memory if sentry is true.  Then we run the constructor, and
@@ -3747,9 +3724,13 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
 
   We need to do this because we allocate the space first, so
   if there are any temporaries with cleanups in the
-  constructor args and we weren't able to preevaluate them, we
-  need this EH region to extend until end of full-expression
-  to preserve nesting.  */
+  constructor args, we need this EH region to extend until
+  end of full-expression to preserve nesting.
+
+  We used to try to evaluate the args first to avoid this, but
+  since C++17 [expr.new] says that "The invocation of the
+  allocation function is sequenced before the evaluations of
+  expressions in the new-initializer."  */
{
  tree end, sentry, begin;
 
@@ -3810,9 +3791,6 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
   rval = build2 (COMPOUND_EXPR, TREE_TYPE (rval), alloc_expr, rval);
 }
 
-  if (init_preeval_expr)
-rval = build2 (COMPOUND_EXPR, TREE_TYPE (rval), init_preeval_expr, rval);
-
   /* A new-expression is never an lvalue.  */
   gcc_assert (!obvalue_p (rval));

[pushed 02/11] c++: loop over array elts w/o explicit init [PR92385]

2022-01-06 Thread Jason Merrill via Gcc-patches
The PR complains that initializing a large array with {} takes a long time
to compile; this was because digest_init would turn {} into a long
CONSTRUCTOR with an initializer for each element, instead of more sensibly
generating a loop.  The standard doesn't specify this implementation, but it
does allow for it by specifying that a temporary created "when a default
constructor is called to initialize an element of an array with no
corresponding initializer" is destroyed "before the construction of the next
array element, if any." rather than living until the end of the complete
object initialization as usual.

This change is also needed before the PR94041 fix extends the lifetime of
temporaries from elements with explicit initializers.

To implement this, I change digest_init so that in cases where
initialization of trailing array elements isn't constant, we return a
VEC_INIT_EXPR instead of a bare CONSTRUCTOR; when it is encountered later,
we call build_vec_init to generate the actual initialization code.

PR c++/92385

gcc/cp/ChangeLog:

* typeck2.c (PICFLAG_VEC_INIT): New.
(process_init_constructor_array): Set it.
(process_init_constructor): Handle it.
(split_nonconstant_init_1): Handle VEC_INIT_EXPR.
* init.c (build_vec_init): Likewise.
* cp-gimplify.c (cp_gimplify_expr): Factor out...
* tree.c (expand_vec_init_expr): ...this function.
(build_vec_init_elt): Handle BRACE_ENCLOSED_INITIALIZER_P.
(build_vec_init_expr): Likewise.
* constexpr.c (cxx_eval_vec_init): Likewise.
(reduced_constant_expression_p): Check arrays before C++20.
* cp-tree.h (expand_vec_init_expr): Declare.

gcc/testsuite/ChangeLog:

* g++.dg/init/array61.C: New test.
---
 gcc/cp/cp-tree.h|  1 +
 gcc/cp/constexpr.c  | 35 ---
 gcc/cp/cp-gimplify.c| 13 ++---
 gcc/cp/init.c   | 11 +++-
 gcc/cp/tree.c   | 44 -
 gcc/cp/typeck2.c| 23 +--
 gcc/testsuite/g++.dg/init/array61.C | 16 +++
 7 files changed, 112 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/init/array61.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 10ca8098a85..0a3697f2f98 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7779,6 +7779,7 @@ extern bool array_of_runtime_bound_p  (tree);
 extern bool vla_type_p (tree);
 extern tree build_array_copy   (tree);
 extern tree build_vec_init_expr(tree, tree, 
tsubst_flags_t);
+extern tree expand_vec_init_expr   (tree, tree, tsubst_flags_t);
 extern void diagnose_non_constexpr_vec_init(tree);
 extern tree hash_tree_cons (tree, tree, tree);
 extern tree hash_tree_chain(tree, tree);
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 72be45c9e87..4a4b347c31d 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3020,8 +3020,7 @@ reduced_constant_expression_p (tree t)
  if (TREE_CODE (TREE_TYPE (t)) == VECTOR_TYPE)
/* An initialized vector would have a VECTOR_CST.  */
return false;
- else if (cxx_dialect >= cxx20
-  && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
+ else if (TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
{
  /* There must be a valid constant initializer at every array
 index.  */
@@ -4955,8 +4954,36 @@ cxx_eval_vec_init (const constexpr_ctx *ctx, tree t,
 {
   tree atype = TREE_TYPE (t);
   tree init = VEC_INIT_EXPR_INIT (t);
-  tree r = cxx_eval_vec_init_1 (ctx, atype, init,
-   VEC_INIT_EXPR_VALUE_INIT (t),
+  bool value_init = VEC_INIT_EXPR_VALUE_INIT (t);
+  if (!init || !BRACE_ENCLOSED_INITIALIZER_P (init))
+;
+  else if (CONSTRUCTOR_NELTS (init) == 0)
+{
+  /* Handle {} as value-init.  */
+  init = NULL_TREE;
+  value_init = true;
+}
+  else
+{
+  /* This is a more complicated case, like needing to loop over trailing
+elements; call build_vec_init and evaluate the result.  */
+  tsubst_flags_t complain = ctx->quiet ? tf_none : tf_warning_or_error;
+  constexpr_ctx new_ctx = *ctx;
+  if (!ctx->object)
+   {
+ /* We want to have an initialization target for an VEC_INIT_EXPR.
+If we don't already have one in CTX, use the VEC_INIT_EXPR_SLOT.  
*/
+ new_ctx.object = VEC_INIT_EXPR_SLOT (t);
+ tree ctor = new_ctx.ctor = build_constructor (atype, NULL);
+ CONSTRUCTOR_NO_CLEARING (ctor) = true;
+ ctx->global->values.put (new_ctx.object, ctor);
+ ctx = &new_ctx;
+   }
+  init = expand_vec_init_expr (ctx->object, t, complain);
+  return cxx_eval_constant_expression (ctx, init, lval, non_constant_p,
+  

[pushed 03/11] c++: temporary lifetime with aggregate init [PR94041]

2022-01-06 Thread Jason Merrill via Gcc-patches
In C++98 the lifetime of temporaries in aggregate initialization was
unclear, but C++11 DR201 clarified that only temporaries created for
default-initialization of an array element with no corresponding
initializer-clause are destroyed immediately; all others persist until the
end of the full-expression.

But we never implemented that, and continued treating individual element
initializations as full-expressions, such as in my patch for PR50866 in
r180442.  This blocked my attempted fix for PR66139, which extended the use
of split_nonconstant_init, and thus the bug, to aggregate initialization of
temporaries within an expression.

The longer temporary lifetime creates further EH region overlap problems
like the ones that wrap_temporary_cleanups addresses: in aggr7.C, we start
out with a normal nesting of

A1
 c.b1
   A2
c.b2
 ...
   ~A2
~A1

where on the way in, throwing from one of the inits will clean up from the
previous inits.  But once c.b2 is initialized, throwing from ~A2 must not
clean up c.b1; instead it needs to clean up c.  So as in build_new_1, we
deal with this by guarding the B cleanups with flags that are cleared once c
is fully constructed; throwing from one of the ~A still hits that region,
but does not call ~B.  And then wrap_temporary_cleanups deals with calling
~C, but we need to tell it not to wrap the subobject cleanups.

The change from expressing the subobject cleanups with CLEANUP_STMT to
TARGET_EXPR was also necessary because we need them to collate with the ~A
in gimplify_cleanup_point_expr; the CLEANUP_STMT representation only worked
with treating subobject initializations as full-expressions.

PR c++/94041

gcc/cp/ChangeLog:

* decl.c (check_initializer): Remove obsolete comment.
(wrap_cleanups_r): Don't wrap CLEANUP_EH_ONLY.
(initialize_local_var): Change assert to test.
* typeck2.c (maybe_push_temp_cleanup): New.
(split_nonconstant_init_1): Use it.
(split_nonconstant_init): Clear cleanup flags.

gcc/testsuite/ChangeLog:

* g++.dg/init/aggr7-eh.C: New test.
* g++.dg/cpp0x/initlist122.C: Also test aggregate variable.
---
 gcc/cp/decl.c| 12 ++---
 gcc/cp/typeck2.c | 55 +
 gcc/testsuite/g++.dg/cpp0x/initlist122.C | 12 -
 gcc/testsuite/g++.dg/init/aggr7-eh.C | 62 
 4 files changed, 121 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/init/aggr7-eh.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 0b71c00f5ab..9f759ceb1c8 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7257,11 +7257,6 @@ check_initializer (tree decl, tree init, int flags, 
vec **cleanups)
 
   if (init && TREE_CODE (init) != TREE_VEC)
{
- /* In aggregate initialization of a variable, each element
-initialization is a full-expression because there is no
-enclosing expression.  */
- gcc_assert (stmts_are_full_exprs_p ());
-
  init_code = store_init_value (decl, init, cleanups, flags);
 
  if (DECL_INITIAL (decl)
@@ -7428,7 +7423,8 @@ wrap_cleanups_r (tree *stmt_p, int *walk_subtrees, void 
*data)
   tree guard = (tree)data;
   tree tcleanup = TARGET_EXPR_CLEANUP (*stmt_p);
 
-  if (tcleanup && !expr_noexcept_p (tcleanup, tf_none))
+  if (tcleanup && !CLEANUP_EH_ONLY (*stmt_p)
+ && !expr_noexcept_p (tcleanup, tf_none))
{
  tcleanup = build2 (TRY_CATCH_EXPR, void_type_node, tcleanup, guard);
  /* Tell honor_protect_cleanup_actions to handle this as a separate
@@ -7500,11 +7496,11 @@ initialize_local_var (tree decl, tree init)
 {
   tree rinit = (TREE_CODE (init) == INIT_EXPR
? TREE_OPERAND (init, 1) : NULL_TREE);
-  if (rinit && !TREE_SIDE_EFFECTS (rinit))
+  if (rinit && !TREE_SIDE_EFFECTS (rinit)
+ && TREE_OPERAND (init, 0) == decl)
{
  /* Stick simple initializers in DECL_INITIAL so that
 -Wno-init-self works (c++/34772).  */
- gcc_assert (TREE_OPERAND (init, 0) == decl);
  DECL_INITIAL (decl) = rinit;
 
  if (warn_init_self && TYPE_REF_P (type))
diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 3d4d35e13c6..54b1d0da129 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -459,13 +459,34 @@ cxx_incomplete_type_error (location_t loc, const_tree 
value, const_tree type)
 }
 
 
+/* We've just initialized subobject SUB; also insert a TARGET_EXPR with an
+   EH-only cleanup for SUB.  Because of EH region nesting issues, we need to
+   make the cleanup conditional on a flag that we will clear once the object is
+   fully initialized, so push a new flag onto FLAGS.  */
+
+static void
+maybe_push_temp_cleanup (tree sub, vec **flags)
+{
+  if (tree cleanup
+  = cxx_maybe_build_cleanup (sub, tf_warning_or_error))
+{
+  tree tx = get_target_expr (boolean_true_node);
+  tree flag = TARGE

  1   2   >