Re: [PATCH] libgfortran : Use the libtool macro to determine libm availability.

2021-08-22 Thread Iain Sandoe



> On 21 Aug 2021, at 23:18, Eric Gallager  wrote:
> 
> On Fri, Aug 20, 2021 at 3:53 AM Tobias Burnus  wrote:
>> 
>> On 20.08.21 09:34, Richard Biener via Fortran wrote:
>> 
>>> On Thu, Aug 19, 2021 at 10:10 PM Iain Sandoe  wrote:
 libm is not needed on Darwin, and should not be added unconditionally
 even if that is (mostly) harmless since it is a symlink to libc.
 
 tested on x86_64, i686-darwin, x86_64-linux,
 OK for master?
>>> OK.
>>> Richard.
>> 
>> Looks also good to me – but for completeness and as background, I also
>> want to remark the following.
>> 
>> (At least as I understand it, I did not dig deeper.)
>> 
>>> --- a/libgfortran/configure.ac
>>> +++ b/libgfortran/configure.ac
>>> ...
>>> +AC_CHECK_LIBM
>> 
>> This CHECK_LIBM has a hard-coded list of targets and for some (like
>> Darwin) it simply does not set $LIBM.
> 
> I thought the proper macro to use was LT_LIB_M ?

you could well be right, libtool.m4 contains:

# Old name:
AU_ALIAS([AC_CHECK_LIBM], [LT_LIB_M])

I guess the patch can be changed and then do another round of testing …
… will add this to the TODO, and withdraw the current patch.

Iain



Re: [PATCH] ipa: add debug counter for IPA MODREF PTA

2021-08-22 Thread Jan Hubicka
> Hi.
> 
> We already have a IPA modref debug counter, but it's only used in 
> tree-ssa-alias,
> which is only a part of what IPA modref does. I used the dbg counter in 
> isolation
> of PR101949.
> 
> Ready for master?
OK,
thanks!

Honza
> 
> gcc/ChangeLog:
> 
>   * dbgcnt.def (DEBUG_COUNTER): New counter.
>   * gimple.c (gimple_call_arg_flags): Use it in IPA PTA.
> ---
>  gcc/dbgcnt.def | 1 +
>  gcc/gimple.c   | 5 +++--
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
> index 2345899ba68..c2bcc4eef5e 100644
> --- a/gcc/dbgcnt.def
> +++ b/gcc/dbgcnt.def
> @@ -175,6 +175,7 @@ DEBUG_COUNTER (ipa_cp_bits)
>  DEBUG_COUNTER (ipa_cp_values)
>  DEBUG_COUNTER (ipa_cp_vr)
>  DEBUG_COUNTER (ipa_mod_ref)
> +DEBUG_COUNTER (ipa_mod_ref_pta)
>  DEBUG_COUNTER (ipa_sra_params)
>  DEBUG_COUNTER (ipa_sra_retvalues)
>  DEBUG_COUNTER (ira_move)
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index 4e2653cab2f..bed7ff9e71c 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -48,7 +48,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "attr-fnspec.h"
>  #include "ipa-modref-tree.h"
>  #include "ipa-modref.h"
> -
> +#include "dbgcnt.h"
>  /* All the tuples have their operand vector (if present) at the very bottom
> of the structure.  Therefore, the offset required to find the
> @@ -1601,7 +1601,8 @@ gimple_call_arg_flags (const gcall *stmt, unsigned arg)
> if ((modref_flags & EAF_DIRECT) && !(flags & EAF_DIRECT))
>   modref_flags &= ~EAF_DIRECT;
>   }
> -   flags |= modref_flags;
> +   if (dbg_cnt (ipa_mod_ref_pta))
> + flags |= modref_flags;
>   }
>  }
>return flags;
> -- 
> 2.32.0
> 


Re: [PATCH] IPA: MODREF should skip EAF_* flags for indirect calls

2021-08-22 Thread Jan Hubicka
> Hello.
> 
> As showed in the PR, returning (EAF_NOCLOBBER | EAF_NOESCAPE) for an argument
> that is a function pointer is problematic. Doing such a function call is a 
> clobber.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
>   PR 101949
> 
> gcc/ChangeLog:
> 
>   * ipa-modref.c (analyze_ssa_name_flags): Do not propagate EAF
> flags arguments for indirect functions.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/lto/pr101949_0.c: New test.
>   * gcc.dg/lto/pr101949_1.c: New test.
> 
> Co-Authored-By: Richard Biener 
> ---
>  gcc/ipa-modref.c  |  3 +++
>  gcc/testsuite/gcc.dg/lto/pr101949_0.c | 20 
>  gcc/testsuite/gcc.dg/lto/pr101949_1.c |  4 
>  3 files changed, 27 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/lto/pr101949_0.c
>  create mode 100644 gcc/testsuite/gcc.dg/lto/pr101949_1.c
> 
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index fafd804d4ba..380ba6926b9 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -1715,6 +1715,9 @@ analyze_ssa_name_flags (tree name, vec 
> &lattice, int depth,
> else if (callee && !ipa && recursive_call_p (current_function_decl,
> callee))
>   lattice[index].merge (0);
> +   /* Ignore indirect calls (PR101949).  */
> +   else if (callee == NULL_TREE)
> + lattice[index].merge (0);

Thanks for looking into this bug - it is interesting that ipa-pta
requires !EAF_NOCLOBBER when function is called...

I have some work done on teaching ipa-modref (and other propagation
passes) to use ipa-devirt info when the full set of callees is known.
This goes oposite way.

You can drop flags only when callee == NAME and you can just frop
EAF_NOCLOBBER.  For example in testcase

struct a {
  void (*foo)();
  void *bar;
}

void wrap (struct a *a)
{
  a->foo ();
}

will prevent us from figuring out that bar can not be modified when you
pass non-ecaping instance of struct a to wrap.

Honza


[PATCH] x86: Allow CONST_VECTOR for vector load in combine

2021-08-22 Thread H.J. Lu via Gcc-patches
In vetor move pattern, replace nonimmediate_or_sse_const_operand with
nonimmediate_or_sse_const_vector_operand to allow vector load from
non-uniform CONST_VECTOR.  Non-uniform CONST_VECTOR is enabled only in
the combine pass since other RTL optimizers work better with constant
pool.

gcc/

PR target/43147
* config/i386/i386-expand.c (ix86_constant_broadcast): New
function.  Extracted from ix86_broadcast_from_constant.
(ix86_broadcast_from_constant): Call ix86_constant_broadcast.
(non_uniform_const_vector_p): New function.
* config/i386/i386-protos.h (non_uniform_const_vector_p): New
prototype.
* config/i386/predicates.md
(nonimmediate_or_sse_const_vector_operand): New predicate.
* config/i386/sse.md (mov_internal): Replace
nonimmediate_or_sse_const_operand with
nonimmediate_or_sse_const_vector_operand.

gcc/testsuite/

PR target/43147
* gcc.target/i386/pr43147.c: New test.
---
 gcc/config/i386/i386-expand.c   | 74 ++---
 gcc/config/i386/i386-protos.h   |  1 +
 gcc/config/i386/predicates.md   |  7 +++
 gcc/config/i386/sse.md  |  2 +-
 gcc/testsuite/gcc.target/i386/pr43147.c | 15 +
 5 files changed, 67 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr43147.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 9bf13dbfa92..1d8f3110310 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -453,31 +453,12 @@ ix86_expand_move (machine_mode mode, rtx operands[])
   emit_insn (gen_rtx_SET (op0, op1));
 }
 
-/* OP is a memref of CONST_VECTOR, return scalar constant mem
-   if CONST_VECTOR is a vec_duplicate, else return NULL.  */
+/* CONSTANT is a CONST_VECTOR, return scalar constant if CONST_VECTOR is
+   a vec_duplicate, else return nullptr.  */
+
 static rtx
-ix86_broadcast_from_constant (machine_mode mode, rtx op)
+ix86_constant_broadcast (rtx constant, machine_mode mode)
 {
-  int nunits = GET_MODE_NUNITS (mode);
-  if (nunits < 2)
-return nullptr;
-
-  /* Don't use integer vector broadcast if we can't move from GPR to SSE
- register directly.  */
-  if (!TARGET_INTER_UNIT_MOVES_TO_VEC
-  && INTEGRAL_MODE_P (mode))
-return nullptr;
-
-  /* Convert CONST_VECTOR to a non-standard SSE constant integer
- broadcast only if vector broadcast is available.  */
-  if (!(TARGET_AVX2
-   || (TARGET_AVX
-   && (GET_MODE_INNER (mode) == SImode
-   || GET_MODE_INNER (mode) == DImode))
-   || FLOAT_MODE_P (mode))
-  || standard_sse_constant_p (op, mode))
-return nullptr;
-
   /* Don't broadcast from a 64-bit integer constant in 32-bit mode.
  We can still put 64-bit integer constant in memory when
  avx512 embed broadcast is available.  */
@@ -486,13 +467,6 @@ ix86_broadcast_from_constant (machine_mode mode, rtx op)
  || (GET_MODE_SIZE (mode) < 64 && !TARGET_AVX512VL)))
 return nullptr;
 
-  if (GET_MODE_INNER (mode) == TImode)
-return nullptr;
-
-  rtx constant = get_pool_constant (XEXP (op, 0));
-  if (GET_CODE (constant) != CONST_VECTOR)
-return nullptr;
-
   /* There could be some rtx like
  (mem/u/c:V16QI (symbol_ref/u:DI ("*.LC1")))
  but with "*.LC1" refer to V2DI constant vector.  */
@@ -506,7 +480,7 @@ ix86_broadcast_from_constant (machine_mode mode, rtx op)
 
   rtx first = XVECEXP (constant, 0, 0);
 
-  for (int i = 1; i < nunits; ++i)
+  for (int i = 1; i < GET_MODE_NUNITS (mode); ++i)
 {
   rtx tmp = XVECEXP (constant, 0, i);
   /* Vector duplicate value.  */
@@ -517,6 +491,44 @@ ix86_broadcast_from_constant (machine_mode mode, rtx op)
   return first;
 }
 
+/* OP is a memref of CONST_VECTOR, return scalar constant mem
+   if CONST_VECTOR is a vec_duplicate, else return NULL.  */
+static rtx
+ix86_broadcast_from_constant (machine_mode mode, rtx op)
+{
+  int nunits = GET_MODE_NUNITS (mode);
+  if (nunits < 2)
+return nullptr;
+
+  /* Don't use integer vector broadcast if we can't move from GPR to SSE
+ register directly.  */
+  if (!TARGET_INTER_UNIT_MOVES_TO_VEC
+  && INTEGRAL_MODE_P (mode))
+return nullptr;
+
+  if (GET_MODE_INNER (mode) == TImode)
+return nullptr;
+
+  rtx constant = get_pool_constant (XEXP (op, 0));
+  if (GET_CODE (constant) != CONST_VECTOR)
+return nullptr;
+
+  return ix86_constant_broadcast (constant, mode);
+}
+
+/* Return true if OP is a non-uniform CONST_VECTOR.  */
+
+bool
+non_uniform_const_vector_p (rtx op, machine_mode mode)
+{
+  /* Allow non-uniform CONST_VECTOR only in the combine pass since other
+ RTL optimizers work better with constant pool.  */
+  return (current_pass
+ && current_pass->tv_id == TV_COMBINE
+ && GET_CODE (op) == CONST_VECTOR
+ && ix86_constant_broadcast (op, mode) == nullptr);
+}
+
 void
 ix86_expand_vector_move (machine_mode mode, rtx opera

Re: [PATCH] Try LTO partial linking. (Was: Speed of compiling gimple-match.c)

2021-08-22 Thread Jan Hubicka
> Good hint. I added hash based on object file name (I don't want to handle
> proper string escaping) and -frandom-seed.
> 
> What do you think about the patch?
Sorry for taking so long - I remember I was sending reply earlier but it
seems I only wrote it and never sent.
> Thanks,
> Martin

> From 372d2944571906932fd1419bfc51a949d67b857e Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Fri, 21 May 2021 10:25:49 +0200
> Subject: [PATCH] LTO: add lto_priv suffixfor LTO_LINKER_OUTPUT_NOLTOREL.
> 
> gcc/lto/ChangeLog:
> 
>   * lto-partition.c (privatize_symbol_name_1): Add random suffix
>   based on hash of the object file and -frandom-seed.
> ---
>  gcc/lto/lto-partition.c | 21 ++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
> index 15761ac9eb5..fef48c869a2 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto/lto-partition.c
> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-fnsummary.h"
>  #include "lto-partition.h"
>  #include "sreal.h"
> +#include "toplev.h"
>  
>  vec ltrans_partitions;
>  
> @@ -941,9 +942,23 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
>  
>name = maybe_rewrite_identifier (name);
>unsigned &clone_number = lto_clone_numbers->get_or_insert (name);
> -  symtab->change_decl_assembler_name (decl,
> -   clone_function_name (
> -   name, "lto_priv", clone_number));
> +
> +  char *suffix = NULL;
> +  if (flag_lto_linker_output == LTO_LINKER_OUTPUT_NOLTOREL)
> +{
> +  hashval_t fnhash = 0;
> +  if (node->lto_file_data != NULL)
> + fnhash = htab_hash_string (node->lto_file_data->file_name);
> +  suffix = XNEWVEC (char, 128);
> +  char sep = symbol_table::symbol_suffix_separator ();
> +  sprintf (suffix, "lto_priv%c%u%c%" PRIu64, sep, fnhash, sep,
> +(unsigned HOST_WIDE_INT)get_random_seed (false));

We have get_file_function_name which does similar work but also working
without random seeds.  Perhaps we can reuse it here: use
get_file_function_name once and use it as prefix or compute hash from
it.

The logic to get unique symbol name is not completely easy and it would
be better to not duplciate it.  Patch is OK with that change
(and indeed it is bugfix - even if it is relatively little used partial
linking of LTO objects into non-LTO should be supported and working).
Honza
> +}
> +
> +  tree clone
> += clone_function_name (name, suffix ? suffix : "lto_priv", clone_number);
> +  symtab->change_decl_assembler_name (decl, clone);
> +  free (suffix);
>clone_number++;
>  
>if (node->lto_file_data)
> -- 
> 2.31.1
> 



Re: fix latent bootstrap-debug issue (modref, tree-inline, tree jump-threading)

2021-08-22 Thread Jan Hubicka
> 
> for  gcc/ChangeLog
> 
>   * ipa-modref.c (analyze_function): Skip debug stmts.
>   * tree-inline.c (estimate_num_insn): Consider builtins even
>   without a cgraph_node.

OK, thanks for looking into this issue!
(for mainline and release brances bit later)
> ---
>  gcc/ipa-modref.c  |3 ++-
>  gcc/tree-inline.c |4 ++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index fafd804d4bae4..f0cddbb077aaa 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -2108,7 +2108,8 @@ analyze_function (function *f, bool ipa)
>FOR_EACH_BB_FN (bb, f)
>  {
>gimple_stmt_iterator si;
> -  for (si = gsi_after_labels (bb); !gsi_end_p (si); gsi_next (&si))
> +  for (si = gsi_start_nondebug_after_labels_bb (bb);
> +!gsi_end_p (si); gsi_next_nondebug (&si))

It seems that analye_stmt indeed does not skip debug stmts.  It is very
odd we got so far without hitting build difference.

Honza
>   {
> if (!analyze_stmt (summary, summary_lto,
>gsi_stmt (si), ipa, &recursive_calls)
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index d0e9f52d5f138..636130fe0019e 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -4436,8 +4436,8 @@ estimate_num_insns (gimple *stmt, eni_weights *weights)
>   /* Do not special case builtins where we see the body.
>  This just confuse inliner.  */
>   struct cgraph_node *node;
> - if (!(node = cgraph_node::get (decl))
> - || node->definition)
> + if ((node = cgraph_node::get (decl))
> + && node->definition)
> ;
>   /* For buitins that are likely expanded to nothing or
>  inlined do not account operand costs.  */
> 
> 
> -- 
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 


[PATCH] Improved handling of shifts/rotates in bit CCP.

2021-08-22 Thread Roger Sayle
 

This patch is the next in the series to improve bit bounds in tree-ssa's

bit CCP pass, this time: bounds for shifts and rotates by unknown amounts.

This allows us to optimize expressions such as ((x&15)<<(y&24))&64.

In this case, the expression (y&24) contains only two unknown bits,

and can therefore have only four possible values: 0, 8, 16 and 24.

>From this (x&15)<<(y&24) has the nonzero bits 0x0f0f0f0f, and from

that ((x&15)<<(y&24))&64 must always be zero.

 

One clever use of computer science in this patch is the use of XOR

to efficiently enumerate bit patterns in Gray code order.  As the

order in which we generate values is not significant, it's faster

and more convenient to enumerate values by flipping one bit at a

time, rather than in numerical order [which would require carry

bits and additional logic].

 

There's a pre-existing ??? comment in tree-ssa-ccp.c that we should

eventually be able to optimize (x<<(y|8))&255, but this patch takes the

conservatively paranoid approach of only optimizing cases where the

shift/rotate is guaranteed to be less than the target precision, and

therefore avoids changing any cases that potentially might invoke

undefined behavior.  This patch does optimize (x<<((y&31)|8))&255.

 

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"

and "make -k check" with no new failures.  OK for mainline?

 

 

2021-08-22  Roger Sayle  

 

gcc/ChangeLog

* tree-ssa-ccp.c (get_individual_bits): Helper function to

extract the individual bits from a widest_int constant (mask).

(gray_code_bit_flips): New read-only table for effiently

enumerating permutations/combinations of bits.

(bit_value_binop) [LROTATE_EXPR, RROTATE_EXPR]: Handle rotates

by unknown counts that are guaranteed less than the target

precision and four or fewer unknown bits by enumeration.

[LSHIFT_EXPR, RSHIFT_EXPR]: Likewise, also handle shifts by

enumeration under the same conditions.  Handle remaining

shifts as a mask based upon the minimum possible shift value.

 

gcc/testsuite/ChangeLog

* gcc.dg/tree-ssa/ssa-ccp-41.c: New test case.

 

 

Roger

--

 

diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 1a63ae5..927a0aa 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -1448,6 +1448,34 @@ bit_value_mult_const (signop sgn, int width,
   *mask = wi::ext (sum_mask, width, sgn);
 }
 
+/* Fill up to MAX values in the BITS array with values representing
+   each of the non-zero bits in the value X.  Returns the number of
+   bits in X (capped at the maximum value MAX).  For example, an X
+   value 11, places 1, 2 and 8 in BITS and returns the value 3.  */
+
+unsigned int
+get_individual_bits (widest_int *bits, widest_int x, unsigned int max)
+{
+  unsigned int count = 0;
+  while (count < max && x != 0)
+{
+  int bitpos = wi::ctz (x);
+  bits[count] = wi::lshift (1, bitpos);
+  x ^= bits[count];
+  count++;
+}
+  return count;
+}
+
+/* Array of 2^N - 1 values representing the bits flipped between
+   consecutive Gray codes.  This is used to efficiently enumerate
+   all permutations on N bits using XOR.  */
+static const unsigned char gray_code_bit_flips[63] = {
+  0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4,
+  0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 5,
+  0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4,
+  0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0
+};
 
 /* Apply the operation CODE in type TYPE to the value, mask pairs
R1VAL, R1MASK and R2VAL, R2MASK representing a values of type R1TYPE
@@ -1525,6 +1553,48 @@ bit_value_binop (enum tree_code code, signop sgn, int 
width,
}
}
}
+  else if (wi::ltu_p (r2val | r2mask, width)
+  && wi::popcount (r2mask) <= 4)
+   {
+ widest_int bits[4];
+ widest_int res_val, res_mask;
+ widest_int tmp_val, tmp_mask;
+ widest_int shift = wi::bit_and_not (r2val, r2mask);
+ unsigned int bit_count = get_individual_bits (bits, r2mask, 4);
+ unsigned int count = (1 << bit_count) - 1;
+
+ /* Initialize result to rotate by smallest value of shift.  */
+ if (code == RROTATE_EXPR)
+   {
+ res_mask = wi::rrotate (r1mask, shift, width);
+ res_val = wi::rrotate (r1val, shift, width);
+   }
+ else
+   {
+ res_mask = wi::lrotate (r1mask, shift, width);
+ res_val = wi::lrotate (r1val, shift, width);
+   }
+
+ /* Iterate through the remaining values of shift.  */
+ for (unsigned int i=0; i (width, false);
+ tmp <<= wi::ctz (r1val | r1mask);
+ tmp <<= wi::bit_and_not (r2val, r2mask);
+ *mask = wi::ext (tmp, width, sgn);
+ *val = 0;
+   }
+ else if (!wi::neg_p (r1val | r1mask, sgn))
+   {
+ /* Logical right shift, or zero sign bit.  */
+ widest_int arg = r1val | r1mask;
+ in

[PATCH] Improved handling of division/modulus in bit CCP.

2021-08-22 Thread Roger Sayle

This patch implements support for TRUNC_MOD_EXPR and TRUNC_DIV_EXPR
in tree-ssa's bit CCP pass.  This is mostly for completeness, as the
VRP pass already provides better bounds for these operations, but
seeing mask values of all_ones in my debugging/instrumentation logs
seemed overly pessimistic.  With this patch, the expression X%10
has a nonzero bits of 0x0f (for unsigned X), likewise (X&1)/3 has
a known value of zero, and (X&3)/3 has a nonzero bits mask of 0x1.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.  Ok for mainline?


2021-08-22  Roger Sayle  

gcc/ChangeLog
* tree-ssa-ccp.c (bit_value_binop) [TRUNC_MOD_EXPR, TRUNC_DIV_EXPR]:
Provide bounds for unsigned (and signed with non-negative operands)
division and modulus.

Roger
--

diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 1a63ae5..1a94aeb 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -1736,6 +1736,68 @@ bit_value_binop (enum tree_code code, signop sgn, int 
width,
break;
   }
 
+case TRUNC_MOD_EXPR:
+  {
+   widest_int r1max = r1val | r1mask;
+   widest_int r2max = r2val | r2mask;
+   if (sgn == UNSIGNED
+   || (!wi::neg_p (r1max) && !wi::neg_p (r2max)))
+ {
+   /* Confirm R2 has some bits set, to avoid division by zero.  */
+   widest_int r2min = wi::bit_and_not (r2val, r2mask);
+   if (r2min != 0)
+ {
+   /* R1 % R2 is R1 if R1 is always less than R2.  */
+   if (wi::ltu_p (r1max, r2min))
+ {
+   *mask = r1mask;
+   *val = r1val;
+ }
+   else
+ {
+   /* R1 % R2 is always less than the maximum of R2.  */
+   unsigned int lzcount = wi::clz (r2max);
+   unsigned int bits = wi::get_precision (r2max) - lzcount;
+   if (r2max == wi::lshift (1, bits))
+ bits--;
+   *mask = wi::mask  (bits, false);
+   *val = 0;
+ }
+  }
+   }
+   }
+  break;
+
+case TRUNC_DIV_EXPR:
+  {
+   widest_int r1max = r1val | r1mask;
+   widest_int r2max = r2val | r2mask;
+   if (sgn == UNSIGNED
+   || (!wi::neg_p (r1max) && !wi::neg_p (r2max)))
+ {
+   /* Confirm R2 has some bits set, to avoid division by zero.  */
+   widest_int r2min = wi::bit_and_not (r2val, r2mask);
+   if (r2min != 0)
+ {
+   /* R1 / R2 is zero if R1 is always less than R2.  */
+   if (wi::ltu_p (r1max, r2min))
+ {
+   *mask = 0;
+   *val = 0;
+ }
+   else
+ {
+   widest_int upper = wi::udiv_trunc (r1max, r2min);
+   unsigned int lzcount = wi::clz (upper);
+   unsigned int bits = wi::get_precision (upper) - lzcount;
+   *mask = wi::mask  (bits, false);
+   *val = 0;
+ }
+  }
+   }
+   }
+  break;
+
 default:;
 }
 }


PING [PATCH] x86: Update memcpy/memset inline strategies for -mtune=generic

2021-08-22 Thread H.J. Lu via Gcc-patches
On Tue, Mar 23, 2021 at 09:19:38AM +0100, Richard Biener wrote:
> On Tue, Mar 23, 2021 at 3:41 AM Hongyu Wang  wrote:
> >
> > > Hongyue, please collect code size differences on SPEC CPU 2017 and
> > > eembc.
> >
> > Here is code size difference for this patch
> 
> Thanks, nothing too bad although slightly larger impacts than envisioned.
> 

PING.

OK for master branch?

Thanks.

H.J.
 ---
Simplify memcpy and memset inline strategies to avoid branches for
-mtune=generic:

1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
   load and store for up to 16 * 16 (256) bytes when the data size is
   fixed and known.
2. Inline only if data size is known to be <= 256.
   a. Use "rep movsb/stosb" with simple code sequence if the data size
  is a constant.
   b. Use loop if data size is not a constant.
3. Use memcpy/memset libray function if data size is unknown or > 256.

With -mtune=generic -O2,

1. On Ice Lake processor,

Performance impacts on SPEC CPU 2017:

500.perlbench_r  0.51%
502.gcc_r0.55%
505.mcf_r0.38%
520.omnetpp_r   -0.74%
523.xalancbmk_r -0.35%
525.x264_r   2.99%
531.deepsjeng_r -0.17%
541.leela_r -0.98%
548.exchange2_r  0.89%
557.xz_r 0.70%
Geomean  0.37%

503.bwaves_r 0.04%
507.cactuBSSN_r -0.01%
508.namd_r  -0.45%
510.parest_r-0.09%
511.povray_r-1.37%
519.lbm_r0.00%
521.wrf_r   -2.56%
526.blender_r   -0.01%
527.cam4_r  -0.05%
538.imagick_r0.36%
544.nab_r0.08%
549.fotonik3d_r -0.06%
554.roms_r   0.05%
Geomean -0.34%

Significant impacts on eembc benchmarks:

eembc/nnet_test  14.85%
eembc/mp2decoddata2  13.57%

2. On Cascadelake processor,

Performance impacts on SPEC CPU 2017:

500.perlbench_r -0.02%
502.gcc_r0.10%
505.mcf_r   -1.14%
520.omnetpp_r   -0.22%
523.xalancbmk_r  0.21%
525.x264_r   0.94%
531.deepsjeng_r -0.37%
541.leela_r -0.46%
548.exchange2_r -0.40%
557.xz_r 0.60%
Geomean -0.08%

503.bwaves_r-0.50%
507.cactuBSSN_r  0.05%
508.namd_r  -0.02%
510.parest_r 0.09%
511.povray_r-1.35%
519.lbm_r0.00%
521.wrf_r   -0.03%
526.blender_r   -0.83%
527.cam4_r   1.23%
538.imagick_r0.97%
544.nab_r   -0.02%
549.fotonik3d_r -0.12%
554.roms_r   0.55%
Geomean  0.00%

Significant impacts on eembc benchmarks:

eembc/nnet_test  9.90%
eembc/mp2decoddata2  16.42%
eembc/textv2data3   -4.86%
eembc/qos12.90%

3. On Znver3 processor,

Performance impacts on SPEC CPU 2017:

500.perlbench_r -0.96%
502.gcc_r   -1.06%
505.mcf_r   -0.01%
520.omnetpp_r   -1.45%
523.xalancbmk_r  2.89%
525.x264_r   4.98%
531.deepsjeng_r  0.18%
541.leela_r -1.54%
548.exchange2_r -1.25%
557.xz_r-0.01%
Geomean  0.16%

503.bwaves_r 0.04%
507.cactuBSSN_r  0.85%
508.namd_r  -0.13%
510.parest_r 0.39%
511.povray_r 0.00%
519.lbm_r0.00%
521.wrf_r0.28%
526.blender_r   -0.10%
527.cam4_r  -0.58%
538.imagick_r0.69%
544.nab_r   -0.04%
549.fotonik3d_r -0.04%
554.roms_r   0.40%
Geomean  0.15%

Significant impacts on eembc benchmarks:

eembc/aifftr01   13.95%
eembc/idctrn01   8.41%
eembc/nnet_test  30.25%
eembc/mp2decoddata2  5.05%
eembc/textv2data36.43%
eembc/qos   -5.79%

Code size differences are:

SPEC CPU 2017

  difference  w patch  w/o patch
500.perlbench_r 0.051%1622637  1621805
502.gcc_r   0.039%6930877  6928141
505.mcf_r   0.098%1641316397
520.omnetpp_r   0.083%1327757  1326653
523.xalancbmk_r 0.001%3575709  3575677
525.x264_r -0.067%769095   769607
531.deepsjeng_r 0.071%6762967581
541.leela_r-3.062%127629   131661
548.exchange2_r-0.338%6614166365
557.xz_r0.946%128061   126861
503.bwaves_r0.534%3311732941
507.cactuBSSN_r 0.004%2993645  2993517
508.namd_r  0.006%851677   851629
510.parest_r0.488%6741277  6708557
511.povray_r   -0.021%849290   849466
521.wrf_r   0.022%29682154 29675530
526.blender_r   0.054%7544057  7540009
527.cam4_r  0.043%6102234  6099594
538.imagick_r  -0.015%1625770  1626010
544.nab_r   0.155%155453   155213
549.fotonik3d_r 0.000%351757   351757
554.roms_r  0.041%735837   735533

eembc

aifftr010.762%1481314701
aiifft010.556%1447714397
idctrn010.101%1585315837
cjpeg-rose7-preset  0.114%5612556061
nnet_test  -0.848%3554935853
aes 0.125%3849338445
cjpegv2data 0.108%5921359149
djpegv2data 

Re: [PATCH] IPA: MODREF should skip EAF_* flags for indirect calls

2021-08-22 Thread Jan Hubicka
> Thanks for looking into this bug - it is interesting that ipa-pta
> requires !EAF_NOCLOBBER when function is called...
> 
> I have some work done on teaching ipa-modref (and other propagation
> passes) to use ipa-devirt info when the full set of callees is known.
> This goes oposite way.
> 
> You can drop flags only when callee == NAME and you can just frop
> EAF_NOCLOBBER.  For example in testcase
> 
> struct a {
>   void (*foo)();
>   void *bar;
> }
> 
> void wrap (struct a *a)
> {
>   a->foo ();
> }
> 
> will prevent us from figuring out that bar can not be modified when you
> pass non-ecaping instance of struct a to wrap.
> 

I am testing this updated patch which implements that.  I am not very
happy about this (it punishes -fno-ipa-pta path for not very good
reason), but we need bugfix for release branch.  

It is very easy now to add now EAF flags at modref side
so we can track EAF_NOT_CALLED. 
tree-ssa-structalias side is always bit anoying wrt new EAF flags
because it has three copies of the code building constraints for call
(for normal, pure and const).

Modref is already tracking if function can read/modify global memory.  I
plan to add flags for NRC and link chain and then we can represent
effect of ECF_CONST and PURE by simply adding flags.  I would thus would
like to merge that code.  We do various optimizations to reduce amount
of constriants produced, but hopefully this is not very important (or
can be implemented by special casing in unified code).

Honza

gcc/ChangeLog:

2021-08-22  Jan Hubicka  
Martin Liska  

* ipa-modref.c (analyze_ssa_name_flags): Indirect call implies
~EAF_NOCLOBBER.

gcc/testsuite/ChangeLog:

2021-08-22  Jan Hubicka  
Martin Liska  

* gcc.dg/lto/pr101949_0.c: New test.
* gcc.dg/lto/pr101949_1.c: New test.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index fafd804d4ba..549153865b8 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -1700,6 +1700,15 @@ analyze_ssa_name_flags (tree name, vec 
&lattice, int depth,
   else if (gcall *call = dyn_cast  (use_stmt))
{
  tree callee = gimple_call_fndecl (call);
+
+ /* IPA PTA internally it treats calling a function as "writing" to
+the argument space of all functions the function pointer points to
+(PR101949).  We can not drop EAF_NOCLOBBER only when ipa-pta
+is on since that would allow propagation of this from -fno-ipa-pta
+to -fipa-pta functions.  */
+ if (gimple_call_fn (use_stmt) == name)
+   lattice[index].merge (~EAF_NOCLOBBER);
+
  /* Return slot optimization would require bit of propagation;
 give up for now.  */
  if (gimple_call_return_slot_opt_p (call)
diff --git a/gcc/testsuite/gcc.dg/lto/pr101949_0.c 
b/gcc/testsuite/gcc.dg/lto/pr101949_0.c
new file mode 100644
index 000..142dffe8780
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr101949_0.c
@@ -0,0 +1,20 @@
+/* { dg-lto-do run } */
+/* { dg-lto-options { "-O2 -fipa-pta -flto -flto-partition=1to1" } } */
+
+extern int bar (int (*)(int *), int *);
+
+static int x;
+
+static int __attribute__ ((noinline)) foo (int *p)
+{
+  *p = 1;
+  x = 0;
+  return *p;
+}
+
+int main ()
+{
+  if (bar (foo, &x) != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/lto/pr101949_1.c 
b/gcc/testsuite/gcc.dg/lto/pr101949_1.c
new file mode 100644
index 000..871d15c9bfb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr101949_1.c
@@ -0,0 +1,4 @@
+int __attribute__((noinline,noclone)) bar (int (*fn)(int *), int *p)
+{
+  return fn (p);
+}


Re: [PATCH] IPA: MODREF should skip EAF_* flags for indirect calls

2021-08-22 Thread H.J. Lu via Gcc-patches
On Sun, Aug 22, 2021 at 10:32 AM Jan Hubicka  wrote:
>
> > Thanks for looking into this bug - it is interesting that ipa-pta
> > requires !EAF_NOCLOBBER when function is called...
> >
> > I have some work done on teaching ipa-modref (and other propagation
> > passes) to use ipa-devirt info when the full set of callees is known.
> > This goes oposite way.
> >
> > You can drop flags only when callee == NAME and you can just frop
> > EAF_NOCLOBBER.  For example in testcase
> >
> > struct a {
> >   void (*foo)();
> >   void *bar;
> > }
> >
> > void wrap (struct a *a)
> > {
> >   a->foo ();
> > }
> >
> > will prevent us from figuring out that bar can not be modified when you
> > pass non-ecaping instance of struct a to wrap.
> >
>
> I am testing this updated patch which implements that.  I am not very
> happy about this (it punishes -fno-ipa-pta path for not very good
> reason), but we need bugfix for release branch.
>
> It is very easy now to add now EAF flags at modref side
> so we can track EAF_NOT_CALLED.
> tree-ssa-structalias side is always bit anoying wrt new EAF flags
> because it has three copies of the code building constraints for call
> (for normal, pure and const).
>
> Modref is already tracking if function can read/modify global memory.  I
> plan to add flags for NRC and link chain and then we can represent
> effect of ECF_CONST and PURE by simply adding flags.  I would thus would
> like to merge that code.  We do various optimizations to reduce amount
> of constriants produced, but hopefully this is not very important (or
> can be implemented by special casing in unified code).
>
> Honza
>
> gcc/ChangeLog:
>
> 2021-08-22  Jan Hubicka  
> Martin Liska  
>
> * ipa-modref.c (analyze_ssa_name_flags): Indirect call implies
> ~EAF_NOCLOBBER.
>
> gcc/testsuite/ChangeLog:
>
> 2021-08-22  Jan Hubicka  
> Martin Liska  
>
> * gcc.dg/lto/pr101949_0.c: New test.
> * gcc.dg/lto/pr101949_1.c: New test.
>
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index fafd804d4ba..549153865b8 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -1700,6 +1700,15 @@ analyze_ssa_name_flags (tree name, vec 
> &lattice, int depth,
>else if (gcall *call = dyn_cast  (use_stmt))
> {
>   tree callee = gimple_call_fndecl (call);
> +
> + /* IPA PTA internally it treats calling a function as "writing" to
> +the argument space of all functions the function pointer points 
> to
> +(PR101949).  We can not drop EAF_NOCLOBBER only when ipa-pta
> +is on since that would allow propagation of this from 
> -fno-ipa-pta
> +to -fipa-pta functions.  */
> + if (gimple_call_fn (use_stmt) == name)
> +   lattice[index].merge (~EAF_NOCLOBBER);
> +
>   /* Return slot optimization would require bit of propagation;
>  give up for now.  */
>   if (gimple_call_return_slot_opt_p (call)
> diff --git a/gcc/testsuite/gcc.dg/lto/pr101949_0.c 
> b/gcc/testsuite/gcc.dg/lto/pr101949_0.c
> new file mode 100644
> index 000..142dffe8780
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/lto/pr101949_0.c
> @@ -0,0 +1,20 @@
> +/* { dg-lto-do run } */
> +/* { dg-lto-options { "-O2 -fipa-pta -flto -flto-partition=1to1" } } */
> +
> +extern int bar (int (*)(int *), int *);
> +
> +static int x;
> +
> +static int __attribute__ ((noinline)) foo (int *p)
> +{
> +  *p = 1;
> +  x = 0;
> +  return *p;
> +}
> +
> +int main ()
> +{
> +  if (bar (foo, &x) != 0)
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/lto/pr101949_1.c 
> b/gcc/testsuite/gcc.dg/lto/pr101949_1.c
> new file mode 100644
> index 000..871d15c9bfb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/lto/pr101949_1.c
> @@ -0,0 +1,4 @@
> +int __attribute__((noinline,noclone)) bar (int (*fn)(int *), int *p)
> +{
> +  return fn (p);
> +}

On Linux/x86-64 with -m32, I got

FAIL: gcc.dg/lto/pr101949 c_lto_pr101949_0.o-c_lto_pr101949_1.o
execute -O2 -fipa-pta -flto -flto-partition=1to1


-- 
H.J.


Re: [C PATCH] qualifiers of pointers to arrays in C2X [PR 98397]

2021-08-22 Thread Uecker, Martin
Am Donnerstag, den 12.08.2021, 16:58 + schrieb Joseph Myers:
> On Mon, 24 May 2021, Uecker, Martin wrote:
> 
> > -  else if (VOID_TYPE_P (TREE_TYPE (type1))
> > -  && !TYPE_ATOMIC (TREE_TYPE (type1)))
> > -   {
> > - if ((TREE_CODE (TREE_TYPE (type2)) == ARRAY_TYPE)
> > - && (TYPE_QUALS (strip_array_types (TREE_TYPE (type2)))
> > - & ~TYPE_QUALS (TREE_TYPE (type1
> > -   warning_at (colon_loc, OPT_Wdiscarded_array_qualifiers,
> > -   "pointer to array loses qualifier "
> > -   "in conditional expression");
> > -
> > - if (TREE_CODE (TREE_TYPE (type2)) == FUNCTION_TYPE)
> > +  else if ((VOID_TYPE_P (TREE_TYPE (type1))
> > +   && !TYPE_ATOMIC (TREE_TYPE (type1)))
> > +  || (VOID_TYPE_P (TREE_TYPE (type2))
> > +  && !TYPE_ATOMIC (TREE_TYPE (type2
> 
> Here you're unifying the two cases where one argument is (not a null 
> pointer constant and) a pointer to qualified or unqualified void (and the 
> other argument is not a pointer to qualified or unqualified void).  The 
> !TYPE_ATOMIC checks are because of the general rule that _Atomic is a type 
> qualifier only syntactically, so _Atomic void doesn't count as qualified 
> void for this purpose.
> 
> > +   {
> > + tree t1 = TREE_TYPE (type1);
> > + tree t2 = TREE_TYPE (type2);
> > + if (!VOID_TYPE_P (t1))
> > +  {
> > +/* roles are swapped */
> > +t1 = t2;
> > +t2 = TREE_TYPE (type1);
> > +  }
> 
> But here you don't have a TYPE_ATOMIC check before swapping.  So if t1 is 
> _Atomic void and t2 is void, the types don't get swapped.
> 
> > + /* for array, use qualifiers of element type */
> > + if (flag_isoc2x)
> > +   t2 = t2_stripped;
> > + result_type = build_pointer_type (qualify_type (t1, t2));
> 
> And then it looks to me like this will end up with _Atomic void * as the 
> result type, when a conditional expression between _Atomic void * and 
> void * should actually have type void *.
> 
> If that's indeed the case, I think the swapping needs to occur whenever t1 
> is not *non-atomic* void, so that the condition for swapping matches the 
> condition checked in the outer if.  (And of course there should be a 
> testcase for that.)
> 
> I didn't see any other issues in this version of the patch.

Committed with this change and the additional test.

Martin

> 


[PATCH] Fold sign of LSHIFT_EXPR to eliminate no-op conversions.

2021-08-22 Thread Roger Sayle

This short patch teaches fold that it is "safe" to change the sign
of a left shift, to reduce the number of type conversions in gimple.
As an example:

unsigned int foo(unsigned int i) {
  return (int)i << 8;
}

is currently optimized to:

unsigned int foo (unsigned int i)
{
  int i.0_1;
  int _2;
  unsigned int _4;

   [local count: 1073741824]:
  i.0_1 = (int) i_3(D);
  _2 = i.0_1 << 8;
  _4 = (unsigned int) _2;
  return _4;
}

with this patch, this now becomes:

unsigned int foo (unsigned int i)
{
  unsigned int _2;

   [local count: 1073741824]:
  _2 = i_1(D) << 8;
  return _2;
}

which generates exactly the same assembly language.  Aside from the
reduced memory usage, the real benefit is that no-op conversions tend
to interfere with many folding optimizations.  For example,

unsigned int bar(unsigned char i) {
return (i ^ (i<<16)) | (i<<8);
}

currently gets (tangled in conversions and) optimized to:

unsigned int bar (unsigned char i)
{
  unsigned int _1;
  unsigned int _2;
  int _3;
  int _4;
  unsigned int _6;
  unsigned int _8;

   [local count: 1073741824]:
  _1 = (unsigned int) i_5(D);
  _2 = _1 * 65537;
  _3 = (int) i_5(D);
  _4 = _3 << 8;
  _8 = (unsigned int) _4;
  _6 = _2 | _8;
  return _6;
}

but with this patch, bar now optimizes down to:

unsigned int bar(unsigned char i)
{
  unsigned int _1;
  unsigned int _4;

   [local count: 1073741824]:
  _1 = (unsigned int) i_3(D);
  _4 = _1 * 65793;
  return _4;

}


This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.  Ok for mainline?


2021-08-23  Roger Sayle  

gcc/ChangeLog
* match.pd (shift transformations): Change the sign of an
LSHIFT_EXPR if it reduces the number of explicit conversions.

gcc/testsuite/ChangeLog
* gcc.dg/fold-convlshift-1.c: New test case.
* gcc.dg/fold-convlshift-2.c: New test case.


Roger
--

diff --git a/gcc/match.pd b/gcc/match.pd
index 0fcfd0e..978a1b0 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3385,6 +3385,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (if (integer_zerop (@2) || integer_all_onesp (@2))
  (cmp @0 @2)
 
+/* Both signed and unsigned lshift produce the same result, so use
+   the form that minimizes the number of conversions.  */
+(simplify
+ (convert (lshift:s@0 (convert:s@1 @2) INTEGER_CST@3))
+ (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
+  && INTEGRAL_TYPE_P (TREE_TYPE (@2))
+  && TYPE_PRECISION (TREE_TYPE (@2)) <= TYPE_PRECISION (type))
+  (lshift (convert @2) @3)))
+
 /* Simplifications of conversions.  */
 
 /* Basic strip-useless-type-conversions / strip_nops.  */
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-optimized" } */

unsigned int foo(unsigned int i)
{
  int t1 = i;
  int t2 = t1 << 8;
  return t2;
}

int bar(int i)
{
  unsigned int t1 = i;
  unsigned int t2 = t1 << 8;
  return t2;
}

/* { dg-final { scan-tree-dump-not "\\(int\\)" "optimized" } } */
/* { dg-final { scan-tree-dump-not "\\(unsigned int\\)" "optimized" } } */

/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-optimized" } */

unsigned int foo(unsigned char c)
{
  int t1 = c;
  int t2 = t1 << 8;
  return t2;
}

int bar(unsigned char c)
{
  unsigned int t1 = c;
  unsigned int t2 = t1 << 8;
  return t2;
}

/* { dg-final { scan-tree-dump-times "\\(int\\)" 1 "optimized" } } */
/* { dg-final { scan-tree-dump-times "\\(unsigned int\\)" 1 "optimized" } } */



Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]

2021-08-22 Thread Victor Tong via Gcc-patches
Thanks for the feedback. I updated the pattern and it passes all tests 
(existing and the new ones I wrote). I added some brackets since there were 
some warnings about missing brackets on the || and &&. Here's the updated 
pattern:

  (simplify
(minus (convert1? @0) (convert2? (minus@2 (convert3? @@0) @1)))
(if (INTEGRAL_TYPE_P (type)
&& !TYPE_OVERFLOW_SANITIZED(type)
&& INTEGRAL_TYPE_P (TREE_TYPE(@2))
&& !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@2))
&& INTEGRAL_TYPE_P (TREE_TYPE(@0))
&& !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
&& 
 (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)) ||
   (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@2)) &&
 (TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (TREE_TYPE (@2)) 
||
 (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE 
(@2)) &&
  !TYPE_UNSIGNED (TREE_TYPE (@0)))
 
(convert @1)))


>>> TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2))

Did you mean > instead of <=? With the condition you proposed, that would 
trigger the optimization in cases where values may get truncated which I think 
should be avoided for this optimization.

>>> Maybe the new transform could be about scalars, and we could restrict the 
>>> old one to vectors, to simplify the code,

I tried limiting the existing pattern to vector types by changing it to the 
following:

  (simplify
   (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
   (if (VECTOR_TYPE_P(type))
   (view_convert @1)))
   
I found that the new pattern doesn't cover some cases. Specifically, it doesn't 
cover a case in pr92734-2.c:

unsigned
f10 (unsigned x, int y)
{
  unsigned a = (int) x - y;
  return x - a;
}

I think the pattern isn't triggering because of the !TYPE_UNSIGNED (TREE_TYPE 
(@0)) check. I'm slightly concerned that changing the new pattern to cover the 
existing cases would add complexity to the new pattern, making it difficult to 
understand.

I also think the new pattern could be simplified by removing the convert on @0. 
I don't think it's needed for the regression pattern that I was seeing, but I 
had added it to be more thorough so the pattern covers more cases. 

From: Richard Biener 
Sent: Monday, August 9, 2021 2:58 AM
To: Marc Glisse 
Cc: Victor Tong ; gcc-patches@gcc.gnu.org 

Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division 
followed by multiply [PR95176] 
 
On Sat, Aug 7, 2021 at 12:49 AM Marc Glisse  wrote:
>
> On Fri, 6 Aug 2021, Victor Tong wrote:
>
> > Thanks for the feedback. Here's the updated pattern:
> >
> >  /* X - (X - Y) --> Y */
> >  (simplify
> >    (minus (convert1? @0) (convert2? (minus@2 (convert3? @@0) @1)))
> >    (if (ANY_INTEGRAL_TYPE_P (type)
> >    && TYPE_OVERFLOW_UNDEFINED(type)
> >    && !TYPE_OVERFLOW_SANITIZED(type)
> >    && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@2))
> >    && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@2))
> >    && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@2))
> >    && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0))
> >    && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
> >    && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
> >    && TYPE_PRECISION (TREE_TYPE (@2)) <= TYPE_PRECISION (type)
> >    && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type))
> >    (convert @1)))
> >
> > I kept the TYPE_OVERFLOW_SANITIZED checks because I saw other patterns that 
> > leverage undefined overflows check for it. I think this new pattern 
> > shouldn't be applied if overflow sanitizer checks are enabled.
> >
> >>> why is this testing TREE_TYPE (@0)?
> >
> > I'm checking the type of @0 because I'm concerned that there could be a 
> > case where @0's type isn't an integer type with undefined overflow. I tried 
> > creating a test case and couldn't seem to create one where this is violated 
> > but I kept the checks to avoid causing a regression. If I'm being 
> > overcautious and you feel that the type checks on @0 aren't needed, I can 
> > remove them. I think the precision check on TREE_TYPE(@0) is needed to 
> > avoid truncation cases though.
>
> It doesn't matter if @0 has undefined overflow, but it can matter that it
> be signed (yes, the 2 are correlated...) if it has the same precision as
> @2. Otherwise (int64_t)(-1u)-(int64_t)((int)(-1u)-0) is definitely not 0
> and it has type:int64_t, @2:int, @0:unsigned.
>
> Ignoring the sanitizer, the confusing double matching of constants, and
> restricting to scalars, I think the tightest condition (without vrp) that
> works is
>
> TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)) ||
>   TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@2)) &&
>    (TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (TREE_TYPE (@2)) ||
> TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@2)) &&
> !TYPE_UNSIGNED (TREE_TYPE (@0))
>    )
>
> (where implicitly undefined => signed) but of course it is ok to handle
> only a subset. It is too late for me to think abou

Re: [PATCH] Fold sign of LSHIFT_EXPR to eliminate no-op conversions.

2021-08-22 Thread Jeff Law via Gcc-patches




On 8/22/2021 6:25 PM, Roger Sayle wrote:

This short patch teaches fold that it is "safe" to change the sign
of a left shift, to reduce the number of type conversions in gimple.
As an example:

unsigned int foo(unsigned int i) {
   return (int)i << 8;
}

is currently optimized to:

unsigned int foo (unsigned int i)
{
   int i.0_1;
   int _2;
   unsigned int _4;

[local count: 1073741824]:
   i.0_1 = (int) i_3(D);
   _2 = i.0_1 << 8;
   _4 = (unsigned int) _2;
   return _4;
}

with this patch, this now becomes:

unsigned int foo (unsigned int i)
{
   unsigned int _2;

[local count: 1073741824]:
   _2 = i_1(D) << 8;
   return _2;
}

which generates exactly the same assembly language.  Aside from the
reduced memory usage, the real benefit is that no-op conversions tend
to interfere with many folding optimizations.  For example,

unsigned int bar(unsigned char i) {
 return (i ^ (i<<16)) | (i<<8);
}

currently gets (tangled in conversions and) optimized to:

unsigned int bar (unsigned char i)
{
   unsigned int _1;
   unsigned int _2;
   int _3;
   int _4;
   unsigned int _6;
   unsigned int _8;

[local count: 1073741824]:
   _1 = (unsigned int) i_5(D);
   _2 = _1 * 65537;
   _3 = (int) i_5(D);
   _4 = _3 << 8;
   _8 = (unsigned int) _4;
   _6 = _2 | _8;
   return _6;
}

but with this patch, bar now optimizes down to:

unsigned int bar(unsigned char i)
{
   unsigned int _1;
   unsigned int _4;

[local count: 1073741824]:
   _1 = (unsigned int) i_3(D);
   _4 = _1 * 65793;
   return _4;

}


This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.  Ok for mainline?


2021-08-23  Roger Sayle  

gcc/ChangeLog
* match.pd (shift transformations): Change the sign of an
LSHIFT_EXPR if it reduces the number of explicit conversions.

gcc/testsuite/ChangeLog
* gcc.dg/fold-convlshift-1.c: New test case.
* gcc.dg/fold-convlshift-2.c: New test case.
Presumably we're relying on the fact that the type of the convert:@1 has 
to be the same type as @0, thus there's no need to check anything 
related to @1.


OK
jeff



Re: [Patch] gcc.c-torture/execute: Fix tmpnam issue on Windows

2021-08-22 Thread Jeff Law via Gcc-patches




On 8/21/2021 9:10 PM, Jonathan Yong via Gcc-patches wrote:

Attached patch OK?

2021-08-22  Jonathan Yong  <10wa...@gmail.com>

gcc/testsuite/ChangLog:

* gcc.c-torture/execute/gcc_tmpnam.h: Fix tmpnam case on Windows
where it can return a filename with "\" to indicate current
directory.
* gcc.c-torture/execute/fprintf-2.c: Use wrapper.
* gcc.c-torture/execute/printf-2.c: Use wrapper.
* gcc.c-torture/execute/user-printf.c: Use wrapper.

OK
jeff



Re: [PATCH] Simplify (truncate:QI (subreg:SI (reg:QI x))) to (reg:QI x)

2021-08-22 Thread Jeff Law via Gcc-patches




On 8/19/2021 5:18 PM, Roger Sayle wrote:

Whilst working on a backend patch, I noticed that the middle-end's
RTL optimizers weren't simplifying a truncation of a paradoxical
subreg extension, though it does transform closely related (more
complex) expressions.  The main (first) part of this patch
implements this simplification, reusing much of the logic already
in place.

I briefly considered suggesting that it's difficult to provide a new
testcase for this change, but then realized the reviewer's response
would be that this type of transformation should be self-tested
in simplify-rtx, so this patch adds a bunch of tests that integer
extensions and truncations are simplified as expected.  No good
deed goes unpunished and I was equally surprised to see that we
don't currently simplify/check/defend (zero_extend:SI (reg:SI)),
i.e. useless no-op extensions to the same mode.  So I've added
some logic to simplify (or more accurately prevent us generating
dubious RTL for) those.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and "make -k check" with no new failures.
Indeed.  I'd bet there's other weaknesses in here.   I've got some 
patches here which add overflow handling on the H8 port (attempting to 
cut runtime of the builtin-arith-overflow-* tests).  Those end up using 
subregs and extensions fairly heavily.  While looking at how the code 
moves through the RTL pipeline it became pretty clear that we're 
generally not doing a good job at optimizing those cases well.


Thankfully I've found some sequences that allow the port to do limited 
store-flag instructions and that eliminated the need to chase this stuff 
down, at least for now.




Ok for mainline?


2021-08-20  Roger Sayle  

gcc/ChangeLog
* simplify-rtx.c (simplify_truncation): Generalize simplification
of (truncate:A (subreg:B X)).
(simplify_unary_operation_1) [FLOAT_TRUNCATE, FLOAT_EXTEND,
SIGN_EXTEND, ZERO_EXTEND]: Handle cases where the operand
already has the desired machine mode.
(test_scalar_int_ops): Add tests that useless extensions and
truncations are optimized away.
(test_scalar_int_ext_ops): New self-test function to confirm
that truncations of extensions are correctly simplified.
(test_scalar_int_ext_ops2): New self-test function to check
truncations of truncations, extensions of extensions, and
truncations of extensions.
(test_scalar_ops): Call the above two functions with a
representative sampling of integer machine modes.
I briefly thought you were missing a subreg_lowpart check, but that's 
checked in the outermost IF.  The comments are somewhat misleading as 
the subreg offset in a lowpart will vary based on endianness, but that's 
not a big deal IMHO,


OK
jeff



Re: [PATCH] mips: msa: truncate immediate shift amount [PR101922]

2021-08-22 Thread Jeff Law via Gcc-patches




On 8/20/2021 11:07 AM, Xi Ruoyao via Gcc-patches wrote:

When -mloongson-mmi is enabled, SHIFT_COUNT_TRUNCATED is turned off.
This causes untruncated immediate shift amount outputed into the asm,
and the GNU assembler refuses to assemble it.

Truncate immediate shift amount when outputing the asm instruction to
make GAS happy again.

gcc/

PR target/101922
* config/mips/mips-protos.h (mips_msa_output_shift_immediate):
  Declare.
* config/mips/mips.c (mips_msa_output_shift_immediate): New
  function.
* config/mips/mips-msa.md (vashl3, vashr3,
  vlshr3): Call it.

gcc/testsuite/

PR target/101922
* gcc.target/mips/pr101922.c: New test.

OK.

Q. Looking out further, is it going to continue to make sense to have 
loongson continue to be based on the mips port, or is it going to make 
more sense to have a distinct loongson port?


Jeff



Re: [PATCH] Improved handling of division/modulus in bit CCP.

2021-08-22 Thread Jeff Law via Gcc-patches




On 8/22/2021 8:50 AM, Roger Sayle wrote:

This patch implements support for TRUNC_MOD_EXPR and TRUNC_DIV_EXPR
in tree-ssa's bit CCP pass.  This is mostly for completeness, as the
VRP pass already provides better bounds for these operations, but
seeing mask values of all_ones in my debugging/instrumentation logs
seemed overly pessimistic.  With this patch, the expression X%10
has a nonzero bits of 0x0f (for unsigned X), likewise (X&1)/3 has
a known value of zero, and (X&3)/3 has a nonzero bits mask of 0x1.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.  Ok for mainline?


2021-08-22  Roger Sayle  

gcc/ChangeLog
* tree-ssa-ccp.c (bit_value_binop) [TRUNC_MOD_EXPR, TRUNC_DIV_EXPR]:
Provide bounds for unsigned (and signed with non-negative operands)
division and modulus.

OK
jeff



Re: [Patch] gcc.c-torture/execute: Fix tmpnam issue on Windows

2021-08-22 Thread Jonathan Yong via Gcc-patches

On 8/23/21 1:07 AM, Jeff Law wrote:



On 8/21/2021 9:10 PM, Jonathan Yong via Gcc-patches wrote:

Attached patch OK?

2021-08-22  Jonathan Yong  <10wa...@gmail.com>

gcc/testsuite/ChangLog:

* gcc.c-torture/execute/gcc_tmpnam.h: Fix tmpnam case on Windows
where it can return a filename with "\" to indicate current
directory.
* gcc.c-torture/execute/fprintf-2.c: Use wrapper.
* gcc.c-torture/execute/printf-2.c: Use wrapper.
* gcc.c-torture/execute/user-printf.c: Use wrapper.

OK
jeff



Pushed to master branch, thanks.


OpenPGP_0x713B5FE29C145D45.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH] Disable slp in loop vectorizer when cost model is very-cheap.

2021-08-22 Thread liuhongt via Gcc-patches
Performance impact for the commit with option:
-march=x86-64 -O2 -ftree-vectorize -fvect-cost-model=very-cheap

SPEC2017 fprate
503.bwaves_rBuildSame
507.cactuBSSN_r -0.04
508.namd_r   0.14
510.parest_r-0.54
511.povray_r 0.10
519.lbm_r   BuildSame
521.wrf_r0.64
526.blender_r   -0.32
527.cam4_r   0.17
538.imagick_r0.09
544.nab_r   BuildSame
549.fotonik3d_r BuildSame
554.roms_r  BuildSame
997.specrand_fr -0.09
Geometric mean:  0.02

SPEC2017 intrate
500.perlbench_r  0.26
502.gcc_r0.21
505.mcf_r   -0.09
520.omnetpp_r   BuildSame
523.xalancbmk_r BuildSame
525.x264_r  -0.41
531.deepsjeng_r BuildSame
541.leela_r  0.13
548.exchange2_r BuildSame
557.xz_rBuildSame
999.specrand_ir BuildSame
Geometric mean:  0.02

EEMBC: no regression, only improvement or build the same, the below is
improved benchmarks.

mp2decoddata1   7.59
mp2decoddata2   31.80
mp2decoddata3   12.15
mp2decoddata4   11.16
mp2decoddata5   11.19
mp2decoddata1   7.06
mp2decoddata2   24.12
mp2decoddata3   10.83
mp2decoddata4   10.04
mp2decoddata5   10.07

Survived regression test.
Ok for trunk?

gcc/ChangeLog:

PR tree-optimization/100089
* tree-vectorizer.c (try_vectorize_loop_1): Disable slp in
loop vectorizer when cost model is very-cheap.
---
 gcc/tree-vectorizer.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index b9709a613d5..8a5b8735546 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -1033,7 +1033,10 @@ try_vectorize_loop_1 (hash_table 
*&simduid_to_vf_htab,
 only non-if-converted parts took part in BB vectorization.  */
   if (flag_tree_slp_vectorize != 0
  && loop_vectorized_call
- && ! loop->inner)
+ && ! loop->inner
+ /* This would purely be a workaround and should be removed
+once the PR100089 is fixed.  */
+ && flag_vect_cost_model != VECT_COST_MODEL_VERY_CHEAP)
{
  basic_block bb = loop->header;
  bool require_loop_vectorize = false;
-- 
2.18.1



Re: [PATCH] Propagate get_nonzero_bits information in division [PR77980]

2021-08-22 Thread Jeff Law via Gcc-patches




On 7/26/2021 6:45 PM, Victor Tong via Gcc-patches wrote:

This change enables the "t1 != 0" check to be optimized away in this code:

int x1 = 0;
unsigned int x2 = 1;

int main ()
{
 int t1 = x1*(1/(x2+x2));
 if (t1 != 0) __builtin_abort();
 return 0;
}

The change utilizes the VRP framework to propagate the get_nonzero_bits information from the 
"x2+x2" expression to the "1/(x2+x2)" division expression. Specifically, the framework 
knows that the least significant bit of the "x2+x2" expression must be zero.

The get_nonzero_bits information of the left hand side and right hand side of 
expressions needed to be passed down to operator_div::wi_fold() in the VRP 
framework. The majority of this change involves adding two additional 
parameters to propagate this information. There are future opportunities to use 
the non zero bit information to perform better optimizations in other types of 
expressions.

The changes were tested against x86_64-pc-linux-gnu and all tests in "make -k 
check" passed.

The original approach was to implement a match.pd pattern to support this but 
the pattern wasn't being triggered. More context is available in: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77980
So you're going to want to sync with Aldy & Andrew as they're the 
experts on the Ranger design & implementation.  This hits the Ranger API 
as well as design questions about how best to tie in the nonzero_bits 
capabilities.


You might also want to reach out to Roger Sayle.  He's been poking 
around in a closely related area, though more focused on the bitwise 
conditional constant propagation rather than Ranger/VRP.  In fact, I 
just acked a patch of his that looks closely related.


https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577888.html

Jeff


Re: [RFC] middle-end: Extend CSE to understand vector extracts.

2021-08-22 Thread Jeff Law via Gcc-patches




On 1/4/2021 6:18 AM, Tamar Christina wrote:

Hi All,

I am trying to get CSE to re-use constants already inside a vector rather than
re-materializing the constant again.

Basically consider the following case:

#include 
#include 

uint64_t
test (uint64_t a, uint64x2_t b, uint64x2_t* rt)
{
   uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL};
   uint64_t res = a | arr[0];
   uint64x2_t val = vld1q_u64 (arr);
   *rt = vaddq_u64 (val, b);
   return res;
}

The actual behavior is inconsequential however notice that the same constants
are used in the vector (arr and later val) and in the calculation of res.

The code we generate for this however is quite sub-optimal:

test:
 adrpx2, .LC0
 sub sp, sp, #16
 ldr q1, [x2, #:lo12:.LC0]
 mov x2, 16502
 movkx2, 0x1023, lsl 16
 movkx2, 0x4308, lsl 32
 add v1.2d, v1.2d, v0.2d
 movkx2, 0x942, lsl 48
 orr x0, x0, x2
 str q1, [x1]
 add sp, sp, 16
 ret
.LC0:
 .xword  667169396713799798
 .xword  667169396713799798

Essentially we materialize the same constant twice.  The reason for this is
because the front-end lowers the constant extracted from arr[0] quite early on.
If you look into the result of fre you'll find

:
   arr[0] = 667169396713799798;
   arr[1] = 667169396713799798;
   res_7 = a_6(D) | 667169396713799798;
   _16 = __builtin_aarch64_ld1v2di (&arr);
   _17 = VIEW_CONVERT_EXPR(_16);
   _11 = b_10(D) + _17;
   *rt_12(D) = _11;
   arr ={v} {CLOBBER};
   return res_7;

Which makes sense for further optimization.  However come expand time if the
constant isn't representable in the target arch it will be assigned to a
register again.

(insn 8 5 9 2 (set (reg:V2DI 99)
 (const_vector:V2DI [
 (const_int 667169396713799798 [0x942430810234076]) repeated x2
 ])) "cse.c":7:12 -1
  (nil))
...
(insn 14 13 15 2 (set (reg:DI 103)
 (const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 -1
  (nil))
(insn 15 14 16 2 (set (reg:DI 102 [ res ])
 (ior:DI (reg/v:DI 96 [ a ])
 (reg:DI 103))) "cse.c":8:12 -1
  (nil))
So I think the key here is to be able to hash the elements of the 
const_vector to the same value as the const_int.  If you can hash them 
the same, they'll be seen as common subexpressions regardless of the 
order in which the insns appear.





My current patch for CSE is:

diff --git a/gcc/cse.c b/gcc/cse.c
index 36bcfc354d8..3cee53bed85 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "rtl-iter.h"
  #include "regs.h"
  #include "function-abi.h"
+#include "expr.h"

  /* The basic idea of common subexpression elimination is to go
 through the code, keeping a record of expressions that would
@@ -4306,6 +4307,20 @@ find_sets_in_insn (rtx_insn *insn, struct set **psets)
  someplace else, so it isn't worth cse'ing.  */
else if (GET_CODE (SET_SRC (x)) == CALL)
 ;
+  else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR)
+   {
+ /* First register the vector itself.  */
+ sets[n_sets++].rtl = x;
+ rtx src = SET_SRC (x);
+ machine_mode elem_mode = GET_MODE_INNER (GET_MODE (src));
+  /* Go over the constants of the CONST_VECTOR in forward order, to
+put them in the same order in the SETS array.  */
+ for (unsigned i = 0; i < const_vector_encoded_nelts (src) ; i++)
+   {
+ rtx y = gen_rtx_SUBREG (elem_mode, SET_DEST (x), i);
+ sets[n_sets++].rtl = PATTERN (gen_move_insn (y, CONST_VECTOR_ELT 
(src, i)));
+   }
+   }
else
 sets[n_sets++].rtl = x;
  }
@@ -4545,7 +4560,14 @@ cse_insn (rtx_insn *insn)
struct set *sets = (struct set *) 0;

if (GET_CODE (x) == SET)
-sets = XALLOCA (struct set);
+{
+  /* For CONST_VECTOR we wants to be able to CSE the vector itself along 
with
+elements inside the vector if the target says it's cheap.  */
+  if (GET_CODE (SET_SRC (x)) == CONST_VECTOR)
+   sets = XALLOCAVEC (struct set, const_vector_encoded_nelts (SET_SRC (x)) 
+ 1);
+  else
+   sets = XALLOCA (struct set);
+}
else if (GET_CODE (x) == PARALLEL)
  sets = XALLOCAVEC (struct set, XVECLEN (x, 0));

--

This extends the sets that CSE uses to perform CSE to not only contain the
CONST_VECTOR but also the individual elements of the vector.
Seems conceptually reasonable.  You probably want something similar to 
allow you to replace those elements in the vector as well.




For each element I generate new RTL which models them as a constant being set
into a subreg of the original vector at the index of the element in the vector.

This so that the SRC is the constant we want to CSE and DEST contains the
SUBREG to extract from the vector.

It works as expected, the testc

Re: [PATCH] Use _GLIBCXX_ASSERTIONS as _GLIBCXX_DEBUG light

2021-08-22 Thread François Dumont via Gcc-patches

Any feedback ?

Thanks

On 08/08/21 9:34 pm, François Dumont wrote:

After further testing here a fixed version which imply less changes.

Moreover I already commit the fixes unrelated with this patch.

    libstdc++: [_GLIBCXX_ASSERTIONS] Activate basic debug checks

    libstdc++-v3/ChangeLog:

    * include/bits/stl_algobase.h (equal): Use runtime-only 
_GLIBCXX_DEBUG check.
    * include/bits/stl_iterator.h [_GLIBCXX_ASSERTIONS]: 
Include .
    * include/debug/debug.h [_GLIBCXX_ASSERTIONS]: Define 
debug macros non-empty. Most of

    the time do a simple valid_range check.
    * include/debug/helper_functions.h: Cleanup comment about 
removed _Iter_base.
    (__gnu_debug::__valid_range): Add __skip_if_constexpr 
parameter and skip check when true

    and in a constexpr context.
    * include/debug/macros.h (_GLIBCXX_DEBUG_VERIFY): Define 
as __glibcxx_assert when only

    _GLIBCXX_ASSERTIONS is defined.
    (__glibcxx_check_valid_range): Add _SkipIfConstexpr 
parameter.

    (__glibcxx_check_can_increment_range): Likewise.
    * include/debug/safe_iterator.h (__valid_range): Adapt.
    * include/debug/safe_local_iterator.h (__valid_range): Adapt.
    * testsuite/24_iterators/istream_iterator/1.cc (test01): 
Skip iterator increment when

    _GLIBCXX_ASSERTIONS is defined.
    * testsuite/25_algorithms/copy/constexpr_neg.cc: New test.
    * testsuite/25_algorithms/heap/1.cc: Skip operation 
complexity checks when _GLIBCXX_ASSERTIONS

    is defined.

Ok to commit ?

François


On 06/08/21 4:52 pm, François Dumont wrote:

On 07/06/21 6:25 am, François Dumont wrote:

On 03/06/21 2:31 pm, Jonathan Wakely wrote:



+  }
+
  /* Checks that [first, last) is a valid range, and then returns
   * __first. This routine is useful when we can't use a separate
   * assertion statement because, e.g., we are in a constructor.
@@ -260,8 +279,9 @@ namespace __gnu_debug
    inline bool
    __check_sorted(const _InputIterator& __first, const 
_InputIterator& __last)

    {
-  return __check_sorted_aux(__first, __last,
-    std::__iterator_category(__first));
+  return __skip_debug_runtime_check()
+    || __check_sorted_aux(__first, __last,
+  std::__iterator_category(__first));


Currently this function is never called at all ifndef _GLIBCXX_DEBUG.
With this change, it's going to be present for _GLIBCXX_ASSERTIONS,
and if it isn't inlined it's going to explode the code size.

Some linux distros are already building the entire distro with
_GLIBCXX_ASSERTIONS so I think we need to be quite careful about this
kind of large change affecting every algo.

So maybe we shouldn't enable these checks via _GLIBCXX_ASSERTIONS, but
a new macro.


_GLIBCXX_DEBUG is already rarely used, so will be yet another mode.

So let's forget about all this, thanks.

I eventually wonder if your feedback was limited to the use of 
__check_sorted and some other codes perhaps.


So here is another proposal which activate a small subset of the 
_GLIBCXX_DEBUG checks in _GLIBCXX_ASSERTIONS but with far less code.


First, the _Error_formatter is not used, the injected checks are 
simply using __glibcxx_assert.


Second I reduced the number of accitaved checks, mostly the 
__valid_range.


I also enhance the valid_range check for constexpr because sometimes 
the normal implementation is good enough to let the compiler diagnose 
a potential issue in this context. This is for example the case of 
the std::equal implementation whereas the std::copy implementation is 
too defensive.


    libstdc++: [_GLIBCXX_ASSERTIONS] Activate basic debug checks

    libstdc++-v3/ChangeLog:

    * include/bits/stl_algobase.h (equal): Use runtime-only 
_GLIBCXX_DEBUG check.
    * include/bits/stl_iterator.h [_GLIBCXX_ASSERTIONS]: 
Include .
    * include/debug/debug.h [_GLIBCXX_ASSERTIONS]: Define 
debug macros non-empty. Most of

    the time do a simple valid_range check.
    * include/debug/helper_functions.h: Cleanup comment about 
removed _Iter_base.
    (__valid_range): Add __skip_if_constexpr parameter and 
skip check when in a constexpr

    context.
    * include/debug/macros.h (_GLIBCXX_DEBUG_VERIFY): Define 
as __glibcxx_assert when only

    _GLIBCXX_ASSERTIONS is defined.
    (__glibcxx_check_valid_range): Add _SkipIfConstexpr 
parameter.

    (__glibcxx_check_can_increment_range): Likewise.
    * testsuite/24_iterators/istream_iterator/1.cc (test01): 
Skip iterator increment when

    _GLIBCXX_ASSERTIONS is defined.
    * testsuite/25_algorithms/copy/constexpr_neg.cc: New test.
    * testsuite/25_algorithms/heap/1.cc: Skip operation 
complexity checks when _GLIBCXX_ASSERTIONS

    is defined.
    * 
testsuite/25_algorithms/lower_bound/debug/con

Re: [Patch][doc][PR101843]clarification on building gcc and binutils together

2021-08-22 Thread Jeff Law via Gcc-patches




On 8/19/2021 4:27 PM, Qing Zhao via Gcc-patches wrote:

Hi,

This patch is on behalf of John Henning, who opened PR 101843:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101843

He proposed the following doc change, please take a look and let me know 
whether this is Okay for commit?
I think we need to get away from suggesting single tree builds. Instead 
what we should recommend is to identify the right version of binutils, 
build and install it into the same prefix as GCC will be installed, then 
build and install GCC.


jeff



Re: [PATCH] Fix ICE when mixing VLAs and statement expressions [PR91038]

2021-08-22 Thread Uecker, Martin
Am Montag, den 16.08.2021, 06:49 +0200 schrieb Martin Uecker:
> Am Montag, den 16.08.2021, 00:30 -0400 schrieb Jason Merrill:
> > On 8/1/21 1:36 PM, Uecker, Martin wrote:
> > > Here is an attempt to fix some old and annoying bugs related
> > > to VLAs and statement expressions. In particulary, this seems
> > > to fix the issues with variably-modified types which are
> > > returned from statement expressions (which works on clang),
> > > but there are still bugs remaining related to structs
> > > with VLA members (which seems to be a FE bug).
> > > 
> > > Of course, I might be doing something stupid...
> > > 
> > > The patch survives bootstrapping and regresstion testing
> > > on x86_64.
> > 
> > Including Ada?
> 
> It broke PLACEHOLDER_EXPRs as you pointed out below.
> 
> Please take a look at the new version:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577402.html


Richard,

any comments on this version of the patch?

Thank you!

Martin



> Martin
> 
> 
> 
> > > Fix ICE when mixing VLAs and statement expressions [PR91038]
> > > 
> > > When returning VM-types from statement expressions, this can
> > > lead to an ICE when declarations from the statement expression
> > > are referred to later. Some of these issues can be addressed by
> > > gimplifying the base expression earlier in gimplify_compound_lval.
> > > This fixes PR91038 and some of the test cases from PR29970
> > > (structs with VLA members need further work).
> > > 
> > >  
> > >  2021-08-01  Martin Uecker  
> > >  
> > >  gcc/
> > >   PR c/91038
> > >   PR c/29970
> > >   * gimplify.c (gimplify_var_or_parm_decl): Update comment.
> > >   (gimplify_compound_lval): Gimplify base expression first.
> > >  
> > >  gcc/testsuite/
> > >   PR c/91038
> > >   PR c/29970
> > >   * gcc.dg/vla-stexp-01.c: New test.
> > >   * gcc.dg/vla-stexp-02.c: New test.
> > >   * gcc.dg/vla-stexp-03.c: New test.
> > >   * gcc.dg/vla-stexp-04.c: New test.
> > > 
> > > 
> > > diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> > > index 21ff32ee4aa..885d5f73585 100644
> > > --- a/gcc/gimplify.c
> > > +++ b/gcc/gimplify.c
> > > @@ -2839,7 +2839,10 @@ gimplify_var_or_parm_decl (tree *expr_p)
> > >declaration, for which we've already issued an error.  It would
> > >be really nice if the front end wouldn't leak these at all.
> > >Currently the only known culprit is C++ destructors, as seen
> > > - in g++.old-deja/g++.jason/binding.C.  */
> > > + in g++.old-deja/g++.jason/binding.C.
> > > + Another culpit are size expressions for variably modified
> > > + types which are lost in the FE or not gimplified correctly.
> > > +  */
> > > if (VAR_P (decl)
> > > && !DECL_SEEN_IN_BIND_EXPR_P (decl)
> > > && !TREE_STATIC (decl) && !DECL_EXTERNAL (decl)
> > > @@ -2984,9 +2987,23 @@ gimplify_compound_lval (tree *expr_p, gimple_seq 
> > > *pre_p, gimple_seq
> > > *post_p,
> > >expression until we deal with any variable bounds, sizes, or
> > >positions in order to deal with PLACEHOLDER_EXPRs.
> > >   
> > > - So we do this in three steps.  First we deal with the annotations
> > > - for any variables in the components, then we gimplify the base,
> > > - then we gimplify any indices, from left to right.  */
> > > + So we do this in three steps.  First we gimplify the base,
> > > + then we deal with the annotations for any variables in the
> > > + components, then we gimplify any indices, from left to right.
> > > +
> > > + The base expression may contain a statement expression that
> > > + has declarations used in size expressions, so has to be
> > > + gimplified first. */
> > 
> > The previous paragraph says,
> > 
> > >   But we can't gimplify the
> > > inner   
> > >  expression until we deal with any variable bounds, sizes,
> > > or  
> > >  positions in order to deal with PLACEHOLDER_EXPRs. 
> > 
> > so I would expect your change to break examples that the current code 
> > was designed to handle.  The change to delay gimplifying the inner 
> > expression was in r0-59131 (SVN r83474), by Richard Kenner.  But there 
> > aren't any testcases in that commit.  Richard, any insight?  Can you 
> > review this patch?
> 
> 
> > > +  /* Step 1 is to gimplify the base expression.  Make sure lvalue is set
> > > + so as to match the min_lval predicate.  Failure to do so may result
> > > + in the creation of large aggregate temporaries.  */
> > > +  tret = gimplify_expr (p, pre_p, post_p, is_gimple_min_lval,
> > > + fallback | fb_lvalue);
> > > +
> > > +  ret = MIN (ret, tret);
> > > +
> > > +
> > > for (i = expr_stack.length () - 1; i >= 0; i--)
> > >   {
> > > tree t = expr_stack[i];
> > > @@ -3076,12 +3093,6 @@ gimplify_compound_lval (tree *expr_p, gimple_seq 
> > > *pre_p, gimple_seq
> > >

Re: [PATCH] mips: msa: truncate immediate shift amount [PR101922]

2021-08-22 Thread Xi Ruoyao via Gcc-patches
On Sun, 2021-08-22 at 19:21 -0600, Jeff Law wrote:
> 
> 
> On 8/20/2021 11:07 AM, Xi Ruoyao via Gcc-patches wrote:
> > When -mloongson-mmi is enabled, SHIFT_COUNT_TRUNCATED is turned off.
> > This causes untruncated immediate shift amount outputed into the
> > asm,
> > and the GNU assembler refuses to assemble it.
> > 
> > Truncate immediate shift amount when outputing the asm instruction
> > to
> > make GAS happy again.
> > 
> > gcc/
> > 
> > PR target/101922
> > * config/mips/mips-protos.h
> > (mips_msa_output_shift_immediate):
> >   Declare.
> > * config/mips/mips.c (mips_msa_output_shift_immediate): New
> >   function.
> > * config/mips/mips-msa.md (vashl3, vashr3,
> >   vlshr3): Call it.
> > 
> > gcc/testsuite/
> > 
> > PR target/101922
> > * gcc.target/mips/pr101922.c: New test.
> OK.

Committed @ f93f0868919.

> Q. Looking out further, is it going to continue to make sense to have 
> more sense to have a distinct loongson port?

The latest Loongson processors (branded Loongson 3A5000 for desktop,
3B5000 for workstation and server, and 3C5000L for server) have moved
away from MIPS to a new RISC architecture named "LoongArch".  Its design
learnt some traits from MIPSr6 and RISC-V I think, but it's not a simple
MIPS variant and will need a new port for GCC.  A manual of LoongArch
basic instructions is at
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html.
LoongArch also have 128 and 256 bit vector insturctions, but the manual
is not published yet.

A team from Loongson is working on the port, the (experimental) source
code is available at
https://github.com/loongson/gcc/commits/loongarch-12.  It's not ready
for upstream reviewing yet.

For "legacy" Loongson processors using MIPS, I suggest to keep the
support as a MIPS extension.  I'll try to keep it in an "usable" state
(i. e. fix, or at least workaround ICE and bad assemble code like this).
If one day we can't maintain it anymore we'd have to sadly deprecate and
remove Loongson MMI support.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University