Re: [3/3][aarch64] Add support for vec_widen_shift pattern

2020-11-13 Thread Richard Biener
On Thu, 12 Nov 2020, Joel Hutton wrote:

> Hi all,
> 
> This patch adds support in the aarch64 backend for the vec_widen_shift 
> vect-pattern and makes a minor mid-end fix to support it.
> 
> All 3 patches together bootstrapped and regression tested on aarch64.
> 
> Ok for stage 1?

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index
f12fd158b13656ee24022ec7e445c53444be6554..1f40b59c0560eec675af1d9a0e3e818d47
589de6 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -4934,8 +4934,13 @@ vectorizable_conversion (vec_info *vinfo,
 &vec_oprnds1);
   if (code == WIDEN_LSHIFT_EXPR)
{
- vec_oprnds1.create (ncopies * ninputs);
- for (i = 0; i < ncopies * ninputs; ++i)
+ int oprnds_size = ncopies * ninputs;
+ /* In the case of SLP ncopies = 1, so the size of vec_oprnds1 
here
+  * should be obtained by the the size of vec_oprnds0.  */

You should be able to always use vec_oprnds0.length ()

This hunk is OK with that change.

+ if (slp_node)
+   oprnds_size = vec_oprnds0.length ();
+ vec_oprnds1.create (oprnds_size);
+ for (i = 0; i < oprnds_size; ++i)
vec_oprnds1.quick_push (op1);
}
   /* Arguments are ready.  Create the new vector stmts.  */

> 
> gcc/ChangeLog:
> 
> 2020-11-12 ?Joel Hutton ?
> 
> ? ? ? ? * config/aarch64/aarch64-simd.md: vec_widen_lshift_hi/lo 
> patterns
> ? ? ? ? * tree-vect-stmts.c 
> ? ? ? ? (vectorizable_conversion): Fix for widen_lshift case
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-11-12 ?Joel Hutton ?
> 
> ? ? ? ? * gcc.target/aarch64/vect-widen-lshift.c: New test.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: Detect EAF flags in ipa-modref

2020-11-13 Thread Richard Biener
On Tue, 10 Nov 2020, Jan Hubicka wrote:

> Hi,
> here is updaed patch.
> 
> Honza
> 
> Bootstrapped/regtested x86_64-linux, OK (after the fnspec fixes)?

OK.

Thanks,
Richard.

> 
> 2020-11-10  Jan Hubicka  
> 
>   * gimple.c: Include ipa-modref-tree.h and ipa-modref.h.
>   (gimple_call_arg_flags): Use modref to determine flags.
>   * ipa-modref.c: Include gimple-ssa.h, tree-phinodes.h,
>   tree-ssa-operands.h, stringpool.h and tree-ssanames.h.
>   (analyze_ssa_name_flags): Declare.
>   (modref_summary::useful_p): Summary is also useful if arg flags are
>   known.
>   (dump_eaf_flags): New function.
>   (modref_summary::dump): Use it.
>   (get_modref_function_summary): Be read for current_function_decl
>   being NULL.
>   (memory_access_to): New function.
>   (deref_flags): New function.
>   (call_lhs_flags): New function.
>   (analyze_parms): New function.
>   (analyze_function): Use it.
>   * ipa-modref.h (struct modref_summary): Add arg_flags.
>   * doc/invoke.texi (ipa-modref-max-depth): Document.
>   * params.opt (ipa-modref-max-depth): New param.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-11-10  Jan Hubicka  
> 
>   * gcc.dg/torture/pta-ptrarith-1.c: Escape parametrs.
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index d2a188d7c75..0bd76d2841e 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -12953,6 +12953,10 @@ memory locations using the mod/ref information.  
> This parameter ought to be
>  bigger than @option{--param ipa-modref-max-bases} and @option{--param
>  ipa-modref-max-refs}.
>  
> +@item ipa-modref-max-depth
> +Specified the maximum depth of DFS walk used by modref escape analysis.
> +Setting to 0 disables the analysis completely.
> +
>  @item profile-func-internal-id
>  A parameter to control whether to use function internal id in profile
>  database lookup. If the value is 0, the compiler uses an id that
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index 1afed88e1f1..da90716aa23 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -46,6 +46,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "asan.h"
>  #include "langhooks.h"
>  #include "attr-fnspec.h"
> +#include "ipa-modref-tree.h"
> +#include "ipa-modref.h"
>  
>  
>  /* All the tuples have their operand vector (if present) at the very bottom
> @@ -1532,24 +1534,45 @@ int
>  gimple_call_arg_flags (const gcall *stmt, unsigned arg)
>  {
>attr_fnspec fnspec = gimple_call_fnspec (stmt);
> -
> -  if (!fnspec.known_p ())
> -return 0;
> -
>int flags = 0;
>  
> -  if (!fnspec.arg_specified_p (arg))
> -;
> -  else if (!fnspec.arg_used_p (arg))
> -flags = EAF_UNUSED;
> -  else
> +  if (fnspec.known_p ())
>  {
> -  if (fnspec.arg_direct_p (arg))
> - flags |= EAF_DIRECT;
> -  if (fnspec.arg_noescape_p (arg))
> - flags |= EAF_NOESCAPE;
> -  if (fnspec.arg_readonly_p (arg))
> - flags |= EAF_NOCLOBBER;
> +  if (!fnspec.arg_specified_p (arg))
> + ;
> +  else if (!fnspec.arg_used_p (arg))
> + flags = EAF_UNUSED;
> +  else
> + {
> +   if (fnspec.arg_direct_p (arg))
> + flags |= EAF_DIRECT;
> +   if (fnspec.arg_noescape_p (arg))
> + flags |= EAF_NOESCAPE;
> +   if (fnspec.arg_readonly_p (arg))
> + flags |= EAF_NOCLOBBER;
> + }
> +}
> +  tree callee = gimple_call_fndecl (stmt);
> +  if (callee)
> +{
> +  cgraph_node *node = cgraph_node::get (callee);
> +  modref_summary *summary = node ? get_modref_function_summary (node)
> + : NULL;
> +
> +  if (summary && summary->arg_flags.length () > arg)
> + {
> +   int modref_flags = summary->arg_flags[arg];
> +
> +   /* We have possibly optimized out load.  Be conservative here.  */
> +   if (!node->binds_to_current_def_p ())
> + {
> +   if ((modref_flags & EAF_UNUSED) && !(flags & EAF_UNUSED))
> + modref_flags &= ~EAF_UNUSED;
> +   if ((modref_flags & EAF_DIRECT) && !(flags & EAF_DIRECT))
> + modref_flags &= ~EAF_DIRECT;
> + }
> +   flags |= modref_flags;
> + }
>  }
>return flags;
>  }
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index 3f46bebed3c..30e76580fb0 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -61,6 +61,15 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-fnsummary.h"
>  #include "attr-fnspec.h"
>  #include "symtab-clones.h"
> +#include "gimple-ssa.h"
> +#include "tree-phinodes.h"
> +#include "tree-ssa-operands.h"
> +#include "ssa-iterators.h"
> +#include "stringpool.h"
> +#include "tree-ssanames.h"
> +
> +static int analyze_ssa_name_flags (tree name,
> +vec &known_flags, int depth);
>  
>  /* We record fnspec specifiers for call edges since they depends on actual
> gimple statements.  */
> @@ -186,6 +195,8 @@ modref_summary::useful_p (int ecf_flags)

[00/23] Make fwprop use an on-the-side RTL SSA representation

2020-11-13 Thread Richard Sandiford via Gcc-patches
Just after GCC 10 stage 1 closed (oops), I posted a patch to add a new
combine pass.  One of its main aims was to allow instructions to move
around where necessary in order to make a combination possible.
It also tried to parallelise instructions that use the same resource.

That pass contained its own code for maintaining limited def-use chains.
When I posted the patch, Segher asked why we wanted yet another piece
of pass-specific code to do that.  Although I had specific reasons
(which I explained at the time) I've gradually come round to agreeing
that that was a flaw.

This series of patches is the result of a Covid-time project to add
a more general, pass-agnostic framework.  There are two parts:
adding the framework itself, and using it to make fwprop.c faster.

The framework part
--

The framework provides an optional, on-the-side SSA view of existing
RTL instructions.  Each instruction gets a list of definitions and a
list of uses, with each use having a single definition.  Phi nodes
handle cases in which there are multiple possible definitions of a
register on entry to a basic block.  There are also routines for
updating instructions while keeping the SSA representation intact.

The aim is only to provide a different view of existing RTL instructions.
Unlike gimple, and unlike (IIRC) the old RTL SSA project from way back,
the new framework isn't a “native” SSA representation.  This means that
all inputs to a phi node for a register R are also definitions of
register R; no move operation is “hidden” in the phi node.

Like gimple, the framework treats memory as a single unified resource.

A more in-depth summary is contained in the doc patch, but some
other random notes:

* At the moment, the SSA information is local to one pass, but it might
  be good to maintain it between passes in future.

* The SSA code groups blocks into extended basic blocks, with the
  EBBs rather than individual blocks having phi nodes.  

* The framework also provides live range information for registers
  within an extended basic block and allows instructions to move within
  their EBB.  It might be useful to allow further movement in future;
  I just don't have a use case for it yet.

* One advantage of the new infrastructure is that it gives
  recog_for_combine-like behaviour: if recog wants to add clobbers
  of things like the flags register, the SSA code will make sure
  that the flags register is free.

* All current queries and updates have amortised sublinear complexity.
  Some updates are done lazily in order to avoid an upfront linear cost.

* I've tried to optimise the code for both memory footprint and
  compile time.  The first part involves quite a bit of overloading
  of pointers and various other kinds of reuse, so most of the new data
  structures use private member variables and public accessor functions.
  I know that style isn't universally popular, but I think it's
  justified here.  Things could easily go wrong if passes tried
  to operate directly on the underlying data structures.

* Debug instructions get SSA information too, on a best-effort basis.
  Providing complete information would be significantly more expensive.

* I wasn't sure for new C++ code whether to stick to the old C /* … */
  comments, or whether to switch to //.  In the end I went for //,
  on the basis that:

  - The ranger code already does this.

  - // is certainly more idiomatic in C++.

  - // is in the lisp tradition of per-line comments and it matches the
;; used in .md files.  I feel sure that GCC would have been written
using // from the outset if that had been possible.

  The patches only do this for new files.  The aim is to ensure that
  each file is at least self-consistent.

Using RTL SSA to make fwprop faster
---

In order to show the thing in action, I tried to port fwprop.c
to use RTL SSA while preserving the pass's current heuristics as
much as possible.

To get an extreme measurement of speed, I made each fwprop pass
run 5000 times, calling:

  df_finish_pass (false);

after each iteration.  Usually only the first iteration would actually
do any optimisation, the other iterations would simply test the cost of
the instruction processing.  In the case of the “old” pass, this included:

  - df_analyze (including solving the notes and md problems)
  - the dominator walk to build a list of single definitions

In the case of the “new” pass, this included:

  - df_analyze (with no additional problems)
  - building the SSA representation

On an --enable-checking=release compiler, the post-patch version was 23%
faster than the pre-patch version when compiling simplify-rtx.ii at -O.

When compiling simplify-rtx.ii at -O normally (without the hack above),
the compile-time improvement is ~0.5% (which was outside the noise).
The assembly output was unchanged.

Testing
---

Tested so far on aarch64-linux-gnu, arm-linux-gnueabihf and
x86_64-linux-gnu.  I'll test on p

[01/23] vec: Silence clang warning

2020-11-13 Thread Richard Sandiford via Gcc-patches
I noticed during compatibility testing that clang warns that this
operator won't be implicitly const in C++14 onwards.

gcc/
* vec.h (vnull::operator vec): Make const.
---
 gcc/vec.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/vec.h b/gcc/vec.h
index 14d77e87342..f02beddc975 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -540,7 +540,7 @@ vec_copy_construct (T *dst, const T *src, unsigned n)
 struct vnull
 {
   template 
-  CONSTEXPR operator vec () { return vec(); }
+  CONSTEXPR operator vec () const { return vec(); }
 };
 extern vnull vNULL;
 
-- 
2.17.1



[03/23] reginfo: Add a global_reg_set

2020-11-13 Thread Richard Sandiford via Gcc-patches
A later patch wants to use the set of global registers as a HARD_REG_SET
rather than a bool/char array.  Most other arrays already have a
HARD_REG_SET counterpart, but this one didn't.

gcc/
* hard-reg-set.h (global_reg_set): Declare.
* reginfo.c (global_reg_set): New variable.
(init_reg_sets_1, globalize_reg): Update it when globalizing
registers.
---
 gcc/hard-reg-set.h | 2 ++
 gcc/reginfo.c  | 5 +
 2 files changed, 7 insertions(+)

diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index 1ec1b4e4aa0..787da3a4f02 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -359,6 +359,8 @@ hard_reg_set_iter_next (hard_reg_set_iterator *iter, 
unsigned *regno)
 
 extern char global_regs[FIRST_PSEUDO_REGISTER];
 
+extern HARD_REG_SET global_reg_set;
+
 class simplifiable_subreg;
 class subreg_shape;
 
diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index e34b74af9f1..cc7d17460eb 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -91,6 +91,9 @@ static const char initial_call_used_regs[] = 
CALL_USED_REGISTERS;
and are also considered fixed.  */
 char global_regs[FIRST_PSEUDO_REGISTER];
 
+/* The set of global registers.  */
+HARD_REG_SET global_reg_set;
+
 /* Declaration for the global register. */
 tree global_regs_decl[FIRST_PSEUDO_REGISTER];
 
@@ -390,6 +393,7 @@ init_reg_sets_1 (void)
{
  fixed_regs[i] = call_used_regs[i] = 1;
  SET_HARD_REG_BIT (fixed_reg_set, i);
+ SET_HARD_REG_BIT (global_reg_set, i);
}
 }
 
@@ -724,6 +728,7 @@ globalize_reg (tree decl, int i)
 
   global_regs[i] = 1;
   global_regs_decl[i] = decl;
+  SET_HARD_REG_BIT (global_reg_set, i);
 
   /* If we're globalizing the frame pointer, we need to set the
  appropriate regs_invalidated_by_call bit, even if it's already
-- 
2.17.1



[02/23] rtlanal: Remove noop_move_p REG_EQUAL condition

2020-11-13 Thread Richard Sandiford via Gcc-patches
noop_move_p currently keeps any instruction that has a REG_EQUAL
note, on the basis that the equality might be useful in future.
But this creates a perverse incentive not to add potentially-useful
REG_EQUAL notes, in case they prevent an instruction from later being
removed as dead.

The condition originates from flow.c:life_analysis_1 and predates
the changes tracked by the current repository (1992).  It probably
made sense when most optimisations were done on RTL rather than FE
trees, but it seems counterproductive now.

gcc/
* rtlanal.c (noop_move_p): Don't check for REG_EQUAL notes.
---
 gcc/rtlanal.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 01130a10783..6f521503c39 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1668,10 +1668,6 @@ noop_move_p (const rtx_insn *insn)
   if (INSN_CODE (insn) == NOOP_MOVE_INSN_CODE)
 return 1;
 
-  /* Insns carrying these notes are useful later on.  */
-  if (find_reg_note (insn, REG_EQUAL, NULL_RTX))
-return 0;
-
   /* Check the code to be executed for COND_EXEC.  */
   if (GET_CODE (pat) == COND_EXEC)
 pat = COND_EXEC_CODE (pat);
-- 
2.17.1



[04/23] Move iterator_range to a new iterator-utils.h file

2020-11-13 Thread Richard Sandiford via Gcc-patches
A later patch will add more iterator-related utilities.  Rather than
putting them all directly in coretypes.h, it seemed better to add a
new header file, here called "iterator-utils.h".  This preliminary
patch moves the existing iterator_range class there too.

I used the same copyright date range as coretypes.h “just to be sure”.

gcc/
* coretypes.h (iterator_range): Move to...
* iterator-utils.h: ...this new file.
---
 gcc/coretypes.h  | 18 +-
 gcc/iterator-utils.h | 44 
 2 files changed, 45 insertions(+), 17 deletions(-)
 create mode 100644 gcc/iterator-utils.h

diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index da178b6a9f6..043df12f588 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -367,23 +367,6 @@ struct kv_pair
   const ValueType value;   /* the value of the name */
 };
 
-/* Iterator pair used for a collection iteration with range-based loops.  */
-
-template
-struct iterator_range
-{
-public:
-  iterator_range (const T &begin, const T &end)
-: m_begin (begin), m_end (end) {}
-
-  T begin () const { return m_begin; }
-  T end () const { return m_end; }
-
-private:
-  T m_begin;
-  T m_end;
-};
-
 #else
 
 struct _dont_use_rtx_here_;
@@ -491,6 +474,7 @@ typedef unsigned char uchar;
 #include "align.h"
 /* Most host source files will require the following headers.  */
 #if !defined (GENERATOR_FILE)
+#include "iterator-utils.h"
 #include "real.h"
 #include "fixed-value.h"
 #include "hash-table.h"
diff --git a/gcc/iterator-utils.h b/gcc/iterator-utils.h
new file mode 100644
index 000..0c95862c7ca
--- /dev/null
+++ b/gcc/iterator-utils.h
@@ -0,0 +1,44 @@
+// Iterator-related utilities.
+// Copyright (C) 2002-2020 Free Software Foundation, Inc.
+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify it under
+// the terms of the GNU General Public License as published by the Free
+// Software Foundation; either version 3, or (at your option) any later
+// version.
+//
+// GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+// WARRANTY; without even the implied warranty of MERCHANTABILITY or
+// FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+// for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with GCC; see the file COPYING3.  If not see
+// .
+
+#ifndef GCC_ITERATOR_UTILS_H
+#define GCC_ITERATOR_UTILS_H 1
+
+// A half-open [begin, end) range of iterators.
+template
+struct iterator_range
+{
+public:
+  using const_iterator = T;
+
+  iterator_range () = default;
+  iterator_range (const T &begin, const T &end)
+: m_begin (begin), m_end (end) {}
+
+  T begin () const { return m_begin; }
+  T end () const { return m_end; }
+
+  explicit operator bool () const { return m_begin != m_end; }
+
+private:
+  T m_begin;
+  T m_end;
+};
+
+#endif
-- 
2.17.1



[05/23] Add more iterator utilities

2020-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds some more iterator helper classes.  They really fall
into two groups, but there didn't seem much value in separating them:

- A later patch has a class hierarchy of the form:

 Base
  +- Derived1
  +- Derived2

  A class wants to store an array A1 of Derived1 pointers and an
  array A2 of Derived2 pointers.  However, for compactness reasons,
  it was convenient to have a single array of Base pointers,
  with A1 and A2 being slices of this array.  This reduces the
  overhead from two pointers and two ints (3 LP64 words) to one
  pointer and two ints (2 LP64 words).

  But consumers of the class shouldn't be aware of this: they should
  see A1 as containing Derived1 pointers rather than Base pointers
  and A2 as containing Derived2 pointers rather than Base pointers.
  This patch adds derived_iterator and const_derived_container
  classes to support this use case.

- A later patch also adds various linked lists.  This patch adds
  wrapper_iterator and list_iterator classes to make it easier
  to create iterators for these linked lists.  For example:

// Iterators for lists of definitions.
using def_iterator = list_iterator;
using reverse_def_iterator
  = list_iterator;

  This in turn makes it possible to use range-based for loops
  on the lists.

The patch just adds the things that the later patches need; it doesn't
try to make the classes as functionally complete as possible.  I think
we should add extra functionality when needed rather than ahead of time.

gcc/
* iterator-utils.h (derived_iterator): New class.
(const_derived_container, wrapper_iterator): Likewise.
(list_iterator): Likewise.
---
 gcc/iterator-utils.h | 159 +++
 1 file changed, 159 insertions(+)

diff --git a/gcc/iterator-utils.h b/gcc/iterator-utils.h
index 0c95862c7ca..22cc1a545ef 100644
--- a/gcc/iterator-utils.h
+++ b/gcc/iterator-utils.h
@@ -41,4 +41,163 @@ private:
   T m_end;
 };
 
+// Provide an iterator like BaseIT, except that it yields values of type T,
+// which is derived from the type that BaseIT normally yields.
+//
+// The class doesn't inherit from BaseIT for two reasons:
+// - using inheritance would stop the class working with plain pointers
+// - not using inheritance increases type-safety for writable iterators
+//
+// Constructing this class from a BaseIT involves an assertion that all
+// contents really do have type T.  The constructor is therefore explicit.
+template
+class derived_iterator
+{
+public:
+  using value_type = T;
+
+  derived_iterator () = default;
+
+  template
+  explicit derived_iterator (Ts... args)
+: m_base (std::forward (args)...) {}
+
+  derived_iterator &operator++ () { ++m_base; return *this; }
+  derived_iterator operator++ (int);
+
+  T operator* () const { return static_cast (*m_base); }
+  T *operator-> () const { return static_cast (m_base.operator-> ()); }
+
+  bool operator== (const derived_iterator &other) const;
+  bool operator!= (const derived_iterator &other) const;
+
+protected:
+  BaseIT m_base;
+};
+
+template
+inline derived_iterator
+derived_iterator::operator++ (int)
+{
+  derived_iterator ret = *this;
+  ++m_base;
+  return ret;
+}
+
+template
+inline bool
+derived_iterator::operator== (const derived_iterator &other) const
+{
+  return m_base == other.m_base;
+}
+
+template
+inline bool
+derived_iterator::operator!= (const derived_iterator &other) const
+{
+  return m_base != other.m_base;
+}
+
+// Provide a constant view of a BaseCT in which every value is known to
+// have type T, which is derived from the type that BaseCT normally presents.
+//
+// Constructing this class from a BaseCT involves an assertion that all
+// contents really do have type T.  The constructor is therefore explicit.
+template
+class const_derived_container : public BaseCT
+{
+  using base_const_iterator = typename BaseCT::const_iterator;
+
+public:
+  using value_type = T;
+  using const_iterator = derived_iterator;
+
+  const_derived_container () = default;
+
+  template
+  explicit const_derived_container (Ts... args)
+: BaseCT (std::forward (args)...) {}
+
+  const_iterator begin () const { return const_iterator (BaseCT::begin ()); }
+  const_iterator end () const { return const_iterator (BaseCT::end ()); }
+
+  T front () const { return static_cast (BaseCT::front ()); }
+  T back () const { return static_cast (BaseCT::back ()); }
+  T operator[] (unsigned int i) const;
+};
+
+template
+inline T
+const_derived_container::operator[] (unsigned int i) const
+{
+  return static_cast (BaseCT::operator[] (i));
+}
+
+// A base class for iterators whose contents consist of a StoredT and that
+// when dereferenced yield those StoredT contents as a T.  Derived classes
+// should implement at least operator++ or operator--.
+template
+class wrapper_iterator
+{
+public:
+  using value_type = T;
+
+  wrapper_iterator () = default;
+
+  template
+  wrapper_iterator (Ts... args) : m_contents (s

[07/23] Add a class that multiplexes two pointer types

2020-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds a pointer_mux class that provides similar
functionality to:

union { T1 *a; T2 *b; };
...
bool is_b_rather_than_a;

except that the is_b_rather_than_a tag is stored in the low bit
of the pointer.  See the comments in the patch for a comparison
between the two approaches and why this one can be more efficient.

I've tried to microoptimise the class a fair bit, since a later
patch uses it extensively in order to keep the sizes of data
structures down.

gcc/
* mux-utils.h: New file.
---
 gcc/mux-utils.h | 248 
 1 file changed, 248 insertions(+)
 create mode 100644 gcc/mux-utils.h

diff --git a/gcc/mux-utils.h b/gcc/mux-utils.h
new file mode 100644
index 000..17ced49cd22
--- /dev/null
+++ b/gcc/mux-utils.h
@@ -0,0 +1,248 @@
+// Multiplexer utilities
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify it under
+// the terms of the GNU General Public License as published by the Free
+// Software Foundation; either version 3, or (at your option) any later
+// version.
+//
+// GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+// WARRANTY; without even the implied warranty of MERCHANTABILITY or
+// FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+// for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with GCC; see the file COPYING3.  If not see
+// .
+
+#ifndef GCC_MUX_UTILS_H
+#define GCC_MUX_UTILS_H 1
+
+// A class that stores a choice "A or B", where A has type T1 * and B has
+// type T2 *.  Both T1 and T2 must have an alignment greater than 1, since
+// the low bit is used to identify B over A.  T1 and T2 can be the same.
+//
+// A can be a null pointer but B cannot.
+//
+// Barring the requirement that B must be nonnull, using the class is
+// equivalent to using:
+//
+// union { T1 *A; T2 *B; };
+//
+// and having a separate tag bit to indicate which alternative is active.
+// However, using this class can have two advantages over a union:
+//
+// - It avoides the need to find somewhere to store the tag bit.
+//
+// - The compiler is aware that B cannot be null, which can make checks
+//   of the form:
+//
+//   if (auto *B = mux.dyn_cast ())
+//
+//   more efficient.  With a union-based representation, the dyn_cast
+//   check could fail either because MUX is an A or because MUX is a
+//   null B, both of which require a run-time test.  With a pointer_mux,
+//   only a check for MUX being A is needed.
+template
+class pointer_mux
+{
+public:
+  // Return an A pointer with the given value.
+  static pointer_mux first (T1 *);
+
+  // Return a B pointer with the given (nonnull) value.
+  static pointer_mux second (T2 *);
+
+  pointer_mux () = default;
+
+  // Create a null A pointer.
+  pointer_mux (std::nullptr_t) : m_ptr (nullptr) {}
+
+  // Create an A or B pointer with the given value.  This is only valid
+  // if T1 and T2 are distinct and if T can be resolved to exactly one
+  // of them.
+  template::value
+   != std::is_convertible::value>::type>
+  pointer_mux (T *ptr);
+
+  // Return true unless the pointer is a null A pointer.
+  explicit operator bool () const { return m_ptr; }
+
+  // Assign A and B pointers respectively.
+  void set_first (T1 *ptr) { *this = first (ptr); }
+  void set_second (T2 *ptr) { *this = second (ptr); }
+
+  // Return true if the pointer is an A pointer.
+  bool is_first () const { return !(uintptr_t (m_ptr) & 1); }
+
+  // Return true if the pointer is a B pointer.
+  bool is_second () const { return uintptr_t (m_ptr) & 1; }
+
+  // Return the contents of the pointer, given that it is known to be
+  // an A pointer.
+  T1 *known_first () const { return reinterpret_cast (m_ptr); }
+
+  // Return the contents of the pointer, given that it is known to be
+  // a B pointer.
+  T2 *known_second () const { return reinterpret_cast (m_ptr - 1); }
+
+  // If the pointer is an A pointer, return its contents, otherwise
+  // return null.  Thus a null return can mean that the pointer is
+  // either a null A pointer or a B pointer.
+  //
+  // If all A pointers are nonnull, it is more efficient to use:
+  //
+  //if (ptr.is_first ())
+  //  ...use ptr.known_first ()...
+  //
+  // over:
+  //
+  //if (T1 *a = ptr.first_or_null ())
+  //  ...use a...
+  T1 *first_or_null () const;
+
+  // If the pointer is a B pointer, return its contents, otherwise
+  // return null.  Using:
+  //
+  //if (T1 *b = ptr.second_or_null ())
+  //  ...use b...
+  //
+  // should be at least as efficient as:
+  //
+  //if (ptr.is_second ())
+  //  ...use ptr.known_second ()...
+  T2 *second_or_null () const;
+
+  // Return true if the pointer is a T.
+  //
+  // This is only valid if T1 and T2 are distinct and if T can be
+  // re

[06/23] Add an RAII class for managing obstacks

2020-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds an RAII class for managing the lifetimes of objects
on an obstack.  See the comments in the patch for more details and
example usage.

gcc/
* obstack-utils.h: New file.
---
 gcc/obstack-utils.h | 86 +
 1 file changed, 86 insertions(+)
 create mode 100644 gcc/obstack-utils.h

diff --git a/gcc/obstack-utils.h b/gcc/obstack-utils.h
new file mode 100644
index 000..ee389f89923
--- /dev/null
+++ b/gcc/obstack-utils.h
@@ -0,0 +1,86 @@
+// Obstack-related utilities.
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify it under
+// the terms of the GNU General Public License as published by the Free
+// Software Foundation; either version 3, or (at your option) any later
+// version.
+//
+// GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+// WARRANTY; without even the implied warranty of MERCHANTABILITY or
+// FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+// for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with GCC; see the file COPYING3.  If not see
+// .
+
+#ifndef GCC_OBSTACK_UTILS_H
+#define GCC_OBSTACK_UTILS_H
+
+// This RAII class automatically frees memory allocated on an obstack,
+// unless told not to via keep ().  It automatically converts to an
+// obstack, so it can (optionally) be used in place of the obstack
+// to make the scoping clearer.  For example:
+//
+// obstack_watermark watermark (ob);
+// auto *ptr1 = XOBNEW (watermark, struct1);
+// if (...)
+//   // Frees ptr1.
+//   return false;
+//
+// auto *ptr2 = XOBNEW (watermark, struct2);
+// if (...)
+//   // Frees ptr1 and ptr2.
+//   return false;
+//
+// // Retains ptr1 and ptr2.
+// watermark.keep ();
+//
+// auto *ptr3 = XOBNEW (watermark, struct3);
+// if (...)
+//   // Frees ptr3.
+//   return false;
+//
+// // Retains ptr3 (in addition to ptr1 and ptr2 above).
+// watermark.keep ();
+// return true;
+//
+// The move constructor makes it possible to transfer ownership to a caller:
+//
+// obstack_watermark
+// foo ()
+// {
+//   obstack_watermark watermark (ob);
+//   ...
+//   return watermark;
+// }
+//
+// void
+// bar ()
+// {
+//   // Inherit ownership of everything that foo allocated.
+//   obstack_watermark watermark = foo ();
+//   ...
+// }
+class obstack_watermark
+{
+public:
+  obstack_watermark (obstack *ob) : m_obstack (ob) { keep (); }
+  constexpr obstack_watermark (obstack_watermark &&) = default;
+  ~obstack_watermark () { obstack_free (m_obstack, m_start); }
+
+  operator obstack *() const { return m_obstack; }
+  void keep () { m_start = XOBNEWVAR (m_obstack, char, 0); }
+
+private:
+  DISABLE_COPY_AND_ASSIGN (obstack_watermark);
+
+protected:
+  obstack *m_obstack;
+  char *m_start;
+};
+
+#endif
-- 
2.17.1



[08/23] Add an alternative splay tree implementation

2020-11-13 Thread Richard Sandiford via Gcc-patches
We already have two splay tree implementations: the old C one in
libiberty and a templated reimplementation of it in typed-splay-tree.h.
However, they have some drawbacks:

- They hard-code the assumption that nodes should have both a key and
  a value, which isn't always true.

- They use the two-phase method of lookup, and so nodes need to store
  a temporary back pointer.  We can avoid that overhead by using the
  top-down method (as e.g. the bitmap tree code already does).

- The tree node has to own the key and the value.  For some use cases
  it's more convenient to embed the tree links in the value instead.

Also, a later patch wants to use splay trees to represent an
adaptive total order: the splay tree itself records whether node N1
is less than node N2, and (in the worst case) comparing nodes is
a splay operation.

This patch therefore adds an alternative implementation.  The main
features are:

- Nodes can optionally point back to their parents.

- An Accessors class abstracts accessing child nodes and (where
  applicable) parent nodes, so that the information can be embedded
  in larger data structures.

- There is no fixed comparison function at the class level.  Instead,
  individual functions that do comparisons take a comparison function
  argument.

- There are two styles of comparison function, optimised for different
  use cases.  (See the comments in the patch for details.)

- It's possible to do some operations directly on a given node,
  without knowing whether it's the root.  This includes the comparison
  use case described above.

This of course has its own set of drawbacks.  It's really providing
splay utility functions rather than a true ADT, and so is more low-level
than the existing routines.  It's mostly geared for cases in which the
client code wants to participate in the splay operations to some extent.

gcc/
* Makefile.in (OBJS): Add splay-tree-utils.o.
* system.h: Include  when INCLUDE_ARRAY is defined.
* selftest.h (splay_tree_cc_tests): Declare.
* selftest-run-tests.c (selftest::run_tests): Run splay_tree_cc_tests.
* splay-tree-utils.h: New file.
* splay-tree-utils.tcc: Likewise.
* splay-tree-utils.cc: Likewise.
---
 gcc/Makefile.in  |   1 +
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.h   |   1 +
 gcc/splay-tree-utils.cc  | 264 +++
 gcc/splay-tree-utils.h   | 491 
 gcc/splay-tree-utils.tcc | 960 +++
 gcc/system.h |   3 +
 7 files changed, 1721 insertions(+)
 create mode 100644 gcc/splay-tree-utils.cc
 create mode 100644 gcc/splay-tree-utils.h
 create mode 100644 gcc/splay-tree-utils.tcc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 978a08f7b04..900bf11b0ba 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1540,6 +1540,7 @@ OBJS = \
sparseset.o \
spellcheck.o \
spellcheck-tree.o \
+   splay-tree-utils.o \
sreal.o \
stack-ptr-mod.o \
statistics.o \
diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
index 7a89b2df5bd..c0c18ad17ca 100644
--- a/gcc/selftest-run-tests.c
+++ b/gcc/selftest-run-tests.c
@@ -79,6 +79,7 @@ selftest::run_tests ()
   optinfo_emit_json_cc_tests ();
   opt_problem_cc_tests ();
   ordered_hash_map_tests_cc_tests ();
+  splay_tree_cc_tests ();
 
   /* Mid-level data structures.  */
   input_c_tests ();
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 963e074b4d2..b6e4345b19f 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -256,6 +256,7 @@ extern void selftest_c_tests ();
 extern void simplify_rtx_c_tests ();
 extern void spellcheck_c_tests ();
 extern void spellcheck_tree_c_tests ();
+extern void splay_tree_cc_tests ();
 extern void sreal_c_tests ();
 extern void store_merging_c_tests ();
 extern void tree_c_tests ();
diff --git a/gcc/splay-tree-utils.cc b/gcc/splay-tree-utils.cc
new file mode 100644
index 000..4b2007b8414
--- /dev/null
+++ b/gcc/splay-tree-utils.cc
@@ -0,0 +1,264 @@
+// Splay tree utilities -*- C++ -*-
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify it under
+// the terms of the GNU General Public License as published by the Free
+// Software Foundation; either version 3, or (at your option) any later
+// version.
+//
+// GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+// WARRANTY; without even the implied warranty of MERCHANTABILITY or
+// FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+// for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with GCC; see the file COPYING3.  If not see
+// .
+
+#define INCLUDE_ALGORITHM
+#define INCLUDE_ARRAY
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "pret

[09/23] Add a cut-down version of std::span (array_slice)

2020-11-13 Thread Richard Sandiford via Gcc-patches
A later patch wants to be able to pass around subarray views of an
existing array.  The standard class to do that is std::span, but it's
a C++20 thing.  This patch just adds a cut-down version of it.

The intention is just to provide what's currently needed.

gcc/
* vec.h (array_slice): New class.
---
 gcc/vec.h | 120 ++
 1 file changed, 120 insertions(+)

diff --git a/gcc/vec.h b/gcc/vec.h
index f02beddc975..7768de9f518 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -2128,6 +2128,126 @@ release_vec_vec (vec > &vec)
   vec.release ();
 }
 
+// Provide a subset of the std::span functionality.  (We can't use std::span
+// itself because it's a C++20 feature.)
+//
+// In addition, provide an invalid value that is distinct from all valid
+// sequences (including the empty sequence).  This can be used to return
+// failure without having to use std::optional.
+//
+// There is no operator bool because it would be ambiguous whether it is
+// testing for a valid value or an empty sequence.
+template
+class array_slice
+{
+  template friend class array_slice;
+
+public:
+  using value_type = T;
+  using iterator = T *;
+  using const_iterator = const T *;
+
+  array_slice () : m_base (nullptr), m_size (0) {}
+
+  template
+  array_slice (array_slice other)
+: m_base (other.m_base), m_size (other.m_size) {}
+
+  array_slice (iterator base, unsigned int size)
+: m_base (base), m_size (size) {}
+
+  template
+  array_slice (T (&array)[N]) : m_base (array), m_size (N) {}
+
+  template
+  array_slice (const vec &v)
+: m_base (v.address ()), m_size (v.length ()) {}
+
+  iterator begin () { return m_base; }
+  iterator end () { return m_base + m_size; }
+
+  const_iterator begin () const { return m_base; }
+  const_iterator end () const { return m_base + m_size; }
+
+  value_type &front ();
+  value_type &back ();
+  value_type &operator[] (unsigned int i);
+
+  const value_type &front () const;
+  const value_type &back () const;
+  const value_type &operator[] (unsigned int i) const;
+
+  size_t size () const { return m_size; }
+  size_t size_bytes () const { return m_size * sizeof (T); }
+  bool empty () const { return m_size == 0; }
+
+  // An invalid array_slice that represents a failed operation.  This is
+  // distinct from an empty slice, which is a valid result in some contexts.
+  static array_slice invalid () { return { nullptr, ~0U }; }
+
+  // True if the array is valid, false if it is an array like INVALID.
+  bool is_valid () const { return m_base || m_size == 0; }
+
+private:
+  iterator m_base;
+  unsigned int m_size;
+};
+
+template
+inline typename array_slice::value_type &
+array_slice::front ()
+{
+  gcc_checking_assert (m_size);
+  return m_base[0];
+}
+
+template
+inline const typename array_slice::value_type &
+array_slice::front () const
+{
+  gcc_checking_assert (m_size);
+  return m_base[0];
+}
+
+template
+inline typename array_slice::value_type &
+array_slice::back ()
+{
+  gcc_checking_assert (m_size);
+  return m_base[m_size - 1];
+}
+
+template
+inline const typename array_slice::value_type &
+array_slice::back () const
+{
+  gcc_checking_assert (m_size);
+  return m_base[m_size - 1];
+}
+
+template
+inline typename array_slice::value_type &
+array_slice::operator[] (unsigned int i)
+{
+  gcc_checking_assert (i < m_size);
+  return m_base[i];
+}
+
+template
+inline const typename array_slice::value_type &
+array_slice::operator[] (unsigned int i) const
+{
+  gcc_checking_assert (i < m_size);
+  return m_base[i];
+}
+
+template
+array_slice
+make_array_slice (T *base, unsigned int size)
+{
+  return array_slice (base, size);
+}
+
 #if (GCC_VERSION >= 3000)
 # pragma GCC poison m_vec m_vecpfx m_vecdata
 #endif
-- 
2.17.1



[11/23] Split update_cfg_for_uncondjump out of combine

2020-11-13 Thread Richard Sandiford via Gcc-patches
Later patches want to reuse combine's update_cfg_for_uncondjump,
so this patch makes it a public cfgrtl.c function.

gcc/
* cfgrtl.h (update_cfg_for_uncondjump): Declare.
* combine.c (update_cfg_for_uncondjump): Move to...
* cfgrtl.c: ...here.
---
 gcc/cfgrtl.c  | 47 +++
 gcc/cfgrtl.h  |  1 +
 gcc/combine.c | 36 
 3 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index 45d84d39b22..332e93607e6 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -3417,6 +3417,53 @@ fixup_abnormal_edges (void)
   return inserted;
 }
 
+/* Delete the unconditional jump INSN and adjust the CFG correspondingly.
+   Note that the INSN should be deleted *after* removing dead edges, so
+   that the kept edge is the fallthrough edge for a (set (pc) (pc))
+   but not for a (set (pc) (label_ref FOO)).  */
+
+void
+update_cfg_for_uncondjump (rtx_insn *insn)
+{
+  basic_block bb = BLOCK_FOR_INSN (insn);
+  gcc_assert (BB_END (bb) == insn);
+
+  purge_dead_edges (bb);
+
+  if (current_ir_type () != IR_RTL_CFGLAYOUT)
+{
+  if (!find_fallthru_edge (bb->succs))
+   {
+ auto barrier = next_nonnote_nondebug_insn (insn);
+ if (!barrier || !BARRIER_P (barrier))
+   emit_barrier_after (insn);
+   }
+  return;
+}
+
+  delete_insn (insn);
+  if (EDGE_COUNT (bb->succs) == 1)
+{
+  rtx_insn *insn;
+
+  single_succ_edge (bb)->flags |= EDGE_FALLTHRU;
+
+  /* Remove barriers from the footer if there are any.  */
+  for (insn = BB_FOOTER (bb); insn; insn = NEXT_INSN (insn))
+   if (BARRIER_P (insn))
+ {
+   if (PREV_INSN (insn))
+ SET_NEXT_INSN (PREV_INSN (insn)) = NEXT_INSN (insn);
+   else
+ BB_FOOTER (bb) = NEXT_INSN (insn);
+   if (NEXT_INSN (insn))
+ SET_PREV_INSN (NEXT_INSN (insn)) = PREV_INSN (insn);
+ }
+   else if (LABEL_P (insn))
+ break;
+}
+}
+
 /* Cut the insns from FIRST to LAST out of the insns stream.  */
 
 rtx_insn *
diff --git a/gcc/cfgrtl.h b/gcc/cfgrtl.h
index ae62d6cf05c..1c177d3a7e3 100644
--- a/gcc/cfgrtl.h
+++ b/gcc/cfgrtl.h
@@ -47,6 +47,7 @@ extern void fixup_partitions (void);
 extern bool purge_dead_edges (basic_block);
 extern bool purge_all_dead_edges (void);
 extern bool fixup_abnormal_edges (void);
+extern void update_cfg_for_uncondjump (rtx_insn *);
 extern rtx_insn *unlink_insn_chain (rtx_insn *, rtx_insn *);
 extern void relink_block_chain (bool);
 extern rtx_insn *duplicate_insn_chain (rtx_insn *, rtx_insn *,
diff --git a/gcc/combine.c b/gcc/combine.c
index ed1ad45de83..5864474e720 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -2531,42 +2531,6 @@ reg_subword_p (rtx x, rtx reg)
 && GET_MODE_CLASS (GET_MODE (x)) == MODE_INT;
 }
 
-/* Delete the unconditional jump INSN and adjust the CFG correspondingly.
-   Note that the INSN should be deleted *after* removing dead edges, so
-   that the kept edge is the fallthrough edge for a (set (pc) (pc))
-   but not for a (set (pc) (label_ref FOO)).  */
-
-static void
-update_cfg_for_uncondjump (rtx_insn *insn)
-{
-  basic_block bb = BLOCK_FOR_INSN (insn);
-  gcc_assert (BB_END (bb) == insn);
-
-  purge_dead_edges (bb);
-
-  delete_insn (insn);
-  if (EDGE_COUNT (bb->succs) == 1)
-{
-  rtx_insn *insn;
-
-  single_succ_edge (bb)->flags |= EDGE_FALLTHRU;
-
-  /* Remove barriers from the footer if there are any.  */
-  for (insn = BB_FOOTER (bb); insn; insn = NEXT_INSN (insn))
-   if (BARRIER_P (insn))
- {
-   if (PREV_INSN (insn))
- SET_NEXT_INSN (PREV_INSN (insn)) = NEXT_INSN (insn);
-   else
- BB_FOOTER (bb) = NEXT_INSN (insn);
-   if (NEXT_INSN (insn))
- SET_PREV_INSN (NEXT_INSN (insn)) = PREV_INSN (insn);
- }
-   else if (LABEL_P (insn))
- break;
-}
-}
-
 /* Return whether PAT is a PARALLEL of exactly N register SETs followed
by an arbitrary number of CLOBBERs.  */
 static bool
-- 
2.17.1



[10/23] Tweak the way that is_a is implemented

2020-11-13 Thread Richard Sandiford via Gcc-patches
At the moment, class hierarchies that use is_a are expected
to define specialisations like:

  template <>
  template <>
  inline bool
  is_a_helper ::test (symtab_node *p)
  {
return p->type == SYMTAB_FUNCTION;
  }

But this doesn't scale well to larger hierarchies, because it only
defines ::test for an argument that is exactly “symtab_node *”
(and not for example “const symtab_node *” or something that
comes between cgraph_node and symtab_node in the hierarchy).

For example:

  struct A { int x; };
  struct B : A {};
  struct C : B {};

  template <>
  template <>
  inline bool
  is_a_helper ::test (A *a)
  {
return a->x == 1;
  }

  bool f(B *b) { return is_a (b); }

gives:

  warning: inline function ‘static bool is_a_helper::test(U*) [with U = B; T 
= C*]’ used but never defined

and:

  bool f(const A *a) { return is_a (a); }

gives:

  warning: inline function ‘static bool is_a_helper::test(U*) [with U = 
const A; T = const C*]’ used but never defined

This patch instead allows is_a to be implemented by specialising
is_a_helper as a whole, for example:

  template<>
  struct is_a_helper : static_is_a_helper
  {
static inline bool test (const A *a) { return a->x == 1; }
  };

It also adds a general specialisation of is_a_helper for const
pointers.  Together, this makes both of the above examples work.

gcc/
* is-a.h (reinterpret_is_a_helper): New class.
(static_is_a_helper): Likewise.
(is_a_helper): Inherit from reinterpret_is_a_helper.
(is_a_helper): New specialization.
---
 gcc/is-a.h | 81 ++
 1 file changed, 63 insertions(+), 18 deletions(-)

diff --git a/gcc/is-a.h b/gcc/is-a.h
index e84c3e4880c..26f53a5ba4a 100644
--- a/gcc/is-a.h
+++ b/gcc/is-a.h
@@ -116,9 +116,30 @@ the connection between the types has not been made.  See 
below.
 
 EXTENDING THE GENERIC TYPE FACILITY
 
-Each connection between types must be made by defining a specialization of the
-template member function 'test' of the template class 'is_a_helper'.  For
-example,
+Method 1
+
+
+If DERIVED is derived from BASE, and if BASE contains enough information
+to determine whether an object is actually an instance of DERIVED,
+then you can make the above routines work for DERIVED by defining
+a specialization of is_a_helper such as:
+
+  template<>
+  struct is_a_helper : static_is_a_helper
+  {
+static inline bool test (const BASE *p) { return ...; }
+  };
+
+This test function should return true if P is an instanced of DERIVED.
+This on its own is enough; the comments below for method 2 do not apply.
+
+Method 2
+
+
+Alternatively, if two types are connected in ways other than C++
+inheritance, each connection between them must be made by defining a
+specialization of the template member function 'test' of the template
+class 'is_a_helper'.  For example,
 
   template <>
   template <>
@@ -145,15 +166,52 @@ when needed may result in a crash.  For example,
 #ifndef GCC_IS_A_H
 #define GCC_IS_A_H
 
+/* A base class that specializations of is_a_helper can use if casting
+   U * to T is simply a reinterpret_cast.  */
+
+template 
+struct reinterpret_is_a_helper
+{
+  template 
+  static inline T cast (U *p) { return reinterpret_cast  (p); }
+};
+
+/* A base class that specializations of is_a_helper can use if casting
+   U * to T is simply a static_cast.  This is more type-safe than
+   reinterpret_is_a_helper.  */
+
+template 
+struct static_is_a_helper
+{
+  template 
+  static inline T cast (U *p) { return static_cast  (p); }
+};
+
 /* A generic type conversion internal helper class.  */
 
 template 
-struct is_a_helper
+struct is_a_helper : reinterpret_is_a_helper
 {
   template 
   static inline bool test (U *p);
+};
+
+/* Reuse the definition of is_a_helper to implement
+   is_a_helper.  */
+
+template 
+struct is_a_helper
+{
   template 
-  static inline T cast (U *p);
+  static inline const T *cast (const U *p)
+  {
+return is_a_helper::cast (const_cast  (p));
+  }
+  template 
+  static inline bool test (const U *p)
+  {
+return is_a_helper::test (p);
+  }
 };
 
 /* Note that we deliberately do not define the 'test' member template.  Not
@@ -161,19 +219,6 @@ struct is_a_helper
not been defined, rather than a run-time error.  See the discussion above
for when to define this member.  */
 
-/* This is the generic implementation for casting from one type to another.
-   Do not use this routine directly; it is an internal function.  See the
-   discussion above for when to define this member.  */
-
-template 
-template 
-inline T
-is_a_helper ::cast (U *p)
-{
-  return reinterpret_cast  (p);
-}
-
-
 /* The public interface.  */
 
 /* A generic test for a type relationship.  See the discussion above for when
-- 
2.17.1



[12/23] Export print-rtl.c:print_insn_with_notes

2020-11-13 Thread Richard Sandiford via Gcc-patches
Later patches want to use print_insn_with_notes (printing to
a pretty_printer).  This patch exports it from print-rtl.c.

The non-notes version is already public.

gcc/
* print-rtl.h (print_insn_with_notes): Declare.
* print-rtl.c (print_insn_with_notes): Make non-static
---
 gcc/print-rtl.c | 5 +
 gcc/print-rtl.h | 1 +
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index d514b1c5373..c1d3c179b75 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -1260,9 +1260,6 @@ print_rtx_insn_vec (FILE *file, const vec 
&vec)
It is also possible to obtain a string for a single pattern as a string
pointer, via str_pattern_slim, but this usage is discouraged.  */
 
-/* For insns we print patterns, and for some patterns we print insns...  */
-static void print_insn_with_notes (pretty_printer *, const rtx_insn *);
-
 /* This recognizes rtx'en classified as expressions.  These are always
represent some action on values or results of other expression, that
may be stored in objects representing values.  */
@@ -2011,7 +2008,7 @@ print_insn (pretty_printer *pp, const rtx_insn *x, int 
verbose)
 /* Pretty-print a slim dump of X (an insn) to PP, including any register
note attached to the instruction.  */
 
-static void
+void
 print_insn_with_notes (pretty_printer *pp, const rtx_insn *x)
 {
   pp_string (pp, print_rtx_head);
diff --git a/gcc/print-rtl.h b/gcc/print-rtl.h
index 09e5a519be9..cf801e81332 100644
--- a/gcc/print-rtl.h
+++ b/gcc/print-rtl.h
@@ -84,6 +84,7 @@ extern void dump_rtl_slim (FILE *, const rtx_insn *, const 
rtx_insn *,
 extern void print_value (pretty_printer *, const_rtx, int);
 extern void print_pattern (pretty_printer *, const_rtx, int);
 extern void print_insn (pretty_printer *pp, const rtx_insn *x, int verbose);
+extern void print_insn_with_notes (pretty_printer *, const rtx_insn *);
 
 extern void rtl_dump_bb_for_graph (pretty_printer *, basic_block);
 extern const char *str_pattern_slim (const_rtx);
-- 
2.17.1



[13/23] recog: Split out a register_asm_p function

2020-11-13 Thread Richard Sandiford via Gcc-patches
verify_changes has a test for whether a particular hard register
is a user-defined register asm.  A later patch needs to test the
same thing, so this patch splits it out into a helper.

gcc/
* rtl.h (register_asm_p): Declare.
* recog.c (verify_changes): Split out the test for whether
a hard register is a register asm to...
(register_asm_p): ...this new function.
---
 gcc/recog.c   |  5 +
 gcc/rtl.h |  1 +
 gcc/rtlanal.c | 12 
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/gcc/recog.c b/gcc/recog.c
index 6d8b7d560ee..2d934169a81 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -408,10 +408,7 @@ verify_changes (int num)
   changes[i].old
   && REG_P (changes[i].old)
   && asm_noperands (PATTERN (object)) > 0
-  && REG_EXPR (changes[i].old) != NULL_TREE
-  && HAS_DECL_ASSEMBLER_NAME_P (REG_EXPR (changes[i].old))
-  && DECL_ASSEMBLER_NAME_SET_P (REG_EXPR (changes[i].old))
-  && DECL_REGISTER (REG_EXPR (changes[i].old)))
+  && register_asm_p (changes[i].old))
{
  /* Don't allow changes of hard register operands to inline
 assemblies if they have been defined as register asm ("x").  */
diff --git a/gcc/rtl.h b/gcc/rtl.h
index fcec9dc6387..5a1670f295c 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3525,6 +3525,7 @@ extern rtx tablejump_casesi_pattern (const rtx_insn 
*insn);
 extern int computed_jump_p (const rtx_insn *);
 extern bool tls_referenced_p (const_rtx);
 extern bool contains_mem_rtx_p (rtx x);
+extern bool register_asm_p (const_rtx);
 
 /* Overload for refers_to_regno_p for checking a single register.  */
 inline bool
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 6f521503c39..30d5b0c6b76 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -6617,3 +6617,15 @@ add_auto_inc_notes (rtx_insn *insn, rtx x)
  add_auto_inc_notes (insn, XVECEXP (x, i, j));
 }
 }
+
+/* Return true if X is register asm.  */
+
+bool
+register_asm_p (const_rtx x)
+{
+  return (REG_P (x)
+ && REG_EXPR (x) != NULL_TREE
+ && HAS_DECL_ASSEMBLER_NAME_P (REG_EXPR (x))
+ && DECL_ASSEMBLER_NAME_SET_P (REG_EXPR (x))
+ && DECL_REGISTER (REG_EXPR (x)));
+}
-- 
2.17.1



Re: [PATCH V2] Clean up loop-closed PHIs after loop finalize

2020-11-13 Thread Richard Biener
On Wed, 11 Nov 2020, Jiufu Guo wrote:

> 
> Thanks a lot for the sugguestion from previous mails.
> The patch was updated accordingly.
> 
> This updated patch propagates loop-closed PHIs them out after
> loop_optimizer_finalize under a new introduced flag.  At some cases,
> to clean up loop-closed PHIs would save efforts of optimization passes
> after loopdone.
> 
> This patch passes bootstrap and regtest on ppc64le.  Is this ok for trunk?

Comments below

> gcc/ChangeLog
> 2020-10-11  Jiufu Guo   
> 
>   * common.opt (flag_clean_up_loop_closed_phi): New flag.
>   * loop-init.c (loop_optimizer_finalize): Check
>   flag_clean_up_loop_closed_phi and call clean_up_loop_closed_phi.
>   * tree-cfgcleanup.h (clean_up_loop_closed_phi): New declare.
>   * tree-ssa-propagate.c (clean_up_loop_closed_phi): New function.
> 
> gcc/testsuite/ChangeLog
> 2020-10-11  Jiufu Guo   
> 
>   * gcc.dg/tree-ssa/loopclosedphi.c: New test.
> 
> ---
>  gcc/common.opt|  4 ++
>  gcc/loop-init.c   |  8 +++
>  gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c | 21 +++
>  gcc/tree-cfgcleanup.h |  1 +
>  gcc/tree-ssa-propagate.c  | 61 +++
>  5 files changed, 95 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 7e789d1c47f..f0d7b74d7ad 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1141,6 +1141,10 @@ fchecking=
>  Common Joined RejectNegative UInteger Var(flag_checking)
>  Perform internal consistency checkings.
>  
> +fclean-up-loop-closed-phi
> +Common Report Var(flag_clean_up_loop_closed_phi) Optimization Init(0)
> +Clean up loop-closed PHIs after loop optimization done.
> +
>  fcode-hoisting
>  Common Report Var(flag_code_hoisting) Optimization
>  Enable code hoisting.
> diff --git a/gcc/loop-init.c b/gcc/loop-init.c
> index 401e5282907..05804759ac9 100644
> --- a/gcc/loop-init.c
> +++ b/gcc/loop-init.c
> @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-loop-niter.h"
>  #include "loop-unroll.h"
>  #include "tree-scalar-evolution.h"
> +#include "tree-cfgcleanup.h"
>  
>  
>  /* Apply FLAGS to the loop state.  */
> @@ -145,6 +146,13 @@ loop_optimizer_finalize (struct function *fn)
>  
>free_numbers_of_iterations_estimates (fn);
>  
> +  if (flag_clean_up_loop_closed_phi

Sorry if there was miscommunication but I've not meant to add a
new user-visible flag but instead a flag argument to loop_optimizer_finalize
(as said, you can default it to false to only need to change the
one in fini_loops)

> +  && loops_state_satisfies_p (fn, LOOP_CLOSED_SSA))
> +{
> +  clean_up_loop_closed_phi (fn);
> +  loops_state_clear (fn, LOOP_CLOSED_SSA);
> +}
> +
>/* If we should preserve loop structure, do not free it but clear
>   flags that advanced properties are there as we are not preserving
>   that in full.  */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c
> new file mode 100644
> index 000..ab22a991935
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fno-tree-ch -w -fdump-tree-loopdone-details 
> -fclean-up-loop-closed-phi" } */
> +
> +void
> +t6 (int qz, int wh)
> +{
> +  int jl = wh;
> +
> +  while (1.0 * qz / wh < 1)
> +{
> +  qz = wh * (wh + 2);
> +
> +  while (wh < 1)
> +jl = 0;
> +}
> +
> +  while (qz < 1)
> +qz = jl * wh;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Replacing" 2 "loopdone"} } */
> diff --git a/gcc/tree-cfgcleanup.h b/gcc/tree-cfgcleanup.h
> index 6ff6726bfe4..9e368d63709 100644
> --- a/gcc/tree-cfgcleanup.h
> +++ b/gcc/tree-cfgcleanup.h
> @@ -26,5 +26,6 @@ extern bool cleanup_tree_cfg (unsigned = 0);
>  extern bool fixup_noreturn_call (gimple *stmt);
>  extern bool delete_unreachable_blocks_update_callgraph (cgraph_node 
> *dst_node,
>   bool update_clones);
> +extern unsigned clean_up_loop_closed_phi (function *);
>  
>  #endif /* GCC_TREE_CFGCLEANUP_H */
> diff --git a/gcc/tree-ssa-propagate.c b/gcc/tree-ssa-propagate.c
> index 87dbf55fab9..a3bfe36c733 100644
> --- a/gcc/tree-ssa-propagate.c
> +++ b/gcc/tree-ssa-propagate.c
> @@ -1549,3 +1549,64 @@ propagate_tree_value_into_stmt (gimple_stmt_iterator 
> *gsi, tree val)
>else
>  gcc_unreachable ();
>  }
> +
> +/* Check exits of each loop in FUN, walk over loop closed PHIs in
> +   each exit basic block and propagate degenerate PHIs.  */
> +
> +unsigned
> +clean_up_loop_closed_phi (function *fun)
> +{
> +  unsigned i;
> +  edge e;
> +  gphi *phi;
> +  tree rhs;
> +  tree lhs;
> +  gphi_iterator gsi;
> +  struct loop *loop;
> +  bool cfg_altered = false;
> +
> +  /* Check dominator info before get loop-cl

[14/23] simplify-rtx: Put simplify routines into a class

2020-11-13 Thread Richard Sandiford via Gcc-patches
One of the recurring warts of RTL is that multiplication by a power
of 2 is represented as a MULT inside a MEM but as an ASHIFT outside
a MEM.  It would obviously be better if we didn't have this kind of
context sensitivity, but it would be difficult to remove.

Currently the simplify-rtx.c routines are hard-coded for the
ASHIFT form.  This means that some callers have to convert the
ASHIFTs “back” into MULTs after calling the simplify-rtx.c
routines; see fwprop.c:canonicalize_address for an example.

I think we can relieve some of the pain by wrapping the simplify-rtx.c
routines in a simple class that tracks whether the expression occurs
in a MEM or not, so that no post-processing is needed.

An obvious concern is whether passing the “this” pointer around
will slow things down or bloat the code.  I can't measure any
increase in compile time after applying the patch.  Sizewise,
simplify-rtx.o text increases by 2.3% in default-checking builds
and 4.1% in release-checking builds.

I realise the MULT/ASHIFT thing isn't the most palatable
reason for doing this, but I think it might be useful for
other things in future, such as using local nonzero_bits
hooks/virtual functions instead of the global hooks.

The obvious alternative would be to add a static variable
and hope that it is always updated correctly.

Later patches make use of this.

gcc/
* rtl.h (simplify_context): New class.
(simplify_unary_operation, simplify_binary_operation): Use it.
(simplify_ternary_operation, simplify_relational_operation): Likewise.
(simplify_subreg, simplify_gen_unary, simplify_gen_binary): Likewise.
(simplify_gen_ternary, simplify_gen_relational): Likewise.
(simplify_gen_subreg, lowpart_subreg): Likewise.
* simplify-rtx.c (simplify_gen_binary): Turn into a member function
of simplify_context.
(simplify_gen_unary, simplify_gen_ternary, simplify_gen_relational)
(simplify_truncation, simplify_unary_operation): Likewise.
(simplify_unary_operation_1, simplify_byte_swapping_operation)
(simplify_associative_operation, simplify_logical_relational_operation)
(simplify_binary_operation, simplify_binary_operation_series)
(simplify_distributive_operation, simplify_plus_minus): Likewise.
(simplify_relational_operation, simplify_relational_operation_1)
(simplify_cond_clz_ctz, simplify_merge_mask): Likewise.
(simplify_ternary_operation, simplify_subreg, simplify_gen_subreg)
(lowpart_subreg): Likewise.
(simplify_binary_operation_1): Likewise.  Test mem_depth when
deciding whether the ASHIFT or MULT form is canonical.
(simplify_merge_mask): Use simplify_context.
---
 gcc/rtl.h  | 149 ++--
 gcc/simplify-rtx.c | 152 ++---
 2 files changed, 220 insertions(+), 81 deletions(-)

diff --git a/gcc/rtl.h b/gcc/rtl.h
index 5a1670f295c..e9df95b02c4 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3375,30 +3375,143 @@ extern rtx_insn *try_split (rtx, rtx_insn *, int);
 extern rtx_insn *split_insns (rtx, rtx_insn *);
 
 /* In simplify-rtx.c  */
+
+/* A class that records the context in which a simplification
+   is being mode.  */
+class simplify_context
+{
+public:
+  rtx simplify_unary_operation (rtx_code, machine_mode, rtx, machine_mode);
+  rtx simplify_binary_operation (rtx_code, machine_mode, rtx, rtx);
+  rtx simplify_ternary_operation (rtx_code, machine_mode, machine_mode,
+ rtx, rtx, rtx);
+  rtx simplify_relational_operation (rtx_code, machine_mode, machine_mode,
+rtx, rtx);
+  rtx simplify_subreg (machine_mode, rtx, machine_mode, poly_uint64);
+
+  rtx lowpart_subreg (machine_mode, rtx, machine_mode);
+
+  rtx simplify_merge_mask (rtx, rtx, int);
+
+  rtx simplify_gen_unary (rtx_code, machine_mode, rtx, machine_mode);
+  rtx simplify_gen_binary (rtx_code, machine_mode, rtx, rtx);
+  rtx simplify_gen_ternary (rtx_code, machine_mode, machine_mode,
+   rtx, rtx, rtx);
+  rtx simplify_gen_relational (rtx_code, machine_mode, machine_mode, rtx, rtx);
+  rtx simplify_gen_subreg (machine_mode, rtx, machine_mode, poly_uint64);
+
+  /* Tracks the level of MEM nesting for the value being simplified:
+ 0 means the value is not in a MEM, >0 means it is.  This is needed
+ because the canonical representation of multiplication is different
+ inside a MEM than outside.  */
+  unsigned int mem_depth = 0;
+
+private:
+  rtx simplify_truncation (machine_mode, rtx, machine_mode);
+  rtx simplify_byte_swapping_operation (rtx_code, machine_mode, rtx, rtx);
+  rtx simplify_associative_operation (rtx_code, machine_mode, rtx, rtx);
+  rtx simplify_distributive_operation (rtx_code, machine_mode, rtx, rtx);
+  rtx simplify_logical_relational_operation (rtx_code, machine_mode, rtx, rtx);
+  rtx simplify_binary_operation_serie

[15/23] recog: Add a validate_change_xveclen function

2020-11-13 Thread Richard Sandiford via Gcc-patches
A later patch wants to be able to use the validate_change machinery
to reduce the XVECLEN of a PARALLEL.  This should be more efficient
than allocating a separate PARALLEL at a possibly distant memory
location, especially since the new PARALLEL would be garbage rtl if
the new pattern turns out not to match.  Combine already pulls this
trick with SUBST_INT.

This patch adds a general helper for doing that.

gcc/
* recog.h (validate_change_xveclen): Declare.
* recog.c (change_t::old_len): New field.
(validate_change_1): Add a new_len parameter.  Conditionally
replace the XVECLEN of an rtx, avoiding single-element PARALLELs.
(validate_change_xveclen): New function.
(cancel_changes): Undo changes made by validate_change_xveclen.
---
 gcc/recog.c | 41 +++--
 gcc/recog.h |  1 +
 2 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/gcc/recog.c b/gcc/recog.c
index 2d934169a81..65125b8f0d1 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -183,6 +183,7 @@ struct change_t
 {
   rtx object;
   int old_code;
+  int old_len;
   bool unshare;
   rtx *loc;
   rtx old;
@@ -194,8 +195,10 @@ static int changes_allocated;
 static int num_changes = 0;
 
 /* Validate a proposed change to OBJECT.  LOC is the location in the rtl
-   at which NEW_RTX will be placed.  If OBJECT is zero, no validation is done,
-   the change is simply made.
+   at which NEW_RTX will be placed.  If NEW_LEN is >= 0, XVECLEN (NEW_RTX, 0)
+   will also be changed to NEW_LEN, which is no greater than the current
+   XVECLEN.  If OBJECT is zero, no validation is done, the change is
+   simply made.
 
Two types of objects are supported:  If OBJECT is a MEM, memory_address_p
will be called with the address and mode as parameters.  If OBJECT is
@@ -212,14 +215,25 @@ static int num_changes = 0;
Otherwise, perform the change and return 1.  */
 
 static bool
-validate_change_1 (rtx object, rtx *loc, rtx new_rtx, bool in_group, bool 
unshare)
+validate_change_1 (rtx object, rtx *loc, rtx new_rtx, bool in_group,
+  bool unshare, int new_len = -1)
 {
   rtx old = *loc;
 
-  if (old == new_rtx || rtx_equal_p (old, new_rtx))
+  /* Single-element parallels aren't valid and won't match anything.
+ Replace them with the single element.  */
+  if (new_len == 1 && GET_CODE (new_rtx) == PARALLEL)
+{
+  new_rtx = XVECEXP (new_rtx, 0, 0);
+  new_len = -1;
+}
+
+  if ((old == new_rtx || rtx_equal_p (old, new_rtx))
+  && (new_len < 0 || XVECLEN (new_rtx, 0) == new_len))
 return 1;
 
-  gcc_assert (in_group != 0 || num_changes == 0);
+  gcc_assert ((in_group != 0 || num_changes == 0)
+ && (new_len < 0 || new_rtx == *loc));
 
   *loc = new_rtx;
 
@@ -239,8 +253,12 @@ validate_change_1 (rtx object, rtx *loc, rtx new_rtx, bool 
in_group, bool unshar
   changes[num_changes].object = object;
   changes[num_changes].loc = loc;
   changes[num_changes].old = old;
+  changes[num_changes].old_len = (new_len >= 0 ? XVECLEN (new_rtx, 0) : -1);
   changes[num_changes].unshare = unshare;
 
+  if (new_len >= 0)
+XVECLEN (new_rtx, 0) = new_len;
+
   if (object && !MEM_P (object))
 {
   /* Set INSN_CODE to force rerecognition of insn.  Save old code in
@@ -278,6 +296,14 @@ validate_unshare_change (rtx object, rtx *loc, rtx 
new_rtx, bool in_group)
   return validate_change_1 (object, loc, new_rtx, in_group, true);
 }
 
+/* Change XVECLEN (*LOC, 0) to NEW_LEN.  OBJECT, IN_GROUP and the return
+   value are as for validate_change_1.  */
+
+bool
+validate_change_xveclen (rtx object, rtx *loc, int new_len, bool in_group)
+{
+  return validate_change_1 (object, loc, *loc, in_group, false, new_len);
+}
 
 /* Keep X canonicalized if some changes have made it non-canonical; only
modifies the operands of X, not (for example) its code.  Simplifications
@@ -541,7 +567,10 @@ cancel_changes (int num)
  they were made.  */
   for (i = num_changes - 1; i >= num; i--)
 {
-  *changes[i].loc = changes[i].old;
+  if (changes[i].old_len >= 0)
+   XVECLEN (*changes[i].loc, 0) = changes[i].old_len;
+  else
+   *changes[i].loc = changes[i].old;
   if (changes[i].object && !MEM_P (changes[i].object))
INSN_CODE (changes[i].object) = changes[i].old_code;
 }
diff --git a/gcc/recog.h b/gcc/recog.h
index d87456c257f..e152e2bb591 100644
--- a/gcc/recog.h
+++ b/gcc/recog.h
@@ -88,6 +88,7 @@ extern int check_asm_operands (rtx);
 extern int asm_operand_ok (rtx, const char *, const char **);
 extern bool validate_change (rtx, rtx *, rtx, bool);
 extern bool validate_unshare_change (rtx, rtx *, rtx, bool);
+extern bool validate_change_xveclen (rtx, rtx *, int, bool);
 extern bool canonicalize_change_group (rtx_insn *insn, rtx x);
 extern int insn_invalid_p (rtx_insn *, bool);
 extern int verify_changes (int);
-- 
2.17.1



[16/23] recog: Add a way of temporarily undoing changes

2020-11-13 Thread Richard Sandiford via Gcc-patches
In some cases, it can be convenient to roll back the changes that
have been made by validate_change to see how things looked before,
then reroll the changes.  For example, this makes it possible
to defer calculating the cost of an instruction until we know that
the result is actually needed.  It can also make dumps easier to read.

This patch adds a couple of helper functions for doing that.

gcc/
* recog.h (temporarily_undo_changes, redo_changes): Declare.
* recog.c (swap_change, temporarily_undo_changes): New functions.
(redo_changes): Likewise.
---
 gcc/recog.c | 40 
 gcc/recog.h |  2 ++
 2 files changed, 42 insertions(+)

diff --git a/gcc/recog.c b/gcc/recog.c
index 65125b8f0d1..309a578a151 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -577,6 +577,46 @@ cancel_changes (int num)
   num_changes = num;
 }
 
+/* Swap the status of change NUM from being applied to not being applied,
+   or vice versa.  */
+
+static void
+swap_change (int num)
+{
+  if (changes[num].old_len >= 0)
+std::swap (XVECLEN (*changes[num].loc, 0), changes[num].old_len);
+  else
+std::swap (*changes[num].loc, changes[num].old);
+  if (changes[num].object && !MEM_P (changes[num].object))
+std::swap (INSN_CODE (changes[num].object), changes[num].old_code);
+}
+
+/* Temporarily undo all the changes numbered NUM and up, with a view
+   to reapplying them later.  The next call to the changes machinery
+   must be:
+
+  redo_changes (NUM)
+
+   otherwise things will end up in an invalid state.  */
+
+void
+temporarily_undo_changes (int num)
+{
+  for (int i = num_changes - 1; i >= num; i--)
+swap_change (i);
+}
+
+/* Redo the changes that were temporarily undone by:
+
+  temporarily_undo_changes (NUM).  */
+
+void
+redo_changes (int num)
+{
+  for (int i = num; i < num_changes; ++i)
+swap_change (i);
+}
+
 /* Reduce conditional compilation elsewhere.  */
 /* A subroutine of validate_replace_rtx_1 that tries to simplify the resulting
rtx.  */
diff --git a/gcc/recog.h b/gcc/recog.h
index e152e2bb591..facf36e7c08 100644
--- a/gcc/recog.h
+++ b/gcc/recog.h
@@ -96,6 +96,8 @@ extern void confirm_change_group (void);
 extern int apply_change_group (void);
 extern int num_validated_changes (void);
 extern void cancel_changes (int);
+extern void temporarily_undo_changes (int);
+extern void redo_changes (int);
 extern int constrain_operands (int, alternative_mask);
 extern int constrain_operands_cached (rtx_insn *, int);
 extern int memory_address_addr_space_p (machine_mode, rtx, addr_space_t);
-- 
2.17.1



[17/23] recog: Add a class for propagating into insns

2020-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds yet another way of propagating into an instruction and
simplifying the result.  (The net effect of the series is to keep the
total number of propagation approaches the same though, since a later
patch removes the fwprop.c routines.)

One of the drawbacks of the validate_replace_* routines is that
they only do simple simplifications, mostly canonicalisations:

  /* Do changes needed to keep rtx consistent.  Don't do any other
 simplifications, as it is not our job.  */
  if (simplify)
simplify_while_replacing (loc, to, object, op0_mode);

But substituting can often lead to real simplification opportunities.
simplify-rtx.c:simplify_replace_rtx does fully simplify the result,
but it only operates on specific rvalues rather than full instruction
patterns.  It is also nondestructive, which means that it returns a
new rtx whenever a substitution or simplification was possible.
This can create quite a bit of garbage rtl in the context of a
speculative recog, where changing the contents of a pointer is
often enough.

The new routines are therefore supposed to provide simplify_replace_rtx-
style substitution in recog.  They go to some effort to prevent garbage
rtl from being created.

At the moment, the new routines fail if the pattern would still refer
to the old "from" value in some way.  That might be unnecessary in
some contexts; if so, it could be put behind a configuration parameter.

gcc/
* recog.h (insn_propagation): New class.
* recog.c (insn_propagation::apply_to_mem_1): New function.
(insn_propagation::apply_to_rvalue_1): Likewise.
(insn_propagation::apply_to_lvalue_1): Likewise.
(insn_propagation::apply_to_pattern_1): Likewise.
(insn_propagation::apply_to_pattern): Likewise.
(insn_propagation::apply_to_rvalue): Likewise.
---
 gcc/recog.c | 372 
 gcc/recog.h | 100 ++
 2 files changed, 472 insertions(+)

diff --git a/gcc/recog.c b/gcc/recog.c
index 309a578a151..cb0952d8c6c 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -989,6 +989,378 @@ validate_simplify_insn (rtx_insn *insn)
   return ((num_changes_pending () > 0) && (apply_change_group () > 0));
 }
 
+/* Try to process the address of memory expression MEM.  Return true on
+   success; leave the caller to clean up on failure.  */
+
+bool
+insn_propagation::apply_to_mem_1 (rtx mem)
+{
+  auto old_num_changes = num_validated_changes ();
+  mem_depth += 1;
+  bool res = apply_to_rvalue_1 (&XEXP (mem, 0));
+  mem_depth -= 1;
+  if (!res)
+return false;
+
+  if (old_num_changes != num_validated_changes ()
+  && should_check_mems
+  && !check_mem (old_num_changes, mem))
+return false;
+
+  return true;
+}
+
+/* Try to process the rvalue expression at *LOC.  Return true on success;
+   leave the caller to clean up on failure.  */
+
+bool
+insn_propagation::apply_to_rvalue_1 (rtx *loc)
+{
+  rtx x = *loc;
+  enum rtx_code code = GET_CODE (x);
+  machine_mode mode = GET_MODE (x);
+
+  auto old_num_changes = num_validated_changes ();
+  if (from && GET_CODE (x) == GET_CODE (from) && rtx_equal_p (x, from))
+{
+  if (should_unshare)
+   validate_unshare_change (insn, loc, to, 1);
+  else
+   validate_change (insn, loc, to, 1);
+  if (mem_depth && !REG_P (to) && !CONSTANT_P (to))
+   {
+ /* We're substituting into an address, but TO will have the
+form expected outside an address.  Canonicalize it if
+necessary.  */
+ insn_propagation subprop (insn);
+ subprop.mem_depth += 1;
+ if (!subprop.apply_to_rvalue (loc))
+   gcc_unreachable ();
+ if (should_unshare
+ && num_validated_changes () != old_num_changes + 1)
+   {
+ /* TO is owned by someone else, so create a copy and
+return TO to its original form.  */
+ rtx to = copy_rtx (*loc);
+ cancel_changes (old_num_changes);
+ validate_change (insn, loc, to, 1);
+   }
+   }
+  num_replacements += 1;
+  should_unshare = true;
+  result_flags |= UNSIMPLIFIED;
+  return true;
+}
+
+  /* Recursively apply the substitution and see if we can simplify
+ the result.  This specifically shouldn't use simplify_gen_* for
+ speculative simplifications, since we want to avoid generating new
+ expressions where possible.  */
+  auto old_result_flags = result_flags;
+  rtx newx = NULL_RTX;
+  bool recurse_p = false;
+  switch (GET_RTX_CLASS (code))
+{
+case RTX_UNARY:
+  {
+   machine_mode op0_mode = GET_MODE (XEXP (x, 0));
+   if (!apply_to_rvalue_1 (&XEXP (x, 0)))
+ return false;
+   if (from && old_num_changes == num_validated_changes ())
+ return true;
+
+   newx = simplify_unary_operation (code, mode, XEXP (x, 0), op0_mode);
+   break;
+  }
+
+case RTX_BIN_ARITH:
+case RTX_COMM_ARITH:
+  {

[18/23] recog: Add an RAII class for undoing insn changes

2020-11-13 Thread Richard Sandiford via Gcc-patches
When using validate_change to make a group of changes, you have
to remember to cancel them if something goes wrong.  This patch
adds an RAII class to make that easier.  See the comments in the
patch for details and examples.

gcc/
* recog.h (insn_change_watermark): New class.
---
 gcc/recog.h | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/gcc/recog.h b/gcc/recog.h
index d6af2aa66d9..b8de43b95bb 100644
--- a/gcc/recog.h
+++ b/gcc/recog.h
@@ -503,6 +503,57 @@ alternative_mask get_preferred_alternatives (rtx_insn *, 
basic_block);
 bool check_bool_attrs (rtx_insn *);
 
 void recog_init ();
+
+/* This RAII class can help to undo tentative insn changes on failure.
+   When an object of the class goes out of scope, it undoes all group
+   changes that have been made via the validate_change machinery and
+   not yet confirmed via confirm_change_group.
+
+   For example:
+
+  insn_change_watermark watermark;
+  validate_change (..., true); // A
+  ...
+  if (test)
+   // Undoes change A.
+   return false;
+  ...
+  validate_change (..., true); // B
+  ...
+  if (test)
+   // Undoes changes A and B.
+   return false;
+  ...
+  confirm_change_group ();
+
+   Code that wants to avoid this behavior can use keep ():
+
+  insn_change_watermark watermark;
+  validate_change (..., true); // A
+  ...
+  if (test)
+   // Undoes change A.
+   return false;
+  ...
+  watermark.keep ();
+  validate_change (..., true); // B
+  ...
+  if (test)
+   // Undoes change B, but not A.
+   return false;
+  ...
+  confirm_change_group ();  */
+class insn_change_watermark
+{
+public:
+  insn_change_watermark () : m_old_num_changes (num_validated_changes ()) {}
+  ~insn_change_watermark () { cancel_changes (m_old_num_changes); }
+  void keep () { m_old_num_changes = num_validated_changes (); }
+
+private:
+  int m_old_num_changes;
+};
+
 #endif
 
 #endif /* GCC_RECOG_H */
-- 
2.17.1



[19/23] rtlanal: Add some new helper classes

2020-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds some classes for gathering the list of registers
and memory that are read and written by an instruction, along
with various properties about the accesses.  In some ways it's
similar to the information that DF collects for registers,
but extended to memory.  The main reason for using it instead
of DF is that it can analyse tentative changes to instructions
before they've been committed.

The classes also collect general information about the instruction,
since it's cheap to do and helps to avoid multiple walks of the same
RTL pattern.

I've tried to optimise the code quite a bit, since with later patches
it becomes relatively performance-sensitive.  See the discussion in
the comments for the trade-offs involved.

I put the declarations in a new rtlanal.h header file since it
seemed a bit excessive to put so much new inline stuff in rtl.h.

gcc/
* rtlanal.h: New file.
(MEM_REGNO): New constant.
(rtx_obj_flags): New namespace.
(rtx_obj_reference, rtx_properties): New classes.
(growing_rtx_properties, vec_rtx_properties_base): Likewise.
(vec_rtx_properties): New alias.
* rtlanal.c: Include it.
(rtx_properties::try_to_add_reg): New function.
(rtx_properties::try_to_add_dest): Likewise.
(rtx_properties::try_to_add_src): Likewise.
(rtx_properties::try_to_add_pattern): Likewise.
(rtx_properties::try_to_add_insn): Likewise.
(vec_rtx_properties_base::grow): Likewise.
---
 gcc/rtlanal.c | 282 ++
 gcc/rtlanal.h | 334 ++
 2 files changed, 616 insertions(+)
 create mode 100644 gcc/rtlanal.h

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 30d5b0c6b76..404813b7668 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "backend.h"
 #include "target.h"
 #include "rtl.h"
+#include "rtlanal.h"
 #include "tree.h"
 #include "predict.h"
 #include "df.h"
@@ -2049,6 +2050,287 @@ note_uses (rtx *pbody, void (*fun) (rtx *, void *), 
void *data)
   return;
 }
 }
+
+/* Try to add a description of REG X to this object, stopping once
+   the REF_END limit has been reached.  FLAGS is a bitmask of
+   rtx_obj_reference flags that describe the context.  */
+
+void
+rtx_properties::try_to_add_reg (const_rtx x, unsigned int flags)
+{
+  if (REG_NREGS (x) != 1)
+flags |= rtx_obj_flags::IS_MULTIREG;
+  machine_mode mode = GET_MODE (x);
+  unsigned int start_regno = REGNO (x);
+  unsigned int end_regno = END_REGNO (x);
+  for (unsigned int regno = start_regno; regno < end_regno; ++regno)
+if (ref_iter != ref_end)
+  *ref_iter++ = rtx_obj_reference (regno, flags, mode,
+  regno - start_regno);
+}
+
+/* Add a description of destination X to this object.  FLAGS is a bitmask
+   of rtx_obj_reference flags that describe the context.
+
+   This routine accepts all rtxes that can legitimately appear in a
+   SET_DEST.  */
+
+void
+rtx_properties::try_to_add_dest (const_rtx x, unsigned int flags)
+{
+  /* If we have a PARALLEL, SET_DEST is a list of EXPR_LIST expressions,
+ each of whose first operand is a register.  */
+  if (__builtin_expect (GET_CODE (x) == PARALLEL, 0))
+{
+  for (int i = XVECLEN (x, 0) - 1; i >= 0; --i)
+   if (rtx dest = XEXP (XVECEXP (x, 0, i), 0))
+ try_to_add_dest (dest, flags);
+  return;
+}
+
+  unsigned int base_flags = flags & rtx_obj_flags::STICKY_FLAGS;
+  flags |= rtx_obj_flags::IS_WRITE;
+  for (;;)
+if (GET_CODE (x) == ZERO_EXTRACT)
+  {
+   try_to_add_src (XEXP (x, 1), base_flags);
+   try_to_add_src (XEXP (x, 2), base_flags);
+   flags |= rtx_obj_flags::IS_READ;
+   x = XEXP (x, 0);
+  }
+else if (GET_CODE (x) == STRICT_LOW_PART)
+  {
+   flags |= rtx_obj_flags::IS_READ;
+   x = XEXP (x, 0);
+  }
+else if (GET_CODE (x) == SUBREG)
+  {
+   flags |= rtx_obj_flags::IN_SUBREG;
+   if (read_modify_subreg_p (x))
+ flags |= rtx_obj_flags::IS_READ;
+   x = SUBREG_REG (x);
+  }
+else
+  break;
+
+  if (MEM_P (x))
+{
+  if (ref_iter != ref_end)
+   *ref_iter++ = rtx_obj_reference (MEM_REGNO, flags, GET_MODE (x));
+
+  unsigned int addr_flags = base_flags | rtx_obj_flags::IN_MEM_STORE;
+  if (flags & rtx_obj_flags::IS_READ)
+   addr_flags |= rtx_obj_flags::IN_MEM_LOAD;
+  try_to_add_src (XEXP (x, 0), addr_flags);
+  return;
+}
+
+  if (__builtin_expect (REG_P (x), 1))
+{
+  /* We want to keep sp alive everywhere -  by making all
+writes to sp also use sp. */
+  if (REGNO (x) == STACK_POINTER_REGNUM)
+   flags |= rtx_obj_flags::IS_READ;
+  try_to_add_reg (x, flags);
+  return;
+}
+}
+
+/* Try to add a description of source X to this object, stopping once
+   the REF_END limit has been reached.  FLAGS is a bit

[21/23] doc: Add documentation for rtl-ssa

2020-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds some documentation to rtl.texi about the SSA form.
It only really describes the high-level structure -- I think for
API-level stuff it's better to rely on function comments instead.

gcc/
* doc/rtl.texi (RTL SSA): New node.
---
 gcc/doc/rtl.texi | 787 +++
 1 file changed, 787 insertions(+)

diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 22af5731bb6..66236204c5e 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -39,6 +39,7 @@ form uses nested parentheses to indicate the pointers in the 
internal form.
 * Debug Information:: Expressions representing debugging information.
 * Insns:: Expression types for entire insns.
 * Calls:: RTL representation of function call insns.
+* RTL SSA::   An on-the-side SSA form for RTL
 * Sharing::   Some expressions are unique; others *must* be copied.
 * Reading RTL::   Reading textual RTL from a file.
 @end menu
@@ -4420,6 +4421,792 @@ function.  Similarly, if registers other than those in
 containing a single @code{clobber} follow immediately after the call to
 indicate which registers.
 
+@node RTL SSA
+@section On-the-Side SSA Form for RTL
+@cindex SSA, RTL form
+@cindex RTL SSA
+
+The patterns of an individual RTL instruction describe which registers
+are inputs to that instruction and which registers are outputs from
+that instruction.  However, it is often useful to know where the
+definition of a register input comes from and where the result of
+a register output is used.  One way of obtaining this information
+is to use the RTL SSA form, which provides a Static Single Assignment
+representation of the RTL instructions.
+
+The RTL SSA code is located in the @file{rtl-ssa} subdirectory of the GCC
+source tree.  This section only gives a brief overview of it; please
+see the comments in the source code for more details.
+
+@menu
+* Using RTL SSA:: What a pass needs to do to use the RTL SSA form
+* RTL SSA Instructions::  How instructions are represented and organized
+* RTL SSA Basic Blocks::  How instructions are grouped into blocks
+* RTL SSA Resources:: How registers and memory are represented
+* RTL SSA Accesses::  How register and memory accesses are represented
+* RTL SSA Phi Nodes:: How multiple sources are combined into one
+* RTL SSA Access Lists::  How accesses are chained together
+* Changing RTL Instructions:: How to use the RTL SSA framework to change insns
+@end menu
+
+@node Using RTL SSA
+@subsection Using RTL SSA in a pass
+
+A pass that wants to use the RTL SSA form should start with the following:
+
+@smallexample
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "df.h"
+#include "rtl-ssa.h"
+@end smallexample
+
+All the RTL SSA code is contained in the @code{rtl_ssa} namespace,
+so most passes will then want to do:
+
+@smallexample
+using namespace rtl_ssa;
+@end smallexample
+
+However, this is purely a matter of taste, and the examples in the rest of
+this section do not require it.
+
+The RTL SSA represention is an optional on-the-side feature that applies
+on top of the normal RTL instructions.  It is currently local to individual
+RTL passes and is not maintained across passes.
+
+However, in order to allow the RTL SSA information to be preserved across
+passes in future, @samp{crtl->ssa} points to the current function's
+SSA form (if any).  Passes that want to use the RTL SSA form should
+first do:
+
+@smallexample
+crtl->ssa = new rtl_ssa::function_info (@var{fn});
+@end smallexample
+
+where @var{fn} is the function that the pass is processing.
+(Passes that are @code{using namespace rtl_ssa} do not need
+the @samp{rtl_ssa::}.)
+
+Once the pass has finished with the SSA form, it should do the following:
+
+@smallexample
+free_dominance_info (CDI_DOMINATORS);
+if (crtl->ssa->perform_pending_updates ())
+  cleanup_cfg (0);
+
+delete crtl->ssa;
+crtl->ssa = nullptr;
+@end smallexample
+
+The @code{free_dominance_info} call is necessary because
+dominance information is not currently maintained between RTL passes.
+The next two lines commit any changes to the RTL instructions that
+were queued for later; see the comment above the declaration of
+@code{perform_pending_updates} for details.  The final two lines
+discard the RTL SSA form and free the associated memory.
+
+@node RTL SSA Instructions
+@subsection RTL SSA Instructions
+
+@cindex RPO
+@cindex reverse postorder
+@cindex instructions, RTL SSA
+@findex rtl_ssa::insn_info
+RTL SSA instructions are represented by an @code{rtl_ssa::insn_info}.
+These instructions are chained together in a single list that follows
+a reverse postorder (RPO) traversal of the function.  This means that
+if any path through the function can execute an instruction @var{I1}
+and then later execute an instruction @var{I2} for

[20/23] rtlanal: Add simple_regno_set

2020-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds a routine for finding a “simple” SET for a register
definition.  See the comment in the patch for details.

gcc/
* rtl.h (simple_regno_set): Declare.
* rtlanal.c (simple_regno_set): New function.
---
 gcc/rtl.h |  1 +
 gcc/rtlanal.c | 33 +
 2 files changed, 34 insertions(+)

diff --git a/gcc/rtl.h b/gcc/rtl.h
index e9df95b02c4..3915fae61e7 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3539,6 +3539,7 @@ extern void set_insn_deleted (rtx_insn *);
 /* Functions in rtlanal.c */
 
 extern rtx single_set_2 (const rtx_insn *, const_rtx);
+extern rtx simple_regno_set (rtx, unsigned int);
 extern bool contains_symbol_ref_p (const_rtx);
 extern bool contains_symbolic_reference_p (const_rtx);
 extern bool contains_constant_pool_address_p (const_rtx);
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 404813b7668..80e72d6049d 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1455,6 +1455,39 @@ set_of (const_rtx pat, const_rtx insn)
   return data.found;
 }
 
+/* Check whether instruction pattern PAT contains a SET with the following
+   properties:
+
+   - the SET is executed unconditionally;
+   - the destination of the SET is write-only rather than read-write; and
+   - either:
+ - the destination of the SET is a REG that contains REGNO; or
+ - the destination of the SET is a SUBREG of such a REG.
+
+   If PAT does have a SET like that, return the set, otherwise return null.
+
+   This is intended to be an alternative to single_set for passes that
+   can handle patterns with multiple_sets.  */
+rtx
+simple_regno_set (rtx pat, unsigned int regno)
+{
+  if (GET_CODE (pat) == PARALLEL)
+{
+  int last = XVECLEN (pat, 0) - 1;
+  for (int i = 0; i < last; ++i)
+   if (rtx set = simple_regno_set (XVECEXP (pat, 0, i), regno))
+ return set;
+
+  pat = XVECEXP (pat, 0, last);
+}
+
+  if (GET_CODE (pat) == SET
+  && covers_regno_no_parallel_p (SET_DEST (pat), regno))
+return pat;
+
+  return nullptr;
+}
+
 /* Add all hard register in X to *PSET.  */
 void
 find_all_hard_regs (const_rtx x, HARD_REG_SET *pset)
-- 
2.17.1



[PATCH 23/23] fwprop: Rewrite to use RTL SSA

2020-11-13 Thread Richard Sandiford via Gcc-patches
This patch rewrites fwprop.c to use the RTL SSA framework.  It tries
as far as possible to mimic the old behaviour, even in caes where
that doesn't fit naturally with the new framework.  I've added ???
comments to mark those places, but I think “fixing” them should
be done separately to make bisection easier.

In particular:

* The old implementation iterated over uses, and after a successful
  substitution, the new insn's uses were added to the end of the list.
  The pass still processed those uses, but because it processed them at
  the end, it didn't fully optimise one instruction before propagating
  it into the next.

  The new version follows the same approach for comparison purposes,
  but I'd like to drop that as a follow-on patch.

* The old implementation operated on single use sites (DF_REF_LOCs).
  This doesn't work well for instructions with match_dups, where it's
  necessary to update both an operand and its dups at the same time.
  For example, attempting to substitute into a divmod instruction would
  fail because only the div or the mod side would be updated.

  The new version again follows this to some extent for comparison
  purposes (although not exactly).  Again I'd like to drop it as a
  follow-on patch.

  One difference is that if a register occurs in multiple MEM addresses
  in a set, the new version will try to update them all at once.  This is
  what causes the SVE ACLE st4* output to improve.

Also, the old version didn't naturally guarantee termination (PR79405),
whereas the new one does.

gcc/
* fwprop.c: Rewrite to use the RTL SSA framework.

gcc/testsuite/
* gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c: Don't
expect insn updates to be deferred.
* gcc.target/aarch64/sve/acle/asm/st4_s8.c: Expect the addition
to be folded into the address.
* gcc.target/aarch64/sve/acle/asm/st4_s8.c: Likewise.
---
 gcc/fwprop.c  | 1698 ++---
 .../test-return-const.c.before-fwprop.c   |2 +-
 .../gcc.target/aarch64/sve/acle/asm/st4_s8.c  |8 +-
 .../gcc.target/aarch64/sve/acle/asm/st4_u8.c  |8 +-
 4 files changed, 561 insertions(+), 1155 deletions(-)

*** /tmp/9upGS6_fwprop.c2020-11-13 08:23:52.837409271 +
--- gcc/fwprop.c2020-11-13 08:05:06.490403698 +
***
*** 18,49 
  along with GCC; see the file COPYING3.  If not see
  .  */
  
  #include "config.h"
  #include "system.h"
  #include "coretypes.h"
  #include "backend.h"
- #include "target.h"
  #include "rtl.h"
- #include "predict.h"
  #include "df.h"
! #include "memmodel.h"
! #include "tm_p.h"
! #include "insn-config.h"
! #include "emit-rtl.h"
! #include "recog.h"
  
  #include "sparseset.h"
  #include "cfgrtl.h"
  #include "cfgcleanup.h"
  #include "cfgloop.h"
  #include "tree-pass.h"
- #include "domwalk.h"
  #include "rtl-iter.h"
! 
  
  /* This pass does simple forward propagation and simplification when an
 operand of an insn can only come from a single def.  This pass uses
!df.c, so it is global.  However, we only do limited analysis of
 available expressions.
  
 1) The pass tries to propagate the source of the def into the use,
--- 18,47 
  along with GCC; see the file COPYING3.  If not see
  .  */
  
+ #define ADD_NOTES 0
+ 
+ #define INCLUDE_ALGORITHM
+ #define INCLUDE_FUNCTIONAL
  #include "config.h"
  #include "system.h"
  #include "coretypes.h"
  #include "backend.h"
  #include "rtl.h"
  #include "df.h"
! #include "rtl-ssa.h"
  
  #include "sparseset.h"
+ #include "predict.h"
  #include "cfgrtl.h"
  #include "cfgcleanup.h"
  #include "cfgloop.h"
  #include "tree-pass.h"
  #include "rtl-iter.h"
! #include "target.h"
  
  /* This pass does simple forward propagation and simplification when an
 operand of an insn can only come from a single def.  This pass uses
!RTL SSA, so it is global.  However, we only do limited analysis of
 available expressions.
  
 1) The pass tries to propagate the source of the def into the use,
***
*** 60,68 
(set (subreg:SI (reg:DI 120) 0) (const_int 0))
(set (subreg:SI (reg:DI 120) 4) (const_int -1))
(set (subreg:SI (reg:DI 122) 0)
!  (ior:SI (subreg:SI (reg:DI 119) 0) (subreg:SI (reg:DI 120) 0)))
(set (subreg:SI (reg:DI 122) 4)
!  (ior:SI (subreg:SI (reg:DI 119) 4) (subreg:SI (reg:DI 120) 4)))
  
 can be simplified to the much simpler
  
--- 58,66 
(set (subreg:SI (reg:DI 120) 0) (const_int 0))
(set (subreg:SI (reg:DI 120) 4) (const_int -1))
(set (subreg:SI (reg:DI 122) 0)
!(ior:SI (subreg:SI (reg:DI 119) 0) (subreg:SI (reg:DI 120) 0)))
(set (subreg:SI (reg:DI 122) 4)
!(ior:SI (subreg:SI (reg:DI 119) 4) (subreg:SI (reg:DI 120) 4)))
  
 can be simplified to the much simpler
  
***
*** 89,95 
   (set (reg:QI 120) (subre

[r10-9014 Regression] FAIL: std/ranges/iota/96042.cc (test for excess errors) on Linux/x86_64

2020-11-13 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

8eb9a45e87bdb81cb44948c651edee846c622a0f is the first bad commit
commit 8eb9a45e87bdb81cb44948c651edee846c622a0f
Author: Jonathan Wakely 
Date:   Wed Aug 19 16:27:25 2020 +0100

libstdc++: Make __int128 meet integer-class requirements [PR 96042]

caused

FAIL: std/ranges/iota/96042.cc (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-10/releases/gcc-10/r10-9014/usr
 --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=std/ranges/iota/96042.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=std/ranges/iota/96042.cc 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH v3 1/2] generate EH info for volatile asm statements (PR93981)

2020-11-13 Thread Richard Biener via Gcc-patches
On Thu, Mar 12, 2020 at 1:41 AM J.W. Jagersma via Gcc-patches
 wrote:
>
> The following patch extends the generation of exception handling
> information, so that it is possible to catch exceptions thrown from
> volatile asm statements, when -fnon-call-exceptions is enabled.  Parts
> of the gcc code already suggested this should be possible, but it was
> never fully implemented.
>
> Two new test cases are added.  The target-dependent test should pass on
> platforms where throwing from a signal handler is allowed.  The only
> platform I am aware of where that is the case is *-linux-gnu, so it is
> set to XFAIL on all others.
>
> gcc/
> 2020-03-11  Jan W. Jagersma  
>
> PR inline-asm/93981
> * tree-cfg.c (make_edges_bb): Make EH edges for GIMPLE_ASM.
> * tree-eh.c (lower_eh_constructs_2): Add case for GIMPLE_ASM.
> Assign register output operands to temporaries.
> * doc/extend.texi: Document that volatile asms can now throw.
>
> gcc/testsuite/
> 2020-03-11  Jan W. Jagersma  
>
> PR inline-asm/93981
> * g++.target/i386/pr93981.C: New test.
> * g++.dg/eh/pr93981.C: New test.
> ---
>  gcc/doc/extend.texi |  5 +++
>  gcc/testsuite/g++.dg/eh/pr93981.C   | 18 
>  gcc/testsuite/g++.target/i386/pr93981.C | 55 +
>  gcc/tree-cfg.c  |  2 +
>  gcc/tree-eh.c   | 32 ++
>  5 files changed, 112 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/eh/pr93981.C
>  create mode 100644 gcc/testsuite/g++.target/i386/pr93981.C
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index e0e7f540c21..b51e34c617a 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -9577,6 +9577,11 @@ errors during compilation if your @code{asm} code 
> defines symbols or labels.
>  Using @samp{%=}
>  (@pxref{AssemblerTemplate}) may help resolve this problem.
>
> +When non-call exceptions (@option{-fnon-call-exceptions}) are enabled, a
> +@code{volatile asm} statement is also allowed to throw exceptions.  If it 
> does,
> +then its register output operands are assumed to be clobbered and will not be
> +used.  Memory operands, however, are always considered valid.
> +
>  @anchor{AssemblerTemplate}
>  @subsubsection Assembler Template
>  @cindex @code{asm} assembler template
> diff --git a/gcc/testsuite/g++.dg/eh/pr93981.C 
> b/gcc/testsuite/g++.dg/eh/pr93981.C
> new file mode 100644
> index 000..a9adb5c069e
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/eh/pr93981.C
> @@ -0,0 +1,18 @@
> +// PR inline-asm/93981
> +// { dg-do compile }
> +// { dg-options "-fnon-call-exceptions" }
> +
> +void
> +f ()
> +{
> +  try
> +{
> +  asm ("#try");
> +}
> +  catch (...)
> +{
> +  asm ("#catch");
> +}
> +}
> +
> +// { dg-final { scan-assembler "#catch" } }
> diff --git a/gcc/testsuite/g++.target/i386/pr93981.C 
> b/gcc/testsuite/g++.target/i386/pr93981.C
> new file mode 100644
> index 000..7a3117901f9
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr93981.C
> @@ -0,0 +1,55 @@
> +// PR inline-asm/93981
> +// { dg-do run }
> +// { dg-options "-fnon-call-exceptions -O3" }
> +// { dg-xfail-if "" { ! *-linux-gnu } }
> +// { dg-xfail-run-if "" { ! *-linux-gnu } }
> +
> +#include 
> +
> +struct illegal_opcode { };
> +
> +extern "C" void
> +sigill (int)
> +{
> +  throw illegal_opcode ( );
> +}
> +
> +int
> +test_mem ()
> +{
> +  int i = 2;
> +  try
> +{
> +  asm volatile ("mov%z0 $1, %0; ud2" : "=m" (i));
> +}
> +  catch (const illegal_opcode&)
> +{
> +  if (i == 1) return 0;
> +}
> +  return i;
> +}
> +
> +int
> +test_reg ()
> +{
> +  int i = 8;
> +  try
> +{
> +  asm volatile ("mov%z0 $4, %0; ud2" : "=r" (i));
> +}
> +  catch (const illegal_opcode&)
> +{
> +  if (i == 8) return 0;
> +}
> +  return i;
> +}
> +
> +int
> +main ()
> +{
> +  struct sigaction sa = { };
> +  sa.sa_handler = sigill;
> +  sa.sa_flags = SA_NODEFER;
> +  sigaction (SIGILL, &sa, 0);
> +  return test_mem () | test_reg ();
> +}
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index f7b817d94e6..c21a7978493 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -913,6 +913,8 @@ make_edges_bb (basic_block bb, struct omp_region 
> **pcur_region, int *pomp_index)
>break;
>
>  case GIMPLE_ASM:
> +  if (stmt_can_throw_internal (cfun, last))
> +   make_eh_edges (last);
>make_gimple_asm_edges (bb);
>fallthru = true;
>break;
> diff --git a/gcc/tree-eh.c b/gcc/tree-eh.c
> index 2a409dcaffe..58b16aa763a 100644
> --- a/gcc/tree-eh.c
> +++ b/gcc/tree-eh.c
> @@ -2077,6 +2077,8 @@ lower_eh_constructs_2 (struct leh_state *state, 
> gimple_stmt_iterator *gsi)
> DECL_GIMPLE_REG_P (tmp) = 1;
>   gsi_insert_after (gsi, s, GSI_SAME_STMT);
> }
> +
> +record_throwing_stmt:
>/* Look for things that can throw exceptions, and record them.  */
>if

Re: [PATCH v3 1/2] generate EH info for volatile asm statements (PR93981)

2020-11-13 Thread Richard Biener via Gcc-patches
On Thu, Nov 12, 2020 at 4:53 PM Jeff Law via Gcc-patches
 wrote:
>
>
> On 3/11/20 6:38 PM, J.W. Jagersma via Gcc-patches wrote:
> > The following patch extends the generation of exception handling
> > information, so that it is possible to catch exceptions thrown from
> > volatile asm statements, when -fnon-call-exceptions is enabled.  Parts
> > of the gcc code already suggested this should be possible, but it was
> > never fully implemented.
> >
> > Two new test cases are added.  The target-dependent test should pass on
> > platforms where throwing from a signal handler is allowed.  The only
> > platform I am aware of where that is the case is *-linux-gnu, so it is
> > set to XFAIL on all others.
> >
> > gcc/
> > 2020-03-11  Jan W. Jagersma  
> >
> >   PR inline-asm/93981
> >   * tree-cfg.c (make_edges_bb): Make EH edges for GIMPLE_ASM.
> >   * tree-eh.c (lower_eh_constructs_2): Add case for GIMPLE_ASM.
> >   Assign register output operands to temporaries.
> >   * doc/extend.texi: Document that volatile asms can now throw.
> >
> > gcc/testsuite/
> > 2020-03-11  Jan W. Jagersma  
> >
> >   PR inline-asm/93981
> >   * g++.target/i386/pr93981.C: New test.
> >   * g++.dg/eh/pr93981.C: New test.
>
> Is this the final version of the patch?  Do we have agreement on the
> sematics for output operands, particularly memory operands?  The last
> few messages in the March thread lead me to believe that's still not
> settled.

I think it's up to the asm itself to put the correct contents.  For the
cases where GCC needs to emit copies from outputs (that is,
if it ever reloads them) the only sensible thing is that those are
not emitted on the EH edge but only on the fallthru one.

On GIMPLE this cannot be represented but it means that
SSA uses of asm defs may not appear on the EH edge
(I do have some checking patch for this somewhere which
I think catches one or two existing problems).  On RTL if
the outputs are registers we cannot do any such checking
of course (no SSA form) and whether the old or the "new"
value lives is an implementation detail of the asm itself.

Richard.

>
> Jeff
>
>


Re: [PATCH] Implementation of asm goto outputs

2020-11-13 Thread Richard Biener via Gcc-patches
On Thu, Nov 12, 2020 at 8:55 PM Vladimir Makarov via Gcc-patches
 wrote:
>
>The following patch implements asm goto with outputs.  Kernel
> developers several times expressed wish to have this feature. Asm
> goto with outputs was implemented in LLVM recently.  This new feature
> was presented on 2020 linux plumbers conference
> (https://linuxplumbersconf.org/event/7/contributions/801/attachments/659/1212/asm_goto_w__Outputs.pdf)
> and 2020 LLVM conference
> (https://www.youtube.com/watch?v=vcPD490s-hE).
>
>The patch permits to use outputs in asm gotos only when LRA is used.
> It is problematic to implement it in the old reload pass.  To be
> honest it was hard to implement it in LRA too until global live info
> update was added to LRA few years ago.
>
>Different from LLVM asm goto output implementation, you can use
> outputs on any path from the asm goto (not only on fallthrough path as
> in LLVM).
>
>The patch removes critical edges on which potentially asm output
> reloads could occur (it means you can have several asm gotos using the
> same labels and the same outputs).  It is done in IRA as it is
> difficult to create new BBs in LRA.  The most of the work (placement
> of output reloads in BB destinations of asm goto basic block) is done in
> LRA.  When it happens, LRA updates global live info to reflect that
> new pseudos live on the BB borders and the old ones do not live there
> anymore.
>
>I tried also approach to split live ranges of pseudos involved in
> asm goto outputs to guarantee they get hard registers in IRA. But
> this approach did not work as it is difficult to keep this assignment
> through all LRA. Also probably it would result in worse code as move
> insn coalescing is not guaranteed.
>
>Asm goto with outputs will not work for targets which were not
> converted to LRA (probably some outdated targets as the old reload
> pass is not supported anymore).  An error will be generated when the
> old reload pass meets asm goto with an output.  A precaution is taken
> not to crash compiler after this error.
>
>The patch is pretty small as all necessary infrastructure was
> already implemented, practically in all compiler pipeline.  It did not
> required adding new RTL insns opposite to what Google engineers did to
> LLVM MIR.
>
>The patch could be also useful for implementing jump insns with
> output reloads in the future (e.g. branch and count insns).
>
>I think asm gotos with outputs should be considered as an experimental
> feature as there are no real usage of this yet.  Earlier adoption of
> this feature could help with debugging and hardening the
> implementation.
>
>The patch was successfully bootstrapped and tested on x86-64, ppc64,
> and aarch64.
>
> Are non-RA changes ok in the patch?

Minor nit for the RA parts:

+  if (i < recog_data.n_operands)
+   {
+ error_for_asm (insn,
+"old reload pass does not support asm goto "
+"with outputs in %");
+ ira_nullify_asm_goto (insn);

I'd say "the target does not support ...", the user shouldn't be concerned
about a thing called "reload".


diff --git a/gcc/tree-into-ssa.c b/gcc/tree-into-ssa.c
index 1493b323956..9be8e295627 100644
--- a/gcc/tree-into-ssa.c
+++ b/gcc/tree-into-ssa.c
@@ -1412,6 +1412,11 @@ rewrite_stmt (gimple_stmt_iterator *si)
SET_DEF (def_p, name);
register_new_def (DEF_FROM_PTR (def_p), var);

+   /* Do not insert debug stmt after asm goto: */
+   if (gimple_code (stmt) == GIMPLE_ASM
+   && gimple_asm_nlabels (as_a  (stmt)) > 0)
+ continue;
+

why?  Ah, the next line explains.  I guess it's better done as

   /* Do not insert debug stmts if the stmt ends the BB.  */
   if (stmt_ends_bb_p (stmt))
 continue;

I wonder why the code never ran into issues for calls that throw
internal ...

You have plenty compile testcases but not a single execute one.
So - does it actually work? ;)

Otherwise OK.

> 2020-11-12  Vladimir Makarov 
>
>  * c/c-parser.c (c_parser_asm_statement): Parse outputs for asm
>  goto too.
>  * c/c-typeck.c (build_asm_expr): Remove an assert checking output
>  absence for asm goto.

I'm sure this will be rejected by the commit hook.  You need sth like

gcc/c/
* c-parser.c (...

gcc/
>  * cfgexpand.c (expand_asm_stmt): Output asm goto with outputs too.
>  Place insns after asm goto on edges.
>  * cp/parser.c (cp_parser_asm_definition): Parse outputs for asm
>  goto too.
>  * doc/extend.texi: Reflect the changes in asm goto documentation.
>  * gcc/gimple.c (gimple_build_asm_1): Remove an assert checking
> output
>  absence for asm goto.
>  * gimple.h (gimple_asm_label_op, gimple_asm_set_label_op): Take
>  possible asm goto outputs into account.
>  * ira.c (ira): Remove critical edges for potential asm goto output
>  reloads.
>

Re: Fix gimple_expr_code?

2020-11-13 Thread Richard Biener via Gcc-patches
On Thu, Nov 12, 2020 at 10:12 PM Andrew MacLeod  wrote:

> On 11/12/20 3:53 PM, Richard Biener wrote:
> > On November 12, 2020 9:43:52 PM GMT+01:00, Andrew MacLeod via
> Gcc-patches  wrote:
> >> So I spent some time tracking down a ranger issue, and in the end, it
> >> boiled down to the range-op handler not being picked up properly.
> >>
> >> The handler is picked up by:
> >>
> >>if ((gimple_code (s) == GIMPLE_ASSIGN) || (gimple_code (s) ==
> >> GIMPLE_COND))
> >>  return range_op_handler (gimple_expr_code (s), gimple_expr_type
> >> (s));
> > IMHO this should use more specific functions. Gimple_expr_code should go
> away similar to gimple_expr_type.
>
> gimple_expr_type is quite pervasive.. and each consumer is going to have
> to roll their own version of it.  Why do we want to get rid of it?
>
> If we are trying to save a few bytes by storing the information in
> different places, then we're going to need some sort of accessing
> function like that
> >
> >> where it is indexing the table with the gimple_expr_code..
> >> the stmt being processed was for a pointer assignment,
> >>_5 = _33
> >> and it was coming back with a gimple_expr_code of  VAR_DECL instead of
> >> an SSA_NAME... which confused me greatly.
> >>
> >>
> >> gimple_expr_code (const gimple *stmt)
> >> {
> >>enum gimple_code code = gimple_code (stmt);
> >>if (code == GIMPLE_ASSIGN || code == GIMPLE_COND)
> >>  return (enum tree_code) stmt->subcode;
> >>
> >> A little more digging shows this:
> >>
> >> static inline enum tree_code
> >> gimple_assign_rhs_code (const gassign *gs)
> >> {
> >>enum tree_code code = (enum tree_code) gs->subcode;
> >>/* While we initially set subcode to the TREE_CODE of the rhs for
> >>   GIMPLE_SINGLE_RHS assigns we do not update that subcode to stay
> >>   in sync when we rewrite stmts into SSA form or do SSA
> >> propagations.  */
> >>if (get_gimple_rhs_class (code) == GIMPLE_SINGLE_RHS)
> >>  code = TREE_CODE (gs->op[1]);
> >>
> >>return code;
> >> }
> >>
> >> Fascinating comment.
> > ... 😬
> >
> >> But it means that gimple_expr_code() isn't returning the correct result
> >>
> >> for GIMPLE_SINGLE_RHS
> > It depends. A SSA name isn't an expression code either. As said, the
> generic gimple_expr_code should be used with extreme care.
>
> what is an expression code?  It seems like its just a  tree_code
> representing what is on the RHS?Im not sure I understand why one
> needs to be careful with it.  It only applies to COND, ASSIGN and CALL.
> and its current right for everything except GIMPLE_SINGLE_RHS?
>
> If we dont fix gimple_expr_code, then Im basically going to be
> reimplementing it myself... which seems kind of pointless.
>

Well sure we can fix it.  Your patch looks OK but can be optimized like

  if (gassign *ass = dyn_cast (stmt))
return gimple_assign_rhs_code (stmt);

note it looks odd that we use this for gimple_assign but
directly access subcode for GIMPLE_COND instead
of returning gimple_cond_code () (again, operate on
gcond to avoid an extra check).

Thanks,
Richard.


> Andrew
>
>
>
>


Re: [gcc r9-8794] aarch64: Clear canary value after stack_protect_test [PR96191]

2020-11-13 Thread Richard Sandiford via Gcc-patches
Sebastian Pop  writes:
> Hi,
>
> On Fri, Aug 7, 2020 at 6:18 AM Richard Sandiford  wrote:
>>
>> https://gcc.gnu.org/g:5380912a17ea09a8996720fb62b1a70c16c8f9f2
>>
>> commit r9-8794-g5380912a17ea09a8996720fb62b1a70c16c8f9f2
>> Author: Richard Sandiford 
>> Date:   Fri Aug 7 12:17:37 2020 +0100
>
> could you please also apply this change to the gcc-8 branch?

OK, I'll backport it next week.

Thanks,
Richard

>
> Thanks,
> Sebastian
>
>>
>> aarch64: Clear canary value after stack_protect_test [PR96191]
>>
>> The stack_protect_test patterns were leaving the canary value in the
>> temporary register, meaning that it was often still in registers on
>> return from the function.  An attacker might therefore have been
>> able to use it to defeat stack-smash protection for a later function.
>>
>> gcc/
>> PR target/96191
>> * config/aarch64/aarch64.md (stack_protect_test_): Set the
>> CC register directly, instead of a GPR.  Replace the original GPR
>> destination with an extra scratch register.  Zero out operand 3
>> after use.
>> (stack_protect_test): Update accordingly.
>>
>> gcc/testsuite/
>> PR target/96191
>> * gcc.target/aarch64/stack-protector-1.c: New test.
>> * gcc.target/aarch64/stack-protector-2.c: Likewise.
>>
>> (cherry picked from commit fe1a26429038d7cd17abc53f96a6f3e2639b605f)
>>
>> Diff:
>> ---
>>  gcc/config/aarch64/aarch64.md  | 34 -
>>  .../gcc.target/aarch64/stack-protector-1.c | 89 
>> ++
>>  .../gcc.target/aarch64/stack-protector-2.c |  6 ++
>>  3 files changed, 110 insertions(+), 19 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> index ed8cf8ecea1..9598bac387f 100644
>> --- a/gcc/config/aarch64/aarch64.md
>> +++ b/gcc/config/aarch64/aarch64.md
>> @@ -6985,10 +6985,8 @@
>> (match_operand 2)]
>>""
>>  {
>> -  rtx result;
>>machine_mode mode = GET_MODE (operands[0]);
>>
>> -  result = gen_reg_rtx(mode);
>>if (aarch64_stack_protector_guard != SSP_GLOBAL)
>>{
>>  /* Generate access through the system register. The
>> @@ -7013,29 +7011,27 @@
>>  operands[1] = gen_rtx_MEM (mode, tmp_reg);
>>}
>>emit_insn ((mode == DImode
>> - ? gen_stack_protect_test_di
>> - : gen_stack_protect_test_si) (result,
>> -   operands[0],
>> -   operands[1]));
>> -
>> -  if (mode == DImode)
>> -emit_jump_insn (gen_cbranchdi4 (gen_rtx_EQ (VOIDmode, result, 
>> const0_rtx),
>> -   result, const0_rtx, operands[2]));
>> -  else
>> -emit_jump_insn (gen_cbranchsi4 (gen_rtx_EQ (VOIDmode, result, 
>> const0_rtx),
>> -   result, const0_rtx, operands[2]));
>> +? gen_stack_protect_test_di
>> +: gen_stack_protect_test_si) (operands[0], operands[1]));
>> +
>> +  rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
>> +  emit_jump_insn (gen_condjump (gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx),
>> +   cc_reg, operands[2]));
>>DONE;
>>  })
>>
>> +;; DO NOT SPLIT THIS PATTERN.  It is important for security reasons that the
>> +;; canary value does not live beyond the end of this sequence.
>>  (define_insn "stack_protect_test_"
>> -  [(set (match_operand:PTR 0 "register_operand" "=r")
>> -   (unspec:PTR [(match_operand:PTR 1 "memory_operand" "m")
>> -(match_operand:PTR 2 "memory_operand" "m")]
>> -UNSPEC_SP_TEST))
>> +  [(set (reg:CC CC_REGNUM)
>> +   (unspec:CC [(match_operand:PTR 0 "memory_operand" "m")
>> +   (match_operand:PTR 1 "memory_operand" "m")]
>> +  UNSPEC_SP_TEST))
>> +   (clobber (match_scratch:PTR 2 "=&r"))
>> (clobber (match_scratch:PTR 3 "=&r"))]
>>""
>> -  "ldr\t%3, %1\;ldr\t%0, %2\;eor\t%0, %3, %0"
>> -  [(set_attr "length" "12")
>> +  "ldr\t%2, %0\;ldr\t%3, %1\;subs\t%2, %2, %3\;mov\t%3, 0"
>> +  [(set_attr "length" "16")
>> (set_attr "type" "multiple")])
>>
>>  ;; Write Floating-point Control Register.
>> diff --git a/gcc/testsuite/gcc.target/aarch64/stack-protector-1.c 
>> b/gcc/testsuite/gcc.target/aarch64/stack-protector-1.c
>> new file mode 100644
>> index 000..73e83bc413f
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/stack-protector-1.c
>> @@ -0,0 +1,89 @@
>> +/* { dg-do run } */
>> +/* { dg-require-effective-target fstack_protector } */
>> +/* { dg-options "-fstack-protector-all -O2" } */
>> +
>> +extern volatile long *stack_chk_guard_ptr;
>> +
>> +volatile long *
>> +get_ptr (void)
>> +{
>> +  return stack_chk_guard_ptr;
>> +}
>> +
>> +void __attribute__ ((noipa))
>> +f (void)
>> +{
>> +  volatile int x;
>> +  x = 1;
>> +  x += 1;
>> +}
>> +
>> +#define CHECK(REG) "\tcmp\tx0, " #REG "\n\tbeq\t1f\

Re: [PATCH] rs6000: Don't split constant operator add before reload, move to temp register for future optimization

2020-11-13 Thread Xionghu Luo via Gcc-patches
Hi,

On 2020/10/27 05:10, Segher Boessenkool wrote:
> On Wed, Oct 21, 2020 at 03:25:29AM -0500, Xionghu Luo wrote:
>> Don't split code from add3 for SDI to allow a later pass to split.
> 
> This is very problematic.
> 
>> This allows later logic to hoist out constant load in add instructions.
> 
> Later logic should be able to do that any way (I do not say that works
> perfectly, mind; it no doubt could be improved).
> 
>> In loop, lis+ori could be hoisted out to improve performance compared with
>> previous addis+addi (About 15% on typical case), weak point is
>> one more register is used and one more instruction is generated.  i.e.:
> 
> Yes, better performance on one testcase, and worse code always :-(
> 
>> addis 3,3,0x6765
>> addi 3,3,0x4321
>>
>> =>
>>
>> lis 9,0x6765
>> ori 9,9,0x4321
>> add 3,3,9
> 
> This is the typical kind of clumsy code you get if you generate RTL that
> matches actual machine instructions too late ("split too late").
> 
> So, please make it possible to hoist 2-insn-immediate sequences out of
> loops, *without* changing them to fake 1-insn things.
> 

As we discussed offline, addis+addi is not quite possible to be hoisted out of
loops as not invariant, update the patch as below, thanks:


[PATCH v2] rs6000: Split constant operator add in split1 instead of expander


Currently, ADD with positive 32bit constant is split to addis+addi
in expander, which seems too early to optimize the constant load out
of loop compared with other targets.  This patch use a temp register
to load the constant and do two register addition in expander same as
negative 32bit constant add.
This allows loop invariant pass to hoist out constant load before
add instructions, then split1 pass will split the load to lis+ori
after combine.  Performance could be improved by 15% on typical case
compared with previous addis+addi in loop.

(1) 0x67654321
addis 3,3,0x6765
addi 3,3,0x4321
=>
lis 9,0x6765
ori 9,9,0x4321
add 3,3,9

(2) 0x8fff
addis 9,9,0x1
addi 3,9,-28673
=>
li 10,0
ori 10,10,0x8fff
add 3,3,10

Regression and bootstrap tested pass on P8LE.

gcc/ChangeLog:

2020-10-21  Xiong Hu Luo  

* config/rs6000/rs6000.md (add3 for SDI): Don't split
before reload, move constant to temp register for add.
(define_split): Split const from split1.

gcc/testsuite/ChangeLog:

2020-10-21  Xiong Hu Luo  

* gcc.target/powerpc/add-const.c: New test.
---
 gcc/config/rs6000/rs6000.md  | 38 
 gcc/testsuite/gcc.target/powerpc/add-const.c | 18 ++
 2 files changed, 41 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/add-const.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 5e5ad9f7c3d..b52e9555962 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -1750,18 +1750,26 @@ (define_expand "add3"
 
   if (CONST_INT_P (operands[2]) && !add_operand (operands[2], mode))
 {
-  rtx tmp = ((!can_create_pseudo_p ()
- || rtx_equal_p (operands[0], operands[1]))
-? operands[0] : gen_reg_rtx (mode));
-
-  /* Adding a constant to r0 is not a valid insn, so use a different
-strategy in that case.  */
-  if (reg_or_subregno (operands[1]) == 0 || reg_or_subregno (tmp) == 0)
+  bool reg0 = reg_or_subregno (operands[0]) == 0;
+  if (can_create_pseudo_p () || reg0)
{
- if (operands[0] == operands[1])
-   FAIL;
- rs6000_emit_move (operands[0], operands[2], mode);
- emit_insn (gen_add3 (operands[0], operands[1], operands[0]));
+
+ rtx tmp = (!can_create_pseudo_p ()
+ || rtx_equal_p (operands[0], operands[1]))
+   ? operands[0] : gen_reg_rtx (mode);
+
+ /* Adding a constant to r0 is not a valid insn, so use a different
+strategy in that case.  See stack-limit.c, need generate
+"24: %0:DI=0x20fa0; 25: %0:DI=%14:DI+%0:DI" in pro_and_epilogue
+when can_create_pseudo_p is false.  */
+ if (reg0 == 0 || reg_or_subregno (tmp) == 0)
+   {
+ if (operands[0] == operands[1])
+   FAIL;
+   }
+
+ rs6000_emit_move (tmp, operands[2], mode);
+ emit_insn (gen_add3 (operands[0], operands[1], tmp));
  DONE;
}
 
@@ -1775,8 +1783,8 @@ (define_expand "add3"
   /* The ordering here is important for the prolog expander.
 When space is allocated from the stack, adding 'low' first may
 produce a temporary deallocation (which would be bad).  */
-  emit_insn (gen_add3 (tmp, operands[1], GEN_INT (rest)));
-  emit_insn (gen_add3 (operands[0], tmp, GEN_INT (low)));
+  emit_insn (gen_add3 (operands[0], operands[1], GEN_INT (rest)));
+  emit_insn (gen_add3 (operands[0], operands[0], GEN_INT (low)));
   DONE;
 }
 })
@@ -9118,7 +9126,7 @@ (define_split
 ;; When non-easy constants can go in the TOC, this should use
 ;; easy_f

RE: Enable MOVDIRI, MOVDIR64B, CLDEMOTE and WAITPKG for march=tremont

2020-11-13 Thread Cui, Lili via Gcc-patches
Hi Uros,

This patch is  to correct previous patch,
PREFETCHW should be both in march=broadwell and march=Silvermont,
but I move PREFETCHW from march=broadwell to march=silvermont in previous
patch, sorry for that.

Bootstrap is ok, and no regressions for i386/x86-64 testsuite.

OK for master?


[PATCH] Put PREFETCHW back to march=broadwell

PREFETCHW should be both in march=broadwell and march=silvermont.
I move PREFETCHW from march=broadwell to march=silvermont in previous
patch.

gcc/ChangeLog:

* config/i386/i386.h: Add PREFETCHW to march=broadwell.
* doc/invoke.texi: Put PREFETCHW back to relation arch.
---
 gcc/config/i386/i386.h |  3 ++-
 gcc/doc/invoke.texi| 50 +++---
 2 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3be7551d6c3..b8ae16e2865 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2518,7 +2518,8 @@ const wide_int_bitmask PTA_IVYBRIDGE = PTA_SANDYBRIDGE | 
PTA_FSGSBASE
   | PTA_RDRND | PTA_F16C;
 const wide_int_bitmask PTA_HASWELL = PTA_IVYBRIDGE | PTA_AVX2 | PTA_BMI
   | PTA_BMI2 | PTA_LZCNT | PTA_FMA | PTA_MOVBE | PTA_HLE;
-const wide_int_bitmask PTA_BROADWELL = PTA_HASWELL | PTA_ADX | PTA_RDSEED;
+const wide_int_bitmask PTA_BROADWELL = PTA_HASWELL | PTA_ADX | PTA_RDSEED
+  | PTA_PRFCHW;
 const wide_int_bitmask PTA_SKYLAKE = PTA_BROADWELL | PTA_AES | PTA_CLFLUSHOPT
   | PTA_XSAVEC | PTA_XSAVES | PTA_SGX;
 const wide_int_bitmask PTA_SKYLAKE_AVX512 = PTA_SKYLAKE | PTA_AVX512F
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 69bf1fa89dd..3c292593030 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -29560,13 +29560,13 @@ BMI, BMI2 and F16C instruction set support.
 @item broadwell
 Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
 SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, 
BMI2,
-F16C, RDSEED and ADCX instruction set support.
+F16C, RDSEED ADCX and PREFETCHW instruction set support.
 
 @item skylake
 Intel Skylake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
 SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
-BMI, BMI2, F16C, RDSEED, ADCX, CLFLUSHOPT, XSAVEC and XSAVES instruction set
-support.
+BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC and XSAVES
+instruction set support.
 
 @item bonnell
 Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and SSSE3
@@ -29595,32 +29595,33 @@ MOVDIR64B, CLDEMOTE and WAITPKG instruction set 
support.
 @item knl
 Intel Knight's Landing CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3,
 SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
-BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHWT1, AVX512F, AVX512PF, AVX512ER and
-AVX512CD instruction set support.
+BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, PREFETCHWT1, AVX512F, AVX512PF,
+AVX512ER and AVX512CD instruction set support.
 
 @item knm
 Intel Knights Mill CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3,
 SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
-BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHWT1, AVX512F, AVX512PF, AVX512ER, 
AVX512CD,
-AVX5124VNNIW, AVX5124FMAPS and AVX512VPOPCNTDQ instruction set support.
+BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, PREFETCHWT1, AVX512F, AVX512PF,
+AVX512ER, AVX512CD, AVX5124VNNIW, AVX5124FMAPS and AVX512VPOPCNTDQ instruction
+set support.
 
 @item skylake-avx512
 Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3,
 SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, 
FMA,
-BMI, BMI2, F16C, RDSEED, ADCX, CLFLUSHOPT, XSAVEC, XSAVES, AVX512F,
+BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, XSAVES, AVX512F,
 CLWB, AVX512VL, AVX512BW, AVX512DQ and AVX512CD instruction set support.
 
 @item cannonlake
 Intel Cannonlake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2,
 SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE,
-RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, CLFLUSHOPT, XSAVEC,
+RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC,
 XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ, AVX512CD, AVX512VBMI,
 AVX512IFMA, SHA and UMIP instruction set support.
 
 @item icelake-client
 Intel Icelake Client CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2,
 SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE,
-RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, CLFLUSHOPT, XSAVEC,
+RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC,
 XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ, AVX512CD, AVX512VBMI,
 AVX512IFMA, SHA, CLWB, UMIP, RDPID, GFNI, AVX512VBMI2, AVX512VPOPCNTDQ,
 AVX512BITALG, AVX512VNNI, VPCLMULQDQ, VAES instruction set support.
@@ -29628,7 +29629,7 @@ AVX512BITALG, AVX512VNNI, VPCLMULQDQ, VAES instruction 
set support.
 @item icelake-server
 Intel Icelake Se

Re: [PATCH] Put absolute address jump table in data.rel.ro.local if targets support relocations

2020-11-13 Thread Richard Sandiford via Gcc-patches
Hi,

Sorry for the slow reply.  Just one minor nit:

HAO CHEN GUI  writes:
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index ea0b59cf44a..40502049b61 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -727,12 +727,26 @@ switch_to_other_text_partition (void)
>switch_to_section (current_function_section ());
>  }
>  
> -/* Return the read-only data section associated with function DECL.  */
> +/* Return the read-only or relocated read-only data section
> +   associated with function DECL.  */
>  
>  section *
> -default_function_rodata_section (tree decl)
> +default_function_rodata_section (tree decl, bool relocatable)
>  {
> -  if (decl != NULL_TREE && DECL_SECTION_NAME (decl))
> +  const char* sname;
> +  unsigned int flags;
> +
> +  flags = 0;
> +
> +  if (relocatable)
> +{
> +  sname = ".data.rel.ro.local";
> +  flags = (SECTION_WRITE | SECTION_RELRO);
> +}
> +  else
> +sname = ".rodata";
> +
> +  if (decl && DECL_SECTION_NAME (decl))
>  {
>const char *name = DECL_SECTION_NAME (decl);
>  
> @@ -745,38 +759,57 @@ default_function_rodata_section (tree decl)
> dot = strchr (name + 1, '.');
> if (!dot)
>   dot = name;
> -   len = strlen (dot) + 8;
> +   len = strlen (dot) + strlen (sname) + 1;
> rname = (char *) alloca (len);
>  
> -   strcpy (rname, ".rodata");
> +   strcpy (rname, sname);
> strcat (rname, dot);
> -   return get_section (rname, SECTION_LINKONCE, decl);
> +   return get_section (rname, (SECTION_LINKONCE | flags), decl);
>   }
> -  /* For .gnu.linkonce.t.foo we want to use .gnu.linkonce.r.foo.  */
> +  /* For .gnu.linkonce.t.foo we want to use .gnu.linkonce.r.foo or
> +  .gnu.linkonce.d.rel.ro.local.foo if the jump table is relocatable.  */
>else if (DECL_COMDAT_GROUP (decl)
>  && strncmp (name, ".gnu.linkonce.t.", 16) == 0)
>   {
> -   size_t len = strlen (name) + 1;
> -   char *rname = (char *) alloca (len);
> +   size_t len;
> +   char *rname;
>  
> -   memcpy (rname, name, len);
> -   rname[14] = 'r';
> -   return get_section (rname, SECTION_LINKONCE, decl);
> +   if (relocatable)
> + {
> +   len = strlen (name) + strlen (".rel.ro.local") + 1;
> +   rname = (char *) alloca (len);
> +
> +   strcpy (rname, ".gnu.linkonce.d");
> +   strcat (rname, ".rel.ro.local");
> +   strcat (rname, name + 15);

I realise you probably wrote it like this to make the correlation
between the length calculation and the string operations more
obvious, but IMO it would be less surprising to have:

  strcpy (rname, ".gnu.linkonce.d.rel.ro.local");

OK with that change, thanks.

Richard


Re: Enable MOVDIRI, MOVDIR64B, CLDEMOTE and WAITPKG for march=tremont

2020-11-13 Thread Uros Bizjak via Gcc-patches
On Fri, Nov 13, 2020 at 10:18 AM Cui, Lili  wrote:
>
> Hi Uros,
>
> This patch is  to correct previous patch,
> PREFETCHW should be both in march=broadwell and march=Silvermont,
> but I move PREFETCHW from march=broadwell to march=silvermont in previous
> patch, sorry for that.
>
> Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
>
> OK for master?
>
>
> [PATCH] Put PREFETCHW back to march=broadwell
>
> PREFETCHW should be both in march=broadwell and march=silvermont.
> I move PREFETCHW from march=broadwell to march=silvermont in previous
> patch.
>
> gcc/ChangeLog:
>
> * config/i386/i386.h: Add PREFETCHW to march=broadwell.
> * doc/invoke.texi: Put PREFETCHW back to relation arch.

OK.

These kinds of changes can be considered under obvious rule.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.h |  3 ++-
>  gcc/doc/invoke.texi| 50 +++---
>  2 files changed, 29 insertions(+), 24 deletions(-)
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 3be7551d6c3..b8ae16e2865 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2518,7 +2518,8 @@ const wide_int_bitmask PTA_IVYBRIDGE = PTA_SANDYBRIDGE 
> | PTA_FSGSBASE
>| PTA_RDRND | PTA_F16C;
>  const wide_int_bitmask PTA_HASWELL = PTA_IVYBRIDGE | PTA_AVX2 | PTA_BMI
>| PTA_BMI2 | PTA_LZCNT | PTA_FMA | PTA_MOVBE | PTA_HLE;
> -const wide_int_bitmask PTA_BROADWELL = PTA_HASWELL | PTA_ADX | PTA_RDSEED;
> +const wide_int_bitmask PTA_BROADWELL = PTA_HASWELL | PTA_ADX | PTA_RDSEED
> +  | PTA_PRFCHW;
>  const wide_int_bitmask PTA_SKYLAKE = PTA_BROADWELL | PTA_AES | PTA_CLFLUSHOPT
>| PTA_XSAVEC | PTA_XSAVES | PTA_SGX;
>  const wide_int_bitmask PTA_SKYLAKE_AVX512 = PTA_SKYLAKE | PTA_AVX512F
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 69bf1fa89dd..3c292593030 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -29560,13 +29560,13 @@ BMI, BMI2 and F16C instruction set support.
>  @item broadwell
>  Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, 
> SSSE3,
>  SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, 
> BMI2,
> -F16C, RDSEED and ADCX instruction set support.
> +F16C, RDSEED ADCX and PREFETCHW instruction set support.
>
>  @item skylake
>  Intel Skylake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
>  SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
> -BMI, BMI2, F16C, RDSEED, ADCX, CLFLUSHOPT, XSAVEC and XSAVES instruction set
> -support.
> +BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC and XSAVES
> +instruction set support.
>
>  @item bonnell
>  Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and 
> SSSE3
> @@ -29595,32 +29595,33 @@ MOVDIR64B, CLDEMOTE and WAITPKG instruction set 
> support.
>  @item knl
>  Intel Knight's Landing CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, 
> SSE3,
>  SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
> -BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHWT1, AVX512F, AVX512PF, AVX512ER and
> -AVX512CD instruction set support.
> +BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, PREFETCHWT1, AVX512F, AVX512PF,
> +AVX512ER and AVX512CD instruction set support.
>
>  @item knm
>  Intel Knights Mill CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3,
>  SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
> -BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHWT1, AVX512F, AVX512PF, AVX512ER, 
> AVX512CD,
> -AVX5124VNNIW, AVX5124FMAPS and AVX512VPOPCNTDQ instruction set support.
> +BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, PREFETCHWT1, AVX512F, AVX512PF,
> +AVX512ER, AVX512CD, AVX5124VNNIW, AVX5124FMAPS and AVX512VPOPCNTDQ 
> instruction
> +set support.
>
>  @item skylake-avx512
>  Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3,
>  SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, 
> FMA,
> -BMI, BMI2, F16C, RDSEED, ADCX, CLFLUSHOPT, XSAVEC, XSAVES, AVX512F,
> +BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, XSAVES, 
> AVX512F,
>  CLWB, AVX512VL, AVX512BW, AVX512DQ and AVX512CD instruction set support.
>
>  @item cannonlake
>  Intel Cannonlake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2,
>  SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE,
> -RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, CLFLUSHOPT, XSAVEC,
> +RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC,
>  XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ, AVX512CD, AVX512VBMI,
>  AVX512IFMA, SHA and UMIP instruction set support.
>
>  @item icelake-client
>  Intel Icelake Client CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2,
>  SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE,
> -RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, CLFLUSHOPT, XSAVEC,
> +RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC,
>  X

Re: [1/3][aarch64] Add aarch64 support for vec_widen_add, vec_widen_sub patterns

2020-11-13 Thread Richard Sandiford via Gcc-patches
Joel Hutton via Gcc-patches  writes:
> Hi all,
>
> This patch adds backend patterns for vec_widen_add, vec_widen_sub on aarch64.
>
> All 3 patches together bootstrapped and regression tested on aarch64.
>
> Ok for stage 1?
>
> gcc/ChangeLog:
>
> 2020-11-12  Joel Hutton  
>
>         * config/aarch64/aarch64-simd.md: New patterns 
> vec_widen_saddl_lo/hi_
>
> From 3e47bc562b83417a048e780bcde52fb2c9617df3 Mon Sep 17 00:00:00 2001
> From: Joel Hutton 
> Date: Mon, 9 Nov 2020 15:35:57 +
> Subject: [PATCH 1/3] [aarch64] Add vec_widen patterns to aarch64
>
> Add widening add and subtract pattrerns to the aarch64
> backend.
> ---
>  gcc/config/aarch64/aarch64-simd.md | 94 ++
>  1 file changed, 94 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 2cf6fe9154a2ee1b21ad9e8e2a6109805022be7f..b4f56a2295926f027bd53e7456eec729af0cd6df
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3382,6 +3382,100 @@
>[(set_attr "type" "neon__long")]
>  )
>  
> +(define_expand "vec_widen_saddl_lo_"
> +  [(match_operand: 0 "register_operand")
> +   (match_operand:VQW 1 "register_operand")
> +   (match_operand:VQW 2 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
> +  emit_insn (gen_aarch64_saddl_lo_internal (operands[0], operands[1],
> +   operands[2], p));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_ssubl_lo_"
> +  [(match_operand: 0 "register_operand")
> +   (match_operand:VQW 1 "register_operand")
> +   (match_operand:VQW 2 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
> +  emit_insn (gen_aarch64_ssubl_lo_internal (operands[0], operands[1],
> +   operands[2], p));
> +  DONE;
> +})
> +(define_expand "vec_widen_saddl_hi_"
> +  [(match_operand: 0 "register_operand")
> +   (match_operand:VQW 1 "register_operand")
> +   (match_operand:VQW 2 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
> +  emit_insn (gen_aarch64_saddl_hi_internal (operands[0], operands[1],
> +   operands[2], p));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_ssubl_hi_"
> +  [(match_operand: 0 "register_operand")
> +   (match_operand:VQW 1 "register_operand")
> +   (match_operand:VQW 2 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
> +  emit_insn (gen_aarch64_ssubl_hi_internal (operands[0], operands[1],
> +   operands[2], p));
> +  DONE;
> +})
> +(define_expand "vec_widen_uaddl_lo_"
> +  [(match_operand: 0 "register_operand")
> +   (match_operand:VQW 1 "register_operand")
> +   (match_operand:VQW 2 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
> +  emit_insn (gen_aarch64_uaddl_lo_internal (operands[0], operands[1],
> +   operands[2], p));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_usubl_lo_"
> +  [(match_operand: 0 "register_operand")
> +   (match_operand:VQW 1 "register_operand")
> +   (match_operand:VQW 2 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
> +  emit_insn (gen_aarch64_usubl_lo_internal (operands[0], operands[1],
> +   operands[2], p));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_uaddl_hi_"
> +  [(match_operand: 0 "register_operand")
> +   (match_operand:VQW 1 "register_operand")
> +   (match_operand:VQW 2 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
> +  emit_insn (gen_aarch64_uaddl_hi_internal (operands[0], operands[1],
> +   operands[2], p));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_usubl_hi_"
> +  [(match_operand: 0 "register_operand")
> +   (match_operand:VQW 1 "register_operand")
> +   (match_operand:VQW 2 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
> +  emit_insn (gen_aarch64_usubl_hi_internal (operands[0], operands[1],
> +   operands[2], p));
> +  DONE;
> +})

There are ways in which we could reduce the amount of cut-&-paste here,
but I guess everything is a trade-off between clarity and compactness.
One extreme is to write them all out explicitly, another extreme would
be to have one define_expand and various iterators and attributes.

I think the vec_widen_mult_*_ patterns strike a good balance:
the use ANY_EXTEND to hide the sign difference while still having
separate hi and lo patterns:

(define_expand "vec_widen_mult_lo_"
  [(match_operand: 0 "

Re: [3/3][aarch64] Add support for vec_widen_shift pattern

2020-11-13 Thread Richard Sandiford via Gcc-patches
Joel Hutton via Gcc-patches  writes:
> Hi all,
>
> This patch adds support in the aarch64 backend for the vec_widen_shift 
> vect-pattern and makes a minor mid-end fix to support it.
>
> All 3 patches together bootstrapped and regression tested on aarch64.
>
> Ok for stage 1?
>
> gcc/ChangeLog:
>
> 2020-11-12  Joel Hutton  
>
>         * config/aarch64/aarch64-simd.md: vec_widen_lshift_hi/lo 
> patterns
>         * tree-vect-stmts.c 
>         (vectorizable_conversion): Fix for widen_lshift case
>
> gcc/testsuite/ChangeLog:
>
> 2020-11-12  Joel Hutton  
>
>         * gcc.target/aarch64/vect-widen-lshift.c: New test.
>
> From 97af35b2d2a505dcefd8474cbd4bc3441b83ab02 Mon Sep 17 00:00:00 2001
> From: Joel Hutton 
> Date: Thu, 12 Nov 2020 11:48:25 +
> Subject: [PATCH 3/3] [AArch64][vect] vec_widen_lshift pattern
>
> Add aarch64 vec_widen_lshift_lo/hi patterns and fix bug it triggers in
> mid-end.
> ---
>  gcc/config/aarch64/aarch64-simd.md| 66 +++
>  .../gcc.target/aarch64/vect-widen-lshift.c| 60 +
>  gcc/tree-vect-stmts.c |  9 ++-
>  3 files changed, 133 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> b4f56a2295926f027bd53e7456eec729af0cd6df..2bb39c530a1a861cb9bd3df0c2943f62bd6153d7
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -4711,8 +4711,74 @@
>[(set_attr "type" "neon_sat_shift_reg")]
>  )
>  
> +(define_expand "vec_widen_shiftl_lo_"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (unspec: [(match_operand:VQW 1 "register_operand" "w")
> +  (match_operand:SI 2
> +"aarch64_simd_shift_imm_bitsize_" "i")]
> +  VSHLL))]
> +  "TARGET_SIMD"
> +  {
> +rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
> +emit_insn (gen_aarch64_shll_internal (operands[0], 
> operands[1],
> +  p, operands[2]));
> +DONE;
> +  }
> +)
> +
> +(define_expand "vec_widen_shiftl_hi_"
> +   [(set (match_operand: 0 "register_operand")
> + (unspec: [(match_operand:VQW 1 "register_operand" "w")
> +  (match_operand:SI 2
> +"immediate_operand" "i")]
> +   VSHLL))]
> +   "TARGET_SIMD"
> +   {
> +rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
> +emit_insn (gen_aarch64_shll2_internal (operands[0], 
> operands[1],
> +   p, operands[2]));
> +DONE;
> +   }
> +)
> +
>  ;; vshll_n
>  
> +(define_insn "aarch64_shll_internal"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (unspec: [(vec_select:
> + (match_operand:VQW 1 "register_operand" "w")
> + (match_operand:VQW 2 "vect_par_cnst_lo_half" ""))
> +  (match_operand:SI 3
> +"aarch64_simd_shift_imm_bitsize_" "i")]
> +  VSHLL))]
> +  "TARGET_SIMD"
> +  {
> +if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
> +  return "shll\\t%0., %1., %3";
> +else
> +  return "shll\\t%0., %1., %3";
> +  }
> +  [(set_attr "type" "neon_shift_imm_long")]
> +)
> +
> +(define_insn "aarch64_shll2_internal"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (unspec: [(vec_select:
> + (match_operand:VQW 1 "register_operand" "w")
> + (match_operand:VQW 2 "vect_par_cnst_hi_half" ""))
> +  (match_operand:SI 3
> +"aarch64_simd_shift_imm_bitsize_" "i")]
> +  VSHLL))]
> +  "TARGET_SIMD"
> +  {
> +if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
> +  return "shll2\\t%0., %1., %3";
> +else
> +  return "shll2\\t%0., %1., %3";
> +  }
> +  [(set_attr "type" "neon_shift_imm_long")]
> +)
> +
>  (define_insn "aarch64_shll_n"
>[(set (match_operand: 0 "register_operand" "=w")
>   (unspec: [(match_operand:VD_BHSI 1 "register_operand" "w")
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c 
> b/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c
> new file mode 100644
> index 
> ..23ed93d1dcbc3ca559efa6708b4ed5855fb6a050
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c
> @@ -0,0 +1,60 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -save-temps" } */
> +#include 
> +#include 
> +

SVE targets will need a:

#pragma GCC target "+nosve"

here, since we'll generate different code for SVE.

> +#define ARR_SIZE 1024
> +
> +/* Should produce an shll,shll2 pair*/
> +void sshll_opt (int32_t *foo, int16_t *a, int16_t *b)
> +{
> +for( int i = 0; i < ARR_SIZE - 3;i=i+4)
> +{
> +foo[i]   = a[i] 

RE: [PATCH] aarch64: Make use of RTL predicates

2020-11-13 Thread Kyrylo Tkachov via Gcc-patches
Hi Andrea,

> -Original Message-
> From: Andrea Corallo 
> Sent: 10 November 2020 13:26
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; nd 
> Subject: [PATCH] aarch64: Make use of RTL predicates
> 
> Hi all,
> 
> I'd like to propose this patch to make use of RTL predicates into the
> AArch64 back-end where possible.
> 
> Bootstrapped and regtested on aarch64-unknown-linux-gnu.
> 
> Okay for trunk?

Ok. I consider these changes obvious (you can commit such changes as 
pre-approved in the future).
If these can be done mechanically I'd appreciate a similar cleanup in the arm 
backend.

Thanks,
Kyrill

> 
> Thanks
> 
>   Andrea



[PATCH] remove almost all users of gimple_expr_code

2020-11-13 Thread Richard Biener
This replaces the old-school gimple_expr_code with more selective
functions throughout the compiler, in all cases making the code
shorter or more clear.

Bootstrapped / tested on x86_64-unknown-linux-gnu, pushed.

2020-11-13  Richard Biener  

* cfgexpand.c (gimple_assign_rhs_to_tree): Use
gimple_assign_rhs_class.
(expand_gimple_stmt_1): Likewise.
* gimplify-me.c (gimple_regimplify_operands): Use
gimple_assign_single_p.
* ipa-icf-gimple.c (func_checker::compare_gimple_assign):
Remove redundant compare.
(func_checker::compare_gimple_cond): Use gimple_cond_code.
* tree-ssa-tail-merge.c (gimple_equal_p): Likewise.
* predict.c (predict_loops): Use gimple_assign_rhs_code.
---
 gcc/cfgexpand.c   |  7 +++
 gcc/gimplify-me.c | 12 
 gcc/ipa-icf-gimple.c  | 10 ++
 gcc/predict.c |  2 +-
 gcc/tree-ssa-tail-merge.c |  4 ++--
 5 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index b2d86859b39..1b7bdbc15be 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -103,7 +103,7 @@ tree
 gimple_assign_rhs_to_tree (gimple *stmt)
 {
   tree t;
-  switch (get_gimple_rhs_class (gimple_expr_code (stmt)))
+  switch (gimple_assign_rhs_class (stmt))
 {
 case GIMPLE_TERNARY_RHS:
   t = build3 (gimple_assign_rhs_code (stmt),
@@ -3741,11 +3741,10 @@ expand_gimple_stmt_1 (gimple *stmt)
   of binary assigns must be a gimple reg.  */
 
if (TREE_CODE (lhs) != SSA_NAME
-   || get_gimple_rhs_class (gimple_expr_code (stmt))
-  == GIMPLE_SINGLE_RHS)
+   || gimple_assign_rhs_class (assign_stmt) == GIMPLE_SINGLE_RHS)
  {
tree rhs = gimple_assign_rhs1 (assign_stmt);
-   gcc_assert (get_gimple_rhs_class (gimple_expr_code (stmt))
+   gcc_assert (gimple_assign_rhs_class (assign_stmt)
== GIMPLE_SINGLE_RHS);
if (gimple_has_location (stmt) && CAN_HAVE_LOCATION_P (rhs)
/* Do not put locations on possibly shared trees.  */
diff --git a/gcc/gimplify-me.c b/gcc/gimplify-me.c
index 47148fbd14f..ee84c8bb194 100644
--- a/gcc/gimplify-me.c
+++ b/gcc/gimplify-me.c
@@ -230,10 +230,8 @@ gimple_regimplify_operands (gimple *stmt, 
gimple_stmt_iterator *gsi_p)
  if (i == 1 && (is_gimple_call (stmt) || is_gimple_assign (stmt)))
gimplify_expr (&op, &pre, NULL, is_gimple_lvalue, fb_lvalue);
  else if (i == 2
-  && is_gimple_assign (stmt)
-  && num_ops == 2
-  && get_gimple_rhs_class (gimple_expr_code (stmt))
- == GIMPLE_SINGLE_RHS)
+  && gimple_assign_single_p (stmt)
+  && num_ops == 2)
gimplify_expr (&op, &pre, NULL,
   rhs_predicate_for (gimple_assign_lhs (stmt)),
   fb_rvalue);
@@ -255,10 +253,8 @@ gimple_regimplify_operands (gimple *stmt, 
gimple_stmt_iterator *gsi_p)
{
  bool need_temp = false;
 
- if (is_gimple_assign (stmt)
- && num_ops == 2
- && get_gimple_rhs_class (gimple_expr_code (stmt))
-== GIMPLE_SINGLE_RHS)
+ if (gimple_assign_single_p (stmt)
+ && num_ops == 2)
gimplify_expr (gimple_assign_rhs1_ptr (stmt), &pre, NULL,
   rhs_predicate_for (gimple_assign_lhs (stmt)),
   fb_rvalue);
diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index d5423a7e9b2..b755d7ec847 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -610,12 +610,6 @@ func_checker::compare_gimple_assign (gimple *s1, gimple 
*s2)
   tree_code code1, code2;
   unsigned i;
 
-  code1 = gimple_expr_code (s1);
-  code2 = gimple_expr_code (s2);
-
-  if (code1 != code2)
-return false;
-
   code1 = gimple_assign_rhs_code (s1);
   code2 = gimple_assign_rhs_code (s2);
 
@@ -652,8 +646,8 @@ func_checker::compare_gimple_cond (gimple *s1, gimple *s2)
   tree t1, t2;
   tree_code code1, code2;
 
-  code1 = gimple_expr_code (s1);
-  code2 = gimple_expr_code (s2);
+  code1 = gimple_cond_code (s1);
+  code2 = gimple_cond_code (s2);
 
   if (code1 != code2)
 return false;
diff --git a/gcc/predict.c b/gcc/predict.c
index 361c4019eec..3acbb86b75f 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -2204,7 +2204,7 @@ predict_loops (void)
 {
   gimple *call_stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (stmt));
   if (gimple_code (call_stmt) == GIMPLE_ASSIGN
-  && gimple_expr_code (call_stmt) == NOP_EXPR
+  && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (call_stmt))
   && TREE_CODE (gimple_assign_rhs1 (call_stmt)) == SSA_NAME)
 call_stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (call_stmt));
   if (gimple_call_internal_p (call_stmt, IFN_B

RE: [PATCH] aarch64: Add backend support for expanding __builtin_memset

2020-11-13 Thread Sudakshina Das via Gcc-patches
Hi Richard

> -Original Message-
> From: Richard Sandiford 
> Sent: 11 November 2020 17:52
> To: Sudakshina Das 
> Cc: Wilco Dijkstra ; gcc-patches@gcc.gnu.org;
> Kyrylo Tkachov ; Richard Earnshaw
> 
> Subject: Re: [PATCH] aarch64: Add backend support for expanding
> __builtin_memset
> 
> Sudakshina Das  writes:
> > Apologies for the delay. I have attached another version of the patch.
> > I have disabled the test cases for ILP32. This is only because
> > function body check fails because there is an addition unsigned extension
> instruction for src pointer in
> > every test (uxtwx0, w0). The actual inlining is not different.
> 
> Yeah, agree that's the best way of handling the ILP32 difference.
> 
> > […]
> > +/* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant.
> Without
> > +   -mstrict-align, make decisions in "setmem".  Otherwise follow a sensible
> > +   default:  when optimizing for size adjust the ratio to account for
> > +the
> 
> nit: should just be one space after “:”
> 
> > […]
> > @@ -21289,6 +21292,134 @@ aarch64_expand_cpymem (rtx *operands)
> >return true;
> >  }
> >
> > +/* Like aarch64_copy_one_block_and_progress_pointers, except for
> memset where
> > +   *src is a register we have created with the duplicated value to be
> > +set.  */
> 
> “*src” -> SRC
> since there's no dereference now
> 
> > […]
> > +  /* In case we are optimizing for size or if the core does not
> > + want to use STP Q regs, lower the max_set_size.  */
> > +  max_set_size = (!speed_p
> > + || (aarch64_tune_params.extra_tuning_flags
> > + & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))
> > + ? max_set_size/2 : max_set_size;
> 
> Formatting nit: should be a space either side of “/”.
> 
> > +  while (n > 0)
> > +{
> > +  /* Find the largest mode in which to do the copy in without
> > +over writing.  */
> 
> s/in without/without/
> 
> > +  opt_scalar_int_mode mode_iter;
> > +  FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
> > +   if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit))
> > + cur_mode = mode_iter.require ();
> > +
> > +  gcc_assert (cur_mode != BLKmode);
> > +
> > +  mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant ();
> > +  aarch64_set_one_block_and_progress_pointer (src, &dst,
> > + cur_mode);
> > +
> > +  n -= mode_bits;
> > +
> > +  /* Do certain trailing copies as overlapping if it's going to be
> > +cheaper.  i.e. less instructions to do so.  For instance doing a 15
> > +byte copy it's more efficient to do two overlapping 8 byte copies
> than
> > +8 + 4 + 2 + 1.  */
> > +  if (n > 0 && n < copy_limit / 2)
> > +   {
> > + next_mode = smallest_mode_for_size (n, MODE_INT);
> > + int n_bits = GET_MODE_BITSIZE (next_mode).to_constant ();
> 
> Sorry for the runaround, but looking at this again, I'm a bit worried that we
> only indirectly test that n_bits is within the length of the original set.  I 
> guess
> it is because if n < copy_limit / 2 then n < mode_bits, and so n_bits will 
> never
> exceed mode_bits.  I think it might be worth adding an assert to make that
> “clearer” (maybe only to me, probably obvious to everyone else):
> 
> gcc_assert (n_bits <= mode_bits);
> 
> OK with those changes, thanks.

Thank you! Committed as 54bbde5 with those changes.

Sudi

> 
> Richard
> 
> > + dst = aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT);
> > + n = n_bits;
> > +   }
> > +}
> > +
> > +  return true;
> > +}
> > +
> > +
> >  /* Split a DImode store of a CONST_INT SRC to MEM DST as two
> > SImode stores.  Handle the case when the constant has identical
> > bottom and top halves.  This is beneficial when the two stores can
> > be


Re: [committed] libstdc++: Optimise std::future::wait_for and fix futex polling

2020-11-13 Thread Jonathan Wakely via Gcc-patches

On 12/11/20 23:49 +, Jonathan Wakely wrote:

To poll a std::future to see if it's ready you have to call one of the
timed waiting functions. The most obvious way is wait_for(0s) but this
was previously very inefficient because it would turn the relative
timeout to an absolute one by calling system_clock::now(). When the
relative timeout is zero (or less) we're obviously going to get a time
that has already passed, but the overhead of obtaining the current time
can be dozens of microseconds. The alternative is to call wait_until
with an absolute timeout that is in the past. If you know the clock's
epoch is in the past you can use a default constructed time_point.
Alternatively, using some_clock::time_point::min() gives the earliest
time point supported by the clock, which should be safe to assume is in
the past. However, using a futex wait with an absolute timeout before
the UNIX epoch fails and sets errno=EINVAL. The new code using futex
waits with absolute timeouts was not checking for this case, which could
result in hangs (or killing the process if the libray is built with
assertions enabled).

This patch checks for times before the epoch before attempting to wait
on a futex with an absolute timeout, which fixes the hangs or crashes.
It also makes it very fast to poll using an absolute timeout before the
epoch (because we skip the futex syscall).

It also makes future::wait_for avoid waiting at all when the relative
timeout is zero or less, to avoid the unnecessary overhead of getting
the current time. This makes polling with wait_for(0s) take only a few
cycles instead of dozens of milliseconds.

libstdc++-v3/ChangeLog:

* include/std/future (future::wait_for): Do not wait for
durations less than or equal to zero.
* src/c++11/futex.cc (_M_futex_wait_until)
(_M_futex_wait_until_steady): Do not wait for timeouts before
the epoch.
* testsuite/30_threads/future/members/poll.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

I think the shortcut in future::wait_for is worth backporting. The
changes in src/c++11/futex.cc are not needed because the code using
absolute timeouts with futex waits is not present on any release
branch.


I've committed this fix for the new test.

Tested x86_64-linux, sparc-solaris-2.11 and powerpc-aix.


commit 8c4e33d2032ab150748ea2fe1df2b1c00652a338
Author: Jonathan Wakely 
Date:   Fri Nov 13 10:04:33 2020

libstdc++: Add -pthread options to std::future polling test

For linux targets this test doesn't need -lpthread because it only uses
atomics, but for all other targets std::call_once still needs pthreads.
Add the necessary test directives to make that work.

The timings in this test might be too fragile or too target-specific, so
it might need to be adjusted in future, or restricted to only run on
specific targets. For now I've increased the allowed ratio between
wait_for calls before and after the future is made ready, because it was
failing with -O3 -march=native sometimes.

libstdc++-v3/ChangeLog:

* testsuite/30_threads/future/members/poll.cc: Require gthreads
and add -pthread for targets that require it. Relax required
ratio of wait_for calls before/after the future is ready.

diff --git a/libstdc++-v3/testsuite/30_threads/future/members/poll.cc b/libstdc++-v3/testsuite/30_threads/future/members/poll.cc
index 54580579d3a1..fff9bea899c9 100644
--- a/libstdc++-v3/testsuite/30_threads/future/members/poll.cc
+++ b/libstdc++-v3/testsuite/30_threads/future/members/poll.cc
@@ -17,6 +17,8 @@
 
 // { dg-options "-O3" }
 // { dg-do run { target c++11 } }
+// { dg-additional-options "-pthread" { target pthread } }
+// { dg-require-gthreads "" }
 
 #include 
 #include 
@@ -49,20 +51,6 @@ int main()
   auto stop = chrono::high_resolution_clock::now();
   double wait_for_0 = print("wait_for(0s)", stop - start);
 
-  start = chrono::high_resolution_clock::now();
-  for(int i = 0; i < iterations; i++)
-f.wait_until(chrono::system_clock::time_point());
-  stop = chrono::high_resolution_clock::now();
-  double wait_until_sys_epoch __attribute__((unused))
-= print("wait_until(system_clock epoch)", stop - start);
-
-  start = chrono::high_resolution_clock::now();
-  for(int i = 0; i < iterations; i++)
-f.wait_until(chrono::steady_clock::time_point());
-  stop = chrono::high_resolution_clock::now();
-  double wait_until_steady_epoch __attribute__((unused))
-= print("wait_until(steady_clock epoch", stop - start);
-
   start = chrono::high_resolution_clock::now();
   for(int i = 0; i < iterations; i++)
 f.wait_until(chrono::system_clock::time_point::min());
@@ -77,6 +65,20 @@ int main()
   double wait_until_steady_min __attribute__((unused))
 = print("wait_until(steady_clock minimum)", stop - start);
 
+  start = chrono::high_resolution_clock::now();
+  for(int i = 0; i < iterations; i++)
+f.wait_until(chrono::sys

Re: [PATCH] aarch64: Make use of RTL predicates

2020-11-13 Thread Richard Sandiford via Gcc-patches
Andrea Corallo via Gcc-patches  writes:
> Hi all,
>
> I'd like to propose this patch to make use of RTL predicates into the
> AArch64 back-end where possible.

Nice cleanup :-)

> Bootstrapped and regtested on aarch64-unknown-linux-gnu.
>
> Okay for trunk?

OK, thanks.

Richard


[PATCH] libstdc++: Fix error shown during Solaris build

2020-11-13 Thread Jonathan Wakely via Gcc-patches
Currently this is shown when building libstdc++ on Solaris:

-lrt: open: No such file or directory

The error comes from the make_sunver.pl script which tries to open each
of its arguments. The arguments are passed by this make rule:

perl ${glibcxx_srcdir}/scripts/make_exports.pl \
  libstdc++-symbols.ver \
  $(libstdc___la_OBJECTS:%.lo=.libs/%.o) \
 `echo $(libstdc___la_LIBADD) | \
sed 's,/\([^/.]*\)\.la,/.libs/\1.a,g'` \
 > $@ || (rm -f $@ ; exit 1)

The $(libstdc___la_LIBADD) variable includes $(GLIBCXX_LIBS) which
contains -lrt on Solaris.

This patch adds another sed script to filter -l arguments from the echo
command. In order to reliably match ' -l[^ ]* ' the echo arguments are
quoted and a space added before and after them. This might be overkill
just to remove -lrt from the start of the string, but should be robust
in case other -l arguments are added to $(GLIBCXX_LIBS), or in case the
$(libstdc___la_LIBADD) libraries are reordered.

libstdc++-v3/ChangeLog:

* src/Makefile.am (libstdc++-symbols.ver-sun): Remove -lrt from
arguments passed to make_sunver.pl script.
* src/Makefile.in: Regenerate.

Tested sparc-solaris2.11. Rainer, does this look OK?

Iain, the libstdc++-symbols.explist target for Darwin is very similar,
but I don't know if it's a problem there. Does GLIBCXX_LIBS contain
anything in $target/libstdc++-v3/src/Makefile on Darwin?

Should we make the same change just in case?


commit 5b9b2158b08650b15049564a4e87b7b5cac49759
Author: Jonathan Wakely 
Date:   Fri Nov 13 10:39:23 2020

libstdc++: Fix error shown during Solaris build

Currently this is shown when building libstdc++ on Solaris:

-lrt: open: No such file or directory

The error comes from the make_sunver.pl script which tries to open each
of its arguments. The arguments are passed by this make rule:

perl ${glibcxx_srcdir}/scripts/make_exports.pl \
  libstdc++-symbols.ver \
  $(libstdc___la_OBJECTS:%.lo=.libs/%.o) \
 `echo $(libstdc___la_LIBADD) | \
sed 's,/\([^/.]*\)\.la,/.libs/\1.a,g'` \
 > $@ || (rm -f $@ ; exit 1)

The $(libstdc___la_LIBADD) variable includes $(GLIBCXX_LIBS) which
contains -lrt on Solaris.

This patch adds another sed script to filter -l arguments from the echo
command. In order to reliably match ' -l[^ ]* ' the echo arguments are
quoted and a space added before and after them. This might be overkill
just to remove -lrt from the start of the string, but should be robust
in case other -l arguments are added to $(GLIBCXX_LIBS), or in case the
$(libstdc___la_LIBADD) libraries are reordered.

libstdc++-v3/ChangeLog:

* src/Makefile.am (libstdc++-symbols.ver-sun): Remove -lrt from
arguments passed to make_sunver.pl script.
* src/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/src/Makefile.am b/libstdc++-v3/src/Makefile.am
index 1eda70edb379..21b6db7fb1c3 100644
--- a/libstdc++-v3/src/Makefile.am
+++ b/libstdc++-v3/src/Makefile.am
@@ -269,8 +269,8 @@ libstdc++-symbols.ver-sun : libstdc++-symbols.ver \
perl $(toplevel_srcdir)/contrib/make_sunver.pl \
  libstdc++-symbols.ver \
  $(libstdc___la_OBJECTS:%.lo=.libs/%.o) \
-`echo $(libstdc___la_LIBADD) | \
-   sed 's,/\([^/.]*\)\.la,/.libs/\1.a,g'` \
+`echo ' $(libstdc___la_LIBADD) ' | \
+   sed -e 's,/\([^/.]*\)\.la,/.libs/\1.a,g' -e 's/ -l[^ ]* / /'` \
 > $@ || (rm -f $@ ; exit 1)
 endif
 if ENABLE_SYMVERS_DARWIN


Re: [PATCH] libstdc++: Fix error shown during Solaris build

2020-11-13 Thread Jonathan Wakely via Gcc-patches

On 13/11/20 11:07 +, Jonathan Wakely wrote:

Currently this is shown when building libstdc++ on Solaris:

-lrt: open: No such file or directory

The error comes from the make_sunver.pl script which tries to open each
of its arguments. The arguments are passed by this make rule:

perl ${glibcxx_srcdir}/scripts/make_exports.pl \
  libstdc++-symbols.ver \
  $(libstdc___la_OBJECTS:%.lo=.libs/%.o) \
 `echo $(libstdc___la_LIBADD) | \
sed 's,/\([^/.]*\)\.la,/.libs/\1.a,g'` \
 > $@ || (rm -f $@ ; exit 1)

The $(libstdc___la_LIBADD) variable includes $(GLIBCXX_LIBS) which
contains -lrt on Solaris.

This patch adds another sed script to filter -l arguments from the echo
command. In order to reliably match ' -l[^ ]* ' the echo arguments are
quoted and a space added before and after them. This might be overkill
just to remove -lrt from the start of the string, but should be robust
in case other -l arguments are added to $(GLIBCXX_LIBS), or in case the
$(libstdc___la_LIBADD) libraries are reordered.

libstdc++-v3/ChangeLog:

* src/Makefile.am (libstdc++-symbols.ver-sun): Remove -lrt from
arguments passed to make_sunver.pl script.
* src/Makefile.in: Regenerate.

Tested sparc-solaris2.11. Rainer, does this look OK?

Iain, the libstdc++-symbols.explist target for Darwin is very similar,
but I don't know if it's a problem there. Does GLIBCXX_LIBS contain
anything in $target/libstdc++-v3/src/Makefile on Darwin?

Should we make the same change just in case?


On examining acinclude.m4 it looks like GLIBCXX_LIBS could in theory
be non-empty for any target, including Darwin:

  elif test x"$enable_libstdcxx_time" != x"no"; then

if test x"$enable_libstdcxx_time" = x"rt"; then
  AC_SEARCH_LIBS(clock_gettime, [rt posix4])
  AC_SEARCH_LIBS(nanosleep, [rt posix4])
else
  AC_SEARCH_LIBS(clock_gettime, [posix4])
  AC_SEARCH_LIBS(nanosleep, [posix4])
fi

case "$ac_cv_search_clock_gettime" in
  -l*) GLIBCXX_LIBS=$ac_cv_search_clock_gettime
  ;;
esac
case "$ac_cv_search_nanosleep" in
  -l*) GLIBCXX_LIBS="$GLIBCXX_LIBS $ac_cv_search_nanosleep"
  ;;
esac

AC_SEARCH_LIBS(sched_yield, [rt posix4])

case "$ac_cv_search_sched_yield" in
  -lposix4*)
  GLIBCXX_LIBS="$GLIBCXX_LIBS $ac_cv_search_sched_yield"
  ac_has_sched_yield=yes
  ;;
  -lrt*)
  if test x"$enable_libstdcxx_time" = x"rt"; then
GLIBCXX_LIBS="$GLIBCXX_LIBS $ac_cv_search_sched_yield"
ac_has_sched_yield=yes
  fi
  ;;
  *)
  ac_has_sched_yield=yes
  ;;
esac

But in practice the snippet above is only used if you explicitly
configure with --enable-libstdcxx-time={yes,rt} and will only add
anything to GLIBCXX_LIBS if clock_gettime or nanosleep lives in one of
librt or libposix4. I think libposix4 is Solaris-specific, and I don't
think Darwin has librt. So in practice I don't think there's a
problem on Darwin today.





Re: [PATCH] libstdc++: Fix error shown during Solaris build

2020-11-13 Thread Iain Sandoe via Gcc-patches

Jonathan Wakely  wrote:


On 13/11/20 11:07 +, Jonathan Wakely wrote:

Currently this is shown when building libstdc++ on Solaris:

-lrt: open: No such file or directory

The error comes from the make_sunver.pl script which tries to open each
of its arguments. The arguments are passed by this make rule:

perl ${glibcxx_srcdir}/scripts/make_exports.pl \
  libstdc++-symbols.ver \
  $(libstdc___la_OBJECTS:%.lo=.libs/%.o) \
 `echo $(libstdc___la_LIBADD) | \
sed 's,/\([^/.]*\)\.la,/.libs/\1.a,g'` \
 > $@ || (rm -f $@ ; exit 1)

The $(libstdc___la_LIBADD) variable includes $(GLIBCXX_LIBS) which
contains -lrt on Solaris.

This patch adds another sed script to filter -l arguments from the echo
command. In order to reliably match ' -l[^ ]* ' the echo arguments are
quoted and a space added before and after them. This might be overkill
just to remove -lrt from the start of the string, but should be robust
in case other -l arguments are added to $(GLIBCXX_LIBS), or in case the
$(libstdc___la_LIBADD) libraries are reordered.

libstdc++-v3/ChangeLog:

* src/Makefile.am (libstdc++-symbols.ver-sun): Remove -lrt from
arguments passed to make_sunver.pl script.
* src/Makefile.in: Regenerate.

Tested sparc-solaris2.11. Rainer, does this look OK?

Iain, the libstdc++-symbols.explist target for Darwin is very similar,
but I don't know if it's a problem there. Does GLIBCXX_LIBS contain
anything in $target/libstdc++-v3/src/Makefile on Darwin?

Should we make the same change just in case?


On examining acinclude.m4 it looks like GLIBCXX_LIBS could in theory
be non-empty for any target, including Darwin:

 elif test x"$enable_libstdcxx_time" != x"no"; then

   if test x"$enable_libstdcxx_time" = x"rt"; then
 AC_SEARCH_LIBS(clock_gettime, [rt posix4])
 AC_SEARCH_LIBS(nanosleep, [rt posix4])
   else
 AC_SEARCH_LIBS(clock_gettime, [posix4])
 AC_SEARCH_LIBS(nanosleep, [posix4])
   fi

   case "$ac_cv_search_clock_gettime" in
 -l*) GLIBCXX_LIBS=$ac_cv_search_clock_gettime
 ;;
   esac
   case "$ac_cv_search_nanosleep" in
 -l*) GLIBCXX_LIBS="$GLIBCXX_LIBS $ac_cv_search_nanosleep"
 ;;
   esac

   AC_SEARCH_LIBS(sched_yield, [rt posix4])

   case "$ac_cv_search_sched_yield" in
 -lposix4*)
 GLIBCXX_LIBS="$GLIBCXX_LIBS $ac_cv_search_sched_yield"
 ac_has_sched_yield=yes
 ;;
 -lrt*)
 if test x"$enable_libstdcxx_time" = x"rt"; then
GLIBCXX_LIBS="$GLIBCXX_LIBS $ac_cv_search_sched_yield"
   ac_has_sched_yield=yes
 fi
 ;;
 *)
 ac_has_sched_yield=yes
 ;;
   esac

But in practice the snippet above is only used if you explicitly
configure with --enable-libstdcxx-time={yes,rt} and will only add
anything to GLIBCXX_LIBS if clock_gettime or nanosleep lives in one of
librt or libposix4. I think libposix4 is Solaris-specific, and I don't
think Darwin has librt. So in practice I don't think there's a
problem on Darwin today.


Agreed, neither of those libs is currently in use on Darwin.

There have, in some cases, been library entries that are simply a symlink to
the one providing the equivalent functionality, to minimize cross-platform  
build

hassles - but I see no entry for librt.dylib or posix4.

thanks
Iain




[PATCH] tree-optimization/97812 - fix range query in VRP assert discovery

2020-11-13 Thread Richard Biener
This makes sure to properly extend the input range before seeing
whether it fits the target.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-11-13  Richard Biener  

PR tree-optimization/97812
* tree-vrp.c (register_edge_assert_for_2): Extend the range
according to its sign before seeing whether it fits.

* gcc.dg/torture/pr97812.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr97812.c | 15 +++
 gcc/tree-vrp.c | 10 --
 2 files changed, 23 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr97812.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr97812.c 
b/gcc/testsuite/gcc.dg/torture/pr97812.c
new file mode 100644
index 000..4d468adf8fa
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr97812.c
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fdisable-tree-evrp" } */
+
+unsigned char c;
+
+int main() {
+volatile short b = 4066;
+  unsigned short bp = b;
+  unsigned d = bp & 2305;
+  signed char e = d;
+  c = e ? : e;
+  if (!d)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 54ce017e8b2..d661866630e 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -1740,8 +1740,14 @@ register_edge_assert_for_2 (tree name, edge e,
  && ((TYPE_PRECISION (TREE_TYPE (name))
   > TYPE_PRECISION (TREE_TYPE (rhs1)))
  || (get_range_info (rhs1, &rmin, &rmax) == VR_RANGE
- && wi::fits_to_tree_p (rmin, TREE_TYPE (name))
- && wi::fits_to_tree_p (rmax, TREE_TYPE (name)
+ && wi::fits_to_tree_p
+  (widest_int::from (rmin,
+ TYPE_SIGN (TREE_TYPE (rhs1))),
+   TREE_TYPE (name))
+ && wi::fits_to_tree_p
+  (widest_int::from (rmax,
+ TYPE_SIGN (TREE_TYPE (rhs1))),
+   TREE_TYPE (name)
add_assert_info (asserts, rhs1, rhs1,
 comp_code, fold_convert (TREE_TYPE (rhs1), val));
}
-- 
2.26.2


Re: [PATCH] Use SHF_GNU_RETAIN to preserve symbol definitions

2020-11-13 Thread Jozef Lawrynowicz
On Thu, Nov 12, 2020 at 02:41:52PM -0800, H.J. Lu wrote:
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index 435c7b348a5..c48ef9692ee 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -289,6 +289,10 @@ get_section (const char *name, unsigned int flags, tree 
> decl,
>slot = section_htab->find_slot_with_hash (name, htab_hash_string (name),
>   INSERT);
>flags |= SECTION_NAMED;
> +#if HAVE_GAS_SHF_GNU_RETAIN
> +  if (decl != nullptr && DECL_PRESERVE_P (decl))

Minor nit, but I think this should be "decl != NULL_TREE".

We should also test that "used" with the "section" attribute applies the
"R" flag. Please apply the attached patch if this gets approved. These
new tests pass with arm-none-eabi and x86_64-pc-linux-gnu.

Thanks,
Jozef
commit cf8e26deb43d13268ab6ee231995aecbf41ba3a3
Author: Jozef Lawrynowicz 
Date:   Fri Nov 13 11:07:14 2020 +

Test "used" attribute in conjunction "section" attribute

diff --git a/gcc/testsuite/gcc.c-torture/compile/attr-used-retain-1.c 
b/gcc/testsuite/gcc.c-torture/compile/attr-used-retain-1.c
index b7763af11e4..5f6cbca6e33 100644
--- a/gcc/testsuite/gcc.c-torture/compile/attr-used-retain-1.c
+++ b/gcc/testsuite/gcc.c-torture/compile/attr-used-retain-1.c
@@ -4,6 +4,7 @@
 /* { dg-final { scan-assembler ".bss.*,\"awR\"" } } */
 /* { dg-final { scan-assembler ".data.*,\"awR\"" } } */
 /* { dg-final { scan-assembler ".rodata.*,\"aR\"" } } */
+/* { dg-final { scan-assembler ".data.used_foo_sec,\"awR\"" } } */
 
 void __attribute__((used)) used_fn (void) { }
 void unused_fn (void) { }
@@ -30,3 +31,5 @@ int __attribute__((used)) used_data2 = 1;
 const int __attribute__((used)) used_rodata2 = 2;
 int __attribute__((used)) used_comm2;
 static int __attribute__((used)) used_lcomm2;
+
+int __attribute__((used,section(".data.used_foo_sec"))) used_foo = 2;
diff --git a/gcc/testsuite/gcc.c-torture/compile/attr-used-retain-2.c 
b/gcc/testsuite/gcc.c-torture/compile/attr-used-retain-2.c
index e3b3cf184f8..be5f3917ac8 100644
--- a/gcc/testsuite/gcc.c-torture/compile/attr-used-retain-2.c
+++ b/gcc/testsuite/gcc.c-torture/compile/attr-used-retain-2.c
@@ -10,6 +10,7 @@
 /* { dg-final { scan-assembler ".rodata.used_rodata2,\"aR\"" } } */
 /* { dg-final { scan-assembler ".bss.used_lcomm,\"awR\"" { target arm-*-* } } 
} */
 /* { dg-final { scan-assembler ".bss.used_lcomm2,\"awR\"" { target arm-*-* } } 
} */
+/* { dg-final { scan-assembler ".data.used_foo_sec,\"awR\"" } } */
 /* { dg-options "-ffunction-sections -fdata-sections" } */
 
 #include "attr-used-retain-1.c"


[committed] [OG10] Backport OpenMP 5.0 features from master

2020-11-13 Thread Kwok Cheung Yeung

Hello

I have backported a couple of patches related to OpenMP 5.0 features from master 
to the devel/omp/gcc-10 branch. These are:


8949b985dbaf07d433bd57d2883e1e5414f20e75: openmp: Add support for the 
omp_get_supported_active_levels runtime library routine


445567b22a3c535be0b1861b393e9a0b050f2b1e: libgomp: Amend documentation for 
omp_get_max_active_levels and omp_get_supported_active_levels


1bfc07d150790fae93184a79a7cce897655cb37b: openmp: Implement support for 
OMP_TARGET_OFFLOAD environment variable


35f258f4bbba7fa044f90b4f14d1bc942db58089: libgomp: Fix up bootstrap in 
libgomp/target.c due to false positive warning


121a8812c45b3155ccbd268b000ad00a778e81e8: libgomp: Hopefully avoid false 
positive warnings in env.c on solaris


74c9882b80bda50b37c9555498de7123c6bdb9e4: openmp: Change omp_get_initial_device 
() to match OpenMP 5.1 requirements


17c5b7e1dc47bab6e6cedbf4b2d88cef3283533e: openmp: Add test for 
OMP_TARGET_OFFLOAD=mandatory for cases where it must not fail


10508db867934264bbc2578f1f454c19fa558fd3: openmp: Mark deprecated symbols in 
OpenMP 5.0


I have tested that these cause no regressions in the libgomp testsuite with both 
AMD GCN and Nvidia offloading.


Kwok


Re: [2/3][vect] Add widening add, subtract vect patterns

2020-11-13 Thread Richard Sandiford via Gcc-patches
[ There was a discussion on irc about how easy it would be to support
  internal functions and tree codes at the same time, so the agreement
  was to go for tree codes for now with a promise to convert the
  widening-related code to use internal functions for GCC 12. ]

Like Richard said, the new patterns need to be documented in md.texi
and the new tree codes need to be documented in generic.texi.

While we're using tree codes, I think we need to make the naming
consistent with other tree codes: WIDEN_PLUS_EXPR instead of
WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR.
Same idea for the VEC_* codes.

(In constrast, the internal functions do try to follow the optab names,
since there's usually a 1:1 correspondence.)

Joel Hutton via Gcc-patches  writes:
> Hi all,
>
> This patch adds widening add and widening subtract patterns to 
> tree-vect-patterns.
>
> All 3 patches together bootstrapped and regression tested on aarch64.
>
> gcc/ChangeLog:
>
> 2020-11-12  Joel Hutton  
>
>         * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases

Not that I personally care about this stuff (would love to see changelogs
go away :-)) but some nits:

Each description is supposed to start with a capital letter and end with
a full stop (even if it's not a complete sentence).  Same for the rest
of the log.

>         * optabs-tree.c (optab_for_tree_code): optabs for widening 
> adds,subtracts

The line limit for changelogs is 80 characters.  The entry should say
what changed, so “Handle …” or “Add case for …” or something.

>         * optabs.def (OPTAB_D): define vectorized widen add, subtracts
>         * tree-cfg.c (verify_gimple_assign_binary): Add case for widening 
> adds, subtracts
>         * tree-inline.c (estimate_operator_cost): Add case for widening adds, 
> subtracts
>         * tree-vect-generic.c (expand_vector_operations_1): Add case for 
> widening adds, subtracts
>         * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog 
> ptatern

typo: pattern

>         (vect_recog_widen_sub_pattern): New recog pattern
>         (vect_recog_average_pattern): Update widened add code
>         (vect_recog_average_pattern): Update widened add code
>         * tree-vect-stmts.c (vectorizable_conversion): Add case for widened 
> add, subtract
>         (supportable_widening_operation): Add case for widened add, subtract
>         * tree.def (WIDEN_ADD_EXPR): New tree code
>         (WIDEN_SUB_EXPR): New tree code
>         (VEC_WIDEN_ADD_HI_EXPR): New tree code
>         (VEC_WIDEN_ADD_LO_EXPR): New tree code
>         (VEC_WIDEN_SUB_HI_EXPR): New tree code
>         (VEC_WIDEN_SUB_LO_EXPR): New tree code
>
> gcc/testsuite/ChangeLog:
>
> 2020-11-12  Joel Hutton  
>
>         * gcc.target/aarch64/vect-widen-add.c: New test.
>         * gcc.target/aarch64/vect-widen-sub.c: New test.
>
>
> Ok for trunk?
>
> From e0c10ca554729b9e6d58dbd3f18ba72b2c3ee8bc Mon Sep 17 00:00:00 2001
> From: Joel Hutton 
> Date: Mon, 9 Nov 2020 15:44:18 +
> Subject: [PATCH 2/3] [vect] Add widening add, subtract patterns
>
> Add widening add, subtract patterns to tree-vect-patterns.
> Add aarch64 tests for patterns.
>
> fix sad

Would be good to expand on this for the final commit message.

> […]
> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index 
> 4dfda756932de1693667c39c6fabed043b20b63b..009dccfa3bd298bca7b3b45401a4cc2acc90ff21
>  100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -170,6 +170,23 @@ optab_for_tree_code (enum tree_code code, const_tree 
> type,
>return (TYPE_UNSIGNED (type)
> ? vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab);
>  
> +case VEC_WIDEN_ADD_LO_EXPR:
> +  return (TYPE_UNSIGNED (type)
> +   ? vec_widen_uaddl_lo_optab  : vec_widen_saddl_lo_optab);
> +
> +case VEC_WIDEN_ADD_HI_EXPR:
> +  return (TYPE_UNSIGNED (type)
> +   ? vec_widen_uaddl_hi_optab  : vec_widen_saddl_hi_optab);
> +
> +case VEC_WIDEN_SUB_LO_EXPR:
> +  return (TYPE_UNSIGNED (type)
> +   ? vec_widen_usubl_lo_optab  : vec_widen_ssubl_lo_optab);
> +
> +case VEC_WIDEN_SUB_HI_EXPR:
> +  return (TYPE_UNSIGNED (type)
> +   ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
> +
> +

Nits: excess blank line at the end and excess space before the “:”s.

>  case VEC_UNPACK_HI_EXPR:
>return (TYPE_UNSIGNED (type)
> ? vec_unpacku_hi_optab : vec_unpacks_hi_optab);
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> 78409aa14537d259bf90277751aac00d452a0d3f..a97cdb360781ca9c743e2991422c600626c75aa5
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -383,6 +383,14 @@ OPTAB_D (vec_widen_smult_even_optab, 
> "vec_widen_smult_even_$a")
>  OPTAB_D (vec_widen_smult_hi_optab, "vec_widen_smult_hi_$a")
>  OPTAB_D (vec_widen_smult_lo_optab, "vec_widen_smult_lo_$a")
>  OPTAB_D (vec_widen_smult_odd_optab, "vec_widen_smult_odd_$a")
> +OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssu

[pushed] doc : Fix build error from r11-4972.

2020-11-13 Thread Iain Sandoe

(aonther re-send, no sign of the message on patches archive)

Hi

As reported on irc, some tex tools don’t like @r{} commands being
split.

For the record, I tried a number of things to wrap the line:

1/ putting the @r{} on the line after the @item
  that didn't work - the (Objective-C and Objective-C++ only) also appeared on
  the following line.

2/ splitting on the space following the ‘and', but that resulted in;
  (Objective-C andObjective-C++ only) regardless of whether I appended a space
  to the end of the first line or the start of the second.

3/ splitting as per the original patch
  worked OK for my installation (tex live 2019).

.. if there’s a general solution to this, I’d be interested.

suggested fix was to allow the long line, at least in the short term,
pushed to master,

Iain



Some tex tools don't allow the @r{} command to be split across
lines.  Fixed by making the change occupy a long line.

gcc/ChangeLog:

* doc/extend.texi: Don't try to line-wrap an @r command.
---
gcc/doc/extend.texi | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 4e5197fc038..4ddbf80a229 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7429,8 +7429,7 @@ data in this way can reduce program startup times.   
This attribute is

specific to ELF targets and relies on the linker to place such data in
the right location
-@item objc_nullability (@var{nullability kind}) @r{(Objective-C and  
Objective

--C++ only)}
+@item objc_nullability (@var{nullability kind}) @r{(Objective-C and  
Objective-C++ only)}

@cindex @code{objc_nullability} variable attribute
This attribute applies to pointer variables only.  It allows marking the
pointer with one of four possible values describing the conditions under
--
2.24.1


Re: [PATCH] aarch64: Make use of RTL predicates

2020-11-13 Thread Andrea Corallo via Gcc-patches
Kyrylo Tkachov via Gcc-patches  writes:

> Hi Andrea,
>
>> -Original Message-
>> From: Andrea Corallo 
>> Sent: 10 November 2020 13:26
>> To: gcc-patches@gcc.gnu.org
>> Cc: Kyrylo Tkachov ; Richard Earnshaw
>> ; nd 
>> Subject: [PATCH] aarch64: Make use of RTL predicates
>> 
>> Hi all,
>> 
>> I'd like to propose this patch to make use of RTL predicates into the
>> AArch64 back-end where possible.
>> 
>> Bootstrapped and regtested on aarch64-unknown-linux-gnu.
>> 
>> Okay for trunk?
>
> Ok. I consider these changes obvious (you can commit such changes as 
> pre-approved in the future).
> If these can be done mechanically I'd appreciate a similar cleanup in the arm 
> backend.

Hi Kyrill,

installed as 3793ecc10fd.

Yes this is done mechanically with [1].

Happy to run it on the arm backend then.

Thanks

  Andrea

[1] 


[PATCH][pushed] clang: fix -Wmisleading-indentation warning.

2020-11-13 Thread Martin Liška

gcc/c-family/c-attribs.c:4698:5: warning: misleading indentation; statement is 
not part of the previous 'if' [-Wmisleading-indentation]

gcc/c-family/ChangeLog:

* c-attribs.c (build_attr_access_from_parms): Format properly.
---
 gcc/c-family/c-attribs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index f1680820ecd..1c7283e1270 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -4695,7 +4695,7 @@ build_attr_access_from_parms (tree parms, bool 
skip_voidptr)
 
   /* Attribute access takes a two or three arguments.  Wrap VBLIST in

  another list in case it has more nodes than would otherwise fit.  */
-vblist = build_tree_list (NULL_TREE, vblist);
+  vblist = build_tree_list (NULL_TREE, vblist);
 
   /* Build a single attribute access with the string describing all

  array arguments and an optional list of any non-parameter VLA
--
2.29.2



[RS6000] Use LIB2_SIDITI_CONV_FUNCS in place of ppc64-fp.c

2020-11-13 Thread Alan Modra via Gcc-patches
This patch retires ppc64-fp.c in favour of using
"LIB2_SIDITI_CONV_FUNCS = yes", which is a lot better solution than
having a copy of selected libgcc2.c functions.

So for powerpc64-linux we see these changes in libgcc files (plus
corresponding _s.o variants).
+_fixdfti.o
+_fixsfti.o
+_fixtfti.o
+_fixunsdfti.o
+_fixunssfti.o
+_fixunstfti.o
+_floattidf.o
+_floattisf.o
+_floattitf.o
+_floatuntidf.o
+_floatuntisf.o
+_floatuntitf.o
-ppc64-fp.o

with these empty objects also appearing (plus _s.o variants).
+_fixunsxfti.o
+_fixxfti.o
+_floattixf.o
+_floatuntixf.o

In reality we aren't getting new TI mode conversions as it might seem,
because the old *di*.o files corresponding to the above files
contained TI mode conversions, whereas now they contain DI mode
conversions.  Those match the functions provided in ppc64-fp.o, and
the set of dynamic libgcc_s.so.1 symbols is identical, apart from 
values, to before this patch.

For ppc32 we get a whole lot more empty objects replacing the empty
ppc64-fp.o.  Again the set of global symbol in libgcc.a and dynamic
symbols in libgcc_s.so.1 are unchanged.

Bootstrapped and regression tested powerpc64-linux, powerpc64le-linux,
powerpc-linux and powerpc-ibm-aix7.2.4.0.  OK?

* config/rs6000/t-ppc64-fp (LIB2ADD): Delete.
(LIB2_SIDITI_CONV_FUNCS): Define.
* config/rs6000/ppc64-fp.c: Delete file.

diff --git a/libgcc/config/rs6000/t-ppc64-fp b/libgcc/config/rs6000/t-ppc64-fp
index 26d1730bcdb..999679fc3cb 100644
--- a/libgcc/config/rs6000/t-ppc64-fp
+++ b/libgcc/config/rs6000/t-ppc64-fp
@@ -1,2 +1 @@
-# Can be used unconditionally, wrapped in __powerpc64__ || __64BIT__ __ppc64__.
-LIB2ADD += $(srcdir)/config/rs6000/ppc64-fp.c
+LIB2_SIDITI_CONV_FUNCS = yes

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH v2, OpenMP 5, C++] Implement implicit mapping of this[:1] (PR92120)

2020-11-13 Thread Chung-Lin Tang

Hi Jakub,
there was a first version of this patch here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554087.html

The attached patch here is a v2 version  that adds implementation of
this part in the this[:1] functionality description in the OpenMP 5.0 spec:

 "if the [member] variable [accessed in a target region] is of a type pointer
  or reference to pointer, it is also treated as if it has appeared in a map
  clause as a zero-length array section."

Basically, referencing a pointer member 'ptr' automatically maps it with the
equivalent of 'map(this->ptr[:0])'

To achieve this, two new map kinds GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION,
and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION were added, which are
basically split from GOMP_MAP_ATTACH and GOMP_MAP_POINTER, except now allowing
the pointer target to be NULL.

This patch has been tested for gcc, g++, gfortran (C and Fortran are not really
affected, but since omp-low.c was slightly touched, tested along for 
completeness)
and libgomp on x86_64-linux with nvptx offloading, all without regressions.

Is this okay for trunk?

Thanks,
Chung-Lin

2020-11-13  Chung-Lin Tang  

PR middle-end/92120

gcc/cp/
* cp-tree.h (finish_omp_target): New declaration.
(set_omp_target_this_expr): Likewise.
* lambda.c (lambda_expr_this_capture): Add call to
set_omp_target_this_expr.
* parser.c (cp_parser_omp_target): Factor out code, change to call
finish_omp_target, add re-initing call to set_omp_target_this_expr.
* semantics.c (omp_target_this_expr): New static variable.
(omp_target_ptr_members_accessed): New static hash_map for tracking
accessed non-static pointer-type members.
(finish_non_static_data_member): Add call to set_omp_target_this_expr.
Add recording of non-static pointer-type members access.
(finish_this_expr): Add call to set_omp_target_this_expr.
(set_omp_target_this_expr): New function to set omp_target_this_expr.
(finish_omp_target): New function with code merged from
cp_parser_omp_target, plus code to implement this[:1] and __closure map
clauses for OpenMP.

gcc/
* omp-low.c (lower_omp_target):
Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
* tree-pretty-print.c (dump_omp_clause): Likewise.

include/
* gomp-constants.h (enum gomp_map_kind):
Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
(GOMP_MAP_POINTER_P):
Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION.

libgomp/
* libgomp.h (gomp_attach_pointer): Add bool parameter.
* oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer.
(goacc_enter_data_internal): Likewise.
* target.c (gomp_map_vars_existing): Update assert condition to
include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION.
(gomp_map_pointer): Add 'bool allow_zero_length_array_sections'
parameter, add support for mapping a pointer with NULL target.
(gomp_attach_pointer): Add 'bool allow_zero_length_array_sections'
parameter, add support for attaching a pointer with NULL target.
(gomp_map_vars_internal): Update calls to gomp_map_pointer and
gomp_attach_pointer, add handling for
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases.

gcc/testsuite/
* g++.dg/gomp/target-this-1.C: New testcase.
* g++.dg/gomp/target-this-2.C: New testcase.
* g++.dg/gomp/target-this-3.C: New testcase.
* g++.dg/gomp/target-this-4.C: New testcase.

libgomp/
* testsuite/libgomp.c++/target-this-1.C: New testcase.
* testsuite/libgomp.c++/target-this-2.C: New testcase.
* testsuite/libgomp.c++/target-this-3.C: New testcase.
* testsuite/libgomp.c++/target-this-4.C: New testcase.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 63724c0e84f..e45540e08f1 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7277,6 +7277,8 @@ extern void record_null_lambda_scope  (tree);
 extern void finish_lambda_scope(void);
 extern tree start_lambda_function  (tree fn, tree lambda_expr);
 extern void finish_lambda_function (tree body);
+extern tree finish_omp_target  (location_t, tree, tree, bool);
+extern void set_omp_target_this_expr   (tree);
 
 /* in tree.c */
 extern int cp_tree_operand_length  (const_tree);
diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 1a1647f465e..eb09971f288 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -841,6 +841,9 @@ lambda_expr_this_capture (tree lambda, int add_capture_p)
 type cast (_expr.cast_ 5.4) to the type of 'this'. [ T

Re: [PATCH] Support the new ("v0") mangling scheme in rust-demangle.

2020-11-13 Thread Eduard-Mihai Burtescu
Hi everyone,

Apologies again for the delay on my end, the past few weeks have been hectic 
and exhausting.

The changes look good and pass the testing I was doing for my version of the 
patch.
Feel free to commit Nikhil's latest patch for us.

Thanks,
- Eddy B.

On Fri, Nov 13, 2020, at 08:42, Nikhil Benesch wrote:
> On 11/6/20 12:09 PM, Jeff Law wrote:
> > So I think the best path forward is to let you and Eduard-Mihai make the
> > technical decisions about what bits are ready for the trunk.  When y'all
> > think something is ready, let's go ahead and get it installed and
> > iterate on things that aren't quite ready yet.
> > 
> > 
> > For bits y'all think are ready, ISTM that Eduard-Mihai should commit the
> > changes.
> 
> I've attached an updated version of the patch that contains some
> additional unit tests that eddyb noticed I lost. From my perspective,
> this is now ready for commit.
> 
> Neither eddyb nor I have write access, so someone else will need to
> commit. (But please wait for eddyb to sign off too.)
> 
> > It's better to get it in sooner, but there is some degree of freedom
> > depending on the impact of the changes.  Changes in the rust demangler
> > aren't likely to trigger codegen or ABI breakages in the compiler itself
> > -- so with that in mind I think we should give this code a higher degree
> > of freedom to land after the stage1 close deadline.
> 
> Got it. Thanks. That's very helpful context.
> 
> Nikhil
> 
> Attachments:
> * rust-demangle.patch


Re: [22/32] miscelaneous c++ bits

2020-11-13 Thread Nathan Sidwell

On 11/3/20 4:16 PM, Nathan Sidwell wrote:

This is probably the messiest diff.


Let's break this diff apart a bit more, for digestibility.

Here's the MODULE_VECTOR piece.   This is a sparse array used for name 
lookup.  A namespace symbol table entry may contain one of these, which 
holds the bindings for each loaded module.  If a module doesn't bind 
that name, there'll be no slot for it.  The current TU always uses slot 
0.  The Global Module uses slot 1, and in a named-module partitions 
share slot 2.  Slots 1 & 2 are used for duplicate declaration matching.

[These slots are managed entirely in name-lookup].

For data layout purposes, the vector is partitioned into clusters.  Each 
cluster holds two slots.  A slot is the pointer to the bound value 
(often an OVERLOAD) or a lazy cookie, a slot index, and a slot span. 
The span is usually '1', except for namespaces which are module-spanning 
entities.  It is '0' in the fixed slots, for simplicity.  Both the index 
and spans are 16-bit 'unsigned short', and the slot is a 32 or 64 bit 
pointer.  Thus on a 64-bit host we can pack 2 indices, 2 spans and 2 
pointers without padding.  (the slots are not adjacent to their index/span).


The slot itself can contain either a pointer or a load cookie.  These 
are distinguished using bit 0 -- 1 for cookie, zero for pointer.  Thus 
pointers have their natural representation (they are 4 or 8 byte 
aligned).  The lazy cookie happens to be partitioned into two pieces.  A 
pair of bits concerning pending template instantiations and member 
definitions, with the remaining bits (29 or 61) being a section number 
in that module's CMI.  When we do name lookup we see if there's a lazy 
cookie in a slot of interest, and if so load that section before 
proceeding.  (the same cookie may be present in several slots [in 
different bindings], loading will populate all those slots.)  There's a 
preceding check to determine whether the contents of that slot are 
visible to us (the containing module has been exported to us).  the 
result is that importing a module is a cheap operation with an amortized 
load cost depending what you look at.  (there's a flag to turn lazy 
loading off)


As you can guess, there's a limit of 16383 imported modules (there is an 
independent limit of 2^31 imported entities)


Indices are always in increasing order, and by construction we only need 
to append to the array.


nathan


--
Nathan Sidwell
diff --git c/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 63724c0e84f..4752ddef898 100644
--- c/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -929,6 +959,84 @@ struct named_decl_hash : ggc_remove  {
   static void mark_deleted (value_type) { gcc_unreachable (); }
 };
 
+/* Bindings for modules are held in a sparse array.  There is always a
+   current TU slot, others are allocated as needed.  By construction
+   of the importing mechanism we only ever need to append to the
+   array.  Rather than have straight index/slot tuples, we bunch them
+   up for greater packing.
+
+   The cluster representation packs well on a 64-bit system.  */
+
+#define MODULE_VECTOR_SLOTS_PER_CLUSTER 2
+struct mc_index {
+  unsigned short base;
+  unsigned short span;
+};
+
+struct GTY(()) module_cluster
+{
+  mc_index GTY((skip)) indices[MODULE_VECTOR_SLOTS_PER_CLUSTER];
+  mc_slot slots[MODULE_VECTOR_SLOTS_PER_CLUSTER];
+};
+
+/* These two fields overlay lang flags.  So don't use those.  */
+#define MODULE_VECTOR_ALLOC_CLUSTERS(NODE) \
+  (MODULE_VECTOR_CHECK (NODE)->base.u.dependence_info.clique)
+#define MODULE_VECTOR_NUM_CLUSTERS(NODE) \
+  (MODULE_VECTOR_CHECK (NODE)->base.u.dependence_info.base)
+#define MODULE_VECTOR_CLUSTER_BASE(NODE) \
+  (((tree_module_vec *)MODULE_VECTOR_CHECK (NODE))->vec)
+#define MODULE_VECTOR_CLUSTER_LAST(NODE) \
+  (&MODULE_VECTOR_CLUSTER (NODE, MODULE_VECTOR_NUM_CLUSTERS (NODE) - 1))
+#define MODULE_VECTOR_CLUSTER(NODE,IX) \
+  (((tree_module_vec *)MODULE_VECTOR_CHECK (NODE))->vec[IX])
+
+struct GTY(()) tree_module_vec {
+  struct tree_base base;
+  tree name;
+  module_cluster GTY((length ("%h.base.u.dependence_info.base"))) vec[1];
+};
+
+/* The name of a module vector.  */
+#define MODULE_VECTOR_NAME(NODE) \
+  (((tree_module_vec *)MODULE_VECTOR_CHECK (NODE))->name)
+
+/* tree_module_vec does uses  base.u.dependence_info.base field for
+   length.  It does not have lang_flag etc available!  */
+
+/* These two flags note if a module-vector contains deduplicated
+   bindings (i.e. multiple declarations in different imports).  */
+/* This binding contains duplicate references to a global module
+   entity.  */
+#define MODULE_VECTOR_GLOBAL_DUPS_P(NODE) \
+  (MODULE_VECTOR_CHECK (NODE)->base.static_flag)
+/* This binding contains duplicate references to a partioned module
+   entity.  */
+#define MODULE_VECTOR_PARTITION_DUPS_P(NODE) \
+  (MODULE_VECTOR_CHECK (NODE)->base.volatile_flag)
+
+/* These two flags indicate the provenence of the bindings on this
+   particular vector slot.  We can of course 

V2 [PATCH] Use SHF_GNU_RETAIN to preserve symbol definitions

2020-11-13 Thread H.J. Lu via Gcc-patches
On Fri, Nov 13, 2020 at 3:36 AM Jozef Lawrynowicz
 wrote:
>
> On Thu, Nov 12, 2020 at 02:41:52PM -0800, H.J. Lu wrote:
> > diff --git a/gcc/varasm.c b/gcc/varasm.c
> > index 435c7b348a5..c48ef9692ee 100644
> > --- a/gcc/varasm.c
> > +++ b/gcc/varasm.c
> > @@ -289,6 +289,10 @@ get_section (const char *name, unsigned int flags, 
> > tree decl,
> >slot = section_htab->find_slot_with_hash (name, htab_hash_string (name),
> >   INSERT);
> >flags |= SECTION_NAMED;
> > +#if HAVE_GAS_SHF_GNU_RETAIN
> > +  if (decl != nullptr && DECL_PRESERVE_P (decl))
>
> Minor nit, but I think this should be "decl != NULL_TREE".

We are using C++ now.  Should we start using nullptr instead of
NULL_TREE or NULL_RTX?

> We should also test that "used" with the "section" attribute applies the
> "R" flag. Please apply the attached patch if this gets approved. These
> new tests pass with arm-none-eabi and x86_64-pc-linux-gnu.
>

Done.  Here is the updated patch.

-- 
H.J.
From 07c4c78c43d3b94e56d6ace97b660c69998011e4 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 3 Feb 2020 11:55:43 -0800
Subject: [PATCH] Use SHF_GNU_RETAIN to preserve symbol definitions

In assemly code, the section flag 'R' sets the SHF_GNU_RETAIN flag to
indicate that the section must be preserved by the linker.

Add SECTION_RETAIN to indicate a section should be retained by the linker
and set SECTION_RETAIN on section for the preserved symbol if assembler
supports SHF_GNU_RETAIN.  All retained symbols are placed in separate
sections with

	.section .data.rel.local.preserved_symbol,"awR"
preserved_symbol:
...
	.section .data.rel.local,"aw"
not_preserved_symbol:
...

to avoid

	.section .data.rel.local,"awR"
preserved_symbol:
...
not_preserved_symbol:
...

which places not_preserved_symbol definition in the SHF_GNU_RETAIN
section.

gcc/

2020-11-XX  H.J. Lu  

	* configure.ac (HAVE_GAS_SHF_GNU_RETAIN): New.  Define 1 if
	the assembler supports marking sections with SHF_GNU_RETAIN flag.
	* output.h (SECTION_RETAIN): New.  Defined as 0x400.
	(SECTION_MACH_DEP): Changed from 0x400 to 0x800.
	(default_unique_section): Add a bool argument.
	* varasm.c (get_section): Set SECTION_RETAIN for the preserved
	symbol with HAVE_GAS_SHF_GNU_RETAIN.
	(resolve_unique_section): Used named section for the preserved
	symbol if assembler supports SHF_GNU_RETAIN.
	(get_variable_section): Handle the preserved common symbol with
	HAVE_GAS_SHF_GNU_RETAIN.
	(default_elf_asm_named_section): Require the full declaration and
	use the 'R' flag for SECTION_RETAIN.
	* config.in: Regenerated.
	* configure: Likewise.

gcc/testsuite/

2020-11-XX  H.J. Lu  
	Jozef Lawrynowicz  

	* c-c++-common/attr-used.c: Check the 'R' flag.
	* c-c++-common/attr-used-2.c: Likewise.
	* c-c++-common/attr-used-3.c: New test.
	* c-c++-common/attr-used-4.c: Likewise.
	* gcc.c-torture/compile/attr-used-retain-1.c: Likewise.
	* gcc.c-torture/compile/attr-used-retain-2.c: Likewise.
	* lib/target-supports.exp
	(check_effective_target_R_flag_in_section): New proc.
---
 gcc/config.in |  7 +++
 gcc/configure | 51 +++
 gcc/configure.ac  | 20 
 gcc/output.h  |  6 ++-
 gcc/testsuite/c-c++-common/attr-used-2.c  |  1 +
 gcc/testsuite/c-c++-common/attr-used-3.c  |  7 +++
 gcc/testsuite/c-c++-common/attr-used-4.c  |  7 +++
 gcc/testsuite/c-c++-common/attr-used.c|  1 +
 .../compile/attr-used-retain-1.c  | 35 +
 .../compile/attr-used-retain-2.c  | 16 ++
 gcc/testsuite/lib/target-supports.exp | 40 +++
 gcc/varasm.c  | 17 +--
 12 files changed, 204 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/attr-used-3.c
 create mode 100644 gcc/testsuite/c-c++-common/attr-used-4.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/attr-used-retain-1.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/attr-used-retain-2.c

diff --git a/gcc/config.in b/gcc/config.in
index b7c3107bfe3..23ae2f9bc1b 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1352,6 +1352,13 @@
 #endif
 
 
+/* Define 0/1 if your assembler supports marking sections with SHF_GNU_RETAIN
+   flag. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GAS_SHF_GNU_RETAIN
+#endif
+
+
 /* Define 0/1 if your assembler supports marking sections with SHF_MERGE flag.
*/
 #ifndef USED_FOR_TARGET
diff --git a/gcc/configure b/gcc/configure
index dbda4415a17..a925a6e5efb 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -24272,6 +24272,57 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
+# Test if the assembler supports the section flag 'R' for specifying
+# section with SHF_GNU_RETAIN.
+case "${target}" in
+  # Solaris may use GNU assembler with Solairs ld.  Even if GNU
+  # assembler supports the section flag 'R', it doesn't mean that
+  # Solai

Re: [22.2/32] module flags

2020-11-13 Thread Nathan Sidwell
Here are the pieces of patch 22 that add new flag bits to tree nodes and 
lang_decl structs, along with a new global indicating what fragment of a 
module we may be processing.


be aware that header-units, although part of the Global Module, are 
treated as-if they are named modules but with some interesting 'may have 
duplicate' rules.  In particular all their entities are exported, and 
marked as having a purview


There was a free LANG_DECL flag, which I use for DECL_MODULE_EXPORT_P, 
such a decl is being exported from somewhere.


I needed to mark typeinfo types, so DECL_TINFO_P is extended to TYPE_DECLs.

OVL_EXPORT_P is added to indicate that a particular member is an export. 
 This is duplicating DECL_MODULE_EXPORT_P, but it is convenient to have 
it in the overload.


DECL_MODULE_PURVIEW_P -- the decl is in the purview of a module

DECL_MODULE_IMPORT_P -- we got this decl from an import

DECL_MODULE_ENTITY_P -- this decl is in the imported entity array & 
hash.  It may be true even if DECL_MODULE_IMPORT_P is false, because the 
current TU might be defining it.


DECL_MODULE_PENDING_SPECIALIZATIONs, this template decl has 
specializations that we have not loaded (they must be loaded before we 
can instantiate the template)  such specializations can be in arbitray 
modules, not necessarily the one defining the template


DECL_MODULE_PENDING_MEMBERS, likewise, we can define members in other 
modules (partitions or header units), or instantiate implicit members 
anywhere.  These need to be loaded before we can look inside this class.


DECL_ATTACHED_DECLS_P, this namespace-scope decl has a set of attached 
decls for ODR purposes.  The case we handle comes from ranges:


template constexpr T var = [] () { return something; }

That lambda is attached to 'var', it's not a different lambda in each TU.

Nearly all those new flags are added to lang_decl_base.  Originally I 
had the module index there, which is why I drastically shrank 
'selector'.  I keep the shrinkage because I don't really think it's a 
bottleneck.


class module_state is defined inside module.cc, but we need to expose 
its incomplete tag.  'modules_p' is true if we're supporting modules. 
IIRC I had one bug during development where a modules-disabled 
compilation crashed.  So I'm reasonably certain that, when disabled, the 
compiler is still as stable as ever.


module_kind is a set of bits indicating what kind of module we're 
processing.  non-module code will have it zero.  In module purview 
MK_MODULE will be set.  In the GMF of a named module MK_GLOBAL will be 
set (and MK_MODULE clear).  In a header unit, both are set.


MK_EXPORTING is set if we're inside an 'export' either a {...} region, 
or a single decl.  MK_INTERFACE is true if we're in the interface of a 
named module (as opposed to implementation), and MK_PARTITION is true if 
we're in a partition of a named module (interface or implementation).


There are a bunch of inline predicate functions to decode the various 
combinations that are useful.


nathan

--
Nathan Sidwell
diff --git c/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 63724c0e84f..4752ddef898 100644
--- c/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -479,13 +488,14 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   CALL_EXPR_ORDERED_ARGS (in CALL_EXPR, AGGR_INIT_EXPR)
   DECLTYPE_FOR_REF_CAPTURE (in DECLTYPE_TYPE)
   CONSTRUCTOR_C99_COMPOUND_LITERAL (in CONSTRUCTOR)
+  DECL_MODULE_EXPORT_P (in _DECL)
   OVL_NESTED_P (in OVERLOAD)
   LAMBDA_EXPR_INSTANTIATED (in LAMBDA_EXPR)
   Reserved for DECL_MODULE_EXPORT (in DECL_)
4: IDENTIFIER_MARKED (IDENTIFIER_NODEs)
   TREE_HAS_CONSTRUCTOR (in INDIRECT_REF, SAVE_EXPR, CONSTRUCTOR,
 	  CALL_EXPR, or FIELD_DECL).
-  DECL_TINFO_P (in VAR_DECL)
+  DECL_TINFO_P (in VAR_DECL, TYPE_DECL)
   FUNCTION_REF_QUALIFIED (in FUNCTION_TYPE, METHOD_TYPE)
   OVL_LOOKUP_P (in OVERLOAD)
   LOOKUP_FOUND_P (in RECORD_TYPE, UNION_TYPE, ENUMERAL_TYPE, NAMESPACE_DECL)
@@ -493,6 +503,7 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   FUNCTION_RVALUE_QUALIFIED (in FUNCTION_TYPE, METHOD_TYPE)
   CALL_EXPR_REVERSE_ARGS (in CALL_EXPR, AGGR_INIT_EXPR)
   CONSTRUCTOR_PLACEHOLDER_BOUNDARY (in CONSTRUCTOR)
+  OVL_EXPORT_P (in OVL_USING_P OVERLOAD)
6: TYPE_MARKED_P (in _TYPE)
   DECL_NONTRIVIALLY_INITIALIZED_P (in VAR_DECL)
   RANGE_FOR_IVDEP (in RANGE_FOR_STMT)
@@ -768,6 +780,8 @@ typedef struct ptrmem_cst * ptrmem_cst_t;
 #define OVL_NESTED_P(NODE)	TREE_LANG_FLAG_3 (OVERLOAD_CHECK (NODE))
 /* If set, this overload was constructed during lookup.  */
 #define OVL_LOOKUP_P(NODE)	TREE_LANG_FLAG_4 (OVERLOAD_CHECK (NODE))
+/* If set, this OVL_USING_P overload is exported.  */
+#define OVL_EXPORT_P(NODE)	TREE_LANG_FLAG_5 (OVERLOAD_CHECK (NODE))
 
 /* The first decl of an overload.  */
 #define OVL_FIRST(NODE)	ovl_first (NODE)
@@ -835,6 +854,12 @@ class ovl_iterator {
 return (TREE_CODE (ovl) == USING_DECL
 	|| (TREE_C

[committed] d: Explicitly determine which built-in copysign function to call.

2020-11-13 Thread Iain Buclaw via Gcc-patches
Hi,

For some targets, mathfn_built_in returns NULL as copysign is not
implicitly available, causing an ICE.  Now copysign is explicitly
requested when expanding the intrinsic.

Bootstrapped and regression tested on x86_64-linux-gnu and
x86_64-freebsd.  Committed to mainline.

As this fixes an ICE, will also prep it for the releases/gcc-10 branch.

Regards
Iain.

---
gcc/d/ChangeLog:

* intrinsics.cc (expand_intrinsic_copysign): Explicitly determine
which built-in copysign function to call.
---
 gcc/d/intrinsics.cc | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/d/intrinsics.cc b/gcc/d/intrinsics.cc
index a629472c6c5..a7de91019a4 100644
--- a/gcc/d/intrinsics.cc
+++ b/gcc/d/intrinsics.cc
@@ -466,11 +466,14 @@ expand_intrinsic_copysign (tree callexp)
 from = fold_convert (type, from);
 
   /* Which variant of __builtin_copysign* should we call?  */
-  tree builtin = mathfn_built_in (type, BUILT_IN_COPYSIGN);
-  gcc_assert (builtin != NULL_TREE);
+  built_in_function code = (type == float_type_node) ? BUILT_IN_COPYSIGNF
+: (type == double_type_node) ? BUILT_IN_COPYSIGN
+: (type == long_double_type_node) ? BUILT_IN_COPYSIGNL
+: END_BUILTINS;
 
-  return call_builtin_fn (callexp, DECL_FUNCTION_CODE (builtin), 2,
- to, from);
+  gcc_assert (code != END_BUILTINS);
+
+  return call_builtin_fn (callexp, code, 2, to, from);
 }
 
 /* Expand a front-end intrinsic call to pow().  This takes two arguments, the
-- 
2.27.0



[committed] libphobos: Update libtool version to 2:0:0

2020-11-13 Thread Iain Buclaw via Gcc-patches
Hi,

This patch bumps the libphobos soname to 2:0:0 so that the library is
not to conflict with gcc-10.

Bootstrapped and regression tested on x86_64-linux-gnu, and committed to
mainline.

Regards
Iain.
---
libphobos/ChangeLog:

* configure: Regenerate.
* configure.ac (libtool_VERSION): Update to 2:0.0.
---
 libphobos/configure| 2 +-
 libphobos/configure.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libphobos/configure b/libphobos/configure
index 4c1116d6f80..de6d367911a 100755
--- a/libphobos/configure
+++ b/libphobos/configure
@@ -15501,7 +15501,7 @@ SPEC_PHOBOS_DEPS="$LIBS"
 
 
 # Libdruntime / phobos soname version
-libtool_VERSION=1:0:0
+libtool_VERSION=2:0:0
 
 
 # Set default flags (after DRUNTIME_WERROR!)
diff --git a/libphobos/configure.ac b/libphobos/configure.ac
index bf21128bd50..60aee3ffe8b 100644
--- a/libphobos/configure.ac
+++ b/libphobos/configure.ac
@@ -256,7 +256,7 @@ SPEC_PHOBOS_DEPS="$LIBS"
 AC_SUBST(SPEC_PHOBOS_DEPS)
 
 # Libdruntime / phobos soname version
-libtool_VERSION=1:0:0
+libtool_VERSION=2:0:0
 AC_SUBST(libtool_VERSION)
 
 # Set default flags (after DRUNTIME_WERROR!)
-- 
2.27.0



[committed] d: Fix ICE in finish_thunk (PR97644)

2020-11-13 Thread Iain Buclaw via Gcc-patches
Hi,

Because this what the upstream reference compiler did, thunks for the D
front-end were associated with the class definition, so were forced
code-gen even if the target function was extern.  This has now been
changed so there are now only generated if there is a function
definition, fixing the ICE that occurred in PR 97644, which was caused
by calling expand_thunk() early.

Bootstrapped and regression tested on x86_64-linux-gnu, and committed to
mainline.

Regards
Iain.

---
gcc/d/ChangeLog:

PR d/97644
* dmd/MERGE: Merge upstream dmd 95044d8e4.
* d-target.cc (TargetCPP::thunkMangle): New function.
* decl.cc (finish_thunk): Don't force expand thunks for external
functions.
(make_thunk): Emit thunks only if the function has a definition.
Generate correct mangling for thunks to C++ classes.

gcc/testsuite/ChangeLog:

* gdc.dg/pr92216.d: Update scan-assember.
---
 gcc/d/d-target.cc  |  9 ++
 gcc/d/decl.cc  | 56 +++---
 gcc/d/dmd/MERGE|  2 +-
 gcc/d/dmd/cppmangle.c  | 20 +++-
 gcc/d/dmd/mangle.h |  1 +
 gcc/d/dmd/target.h |  2 ++
 gcc/testsuite/gdc.dg/pr92216.d |  4 +--
 7 files changed, 58 insertions(+), 36 deletions(-)

diff --git a/gcc/d/d-target.cc b/gcc/d/d-target.cc
index 692fce6a655..cd136524eb9 100644
--- a/gcc/d/d-target.cc
+++ b/gcc/d/d-target.cc
@@ -329,6 +329,15 @@ TargetCPP::typeInfoMangle (ClassDeclaration *cd)
   return cppTypeInfoMangleItanium (cd);
 }
 
+/* Get mangle name of a this-adjusting thunk to the function declaration FD
+   at call offset OFFSET for C++ linkage.  */
+
+const char *
+TargetCPP::thunkMangle (FuncDeclaration *fd, int offset)
+{
+  return cppThunkMangleItanium (fd, offset);
+}
+
 /* For a vendor-specific type, return a string containing the C++ mangling.
In all other cases, return NULL.  */
 
diff --git a/gcc/d/decl.cc b/gcc/d/decl.cc
index d668715af59..218f35838fd 100644
--- a/gcc/d/decl.cc
+++ b/gcc/d/decl.cc
@@ -1693,26 +1693,6 @@ finish_thunk (tree thunk, tree function)
 
   if (DECL_ONE_ONLY (function))
 thunk_node->add_to_same_comdat_group (funcn);
-
-  /* Target assemble_mi_thunk doesn't work across section boundaries
- on many targets, instead force thunk to be expanded in gimple.  */
-  if (DECL_EXTERNAL (function))
-{
-  /* cgraph::expand_thunk writes over current_function_decl, so if this
-could ever be in use by the codegen pass, we want to know about it.  */
-  gcc_assert (current_function_decl == NULL_TREE);
-
-  if (!stdarg_p (TREE_TYPE (thunk)))
-   {
- thunk_node->create_edge (funcn, NULL, thunk_node->count);
- expand_thunk (thunk_node, false, true);
-   }
-
-  /* Tell the back-end to not bother inlining the function, this is
-assumed not to work as it could be referencing symbols outside
-of the current compilation unit.  */
-  DECL_UNINLINABLE (function) = 1;
-}
 }
 
 /* Return a thunk to DECL.  Thunks adjust the incoming `this' pointer by 
OFFSET.
@@ -1789,12 +1769,11 @@ make_thunk (FuncDeclaration *decl, int offset)
 
   DECL_CONTEXT (thunk) = d_decl_context (decl);
 
-  /* Thunks inherit the public access of the function they are targetting.
- When the function is outside the current compilation unit however, then 
the
- thunk must be kept private to not conflict.  */
-  TREE_PUBLIC (thunk) = TREE_PUBLIC (function) && !DECL_EXTERNAL (function);
-
-  DECL_EXTERNAL (thunk) = 0;
+  /* Thunks inherit the public access of the function they are targeting.
+ Thunks are connected to the definitions of the functions, so thunks are
+ not produced for external functions.  */
+  TREE_PUBLIC (thunk) = TREE_PUBLIC (function);
+  DECL_EXTERNAL (thunk) = DECL_EXTERNAL (function);
 
   /* Thunks are always addressable.  */
   TREE_ADDRESSABLE (thunk) = 1;
@@ -1806,18 +1785,31 @@ make_thunk (FuncDeclaration *decl, int offset)
   DECL_COMDAT (thunk) = DECL_COMDAT (function);
   DECL_WEAK (thunk) = DECL_WEAK (function);
 
-  tree target_name = DECL_ASSEMBLER_NAME (function);
-  unsigned identlen = IDENTIFIER_LENGTH (target_name) + 14;
-  const char *ident = XNEWVEC (const char, identlen);
-  snprintf (CONST_CAST (char *, ident), identlen,
-   "_DT%u%s", offset, IDENTIFIER_POINTER (target_name));
+  /* When the thunk is for an extern C++ function, let C++ do the thunk
+ generation and just reference the symbol as extern, instead of
+ forcing a D local thunk to be emitted.  */
+  const char *ident;
+
+  if (decl->linkage == LINKcpp)
+ident = target.cpp.thunkMangle (decl, offset);
+  else
+{
+  tree target_name = DECL_ASSEMBLER_NAME (function);
+  unsigned identlen = IDENTIFIER_LENGTH (target_name) + 14;
+  ident = XNEWVEC (const char, identlen);
+
+  snprintf (CONST_CAST (char *, ident), identlen,
+   "_DTi%u%s", offset, IDENTIFIER_POINTE

[PING][PATCH] d: Add dragonflybsd support for D compiler and runtime

2020-11-13 Thread Iain Buclaw via Gcc-patches
Ping.

CTFE math fixes have been committed to mainline in r11-4980.

Excerpts from Iain Buclaw's message of October 29, 2020 3:22 pm:
> Hi,
> 
> This patch adds the necessary version conditions and configure rules in
> place to allow building the D compiler on DragonFlyBSD.
> 
> Running the testsuite, all core tests pass, with a couple failures
> relating to CTFE math support which are not blocking the library from
> being usable, and will be fixed in a follow-up.
> 
> OK for mainline?
> 
> Regards
> Iain
> 
> ---
> gcc/ChangeLog:
> 
>   * config.gcc (*-*-dragonfly*): Add dragonfly-d.o and t-dragonfly.
>   * config/dragonfly-d.c: New file.
>   * config/t-dragonfly: New file.
> 
> libphobos/ChangeLog:
> 
>   * configure.tgt: Add *-*-dragonfly* as a supported target.
>   * configure: Regenerate.
>   * m4/druntime/os.m4 (DRUNTIME_OS_SOURCES): Add dragonfly* as a posix
>   target.
> ---
>  gcc/config.gcc  |  3 +++
>  gcc/config/dragonfly-d.c| 37 +
>  gcc/config/t-dragonfly  | 21 +
>  libphobos/configure |  2 +-
>  libphobos/configure.tgt |  3 +++
>  libphobos/m4/druntime/os.m4 |  2 +-
>  6 files changed, 66 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/config/dragonfly-d.c
>  create mode 100644 gcc/config/t-dragonfly
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index d14a1a3e812..8fff8da1dd0 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -731,6 +731,9 @@ case ${target} in
>extra_options="$extra_options rpath.opt dragonfly.opt"
>default_use_cxa_atexit=yes
>use_gcc_stdint=wrap
> +  d_target_objs="${d_target_objs} dragonfly-d.o"
> +  tmake_file="${tmake_file} t-dragonfly"
> +  target_has_targetdm=yes
>;;
>  *-*-freebsd*)
># This is the generic ELF configuration of FreeBSD.  Later
> diff --git a/gcc/config/dragonfly-d.c b/gcc/config/dragonfly-d.c
> new file mode 100644
> index 000..70ec820b75d
> --- /dev/null
> +++ b/gcc/config/dragonfly-d.c
> @@ -0,0 +1,37 @@
> +/* DragonFly support needed only by D front-end.
> +   Copyright (C) 2020 Free Software Foundation, Inc.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm_d.h"
> +#include "d/d-target.h"
> +#include "d/d-target-def.h"
> +
> +/* Implement TARGET_D_OS_VERSIONS for DragonFly targets.  */
> +
> +static void
> +dragonfly_d_os_builtins (void)
> +{
> +  d_add_builtin_version ("DragonFlyBSD");
> +  d_add_builtin_version ("Posix");
> +}
> +
> +#undef TARGET_D_OS_VERSIONS
> +#define TARGET_D_OS_VERSIONS dragonfly_d_os_builtins
> +
> +struct gcc_targetdm targetdm = TARGETDM_INITIALIZER;
> diff --git a/gcc/config/t-dragonfly b/gcc/config/t-dragonfly
> new file mode 100644
> index 000..764ced9cd91
> --- /dev/null
> +++ b/gcc/config/t-dragonfly
> @@ -0,0 +1,21 @@
> +# Copyright (C) 2020 Free Software Foundation, Inc.
> +#
> +# This file is part of GCC.
> +#
> +# GCC is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3, or (at your option)
> +# any later version.
> +#
> +# GCC is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# .
> +
> +dragonfly-d.o: $(srcdir)/config/dragonfly-d.c
> + $(COMPILE) $<
> + $(POSTCOMPILE)
> diff --git a/libphobos/configure b/libphobos/configure
> index 4c1116d6f80..455f338a9e8 100755
> --- a/libphobos/configure
> +++ b/libphobos/configure
> @@ -14283,7 +14283,7 @@ fi
>  
>druntime_target_posix="no"
>case "$druntime_cv_target_os" in
> -aix*|*bsd*|cygwin*|darwin*|gnu*|linux*|skyos*|*solaris*|sysv*)
> +aix*|*bsd*|cygwin*|darwin*|dragonfly*|gnu*|linux*|skyos*|*solaris*|sysv*)
>druntime_target_posix="yes"
>;;
>esac
> diff --git a/libphobos/configure.tgt b/libphobos/configure.tgt
> index 94e42bf5509..1ea9e0c804c 100644
> --- a/libphobos/configure.tgt
> +++ b/libphobos/configure.tgt
> @@ -

[PATCH] improve VN PHI hashing

2020-11-13 Thread Richard Biener
This reduces the number of collisions for PHIs in the VN hashtable
by always hashing the number of predecessors and separately hashing
the block number when we never merge PHIs from different blocks.

This improves collisions seen for the PR69609 testcase dramatically.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-11-13  Richard Biener  

* tree-ssa-sccvn.c (vn_phi_compute_hash): Always hash the
number of predecessors.  Hash the block number also for
loop header PHIs.
(expressions_equal_p): Short-cut SSA name compares, remove
test for NULL operands.
(vn_phi_eq): Cache number of predecessors, change inlined
test from expressions_equal_p.
---
 gcc/tree-ssa-sccvn.c | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 8c93e515b6c..4d78054b1e0 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -4126,13 +4126,27 @@ vn_nary_op_insert_stmt (gimple *stmt, tree result)
 static inline hashval_t
 vn_phi_compute_hash (vn_phi_t vp1)
 {
-  inchash::hash hstate (EDGE_COUNT (vp1->block->preds) > 2
-   ? vp1->block->index : EDGE_COUNT (vp1->block->preds));
+  inchash::hash hstate;
   tree phi1op;
   tree type;
   edge e;
   edge_iterator ei;
 
+  hstate.add_int (EDGE_COUNT (vp1->block->preds));
+  switch (EDGE_COUNT (vp1->block->preds))
+{
+case 1:
+  break;
+case 2:
+  if (vp1->block->loop_father->header == vp1->block)
+   ;
+  else
+   break;
+  /* Fallthru.  */
+default:
+  hstate.add_int (vp1->block->index);
+}
+
   /* If all PHI arguments are constants we need to distinguish
  the PHI node via its type.  */
   type = vp1->type;
@@ -4277,11 +4291,12 @@ vn_phi_eq (const_vn_phi_t const vp1, const_vn_phi_t 
const vp2)
 
   /* Any phi in the same block will have it's arguments in the
  same edge order, because of how we store phi nodes.  */
-  for (unsigned i = 0; i < EDGE_COUNT (vp1->block->preds); ++i)
+  unsigned nargs = EDGE_COUNT (vp1->block->preds);
+  for (unsigned i = 0; i < nargs; ++i)
 {
   tree phi1op = vp1->phiargs[i];
   tree phi2op = vp2->phiargs[i];
-  if (phi1op == VN_TOP || phi2op == VN_TOP)
+  if (phi1op == phi2op)
continue;
   if (!expressions_equal_p (phi1op, phi2op))
return false;
@@ -5612,8 +5627,8 @@ expressions_equal_p (tree e1, tree e2)
   if (e1 == VN_TOP || e2 == VN_TOP)
 return true;
 
-  /* If only one of them is null, they cannot be equal.  */
-  if (!e1 || !e2)
+  /* SSA_NAME compare pointer equal.  */
+  if (TREE_CODE (e1) == SSA_NAME || TREE_CODE (e2) == SSA_NAME)
 return false;
 
   /* Now perform the actual comparison.  */
-- 
2.26.2


Re: [22.2/32] module flags

2020-11-13 Thread Richard Biener via Gcc-patches
On Fri, Nov 13, 2020 at 3:04 PM Nathan Sidwell  wrote:
>
> Here are the pieces of patch 22 that add new flag bits to tree nodes and
> lang_decl structs, along with a new global indicating what fragment of a
> module we may be processing.
>
> be aware that header-units, although part of the Global Module, are
> treated as-if they are named modules but with some interesting 'may have
> duplicate' rules.  In particular all their entities are exported, and
> marked as having a purview
>
> There was a free LANG_DECL flag, which I use for DECL_MODULE_EXPORT_P,
> such a decl is being exported from somewhere.
>
> I needed to mark typeinfo types, so DECL_TINFO_P is extended to TYPE_DECLs.
>
> OVL_EXPORT_P is added to indicate that a particular member is an export.
>   This is duplicating DECL_MODULE_EXPORT_P, but it is convenient to have
> it in the overload.
>
> DECL_MODULE_PURVIEW_P -- the decl is in the purview of a module
>
> DECL_MODULE_IMPORT_P -- we got this decl from an import
>
> DECL_MODULE_ENTITY_P -- this decl is in the imported entity array &
> hash.  It may be true even if DECL_MODULE_IMPORT_P is false, because the
> current TU might be defining it.
>
> DECL_MODULE_PENDING_SPECIALIZATIONs, this template decl has
> specializations that we have not loaded (they must be loaded before we
> can instantiate the template)  such specializations can be in arbitray
> modules, not necessarily the one defining the template
>
> DECL_MODULE_PENDING_MEMBERS, likewise, we can define members in other
> modules (partitions or header units), or instantiate implicit members
> anywhere.  These need to be loaded before we can look inside this class.
>
> DECL_ATTACHED_DECLS_P, this namespace-scope decl has a set of attached
> decls for ODR purposes.  The case we handle comes from ranges:
>
> template constexpr T var = [] () { return something; }
>
> That lambda is attached to 'var', it's not a different lambda in each TU.
>
> Nearly all those new flags are added to lang_decl_base.  Originally I
> had the module index there, which is why I drastically shrank
> 'selector'.  I keep the shrinkage because I don't really think it's a
> bottleneck.
>
> class module_state is defined inside module.cc, but we need to expose
> its incomplete tag.  'modules_p' is true if we're supporting modules.
> IIRC I had one bug during development where a modules-disabled
> compilation crashed.  So I'm reasonably certain that, when disabled, the
> compiler is still as stable as ever.
>
> module_kind is a set of bits indicating what kind of module we're
> processing.  non-module code will have it zero.  In module purview
> MK_MODULE will be set.  In the GMF of a named module MK_GLOBAL will be
> set (and MK_MODULE clear).  In a header unit, both are set.
>
> MK_EXPORTING is set if we're inside an 'export' either a {...} region,
> or a single decl.  MK_INTERFACE is true if we're in the interface of a
> named module (as opposed to implementation), and MK_PARTITION is true if
> we're in a partition of a named module (interface or implementation).
>
> There are a bunch of inline predicate functions to decode the various
> combinations that are useful.

 struct GTY(()) lang_decl_base {
-  /* Larger than necessary for faster access.  */
-  ENUM_BITFIELD(lang_decl_selector) selector : 16;
+  ENUM_BITFIELD(lang_decl_selector) selector : 3;
...
+  unsigned attached_decls_p : 1;
+
+  /* 10 spare bits.  */

so for "faster access' you could still make selector 8 bits, reducing
spare bits to 5.

Can you add comments (like on some other bits var / fn / type)
what kind of decls the new bits are used on?  Maybe some
bits can be overloaded if spare bits are needed.

Thanks,
Richard.

> nathan
>
> --
> Nathan Sidwell


Re: [PING][PATCH] d: Add dragonflybsd support for D compiler and runtime

2020-11-13 Thread Richard Biener via Gcc-patches
On Fri, Nov 13, 2020 at 3:18 PM Iain Buclaw via Gcc-patches
 wrote:
>
> Ping.
>
> CTFE math fixes have been committed to mainline in r11-4980.

OK.

> Excerpts from Iain Buclaw's message of October 29, 2020 3:22 pm:
> > Hi,
> >
> > This patch adds the necessary version conditions and configure rules in
> > place to allow building the D compiler on DragonFlyBSD.
> >
> > Running the testsuite, all core tests pass, with a couple failures
> > relating to CTFE math support which are not blocking the library from
> > being usable, and will be fixed in a follow-up.
> >
> > OK for mainline?
> >
> > Regards
> > Iain
> >
> > ---
> > gcc/ChangeLog:
> >
> >   * config.gcc (*-*-dragonfly*): Add dragonfly-d.o and t-dragonfly.
> >   * config/dragonfly-d.c: New file.
> >   * config/t-dragonfly: New file.
> >
> > libphobos/ChangeLog:
> >
> >   * configure.tgt: Add *-*-dragonfly* as a supported target.
> >   * configure: Regenerate.
> >   * m4/druntime/os.m4 (DRUNTIME_OS_SOURCES): Add dragonfly* as a posix
> >   target.
> > ---
> >  gcc/config.gcc  |  3 +++
> >  gcc/config/dragonfly-d.c| 37 +
> >  gcc/config/t-dragonfly  | 21 +
> >  libphobos/configure |  2 +-
> >  libphobos/configure.tgt |  3 +++
> >  libphobos/m4/druntime/os.m4 |  2 +-
> >  6 files changed, 66 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/config/dragonfly-d.c
> >  create mode 100644 gcc/config/t-dragonfly
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index d14a1a3e812..8fff8da1dd0 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -731,6 +731,9 @@ case ${target} in
> >extra_options="$extra_options rpath.opt dragonfly.opt"
> >default_use_cxa_atexit=yes
> >use_gcc_stdint=wrap
> > +  d_target_objs="${d_target_objs} dragonfly-d.o"
> > +  tmake_file="${tmake_file} t-dragonfly"
> > +  target_has_targetdm=yes
> >;;
> >  *-*-freebsd*)
> ># This is the generic ELF configuration of FreeBSD.  Later
> > diff --git a/gcc/config/dragonfly-d.c b/gcc/config/dragonfly-d.c
> > new file mode 100644
> > index 000..70ec820b75d
> > --- /dev/null
> > +++ b/gcc/config/dragonfly-d.c
> > @@ -0,0 +1,37 @@
> > +/* DragonFly support needed only by D front-end.
> > +   Copyright (C) 2020 Free Software Foundation, Inc.
> > +
> > +GCC is free software; you can redistribute it and/or modify it under
> > +the terms of the GNU General Public License as published by the Free
> > +Software Foundation; either version 3, or (at your option) any later
> > +version.
> > +
> > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > +for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +.  */
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "tm_d.h"
> > +#include "d/d-target.h"
> > +#include "d/d-target-def.h"
> > +
> > +/* Implement TARGET_D_OS_VERSIONS for DragonFly targets.  */
> > +
> > +static void
> > +dragonfly_d_os_builtins (void)
> > +{
> > +  d_add_builtin_version ("DragonFlyBSD");
> > +  d_add_builtin_version ("Posix");
> > +}
> > +
> > +#undef TARGET_D_OS_VERSIONS
> > +#define TARGET_D_OS_VERSIONS dragonfly_d_os_builtins
> > +
> > +struct gcc_targetdm targetdm = TARGETDM_INITIALIZER;
> > diff --git a/gcc/config/t-dragonfly b/gcc/config/t-dragonfly
> > new file mode 100644
> > index 000..764ced9cd91
> > --- /dev/null
> > +++ b/gcc/config/t-dragonfly
> > @@ -0,0 +1,21 @@
> > +# Copyright (C) 2020 Free Software Foundation, Inc.
> > +#
> > +# This file is part of GCC.
> > +#
> > +# GCC is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License as published by
> > +# the Free Software Foundation; either version 3, or (at your option)
> > +# any later version.
> > +#
> > +# GCC is distributed in the hope that it will be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with GCC; see the file COPYING3.  If not see
> > +# .
> > +
> > +dragonfly-d.o: $(srcdir)/config/dragonfly-d.c
> > + $(COMPILE) $<
> > + $(POSTCOMPILE)
> > diff --git a/libphobos/configure b/libphobos/configure
> > index 4c1116d6f80..455f338a9e8 100755
> > --- a/libphobos/configure
> > +++ b/libphobos/configure
> > @@ -14283,7 +14283,7 @@ fi
> >
> >druntime_target_posix="no"
> >case "$druntime_cv_target_os" in
> > -aix*|*bsd*|cygwin*|darwin*|gnu*|linux*|skyos*

Re: [PATCH] Implementation of asm goto outputs

2020-11-13 Thread Vladimir Makarov via Gcc-patches



On 2020-11-13 4:00 a.m., Richard Biener wrote:

On Thu, Nov 12, 2020 at 8:55 PM Vladimir Makarov via Gcc-patches
 wrote:

The following patch implements asm goto with outputs.  Kernel
developers several times expressed wish to have this feature. Asm
goto with outputs was implemented in LLVM recently.  This new feature
was presented on 2020 linux plumbers conference
(https://linuxplumbersconf.org/event/7/contributions/801/attachments/659/1212/asm_goto_w__Outputs.pdf)
and 2020 LLVM conference
(https://www.youtube.com/watch?v=vcPD490s-hE).

The patch permits to use outputs in asm gotos only when LRA is used.
It is problematic to implement it in the old reload pass.  To be
honest it was hard to implement it in LRA too until global live info
update was added to LRA few years ago.

Different from LLVM asm goto output implementation, you can use
outputs on any path from the asm goto (not only on fallthrough path as
in LLVM).

The patch removes critical edges on which potentially asm output
reloads could occur (it means you can have several asm gotos using the
same labels and the same outputs).  It is done in IRA as it is
difficult to create new BBs in LRA.  The most of the work (placement
of output reloads in BB destinations of asm goto basic block) is done in
LRA.  When it happens, LRA updates global live info to reflect that
new pseudos live on the BB borders and the old ones do not live there
anymore.

I tried also approach to split live ranges of pseudos involved in
asm goto outputs to guarantee they get hard registers in IRA. But
this approach did not work as it is difficult to keep this assignment
through all LRA. Also probably it would result in worse code as move
insn coalescing is not guaranteed.

Asm goto with outputs will not work for targets which were not
converted to LRA (probably some outdated targets as the old reload
pass is not supported anymore).  An error will be generated when the
old reload pass meets asm goto with an output.  A precaution is taken
not to crash compiler after this error.

The patch is pretty small as all necessary infrastructure was
already implemented, practically in all compiler pipeline.  It did not
required adding new RTL insns opposite to what Google engineers did to
LLVM MIR.

The patch could be also useful for implementing jump insns with
output reloads in the future (e.g. branch and count insns).

I think asm gotos with outputs should be considered as an experimental
feature as there are no real usage of this yet.  Earlier adoption of
this feature could help with debugging and hardening the
implementation.

The patch was successfully bootstrapped and tested on x86-64, ppc64,
and aarch64.

Are non-RA changes ok in the patch?

Minor nit for the RA parts:

+  if (i < recog_data.n_operands)
+   {
+ error_for_asm (insn,
+"old reload pass does not support asm goto "
+"with outputs in %");
+ ira_nullify_asm_goto (insn);

I'd say "the target does not support ...", the user shouldn't be concerned
about a thing called "reload".


Yes, it has sense.  A regular user hardly knows our internal kitchen.

diff --git a/gcc/tree-into-ssa.c b/gcc/tree-into-ssa.c
index 1493b323956..9be8e295627 100644
--- a/gcc/tree-into-ssa.c
+++ b/gcc/tree-into-ssa.c
@@ -1412,6 +1412,11 @@ rewrite_stmt (gimple_stmt_iterator *si)
 SET_DEF (def_p, name);
 register_new_def (DEF_FROM_PTR (def_p), var);

+   /* Do not insert debug stmt after asm goto: */
+   if (gimple_code (stmt) == GIMPLE_ASM
+   && gimple_asm_nlabels (as_a  (stmt)) > 0)
+ continue;
+

why?  Ah, the next line explains.  I guess it's better done as

/* Do not insert debug stmts if the stmt ends the BB.  */
if (stmt_ends_bb_p (stmt))
  continue;


Richard, thank you for your review.  I am not familiar well with the 
middle-end.  So your comments are really useful.



I wonder why the code never ran into issues for calls that throw
internal ...



I have no idea.  But I really ran into this problem when I tested asm 
goto with outputs.



You have plenty compile testcases but not a single execute one.
So - does it actually work? ;)
Yes, it works.  Two tests actually produces output reloads at least on 
x86-64.  As for execution tests, it is difficult to write for me 
something meaningful, especially with generated output reloads. I'll try 
to add execution tests too.

Otherwise OK.



Richard, thank you again for your quick review. I'll update the patch 
according to your proposals, test it again and commit it.




2020-11-12  Vladimir Makarov 

  * c/c-parser.c (c_parser_asm_statement): Parse outputs for asm
  goto too.
  * c/c-typeck.c (build_asm_expr): Remove an assert checking output
  absence for asm goto.

I'm sure this will be rejected by the commit hook.  You need sth like



OK thanks.



gcc/c/
 * c-parser.c (...

gcc/



  

[PATCH] Cleanup range of address calculations.

2020-11-13 Thread Andrew MacLeod via Gcc-patches
There were some slight differences between when ranger would get 
non-zero for &expr and what EVRP was getting.


Ive extracted the bits from vrp_stmt_computes_nonzero required and now 
the 2 versions should be aligned. Ive also renamed the function from the 
previously cryptic range_of_non_trivial_assignment () to simply 
range_of_address ().


I also discovered we were not tracking integral constants, like 0, (AKA 
NULL) for pointers, so I added the constant handler in the pointer table.


I modified a test from 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655 which shows some 
cases which track non-zero and zero and verified we remove bits when we 
can. Disabled all the previous passes so that EVRp actually sees the 
code and can make decisions.


bootstrapped on x86_64-pc-linux-gnu, no regressions. Pushed.

Andrew
commit 47923622c663ffad8b14aa93706183290d4f6791
Author: Andrew MacLeod 
Date:   Thu Nov 12 11:53:52 2020 -0500

Cleanup range of address calculations.

Align EVRP and ranger for how ranges of ADDR_EXPR are calculated.

gcc/
* gimple-range.cc: (gimple_ranger::range_of_range_op): Check for
ADDR_EXPR and call range_of_address.
(gimple_ranger::range_of_address): Rename from
range_of_non_trivial_assignment and match vrp_stmt_computes_nonzero.
* gimple-range.h: (range_of_address): Renamed.
* range-op.cc: (pointer_table): Add INTEGER_CST handler.
gcc/testsuite/
* gcc.dg/tree-ssa/pr78655.c: New.

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 92a6335bec5..4f5d5024fa9 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -431,8 +431,9 @@ gimple_ranger::range_of_range_op (irange &r, gimple *s)
   m_cache.register_dependency (lhs, op2);
 }
 
-  if (range_of_non_trivial_assignment (r, s))
-return true;
+  if (gimple_code (s) == GIMPLE_ASSIGN
+  && gimple_assign_rhs_code (s) == ADDR_EXPR)
+return range_of_address (r, s);
 
   if (range_of_expr (range1, op1, s))
 {
@@ -446,48 +447,84 @@ gimple_ranger::range_of_range_op (irange &r, gimple *s)
   return true;
 }
 
-// Calculate the range of a non-trivial assignment.  That is, is one
-// inolving arithmetic on an SSA name (for example, an ADDR_EXPR).
+// Calculate the range of an assignment containing an ADDR_EXPR.
 // Return the range in R.
-//
-// If a range cannot be calculated, return false.
+// If a range cannot be calculated, set it to VARYING and return true.
 
 bool
-gimple_ranger::range_of_non_trivial_assignment (irange &r, gimple *stmt)
+gimple_ranger::range_of_address (irange &r, gimple *stmt)
 {
-  if (gimple_code (stmt) != GIMPLE_ASSIGN)
-return false;
+  gcc_checking_assert (gimple_code (stmt) == GIMPLE_ASSIGN);
+  gcc_checking_assert (gimple_assign_rhs_code (stmt) == ADDR_EXPR);
 
-  tree base = gimple_range_base_of_assignment (stmt);
-  if (base)
+  bool strict_overflow_p;
+  tree expr = gimple_assign_rhs1 (stmt);
+  poly_int64 bitsize, bitpos;
+  tree offset;
+  machine_mode mode;
+  int unsignedp, reversep, volatilep;
+  tree base = get_inner_reference (TREE_OPERAND (expr, 0), &bitsize,
+   &bitpos, &offset, &mode, &unsignedp,
+   &reversep, &volatilep);
+
+
+  if (base != NULL_TREE
+  && TREE_CODE (base) == MEM_REF
+  && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME)
 {
-  if (TREE_CODE (base) == MEM_REF)
+  tree ssa = TREE_OPERAND (base, 0);
+  gcc_checking_assert (irange::supports_type_p (TREE_TYPE (ssa)));
+  range_of_expr (r, ssa, stmt);
+  range_cast (r, TREE_TYPE (gimple_assign_rhs1 (stmt)));
+
+  poly_offset_int off = 0;
+  bool off_cst = false;
+  if (offset == NULL_TREE || TREE_CODE (offset) == INTEGER_CST)
 	{
-	  if (TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME)
-	{
-	  int_range_max range1;
-	  tree ssa = TREE_OPERAND (base, 0);
-	  if (range_of_expr (range1, ssa, stmt))
-		{
-		  tree type = TREE_TYPE (ssa);
-		  range_operator *op = range_op_handler (POINTER_PLUS_EXPR,
-			 type);
-		  int_range<2> offset (TREE_OPERAND (base, 1),
-   TREE_OPERAND (base, 1));
-		  op->fold_range (r, type, range1, offset);
-		  return true;
-		}
-	}
-	  return false;
+	  off = mem_ref_offset (base);
+	  if (offset)
+	off += poly_offset_int::from (wi::to_poly_wide (offset),
+	  SIGNED);
+	  off <<= LOG2_BITS_PER_UNIT;
+	  off += bitpos;
+	  off_cst = true;
 	}
-  if (gimple_assign_rhs_code (stmt) == ADDR_EXPR)
+  /* If &X->a is equal to X, the range of X is the result.  */
+  if (off_cst && known_eq (off, 0))
+	  return true;
+  else if (flag_delete_null_pointer_checks
+	   && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (expr)))
+	{
+	 /* For -fdelete-null-pointer-checks -fno-wrapv-pointer we don't
+	 allow going from non-NULL pointer to NULL.  */
+	   if(!range_includes_zero_p (&r))
+	return true;
+	}
+  /* If MEM_REF has a "positive" offset, consider it non-NULL

[committed] arm: Make use of RTL predicates

2020-11-13 Thread Andrea Corallo via Gcc-patches
Hi all,

this is to fix missing uses of RTL predicates in the arm backend.

Regtested and bootstraped on arm-linux-gnueabihf.

Commited into master as 156edf21fab as pre-approved [1].

   Andrea

[1] 

>From 156edf21fab7dd5891c72db7ec58b38ef7d52bfa Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Fri, 13 Nov 2020 11:42:04 +
Subject: [PATCH] arm: Make use of RTL predicates

2020-11-13  Andrea Corallo  

* config/arm/aarch-common.c (aarch_accumulator_forwarding): Use
RTL predicates where possible.
* config/arm/arm.c (legitimate_pic_operand_p)
(legitimize_pic_address, arm_is_segment_info_known)
(can_avoid_literal_pool_for_label_p)
(thumb1_legitimate_address_p, arm_legitimize_address)
(arm_tls_referenced_p, thumb_legitimate_constant_p)
(REG_OR_SUBREG_REG, thumb1_rtx_costs, thumb1_size_rtx_costs)
(arm_adjust_cost, arm_coproc_mem_operand_wb)
(neon_vector_mem_operand, neon_struct_mem_operand)
(symbol_mentioned_p, label_mentioned_p, )
(load_multiple_sequence, store_multiple_sequence)
(arm_select_cc_mode, arm_reload_in_hi, arm_reload_out_hi)
(mem_ok_for_ldrd_strd, arm_emit_call_insn, output_move_neon)
(arm_attr_length_move_neon, arm_assemble_integer)
(arm_emit_coreregs_64bit_shift, arm_valid_symbolic_address_p)
(extract_base_offset_in_addr, fusion_load_store): Likewise.
---
 gcc/config/arm/aarch-common.c |  2 +-
 gcc/config/arm/arm.c  | 90 +--
 2 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index 6bc6ccf9411..e7b13f00fb4 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -485,7 +485,7 @@ aarch_accumulator_forwarding (rtx_insn *producer, rtx_insn 
*consumer)
return 0;
 }
 
-  if (GET_CODE (accumulator) == SUBREG)
+  if (SUBREG_P (accumulator))
 accumulator = SUBREG_REG (accumulator);
 
   if (!REG_P (accumulator))
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5612d1e7e18..04190b1880a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7775,7 +7775,7 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
 int
 legitimate_pic_operand_p (rtx x)
 {
-  if (GET_CODE (x) == SYMBOL_REF
+  if (SYMBOL_REF_P (x)
   || (GET_CODE (x) == CONST
  && GET_CODE (XEXP (x, 0)) == PLUS
  && GET_CODE (XEXP (XEXP (x, 0), 0)) == SYMBOL_REF))
@@ -7904,8 +7904,8 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx 
reg, rtx pic_reg,
 {
   gcc_assert (compute_now == (pic_reg != NULL_RTX));
 
-  if (GET_CODE (orig) == SYMBOL_REF
-  || GET_CODE (orig) == LABEL_REF)
+  if (SYMBOL_REF_P (orig)
+  || LABEL_REF_P (orig))
 {
   if (reg == 0)
{
@@ -7922,8 +7922,8 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx 
reg, rtx pic_reg,
   /* References to weak symbols cannot be resolved locally: they
 may be overridden by a non-weak definition at link time.  */
   rtx_insn *insn;
-  if ((GET_CODE (orig) == LABEL_REF
-  || (GET_CODE (orig) == SYMBOL_REF
+  if ((LABEL_REF_P (orig)
+  || (SYMBOL_REF_P (orig)
   && SYMBOL_REF_LOCAL_P (orig)
   && (SYMBOL_REF_DECL (orig)
   ? !DECL_WEAK (SYMBOL_REF_DECL (orig)) : 1)
@@ -8177,7 +8177,7 @@ arm_is_segment_info_known (rtx orig, bool *is_readonly)
 {
   *is_readonly = false;
 
-  if (GET_CODE (orig) == LABEL_REF)
+  if (LABEL_REF_P (orig))
 {
   *is_readonly = true;
   return true;
@@ -8437,7 +8437,7 @@ can_avoid_literal_pool_for_label_p (rtx x)
  (set (reg r0) (mem (reg r0))).
  No extra register is required, and (mem (reg r0)) won't cause the use
  of literal pools.  */
-  if (arm_disable_literal_pool && GET_CODE (x) == SYMBOL_REF
+  if (arm_disable_literal_pool && SYMBOL_REF_P (x)
   && CONSTANT_POOL_ADDRESS_P (x))
 return 1;
   return 0;
@@ -8816,7 +8816,7 @@ thumb1_legitimate_address_p (machine_mode mode, rtx x, 
int strict_p)
 
   /* This is PC relative data before arm_reorg runs.  */
   else if (GET_MODE_SIZE (mode) >= 4 && CONSTANT_P (x)
-  && GET_CODE (x) == SYMBOL_REF
+  && SYMBOL_REF_P (x)
   && CONSTANT_POOL_ADDRESS_P (x) && !flag_pic
   && !arm_disable_literal_pool)
 return 1;
@@ -8824,7 +8824,7 @@ thumb1_legitimate_address_p (machine_mode mode, rtx x, 
int strict_p)
   /* This is PC relative data after arm_reorg runs.  */
   else if ((GET_MODE_SIZE (mode) >= 4 || mode == HFmode)
   && reload_completed
-  && (GET_CODE (x) == LABEL_REF
+  && (LABEL_REF_P (x)
   || (GET_CODE (x) == CONST
   && GET_CODE (XEXP (x, 0)) == PLUS
   && GET_CODE (XEXP (XEXP (x, 0), 0)) == LABEL_REF
@@ -8884,7 +8884,7 @@ thumb1_legitimate_address_p (machine

Re: [PATCH v2] c: Silently ignore pragma region [PR85487]

2020-11-13 Thread Austin Morton via Gcc-patches
On the contrary, as a user of GCC I would much prefer a consistent
behavior for #pragma region based purely on GCC version.

IE, so you can tell people:
"just update to GCC X.Y and those warnings will go away"
rather than:
"update to GCC X.Y and pass some new flags - but make sure
 not to pass them to old GCC versions, since that will generate
 a new warning"

I do agree it may be generally useful to have a configurable way to
specify pragmas to ignore at runtime, but that is not what I was trying
to accomplish here.

Both clang and MSVC handle this pragma without any runtime
configuration, and I think GCC should as well.

Austin

On Thu, Nov 12, 2020 at 11:25 PM Jeff Law  wrote:
>
>
> On 9/2/20 6:59 PM, Austin Morton via Gcc-patches wrote:
> > #pragma region is a feature introduced by Microsoft in order to allow
> > manual grouping and folding of code within Visual Studio.  It is
> > entirely ignored by the compiler.  Clang has supported this feature
> > since 2012 when in MSVC compatibility mode, and enabled it across the
> > board in 2018.
> >
> > As it stands, you cannot use #pragma region within GCC without
> > disabling unknown pragma warnings, which is not advisable.
> >
> > I propose GCC adopt "#pragma region" and "#pragma endregion" in order
> > to alleviate these issues.  Because the pragma has no purpose at
> > compile time, the implementation is trivial.
> >
> >
> > Microsoft Documentation on the feature:
> > https://docs.microsoft.com/en-us/cpp/preprocessor/region-endregion
> >
> > LLVM change which enabled pragma region across the board:
> > https://reviews.llvm.org/D42248
> > ---
> >  gcc/ChangeLog|  5 +
> >  gcc/c-family/ChangeLog   |  5 +
> >  gcc/c-family/c-pragma.c  | 10 ++
> >  gcc/doc/cpp.texi |  6 ++
> >  gcc/testsuite/ChangeLog  |  5 +
> >  gcc/testsuite/gcc.dg/pragma-region.c | 21 +
> >  6 files changed, 52 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/pragma-region.c
>
> I'm not sure that this is really the way we want to handle this stuff.
> I understand the problem you're trying to solve, but embedding a list of
> pragmas to ignore into the compiler itself just seems like the wrong
> approach -- it bakes that set of pragmas to ignore into the compiler.
>
>
> ISTM that we'd be better off either having a command line option to list
> the set of pragmas to ignore, or they should be pulled from a file
> specified on the command line.   That would seem to be a lot more
> friendly to downstream users since each project could set the list of
> pragmas to ignore on their own and have that set updated dynamically
> over time without having to patch and update GCC.
>
>
> Any chance you would be willing to work on that?
>
> Jeff
>


Re: [PATCH v2] c: Silently ignore pragma region [PR85487]

2020-11-13 Thread David Malcolm via Gcc-patches
On Fri, 2020-11-13 at 09:57 -0500, Austin Morton via Gcc-patches wrote:
> On the contrary, as a user of GCC I would much prefer a consistent
> behavior for #pragma region based purely on GCC version.
> 
> IE, so you can tell people:
> "just update to GCC X.Y and those warnings will go away"
> rather than:
> "update to GCC X.Y and pass some new flags - but make sure
>  not to pass them to old GCC versions, since that will generate
>  a new warning"
> 
> I do agree it may be generally useful to have a configurable way to
> specify pragmas to ignore at runtime, but that is not what I was
> trying
> to accomplish here.
> 
> Both clang and MSVC handle this pragma without any runtime
> configuration, and I think GCC should as well.

FWIW I like the patch (but I don't think I can approve it).

How much does this pragma get used "in the wild"?

Thinking aloud, I wonder if it would be useful to capture regions in
the diagnostic subsystem, and emit "In region ..." messages,
rather like we emit "In function ..." when first emitting a
diagnostic within a region?  (not sure if good idea, just
brainstorming)

Dave

> Austin
> 
> On Thu, Nov 12, 2020 at 11:25 PM Jeff Law  wrote:
> > 
> > On 9/2/20 6:59 PM, Austin Morton via Gcc-patches wrote:
> > > #pragma region is a feature introduced by Microsoft in order to
> > > allow
> > > manual grouping and folding of code within Visual Studio.  It is
> > > entirely ignored by the compiler.  Clang has supported this
> > > feature
> > > since 2012 when in MSVC compatibility mode, and enabled it across
> > > the
> > > board in 2018.
> > > 
> > > As it stands, you cannot use #pragma region within GCC without
> > > disabling unknown pragma warnings, which is not advisable.
> > > 
> > > I propose GCC adopt "#pragma region" and "#pragma endregion" in
> > > order
> > > to alleviate these issues.  Because the pragma has no purpose at
> > > compile time, the implementation is trivial.
> > > 
> > > 
> > > Microsoft Documentation on the feature:
> > > https://docs.microsoft.com/en-us/cpp/preprocessor/region-endregion
> > > 
> > > LLVM change which enabled pragma region across the board:
> > > https://reviews.llvm.org/D42248
> > > ---
> > >  gcc/ChangeLog|  5 +
> > >  gcc/c-family/ChangeLog   |  5 +
> > >  gcc/c-family/c-pragma.c  | 10 ++
> > >  gcc/doc/cpp.texi |  6 ++
> > >  gcc/testsuite/ChangeLog  |  5 +
> > >  gcc/testsuite/gcc.dg/pragma-region.c | 21 +
> > >  6 files changed, 52 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.dg/pragma-region.c
> > 
> > I'm not sure that this is really the way we want to handle this
> > stuff.
> > I understand the problem you're trying to solve, but embedding a
> > list of
> > pragmas to ignore into the compiler itself just seems like the
> > wrong
> > approach -- it bakes that set of pragmas to ignore into the
> > compiler.
> > 
> > 
> > ISTM that we'd be better off either having a command line option to
> > list
> > the set of pragmas to ignore, or they should be pulled from a file
> > specified on the command line.   That would seem to be a lot more
> > friendly to downstream users since each project could set the list
> > of
> > pragmas to ignore on their own and have that set updated
> > dynamically
> > over time without having to patch and update GCC.
> > 
> > 
> > Any chance you would be willing to work on that?
> > 
> > Jeff
> > 



Re: [PATCH v2] c: Silently ignore pragma region [PR85487]

2020-11-13 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 13, 2020 at 09:57:31AM -0500, Austin Morton via Gcc-patches wrote:
> On the contrary, as a user of GCC I would much prefer a consistent
> behavior for #pragma region based purely on GCC version.
> 
> IE, so you can tell people:
> "just update to GCC X.Y and those warnings will go away"
> rather than:
> "update to GCC X.Y and pass some new flags - but make sure
>  not to pass them to old GCC versions, since that will generate
>  a new warning"
> 
> I do agree it may be generally useful to have a configurable way to
> specify pragmas to ignore at runtime, but that is not what I was trying
> to accomplish here.
> 
> Both clang and MSVC handle this pragma without any runtime
> configuration, and I think GCC should as well.

But in that case the pragma shouldn't be ignored, but instead checked
against the various requirements.  E.g. that region is followed by a single
optional name (are there any requirements on what name can be, can it be
just a single token, can it be C/C++ keyword, etc.), guess ignore all the
tokens after endregion if it is all a comment, and verify nesting
(all region/endregion are properly paired up).

Jakub



Re: [PATCH] c++: Don't form a templated TARGET_EXPR in finish_compound_literal

2020-11-13 Thread Patrick Palka via Gcc-patches
On Thu, 12 Nov 2020, Jason Merrill wrote:

> On 11/12/20 1:27 PM, Patrick Palka wrote:
> > The atom_cache in normalize_atom relies on the assumption that two
> > equivalent (templated) trees (in the sense of cp_tree_equal) must use
> > the same template parameters (according to find_template_parameters).
> > 
> > This assumption unfortunately doesn't always hold for TARGET_EXPRs,
> > because cp_tree_equal ignores an artificial target of a TARGET_EXPR, but
> > find_template_parameters walks this target (and its DECL_CONTEXT).
> > 
> > Hence two TARGET_EXPRs built by force_target_expr with the same
> > initializer but under different settings of current_function_decl may
> > compare equal according to cp_tree_equal, but find_template_parameters
> > returns a different set of template parameters for them.  This breaks
> > the below testcase because during normalization we build two such
> > TARGET_EXPRs (one under current_function_decl=f and another under =g),
> > and then use the same ATOMIC_CONSTR for the two corresponding atoms,
> > leading to a crash during satisfaction of g's associated constraints.
> > 
> > This patch works around this assumption violation by removing the source
> > of these templated TARGET_EXPRs.  The relevant call to get_target_expr was
> > added in r9-6043, but it seems it's no longer necessary (according to
> > https://gcc.gnu.org/pipermail/gcc-patches/2019-February/517323.html, the
> > call was added in order to avoid regressing on initlist109.C at the time).
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?
> 
> OK.  I wonder what else asserting !processing_template_decl in
> build_target_expr would find...

FWIW, testing exposed seven distinct paths that trigger such an assert,
five of which go through build_cplus_new:


0x6b3983 build_target_expr
/gcc/gcc/cp/tree.c:496
0xa94ec8 build_cplus_new(tree_node*, tree_node*, int)
/gcc/gcc/cp/tree.c:728
0x91712b force_rvalue(tree_node*, int)
/gcc/gcc/cp/cvt.c:569
0x8b8310 build_conditional_expr_1
/gcc/gcc/cp/call.c:5592
0x8ba08c build_conditional_expr(op_location_t const&, tree_node*, tree_node*, 
tree_node*, int)
/gcc/gcc/cp/call.c:5777
0xaa70fb build_x_conditional_expr(unsigned int, tree_node*, tree_node*, 
tree_node*, int)
/gcc/gcc/cp/typeck.c:7133
0x9cc9fa cp_parser_assignment_expression
/gcc/gcc/cp/parser.c:9964


0x6b3983 build_target_expr
/gcc/gcc/cp/tree.c:496
0xa94ec8 build_cplus_new(tree_node*, tree_node*, int)
/gcc/gcc/cp/tree.c:728
0x97d185 expand_default_init
/gcc/gcc/cp/init.c:1924
0x97d185 expand_aggr_init_1
/gcc/gcc/cp/init.c:2101
0x97f026 build_aggr_init(tree_node*, tree_node*, int, int)
/gcc/gcc/cp/init.c:1835
0x92c88d build_aggr_init_full_exprs
/gcc/gcc/cp/decl.c:6696
0x92c88d check_initializer
/gcc/gcc/cp/decl.c:6857
0x950982 cp_finish_decl(tree_node*, tree_node*, bool, tree_node*, int)
/gcc/gcc/cp/decl.c:7699
0x960c7e grokfield(cp_declarator const*, cp_decl_specifier_seq*, tree_node*, 
bool, tree_node*, tree_node*)
/gcc/gcc/cp/decl2.c:1000
0xa02ceb cp_parser_member_declaration
/gcc/gcc/cp/parser.c:25755


0x6b3983 build_target_expr
/gcc/gcc/cp/tree.c:496
0xa94ee8 build_cplus_new(tree_node*, tree_node*, int)
/gcc/gcc/cp/tree.c:728
0x8a0ed5 build_cxx_call(tree_node*, int, tree_node**, int, tree_node*)
/gcc/gcc/cp/call.c:9747
0xabed0b cp_build_function_call_vec(tree_node*, vec**, int, tree_node*)
/gcc/gcc/cp/typeck.c:4025
0xa75a30 finish_call_expr(tree_node*, vec**, bool, 
bool, int)
/gcc/gcc/cp/semantics.c:2728
0x9e9383 cp_parser_postfix_expression
/gcc/gcc/cp/parser.c:7549


0x6b3983 build_target_expr
/gcc/gcc/cp/tree.c:496
0xa94ee8 build_cplus_new(tree_node*, tree_node*, int)
/gcc/gcc/cp/tree.c:728
0x8a0ed5 build_cxx_call(tree_node*, int, tree_node**, int, tree_node*)
/gcc/gcc/cp/call.c:9747
0xabed0b cp_build_function_call_vec(tree_node*, vec**, int, tree_node*)
/gcc/gcc/cp/typeck.c:4025
0x95eb40 build_offset_ref_call_from_tree(tree_node*, vec**, int)
/gcc/gcc/cp/decl2.c:5292
0x9e9b1f cp_parser_postfix_expression
/gcc/gcc/cp/parser.c:7534


0x6b3983 build_target_expr
/gcc/gcc/cp/tree.c:496
0xa94ec8 build_cplus_new(tree_node*, tree_node*, int)
/gcc/gcc/cp/tree.c:728
0x8b67c2 perform_direct_initialization_if_possible(tree_node*, tree_node*, 
bool, int)
/gcc/gcc/cp/call.c:12038
0xaba6a9 build_static_cast_1
/gcc/gcc/cp/typeck.c:7607
0xabb500 build_static_cast(unsigned int, tree_node*, tree_node*, int)
/gcc/gcc/cp/typeck.c:7813
0x9e9f9e cp_parser_postfix_expression
/gcc/gcc/cp/parser.c:7049


0x6b3983 build_target_expr
/gcc/gcc/cp/tree.c:496
0x97fb53 build_new_1
/gcc/gcc/cp/init.c:3281
0x982382 build_new(unsigned int, vec**, 
tree_node*, tree_node*, vec**, int, int)
/gcc/gcc/cp/in

Re: [PATCH] Implementation of asm goto outputs

2020-11-13 Thread Uros Bizjak via Gcc-patches
Hello!

>The following patch implements asm goto with outputs.  Kernel
> developers several times expressed wish to have this feature. Asm
> goto with outputs was implemented in LLVM recently.  This new feature
> was presented on 2020 linux plumbers conference
> (https://linuxplumbersconf.org/event/7/contributions/801/attachments/659/1212/asm_goto_w__Outputs.pdf)
> and 2020 LLVM conference
> (https://www.youtube.com/watch?v=vcPD490s-hE).

diff --git a/gcc/testsuite/gcc.c-torture/compile/asmgoto-4.c
b/gcc/testsuite/gcc.c-torture/compile/asmgoto-4.c
new file mode 100644
index 000..8685ca2a1cb
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/asmgoto-4.c
@@ -0,0 +1,14 @@
+/* Check that LRA really puts output reloads for p4 in two successors blocks */
+/* { dg-do compile { target x86_64-*-* } } */

Please use:

/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && { ! ia32 } } } } */

to correctly select 64bit x86 targets.

diff --git a/gcc/testsuite/gcc.c-torture/compile/asmgoto-5.c
b/gcc/testsuite/gcc.c-torture/compile/asmgoto-5.c
new file mode 100644
index 000..57359192f62
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/asmgoto-5.c
@@ -0,0 +1,56 @@
+/* Test to generate output reload in asm goto on x86_64.  */
+/* { dg-do compile } */
+/* { dg-skip-if "no O0" { x86_64-*-* } { "-O0" } { "" } } */

Same here:

+/* { dg-skip-if "no O0" { { i?86-*-* x86_64-*-* } && { ! ia32 } } {
"-O0" } { "" } } */

Uros.


Re: [PATCH] Implementation of asm goto outputs

2020-11-13 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 13, 2020 at 04:51:09PM +0100, Uros Bizjak via Gcc-patches wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/asmgoto-4.c
> @@ -0,0 +1,14 @@
> +/* Check that LRA really puts output reloads for p4 in two successors blocks 
> */
> +/* { dg-do compile { target x86_64-*-* } } */
> 
> Please use:
> 
> /* { dg-do compile { target { { i?86-*-* x86_64-*-* } && { ! ia32 } } } } */
> 
> to correctly select 64bit x86 targets.

Or lp64 instead of { ! ia32 }, depending on if it should work on -mx32 or not.
As they are compile time tests, one can test them easily...

Jakub



Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-11-13 Thread Jose E. Marchesi via Gcc-patches


> PING^7 on the following patch proposed 8 months ago for gcc11:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542198.html
> 
>
> The deadline for gcc11 stage 1 is approaching.  The pinged patch is
> one that has been sent for review 8 months ago in order to
> Make into gcc11. 
>
> And this is an important feature that our company is waiting for a long time. 
>
> Could you please take a look at this patch and let us know whether
> it’s ready for commit into gcc11?

ping


Re: [22.2/32] module flags

2020-11-13 Thread Nathan Sidwell

On 11/13/20 9:27 AM, Richard Biener wrote:

On Fri, Nov 13, 2020 at 3:04 PM Nathan Sidwell  wrote:





  struct GTY(()) lang_decl_base {
-  /* Larger than necessary for faster access.  */
-  ENUM_BITFIELD(lang_decl_selector) selector : 16;
+  ENUM_BITFIELD(lang_decl_selector) selector : 3;
...
+  unsigned attached_decls_p : 1;
+
+  /* 10 spare bits.  */

so for "faster access' you could still make selector 8 bits, reducing
spare bits to 5.


could do -- we always know what kind of lang_decl to expect from the 
originating tree's code.  It's only for the garbage collector that we 
need the selector. (+ the checkers)

Can you add comments (like on some other bits var / fn / type)
what kind of decls the new bits are used on?  Maybe some
bits can be overloaded if spare bits are needed.


sure.  For the record it's VAR_DECL, TYPE_DECL, FUNCTION_DECL, 
CONCEPT_DECL, TEMPLATE_DECL, NAMESPACE_DECL (that's to many for a 
TREE_CHECK, we only go to 5).




Thanks,
Richard.


nathan

--
Nathan Sidwell



--
Nathan Sidwell


[PATCH][pushed] testsuite: move expected error location

2020-11-13 Thread Martin Liška

Hello.

One obvious fix of expected error location.

Martin

gcc/testsuite/ChangeLog:

PR testsuite/97788
* g++.dg/ubsan/pr61272.C: Move expected error location.
---
 gcc/testsuite/g++.dg/ubsan/pr61272.C | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/ubsan/pr61272.C 
b/gcc/testsuite/g++.dg/ubsan/pr61272.C
index 11dd1ecb733..cb4751e9931 100644
--- a/gcc/testsuite/g++.dg/ubsan/pr61272.C
+++ b/gcc/testsuite/g++.dg/ubsan/pr61272.C
@@ -12,10 +12,10 @@ namespace std
   };
   namespace __gnu_cxx
   {
-template < typename _Alloc > struct __alloc_traits:std::allocator_traits < _Alloc > 
// { dg-error "within this context" }
+template < typename _Alloc > struct __alloc_traits:std::allocator_traits < 
_Alloc >
 {
   typedef std::allocator_traits < _Alloc > _Base_type;
-  using _Base_type::construct;
+  using _Base_type::construct; // { dg-error "within this context" }
 };
 template < typename _Tp, typename _Alloc > struct _Vector_base { typedef typename 
__gnu_cxx::__alloc_traits < _Alloc >::template rebind < _Tp >::other _Tp_alloc_type; }; // { 
dg-error "no class template" }
 template < typename _Tp, typename _Alloc = std::allocator < _Tp > >class vector 
: protected _Vector_base < _Tp, _Alloc > { };
--
2.29.2



Re: [1/3][aarch64] Add aarch64 support for vec_widen_add, vec_widen_sub patterns

2020-11-13 Thread Joel Hutton via Gcc-patches
Tests are still running, but I believe I've addressed the comment.

> There are ways in which we could reduce the amount of cut-&-paste here,
> but I guess everything is a trade-off between clarity and compactness.
> One extreme is to write them all out explicitly, another extreme would
> be to have one define_expand and various iterators and attributes.
>
> I think the vec_widen_mult_*_ patterns strike a good balance:
> the use ANY_EXTEND to hide the sign difference while still having
> separate hi and lo patterns:

Done

gcc/ChangeLog:

2020-11-13  Joel Hutton  

* config/aarch64/aarch64-simd.md: New patterns
  vec_widen_saddl_lo/hi_.
From c52fd11f5d471200c1292fad3bc04056e7721f06 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Mon, 9 Nov 2020 15:35:57 +
Subject: [PATCH 1/3] [aarch64] Add vec_widen patterns to aarch64

Add widening add and subtract patterns to the aarch64
backend. These allow taking vectors of N elements of size S
and performing and add/subtract on the high or low half
widening the resulting elements and storing N/2 elements of size 2*S.
These correspond to the addl,addl2,subl,subl2 instructions.
---
 gcc/config/aarch64/aarch64-simd.md | 47 ++
 1 file changed, 47 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 2cf6fe9154a..30299610635 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3382,6 +3382,53 @@
   [(set_attr "type" "neon__long")]
 )
 
+(define_expand "vec_widen_addl_lo_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_addl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_addl_hi_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_addl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_subl_lo_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_subl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_subl_hi_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_subl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
 
 (define_expand "aarch64_saddl2"
   [(match_operand: 0 "register_operand")
-- 
2.17.1



[PATCH] Add 3 new EVRP testcases.

2020-11-13 Thread Andrew MacLeod via Gcc-patches
This patch adds 3 new evrp testcases which test some enhanced ranger 
functionality in EVRP.


I pulled them from the old rvrp testsuite that was created when the 
ranger project was first started, and they are things EVRP didn't use to 
get.



Andrew




commit 0d1189b4e618517b62f938a94c722123cc0ef5f5
Author: Andrew MacLeod 
Date:   Fri Nov 13 11:40:41 2020 -0500

Add 3 new EVRP testcases.

test new evrp functionality.

gcc/testsuite/
* gcc.dg/tree-ssa/evrp20.c
* gcc.dg/tree-ssa/evrp21.c
* gcc.dg/tree-ssa/evrp22.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/evrp20.c b/gcc/testsuite/gcc.dg/tree-ssa/evrp20.c
new file mode 100644
index 000..7d4d55f7638
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/evrp20.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+void call (void);
+
+void foo (int base)
+{
+  unsigned i;
+
+  // Ranger should be able to remove the (i > 123) comparison.
+  for (i = base; i < 10; i++)
+if (i > 123)
+  {
+call ();
+	return;
+  }
+}
+
+/* { dg-final { scan-tree-dump-not "call" "evrp"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/evrp21.c b/gcc/testsuite/gcc.dg/tree-ssa/evrp21.c
new file mode 100644
index 000..dae788cc2b6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/evrp21.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+extern void vrp_keep (void);
+extern void vrp_kill (void);
+
+void
+f2 (int s, int b)
+{
+  if (s > 4)
+s = 4;
+  if (s < -16)
+s = -16;
+  /* s in [-16, 4].   */
+  b = (b & 1) + 1;
+  /* b in range [1, 2].  */
+  b = s << b;
+  /* b in range [-64, 16].  */
+  if (b == -2)
+vrp_keep ();
+  if (b <= -65)
+vrp_kill ();
+  if (b >= 17)
+vrp_kill ();
+}
+
+/* { dg-final { scan-tree-dump-times "vrp_keep \\(" 1 "evrp"} } */
+/* { dg-final { scan-tree-dump-times "vrp_kill \\(" 0 "evrp"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/evrp22.c b/gcc/testsuite/gcc.dg/tree-ssa/evrp22.c
new file mode 100644
index 000..3dd47e55d04
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/evrp22.c
@@ -0,0 +1,43 @@
+/* See backwards thru casts if the range fits the LHS type. */
+/* { dg-do compile } */
+/* { dg-options "-O2  -fdump-tree-evrp" } */
+
+extern void kill(int i);
+extern void keep(int i);
+
+void
+foo (int i)
+{
+  if (i >= 10)
+{
+  if (i <= 100)
+	{
+	  /* i has a range of [10, 100]  */
+	  char c = (char) i;
+	  if (c < 30)
+	{
+	  /* If we wind back thru the cast with the range of c being [10,29]
+	   * from the branch, and recognize that the range of i fits within
+	   * a cast to c, then there is no missing information in a cast
+	   * back to int. We can use the range calculated for 'c' with 'i'
+	   * as well and Ranger should be able to kill the call.  */
+	  if (i > 29)
+		kill (i);
+	}
+	}
+  /* i has a range of [10, MAX]  */
+  char d  = (char) i;
+  if (d < 30)
+	{
+	  /* Here, a cast to a char and back is NOT equivalent, so we cannot use
+	   * the value of d to remove the call.  */
+	  if (i > 29)
+	keep (i);
+	}
+
+}
+}
+
+/* { dg-final { scan-tree-dump-times "kill \\(" 0 "evrp"} } */
+/* { dg-final { scan-tree-dump-times "keep \\(" 1 "evrp"} } */
+


Re: [PATCH] Implementation of asm goto outputs

2020-11-13 Thread Vladimir Makarov via Gcc-patches



On 2020-11-13 10:51 a.m., Uros Bizjak wrote:

diff --git a/gcc/testsuite/gcc.c-torture/compile/asmgoto-4.c
b/gcc/testsuite/gcc.c-torture/compile/asmgoto-4.c
new file mode 100644
index 000..8685ca2a1cb
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/asmgoto-4.c
@@ -0,0 +1,14 @@
+/* Check that LRA really puts output reloads for p4 in two successors blocks */
+/* { dg-do compile { target x86_64-*-* } } */

Please use:

/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && { ! ia32 } } } } */

to correctly select 64bit x86 targets.

diff --git a/gcc/testsuite/gcc.c-torture/compile/asmgoto-5.c
b/gcc/testsuite/gcc.c-torture/compile/asmgoto-5.c
new file mode 100644
index 000..57359192f62
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/asmgoto-5.c
@@ -0,0 +1,56 @@
+/* Test to generate output reload in asm goto on x86_64.  */
+/* { dg-do compile } */
+/* { dg-skip-if "no O0" { x86_64-*-* } { "-O0" } { "" } } */

Same here:

+/* { dg-skip-if "no O0" { { i?86-*-* x86_64-*-* } && { ! ia32 } } {
"-O0" } { "" } } */


OK. Thank you, Uros.  I've changed these tests.




[PATCH] gcov: Add -fprofile-info-section support

2020-11-13 Thread Sebastian Huber
Register the profile information in the specified section instead of using a
constructor/destructor.  A pointer to the profile information generated by
-fprofile-arcs or -ftest-coverage is placed in the specified section for each
translation unit.  This option disables the profile information registration
through a constructor and it disables the profile information processing
through a destructor.

I am not sure how I can test this option.  One approach would be to assemble a
test file, then scan it and check that a .gcov_info section is present and no
__gcov_init() and __gcov_exit() calls are present.  Is there an example for
this in the test suite?

gcc/

* gcc/common.opt (fprofile-info-section): New.
* gcc/coverage.c (build_gcov_info_var_registration): New.
(coverage_obj_init): Evaluate profile_info_section and use
build_gcov_info_var_registration().
* gcc/doc/invoke.texi (fprofile-info-section): Document.
* gcc/opts.c (common_handle_option): Process fprofile-info-section
option.
---
 gcc/common.opt  |  8 
 gcc/coverage.c  | 28 ++--
 gcc/doc/invoke.texi | 29 +
 gcc/opts.c  |  4 
 4 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 7d0e0d9c88a..114fe15e3c6 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2268,6 +2268,14 @@ fprofile-generate=
 Common Joined RejectNegative
 Enable common options for generating profile info for profile feedback 
directed optimizations, and set -fprofile-dir=.
 
+fprofile-info-section
+Common RejectNegative
+Register the profile information in the .gcov_info section instead of using a 
constructor/destructor.
+
+fprofile-info-section=
+Common Joined RejectNegative Var(profile_info_section)
+Register the profile information in the specified section instead of using a 
constructor/destructor.
+
 fprofile-partial-training
 Common Report Var(flag_profile_partial_training) Optimization
 Do not assume that functions never executed during the train run are cold.
diff --git a/gcc/coverage.c b/gcc/coverage.c
index 7711412c3be..d299e48d591 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -1097,6 +1097,25 @@ build_gcov_exit_decl (void)
   cgraph_build_static_cdtor ('D', dtor, priority);
 }
 
+/* Generate the pointer to the gcov_info_var in a dedicated section.  */
+
+static void
+build_gcov_info_var_registration (tree gcov_info_type)
+{
+  tree var = build_decl (BUILTINS_LOCATION,
+VAR_DECL, NULL_TREE,
+build_pointer_type (gcov_info_type));
+  TREE_STATIC (var) = 1;
+  TREE_READONLY (var) = 1;
+  char name_buf[32];
+  ASM_GENERATE_INTERNAL_LABEL (name_buf, "LPBX", 2);
+  DECL_NAME (var) = get_identifier (name_buf);
+  get_section (profile_info_section, SECTION_UNNAMED, NULL);
+  set_decl_section_name (var, profile_info_section);
+  DECL_INITIAL (var) = build_fold_addr_expr (gcov_info_var);
+  varpool_node::finalize_decl (var);
+}
+
 /* Create the gcov_info types and object.  Generate the constructor
function to call __gcov_init.  Does not generate the initializer
for the object.  Returns TRUE if coverage data is being emitted.  */
@@ -1151,8 +1170,13 @@ coverage_obj_init (void)
   ASM_GENERATE_INTERNAL_LABEL (name_buf, "LPBX", 0);
   DECL_NAME (gcov_info_var) = get_identifier (name_buf);
 
-  build_init_ctor (gcov_info_type);
-  build_gcov_exit_decl ();
+  if (profile_info_section)
+build_gcov_info_var_registration (gcov_info_type);
+  else
+{
+  build_init_ctor (gcov_info_type);
+  build_gcov_exit_decl ();
+}
 
   return true;
 }
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d01beb248e1..e78c3c23ad2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -564,6 +564,7 @@ Objective-C and Objective-C++ Dialects}.
 @gccoptlist{-p  -pg  -fprofile-arcs  --coverage  -ftest-coverage @gol
 -fprofile-abs-path @gol
 -fprofile-dir=@var{path}  -fprofile-generate  -fprofile-generate=@var{path} 
@gol
+-fprofile-info-section  -fprofile-info-section=@var{name} @gol
 -fprofile-note=@var{path} -fprofile-prefix-path=@var{path} @gol
 -fprofile-update=@var{method} -fprofile-filter-files=@var{regex} @gol
 -fprofile-exclude-files=@var{regex} @gol
@@ -14183,6 +14184,34 @@ the profile feedback data files. See 
@option{-fprofile-dir}.
 To optimize the program based on the collected profile information, use
 @option{-fprofile-use}.  @xref{Optimize Options}, for more information.
 
+@item -fprofile-info-section
+@itemx -fprofile-info-section=@var{name}
+@opindex fprofile-info-section
+
+Register the profile information in the specified section instead of using a
+constructor/destructor.  The section name is @var{name} if it is specified,
+otherwise the section name defaults to @code{.gcov_info}.  A pointer to the
+profile information generated by @option{-fprofile-arcs} or
+@option{-ftest-coverage} is placed in the specified s

Re: [2/3][vect] Add widening add, subtract vect patterns

2020-11-13 Thread Joel Hutton via Gcc-patches
Tests are still running, but I believe I've addressed all the comments.

> Like Richard said, the new patterns need to be documented in md.texi
> and the new tree codes need to be documented in generic.texi.

Done.

> While we're using tree codes, I think we need to make the naming
> consistent with other tree codes: WIDEN_PLUS_EXPR instead of
> WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR.
> Same idea for the VEC_* codes.

Fixed.

> > gcc/ChangeLog:
> >
> > 2020-11-12  Joel Hutton  
> >
> > * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
> 
> Not that I personally care about this stuff (would love to see changelogs
> go away :-)) but some nits:
> 
> Each description is supposed to start with a capital letter and end with
> a full stop (even if it's not a complete sentence).  Same for the rest

Fixed.

> > * optabs-tree.c (optab_for_tree_code): optabs for widening 
> > adds,subtracts
> 
> The line limit for changelogs is 80 characters.  The entry should say
> what changed, so “Handle …” or “Add case for …” or something.

Fixed.

> > * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog 
> > ptatern
> 
> typo: pattern

Fixed.

> > Add widening add, subtract patterns to tree-vect-patterns.
> > Add aarch64 tests for patterns.
> >
> > fix sad
> 
> Would be good to expand on this for the final commit message.

'fix sad' was accidentally included when I squashed two commits. I've made all 
the commit messages more descriptive.

> > +
> > +case VEC_WIDEN_SUB_HI_EXPR:
> > +  return (TYPE_UNSIGNED (type)
> > +   ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
> > +
> > +
> 
> Nits: excess blank line at the end and excess space before the “:”s.

Fixed.

> > +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> > +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> > +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
> >  OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
> >  OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
> >  OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
> 
> Looks like the current code groups signed stuff together and
> unsigned stuff together, so would be good to follow that.

Fixed.

> Same comments as the previous patch about having a "+nosve" pragma
> and about the scan-assembler-times lines.  Same for the sub test.

Fixed.

> I am missing documentation in md.texi for the new patterns.  In
> particular I wonder why you need singed and unsigned variants
> for the add/subtract patterns.

Fixed. Signed and unsigned variants because they correspond to signed and
unsigned instructions, (uaddl/uaddl2, saddl/saddl2).

> The new functions should have comments before them.  Can probably
> just use the vect_recog_widen_mult_pattern comment as a template.

Fixed.

> > +case VEC_WIDEN_SUB_HI_EXPR:
> > +case VEC_WIDEN_SUB_LO_EXPR:
> > +case VEC_WIDEN_ADD_HI_EXPR:
> > +case VEC_WIDEN_ADD_LO_EXPR:
> > +  return false;
> > +
>
> I think these should get the same validity checking as
> VEC_WIDEN_MULT_HI_EXPR etc.

Fixed.

> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo,
> >   of the above pattern.  */
> >
> >tree plus_oprnd0, plus_oprnd1;
> > -  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > -&plus_oprnd0, &plus_oprnd1))
> > +  if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > +&plus_oprnd0, &plus_oprnd1)
> > + || vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
> > +&plus_oprnd0, &plus_oprnd1)))
> >  return NULL;
> >
> > tree sum_type = gimple_expr_type (last_stmt);
>
> I think we should make:
>
>   /* Any non-truncating sequence of conversions is OK here, since
>  with a successful match, the result of the ABS(U) is known to fit
>  within the nonnegative range of the result type.  (It cannot be the
>  negative of the minimum signed value due to the range of the widening
>  MINUS_EXPR.)  */
>   vect_unpromoted_value unprom_abs;
>   plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0,
>   &unprom_abs);
>
> specific to the PLUS_EXPR case.  If we look through promotions on
> the operands of a WIDEN_ADD_EXPR, we could potentially have a mixture
> of signednesses involved, one on the operands of the WIDEN_ADD_EXPR
> and one on its inputs.

Fixed.


gcc/ChangeLog:

2020-11-13  Joel Hutton  

* expr.c (expand_expr_real_2): Add widen_add,widen_subtract cases.
* optabs-tree.c (optab_for_tree_code): Add case for widening optabs.
  adds, subtracts.
* optabs.def (OPTAB_D): Define vectorized widen add, subtracts.
* tree-cfg.c (verify

Re: [3/3][aarch64] Add support for vec_widen_shift pattern

2020-11-13 Thread Joel Hutton via Gcc-patches
Tests are still running, but I believe I've addressed all the comments.

> > +#include 
> > +
> 
> SVE targets will need a:
> 
> #pragma GCC target "+nosve"
> 
> here, since we'll generate different code for SVE.

Fixed.

> > +/* { dg-final { scan-assembler-times "shll\t" 1} } */
> > +/* { dg-final { scan-assembler-times "shll2\t" 1} } */
> 
> Very minor nit, sorry, but I think:
> 
> /* { dg-final { scan-assembler-times {\tshll\t} 1 } } */
> 
> would be better.  Using "…\t" works, but IIRC it shows up as a tab
> character in the testsuite result summary too.

Fixed. Minor nits welcome. :)


> OK for the aarch64 bits with the testsuite changes above.
ok?

gcc/ChangeLog:

2020-11-13  Joel Hutton  

* config/aarch64/aarch64-simd.md: Add vec_widen_lshift_hi/lo
  patterns.
* tree-vect-stmts.c
(vectorizable_conversion): Fix for widen_lshift case.

gcc/testsuite/ChangeLog:

2020-11-13  Joel Hutton  

* gcc.target/aarch64/vect-widen-lshift.c: New test.
From e8d3ed6fa739850eb649b97c250f1f2c650c34c1 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Thu, 12 Nov 2020 11:48:25 +
Subject: [PATCH 3/3] [AArch64][vect] vec_widen_lshift pattern

Add aarch64 vec_widen_lshift_lo/hi patterns and fix bug it triggers in
mid-end. This pattern takes one vector with N elements of size S, shifts
each element left by the element width and stores the results as N
elements of size 2*s (in 2 result vectors). The aarch64 backend
implements this with the shll,shll2 instruction pair.
---
 gcc/config/aarch64/aarch64-simd.md| 66 +++
 .../gcc.target/aarch64/vect-widen-lshift.c| 62 +
 gcc/tree-vect-stmts.c |  5 +-
 3 files changed, 131 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 30299610635..4ba799a27c9 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4664,8 +4664,74 @@
   [(set_attr "type" "neon_sat_shift_reg")]
 )
 
+(define_expand "vec_widen_shiftl_lo_"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(match_operand:VQW 1 "register_operand" "w")
+			 (match_operand:SI 2
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+emit_insn (gen_aarch64_shll_internal (operands[0], operands[1],
+		 p, operands[2]));
+DONE;
+  }
+)
+
+(define_expand "vec_widen_shiftl_hi_"
+   [(set (match_operand: 0 "register_operand")
+	(unspec: [(match_operand:VQW 1 "register_operand" "w")
+			 (match_operand:SI 2
+			   "immediate_operand" "i")]
+			  VSHLL))]
+   "TARGET_SIMD"
+   {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+emit_insn (gen_aarch64_shll2_internal (operands[0], operands[1],
+		  p, operands[2]));
+DONE;
+   }
+)
+
 ;; vshll_n
 
+(define_insn "aarch64_shll_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(vec_select:
+			(match_operand:VQW 1 "register_operand" "w")
+			(match_operand:VQW 2 "vect_par_cnst_lo_half" ""))
+			 (match_operand:SI 3
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll\\t%0., %1., %3";
+else
+  return "shll\\t%0., %1., %3";
+  }
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_shll2_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(vec_select:
+			(match_operand:VQW 1 "register_operand" "w")
+			(match_operand:VQW 2 "vect_par_cnst_hi_half" ""))
+			 (match_operand:SI 3
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll2\\t%0., %1., %3";
+else
+  return "shll2\\t%0., %1., %3";
+  }
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
 (define_insn "aarch64_shll_n"
   [(set (match_operand: 0 "register_operand" "=w")
 	(unspec: [(match_operand:VD_BHSI 1 "register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c
new file mode 100644
index 000..48a3719d4ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c
@@ -0,0 +1,62 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include 
+#include 
+
+#pragma GCC target "+nosve"
+
+#define ARR_SIZE 1024
+
+/* Should produce an shll,shll2 pair*/
+void sshll_opt (int32_t *foo, int16_t *a, int16_t *b)
+{
+for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+{
+foo[i]   = a[i]   << 16;
+foo[i+1] = a[i+1] << 16;
+foo[i+2] = a[i+2] << 16;
+foo[i+3] = a[i+3] << 16;
+}
+}
+
+__attribute__((optimize (0)))
+void sshll_nonopt (int32_t *foo, int16_t *a, i

Re: [Patch 0/X] HWASAN v4

2020-11-13 Thread Matthew Malcomson via Gcc-patches
Hi there,

Thanks for the heads-up.
As it turns out the most recent `libhwasan` crashes when displaying an address 
on the stack in Linux.

I'm currently working on getting it fixed here 
https://reviews.llvm.org/D91344#2393371 .
If this hwasan patch series gets approved and if that patch goes in would it be 
feasible to bump the libsanitizer merge to whatever version that would be?

If not (maybe because stage1 would be finished?) then could/would we end up 
using the LOCAL_PATCHES approach?

Thanks,
Matthew

From: Martin Liška 
Sent: 13 November 2020 16:33
To: Matthew Malcomson ; gcc-patches@gcc.gnu.org 

Cc: ja...@redhat.com ; Richard Earnshaw 
; k...@google.com ; do...@redhat.com 
; jos...@codesourcery.com 
Subject: Re: [Patch 0/X] HWASAN v4

On 10/16/20 11:03 AM, Martin Li�ka wrote:
> Hello.
>
> I've just merged libsanitizer and there's the corresponding part that includes
> libhwasan.
>
> Martin

Hey.

I've just made last merge from upstream, there's corresponding hwasan part.

Martin


Re: [Patch 0/X] HWASAN v4

2020-11-13 Thread Martin Liška

On 11/13/20 5:57 PM, Matthew Malcomson wrote:

Hi there,

Thanks for the heads-up.
As it turns out the most recent `libhwasan` crashes when displaying an address 
on the stack in Linux.


Hello.

What a bad luck.



I'm currently working on getting it fixed here 
https://reviews.llvm.org/D91344#2393371 
 .
If this hwasan patch series gets approved and if that patch goes in would it be 
feasible to bump the libsanitizer merge to whatever version that would be?

If not (maybe because stage1 would be finished?) then could/would we end up 
using the LOCAL_PATCHES approach?


Since now, I would prefer doing cherry picks. Hopefully, we'll end just with 
couple of patches.

Thanks,
Martin



Thanks,
Matthew
--
*From:* Martin Liška 
*Sent:* 13 November 2020 16:33
*To:* Matthew Malcomson ; gcc-patches@gcc.gnu.org 

*Cc:* ja...@redhat.com ; Richard Earnshaw ; 
k...@google.com ; do...@redhat.com ; jos...@codesourcery.com 

*Subject:* Re: [Patch 0/X] HWASAN v4
On 10/16/20 11:03 AM, Martin Li�ka wrote:

Hello.

I've just merged libsanitizer and there's the corresponding part that includes
libhwasan.

Martin


Hey.

I've just made last merge from upstream, there's corresponding hwasan part.

Martin




Re: [committed] libstdc++: Optimise std::future::wait_for and fix futex polling

2020-11-13 Thread Jonathan Wakely via Gcc-patches

On 13/11/20 11:02 +, Jonathan Wakely wrote:

On 12/11/20 23:49 +, Jonathan Wakely wrote:

To poll a std::future to see if it's ready you have to call one of the
timed waiting functions. The most obvious way is wait_for(0s) but this
was previously very inefficient because it would turn the relative
timeout to an absolute one by calling system_clock::now(). When the
relative timeout is zero (or less) we're obviously going to get a time
that has already passed, but the overhead of obtaining the current time
can be dozens of microseconds. The alternative is to call wait_until
with an absolute timeout that is in the past. If you know the clock's
epoch is in the past you can use a default constructed time_point.
Alternatively, using some_clock::time_point::min() gives the earliest
time point supported by the clock, which should be safe to assume is in
the past. However, using a futex wait with an absolute timeout before
the UNIX epoch fails and sets errno=EINVAL. The new code using futex
waits with absolute timeouts was not checking for this case, which could
result in hangs (or killing the process if the libray is built with
assertions enabled).

This patch checks for times before the epoch before attempting to wait
on a futex with an absolute timeout, which fixes the hangs or crashes.
It also makes it very fast to poll using an absolute timeout before the
epoch (because we skip the futex syscall).

It also makes future::wait_for avoid waiting at all when the relative
timeout is zero or less, to avoid the unnecessary overhead of getting
the current time. This makes polling with wait_for(0s) take only a few
cycles instead of dozens of milliseconds.

libstdc++-v3/ChangeLog:

* include/std/future (future::wait_for): Do not wait for
durations less than or equal to zero.
* src/c++11/futex.cc (_M_futex_wait_until)
(_M_futex_wait_until_steady): Do not wait for timeouts before
the epoch.
* testsuite/30_threads/future/members/poll.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

I think the shortcut in future::wait_for is worth backporting. The
changes in src/c++11/futex.cc are not needed because the code using
absolute timeouts with futex waits is not present on any release
branch.


I've committed this fix for the new test.


Backporting the change to gcc-10 revealed an overflow bug in the
existing code, resulting in blocking for years when given an absolute
timeout in the distant past. There's still a similar bug in the new
code (using futexes with absolute timeouts against clocks) where a
large chrono::seconds value can overflow and produce an incorrect
tv_sec value. Apart from the overflow itself being UB, the result in
that case is just a spurious wakeup (the call says it timed out when
it didn't reach the specified time). That should still be fixed, but
I'll do it separately.

Tested x86_64-linux. Committed to trunk.


commit e7e0eeeb6e6707be2a6c6da49d4b6be3199e2af8
Author: Jonathan Wakely 
Date:   Fri Nov 13 15:19:04 2020

libstdc++: Avoid 32-bit time_t overflows in futex calls

The existing code doesn't check whether the chrono::seconds value is out
of range of time_t. When using a timeout before the epoch (with a
negative value) subtracting the current time (as time_t) and then
assigning it to a time_t can overflow to a large positive value. This
means that we end up waiting several years even though the specific
timeout was in the distant past.

We do have a check for negative timeouts, but that happens after the
conversion to time_t so happens after the overflow.

The conversion to a relative timeout is done in two places, so this
factors it into a new function and adds the overflow checks there.

libstdc++-v3/ChangeLog:

* src/c++11/futex.cc (relative_timespec): New function to
create relative time from two absolute times.
(__atomic_futex_unsigned_base::_M_futex_wait_until)
(__atomic_futex_unsigned_base::_M_futex_wait_until_steady):
Use relative_timespec.

diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
index 57f7dfe87e9e..c2b2d32e8c43 100644
--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #ifdef _GLIBCXX_USE_CLOCK_GETTIME_SYSCALL
@@ -46,20 +47,55 @@ const unsigned futex_clock_realtime_flag = 256;
 const unsigned futex_bitset_match_any = ~0;
 const unsigned futex_wake_op = 1;
 
-namespace
-{
-  std::atomic futex_clock_realtime_unavailable;
-  std::atomic futex_clock_monotonic_unavailable;
-}
-
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+namespace
+{
+  std::atomic futex_clock_realtime_unavailable;
+  std::atomic futex_clock_monotonic_unavailable;
+
+  // Return the relative duration from (now_s + now_ns) to (abs_s + abs_ns)
+  // as a timesp

[PATCH] ipa-cp: One more safe_add (PR 97816)

2020-11-13 Thread Martin Jambor
Hi,

The new behavior of safe_add triggered an ICE because of one use where
it had not been used instead of a simple addition.  I'll fix it with the
following obvious patch so that periodic benchmarkers can continue
working because a proper fix (see below) will need a review.

The testcase showed me, however, that we can propagate time and cost
from one lattice to another more than once even when that was not the
intent.  I'll address that as a follow-up after I verify it does not
affect the IPA-CP heuristics too much or change the corresponding
params accordingly.

Bootstrapped and tested on x86_64-linux.

Thanks,

Martin


gcc/ChangeLog:

2020-11-13  Martin Jambor  

PR ipa/97816
* ipa-cp.c (value_topo_info::propagate_effects): Use
safe_add instead of a simple addition.
---
 gcc/ipa-cp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index f29f2164f4e..c3ee71e16e1 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3873,7 +3873,8 @@ value_topo_info::propagate_effects ()
   for (val = base; val; val = val->scc_next)
{
  time = time + val->local_time_benefit + val->prop_time_benefit;
- size = safe_add (size, val->local_size_cost + val->prop_size_cost);
+ size = safe_add (size, safe_add (val->local_size_cost,
+  val->prop_size_cost));
}
 
   for (val = base; val; val = val->scc_next)
-- 
2.28.0



Re: [PATCH][c++] Do not warn about unused macros while processing #pragma GCC optimize

2020-11-13 Thread Jeff Law via Gcc-patches


On 8/5/19 7:53 PM, Piotr H. Dabrowski wrote:
> Fixes c++/91318.
>
> libcpp/ChangeLog:
>
> 2019-08-06  Piotr Henryk Dabrowski  
>
>   PR c++/91318
>   * include/cpplib.h: Added cpp_define_unused(), 
> cpp_define_formatted_unused()
>   * directives.c: Likewise.
>
> gcc/c-family/ChangeLog:
>
> 2019-08-06  Piotr Henryk Dabrowski  
>
>   PR c++/91318
>   * c-cppbuiltin.c: c_cpp_builtins_optimize_pragma(): use 
> cpp_define_unused()

THanks.  I've bootstrapped and regression tested this patch and pushed
it to the trunk.  Sorry about the long delay.

jeff



[PATCH] c++: Predefine __STDCPP_THREADS__ in the compiler if thread model is not single

2020-11-13 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch predefines __STDCPP_THREADS__ macro to 1 if c++11 or
later and thread model (e.g. printed by gcc -v) is not single.
There are two targets not handled by this patch, those that define
THREAD_MODEL_SPEC.  In one case - QNX - it looks just like a mistake
to me, instead of setting thread_model=posix in config.gcc it uses
THREAD_MODEL_SPEC macro to set it unconditionally to posix.
The other is hpux10, which uses -threads option to decide if threads
are enabled or not, but that option isn't really passed to the compiler.
I think that is something that really should be solved in config/pa/
instead, e.g. in the config/xxx/xxx-c.c targets usually set their own
predefined macros and it could handle this, and either pass the option
also to the compiler, or say predefine __STDCPP_THREADS__ if _DCE_THREADS
macro is defined already (or -D_DCE_THREADS found on the command line),
or whatever else.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 

2020-11-13  Jakub Jelinek  

* c-cppbuiltin.c: Include configargs.h.
(c_cpp_builtins): For C++11 and later if THREAD_MODEL_SPEC is not
defined, predefine __STDCPP_THREADS__ to 1 unless thread_model is
"single".

--- gcc/c-family/c-cppbuiltin.c.jj  2020-10-22 10:37:25.890800306 +0200
+++ gcc/c-family/c-cppbuiltin.c 2020-11-13 12:34:25.290963074 +0100
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.
 #include "debug.h" /* For dwarf2out_do_cfi_asm.  */
 #include "common/common-target.h"
 #include "cppbuiltin.h"
+#include "configargs.h"
 
 #ifndef TARGET_OS_CPP_BUILTINS
 # define TARGET_OS_CPP_BUILTINS()
@@ -1033,6 +1034,12 @@ c_cpp_builtins (cpp_reader *pfile)
cpp_define (pfile, "__cpp_threadsafe_static_init=200806L");
   if (flag_char8_t)
 cpp_define (pfile, "__cpp_char8_t=201811L");
+#ifndef THREAD_MODEL_SPEC
+  /* Targets that define THREAD_MODEL_SPEC need to define
+__STDCPP_THREADS__ in their config/XXX/XXX-c.c themselves.  */
+  if (cxx_dialect >= cxx11 && strcmp (thread_model, "single") != 0)
+   cpp_define (pfile, "__STDCPP_THREADS__=1");
+#endif
 }
   /* Note that we define this for C as well, so that we know if
  __attribute__((cleanup)) will interface with EH.  */

Jakub



Re: [PATCH] ipa-cp: One more safe_add (PR 97816)

2020-11-13 Thread Jan Hubicka
> Hi,
> 
> The new behavior of safe_add triggered an ICE because of one use where
> it had not been used instead of a simple addition.  I'll fix it with the
> following obvious patch so that periodic benchmarkers can continue
> working because a proper fix (see below) will need a review.
> 
> The testcase showed me, however, that we can propagate time and cost
> from one lattice to another more than once even when that was not the
> intent.  I'll address that as a follow-up after I verify it does not
> affect the IPA-CP heuristics too much or change the corresponding
> params accordingly.
> 
> Bootstrapped and tested on x86_64-linux.
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2020-11-13  Martin Jambor  
> 
>   PR ipa/97816
>   * ipa-cp.c (value_topo_info::propagate_effects): Use
>   safe_add instead of a simple addition.
Seems OK. I wonder if we can't just stop propagation earlier? Does it
really make sense to propagate such a large vlaues? Don't we have some
cap on how much size we permit?
thanks!
Honza
> ---
>  gcc/ipa-cp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index f29f2164f4e..c3ee71e16e1 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -3873,7 +3873,8 @@ value_topo_info::propagate_effects ()
>for (val = base; val; val = val->scc_next)
>   {
> time = time + val->local_time_benefit + val->prop_time_benefit;
> -   size = safe_add (size, val->local_size_cost + val->prop_size_cost);
> +   size = safe_add (size, safe_add (val->local_size_cost,
> +val->prop_size_cost));
>   }
>  
>for (val = base; val; val = val->scc_next)
> -- 
> 2.28.0
> 


Re: [PATCH PR93334][GCC11]Refine data dependence of two refs storing the same constant with the same bytes

2020-11-13 Thread Jeff Law via Gcc-patches


On 1/29/20 6:52 AM, bin.cheng wrote:
> Hi,
>
> As discussed in the PR, this simple patch refines data dependence of two write
> references storing the same constant with the same bytes.  It simply detects
> the case with some restrictions and treats it as no dependence.  For now the
> added interface in tree-data-ref.c is only used in loop distribution, which 
> might
> be generalized in the future.
>
> Bootstrap and test on x86_64.  Any comments?
>
> Thanks,
> bin
>
> 2020-01-29  Bin Cheng  
>
> * tree-data-ref.c (refine_affine_dependence): New function.
> * tree-data-ref.h (refine_affine_dependence): New declaration.
> * tree.h (const_with_all_bytes_same): External declaration.
> * tree.c (const_with_all_bytes_same): Moved from...
> * tree-loop-distribution.c (const_with_all_bytes_same): ...here.  Call
> refine_affine_dependence
>
> gcc/testsuite
> 2020-01-29  Bin Cheng  
>
> * gcc/testsuite/gcc.dg/tree-ssa/pr93334.c: New test.

This looks reasonable to me.  I'm not sure how often it happens in
practice though.  OK after a fresh bootstrap and regression test.


Jeff




Re: [PATCH v2] c: Silently ignore pragma region [PR85487]

2020-11-13 Thread Austin Morton via Gcc-patches
> But in that case the pragma shouldn't be ignored, but instead checked
> against the various requirements.  E.g. that region is followed by a single
> optional name (are there any requirements on what name can be, can it be
> just a single token, can it be C/C++ keyword, etc.), guess ignore all the
> tokens after endregion if it is all a comment, and verify nesting
> (all region/endregion are properly paired up).

I cannot speak to the internals of the MSVC compiler, but I can say
definitively that
clang does not do any of this.  The clang handling of #pragma region
is a no-op, as
can be seen here:
https://github.com/llvm/llvm-project/blob/48b510c4bc0fe090e635ee0440e46fc176527d7e/clang/lib/Lex/Pragma.cpp#L1847

Additionally, the MSVC implementation appears to do no validation (or
at least doesn't
emit warnings that indicate that it does). https://godbolt.org/z/3zTbnn

#pragma region is purely designed for editors to assist in
code-folding (primarily
visual studio and visual studio code, although I am sure others support it).

The goal of this patch is to make GCC compatible with both clang and
MSVC handling of this
pragma, not to introduce novel functionality.


[PATCH] testsuite: guality/redeclaration1.C test workaround

2020-11-13 Thread Jakub Jelinek via Gcc-patches
Hi!

Apparently older GDB versions didn't handle this test right and so while
it has been properly printing 42 on line 14 (e.g. on x86_64), it issued
a weird error on line 17 (and because it didn't print any value, guality
testsuite wasn't marking it as FAIL).
That has been apparently fixed in GDB 10, where it now (on x86_64) prints
properly.
Unfortunately that revealed that the test can suffer from instruction
scheduling, where e.g. on i686 (but various other arches) the very first
insn of the function (or whatever b 14 is on) happens to be load of the
S::i variable from memory and that insn has the inner lexical scope, so
GDB 10 prints there 24 instead of 42.  The following insn is then
the first store to l and there the automatic i is in scope and prints as 42
and then the second store to l where the inner lexical scope is current
and prints 24 again.
The test wasn't meant about insn scheduling but about whether we emit the
DIEs properly, so this hack attempts to prevent the undesirable scheduling.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-11-13  Jakub Jelinek  

* g++.dg/guality/redeclaration1.C (p): New variable.
(S::f): Increment what p points to before storing S::i into l.  Adjust
gdb-test line numbers.
(main): Initialize p to address of an automatic variable.

--- gcc/testsuite/g++.dg/guality/redeclaration1.C.jj2020-01-12 
11:54:37.182401807 +0100
+++ gcc/testsuite/g++.dg/guality/redeclaration1.C   2020-11-13 
13:40:40.684122957 +0100
@@ -3,6 +3,7 @@
 // { dg-skip-if "" { *-*-* } { "-flto" } { "" } }
 
 volatile int l;
+int *volatile p;
 
 namespace S
 {
@@ -11,10 +12,11 @@ namespace S
   f()
   {
 int i = 42;
-l = i; // { dg-final { gdb-test 14 "i" "42" } }
+l = i; // { dg-final { gdb-test 15 "i" "42" } }
 {
   extern int i;
-  l = i;   // { dg-final { gdb-test 17 "i" "24" } }
+  p[0]++;
+  l = i;   // { dg-final { gdb-test 19 "i" "24" } }
 }
   }
 }
@@ -22,6 +24,8 @@ namespace S
 int
 main (void)
 {
+  int x = 0;
+  p = &x;
   S::f ();
   return 0;
 }

Jakub



Improve handling of memory operands in ipa-icf 3/4

2020-11-13 Thread Jan Hubicka
Hi,
this patch is based on Maritn's patch
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02633.html
however based on new code that track and compare memory accesses 
so it can be implemented correctly.

As shown here
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558773.html
the most common reason for function body being streamed in but merging to fail
is the mismatch in base alias set.

This patch collect base and ref types ao_alias_ptr types, stream them to WPA
and at WPA time hash is produced. Now we can use alias_sets since these these
are assumed to be same as ltrans time alias sets. This is currently not always
true - but that is pre-existing issue.  I will try to produce a testcase and
make followup patch on this (that will stream out ODR types with TYPE_CANONICAL
that is !ODR as !ODR type). However for this patch this is not a problem since
the real alias sets are finer but definitly not coarser.

We may make it possible to use canonical type hash and save some streaming, but
I think it would be better to wait for next stage1 since it is not completely
trivial WRT ODR types: either we hash ODR type names and then hash values would
be too coarse for cases we got conflict betwen C and C++ type or we do not
stream and will again get into trouble with hash values being too weak. Tried
that - we get a lot of types that are struturally same but distinguished by
ODR names (from template instantiations).

As followup I will add code for merging with mismatched base alias sets.  This
makes the aforementioned problem about ODR names less pronounced but it is
still present on pointer loads/stores which requires REF alias set mismatches.

Building cc1plus memory usage goes from:

Time variable   usr   sys  wall 
  GGC
 ipa lto gimple in  :   2.67 (  2%)   2.08 ( 21%)   4.63 (  3%) 
  231M ( 12%)
 ipa lto decl in:   6.45 (  5%)   0.53 (  5%)   7.19 (  5%) 
  461M ( 25%)
 ipa icf:  13.65 ( 10%)   3.73 ( 37%)  17.41 ( 12%) 
  164M (  9%)
 TOTAL  : 137.91  9.97148.48
 1868M

Time variable   usr   sys  wall 
  GGC
 ipa lto gimple in  :   1.59 (  1%)   0.95 ( 19%)   2.82 (  2%) 
  137M (  9%)
 ipa lto decl in:   6.58 (  5%)   0.51 ( 10%)   7.30 (  6%) 
  459M ( 29%)
 ipa icf:   4.32 (  4%)   0.24 (  5%)   4.69 (  4%) 
   15M (  1%)
 TOTAL  : 122.76  5.12128.49
 1604M

Time is not 100% reliable, machine was not quiet, but we have 16% memory
allocation improvement and almost 50% gimple streaming in.  6200 functions are
identified in both builds.

Stats listed by Martin changes from:

Init called for 14675 items (23.61%).
Totally needed symbols: 8008, fraction of loaded symbols: 54.57%

to:

Init called for 8753 items (14.08%).
Totally needed symbols: 8008, fraction of loaded symbols: 91.49%

Memory use will again get worse with base alias set merging, but at east it
will pay back by actual code size reductions.

The resons for mismatch now looks as folows:

  1   false returned: 'case high values are different' in 
compare_gimple_switch at ../../gcc/ipa-icf-gimple.c:789
  1   false returned: 'Declaration mismatch' in equals at 
../../gcc/ipa-icf.c:1799
  1   false returned: 'DECL_CXX_DESTRUCTOR mismatch' in equals_wpa at 
../../gcc/ipa-icf.c:565
  1   false returned: '' in compare_gimple_call at 
../../gcc/ipa-icf-gimple.c:607
  2   fprintf (_1, "  false returned: \'\' in %s at %s:%u\n", func_4(D), 
filename_5(D), line_6(D));
  2   fprintf (_1, "  false returned: \'%s\' in %s at %s:%u\n", 
message_6(D), func_7(D), filename_8(D), line_9(D));
  4   false returned: 'final flag mismatch' in 
compare_referenced_symbol_properties at ../../gcc/ipa-icf.c:401
  5   false returned: 'different decl attributes' in equals_wpa at 
../../gcc/ipa-icf.c:662
  7   false returned: 'compare_ao_refs failed (access path difference)' in 
compare_operand at ../../gcc/ipa-icf-gimple.c:346
 10   false returned: 'case low values are different' in 
compare_gimple_switch at ../../gcc/ipa-icf-gimple.c:783
 10   false returned: 'different references' in compare_symbol_references 
at ../../gcc/ipa-icf.c:465
 11   false returned: 'size mismatch' in equals_wpa at 
../../gcc/ipa-icf.c:1648
 18   false returned: 'variables types are different' in equals at 
../../gcc/ipa-icf.c:1694
 20   false returned: 'METHOD_TYPE and FUNCTION_TYPE mismatch' in 
equals_wpa at ../../gcc/ipa-icf.c:673
 27   false returned: 'GIMPLE LHS type mismatch' in compare_gimple_assign 
at ../../gcc/ipa-icf-gimple.c:695
 35   false returned: 'INTEGER_CST precision mismatch' in equals at 
../../gcc/ipa-icf.c:1803
 40   false returned: 'GIMPLE call operands are dif

Re: [PATCH v2] c: Silently ignore pragma region [PR85487]

2020-11-13 Thread Austin Morton via Gcc-patches
> How much does this pragma get used "in the wild"?

A quick search on github for "#pragma region" comes back with 170k C++ results
and searching for "#pragma" comes back with 38M C++ results

Possibly not the best metric, but we can "conclude" roughly 0.45% of
open source C++
code files on github using any pragmas makes use of #pragma region.

https://github.com/search?l=C%2B%2B&q=%22%23pragma+region%22&type=Code
https://github.com/search?l=C%2B%2B&q=%22%23pragma%22&type=Code


[PATCH] dwarf2: Emit DW_TAG_unspecified_parameters even in late DWARF [PR97599]

2020-11-13 Thread Jakub Jelinek via Gcc-patches
Hi!

Aldy's PR71855 fix avoided emitting multiple redundant
DW_TAG_unspecified_parameters sub-DIEs of a single DIE by restricting
it to early dwarf only.  That unfortunately means if we need to emit
another DIE for the function (whether it is for LTO, or e.g. because of
IPA cloning), we don't emit DW_TAG_unspecified_parameters, it remains
solely in the DW_AT_abstract_origin's referenced DIE.
But DWARF consumers don't really use DW_TAG_unspecified_parameters
from there, like we duplicate DW_TAG_formal_parameter sub-DIEs even in the
clones because either they have some more specific location, or e.g.
a function clone could have fewer or different argument types etc.,
they need to assume that originally stdarg function isn't later stdarg etc.
Unfortunately, while for DW_TAG_formal_parameter sub-DIEs, we can use the
hash tabs to look the PARM_DECLs if we already have the DIEs, for
DW_TAG_unspecified_parameters we don't have an easy way to look it up.

The following patch handles it by trying to figure out if we are creating a
fresh new DIE (in that case we add DW_TAG_unspecified_parameters if it is
stdarg), or if gen_subprogram_die is called again on an pre-existing DIE
to fill in some further details (then it will not touch it).

Except for lto, subr_die != old_die would be good enough, but unfortunately
for LTO the new DIE that will refer to early dwarf created DIE is created
on the fly during lookup_decl_die.  So the patch tracks if the DIE has
no children before any children are added to it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-11-13  Jakub Jelinek  

PR debug/97599
* dwarf2out.c (gen_subprogram_die): Call
gen_unspecified_parameters_die even if not early dwarf, but only
if subr_die is a newly created DIE.

--- gcc/dwarf2out.c.jj  2020-10-27 18:38:00.001979404 +0100
+++ gcc/dwarf2out.c 2020-10-28 10:52:29.618796758 +0100
@@ -22756,6 +22756,7 @@ gen_subprogram_die (tree decl, dw_die_re
   tree origin = decl_ultimate_origin (decl);
   dw_die_ref subr_die;
   dw_die_ref old_die = lookup_decl_die (decl);
+  bool old_die_had_no_children = false;
 
   /* This function gets called multiple times for different stages of
  the debug process.  For example, for func() in this code:
@@ -22846,6 +22847,9 @@ gen_subprogram_die (tree decl, dw_die_re
   if (old_die && declaration)
 return;
 
+  if (in_lto_p && old_die && old_die->die_child == NULL)
+old_die_had_no_children = true;
+
   /* Now that the C++ front end lazily declares artificial member fns, we
  might need to retrofit the declaration into its class.  */
   if (!declaration && !origin && !old_die
@@ -23365,6 +23369,10 @@ gen_subprogram_die (tree decl, dw_die_re
  else if (DECL_INITIAL (decl) == NULL_TREE)
gen_unspecified_parameters_die (decl, subr_die);
}
+  else if ((subr_die != old_die || old_die_had_no_children)
+  && prototype_p (TREE_TYPE (decl))
+  && stdarg_p (TREE_TYPE (decl)))
+   gen_unspecified_parameters_die (decl, subr_die);
 }
 
   if (subr_die != old_die)

Jakub



[committed] openmp: Support allocate for C/C++ array section reductions

2020-11-13 Thread Jakub Jelinek via Gcc-patches
Hi!

This adds allocate clause support for array section reductions.
Furthermore, it fixes one bug that would cause inscan reductions with
allocate to be rejected by C, and for now just ignores allocate for
inscan/task reductions, that will need slightly more work.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-11-13  Jakub Jelinek  

gcc/
* omp-low.c (scan_sharing_clauses): For now remove for reduction
clauses with inscan or task modifiers decl from allocate_map.
(lower_private_allocate): Handle TYPE_P (new_var).
(lower_rec_input_clauses): Handle allocate clause for C/C++ array
reductions.
gcc/c/
* c-typeck.c (c_finish_omp_clauses): Don't clear
OMP_CLAUSE_REDUCTION_INSCAN unless reduction_seen == -2.
libgomp/
* testsuite/libgomp.c-c++-common/allocate-1.c (foo): Add tests
for array reductions.
(main): Adjust foo callers.

--- gcc/omp-low.c.jj2020-11-12 21:37:53.909422916 +0100
+++ gcc/omp-low.c   2020-11-13 15:55:09.479302108 +0100
@@ -1197,6 +1197,14 @@ scan_sharing_clauses (tree clauses, omp_
  if (is_oacc_parallel_or_serial (ctx) || is_oacc_kernels (ctx))
ctx->local_reduction_clauses
  = tree_cons (NULL, c, ctx->local_reduction_clauses);
+ if ((OMP_CLAUSE_REDUCTION_INSCAN (c)
+  || OMP_CLAUSE_REDUCTION_TASK (c)) && ctx->allocate_map)
+   {
+ tree decl = OMP_CLAUSE_DECL (c);
+ /* For now.  */
+ if (ctx->allocate_map->get (decl))
+   ctx->allocate_map->remove (decl);
+   }
  /* FALLTHRU */
 
case OMP_CLAUSE_IN_REDUCTION:
@@ -4392,13 +4400,17 @@ lower_private_allocate (tree var, tree n
   if (allocator)
 return false;
   gcc_assert (allocate_ptr == NULL_TREE);
-  if (ctx->allocate_map && DECL_P (new_var))
+  if (ctx->allocate_map
+  && (DECL_P (new_var) || (TYPE_P (new_var) && size)))
 if (tree *allocatorp = ctx->allocate_map->get (var))
   allocator = *allocatorp;
   if (allocator == NULL_TREE)
 return false;
   if (!is_ref && omp_is_reference (var))
-return false;
+{
+  allocator = NULL_TREE;
+  return false;
+}
 
   if (TREE_CODE (allocator) != INTEGER_CST)
 allocator = build_outer_var_ref (allocator, ctx);
@@ -4410,19 +4422,24 @@ lower_private_allocate (tree var, tree n
   allocator = var;
 }
 
-  tree ptr_type, align, sz;
-  if (is_ref)
+  tree ptr_type, align, sz = size;
+  if (TYPE_P (new_var))
+{
+  ptr_type = build_pointer_type (new_var);
+  align = build_int_cst (size_type_node, TYPE_ALIGN_UNIT (new_var));
+}
+  else if (is_ref)
 {
   ptr_type = build_pointer_type (TREE_TYPE (TREE_TYPE (new_var)));
   align = build_int_cst (size_type_node,
 TYPE_ALIGN_UNIT (TREE_TYPE (ptr_type)));
-  sz = size;
 }
   else
 {
   ptr_type = build_pointer_type (TREE_TYPE (new_var));
   align = build_int_cst (size_type_node, DECL_ALIGN_UNIT (new_var));
-  sz = fold_convert (size_type_node, DECL_SIZE_UNIT (new_var));
+  if (sz == NULL_TREE)
+   sz = fold_convert (size_type_node, DECL_SIZE_UNIT (new_var));
 }
   if (TREE_CODE (sz) != INTEGER_CST)
 {
@@ -4855,7 +4872,23 @@ lower_rec_input_clauses (tree clauses, g
  tree type = TREE_TYPE (d);
  gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
  tree v = TYPE_MAX_VALUE (TYPE_DOMAIN (type));
+ tree sz = v;
  const char *name = get_name (orig_var);
+ if (pass != 3 && !TREE_CONSTANT (v))
+   {
+ tree t = maybe_lookup_decl (v, ctx);
+ if (t)
+   v = t;
+ else
+   v = maybe_lookup_decl_in_outer_ctx (v, ctx);
+ gimplify_expr (&v, ilist, NULL, is_gimple_val, fb_rvalue);
+ t = fold_build2_loc (clause_loc, PLUS_EXPR,
+  TREE_TYPE (v), v,
+  build_int_cst (TREE_TYPE (v), 1));
+ sz = fold_build2_loc (clause_loc, MULT_EXPR,
+   TREE_TYPE (v), t,
+   TYPE_SIZE_UNIT (TREE_TYPE (type)));
+   }
  if (pass == 3)
{
  tree xv = create_tmp_var (ptr_type_node);
@@ -4913,6 +4946,13 @@ lower_rec_input_clauses (tree clauses, g
  gimplify_assign (cond, x, ilist);
  x = xv;
}
+ else if (lower_private_allocate (var, type, allocator,
+  allocate_ptr, ilist, ctx,
+  true,
+  TREE_CONSTANT (v)
+  ? TYPE_SIZE_UNIT (type)
+  : sz))
+

[COMMITTED] Implementation of asm goto outputs

2020-11-13 Thread Vladimir Makarov via Gcc-patches
The original patch has been modified according to the reviewers comments 
and the following patch has been committed.



commit e3b3b59683c1e7d31a9d313dd97394abebf644be
Author: Vladimir N. Makarov 
Date:   Fri Nov 13 12:45:59 2020 -0500

[PATCH] Implementation of asm goto outputs

gcc/
* cfgexpand.c (expand_asm_stmt): Output asm goto with outputs too.
Place insns after asm goto on edges.
* doc/extend.texi: Reflect the changes in asm goto documentation.
* gimple.c (gimple_build_asm_1): Remove an assert checking output
absence for asm goto.
* gimple.h (gimple_asm_label_op, gimple_asm_set_label_op): Take
possible asm goto outputs into account.
* ira.c (ira): Remove critical edges for potential asm goto output
reloads.
(ira_nullify_asm_goto): New function.
* ira.h (ira_nullify_asm_goto): New prototype.
* lra-assigns.c (lra_split_hard_reg_for): Use ira_nullify_asm_goto.
Check that splitting is done inside a basic block.
* lra-constraints.c (curr_insn_transform): Permit output reloads
for any jump insn.
* lra-spills.c (lra_final_code_change): Remove USEs added in ira
for asm gotos.
* lra.c (lra_process_new_insns): Place output reload insns after
jumps in the beginning of destination BBs.
* reload.c (find_reloads): Report error for asm gotos with
outputs.  Modify them to keep CFG consistency to avoid crashes.
* tree-into-ssa.c (rewrite_stmt): Don't put debug stmt after asm
goto.

gcc/c/
* c-parser.c (c_parser_asm_statement): Parse outputs for asm
goto too.
* c-typeck.c (build_asm_expr): Remove an assert checking output
absence for asm goto.

gcc/cp
* parser.c (cp_parser_asm_definition): Parse outputs for asm
goto too.

gcc/testsuite/
* c-c++-common/asmgoto-2.c: Permit output in asm goto.
* gcc.c-torture/compile/asmgoto-2.c: New.
* gcc.c-torture/compile/asmgoto-3.c: New.
* gcc.c-torture/compile/asmgoto-4.c: New.
* gcc.c-torture/compile/asmgoto-5.c: New.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index f4c4cf7bf8f..7540a15d65d 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -7144,10 +7144,7 @@ c_parser_asm_statement (c_parser *parser)
 	switch (section)
 	  {
 	  case 0:
-	/* For asm goto, we don't allow output operands, but reserve
-	   the slot for a future extension that does allow them.  */
-	if (!is_goto)
-	  outputs = c_parser_asm_operands (parser);
+	outputs = c_parser_asm_operands (parser);
 	break;
 	  case 1:
 	inputs = c_parser_asm_operands (parser);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 26a5f7128d2..413109c916c 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -10666,10 +10666,6 @@ build_asm_expr (location_t loc, tree string, tree outputs, tree inputs,
   TREE_VALUE (tail) = input;
 }
 
-  /* ASMs with labels cannot have outputs.  This should have been
- enforced by the parser.  */
-  gcc_assert (outputs == NULL || labels == NULL);
-
   args = build_stmt (loc, ASM_EXPR, string, outputs, inputs, clobbers, labels);
 
   /* asm statements without outputs, including simple ones, are treated
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 1b7bdbc15be..1df6f4bc55a 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3371,20 +3371,21 @@ expand_asm_stmt (gasm *stmt)
 			   ARGVEC CONSTRAINTS OPNAMES))
  If there is more than one, put them inside a PARALLEL.  */
 
-  if (nlabels > 0 && nclobbers == 0)
-{
-  gcc_assert (noutputs == 0);
-  emit_jump_insn (body);
-}
-  else if (noutputs == 0 && nclobbers == 0)
+  if (noutputs == 0 && nclobbers == 0)
 {
   /* No output operands: put in a raw ASM_OPERANDS rtx.  */
-  emit_insn (body);
+  if (nlabels > 0)
+	emit_jump_insn (body);
+  else
+	emit_insn (body);
 }
   else if (noutputs == 1 && nclobbers == 0)
 {
   ASM_OPERANDS_OUTPUT_CONSTRAINT (body) = constraints[0];
-  emit_insn (gen_rtx_SET (output_rvec[0], body));
+  if (nlabels > 0)
+	emit_jump_insn (gen_rtx_SET (output_rvec[0], body));
+  else 
+	emit_insn (gen_rtx_SET (output_rvec[0], body));
 }
   else
 {
@@ -3461,7 +3462,27 @@ expand_asm_stmt (gasm *stmt)
   if (after_md_seq)
 emit_insn (after_md_seq);
   if (after_rtl_seq)
-emit_insn (after_rtl_seq);
+{
+  if (nlabels == 0)
+	emit_insn (after_rtl_seq);
+  else
+	{
+	  edge e;
+	  edge_iterator ei;
+	  
+	  FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->succs)
+	{
+	  start_sequence ();
+	  for (rtx_insn *curr = after_rtl_seq;
+		   curr != NULL_RTX;
+		   curr = NEXT_INSN (curr))
+		emit_insn (copy_insn (PATTERN (curr)));
+	  rtx_insn *cop

Re: [PATCH] testsuite: guality/redeclaration1.C test workaround

2020-11-13 Thread Jeff Law via Gcc-patches


On 11/13/20 10:37 AM, Jakub Jelinek via Gcc-patches wrote:
> Hi!
>
> Apparently older GDB versions didn't handle this test right and so while
> it has been properly printing 42 on line 14 (e.g. on x86_64), it issued
> a weird error on line 17 (and because it didn't print any value, guality
> testsuite wasn't marking it as FAIL).
> That has been apparently fixed in GDB 10, where it now (on x86_64) prints
> properly.
> Unfortunately that revealed that the test can suffer from instruction
> scheduling, where e.g. on i686 (but various other arches) the very first
> insn of the function (or whatever b 14 is on) happens to be load of the
> S::i variable from memory and that insn has the inner lexical scope, so
> GDB 10 prints there 24 instead of 42.  The following insn is then
> the first store to l and there the automatic i is in scope and prints as 42
> and then the second store to l where the inner lexical scope is current
> and prints 24 again.
> The test wasn't meant about insn scheduling but about whether we emit the
> DIEs properly, so this hack attempts to prevent the undesirable scheduling.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2020-11-13  Jakub Jelinek  
>
>   * g++.dg/guality/redeclaration1.C (p): New variable.
>   (S::f): Increment what p points to before storing S::i into l.  Adjust
>   gdb-test line numbers.
>   (main): Initialize p to address of an automatic variable.

OK

jeff




  1   2   >