date:20221111

Re: [PATCH] RISC-V: Add RVV registers register spilling

2022-11-11 Thread Kito Cheng via Gcc-patches

Committed, thanks !

On Sun, Nov 6, 2022 at 1:57 AM  wrote:
>
> From: Ju-Zhe Zhong 
>
> This patch support RVV scalable register spilling.
> prologue && epilogue handling pick up prototype from Monk Chiang 
> .
> Co-authored-by: Monk Chiang 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-v.cc (emit_pred_move): Adjust for scalable 
> register spilling.
> (legitimize_move): Ditto.
> * config/riscv/riscv.cc (riscv_v_adjust_scalable_frame): New function.
> (riscv_first_stack_step): Adjust for scalable register spilling.
> (riscv_expand_prologue): Ditto.
> (riscv_expand_epilogue): Ditto.
> (riscv_dwarf_poly_indeterminate_value): New function.
> (TARGET_DWARF_POLY_INDETERMINATE_VALUE): New target hook support for 
> register spilling.
> * config/riscv/riscv.h (RISCV_DWARF_VLENB): New macro.
> (RISCV_PROLOGUE_TEMP2_REGNUM): Ditto.
> (RISCV_PROLOGUE_TEMP2): Ditto.
> * config/riscv/vector-iterators.md: New iterators.
> * config/riscv/vector.md (*mov): Fix it for register spilling.
> (*mov_whole): New pattern.
> (*mov_fract): New pattern.
> (@pred_mov): Fix it for register spilling.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/mov-9.c:
> * gcc.target/riscv/rvv/base/macro.h: New test.
> * gcc.target/riscv/rvv/base/spill-1.c: New test.
> * gcc.target/riscv/rvv/base/spill-10.c: New test.
> * gcc.target/riscv/rvv/base/spill-11.c: New test.
> * gcc.target/riscv/rvv/base/spill-12.c: New test.
> * gcc.target/riscv/rvv/base/spill-2.c: New test.
> * gcc.target/riscv/rvv/base/spill-3.c: New test.
> * gcc.target/riscv/rvv/base/spill-4.c: New test.
> * gcc.target/riscv/rvv/base/spill-5.c: New test.
> * gcc.target/riscv/rvv/base/spill-6.c: New test.
> * gcc.target/riscv/rvv/base/spill-7.c: New test.
> * gcc.target/riscv/rvv/base/spill-8.c: New test.
> * gcc.target/riscv/rvv/base/spill-9.c: New test.
>
> ---
>  gcc/config/riscv/riscv-v.cc   |  47 +--
>  gcc/config/riscv/riscv.cc | 147 ++-
>  gcc/config/riscv/riscv.h  |   3 +
>  gcc/config/riscv/vector-iterators.md  |  23 ++
>  gcc/config/riscv/vector.md| 136 +--
>  .../gcc.target/riscv/rvv/base/macro.h |   6 +
>  .../gcc.target/riscv/rvv/base/mov-9.c |   8 +-
>  .../gcc.target/riscv/rvv/base/spill-1.c   | 385 ++
>  .../gcc.target/riscv/rvv/base/spill-10.c  |  41 ++
>  .../gcc.target/riscv/rvv/base/spill-11.c  |  60 +++
>  .../gcc.target/riscv/rvv/base/spill-12.c  |  47 +++
>  .../gcc.target/riscv/rvv/base/spill-2.c   | 320 +++
>  .../gcc.target/riscv/rvv/base/spill-3.c   | 254 
>  .../gcc.target/riscv/rvv/base/spill-4.c   | 196 +
>  .../gcc.target/riscv/rvv/base/spill-5.c   | 130 ++
>  .../gcc.target/riscv/rvv/base/spill-6.c   | 101 +
>  .../gcc.target/riscv/rvv/base/spill-7.c   | 114 ++
>  .../gcc.target/riscv/rvv/base/spill-8.c   |  51 +++
>  .../gcc.target/riscv/rvv/base/spill-9.c   |  42 ++
>  19 files changed, 2021 insertions(+), 90 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/macro.h
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-11.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-12.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-8.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-9.c
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 6615a5c7ffe..e0459e3f610 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -106,28 +106,25 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT 
> minval,
>
>  /* Emit an RVV unmask && vl mov from SRC to DEST.  */
>  static void
> -emit_pred_move (rtx dest, rtx src, rtx vl, machine_mode mask_mode)
> +emit_pred_move (rtx dest, rtx src, machine_mode mask_mode)
>  {
>insn_expander<7> e;
> -
>machine_mode mode = GET_MODE (dest);
> -  if (register_operand (src, mode) && register_operand (dest, mode))
> -{
> -  emit_move_insn (dest, src);
> -  return;
> -}
> +  rtx vl = gen_reg_rtx (Pmode);
> +  unsigned int sew = GET_MO

[PATCH] Using sub-scalars mode to move struct block

2022-11-11 Thread Jiufu Guo via Gcc-patches

Hi,

When assigning a struct parameter to another variable, or loading a
memory block to a struct var (especially for return value),
Now, "block move" would be used during expand the assignment. And
the "block move" may use a type/mode different from the mode which
is accessing the var. e.g. on ppc64le, V2DI would be used to move
the block of 16bytes.

And then, this "block move" would prevent optimization passes from
leaping/crossing over the assignment. PR65421 reflects this issue.

As the example code in PR65421.

typedef struct { double a[4]; } A;
A foo (const A *a) { return *a; }

On ppc64le, the below instructions are used for the "block move":
  7: r122:V2DI=[r121:DI]
  8: r124:V2DI=[r121:DI+r123:DI]
  9: [r112:DI]=r122:V2DI
  10: [r112:DI+0x10]=r124:V2DI

For this issue, a few comments/suggestions are mentioned via RFC:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
I drafted a patch which is updating the behavior of block_move for
struct type. This patch is simple to work with, a few ideas in the
comments are not put into this patch. I would submit this
patch first.

The idea is trying to use sub-modes(scalar) for the "block move".
And the sub-modes would align with the access patterns of the
struct members and usages on parameter/return value.
The major benefits of this change would be raising more 
opportunities for other optimization passes(cse/dse/xprop).

The suitable mode would be target specified and relates to ABI,
this patch introduces a target hook. And in this patch, the hook
is implemented on rs6000.

In this patch, the hook would be just using heuristic modes for all
struct block moving. And the hook would not check if the "block move"
is about parameters or return value or other uses.

For the rs6000 implementation of this hook, it is able to use
DF/DI/TD/.. modes for the struct block movement. The sub-modes
would be the same as the mode when the struct type is on parameter or
return value.

Bootstrapped and regtested on ppc64/ppc64le. 
Is this ok for trunk?


BR,
Jeff(Jiufu)


gcc/ChangeLog:

* config/rs6000/rs6000.cc (TARGET_BLOCK_MOVE_FOR_STRUCT): Define.
(submode_for_struct_block_move): New function.  Called from
rs600_block_move_for_struct.
(rs600_block_move_for_struct): New function.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_BLOCK_MOVE_FOR_STRUCT): New.
* expr.cc (store_expr): Call block_move_for_struct.
* target.def (block_move_for_struct): New hook.
* targhooks.cc (default_block_move_for_struct): New function.
* targhooks.h (default_block_move_for_struct): New Prototype.

---
 gcc/config/rs6000/rs6000.cc | 44 +
 gcc/doc/tm.texi |  6 +
 gcc/doc/tm.texi.in  |  2 ++
 gcc/expr.cc | 14 +---
 gcc/target.def  | 10 +
 gcc/targhooks.cc|  7 ++
 gcc/targhooks.h |  1 +
 7 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a85d7630b41..e14cecba0ef 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1758,6 +1758,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_NEED_IPA_FN_TARGET_INFO
 #define TARGET_NEED_IPA_FN_TARGET_INFO rs6000_need_ipa_fn_target_info
 
+#undef TARGET_BLOCK_MOVE_FOR_STRUCT
+#define TARGET_BLOCK_MOVE_FOR_STRUCT rs600_block_move_for_struct
+
 #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
 #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
 
@@ -23672,6 +23675,47 @@ rs6000_function_value (const_tree valtype,
   return gen_rtx_REG (mode, regno);
 }
 
+/* Subroutine of rs600_block_move_for_struct, to get the internal mode which
+   would be used to move the struct.  */
+static machine_mode
+submode_for_struct_block_move (tree type)
+{
+  gcc_assert (TREE_CODE (type) == RECORD_TYPE);
+
+  /* The sub mode may not be the field's type of the struct.
+ It would be fine to use the mode as if the type is used as a function
+ parameter or return value.  For example: DF for "{double a[4];}", and
+ DI for "{doubel a[3]; long l;}".
+ Here, using the mode as if it is function return type.  */
+  rtx val = rs6000_function_value (type, NULL, 0);
+  return (GET_CODE (val) == PARALLEL) ? GET_MODE (XEXP (XVECEXP (val, 0, 0), 
0))
+ : word_mode;
+}
+
+/* Implement the TARGET_BLOCK_MOVE_FOR_STRUCT hook.  */
+static void
+rs600_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT method)
+{
+  machine_mode mode = submode_for_struct_block_move (TREE_TYPE (exp));
+  int mode_size = GET_MODE_SIZE (mode);
+  int size = UINTVAL (expr_size (exp));
+  if (size < mode_size || (size % mode_size) != 0 || size > 64)
+{
+  default_block_move_for_struct (x, y, exp, method);
+  return;
+}
+
+  int len = size / mode_size;
+  for (int i = 0; i < le

Re: old install to a different folder

2022-11-11 Thread Tobias Burnus


Hi Gerald,

On 10.11.22 20:24, Gerald Pfeifer wrote:

On Thu, 10 Nov 2022, Martin Liška wrote:

We noticed we'll need the old /install to be available for redirect.

Gerald, can you please put it somewhere under /install-prev, or
something similar?

I'm afraid I am confused now. Based on your original request I had removed
the original /install directoy.


I think we just need to handle more. Namely:

* Links directly to https://gcc.gnu.org/install/
  this works and shows the new page.

* Sublinks - those currently fail as the name has changed:
  https://gcc.gnu.org/install/configure.html (which is now 
https://gcc.gnu.org/install/configuration.html )
  https://gcc.gnu.org/install/build.html (now: 
https://gcc.gnu.org/install/building.html )
  https://gcc.gnu.org/install/specific.html#avr → 
https://gcc.gnu.org/install/host-target-specific-installation-notes-for-gcc.html#avr

My impression is that it is sufficient to handle those renamings and we do not 
need the old pages.

However, others might have different ideas. Note that this was discussed in the thread 
"Links to web pages are broken."

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: old install to a different folder

2022-11-11 Thread Martin Liška

On 11/11/22 09:40, Tobias Burnus wrote:
> However, others might have different ideas. Note that this was discussed in 
> the thread "Links to web pages are broken."

Yes, please discuss this further in the aforementioned thread.

I do support the Richi's idea about using a new URL for the new Sphinx 
documentation
while keeping the older Texinfo documentation under /onlinedocs and /install

Martin

[PATCH] range-op, v2: Implement floating point multiplication fold_range [PR107569]

2022-11-11 Thread Jakub Jelinek via Gcc-patches

On Thu, Nov 10, 2022 at 08:20:06PM +0100, Jakub Jelinek via Gcc-patches wrote:
> This made me think about it some more and I'll need to play around with it
> some more, perhaps the right thing is similarly to what I've attached for
> division to handle special cases upfront and call frange_arithmetic only
> for the safe cases.
> E.g. one case which the posted foperator_mult handles pessimistically is
> [0.0, 10.0] * [INF, INF].  This should be just [INF, INF] +-NAN IMHO,
> because the 0.0 * INF case will result in NAN, while
> nextafter (0.0, 1.0) * INF
> will be already INF and everything larger as well.
> I could in frange_mult be very conservative and for the 0 * INF cases
> set result_lb and result_ub to [0.0, INF] range (corresponding signs
> depending on the xor of sign of ops), but that would be quite pessimistic as
> well.  If one has:
> [0.0, 0.0] * [10.0, INF], the result should be just [0.0, 0.0] +-NAN,
> because again 0.0 * INF is NAN, but 0.0 * nextafter (INF, 0.0) is already 0.0.
> 
> Note, the is_square case doesn't suffer from all of this mess, the result
> is never NAN (unless operand is NAN).

Ok, here is the patch rewritten in the foperator_div style, with special
cases handled first and then the ordinary cases without problematic cases.
I guess if/once we have a plugin testing infrastructure, we could compare
the two versions of the patch, I think this one is more precise.
And, admittedly there are many similar spots with the foperator_div case
(but also with significant differences), so perhaps if foperator_{mult,div}
inherit from some derived class from range_operator_float and that class
would define various smaller helper static? methods, like this
discussed in the PR - contains_zero_p, singleton_nan_p, zero_p,
that
+   bool must_have_signbit_zero = false;
+   bool must_have_signbit_nonzero = false;
+   if (real_isneg (&lh_lb) == real_isneg (&lh_ub)
+   && real_isneg (&rh_lb) == real_isneg (&rh_ub))
+ {
+   if (real_isneg (&lh_lb) == real_isneg (&rh_ub))
+ must_have_signbit_zero = true;
+   else
+ must_have_signbit_nonzero = true;
+ }
returned as -1/0/1 int, and those set result (based on the above value) to
[+INF, +INF], [-INF, -INF] or [-INF, +INF]
or
[+0, +0], [-0, -0] or [-0, +0]
or
[+0, +INF], [-INF, -0] or [-INF, +INF]
and the
+for (int i = 1; i < 4; ++i)
+  {
+   if (real_less (&cp[i], &cp[0])
+   || (real_iszero (&cp[0]) && real_isnegzero (&cp[i])))
+ std::swap (cp[i], cp[0]);
+   if (real_less (&cp[4], &cp[i + 4])
+   || (real_isnegzero (&cp[4]) && real_iszero (&cp[i + 4])))
+ std::swap (cp[i + 4], cp[4]);
+  }
block, it could be smaller and more readable.

Thoughts?

This has been just compile tested so far.

2022-11-11  Jakub Jelinek  

PR tree-optimization/107569
PR tree-optimization/107591
* range-op.h (range_operator_float::rv_fold): Add relation_kind
argument.
* range-op-float.cc (range_operator_float::fold_range): Name
last argument trio and pass trio.op1_op2 () as last argument to
rv_fold.
(range_operator_float::rv_fold): Add relation_kind argument.
(foperator_plus::rv_fold, foperator_minus::rv_fold): Likewise.
(foperator_mult): New class.
(floating_op_table::floating_op_table): Use foperator_mult for
MULT_EXPR.

--- gcc/range-op.h.jj   2022-11-11 08:15:20.952520590 +0100
+++ gcc/range-op.h  2022-11-11 08:48:27.649349048 +0100
@@ -123,7 +123,8 @@ public:
const REAL_VALUE_TYPE &lh_lb,
const REAL_VALUE_TYPE &lh_ub,
const REAL_VALUE_TYPE &rh_lb,
-   const REAL_VALUE_TYPE &rh_ub) const;
+   const REAL_VALUE_TYPE &rh_ub,
+   relation_kind) const;
   // Unary operations have the range of the LHS as op2.
   virtual bool fold_range (irange &r, tree type,
   const frange &lh,
--- gcc/range-op-float.cc.jj2022-11-11 08:15:20.933520849 +0100
+++ gcc/range-op-float.cc   2022-11-11 09:39:14.950523368 +0100
@@ -51,7 +51,7 @@ along with GCC; see the file COPYING3.
 bool
 range_operator_float::fold_range (frange &r, tree type,
  const frange &op1, const frange &op2,
- relation_trio) const
+ relation_trio trio) const
 {
   if (empty_range_varying (r, type, op1, op2))
 return true;
@@ -65,7 +65,7 @@ range_operator_float::fold_range (frange
   bool maybe_nan;
   rv_fold (lb, ub, maybe_nan, type,
   op1.lower_bound (), op1.upper_bound (),
-  op2.lower_bound (), op2.upper_bound ());
+  op2.lower_bound (), op2.upper_bound (), trio.op1_op2 ());
 
   // Handle possible NANs by saturating to the appropriate INF if only
   // one end is a NAN.  If

[PATCH] x86: Enable 256 move by pieces for ALDERLAKE and AVX2.

2022-11-11 Thread Cui,Lili via Gcc-patches

From: Lili Cui 

Hi Hontao,

This patch is to enable 256 move by pieces for ALDERLAKE and AVX2.
Bootstrap is ok, and no regressions for i386/x86-64 testsuite.

OK for master?


gcc/Changelog:

* config/i386/x86-tune.def
(X86_TUNE_AVX256_MOVE_BY_PIECES): Add alderlake and avx2.
(X86_TUNE_AVX256_STORE_BY_PIECES): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pieces-memset-50.c: New test.
---
 gcc/config/i386/x86-tune.def |  4 ++--
 gcc/testsuite/gcc.target/i386/pieces-memset-50.c | 12 
 2 files changed, 14 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-50.c

diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 58e29e7806a..cd66f335113 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -536,12 +536,12 @@ DEF_TUNE (X86_TUNE_AVX256_OPTIMAL, "avx256_optimal", 
m_CORE_AVX512)
 /* X86_TUNE_AVX256_MOVE_BY_PIECES: Optimize move_by_pieces with 256-bit
AVX instructions.  */
 DEF_TUNE (X86_TUNE_AVX256_MOVE_BY_PIECES, "avx256_move_by_pieces",
- m_CORE_AVX512)
+ m_ALDERLAKE | m_CORE_AVX2)
 
 /* X86_TUNE_AVX256_STORE_BY_PIECES: Optimize store_by_pieces with 256-bit
AVX instructions.  */
 DEF_TUNE (X86_TUNE_AVX256_STORE_BY_PIECES, "avx256_store_by_pieces",
- m_CORE_AVX512)
+ m_ALDERLAKE | m_CORE_AVX2)
 
 /* X86_TUNE_AVX512_MOVE_BY_PIECES: Optimize move_by_pieces with 512-bit
AVX instructions.  */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-50.c 
b/gcc/testsuite/gcc.target/i386/pieces-memset-50.c
new file mode 100644
index 000..c09e7c3649c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-50.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=alderlake" } */
+
+extern char *dst;
+
+void
+foo (int x)
+{
+  __builtin_memset (dst, x, 64);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */
-- 
2.17.1

Thanks,
Lili.

[PATCH] range-op: Implement floating point division fold_range [PR107569]

2022-11-11 Thread Jakub Jelinek via Gcc-patches

Hi!

Here is the floating point division fold_range implementation,
as I wrote in the last mail, we could outline some of the common parts
into static methods with descriptive names and share them between
foperator_div and foperator_mult.

Bootstrapped/regtested on top of the earlier version of the multiplication
fold_range on x86_64-linux and i686-linux, regressions are
+FAIL: gcc.dg/pr95115.c execution test
+FAIL: libphobos.phobos/std/math/hardware.d execution test
+FAIL: libphobos.phobos_shared/std/math/hardware.d execution test
The first test is we have:
  # RANGE [frange] double [] +-NAN
  _3 =  Inf /  Inf;
  if (_3 ord _3)
goto ; [INV]
  else
goto ; [INV]

   :
  abort ();

   :
before evrp, the range is correct, Inf / Inf is known NAN of unknown
sign.  evrp correctly folds _3 ord _3 into false and the
  _3 =  Inf /  Inf;
remains in the IL, but then comes dse1 and removes it as dead
statement.  So, I think yet another example of the PR107608 problems
where DCE? removes dead statements which raise floating point exceptions.
And -fno-delete-dead-exceptions doesn't help.

2022-11-11  Jakub Jelinek  

PR tree-optimization/107569
* range-op-float.cc (foperator_div): New class.
(floating_op_table::floating_op_table): Use foperator_div
for RDIV_EXPR.

--- gcc/range-op-float.cc.jj2022-11-10 12:31:57.987917289 +0100
+++ gcc/range-op-float.cc   2022-11-10 17:04:35.743056880 +0100
@@ -2027,6 +2027,183 @@ class foperator_mult : public range_oper
   }
 } fop_mult;
 
+class foperator_div : public range_operator_float
+{
+  void rv_fold (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub, bool &maybe_nan,
+   tree type,
+   const REAL_VALUE_TYPE &lh_lb,
+   const REAL_VALUE_TYPE &lh_ub,
+   const REAL_VALUE_TYPE &rh_lb,
+   const REAL_VALUE_TYPE &rh_ub,
+   relation_kind) const final override
+  {
+// +-0.0 / +-0.0 or +-INF / +-INF is a known NAN.
+if ((real_iszero (&lh_lb)
+&& real_iszero (&lh_ub)
+&& real_iszero (&rh_lb)
+&& real_iszero (&rh_ub))
+   || (real_isinf (&lh_lb)
+   && real_isinf (&lh_ub, real_isneg (&lh_lb))
+   && real_isinf (&rh_lb)
+   && real_isinf (&rh_ub, real_isneg (&rh_lb
+  {
+   real_nan (&lb, "", 0, TYPE_MODE (type));
+   ub = lb;
+   maybe_nan = true;
+   return;
+  }
+
+bool both_maybe_zero = false;
+bool both_maybe_inf = false;
+bool must_have_signbit_zero = false;
+bool must_have_signbit_nonzero = false;
+
+// If +-0.0 is in both ranges, it is a maybe NAN.
+if (real_compare (LE_EXPR, &lh_lb, &dconst0)
+   && real_compare (GE_EXPR, &lh_ub, &dconst0)
+   && real_compare (LE_EXPR, &rh_lb, &dconst0)
+   && real_compare (GE_EXPR, &rh_ub, &dconst0))
+  {
+   both_maybe_zero = true;
+   maybe_nan = true;
+  }
+// If +-INF is in both ranges, it is a maybe NAN.
+else if ((real_isinf (&lh_lb) || real_isinf (&lh_ub))
+&& (real_isinf (&rh_lb) || real_isinf (&rh_ub)))
+  {
+   both_maybe_inf = true;
+   maybe_nan = true;
+  }
+else
+  maybe_nan = false;
+
+if (real_isneg (&lh_lb) == real_isneg (&lh_ub)
+   && real_isneg (&rh_lb) == real_isneg (&rh_ub))
+  {
+   if (real_isneg (&lh_lb) == real_isneg (&rh_ub))
+ must_have_signbit_zero = true;
+   else
+ must_have_signbit_nonzero = true;
+  }
+
+// If dividend must be zero, the range is just +-0
+// (including if the divisor is +-INF).
+// If divisor must be +-INF, the range is just +-0
+// (including if the dividend is zero).
+if ((real_iszero (&lh_lb) && real_iszero (&lh_ub))
+   || real_isinf (&rh_lb, false)
+   || real_isinf (&rh_ub, true))
+  {
+   ub = lb = dconst0;
+   // If all the boundary signs are the same, [+0.0, +0.0].
+   if (must_have_signbit_zero)
+ ;
+   // If divisor and dividend must have different signs,
+   // [-0.0, -0.0].
+   else if (must_have_signbit_nonzero)
+ ub = lb = real_value_negate (&dconst0);
+   // Otherwise -> [-0.0, +0.0].
+   else
+ lb = real_value_negate (&dconst0);
+   return;
+  }
+
+// If divisor must be zero, the range is just +-INF
+// (including if the dividend is +-INF).
+// If dividend must be +-INF, the range is just +-INF
+// (including if the dividend is zero).
+if ((real_iszero (&rh_lb) && real_iszero (&rh_ub))
+   || real_isinf (&lh_lb, false)
+   || real_isinf (&lh_ub, true))
+  {
+   // If all the boundary signs are the same, [+INF, +INF].
+   if (must_have_signbit_zero)
+ ub = lb = dconstinf;
+   // If divisor and dividend must have different signs,
+   // [-INF, -INF].
+   else if (must_have_signbit_nonzero)
+ ub = lb = dconstninf;
+   // Otherwise -> [-INF, +INF] (-INF or +INF).
+   else
+ {

Re: old install to a different folder

2022-11-11 Thread Tobias Burnus


On 11.11.22 09:50, Martin Liška wrote:

I do support the Richi's idea about using a new URL for the new Sphinx 
documentation
while keeping the older Texinfo documentation under /onlinedocs and /install


If we do so and those become then static files: Can we put some
disclaimer at the top of all HTML files under /install/ and under
/onlinedocs// that those are legacy files and the new
documentation can be found under  (not a deep link but directly to
the install pages or the new overview page about the Sphinx docs).

I think we really need such a hint – otherwise it is more confusing than
helpful! Additionally, we should add a "news" entry to the mainpage
pointing out that it changed and linking to the new Sphinx doc.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH] Using sub-scalars mode to move struct block

2022-11-11 Thread Richard Biener via Gcc-patches

On Fri, 11 Nov 2022, Jiufu Guo wrote:

> Hi,
> 
> When assigning a struct parameter to another variable, or loading a
> memory block to a struct var (especially for return value),
> Now, "block move" would be used during expand the assignment. And
> the "block move" may use a type/mode different from the mode which
> is accessing the var. e.g. on ppc64le, V2DI would be used to move
> the block of 16bytes.
> 
> And then, this "block move" would prevent optimization passes from
> leaping/crossing over the assignment. PR65421 reflects this issue.
> 
> As the example code in PR65421.
> 
> typedef struct { double a[4]; } A;
> A foo (const A *a) { return *a; }
> 
> On ppc64le, the below instructions are used for the "block move":
>   7: r122:V2DI=[r121:DI]
>   8: r124:V2DI=[r121:DI+r123:DI]
>   9: [r112:DI]=r122:V2DI
>   10: [r112:DI+0x10]=r124:V2DI
> 
> For this issue, a few comments/suggestions are mentioned via RFC:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
> I drafted a patch which is updating the behavior of block_move for
> struct type. This patch is simple to work with, a few ideas in the
> comments are not put into this patch. I would submit this
> patch first.
> 
> The idea is trying to use sub-modes(scalar) for the "block move".
> And the sub-modes would align with the access patterns of the
> struct members and usages on parameter/return value.
> The major benefits of this change would be raising more 
> opportunities for other optimization passes(cse/dse/xprop).
> 
> The suitable mode would be target specified and relates to ABI,
> this patch introduces a target hook. And in this patch, the hook
> is implemented on rs6000.
> 
> In this patch, the hook would be just using heuristic modes for all
> struct block moving. And the hook would not check if the "block move"
> is about parameters or return value or other uses.
> 
> For the rs6000 implementation of this hook, it is able to use
> DF/DI/TD/.. modes for the struct block movement. The sub-modes
> would be the same as the mode when the struct type is on parameter or
> return value.
> 
> Bootstrapped and regtested on ppc64/ppc64le. 
> Is this ok for trunk?
> 
> 
> BR,
> Jeff(Jiufu)
> 
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (TARGET_BLOCK_MOVE_FOR_STRUCT): Define.
>   (submode_for_struct_block_move): New function.  Called from
>   rs600_block_move_for_struct.
>   (rs600_block_move_for_struct): New function.
>   * doc/tm.texi: Regenerate.
>   * doc/tm.texi.in (TARGET_BLOCK_MOVE_FOR_STRUCT): New.
>   * expr.cc (store_expr): Call block_move_for_struct.
>   * target.def (block_move_for_struct): New hook.
>   * targhooks.cc (default_block_move_for_struct): New function.
>   * targhooks.h (default_block_move_for_struct): New Prototype.
> 
> ---
>  gcc/config/rs6000/rs6000.cc | 44 +
>  gcc/doc/tm.texi |  6 +
>  gcc/doc/tm.texi.in  |  2 ++
>  gcc/expr.cc | 14 +---
>  gcc/target.def  | 10 +
>  gcc/targhooks.cc|  7 ++
>  gcc/targhooks.h |  1 +
>  7 files changed, 81 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index a85d7630b41..e14cecba0ef 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1758,6 +1758,9 @@ static const struct attribute_spec 
> rs6000_attribute_table[] =
>  #undef TARGET_NEED_IPA_FN_TARGET_INFO
>  #define TARGET_NEED_IPA_FN_TARGET_INFO rs6000_need_ipa_fn_target_info
>  
> +#undef TARGET_BLOCK_MOVE_FOR_STRUCT
> +#define TARGET_BLOCK_MOVE_FOR_STRUCT rs600_block_move_for_struct
> +
>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
>  
> @@ -23672,6 +23675,47 @@ rs6000_function_value (const_tree valtype,
>return gen_rtx_REG (mode, regno);
>  }
>  
> +/* Subroutine of rs600_block_move_for_struct, to get the internal mode which
> +   would be used to move the struct.  */
> +static machine_mode
> +submode_for_struct_block_move (tree type)
> +{
> +  gcc_assert (TREE_CODE (type) == RECORD_TYPE);
> +
> +  /* The sub mode may not be the field's type of the struct.
> + It would be fine to use the mode as if the type is used as a function
> + parameter or return value.  For example: DF for "{double a[4];}", and
> + DI for "{doubel a[3]; long l;}".
> + Here, using the mode as if it is function return type.  */
> +  rtx val = rs6000_function_value (type, NULL, 0);
> +  return (GET_CODE (val) == PARALLEL) ? GET_MODE (XEXP (XVECEXP (val, 0, 0), 
> 0))
> +   : word_mode;
> +}
> +
> +/* Implement the TARGET_BLOCK_MOVE_FOR_STRUCT hook.  */
> +static void
> +rs600_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT method)
> +{
> +  machine_mode mode = submode_for_struct_block_move (TREE_TYPE (exp));
> +  int mode_size = GET_MODE_S

Re: [PATCH] range-op: Implement floating point multiplication fold_range [PR107569]

2022-11-11 Thread Aldy Hernandez via Gcc-patches





On 11/10/22 20:20, Jakub Jelinek wrote:

On Thu, Nov 10, 2022 at 03:50:47PM +0100, Aldy Hernandez wrote:

@@ -1908,6 +1910,123 @@ class foperator_minus : public range_ope
 }
   } fop_minus;
+/* Wrapper around frange_arithmetics, that computes the result
+   if inexact rounded to both directions.  Also, if one of the
+   operands is +-0.0 and another +-INF, return +-0.0 rather than
+   NAN.  */


s/frange_arithmetics/frange_arithmetic/

Also, would you mind written a little blurb about why it's necessary not to
compute INF*0.0 as NAN.  I assume it's because you're using it for the cross
product and you'll set maybe_nan separately, but it's nice to spell it out.


This made me think about it some more and I'll need to play around with it
some more, perhaps the right thing is similarly to what I've attached for
division to handle special cases upfront and call frange_arithmetic only
for the safe cases.
E.g. one case which the posted foperator_mult handles pessimistically is
[0.0, 10.0] * [INF, INF].  This should be just [INF, INF] +-NAN IMHO,
because the 0.0 * INF case will result in NAN, while
nextafter (0.0, 1.0) * INF
will be already INF and everything larger as well.
I could in frange_mult be very conservative and for the 0 * INF cases
set result_lb and result_ub to [0.0, INF] range (corresponding signs
depending on the xor of sign of ops), but that would be quite pessimistic as
well.  If one has:
[0.0, 0.0] * [10.0, INF], the result should be just [0.0, 0.0] +-NAN,
because again 0.0 * INF is NAN, but 0.0 * nextafter (INF, 0.0) is already 0.0.

Note, the is_square case doesn't suffer from all of this mess, the result
is never NAN (unless operand is NAN).


It'd be nice to have some testcases.  For example, from what I can see, the
original integer multiplication code came with some tests in
gcc.dg/tree-ssa/vrp13.c (commit 9983270bec0a18).  It'd be nice to have some
sanity checks, especially because so many things can go wrong with floats.

I'll leave it to you to decide what tests to include.


I've tried following, but it suffers from various issues:
1) we don't handle __builtin_signbit (whatever) == 0 (or != 0) as guarantee
that in the guarded code whatever has signbit 0 or 1


We have a range-op entry for __builtin_signbit in cfn_signbit.  Is this 
a shortcoming of this code, or something else?



2) __builtin_isinf (x) > 0 is lowered to x > DBL_MAX, but unfortunately we don't
infer from that [INF,INF] range, but [DBL_MAX, INF] range
3) what I wrote above, I think we don't handle [0, 2] * [INF, INF] right but
due to 2) we can't see it


Doesn't this boil down to a representation issue?  I wonder if we should 
bite the bullet and tweak build_gt() and build_lt() to represent open 
ranges.  In theory it should be one more/less ULP, while adjusting for 
HONOR_INFINITIES.


If the signbit issue were resolved and we could represent > and < 
properly, would that allow us to write proper testcases without having 
to writing a plug-in (which I assume is a lot harder)?


Aldy



So, maybe for now a selftest will be better than a testcase, or
alternatively a plugin test which acts like a selftest.

/* { dg-do compile { target { ! { vax-*-* powerpc-*-*spe pdp11-*-* } } } } */
/* { dg-options "-O2 -fno-trapping-math -fno-signaling-nans -fsigned-zeros 
-fno-tree-fre -fno-tree-dominator-opts -fno-thread-jumps -fdump-tree-optimized" } */
/* { dg-add-options ieee } */

void
foo (double x, double y)
{
   const double inf = __builtin_inf ();
   const double minf = -inf;
   if (__builtin_isnan (x) || __builtin_isnan (y))
 return;
#define TEST(n, xl, xu, yl, yu, rl, ru, nan) \
   if ((__builtin_isinf (xl) > 0 \
? x > 0.0 && __builtin_isinf (x) \
: __builtin_isinf (xu) < 0   \
? x < 0.0 && __builtin_isinf (x) \
: x >= xl && x <= xu  \
 && (xl != 0.0  \
 || __builtin_signbit (xl)  \
 || !__builtin_signbit (x)) \
 && (xu != 0.0  \
 || !__builtin_signbit (xu) \
 || __builtin_signbit (x))) \
   && (__builtin_isinf (yl) > 0  \
  ? y > 0.0 && __builtin_isinf (y)   \
  : __builtin_isinf (yu) < 0 \
  ? y < 0.0 && __builtin_isinf (y)   \
  : y >= yl && y <= yu\
&& (yl != 0.0   \
|| __builtin_signbit (yl)   \
|| !__builtin_signbit (y))  \
&& (yu != 0.0   \
|| !__builtin_signbit (yu)  \
|| __builtin_signbit (y \
 {  \
   dou

[PATCH] range-op: Cleanup floating point multiplication and division fold_range [PR107569]

2022-11-11 Thread Jakub Jelinek via Gcc-patches

On Fri, Nov 11, 2022 at 09:52:53AM +0100, Jakub Jelinek via Gcc-patches wrote:
> Ok, here is the patch rewritten in the foperator_div style, with special
> cases handled first and then the ordinary cases without problematic cases.
> I guess if/once we have a plugin testing infrastructure, we could compare
> the two versions of the patch, I think this one is more precise.
> And, admittedly there are many similar spots with the foperator_div case
> (but also with significant differences), so perhaps if foperator_{mult,div}
> inherit from some derived class from range_operator_float and that class
> would define various smaller helper static? methods, like this
> discussed in the PR - contains_zero_p, singleton_nan_p, zero_p,
> that
> +   bool must_have_signbit_zero = false;
> +   bool must_have_signbit_nonzero = false;
> +   if (real_isneg (&lh_lb) == real_isneg (&lh_ub)
> +   && real_isneg (&rh_lb) == real_isneg (&rh_ub))
> + {
> +   if (real_isneg (&lh_lb) == real_isneg (&rh_ub))
> + must_have_signbit_zero = true;
> +   else
> + must_have_signbit_nonzero = true;
> + }
> returned as -1/0/1 int, and those set result (based on the above value) to
> [+INF, +INF], [-INF, -INF] or [-INF, +INF]
> or
> [+0, +0], [-0, -0] or [-0, +0]
> or
> [+0, +INF], [-INF, -0] or [-INF, +INF]
> and the
> +for (int i = 1; i < 4; ++i)
> +  {
> +   if (real_less (&cp[i], &cp[0])
> +   || (real_iszero (&cp[0]) && real_isnegzero (&cp[i])))
> + std::swap (cp[i], cp[0]);
> +   if (real_less (&cp[4], &cp[i + 4])
> +   || (real_isnegzero (&cp[4]) && real_iszero (&cp[i + 4])))
> + std::swap (cp[i + 4], cp[4]);
> +  }
> block, it could be smaller and more readable.

Here is an incremental patch on top of this and division patch,
which does that.

2022-11-11  Jakub Jelinek  

PR tree-optimization/107569
* range-op-float.cc (foperator_mult_div_base): New class.
(foperator_mult, foperator_div): Derive from that and use
protected static methods from it to simplify the code.

--- gcc/range-op-float.cc.jj2022-11-11 10:13:30.879410560 +0100
+++ gcc/range-op-float.cc   2022-11-11 10:55:57.602617289 +0100
@@ -1911,7 +1911,125 @@ class foperator_minus : public range_ope
 } fop_minus;
 
 
-class foperator_mult : public range_operator_float
+class foperator_mult_div_base : public range_operator_float
+{
+protected:
+  // True if [lb, ub] is [+-0, +-0].
+  static bool zero_p (const REAL_VALUE_TYPE &lb,
+ const REAL_VALUE_TYPE &ub)
+  {
+return real_iszero (&lb) && real_iszero (&ub);
+  }
+
+  // True if +0 or -0 is in [lb, ub] range.
+  static bool contains_zero_p (const REAL_VALUE_TYPE &lb,
+  const REAL_VALUE_TYPE &ub)
+  {
+return (real_compare (LE_EXPR, &lb, &dconst0)
+   && real_compare (GE_EXPR, &ub, &dconst0));
+  }
+
+  // True if [lb, ub] is [-INF, -INF] or [+INF, +INF].
+  static bool singleton_inf_p (const REAL_VALUE_TYPE &lb,
+  const REAL_VALUE_TYPE &ub)
+  {
+return real_isinf (&lb) && real_isinf (&ub, real_isneg (&lb));
+  }
+
+  // Return -1 if binary op result must have sign bit set,
+  // 1 if binary op result must have sign bit clear,
+  // 0 otherwise.
+  // Sign bit of binary op result is exclusive or of the
+  // operand's sign bits.
+  static int signbit_known_p (const REAL_VALUE_TYPE &lh_lb,
+ const REAL_VALUE_TYPE &lh_ub,
+ const REAL_VALUE_TYPE &rh_lb,
+ const REAL_VALUE_TYPE &rh_ub)
+  {
+if (real_isneg (&lh_lb) == real_isneg (&lh_ub)
+   && real_isneg (&rh_lb) == real_isneg (&rh_ub))
+  {
+   if (real_isneg (&lh_lb) == real_isneg (&rh_ub))
+ return 1;
+   else
+ return -1;
+  }
+return 0;
+  }
+
+  // Set [lb, ub] to [-0, -0], [-0, +0] or [+0, +0] depending on
+  // signbit_known.
+  static void zero_range (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub,
+ int signbit_known)
+  {
+ub = lb = dconst0;
+if (signbit_known <= 0)
+  lb = real_value_negate (&dconst0);
+if (signbit_known < 0)
+  ub = lb;
+  }
+
+  // Set [lb, ub] to [-INF, -INF], [-INF, +INF] or [+INF, +INF] depending on
+  // signbit_known.
+  static void inf_range (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub,
+int signbit_known)
+  {
+if (signbit_known > 0)
+  ub = lb = dconstinf;
+else if (signbit_known < 0)
+  ub = lb = dconstninf;
+else
+  {
+   lb = dconstninf;
+   ub = dconstinf;
+  }
+  }
+
+  // Set [lb, ub] to [-INF, -0], [-INF, +INF] or [+0, +INF] depending on
+  // signbit_known.
+  static void zero_to_inf_range (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub,
+int signbit_known)
+  {
+if (signbit_known > 0)

Re: [PATCH] range-op: Cleanup floating point multiplication and division fold_range [PR107569]

2022-11-11 Thread Aldy Hernandez via Gcc-patches





On 11/11/22 11:01, Jakub Jelinek wrote:

On Fri, Nov 11, 2022 at 09:52:53AM +0100, Jakub Jelinek via Gcc-patches wrote:

Ok, here is the patch rewritten in the foperator_div style, with special
cases handled first and then the ordinary cases without problematic cases.
I guess if/once we have a plugin testing infrastructure, we could compare
the two versions of the patch, I think this one is more precise.
And, admittedly there are many similar spots with the foperator_div case
(but also with significant differences), so perhaps if foperator_{mult,div}
inherit from some derived class from range_operator_float and that class
would define various smaller helper static? methods, like this
discussed in the PR - contains_zero_p, singleton_nan_p, zero_p,
that
+   bool must_have_signbit_zero = false;
+   bool must_have_signbit_nonzero = false;
+   if (real_isneg (&lh_lb) == real_isneg (&lh_ub)
+   && real_isneg (&rh_lb) == real_isneg (&rh_ub))
+ {
+   if (real_isneg (&lh_lb) == real_isneg (&rh_ub))
+ must_have_signbit_zero = true;
+   else
+ must_have_signbit_nonzero = true;
+ }
returned as -1/0/1 int, and those set result (based on the above value) to
[+INF, +INF], [-INF, -INF] or [-INF, +INF]
or
[+0, +0], [-0, -0] or [-0, +0]
or
[+0, +INF], [-INF, -0] or [-INF, +INF]
and the
+for (int i = 1; i < 4; ++i)
+  {
+   if (real_less (&cp[i], &cp[0])
+   || (real_iszero (&cp[0]) && real_isnegzero (&cp[i])))
+ std::swap (cp[i], cp[0]);
+   if (real_less (&cp[4], &cp[i + 4])
+   || (real_isnegzero (&cp[4]) && real_iszero (&cp[i + 4])))
+ std::swap (cp[i + 4], cp[4]);
+  }
block, it could be smaller and more readable.


Here is an incremental patch on top of this and division patch,
which does that.

2022-11-11  Jakub Jelinek  

PR tree-optimization/107569
* range-op-float.cc (foperator_mult_div_base): New class.
(foperator_mult, foperator_div): Derive from that and use
protected static methods from it to simplify the code.

--- gcc/range-op-float.cc.jj2022-11-11 10:13:30.879410560 +0100
+++ gcc/range-op-float.cc   2022-11-11 10:55:57.602617289 +0100
@@ -1911,7 +1911,125 @@ class foperator_minus : public range_ope
  } fop_minus;
  
  
-class foperator_mult : public range_operator_float

+class foperator_mult_div_base : public range_operator_float
+{
+protected:
+  // True if [lb, ub] is [+-0, +-0].
+  static bool zero_p (const REAL_VALUE_TYPE &lb,
+ const REAL_VALUE_TYPE &ub)
+  {
+return real_iszero (&lb) && real_iszero (&ub);
+  }
+
+  // True if +0 or -0 is in [lb, ub] range.
+  static bool contains_zero_p (const REAL_VALUE_TYPE &lb,
+  const REAL_VALUE_TYPE &ub)
+  {
+return (real_compare (LE_EXPR, &lb, &dconst0)
+   && real_compare (GE_EXPR, &ub, &dconst0));
+  }
+
+  // True if [lb, ub] is [-INF, -INF] or [+INF, +INF].
+  static bool singleton_inf_p (const REAL_VALUE_TYPE &lb,
+  const REAL_VALUE_TYPE &ub)
+  {
+return real_isinf (&lb) && real_isinf (&ub, real_isneg (&lb));
+  }
+
+  // Return -1 if binary op result must have sign bit set,
+  // 1 if binary op result must have sign bit clear,
+  // 0 otherwise.
+  // Sign bit of binary op result is exclusive or of the
+  // operand's sign bits.
+  static int signbit_known_p (const REAL_VALUE_TYPE &lh_lb,
+ const REAL_VALUE_TYPE &lh_ub,
+ const REAL_VALUE_TYPE &rh_lb,
+ const REAL_VALUE_TYPE &rh_ub)
+  {
+if (real_isneg (&lh_lb) == real_isneg (&lh_ub)
+   && real_isneg (&rh_lb) == real_isneg (&rh_ub))
+  {
+   if (real_isneg (&lh_lb) == real_isneg (&rh_ub))
+ return 1;
+   else
+ return -1;
+  }
+return 0;
+  }
+
+  // Set [lb, ub] to [-0, -0], [-0, +0] or [+0, +0] depending on
+  // signbit_known.
+  static void zero_range (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub,
+ int signbit_known)
+  {
+ub = lb = dconst0;
+if (signbit_known <= 0)
+  lb = real_value_negate (&dconst0);
+if (signbit_known < 0)
+  ub = lb;
+  }
+
+  // Set [lb, ub] to [-INF, -INF], [-INF, +INF] or [+INF, +INF] depending on
+  // signbit_known.
+  static void inf_range (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub,
+int signbit_known)
+  {
+if (signbit_known > 0)
+  ub = lb = dconstinf;
+else if (signbit_known < 0)
+  ub = lb = dconstninf;
+else
+  {
+   lb = dconstninf;
+   ub = dconstinf;
+  }
+  }
+
+  // Set [lb, ub] to [-INF, -0], [-INF, +INF] or [+0, +INF] depending on
+  // signbit_known.
+  static void zero_to_inf_range (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub,
+int signbit_known)
+  {
+if (signbit_known > 0)
+  {
+   lb =

Re: old install to a different folder

2022-11-11 Thread Richard Biener via Gcc-patches

On Fri, Nov 11, 2022 at 10:12 AM Tobias Burnus  wrote:
>
> On 11.11.22 09:50, Martin Liška wrote:
> > I do support the Richi's idea about using a new URL for the new Sphinx 
> > documentation
> > while keeping the older Texinfo documentation under /onlinedocs and /install
>
> If we do so and those become then static files: Can we put some
> disclaimer at the top of all HTML files under /install/ and under
> /onlinedocs// that those are legacy files and the new
> documentation can be found under  (not a deep link but directly to
> the install pages or the new overview page about the Sphinx docs).
>
> I think we really need such a hint – otherwise it is more confusing than
> helpful! Additionally, we should add a "news" entry to the mainpage
> pointing out that it changed and linking to the new Sphinx doc.

Note I think we can "remove" the install/ and onlinedocs/ _landing_ pages
(index.html) but we should keep the actual content pages so old links keep
working.  We can also replace the landing pages with a pointer to the new
documentation (or plain re-direct to that!).

Richard.

> Tobias
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

[PATCH] aarch64: Add support for +cssc

2022-11-11 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

This patch adds codegen for FEAT_CSSC from the 2022 Architecture extensions.
It fits various existing optabs in GCC quite well.
There are instructions for scalar signed/unsigned min/max, abs, ctz, popcount.
We have expanders for these already, so they are wired up to emit single-insn
patterns for the new TARGET_CSSC.

These instructions are enabled by the +cssc command-line extension.
Bootstrapped and tested on aarch64-none-linux-gnu.

I'll push it once the Binutils patch from Andre for this gets committed

Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (cssc): Define.
* config/aarch64/aarch64.h (AARCH64_ISA_CSSC): Define.
(TARGET_CSSC): Likewise.
* config/aarch64/aarch64.md (aarch64_abs2_insn): New define_insn.
(abs2): Adjust for the above.
(aarch64_umax3_insn): New define_insn.
(umax3): Adjust for the above.
(aarch64_popcount2_insn): New define_insn.
(popcount2): Adjust for the above.
(3): New define_insn.
* config/aarch64/constraints.md (Usm): Define.
(Uum): Likewise.
* 
doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst:
Document +cssc.
* config/aarch64/iterators.md (MAXMIN_NOUMAX): New code iterator.
* config/aarch64/predicates.md (aarch64_sminmax_immediate): Define.
(aarch64_sminmax_operand): Likewise.
(aarch64_uminmax_immediate): Likewise.
(aarch64_uminmax_operand): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cssc_1.c: New test.
* gcc.target/aarch64/cssc_2.c: New test.
* gcc.target/aarch64/cssc_3.c: New test.
* gcc.target/aarch64/cssc_4.c: New test.
* gcc.target/aarch64/cssc_5.c: New test.


cssc.patch
Description: cssc.patch

Re: old install to a different folder

2022-11-11 Thread Martin Liška

On 11/11/22 11:18, Richard Biener wrote:
> On Fri, Nov 11, 2022 at 10:12 AM Tobias Burnus  
> wrote:
>>
>> On 11.11.22 09:50, Martin Liška wrote:
>>> I do support the Richi's idea about using a new URL for the new Sphinx 
>>> documentation
>>> while keeping the older Texinfo documentation under /onlinedocs and /install
>>
>> If we do so and those become then static files: Can we put some
>> disclaimer at the top of all HTML files under /install/ and under
>> /onlinedocs// that those are legacy files and the new
>> documentation can be found under  (not a deep link but directly to
>> the install pages or the new overview page about the Sphinx docs).
>>
>> I think we really need such a hint – otherwise it is more confusing than
>> helpful! Additionally, we should add a "news" entry to the mainpage
>> pointing out that it changed and linking to the new Sphinx doc.
> 
> Note I think we can "remove" the install/ and onlinedocs/ _landing_ pages
> (index.html) but we should keep the actual content pages so old links keep
> working.  We can also replace the landing pages with a pointer to the new
> documentation (or plain re-direct to that!).

Even better. So let me summarize it:

gcc.gnu.org/docs - will contain newly generated Sphinx documentation
gcc.gnu.org/docs/gcc - sub-folder example
gcc.gnu.org/docs/install - sub-folder example
gcc.gnu.org/docs/gcc-13.1.0/install - sub-folder example once we'll have GCC 
13.1 release

gcc.gnu.org/install/index.html - 301 to gcc.gnu.org/docs/install
gcc.gnu.org/install/$something - point to old install manual
gcc.gnu.org/onlinedocs/index.html - 301 to gcc.gnu.org/docs/
gcc.gnu.org/onlinedocs/$something - point to old GCC manual

@Gerald: Is it something you can set-up? What do you think about it?

Martin

> 
> Richard.
> 
>> Tobias
>>
>> -
>> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 
>> 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: 
>> Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; 
>> Registergericht München, HRB 106955

Re: old install to a different folder

2022-11-11 Thread Tobias Burnus


Hi Richard,

On 11.11.22 11:18, Richard Bienr wrote:


Note I think we can "remove" the install/ and onlinedocs/ _landing_ pages
(index.html) but we should keep the actual content pages so old links keep
working.  We can also replace the landing pages with a pointer to the new
documentation (or plain re-direct to that!).


For install, I think we should consider to redirect. Before the move to Sphinx, 
we had only:

binaries.html
build.html
configure.html
download.html
finalinstall.html
gfdl.html
index.html
prerequisites.html
specific.html
test.html

Re-directing them to the new pages will work. There is a one-to-one 
correspondence for all but
build/test which are now in 7* and 5 files, respectively. Still linking to the 
outermost
should be ok as I do not think that there will be many links using '#...'.

(*The subdivision is also a bit pointless for Ada and D as it consists only of 
the texts
"GNAT prerequisites." and "GDC prerequisites.", respectively (in the old doc).
In the Sphinx docs, it is even shortened to: "GNAT." and "GDC.".)

The only except where links to page anchors are likely used is for
"Host/target specific installation notes for GCC".
For them, some like '#avr' still works while others don't (like 'nvptx-*-none'
as '#nvptx-x-none' changed to '#nvptx-none'). But the page is short enough and
it is clear from the context what the user wants - there is also a table of
content on the right to click on. (IMHO that's sufficient.)

* * *

For /onlinedocs/, I concur that we want to have the old doc there as there are 
many
deep links. Still, we should consider adding a disclaimer box to all former 
mainline
documentation stating that this data is no longer updated + point to the new 
overview
page + we could redirect access which goes directly to '//' and not 
a (sub)html
page to the new site, as you proposed.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH] range-op: Implement floating point multiplication fold_range [PR107569]

2022-11-11 Thread Jakub Jelinek via Gcc-patches

On Fri, Nov 11, 2022 at 11:01:38AM +0100, Aldy Hernandez wrote:
> > I've tried following, but it suffers from various issues:
> > 1) we don't handle __builtin_signbit (whatever) == 0 (or != 0) as guarantee
> > that in the guarded code whatever has signbit 0 or 1
> 
> We have a range-op entry for __builtin_signbit in cfn_signbit.  Is this a
> shortcoming of this code, or something else?

Dunno, I admit I haven't investigated it much.  I just saw it when putting
a breakpoint on the mult fold_range.

> > 2) __builtin_isinf (x) > 0 is lowered to x > DBL_MAX, but unfortunately we 
> > don't
> > infer from that [INF,INF] range, but [DBL_MAX, INF] range
> > 3) what I wrote above, I think we don't handle [0, 2] * [INF, INF] right but
> > due to 2) we can't see it
> 
> Doesn't this boil down to a representation issue?  I wonder if we should
> bite the bullet and tweak build_gt() and build_lt() to represent open
> ranges.  In theory it should be one more/less ULP, while adjusting for
> HONOR_INFINITIES.

At least with the exception of MODE_COMPOSITE_P, I think we don't need
to introduce open ranges (and who cares about MODE_COMPOSITE_P if it is
conservatively correct).
For other floats, I think
x > cst
is always equivalent to
x >= nextafter (cst, inf)
and
x < cst
is always equivalent to
x <= nextafter (cst, -inf)
except for the signed zero cases which needs tiny bit more thought.
So, if we have
if (x > DBL_MAX)
then in code guarded by that we can just use [INF, INF] as range.

> If the signbit issue were resolved and we could represent > and < properly,
> would that allow us to write proper testcases without having to writing a
> plug-in (which I assume is a lot harder)?

I don't know, we'd need to see.
First work out on all the issues that result on the testcase the operand
ranges aren't exactly what we want (whether on the testcase side or on the
range-ops side or wherever) and once that looks ok, see if the ranges
on the rN/sN vars are correct and if so, watch what hasn't been folded away
and why.
I think the plugin would be 100-200 lines of code and then we could just
write multiple testcases against the plugin.

Jakub

Re: [PATCH] range-op: Cleanup floating point multiplication and division fold_range [PR107569]

2022-11-11 Thread Jakub Jelinek via Gcc-patches

On Fri, Nov 11, 2022 at 11:12:01AM +0100, Aldy Hernandez wrote:
> > --- gcc/range-op-float.cc.jj2022-11-11 10:13:30.879410560 +0100
> > +++ gcc/range-op-float.cc   2022-11-11 10:55:57.602617289 +0100
> > @@ -1911,7 +1911,125 @@ class foperator_minus : public range_ope
> >   } fop_minus;
> > -class foperator_mult : public range_operator_float
> > +class foperator_mult_div_base : public range_operator_float
> > +{
> > +protected:
> > +  // True if [lb, ub] is [+-0, +-0].
> > +  static bool zero_p (const REAL_VALUE_TYPE &lb,
> > + const REAL_VALUE_TYPE &ub)
> > +  {
> > +return real_iszero (&lb) && real_iszero (&ub);
> > +  }
> > +
> > +  // True if +0 or -0 is in [lb, ub] range.
> > +  static bool contains_zero_p (const REAL_VALUE_TYPE &lb,
> > +  const REAL_VALUE_TYPE &ub)
> > +  {
> > +return (real_compare (LE_EXPR, &lb, &dconst0)
> > +   && real_compare (GE_EXPR, &ub, &dconst0));
> > +  }
> > +
> > +  // True if [lb, ub] is [-INF, -INF] or [+INF, +INF].
> > +  static bool singleton_inf_p (const REAL_VALUE_TYPE &lb,
> > +  const REAL_VALUE_TYPE &ub)
> > +  {
> > +return real_isinf (&lb) && real_isinf (&ub, real_isneg (&lb));
> > +  }
> > +
> > +  // Return -1 if binary op result must have sign bit set,
> > +  // 1 if binary op result must have sign bit clear,
> > +  // 0 otherwise.
> > +  // Sign bit of binary op result is exclusive or of the
> > +  // operand's sign bits.
> > +  static int signbit_known_p (const REAL_VALUE_TYPE &lh_lb,
> > + const REAL_VALUE_TYPE &lh_ub,
> > + const REAL_VALUE_TYPE &rh_lb,
> > + const REAL_VALUE_TYPE &rh_ub)
> > +  {
> > +if (real_isneg (&lh_lb) == real_isneg (&lh_ub)
> > +   && real_isneg (&rh_lb) == real_isneg (&rh_ub))
> > +  {
> > +   if (real_isneg (&lh_lb) == real_isneg (&rh_ub))
> > + return 1;
> > +   else
> > + return -1;
> > +  }
> > +return 0;
> > +  }
> > +
> > +  // Set [lb, ub] to [-0, -0], [-0, +0] or [+0, +0] depending on
> > +  // signbit_known.
> > +  static void zero_range (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub,
> > + int signbit_known)
> > +  {
> > +ub = lb = dconst0;
> > +if (signbit_known <= 0)
> > +  lb = real_value_negate (&dconst0);
> > +if (signbit_known < 0)
> > +  ub = lb;
> > +  }
> > +
> > +  // Set [lb, ub] to [-INF, -INF], [-INF, +INF] or [+INF, +INF] depending 
> > on
> > +  // signbit_known.
> > +  static void inf_range (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub,
> > +int signbit_known)
> > +  {
> > +if (signbit_known > 0)
> > +  ub = lb = dconstinf;
> > +else if (signbit_known < 0)
> > +  ub = lb = dconstninf;
> > +else
> > +  {
> > +   lb = dconstninf;
> > +   ub = dconstinf;
> > +  }
> > +  }
> > +
> > +  // Set [lb, ub] to [-INF, -0], [-INF, +INF] or [+0, +INF] depending on
> > +  // signbit_known.
> > +  static void zero_to_inf_range (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub,
> > +int signbit_known)
> > +  {
> > +if (signbit_known > 0)
> > +  {
> > +   lb = dconst0;
> > +   ub = dconstinf;
> > +  }
> > +else if (signbit_known < 0)
> > +  {
> > +   lb = dconstninf;
> > +   ub = real_value_negate (&dconst0);
> > +  }
> > +else
> > +  {
> > +   lb = dconstninf;
> > +   ub = dconstinf;
> > +  }
> > +  }
> 
> The above functions look like they could be useful outside of the mult/div
> implementation.  Perhaps put them in file scope, instead limiting it to
> foperator_mult_div_base?

Well, I didn't want to export them to everything and most of the file
works on franges, not on REAL_VALUE_TYPE pairs.  But sure, if there
are other uses, it can be moved elsewhere.

> > +  static void zero_to_inf_range (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE
> &ub,
> > +int signbit_known)
> > +  {
> > +if (signbit_known > 0)
> 
> The rest of frange uses bool for a sign.  Also, real_iszero, real_isinf,
> real_inf, etc all use bool sign.  Can you use a bool, or is there a reason
> for the int?

I need a tristate.  signbit is known and clear (this happens when
all the 4 bounds have the same sign), signbit is known and set
(this happens when one operand has signbit clear and the other signbit
set, or vice versa), or the state of the resulting signbit is unknown
(at least one operand has some values in the range with clear and others
with set signbit).

Jakub

Re: [PATCH] range-op: Implement floating point multiplication fold_range [PR107569]

2022-11-11 Thread Aldy Hernandez via Gcc-patches





On 11/11/22 11:47, Jakub Jelinek wrote:

On Fri, Nov 11, 2022 at 11:01:38AM +0100, Aldy Hernandez wrote:

I've tried following, but it suffers from various issues:
1) we don't handle __builtin_signbit (whatever) == 0 (or != 0) as guarantee
 that in the guarded code whatever has signbit 0 or 1


We have a range-op entry for __builtin_signbit in cfn_signbit.  Is this a
shortcoming of this code, or something else?


Dunno, I admit I haven't investigated it much.  I just saw it when putting
a breakpoint on the mult fold_range.


Could you send me a small testcase.  I can look into that.




2) __builtin_isinf (x) > 0 is lowered to x > DBL_MAX, but unfortunately we don't
 infer from that [INF,INF] range, but [DBL_MAX, INF] range
3) what I wrote above, I think we don't handle [0, 2] * [INF, INF] right but
 due to 2) we can't see it


Doesn't this boil down to a representation issue?  I wonder if we should
bite the bullet and tweak build_gt() and build_lt() to represent open
ranges.  In theory it should be one more/less ULP, while adjusting for
HONOR_INFINITIES.


At least with the exception of MODE_COMPOSITE_P, I think we don't need
to introduce open ranges (and who cares about MODE_COMPOSITE_P if it is
conservatively correct).
For other floats, I think
x > cst
is always equivalent to
x >= nextafter (cst, inf)
and
x < cst
is always equivalent to
x <= nextafter (cst, -inf)
except for the signed zero cases which needs tiny bit more thought.
So, if we have
if (x > DBL_MAX)
then in code guarded by that we can just use [INF, INF] as range.


Yeah, yeah.  That's exactly what I meant... using nextafter.  I'll look 
into that, as there seems there's more than one issue related to our 
lack of precision in representing < and >.





If the signbit issue were resolved and we could represent > and < properly,
would that allow us to write proper testcases without having to writing a
plug-in (which I assume is a lot harder)?


I don't know, we'd need to see.
First work out on all the issues that result on the testcase the operand
ranges aren't exactly what we want (whether on the testcase side or on the
range-ops side or wherever) and once that looks ok, see if the ranges
on the rN/sN vars are correct and if so, watch what hasn't been folded away
and why.
I think the plugin would be 100-200 lines of code and then we could just
write multiple testcases against the plugin.


If you think the plug-in will get better test coverage, by all means.  I 
was just trying to save you/us some work.


Andrew, do you have any thoughts on the plug-in?

Aldy

[PATCH] libstdc++: Set active union member in constexpr std::string [PR103295]

2022-11-11 Thread Nathaniel Shead via Gcc-patches

Hi,

Below is a patch to fix std::string in constexpr contexts on Clang. This
was originally fixed in the commits attached to PR103295, but a later
commit 98a0d72a seems to have mistakenly undone this.

Tested on x86_64-linux. Verified using clang-14 and clang-15 that the
fix works. I haven't added anything to the test suite, since this issue
is only detected by clang.

This is my first time contributing, so please let me know if I've done
anything wrong or missed something. Thanks!

Nathaniel

-- >8 --

Clang still complains about using std::string in constexpr contexts due
to the changes made in commit 98a0d72a. This patch ensures that we set
the active member of the union as according to [class.union.general] p6.

libstdc++-v3/ChangeLog:

PR libstdc++/103295
* include/bits/basic_string.h (_M_use_local_data): Set active
  member to _M_local_buf.

Signed-off-by: Nathaniel Shead 
---
 libstdc++-v3/include/bits/basic_string.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 9c2b57f5a1d..2790fd49b05 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -352,8 +352,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   {
 #if __cpp_lib_is_constant_evaluated
if (std::is_constant_evaluated())
- for (_CharT& __c : _M_local_buf)
-   __c = _CharT();
+ for (size_type i = 0; i <= _S_local_capacity; ++i)
+   _M_local_buf[i] = _CharT();
 #endif
return _M_local_data();
   }
-- 
2.34.1

[PATCH] range-op: Implement op[12]_range operators for {PLUS,MINUS,MULT,RDIV}_EXPR

2022-11-11 Thread Jakub Jelinek via Gcc-patches

On Wed, Nov 09, 2022 at 04:43:56PM +0100, Aldy Hernandez wrote:
> On Wed, Nov 9, 2022 at 3:58 PM Jakub Jelinek  wrote:
> >
> > On Wed, Nov 09, 2022 at 10:02:46AM +0100, Aldy Hernandez wrote:
> > > We can implement the op[12]_range entries for plus and minus in terms
> > > of each other.  These are adapted from the integer versions.
> >
> > I think for NANs the op[12]_range shouldn't act this way.
> > For the forward binary operations, we have the (maybe/known) NAN handling
> > of one or both NAN operands resulting in VARYING sign (maybe/known) NAN
> > result, that is the somehow the case for the reverse binary operations too,
> > if result is (maybe/known) NAN and the other op is not NAN, op is
> > VARYING sign (maybe/known) NAN, if other op is (maybe/known) NAN,
> > then op is VARYING sign maybe NAN (always maybe, never known).
> > But then for + we have the -INF + INF or vice versa into NAN, and that
> > is something that shouldn't be considered.  If result isn't NAN, then
> > neither operand can be NAN, regardless of whether result can be
> > +/- INF and the other op -/+ INF.
> 
> Heh.  I just ran into this while debugging the problem reported by Xi.
> 
> We are solving NAN = op1 - VARYING, and trying to do it with op1 = NAN
> + VARYING, which returns op1 = NAN (incorrectly).
> 
> I suppose in the above case op1 should ideally be
> [-INF,-INF][+INF,+INF]+-NAN, but since we can't represent that then
> [-INF,+INF] +-NAN, which is actually VARYING.  Do you agree?
> 
> I'm reverting this patch as attached, while I sort this out.

Here is my (so far only on the testcase tested) patch which reinstalls
your change, add the fixups I've talked about and also hooks up
reverse operators for MULT_EXPR/RDIV_EXPR.

2022-11-11  Aldy Hernandez  
Jakub Jelinek  

* range-op-float.cc (float_binary_op_range_finish): New function.
(foperator_plus::op1_range): New.
(foperator_plus::op2_range): New.
(foperator_minus::op1_range): New.
(foperator_minus::op2_range): New.
(foperator_mult::op1_range): New.
(foperator_mult::op2_range): New.
(foperator_div::op1_range): New.
(foperator_div::op2_range): New.

* gcc.c-torture/execute/ieee/inf-4.c: New test.

--- gcc/range-op-float.cc.jj2022-11-11 10:55:57.602617289 +0100
+++ gcc/range-op-float.cc   2022-11-11 12:32:19.378633983 +0100
@@ -1861,8 +1861,64 @@ foperator_unordered_equal::op1_range (fr
   return true;
 }
 
+// Final tweaks for float binary op op1_range/op2_range.
+
+static bool
+float_binary_op_range_finish (bool ret, frange &r, tree type,
+ const frange &lhs)
+{
+  if (!ret)
+return ret;
+
+  // If we get a known NAN from reverse op, it means either that
+  // the other operand was known NAN (in that case we know nothing),
+  // or the reverse operation introduced a known NAN.
+  // Say for lhs = op1 * op2 if lhs is [-0, +0] and op2 is too,
+  // 0 / 0 is known NAN.  Just punt in that case.
+  // Or if lhs is a known NAN, we also don't know anything.
+  if (r.known_isnan () || lhs.known_isnan ())
+{
+  r.set_varying (type);
+  return false;
+}
+
+  // If lhs isn't NAN, then neither operand could be NAN,
+  // even if the reverse operation does introduce a maybe_nan.
+  if (!lhs.maybe_isnan ())
+r.clear_nan ();
+  // If lhs is a maybe or known NAN, the operand could be
+  // NAN.
+  else
+r.update_nan ();
+  return true;
+}
+
 class foperator_plus : public range_operator_float
 {
+  using range_operator_float::op1_range;
+  using range_operator_float::op2_range;
+public:
+  virtual bool op1_range (frange &r, tree type,
+ const frange &lhs,
+ const frange &op2,
+ relation_trio = TRIO_VARYING) const final override
+  {
+if (lhs.undefined_p ())
+  return false;
+range_op_handler minus (MINUS_EXPR, type);
+if (!minus)
+  return false;
+return float_binary_op_range_finish (minus.fold_range (r, type, lhs, op2),
+r, type, lhs);
+  }
+  virtual bool op2_range (frange &r, tree type,
+ const frange &lhs,
+ const frange &op1,
+ relation_trio = TRIO_VARYING) const final override
+  {
+return op1_range (r, type, lhs, op1);
+  }
+private:
   void rv_fold (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub, bool &maybe_nan,
tree type,
const REAL_VALUE_TYPE &lh_lb,
@@ -1888,6 +1944,31 @@ class foperator_plus : public range_oper
 
 class foperator_minus : public range_operator_float
 {
+  using range_operator_float::op1_range;
+  using range_operator_float::op2_range;
+public:
+  virtual bool op1_range (frange &r, tree type,
+ const frange &lhs,
+ const frange &op2,
+ relation_trio = TRIO_VARYING) const final override
+  {
+if (lhs.u

[PATCH][GCC] aarch64: Add support for Cortex-A715 CPU.

2022-11-11 Thread Srinath Parvathaneni via Gcc-patches

Hi,

This patch adds support for Cortex-A715 CPU.

Bootstrapped on aarch64-none-linux-gnu and found no regressions.

Ok for GCC master?

Regards,
Srinath.

gcc/ChangeLog:

2022-11-09  Srinath Parvathaneni  

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Cortex-A715 CPU.
* config/aarch64/aarch64-tune.md: Regenerate.
* 
doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst:
Document Cortex-A715 CPU.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
e9a4b622be018d92a790db10f4d5cf926bba512c..380bd8d90fdc7bddea2c8465522a30f938c2ffc5
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -167,6 +167,8 @@ AARCH64_CORE("cortex-a510",  cortexa510, cortexa55, V9A,  
(SVE2_BITPERM, MEMTAG,
 
 AARCH64_CORE("cortex-a710",  cortexa710, cortexa57, V9A,  (SVE2_BITPERM, 
MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd47, -1)
 
+AARCH64_CORE("cortex-a715",  cortexa715, cortexa57, V9A,  (SVE2_BITPERM, 
MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd4d, -1)
+
 AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, V9A,  (SVE2_BITPERM, MEMTAG, 
I8MM, BF16), neoversen2, 0x41, 0xd48, -1)
 
 AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x41, 0xd49, -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
84e9bbf44f6222b3e5bcf4cbf8fab7ebf17015e1..f5b1482ba357d14f36e13ca3c4358865d4238e9a
 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexx2,neoversen2,demeter,neoversev2"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexa715,cortexx2,neoversen2,demeter,neoversev2"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git 
a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst 
b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
index 
c2b23a6ee97ef2b7c74119f22c1d3e3d85385f4d..2e1bd6dbfb1fcff53dd562ec5e8923d0a21cf715
 100644
--- 
a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
+++ 
b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
@@ -258,7 +258,8 @@ These options are defined for AArch64 implementations:
   :samp:`cortex-a73.cortex-a35`, :samp:`cortex-a73.cortex-a53`,
   :samp:`cortex-a75.cortex-a55`, :samp:`cortex-a76.cortex-a55`,
   :samp:`cortex-r82`, :samp:`cortex-x1`, :samp:`cortex-x2`,
-  :samp:`cortex-a510`, :samp:`cortex-a710`, :samp:`ampere1`, :samp:`native`.
+  :samp:`cortex-a510`, :samp:`cortex-a710`, :samp:`cortex-a715`, 
:samp:`ampere1`,
+  :samp:`native`.
 
   The values :samp:`cortex-a57.cortex-a53`, :samp:`cortex-a72.cortex-a53`,
   :samp:`cortex-a73.cortex-a35`, :samp:`cortex-a73.cortex-a53`,



diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
e9a4b622be018d92a790db10f4d5cf926bba512c..380bd8d90fdc7bddea2c8465522a30f938c2ffc5
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -167,6 +167,8 @@ AARCH64_CORE("cortex-a510",  cortexa510, cortexa55, V9A,  
(SVE2_BITPERM, MEMTAG,
 
 AARCH64_CORE("cortex-a710",  cortexa710, cortexa57, V9A,  (SVE2_BITPERM, 
MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd47, -1)
 
+AARCH64_CORE("cortex-a715",  cortexa715, cortexa57, V9A,  (SVE2_BITPERM,

[PATCH][GCC] aarch64: Add support for Cortex-X1C CPU.

2022-11-11 Thread Srinath Parvathaneni via Gcc-patches

Hi,

This patch adds support for Cortex-X1C CPU.

Bootstrapped on aarch64-none-linux-gnu and found no regressions.

Ok for GCC master?

Regards,
Srinath.

gcc/ChangeLog:

2022-11-09  Srinath Parvathaneni  

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Cortex-X1C CPU.
* config/aarch64/aarch64-tune.md: Regenerate.
* 
doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst:
Document Cortex-X1C CPU.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
380bd8d90fdc7bddea2c8465522a30f938c2ffc5..d2671778928678f1ab8f7e6c79c9721f3abe1f5c
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -110,6 +110,7 @@ AARCH64_CORE("cortex-a78c",  cortexa78c, cortexa57, V8_2A,  
(F16, RCPC, DOTPROD,
 AARCH64_CORE("cortex-a65",  cortexa65, cortexa53, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS), cortexa73, 0x41, 0xd06, -1)
 AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa73, 0x41, 0xd43, -1)
 AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), neoversen1, 0x41, 0xd44, -1)
+AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE, PAUTH), neoversen1, 0x41, 0xd4c, -1)
 AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, PROFILE), 
neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa73, 0x41, 0xd4a, -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
f5b1482ba357d14f36e13ca3c4358865d4238e9a..22ec1be5a4c71b930221d2c4f1e62df57df0cadf
 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexa715,cortexx2,neoversen2,demeter,neoversev2"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexa715,cortexx2,neoversen2,demeter,neoversev2"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git 
a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst 
b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
index 
2e1bd6dbfb1fcff53dd562ec5e8923d0a21cf715..d97515d9e54feaa85a2ead4e9b73f0eb966cb39f
 100644
--- 
a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
+++ 
b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
@@ -257,7 +257,7 @@ These options are defined for AArch64 implementations:
   :samp:`cortex-a57.cortex-a53`, :samp:`cortex-a72.cortex-a53`,
   :samp:`cortex-a73.cortex-a35`, :samp:`cortex-a73.cortex-a53`,
   :samp:`cortex-a75.cortex-a55`, :samp:`cortex-a76.cortex-a55`,
-  :samp:`cortex-r82`, :samp:`cortex-x1`, :samp:`cortex-x2`,
+  :samp:`cortex-r82`, :samp:`cortex-x1`, :samp:`cortex-x1c`, :samp:`cortex-x2`,
   :samp:`cortex-a510`, :samp:`cortex-a710`, :samp:`cortex-a715`, 
:samp:`ampere1`,
   :samp:`native`.
 



diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
380bd8d90fdc7bddea2c8465522a30f938c2ffc5..d2671778928678f1ab8f7e6c79c9721f3abe1f5c
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -110,6 +110,7 @@ AARCH64_CORE("

[PATCH (pushed)] sphinx: stop using parallel mode

2022-11-11 Thread Martin Liška

Noticed that the documentation build can stuck on a machine with
many cores (160) and I identified a real sphinx problem:
https://github.com/sphinx-doc/sphinx/issues/10969

Note the parallel can help just for some manuals and it is not critical
for us.

ChangeLog:

* doc/Makefile: Disable -j auto.
---
 doc/Makefile | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/Makefile b/doc/Makefile
index 9e305a8e7da..e08a43ecf2d 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -2,7 +2,11 @@
 #
 
 # You can set these variables from the command line.
-SPHINXOPTS   ?= -j auto -q
+
+# Disable parallel reading as it can be very slow on a machine with CPUs:
+# https://github.com/sphinx-doc/sphinx/issues/10969
+
+SPHINXOPTS   ?= -q
 SPHINXBUILD  ?= sphinx-build
 PAPER?=
 SOURCEDIR = .
-- 
2.38.1

Re: [PATCH v2] match.pd: rewrite select to branchless expression

2022-11-11 Thread Michael Collison


Hi Prathamesh,

It is my understanding that INTEGRAL_TYPE_P applies to the other integer 
types you mentioned (chart, short, long). In fact the test function that 
motivated this match has a mixture of char and short and does not 
restrict matching.


On 11/11/22 02:44, Prathamesh Kulkarni wrote:

On Fri, 11 Nov 2022 at 07:58, Michael Collison  wrote:

This patches transforms ((x & 0x1) == 0) ? y : z  y -into
(-(typeof(y))(x & 0x1) & z)  y, where op is a '^' or a '|'. It also
transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x ,
0x1)) & z ) op y.

Matching this patterns allows GCC to generate branchless code for one of
the functions in coremark.

Bootstrapped and tested on x86 and RISC-V. Okay?

Michael.

2022-11-10  Michael Collison  

  * match.pd ((x & 0x1) == 0) ? y : z  y
  -> (-(typeof(y))(x & 0x1) & z)  y.

2022-11-10  Michael Collison 

  * gcc.dg/tree-ssa/branchless-cond.c: New test.

---

Changes in v2:

- Rewrite comment to use C syntax

- Guard against 1-bit types

- Simplify pattern by using zero_one_valued_p

   gcc/match.pd  | 24 +
   .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++
   2 files changed, 50 insertions(+)
   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188..258531e9046 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3486,6 +3486,30 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
 (max @2 @1))

+/* ((x & 0x1) == 0) ? y : z  y -> (-(typeof(y))(x & 0x1) & z)  y */
+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (eq zero_one_valued_p@0
+integer_zerop)
+@1
+(op:c @2 @1))
+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) > 1
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type @0)) @2) @1
+
+/* ((x & 0x1) == 0) ? z  y : y -> (-(typeof(y))(x & 0x1) & z)  y */
+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (ne zero_one_valued_p@0
+integer_zerop)
+   (op:c @2 @1)
+@1)
+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) > 1
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type @0)) @2) @1
+
   /* Simplifications of shift and rotates.  */

   (for rotate (lrotate rrotate)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
new file mode 100644
index 000..68087ae6568
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z ^ y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z ^ y : y;
+}
+
+int f3(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z | y;
+}
+
+int f4(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z | y : y;
+}

Sorry to nitpick -- Since the pattern gates on INTEGRAL_TYPE_P, would
it be a good idea
to have these tests for other integral types too besides int like
{char, short, long} ?

Thanks,
Prathamesh

+
+/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
--
2.34.1

[PATCH] fix small const data for riscv

2022-11-11 Thread Oria Chen via Gcc-patches

gcc/testsuite ChangeLog:

2022-11-11  Oria Chen  

* gcc/testsuite/gcc.dg/pr25521.c: Add compile option 
"-msmall-data-limit=0" to avoid using .srodata section.  
---
 gcc/testsuite/gcc.dg/pr25521.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr25521.c b/gcc/testsuite/gcc.dg/pr25521.c
index 74fe2ae6626..628ddf1a761 100644
--- a/gcc/testsuite/gcc.dg/pr25521.c
+++ b/gcc/testsuite/gcc.dg/pr25521.c
@@ -2,7 +2,8 @@
sections.
 
{ dg-require-effective-target elf }
-   { dg-do compile } */
+   { dg-do compile }
+   { dg-options "-msmall-data-limit=0" { target { riscv*-*-* } } } */
 
 const volatile int foo = 30;
 
-- 
2.37.2

Re:[PATCH 1/1] RISC-V: Make R_RISCV_SUB6 conforms to riscv abi standard

2022-11-11 Thread shihua

LGTM,and I think it would be better to have a test example.





> From: zengxiao 
>
> This patch makes R_RISCV_SUB6 conforms to riscv abi standard.
> R_RISCV_SUB6 only the lower 6 bits of the code are valid.
> The proposed specification which can be found in 8.5. Relocations of,
> https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases/download/v1.0-rc4/riscv-abi.pdf
>
> bfd/ChangeLog:
>
> * elfxx-riscv.c (riscv_elf_add_sub_reloc): Take the lower
> 6 bits as the significant bit
>
> reviewed-by: gao...@eswincomputing.com
>  jinyanji...@eswincomputing.com
>
> Signed-off-by: zengxiao 
> ---
>  bfd/elfxx-riscv.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/bfd/elfxx-riscv.c b/bfd/elfxx-riscv.c
> index f0c91cc97f7..0fbfedd17fe 100644
> --- a/bfd/elfxx-riscv.c
> +++ b/bfd/elfxx-riscv.c
> @@ -994,6 +994,13 @@ riscv_elf_add_sub_reloc (bfd *abfd,
>relocation = old_value + relocation;
>break;
>  case R_RISCV_SUB6:
> +  {
> +bfd_vma six_bit_valid_value = old_value & howto->dst_mask;
> +six_bit_valid_value -= relocation;
> +relocation = (six_bit_valid_value & howto->dst_mask) |
> +  (old_value & ~howto->dst_mask);
> +  }
> +  break;
>  case R_RISCV_SUB8:
>  case R_RISCV_SUB16:
>  case R_RISCV_SUB32:
> -- 
> 2.34.1









logo

RE: [PATCH][GCC] aarch64: Add support for Cortex-X1C CPU.

2022-11-11 Thread Kyrylo Tkachov via Gcc-patches

Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: Friday, November 11, 2022 12:11 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Kyrylo Tkachov
> 
> Subject: [PATCH][GCC] aarch64: Add support for Cortex-X1C CPU.
> 
> Hi,
> 
> This patch adds support for Cortex-X1C CPU.
> 
> Bootstrapped on aarch64-none-linux-gnu and found no regressions.
> 
> Ok for GCC master?
> 

Ok.
Thanks,
Kyrill

> Regards,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2022-11-09  Srinath Parvathaneni  
> 
> * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Cortex-X1C
> CPU.
> * config/aarch64/aarch64-tune.md: Regenerate.
> * doc/gcc/gcc-command-options/machine-dependent-options/aarch64-
> options.rst:
> Document Cortex-X1C CPU.
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-cores.def
> b/gcc/config/aarch64/aarch64-cores.def
> index
> 380bd8d90fdc7bddea2c8465522a30f938c2ffc5..d2671778928678f1ab8f7e6c7
> 9c9721f3abe1f5c 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -110,6 +110,7 @@ AARCH64_CORE("cortex-a78c",  cortexa78c,
> cortexa57, V8_2A,  (F16, RCPC, DOTPROD,
>  AARCH64_CORE("cortex-a65",  cortexa65, cortexa53, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS), cortexa73, 0x41, 0xd06, -1)
>  AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, V8_2A,  (F16,
> RCPC, DOTPROD, SSBS), cortexa73, 0x41, 0xd43, -1)
>  AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS, PROFILE), neoversen1, 0x41, 0xd44, -1)
> +AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS, PROFILE, PAUTH), neoversen1, 0x41, 0xd4c, -1)
>  AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD,
> PROFILE), neoversen1, 0x41, 0xd0c, -1)
>  AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
>  AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS), cortexa73, 0x41, 0xd4a, -1)
> diff --git a/gcc/config/aarch64/aarch64-tune.md
> b/gcc/config/aarch64/aarch64-tune.md
> index
> f5b1482ba357d14f36e13ca3c4358865d4238e9a..22ec1be5a4c71b930221d2c4
> f1e62df57df0cadf 100644
> --- a/gcc/config/aarch64/aarch64-tune.md
> +++ b/gcc/config/aarch64/aarch64-tune.md
> @@ -1,5 +1,5 @@
>  ;; -*- buffer-read-only: t -*-
>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>  (define_attr "tune"
> -
>   "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thun
> derx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunder
> xt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,t
> hunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa
> 76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,co
> rtexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,oc
> teontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thun
> derx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cort
> exa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55
> ,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexa715,cortexx2,n
> eoversen2,demeter,neoversev2"
> +
>   "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thun
> derx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunder
> xt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,t
> hunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa
> 76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,co
> rtexx1,cortexx1c,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeo
> ntx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,ts
> v110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cort
> exa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa7
> 5cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexa715
> ,cortexx2,neoversen2,demeter,neoversev2"
>   (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> diff --git a/gcc/doc/gcc/gcc-command-options/machine-dependent-
> options/aarch64-options.rst b/gcc/doc/gcc/gcc-command-options/machine-
> dependent-options/aarch64-options.rst
> index
> 2e1bd6dbfb1fcff53dd562ec5e8923d0a21cf715..d97515d9e54feaa85a2ead4e
> 9b73f0eb966cb39f 100644
> --- a/gcc/doc/gcc/gcc-command-options/machine-dependent-
> options/aarch64-options.rst
> +++ b/gcc/doc/gcc/gcc-command-options/machine-dependent-
> options/aarch64-options.rst
> @@ -257,7 +257,7 @@ These options are defined for AArch64
> implementations:
>:samp:`cortex-a57.cortex-a53`, :samp:`cortex-a72.cortex-a53`,
>:samp:`cortex-a73.cortex-a35`, :samp:`cortex-a73.cortex-a53`,
>:samp:`cortex-a75.cortex-a55`, :samp:`cortex-a76.cortex-a55`,
> -  :samp:`cortex-r82`, :samp:`cortex-x1`, :samp:`cortex-x2`,
> +  :samp:`cortex-

[PATCH (pushed)] jit: doc: Use shared Indices and tables

2022-11-11 Thread Martin Liška

Apart from that, do not use leading .rst names in toctree.

ChangeLog:

* doc/indices-and-tables.rst: Rename Indexes to Indices.

gcc/jit/ChangeLog:

* doc/cp/index.rst: Remove trailing .rst in toctree.
* doc/cp/intro/index.rst: Likewise.
* doc/cp/topics/index.rst: Likewise.
* doc/index.rst: Likewise.
* doc/intro/index.rst: Likewise.
* doc/topics/index.rst: Likewise.
* doc/indices-and-tables.rst: New file.
---
 doc/indices-and-tables.rst |  2 +-
 gcc/jit/doc/cp/index.rst   |  4 ++--
 gcc/jit/doc/cp/intro/index.rst |  8 
 gcc/jit/doc/cp/topics/index.rst| 16 
 gcc/jit/doc/index.rst  | 15 +--
 gcc/jit/doc/indices-and-tables.rst |  1 +
 gcc/jit/doc/intro/index.rst| 10 +-
 gcc/jit/doc/topics/index.rst   | 22 +++---
 8 files changed, 37 insertions(+), 41 deletions(-)
 create mode 100644 gcc/jit/doc/indices-and-tables.rst

diff --git a/doc/indices-and-tables.rst b/doc/indices-and-tables.rst
index bf62509bd14..56b33139280 100644
--- a/doc/indices-and-tables.rst
+++ b/doc/indices-and-tables.rst
@@ -1,6 +1,6 @@
 .. only:: html
 
-  Indexes and tables
+  Indices and tables
   ==
 
   :ref:`genindex`
diff --git a/gcc/jit/doc/cp/index.rst b/gcc/jit/doc/cp/index.rst
index 46efb8a516f..00263b6fe72 100644
--- a/gcc/jit/doc/cp/index.rst
+++ b/gcc/jit/doc/cp/index.rst
@@ -33,5 +33,5 @@ Contents:
 .. toctree::
:maxdepth: 2
 
-   intro/index.rst
-   topics/index.rst
+   intro/index
+   topics/index
diff --git a/gcc/jit/doc/cp/intro/index.rst b/gcc/jit/doc/cp/intro/index.rst
index e6812101c6f..3a6a26c943e 100644
--- a/gcc/jit/doc/cp/intro/index.rst
+++ b/gcc/jit/doc/cp/intro/index.rst
@@ -21,7 +21,7 @@ Tutorial
 .. toctree::
:maxdepth: 2
 
-   tutorial01.rst
-   tutorial02.rst
-   tutorial03.rst
-   tutorial04.rst
+   tutorial01
+   tutorial02
+   tutorial03
+   tutorial04
diff --git a/gcc/jit/doc/cp/topics/index.rst b/gcc/jit/doc/cp/topics/index.rst
index cdf7e55a6c8..e659ece3fb1 100644
--- a/gcc/jit/doc/cp/topics/index.rst
+++ b/gcc/jit/doc/cp/topics/index.rst
@@ -21,11 +21,11 @@ Topic Reference
 .. toctree::
:maxdepth: 2
 
-   contexts.rst
-   objects.rst
-   types.rst
-   expressions.rst
-   functions.rst
-   locations.rst
-   compilation.rst
-   asm.rst
+   contexts
+   objects
+   types
+   expressions
+   functions
+   locations
+   compilation
+   asm
diff --git a/gcc/jit/doc/index.rst b/gcc/jit/doc/index.rst
index 0f575966303..a354d1c1501 100644
--- a/gcc/jit/doc/index.rst
+++ b/gcc/jit/doc/index.rst
@@ -33,14 +33,9 @@ Contents:
 .. toctree::
:maxdepth: 2
 
-   intro/index.rst
-   topics/index.rst
-   cp/index.rst
-   internals/index.rst
+   intro/index
+   topics/index
+   cp/index
+   internals/index
 
-
-Indices and tables
-==
-
-* :ref:`genindex`
-* :ref:`search`
+   indices-and-tables
diff --git a/gcc/jit/doc/indices-and-tables.rst 
b/gcc/jit/doc/indices-and-tables.rst
new file mode 100644
index 000..5cc3191ee47
--- /dev/null
+++ b/gcc/jit/doc/indices-and-tables.rst
@@ -0,0 +1 @@
+.. include:: ../../../doc/indices-and-tables.rst
diff --git a/gcc/jit/doc/intro/index.rst b/gcc/jit/doc/intro/index.rst
index 552a6ed4417..20a813e7a96 100644
--- a/gcc/jit/doc/intro/index.rst
+++ b/gcc/jit/doc/intro/index.rst
@@ -21,8 +21,8 @@ Tutorial
 .. toctree::
:maxdepth: 2
 
-   tutorial01.rst
-   tutorial02.rst
-   tutorial03.rst
-   tutorial04.rst
-   tutorial05.rst
+   tutorial01
+   tutorial02
+   tutorial03
+   tutorial04
+   tutorial05
diff --git a/gcc/jit/doc/topics/index.rst b/gcc/jit/doc/topics/index.rst
index 8e843c207fc..39462d9e828 100644
--- a/gcc/jit/doc/topics/index.rst
+++ b/gcc/jit/doc/topics/index.rst
@@ -21,14 +21,14 @@ Topic Reference
 .. toctree::
:maxdepth: 2
 
-   contexts.rst
-   objects.rst
-   types.rst
-   expressions.rst
-   functions.rst
-   function-pointers.rst
-   locations.rst
-   compilation.rst
-   compatibility.rst
-   performance.rst
-   asm.rst
+   contexts
+   objects
+   types
+   expressions
+   functions
+   function-pointers
+   locations
+   compilation
+   compatibility
+   performance
+   asm
-- 
2.38.1

[PATCH] tree-optimization/107618 - enhance copy propagation of constants

2022-11-11 Thread Richard Biener via Gcc-patches

The following enhances copy propagation of constants to also see
through simple operations like conversions but also operations with
otherwise constant operands.  That's required to fulfill the promise

  /* Copy propagation also copy-propagates constants, this is necessary
 to forward object-size and builtin folding results properly.  */
  NEXT_PASS (pass_copy_prop);

and avoid false diagnostics as shown in the testcase.  We're
using gimple_fold_stmt_to_constant_1 with not following SSA edges
and accordingly adjust what stmts we simulate during SSA propagation.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/107618
* tree-sssa-copy.cc (stmt_may_generate_copy): Simulate all
assignments with a single SSA use.
(copy_prop_visit_assignment): Use gimple_fold_stmt_to_constant_1
to perform simple constant folding.
(copy_prop::visit_stmt): Visit all assignments.

* gcc.dg/pr107618.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr107618.c | 10 +++
 gcc/tree-ssa-copy.cc| 49 +
 2 files changed, 35 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr107618.c

diff --git a/gcc/testsuite/gcc.dg/pr107618.c b/gcc/testsuite/gcc.dg/pr107618.c
new file mode 100644
index 000..9e73cc1f1a1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr107618.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-Og" } */
+
+void a(void) __attribute__((__warning__("")));
+int main(void)
+{
+  unsigned long b = __builtin_object_size(0, 0);
+  if (__builtin_expect(b < 1, 0))
+a(); /* { dg-bogus "warning" } */
+}
diff --git a/gcc/tree-ssa-copy.cc b/gcc/tree-ssa-copy.cc
index 782ceb500cc..811161c223e 100644
--- a/gcc/tree-ssa-copy.cc
+++ b/gcc/tree-ssa-copy.cc
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "tree-scalar-evolution.h"
 #include "tree-ssa-loop-niter.h"
+#include "gimple-fold.h"
 
 
 /* This file implements the copy propagation pass and provides a
@@ -99,12 +100,16 @@ stmt_may_generate_copy (gimple *stmt)
   if (gimple_vuse (stmt))
 return false;
 
+  /* If the assignment is from a constant it generates a useful copy.  */
+  if (gimple_assign_single_p (stmt)
+  && is_gimple_min_invariant (gimple_assign_rhs1 (stmt)))
+return true;
+
   /* Otherwise, the only statements that generate useful copies are
- assignments whose RHS is just an SSA name that doesn't flow
- through abnormal edges.  */
-  return ((gimple_assign_rhs_code (stmt) == SSA_NAME
-  && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_assign_rhs1 (stmt)))
- || is_gimple_min_invariant (gimple_assign_rhs1 (stmt)));
+ assignments whose single SSA use doesn't flow through abnormal
+ edges.  */
+  tree rhs = single_ssa_tree_operand (stmt, SSA_OP_USE);
+  return (rhs && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs));
 }
 
 
@@ -197,26 +202,24 @@ dump_copy_of (FILE *file, tree var)
 static enum ssa_prop_result
 copy_prop_visit_assignment (gimple *stmt, tree *result_p)
 {
-  tree lhs, rhs;
-
-  lhs = gimple_assign_lhs (stmt);
-  rhs = valueize_val (gimple_assign_rhs1 (stmt));
-
-  if (TREE_CODE (lhs) == SSA_NAME)
+  tree lhs = gimple_assign_lhs (stmt);
+  tree rhs = gimple_fold_stmt_to_constant_1 (stmt, valueize_val);
+  if (rhs
+  && (TREE_CODE (rhs) == SSA_NAME
+ || is_gimple_min_invariant (rhs)))
 {
-  /* Straight copy between two SSA names.  First, make sure that
+  /* Straight copy between two SSA names or a constant.  Make sure that
 we can propagate the RHS into uses of LHS.  */
   if (!may_propagate_copy (lhs, rhs))
-   return SSA_PROP_VARYING;
-
-  *result_p = lhs;
-  if (set_copy_of_val (*result_p, rhs))
-   return SSA_PROP_INTERESTING;
-  else
-   return SSA_PROP_NOT_INTERESTING;
+   rhs = lhs;
 }
+  else
+rhs = lhs;
 
-  return SSA_PROP_VARYING;
+  *result_p = lhs;
+  if (set_copy_of_val (*result_p, rhs))
+return SSA_PROP_INTERESTING;
+  return rhs != lhs ? SSA_PROP_NOT_INTERESTING : SSA_PROP_VARYING;
 }
 
 
@@ -282,10 +285,8 @@ copy_prop::visit_stmt (gimple *stmt, edge *taken_edge_p, 
tree *result_p)
   fprintf (dump_file, "\n");
 }
 
-  if (gimple_assign_single_p (stmt)
-  && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
-  && (TREE_CODE (gimple_assign_rhs1 (stmt)) == SSA_NAME
- || is_gimple_min_invariant (gimple_assign_rhs1 (stmt
+  if (is_gimple_assign (stmt)
+  && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)
 {
   /* If the statement is a copy assignment, evaluate its RHS to
 see if the lattice value of its output has changed.  */
-- 
2.35.3

[PATCH 0/8] middle-end: Popcount and clz/ctz idiom recognition improvements

2022-11-11 Thread Andrew Carlotti via Gcc-patches

This is a series of patches to improve recognition of popcount and
clz/ctz idioms, along with some related fixes.

- Patches 1 and 8 are independent fixes or improvements.
- Patch 4 is a dependency of patch 5, as it improves the robustness of a
  test that would otherwise begin failing.
- Patches 2, 3, 5 and 7 form the main dependent sequence.
- Patch 6 is a documentation update, covering attributes in patch 5 and
  existing code.
- Patch 7 may require other work before it can be merged, as it seems to
  expose a latent issue in the vectoriser.

Each patch has been bootstrapped and regression tested on
aarch64-none-linux-gnu.

[PATCH 0/8] middle-end: Ensure at_stmt is defined before an early exit

2022-11-11 Thread Andrew Carlotti via Gcc-patches

This prevents a null dereference error when outputing debug information
following an early exit from number_of_iterations_exit_assumptions.

gcc/ChangeLog:

* tree-ssa-loop-niter.cc (number_of_iterations_exit_assumptions):
Move at_stmt assignment.


--


diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 
4ffcef4f4ff2fe182fbe711553c8e4575560ab07..cdbb924216243ebcabe6c695698a4aee71882c49
 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -2537,6 +2537,9 @@ number_of_iterations_exit_assumptions (class loop *loop, 
edge exit,
   if (!stmt)
 return false;
 
+  if (at_stmt)
+*at_stmt = stmt;
+
   /* We want the condition for staying inside loop.  */
   code = gimple_cond_code (stmt);
   if (exit->flags & EDGE_TRUE_VALUE)
@@ -2642,9 +2645,6 @@ number_of_iterations_exit_assumptions (class loop *loop, 
edge exit,
   if (TREE_CODE (niter->niter) == INTEGER_CST)
 niter->max = wi::to_widest (niter->niter);
 
-  if (at_stmt)
-*at_stmt = stmt;
-
   return (!integer_zerop (niter->assumptions));
 }

[PATCH] doc: Ada: include Indices and Tables in manuals

2022-11-11 Thread Martin Liška

Similarly to other manuals, we should include the page
in HTML builder.

What Ada folks think about it?

Thanks,
Martin

gcc/ada/ChangeLog:

* doc/gnat-style/index.rst: Add Indicies and Tables.
* doc/gnat_rm/index.rst: Likewise.
* doc/gnat_ugn/index.rst: Likewise.
* doc/gnat-style/indices-and-tables.rst: New file.
* doc/gnat_rm/indices-and-tables.rst: New file.
* doc/gnat_ugn/indices-and-tables.rst: New file.
---
 gcc/ada/doc/gnat-style/index.rst  | 1 +
 gcc/ada/doc/gnat-style/indices-and-tables.rst | 1 +
 gcc/ada/doc/gnat_rm/index.rst | 1 +
 gcc/ada/doc/gnat_rm/indices-and-tables.rst| 1 +
 gcc/ada/doc/gnat_ugn/index.rst| 2 +-
 gcc/ada/doc/gnat_ugn/indices-and-tables.rst   | 1 +
 6 files changed, 6 insertions(+), 1 deletion(-)
 create mode 100644 gcc/ada/doc/gnat-style/indices-and-tables.rst
 create mode 100644 gcc/ada/doc/gnat_rm/indices-and-tables.rst
 create mode 100644 gcc/ada/doc/gnat_ugn/indices-and-tables.rst

diff --git a/gcc/ada/doc/gnat-style/index.rst b/gcc/ada/doc/gnat-style/index.rst
index b9428749f1f..998be2b25c8 100644
--- a/gcc/ada/doc/gnat-style/index.rst
+++ b/gcc/ada/doc/gnat-style/index.rst
@@ -689,3 +689,4 @@ Program Structure and Compilation Issues
 
 .. toctree::
gnu_free_documentation_license
+   indices-and-tables
\ No newline at end of file
diff --git a/gcc/ada/doc/gnat-style/indices-and-tables.rst 
b/gcc/ada/doc/gnat-style/indices-and-tables.rst
new file mode 100644
index 000..8d84ef9d4ec
--- /dev/null
+++ b/gcc/ada/doc/gnat-style/indices-and-tables.rst
@@ -0,0 +1 @@
+.. include:: ../../../../doc/indices-and-tables.rst
diff --git a/gcc/ada/doc/gnat_rm/index.rst b/gcc/ada/doc/gnat_rm/index.rst
index 372c2100e51..5930498c5f3 100644
--- a/gcc/ada/doc/gnat_rm/index.rst
+++ b/gcc/ada/doc/gnat_rm/index.rst
@@ -68,3 +68,4 @@ GNAT Reference Manual
:maxdepth: 3
 
gnu_free_documentation_license
+   indices-and-tables
diff --git a/gcc/ada/doc/gnat_rm/indices-and-tables.rst 
b/gcc/ada/doc/gnat_rm/indices-and-tables.rst
new file mode 100644
index 000..8d84ef9d4ec
--- /dev/null
+++ b/gcc/ada/doc/gnat_rm/indices-and-tables.rst
@@ -0,0 +1 @@
+.. include:: ../../../../doc/indices-and-tables.rst
diff --git a/gcc/ada/doc/gnat_ugn/index.rst b/gcc/ada/doc/gnat_ugn/index.rst
index d3d1dac3569..11c5973bc36 100644
--- a/gcc/ada/doc/gnat_ugn/index.rst
+++ b/gcc/ada/doc/gnat_ugn/index.rst
@@ -59,5 +59,5 @@ GNAT User's Guide for Native Platforms
C. Elaboration Order Handling in GNAT 
D. Inline Assembler 
E. GNU Free Documentation License 
-
+   F. Indices and Tables 
 
diff --git a/gcc/ada/doc/gnat_ugn/indices-and-tables.rst 
b/gcc/ada/doc/gnat_ugn/indices-and-tables.rst
new file mode 100644
index 000..8d84ef9d4ec
--- /dev/null
+++ b/gcc/ada/doc/gnat_ugn/indices-and-tables.rst
@@ -0,0 +1 @@
+.. include:: ../../../../doc/indices-and-tables.rst
-- 
2.38.1

Re: [PATCH] 1/19 modula2 front end: changes outside gcc/m2, libgm2 and gcc/testsuite.

2022-11-11 Thread Richard Biener via Gcc-patches

On Mon, Oct 10, 2022 at 5:36 PM Gaius Mulley via Gcc-patches
 wrote:
>
>
>
> This patch set contains the non machine generated changes found in /
> for example the language die and documentation changes.  It also
> contains the changes to the top level build Makefile infastructure
> and the install.texi sourcebuild.texi documentation.

I couldn't spot any issue besides the docs now being written in
Sphinx, so this part
is OK (with the docs ported)

Thanks,
Richard.

>
> --8<--8<--8<--8<--8<--8<
> diff -ruw gcc-git-master/configure.ac gcc-git-devel-modula2/configure.ac
> --- gcc-git-master/configure.ac 2022-10-07 20:21:09.001978462 +0100
> +++ gcc-git-devel-modula2/configure.ac  2022-10-07 20:21:18.522095368 +0100
> @@ -140,7 +140,7 @@
>  # binutils, gas and ld appear in that order because it makes sense to run
>  # "make check" in that particular order.
>  # If --enable-gold is used, "gold" may replace "ld".
> -host_tools="texinfo flex bison binutils gas ld fixincludes gcc cgen sid sim 
> gdb gdbserver gprof etc expect dejagnu m4 utils guile fastjar gnattools 
> libcc1 gotools c++tools"
> +host_tools="texinfo flex bison binutils gas ld fixincludes gcc cgen sid sim 
> gdb gdbserver gprof etc expect dejagnu m4 utils guile fastjar gnattools 
> libcc1 gm2tools gotools c++tools"
>
>  # these libraries are built for the target environment, and are built after
>  # the host libraries and the host tools (which may be a cross compiler)
> @@ -162,6 +162,7 @@
> target-libffi \
> target-libobjc \
> target-libada \
> +   target-libgm2 \
> target-libgo \
> target-libphobos \
> target-zlib"
> @@ -459,6 +460,14 @@
>noconfigdirs="$noconfigdirs gnattools"
>  fi
>
> +AC_ARG_ENABLE(libgm2,
> +[AS_HELP_STRING([--enable-libgm2], [build libgm2 directory])],
> +ENABLE_LIBGM2=$enableval,
> +ENABLE_LIBGM2=no)
> +if test "${ENABLE_LIBGM2}" != "yes" ; then
> +  noconfigdirs="$noconfigdirs gm2tools"
> +fi
> +
>  AC_ARG_ENABLE(libssp,
>  [AS_HELP_STRING([--enable-libssp], [build libssp directory])],
>  ENABLE_LIBSSP=$enableval,
> @@ -3616,6 +3625,7 @@
>  NCN_STRICT_CHECK_TARGET_TOOLS(GFORTRAN_FOR_TARGET, gfortran)
>  NCN_STRICT_CHECK_TARGET_TOOLS(GOC_FOR_TARGET, gccgo)
>  NCN_STRICT_CHECK_TARGET_TOOLS(GDC_FOR_TARGET, gdc)
> +NCN_STRICT_CHECK_TARGET_TOOLS(GM2_FOR_TARGET, gm2)
>
>  ACX_CHECK_INSTALLED_TARGET_TOOL(AR_FOR_TARGET, ar)
>  ACX_CHECK_INSTALLED_TARGET_TOOL(AS_FOR_TARGET, as)
> @@ -3654,6 +3664,8 @@
> [gcc/gccgo -B$$r/$(HOST_SUBDIR)/gcc/], go)
>  GCC_TARGET_TOOL(gdc, GDC_FOR_TARGET, GDC,
> [gcc/gdc -B$$r/$(HOST_SUBDIR)/gcc/], d)
> +GCC_TARGET_TOOL(gm2, GM2_FOR_TARGET, GM2,
> +   [gcc/gm2 -B$$r/$(HOST_SUBDIR)/gcc/], m2)
>  GCC_TARGET_TOOL(ld, LD_FOR_TARGET, LD, [ld/ld-new])
>  GCC_TARGET_TOOL(lipo, LIPO_FOR_TARGET, LIPO)
>  GCC_TARGET_TOOL(nm, NM_FOR_TARGET, NM, [binutils/nm-new])
> @@ -3780,6 +3792,9 @@
>  # Specify what files to not compare during bootstrap.
>
>  compare_exclusions="gcc/cc*-checksum\$(objext) | gcc/ada/*tools/*"
> +compare_exclusions="$compare_exclusions | 
> gcc/m2/gm2-compiler-boot/M2Version*"
> +compare_exclusions="$compare_exclusions | gcc/m2/gm2-compiler-boot/SYSTEM*"
> +compare_exclusions="$compare_exclusions | gcc/m2/gm2version*"
>  case "$target" in
>hppa*64*-*-hpux*) ;;
>powerpc*-ibm-aix*) compare_exclusions="$compare_exclusions | 
> *libgomp*\$(objext)" ;;
> diff -ruw gcc-git-master/gcc/doc/sourcebuild.texi 
> gcc-git-devel-modula2/gcc/doc/sourcebuild.texi
> --- gcc-git-master/gcc/doc/sourcebuild.texi 2022-10-07 20:21:09.761987791 
> +0100
> +++ gcc-git-devel-modula2/gcc/doc/sourcebuild.texi  2022-10-07 
> 20:21:18.606096399 +0100
> @@ -97,6 +97,9 @@
>  @item libgfortran
>  The Fortran runtime library.
>
> +@item libgm2
> +The Modula-2 runtime library.
> +
>  @item libgo
>  The Go runtime library.  The bulk of this library is mirrored from the
>  @uref{https://github.com/@/golang/go, master Go repository}.
> @@ -187,13 +190,12 @@
>  @item @var{language}
>  Subdirectories for various languages.  Directories containing a file
>  @file{config-lang.in} are language subdirectories.  The contents of
> -the subdirectories @file{c} (for C), @file{cp} (for C++),
> -@file{objc} (for Objective-C), @file{objcp} (for Objective-C++),
> -and @file{lto} (for LTO) are documented in this
> -manual (@pxref{Passes, , Passes and Files of the Compiler});
> -those for other languages are not.  @xref{Front End, ,
> -Anatomy of a Language Front End}, for details of the files in these
> -directories.
> +the subdirectories @file{c} (for C), @file{cp} (for C++), @file{m2}
> +(for Modula-2), @file{objc} (for Objective-C), @file{objcp} (for
> +Objective-C++), and @file{lto} (for LTO) are documented in this manual
> +(@pxref{Passes, , Passes and Files of the Compiler}); those for other
> +languages are not.  @xref{F

[PATCH 2/8] middle-end: Remove prototype for number_of_iterations_popcount

2022-11-11 Thread Andrew Carlotti via Gcc-patches

gcc/ChangeLog:

* tree-ssa-loop-niter.c (ssa_defined_by_minus_one_stmt_p): Move
(number_of_iterations_popcount): Move, and remove separate prototype.


--


diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 
cdbb924216243ebcabe6c695698a4aee71882c49..c23643fd9dd8b27ff11549e1f28f585534e84cd3
 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -63,11 +63,6 @@ struct bounds
   mpz_t below, up;
 };
 
-static bool number_of_iterations_popcount (loop_p loop, edge exit,
-  enum tree_code code,
-  class tree_niter_desc *niter);
-
-
 /* Splits expression EXPR to a variable part VAR and constant OFFSET.  */
 
 static void
@@ -2031,6 +2026,200 @@ number_of_iterations_cond (class loop *loop,
   return ret;
 }
 
+/* Utility function to check if OP is defined by a stmt
+   that is a val - 1.  */
+
+static bool
+ssa_defined_by_minus_one_stmt_p (tree op, tree val)
+{
+  gimple *stmt;
+  return (TREE_CODE (op) == SSA_NAME
+ && (stmt = SSA_NAME_DEF_STMT (op))
+ && is_gimple_assign (stmt)
+ && (gimple_assign_rhs_code (stmt) == PLUS_EXPR)
+ && val == gimple_assign_rhs1 (stmt)
+ && integer_minus_onep (gimple_assign_rhs2 (stmt)));
+}
+
+/* See if LOOP is a popcout implementation, determine NITER for the loop
+
+   We match:
+   
+   goto 
+
+   
+   _1 = b_11 + -1
+   b_6 = _1 & b_11
+
+   
+   b_11 = PHI 
+
+   exit block
+   if (b_11 != 0)
+   goto 
+   else
+   goto 
+
+   OR we match copy-header version:
+   if (b_5 != 0)
+   goto 
+   else
+   goto 
+
+   
+   b_11 = PHI 
+   _1 = b_11 + -1
+   b_6 = _1 & b_11
+
+   exit block
+   if (b_6 != 0)
+   goto 
+   else
+   goto 
+
+   If popcount pattern, update NITER accordingly.
+   i.e., set NITER to  __builtin_popcount (b)
+   return true if we did, false otherwise.
+
+ */
+
+static bool
+number_of_iterations_popcount (loop_p loop, edge exit,
+  enum tree_code code,
+  class tree_niter_desc *niter)
+{
+  bool adjust = true;
+  tree iter;
+  HOST_WIDE_INT max;
+  adjust = true;
+  tree fn = NULL_TREE;
+
+  /* Check loop terminating branch is like
+ if (b != 0).  */
+  gimple *stmt = last_stmt (exit->src);
+  if (!stmt
+  || gimple_code (stmt) != GIMPLE_COND
+  || code != NE_EXPR
+  || !integer_zerop (gimple_cond_rhs (stmt))
+  || TREE_CODE (gimple_cond_lhs (stmt)) != SSA_NAME)
+return false;
+
+  gimple *and_stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (stmt));
+
+  /* Depending on copy-header is performed, feeding PHI stmts might be in
+ the loop header or loop latch, handle this.  */
+  if (gimple_code (and_stmt) == GIMPLE_PHI
+  && gimple_bb (and_stmt) == loop->header
+  && gimple_phi_num_args (and_stmt) == 2
+  && (TREE_CODE (gimple_phi_arg_def (and_stmt,
+loop_latch_edge (loop)->dest_idx))
+ == SSA_NAME))
+{
+  /* SSA used in exit condition is defined by PHI stmt
+   b_11 = PHI 
+   from the PHI stmt, get the and_stmt
+   b_6 = _1 & b_11.  */
+  tree t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx);
+  and_stmt = SSA_NAME_DEF_STMT (t);
+  adjust = false;
+}
+
+  /* Make sure it is indeed an and stmt (b_6 = _1 & b_11).  */
+  if (!is_gimple_assign (and_stmt)
+  || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
+return false;
+
+  tree b_11 = gimple_assign_rhs1 (and_stmt);
+  tree _1 = gimple_assign_rhs2 (and_stmt);
+
+  /* Check that _1 is defined by _b11 + -1 (_1 = b_11 + -1).
+ Also make sure that b_11 is the same in and_stmt and _1 defining stmt.
+ Also canonicalize if _1 and _b11 are revrsed.  */
+  if (ssa_defined_by_minus_one_stmt_p (b_11, _1))
+std::swap (b_11, _1);
+  else if (ssa_defined_by_minus_one_stmt_p (_1, b_11))
+;
+  else
+return false;
+  /* Check the recurrence:
+   ... = PHI .  */
+  gimple *phi = SSA_NAME_DEF_STMT (b_11);
+  if (gimple_code (phi) != GIMPLE_PHI
+  || (gimple_bb (phi) != loop_latch_edge (loop)->dest)
+  || (gimple_assign_lhs (and_stmt)
+ != gimple_phi_arg_def (phi, loop_latch_edge (loop)->dest_idx)))
+return false;
+
+  /* We found a match. Get the corresponding popcount builtin.  */
+  tree src = gimple_phi_arg_def (phi, loop_preheader_edge (loop)->dest_idx);
+  if (TYPE_PRECISION (TREE_TYPE (src)) <= TYPE_PRECISION (integer_type_node))
+fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+  else if (TYPE_PRECISION (TREE_TYPE (src))
+  == TYPE_PRECISION (long_integer_type_node))
+fn = builtin_decl_implicit (BUILT_IN_POPCOUNTL);
+  else if (TYPE_PRECISION (TREE_TYPE (src))
+  == TYPE_PRECISION (long_long_integer_type_node)
+  || (TYPE_PRECISION (TREE_TYPE (src))
+  == 2 * TYPE_PRECISION (long_long_integer_type_node)))
+fn = builtin_decl_implicit

[COMMITTED] [range-ops] Update known bitmasks using CCP for all operators.

2022-11-11 Thread Aldy Hernandez via Gcc-patches

Use bit-CCP to calculate bitmasks for all integer operators, instead
of the half-assed job we were doing with just a handful of operators.

This sets us up nicely for tracking known-one bitmasks in the next
release, as all we'll have to do is just store them in the irange.

All in all, this series of patches incur a 1.9% penalty to VRP, with
no measurable difference in overall compile time.  The reason is
three-fold:

(a) There's double dispatch going on.  First, the dispatch for the
range-ops virtuals, and now the switch in bit_value_binop.

(b) The maybe nonzero mask is stored as a tree and there is an endless
back and forth with wide-ints.  This will be a non-issue next release,
when we convert irange to wide-ints.

(c) New functionality has a cost.  We were handling 2 cases (plus
casts).  Now we handle 20.

I can play around with moving the bit_value_binop cases into inlined
methods in the different range-op entries, and see if that improves
anything, but I doubt (a) buys us that much.  Certainly something that
can be done in stage3 if it's measurable in any significant way.

p.s It would be nice in the future to teach the op[12]_range methods about
the masks.

gcc/ChangeLog:

* range-op.cc (range_operator::fold_range): Call
update_known_bitmask.
(operator_bitwise_and::fold_range): Avoid setting nonzero bits
when range is undefined.
---
 gcc/range-op.cc | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 00a736e983d..9eec46441a3 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -245,6 +245,7 @@ range_operator::fold_range (irange &r, tree type,
   wi_fold_in_parts (r, type, lh.lower_bound (), lh.upper_bound (),
rh.lower_bound (), rh.upper_bound ());
   op1_op2_relation_effect (r, type, lh, rh, rel);
+  update_known_bitmask (r, m_code, lh, rh);
   return true;
 }
 
@@ -262,10 +263,12 @@ range_operator::fold_range (irange &r, tree type,
if (r.varying_p ())
  {
op1_op2_relation_effect (r, type, lh, rh, rel);
+   update_known_bitmask (r, m_code, lh, rh);
return true;
  }
   }
   op1_op2_relation_effect (r, type, lh, rh, rel);
+  update_known_bitmask (r, m_code, lh, rh);
   return true;
 }
 
@@ -2873,7 +2876,7 @@ operator_bitwise_and::fold_range (irange &r, tree type,
 {
   if (range_operator::fold_range (r, type, lh, rh))
 {
-  if (!lh.undefined_p () && !rh.undefined_p ())
+  if (!r.undefined_p () && !lh.undefined_p () && !rh.undefined_p ())
r.set_nonzero_bits (wi::bit_and (lh.get_nonzero_bits (),
 rh.get_nonzero_bits ()));
   return true;
-- 
2.38.1

[COMMITTED] [range-ops] Add tree code to range_operator.

2022-11-11 Thread Aldy Hernandez via Gcc-patches

This patch adds a tree code to range_operator in order to known which
tree code to pass into bit-CCP.

Up to now range-ops has been free of tree details, with the exception
of the div entries which use a tree code to differentiate between
them.  This is still the goal going forward, but this is a stop-gap
until we can merge the CCP and range-op bit handling in the next
release.

No change in performance.

gcc/ChangeLog:

* range-op.cc: (range_op_table::set): Set m_code.
(integral_table::integral_table): Handle shared entries.
(pointer_table::pointer_table): Same.
* range-op.h (class range_operator): Add m_code.
---
 gcc/range-op.cc | 37 +++--
 gcc/range-op.h  |  5 +
 2 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 8ff5d5b4c78..1fbebd85620 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -2523,7 +2523,7 @@ private:
const irange &outer) const;
   void fold_pair (irange &r, unsigned index, const irange &inner,
   const irange &outer) const;
-} op_convert;
+};
 
 // Add a partial equivalence between the LHS and op1 for casts.
 
@@ -3877,7 +3877,7 @@ public:
   const irange &op1,
   const irange &op2,
   relation_kind rel) const;
-} op_identity;
+};
 
 // Determine if there is a relationship between LHS and OP1.
 
@@ -3922,7 +3922,7 @@ public:
   const irange &op1,
   const irange &op2,
   relation_trio rel = TRIO_VARYING) const;
-} op_unknown;
+};
 
 bool
 operator_unknown::fold_range (irange &r, tree type,
@@ -4245,7 +4245,7 @@ public:
   virtual void wi_fold (irange & r, tree type,
const wide_int &lh_lb, const wide_int &lh_ub,
const wide_int &rh_lb, const wide_int &rh_ub) const;
-} op_ptr_min_max;
+};
 
 void
 pointer_min_max_operator::wi_fold (irange &r, tree type,
@@ -4372,8 +4372,17 @@ range_op_table::set (enum tree_code code, range_operator 
&op)
 {
   gcc_checking_assert (m_range_tree[code] == NULL);
   m_range_tree[code] = &op;
+  gcc_checking_assert (op.m_code == ERROR_MARK || op.m_code == code);
+  op.m_code = code;
 }
 
+// Shared operators that require separate instantiations because they
+// do not share a common tree code.
+static operator_cast op_nop, op_convert;
+static operator_identity op_ssa, op_paren, op_obj_type;
+static operator_unknown op_realpart, op_imagpart;
+static pointer_min_max_operator op_ptr_min, op_ptr_max;
+
 // Instantiate a range op table for integral operations.
 
 class integral_table : public range_op_table
@@ -4402,7 +4411,7 @@ integral_table::integral_table ()
   set (EXACT_DIV_EXPR, op_exact_div);
   set (LSHIFT_EXPR, op_lshift);
   set (RSHIFT_EXPR, op_rshift);
-  set (NOP_EXPR, op_convert);
+  set (NOP_EXPR, op_nop);
   set (CONVERT_EXPR, op_convert);
   set (TRUTH_AND_EXPR, op_logical_and);
   set (BIT_AND_EXPR, op_bitwise_and);
@@ -4413,11 +4422,11 @@ integral_table::integral_table ()
   set (TRUTH_NOT_EXPR, op_logical_not);
   set (BIT_NOT_EXPR, op_bitwise_not);
   set (INTEGER_CST, op_integer_cst);
-  set (SSA_NAME, op_identity);
-  set (PAREN_EXPR, op_identity);
-  set (OBJ_TYPE_REF, op_identity);
-  set (IMAGPART_EXPR, op_unknown);
-  set (REALPART_EXPR, op_unknown);
+  set (SSA_NAME, op_ssa);
+  set (PAREN_EXPR, op_paren);
+  set (OBJ_TYPE_REF, op_obj_type);
+  set (IMAGPART_EXPR, op_imagpart);
+  set (REALPART_EXPR, op_realpart);
   set (POINTER_DIFF_EXPR, op_pointer_diff);
   set (ABS_EXPR, op_abs);
   set (ABSU_EXPR, op_absu);
@@ -4437,8 +4446,8 @@ pointer_table::pointer_table ()
 {
   set (BIT_AND_EXPR, op_pointer_and);
   set (BIT_IOR_EXPR, op_pointer_or);
-  set (MIN_EXPR, op_ptr_min_max);
-  set (MAX_EXPR, op_ptr_min_max);
+  set (MIN_EXPR, op_ptr_min);
+  set (MAX_EXPR, op_ptr_max);
   set (POINTER_PLUS_EXPR, op_pointer_plus);
 
   set (EQ_EXPR, op_equal);
@@ -4447,10 +4456,10 @@ pointer_table::pointer_table ()
   set (LE_EXPR, op_le);
   set (GT_EXPR, op_gt);
   set (GE_EXPR, op_ge);
-  set (SSA_NAME, op_identity);
+  set (SSA_NAME, op_ssa);
   set (INTEGER_CST, op_integer_cst);
   set (ADDR_EXPR, op_addr);
-  set (NOP_EXPR, op_convert);
+  set (NOP_EXPR, op_nop);
   set (CONVERT_EXPR, op_convert);
 
   set (BIT_NOT_EXPR, op_bitwise_not);
diff --git a/gcc/range-op.h b/gcc/range-op.h
index 442a6e1d299..c999b456f62 100644
--- a/gcc/range-op.h
+++ b/gcc/range-op.h
@@ -48,7 +48,9 @@ along with GCC; see the file COPYING3.  If not see
 
 class range_operator
 {
+  friend class range_op_table;
 public:
+  range_operator () : m_code (ERROR_MARK) { }
   // Perform an operation between 2 ranges and return it.
   virtual bool fold_range (irange &r, tree type,
   const irange &lh,
@@ -106,6 +108,9 @@ protected:

[COMMITTED] [range-ops] Use existing tree code for *DIV_EXPR entries.

2022-11-11 Thread Aldy Hernandez via Gcc-patches

There is no need for a special tree code in the *DIV_EXPR entries, as
the parent class has one.

gcc/ChangeLog:

* range-op.cc (class operator_div): Remove tree code.
(operator_div::wi_op_overflows): Handle EXACT_DIV_EXPR as
TRUNC_DIV_EXPR.
---
 gcc/range-op.cc | 21 ++---
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 1fbebd85620..00a736e983d 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1971,7 +1971,6 @@ operator_mult::wi_fold (irange &r, tree type,
 class operator_div : public cross_product_operator
 {
 public:
-  operator_div (enum tree_code c)  { code = c; }
   virtual void wi_fold (irange &r, tree type,
const wide_int &lh_lb,
const wide_int &lh_ub,
@@ -1983,8 +1982,6 @@ public:
   virtual bool fold_range (irange &r, tree type,
   const irange &lh, const irange &rh,
   relation_trio trio) const final override;
-private:
-  enum tree_code code;
 };
 
 bool
@@ -1995,7 +1992,7 @@ operator_div::fold_range (irange &r, tree type,
   if (!cross_product_operator::fold_range (r, type, lh, rh, trio))
 return false;
 
-  update_known_bitmask (r, code, lh, rh);
+  update_known_bitmask (r, m_code, lh, rh);
   return true;
 }
 
@@ -2009,13 +2006,9 @@ operator_div::wi_op_overflows (wide_int &res, tree type,
   wi::overflow_type overflow = wi::OVF_NONE;
   signop sign = TYPE_SIGN (type);
 
-  switch (code)
+  switch (m_code)
 {
 case EXACT_DIV_EXPR:
-  // EXACT_DIV_EXPR is implemented as TRUNC_DIV_EXPR in
-  // operator_exact_divide.  No need to handle it here.
-  gcc_unreachable ();
-  break;
 case TRUNC_DIV_EXPR:
   res = wi::div_trunc (w0, w1, sign, &overflow);
   break;
@@ -2091,17 +2084,11 @@ operator_div::wi_fold (irange &r, tree type,
   gcc_checking_assert (!r.undefined_p ());
 }
 
-operator_div op_trunc_div (TRUNC_DIV_EXPR);
-operator_div op_floor_div (FLOOR_DIV_EXPR);
-operator_div op_round_div (ROUND_DIV_EXPR);
-operator_div op_ceil_div (CEIL_DIV_EXPR);
-
 
 class operator_exact_divide : public operator_div
 {
   using range_operator::op1_range;
 public:
-  operator_exact_divide () : operator_div (TRUNC_DIV_EXPR) { }
   virtual bool op1_range (irange &r, tree type,
  const irange &lhs,
  const irange &op2,
@@ -4382,6 +4369,10 @@ static operator_cast op_nop, op_convert;
 static operator_identity op_ssa, op_paren, op_obj_type;
 static operator_unknown op_realpart, op_imagpart;
 static pointer_min_max_operator op_ptr_min, op_ptr_max;
+static operator_div op_trunc_div;
+static operator_div op_floor_div;
+static operator_div op_round_div;
+static operator_div op_ceil_div;
 
 // Instantiate a range op table for integral operations.
 
-- 
2.38.1

[COMMITTED] [range-ops] Avoid unnecessary intersection in update_known_bitmask.

2022-11-11 Thread Aldy Hernandez via Gcc-patches

All the work for keeping the maybe nonzero masks up to date is being
done by the bit-CCP code now.  Any bitmask inherent in the range that
range-ops may have calculated has no extra information, so the
intersection is unnecessary.

gcc/ChangeLog:

* range-op.cc (update_known_bitmask): Avoid unnecessary intersection.
---
 gcc/range-op.cc | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 9eec46441a3..0b01cf48fdf 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -89,10 +89,7 @@ update_known_bitmask (irange &r, tree_code code,
   bit_value_binop (code, sign, prec, &value, &mask,
   lh_sign, lh_prec, lh_value, lh_mask,
   rh_sign, rh_prec, rh_value, rh_mask);
-
-  int_range<2> tmp (type);
-  tmp.set_nonzero_bits (value | mask);
-  r.intersect (tmp);
+  r.set_nonzero_bits (value | mask);
 }
 
 // Return the upper limit for a type.
-- 
2.38.1

[COMMITTED] [range-ops] Remove specialized fold_range methods for various operators.

2022-11-11 Thread Aldy Hernandez via Gcc-patches

Remove some specialized fold_range methods that were merely setting
maybe nonzero masks, as these are now subsumed by the generic version.

gcc/ChangeLog:

* range-op.cc (operator_mult::fold_range): Remove.
(operator_div::fold_range): Remove.
(operator_bitwise_and): Remove.
---
 gcc/range-op.cc | 52 -
 1 file changed, 52 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 0b01cf48fdf..6fa3b151596 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1790,13 +1790,9 @@ cross_product_operator::wi_cross_product (irange &r, 
tree type,
 
 class operator_mult : public cross_product_operator
 {
-  using range_operator::fold_range;
   using range_operator::op1_range;
   using range_operator::op2_range;
 public:
-  virtual bool fold_range (irange &r, tree type,
-  const irange &lh, const irange &rh,
-  relation_trio = TRIO_VARYING) const final override;
   virtual void wi_fold (irange &r, tree type,
const wide_int &lh_lb,
const wide_int &lh_ub,
@@ -1815,18 +1811,6 @@ public:
  relation_trio) const final override;
 } op_mult;
 
-bool
-operator_mult::fold_range (irange &r, tree type,
-  const irange &lh, const irange &rh,
-  relation_trio trio) const
-{
-  if (!cross_product_operator::fold_range (r, type, lh, rh, trio))
-return false;
-
-  update_known_bitmask (r, MULT_EXPR, lh, rh);
-  return true;
-}
-
 bool
 operator_mult::op1_range (irange &r, tree type,
  const irange &lhs, const irange &op2,
@@ -1979,23 +1963,8 @@ public:
   virtual bool wi_op_overflows (wide_int &res, tree type,
const wide_int &, const wide_int &)
 const final override;
-  virtual bool fold_range (irange &r, tree type,
-  const irange &lh, const irange &rh,
-  relation_trio trio) const final override;
 };
 
-bool
-operator_div::fold_range (irange &r, tree type,
- const irange &lh, const irange &rh,
- relation_trio trio) const
-{
-  if (!cross_product_operator::fold_range (r, type, lh, rh, trio))
-return false;
-
-  update_known_bitmask (r, m_code, lh, rh);
-  return true;
-}
-
 bool
 operator_div::wi_op_overflows (wide_int &res, tree type,
   const wide_int &w0, const wide_int &w1) const
@@ -2834,14 +2803,9 @@ operator_logical_and::op2_range (irange &r, tree type,
 
 class operator_bitwise_and : public range_operator
 {
-  using range_operator::fold_range;
   using range_operator::op1_range;
   using range_operator::op2_range;
 public:
-  virtual bool fold_range (irange &r, tree type,
-  const irange &lh,
-  const irange &rh,
-  relation_trio rel = TRIO_VARYING) const;
   virtual bool op1_range (irange &r, tree type,
  const irange &lhs,
  const irange &op2,
@@ -2865,22 +2829,6 @@ private:
const irange &op2) const;
 } op_bitwise_and;
 
-bool
-operator_bitwise_and::fold_range (irange &r, tree type,
- const irange &lh,
- const irange &rh,
- relation_trio) const
-{
-  if (range_operator::fold_range (r, type, lh, rh))
-{
-  if (!r.undefined_p () && !lh.undefined_p () && !rh.undefined_p ())
-   r.set_nonzero_bits (wi::bit_and (lh.get_nonzero_bits (),
-rh.get_nonzero_bits ()));
-  return true;
-}
-  return false;
-}
-
 
 // Optimize BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR of signed types
 // by considering the number of leading redundant sign bit copies.
-- 
2.38.1

Re: [PATCH] 2/19 modula2 front end: Make-lang.in

2022-11-11 Thread Richard Biener via Gcc-patches

On Mon, Oct 10, 2022 at 5:34 PM Gaius Mulley via Gcc-patches
 wrote:
>
>
>
> The makefile fragment for modula2 which builds the gm2 driver and cc1gm2.
>
>
> --8<--8<--8<--8<--8<--8<
> diff -ruw /dev/null gcc-git-devel-modula2/gcc/m2/Make-lang.in
> --- /dev/null   2022-08-24 16:22:16.88870 +0100
> +++ gcc-git-devel-modula2/gcc/m2/Make-lang.in   2022-10-07 20:21:18.634096743 
> +0100
> @@ -0,0 +1,1556 @@
> +# Top level -*- makefile -*- fragment for GNU M2.
> +
> +# Copyright (C) 2000-2022 Free Software Foundation, Inc.
> +
> +#This file is part of GCC.
> +
> +#GCC is free software; you can redistribute it and/or modify
> +#it under the terms of the GNU General Public License as published by
> +#the Free Software Foundation; either version 3, or (at your option)
> +#any later version.
> +
> +#GCC is distributed in the hope that it will be useful,
> +#but WITHOUT ANY WARRANTY; without even the implied warranty of
> +#MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +#GNU General Public License for more details.
> +
> +#You should have received a copy of the GNU General Public License
> +#along with GCC; see the file COPYING3.  If not see
> +#.
> +
> +GM2_MAKE_DEBUG=
> +
> +# Actual names to use when installing a native compiler.
> +GM2_INSTALL_NAME = $(shell echo gm2|sed '$(program_transform_name)')
> +GM2_TARGET_INSTALL_NAME = $(target_noncanonical)-$(shell echo gm2|sed 
> '$(program_transform_name)')
> +
> +# Actual names to use when installing a cross-compiler.
> +GM2_CROSS_NAME = `echo gm2|sed '$(program_transform_cross_name)'`
> +
> +M2_MAINTAINER = no
> +
> +CPP_GM2=-fpermissive -DIN_GCC -g

Do we really need -fpermissive here?

> +GM2_1 = ./gm2 -B./stage1/m2 -g -fm2-g
> +
> +GM2_FOR_TARGET = $(STAGE_CC_WRAPPER) ./gm2 -B./ -B$(build_tooldir)/bin/ 
> -L$(objdir)/../ld $(TFLAGS)
> +
> +TEXISRC = $(objdir)/m2/images/gnu.eps \
> +  $(srcdir)/doc/gm2.texi \
> +  m2/gm2-libs.texi \
> +  m2/gm2-ebnf.texi \
> +  m2/SYSTEM-pim.texi \
> +  m2/SYSTEM-iso.texi \
> +  m2/Builtins.texi
> +

That will need Sphinx treatment

> +# Define the names for selecting modula-2 in LANGUAGES.
> +m2 modula-2 modula2: gm2$(exeext) xgcc$(exeext) cc1gm2$(exeext) \
> + $(GCC_PASSES) $(GCC_PARTS)
> +m2.serial = cc1gm2$(exeext)
> +
> +# Tell GNU make to ignore these if they exist.
> +.PHONY: m2 modula-2 modula2
> +
> +GM2_PROG_DEP=gm2$(exeext) xgcc$(exeext) cc1gm2$(exeext)
> +
> +include m2/config-make
> +LIBSTDCXX=../$(TARGET_SUBDIR)/libstdc++-v3/src/.libs/libstdc++.a
> +
> +PGE=m2/pge$(exeext)
> +
> +SRC_PREFIX=G
> +
> +m2/gm2spec.o: $(srcdir)/m2/gm2spec.cc $(SYSTEM_H) $(GCC_H) $(CONFIG_H) \
> +   m2/gm2config.h $(TARGET_H) $(PLUGIN_HEADERS) \
> +   $(generated_files) $(C_TREE_H) insn-attr-common.h
> +   (SHLIB_LINK='$(SHLIB_LINK)' \
> +   SHLIB_MULTILIB='$(SHLIB_MULTILIB)'; \
> +   $(COMPILER) $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
> + $(DRIVER_DEFINES) \
> +   -DLIBSUBDIR=\"$(libsubdir)\" \
> +-DPREFIX=\"$(prefix)\" \
> +-c $(srcdir)/m2/gm2spec.cc $(OUTPUT_OPTION))
> +
> +# Create the compiler driver for M2.
> +CFLAGS-m2/m2/gm2spec.o += $(DRIVER_DEFINES)
> +
> +GM2_OBJS = $(GCC_OBJS) prefix.o intl.o m2/gm2spec.o
> +
> +# Create the compiler driver for gm2.
> +gm2$(exeext): $(GM2_OBJS) $(EXTRA_GCC_OBJS) libcommon-target.a $(LIBDEPS) \
> +m2/gm2config.h
> +   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
> + $(GM2_OBJS) $(EXTRA_GCC_OBJS) libcommon-target.a \
> + $(EXTRA_GCC_LIBS) $(LIBS)
> +
> +# Create a version of the gm2 driver which calls the cross-compiler.
> +gm2-cross$(exeext): gm2$(exeext)
> +   -rm -f gm2-cross$(exeext)
> +   cp gm2$(exeext) gm2-cross$(exeext)
> +
> +po-generated:
> +
> +# Build hooks:
> +
> +m2.all.cross: gm2-cross$(exeext) plugin/m2rte$(exeext).so
> +
> +m2.start.encap: gm2$(exeext) plugin/m2rte$(exeext).so
> +m2.rest.encap:
> +
> +m2.info: doc/m2.info
> +
> +m2.man: doc/m2.1
> +
> +m2.install-man: $(DESTDIR)$(man1dir)/$(GM2_INSTALL_NAME)$(man1ext)
> +
> +$(DESTDIR)$(man1dir)/$(GM2_INSTALL_NAME)$(man1ext): doc/m2.1 installdirs
> +   -rm -f $@
> +   -$(INSTALL_DATA) $< $@
> +   -chmod a-x $@
> +
> +m2.dvi: $(TEXISRC)
> +   $(TEXI2DVI) -I $(objdir)/m2 -I $(srcdir)/doc/include 
> $(srcdir)/doc/gm2.texi -o $@
> +
> +m2.ps: m2.dvi
> +   dvips -o $@ $<
> +
> +m2.pdf: m2.ps
> +   gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=$@ $<
> +
> +.INTERMEDIATE: gm2.pod
> +
> +m2.pod: doc/gm2.texi $(TEXISRC)
> +   -$(TEXI2POD) -I $(objdir)/m2 -D m2 < $< > $@

Likewise for the doc parts above and below

> +doc/m2.info: $(TEXISRC)
> +   if test "x$(BUILD_INFO)" = xinfo; then \
> + rm -f doc/m2.info*; \
> +  $(MAKEINFO) --no-headers -I$(objdir)/m2 -I$(srcdir)/doc/include \
> +

[PATCH 3/8] middle-end: Refactor number_of_iterations_popcount

2022-11-11 Thread Andrew Carlotti via Gcc-patches

This includes various changes to improve clarity, and to enable the code
to be more similar to the clz and ctz idiom recognition added in
subsequent patches.

We create new number_of_iterations_bitcount function, which will be used
to call the other bit-counting recognition functions added in subsequent
patches, as well as a generic comment describing the loop structures
that are common to each idiom. Some of the variables in
number_of_iterations_popcount are given more descriptive names, and the
popcount expression builder is extracted into a separate function.

As part of the refactoring, we also fix a bug where the max loop count
for modes shorter than an integer would be incorrectly computed as if
the input mode were actually an integer.

We also ensure that niter->max takes into account the final value for
niter->niter (after any folding and simplifying), since if the latter is a
constant, then record_estimate mandates that the two values are equivalent.

gcc/ChangeLog:

* tree-ssa-loop-niter.cc
(number_of_iterations_exit_assumptions): Modify to call...
(number_of_iterations_bitcount): ...this new function.
(number_of_iterations_popcount): Now called by the above.
Refactor, and extract popcount expression builder to...
(build_popcount_expr): this new function.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/popcount-max.c: New test.


--


diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount-max.c 
b/gcc/testsuite/gcc.dg/tree-ssa/popcount-max.c
new file mode 100644
index 
..ca7204cbc3cea636183408e24d7dd36d702ffdb2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount-max.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-loop-optimize -fdump-tree-optimized" } */
+
+#define PREC (__CHAR_BIT__)
+
+int count1 (unsigned char b) {
+int c = 0;
+
+while (b) {
+   b &= b - 1;
+   c++;
+}
+if (c <= PREC)
+  return 0;
+else
+  return 34567;
+}
+
+int count2 (unsigned char b) {
+int c = 0;
+
+while (b) {
+   b &= b - 1;
+   c++;
+}
+if (c <= PREC - 1)
+  return 0;
+else
+  return 76543;
+}
+
+/* { dg-final { scan-tree-dump-times "34567" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "76543" 1 "optimized" } } */
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 
0af34e46580bb9a6f9b40e09c9f29b8454a4aaf6..fece876099c1687569d6351e7d2416ea6acae5b5
 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -2026,6 +2026,48 @@ number_of_iterations_cond (class loop *loop,
   return ret;
 }
 
+/* Return an expression that computes the popcount of src.  */
+
+static tree
+build_popcount_expr (tree src)
+{
+  tree fn;
+  int prec = TYPE_PRECISION (TREE_TYPE (src));
+  int i_prec = TYPE_PRECISION (integer_type_node);
+  int li_prec = TYPE_PRECISION (long_integer_type_node);
+  int lli_prec = TYPE_PRECISION (long_long_integer_type_node);
+  if (prec <= i_prec)
+fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+  else if (prec == li_prec)
+fn = builtin_decl_implicit (BUILT_IN_POPCOUNTL);
+  else if (prec == lli_prec || prec == 2 * lli_prec)
+fn = builtin_decl_implicit (BUILT_IN_POPCOUNTLL);
+  else
+return NULL_TREE;
+
+  tree utype = unsigned_type_for (TREE_TYPE (src));
+  src = fold_convert (utype, src);
+  if (prec < i_prec)
+src = fold_convert (unsigned_type_node, src);
+  tree call;
+  if (prec == 2 * lli_prec)
+{
+  tree src1 = fold_convert (long_long_unsigned_type_node,
+   fold_build2 (RSHIFT_EXPR, TREE_TYPE (src),
+unshare_expr (src),
+build_int_cst (integer_type_node,
+   lli_prec)));
+  tree src2 = fold_convert (long_long_unsigned_type_node, src);
+  tree call1 = build_call_expr (fn, 1, src1);
+  tree call2 = build_call_expr (fn, 1, src2);
+  call = fold_build2 (PLUS_EXPR, integer_type_node, call1, call2);
+}
+  else
+call = build_call_expr (fn, 1, src);
+
+  return call;
+}
+
 /* Utility function to check if OP is defined by a stmt
that is a val - 1.  */
 
@@ -2041,45 +2083,18 @@ ssa_defined_by_minus_one_stmt_p (tree op, tree val)
  && integer_minus_onep (gimple_assign_rhs2 (stmt)));
 }
 
-/* See if LOOP is a popcout implementation, determine NITER for the loop
+/* See comment below for number_of_iterations_bitcount.
+   For popcount, we have:
 
-   We match:
-   
-   goto 
+   modify:
+   _1 = iv_1 + -1
+   iv_2 = iv_1 & _1
 
-   
-   _1 = b_11 + -1
-   b_6 = _1 & b_11
-
-   
-   b_11 = PHI 
+   test:
+   if (iv != 0)
 
-   exit block
-   if (b_11 != 0)
-   goto 
-   else
-   goto 
-
-   OR we match copy-header version:
-   if (b_5 != 0)
-   goto 
-   else
-   goto 
-
-   
-   b_11 = PHI 
-   _1 = b_11 + -1
-   b_6 = _1 &

Re: [PATCH] 0/19 modula-2 front end patches overview

2022-11-11 Thread Richard Biener via Gcc-patches

On Mon, Oct 10, 2022 at 5:32 PM Gaius Mulley via Gcc-patches
 wrote:
>
>
> Here are the latest modula-2 front end patches for review.
> The status of the patches and their contents are also contained at:
>
>https://splendidisolation.ddns.net/public/modula2/patchsummary.html
>
> where they are also broken down into topic groups.
>
> In summary the high level changes from the last posting are:
>
>* the driver code has been completely rewritten and it is now based
>  on the fortran driver and the c++ driver.  The gm2 driver adds
>  paths/libraries depending upon dialect chosen.
>* the linking mechanism has been completely redesigned
>  (As per
>  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595725.html).
>  Objects can be linked via g++.  New linking options
>  are available to allow linking with/without a scaffold.
>* gcc/m2/Make-lang.in (rewritten).
>* gm2tools/ removed and any required functionality with the
>  new linking mechanism has been moved into cc1gm2.
>
> The gm2 testsuite has been extended to test project linking
> options.

Thanks for these improvements!

The frontend specific parts are a lot to digest and I think it isn't
too important to
wait for the unlikely event that all of that gets a review.  I'm
trusting you here
as a maintainer and also based on the use of the frontend out in the wild.
I've CCed the other two RMs for their opinion on this.

I hope to get to the driver parts that I reviewed the last time, I'd
appreciate a look
on the runtime library setup by somebody else.

I think it's important to get this (and the rust frontend) into the tree before
Christmas holidays so it gets exposed to the more weird treatment of some
of our users (build wise).  This way we can develop either a negative or
positive list of host/targets where to disable the new frontends.

Thanks,
Richard.

>
> Testing
> ===
>
> 1. bootstrap on gcc-13 master --enable-languages=c,c++,fortran,d,lto
>
> 2. bootstrap on gcc-13 devel/modula-2 --enable-languages=c,c++,fortran,d,lto
>no extra failures seen between contrib/compare_diffs 1 2
>
> 3. bootstrap on gcc-13 devel/modula-2 
> --enable-languages=c,c++,fortran,d,lto,m2
>no extra no m2 failures seen between contrib/compare_diffs 2 3
>
> Steps 1, 2, 3 were performed on amd64 and aarch64 systems.
>
> The devel/modula-2 branch has been bootstrapped on:
>
>amd64 (debian bullseye/suse leap, suse tumbleweed),
>aarch64 (debian bullseye),
>armv7l (raspian),
>ppc64 (GNU/Linux),
>ppc64le (GNU/Linux),
>i586 (debian bullseye),
>sparc64 solaris
>sparc32 solaris
>
> and built on
>
>NetBSD 9.2 sparc64
>OpenBSD amd64
>
> Sources
> ===
>
> The patch set files follow in subsequent emails for review and copies
> can be found in the tarball below.  For ease of testing the full front
> end is also available via:
>
>   git clone git://gcc.gnu.org/git/gcc.git gcc-git-devel-modula2
>   cd gcc-git-devel-modula2
>   git checkout devel/modula-2
>
> The complete patch set is also available from:
>
>   https://splendidisolation.ddns.net/public/modula2/gm2patchset.tar.gz
>
> which can be applied to the gcc-13 master branch via:
>
>   git clone git://gcc.gnu.org/git/gcc.git gcc-git
>   wget --no-check-certificate \
>   https://splendidisolation.ddns.net/public/modula2/gm2patchset.tar.gz
>   tar zxf gm2patchset.tar.gz
>   bash gm2patchset/apply-patch.bash gcc-git
>   bash gm2patchset/pre-configure.bash gcc-git # regenerates configure and 
> friends
>
> when the script has completed the master branch should be identical
> to git branch devel/modula-2 above modulo recent git master commits.
>
> Review Patch Set
> 
>
> Here are all the source infrastructure files and all the c++/c sources
> (minus the bootstrap tools as these are autogenerated from the
> modula-2 sources).  I've not included the modula-2 sources (patch sets
> 18 and 19) in these emails as an attempt to reduce the email volume.
> They are available in
> https://splendidisolation.ddns.net/public/modula2/gm2patchset.tar.gz
> and of course the git repro.
>
> I'm happy to convert the documentation into sphynx and at a convenient
> point would like to post the analyser patches for modula2.
>
> Thank you for reviewing the patches and thank you to all the testers
>
> regards,
> Gaius

Re: [PATCH] tree-optimization/105142 - improve maybe_fold_comparisons_from_match_pd fix

2022-11-11 Thread Richard Biener via Gcc-patches

On Tue, Jul 26, 2022 at 1:18 PM Richard Biener via Gcc-patches
 wrote:
>
> The following improves on the fix for PR105142 which restricted the
> expression lookup used for maybe_fold_comparisons_from_match_pd to
> avoid picking up flow-sensitive info for use in places where guarding
> conditions do not hold.  Instead of not allowing to expand SSA
> definitions there the following temporarily clears flow-sensitive
> info on the SSA names and restores it when finished matching.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Better late then never - I have now pushed this after re-bootstrapping
and testing this on x86_64-unknown-linux-gnu.

Richard.

> PR tree-optimization/105142
> * gimple-fold.cc (fosa_unwind): New global.
> (follow_outer_ssa_edges): When the SSA definition to follow
> is does not dominate fosa_bb, temporarily clear flow-sensitive
> info.
> (maybe_fold_comparisons_from_match_pd): Set up unwind stack
> for follow_outer_ssa_edges and unwind flow-sensitive info
> clearing after matching.
> ---
>  gcc/gimple-fold.cc | 19 ++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index a1704784bc9..876ef45434e 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -6886,6 +6886,7 @@ and_comparisons_1 (tree type, enum tree_code code1, 
> tree op1a, tree op1b,
>  }
>
>  static basic_block fosa_bb;
> +static vec > *fosa_unwind;
>  static tree
>  follow_outer_ssa_edges (tree val)
>  {
> @@ -6899,7 +6900,15 @@ follow_outer_ssa_edges (tree val)
>   && (def_bb == fosa_bb
>   || dominated_by_p (CDI_DOMINATORS, fosa_bb, def_bb
> return val;
> -  return NULL_TREE;
> +  /* If the definition does not dominate fosa_bb temporarily reset
> +flow-sensitive info.  */
> +  if (val->ssa_name.info.range_info)
> +   {
> + fosa_unwind->safe_push (std::make_pair
> +   (val, val->ssa_name.info.range_info));
> + val->ssa_name.info.range_info = NULL;
> +   }
> +  return val;
>  }
>return val;
>  }
> @@ -6958,9 +6967,14 @@ maybe_fold_comparisons_from_match_pd (tree type, enum 
> tree_code code,
>   type, gimple_assign_lhs (stmt1),
>   gimple_assign_lhs (stmt2));
>fosa_bb = outer_cond_bb;
> +  auto_vec, 8> unwind_stack;
> +  fosa_unwind = &unwind_stack;
>if (op.resimplify (NULL, (!outer_cond_bb
> ? follow_all_ssa_edges : follow_outer_ssa_edges)))
>  {
> +  fosa_unwind = NULL;
> +  for (auto p : unwind_stack)
> +   p.first->ssa_name.info.range_info = p.second;
>if (gimple_simplified_result_is_gimple_val (&op))
> {
>   tree res = op.ops[0];
> @@ -6982,6 +6996,9 @@ maybe_fold_comparisons_from_match_pd (tree type, enum 
> tree_code code,
>   return build2 ((enum tree_code)op.code, op.type, op0, op1);
> }
>  }
> +  fosa_unwind = NULL;
> +  for (auto p : unwind_stack)
> +p.first->ssa_name.info.range_info = p.second;
>
>return NULL_TREE;
>  }
> --
> 2.35.3

[PATCH] libatomic: Add support for LSE and LSE2

2022-11-11 Thread Wilco Dijkstra via Gcc-patches

Add support for AArch64 LSE and LSE2 to libatomic.  Disable outline atomics,
and use LSE ifuncs for 1-8 byte atomics and LSE2 ifuncs for 16-byte atomics.
On Neoverse V1, 16-byte atomics are ~4x faster due to avoiding locks.

Note this is safe since we swap all 16-byte atomics using the same ifunc,
so they either use locks or LSE2 atomics, but never a mix. This also improves
ABI compatibility with LLVM: its inlined 16-byte atomics are compatible with
the new libatomic if LSE2 is supported.

Passes regress, OK for commit?

libatomic/
Makefile.in: Regenerated with automake 1.15.1.
Makefile.am: Add atomic_16.S for AArch64.
configure.tgt: Disable outline atomics in AArch64 build.
config/linux/aarch64/atomic_16.S: New file - implementation of
ifuncs for 128-bit atomics.
config/linux/aarch64/host-config.h: Enable ifuncs, use LSE 
(HWCAP_ATOMICS)
for 1-8-byte atomics and LSE2 (HWCAP_USCAT) for 16-byte atomics.

---
diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index 
d88515e4a03bd812334ae0b7bf4c0bba119455dc..41e5da28512150780a2018386e22b4e70afcfa3f
 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -127,6 +127,8 @@ if HAVE_IFUNC
 if ARCH_AARCH64_LINUX
 IFUNC_OPTIONS   = -march=armv8-a+lse
 libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix 
_$(s)_1_.lo,$(SIZEOBJS)))
+libatomic_la_SOURCES += atomic_16.S
+
 endif
 if ARCH_ARM_LINUX
 IFUNC_OPTIONS   = -march=armv7-a+fp -DHAVE_KERNEL64
diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index 
80d25653dc75cca995c8b0b2107a55f1234a6d52..89e29fc60a7fb74341b2f0f805e461847073082c
 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -90,13 +90,14 @@ build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
 @ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_1 = $(foreach 
s,$(SIZES),$(addsuffix _$(s)_1_.lo,$(SIZEOBJS)))
-@ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_2 = $(foreach \
+@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_2 = atomic_16.S
+@ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_3 = $(foreach \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ s,$(SIZES),$(addsuffix \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ _$(s)_1_.lo,$(SIZEOBJS))) \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ $(addsuffix \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ _8_2_.lo,$(SIZEOBJS))
-@ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@am__append_3 = $(addsuffix 
_8_1_.lo,$(SIZEOBJS))
-@ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@am__append_4 = $(addsuffix 
_16_1_.lo,$(SIZEOBJS)) \
+@ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@am__append_4 = $(addsuffix 
_8_1_.lo,$(SIZEOBJS))
+@ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@am__append_5 = $(addsuffix 
_16_1_.lo,$(SIZEOBJS)) \
 @ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@   $(addsuffix 
_16_2_.lo,$(SIZEOBJS))
 
 subdir = .
@@ -154,8 +155,11 @@ am__uninstall_files_from_dir = { \
   }
 am__installdirs = "$(DESTDIR)$(toolexeclibdir)"
 LTLIBRARIES = $(noinst_LTLIBRARIES) $(toolexeclib_LTLIBRARIES)
+@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__objects_1 =  \
+@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@ atomic_16.lo
 am_libatomic_la_OBJECTS = gload.lo gstore.lo gcas.lo gexch.lo \
-   glfree.lo lock.lo init.lo fenv.lo fence.lo flag.lo
+   glfree.lo lock.lo init.lo fenv.lo fence.lo flag.lo \
+   $(am__objects_1)
 libatomic_la_OBJECTS = $(am_libatomic_la_OBJECTS)
 AM_V_lt = $(am__v_lt_@AM_V@)
 am__v_lt_ = $(am__v_lt_@AM_DEFAULT_V@)
@@ -165,9 +169,9 @@ libatomic_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC 
$(AM_LIBTOOLFLAGS) \
$(LIBTOOLFLAGS) --mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
$(libatomic_la_LDFLAGS) $(LDFLAGS) -o $@
 libatomic_convenience_la_DEPENDENCIES = $(libatomic_la_LIBADD)
-am__objects_1 = gload.lo gstore.lo gcas.lo gexch.lo glfree.lo lock.lo \
-   init.lo fenv.lo fence.lo flag.lo
-am_libatomic_convenience_la_OBJECTS = $(am__objects_1)
+am__objects_2 = gload.lo gstore.lo gcas.lo gexch.lo glfree.lo lock.lo \
+   init.lo fenv.lo fence.lo flag.lo $(am__objects_1)
+am_libatomic_convenience_la_OBJECTS = $(am__objects_2)
 libatomic_convenience_la_OBJECTS =  \
$(am_libatomic_convenience_la_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
@@ -185,6 +189,16 @@ am__v_at_1 =
 depcomp = $(SHELL) $(top_srcdir)/../depcomp
 am__depfiles_maybe = depfiles
 am__mv = mv -f
+CPPASCOMPILE = $(CCAS) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) \
+   $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CCASFLAGS) $(CCASFLAGS)
+LTCPPASCOMPILE = $(LIBTOOL) $(AM_V_lt) $(AM_LIBTOOLFLAGS) \
+   $(LIBTOOLFLAGS) --mode=compile $(CCAS) $(DEFS) \
+   $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) \
+   $(AM_CCASFLAGS) $(CCASFLAGS)
+AM_V_CPPAS = $(am__v_CPPAS_@AM_V@)
+am__v_CPPAS_ = $(am__v_CPPAS_@AM_DEFAULT_V@)
+am__v_CPPAS_0 = @echo "  CPPAS   " $@;
+am__v_CPPAS_1 = 
 COMPILE = $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) \
$(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS)
 LTCOMPILE = $(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) \
@

Re: [PATCH] range-op: Implement op[12]_range operators for {PLUS,MINUS,MULT,RDIV}_EXPR

2022-11-11 Thread Aldy Hernandez via Gcc-patches





On 11/11/22 12:50, Jakub Jelinek wrote:

On Wed, Nov 09, 2022 at 04:43:56PM +0100, Aldy Hernandez wrote:

On Wed, Nov 9, 2022 at 3:58 PM Jakub Jelinek  wrote:


On Wed, Nov 09, 2022 at 10:02:46AM +0100, Aldy Hernandez wrote:

We can implement the op[12]_range entries for plus and minus in terms
of each other.  These are adapted from the integer versions.


I think for NANs the op[12]_range shouldn't act this way.
For the forward binary operations, we have the (maybe/known) NAN handling
of one or both NAN operands resulting in VARYING sign (maybe/known) NAN
result, that is the somehow the case for the reverse binary operations too,
if result is (maybe/known) NAN and the other op is not NAN, op is
VARYING sign (maybe/known) NAN, if other op is (maybe/known) NAN,
then op is VARYING sign maybe NAN (always maybe, never known).
But then for + we have the -INF + INF or vice versa into NAN, and that
is something that shouldn't be considered.  If result isn't NAN, then
neither operand can be NAN, regardless of whether result can be
+/- INF and the other op -/+ INF.


Heh.  I just ran into this while debugging the problem reported by Xi.

We are solving NAN = op1 - VARYING, and trying to do it with op1 = NAN
+ VARYING, which returns op1 = NAN (incorrectly).

I suppose in the above case op1 should ideally be
[-INF,-INF][+INF,+INF]+-NAN, but since we can't represent that then
[-INF,+INF] +-NAN, which is actually VARYING.  Do you agree?

I'm reverting this patch as attached, while I sort this out.


Here is my (so far only on the testcase tested) patch which reinstalls
your change, add the fixups I've talked about and also hooks up
reverse operators for MULT_EXPR/RDIV_EXPR.


OMG, you're a rockstar (or salsa or bachata star if that's your thing)! 
:-P).


Thank you so much.  I was just looking at that now.



2022-11-11  Aldy Hernandez  
Jakub Jelinek  

* range-op-float.cc (float_binary_op_range_finish): New function.
(foperator_plus::op1_range): New.
 (foperator_plus::op2_range): New.
 (foperator_minus::op1_range): New.
 (foperator_minus::op2_range): New.
(foperator_mult::op1_range): New.
 (foperator_mult::op2_range): New.
 (foperator_div::op1_range): New.
 (foperator_div::op2_range): New.

* gcc.c-torture/execute/ieee/inf-4.c: New test.

--- gcc/range-op-float.cc.jj2022-11-11 10:55:57.602617289 +0100
+++ gcc/range-op-float.cc   2022-11-11 12:32:19.378633983 +0100
@@ -1861,8 +1861,64 @@ foperator_unordered_equal::op1_range (fr
return true;
  }
  
+// Final tweaks for float binary op op1_range/op2_range.

+
+static bool
+float_binary_op_range_finish (bool ret, frange &r, tree type,
+ const frange &lhs)
+{


Can you document the return value, even if it's just "the same as 
op1/2_range" ;-).



+  if (!ret)
+return ret;
+
+  // If we get a known NAN from reverse op, it means either that
+  // the other operand was known NAN (in that case we know nothing),
+  // or the reverse operation introduced a known NAN.
+  // Say for lhs = op1 * op2 if lhs is [-0, +0] and op2 is too,
+  // 0 / 0 is known NAN.  Just punt in that case.
+  // Or if lhs is a known NAN, we also don't know anything.
+  if (r.known_isnan () || lhs.known_isnan ())
+{
+  r.set_varying (type);
+  return false;
+}


A return of false means the operation is not handled, similar to what 
the default operators defined at the top of range-op*.cc do.  The caller 
(gori?) is free to disregard the range altogether.  In practice this 
means VARYING, so you're getting the same behavior.  But you should 
probably return TRUE, which is what we do in other operators. 
Technically you could also not set "r" and just return false.


Otherwise LGTM.

I'll look at your other patches next.
Aldy


+
+  // If lhs isn't NAN, then neither operand could be NAN,
+  // even if the reverse operation does introduce a maybe_nan.
+  if (!lhs.maybe_isnan ())
+r.clear_nan ();
+  // If lhs is a maybe or known NAN, the operand could be
+  // NAN.
+  else
+r.update_nan ();
+  return true;
+}
+
  class foperator_plus : public range_operator_float
  {
+  using range_operator_float::op1_range;
+  using range_operator_float::op2_range;
+public:
+  virtual bool op1_range (frange &r, tree type,
+ const frange &lhs,
+ const frange &op2,
+ relation_trio = TRIO_VARYING) const final override
+  {
+if (lhs.undefined_p ())
+  return false;
+range_op_handler minus (MINUS_EXPR, type);
+if (!minus)
+  return false;
+return float_binary_op_range_finish (minus.fold_range (r, type, lhs, op2),
+r, type, lhs);
+  } > +  virtual bool op2_range (frange &r, tree type,
+ const frange &lhs,
+ const frange &op1,
+ relation_trio = TRIO_VARYING) const

RE: [PATCH][GCC] aarch64: Add support for Cortex-A715 CPU.

2022-11-11 Thread Kyrylo Tkachov via Gcc-patches

Hi Srinath,

> -Original Message-
> From: Srinath Parvathaneni 
> Sent: Friday, November 11, 2022 11:58 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Kyrylo Tkachov
> 
> Subject: [PATCH][GCC] aarch64: Add support for Cortex-A715 CPU.
> 
> Hi,
> 
> This patch adds support for Cortex-A715 CPU.
> 
> Bootstrapped on aarch64-none-linux-gnu and found no regressions.
> 
> Ok for GCC master?
> 

Ok. Please make sure aarch64-tune.md is appropriately regenerated in 
combination with the other -mcpu options you're adding.
Thanks,
Kyrill

> Regards,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2022-11-09  Srinath Parvathaneni  
> 
> * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Cortex-A715
> CPU.
> * config/aarch64/aarch64-tune.md: Regenerate.
> * doc/gcc/gcc-command-options/machine-dependent-options/aarch64-
> options.rst:
> Document Cortex-A715 CPU.
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-cores.def
> b/gcc/config/aarch64/aarch64-cores.def
> index
> e9a4b622be018d92a790db10f4d5cf926bba512c..380bd8d90fdc7bddea2c846
> 5522a30f938c2ffc5 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -167,6 +167,8 @@ AARCH64_CORE("cortex-a510",  cortexa510,
> cortexa55, V9A,  (SVE2_BITPERM, MEMTAG,
> 
>  AARCH64_CORE("cortex-a710",  cortexa710, cortexa57, V9A,
> (SVE2_BITPERM, MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd47, -1)
> 
> +AARCH64_CORE("cortex-a715",  cortexa715, cortexa57, V9A,
> (SVE2_BITPERM, MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd4d, -1)
> +
>  AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, V9A,  (SVE2_BITPERM,
> MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd48, -1)
> 
>  AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16,
> SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x41, 0xd49, -1)
> diff --git a/gcc/config/aarch64/aarch64-tune.md
> b/gcc/config/aarch64/aarch64-tune.md
> index
> 84e9bbf44f6222b3e5bcf4cbf8fab7ebf17015e1..f5b1482ba357d14f36e13ca3c
> 4358865d4238e9a 100644
> --- a/gcc/config/aarch64/aarch64-tune.md
> +++ b/gcc/config/aarch64/aarch64-tune.md
> @@ -1,5 +1,5 @@
>  ;; -*- buffer-read-only: t -*-
>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>  (define_attr "tune"
> -
>   "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thun
> derx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunder
> xt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,t
> hunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa
> 76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,co
> rtexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,oc
> teontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thun
> derx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cort
> exa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55
> ,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexx2,neoversen2,
> demeter,neoversev2"
> +
>   "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thun
> derx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunder
> xt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,t
> hunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa
> 76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,co
> rtexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,oc
> teontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thun
> derx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cort
> exa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55
> ,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexa715,cortexx2,n
> eoversen2,demeter,neoversev2"
>   (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> diff --git a/gcc/doc/gcc/gcc-command-options/machine-dependent-
> options/aarch64-options.rst b/gcc/doc/gcc/gcc-command-options/machine-
> dependent-options/aarch64-options.rst
> index
> c2b23a6ee97ef2b7c74119f22c1d3e3d85385f4d..2e1bd6dbfb1fcff53dd562ec5
> e8923d0a21cf715 100644
> --- a/gcc/doc/gcc/gcc-command-options/machine-dependent-
> options/aarch64-options.rst
> +++ b/gcc/doc/gcc/gcc-command-options/machine-dependent-
> options/aarch64-options.rst
> @@ -258,7 +258,8 @@ These options are defined for AArch64
> implementations:
>:samp:`cortex-a73.cortex-a35`, :samp:`cortex-a73.cortex-a53`,
>:samp:`cortex-a75.cortex-a55`, :samp:`cortex-a76.cortex-a55`,
>:samp:`cortex-r82`, :samp:`cortex-x1`, :samp:`cortex-x2`,
> -  :samp:`cortex-a510`, :samp:`cortex-a710`, :samp:`ampere1`,
> :samp:`native`.
> +  :samp:`cortex-a510`, :samp:`cortex-a710`, :samp:`cortex-a715`,
> :samp:`ampere1`,
> +  :samp:`native`.
> 
>The values :samp:`cortex-a57.cortex-a53`, :samp:`cortex-a72.cortex-a53`,
>:samp:`cortex-a73.cortex-a35`, :samp:

Re: [PATCH] range-op: Cleanup floating point multiplication and division fold_range [PR107569]

2022-11-11 Thread Aldy Hernandez via Gcc-patches





On 11/11/22 11:01, Jakub Jelinek wrote:

On Fri, Nov 11, 2022 at 09:52:53AM +0100, Jakub Jelinek via Gcc-patches wrote:

Ok, here is the patch rewritten in the foperator_div style, with special
cases handled first and then the ordinary cases without problematic cases.
I guess if/once we have a plugin testing infrastructure, we could compare
the two versions of the patch, I think this one is more precise.
And, admittedly there are many similar spots with the foperator_div case
(but also with significant differences), so perhaps if foperator_{mult,div}
inherit from some derived class from range_operator_float and that class
would define various smaller helper static? methods, like this
discussed in the PR - contains_zero_p, singleton_nan_p, zero_p,
that
+   bool must_have_signbit_zero = false;
+   bool must_have_signbit_nonzero = false;
+   if (real_isneg (&lh_lb) == real_isneg (&lh_ub)
+   && real_isneg (&rh_lb) == real_isneg (&rh_ub))
+ {
+   if (real_isneg (&lh_lb) == real_isneg (&rh_ub))
+ must_have_signbit_zero = true;
+   else
+ must_have_signbit_nonzero = true;
+ }
returned as -1/0/1 int, and those set result (based on the above value) to
[+INF, +INF], [-INF, -INF] or [-INF, +INF]
or
[+0, +0], [-0, -0] or [-0, +0]
or
[+0, +INF], [-INF, -0] or [-INF, +INF]
and the
+for (int i = 1; i < 4; ++i)
+  {
+   if (real_less (&cp[i], &cp[0])
+   || (real_iszero (&cp[0]) && real_isnegzero (&cp[i])))
+ std::swap (cp[i], cp[0]);
+   if (real_less (&cp[4], &cp[i + 4])
+   || (real_isnegzero (&cp[4]) && real_iszero (&cp[i + 4])))
+ std::swap (cp[i + 4], cp[4]);
+  }
block, it could be smaller and more readable.


Here is an incremental patch on top of this and division patch,
which does that.

2022-11-11  Jakub Jelinek  

PR tree-optimization/107569
* range-op-float.cc (foperator_mult_div_base): New class.
(foperator_mult, foperator_div): Derive from that and use
protected static methods from it to simplify the code.

--- gcc/range-op-float.cc.jj2022-11-11 10:13:30.879410560 +0100
+++ gcc/range-op-float.cc   2022-11-11 10:55:57.602617289 +0100
@@ -1911,7 +1911,125 @@ class foperator_minus : public range_ope
  } fop_minus;
  
  
-class foperator_mult : public range_operator_float

+class foperator_mult_div_base : public range_operator_float
+{
+protected:
+  // True if [lb, ub] is [+-0, +-0].
+  static bool zero_p (const REAL_VALUE_TYPE &lb,
+ const REAL_VALUE_TYPE &ub)
+  {
+return real_iszero (&lb) && real_iszero (&ub);
+  }
+
+  // True if +0 or -0 is in [lb, ub] range.
+  static bool contains_zero_p (const REAL_VALUE_TYPE &lb,
+  const REAL_VALUE_TYPE &ub)
+  {
+return (real_compare (LE_EXPR, &lb, &dconst0)
+   && real_compare (GE_EXPR, &ub, &dconst0));
+  }
+
+  // True if [lb, ub] is [-INF, -INF] or [+INF, +INF].
+  static bool singleton_inf_p (const REAL_VALUE_TYPE &lb,
+  const REAL_VALUE_TYPE &ub)
+  {
+return real_isinf (&lb) && real_isinf (&ub, real_isneg (&lb));
+  }
+
+  // Return -1 if binary op result must have sign bit set,
+  // 1 if binary op result must have sign bit clear,
+  // 0 otherwise.
+  // Sign bit of binary op result is exclusive or of the
+  // operand's sign bits.
+  static int signbit_known_p (const REAL_VALUE_TYPE &lh_lb,
+ const REAL_VALUE_TYPE &lh_ub,
+ const REAL_VALUE_TYPE &rh_lb,
+ const REAL_VALUE_TYPE &rh_ub)
+  {
+if (real_isneg (&lh_lb) == real_isneg (&lh_ub)
+   && real_isneg (&rh_lb) == real_isneg (&rh_ub))
+  {
+   if (real_isneg (&lh_lb) == real_isneg (&rh_ub))
+ return 1;
+   else
+ return -1;
+  }
+return 0;
+  }
+
+  // Set [lb, ub] to [-0, -0], [-0, +0] or [+0, +0] depending on
+  // signbit_known.
+  static void zero_range (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub,
+ int signbit_known)
+  {
+ub = lb = dconst0;
+if (signbit_known <= 0)
+  lb = real_value_negate (&dconst0);
+if (signbit_known < 0)
+  ub = lb;
+  }
+
+  // Set [lb, ub] to [-INF, -INF], [-INF, +INF] or [+INF, +INF] depending on
+  // signbit_known.
+  static void inf_range (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub,
+int signbit_known)
+  {
+if (signbit_known > 0)
+  ub = lb = dconstinf;
+else if (signbit_known < 0)
+  ub = lb = dconstninf;
+else
+  {
+   lb = dconstninf;
+   ub = dconstinf;
+  }
+  }
+
+  // Set [lb, ub] to [-INF, -0], [-INF, +INF] or [+0, +INF] depending on
+  // signbit_known.
+  static void zero_to_inf_range (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub,
+int signbit_known)
+  {
+if (signbit_known > 0)
+  {
+   lb =

RE: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-11-11 Thread Tamar Christina via Gcc-patches

Hi,

> 
> ...can we use expand_vec_perm_const here?  It will try the constant
> expansion first, which is the preferred order.  It also has a few variations 
> up
> its sleeve.
> 

We can, however it this function seems to be incorrectly assuming it can always
Convert the input mode to a QI vector mode.  When I started using it we got a 
number
of miscompilations in the AArch64 codegen.  This had the knock-on effect of 
uncovering
bugs in both the AArch64 backend and i386.  I'll send patched out for those 
separately.

For now here's the new patch using that hook and updating the permute expansion 
code:

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* expmed.cc (extract_bit_field_1): Add support for vector element
extracts.
* optabs.cc (expand_vec_perm_const): Add checks before converting
permute to QImode fallback.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ext_1.c: New.

--- inline copy of patch ---

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 
bab020c07222afa38305ef8d7333f271b1965b78..7d38045ae525c8a4665a0c1384fc515e4de88c67
 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -1718,6 +1718,21 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
  return target;
}
}
+  else if (!known_eq (bitnum, 0U)
+  && multiple_p (GET_MODE_UNIT_BITSIZE (tmode), bitnum, &pos))
+   {
+ /* The encoding has a single stepped pattern.  */
+ poly_uint64 nunits = GET_MODE_NUNITS (new_mode);
+ vec_perm_builder sel (nunits, 1, 3);
+ sel.quick_push (pos);
+ sel.quick_push (pos + 1);
+ sel.quick_push (pos + 2);
+
+ rtx res
+   = expand_vec_perm_const (new_mode, op0, op0, sel, new_mode, NULL);
+ if (res)
+   return simplify_gen_subreg (tmode, res, new_mode, 0);
+   }
 }
 
   /* See if we can get a better vector mode before extracting.  */
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 
cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b6896160090a453cc6a28d9
 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1,
   v0_qi = gen_lowpart (qimode, v0);
   v1_qi = gen_lowpart (qimode, v1);
   if (targetm.vectorize.vec_perm_const != NULL
+ && targetm.can_change_mode_class (mode, qimode, ALL_REGS)
  && targetm.vectorize.vec_perm_const (qimode, qimode, target_qi, v0_qi,
   v1_qi, qimode_indices))
return gen_lowpart (mode, target_qi);
@@ -6311,7 +6312,8 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1,
 }
 
   if (qimode != VOIDmode
-  && selector_fits_mode_p (qimode, qimode_indices))
+  && selector_fits_mode_p (qimode, qimode_indices)
+  && targetm.can_change_mode_class (mode, qimode, ALL_REGS))
 {
   icode = direct_optab_handler (vec_perm_optab, qimode);
   if (icode != CODE_FOR_nothing)
diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c 
b/gcc/testsuite/gcc.target/aarch64/ext_1.c
new file mode 100644
index 
..18a10a14f1161584267a8472e571b3bc2ddf887a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ext_1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include 
+
+typedef unsigned int v4si __attribute__((vector_size (16)));
+typedef unsigned int v2si __attribute__((vector_size (8)));
+
+/*
+** extract: { xfail *-*-* }
+** ext v0.16b, v0.16b, v0.16b, #4
+** ret
+*/
+v2si extract (v4si x)
+{
+v2si res = {x[1], x[2]};
+return res;
+}
+
+/*
+** extract1: { xfail *-*-* }
+** ext v0.16b, v0.16b, v0.16b, #4
+** ret
+*/
+v2si extract1 (v4si x)
+{
+v2si res;
+memcpy (&res, ((int*)&x)+1, sizeof(res));
+return res;
+}
+
+typedef struct cast {
+  int a;
+  v2si b __attribute__((packed));
+} cast_t;
+
+typedef union Data {
+   v4si x;
+   cast_t y;
+} data;  
+
+/*
+** extract2:
+** ext v0.16b, v0.16b, v0.16b, #4
+** ret
+*/
+v2si extract2 (v4si x)
+{
+data d;
+d.x = x;
+return d.y.b;
+}
+


rb16242.patch
Description: rb16242.patch

[PATCH] tree-optimization/107554 - fix ICE in stlen optimization

2022-11-11 Thread Richard Biener via Gcc-patches

The following fixes a wrongly typed variable causing an ICE.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/107554
* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes):
Use unsigned HOST_WIDE_INT type for the strlen.

* gcc.dg/pr107554.c: New testcase.

Co-Authored-By: Nikita Voronov 
---
 gcc/testsuite/gcc.dg/pr107554.c | 12 
 gcc/tree-ssa-strlen.cc  |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr107554.c

diff --git a/gcc/testsuite/gcc.dg/pr107554.c b/gcc/testsuite/gcc.dg/pr107554.c
new file mode 100644
index 000..8bbe6b07ae9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr107554.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O -foptimize-strlen" } */
+
+#define ELEMS 0x4000
+
+int a[ELEMS];
+int b[ELEMS];
+
+int main()
+{
+  __builtin_memcpy(a, b, ELEMS*sizeof(int));
+}
diff --git a/gcc/tree-ssa-strlen.cc b/gcc/tree-ssa-strlen.cc
index b87c7c7ce1f..abec225566d 100644
--- a/gcc/tree-ssa-strlen.cc
+++ b/gcc/tree-ssa-strlen.cc
@@ -4735,7 +4735,7 @@ strlen_pass::count_nonzero_bytes (tree exp, gimple *stmt,
 
   /* Compute the number of leading nonzero bytes in the representation
  and update the minimum and maximum.  */
-  unsigned n = prep ? strnlen (prep, nbytes) : nbytes;
+  unsigned HOST_WIDE_INT n = prep ? strnlen (prep, nbytes) : nbytes;
 
   if (n < lenrange[0])
 lenrange[0] = n;
-- 
2.35.3

RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-11-11 Thread Tamar Christina via Gcc-patches

Hi,


> This name might cause confusion with the SVE iterators, where FULL means
> "every bit of the register is used".  How about something like VMOVE
> instead?
> 
> With this change, I guess VALL_F16 represents "The set of all modes for
> which the vld1 intrinsics are provided" and VMOVE or whatever is "All
> Advanced SIMD modes suitable for moving, loading, and storing".
> That is, VMOVE extends VALL_F16 with modes that are not manifested via
> intrinsics.
> 

Done.

> Where is the 2h used, and is it valid syntax in that context?
> 
> Same for later instances of 2h.

They are, but they weren't meant to be in this patch.  They belong in a 
separate FP16 series that
I won't get to finish for GCC 13 due not being able to finish writing all the 
tests.  I have moved them
to that patch series though.

While the addp patch series has been killed, this patch is still good 
standalone and improves codegen
as shown in the updated testcase.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
(mov, movmisalign, aarch64_dup_lane,
aarch64_store_lane0, aarch64_simd_vec_set,
@aarch64_simd_vec_copy_lane, vec_set,
reduc__scal_, reduc__scal_,
aarch64_reduc__internal, aarch64_get_lane,
vec_init, vec_extract): Support V2HF.
(aarch64_simd_dupv2hf): New.
* config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
Add E_V2HFmode.
* config/aarch64/iterators.md (VHSDF_P): New.
(V2F, VMOVE, nunits, Vtype, Vmtype, Vetype, stype, VEL,
Vel, q, vp): Add V2HF.
* config/arm/types.md (neon_fp_reduc_add_h): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/slp_1.c: Update testcase.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
f4152160084d6b6f34bd69f0ba6386c1ab50f77e..487a31010245accec28e779661e6c2d578fca4b7
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -19,10 +19,10 @@
 ;; .
 
 (define_expand "mov"
-  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
-   (match_operand:VALL_F16 1 "general_operand"))]
+  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
+   (match_operand:VMOVE 1 "general_operand"))]
   "TARGET_SIMD"
-  "
+{
   /* Force the operand into a register if it is not an
  immediate whose use can be replaced with xzr.
  If the mode is 16 bytes wide, then we will be doing
@@ -46,12 +46,11 @@ (define_expand "mov"
   aarch64_expand_vector_init (operands[0], operands[1]);
   DONE;
 }
-  "
-)
+})
 
 (define_expand "movmisalign"
-  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
-(match_operand:VALL_F16 1 "general_operand"))]
+  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
+(match_operand:VMOVE 1 "general_operand"))]
   "TARGET_SIMD && !STRICT_ALIGNMENT"
 {
   /* This pattern is not permitted to fail during expansion: if both arguments
@@ -73,6 +72,16 @@ (define_insn "aarch64_simd_dup"
   [(set_attr "type" "neon_dup, neon_from_gp")]
 )
 
+(define_insn "aarch64_simd_dupv2hf"
+  [(set (match_operand:V2HF 0 "register_operand" "=w")
+   (vec_duplicate:V2HF
+ (match_operand:HF 1 "register_operand" "0")))]
+  "TARGET_SIMD"
+  "@
+   sli\\t%d0, %d1, 16"
+  [(set_attr "type" "neon_shift_imm")]
+)
+
 (define_insn "aarch64_simd_dup"
   [(set (match_operand:VDQF_F16 0 "register_operand" "=w,w")
(vec_duplicate:VDQF_F16
@@ -85,10 +94,10 @@ (define_insn "aarch64_simd_dup"
 )
 
 (define_insn "aarch64_dup_lane"
-  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
-   (vec_duplicate:VALL_F16
+  [(set (match_operand:VMOVE 0 "register_operand" "=w")
+   (vec_duplicate:VMOVE
  (vec_select:
-   (match_operand:VALL_F16 1 "register_operand" "w")
+   (match_operand:VMOVE 1 "register_operand" "w")
(parallel [(match_operand:SI 2 "immediate_operand" "i")])
   )))]
   "TARGET_SIMD"
@@ -142,6 +151,29 @@ (define_insn "*aarch64_simd_mov"
 mov_reg, neon_move")]
 )
 
+(define_insn "*aarch64_simd_movv2hf"
+  [(set (match_operand:V2HF 0 "nonimmediate_operand"
+   "=w, m,  m,  w, ?r, ?w, ?r, w, w")
+   (match_operand:V2HF 1 "general_operand"
+   "m,  Dz, w,  w,  w,  r,  r, Dz, Dn"))]
+  "TARGET_SIMD_F16INST
+   && (register_operand (operands[0], V2HFmode)
+   || aarch64_simd_reg_or_zero (operands[1], V2HFmode))"
+   "@
+ldr\\t%s0, %1
+str\\twzr, %0
+str\\t%s1, %0
+mov\\t%0.2s[0], %1.2s[0]
+umov\\t%w0, %1.s[0]
+fmov\\t%s0, %1
+mov\\t%0, %1
+movi\\t%d0, 0
+* return aarch64_output_simd_mov_immediate (operands[1], 32);"
+  [(set_attr "type" "neon_load1_1reg, store_8, neon_store1_1reg,\
+neon_logic, neon_to_gp, f_mcr,\
+

RE: [PATCH]AArch64 Extend umov and sbfx patterns.

2022-11-11 Thread Tamar Christina via Gcc-patches

Hi,

> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -4259,7 +4259,7 @@ (define_insn
> "*aarch64_get_lane_zero_extend"
> >  ;; Extracting lane zero is split into a simple move when it is
> > between SIMD  ;; registers or a store.
> >  (define_insn_and_split "aarch64_get_lane"
> > -  [(set (match_operand: 0 "aarch64_simd_nonimmediate_operand"
> > "=?r, w, Utv")
> > +  [(set (match_operand: 0 "aarch64_simd_nonimmediate_operand"
> > + "=r, w, Utv")
> > (vec_select:
> >   (match_operand:VALL_F16_FULL 1 "register_operand" "w, w, w")
> >   (parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))]
> 
> Which testcase does this help with?  It didn't look like the new tests do any
> vector stuff.
> 

Right, sorry about that, splitting up my patches resulted in this sneaking in 
from a different series.
Moved now.

> > -(define_insn "*_ashl"
> > +(define_insn "*_ashl"
> >[(set (match_operand:GPI 0 "register_operand" "=r")
> > (ANY_EXTEND:GPI
> > -(ashift:SHORT (match_operand:SHORT 1 "register_operand" "r")
> > +(ashift:ALLX (match_operand:ALLX 1 "register_operand" "r")
> >(match_operand 2 "const_int_operand" "n"]
> > -  "UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
> > +  "UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
> 
> It'd be better to avoid even defining si<-si or si<-di "extensions"
> (even though nothing should try to match them), so how about adding:
> 
>>  &&
> 
> or similar to the beginning of the condition?  The conditions for the invalid
> combos will then be provably false at compile time and the patterns will be
> compiled out.
> 

Done.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.md
(*_ashl): Renamed to...
(*_ashl): ...this.
(*zero_extend_lshr): Renamed to...
(*zero_extend_lshr): ...this.
(*extend_ashr): Rename to...
(*extend_ashr): ...this.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/bitmove_1.c: New test.
* gcc.target/aarch64/bitmove_2.c: New test.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
d7684c93fba5b717d568e1a4fd712bde55c7c72e..d230bbb833f97813c8371aa07b587bd8b0292cee
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5711,40 +5711,43 @@ (define_insn "*extrsi5_insn_di"
   [(set_attr "type" "rotate_imm")]
 )
 
-(define_insn "*_ashl"
+(define_insn "*_ashl"
   [(set (match_operand:GPI 0 "register_operand" "=r")
(ANY_EXTEND:GPI
-(ashift:SHORT (match_operand:SHORT 1 "register_operand" "r")
+(ashift:ALLX (match_operand:ALLX 1 "register_operand" "r")
   (match_operand 2 "const_int_operand" "n"]
-  "UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
+  " > 
+   && UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
 {
-  operands[3] = GEN_INT ( - UINTVAL (operands[2]));
+  operands[3] = GEN_INT ( - UINTVAL (operands[2]));
   return "bfiz\t%0, %1, %2, %3";
 }
   [(set_attr "type" "bfx")]
 )
 
-(define_insn "*zero_extend_lshr"
+(define_insn "*zero_extend_lshr"
   [(set (match_operand:GPI 0 "register_operand" "=r")
(zero_extend:GPI
-(lshiftrt:SHORT (match_operand:SHORT 1 "register_operand" "r")
-(match_operand 2 "const_int_operand" "n"]
-  "UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
+(lshiftrt:ALLX (match_operand:ALLX 1 "register_operand" "r")
+   (match_operand 2 "const_int_operand" "n"]
+  " > 
+   && UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
 {
-  operands[3] = GEN_INT ( - UINTVAL (operands[2]));
+  operands[3] = GEN_INT ( - UINTVAL (operands[2]));
   return "ubfx\t%0, %1, %2, %3";
 }
   [(set_attr "type" "bfx")]
 )
 
-(define_insn "*extend_ashr"
+(define_insn "*extend_ashr"
   [(set (match_operand:GPI 0 "register_operand" "=r")
(sign_extend:GPI
-(ashiftrt:SHORT (match_operand:SHORT 1 "register_operand" "r")
-(match_operand 2 "const_int_operand" "n"]
-  "UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
+(ashiftrt:ALLX (match_operand:ALLX 1 "register_operand" "r")
+   (match_operand 2 "const_int_operand" "n"]
+  " > 
+   && UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
 {
-  operands[3] = GEN_INT ( - UINTVAL (operands[2]));
+  operands[3] = GEN_INT ( - UINTVAL (operands[2]));
   return "sbfx\\t%0, %1, %2, %3";
 }
   [(set_attr "type" "bfx")]
diff --git a/gcc/testsuite/gcc.target/aarch64/bitmove_1.c 
b/gcc/testsuite/gcc.target/aarch64/bitmove_1.c
new file mode 100644
index 
..5ea4265f55213d7e7e5193a3a3681c9350867b50
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/bitmove_1.c
@@ -0,0 +1,76 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -std=c99" } */
+/*

[PATCH]AArch64 Fix vector re-interpretation between partial SIMD modes

2022-11-11 Thread Tamar Christina via Gcc-patches

Hi All,

While writing a patch series I started getting incorrect codegen out from
VEC_PERM on partial struct types.

It turns out that this was happening because the TARGET_CAN_CHANGE_MODE_CLASS
implementation has a slight bug in it.  The hook only checked for SIMD to
Partial but never Partial to SIMD.   This resulted in incorrect subregs to be
generated from the fallback code in VEC_PERM_EXPR expansions.

I have unfortunately not been able to trigger it using a standalone testcase as
the mid-end optimizes away the permute every time I try to describe a permute
that would result in the bug.

The patch now rejects any conversion of partial SIMD struct types, unless they
are both partial structures of the same number of registers or one is a SIMD
type who's size is less than 8 bytes.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? And backport to GCC 12?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_can_change_mode_class): Restrict
conversions between partial struct types properly.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
d3c3650d7d728f56adb65154127dc7b72386c5a7..84dbe2f4ea7d03b424602ed98a34e7824217dc91
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26471,9 +26471,10 @@ aarch64_can_change_mode_class (machine_mode from,
   bool from_pred_p = (from_flags & VEC_SVE_PRED);
   bool to_pred_p = (to_flags & VEC_SVE_PRED);
 
-  bool from_full_advsimd_struct_p = (from_flags == (VEC_ADVSIMD | VEC_STRUCT));
   bool to_partial_advsimd_struct_p = (to_flags == (VEC_ADVSIMD | VEC_STRUCT
   | VEC_PARTIAL));
+  bool from_partial_advsimd_struct_p = (from_flags == (VEC_ADVSIMD | VEC_STRUCT
+  | VEC_PARTIAL));
 
   /* Don't allow changes between predicate modes and other modes.
  Only predicate registers can hold predicate modes and only
@@ -26496,9 +26497,23 @@ aarch64_can_change_mode_class (machine_mode from,
 return false;
 
   /* Don't allow changes between partial and full Advanced SIMD structure
- modes.  */
-  if (from_full_advsimd_struct_p && to_partial_advsimd_struct_p)
-return false;
+ modes unless both are a partial struct with the same number of registers
+ or the vector bitsizes must be the same.  */
+  if (to_partial_advsimd_struct_p ^ from_partial_advsimd_struct_p)
+{
+  /* If they're both partial structures, allow if they have the same number
+or registers.  */
+  if (to_partial_advsimd_struct_p == from_partial_advsimd_struct_p)
+   return known_eq (GET_MODE_SIZE (from), GET_MODE_SIZE (to));
+
+  /* If one is a normal SIMD register, allow only if no larger than 
64-bit.  */
+  if ((to_flags & VEC_ADVSIMD) == to_flags)
+   return known_le (GET_MODE_SIZE (to), 8);
+  else if ((from_flags & VEC_ADVSIMD) == from_flags)
+   return known_le (GET_MODE_SIZE (from), 8);
+
+  return false;
+}
 
   if (maybe_ne (BITS_PER_SVE_VECTOR, 128u))
 {




-- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
d3c3650d7d728f56adb65154127dc7b72386c5a7..84dbe2f4ea7d03b424602ed98a34e7824217dc91
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26471,9 +26471,10 @@ aarch64_can_change_mode_class (machine_mode from,
   bool from_pred_p = (from_flags & VEC_SVE_PRED);
   bool to_pred_p = (to_flags & VEC_SVE_PRED);
 
-  bool from_full_advsimd_struct_p = (from_flags == (VEC_ADVSIMD | VEC_STRUCT));
   bool to_partial_advsimd_struct_p = (to_flags == (VEC_ADVSIMD | VEC_STRUCT
   | VEC_PARTIAL));
+  bool from_partial_advsimd_struct_p = (from_flags == (VEC_ADVSIMD | VEC_STRUCT
+  | VEC_PARTIAL));
 
   /* Don't allow changes between predicate modes and other modes.
  Only predicate registers can hold predicate modes and only
@@ -26496,9 +26497,23 @@ aarch64_can_change_mode_class (machine_mode from,
 return false;
 
   /* Don't allow changes between partial and full Advanced SIMD structure
- modes.  */
-  if (from_full_advsimd_struct_p && to_partial_advsimd_struct_p)
-return false;
+ modes unless both are a partial struct with the same number of registers
+ or the vector bitsizes must be the same.  */
+  if (to_partial_advsimd_struct_p ^ from_partial_advsimd_struct_p)
+{
+  /* If they're both partial structures, allow if they have the same number
+or registers.  */
+  if (to_partial_advsimd_struct_p == from_partial_advsimd_struct_p)
+   return known_eq (GET_MODE_SIZE (from), GET_MODE_SIZE (to));
+
+  /* If one is a normal SIMD register, allow only if no larger than 
64-bit.  */
+  if ((to_flags & VEC_ADVSIMD) == to_flags)
+   return known_le (GET_MODE_SIZE (to), 8);
+  else if (

[PATCH][i386]: Update ix86_can_change_mode_class target hook to accept QImode conversions

2022-11-11 Thread Tamar Christina via Gcc-patches

Hi All,

The current i386 implementation of the TARGET_CAN_CHANGE_MODE_CLASS is currently
not useful before re-alloc.

In particular before regalloc optimization passes query the hook using ALL_REGS,
but because of the

  if (MAYBE_FLOAT_CLASS_P (regclass))
  return false;

The hook returns false for all modes, even integer ones because ALL_REGS
overlaps with floating point regs.

The vector permute fallback cases used to unconditionally convert vector integer
permutes to vector QImode ones as a fallback plan.  This is incorrect and can
result in incorrect code if the target doesn't support this conversion.

To fix this some more checks were added, however that ended up introducing ICEs
in the i386 backend because e.g. the hook would reject conversions between modes
like V2TImode and V32QImode.

My understanding is that for x87 we don't want to allow floating point
conversions, but integers are fine.  So I have modified the check such that it
also checks the modes, not just the register class groups.

The second part of the code is needed because now that integer modes aren't
uniformly rejected the i386 backend trigger further optimizations.  However the
backend lacks instructions to deal with canonical RTL representations of
certain instructions.  for instance the back-end seems to prefer vec_select 0
instead of subregs.

So to prevent the canonicalization I reject integer modes when the sizes of to
and from don't match and when we would have exited with false previously.

This fixes all the ICEs and codegen regressions, but perhaps an x86 maintainer
should take a deeper look at this hook implementation.

Bootstrapped Regtested on x86_64-pc-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/i386/i386.cc (ix86_can_change_mode_class): Update the target
hook.

--- inline copy of patch -- 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 
c4d0e36e9c0a2256f5dde1f4dc021c0328aa0cba..477dd007ea80272680751b61e35cc3eec79b66c3
 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -19682,7 +19682,15 @@ ix86_can_change_mode_class (machine_mode from, 
machine_mode to,
 
   /* x87 registers can't do subreg at all, as all values are reformatted
  to extended precision.  */
-  if (MAYBE_FLOAT_CLASS_P (regclass))
+  if (MAYBE_FLOAT_CLASS_P (regclass)
+  && VALID_FP_MODE_P (from)
+  && VALID_FP_MODE_P (to))
+return false;
+
+  /* Reject integer modes if the sizes aren't the same.  It would have
+ normally exited above.  */
+  if (MAYBE_FLOAT_CLASS_P (regclass)
+  && GET_MODE_SIZE (from) != GET_MODE_SIZE (to))
 return false;
 
   if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))




-- 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 
c4d0e36e9c0a2256f5dde1f4dc021c0328aa0cba..477dd007ea80272680751b61e35cc3eec79b66c3
 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -19682,7 +19682,15 @@ ix86_can_change_mode_class (machine_mode from, 
machine_mode to,
 
   /* x87 registers can't do subreg at all, as all values are reformatted
  to extended precision.  */
-  if (MAYBE_FLOAT_CLASS_P (regclass))
+  if (MAYBE_FLOAT_CLASS_P (regclass)
+  && VALID_FP_MODE_P (from)
+  && VALID_FP_MODE_P (to))
+return false;
+
+  /* Reject integer modes if the sizes aren't the same.  It would have
+ normally exited above.  */
+  if (MAYBE_FLOAT_CLASS_P (regclass)
+  && GET_MODE_SIZE (from) != GET_MODE_SIZE (to))
 return false;
 
   if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))

[PATCH] AArch64: Add support for -mdirect-extern-access

2022-11-11 Thread Wilco Dijkstra via Gcc-patches

Add a new option -mdirect-extern-access similar to other targets.  This removes
GOT indirections on external symbols with -fPIE, resulting in significantly
better code quality.  With -fPIC it only affects protected symbols, allowing
for more efficient shared libraries which can be linked with standard PIE
binaries (this is what LLVM does by default, so this improves interoperability
with LLVM). This patch doesn't affect ABI, but in the future GCC and LLVM
should converge to using the same ABI.

Regress and bootstrap pass, OK for commit?

gcc/
* config/aarch64/aarch64.cc (aarch64_binds_local_p): New function.
(aarch64_symbol_binds_local_p): Refactor, support direct extern access.
* config/aarch64/aarch64-linux.h (TARGET_BINDS_LOCAL_P):
Use aarch64_binds_local_p.
* config/aarch64/aarch64-freebsd.h (TARGET_BINDS_LOCAL_P): Likewise.
* config/aarch64/aarch64-protos.h: Add aarch64_binds_local_p.
* doc/gcc/gcc-command-options/option-summary.rst: Add
-mdirect-extern-access.
* 
doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst:
Add description of -mdirect-extern-access.

gcc/testsuite/
* gcc.target/aarch64/pr66912-2.c: New test.

---

diff --git a/gcc/config/aarch64/aarch64-freebsd.h 
b/gcc/config/aarch64/aarch64-freebsd.h
index 
13beb3781b61afd82d767884f3c16ff8eead09cc..20bc0f48e484686cd3754613bf20bb3521079d48
 100644
--- a/gcc/config/aarch64/aarch64-freebsd.h
+++ b/gcc/config/aarch64/aarch64-freebsd.h
@@ -71,7 +71,7 @@
strong definitions in dependent shared libraries, will resolve
to COPY relocated symbol in the executable.  See PR65780.  */
 #undef TARGET_BINDS_LOCAL_P
-#define TARGET_BINDS_LOCAL_P default_binds_local_p_2
+#define TARGET_BINDS_LOCAL_P aarch64_binds_local_p
 
 /* Use the AAPCS type for wchar_t, override the one from
config/freebsd.h.  */
diff --git a/gcc/config/aarch64/aarch64-linux.h 
b/gcc/config/aarch64/aarch64-linux.h
index 
5e4553d79f5053f2da0eb381e0805f47aec964ae..6c962402155d60b82610d4f65af5182d6faa47ad
 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -70,7 +70,7 @@
strong definitions in dependent shared libraries, will resolve
to COPY relocated symbol in the executable.  See PR65780.  */
 #undef TARGET_BINDS_LOCAL_P
-#define TARGET_BINDS_LOCAL_P default_binds_local_p_2
+#define TARGET_BINDS_LOCAL_P aarch64_binds_local_p
 
 /* Define this to be nonzero if static stack checking is supported.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
238820581c5ee7617f8eed1df2cf5418b1127e19..fac754f78c1d7606ba90e1034820a62466b96b63
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1072,5 +1072,6 @@ const char *aarch64_sls_barrier (int);
 const char *aarch64_indirect_call_asm (rtx);
 extern bool aarch64_harden_sls_retbr_p (void);
 extern bool aarch64_harden_sls_blr_p (void);
+extern bool aarch64_binds_local_p (const_tree);
 
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
d1f979ebcf80333d957f8ad8631deef47dc693a5..ab4c42c34da5b15f6739c9b0a7ebaafda9488f2d
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -19185,9 +19185,29 @@ aarch64_tlsdesc_abi_id ()
 static bool
 aarch64_symbol_binds_local_p (const_rtx x)
 {
-  return (SYMBOL_REF_DECL (x)
- ? targetm.binds_local_p (SYMBOL_REF_DECL (x))
- : SYMBOL_REF_LOCAL_P (x));
+  if (!SYMBOL_REF_DECL (x))
+return SYMBOL_REF_LOCAL_P (x);
+
+  if (targetm.binds_local_p (SYMBOL_REF_DECL (x)))
+return true;
+
+  /* In PIE binaries avoid a GOT indirection on non-weak data symbols if
+ aarch64_direct_extern_access is true.  */
+  if (flag_pie && aarch64_direct_extern_access && !SYMBOL_REF_WEAK (x)
+  && !SYMBOL_REF_FUNCTION_P (x))
+return true;
+
+  return false;
+}
+
+/* Implement TARGET_BINDS_LOCAL_P hook.  */
+
+bool
+aarch64_binds_local_p (const_tree exp)
+{
+  /* Protected symbols are local if aarch64_direct_extern_access is true.  */
+  return default_binds_local_p_3 (exp, flag_shlib != 0, true,
+ !aarch64_direct_extern_access, !flag_pic);
 }
 
 /* Return true if SYMBOL_REF X is thread local */
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 
b89b20450710592101b93f4f3b5dc33d152d1eb6..6251a36b544a03955361b445c9f5dfad3740eea8
 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -299,3 +299,7 @@ Constant memset size in bytes from which to start using 
MOPS sequence.
 -param=aarch64-vect-unroll-limit=
 Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param
 Limit how much the autovectorizer may unroll a loop.
+
+mdirect-extern-access
+Target Var(aarch64_direct_extern_access) Init(0)
+Do not indirect accesses to external

Re: [PATCH] range-op: Implement op[12]_range operators for {PLUS,MINUS,MULT,RDIV}_EXPR

2022-11-11 Thread Andrew MacLeod via Gcc-patches




On 11/11/22 09:22, Aldy Hernandez wrote:




A return of false means the operation is not handled, similar to what 
the default operators defined at the top of range-op*.cc do. The 
caller (gori?) is free to disregard the range altogether.  In practice 
this means VARYING, so you're getting the same behavior. But you 
should probably return TRUE,



When false is returned, the range i suppose to be ignored as it is not 
guaranteed to be set.  It means, "I cant tell you anything additional to 
what is already known".  (which is similar to returning VARYING..)


Andrew

RE: [PATCH][GCC] aarch64: Add support for Cortex-A715 CPU.

2022-11-11 Thread Srinath Parvathaneni via Gcc-patches

Hi,

> -Original Message-
> From: Kyrylo Tkachov 
> Sent: Friday, November 11, 2022 2:24 PM
> To: Srinath Parvathaneni ; gcc-
> patc...@gcc.gnu.org
> Cc: Richard Sandiford 
> Subject: RE: [PATCH][GCC] aarch64: Add support for Cortex-A715 CPU.
> 
> Hi Srinath,
> 
> > -Original Message-
> > From: Srinath Parvathaneni 
> > Sent: Friday, November 11, 2022 11:58 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Kyrylo Tkachov
> > 
> > Subject: [PATCH][GCC] aarch64: Add support for Cortex-A715 CPU.
> >
> > Hi,
> >
> > This patch adds support for Cortex-A715 CPU.
> >
> > Bootstrapped on aarch64-none-linux-gnu and found no regressions.
> >
> > Ok for GCC master?
> >
> 
> Ok. Please make sure aarch64-tune.md is appropriately regenerated in
> combination with the other -mcpu options you're adding.

Thank you Kyrill for approving the patches, I have re-checked the regenerated
aarch64-tune.md for the added CPU's support and committed the code on to GCC 
master.

Regards,
Srinath.

> Thanks,
> Kyrill
>
> > Regards,
> > Srinath.
> >
> > gcc/ChangeLog:
> >
> > 2022-11-09  Srinath Parvathaneni  
> >
> > * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add
> > Cortex-A715 CPU.
> > * config/aarch64/aarch64-tune.md: Regenerate.
> > *
> > doc/gcc/gcc-command-options/machine-dependent-options/aarch64-
> > options.rst:
> > Document Cortex-A715 CPU.
> >
> >
> > ### Attachment also inlined for ease of reply
> > ###
> >
> >
> > diff --git a/gcc/config/aarch64/aarch64-cores.def
> > b/gcc/config/aarch64/aarch64-cores.def
> > index
> >
> e9a4b622be018d92a790db10f4d5cf926bba512c..380bd8d90fdc7bddea2c846
> > 5522a30f938c2ffc5 100644
> > --- a/gcc/config/aarch64/aarch64-cores.def
> > +++ b/gcc/config/aarch64/aarch64-cores.def
> > @@ -167,6 +167,8 @@ AARCH64_CORE("cortex-a510",  cortexa510,
> > cortexa55, V9A,  (SVE2_BITPERM, MEMTAG,
> >
> >  AARCH64_CORE("cortex-a710",  cortexa710, cortexa57, V9A,
> > (SVE2_BITPERM, MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd47, -1)
> >
> > +AARCH64_CORE("cortex-a715",  cortexa715, cortexa57, V9A,
> > (SVE2_BITPERM, MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd4d, -1)
> > +
> >  AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, V9A,  (SVE2_BITPERM,
> > MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd48, -1)
> >
> >  AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM,
> BF16,
> > SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x41, 0xd49, -1) diff
> > --git a/gcc/config/aarch64/aarch64-tune.md
> > b/gcc/config/aarch64/aarch64-tune.md
> > index
> >
> 84e9bbf44f6222b3e5bcf4cbf8fab7ebf17015e1..f5b1482ba357d14f36e13ca3c
> > 4358865d4238e9a 100644
> > --- a/gcc/config/aarch64/aarch64-tune.md
> > +++ b/gcc/config/aarch64/aarch64-tune.md
> > @@ -1,5 +1,5 @@
> >  ;; -*- buffer-read-only: t -*-
> >  ;; Generated automatically by gentune.sh from aarch64-cores.def
> > (define_attr "tune"
> > -
> > "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thun
> >
> derx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunde
> > r
> >
> xt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,t
> > hunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortex
> > a
> > 76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,
> > co
> > rtexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,
> > oc
> >
> teontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thun
> > derx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,co
> > rt
> >
> exa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa5
> > 5
> > ,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexx2,neoversen
> > 2,
> > demeter,neoversev2"
> > +
> > "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thun
> >
> derx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunde
> > r
> >
> xt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,t
> > hunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortex
> > a
> > 76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,
> > co
> > rtexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,
> > oc
> >
> teontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thun
> > derx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,co
> > rt
> >
> exa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa5
> > 5
> > ,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexa715,cortexx
> > 2,n
> > eoversen2,demeter,neoversev2"
> > (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) diff --git
> > a/gcc/doc/gcc/gcc-command-options/machine-dependent-
> > options/aarch64-options.rst b/gcc/doc/gcc/gcc-command-
> options/machine-
> > dependent-options/aarch64-options.rst
> > index
> >
> c2b23a6ee97ef2b7c74119f22c1d3e3d85385f4d..2e1bd6dbfb1fcff53dd562ec5
> > e8923d0a21cf715 100644
> > --- a/gcc/doc/gcc/gcc-command-optio

[PATCH][GCC] aarch64: Add support for Cortex-X3 CPU.

2022-11-11 Thread Srinath Parvathaneni via Gcc-patches

Hi,

This patch adds support for Cortex-X3 CPU.

Bootstrapped on aarch64-none-linux-gnu and found no regressions.

Ok for GCC master?

Regards,
Srinath.

gcc/ChangeLog:

2022-11-09  Srinath Parvathaneni  

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Cortex-X3 CPU.
* config/aarch64/aarch64-tune.md: Regenerate.
* 
doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst:
Document Cortex-X3 CPU.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
3055da9b268b6b71bc3bd6db721812b387e8dd44..a2062468136bf1c38b941c53868d26dafedda276
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -172,6 +172,8 @@ AARCH64_CORE("cortex-a715",  cortexa715, cortexa57, V9A,  
(SVE2_BITPERM, MEMTAG,
 
 AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, V9A,  (SVE2_BITPERM, MEMTAG, 
I8MM, BF16), neoversen2, 0x41, 0xd48, -1)
 
+AARCH64_CORE("cortex-x3",  cortexx3, cortexa57, V9A,  (SVE2_BITPERM, MEMTAG, 
I8MM, BF16), neoversen2, 0x41, 0xd4e, -1)
+
 AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x41, 0xd49, -1)
 
 AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, 
RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
22ec1be5a4c71b930221d2c4f1e62df57df0cadf..74c4384712b202058a58f1da0ca28adec97a6b9b
 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexa715,cortexx2,neoversen2,demeter,neoversev2"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexa715,cortexx2,cortexx3,neoversen2,demeter,neoversev2"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git 
a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst 
b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
index 
d97515d9e54feaa85a2ead4e9b73f0eb966cb39f..7cc369ef95e510e30873159b8e2130c4f77a57d3
 100644
--- 
a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
+++ 
b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
@@ -258,8 +258,8 @@ These options are defined for AArch64 implementations:
   :samp:`cortex-a73.cortex-a35`, :samp:`cortex-a73.cortex-a53`,
   :samp:`cortex-a75.cortex-a55`, :samp:`cortex-a76.cortex-a55`,
   :samp:`cortex-r82`, :samp:`cortex-x1`, :samp:`cortex-x1c`, :samp:`cortex-x2`,
-  :samp:`cortex-a510`, :samp:`cortex-a710`, :samp:`cortex-a715`, 
:samp:`ampere1`,
-  :samp:`native`.
+  :samp:`cortex-x3`, :samp:`cortex-a510`, :samp:`cortex-a710`,
+  :samp:`cortex-a715`, :samp:`ampere1`, :samp:`native`.
 
   The values :samp:`cortex-a57.cortex-a53`, :samp:`cortex-a72.cortex-a53`,
   :samp:`cortex-a73.cortex-a35`, :samp:`cortex-a73.cortex-a53`,



diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
3055da9b268b6b71bc3bd6db721812b387e8dd44..a2062468136bf1c38b941c53868d26dafedda276
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -172,6 +172,8 @@ AARCH64_CORE("cortex-a715",  cortexa715, cortexa57, V9A,  
(SVE2_BITPERM, MEMTAG,
 
 AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, V9A,  (SVE2_BITPERM, MEMTAG, 
I8MM, BF16), neoversen2,

[wwwdocs] projects/gomp: TR11 + GCC13 update

2022-11-11 Thread Tobias Burnus via Gcc-patches

This patch adds TR11 to the history of OpenMP releases – and it does an 
update of the implementation status.


OK?

Tobias

PS: The implementation-status changes were lying around in that file for 
a while. I think both the GCC 13 release notes and this file needs some 
update for more recent changes. Nonetheless, while incomplete, the 
changes themselves should be fine.
projects/gomp: TR11 + GCC13 update

 htdocs/projects/gomp/index.html | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 713a4e16..46f393c8 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -677,7 +677,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 device-specific ICV settings with environment variables
-No
+GCC 13
 
   
   
@@ -771,10 +771,10 @@ than listed, depending on resolved corner cases and optimizations.
 No
 
   
-  
+  
 omp/ompx/omx sentinels and omp_/ompx_ namespaces
 N/A
-
+warning for ompx/omx sentinels (1)
   
   
 Clauses on end directive can be on directive
@@ -888,7 +888,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 New doacross clause as alias for depend with source/sink modifier
-No
+GCC 13
 
   
   
@@ -898,7 +898,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 omp_cur_iteration keyword
-No
+GCC 13
 
   
   
@@ -924,9 +924,22 @@ than listed, depending on resolved corner cases and optimizations.
 
 
 
+(1) The
+ompx sentinel as C/C++ pragma and C++ attributes are warned for
+with -Wunknown-pragmas (implied by -Wall) and
+-Wattributes (enabled by default), respectively; for Fortran
+free-source code, there is a warning enabled by default and, for fixed-source
+code, the omx sentinel is warned for with -Wsurprising
+(enabled by -Wall). Unknown clauses are always rejected with an
+error.
 
 OpenMP Releases and Status
 
+November 9, 2022
+https://www.openmp.org/wp-content/uploads/openmp-TR11.pdf";>OpenMP
+Technical Report 11 (first preview for the OpenMP API Version 6.0) has been
+released.
+
 November 9, 2021
 https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf";>OpenMP
 Version 5.2 has been released.

Re: [PATCH (pushed)] sphinx: stop using parallel mode

2022-11-11 Thread Andrew Pinski via Gcc-patches

On Fri, Nov 11, 2022 at 4:34 AM Martin Liška  wrote:
>
> Noticed that the documentation build can stuck on a machine with
> many cores (160) and I identified a real sphinx problem:
> https://github.com/sphinx-doc/sphinx/issues/10969
>
> Note the parallel can help just for some manuals and it is not critical
> for us.

This alone should cause us to pause and just revert back to texinfo.
People are not going to upgrade sphinx all the time just to get fixes
for documentation layout.
Texinfo is stable and we should just revert back to it.

Thanks,
Andrew Pinski

>
> ChangeLog:
>
> * doc/Makefile: Disable -j auto.
> ---
>  doc/Makefile | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/doc/Makefile b/doc/Makefile
> index 9e305a8e7da..e08a43ecf2d 100644
> --- a/doc/Makefile
> +++ b/doc/Makefile
> @@ -2,7 +2,11 @@
>  #
>
>  # You can set these variables from the command line.
> -SPHINXOPTS   ?= -j auto -q
> +
> +# Disable parallel reading as it can be very slow on a machine with CPUs:
> +# https://github.com/sphinx-doc/sphinx/issues/10969
> +
> +SPHINXOPTS   ?= -q
>  SPHINXBUILD  ?= sphinx-build
>  PAPER?=
>  SOURCEDIR = .
> --
> 2.38.1
>

[COMMITTED] process transitive inferred ranges in pre_fold_stmt.

2022-11-11 Thread Andrew MacLeod via Gcc-patches

I was processing the transitive inferred ranges in fold_stmt when it was 
the final statement in the block.  the substitute_and_fold engine 
actually does a bit of work before calling fold_stmt.  this patch moves 
the check to pre_fold_stmt instead so it gets done before the final 
statement in the block is processed... as was the original intention.


I also changed it so we always do this just before the last statement in 
any block.  This allows us to get transitive inferred ranges registered 
for returns, as well as just normal blocks which can feed other blocks. 
  Performance impact is minimal.


Bootstraped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrei

From dab5d73959cfc8f03cba548777adda9a798e1f0e Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 9 Nov 2022 10:58:15 -0500
Subject: [PATCH] process transitive inferred ranges in pre_fold_stmt.

The subst_and_fold engine can perform some folding activity before
calling fold_stmt, so do this work in pre_fold_stmt instead.

	* tree-vrp.cc (rvrp_folder::rvrp_folder): Init m_last_bb_stmt.
	(rvrp_folder::pre_fold_bb): Set m_last_bb_stmt.
	(rvrp_folder::pre_fold_stmt): Check for transitive inferred ranges.
	(rvrp_folder::fold_stmt): Check in pre_fold_stmt instead.
---
 gcc/tree-vrp.cc | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index 3393c73a7db..a474d9d11e5 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -4442,6 +4442,7 @@ public:
   {
 m_ranger = r;
 m_pta = new pointer_equiv_analyzer (m_ranger);
+m_last_bb_stmt = NULL;
   }
 
   ~rvrp_folder ()
@@ -4485,6 +4486,7 @@ public:
 for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
 	 gsi_next (&gsi))
   m_ranger->register_inferred_ranges (gsi.phi ());
+m_last_bb_stmt = last_stmt (bb);
   }
 
   void post_fold_bb (basic_block bb) override
@@ -4497,19 +4499,14 @@ public:
   void pre_fold_stmt (gimple *stmt) override
   {
 m_pta->visit_stmt (stmt);
+// If this is the last stmt and there are inferred ranges, reparse the
+// block for transitive inferred ranges that occur earlier in the block.
+if (stmt == m_last_bb_stmt)
+  m_ranger->register_transitive_inferred_ranges (gimple_bb (stmt));
   }
 
   bool fold_stmt (gimple_stmt_iterator *gsi) override
   {
-gimple *s = gsi_stmt (*gsi);
-// If this is a block ending condition, and there are inferred ranges,
-// reparse the block to see if there are any transitive inferred ranges.
-if (is_a (s))
-  {
-	basic_block bb = gimple_bb (s);
-	if (bb && s == gimple_outgoing_range_stmt_p (bb))
-	  m_ranger->register_transitive_inferred_ranges (bb);
-  }
 bool ret = m_simplifier.simplify (gsi);
 if (!ret)
   ret = m_ranger->fold_stmt (gsi, follow_single_use_edges);
@@ -4523,6 +4520,7 @@ private:
   gimple_ranger *m_ranger;
   simplify_using_ranges m_simplifier;
   pointer_equiv_analyzer *m_pta;
+  gimple *m_last_bb_stmt;
 };
 
 /* Main entry point for a VRP pass using just ranger. This can be called
-- 
2.37.3

[PATCH] Handle epilogues that contain jumps

2022-11-11 Thread Richard Sandiford via Gcc-patches

The prologue/epilogue pass allows the prologue sequence
to contain jumps.  The sequence is then partitioned into
basic blocks using find_many_sub_basic_blocks.

This patch treats epilogues in the same way.  It's needed for
a follow-on aarch64 patch that adds conditional code to both
the prologue and the epilogue.

Tested on aarch64-linux-gnu (including with a follow-on patch)
and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* function.cc (thread_prologue_and_epilogue_insns): Handle
epilogues that contain jumps.
---
 gcc/function.cc | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/function.cc b/gcc/function.cc
index d3da20ede7f..b54a1d81a3b 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -6136,6 +6136,11 @@ thread_prologue_and_epilogue_insns (void)
  && returnjump_p (BB_END (e->src)))
e->flags &= ~EDGE_FALLTHRU;
}
+
+ auto_sbitmap blocks (last_basic_block_for_fn (cfun));
+ bitmap_clear (blocks);
+   bitmap_set_bit (blocks, BLOCK_FOR_INSN (epilogue_seq)->index);
+ find_many_sub_basic_blocks (blocks);
}
   else if (next_active_insn (BB_END (exit_fallthru_edge->src)))
{
@@ -6234,6 +6239,11 @@ thread_prologue_and_epilogue_insns (void)
  set_insn_locations (seq, epilogue_location);
 
  emit_insn_before (seq, insn);
+
+ auto_sbitmap blocks (last_basic_block_for_fn (cfun));
+ bitmap_clear (blocks);
+ bitmap_set_bit (blocks, BLOCK_FOR_INSN (insn)->index);
+ find_many_sub_basic_blocks (blocks);
}
 }
 
-- 
2.25.1

[PATCH] Allow prologues and epilogues to be inserted later

2022-11-11 Thread Richard Sandiford via Gcc-patches

Arm's SME adds a new processor mode called streaming mode.
This mode enables some new (matrix-oriented) instructions and
disables several existing groups of instructions, such as most
Advanced SIMD vector instructions and a much smaller set of SVE
instructions.  It can also change the current vector length.

There are instructions to switch in and out of streaming mode.
However, their effect on the ISA and vector length can't be represented
directly in RTL, so they need to be emitted late in the pass pipeline,
close to md_reorg.

It's sometimes the responsibility of the prologue and epilogue to
switch modes, which means we need to emit the prologue and epilogue
sequences late as well.  (This loses shrink-wrapping and scheduling
opportunities, but that's a price worth paying.)

This patch therefore adds a target hook for forcing prologue
and epilogue insertion to happen later in the pipeline.

Tested on aarch64-linux-gnu (including with a follow-on patch)
and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* target.def (use_late_prologue_epilogue): New hook.
* doc/gccint/target-macros/miscellaneous-parameters.rst: Add
TARGET_USE_LATE_PROLOGUE_EPILOGUE.
* doc/gccint/target-macros/tm.rst.in: Regenerate.
* passes.def (pass_late_thread_prologue_and_epilogue): New pass.
* tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare.
* function.cc (pass_thread_prologue_and_epilogue::gate): New function.
(pass_data_late_thread_prologue_and_epilogue): New pass variable.
(pass_late_thread_prologue_and_epilogue): New pass class.
(make_pass_late_thread_prologue_and_epilogue): New function.
---
 .../miscellaneous-parameters.rst  |  5 +++
 gcc/doc/gccint/target-macros/tm.rst.in| 22 ++
 gcc/function.cc   | 43 +++
 gcc/passes.def|  3 ++
 gcc/target.def| 21 +
 gcc/tree-pass.h   |  2 +
 6 files changed, 96 insertions(+)

diff --git a/gcc/doc/gccint/target-macros/miscellaneous-parameters.rst 
b/gcc/doc/gccint/target-macros/miscellaneous-parameters.rst
index e4e348c2adc..b48f91d3fd2 100644
--- a/gcc/doc/gccint/target-macros/miscellaneous-parameters.rst
+++ b/gcc/doc/gccint/target-macros/miscellaneous-parameters.rst
@@ -551,6 +551,11 @@ Here are several miscellaneous parameters.
   of the if-block in the ``struct ce_if_block`` structure that is pointed
   to by :samp:`{ce_info}`.
 
+.. include:: tm.rst.in
+  :start-after: [TARGET_USE_LATE_PROLOGUE_EPILOGUE]
+  :end-before: [TARGET_USE_LATE_PROLOGUE_EPILOGUE]
+
+
 .. include:: tm.rst.in
   :start-after: [TARGET_MACHINE_DEPENDENT_REORG]
   :end-before: [TARGET_MACHINE_DEPENDENT_REORG]
diff --git a/gcc/doc/gccint/target-macros/tm.rst.in 
b/gcc/doc/gccint/target-macros/tm.rst.in
index 44f3a3b..2e789f8723d 100644
--- a/gcc/doc/gccint/target-macros/tm.rst.in
+++ b/gcc/doc/gccint/target-macros/tm.rst.in
@@ -3702,6 +3702,28 @@
 
 [TARGET_CC_MODES_COMPATIBLE]
 
+[TARGET_USE_LATE_PROLOGUE_EPILOGUE]
+.. function:: bool TARGET_USE_LATE_PROLOGUE_EPILOGUE ()
+
+  Return true if the current function's prologue and epilogue should
+  be emitted late in the pass pipeline, instead of at the usual point.
+  
+  Normally, the prologue and epilogue sequences are introduced soon after
+  register allocation is complete.  The advantage of this approach is that
+  it allows the prologue and epilogue instructions to be optimized and
+  scheduled with other code in the function.  However, some targets
+  require the prologue and epilogue to be the first and last sequences
+  executed by the function, with no variation allowed.  This hook should
+  return true on such targets.
+  
+  The default implementation returns false, which is correct for most
+  targets.  The hook should only return true if there is a specific
+  target limitation that cannot be described in RTL.  For example,
+  the hook might return true if the prologue and epilogue need to switch
+  between instruction sets.
+
+[TARGET_USE_LATE_PROLOGUE_EPILOGUE]
+
 [TARGET_MACHINE_DEPENDENT_REORG]
 .. function:: void TARGET_MACHINE_DEPENDENT_REORG (void)
 
diff --git a/gcc/function.cc b/gcc/function.cc
index b54a1d81a3b..3b1ab5d09e5 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -6641,6 +6641,11 @@ public:
   {}
 
   /* opt_pass methods: */
+  bool gate (function *) final override
+{
+  return !targetm.use_late_prologue_epilogue ();
+}
+
   unsigned int execute (function *) final override
 {
   return rest_of_handle_thread_prologue_and_epilogue ();
@@ -6648,6 +6653,38 @@ public:
 
 }; // class pass_thread_prologue_and_epilogue
 
+const pass_data pass_data_late_thread_prologue_and_epilogue =
+{
+  RTL_PASS, /* type */
+  "late_pro_and_epilogue", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_THREAD_PROLOGUE_AND_EPILOGUE, /* tv_id */
+  0, /* propert

[PATCH] Add a target hook for sibcall epilogues

2022-11-11 Thread Richard Sandiford via Gcc-patches

Epilogues for sibling calls are generated using the
sibcall_epilogue pattern.  One disadvantage of this approach
is that the target doesn't know which call the epilogue is for,
even though the code that generates the pattern has the call
to hand.

Although call instructions are currently rtxes, and so could be
passed as an operand to the pattern, the main point of introducing
rtx_insn was to move towards separating the rtx and insn types
(a good thing IMO).  There also isn't an existing practice of
passing genuine instructions (as opposed to labels) to
instruction patterns.

This patch therefore adds a hook that can be defined as an
alternative to sibcall_epilogue.  The advantage is that it
can be passed the call; the disadvantage is that it can't
use .md conveniences like generating instructions from
textual patterns (although most epilogues are too complex
to benefit much from that anyway).

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* doc/gccint/target-macros/miscellaneous-parameters.rst:
Add TARGET_EMIT_EPILOGUE_FOR_SIBCALL.
* doc/gccint/target-macros/tm.rst.in: Regenerate.
* target.def (emit_epilogue_for_sibcall): New hook.
* calls.cc (can_implement_as_sibling_call_p): Use it.
* function.cc (thread_prologue_and_epilogue_insns): Likewise.
(reposition_prologue_and_epilogue_notes): Likewise.
* config/aarch64/aarch64-protos.h (aarch64_expand_epilogue): Take
an rtx_call_insn * rather than a bool.
* config/aarch64/aarch64.cc (aarch64_expand_epilogue): Likewise.
(TARGET_EMIT_EPILOGUE_FOR_SIBCALL): Define.
* config/aarch64/aarch64.md (epilogue): Update call.
(sibcall_epilogue): Delete.
---
 gcc/calls.cc  |  3 ++-
 gcc/config/aarch64/aarch64-protos.h   |  2 +-
 gcc/config/aarch64/aarch64.cc | 11 +++
 gcc/config/aarch64/aarch64.md | 11 +--
 .../target-macros/miscellaneous-parameters.rst|  5 +
 gcc/doc/gccint/target-macros/tm.rst.in| 11 +++
 gcc/function.cc   | 15 +--
 gcc/target.def|  9 +
 8 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/gcc/calls.cc b/gcc/calls.cc
index 6dd6f73e978..51b664f1b4d 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -2493,7 +2493,8 @@ can_implement_as_sibling_call_p (tree exp,
 tree addr,
 const args_size &args_size)
 {
-  if (!targetm.have_sibcall_epilogue ())
+  if (!targetm.have_sibcall_epilogue ()
+  && !targetm.emit_epilogue_for_sibcall)
 {
   maybe_complain_about_tail_call
(exp,
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 238820581c5..3d81c223b01 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -886,7 +886,7 @@ const char * aarch64_gen_far_branch (rtx *, int, const char 
*, const char *);
 const char * aarch64_output_probe_stack_range (rtx, rtx);
 const char * aarch64_output_probe_sve_stack_clash (rtx, rtx, rtx, rtx);
 void aarch64_err_no_fpadvsimd (machine_mode);
-void aarch64_expand_epilogue (bool);
+void aarch64_expand_epilogue (rtx_call_insn *);
 rtx aarch64_ptrue_all (unsigned int);
 opt_machine_mode aarch64_ptrue_all_mode (rtx);
 rtx aarch64_convert_sve_data_to_pred (rtx, machine_mode, rtx);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d1f979ebcf8..41a2181a7d3 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -10003,7 +10003,7 @@ aarch64_use_return_insn_p (void)
from a deallocated stack, and we optimize the unwind records by
emitting them all together if possible.  */
 void
-aarch64_expand_epilogue (bool for_sibcall)
+aarch64_expand_epilogue (rtx_call_insn *sibcall)
 {
   poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
   HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
@@ -10151,7 +10151,7 @@ aarch64_expand_epilogue (bool for_sibcall)
   explicitly authenticate.
 */
   if (aarch64_return_address_signing_enabled ()
-  && (for_sibcall || !TARGET_ARMV8_3))
+  && (sibcall || !TARGET_ARMV8_3))
 {
   switch (aarch64_ra_sign_key)
{
@@ -10169,7 +10169,7 @@ aarch64_expand_epilogue (bool for_sibcall)
 }
 
   /* Stack adjustment for exception handler.  */
-  if (crtl->calls_eh_return && !for_sibcall)
+  if (crtl->calls_eh_return && !sibcall)
 {
   /* We need to unwind the stack by the offset computed by
 EH_RETURN_STACKADJ_RTX.  We have already reset the CFA
@@ -10180,7 +10180,7 @@ aarch64_expand_epilogue (bool for_sibcall)
 }
 
   emit_use (gen_rtx_REG (DImode, LR_REGNUM));
-  if (!for_sibcall)
+  if (!sibcall)
 emit_jump_insn (ret_rtx);
 }
 
@@ -27906,6 +27906,9

[PATCH] Add a new target hook: TARGET_START_CALL_ARGS

2022-11-11 Thread Richard Sandiford via Gcc-patches

We have the following two hooks into the call expansion code:

- TARGET_CALL_ARGS is called for each argument before arguments
  are moved into hard registers.

- TARGET_END_CALL_ARGS is called after the end of the call
  sequence (specifically, after any return value has been
  moved to a pseudo).

This patch adds a TARGET_START_CALL_ARGS hook that is called before
the TARGET_CALL_ARGS sequence.  This means that TARGET_START_CALL_REGS
and TARGET_END_CALL_REGS bracket the region in which argument pseudos
might be live.  They also bracket a region in which the only call
emiitted by target-independent code is the call to the target function
itself.  (For example, TARGET_START_CALL_ARGS happens after any use of
memcpy to copy arguments, and TARGET_END_CALL_ARGS happens before any
use of memcpy to copy the result.)

Also, the patch adds the cumulative argument structure as an argument
to the hooks, so that the target can use it to record and retrieve
information about the call as a whole.

The TARGET_CALL_ARGS docs said:

   While generating RTL for a function call, this target hook is invoked once
   for each argument passed to the function, either a register returned by
   ``TARGET_FUNCTION_ARG`` or a memory location.  It is called just
-  before the point where argument registers are stored.

The last bit was true for normal calls, but for libcalls the hook was
invoked earlier, before stack arguments have been copied.  I don't think
this caused a practical difference for nvptx (the only port to use the
hooks) since I wouldn't expect any libcalls to take stack parameters.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Also tested by
building cc1 for nvptx-none.  OK to install?

Richard


gcc/
* doc/gccint/target-macros/implementing-the-varargs-macros.rst:
Add TARGET_START_CALL_ARGS.
* doc/gccint/target-macros/tm.rst.in: Regenerate.
* target.def (start_call_args): New hook.
(call_args, end_call_args): Add a parameter for the cumulative
argument information.
* hooks.h (hook_void_rtx_tree): Delete.
* hooks.cc (hook_void_rtx_tree): Likewise.
* targhooks.h (hook_void_CUMULATIVE_ARGS): Declare.
(hook_void_CUMULATIVE_ARGS_rtx_tree): Likewise.
* targhooks.cc (hook_void_CUMULATIVE_ARGS): New function.
(hook_void_CUMULATIVE_ARGS_rtx_tree): Likewise.
* calls.cc (expand_call): Call start_call_args before computing
and storing stack parameters.  Pass the cumulative argument
information to call_args and end_call_args.
(emit_library_call_value_1): Likewise.
* config/nvptx/nvptx.cc (nvptx_call_args): Add a cumulative
argument parameter.
(nvptx_end_call_args): Likewise.
---
 gcc/calls.cc  | 61 ++-
 gcc/config/nvptx/nvptx.cc |  4 +-
 .../implementing-the-varargs-macros.rst   |  5 ++
 gcc/doc/gccint/target-macros/tm.rst.in| 53 +---
 gcc/hooks.cc  |  5 --
 gcc/hooks.h   |  1 -
 gcc/target.def| 56 +
 gcc/targhooks.cc  | 10 +++
 gcc/targhooks.h   |  5 +-
 9 files changed, 140 insertions(+), 60 deletions(-)

diff --git a/gcc/calls.cc b/gcc/calls.cc
index 51b664f1b4d..d3287bcc277 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -3542,15 +3542,26 @@ expand_call (tree exp, rtx target, int ignore)
sibcall_failure = 1;
}
 
+  /* Set up the next argument register.  For sibling calls on machines
+with register windows this should be the incoming register.  */
+  if (pass == 0)
+   next_arg_reg = targetm.calls.function_incoming_arg
+ (args_so_far, function_arg_info::end_marker ());
+  else
+   next_arg_reg = targetm.calls.function_arg
+ (args_so_far, function_arg_info::end_marker ());
+
+  targetm.calls.start_call_args (args_so_far);
+
   bool any_regs = false;
   for (i = 0; i < num_actuals; i++)
if (args[i].reg != NULL_RTX)
  {
any_regs = true;
-   targetm.calls.call_args (args[i].reg, funtype);
+   targetm.calls.call_args (args_so_far, args[i].reg, funtype);
  }
   if (!any_regs)
-   targetm.calls.call_args (pc_rtx, funtype);
+   targetm.calls.call_args (args_so_far, pc_rtx, funtype);
 
   /* Figure out the register where the value, if any, will come back.  */
   valreg = 0;
@@ -3613,15 +3624,6 @@ expand_call (tree exp, rtx target, int ignore)
 later safely search backwards to find the CALL_INSN.  */
   before_call = get_last_insn ();
 
-  /* Set up next argument register.  For sibling calls on machines
-with register windows this should be the incoming register.  */
-  if (pass == 0)
-   next_arg_reg = targetm.calls.function_incoming_arg
-

[PATCH] c++: Implement C++23 P2647R1 - Permitting static constexpr variables in constexpr functions

2022-11-11 Thread Jakub Jelinek via Gcc-patches

Hi!

The following patch on top of Marek's P2448 PR106649 patch
(mainly because that patch implements the previous __cpp_constexpr
feature test macro bump so this can't go in earlier; OT,
P2280R4 doesn't have any feature test macro?) implements this
simple paper.

Ok for trunk if it passes bootstrap/regtest and is voted into C++23?

2022-11-11  Jakub Jelinek  

gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Bump __cpp_constexpr
value from 202207L to 202211L.
gcc/cp/
* constexpr.cc (cxx_eval_constant_expression): Implement C++23
P2647R1 - Permitting static constexpr variables in constexpr functions.
Allow decl_maybe_constant_var_p static or thread_local vars for
C++23.
(potential_constant_expression_1): Likewise.
gcc/testsuite/
* g++.dg/cpp23/constexpr-nonlit17.C: New test.
* g++.dg/cpp23/feat-cxx2b.C: Adjust expected __cpp_constexpr
value.

--- gcc/c-family/c-cppbuiltin.cc.jj 2022-11-11 17:14:52.021613271 +0100
+++ gcc/c-family/c-cppbuiltin.cc2022-11-11 17:17:45.065265246 +0100
@@ -1074,7 +1074,7 @@ c_cpp_builtins (cpp_reader *pfile)
  /* Set feature test macros for C++23.  */
  cpp_define (pfile, "__cpp_size_t_suffix=202011L");
  cpp_define (pfile, "__cpp_if_consteval=202106L");
- cpp_define (pfile, "__cpp_constexpr=202207L");
+ cpp_define (pfile, "__cpp_constexpr=202211L");
  cpp_define (pfile, "__cpp_multidimensional_subscript=202110L");
  cpp_define (pfile, "__cpp_named_character_escapes=202207L");
  cpp_define (pfile, "__cpp_static_call_operator=202207L");
--- gcc/cp/constexpr.cc.jj  2022-11-11 17:14:52.024613231 +0100
+++ gcc/cp/constexpr.cc 2022-11-11 17:16:54.384952917 +0100
@@ -7085,7 +7085,8 @@ cxx_eval_constant_expression (const cons
&& (TREE_STATIC (r)
|| (CP_DECL_THREAD_LOCAL_P (r) && !DECL_REALLY_EXTERN (r)))
/* Allow __FUNCTION__ etc.  */
-   && !DECL_ARTIFICIAL (r))
+   && !DECL_ARTIFICIAL (r)
+   && (cxx_dialect < cxx23 || !decl_maybe_constant_var_p (r)))
  {
if (!ctx->quiet)
  {
@@ -9577,7 +9578,10 @@ potential_constant_expression_1 (tree t,
   tmp = DECL_EXPR_DECL (t);
   if (VAR_P (tmp) && !DECL_ARTIFICIAL (tmp))
{
- if (CP_DECL_THREAD_LOCAL_P (tmp) && !DECL_REALLY_EXTERN (tmp))
+ if (CP_DECL_THREAD_LOCAL_P (tmp)
+ && !DECL_REALLY_EXTERN (tmp)
+ && (cxx_dialect < cxx23
+ || !decl_maybe_constant_var_p (tmp)))
{
  if (flags & tf_error)
constexpr_error (DECL_SOURCE_LOCATION (tmp), fundef_p,
@@ -9585,7 +9589,9 @@ potential_constant_expression_1 (tree t,
 "% context", tmp);
  return false;
}
- else if (TREE_STATIC (tmp))
+ else if (TREE_STATIC (tmp)
+  && (cxx_dialect < cxx23
+  || !decl_maybe_constant_var_p (tmp)))
{
  if (flags & tf_error)
constexpr_error (DECL_SOURCE_LOCATION (tmp), fundef_p,
--- gcc/testsuite/g++.dg/cpp23/constexpr-nonlit17.C.jj  2022-11-11 
17:59:59.972852793 +0100
+++ gcc/testsuite/g++.dg/cpp23/constexpr-nonlit17.C 2022-11-11 
17:59:38.725141231 +0100
@@ -0,0 +1,12 @@
+// P2647R1 - Permitting static constexpr variables in constexpr functions
+// { dg-do compile { target c++23 } }
+
+constexpr char
+test ()
+{
+  static const int x = 5;
+  static constexpr char c[] = "Hello World";
+  return *(c + x);
+}
+
+static_assert (test () == ' ');
--- gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C.jj  2022-11-11 17:14:52.194610922 
+0100
+++ gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C 2022-11-11 17:48:56.038865825 
+0100
@@ -134,8 +134,8 @@
 
 #ifndef __cpp_constexpr
 #  error "__cpp_constexpr"
-#elif __cpp_constexpr != 202207
-#  error "__cpp_constexpr != 202207"
+#elif __cpp_constexpr != 202211
+#  error "__cpp_constexpr != 202211"
 #endif
 
 #ifndef __cpp_decltype_auto

Jakub

Re: [PATCH] libstdc++: Set active union member in constexpr std::string [PR103295]

2022-11-11 Thread Jonathan Wakely via Gcc-patches

On Fri, 11 Nov 2022 at 11:23, Nathaniel Shead via Libstdc++
 wrote:
>
> Hi,
>
> Below is a patch to fix std::string in constexpr contexts on Clang. This
> was originally fixed in the commits attached to PR103295, but a later
> commit 98a0d72a seems to have mistakenly undone this.
>
> Tested on x86_64-linux. Verified using clang-14 and clang-15 that the
> fix works. I haven't added anything to the test suite, since this issue
> is only detected by clang.
>
> This is my first time contributing, so please let me know if I've done
> anything wrong or missed something. Thanks!

Thanks for the patch, I'll get this committed today.

The only thing I had to fix was the indentation in the commit log. The
second line of the ChangeLog should be aligned with the * not the text
following it (so indented by a single tab).


>
> Nathaniel
>
> -- >8 --
>
> Clang still complains about using std::string in constexpr contexts due
> to the changes made in commit 98a0d72a. This patch ensures that we set
> the active member of the union as according to [class.union.general] p6.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/103295
> * include/bits/basic_string.h (_M_use_local_data): Set active
>   member to _M_local_buf.
>
> Signed-off-by: Nathaniel Shead 
> ---
>  libstdc++-v3/include/bits/basic_string.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/basic_string.h 
> b/libstdc++-v3/include/bits/basic_string.h
> index 9c2b57f5a1d..2790fd49b05 100644
> --- a/libstdc++-v3/include/bits/basic_string.h
> +++ b/libstdc++-v3/include/bits/basic_string.h
> @@ -352,8 +352,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>{
>  #if __cpp_lib_is_constant_evaluated
> if (std::is_constant_evaluated())
> - for (_CharT& __c : _M_local_buf)
> -   __c = _CharT();
> + for (size_type i = 0; i <= _S_local_capacity; ++i)
> +   _M_local_buf[i] = _CharT();
>  #endif
> return _M_local_data();
>}
> --
> 2.34.1
>

Re: why does gccgit require pthread?

2022-11-11 Thread Jonathan Wakely via Gcc-patches

On Mon, 7 Nov 2022 at 13:51, Jonathan Wakely wrote:
>
> On Mon, 7 Nov 2022 at 13:33, LIU Hao wrote:
> >
> > 在 2022-11-07 20:57, Jonathan Wakely 写道:
> > > It would be a lot nicer if playback::context met the C++ Lockable
> > > requirements, and playback::context::compile () could just take a
> > > scoped lock on *this:
> > >
> > >
> >
> > Yeah yeah that makes a lot of sense. Would you please just commit that? I 
> > don't have write access to
> > GCC repo, and it takes a couple of hours for me to bootstrap GCC just for 
> > this tiny change.
>
> Somebody else needs to approve it first. I'll combine our patches and
> test and submit it properly for approval.

Here's a complete patch that actually builds now, although I'm seeing
a stage 2 vs stage 3 comparison error which I don't have time to look
into right now.
commit 5dde4bd09c4706617120a42c5953908ae39b5751
Author: Jonathan Wakely 
Date:   Fri Nov 11 12:48:29 2022

jit: Use std::mutex instead of pthread_mutex_t

This allows JIT to be built with a different thread model from posix
where pthread isn't available

By renaming the acquire_mutex () and release_mutex () member functions
to lock() and unlock() we make the playback::context type meet the C++
Lockable requirements. This allows it to be used with a scoped lock
(i.e. RAII) type as std::lock_guard. This automatically releases the
mutex when leaving the scope.

Co-authored-by: LIU Hao 

gcc/jit/ChangeLog:

* jit-playback.cc (playback::context::scoped_lock): Define RAII
lock type.
(playback::context::compile): Use scoped_lock to acquire mutex
for the active playback context.
(jit_mutex): Change to std::mutex.
(playback::context::acquire_mutex): Rename to ...
(playback::context::lock): ... this.
(playback::context::release_mutex): Rename to ...
(playback::context::unlock): ... this.
* jit-playback.h (playback::context): Rename members and declare
scoped_lock.
* jit-recording.cc (INCLUDE_PTHREAD_H): Remove unused define.
* libgccjit.cc (version_mutex): Change to std::mutex.
(struct jit_version_info): Use std::lock_guard to acquire and
release mutex.

gcc/ChangeLog:

* system.h [INCLUDE_MUTEX]: Include header for std::mutex.

diff --git a/gcc/jit/jit-playback.cc b/gcc/jit/jit-playback.cc
index d227d36283a..bf006903a44 100644
--- a/gcc/jit/jit-playback.cc
+++ b/gcc/jit/jit-playback.cc
@@ -19,7 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
-#define INCLUDE_PTHREAD_H
+#define INCLUDE_MUTEX
 #include "system.h"
 #include "coretypes.h"
 #include "target.h"
@@ -2302,6 +2302,20 @@ block (function *func,
   m_label_expr = NULL;
 }
 
+// This is basically std::lock_guard but it can call the private lock/unlock
+// members of playback::context.
+struct playback::context::scoped_lock
+{
+  scoped_lock (context &ctx) : m_ctx (&ctx) { m_ctx->lock (); }
+  ~scoped_lock () { m_ctx->unlock (); }
+
+  context *m_ctx;
+
+  // Not movable or copyable.
+  scoped_lock (scoped_lock &&) = delete;
+  scoped_lock &operator= (scoped_lock &&) = delete;
+};
+
 /* Compile a playback::context:
 
- Use the context's options to cconstruct command-line options, and
@@ -2353,15 +2367,12 @@ compile ()
   m_recording_ctxt->get_all_requested_dumps (&requested_dumps);
 
   /* Acquire the JIT mutex and set "this" as the active playback ctxt.  */
-  acquire_mutex ();
+  scoped_lock lock(*this);
 
   auto_string_vec fake_args;
   make_fake_args (&fake_args, ctxt_progname, &requested_dumps);
   if (errors_occurred ())
-{
-  release_mutex ();
-  return;
-}
+return;
 
   /* This runs the compiler.  */
   toplev toplev (get_timer (), /* external_timer */
@@ -2388,10 +2399,7 @@ compile ()
  followup activities use timevars, which are global state.  */
 
   if (errors_occurred ())
-{
-  release_mutex ();
-  return;
-}
+return;
 
   if (get_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE))
 dump_generated_code ();
@@ -2403,8 +2411,6 @@ compile ()
  convert the .s file to the requested output format, and copy it to a
  given file (playback::compile_to_file).  */
   postprocess (ctxt_progname);
-
-  release_mutex ();
 }
 
 /* Implementation of class gcc::jit::playback::compile_to_memory,
@@ -2662,18 +2668,18 @@ playback::compile_to_file::copy_file (const char 
*src_path,
 /* This mutex guards gcc::jit::recording::context::compile, so that only
one thread can be accessing the bulk of GCC's state at once.  */
 
-static pthread_mutex_t jit_mutex = PTHREAD_MUTEX_INITIALIZER;
+static std::mutex jit_mutex;
 
 /* Acquire jit_mutex and set "this" as the active playback ctxt.  */
 
 void
-playback::context::acquire_mutex ()
+playback::context::lock ()
 {
   auto_tim

Re: [PATCH] doc: Ada: include Indices and Tables in manuals

2022-11-11 Thread Arnaud Charlet via Gcc-patches



> Similarly to other manuals, we should include the page
> in HTML builder.
> 
> What Ada folks think about it?

The latest changes have broken our build of the Ada doc at AdaCore so until 
further notice, please do not make any additional changes to the Ada doc while 
we review in details all the recent changes and find a way to recover, thank 
you.

Arno

[PATCH] Allow targets to add USEs to asms

2022-11-11 Thread Richard Sandiford via Gcc-patches

Arm's SME has an array called ZA that for inline asm purposes
is effectively a form of special-purpose memory.  It doesn't
have an associated storage type and so can't be passed and
returned in normal C/C++ objects.

We'd therefore like "za" in a clobber list to mean that an inline
asm can read from and write to ZA.  (Just reading or writing
individually is unlikely to be useful, but we could add syntax
for that too if necessary.)

There is currently a TARGET_MD_ASM_ADJUST target hook that allows
targets to add clobbers to an asm instruction.  This patch
extends that to allow targets to add USEs as well.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Also tested by
building cc1 for one target per affected CPU.  OK to install?

Richard


gcc/
* target.def (md_asm_adjust): Add a uses parameter.
* doc/gccint/target-macros/tm.rst.in: Regenerate.
* cfgexpand.cc (expand_asm_loc): Update call to md_asm_adjust.
Handle any USEs created by the target.
(expand_asm_stmt): Likewise.
* recog.cc (asm_noperands): Handle asms with USEs.
(decode_asm_operands): Likewise.
* config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add uses
parameter.
* config/arm/aarch-common.cc (arm_md_asm_adjust): Likewise.
* config/arm/arm.cc (thumb1_md_asm_adjust): Likewise.
* config/avr/avr.cc (avr_md_asm_adjust): Likewise.
* config/cris/cris.cc (cris_md_asm_adjust): Likewise.
* config/i386/i386.cc (ix86_md_asm_adjust): Likewise.
* config/mn10300/mn10300.cc (mn10300_md_asm_adjust): Likewise.
* config/nds32/nds32.cc (nds32_md_asm_adjust): Likewise.
* config/pdp11/pdp11.cc (pdp11_md_asm_adjust): Likewise.
* config/rs6000/rs6000.cc (rs6000_md_asm_adjust): Likewise.
* config/s390/s390.cc (s390_md_asm_adjust): Likewise.
* config/vax/vax.cc (vax_md_asm_adjust): Likewise.
* config/visium/visium.cc (visium_md_asm_adjust): Likewise.
---
 gcc/cfgexpand.cc   | 37 +-
 gcc/config/arm/aarch-common-protos.h   |  2 +-
 gcc/config/arm/aarch-common.cc |  3 ++-
 gcc/config/arm/arm.cc  |  5 ++--
 gcc/config/avr/avr.cc  |  1 +
 gcc/config/cris/cris.cc|  6 +++--
 gcc/config/i386/i386.cc|  5 ++--
 gcc/config/mn10300/mn10300.cc  |  3 ++-
 gcc/config/nds32/nds32.cc  |  4 +--
 gcc/config/pdp11/pdp11.cc  |  6 +++--
 gcc/config/rs6000/rs6000.cc|  3 ++-
 gcc/config/s390/s390.cc|  3 ++-
 gcc/config/vax/vax.cc  |  4 ++-
 gcc/config/visium/visium.cc|  5 ++--
 gcc/doc/gccint/target-macros/tm.rst.in |  9 ---
 gcc/recog.cc   | 20 +-
 gcc/target.def |  9 ---
 17 files changed, 81 insertions(+), 44 deletions(-)

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index dd29c03..82cb6450281 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -2873,6 +2873,7 @@ expand_asm_loc (tree string, int vol, location_t locus)
   auto_vec input_rvec, output_rvec;
   auto_vec input_mode;
   auto_vec constraints;
+  auto_vec use_rvec;
   auto_vec clobber_rvec;
   HARD_REG_SET clobbered_regs;
   CLEAR_HARD_REG_SET (clobbered_regs);
@@ -2882,16 +2883,20 @@ expand_asm_loc (tree string, int vol, location_t locus)
 
   if (targetm.md_asm_adjust)
targetm.md_asm_adjust (output_rvec, input_rvec, input_mode,
-  constraints, clobber_rvec, clobbered_regs,
-  locus);
+  constraints, use_rvec, clobber_rvec,
+  clobbered_regs, locus);
 
   asm_op = body;
   nclobbers = clobber_rvec.length ();
-  body = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (1 + nclobbers));
+  auto nuses = use_rvec.length ();
+  body = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (1 + nuses + nclobbers));
 
-  XVECEXP (body, 0, 0) = asm_op;
-  for (i = 0; i < nclobbers; i++)
-   XVECEXP (body, 0, i + 1) = gen_rtx_CLOBBER (VOIDmode, clobber_rvec[i]);
+  i = 0;
+  XVECEXP (body, 0, i++) = asm_op;
+  for (rtx use : use_rvec)
+   XVECEXP (body, 0, i++) = gen_rtx_USE (VOIDmode, use);
+  for (rtx clobber : clobber_rvec)
+   XVECEXP (body, 0, i++) = gen_rtx_CLOBBER (VOIDmode, clobber);
 }
 
   emit_insn (body);
@@ -3443,11 +3448,12 @@ expand_asm_stmt (gasm *stmt)
  maintaining source-level compatibility means automatically clobbering
  the flags register.  */
   rtx_insn *after_md_seq = NULL;
+  auto_vec use_rvec;
   if (targetm.md_asm_adjust)
 after_md_seq
= targetm.md_asm_adjust (output_rvec, input_rvec, input_mode,
-constraints, clobber_rvec, clobbered_regs,
-locus);
+constraints, us

[PATCH] aarch64: Use SVE's RDVL instruction

2022-11-11 Thread Richard Sandiford via Gcc-patches

We didn't previously use SVE's RDVL instruction, since the CNT*
forms are preferred and provide most of the range.  However,
there are some cases that RDVL can handle and CNT* can't,
and using RDVL-like instructions becomes important for SME.

Tested on aarch64-linux-gnu.  I plan to apply this soon if there
are no comments.

Thanks,
Richard


gcc/
* config/aarch64/aarch64-protos.h (aarch64_sve_rdvl_immediate_p)
(aarch64_output_sve_rdvl): Declare.
* config/aarch64/aarch64.cc (aarch64_sve_cnt_factor_p): New
function, split out from...
(aarch64_sve_cnt_immediate_p): ...here.
(aarch64_sve_rdvl_factor_p): New function.
(aarch64_sve_rdvl_immediate_p): Likewise.
(aarch64_output_sve_rdvl): Likewise.
(aarch64_offset_temporaries): Rewrite the SVE handling to use RDVL
for some cases.
(aarch64_expand_mov_immediate): Handle RDVL immediates.
(aarch64_mov_operand_p): Likewise.
* config/aarch64/constraints.md (Usr): New constraint.
* config/aarch64/aarch64.md (*mov_aarch64): Add an RDVL
alternative.
(*movsi_aarch64, *movdi_aarch64): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/cntb.c: Tweak expected output.
* gcc.target/aarch64/sve/acle/asm/cnth.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/cntw.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/cntd.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfb.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfh.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfw.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfd.c: Likewise.
* gcc.target/aarch64/sve/loop_add_4.c: Expect RDVL to be used
to calculate the -17 and 17 factors.
* gcc.target/aarch64/sve/pcs/stack_clash_1.c: Likewise the 18 factor.
---
 gcc/config/aarch64/aarch64-protos.h   |   2 +
 gcc/config/aarch64/aarch64.cc | 196 --
 gcc/config/aarch64/aarch64.md |  58 +++---
 gcc/config/aarch64/constraints.md |   6 +
 .../gcc.target/aarch64/sve/acle/asm/cntb.c|  71 +--
 .../gcc.target/aarch64/sve/acle/asm/cntd.c|  12 +-
 .../gcc.target/aarch64/sve/acle/asm/cnth.c|  20 +-
 .../gcc.target/aarch64/sve/acle/asm/cntw.c|  16 +-
 .../gcc.target/aarch64/sve/acle/asm/prfb.c|   6 +-
 .../gcc.target/aarch64/sve/acle/asm/prfd.c|   4 +-
 .../gcc.target/aarch64/sve/acle/asm/prfh.c|   4 +-
 .../gcc.target/aarch64/sve/acle/asm/prfw.c|   4 +-
 .../gcc.target/aarch64/sve/loop_add_4.c   |   6 +-
 .../aarch64/sve/pcs/stack_clash_1.c   |   3 +-
 14 files changed, 260 insertions(+), 148 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 3d81c223b01..866d68ad4d7 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -801,6 +801,7 @@ bool aarch64_sve_mode_p (machine_mode);
 HOST_WIDE_INT aarch64_fold_sve_cnt_pat (aarch64_svpattern, unsigned int);
 bool aarch64_sve_cnt_immediate_p (rtx);
 bool aarch64_sve_scalar_inc_dec_immediate_p (rtx);
+bool aarch64_sve_rdvl_immediate_p (rtx);
 bool aarch64_sve_addvl_addpl_immediate_p (rtx);
 bool aarch64_sve_vector_inc_dec_immediate_p (rtx);
 int aarch64_add_offset_temporaries (rtx);
@@ -813,6 +814,7 @@ char *aarch64_output_sve_prefetch (const char *, rtx, const 
char *);
 char *aarch64_output_sve_cnt_immediate (const char *, const char *, rtx);
 char *aarch64_output_sve_cnt_pat_immediate (const char *, const char *, rtx *);
 char *aarch64_output_sve_scalar_inc_dec (rtx);
+char *aarch64_output_sve_rdvl (rtx);
 char *aarch64_output_sve_addvl_addpl (rtx);
 char *aarch64_output_sve_vector_inc_dec (const char *, rtx);
 char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 41a2181a7d3..a40ac6fd903 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -5266,6 +5266,18 @@ aarch64_fold_sve_cnt_pat (aarch64_svpattern pattern, 
unsigned int nelts_per_vq)
   return -1;
 }
 
+/* Return true if a single CNT[BHWD] instruction can multiply FACTOR
+   by the number of 128-bit quadwords in an SVE vector.  */
+
+static bool
+aarch64_sve_cnt_factor_p (HOST_WIDE_INT factor)
+{
+  /* The coefficient must be [1, 16] * {2, 4, 8, 16}.  */
+  return (IN_RANGE (factor, 2, 16 * 16)
+ && (factor & 1) == 0
+ && factor <= 16 * (factor & -factor));
+}
+
 /* Return true if we can move VALUE into a register using a single
CNT[BHWD] instruction.  */
 
@@ -5273,11 +5285,7 @@ static bool
 aarch64_sve_cnt_immediate_p (poly_int64 value)
 {
   HOST_WIDE_INT factor = value.coeffs[0];
-  /* The coefficient must be [1, 16] * {2, 4, 8, 16}.  */
-  return (value.coeffs[1] == factor
- && IN_RANGE (factor, 2, 16 * 16)
- && (factor & 1) == 0
- && factor <= 16 * (fact

[PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2022-11-11 Thread Stam Markianos-Wright via Gcc-patches


Hi all,

This is the 2/2 patch that contains the functional changes needed
for MVE Tail Predicated Low Overhead Loops.  See my previous email
for a general introduction of MVE LOLs.

This support is added through the already existing loop-doloop
mechanisms that are used for non-MVE dls/le looping.

Changes are:

1) Relax the loop-doloop mechanism in the mid-end to allow for
   decrement numbers other that -1 and for `count` to be an
   rtx containing the number of elements to be processed, rather
   than an expression for calculating the number of iterations.
2) Add a `allow_elementwise_doloop` target hook. This allows the
   target backend to manipulate the iteration count as it needs:
   in our case to change it from a pre-calculation of the number
   of iterations to the number of elements to be processed.
3) The doloop_end target-insn now had an additional parameter:
   the `count` (note: this is before it gets modified to just be
   the number of elements), so that the decrement value is
   extracted from that parameter.

And many things in the backend to implement the above optimisation:

4)  Appropriate changes to the define_expand of doloop_end and new
    patterns for dlstp and letp.
5) `arm_attempt_dlstp_transform`: (called from the define_expand of
    doloop_end) this function checks for the loop's suitability for
    dlstp/letp transformation and then implements it, if possible.
6) `arm_mve_get_loop_unique_vctp`: A function that loops through
    the loop contents and returns the vctp VPR-genereting operation
    within the loop, if it is unique and there is exclusively one
    vctp within the loop.
7) A couple of utility functions: `arm_mve_get_vctp_lanes` to map
   from vctp unspecs to number of lanes, and `arm_get_required_vpr_reg`
   to check an insn to see if it requires the VPR or not.

No regressions on arm-none-eabi with various targets and on
aarch64-none-elf. Thoughts on getting this into trunk?

Thank you,
Stam Markianos-Wright

gcc/ChangeLog:

    * config/aarch64/aarch64.md: Add extra doloop_end arg.
    * config/arm/arm-protos.h (arm_attempt_dlstp_transform): New.
    * config/arm/arm.cc (TARGET_ALLOW_ELEMENTWISE_DOLOOP): New.
    (arm_mve_get_vctp_lanes): New.
    (arm_get_required_vpr_reg): New.
    (arm_mve_get_loop_unique_vctp): New.
    (arm_attempt_dlstp_transform): New.
    (arm_allow_elementwise_doloop): New.
    * config/arm/iterators.md:
    * config/arm/mve.md (*predicated_doloop_end_internal): New.
    (dlstp_insn): New.
    * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.
    * config/arm/unspecs.md: New unspecs.
    * config/ia64/ia64.md: Add extra doloop_end arg.
    * config/pru/pru.md: Add extra doloop_end arg.
    * config/rs6000/rs6000.md: Add extra doloop_end arg.
    * config/s390/s390.md: Add extra doloop_end arg.
    * config/v850/v850.md: Add extra doloop_end arg.
    * doc/tm.texi: Document new hook.
    * doc/tm.texi.in: Likewise.
    * loop-doloop.cc (doloop_condition_get): Relax conditions.
    (doloop_optimize): Add support for elementwise LoLs.
    * target-insns.def (doloop_end): Add extra arg.
    * target.def (allow_elementwise_doloop): New hook.
    * targhooks.cc (default_allow_elementwise_doloop): New.
    * targhooks.h (default_allow_elementwise_doloop): New.

gcc/testsuite/ChangeLog:

    * gcc.target/arm/lob.h: Update framework.
    * gcc.target/arm/lob1.c: Likewise.
    * gcc.target/arm/lob6.c: Likewise.
    * gcc.target/arm/dlstp-int16x8.c: New test.
    * gcc.target/arm/dlstp-int32x4.c: New test.
    * gcc.target/arm/dlstp-int64x2.c: New test.
    * gcc.target/arm/dlstp-int8x16.c: New test.


### Inline copy of patch ###

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
f2e3d905dbbeb2949f2947f5cfd68208c94c9272..7a6d24a80060b4a704a481ccd1a32d96e7b0f369 
100644

--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7366,7 +7366,8 @@
 ;; knows what to generate.
 (define_expand "doloop_end"
   [(use (match_operand 0 "" ""))  ; loop pseudo
-   (use (match_operand 1 "" ""))] ; label
+   (use (match_operand 1 "" ""))  ; label
+   (use (match_operand 2 "" ""))] ; decrement constant
   "optimize > 0 && flag_modulo_sched"
 {
   rtx s0;
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
550272facd12e60a49bf8a3b20f811cc13765b3a..7684620f0f4d161dd9e9ad2d70308021ec3d3d34 
100644

--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -63,7 +63,7 @@ extern void arm_decompose_di_binop (rtx, rtx, rtx *, 
rtx *, rtx *, rtx *);

 extern bool arm_q_bit_access (void);
 extern bool arm_ge_bits_access (void);
 extern bool arm_target_insn_ok_for_lob (rtx);
-
+extern rtx arm_attempt_dlstp_transform (rtx, rtx);
 #ifdef RTX_CODE
 enum reg_class
 arm_mode_base_reg_class (machine_mode);
diff --git a/gcc/config/arm/a

[committed] libstdc++: Fix wstring conversions in filesystem::path [PR95048]

2022-11-11 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux and x86_64-w64-ming32 (via Wine).

Pushed to trunk. This needs to be backported too.

-- >8 --

In commit r9-7381-g91756c4abc1757 I changed filesystem::path to use
std::codecvt for conversions from all wide
strings to UTF-8, instead of using std::codecvt_utf8. This was
done because for 16-bit wchar_t, std::codecvt_utf8 only
supports UCS-2 and not UTF-16. The rationale for the change was sound,
but the actual fix was not. It's OK to use std::codecvt for char16_t or
char32_t, because the specializations for those types always use UTF-8 ,
but std::codecvt uses the current locale's
encodings, and the narrow encoding is probably ASCII and can't support
non-ASCII characters.

The correct fix is to use std::codecvt only for char16_t and char32_t.
For 32-bit wchar_t we could have continued using std::codecvt_utf8
because that uses UTF-32 which is fine, switching to std::codecvt broke
non-Windows targets with 32-bit wchar_t. For 16-bit wchar_t we did need
to change, but should have changed to std::codecvt_utf8_utf16
instead, as that always uses UTF-16 not UCS-2. I actually noted that in
the commit message for r9-7381-g91756c4abc1757 but didn't use that
option. Oops.

This replaces the unconditional std::codecvt
with a type defined via template specialization, so it can vary
depending on the wide character type. The code is also simplified to
remove some of the mess of #ifdef and if-constexpr conditions.

libstdc++-v3/ChangeLog:

PR libstdc++/95048
* include/bits/fs_path.h (path::_Codecvt): New class template
that selects the kind of code conversion done.
(path::_Codecvt): Select based on sizeof(wchar_t).
(_GLIBCXX_CONV_FROM_UTF8): New macro to allow the same code to
be used for Windows and POSIX.
(path::_S_convert(const EcharT*, const EcharT*)): Simplify by
using _Codecvt and _GLIBCXX_CONV_FROM_UTF8 abstractions.
(path::_S_str_convert(basic_string_view, const A&)):
Simplify nested conditions.
* include/experimental/bits/fs_path.h (path::_Cvt): Define
nested typedef controlling type of code conversion done.
(path::_Cvt::_S_wconvert): Use new typedef.
(path::string(const A&)): Likewise.
* testsuite/27_io/filesystem/path/construct/95048.cc: New test.
* testsuite/experimental/filesystem/path/construct/95048.cc: New
test.
---
 libstdc++-v3/include/bits/fs_path.h   | 128 ++
 .../include/experimental/bits/fs_path.h   |  51 +--
 .../27_io/filesystem/path/construct/95048.cc  |  45 ++
 .../filesystem/path/construct/95048.cc|  47 +++
 4 files changed, 204 insertions(+), 67 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/27_io/filesystem/path/construct/95048.cc
 create mode 100644 
libstdc++-v3/testsuite/experimental/filesystem/path/construct/95048.cc

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 2fc7dcd98c9..b1835573c6a 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -727,6 +727,8 @@ namespace __detail
 _List _M_cmpts;
 
 struct _Parser;
+
+template struct _Codecvt;
   };
 
   /// @{
@@ -855,55 +857,72 @@ namespace __detail
 size_t _M_pos;
   };
 
+  // path::_Codecvt Performs conversions between C and path::string_type.
+  // The native encoding of char strings is the OS-dependent current
+  // encoding for pathnames. FIXME: We assume this is UTF-8 everywhere,
+  // but should use a Windows API to query it.
+
+  // Converts between native pathname encoding and char16_t or char32_t.
+  template
+struct path::_Codecvt
+// Need derived class here because std::codecvt has protected destructor.
+: std::codecvt<_EcharT, char, mbstate_t>
+{ };
+
+  // Converts between native pathname encoding and native wide encoding.
+  // The native encoding for wide strings is the execution wide-character
+  // set encoding. FIXME: We assume that this is either UTF-32 or UTF-16
+  // (depending on the width of wchar_t). That matches GCC's default,
+  // but can be changed with -fwide-exec-charset.
+  // We need a custom codecvt converting the native pathname encoding
+  // to/from the native wide encoding.
+  template<>
+struct path::_Codecvt
+: __conditional_t,   // UTF-8 <-> UTF-32
+ std::codecvt_utf8_utf16> // UTF-8 <-> UTF-16
+{ };
+
   template
 auto
 path::_S_convert(const _EcharT* __f, const _EcharT* __l)
 {
   static_assert(__detail::__is_encoded_char<_EcharT>);
 
+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+# define _GLIBCXX_CONV_FROM_UTF8(S) __detail::__wstr_from_utf8(S)
+#else
+# define _GLIBCXX_CONV_FROM_UTF8(S) S
+#endif
+
   if constexpr (is_same_v<_EcharT, value_type>)
return basic_string_view(__f, __l - __f);
-#if !defined _GLIBCXX_FILESYSTEM_IS_WINDOWS && defined _GLIBCXX_USE_CHAR8_T
+#ifdef _GLIBCXX_USE_CHAR8_T
   else if constexpr

Re: [PATCH] libstdc++: Set active union member in constexpr std::string [PR103295]

2022-11-11 Thread Patrick Palka via Gcc-patches

On Fri, 11 Nov 2022, Jonathan Wakely via Libstdc++ wrote:

> On Fri, 11 Nov 2022 at 11:23, Nathaniel Shead via Libstdc++
>  wrote:
> >
> > Hi,
> >
> > Below is a patch to fix std::string in constexpr contexts on Clang. This
> > was originally fixed in the commits attached to PR103295, but a later
> > commit 98a0d72a seems to have mistakenly undone this.
> >
> > Tested on x86_64-linux. Verified using clang-14 and clang-15 that the
> > fix works. I haven't added anything to the test suite, since this issue
> > is only detected by clang.
> >
> > This is my first time contributing, so please let me know if I've done
> > anything wrong or missed something. Thanks!
> 
> Thanks for the patch, I'll get this committed today.
> 
> The only thing I had to fix was the indentation in the commit log. The
> second line of the ChangeLog should be aligned with the * not the text
> following it (so indented by a single tab).
> 
> 
> >
> > Nathaniel
> >
> > -- >8 --
> >
> > Clang still complains about using std::string in constexpr contexts due
> > to the changes made in commit 98a0d72a. This patch ensures that we set
> > the active member of the union as according to [class.union.general] p6.
> >
> > libstdc++-v3/ChangeLog:
> >
> > PR libstdc++/103295
> > * include/bits/basic_string.h (_M_use_local_data): Set active
> >   member to _M_local_buf.
> >
> > Signed-off-by: Nathaniel Shead 
> > ---
> >  libstdc++-v3/include/bits/basic_string.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/bits/basic_string.h 
> > b/libstdc++-v3/include/bits/basic_string.h
> > index 9c2b57f5a1d..2790fd49b05 100644
> > --- a/libstdc++-v3/include/bits/basic_string.h
> > +++ b/libstdc++-v3/include/bits/basic_string.h
> > @@ -352,8 +352,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
> >{
> >  #if __cpp_lib_is_constant_evaluated
> > if (std::is_constant_evaluated())
> > - for (_CharT& __c : _M_local_buf)
> > -   __c = _CharT();
> > + for (size_type i = 0; i <= _S_local_capacity; ++i)
> > +   _M_local_buf[i] = _CharT();

Just a minor nit, but we should probably uglify i to __i here.

> >  #endif
> > return _M_local_data();
> >}
> > --
> > 2.34.1
> >
> 
>

Re: [PATCH] libstdc++: Set active union member in constexpr std::string [PR103295]

2022-11-11 Thread Jonathan Wakely via Gcc-patches

On Fri, 11 Nov 2022 at 17:55, Patrick Palka  wrote:
>
> On Fri, 11 Nov 2022, Jonathan Wakely via Libstdc++ wrote:
>
> > On Fri, 11 Nov 2022 at 11:23, Nathaniel Shead via Libstdc++
> >  wrote:
> > >
> > > Hi,
> > >
> > > Below is a patch to fix std::string in constexpr contexts on Clang. This
> > > was originally fixed in the commits attached to PR103295, but a later
> > > commit 98a0d72a seems to have mistakenly undone this.
> > >
> > > Tested on x86_64-linux. Verified using clang-14 and clang-15 that the
> > > fix works. I haven't added anything to the test suite, since this issue
> > > is only detected by clang.
> > >
> > > This is my first time contributing, so please let me know if I've done
> > > anything wrong or missed something. Thanks!
> >
> > Thanks for the patch, I'll get this committed today.
> >
> > The only thing I had to fix was the indentation in the commit log. The
> > second line of the ChangeLog should be aligned with the * not the text
> > following it (so indented by a single tab).
> >
> >
> > >
> > > Nathaniel
> > >
> > > -- >8 --
> > >
> > > Clang still complains about using std::string in constexpr contexts due
> > > to the changes made in commit 98a0d72a. This patch ensures that we set
> > > the active member of the union as according to [class.union.general] p6.
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > > PR libstdc++/103295
> > > * include/bits/basic_string.h (_M_use_local_data): Set active
> > >   member to _M_local_buf.
> > >
> > > Signed-off-by: Nathaniel Shead 
> > > ---
> > >  libstdc++-v3/include/bits/basic_string.h | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/libstdc++-v3/include/bits/basic_string.h 
> > > b/libstdc++-v3/include/bits/basic_string.h
> > > index 9c2b57f5a1d..2790fd49b05 100644
> > > --- a/libstdc++-v3/include/bits/basic_string.h
> > > +++ b/libstdc++-v3/include/bits/basic_string.h
> > > @@ -352,8 +352,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
> > >{
> > >  #if __cpp_lib_is_constant_evaluated
> > > if (std::is_constant_evaluated())
> > > - for (_CharT& __c : _M_local_buf)
> > > -   __c = _CharT();
> > > + for (size_type i = 0; i <= _S_local_capacity; ++i)
> > > +   _M_local_buf[i] = _CharT();
>
> Just a minor nit, but we should probably uglify i to __i here.

Good catch, thanks. Fixed and pushed.

[PATCH] [range-ops] Add ability to represent open intervals in frange.

2022-11-11 Thread Aldy Hernandez via Gcc-patches

Currently we represent < and > with a closed interval.  So < 3.0 is
represented as [-INF, +3.0].  This means 3.0 is included in the range,
and though not ideal, is conservatively correct.  Jakub has found a
couple cases where properly representing < and > would help
optimizations and tests, and this patch allows representing open
intervals with real_nextafter.

There are a few caveats.

First, we leave MODE_COMPOSITE_P types pessimistically as a closed interval.

Second, for -ffinite-math-only, real_nextafter will will saturate the
maximum representable number into +INF.  However, this will still do
the right thing, as frange::set() will crop things appropriately.

Finally, and most frustratingly, representing < and > -+0.0 is
problematic because we flush denormals to zero.  Let me explain...

real_nextafter(+0.0, +INF) gives 0x0.8p-148 as expected, but setting a
range to this value yields [+0.0, 0x0.8p-148] because of the flushing.

On the other hand, real_nextafter(+0.0, -INF) (surprisingly) gives
-0.0.8p-148, but setting a range to that value yields [-0.0x8p-148,
-0.0].  I say surprising, because according to cppreference.com,
std::nextafter(+0.0, -INF) should give -0.0.  But that's neither here
nor there because our flushing denormals to zero prevents us from even
representing ranges involving small values around 0.0.  We ultimately
end up with ranges looking like this:

> +0.0  => [+0.0, INF]
> -0.0  => [+0.0, INF]
< +0.0  => [-INF, -0.0]
< -0.0  => [-INF, -0.0]

I suppose this is no worse off that what we had before with closed
intervals.  One could even argue that we're better because we at least
have the right sign now ;-).

All other (non-zero) values look sane.

Lightly tested.

Thoughts?

gcc/ChangeLog:

* range-op-float.cc (build_lt): Adjust with frange_nextafter
instead of default to a closed range.
(build_gt): Same.
---
 gcc/range-op-float.cc | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 380142b4c14..402393097b2 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -381,9 +381,17 @@ build_lt (frange &r, tree type, const frange &val)
r.set_undefined ();
   return false;
 }
-  // We only support closed intervals.
+
   REAL_VALUE_TYPE ninf = frange_val_min (type);
-  r.set (type, ninf, val.upper_bound ());
+  REAL_VALUE_TYPE prev = val.upper_bound ();
+  machine_mode mode = TYPE_MODE (type);
+  // Default to the conservatively correct closed ranges for
+  // MODE_COMPOSITE_P, otherwise use nextafter.  Note that for
+  // !HONOR_INFINITIES, nextafter will yield -INF, but frange::set()
+  // will crop the range appropriately.
+  if (!MODE_COMPOSITE_P (mode))
+frange_nextafter (mode, prev, ninf);
+  r.set (type, ninf, prev);
   return true;
 }
 
@@ -424,9 +432,16 @@ build_gt (frange &r, tree type, const frange &val)
   return false;
 }
 
-  // We only support closed intervals.
   REAL_VALUE_TYPE inf = frange_val_max (type);
-  r.set (type, val.lower_bound (), inf);
+  REAL_VALUE_TYPE next = val.lower_bound ();
+  machine_mode mode = TYPE_MODE (type);
+  // Default to the conservatively correct closed ranges for
+  // MODE_COMPOSITE_P, otherwise use nextafter.  Note that for
+  // !HONOR_INFINITIES, nextafter will yield +INF, but frange::set()
+  // will crop the range appropriately.
+  if (!MODE_COMPOSITE_P (mode))
+frange_nextafter (mode, next, inf);
+  r.set (type, next, inf);
   return true;
 }
 
-- 
2.38.1

Re: why does gccgit require pthread?

2022-11-11 Thread Jonathan Wakely via Gcc-patches

On Fri, 11 Nov 2022 at 17:16, Jonathan Wakely wrote:
>
> On Mon, 7 Nov 2022 at 13:51, Jonathan Wakely wrote:
> >
> > On Mon, 7 Nov 2022 at 13:33, LIU Hao wrote:
> > >
> > > 在 2022-11-07 20:57, Jonathan Wakely 写道:
> > > > It would be a lot nicer if playback::context met the C++ Lockable
> > > > requirements, and playback::context::compile () could just take a
> > > > scoped lock on *this:
> > > >
> > > >
> > >
> > > Yeah yeah that makes a lot of sense. Would you please just commit that? I 
> > > don't have write access to
> > > GCC repo, and it takes a couple of hours for me to bootstrap GCC just for 
> > > this tiny change.
> >
> > Somebody else needs to approve it first. I'll combine our patches and
> > test and submit it properly for approval.
>
> Here's a complete patch that actually builds now, although I'm seeing
> a stage 2 vs stage 3 comparison error which I don't have time to look
> into right now.

A clean build fixed that. This patch bootstraps and passes testing on
x86_64-pc-linux-gnu (CentOS 8 Stream).

OK for trunk?

[PATCH 4/8] Modify test, to prevent the next patch breaking it

2022-11-11 Thread Andrew Carlotti via Gcc-patches

The upcoming c[lt]z idiom recognition patch eliminates the need for a
brute force computation of the iteration count of these loops. The test
is intended to verify that ivcanon can determine the loop count when the
condition is given by a chain of constant computations.

We replace the constant operations with a more complicated chain that should
resist future idiom recognition.

gcc/testsuite/ChangeLog:

* gcc.dg/pr77975.c: Make tests more robust.


--


diff --git a/gcc/testsuite/gcc.dg/pr77975.c b/gcc/testsuite/gcc.dg/pr77975.c
index 
148cebdded964da7fce148abdf2a430c55650513..a187ce2b50c2821841e71b5b6cb243a37a66fb57
 100644
--- a/gcc/testsuite/gcc.dg/pr77975.c
+++ b/gcc/testsuite/gcc.dg/pr77975.c
@@ -7,10 +7,11 @@
 unsigned int
 foo (unsigned int *b)
 {
-  unsigned int a = 3;
+  unsigned int a = 8;
   while (a)
 {
-  a >>= 1;
+  a += 5;
+  a &= 44;
   *b += a;
 }
   return a; 
@@ -21,10 +22,11 @@ foo (unsigned int *b)
 unsigned int
 bar (unsigned int *b)
 {
-  unsigned int a = 7;
+  unsigned int a = 3;
   while (a)
 {
-  a >>= 1;
+  a += 5;
+  a &= 44;
   *b += a;
 }
   return a;

[PATCH] c++: init_priority and SUPPORTS_INIT_PRIORITY [PR107638]

2022-11-11 Thread Patrick Palka via Gcc-patches

The commit r13-3706-gd0a492faa6478c for correcting the result of
__has_attribute(init_priority) causes a bootstrap failure on hppa64-hpux
because it assumes SUPPORTS_INIT_PRIORITY expands to a simple constant,
but on this target SUPPORTS_INIT_PRIORITY is defined as

  #define SUPPORTS_INIT_PRIORITY (TARGET_GNU_LD ? 1 : 0)

(where TARGET_GNU_LD expands to something in terms of global_options)
which means we can't use this macro to statically exclude the entry
for init_priority when defining the cxx_attribute_table.

So instead of trying to exclude init_priority from the attribute table
for sake of __has_attribute, this patch just makes __has_attribute
handle init_priority specially.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  Also sanity checked by artificially defining SUPPORTS_INIT_PRIORITY
to 0.

PR c++/107638

gcc/c-family/ChangeLog:

* c-lex.cc (c_common_has_attribute): Return 1 for init_priority
iff SUPPORTS_INIT_PRIORITY.

gcc/cp/ChangeLog:

* tree.cc (cxx_attribute_table): Don't conditionally exclude
the init_priority entry.
(handle_init_priority_attribute): Remove ATTRIBUTE_UNUSED.
Return error_mark_node if !SUPPORTS_INIT_PRIORITY.
---
 gcc/c-family/c-lex.cc |  9 +
 gcc/cp/tree.cc| 11 +++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index 89c65aca28a..2fe562c7ccf 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -380,6 +380,15 @@ c_common_has_attribute (cpp_reader *pfile, bool std_syntax)
result = 201907;
  else if (is_attribute_p ("assume", attr_name))
result = 202207;
+ else if (is_attribute_p ("init_priority", attr_name))
+   {
+ /* The (non-standard) init_priority attribute is always
+included in the attribute table, but we don't want to
+advertise the attribute unless the target actually
+supports init priorities.  */
+ result = SUPPORTS_INIT_PRIORITY ? 1 : 0;
+ attr_name = NULL_TREE;
+   }
}
  else
{
diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index c30bbeb0839..2324c2269fc 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -5010,10 +5010,8 @@ const struct attribute_spec cxx_attribute_table[] =
 {
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
affects_type_identity, handler, exclude } */
-#if SUPPORTS_INIT_PRIORITY
   { "init_priority",  1, 1, true,  false, false, false,
 handle_init_priority_attribute, NULL },
-#endif
   { "abi_tag", 1, -1, false, false, false, true,
 handle_abi_tag_attribute, NULL },
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
@@ -5041,13 +5039,19 @@ const struct attribute_spec std_attribute_table[] =
 
 /* Handle an "init_priority" attribute; arguments as in
struct attribute_spec.handler.  */
-ATTRIBUTE_UNUSED static tree
+static tree
 handle_init_priority_attribute (tree* node,
tree name,
tree args,
int /*flags*/,
bool* no_add_attrs)
 {
+  if (!SUPPORTS_INIT_PRIORITY)
+/* Treat init_priority as an unrecognized attribute (mirroring the
+   result of __has_attribute) if the target doesn't support init
+   priorities.  */
+return error_mark_node;
+
   tree initp_expr = TREE_VALUE (args);
   tree decl = *node;
   tree type = TREE_TYPE (decl);
@@ -5105,7 +5109,6 @@ handle_init_priority_attribute (tree* node,
 pri);
 }
 
-  gcc_assert (SUPPORTS_INIT_PRIORITY);
   SET_DECL_INIT_PRIORITY (decl, pri);
   DECL_HAS_INIT_PRIORITY_P (decl) = 1;
   return NULL_TREE;
-- 
2.38.1.420.g319605f8f0

[PATCH 5/8] middle-end: Add cltz_complement idiom recognition

2022-11-11 Thread Andrew Carlotti via Gcc-patches

This recognises patterns of the form:
while (n) { n >>= 1 }

This patch results in improved (but still suboptimal) codegen:

foo (unsigned int b) {
int c = 0;

while (b) {
b >>= 1;
c++;
}

return c;
}

foo:
.LFB11:
.cfi_startproc
cbz w0, .L3
clz w1, w0
tst x0, 1
mov w0, 32
sub w0, w0, w1
cselw0, w0, wzr, ne
ret

The conditional is unnecessary. phiopt could recognise a redundant csel
(using cond_removal_in_builtin_zero_pattern) when one of the inputs is a
clz call, but it cannot recognise the redunancy when the input is (e.g.)
(32 - clz).

I could perhaps extend this function to recognise this pattern in a later
patch, if this is a good place to recognise more patterns.

gcc/ChangeLog:

* tree-scalar-evolution.cc (expression_expensive_p): Add checks
for c[lt]z optabs.
* tree-ssa-loop-niter.cc (build_cltz_expr): New.
(number_of_iterations_cltz_complement): New.
(number_of_iterations_bitcount): Add call to the above.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_clz)
(check_effective_target_clzl, check_effective_target_clzll)
(check_effective_target_ctz, check_effective_target_clzl)
(check_effective_target_ctzll): New.
* gcc.dg/tree-ssa/cltz-complement-max.c: New test.
* gcc.dg/tree-ssa/clz-complement-char.c: New test.
* gcc.dg/tree-ssa/clz-complement-int.c: New test.
* gcc.dg/tree-ssa/clz-complement-long-long.c: New test.
* gcc.dg/tree-ssa/clz-complement-long.c: New test.
* gcc.dg/tree-ssa/ctz-complement-char.c: New test.
* gcc.dg/tree-ssa/ctz-complement-int.c: New test.
* gcc.dg/tree-ssa/ctz-complement-long-long.c: New test.
* gcc.dg/tree-ssa/ctz-complement-long.c: New test.


--


diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cltz-complement-max.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cltz-complement-max.c
new file mode 100644
index 
..1a29ca52e42e50822e4e3213b2cb008b766d0318
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cltz-complement-max.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-loop-optimize -fdump-tree-optimized" } */
+
+#define PREC (__CHAR_BIT__)
+
+int clz_complement_count1 (unsigned char b) {
+int c = 0;
+
+while (b) {
+   b >>= 1;
+   c++;
+}
+if (c <= PREC)
+  return 0;
+else
+  return 34567;
+}
+
+int clz_complement_count2 (unsigned char b) {
+int c = 0;
+
+while (b) {
+   b >>= 1;
+   c++;
+}
+if (c <= PREC - 1)
+  return 0;
+else
+  return 76543;
+}
+
+int ctz_complement_count1 (unsigned char b) {
+int c = 0;
+
+while (b) {
+   b <<= 1;
+   c++;
+}
+if (c <= PREC)
+  return 0;
+else
+  return 23456;
+}
+
+int ctz_complement_count2 (unsigned char b) {
+int c = 0;
+
+while (b) {
+   b <<= 1;
+   c++;
+}
+if (c <= PREC - 1)
+  return 0;
+else
+  return 65432;
+}
+/* { dg-final { scan-tree-dump-times "34567" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "76543" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "23456" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "65432" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-char.c 
b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-char.c
new file mode 100644
index 
..2ebe8fabcaf0ce88f3a6a46e9ba4ba79b7d3672e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-char.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-require-effective-target clz } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define PREC (__CHAR_BIT__)
+
+int
+__attribute__ ((noinline, noclone))
+foo (unsigned char b) {
+int c = 0;
+
+while (b) {
+   b >>= 1;
+   c++;
+}
+
+return c;
+}
+
+int main()
+{
+  if (foo(0) != 0)
+__builtin_abort ();
+  if (foo(5) != 3)
+__builtin_abort ();
+  if (foo(255) != 8)
+__builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_clz|\\.CLZ" 1 "optimized" } } 
*/
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-int.c 
b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-int.c
new file mode 100644
index 
..f2c5c23f6a7d84ecb637c6961698b0fc30d7426b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-int.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-require-effective-target clz } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define PREC (__CHAR_BIT__ * __SIZEOF_INT__)
+
+int
+__attribute__ ((noinline, noclone))
+foo (unsigned int b) {
+int c = 0;
+
+while (b) {
+   b >>= 1;
+   c++;
+}
+
+return c;
+}
+
+int main()
+{
+  if (foo(0) != 0)
+__builtin_abort ();
+  if (foo(5) != 3)

Re: [PATCH v2] RISC-V missing __builtin_lceil and __builtin_lfloor

2022-11-11 Thread Kevin Lee

On Wed, Nov 9, 2022 at 1:49 AM Xi Ruoyao  wrote:
>
> On Mon, 2022-11-07 at 20:36 -0800, Kevin Lee wrote:
> I "shamelessly copied" your idea in
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605456.html.
> During the review we found an issue.
>

> -fno-fp-int-builtin-inexact does not allow __builtin_ceil to raise
> inexact exception.  But fcvt.l.d may raise one.

Your solution of activating only for the fp-int-builtin-inexact seems
to be a good way to handle the issue. Thank you for the example. I'll
create a new patch based on the fix.

> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University

[PATCH 6/8] docs: Add popcount, clz and ctz target attributes

2022-11-11 Thread Andrew Carlotti via Gcc-patches

gcc/ChangeLog:

* 
doc/gccint/testsuites/directives-used-within-dejagnu-tests/keywords-describing-target-attributes.rst:
Add missing target attributes.


--


diff --git 
a/gcc/doc/gccint/testsuites/directives-used-within-dejagnu-tests/keywords-describing-target-attributes.rst
 
b/gcc/doc/gccint/testsuites/directives-used-within-dejagnu-tests/keywords-describing-target-attributes.rst
index 
709e4ea2b903cfad4faed40899020b29bc9b5811..8410c40d38fceb83ea8c6ba3bbf0fba5db7929e5
 100644
--- 
a/gcc/doc/gccint/testsuites/directives-used-within-dejagnu-tests/keywords-describing-target-attributes.rst
+++ 
b/gcc/doc/gccint/testsuites/directives-used-within-dejagnu-tests/keywords-describing-target-attributes.rst
@@ -1075,6 +1075,24 @@ Other hardware attributes
 ``cell_hw``
   Test system can execute AltiVec and Cell PPU instructions.
 
+``clz``
+  Target supports a clz optab on int.
+
+``clzl``
+  Target supports a clz optab on long.
+
+``clzll``
+  Target supports a clz optab on long long.
+
+``ctz``
+  Target supports a ctz optab on int.
+
+``ctzl``
+  Target supports a ctz optab on long.
+
+``ctzll``
+  Target supports a ctz optab on long long.
+
 ``cmpccxadd``
   Target supports the execution of ``cmpccxadd`` instructions.
 
@@ -1096,6 +1114,15 @@ Other hardware attributes
 ``pie_copyreloc``
   The x86-64 target linker supports PIE with copy reloc.
 
+``popcount``
+  Target supports a popcount optab on int.
+
+``popcountl``
+  Target supports a popcount optab on long.
+
+``popcountll``
+  Target supports a popcount optab on long long.
+
 ``prefetchi``
   Target supports the execution of ``prefetchi`` instructions.

Re: [PATCH] c++: init_priority and SUPPORTS_INIT_PRIORITY [PR107638]

2022-11-11 Thread Andrew Pinski via Gcc-patches

On Fri, Nov 11, 2022 at 10:48 AM Patrick Palka via Gcc-patches
 wrote:
>
> The commit r13-3706-gd0a492faa6478c for correcting the result of
> __has_attribute(init_priority) causes a bootstrap failure on hppa64-hpux
> because it assumes SUPPORTS_INIT_PRIORITY expands to a simple constant,
> but on this target SUPPORTS_INIT_PRIORITY is defined as
>
>   #define SUPPORTS_INIT_PRIORITY (TARGET_GNU_LD ? 1 : 0)
>
> (where TARGET_GNU_LD expands to something in terms of global_options)
> which means we can't use this macro to statically exclude the entry
> for init_priority when defining the cxx_attribute_table.
>
> So instead of trying to exclude init_priority from the attribute table
> for sake of __has_attribute, this patch just makes __has_attribute
> handle init_priority specially.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?  Also sanity checked by artificially defining SUPPORTS_INIT_PRIORITY
> to 0.
>
> PR c++/107638
>
> gcc/c-family/ChangeLog:
>
> * c-lex.cc (c_common_has_attribute): Return 1 for init_priority
> iff SUPPORTS_INIT_PRIORITY.
>
> gcc/cp/ChangeLog:
>
> * tree.cc (cxx_attribute_table): Don't conditionally exclude
> the init_priority entry.
> (handle_init_priority_attribute): Remove ATTRIBUTE_UNUSED.
> Return error_mark_node if !SUPPORTS_INIT_PRIORITY.
> ---
>  gcc/c-family/c-lex.cc |  9 +
>  gcc/cp/tree.cc| 11 +++
>  2 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
> index 89c65aca28a..2fe562c7ccf 100644
> --- a/gcc/c-family/c-lex.cc
> +++ b/gcc/c-family/c-lex.cc
> @@ -380,6 +380,15 @@ c_common_has_attribute (cpp_reader *pfile, bool 
> std_syntax)
> result = 201907;
>   else if (is_attribute_p ("assume", attr_name))
> result = 202207;
> + else if (is_attribute_p ("init_priority", attr_name))
> +   {
> + /* The (non-standard) init_priority attribute is always
> +included in the attribute table, but we don't want to
> +advertise the attribute unless the target actually
> +supports init priorities.  */
> + result = SUPPORTS_INIT_PRIORITY ? 1 : 0;
> + attr_name = NULL_TREE;
> +   }
> }
>   else
> {
> diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
> index c30bbeb0839..2324c2269fc 100644
> --- a/gcc/cp/tree.cc
> +++ b/gcc/cp/tree.cc
> @@ -5010,10 +5010,8 @@ const struct attribute_spec cxx_attribute_table[] =
>  {
>/* { name, min_len, max_len, decl_req, type_req, fn_type_req,
> affects_type_identity, handler, exclude } */
> -#if SUPPORTS_INIT_PRIORITY
>{ "init_priority",  1, 1, true,  false, false, false,
>  handle_init_priority_attribute, NULL },
> -#endif

Hmm, seems like this could be better handled if attribute_spec has
another field which takes a function which returns if it is supported
or not.
And maybe remove some of the special cases inside c_common_has_attribute.
I suspect there are some target ones which need special casing in more
fancy way.
An example of this is arm_handle_cmse_nonsecure_entry in config/arm/arm.cc .
Where this attribute is not supported unless you use -mcmse option.
Seems would be use if you want to use __has_attribute on
cmse_nonsecure_entry to get the right value there too.

Note I am not blocking this patch for this but just thinking out loud
of how to improve this so special casing is not needed.

Thanks,
Andrew Pinski


>{ "abi_tag", 1, -1, false, false, false, true,
>  handle_abi_tag_attribute, NULL },
>{ NULL, 0, 0, false, false, false, false, NULL, NULL }
> @@ -5041,13 +5039,19 @@ const struct attribute_spec std_attribute_table[] =
>
>  /* Handle an "init_priority" attribute; arguments as in
> struct attribute_spec.handler.  */
> -ATTRIBUTE_UNUSED static tree
> +static tree
>  handle_init_priority_attribute (tree* node,
> tree name,
> tree args,
> int /*flags*/,
> bool* no_add_attrs)
>  {
> +  if (!SUPPORTS_INIT_PRIORITY)
> +/* Treat init_priority as an unrecognized attribute (mirroring the
> +   result of __has_attribute) if the target doesn't support init
> +   priorities.  */
> +return error_mark_node;
> +
>tree initp_expr = TREE_VALUE (args);
>tree decl = *node;
>tree type = TREE_TYPE (decl);
> @@ -5105,7 +5109,6 @@ handle_init_priority_attribute (tree* node,
>  pri);
>  }
>
> -  gcc_assert (SUPPORTS_INIT_PRIORITY);
>SET_DECL_INIT_PRIORITY (decl, pri);
>DECL_HAS_INIT_PRIORITY_P (decl) = 1;
>return NULL_TREE;
> --
> 2.38.1.420.g319605f8f0
>

[PATCH 7/8] middle-end: Add c[lt]z idiom recognition

2022-11-11 Thread Andrew Carlotti via Gcc-patches

This recognises the patterns of the form:
  while (n & 1) { n >>= 1 }

Unfortunately there are currently two issues relating to this patch.

Firstly, simplify_using_initial_conditions does not recognise that
(n != 0) and ((n & 1) == 0) implies that ((n >> 1) != 0).

This preconditions arise following the loop copy-header pass, and the
assumptions returned by number_of_iterations_exit_assumptions then
prevent final value replacement from using the niter result.

I'm not sure what is the best way to fix this - one approach could be to
modify simplify_using_initial_conditions to handle this sort of case,
but it seems that it basically wants the information that ranger could
give anway, so would something like that be a better option?

The second issue arises in the vectoriser, which is able to determine
that the niter->assumptions are always true.
When building with -march=armv8.4-a+sve -S -O3, we get this codegen:

foo (unsigned int b) {
int c = 0;

if (b == 0)
  return PREC;

while (!(b & (1 << (PREC - 1 {
b <<= 1;
c++;
}

return c;
}

foo:
.LFB0:
.cfi_startproc
cmp w0, 0
cbz w0, .L6
blt .L7
lsl w1, w0, 1
clz w2, w1
cmp w2, 14
bls .L8
mov x0, 0
cntwx3
add w1, w2, 1
index   z1.s, #0, #1
whilelo p0.s, wzr, w1
.L4:
add x0, x0, x3
mov p1.b, p0.b
mov z0.d, z1.d
whilelo p0.s, w0, w1
incwz1.s
b.any   .L4
add z0.s, z0.s, #1
lastb   w0, p1, z0.s
ret
.p2align 2,,3
.L8:
mov w0, 0
b   .L3
.p2align 2,,3
.L13:
lsl w1, w1, 1
.L3:
add w0, w0, 1
tbz w1, #31, .L13
ret
.p2align 2,,3
.L6:
mov w0, 32
ret
.p2align 2,,3
.L7:
mov w0, 0
ret
.cfi_endproc

In essence, the vectoriser uses the niter information to determine
exactly how many iterations of the loop it needs to run. It then uses
SVE whilelo instructions to run this number of iterations. The original
loop counter is also vectorised, despite only being used in the final
iteration, and then the final value of this counter is used as the
return value (which is the same as the number of iterations it computed
in the first place).

This vectorisation is obviously bad, and I think it exposes a latent
bug in the vectoriser, rather than being an issue caused by this
specific patch.

gcc/ChangeLog:

* tree-ssa-loop-niter.cc (number_of_iterations_cltz): New.
(number_of_iterations_bitcount): Add call to the above.
(number_of_iterations_exit_assumptions): Add EQ_EXPR case for
c[lt]z idiom recognition.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/cltz-max.c: New test.
* gcc.dg/tree-ssa/clz-char.c: New test.
* gcc.dg/tree-ssa/clz-int.c: New test.
* gcc.dg/tree-ssa/clz-long-long.c: New test.
* gcc.dg/tree-ssa/clz-long.c: New test.
* gcc.dg/tree-ssa/ctz-char.c: New test.
* gcc.dg/tree-ssa/ctz-int.c: New test.
* gcc.dg/tree-ssa/ctz-long-long.c: New test.
* gcc.dg/tree-ssa/ctz-long.c: New test.


--


diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cltz-max.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cltz-max.c
new file mode 100644
index 
..a6bea3d338940efee2e7e1c95a5941525945af9e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cltz-max.c
@@ -0,0 +1,72 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-loop-optimize -fdump-tree-optimized" } */
+
+#define PREC (__CHAR_BIT__)
+
+int clz_count1 (unsigned char b) {
+int c = 0;
+
+if (b == 0)
+  return 0;
+
+while (!(b & (1 << (PREC - 1 {
+   b <<= 1;
+   c++;
+}
+if (c <= PREC - 1)
+  return 0;
+else
+  return 34567;
+}
+
+int clz_count2 (unsigned char b) {
+int c = 0;
+
+if (b == 0)
+  return 0;
+
+while (!(b & (1 << PREC - 1))) {
+   b <<= 1;
+   c++;
+}
+if (c <= PREC - 2)
+  return 0;
+else
+  return 76543;
+}
+
+int ctz_count1 (unsigned char b) {
+int c = 0;
+
+if (b == 0)
+  return 0;
+
+while (!(b & 1)) {
+   b >>= 1;
+   c++;
+}
+if (c <= PREC - 1)
+  return 0;
+else
+  return 23456;
+}
+
+int ctz_count2 (unsigned char b) {
+int c = 0;
+
+if (b == 0)
+  return 0;
+
+while (!(b & 1)) {
+   b >>= 1;
+   c++;
+}
+if (c <= PREC - 2)
+  return 0;
+else
+  return 65432;
+}
+/* { dg-final { scan-tree-dump-times "34567" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "76543" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "23456" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "65432" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/clz-char.c 
b/gcc/test

[PATCH 8/8] middle-end: Expand comment for tree_niter_desc.max

2022-11-11 Thread Andrew Carlotti via Gcc-patches

This requirement is enforced by a gcc_checking_assert in
record_estimate.

gcc/ChangeLog:

* tree-ssa-loop.h (tree_niter_desc): Update comment.


--


diff --git a/gcc/tree-ssa-loop.h b/gcc/tree-ssa-loop.h
index 
6c70f795d171f22b3ed75873fec4920fea75255b..c24215be8822c31a05eaedcf4d3a26db0feab6cf
 100644
--- a/gcc/tree-ssa-loop.h
+++ b/gcc/tree-ssa-loop.h
@@ -52,7 +52,8 @@ public:
   may_be_zero == false), more precisely the number
   of executions of the latch of the loop.  */
   widest_int max;  /* The upper bound on the number of iterations of
-  the loop.  */
+  the loop.  If niter is constant, then these values
+  must agree.  */
 
   /* The simplified shape of the exit condition.  This information is used by
  loop unrolling.  If CMP is ERROR_MARK, then the loop cannot be unrolled.

Re: [PATCH] [range-ops] Add ability to represent open intervals in frange.

2022-11-11 Thread Aldy Hernandez via Gcc-patches

Passes tests for all languages. Passes lapack tests.

So ready to be installed unless you have any issues. Oh... I should
write some tests..

Aldy

On Fri, Nov 11, 2022, 19:11 Aldy Hernandez  wrote:

> Currently we represent < and > with a closed interval.  So < 3.0 is
> represented as [-INF, +3.0].  This means 3.0 is included in the range,
> and though not ideal, is conservatively correct.  Jakub has found a
> couple cases where properly representing < and > would help
> optimizations and tests, and this patch allows representing open
> intervals with real_nextafter.
>
> There are a few caveats.
>
> First, we leave MODE_COMPOSITE_P types pessimistically as a closed
> interval.
>
> Second, for -ffinite-math-only, real_nextafter will will saturate the
> maximum representable number into +INF.  However, this will still do
> the right thing, as frange::set() will crop things appropriately.
>
> Finally, and most frustratingly, representing < and > -+0.0 is
> problematic because we flush denormals to zero.  Let me explain...
>
> real_nextafter(+0.0, +INF) gives 0x0.8p-148 as expected, but setting a
> range to this value yields [+0.0, 0x0.8p-148] because of the flushing.
>
> On the other hand, real_nextafter(+0.0, -INF) (surprisingly) gives
> -0.0.8p-148, but setting a range to that value yields [-0.0x8p-148,
> -0.0].  I say surprising, because according to cppreference.com,
> std::nextafter(+0.0, -INF) should give -0.0.  But that's neither here
> nor there because our flushing denormals to zero prevents us from even
> representing ranges involving small values around 0.0.  We ultimately
> end up with ranges looking like this:
>
> > +0.0  => [+0.0, INF]
> > -0.0  => [+0.0, INF]
> < +0.0  => [-INF, -0.0]
> < -0.0  => [-INF, -0.0]
>
> I suppose this is no worse off that what we had before with closed
> intervals.  One could even argue that we're better because we at least
> have the right sign now ;-).
>
> All other (non-zero) values look sane.
>
> Lightly tested.
>
> Thoughts?
>
> gcc/ChangeLog:
>
> * range-op-float.cc (build_lt): Adjust with frange_nextafter
> instead of default to a closed range.
> (build_gt): Same.
> ---
>  gcc/range-op-float.cc | 23 +++
>  1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
> index 380142b4c14..402393097b2 100644
> --- a/gcc/range-op-float.cc
> +++ b/gcc/range-op-float.cc
> @@ -381,9 +381,17 @@ build_lt (frange &r, tree type, const frange &val)
> r.set_undefined ();
>return false;
>  }
> -  // We only support closed intervals.
> +
>REAL_VALUE_TYPE ninf = frange_val_min (type);
> -  r.set (type, ninf, val.upper_bound ());
> +  REAL_VALUE_TYPE prev = val.upper_bound ();
> +  machine_mode mode = TYPE_MODE (type);
> +  // Default to the conservatively correct closed ranges for
> +  // MODE_COMPOSITE_P, otherwise use nextafter.  Note that for
> +  // !HONOR_INFINITIES, nextafter will yield -INF, but frange::set()
> +  // will crop the range appropriately.
> +  if (!MODE_COMPOSITE_P (mode))
> +frange_nextafter (mode, prev, ninf);
> +  r.set (type, ninf, prev);
>return true;
>  }
>
> @@ -424,9 +432,16 @@ build_gt (frange &r, tree type, const frange &val)
>return false;
>  }
>
> -  // We only support closed intervals.
>REAL_VALUE_TYPE inf = frange_val_max (type);
> -  r.set (type, val.lower_bound (), inf);
> +  REAL_VALUE_TYPE next = val.lower_bound ();
> +  machine_mode mode = TYPE_MODE (type);
> +  // Default to the conservatively correct closed ranges for
> +  // MODE_COMPOSITE_P, otherwise use nextafter.  Note that for
> +  // !HONOR_INFINITIES, nextafter will yield +INF, but frange::set()
> +  // will crop the range appropriately.
> +  if (!MODE_COMPOSITE_P (mode))
> +frange_nextafter (mode, next, inf);
> +  r.set (type, next, inf);
>return true;
>  }
>
> --
> 2.38.1
>
>

Re: [PATCH] 1/19 modula2 front end: changes outside gcc/m2, libgm2 and gcc/testsuite.

2022-11-11 Thread Gaius Mulley via Gcc-patches

Richard Biener  writes:

> On Mon, Oct 10, 2022 at 5:36 PM Gaius Mulley via Gcc-patches
>  wrote:
>>
>>
>>
>> This patch set contains the non machine generated changes found in /
>> for example the language die and documentation changes.  It also
>> contains the changes to the top level build Makefile infastructure
>> and the install.texi sourcebuild.texi documentation.
>
> I couldn't spot any issue besides the docs now being written in
> Sphinx, so this part
> is OK (with the docs ported)

awesome, many thanks - I'm working on the tool
(gcc/m2/tools-src/def2doc.py) which generates Sphinx from the modula-2
library sources and should post this in a few days time,

regards,
Gaius

[PATCH] gcc: m68k: fix PR target/107645

2022-11-11 Thread Max Filippov via Gcc-patches

gcc/
PR target/107645
* config/m68k/predicates.md (symbolic_operand): Return false
when UNSPEC is under the CONST node.
---
Regtested with --enable-checking=all for target=m68k-linux-uclibc, no
new regressions compared to the compiler built without checking.
Ok for master?

 gcc/config/m68k/predicates.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/m68k/predicates.md b/gcc/config/m68k/predicates.md
index f8dedd9f8375..094a18955534 100644
--- a/gcc/config/m68k/predicates.md
+++ b/gcc/config/m68k/predicates.md
@@ -141,6 +141,8 @@
 
 case CONST:
   op = XEXP (op, 0);
+  if (GET_CODE (op) == UNSPEC)
+return false;
   return ((GET_CODE (XEXP (op, 0)) == SYMBOL_REF
   || GET_CODE (XEXP (op, 0)) == LABEL_REF)
  && GET_CODE (XEXP (op, 1)) == CONST_INT);
-- 
2.30.2

[COMMITTED] PR tree-optimization/107523 - Don't add dependencies in update_stmt.

2022-11-11 Thread Andrew MacLeod via Gcc-patches

This adjusts gimple-ranger::update_stmt (which inform the range engine 
that a statement has changed under the covers.  I was calculating the 
statement using a fur_depend class instead of a fur_stmt. (FUR is Fold 
Using Range)


The difference between the 2 is that a fur_depend will reigster any 
relations or dependencies it sees with the oracle and GORI. The problem 
is, update_stmt has no context of where this is being done.  The path 
ranger was performing a simplification, and a relation was being set 
globally even though it should not have been.


THe fix is simple, we're only trying to recalculate the result.. limit 
it to that.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
commit 0a7b437ca71e2721e9bcf070762fc54ef7991aeb
Author: Andrew MacLeod 
Date:   Fri Nov 11 12:22:33 2022 -0500

Don't add dependencies in update_stmt.

gimple_ranger::update_stmt has no idea what the context of an update
is, and should not be adding relations when it re-evaluates a stmt.

PR tree-optimization/107523
gcc/
* gimple-range.cc (gimple_ranger::update_stmt): Use fur_stmt
rather than fur_depend.

gcc/testsuite/
* gcc.dg/pr107523.c: New.

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 2885d0fa21e..ecd6039e0fd 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -546,7 +546,7 @@ gimple_ranger::update_stmt (gimple *s)
   // Re-calculate a new value using just cache values.
   Value_Range tmp (TREE_TYPE (lhs));
   fold_using_range f;
-  fur_depend src (s, &(gori ()), &m_cache);
+  fur_stmt src (s, &m_cache);
   f.fold_stmt (tmp, s, src, lhs);
 
   // Combine the new value with the old value to check for a change.
diff --git a/gcc/testsuite/gcc.dg/pr107523.c b/gcc/testsuite/gcc.dg/pr107523.c
new file mode 100644
index 000..1e5ed46c636
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr107523.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-O2 " } */
+
+int a, b = 1;
+unsigned int c = 1;
+int main() {
+  int d = 1, f;
+  if (b)
+d = 0;
+  a = -1;
+  b = ~d ^ 465984011;
+ L1:;
+  if (b < 2)
+f = b;
+  b = f;
+  if (f <= a) {
+int g = -(a && 1), h = g - f && a, i = ~(c / f) && 1 % (a | h);
+if (c) {
+  g = f;
+  if (i || (g && (g > -465984012)))
+goto L2;
+}
+c = g | f / c;
+  }
+  if (0)
+  L2:
+a = 0;
+  if (a <= c)
+goto L1;
+  return 0;
+}
+

Re: [PATCH] fix small const data for riscv

2022-11-11 Thread Andrew Pinski via Gcc-patches

On Fri, Nov 11, 2022 at 5:03 AM Oria Chen via Gcc-patches
 wrote:
>
> gcc/testsuite ChangeLog:
>
> 2022-11-11  Oria Chen  
>
> * gcc/testsuite/gcc.dg/pr25521.c: Add compile option 
> "-msmall-data-limit=0" to avoid using .srodata section.

I noticed g++.dg/cpp0x/constexpr-rom.C has some slightly different
handling here.
Seems like there should be a generic way to add
-G0/-msmall-data-limit=0 if we don't want small data for a testcase
rather than the current scheme of things.

Thanks,
Andrew Pinski

> ---
>  gcc/testsuite/gcc.dg/pr25521.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/pr25521.c b/gcc/testsuite/gcc.dg/pr25521.c
> index 74fe2ae6626..628ddf1a761 100644
> --- a/gcc/testsuite/gcc.dg/pr25521.c
> +++ b/gcc/testsuite/gcc.dg/pr25521.c
> @@ -2,7 +2,8 @@
> sections.
>
> { dg-require-effective-target elf }
> -   { dg-do compile } */
> +   { dg-do compile }
> +   { dg-options "-msmall-data-limit=0" { target { riscv*-*-* } } } */
>
>  const volatile int foo = 30;
>
> --
> 2.37.2
>

Re: [PATCH] aarch64: Add support for +cssc

2022-11-11 Thread Andrew Pinski via Gcc-patches

On Fri, Nov 11, 2022 at 2:26 AM Kyrylo Tkachov via Gcc-patches
 wrote:
>
> Hi all,
>
> This patch adds codegen for FEAT_CSSC from the 2022 Architecture extensions.
> It fits various existing optabs in GCC quite well.
> There are instructions for scalar signed/unsigned min/max, abs, ctz, popcount.
> We have expanders for these already, so they are wired up to emit single-insn
> patterns for the new TARGET_CSSC.
>
> These instructions are enabled by the +cssc command-line extension.
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> I'll push it once the Binutils patch from Andre for this gets committed



@@ -4976,8 +5020,14 @@ (define_expand "ffs2"
 (define_expand "popcount2"
   [(match_operand:GPI 0 "register_operand")
(match_operand:GPI 1 "register_operand")]
-  "TARGET_SIMD"
+  "TARGET_CSSC || TARGET_SIMD"
 {
+  if (TARGET_CSSC)
+{
+  emit_insn (gen_aarch64_popcount2_insn (operands[0], operands[1]));
+  DONE;
+}
+
   rtx v = gen_reg_rtx (V8QImode);
   rtx v1 = gen_reg_rtx (V8QImode);
   rtx in = operands[1];

I think the easy way is to this instead:
 (define_expand "popcount2"
   [(set (match_operand:GPI 0 "register_operand")
 (popcount:GPI  (match_operand:GPI 1 "register_operand")))]
  "TARGET_CSSC || TARGET_SIMD"
{
  if (!TARGET_CSSC)
{
// Current code
 DONE;
}
}

And then you don't need to name the aarch64_popcount pattern. Or use a *.
Yes it does mess up the diff but the end result seems cleaner.
I suspect all of the expands you are changing should be done this
similar way too.

Thanks,
Andrew Pinski

>
> Thanks,
> Kyrill
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-option-extensions.def (cssc): Define.
> * config/aarch64/aarch64.h (AARCH64_ISA_CSSC): Define.
> (TARGET_CSSC): Likewise.
> * config/aarch64/aarch64.md (aarch64_abs2_insn): New 
> define_insn.
> (abs2): Adjust for the above.
> (aarch64_umax3_insn): New define_insn.
> (umax3): Adjust for the above.
> (aarch64_popcount2_insn): New define_insn.
> (popcount2): Adjust for the above.
> (3): New define_insn.
> * config/aarch64/constraints.md (Usm): Define.
> (Uum): Likewise.
> * 
> doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst:
> Document +cssc.
> * config/aarch64/iterators.md (MAXMIN_NOUMAX): New code iterator.
> * config/aarch64/predicates.md (aarch64_sminmax_immediate): Define.
> (aarch64_sminmax_operand): Likewise.
> (aarch64_uminmax_immediate): Likewise.
> (aarch64_uminmax_operand): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/cssc_1.c: New test.
> * gcc.target/aarch64/cssc_2.c: New test.
> * gcc.target/aarch64/cssc_3.c: New test.
> * gcc.target/aarch64/cssc_4.c: New test.
> * gcc.target/aarch64/cssc_5.c: New test.

gcc-patches@gcc.gnu.org

2022-11-11 Thread Marek Polacek via Gcc-patches

Non-const lvalue references can't bind to a temporary, so the
warning should not be emitted if we're initializing something of that
type.  I'm not disabling the warning when the function itself returns
a non-const lvalue reference, that would regress at least

  const int &r = std::any_cast(std::any());

in Wdangling-reference2.C where the any_cast returns an int&.

Unfortunately, this patch means we'll stop diagnosing

  int& fn(int&& x) { return static_cast(x); }
  void test ()
  {
int &r = fn(4);
  }

where there's a genuine dangling reference.  OTOH, the patch
should suppress false positives with iterators, like:

  auto &candidate = *candidates.begin ();

and arguably that's more important than detecting some relatively
obscure cases.  It's probably not worth it making the warning more
complicated by, for instance, not warning when a fn returns 'int&'
but takes 'const int&' (because then it can't return its argument).

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

* call.cc (maybe_warn_dangling_reference): Don't warn when initializing
a non-const lvalue reference.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/elision4.C: Remove dg-warning.
* g++.dg/warn/Wdangling-reference1.C: Turn dg-warning into dg-bogus.
* g++.dg/warn/Wdangling-reference7.C: New test.
---
 gcc/cp/call.cc   | 10 --
 gcc/testsuite/g++.dg/cpp23/elision4.C|  4 ++--
 gcc/testsuite/g++.dg/warn/Wdangling-reference1.C |  4 ++--
 gcc/testsuite/g++.dg/warn/Wdangling-reference7.C | 16 
 4 files changed, 28 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference7.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index bd3b64a7e26..ef618d5c485 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -13679,8 +13679,14 @@ maybe_warn_dangling_reference (const_tree decl, tree 
init)
 {
   if (!warn_dangling_reference)
 return;
-  if (!(TYPE_REF_OBJ_P (TREE_TYPE (decl))
-   || std_pair_ref_ref_p (TREE_TYPE (decl
+  tree type = TREE_TYPE (decl);
+  /* Only warn if what we're initializing has type T&& or const T&, or
+ std::pair.  (A non-const lvalue reference can't
+ bind to a temporary.)  */
+  if (!((TYPE_REF_OBJ_P (type)
+&& (TYPE_REF_IS_RVALUE (type)
+|| CP_TYPE_CONST_P (TREE_TYPE (type
+   || std_pair_ref_ref_p (type)))
 return;
   /* Don't suppress the diagnostic just because the call comes from
  a system header.  If the DECL is not in a system header, or if
diff --git a/gcc/testsuite/g++.dg/cpp23/elision4.C 
b/gcc/testsuite/g++.dg/cpp23/elision4.C
index d39053ad741..77dcffcdaad 100644
--- a/gcc/testsuite/g++.dg/cpp23/elision4.C
+++ b/gcc/testsuite/g++.dg/cpp23/elision4.C
@@ -34,6 +34,6 @@ T& temporary2(T&& x) { return static_cast(x); }
 void
 test ()
 {
-  int& r1 = temporary1 (42); // { dg-warning "dangling reference" }
-  int& r2 = temporary2 (42); // { dg-warning "dangling reference" }
+  int& r1 = temporary1 (42);
+  int& r2 = temporary2 (42);
 }
diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference1.C 
b/gcc/testsuite/g++.dg/warn/Wdangling-reference1.C
index 97c81ee716c..1718c28165e 100644
--- a/gcc/testsuite/g++.dg/warn/Wdangling-reference1.C
+++ b/gcc/testsuite/g++.dg/warn/Wdangling-reference1.C
@@ -139,6 +139,6 @@ struct Y {
 // x1 = Y::operator int&& (&TARGET_EXPR )
 int&& x1 = Y(); // { dg-warning "dangling reference" }
 int&& x2 = Y{}; // { dg-warning "dangling reference" }
-int& x3 = Y(); // { dg-warning "dangling reference" }
-int& x4 = Y{}; // { dg-warning "dangling reference" }
+int& x3 = Y(); // { dg-bogus "dangling reference" }
+int& x4 = Y{}; // { dg-bogus "dangling reference" }
 const int& t1 = Y().foo(10); // { dg-warning "dangling reference" }
diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference7.C 
b/gcc/testsuite/g++.dg/warn/Wdangling-reference7.C
new file mode 100644
index 000..4b0de2d8670
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wdangling-reference7.C
@@ -0,0 +1,16 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wdangling-reference" }
+
+int& ref(const int&);
+int&& rref(const int&);
+
+void
+g ()
+{
+  const int& r1 = ref (1); // { dg-warning "dangling reference" }
+  int& r2 = ref (2); // { dg-bogus "dangling reference" }
+  auto& r3 = ref (3); // { dg-bogus "dangling reference" }
+  int&& r4 = rref (4); // { dg-warning "dangling reference" }
+  auto&& r5 = rref (5); // { dg-warning "dangling reference" }
+  const int&& r6 = rref (6); // { dg-warning "dangling reference" }
+}

base-commit: 0a7b437ca71e2721e9bcf070762fc54ef7991aeb
-- 
2.38.1

Re: [PATCH] fix small const data for riscv

2022-11-11 Thread Palmer Dabbelt


On Fri, 11 Nov 2022 11:56:08 PST (-0800), gcc-patches@gcc.gnu.org wrote:

On Fri, Nov 11, 2022 at 5:03 AM Oria Chen via Gcc-patches
 wrote:


gcc/testsuite ChangeLog:

2022-11-11  Oria Chen  

* gcc/testsuite/gcc.dg/pr25521.c: Add compile option 
"-msmall-data-limit=0" to avoid using .srodata section.


I noticed g++.dg/cpp0x/constexpr-rom.C has some slightly different
handling here.
Seems like there should be a generic way to add
-G0/-msmall-data-limit=0 if we don't want small data for a testcase
rather than the current scheme of things.


There's also a few tests like these where we modified the regex to match 
.sadata in addition to .data, which fixes the problem on MIPS too.




Thanks,
Andrew Pinski


---
 gcc/testsuite/gcc.dg/pr25521.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr25521.c b/gcc/testsuite/gcc.dg/pr25521.c
index 74fe2ae6626..628ddf1a761 100644
--- a/gcc/testsuite/gcc.dg/pr25521.c
+++ b/gcc/testsuite/gcc.dg/pr25521.c
@@ -2,7 +2,8 @@
sections.

{ dg-require-effective-target elf }
-   { dg-do compile } */
+   { dg-do compile }
+   { dg-options "-msmall-data-limit=0" { target { riscv*-*-* } } } */

 const volatile int foo = 30;

--
2.37.2

Re: Announcement: Porting the Docs to Sphinx - tomorrow

2022-11-11 Thread Gerald Pfeifer

On Tue, 8 Nov 2022, Martin Liška wrote:
> After the migration, people should be able to build (and install) GCC 
> even if they miss Sphinx (similar happens now if you miss makeinfo). 

My nightly *install* (not build) on amd64-unknown-freebsd12.2 broke 
(from what I can tell due to this - it's been working fine most of 
the last several 1000 days):

  if [ -f doc/g++.1 ]; then rm -f 
/home/gerald/gcc-ref12-amd64/share/man/man1/g++.1; /usr/bin/install -c -m 644 
doc/g++.1 /home/gerald/gcc-ref12-amd64/share/man/man1/g++.1; chmod a-x 
/home/gerald/gcc-ref12-amd64/share/man/man1/g++.1; fimake -C 
/scratch/tmp/gerald/GCC-HEAD/gcc/../doc man 
SOURCEDIR=/scratch/tmp/gerald/GCC-HEAD/gcc/fortran/doc/gfortran 
BUILDDIR=/scratch/tmp/gerald/OBJ--0954/gcc/doc/gfortran/man SPHINXBUILD=
  make[3]: make[3]: don't know how to make w. Stop
  make[3]: stopped in /scratch/tmp/gerald/GCC-HEAD/doc
  gmake[2]: *** [/scratch/tmp/gerald/GCC-HEAD/gcc/fortran/Make-lang.in:164: 
doc/gfortran/man/man/gfortran.1] Error 2
  gmake[2]: Leaving directory '/scratch/tmp/gerald/OBJ--0954/gcc'
  gmake[1]: *** [Makefile:5310: install-strip-gcc] Error 2
  gmake[1]: Leaving directory '/scratch/tmp/gerald/OBJ--0954'
  gmake: *** [Makefile:2734: install-strip] Error 2

(This appears to be the case with "make -j1 install-strip". Not sure where 
that "w" target is coming from?)

Gerald

Re: [committed] libstdc++: Avoid redundant checks in std::use_facet [PR103755]

2022-11-11 Thread Stephan Bergmann via Gcc-patches


On 11/11/22 06:30, Jonathan Wakely via Gcc-patches wrote:

As discussed in the PR, this makes it three times faster to construct
iostreams objects.

Tested x86_64-linux. Pushed to trunk.


I haven't yet tried to track down what's going on, but with various 
versions of Clang (e.g. clang-15.0.4-1.fc37.x86_64):



$ cat test.cc
#include 
int main(int, char ** argv) {
std::regex_traits().transform(argv[0], argv[0] + 1);
}



$ clang++ --gcc-toolchain=... -fsanitize=undefined -O2 test.cc
/usr/bin/ld: /tmp/test-c112b1.o: in function `std::__cxx11::basic_string, std::allocator > 
std::__cxx11::regex_traits::transform(char*, char*) const':
test.cc:(.text._ZNKSt7__cxx1112regex_traitsIcE9transformIPcEENS_12basic_stringIcSt11char_traitsIcESaIcEEET_S9_[_ZNKSt7__cxx1112regex_traitsIcE9transformIPcEENS_12basic_stringIcSt11char_traitsIcESaIcEEET_S9_]+0x1b):
 undefined reference to `std::__cxx11::collate const* 
std::__try_use_facet >(std::locale const&)'
clang-15: error: linker command failed with exit code 1 (use -v to see 
invocation)

Re: [PATCH 1/6] PowerPC: Add -mcpu=future

2022-11-11 Thread Peter Bergner via Gcc-patches

On 11/9/22 8:44 PM, Michael Meissner via Gcc-patches wrote:
> +  /* For now, make -mtune=future the same as -mtune=power10.  */
> +  if (rs6000_tune == PROCESSOR_FUTURE)
> +rs6000_tune = PROCESSOR_POWER10;

This comment matches the code...

> +  /* Some future processor.  For now, just use power10.  */
> +  if (rs6000_cpu == PROCESSOR_FUTURE)
> +return "future";

...but this doesn't.

Peter

Re: Announcement: Porting the Docs to Sphinx - tomorrow

2022-11-11 Thread Sandra Loosemore


On 11/11/22 13:52, Gerald Pfeifer wrote:

On Tue, 8 Nov 2022, Martin Liška wrote:

After the migration, people should be able to build (and install) GCC
even if they miss Sphinx (similar happens now if you miss makeinfo).


My nightly *install* (not build) on amd64-unknown-freebsd12.2 broke
(from what I can tell due to this - it's been working fine most of
the last several 1000 days):

   if [ -f doc/g++.1 ]; then rm -f 
/home/gerald/gcc-ref12-amd64/share/man/man1/g++.1; /usr/bin/install -c -m 644 
doc/g++.1 /home/gerald/gcc-ref12-amd64/share/man/man1/g++.1; chmod a-x 
/home/gerald/gcc-ref12-amd64/share/man/man1/g++.1; fimake -C 
/scratch/tmp/gerald/GCC-HEAD/gcc/../doc man 
SOURCEDIR=/scratch/tmp/gerald/GCC-HEAD/gcc/fortran/doc/gfortran 
BUILDDIR=/scratch/tmp/gerald/OBJ--0954/gcc/doc/gfortran/man SPHINXBUILD=
   make[3]: make[3]: don't know how to make w. Stop
   make[3]: stopped in /scratch/tmp/gerald/GCC-HEAD/doc
   gmake[2]: *** [/scratch/tmp/gerald/GCC-HEAD/gcc/fortran/Make-lang.in:164: 
doc/gfortran/man/man/gfortran.1] Error 2
   gmake[2]: Leaving directory '/scratch/tmp/gerald/OBJ--0954/gcc'
   gmake[1]: *** [Makefile:5310: install-strip-gcc] Error 2
   gmake[1]: Leaving directory '/scratch/tmp/gerald/OBJ--0954'
   gmake: *** [Makefile:2734: install-strip] Error 2

(This appears to be the case with "make -j1 install-strip". Not sure where
that "w" target is coming from?)


I've seen something similar:  "make install" seems to be passing an 
empty SPHINXBUILD= option to the docs Makefile which is not equipped to 
handle that.  I know the fix is to get a recent-enough version of Sphinx 
installed (and I'm going to work on that over the weekend), but it ought 
to fail more gracefully, or not try to install docs that cannot be built 
without Sphinx.


-Sandra

[committed] analyzer: new warning: -Wanalyzer-infinite-recursion [PR106147]

2022-11-11 Thread David Malcolm via Gcc-patches

This patch adds a new -Wanalyzer-infinite-recursion warning to
-fanalyzer, which complains about certain cases of infinite recursion.

Specifically, when it detects recursion during its symbolic execution
of the user's code, it compares the state of memory to that at the
previous level of recursion, and if nothing appears to have effectively
changed, it issues a warning.

Unlike the middle-end warning -Winfinite-recursion (added by Martin
Sebor in GCC 12; r12-5483-g30ba058f77eedf), the analyzer warning
complains if there exists an interprocedural path in which recursion
occurs in which memory has not changed, whereas -Winfinite-recursion
complains if *every* intraprocedural path through the function leads to
a self-call.

Hence the warnings complement each other: there's some overlap, but each
also catches issues that the other misses.

For example, the new warning complains about a guarded recursion in
which the guard is passed unchanged:

void test_guarded (int flag)
{
  if (flag)
test_guarded (flag);
}

t.c: In function 'test_guarded':
t.c:4:5: warning: infinite recursion [CWE-674] [-Wanalyzer-infinite-recursion]
4 | test_guarded (flag);
  | ^~~
  'test_guarded': events 1-4
|
|1 | void test_guarded (int flag)
|  |  ^~~~
|  |  |
|  |  (1) initial entry to 'test_guarded'
|2 | {
|3 |   if (flag)
|  |  ~
|  |  |
|  |  (2) following 'true' branch (when 'flag != 0')...
|4 | test_guarded (flag);
|  | ~~~
|  | |
|  | (3) ...to here
|  | (4) calling 'test_guarded' from 'test_guarded'
|
+--> 'test_guarded': events 5-6
   |
   |1 | void test_guarded (int flag)
   |  |  ^~~~
   |  |  |
   |  |  (5) recursive entry to 'test_guarded'; previously 
entered at (1)
   |  |  (6) apparently infinite recursion
   |

whereas the existing warning doesn't complain, since when "flag" is
false the function doesn't recurse.

The new warning doesn't trigger for e.g.:

  void test_param_variant (int depth)
  {
if (depth > 0)
  test_param_variant (depth - 1);
  }

on the grounds that "depth" is changing, and appears to be a variant
that enforces termination of the recursion.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-3912-g12c583a2a3da79.

gcc/ChangeLog:
PR analyzer/106147
* Makefile.in (ANALYZER_OBJS): Add analyzer/infinite-recursion.o.

gcc/analyzer/ChangeLog:
PR analyzer/106147
* analyzer.opt (Wanalyzer-infinite-recursion): New.
* call-string.cc (call_string::count_occurrences_of_function):
New.
* call-string.h (call_string::count_occurrences_of_function): New
decl.
* checker-path.cc (function_entry_event::function_entry_event):
New ctor.
(checker_path::add_final_event): Delete.
* checker-path.h (function_entry_event::function_entry_event): New
ctor.
(function_entry_event::get_desc): Drop "final".
(checker_path::add_final_event): Delete.
* diagnostic-manager.cc
(diagnostic_manager::emit_saved_diagnostic): Create the final
event via a new pending_diagnostic::add_final_event vfunc, rather
than checker_path::add_final_event.
(diagnostic_manager::add_events_for_eedge): Create function entry
events via a new pending_diagnostic::add_function_entry_event
vfunc.
* engine.cc (exploded_graph::process_node): When creating a new
PK_BEFORE_SUPERNODE node, call
exploded_graph::detect_infinite_recursion on it after adding the
in-edge.
* exploded-graph.h (exploded_graph::detect_infinite_recursion):
New decl.
(exploded_graph::find_previous_entry_to): New decl.
* infinite-recursion.cc: New file.
* pending-diagnostic.cc
(pending_diagnostic::add_function_entry_event): New.
(pending_diagnostic::add_final_event): New.
* pending-diagnostic.h
(pending_diagnostic::add_function_entry_event): New vfunc.
(pending_diagnostic::add_final_event): New vfunc.

gcc/ChangeLog:
PR analyzer/106147
* doc/gcc/gcc-command-options/options-that-control-static-analysis.rst:
Add -Wanalyzer-infinite-recursion.
* 
doc/gcc/gcc-command-options/options-to-request-or-suppress-warnings.rst
(-Winfinite-recursion): Mention -Wanalyzer-infinite-recursion.

gcc/testsuite/ChangeLog:
PR analyzer/106147
* g++.dg/analyzer/infinite-recursion-1.C: New test.
* g++.dg/analyzer/infinite-recursion-2.C: New test, copied from
g++.dg/warn/Winfinite-recursion-2.C.
* g++.dg/analyzer/infinite-recursion-3.C: New test, adapted from
g++.dg/warn/Winfinite-recursio

[committed] analyzer: split out checker_event classes to their own header

2022-11-11 Thread David Malcolm via Gcc-patches

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-3913-g65752c1f7c41c5.

gcc/analyzer/ChangeLog:
* checker-path.h: Split out checker_event and its subclasses to...
* checker-event.h: ...this new header.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/checker-event.h | 610 +++
 gcc/analyzer/checker-path.h  | 584 +
 2 files changed, 612 insertions(+), 582 deletions(-)
 create mode 100644 gcc/analyzer/checker-event.h

diff --git a/gcc/analyzer/checker-event.h b/gcc/analyzer/checker-event.h
new file mode 100644
index 000..18c44e600c8
--- /dev/null
+++ b/gcc/analyzer/checker-event.h
@@ -0,0 +1,610 @@
+/* Subclasses of diagnostic_event for analyzer diagnostics.
+   Copyright (C) 2019-2022 Free Software Foundation, Inc.
+   Contributed by David Malcolm .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef GCC_ANALYZER_CHECKER_EVENT_H
+#define GCC_ANALYZER_CHECKER_EVENT_H
+
+#include "tree-logical-location.h"
+
+namespace ana {
+
+/* An enum for discriminating between the concrete subclasses of
+   checker_event.  */
+
+enum event_kind
+{
+  EK_DEBUG,
+  EK_CUSTOM,
+  EK_STMT,
+  EK_REGION_CREATION,
+  EK_FUNCTION_ENTRY,
+  EK_STATE_CHANGE,
+  EK_START_CFG_EDGE,
+  EK_END_CFG_EDGE,
+  EK_CALL_EDGE,
+  EK_RETURN_EDGE,
+  EK_START_CONSOLIDATED_CFG_EDGES,
+  EK_END_CONSOLIDATED_CFG_EDGES,
+  EK_INLINED_CALL,
+  EK_SETJMP,
+  EK_REWIND_FROM_LONGJMP,
+  EK_REWIND_TO_SETJMP,
+  EK_WARNING
+};
+
+extern const char *event_kind_to_string (enum event_kind ek);
+
+/* Event subclasses.
+
+   The class hierarchy looks like this (using indentation to show
+   inheritance, and with event_kinds shown for the concrete subclasses):
+
+   diagnostic_event
+ checker_event
+   debug_event (EK_DEBUG)
+   custom_event (EK_CUSTOM)
+precanned_custom_event
+   statement_event (EK_STMT)
+   region_creation_event (EK_REGION_CREATION)
+   function_entry_event (EK_FUNCTION_ENTRY)
+   state_change_event (EK_STATE_CHANGE)
+   superedge_event
+ cfg_edge_event
+  start_cfg_edge_event (EK_START_CFG_EDGE)
+  end_cfg_edge_event (EK_END_CFG_EDGE)
+ call_event (EK_CALL_EDGE)
+ return_edge (EK_RETURN_EDGE)
+   start_consolidated_cfg_edges_event (EK_START_CONSOLIDATED_CFG_EDGES)
+   end_consolidated_cfg_edges_event (EK_END_CONSOLIDATED_CFG_EDGES)
+   inlined_call_event (EK_INLINED_CALL)
+   setjmp_event (EK_SETJMP)
+   rewind_event
+ rewind_from_longjmp_event (EK_REWIND_FROM_LONGJMP)
+rewind_to_setjmp_event (EK_REWIND_TO_SETJMP)
+   warning_event (EK_WARNING).  */
+
+/* Abstract subclass of diagnostic_event; the base class for use in
+   checker_path (the analyzer's diagnostic_path subclass).  */
+
+class checker_event : public diagnostic_event
+{
+public:
+  /* Implementation of diagnostic_event.  */
+
+  location_t get_location () const final override { return m_loc; }
+  tree get_fndecl () const final override { return m_effective_fndecl; }
+  int get_stack_depth () const final override { return m_effective_depth; }
+  const logical_location *get_logical_location () const final override
+  {
+if (m_effective_fndecl)
+  return &m_logical_loc;
+else
+  return NULL;
+  }
+  meaning get_meaning () const override;
+
+  /* Additional functionality.  */
+
+  int get_original_stack_depth () const { return m_original_depth; }
+
+  virtual void prepare_for_emission (checker_path *,
+pending_diagnostic *pd,
+diagnostic_event_id_t emission_id);
+  virtual bool is_call_p () const { return false; }
+  virtual bool is_function_entry_p () const  { return false; }
+  virtual bool is_return_p () const  { return false; }
+
+  /* For use with %@.  */
+  const diagnostic_event_id_t *get_id_ptr () const
+  {
+return &m_emission_id;
+  }
+
+  void dump (pretty_printer *pp) const;
+  void debug () const;
+
+  void set_location (location_t loc) { m_loc = loc; }
+
+protected:
+  checker_event (enum event_kind kind,
+location_t loc, tree fndecl, int depth);
+
+ public:
+  const enum event_kind m_kind;
+ protected:
+  location_t m_loc;
+  tree m_original_fndecl;
+  tree m_effective_fndecl;
+  int m_original_depth;
+  int m_effective_depth;
+  pe

Re: [PATCH] maintainer-scripts/gcc_release: compress xz in parallel

2022-11-11 Thread Sam James via Gcc-patches



> On 8 Nov 2022, at 07:14, Sam James  wrote:
> 
> 1. This should speed up decompression for folks, as parallel xz
>   creates a different archive which can be decompressed in parallel.
> 
>   Note that this different method is enabled by default in a new
>   xz release coming shortly anyway (>= 5.3.3_alpha1).
> 
>   I build GCC regularly from the weekly snapshots
>   and so the decompression time adds up.
> 
> 2. It should speed up compression on the webserver a bit.
> 
>   Note that -T0 won't be the default in the new xz release,
>   only the parallel compression mode (which enables parallel
>   decompression).
> 
>   -T0 detects the number of cores available.
> 
>   So, if a different number of threads is preferred, it's fine
>   to set e.g. -T2, etc.
> 
> Signed-off-by: Sam James 
> ---
> maintainer-scripts/gcc_release | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 

Given no disagreements, anyone fancy pushing
this in time for Sunday evening for the next 13
snapshot? ;)



signature.asc
Description: Message signed with OpenPGP

1 2 >

1 - 100 of 140 matches

Mail list logo