date:20220915

Re: [PATCH] Rewrite NAN and sign handling in frange

2022-09-15 Thread Richard Biener via Gcc-patches

On Thu, Sep 15, 2022 at 7:41 AM Aldy Hernandez  wrote:
>
> Hi Richard.  Hi all.
>
> The attatched patch rewrites the NAN and sign handling, dropping both
> tristates in favor of a pair of boolean flags for NANs, and nothing at
> all for signs.  The signs are tracked in the range itself, so now it's
> possible to describe things like [-0.0, +0.0] +NAN, [+0, +0], [-5, +0],
> [+0, 3] -NAN, etc.
>
> There are a lot of changes, as the tristate was quite pervasive.  I
> could use another pair of eyes.  The code IMO is cleaner and handles
> all the cases we discussed.
>
> Here is an example of the various ranges and how they are displayed:
>
> [frange] float VARYING NAN ;; Varying includes NAN
> [frange] UNDEFINED  ;; Empty set as always
> [frange] float [] NAN   ;; Unknown sign NAN
> [frange] float [] -NAN  ;; -NAN
> [frange] float [] +NAN  ;; +NAN
> [frange] float [-0.0, 0.0]  ;; All zeros.
> [frange] float [-0.0, -0.0] NAN ;; -0 or NAN.
> [frange] float [-5.0e+0, -1.0e+0] +NAN  ;; [-5, -1] or +NAN
> [frange] float [-5.0e+0, -0.0] NAN  ;; [-5, -0] or +-NAN
> [frange] float [-5.0e+0, -0.0]  ;; [-5, -0]
> [frange] float [5.0e+0, 1.0e+1] ;; [5, 10]
>
> We could represent an unknown sign with +NAN -NAN if preferred.

maybe -+NAN or +-NAN?  I prefer to somehow show both signs for clarity

>
> Notice the NAN signs are decoupled from the range, so we can represent
> a negative range with a positive NAN.  For this range,
> frange::known_bit() would return false, as only when the signs of the
> NANs and range agree can we be certain.
>
> There is no longer any pessimization of ranges for intersects
> involving NANs.  Also, union and intersect work with signed zeros:
>
> //   [-0,  x] U [+0,  x] => [-0,  x]
> //   [ x, -0] U [ x, +0] => [ x, +0]
> //   [-0,  x] ^ [+0,  x] => [+0,  x]
> //   [ x, -0] ^ [ x, +0] => [ x, -0]
>
> The special casing for signed zeros in the singleton code is gone in
> favor of just making sure the signs in the range agree, that is
> [-0, -0] for example.
>
> I have removed the idea that a known NAN is a "range", so a NAN is no
> longer in the endpoints itself.  Requesting the bound of a known NAN
> is a hard fail.  For that matter, we don't store the actual NAN in the
> range.  The only information we have are the set of boolean flags.
> This way we make sure nothing seeps into the frange.  This also means
> it's explicit that we don't track anything but the sign in NANs.  We
> can revisit this if we desire to track signalling or whatever
> concoction y'all can imagine.
>
> All in all, I'm quite happy with this.  It does look better, and we
> handle all the corner cases we couldn't before.  Thanks for the
> suggestion.
>
> Regstrapped with mpfr tests on x86-64 and ppc64le Linux.  Selftests
> were also run with -ffinite-math-only on x86-64.
>
> At Jakub's suggestion, I built lapack with associated tests.  They
> pass on x86-64 and ppc64le Linux with no regressions from mainline.
> As a sanity check, I also ran them for -ffinite-math-only on x86 which
> (as expected) returned:
>
> NaN arithmetic did not perform per the ieee spec
>
> Otherwise, all tests pass for -ffinite-math-only.
>
> How does this look?

Overall it looks good.

Reading ::intersect and ::union I find it less clear to spread out the _nan
cases into separate functions.

Can you add a comment to frange that its representation is
a single value-range specified by m_type, m_min, m_max
unioned with the set of { -NaN, +NaN }?  Because somehow
the ::undefined_p vs. m_type == VR_UNDEFINED checks are
a bit confusing to the occasional reader can we instead use
::nan_p to complement ::undefined_p?

Brain dump: maybe having a NaN-less frange with m_type, m_min, m_max
and then frange_with_nan having a frange member plus the nan bits
would make a better distinction?  Maybe we can use m_type == VR_RANGE
when the actual range is empty but we have NaNs somehow?  That we
need m_type to represent an empty range and VR_VARYING for the full
range is somehow duplicate - ]0,0[ would be an empty range as well,
but then we'd need inclusive/exclusive ranges.  NULL m_min/max might
be another (bad) representation.  Having m_type makes for efficient
checking as well, so that's a pro.  Maybe have m_type == VR_NAN for
the case of empty range but NaNs, leaving VR_UNDEFINED to the
true empty set?

Anyway, I think the patch is OK as-is with the NaN printing adjusted
and maybe avoiding the bare m_type == VR_UNDEFINED checks
(in all but the abstraction).

Thanks,
Richard.

>
> gcc/ChangeLog:
>
> * range-op-float.cc (frange_add_zeros): Replace set_signbit with
> union of zero.
> * value-query.cc (range_query::get_tree_range): Remove set_signbit
> use.
> * value-range-pretty-print.cc (vrange_printer::print_frange_prop):
> Remove.
> (vrange_printer::print

Ping: [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant

2022-09-15 Thread Jiufu Guo via Gcc-patches



Ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html

BR,
Jeff(Jiufu)


Jiufu Guo  writes:

> Hi,
>
> When checking eq/ne with a constant which has only 16bits, it can be
> optimized to check the rotated data.  By this, the constant building
> is optimized.
>
> As the example in PR103743:
> For "in == 0x8000LL", this patch generates:
> rotldi %r3,%r3,16
> cmpldi %cr0,%r3,32768
> instead:
> li %r9,-1
> rldicr %r9,%r9,0,0
> cmpd %cr0,%r3,%r9
>
> Compare with previous patchs:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600385.html
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600198.html
>
> This patch releases the condition on can_create_pseudo_p and adds
> clobbers to allow the splitter can be run both before and after RA.
>
> This is updated patch based on previous patch and comments:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600315.html
>
> This patch pass bootstrap and regtest on ppc64 and ppc64le.
> Is it ok for trunk?  Thanks for comments!
>
> BR,
> Jeff(Jiufu)
>
>
>   PR target/103743
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000-protos.h (rotate_from_leading_zeros_const): New.
>   (compare_rotate_immediate_p): New.
>   * config/rs6000/rs6000.cc (rotate_from_leading_zeros_const): New
>   definition.
>   (compare_rotate_immediate_p): New definition.
>   * config/rs6000/rs6000.md (EQNE): New code_attr.
>   (*rotate_on_cmpdi): New define_insn_and_split.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/pr103743.c: New test.
>   * gcc.target/powerpc/pr103743_1.c: New test.
>
> ---
>  gcc/config/rs6000/rs6000-protos.h |  2 +
>  gcc/config/rs6000/rs6000.cc   | 41 
>  gcc/config/rs6000/rs6000.md   | 62 +++-
>  gcc/testsuite/gcc.target/powerpc/pr103743.c   | 52 ++
>  gcc/testsuite/gcc.target/powerpc/pr103743_1.c | 95 +++
>  5 files changed, 251 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index b3c16e7448d..78847e6b3db 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -35,6 +35,8 @@ extern bool xxspltib_constant_p (rtx, machine_mode, int *, 
> int *);
>  extern int vspltis_shifted (rtx);
>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
> +extern int rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT, int);
> +extern bool compare_rotate_immediate_p (unsigned HOST_WIDE_INT);
>  extern int num_insns_constant (rtx, machine_mode);
>  extern int small_data_operand (rtx, machine_mode);
>  extern bool mem_operand_gpr (rtx, machine_mode);
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index df491bee2ea..a548db42660 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -14797,6 +14797,47 @@ rs6000_reverse_condition (machine_mode mode, enum 
> rtx_code code)
>  return reverse_condition (code);
>  }
>  
> +/* Check if C can be rotated from an immediate which starts (as 64bit 
> integer)
> +   with at least CLZ bits zero.
> +
> +   Return the number by which C can be rotated from the immediate.
> +   Return -1 if C can not be rotated as from.  */
> +
> +int
> +rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT c, int clz)
> +{
> +  /* case a. 0..0xxx: already at least clz zeros.  */
> +  int lz = clz_hwi (c);
> +  if (lz >= clz)
> +return 0;
> +
> +  /* case b. 0..0xxx0..0: at least clz zeros.  */
> +  int tz = ctz_hwi (c);
> +  if (lz + tz >= clz)
> +return tz;
> +
> +  /* case c. xx10.0xx: rotate 'clz + 1' bits firstly, then check case b.
> +^bit -> Vbit
> +  00...00xxx100, 'clz + 1' >= bits of .  */
> +  const int rot_bits = HOST_BITS_PER_WIDE_INT - clz + 1;
> +  unsigned HOST_WIDE_INT rc = (c >> rot_bits) | (c << (clz - 1));
> +  tz = ctz_hwi (rc);
> +  if (clz_hwi (rc) + tz >= clz)
> +return tz + rot_bits;
> +
> +  return -1;
> +}
> +
> +/* Check if C can be rotated from an immediate operand of cmpdi or cmpldi.  
> */
> +
> +bool
> +compare_rotate_immediate_p (unsigned HOST_WIDE_INT c)
> +{
> +  /* leading 48 zeros (cmpldi), or leading 49 ones (cmpdi).  */
> +  return rotate_from_leading_zeros_const (~c, 49) > 0
> +  || rotate_from_leading_zeros_const (c, 48) > 0;
> +}
> +
>  /* Generate a compare for CODE.  Return a brand-new rtx that
> represents the result of the compare.  */
>  
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index e9e5cd1e54d..cad3cfc98cd 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7766,6 +7766,67 @@ (define_insn "*movsi_from_df"
>"xscvd

[PATCH] RISC-V: Support poly move manipulation and selftests.

2022-09-15 Thread juzhe . zhong

From: zhongjuzhe 

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Change "static void" to "void".
* config.gcc: Add riscv-selftests.o
* config/riscv/predicates.md: Allow const_poly_int.
* config/riscv/riscv-protos.h (riscv_reinit): New function.
(riscv_parse_arch_string): change as exten function.
(riscv_run_selftests): New function.
* config/riscv/riscv.cc (riscv_cannot_force_const_mem): Don't allow 
poly into const pool.
(riscv_report_v_required): New function.
(riscv_expand_op): New function.
(riscv_expand_mult_with_const_int): New function.
(riscv_legitimize_poly_move): Ditto.
(riscv_legitimize_move): New function.
(riscv_hard_regno_mode_ok): Add VL/VTYPE register allocation and fix 
vector RA.
(riscv_convert_vector_bits): Fix riscv_vector_chunks configuration for 
-marh no 'v'.
(riscv_reinit): New function.
(TARGET_RUN_TARGET_SELFTESTS): New target hook support.
* config/riscv/t-riscv: Add riscv-selftests.o.
* config/riscv/riscv-selftests.cc: New file.

gcc/testsuite/ChangeLog:

* selftests/riscv/empty-func.rtl: New test.

---
 gcc/common/config/riscv/riscv-common.cc  |   2 +-
 gcc/config.gcc   |   2 +-
 gcc/config/riscv/predicates.md   |   3 +
 gcc/config/riscv/riscv-protos.h  |   9 +
 gcc/config/riscv/riscv-selftests.cc  | 239 +++
 gcc/config/riscv/riscv.cc| 298 ++-
 gcc/config/riscv/t-riscv |   4 +
 gcc/testsuite/selftests/riscv/empty-func.rtl |   8 +
 8 files changed, 558 insertions(+), 7 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-selftests.cc
 create mode 100644 gcc/testsuite/selftests/riscv/empty-func.rtl

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 77219162eeb..c39ed2e2696 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1224,7 +1224,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 /* Parse a RISC-V ISA string into an option mask.  Must clear or set all arch
dependent mask bits, in case more than one -march string is passed.  */
 
-static void
+void
 riscv_parse_arch_string (const char *isa,
 struct gcc_options *opts,
 location_t loc)
diff --git a/gcc/config.gcc b/gcc/config.gcc
index f4e757bd853..27ffce3fb50 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -515,7 +515,7 @@ pru-*-*)
;;
 riscv*)
cpu_type=riscv
-   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o"
+   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o riscv-selftests.o"
d_target_objs="riscv-d.o"
;;
 rs6000*-*-*)
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 862e72b0983..5e149b3a95f 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -146,6 +146,9 @@
 case CONST_INT:
   return !splittable_const_int_operand (op, mode);
 
+case CONST_POLY_INT:
+  return known_eq (rtx_to_poly_int64 (op), BYTES_PER_RISCV_VECTOR);
+
 case CONST:
 case SYMBOL_REF:
 case LABEL_REF:
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 649c5c977e1..f9a2baa46c7 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -74,6 +74,7 @@ extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern bool riscv_store_data_bypass_p (rtx_insn *, rtx_insn *);
 extern rtx riscv_gen_gpr_save_insn (struct riscv_frame_info *);
 extern bool riscv_gpr_save_operation_p (rtx);
+extern void riscv_reinit (void);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
@@ -86,6 +87,7 @@ extern void riscv_init_builtins (void);
 
 /* Routines implemented in riscv-common.cc.  */
 extern std::string riscv_arch_str (bool version_p = true);
+extern void riscv_parse_arch_string (const char *, struct gcc_options *, 
location_t);
 
 extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
 
@@ -105,4 +107,11 @@ struct riscv_cpu_info {
 
 extern const riscv_cpu_info *riscv_find_cpu (const char *);
 
+/* Routines implemented in riscv-selftests.cc.  */
+#if CHECKING_P
+namespace selftest {
+extern void riscv_run_selftests (void);
+} // namespace selftest
+#endif
+
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/config/riscv/riscv-selftests.cc 
b/gcc/config/riscv/riscv-selftests.cc
new file mode 100644
index 000..167cd47c880
--- /dev/null
+++ b/gcc/config/riscv/riscv-selftests.cc
@@ -0,0 +1,239 @@
+/* This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your o

[PATCH]rs6000: Load high and low part of 64bit constant independently

2022-09-15 Thread Jiufu Guo via Gcc-patches

Hi,

For a complicate 64bit constant, blow is one instruction-sequence to
build:
lis 9,0x800a
ori 9,9,0xabcd
sldi 9,9,32
oris 9,9,0xc167
ori 9,9,0xfa16

while we can also use below sequence to build:
lis 9,0xc167
lis 10,0x800a
ori 9,9,0xfa16
ori 10,10,0xabcd
rldimi 9,10,32,0
This sequence is using 2 registers to build high and low part firstly,
and then merge them.
In parallel aspect, this sequence would be faster. (Ofcause, using 1 more
register with potential register pressure).

Bootstrap and regtest pass on ppc64le.
Is this ok for trunk?


BR,
Jeff(Jiufu)


gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit
constant build.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/parall_5insn_const.c: New test.

---
 gcc/config/rs6000/rs6000.cc   | 45 +++
 .../gcc.target/powerpc/parall_5insn_const.c   | 27 +++
 2 files changed, 53 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a656cb32a47..759c6309677 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10180,26 +10180,33 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 }
   else
 {
-  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
-
-  emit_move_insn (copy_rtx (temp),
- GEN_INT (((ud4 << 16) ^ 0x8000) - 0x8000));
-  if (ud3 != 0)
-   emit_move_insn (copy_rtx (temp),
-   gen_rtx_IOR (DImode, copy_rtx (temp),
-GEN_INT (ud3)));
+  if (can_create_pseudo_p ())
+   {
+ /* lis A,U4; ori A,U3; lis B,U2; ori B,U1; rldimi A,B,32,0.  */
+ rtx H = gen_reg_rtx (DImode);
+ rtx L = gen_reg_rtx (DImode);
+ HOST_WIDE_INT num = (ud2 << 16) | ud1;
+ rs6000_emit_set_long_const (L, (num ^ 0x8000) - 0x8000);
+ num = (ud4 << 16) | ud3;
+ rs6000_emit_set_long_const (H, (num ^ 0x8000) - 0x8000);
+ emit_insn (gen_rotldi3_insert_3 (dest, H, GEN_INT (32), L,
+  GEN_INT (0x)));
+   }
+  else
+   {
+ /* lis A, U4; ori A,U3; rotl A,32; oris A,U2; ori A,U1.  */
+ emit_move_insn (dest,
+ GEN_INT (((ud4 << 16) ^ 0x8000) - 0x8000));
+ if (ud3 != 0)
+   emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud3)));
 
-  emit_move_insn (ud2 != 0 || ud1 != 0 ? copy_rtx (temp) : dest,
- gen_rtx_ASHIFT (DImode, copy_rtx (temp),
- GEN_INT (32)));
-  if (ud2 != 0)
-   emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest,
-   gen_rtx_IOR (DImode, copy_rtx (temp),
-GEN_INT (ud2 << 16)));
-  if (ud1 != 0)
-   emit_move_insn (dest,
-   gen_rtx_IOR (DImode, copy_rtx (temp),
-GEN_INT (ud1)));
+ emit_move_insn (dest, gen_rtx_ASHIFT (DImode, dest, GEN_INT (32)));
+ if (ud2 != 0)
+   emit_move_insn (dest,
+   gen_rtx_IOR (DImode, dest, GEN_INT (ud2 << 16)));
+ if (ud1 != 0)
+   emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud1)));
+   }
 }
 }
 
diff --git a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c 
b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
new file mode 100644
index 000..ed8ccc73378
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mdejagnu-cpu=power7  -save-temps" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+/* { dg-final { scan-assembler-times {\mlis\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mori\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */
+
+void __attribute__ ((noinline)) foo (unsigned long long *a)
+{
+  /* 2lis+2ori+1rldimi for each constant.  */
+  *a++ = 0x800aabcdc167fa16ULL;
+  *a++ = 0x7543a876867f616ULL;
+}
+
+long long A[] = {0x800aabcdc167fa16ULL, 0x7543a876867f616ULL};
+int
+main ()
+{
+  long long res[2];
+
+  foo (res);
+  if (__builtin_memcmp (res, A, sizeof (res)) != 0)
+__builtin_abort ();
+
+  return 0;
+}
-- 
2.17.1

[PATCH] RISC-V: Add RVV machine modes.

2022-09-15 Thread juzhe . zhong

From: zhongjuzhe 

gcc/ChangeLog:

* config/riscv/riscv-modes.def (VECTOR_BOOL_MODE): Define RVV mask 
modes.
(ADJUST_NUNITS): Adjust NUNITS using riscv_vector_chunks.
(ADJUST_ALIGNMENT): Adjust alignment.
(ADJUST_BYTESIZE): Adjust bytesize.
(RVV_MODES): New macro.
(VECTOR_MODE_WITH_PREFIX): Define RVV vector modes.
(VECTOR_MODES_WITH_PREFIX): Define RVV vector modes.
(NUM_POLY_INT_COEFFS): remove redundant space.

---
 gcc/config/riscv/riscv-modes.def | 143 ++-
 1 file changed, 142 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index 6e30c1a5595..649dc5877d2 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -22,6 +22,147 @@ along with GCC; see the file COPYING3.  If not see
 FLOAT_MODE (HF, 2, ieee_half_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
 
+/* Vector modes.  */
+
+/* Encode the ratio of SEW/LMUL into the mask types. There are the following
+ * mask types.  */
+
+/* | Mode | MIN_VLEN = 32 | MIN_VLEN = 64 |
+   |  | SEW/LMUL  | SEW/LMUL  |
+   | VNx1BI   | 32| 64|
+   | VNx2BI   | 16| 32|
+   | VNx4BI   | 8 | 16|
+   | VNx8BI   | 4 | 8 |
+   | VNx16BI  | 2 | 4 |
+   | VNx32BI  | 1 | 2 |
+   | VNx64BI  | N/A   | 1 |  */
+
+VECTOR_BOOL_MODE (VNx1BI, 1, BI, 8);
+VECTOR_BOOL_MODE (VNx2BI, 2, BI, 8);
+VECTOR_BOOL_MODE (VNx4BI, 4, BI, 8);
+VECTOR_BOOL_MODE (VNx8BI, 8, BI, 8);
+VECTOR_BOOL_MODE (VNx16BI, 16, BI, 8);
+VECTOR_BOOL_MODE (VNx32BI, 32, BI, 8);
+VECTOR_BOOL_MODE (VNx64BI, 64, BI, 8);
+
+ADJUST_NUNITS (VNx1BI, riscv_vector_chunks * 1);
+ADJUST_NUNITS (VNx2BI, riscv_vector_chunks * 2);
+ADJUST_NUNITS (VNx4BI, riscv_vector_chunks * 4);
+ADJUST_NUNITS (VNx8BI, riscv_vector_chunks * 8);
+ADJUST_NUNITS (VNx16BI, riscv_vector_chunks * 16);
+ADJUST_NUNITS (VNx32BI, riscv_vector_chunks * 32);
+ADJUST_NUNITS (VNx64BI, riscv_vector_chunks * 64);
+
+ADJUST_ALIGNMENT (VNx1BI, 1);
+ADJUST_ALIGNMENT (VNx2BI, 1);
+ADJUST_ALIGNMENT (VNx4BI, 1);
+ADJUST_ALIGNMENT (VNx8BI, 1);
+ADJUST_ALIGNMENT (VNx16BI, 1);
+ADJUST_ALIGNMENT (VNx32BI, 1);
+ADJUST_ALIGNMENT (VNx64BI, 1);
+
+ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks *riscv_bytes_per_vector_chunk);
+ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks *riscv_bytes_per_vector_chunk);
+ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks *riscv_bytes_per_vector_chunk);
+ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks *riscv_bytes_per_vector_chunk);
+ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks *riscv_bytes_per_vector_chunk);
+ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks *riscv_bytes_per_vector_chunk);
+ADJUST_BYTESIZE (VNx64BI, riscv_vector_chunks *riscv_bytes_per_vector_chunk);
+
+/*
+   | Mode| MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
+   | | LMUL|  SEW/LMUL   | LMUL| SEW/LMUL|
+   | VNx1QI  | MF4 |  32 | MF8 | 64  |
+   | VNx2QI  | MF2 |  16 | MF4 | 32  |
+   | VNx4QI  | M1  |  8  | MF2 | 16  |
+   | VNx8QI  | M2  |  4  | M1  | 8   |
+   | VNx16QI | M4  |  2  | M2  | 4   |
+   | VNx32QI | M8  |  1  | M4  | 2   |
+   | VNx64QI | N/A |  N/A| M8  | 1   |
+   | VNx1(HI|HF) | MF2 |  32 | MF4 | 64  |
+   | VNx2(HI|HF) | M1  |  16 | MF2 | 32  |
+   | VNx4(HI|HF) | M2  |  8  | M1  | 16  |
+   | VNx8(HI|HF) | M4  |  4  | M2  | 8   |
+   | VNx16(HI|HF)| M8  |  2  | M4  | 4   |
+   | VNx32(HI|HF)| N/A |  N/A| M8  | 2   |
+   | VNx1(SI|SF) | M1  |  32 | MF2 | 64  |
+   | VNx2(SI|SF) | M2  |  16 | M1  | 32  |
+   | VNx4(SI|SF) | M4  |  8  | M2  | 16  |
+   | VNx8(SI|SF) | M8  |  4  | M4  | 8   |
+   | VNx16(SI|SF)| N/A |  N/A| M8  | 4   |
+   | VNx1(DI|DF) | N/A |  N/A| M1  | 64  |
+   | VNx2(DI|DF) | N/A |  N/A| M2  | 32  |
+   | VNx4(DI|DF) | N/A |  N/A| M4  | 16  |
+   | VNx8(DI|DF) | N/A |  N/A| M8  | 8   |
+*/
+
+/* Define RVV modes whose sizes are multiples of 64-bit chunks.  */
+#define RVV_MODES(NVECS, VB, VH, VS, VD)   
\
+  VECTOR_MODES_WITH_PREFIX (VNx, INT, 8 * NVECS, 0);

Re: [PATCH] Move void_list_node init to common code

2022-09-15 Thread Richard Sandiford via Gcc-patches

Richard Biener via Gcc-patches  writes:
> All frontends replicate this, so move it.
>
> Bootstrap and regtest running for all languages on 
> x86_64-unknown-linux-gnu.
>
> OK if that succeeds?

LGTM FWIW.

Thanks,
Richard

> Thanks,
> Richard.
>
> gcc/
>   * tree.cc (build_common_tree_nodes): Initialize void_list_node
>   here.
>
> gcc/ada/
>   * gcc-interface/trans.cc (gigi): Do not initialize void_list_node.
>
> gcc/c-family/
>   * c-common.h (build_void_list_node): Remove.
>   * c-common.cc (c_common_nodes_and_builtins): Do not initialize
>   void_list_node.
>
> gcc/c/
>   * c-decl.cc (build_void_list_node): Remove.
>
> gcc/cp/
>   * decl.cc (cxx_init_decl_processing): Inline last
>   build_void_list_node call.
>   (build_void_list_node): Remove.
>
> gcc/d/
>   * d-builtins.cc (d_build_c_type_nodes): Do not initialize
>   void_list_node.
>
> gcc/fortran/
>   * f95-lang.c (gfc_init_decl_processing): Do not initialize
>   void_list_node.
>
> gcc/go/
>   * go-lang.cc (go_langhook_init): Do not initialize
>   void_list_node.
>
> gcc/jit/
>   * dummy-frontend.cc (jit_langhook_init): Do not initialize
>   void_list_node.
>
> gcc/lto/
>   * lto-lang.cc (lto_build_c_type_nodes): Do not initialize
>   void_list_node.
> ---
>  gcc/ada/gcc-interface/trans.cc |  1 -
>  gcc/c-family/c-common.cc   |  2 --
>  gcc/c-family/c-common.h|  1 -
>  gcc/c/c-decl.cc|  8 
>  gcc/cp/decl.cc | 10 +-
>  gcc/d/d-builtins.cc|  1 -
>  gcc/fortran/f95-lang.cc|  2 --
>  gcc/go/go-lang.cc  |  3 ---
>  gcc/jit/dummy-frontend.cc  |  3 ---
>  gcc/lto/lto-lang.cc|  1 -
>  gcc/tree.cc|  2 ++
>  11 files changed, 3 insertions(+), 31 deletions(-)
>
> diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
> index f2e0cb2299b..2d93947cb26 100644
> --- a/gcc/ada/gcc-interface/trans.cc
> +++ b/gcc/ada/gcc-interface/trans.cc
> @@ -413,7 +413,6 @@ gigi (Node_Id gnat_root,
>save_gnu_tree (gnat_literal, t, false);
>  
>/* Declare the building blocks of function nodes.  */
> -  void_list_node = build_tree_list (NULL_TREE, void_type_node);
>void_ftype = build_function_type_list (void_type_node, NULL_TREE);
>ptr_void_ftype = build_pointer_type (void_ftype);
>  
> diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> index 0a5b7e120c9..c0f15f4cab1 100644
> --- a/gcc/c-family/c-common.cc
> +++ b/gcc/c-family/c-common.cc
> @@ -4505,8 +4505,6 @@ c_common_nodes_and_builtins (void)
>  TYPE_NAME (void_type_node) = void_name;
>}
>  
> -  void_list_node = build_void_list_node ();
> -
>/* Make a type to be the domain of a few array types
>   whose domains don't really matter.
>   200 is small enough that it always fits in size_t
> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> index ce971a29b5d..2f592f5cd58 100644
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -853,7 +853,6 @@ extern tree identifier_global_tag (tree);
>  extern bool names_builtin_p (const char *);
>  extern tree c_linkage_bindings (tree);
>  extern void record_builtin_type (enum rid, const char *, tree);
> -extern tree build_void_list_node (void);
>  extern void start_fname_decls (void);
>  extern void finish_fname_decls (void);
>  extern const char *fname_as_string (int);
> diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
> index 34f8feda897..b09c6393b91 100644
> --- a/gcc/c/c-decl.cc
> +++ b/gcc/c/c-decl.cc
> @@ -10676,14 +10676,6 @@ record_builtin_type (enum rid rid_index, const char 
> *name, tree type)
>  debug_hooks->type_decl (decl, false);
>  }
>  
> -/* Build the void_list_node (void_type_node having been created).  */
> -tree
> -build_void_list_node (void)
> -{
> -  tree t = build_tree_list (NULL_TREE, void_type_node);
> -  return t;
> -}
> -
>  /* Return a c_parm structure with the given SPECS, ATTRS and DECLARATOR.  */
>  
>  struct c_parm *
> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> index 006e9affcba..070f673c3a2 100644
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -4623,7 +4623,7 @@ cxx_init_decl_processing (void)
>record_unknown_type (init_list_type_node, "init list");
>  
>/* Used when parsing to distinguish parameter-lists () and (void).  */
> -  explicit_void_list_node = build_void_list_node ();
> +  explicit_void_list_node = build_tree_list (NULL_TREE, void_type_node);
>  
>{
>  /* Make sure we get a unique function type, so we can give
> @@ -18450,14 +18450,6 @@ cp_tree_node_structure (union lang_tree_node * t)
>  }
>  }
>  
> -/* Build the void_list_node (void_type_node having been created).  */
> -tree
> -build_void_list_node (void)
> -{
> -  tree t = build_tree_list (NULL_TREE, void_type_node);
> -  return t;
> -}
> -
>  bool
>  cp_missing_noreturn_ok_p (tree decl)
>  {
> diff --git a/gcc/d/d-builtins.cc b/gcc/d/d-builtins.cc
> i

[PATCH] Fix c-c++-common/goacc/mdc-2.c and g++.dg/goacc/mdc.C tests

2022-09-15 Thread Julian Brown

These testsuite hunks got left attached to the wrong patch in the series
I just posted. I will apply as obvious.

2022-09-15  Julian Brown  

gcc/testsuite/
* c-c++-common/goacc/mdc-2.c: Update expected errors.
* g++.dg/goacc/mdc.C: Likewise.
---
 gcc/testsuite/c-c++-common/goacc/mdc-2.c | 2 ++
 gcc/testsuite/g++.dg/goacc/mdc.C | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/gcc/testsuite/c-c++-common/goacc/mdc-2.c 
b/gcc/testsuite/c-c++-common/goacc/mdc-2.c
index df3ce543d30..246625c76a2 100644
--- a/gcc/testsuite/c-c++-common/goacc/mdc-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/mdc-2.c
@@ -37,7 +37,9 @@ t1 ()
 #pragma acc exit data detach(z[:]) /* { dg-error "expected single pointer in 
.detach. clause" } */
 /* { dg-error "has no data movement clause" "" { target *-*-* } .-1 } */
 #pragma acc enter data attach(z[3]) /* { dg-error "expected pointer in 
.attach. clause" } */
+/* { dg-error "has no data movement clause" "" { target *-*-* } .-1 } */
 #pragma acc exit data detach(z[3]) /* { dg-error "expected pointer in .detach. 
clause" } */
+/* { dg-error "has no data movement clause" "" { target *-*-* } .-1 } */
 
 #pragma acc enter data attach(s.e)
 #pragma acc exit data detach(s.e) attach(z) /* { dg-error ".attach. is not 
valid for" } */
diff --git a/gcc/testsuite/g++.dg/goacc/mdc.C b/gcc/testsuite/g++.dg/goacc/mdc.C
index e8ba1cceba2..9d460f286b4 100644
--- a/gcc/testsuite/g++.dg/goacc/mdc.C
+++ b/gcc/testsuite/g++.dg/goacc/mdc.C
@@ -43,7 +43,9 @@ t1 ()
 #pragma acc exit data detach(rz[:]) /* { dg-error "expected single pointer in 
.detach. clause" } */
 /* { dg-error "has no data movement clause" "" { target *-*-* } .-1 } */
 #pragma acc enter data attach(rz[3]) /* { dg-error "expected pointer in 
.attach. clause" } */
+/* { dg-error "has no data movement clause" "" { target *-*-* } .-1 } */
 #pragma acc exit data detach(rz[3]) /* { dg-error "expected pointer in 
.detach. clause" } */
+/* { dg-error "has no data movement clause" "" { target *-*-* } .-1 } */
 
 #pragma acc enter data attach(rs.e)
 #pragma acc exit data detach(rs.e) attach(rz) /* { dg-error ".attach. is not 
valid for" } */
-- 
2.29.2

Basic REG_EQUIV comprehension question

2022-09-15 Thread Robin Dapp via Gcc-patches

Hi,

I have been working on making better use of s390's vzero instruction.
Currently we rather zero a vector register once and load it into other
registers via vlr instead of emitting multiple vzeros.

At IRA/reload point we e.g. have

(insn 8 5 19 2 (set (reg/v:V2DI 64 [ zero ])
(const_vector:V2DI [
(const_int 0 [0]) repeated x2
])) "vzero-vs-vlr.c":18:17 412 {movv2di}
 (expr_list:REG_EQUIV (const_vector:V2DI [
(const_int 0 [0]) repeated x2 ])
(nil)))

so reg 64 is equivalent to a const_vector 0.  I expected ira/reload to
match our move pattern (abbreviated for readability)

(define_insn "mov"
  [(set (match_operand:V_64 0 "nonimmediate_operand" "v")
(match_operand:V_64 1 "general_operand"  "j00"))]
  "TARGET_ZARCH"
  "@
   vzero\t%v0 [..]

where the j00 constraint is simply

(define_constraint "j00"
  "Zero scalar or vector constant"
  (match_test "op == CONST0_RTX (GET_MODE (op))"))

Apparently this is not what's happening.  The vzero alternative is
rejected since the register is not actually a constant but only
equivalent to one.

It is possible to work around that by changing pattern decisions earlier
but I'd still like to understand what is supposed to happen here.
Should another pass perform the equiv replacement or is this not how all
of this works entirely?  I was also thinking into the direction of
register_move_costs and rtx_costs but at least initial attempts did not
help.

Thanks for clarifying
 Robin

[PATCH] RISC-V: Add RVV machine modes.

2022-09-15 Thread juzhe . zhong

From: zhongjuzhe 

gcc/ChangeLog:

* config/riscv/riscv-modes.def (VECTOR_BOOL_MODE): Add RVV mask modes.
(ADJUST_NUNITS): Adjust nunits using riscv_vector_chunks.
(ADJUST_ALIGNMENT): Adjust alignment.
(ADJUST_BYTESIZE): Adjust bytesize using riscv_vector_chunks.
(RVV_MODES): New macro.
(VECTOR_MODE_WITH_PREFIX): Add RVV vector modes.
(VECTOR_MODES_WITH_PREFIX): Add RVV vector modes.

---
 gcc/config/riscv/riscv-modes.def | 141 +++
 1 file changed, 141 insertions(+)

diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index 6e30c1a5595..95f69e87e23 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -22,6 +22,147 @@ along with GCC; see the file COPYING3.  If not see
 FLOAT_MODE (HF, 2, ieee_half_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
 
+/* Vector modes.  */
+
+/* Encode the ratio of SEW/LMUL into the mask types. There are the following
+ * mask types.  */
+
+/* | Mode | MIN_VLEN = 32 | MIN_VLEN = 64 |
+   |  | SEW/LMUL  | SEW/LMUL  |
+   | VNx1BI   | 32| 64|
+   | VNx2BI   | 16| 32|
+   | VNx4BI   | 8 | 16|
+   | VNx8BI   | 4 | 8 |
+   | VNx16BI  | 2 | 4 |
+   | VNx32BI  | 1 | 2 |
+   | VNx64BI  | N/A   | 1 |  */
+
+VECTOR_BOOL_MODE (VNx1BI, 1, BI, 8);
+VECTOR_BOOL_MODE (VNx2BI, 2, BI, 8);
+VECTOR_BOOL_MODE (VNx4BI, 4, BI, 8);
+VECTOR_BOOL_MODE (VNx8BI, 8, BI, 8);
+VECTOR_BOOL_MODE (VNx16BI, 16, BI, 8);
+VECTOR_BOOL_MODE (VNx32BI, 32, BI, 8);
+VECTOR_BOOL_MODE (VNx64BI, 64, BI, 8);
+
+ADJUST_NUNITS (VNx1BI, riscv_vector_chunks * 1);
+ADJUST_NUNITS (VNx2BI, riscv_vector_chunks * 2);
+ADJUST_NUNITS (VNx4BI, riscv_vector_chunks * 4);
+ADJUST_NUNITS (VNx8BI, riscv_vector_chunks * 8);
+ADJUST_NUNITS (VNx16BI, riscv_vector_chunks * 16);
+ADJUST_NUNITS (VNx32BI, riscv_vector_chunks * 32);
+ADJUST_NUNITS (VNx64BI, riscv_vector_chunks * 64);
+
+ADJUST_ALIGNMENT (VNx1BI, 1);
+ADJUST_ALIGNMENT (VNx2BI, 1);
+ADJUST_ALIGNMENT (VNx4BI, 1);
+ADJUST_ALIGNMENT (VNx8BI, 1);
+ADJUST_ALIGNMENT (VNx16BI, 1);
+ADJUST_ALIGNMENT (VNx32BI, 1);
+ADJUST_ALIGNMENT (VNx64BI, 1);
+
+ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
+ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
+ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
+ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
+ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
+ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
+ADJUST_BYTESIZE (VNx64BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
+
+/*
+   | Mode| MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
+   | | LMUL|  SEW/LMUL   | LMUL| SEW/LMUL|
+   | VNx1QI  | MF4 |  32 | MF8 | 64  |
+   | VNx2QI  | MF2 |  16 | MF4 | 32  |
+   | VNx4QI  | M1  |  8  | MF2 | 16  |
+   | VNx8QI  | M2  |  4  | M1  | 8   |
+   | VNx16QI | M4  |  2  | M2  | 4   |
+   | VNx32QI | M8  |  1  | M4  | 2   |
+   | VNx64QI | N/A |  N/A| M8  | 1   |
+   | VNx1(HI|HF) | MF2 |  32 | MF4 | 64  |
+   | VNx2(HI|HF) | M1  |  16 | MF2 | 32  |
+   | VNx4(HI|HF) | M2  |  8  | M1  | 16  |
+   | VNx8(HI|HF) | M4  |  4  | M2  | 8   |
+   | VNx16(HI|HF)| M8  |  2  | M4  | 4   |
+   | VNx32(HI|HF)| N/A |  N/A| M8  | 2   |
+   | VNx1(SI|SF) | M1  |  32 | MF2 | 64  |
+   | VNx2(SI|SF) | M2  |  16 | M1  | 32  |
+   | VNx4(SI|SF) | M4  |  8  | M2  | 16  |
+   | VNx8(SI|SF) | M8  |  4  | M4  | 8   |
+   | VNx16(SI|SF)| N/A |  N/A| M8  | 4   |
+   | VNx1(DI|DF) | N/A |  N/A| M1  | 64  |
+   | VNx2(DI|DF) | N/A |  N/A| M2  | 32  |
+   | VNx4(DI|DF) | N/A |  N/A| M4  | 16  |
+   | VNx8(DI|DF) | N/A |  N/A| M8  | 8   |
+*/
+
+/* Define RVV modes whose sizes are multiples of 64-bit chunks.  */
+#define RVV_MODES(NVECS, VB, VH, VS, VD)   
\
+  VECTOR_MODES_WITH_PREFIX (VNx, INT, 8 * NVECS, 0);   
\
+  VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 8

Re: Basic REG_EQUIV comprehension question

2022-09-15 Thread Robin Dapp via Gcc-patches

Small addition to clarify:  (insn 8) from the example is of course
matched to a vzero.  The "problem" begins when (reg 64) is later moved
into another register and the (const_vector) has been optimized to a
single definition e.g. by CSE, i.e. we have several

(insn yy (set (reg:V2DI xx) (reg:V2DI 64 [zero])) where (reg 64) is
equivalent to a (const_vector 0).

Regards
 Robin

Re: Basic REG_EQUIV comprehension question

2022-09-15 Thread Richard Sandiford via Gcc-patches

Robin Dapp via Gcc-patches  writes:
> Hi,
>
> I have been working on making better use of s390's vzero instruction.
> Currently we rather zero a vector register once and load it into other
> registers via vlr instead of emitting multiple vzeros.
>
> At IRA/reload point we e.g. have
>
> (insn 8 5 19 2 (set (reg/v:V2DI 64 [ zero ])
> (const_vector:V2DI [
> (const_int 0 [0]) repeated x2
> ])) "vzero-vs-vlr.c":18:17 412 {movv2di}
>  (expr_list:REG_EQUIV (const_vector:V2DI [
> (const_int 0 [0]) repeated x2 ])
> (nil)))
>
> so reg 64 is equivalent to a const_vector 0.  I expected ira/reload to
> match our move pattern (abbreviated for readability)
>
> (define_insn "mov"
>   [(set (match_operand:V_64 0 "nonimmediate_operand" "v")
> (match_operand:V_64 1 "general_operand"  "j00"))]
>   "TARGET_ZARCH"
>   "@
>vzero\t%v0 [..]
>
> where the j00 constraint is simply
>
> (define_constraint "j00"
>   "Zero scalar or vector constant"
>   (match_test "op == CONST0_RTX (GET_MODE (op))"))
>
> Apparently this is not what's happening.  The vzero alternative is
> rejected since the register is not actually a constant but only
> equivalent to one.
>
> It is possible to work around that by changing pattern decisions earlier
> but I'd still like to understand what is supposed to happen here.
> Should another pass perform the equiv replacement or is this not how all
> of this works entirely?  I was also thinking into the direction of
> register_move_costs and rtx_costs but at least initial attempts did not
> help.

Yeah, rtx_costs (or preferably insn_cost, if that works) seem like the
best way of addressing this.  If the target says that register moves are
cheaper than constant moves then it's a feature that CSE & co remove
duplicate constants.  The REG_EQUIV note is still useful in those cases
because the note tells IRA/LRA that if the source operand is spilled,
it would be possible to rematerialise the source value (rather than spill
the original source operand and reload it from the stack).

Thanks,
Richard

[PATCH] Fix c-c++-common/gomp/target-50.c test

2022-09-15 Thread Julian Brown

The expected scan dump output for this test will change after the
following patch is committed:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601558.html

But for now, this patch reverts to the old expected pattern so the test
passes. I will apply as obvious.

2022-09-15  Julian Brown  

gcc/testsuite/
* c-c++-common/gomp/target-50.c: Modify scan pattern.
---
 gcc/testsuite/c-c++-common/gomp/target-50.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/c-c++-common/gomp/target-50.c 
b/gcc/testsuite/c-c++-common/gomp/target-50.c
index a30a25e0893..41f1d37845c 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-50.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-50.c
@@ -17,7 +17,7 @@ int main()
 
   #pragma omp target map(tofrom: tmp->arr[0:10]) map(to: tmp->arr)
   { }
-/* { dg-final { scan-tree-dump-times {map\(struct:\*tmp \[len: 1\]\) 
map\(alloc:tmp[._0-9]*->arr \[len: [0-9]+\]\) map\(tofrom:\*_[0-9]+ \[len: 
[0-9]+\]\) map\(attach:tmp[._0-9]*->arr \[bias: 0\]\)} 2 "gimple" { target { ! 
{ nvptx*-*-* amdgcn*-*-* } } } } } */
+/* { dg-final { scan-tree-dump-times {map\(struct:\*tmp \[len: 1\]\) 
map\(to:tmp[._0-9]*->arr \[len: [0-9]+\]\) map\(tofrom:\*_[0-9]+ \[len: 
[0-9]+\]\) map\(attach:tmp[._0-9]*->arr \[bias: 0\]\)} 2 "gimple" { target { ! 
{ nvptx*-*-* amdgcn*-*-* } } } } } */
 
   return 0;
 }
-- 
2.29.2

Re: Basic REG_EQUIV comprehension question

2022-09-15 Thread Robin Dapp via Gcc-patches

> Yeah, rtx_costs (or preferably insn_cost, if that works) seem like the
> best way of addressing this.  If the target says that register moves are
> cheaper than constant moves then it's a feature that CSE & co remove
> duplicate constants.  The REG_EQUIV note is still useful in those cases
> because the note tells IRA/LRA that if the source operand is spilled,
> it would be possible to rematerialise the source value (rather than spill
> the original source operand and reload it from the stack).

Thanks, that's reasonable.  So the mechanism I thought of (match
alternatives via REG_EQUIV) doesn't exist and we should generally make
sure not to end up in such situations.

Thanks
 Robin

[PATCH] tree-optimization/106922 - PRE and virtual operand translation

2022-09-15 Thread Richard Biener via Gcc-patches

PRE implicitely keeps virtual operands at the blocks incoming version
but the explicit updating point during PHI translation fails to trigger
when there are no PHIs at all in a block.  Later lazy updating then
fails because of a too lose block check.  A similar issues plagues
reference invalidation when checking the ANTIC_OUT to ANTIC_IN
translation.  The following fixes both and makes the lazy updating
work.

The diagnostic testcase unfortunately requires boost so the
testcase is the one I reduced for a missed optimization in PRE.
The testcase fails with -m32 on x86_64 because we optimize too
much before PRE which causes PRE to not trigger so we fail to
eliminate a full redundancy.  I'm going to open a separate bug
for this.  Hopefully the !lp64 selector is good enough.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

PR tree-optimization/106922
* tree-ssa-pre.cc (translate_vuse_through_block): Only
keep the VUSE if its def dominates PHIBLOCK.
(prune_clobbered_mems): Rewrite logic so we check whether
a value dies in a block when the VUSE def doesn't dominate it.

* g++.dg/tree-ssa/pr106922.C: New testcase.
---
 gcc/testsuite/g++.dg/tree-ssa/pr106922.C | 91 
 gcc/tree-ssa-pre.cc  | 18 +++--
 2 files changed, 103 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr106922.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr106922.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
new file mode 100644
index 000..faf379b0361
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
@@ -0,0 +1,91 @@
+// { dg-require-effective-target c++20 }
+// { dg-options "-O2 -fdump-tree-pre-details -fdump-tree-cddce3" }
+
+template  struct __new_allocator {
+  void deallocate(int *, int) { operator delete(0); }
+};
+template  using __allocator_base = __new_allocator<_Tp>;
+template  struct allocator : __allocator_base {
+  [[__gnu__::__always_inline__]] void deallocate(int *__p, int __n) {
+__allocator_base::deallocate(__p, __n);
+  }
+};
+template  struct allocator_traits;
+template  struct allocator_traits> {
+  using allocator_type = allocator<_Tp>;
+  using pointer = _Tp *;
+  using size_type = int;
+  template  using rebind_alloc = allocator<_Up>;
+  static void deallocate(allocator_type &__a, pointer __p, size_type __n) {
+__a.deallocate(__p, __n);
+  }
+};
+template  struct __alloc_traits : allocator_traits<_Alloc> {
+  typedef allocator_traits<_Alloc> _Base_type;
+  template  struct rebind {
+typedef _Base_type::template rebind_alloc<_Tp> other;
+  };
+};
+long _M_deallocate___n;
+struct _Vector_base {
+  typedef __alloc_traits>::rebind::other _Tp_alloc_type;
+  typedef __alloc_traits<_Tp_alloc_type>::pointer pointer;
+  struct _Vector_impl_data {
+pointer _M_start;
+  };
+  struct _Vector_impl : _Tp_alloc_type, _Vector_impl_data {};
+  ~_Vector_base() { _M_deallocate(_M_impl._M_start); }
+  _Vector_impl _M_impl;
+  void _M_deallocate(pointer __p) {
+if (__p)
+  __alloc_traits<_Tp_alloc_type>::deallocate(_M_impl, __p,
+ _M_deallocate___n);
+  }
+};
+struct vector : _Vector_base {};
+struct aligned_storage {
+  int dummy_;
+  int *ptr_ref0;
+  vector &ref() {
+vector *__trans_tmp_2;
+void *__trans_tmp_1 = &dummy_;
+union {
+  void *ap_pvoid;
+  vector *as_ptype;
+} caster{__trans_tmp_1};
+__trans_tmp_2 = caster.as_ptype;
+return *__trans_tmp_2;
+  }
+};
+struct optional_base {
+  optional_base operator=(optional_base &) {
+bool __trans_tmp_3 = m_initialized;
+if (__trans_tmp_3)
+  m_initialized = false;
+return *this;
+  }
+  ~optional_base() {
+if (m_initialized)
+  m_storage.ref().~vector();
+  }
+  bool m_initialized;
+  aligned_storage m_storage;
+};
+struct optional : optional_base {
+  optional() : optional_base() {}
+};
+template  using Optional = optional;
+struct Trans_NS___cxx11_basic_stringstream {};
+void operator<<(Trans_NS___cxx11_basic_stringstream, int);
+int testfunctionfoo_myStructs[10];
+void testfunctionfoo() {
+  Optional external, internal;
+  for (auto myStruct : testfunctionfoo_myStructs) {
+Trans_NS___cxx11_basic_stringstream address_stream;
+address_stream << myStruct;
+external = internal;
+  }
+}
+
+// { dg-final { scan-tree-dump-times "Found fully redundant value" 4 "pre" { 
xfail { ! lp64 } } } }
+// { dg-final { scan-tree-dump-not "m_initialized" "cddce3" { xfail { ! lp64 } 
} } }
diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
index e029bd36da3..2afc74fc57c 100644
--- a/gcc/tree-ssa-pre.cc
+++ b/gcc/tree-ssa-pre.cc
@@ -1236,7 +1236,11 @@ translate_vuse_through_block (vec 
operands,
   if (same_valid)
 *same_valid = true;
 
-  if (gimple_bb (phi) != phiblock)
+  /* If value-numbering provided a memory state for this
+ that dominates PHIBLOCK we can just use that.  */
+  if (gimple_nop_p

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-09-15 Thread Prathamesh Kulkarni via Gcc-patches

On Mon, 12 Sept 2022 at 19:57, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Mon, 5 Sept 2022 at 15:51, Richard Sandiford
> >  wrote:
> >>
> >> Sorry for the slow reply.  I wrote a response a couple of weeks ago
> >> but I think it get lost in a machine outage.
> >>
> >> Prathamesh Kulkarni  writes:
> >> > Hi,
> >> > The attached prototype patch extends fold_vec_perm to fold VEC_PERM_EXPR
> >> > in VLA manner, and currently handles the following cases:
> >> > (a) fixed len arg0, arg1 and fixed len sel.
> >> > (b) fixed len arg0, arg1 and vla sel
> >> > (c) vla arg0, arg1 and vla sel with arg0, arg1 being VECTOR_CST.
> >> >
> >> > It seems to work for the VLA tests written in
> >> > test_vec_perm_vla_folding (), and am working thru the fallout observed in
> >> > regression testing.
> >> >
> >> > Does the approach taken in the patch look in the right direction ?
> >> > I am not sure if I have got the conversion from "sel_index"
> >> > to index of either arg0, or arg1 entirely correct.
> >> > I would be grateful for suggestions on the patch.
> >> >
> >> > Thanks,
> >> > Prathamesh
> >> >
> >> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> >> > index 4f4ec81c8d4..5e12260211e 100644
> >> > --- a/gcc/fold-const.cc
> >> > +++ b/gcc/fold-const.cc
> >> > @@ -85,6 +85,9 @@ along with GCC; see the file COPYING3.  If not see
> >> >  #include "vec-perm-indices.h"
> >> >  #include "asan.h"
> >> >  #include "gimple-range.h"
> >> > +#include "tree-pretty-print.h"
> >> > +#include "gimple-pretty-print.h"
> >> > +#include "print-tree.h"
> >> >
> >> >  /* Nonzero if we are folding constants inside an initializer or a C++
> >> > manifestly-constant-evaluated context; zero otherwise.
> >> > @@ -10496,40 +10499,6 @@ fold_mult_zconjz (location_t loc, tree type, 
> >> > tree expr)
> >> > build_zero_cst (itype));
> >> >  }
> >> >
> >> > -
> >> > -/* Helper function for fold_vec_perm.  Store elements of VECTOR_CST or
> >> > -   CONSTRUCTOR ARG into array ELTS, which has NELTS elements, and return
> >> > -   true if successful.  */
> >> > -
> >> > -static bool
> >> > -vec_cst_ctor_to_array (tree arg, unsigned int nelts, tree *elts)
> >> > -{
> >> > -  unsigned HOST_WIDE_INT i, nunits;
> >> > -
> >> > -  if (TREE_CODE (arg) == VECTOR_CST
> >> > -  && VECTOR_CST_NELTS (arg).is_constant (&nunits))
> >> > -{
> >> > -  for (i = 0; i < nunits; ++i)
> >> > - elts[i] = VECTOR_CST_ELT (arg, i);
> >> > -}
> >> > -  else if (TREE_CODE (arg) == CONSTRUCTOR)
> >> > -{
> >> > -  constructor_elt *elt;
> >> > -
> >> > -  FOR_EACH_VEC_SAFE_ELT (CONSTRUCTOR_ELTS (arg), i, elt)
> >> > - if (i >= nelts || TREE_CODE (TREE_TYPE (elt->value)) == 
> >> > VECTOR_TYPE)
> >> > -   return false;
> >> > - else
> >> > -   elts[i] = elt->value;
> >> > -}
> >> > -  else
> >> > -return false;
> >> > -  for (; i < nelts; i++)
> >> > -elts[i]
> >> > -  = fold_convert (TREE_TYPE (TREE_TYPE (arg)), integer_zero_node);
> >> > -  return true;
> >> > -}
> >> > -
> >> >  /* Attempt to fold vector permutation of ARG0 and ARG1 vectors using SEL
> >> > selector.  Return the folded VECTOR_CST or CONSTRUCTOR if successful,
> >> > NULL_TREE otherwise.  */
> >> > @@ -10537,45 +10506,149 @@ vec_cst_ctor_to_array (tree arg, unsigned int 
> >> > nelts, tree *elts)
> >> >  tree
> >> >  fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices 
> >> > &sel)
> >> >  {
> >> > -  unsigned int i;
> >> > -  unsigned HOST_WIDE_INT nelts;
> >> > -  bool need_ctor = false;
> >> > +  poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >> > +  poly_uint64 arg1_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1));
> >> > +
> >> > +  gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type),
> >> > + sel.length ()));
> >> > +  gcc_assert (known_eq (arg0_len, arg1_len));
> >> >
> >> > -  if (!sel.length ().is_constant (&nelts))
> >> > -return NULL_TREE;
> >> > -  gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> >> > -   && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> >> > -   && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), 
> >> > nelts));
> >> >if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
> >> >|| TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
> >> >  return NULL_TREE;
> >> >
> >> > -  tree *in_elts = XALLOCAVEC (tree, nelts * 2);
> >> > -  if (!vec_cst_ctor_to_array (arg0, nelts, in_elts)
> >> > -  || !vec_cst_ctor_to_array (arg1, nelts, in_elts + nelts))
> >> > +  unsigned input_npatterns = 0;
> >> > +  unsigned out_npatterns = sel.encoding ().npatterns ();
> >> > +  unsigned out_nelts_per_pattern = sel.encoding ().nelts_per_pattern ();
> >> > +
> >> > +  /* FIXME: How to reshape fixed length vector_cst, so that
> >> > + npatterns == vector.length () and nelts_per_pattern == 1 ?
> >> > + It seems the vector is canonicalized to minimize npatterns.

[pushed] MAINTAINERS: Add myself to Write After Approval

2022-09-15 Thread Torbjörn SVENSSON via Gcc-patches

ChangeLog:

* MAINTAINERS (Write After Approval): Add myself.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e89eb343528..be146855ed8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -641,6 +641,7 @@ YunQiang Su 

 Robert Suchanek
 Andrew Sutton  
 Gabriele Svelto
+Torbjörn Svensson  
 Toma Tabacu
 Omar Tahir 
 Sriraman Tallam
-- 
2.25.1

Re: [PATCH] tree-object-size: Support strndup and strdup

2022-09-15 Thread Siddhesh Poyarekar


Ping!

On 2022-09-07 15:21, Siddhesh Poyarekar wrote:

Ping!

On 2022-08-29 10:16, Siddhesh Poyarekar wrote:

Ping!

On 2022-08-15 15:23, Siddhesh Poyarekar wrote:

Use string length of input to strdup to determine the usable size of the
resulting object.  Avoid doing the same for strndup since there's a
chance that the input may be too large, resulting in an unnecessary
overhead or worse, the input may not be NULL terminated, resulting in a
crash where there would otherwise have been none.

gcc/ChangeLog:

* tree-object-size.cc (get_whole_object): New function.
(addr_object_size): Use it.
(strdup_object_size): New function.
(call_object_size): Use it.
(pass_data_object_sizes, pass_data_early_object_sizes): Set
todo_flags_finish to TODO_update_ssa_no_phi.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c (test_strdup,
test_strndup, test_strdup_min, test_strndup_min): New tests.
(main): Call them.
* gcc.dg/builtin-dynamic-object-size-1.c: Silence overread
warnings.
* gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-4.c: Likewise.
* gcc.dg/builtin-object-size-1.c: Silence overread warnings.
Declare free, strdup and strndup.
(test11): New test.
(main): Call it.
* gcc.dg/builtin-object-size-2.c: Silence overread warnings.
Declare free, strdup and strndup.
(test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-3.c: Silence overread warnings.
Declare free, strdup and strndup.
(test11): New test.
(main): Call it.
* gcc.dg/builtin-object-size-4.c: Silence overread warnings.
Declare free, strdup and strndup.
(test9): New test.
(main): Call it.
---
  .../gcc.dg/builtin-dynamic-object-size-0.c    | 43 +++
  .../gcc.dg/builtin-dynamic-object-size-1.c    |  2 +-
  .../gcc.dg/builtin-dynamic-object-size-2.c    |  2 +-
  .../gcc.dg/builtin-dynamic-object-size-3.c    |  2 +-
  .../gcc.dg/builtin-dynamic-object-size-4.c    |  2 +-
  gcc/testsuite/gcc.dg/builtin-object-size-1.c  | 64 +++-
  gcc/testsuite/gcc.dg/builtin-object-size-2.c  | 63 ++-
  gcc/testsuite/gcc.dg/builtin-object-size-3.c  | 63 ++-
  gcc/testsuite/gcc.dg/builtin-object-size-4.c  | 63 ++-
  gcc/tree-object-size.cc   | 76 +--
  10 files changed, 366 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c

index 01a280b2d7b..7f023708b15 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -479,6 +479,40 @@ test_loop (int *obj, size_t sz, size_t start, 
size_t end, int incr)

    return __builtin_dynamic_object_size (ptr, 0);
  }
+/* strdup/strndup.  */
+
+size_t
+__attribute__ ((noinline))
+test_strdup (const char *in)
+{
+  char *res = __builtin_strdup (in);
+  return __builtin_dynamic_object_size (res, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_strndup (const char *in, size_t bound)
+{
+  char *res = __builtin_strndup (in, bound);
+  return __builtin_dynamic_object_size (res, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_strdup_min (const char *in)
+{
+  char *res = __builtin_strdup (in);
+  return __builtin_dynamic_object_size (res, 2);
+}
+
+size_t
+__attribute__ ((noinline))
+test_strndup_min (const char *in, size_t bound)
+{
+  char *res = __builtin_strndup (in, bound);
+  return __builtin_dynamic_object_size (res, 2);
+}
+
  /* Other tests.  */
  struct TV4
@@ -651,6 +685,15 @@ main (int argc, char **argv)
    int *t = test_pr105736 (&val3);
    if (__builtin_dynamic_object_size (t, 0) != -1)
  FAIL ();
+  const char *str = "hello world";
+  if (test_strdup (str) != __builtin_strlen (str) + 1)
+    FAIL ();
+  if (test_strndup (str, 4) != 5)
+    FAIL ();
+  if (test_strdup_min (str) != __builtin_strlen (str) + 1)
+    FAIL ();
+  if (test_strndup_min (str, 4) != 0)
+    FAIL ();
    if (nfails > 0)
  __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c

index 7cc8b1c9488..8f17c8edcaf 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
@@ -1,5 +1,5 @@
  /* { dg-do run } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -Wno-stringop-overread" } */
  /* { dg-require-effective-target alloca } */
  #define __builtin_object_size __builtin_dynamic_object_size
diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c

index 267dbf48ca7..3677782ff1c 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
@@ -1,5 +1,5 @@
  /* { dg-do run } */
-/* { dg-

Re: [PATCH][_GLIBCXX_INLINE_VERSION] Cleanup gnu-versioned-namespace.ver

2022-09-15 Thread Jonathan Wakely via Gcc-patches

On Wed, 14 Sept 2022 at 18:22, François Dumont via Libstdc++
 wrote:
>
>  libstdc++: [_GLIBCXX_INLINE_VERSION] Cleanup
> gnu-versioned-namespace.ver
>
>  Remove expressions for symbols in std::__detail::__8 namespace,
> they are obsolete since
>  version namespace applies only at std:: level, not at sub-levels.
>
>  libstdc++-v3/ChangeLog:
>
>  * config/abi/pre/gnu-versioned-namespace.ver: Remove
> obsolete std::__detail::__8
>  symbols.
>
> Tested under Linux x86_64.
>
> Ok to commit ?

Yes, thanks.

Re: [PATCH][_GLIBCXX_INLINE_VERSION] Fix test dg-prune-output

2022-09-15 Thread Jonathan Wakely via Gcc-patches

On Wed, 14 Sept 2022 at 18:26, François Dumont via Libstdc++
 wrote:
>
>  libstdc++: [_GLIBCXX_INLINE_VERSION] Fix test dg-prune-output
>
>  libstdc++-v3/ChangeLog:
>
>  *
> testsuite/20_util/is_complete_or_unbounded/memoization_neg.cc: Adapt

Please put the "Adapt" on the next line. The file path is already
longer than the maximum line for a ChangeLog :-(

OK with that adjustment, thanks!

> dg-prune-output to
>  _GLIBCXX_INLINE_VERSION mode.
>
>
> With this patch all tests are Ok in _GLIBCXX_INLINE_VERSION mode (at the
> time I'm writing this).
>
> Ok to commit ?
>
> François

Re: [PATCH] libstdc++: Implement ranges::chunk_by_view from P2443R1

2022-09-15 Thread Jonathan Wakely via Gcc-patches

On Wed, 14 Sept 2022 at 15:19, Patrick Palka via Libstdc++
 wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

OK, thanks.

[PATCH] c++: constraint matching, TEMPLATE_ID_EXPR, current inst

2022-09-15 Thread Patrick Palka via Gcc-patches

Here we're crashing during constraint matching for the instantiated
hidden friends due to two issues with dependent substitution into a
TEMPLATE_ID_EXPR naming a template from the current instantiation
(as performed from maybe_substitute_reqs_for for C<3> with T=T):

  * tsubst_copy substitutes into such a TEMPLATE_DECL by looking it
up from the substituted class scope.  But for this to not fail when
the args are dependent, we need to pass entering_scope=true for the
class scope substitution so that we obtain the primary template type
A (which has TYPE_BINFO) instead of the implicit instantiation
A (which doesn't).
  * lookup_and_finish_template_variable shouldn't instantiate a
TEMPLATE_ID_EXPR that names a TEMPLATE_DECL which has more than
one level of (unsubstituted) parameters (such as A::C).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* pt.cc (lookup_and_finish_template_variable): Don't
instantiate if the template's scope is dependent.
(tsubst_copy) : Pass entering_scope=true
when substituting the class scope.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-friend10.C: New test.
---
 gcc/cp/pt.cc  | 14 +++--
 .../g++.dg/cpp2a/concepts-friend10.C  | 21 +++
 2 files changed, 29 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-friend10.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index db4e808adec..bfcbe0b8670 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -10475,14 +10475,15 @@ tree
 lookup_and_finish_template_variable (tree templ, tree targs,
 tsubst_flags_t complain)
 {
-  templ = lookup_template_variable (templ, targs);
-  if (!any_dependent_template_arguments_p (targs))
+  tree var = lookup_template_variable (templ, targs);
+  if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (templ)) == 1
+  && !any_dependent_template_arguments_p (targs))
 {
-  templ = finish_template_variable (templ, complain);
-  mark_used (templ);
+  var = finish_template_variable (var, complain);
+  mark_used (var);
 }
 
-  return convert_from_reference (templ);
+  return convert_from_reference (var);
 }
 
 /* If the set of template parameters PARMS contains a template parameter
@@ -17282,7 +17283,8 @@ tsubst_copy (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
 TEMPLATE_DECL with `D' as its DECL_CONTEXT.  Now we
 have to substitute this with one having context `D'.  */
 
- tree context = tsubst (DECL_CONTEXT (t), args, complain, in_decl);
+ tree context = tsubst_aggr_type (DECL_CONTEXT (t), args, complain,
+  in_decl, /*entering_scope=*/true);
  return lookup_field (context, DECL_NAME(t), 0, false);
}
   else
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-friend10.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-friend10.C
new file mode 100644
index 000..4b21a379f59
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-friend10.C
@@ -0,0 +1,21 @@
+// { dg-do compile { target c++20 } }
+// Verify we don't crash during constraint matching containing
+// a TEMPLATE_ID_EXPR referring to a template from the current
+// instantiation.
+
+template
+struct A {
+  template static constexpr bool C = sizeof(T) > N;
+  friend constexpr void f(A) requires C<3> { }
+  friend constexpr void f(A) requires C<3> || true { }
+};
+
+template
+struct A {
+  template static constexpr bool C = sizeof(T) > N;
+  friend constexpr void g(A) requires C<3> { }
+  friend constexpr void g(A) requires C<3> || true { }
+};
+
+template struct A;
+template struct A;
-- 
2.37.3.662.g36f8e7ed7d

[PATCH] c++: 'mutable' within constexpr [PR92505]

2022-09-15 Thread Patrick Palka via Gcc-patches

This patch permits accessing 'mutable' members of local objects during
constexpr evaluation (which other compilers seem to accept in C++14
mode, while we reject), while continuing to reject it for global objects
(as in the last line of cpp0x/constexpr-mutable1.C, which other
compilers also reject).  To distinguish between the two cases, it looks
like we just need to additionally check CONSTRUCTOR_MUTABLE_POISION
alongside DECL_MUTABLE_P in cxx_eval_component_reference before
rejecting the access.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/92505

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_component_reference): Test non_constant_p
earlier.  In C++14 or later, reject DECL_MUTABLE_P member
accesses only if CONSTRUCTOR_MUTABLE_POISION is also set.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-mutable3.C: New test.
* g++.dg/cpp1y/constexpr-mutable1.C: New test.
---
 gcc/cp/constexpr.cc | 11 +++
 gcc/testsuite/g++.dg/cpp0x/constexpr-mutable3.C |  7 +++
 gcc/testsuite/g++.dg/cpp1y/constexpr-mutable1.C | 16 
 3 files changed, 30 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-mutable3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-mutable1.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 57283eabf3c..10639876d9c 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -4088,6 +4088,8 @@ cxx_eval_component_reference (const constexpr_ctx *ctx, 
tree t,
   tree whole = cxx_eval_constant_expression (ctx, orig_whole,
 lval,
 non_constant_p, overflow_p);
+  if (*non_constant_p)
+return t;
   if (INDIRECT_REF_P (whole)
   && integer_zerop (TREE_OPERAND (whole, 0)))
 {
@@ -4108,20 +4110,21 @@ cxx_eval_component_reference (const constexpr_ctx *ctx, 
tree t,
whole, part, NULL_TREE);
   /* Don't VERIFY_CONSTANT here; we only want to check that we got a
  CONSTRUCTOR.  */
-  if (!*non_constant_p && TREE_CODE (whole) != CONSTRUCTOR)
+  if (TREE_CODE (whole) != CONSTRUCTOR)
 {
   if (!ctx->quiet)
error ("%qE is not a constant expression", orig_whole);
   *non_constant_p = true;
+  return t;
 }
-  if (DECL_MUTABLE_P (part))
+  if ((cxx_dialect < cxx14 || CONSTRUCTOR_MUTABLE_POISON (whole))
+  && DECL_MUTABLE_P (part))
 {
   if (!ctx->quiet)
error ("mutable %qD is not usable in a constant expression", part);
   *non_constant_p = true;
+  return t;
 }
-  if (*non_constant_p)
-return t;
   bool pmf = TYPE_PTRMEMFUNC_P (TREE_TYPE (whole));
   FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (whole), i, field, value)
 {
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable3.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable3.C
new file mode 100644
index 000..46c9d8437be
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable3.C
@@ -0,0 +1,7 @@
+// PR c++/92505
+// { dg-do compile { target c++11 } }
+
+struct A { mutable int m; };
+constexpr int f(A a) { return a.m; }
+static_assert(f({42}) == 42, "");
+// { dg-error "non-constant|mutable" "" { target c++11_only } .-1 }
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-mutable1.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-mutable1.C
new file mode 100644
index 000..6c47988c01a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-mutable1.C
@@ -0,0 +1,16 @@
+// PR c++/92505
+// { dg-do compile { target c++14 } }
+
+struct S { mutable int m; };
+
+static_assert(S{42}.m == 42, "");
+
+constexpr int f() {
+  S s = {40};
+  s.m++;
+  const auto& cs = s;
+  ++cs.m;
+  return cs.m;
+}
+
+static_assert(f() == 42, "");
-- 
2.37.3.662.g36f8e7ed7d

[PATCH] c++: modules ICE with typename friend declaration

2022-09-15 Thread Patrick Palka via Gcc-patches

A couple of xtreme-header-* modules tests began ICEing in C++23 mode
ever since r13-2650-g5d84a4418aa962 introduced into  the
dependently scoped friend declaration

  friend /* typename */ _OuterIter::value_type;

ultimately because the streaming code assumes a TYPE_P friend must
be a class type, but here it's a TYPENAME_TYPE, which doesn't have
a TEMPLATE_INFO or CLASSTYPE_BEFRIENDING_CLASSES.  This patch tries
to correct this in a minimal way.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

gcc/cp/ChangeLog:

* module.cc (friend_from_decl_list): Don't consider
CLASSTYPE_TEMPLATE_INFO for a TYPENAME_TYPE friend.
(trees_in::read_class_def): Don't add to
CLASSTYPE_BEFRIENDING_CLASSES for a TYPENAME_TYPE friend.

gcc/testsuite/ChangeLog:

* g++.dg/modules/typename-friend.C: New test.
---
 gcc/cp/module.cc   | 5 +++--
 gcc/testsuite/g++.dg/modules/typename-friend.C | 9 +
 2 files changed, 12 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/typename-friend.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f27f4d091e5..1a1ff5be574 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -4734,7 +4734,8 @@ friend_from_decl_list (tree frnd)
   if (TYPE_P (frnd))
{
  res = TYPE_NAME (frnd);
- if (CLASSTYPE_TEMPLATE_INFO (frnd))
+ if (CLASS_TYPE_P (frnd)
+ && CLASSTYPE_TEMPLATE_INFO (frnd))
tmpl = CLASSTYPE_TI_TEMPLATE (frnd);
}
   else if (DECL_TEMPLATE_INFO (frnd))
@@ -12121,7 +12122,7 @@ trees_in::read_class_def (tree defn, tree 
maybe_template)
{
  tree f = TREE_VALUE (friend_classes);
 
- if (TYPE_P (f))
+ if (CLASS_TYPE_P (f))
{
  CLASSTYPE_BEFRIENDING_CLASSES (f)
= tree_cons (NULL_TREE, type,
diff --git a/gcc/testsuite/g++.dg/modules/typename-friend.C 
b/gcc/testsuite/g++.dg/modules/typename-friend.C
new file mode 100644
index 000..d8faf7955c3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/typename-friend.C
@@ -0,0 +1,9 @@
+// { dg-additional-options "-fmodules-ts" }
+
+export module x;
+
+template
+struct A {
+  friend typename T::type;
+  friend void f(A) { }
+};
-- 
2.37.3.662.g36f8e7ed7d

[PATCH, committed] Fortran: error recovery for bad deferred character length assignment [PR104314]

2022-09-15 Thread Harald Anlauf via Gcc-patches

Dear all,

the attached obvious patch fixes an ICE on a NULL pointer
dereference.  We didn't properly check that the types of
expressions are character before referencing the length.

The issue was originally investigated by Steve, so I made
him co-author.

Regtested on x86_64-pc-linux-gnu and pushed to mainline as
commit r13-2690-g7bd4deb2a7c1394550610ab27507d1ed2af817c2

Thanks,
Harald

From 7bd4deb2a7c1394550610ab27507d1ed2af817c2 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 15 Sep 2022 22:06:53 +0200
Subject: [PATCH] Fortran: error recovery for bad deferred character length
 assignment [PR104314]

gcc/fortran/ChangeLog:

	PR fortran/104314
	* resolve.cc (deferred_op_assign): Do not try to generate temporary
	for deferred character length assignment if types do not agree.

gcc/testsuite/ChangeLog:

	PR fortran/104314
	* gfortran.dg/pr104314.f90: New test.

Co-authored-by: Steven G. Kargl 
---
 gcc/fortran/resolve.cc | 1 +
 gcc/testsuite/gfortran.dg/pr104314.f90 | 9 +
 2 files changed, 10 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr104314.f90

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index ca114750f65..ae7ebb624e4 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -11803,6 +11803,7 @@ deferred_op_assign (gfc_code **code, gfc_namespace *ns)

   if (!((*code)->expr1->ts.type == BT_CHARACTER
 	 && (*code)->expr1->ts.deferred && (*code)->expr1->rank
+	 && (*code)->expr2->ts.type == BT_CHARACTER
 	 && (*code)->expr2->expr_type == EXPR_OP))
 return false;

diff --git a/gcc/testsuite/gfortran.dg/pr104314.f90 b/gcc/testsuite/gfortran.dg/pr104314.f90
new file mode 100644
index 000..510ded0b164
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr104314.f90
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! PR fortran/104314 - ICE in deferred_op_assign
+! Contributed by G.Steinmetz
+
+program p
+  character(:), allocatable :: c(:)
+  c = ['123']
+  c = c == c  ! { dg-error "Cannot convert" }
+end
--
2.35.3

Re: [PATCH] Rewrite NAN and sign handling in frange

2022-09-15 Thread Aldy Hernandez via Gcc-patches

On Thu, Sep 15, 2022 at 9:06 AM Richard Biener
 wrote:
>
> On Thu, Sep 15, 2022 at 7:41 AM Aldy Hernandez  wrote:
> >
> > Hi Richard.  Hi all.
> >
> > The attatched patch rewrites the NAN and sign handling, dropping both
> > tristates in favor of a pair of boolean flags for NANs, and nothing at
> > all for signs.  The signs are tracked in the range itself, so now it's
> > possible to describe things like [-0.0, +0.0] +NAN, [+0, +0], [-5, +0],
> > [+0, 3] -NAN, etc.
> >
> > There are a lot of changes, as the tristate was quite pervasive.  I
> > could use another pair of eyes.  The code IMO is cleaner and handles
> > all the cases we discussed.
> >
> > Here is an example of the various ranges and how they are displayed:
> >
> > [frange] float VARYING NAN ;; Varying includes NAN
> > [frange] UNDEFINED  ;; Empty set as always
> > [frange] float [] NAN   ;; Unknown sign NAN
> > [frange] float [] -NAN  ;; -NAN
> > [frange] float [] +NAN  ;; +NAN
> > [frange] float [-0.0, 0.0]  ;; All zeros.
> > [frange] float [-0.0, -0.0] NAN ;; -0 or NAN.
> > [frange] float [-5.0e+0, -1.0e+0] +NAN  ;; [-5, -1] or +NAN
> > [frange] float [-5.0e+0, -0.0] NAN  ;; [-5, -0] or +-NAN
> > [frange] float [-5.0e+0, -0.0]  ;; [-5, -0]
> > [frange] float [5.0e+0, 1.0e+1] ;; [5, 10]
> >
> > We could represent an unknown sign with +NAN -NAN if preferred.
>
> maybe -+NAN or +-NAN?  I prefer to somehow show both signs for clarity

Sure.

>
> >
> > Notice the NAN signs are decoupled from the range, so we can represent
> > a negative range with a positive NAN.  For this range,
> > frange::known_bit() would return false, as only when the signs of the
> > NANs and range agree can we be certain.
> >
> > There is no longer any pessimization of ranges for intersects
> > involving NANs.  Also, union and intersect work with signed zeros:
> >
> > //   [-0,  x] U [+0,  x] => [-0,  x]
> > //   [ x, -0] U [ x, +0] => [ x, +0]
> > //   [-0,  x] ^ [+0,  x] => [+0,  x]
> > //   [ x, -0] ^ [ x, +0] => [ x, -0]
> >
> > The special casing for signed zeros in the singleton code is gone in
> > favor of just making sure the signs in the range agree, that is
> > [-0, -0] for example.
> >
> > I have removed the idea that a known NAN is a "range", so a NAN is no
> > longer in the endpoints itself.  Requesting the bound of a known NAN
> > is a hard fail.  For that matter, we don't store the actual NAN in the
> > range.  The only information we have are the set of boolean flags.
> > This way we make sure nothing seeps into the frange.  This also means
> > it's explicit that we don't track anything but the sign in NANs.  We
> > can revisit this if we desire to track signalling or whatever
> > concoction y'all can imagine.
> >
> > All in all, I'm quite happy with this.  It does look better, and we
> > handle all the corner cases we couldn't before.  Thanks for the
> > suggestion.
> >
> > Regstrapped with mpfr tests on x86-64 and ppc64le Linux.  Selftests
> > were also run with -ffinite-math-only on x86-64.
> >
> > At Jakub's suggestion, I built lapack with associated tests.  They
> > pass on x86-64 and ppc64le Linux with no regressions from mainline.
> > As a sanity check, I also ran them for -ffinite-math-only on x86 which
> > (as expected) returned:
> >
> > NaN arithmetic did not perform per the ieee spec
> >
> > Otherwise, all tests pass for -ffinite-math-only.
> >
> > How does this look?
>
> Overall it looks good.
>
> Reading ::intersect and ::union I find it less clear to spread out the _nan
> cases into separate functions.

OK, will inline them.

>
> Can you add a comment to frange that its representation is
> a single value-range specified by m_type, m_min, m_max
> unioned with the set of { -NaN, +NaN }?  Because somehow
> the ::undefined_p vs. m_type == VR_UNDEFINED checks are
> a bit confusing to the occasional reader can we instead use
> ::nan_p to complement ::undefined_p?

Wouldn't that just make nan_p the same as known_nan?  Speaking of
which, I'm not a big fan of known_nan.  Perhaps we should rename all
the known_foo variants to foo_p variants?  Or...maybe even:

  // fpclassify like API
  bool isfinite () const;
  bool isinf () const;
  bool maybe_isinf () const;
  bool isnan () const;
  bool maybe_isnan () const;
  bool signbit_p (bool &signbit) const;

That would make it clear how they map to the fpclassify API.  And the
signbit_p() follows what we do for singleton_p(tree *).

isnan() would be your nan_p suggestion.

>
> Brain dump: maybe having a NaN-less frange with m_type, m_min, m_max
> and then frange_with_nan having a frange member plus the nan bits
> would make a better distinction?  Maybe we can use m_type == VR_RANGE
> when the actual range is empty but we have NaNs somehow?  That we
> need m_type to represent an empty range and VR_VARYING for the full
> range is somehow duplicate - ]0,0[

[committed] libstdc++: Tweak TSan annotations for std::atomic>

2022-09-15 Thread Jonathan Wakely via Gcc-patches

On Wed, 14 Sept 2022 at 23:28, Jonathan Wakely wrote:
>
> On Wed, 14 Sept 2022 at 23:25, Jonathan Wakely wrote:
> >
> > On Wed, 14 Sept 2022 at 23:05, Jonathan Wakely via Libstdc++
> >  wrote:
> > > @@ -377,6 +401,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > ~_Atomic_count()
> > > {
> > >   auto __val = _M_val.load(memory_order_relaxed);
> > > + _GLIBCXX_TSAN_MUTEX_DESTROY(&_M_val);
> >
> > After further thought, I'm not sure this is right. This tells tsan
> > that the "mutex" at &_M_val cannot be locked or unlocked again after
> > this. But what happens if the address is reused by a different
> > atomic> which happens to be at the same memory address?
> > Will tsan think that's an invalid use of the original "mutex" after
> > its destruction?
>
> We can't easily add a call to __tsan_mutex_create, which would begin
> the lifetime of a new object at that address, because the default
> constructor is constexpr, and the create function isn't.
>
> >
> > I will investigate.

I investigated.

There is a bug in my commit, but only that I pass
__tsan_mutex_not_static to the unlock annotations, and it's only valid
for the create and lock annotations. But it appears to simply be ignored
by the unlock functions, so it's harmless.

It seems that if __tsan_mutex_create has not been called, passing
__tsan_mutex_not_static to the lock functions implicitly begins the
lifetime of a lock. That means it's fine to "destroy" with an address
that then gets reused later for a second object, because we'll
implicitly create a new mutex (in tsan's mind) when we first lock it.
But it also means that tsan doesn't complain about this:

  using A = std::atomic>;
 alignas(A) unsigned char buf[sizeof(A)];
 A* a = new(buf) A();
 a->load();
 a->~A();
 a->load();

The second load() uses a destroyed mutex, but tsan just beings the
lifetime of a new one when we call __tsan_mutex_pre_lock(&_M_val,
__tsan_mutex_not_static). I don't think we can do anything about that,
but it's not tsan's job to detect use-after-free anyway.

Here's a patch to fix the incorrect flags being passed to the pre/post
unlock functions, and to make the lock annotations more fine-grained.

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

Do not use the __tsan_mutex_not_static flag for annotation functions
where it's not a valid flag.  Also use the try_lock and try_lock_failed
flags to more precisely annotate the CAS loop used to acquire a lock.

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr_atomic.h (_GLIBCXX_TSAN_MUTEX_PRE_LOCK):
Replace with ...
(_GLIBCXX_TSAN_MUTEX_TRY_LOCK): ... this, add try_lock flag.
(_GLIBCXX_TSAN_MUTEX_TRY_LOCK_FAILED): New macro using
try_lock_failed flag
(_GLIBCXX_TSAN_MUTEX_POST_LOCK): Rename to ...
(_GLIBCXX_TSAN_MUTEX_LOCKED): ... this.
(_GLIBCXX_TSAN_MUTEX_PRE_UNLOCK): Remove invalid flag.
(_GLIBCXX_TSAN_MUTEX_POST_UNLOCK): Remove invalid flag.
(_Sp_atomic::_Atomic_count::lock): Use new macros.
---
 libstdc++-v3/include/bits/shared_ptr_atomic.h | 26 +++
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/libstdc++-v3/include/bits/shared_ptr_atomic.h 
b/libstdc++-v3/include/bits/shared_ptr_atomic.h
index 4580807f42c..55d193d4bda 100644
--- a/libstdc++-v3/include/bits/shared_ptr_atomic.h
+++ b/libstdc++-v3/include/bits/shared_ptr_atomic.h
@@ -32,24 +32,26 @@
 
 #include 
 
+// Annotations for the custom locking in atomic>.
 #if defined _GLIBCXX_TSAN && __has_include()
 #include 
 #define _GLIBCXX_TSAN_MUTEX_DESTROY(X) \
   __tsan_mutex_destroy(X, __tsan_mutex_not_static)
-#define _GLIBCXX_TSAN_MUTEX_PRE_LOCK(X) \
-  __tsan_mutex_pre_lock(X, __tsan_mutex_not_static)
-#define _GLIBCXX_TSAN_MUTEX_POST_LOCK(X) \
+#define _GLIBCXX_TSAN_MUTEX_TRY_LOCK(X) \
+  __tsan_mutex_pre_lock(X, __tsan_mutex_not_static|__tsan_mutex_try_lock)
+#define _GLIBCXX_TSAN_MUTEX_TRY_LOCK_FAILED(X) __tsan_mutex_post_lock(X, \
+__tsan_mutex_not_static|__tsan_mutex_try_lock_failed, 0)
+#define _GLIBCXX_TSAN_MUTEX_LOCKED(X) \
   __tsan_mutex_post_lock(X, __tsan_mutex_not_static, 0)
-#define _GLIBCXX_TSAN_MUTEX_PRE_UNLOCK(X) \
-  __tsan_mutex_pre_unlock(X, __tsan_mutex_not_static)
-#define _GLIBCXX_TSAN_MUTEX_POST_UNLOCK(X) \
-  __tsan_mutex_post_unlock(X, __tsan_mutex_not_static)
+#define _GLIBCXX_TSAN_MUTEX_PRE_UNLOCK(X) __tsan_mutex_pre_unlock(X, 0)
+#define _GLIBCXX_TSAN_MUTEX_POST_UNLOCK(X) __tsan_mutex_post_unlock(X, 0)
 #define _GLIBCXX_TSAN_MUTEX_PRE_SIGNAL(X) __tsan_mutex_pre_signal(X, 0)
 #define _GLIBCXX_TSAN_MUTEX_POST_SIGNAL(X) __tsan_mutex_post_signal(X, 0)
 #else
 #define _GLIBCXX_TSAN_MUTEX_DESTROY(X)
-#define _GLIBCXX_TSAN_MUTEX_PRE_LOCK(X)
-#define _GLIBCXX_TSAN_MUTEX_POST_LOCK(X)
+#define _GLIBCXX_TSAN_MUTEX_TRY_LOCK(X)
+#define _GLIBCXX_TSAN_MUTEX_TRY_LOCK_FAILED(X)
+#define _GLIBCXX_TSAN_MUTEX_LOCKED(X)
 #define _GLIBCXX_TSAN_MUTEX_PRE_UNLOCK(X)
 #define _GLIBCXX_TSAN_MUTEX_POST_UNLOCK(X)
 #define _G

[PATCH, committed] Fortran: catch NULL pointer dereferences while simplifying PACK [PR106857]

2022-09-15 Thread Harald Anlauf via Gcc-patches

Dear all,

we hit a NULL pointer dereference when trying to simplify PACK
when the MASK argument was present.  The obvious and trivial
solution is to check for NULL pointer dereferences why looking
at the constructor for the ARRAY argument, which we already do
in the case the MASK is not present.

Committed to mainline after regtesting on x86_64-pc-linux-gnu
as commit r13-2691-g2b75d5f533b9d6b39f4055949aff64ed0d22dd24

This is a 10/11/12/13 regression, so I will check if it can
be backported.

Thanks,
Harald

From 2b75d5f533b9d6b39f4055949aff64ed0d22dd24 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 15 Sep 2022 22:39:24 +0200
Subject: [PATCH] Fortran: catch NULL pointer dereferences while simplifying
 PACK [PR106857]

gcc/fortran/ChangeLog:

	PR fortran/106857
	* simplify.cc (gfc_simplify_pack): Check for NULL pointer dereferences
	while walking through constructors (error recovery).

gcc/testsuite/ChangeLog:

	PR fortran/106857
	* gfortran.dg/pr106857.f90: New test.
---
 gcc/fortran/simplify.cc|  2 +-
 gcc/testsuite/gfortran.dg/pr106857.f90 | 12 
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr106857.f90

diff --git a/gcc/fortran/simplify.cc b/gcc/fortran/simplify.cc
index bc178d54891..140c17721a7 100644
--- a/gcc/fortran/simplify.cc
+++ b/gcc/fortran/simplify.cc
@@ -6431,7 +6431,7 @@ gfc_simplify_pack (gfc_expr *array, gfc_expr *mask, gfc_expr *vector)
   /* Copy only those elements of ARRAY to RESULT whose
 	 MASK equals .TRUE..  */
   mask_ctor = gfc_constructor_first (mask->value.constructor);
-  while (mask_ctor)
+  while (mask_ctor && array_ctor)
 	{
 	  if (mask_ctor->expr->value.logical)
 	{
diff --git a/gcc/testsuite/gfortran.dg/pr106857.f90 b/gcc/testsuite/gfortran.dg/pr106857.f90
new file mode 100644
index 000..4b0f86a75a6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr106857.f90
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! PR fortran/106857 - ICE in gfc_simplify_pack
+! Contributed by G.Steinmetz
+
+program p
+  type t
+ integer :: n
+  end type
+  type(t), parameter :: a(2,2) = t(1)
+  type(t), parameter :: b(4) = reshape(a, [2])  ! { dg-error "Different shape" }
+  type(t), parameter :: c(2) = pack(b, [.false.,.true.,.false.,.true.]) ! { dg-error "Different shape" }
+end
--
2.35.3

Ping: [PATCH] libcpp: Improve location for macro names [PR66290]

2022-09-15 Thread Lewis Hyatt via Gcc-patches

Hello-

https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
May I please ping this patch? Thank you.

-Lewis

On Fri, Aug 5, 2022 at 12:14 PM Lewis Hyatt  wrote:
>
>
> When libcpp reports diagnostics whose locus is a macro name (such as for
> -Wunused-macros), it uses the location in the cpp_macro object that was
> stored by _cpp_new_macro. This is currently set to pfile->directive_line,
> which contains the line number only and no column information. This patch
> changes the stored location to the src_loc for the token defining the macro
> name, which includes the location and range information.
>
> libcpp/ChangeLog:
>
> PR c++/66290
> * macro.cc (_cpp_create_definition): Add location argument.
> * internal.h (_cpp_create_definition): Adjust prototype.
> * directives.cc (do_define): Pass new location argument to
> _cpp_create_definition.
> (do_undef): Stop passing inferior location to cpp_warning_with_line;
> the default from cpp_warning is better.
> (cpp_pop_definition): Pass new location argument to
> _cpp_create_definition.
> * pch.cc (cpp_read_state): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR c++/66290
> * c-c++-common/cpp/macro-ranges.c: New test.
> * c-c++-common/cpp/line-2.c: Adapt to check for column information
> on macro-related libcpp warnings.
> * c-c++-common/cpp/line-3.c: Likewise.
> * c-c++-common/cpp/macro-arg-count-1.c: Likewise.
> * c-c++-common/cpp/pr58844-1.c: Likewise.
> * c-c++-common/cpp/pr58844-2.c: Likewise.
> * c-c++-common/cpp/warning-zero-location.c: Likewise.
> * c-c++-common/pragma-diag-14.c: Likewise.
> * c-c++-common/pragma-diag-15.c: Likewise.
> * g++.dg/modules/macro-2_d.C: Likewise.
> * g++.dg/modules/macro-4_d.C: Likewise.
> * g++.dg/modules/macro-4_e.C: Likewise.
> * g++.dg/spellcheck-macro-ordering.C: Likewise.
> * gcc.dg/builtin-redefine.c: Likewise.
> * gcc.dg/cpp/Wunused.c: Likewise.
> * gcc.dg/cpp/redef2.c: Likewise.
> * gcc.dg/cpp/redef3.c: Likewise.
> * gcc.dg/cpp/redef4.c: Likewise.
> * gcc.dg/cpp/ucnid-11-utf8.c: Likewise.
> * gcc.dg/cpp/ucnid-11.c: Likewise.
> * gcc.dg/cpp/undef2.c: Likewise.
> * gcc.dg/cpp/warn-redefined-2.c: Likewise.
> * gcc.dg/cpp/warn-redefined.c: Likewise.
> * gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
> * gcc.dg/cpp/warn-unused-macros.c: Likewise.
> ---
>
> Notes:
> Hello-
>
> The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66290) was originally
> about the entirely wrong location for -Wunused-macros in C++ mode, which
> behavior was fixed by r13-1903, but before closing it out I wanted to also
> address a second point brought up in the PR comments, namely that we do 
> not
> include column information when emitting diagnostics for macro names, 
> such as
> is done for -Wunused-macros. The attached patch updates the location 
> stored in
> the cpp_macro object so that it includes the column and range information 
> for
> the token comprising the macro name; previously, the location was just the
> generic one pointing to the whole line.
>
> The change to libcpp is very small, the reason for all the testsuite 
> changes is
> that I have updated all tests explicitly looking for the columnless 
> diagnostics
> (with the "-:" syntax to dg-warning et al) so that they expect a column
> instead. I also added a new test which verifies the expected range 
> information
> in diagnostics with carets.
>
> Bootstrap + regtest on x86-64 Linux looks good. Please let me know if it 
> looks
> OK? Thanks!
>
> -Lewis
>
>  libcpp/directives.cc  |  13 +-
>  libcpp/internal.h |   2 +-
>  libcpp/macro.cc   |  12 +-
>  libcpp/pch.cc |   2 +-
>  gcc/testsuite/c-c++-common/cpp/line-2.c   |   2 +-
>  gcc/testsuite/c-c++-common/cpp/line-3.c   |   2 +-
>  .../c-c++-common/cpp/macro-arg-count-1.c  |   4 +-
>  gcc/testsuite/c-c++-common/cpp/macro-ranges.c |  52 ++
>  gcc/testsuite/c-c++-common/cpp/pr58844-1.c|   4 +-
>  gcc/testsuite/c-c++-common/cpp/pr58844-2.c|   4 +-
>  .../c-c++-common/cpp/warning-zero-location.c  |   2 +-
>  gcc/testsuite/c-c++-common/pragma-diag-14.c   |   2 +-
>  gcc/testsuite/c-c++-common/pragma-diag-15.c   |   2 +-
>  gcc/testsuite/g++.dg/modules/macro-2_d.C  |   4 +-
>  gcc/testsuite/g++.dg/modules/macro-4_d.C  |   4 +-
>  gcc/testsuite/g++.dg/modules/macro-4_e.C  |   2 +-
>  .../g++.dg/spellcheck-macro-ordering.C|   2 +-
>  gcc/testsuite/gcc.dg/builtin-redefine.c   |  18 +-
>  gcc/testsuite/gcc.dg/cpp/Wunused.c|   6 +-
>  gcc/testsuite/gcc.dg/cpp/redef2.c |  20 +-
>  gcc/testsu

[committed] libstdc++: Remove unnecessary header from

2022-09-15 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

Previously  included  so that std::copy,
std::fill etc. could be used by . But that
includes it explicitly now, so that it can be compiled as a header unit.
There's no need to include it in , where its purpose isn't
obvious.

libstdc++-v3/ChangeLog:

* include/std/memory: Do not include .
---
 libstdc++-v3/include/std/memory | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libstdc++-v3/include/std/memory b/libstdc++-v3/include/std/memory
index 481fa42a618..20a55020a36 100644
--- a/libstdc++-v3/include/std/memory
+++ b/libstdc++-v3/include/std/memory
@@ -60,7 +60,6 @@
  * Smart pointers, etc.
  */
 
-#include 
 #include 
 #include 
 #include 
-- 
2.37.3

[PATCH] Modernize ix86_builtin_vectorized_function with corresponding expanders.

2022-09-15 Thread liuhongt via Gcc-patches

For ifloor/lfloor/iceil/lceil/irint/lrint/iround/lround when size of
in_mode is not equal out_mode, vectorizer doesn't go to internal fn
way,still left that part in the ix86_builtin_vectorized_function.

Remove others builtins and add corresponding expanders.
Note the patch just refactor the codes, doesn't solve the related case
in the PR which needs extra expander for 64-bit vector.

Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,}.
Ok for trunk.

gcc/ChangeLog:

PR target/106910
* config/i386/i386-builtins.cc
(ix86_builtin_vectorized_function): Modernized with
corresponding expanders.
* config/i386/sse.md (lrint2): New
expander.
(floor2): Ditto.
(lfloor2): Ditto.
(ceil2): Ditto.
(lceil2): Ditto.
(btrunc2): Ditto.
(lround2): Ditto.
(exp22): Ditto.
---
 gcc/config/i386/i386-builtins.cc | 185 +--
 gcc/config/i386/sse.md   |  80 +
 2 files changed, 84 insertions(+), 181 deletions(-)

diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
index 6a04fb57e65..af2faee245b 100644
--- a/gcc/config/i386/i386-builtins.cc
+++ b/gcc/config/i386/i386-builtins.cc
@@ -1540,21 +1540,16 @@ ix86_builtin_vectorized_function (unsigned int fn, tree 
type_out,
 
   switch (fn)
 {
-CASE_CFN_EXP2:
-  if (out_mode == SFmode && in_mode == SFmode)
-   {
- if (out_n == 16 && in_n == 16)
-   return ix86_get_builtin (IX86_BUILTIN_EXP2PS);
-   }
-  break;
-
 CASE_CFN_IFLOOR:
 CASE_CFN_LFLOOR:
-CASE_CFN_LLFLOOR:
   /* The round insn does not trap on denormals.  */
   if (flag_trapping_math || !TARGET_SSE4_1)
break;
 
+  /* PR106910, currently vectorizer doesn't go direct internal fn way
+when out_n != in_n, so let's still keep this.
+Otherwise, it relies on expander of
+lceilmn2/lfloormn2/lroundmn2/lrintmn2.  */
   if (out_mode == SImode && in_mode == DFmode)
{
  if (out_n == 4 && in_n == 2)
@@ -1564,20 +1559,10 @@ ix86_builtin_vectorized_function (unsigned int fn, tree 
type_out,
  else if (out_n == 16 && in_n == 8)
return ix86_get_builtin (IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX512);
}
-  if (out_mode == SImode && in_mode == SFmode)
-   {
- if (out_n == 4 && in_n == 4)
-   return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX);
- else if (out_n == 8 && in_n == 8)
-   return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX256);
- else if (out_n == 16 && in_n == 16)
-   return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX512);
-   }
   break;
 
 CASE_CFN_ICEIL:
 CASE_CFN_LCEIL:
-CASE_CFN_LLCEIL:
   /* The round insn does not trap on denormals.  */
   if (flag_trapping_math || !TARGET_SSE4_1)
break;
@@ -1591,20 +1576,10 @@ ix86_builtin_vectorized_function (unsigned int fn, tree 
type_out,
  else if (out_n == 16 && in_n == 8)
return ix86_get_builtin (IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512);
}
-  if (out_mode == SImode && in_mode == SFmode)
-   {
- if (out_n == 4 && in_n == 4)
-   return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX);
- else if (out_n == 8 && in_n == 8)
-   return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX256);
- else if (out_n == 16 && in_n == 16)
-   return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX512);
-   }
   break;
 
 CASE_CFN_IRINT:
 CASE_CFN_LRINT:
-CASE_CFN_LLRINT:
   if (out_mode == SImode && in_mode == DFmode)
{
  if (out_n == 4 && in_n == 2)
@@ -1614,20 +1589,10 @@ ix86_builtin_vectorized_function (unsigned int fn, tree 
type_out,
  else if (out_n == 16 && in_n == 8)
return ix86_get_builtin (IX86_BUILTIN_VEC_PACK_SFIX512);
}
-  if (out_mode == SImode && in_mode == SFmode)
-   {
- if (out_n == 4 && in_n == 4)
-   return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ);
- else if (out_n == 8 && in_n == 8)
-   return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ256);
- else if (out_n == 16 && in_n == 16)
-   return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ512);
-   }
   break;
 
 CASE_CFN_IROUND:
 CASE_CFN_LROUND:
-CASE_CFN_LLROUND:
   /* The round insn does not trap on denormals.  */
   if (flag_trapping_math || !TARGET_SSE4_1)
break;
@@ -1641,150 +1606,8 @@ ix86_builtin_vectorized_function (unsigned int fn, tree 
type_out,
  else if (out_n == 16 && in_n == 8)
return ix86_get_builtin (IX86_BUILTIN_ROUNDPD_AZ_VEC_PACK_SFIX512);
}
-  if (out_mode == SImode && in_mode == SFmode)
-   {
- if (out_n == 4 && in_n == 4)
-   return ix86_get_builtin (IX86_BUILTIN_ROUNDPS_AZ_SFIX);
- else if (out_n == 8 && in_n == 8)
-   return ix86_get_bu

Re: [PATCH] Modernize ix86_builtin_vectorized_function with corresponding expanders.

2022-09-15 Thread Hongtao Liu via Gcc-patches

On Fri, Sep 16, 2022 at 8:55 AM liuhongt  wrote:
>
> For ifloor/lfloor/iceil/lceil/irint/lrint/iround/lround when size of
> in_mode is not equal out_mode, vectorizer doesn't go to internal fn
> way,still left that part in the ix86_builtin_vectorized_function.
>
> Remove others builtins and add corresponding expanders.
> Note the patch just refactor the codes, doesn't solve the related case
> in the PR which needs extra expander for 64-bit vector.
>
> Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,}.
> Ok for trunk.
>
> gcc/ChangeLog:
>
> PR target/106910
> * config/i386/i386-builtins.cc
> (ix86_builtin_vectorized_function): Modernized with
> corresponding expanders.
> * config/i386/sse.md (lrint2): New
> expander.
> (floor2): Ditto.
> (lfloor2): Ditto.
> (ceil2): Ditto.
> (lceil2): Ditto.
> (btrunc2): Ditto.
> (lround2): Ditto.
> (exp22): Ditto.
> ---
>  gcc/config/i386/i386-builtins.cc | 185 +--
>  gcc/config/i386/sse.md   |  80 +
>  2 files changed, 84 insertions(+), 181 deletions(-)
>
> diff --git a/gcc/config/i386/i386-builtins.cc 
> b/gcc/config/i386/i386-builtins.cc
> index 6a04fb57e65..af2faee245b 100644
> --- a/gcc/config/i386/i386-builtins.cc
> +++ b/gcc/config/i386/i386-builtins.cc
> @@ -1540,21 +1540,16 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>
>switch (fn)
>  {
> -CASE_CFN_EXP2:
> -  if (out_mode == SFmode && in_mode == SFmode)
> -   {
> - if (out_n == 16 && in_n == 16)
> -   return ix86_get_builtin (IX86_BUILTIN_EXP2PS);
> -   }
> -  break;
> -
>  CASE_CFN_IFLOOR:
>  CASE_CFN_LFLOOR:
> -CASE_CFN_LLFLOOR:
>/* The round insn does not trap on denormals.  */
>if (flag_trapping_math || !TARGET_SSE4_1)
> break;
>
> +  /* PR106910, currently vectorizer doesn't go direct internal fn way
> +when out_n != in_n, so let's still keep this.
> +Otherwise, it relies on expander of
> +lceilmn2/lfloormn2/lroundmn2/lrintmn2.  */
>if (out_mode == SImode && in_mode == DFmode)
> {
>   if (out_n == 4 && in_n == 2)
> @@ -1564,20 +1559,10 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>   else if (out_n == 16 && in_n == 8)
> return ix86_get_builtin (IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX512);
> }
> -  if (out_mode == SImode && in_mode == SFmode)
> -   {
> - if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX);
> - else if (out_n == 8 && in_n == 8)
> -   return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX256);
> - else if (out_n == 16 && in_n == 16)
> -   return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX512);
> -   }
>break;
>
>  CASE_CFN_ICEIL:
>  CASE_CFN_LCEIL:
> -CASE_CFN_LLCEIL:
>/* The round insn does not trap on denormals.  */
>if (flag_trapping_math || !TARGET_SSE4_1)
> break;
> @@ -1591,20 +1576,10 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>   else if (out_n == 16 && in_n == 8)
> return ix86_get_builtin (IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512);
> }
> -  if (out_mode == SImode && in_mode == SFmode)
> -   {
> - if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX);
> - else if (out_n == 8 && in_n == 8)
> -   return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX256);
> - else if (out_n == 16 && in_n == 16)
> -   return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX512);
> -   }
>break;
>
>  CASE_CFN_IRINT:
>  CASE_CFN_LRINT:
> -CASE_CFN_LLRINT:
>if (out_mode == SImode && in_mode == DFmode)
> {
>   if (out_n == 4 && in_n == 2)
> @@ -1614,20 +1589,10 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>   else if (out_n == 16 && in_n == 8)
> return ix86_get_builtin (IX86_BUILTIN_VEC_PACK_SFIX512);
> }
> -  if (out_mode == SImode && in_mode == SFmode)
> -   {
> - if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ);
> - else if (out_n == 8 && in_n == 8)
> -   return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ256);
> - else if (out_n == 16 && in_n == 16)
> -   return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ512);
> -   }
>break;
>
>  CASE_CFN_IROUND:
>  CASE_CFN_LROUND:
> -CASE_CFN_LLROUND:
>/* The round insn does not trap on denormals.  */
>if (flag_trapping_math || !TARGET_SSE4_1)
> break;
> @@ -1641,150 +1606,8 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>   else if (out_n == 16 && in_n == 8)
> return ix86_get_bu

[PATCH] [x86]Don't optimize cmp mem, 0 to load mem, reg + test reg, reg

2022-09-15 Thread liuhongt via Gcc-patches

There's peephole2 submit in 1990s which split cmp mem, 0 to load mem,
reg + test reg, reg. I don't know exact reason why gcc do this.

For latest x86 processors, ciscization should help processor frontend
also codesize, for processor backend, they should be the same(has same
uops).

So the patch deleted the peephole2, and also modify another splitter to
generate more cmp mem, 0 for 32-bit target.

It will help instruction fetch.

for minmax-1.c minmax-2.c minmax-10, pr96891.c, it's supposed to scan there's no
comparison to 1 or -1, so adjust the testcase since under 32-bit
target, we now generate cmp mem, 0 instead of load + test.

Similar for pr78035.c.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
No performance impact for SPEC2017 on ICX/Znver3.

Ok for trunk?

gcc/ChangeLog:

* config/i386/i386.md (*3_1): Replace
register_operand with nonimmediate_operand for operand 1. Also
force_reg it when mode is QImode.
(define_peephole2): Deleted related peephole2.

gcc/testsuite/ChangeLog:

* gcc.target/i386/minmax-1.c: Scan-assemble-not for cmp with 1
or -1, also don't scan-assembler test for ia32.
* gcc.target/i386/minmax-10.c: Ditto.
* gcc.target/i386/minmax-2.c: Ditto.
* gcc.target/i386/pr78035.c: Ditto.
* gcc.target/i386/pr96861.c: Scan either cmp or test 3 times.
---
 gcc/config/i386/i386.md   | 18 +-
 gcc/testsuite/gcc.target/i386/minmax-1.c  |  4 ++--
 gcc/testsuite/gcc.target/i386/minmax-10.c |  4 ++--
 gcc/testsuite/gcc.target/i386/minmax-2.c  |  4 ++--
 gcc/testsuite/gcc.target/i386/pr78035.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr96861.c   |  4 ++--
 6 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 1be9b669909..93b905beb72 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -21871,7 +21871,7 @@ (define_insn_and_split "*3_doubleword"
 (define_insn_and_split "*3_1"
   [(set (match_operand:SWI 0 "register_operand")
(maxmin:SWI
- (match_operand:SWI 1 "register_operand")
+ (match_operand:SWI 1 "nonimmediate_operand")
  (match_operand:SWI 2 "general_operand")))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_CMOVE
@@ -21886,9 +21886,12 @@ (define_insn_and_split "*3_1"
 {
   machine_mode mode = mode;
   rtx cmp_op = operands[2];
-
   operands[2] = force_reg (mode, cmp_op);
 
+  /* movqicc_noc only support register_operand for op1.  */
+  if (mode == QImode)
+operands[1] = force_reg (mode, operands[1]);
+
   enum rtx_code code = ;
 
   if (cmp_op == const1_rtx)
@@ -22482,17 +22485,6 @@ (define_peephole2
   [(set (match_dup 2) (match_dup 1))
(set (match_dup 0) (match_dup 2))])
 
-;; Don't compare memory with zero, load and use a test instead.
-(define_peephole2
-  [(set (match_operand 0 "flags_reg_operand")
-   (match_operator 1 "compare_operator"
- [(match_operand:SI 2 "memory_operand")
-  (const_int 0)]))
-   (match_scratch:SI 3 "r")]
-  "optimize_insn_for_speed_p () && ix86_match_ccmode (insn, CCNOmode)"
-  [(set (match_dup 3) (match_dup 2))
-   (set (match_dup 0) (match_op_dup 1 [(match_dup 3) (const_int 0)]))])
-
 ;; NOT is not pairable on Pentium, while XOR is, but one byte longer.
 ;; Don't split NOTs with a displacement operand, because resulting XOR
 ;; will not be pairable anyway.
diff --git a/gcc/testsuite/gcc.target/i386/minmax-1.c 
b/gcc/testsuite/gcc.target/i386/minmax-1.c
index 0ec35b1c5a1..840b32c5414 100644
--- a/gcc/testsuite/gcc.target/i386/minmax-1.c
+++ b/gcc/testsuite/gcc.target/i386/minmax-1.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -march=opteron -mno-stv" } */
-/* { dg-final { scan-assembler "test" } } */
-/* { dg-final { scan-assembler-not "cmp" } } */
+/* { dg-final { scan-assembler "test" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not {(?n)cmp.*[$]+1} } } */
 #define max(a,b) (((a) > (b))? (a) : (b))
 int
 t(int a)
diff --git a/gcc/testsuite/gcc.target/i386/minmax-10.c 
b/gcc/testsuite/gcc.target/i386/minmax-10.c
index b044462c5a9..1dd2eedf435 100644
--- a/gcc/testsuite/gcc.target/i386/minmax-10.c
+++ b/gcc/testsuite/gcc.target/i386/minmax-10.c
@@ -34,5 +34,5 @@ unsigned int umin1(unsigned int x)
   return min(x,1);
 }
 
-/* { dg-final { scan-assembler-times "test" 6 } } */
-/* { dg-final { scan-assembler-not "cmp" } } */
+/* { dg-final { scan-assembler-times "test" 6 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not {(?n)cmp.*1} } } */
diff --git a/gcc/testsuite/gcc.target/i386/minmax-2.c 
b/gcc/testsuite/gcc.target/i386/minmax-2.c
index af9baeaaf7c..2c82f6cecb9 100644
--- a/gcc/testsuite/gcc.target/i386/minmax-2.c
+++ b/gcc/testsuite/gcc.target/i386/minmax-2.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mno-stv" } */
-/* { dg-final { scan-assembler "test" } } */
-/* { dg-final { scan-assembler-not "cmp" } } */
+/* { dg-final { scan-ass

Re: [PATCH] [x86]Don't optimize cmp mem, 0 to load mem, reg + test reg, reg

2022-09-15 Thread Hongtao Liu via Gcc-patches

On Fri, Sep 16, 2022 at 9:09 AM liuhongt via Gcc-patches
 wrote:
>
> There's peephole2 submit in 1990s which split cmp mem, 0 to load mem,
> reg + test reg, reg. I don't know exact reason why gcc do this.
>
> For latest x86 processors, ciscization should help processor frontend
> also codesize, for processor backend, they should be the same(has same
> uops).
>
> So the patch deleted the peephole2, and also modify another splitter to
> generate more cmp mem, 0 for 32-bit target.
>
> It will help instruction fetch.
>
> for minmax-1.c minmax-2.c minmax-10, pr96891.c, it's supposed to scan there's 
> no
> comparison to 1 or -1, so adjust the testcase since under 32-bit
> target, we now generate cmp mem, 0 instead of load + test.
>
> Similar for pr78035.c.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> No performance impact for SPEC2017 on ICX/Znver3.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.md (*3_1): Replace
> register_operand with nonimmediate_operand for operand 1. Also
> force_reg it when mode is QImode.
> (define_peephole2): Deleted related peephole2.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/minmax-1.c: Scan-assemble-not for cmp with 1
> or -1, also don't scan-assembler test for ia32.
> * gcc.target/i386/minmax-10.c: Ditto.
> * gcc.target/i386/minmax-2.c: Ditto.
> * gcc.target/i386/pr78035.c: Ditto.
> * gcc.target/i386/pr96861.c: Scan either cmp or test 3 times.
> ---
>  gcc/config/i386/i386.md   | 18 +-
>  gcc/testsuite/gcc.target/i386/minmax-1.c  |  4 ++--
>  gcc/testsuite/gcc.target/i386/minmax-10.c |  4 ++--
>  gcc/testsuite/gcc.target/i386/minmax-2.c  |  4 ++--
>  gcc/testsuite/gcc.target/i386/pr78035.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr96861.c   |  4 ++--
>  6 files changed, 14 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 1be9b669909..93b905beb72 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -21871,7 +21871,7 @@ (define_insn_and_split "*3_doubleword"
>  (define_insn_and_split "*3_1"
>[(set (match_operand:SWI 0 "register_operand")
> (maxmin:SWI
> - (match_operand:SWI 1 "register_operand")
> + (match_operand:SWI 1 "nonimmediate_operand")
>   (match_operand:SWI 2 "general_operand")))
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_CMOVE
> @@ -21886,9 +21886,12 @@ (define_insn_and_split "*3_1"
>  {
>machine_mode mode = mode;
>rtx cmp_op = operands[2];
> -
>operands[2] = force_reg (mode, cmp_op);
>
> +  /* movqicc_noc only support register_operand for op1.  */
> +  if (mode == QImode)
> +operands[1] = force_reg (mode, operands[1]);
> +
>enum rtx_code code = ;
>
>if (cmp_op == const1_rtx)
> @@ -22482,17 +22485,6 @@ (define_peephole2
>[(set (match_dup 2) (match_dup 1))
> (set (match_dup 0) (match_dup 2))])
>
> -;; Don't compare memory with zero, load and use a test instead.
> -(define_peephole2
> -  [(set (match_operand 0 "flags_reg_operand")
> -   (match_operator 1 "compare_operator"
> - [(match_operand:SI 2 "memory_operand")
> -  (const_int 0)]))
> -   (match_scratch:SI 3 "r")]
> -  "optimize_insn_for_speed_p () && ix86_match_ccmode (insn, CCNOmode)"
> -  [(set (match_dup 3) (match_dup 2))
> -   (set (match_dup 0) (match_op_dup 1 [(match_dup 3) (const_int 0)]))])
> -
>  ;; NOT is not pairable on Pentium, while XOR is, but one byte longer.
>  ;; Don't split NOTs with a displacement operand, because resulting XOR
>  ;; will not be pairable anyway.
> diff --git a/gcc/testsuite/gcc.target/i386/minmax-1.c 
> b/gcc/testsuite/gcc.target/i386/minmax-1.c
> index 0ec35b1c5a1..840b32c5414 100644
> --- a/gcc/testsuite/gcc.target/i386/minmax-1.c
> +++ b/gcc/testsuite/gcc.target/i386/minmax-1.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -march=opteron -mno-stv" } */
> -/* { dg-final { scan-assembler "test" } } */
> -/* { dg-final { scan-assembler-not "cmp" } } */
> +/* { dg-final { scan-assembler "test" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not {(?n)cmp.*[$]+1} } } */
>  #define max(a,b) (((a) > (b))? (a) : (b))
>  int
>  t(int a)
> diff --git a/gcc/testsuite/gcc.target/i386/minmax-10.c 
> b/gcc/testsuite/gcc.target/i386/minmax-10.c
> index b044462c5a9..1dd2eedf435 100644
> --- a/gcc/testsuite/gcc.target/i386/minmax-10.c
> +++ b/gcc/testsuite/gcc.target/i386/minmax-10.c
> @@ -34,5 +34,5 @@ unsigned int umin1(unsigned int x)
>return min(x,1);
>  }
>
> -/* { dg-final { scan-assembler-times "test" 6 } } */
> -/* { dg-final { scan-assembler-not "cmp" } } */
> +/* { dg-final { scan-assembler-times "test" 6 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not {(?n)cmp.*1} } } */
> diff --git a/gcc/testsuite/gcc.target/i386/minmax-2.c 
> b/gcc/testsuite/gcc.target/i386/minmax-2.c
> index af9baeaaf7c..2c82f6cecb9 100

Re: [PATCH] [x86]Don't optimize cmp mem, 0 to load mem, reg + test reg, reg

2022-09-15 Thread Jeff Law via Gcc-patches




On 9/15/22 19:06, liuhongt via Gcc-patches wrote:

There's peephole2 submit in 1990s which split cmp mem, 0 to load mem,
reg + test reg, reg. I don't know exact reason why gcc do this.

For latest x86 processors, ciscization should help processor frontend
also codesize, for processor backend, they should be the same(has same
uops).

So the patch deleted the peephole2, and also modify another splitter to
generate more cmp mem, 0 for 32-bit target.

It will help instruction fetch.

for minmax-1.c minmax-2.c minmax-10, pr96891.c, it's supposed to scan there's no
comparison to 1 or -1, so adjust the testcase since under 32-bit
target, we now generate cmp mem, 0 instead of load + test.

Similar for pr78035.c.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
No performance impact for SPEC2017 on ICX/Znver3.

Ok for trunk?

gcc/ChangeLog:

* config/i386/i386.md (*3_1): Replace
register_operand with nonimmediate_operand for operand 1. Also
force_reg it when mode is QImode.
(define_peephole2): Deleted related peephole2.

gcc/testsuite/ChangeLog:

* gcc.target/i386/minmax-1.c: Scan-assemble-not for cmp with 1
or -1, also don't scan-assembler test for ia32.
* gcc.target/i386/minmax-10.c: Ditto.
* gcc.target/i386/minmax-2.c: Ditto.
* gcc.target/i386/pr78035.c: Ditto.
* gcc.target/i386/pr96861.c: Scan either cmp or test 3 times.


It was almost certainly for PPro/P2 given it was rth's work from 
1999.    Probably should have been conditionalized on PPro/P2 at the 
time.   No worries losing it now...



Jeff

RE: [PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-09-15 Thread Kong, Lingling via Gcc-patches

Hi Richard,

Thanks again for your reviewing.

> Yes, use else if for the bitwise induction.  Can you also make the new case
> conditional on 'def'
> (the compute_overall_effect_of_inner_loop) being chrec_dont_know?  If that
> call produced something useful it will not be of either of the two special 
> forms.
> Thus like
> 
>   if (def != chrec_dont_know)
> /* Already OK.  */
> ;
>  else if ((bitinv_def = ...)
> ..
>  else if (tree_fits_uhwi_p (niter)
>  ... bitwise induction case...)
> ...
>
Yes, I fixed it in new patch. Thanks.
Ok for master ?

Thanks,
Lingling

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, September 14, 2022 4:16 PM
> To: Kong, Lingling 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] Enhance final_value_replacement_loop to handle bitop
> with an invariant induction.[PR105735]
> 
> On Tue, Sep 13, 2022 at 9:54 AM Kong, Lingling 
> wrote:
> >
> > Hi Richard,
> >
> > Thanks you so much for reviewing this patch.  I really appreciate it. For 
> > these
> review comments, I have made some changes.
> >
> > > That's a single-stmt match, you shouldn't use match.pd matching for this.
> > > Instead just do
> > >
> > >   if (is_gimple_assign (stmt)
> > >   && ((code = gimple_assign_rhs_code (stmt)), true)
> > >   && (code == BIT_AND_EXPR || code == BIT_IOR_EXPR || code ==
> > > BIT_XOR_EXPR))
> >
> > Yes, I fixed it and dropped modification for match.pd.
> >
> > > and pick gimple_assign_rhs{1,2} (stmt) as the operands.  The :c in
> > > bit_op:c is redundant btw. - while the name suggests "with
> > > invariant" you don't actually check for that.  But again, given
> > > canonicalization rules the invariant will be rhs2 so above add
> > >
> > > && TREE_CODE (gimple_assign_rhs2 (stmt)) == INTEGER_CST
> >
> > For " with invariant", this needed op1 is invariant, and I used
> `expr_invariant_in_loop_p (loop, match_op[0])` for check.
> > And op2 just be PHI is ok. If op2 is INTEGER_CST, existing gcc can be 
> > directly
> optimized and do not need modification.
> >
> > > you probably need dg-require-effective-target longlong, but is it
> > > necessary to use long long for the testcases in the first place?
> > > The IV seems to be unused, if it should match the variables bit size
> > > use sizeof
> > > (type) * 8
> >
> > Yes, It is not necessary to use long long for the testcases. I changed type 
> > to
> unsigned int.
> >
> > > > +  inv = PHI_ARG_DEF_FROM_EDGE (header_phi, loop_preheader_edge
> > > > + (loop));  return fold_build2 (code1, type, inv, match_op[0]); }
> > >
> > > The } goes to the next line.
> >
> > Sorry, It might be something wrong with my use of gcc send-email format.
> >
> > > > +  tree bitinv_def;
> > > > +  if ((bitinv_def
> > >
> > > please use else if here
> >
> > Sorry, If use the else if here, there is no corresponding above if. I'm not 
> > sure if
> you mean change bitwise induction expression if to else if.
> 
> Yes, use else if for the bitwise induction.  Can you also make the new case
> conditional on 'def'
> (the compute_overall_effect_of_inner_loop) being chrec_dont_know?  If that
> call produced something useful it will not be of either of the two special 
> forms.
> Thus like
> 
>   if (def != chrec_dont_know)
> /* Already OK.  */
> ;
>  else if ((bitinv_def = ...)
> ..
>  else if (tree_fits_uhwi_p (niter)
>  ... bitwise induction case...)
> ...
> 
> ?
> 
> Otherwise looks OK now.
> 
> Thanks,
> Richard.
> 
> > Do you agree with these changes?  Thanks again for taking a look.
> >
> > Thanks,
> > Lingling
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Tuesday, August 23, 2022 3:27 PM
> > > To: Kong, Lingling 
> > > Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH] Enhance final_value_replacement_loop to handle
> > > bitop with an invariant induction.[PR105735]
> > >
> > > On Thu, Aug 18, 2022 at 8:48 AM Kong, Lingling via Gcc-patches  > > patc...@gcc.gnu.org> wrote:
> > > >
> > > > Hi,
> > > >
> > > > This patch is for pr105735/pr101991. It will enable below optimization:
> > > > {
> > > > -  long unsigned int bit;
> > > > -
> > > > -   [local count: 32534376]:
> > > > -
> > > > -   [local count: 1041207449]:
> > > > -  # tmp_10 = PHI 
> > > > -  # bit_12 = PHI 
> > > > -  tmp_7 = bit2_6(D) & tmp_10;
> > > > -  bit_8 = bit_12 + 1;
> > > > -  if (bit_8 != 32)
> > > > -goto ; [96.97%]
> > > > -  else
> > > > -goto ; [3.03%]
> > > > -
> > > > -   [local count: 1009658865]:
> > > > -  goto ; [100.00%]
> > > > -
> > > > -   [local count: 32534376]:
> > > > -  # tmp_11 = PHI 
> > > > -  return tmp_11;
> > > > +  tmp_11 = tmp_4(D) & bit2_6(D);
> > > > +  return tmp_11;
> > > >
> > > > }
> > > >
> > > > Ok for master ?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR middle-end/105735
> > > > * match.pd (bitop_with_inv_p): New match.
> > > > * tree-scalar-evolution.cc (gimple

New French PO file for 'gcc' (version 12.2.0)

2022-09-15 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the French team of translators.  The file is available at:

https://translationproject.org/latest/gcc/fr.po

(This file, 'gcc-12.2.0.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

[PATCH] [x86] Adjust issue_rate for latest Intel processors.

2022-09-15 Thread liuhongt via Gcc-patches

For Skylake based processor, decoder is 4-way.
For Sunny Cove and Willow Cove, decoder is 5-way.
For Golden cove, decoder is 6-way.

Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,}.
Ready to install.

gcc/ChangeLog:

* config/i386/x86-tune-sched.cc (ix86_issue_rate): Adjust for
latest Intel processors.
---
 gcc/config/i386/x86-tune-sched.cc | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/i386/x86-tune-sched.cc 
b/gcc/config/i386/x86-tune-sched.cc
index 1ffaeef037c..e2765f81902 100644
--- a/gcc/config/i386/x86-tune-sched.cc
+++ b/gcc/config/i386/x86-tune-sched.cc
@@ -73,10 +73,24 @@ ix86_issue_rate (void)
 case PROCESSOR_SANDYBRIDGE:
 case PROCESSOR_HASWELL:
 case PROCESSOR_TREMONT:
+case PROCESSOR_SKYLAKE:
+case PROCESSOR_SKYLAKE_AVX512:
+case PROCESSOR_CASCADELAKE:
+case PROCESSOR_CANNONLAKE:
 case PROCESSOR_ALDERLAKE:
 case PROCESSOR_GENERIC:
   return 4;
 
+case PROCESSOR_ICELAKE_CLIENT:
+case PROCESSOR_ICELAKE_SERVER:
+case PROCESSOR_TIGERLAKE:
+case PROCESSOR_COOPERLAKE:
+case PROCESSOR_ROCKETLAKE:
+  return 5;
+
+case PROCESSOR_SAPPHIRERAPIDS:
+  return 6;
+
 default:
   return 1;
 }
-- 
2.18.1

Re: [PATCH] [x86]Don't optimize cmp mem, 0 to load mem, reg + test reg, reg

2022-09-15 Thread Uros Bizjak via Gcc-patches

On Fri, Sep 16, 2022 at 3:32 AM Jeff Law via Gcc-patches
 wrote:
>
>
> On 9/15/22 19:06, liuhongt via Gcc-patches wrote:
> > There's peephole2 submit in 1990s which split cmp mem, 0 to load mem,
> > reg + test reg, reg. I don't know exact reason why gcc do this.
> >
> > For latest x86 processors, ciscization should help processor frontend
> > also codesize, for processor backend, they should be the same(has same
> > uops).
> >
> > So the patch deleted the peephole2, and also modify another splitter to
> > generate more cmp mem, 0 for 32-bit target.
> >
> > It will help instruction fetch.
> >
> > for minmax-1.c minmax-2.c minmax-10, pr96891.c, it's supposed to scan 
> > there's no
> > comparison to 1 or -1, so adjust the testcase since under 32-bit
> > target, we now generate cmp mem, 0 instead of load + test.
> >
> > Similar for pr78035.c.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> > No performance impact for SPEC2017 on ICX/Znver3.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >   * config/i386/i386.md (*3_1): Replace
> >   register_operand with nonimmediate_operand for operand 1. Also
> >   force_reg it when mode is QImode.
> >   (define_peephole2): Deleted related peephole2.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/i386/minmax-1.c: Scan-assemble-not for cmp with 1
> >   or -1, also don't scan-assembler test for ia32.
> >   * gcc.target/i386/minmax-10.c: Ditto.
> >   * gcc.target/i386/minmax-2.c: Ditto.
> >   * gcc.target/i386/pr78035.c: Ditto.
> >   * gcc.target/i386/pr96861.c: Scan either cmp or test 3 times.
>
> It was almost certainly for PPro/P2 given it was rth's work from
> 1999.Probably should have been conditionalized on PPro/P2 at the
> time.   No worries losing it now...

Please add a tune flag in x86-tune.def under "Historical relics" and
use it in the relevant peephole2 instead of deleting it.

Uros.

Re: [PATCH] Modernize ix86_builtin_vectorized_function with corresponding expanders.

2022-09-15 Thread Uros Bizjak via Gcc-patches

On Fri, Sep 16, 2022 at 2:55 AM liuhongt via Gcc-patches
 wrote:
>
> For ifloor/lfloor/iceil/lceil/irint/lrint/iround/lround when size of
> in_mode is not equal out_mode, vectorizer doesn't go to internal fn
> way,still left that part in the ix86_builtin_vectorized_function.
>
> Remove others builtins and add corresponding expanders.
> Note the patch just refactor the codes, doesn't solve the related case
> in the PR which needs extra expander for 64-bit vector.
>
> Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,}.
> Ok for trunk.
>
> gcc/ChangeLog:
>
> PR target/106910
> * config/i386/i386-builtins.cc
> (ix86_builtin_vectorized_function): Modernized with
> corresponding expanders.
> * config/i386/sse.md (lrint2): New
> expander.
> (floor2): Ditto.
> (lfloor2): Ditto.
> (ceil2): Ditto.
> (lceil2): Ditto.
> (btrunc2): Ditto.
> (lround2): Ditto.
> (exp22): Ditto.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-builtins.cc | 185 +--
>  gcc/config/i386/sse.md   |  80 +
>  2 files changed, 84 insertions(+), 181 deletions(-)
>
> diff --git a/gcc/config/i386/i386-builtins.cc 
> b/gcc/config/i386/i386-builtins.cc
> index 6a04fb57e65..af2faee245b 100644
> --- a/gcc/config/i386/i386-builtins.cc
> +++ b/gcc/config/i386/i386-builtins.cc
> @@ -1540,21 +1540,16 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>
>switch (fn)
>  {
> -CASE_CFN_EXP2:
> -  if (out_mode == SFmode && in_mode == SFmode)
> -   {
> - if (out_n == 16 && in_n == 16)
> -   return ix86_get_builtin (IX86_BUILTIN_EXP2PS);
> -   }
> -  break;
> -
>  CASE_CFN_IFLOOR:
>  CASE_CFN_LFLOOR:
> -CASE_CFN_LLFLOOR:
>/* The round insn does not trap on denormals.  */
>if (flag_trapping_math || !TARGET_SSE4_1)
> break;
>
> +  /* PR106910, currently vectorizer doesn't go direct internal fn way
> +when out_n != in_n, so let's still keep this.
> +Otherwise, it relies on expander of
> +lceilmn2/lfloormn2/lroundmn2/lrintmn2.  */
>if (out_mode == SImode && in_mode == DFmode)
> {
>   if (out_n == 4 && in_n == 2)
> @@ -1564,20 +1559,10 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>   else if (out_n == 16 && in_n == 8)
> return ix86_get_builtin (IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX512);
> }
> -  if (out_mode == SImode && in_mode == SFmode)
> -   {
> - if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX);
> - else if (out_n == 8 && in_n == 8)
> -   return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX256);
> - else if (out_n == 16 && in_n == 16)
> -   return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX512);
> -   }
>break;
>
>  CASE_CFN_ICEIL:
>  CASE_CFN_LCEIL:
> -CASE_CFN_LLCEIL:
>/* The round insn does not trap on denormals.  */
>if (flag_trapping_math || !TARGET_SSE4_1)
> break;
> @@ -1591,20 +1576,10 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>   else if (out_n == 16 && in_n == 8)
> return ix86_get_builtin (IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512);
> }
> -  if (out_mode == SImode && in_mode == SFmode)
> -   {
> - if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX);
> - else if (out_n == 8 && in_n == 8)
> -   return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX256);
> - else if (out_n == 16 && in_n == 16)
> -   return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX512);
> -   }
>break;
>
>  CASE_CFN_IRINT:
>  CASE_CFN_LRINT:
> -CASE_CFN_LLRINT:
>if (out_mode == SImode && in_mode == DFmode)
> {
>   if (out_n == 4 && in_n == 2)
> @@ -1614,20 +1589,10 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>   else if (out_n == 16 && in_n == 8)
> return ix86_get_builtin (IX86_BUILTIN_VEC_PACK_SFIX512);
> }
> -  if (out_mode == SImode && in_mode == SFmode)
> -   {
> - if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ);
> - else if (out_n == 8 && in_n == 8)
> -   return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ256);
> - else if (out_n == 16 && in_n == 16)
> -   return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ512);
> -   }
>break;
>
>  CASE_CFN_IROUND:
>  CASE_CFN_LROUND:
> -CASE_CFN_LLROUND:
>/* The round insn does not trap on denormals.  */
>if (flag_trapping_math || !TARGET_SSE4_1)
> break;
> @@ -1641,150 +1606,8 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>   else if (out_n == 16 && in_n

39 matches

Mail list logo