date:20231206

Re: [PATCH] build: unbreak bootstrap on uclinux targets [PR112762]

2023-12-06 Thread Richard Biener

On Tue, Dec 5, 2023 at 7:50 PM Marek Polacek  wrote:
>
> Tested with .../configure --target=c6x-uclinux [...] && make all-gcc,
> ok for trunk?

OK

> -- >8 --
> Currently, cross-compiling with --target=c6x-uclinux (and several other)
> fails due to:
>
> ../../src/gcc/config/linux.h:221:45: error: 
> 'linux_fortify_source_default_level' was not declared in this scope
>  #define TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL 
> linux_fortify_source_default_level
>
> In the PR Andrew mentions that another fix would be in config.gcc,
> but really, here I meant to use the target hook for glibc only, not
> uclibc.  This trivial patch fixes the build problem.  It means that
> -fhardened with uclibc will use -D_FORTIFY_SOURCE=2 and not =3.
>
> PR target/112762
>
> gcc/ChangeLog:
>
> * config/linux.h: Redefine TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL for
> glibc only.
> ---
>  gcc/config/linux.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/linux.h b/gcc/config/linux.h
> index 79b6537dcf1..73f39d3c603 100644
> --- a/gcc/config/linux.h
> +++ b/gcc/config/linux.h
> @@ -215,7 +215,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  # undef TARGET_LIBM_FUNCTION_MAX_ERROR
>  # define TARGET_LIBM_FUNCTION_MAX_ERROR linux_libm_function_max_error
>
> -#endif
> -
>  #undef TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL
>  #define TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL 
> linux_fortify_source_default_level
> +
> +#endif
>
> base-commit: 9c3a880feecf81c310b4ade210fbd7004c9aece7
> --
> 2.43.0
>

Re: [PATCH]middle-end: correct loop bounds for early breaks and peeled vector loops

2023-12-06 Thread Richard Biener

On Wed, 6 Dec 2023, Tamar Christina wrote:

> Hi All,
> 
> While waiting for reviews I've continued to run more test.
> In this case this was one found running 32-bit systems.
> 
> While we calculate the right latch count for the epilog,
> the vectorizer overrides SCEV and so unrolling goes wrong.
> 
> This updates the bounds for the case where we've peeled a
> vector iteration.
> 
> Testcase in early break testsuite adjusted to test for this
> and I'll merge this commit in the main one.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop-manip.cc (vect_do_peeling): Adjust bounds.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> 4edde4443ecd98775972f39b3fe839255db12b04..7d48502e2e46240553509dfa6d75fcab7fea36d3
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3457,6 +3457,12 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
> niters, tree nitersm1,
>if (bound_scalar.is_constant (&bound))
>   {
> gcc_assert (bound != 0);
> +   /* Adjust the upper bound by the extra peeled vector iteration if we
> +  are an epilogue of an peeled vect loop and not VLA.  For VLA the
> +  loop bounds are unknown.  */
> +   if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)
> +   && vf.is_constant ())
> + bound += vf.to_constant ();
> /* -1 to convert loop iterations to latch iterations.  */
> record_niter_bound (epilog, bound - 1, false, true);
> scale_loop_profile (epilog, profile_probability::always (),
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH v3 00/16] Support Intel APX NDD

2023-12-06 Thread Hongyu Wang

Hi,

Following up the discussion of V2 patches in
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639368.html,
this patch series add early clobber for all TImode NDD alternatives
to avoid any potential overlapping between dest register and src
register/memory. Also use get_attr_isa (insn) == ISA_APX_NDD instead of
checking alternative at asm output stage.

Bootstrapped & regtested on x86_64-pc-linux-gnu{-m32,} and sde.

Ok for master?

Hongyu Wang (7):
  [APX NDD] Disable seg_prefixed memory usage for NDD add
  [APX NDD] Support APX NDD for left shift insns
  [APX NDD] Support APX NDD for right shift insns
  [APX NDD] Support APX NDD for rotate insns
  [APX NDD] Support APX NDD for shld/shrd insns
  [APX NDD] Support APX NDD for cmove insns
  [APX NDD] Support TImode shift for NDD

Kong Lingling (9):
  [APX NDD] Support Intel APX NDD for legacy add insn
  [APX NDD] Support APX NDD for optimization patterns of add
  [APX NDD] Support APX NDD for adc insns
  [APX NDD] Support APX NDD for sub insns
  [APX NDD] Support APX NDD for sbb insn
  [APX NDD] Support APX NDD for neg insn
  [APX NDD] Support APX NDD for not insn
  [APX NDD] Support APX NDD for and insn
  [APX NDD] Support APX NDD for or/xor insn

 gcc/config/i386/constraints.md|5 +
 gcc/config/i386/i386-expand.cc|  164 +-
 gcc/config/i386/i386-options.cc   |2 +
 gcc/config/i386/i386-protos.h |   16 +-
 gcc/config/i386/i386.cc   |   30 +-
 gcc/config/i386/i386.md   | 2325 +++--
 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c   |   15 +
 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c  |   16 +
 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c   |6 +
 .../gcc.target/i386/apx-ndd-shld-shrd.c   |   24 +
 .../gcc.target/i386/apx-ndd-ti-shift.c|   91 +
 gcc/testsuite/gcc.target/i386/apx-ndd.c   |  202 ++
 12 files changed, 2141 insertions(+), 755 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd.c

-- 
2.31.1

[PATCH 08/16] [APX NDD] Support APX NDD for not insn

2023-12-06 Thread Hongyu Wang

From: Kong Lingling 

For *one_cmplsi2_2_zext, it will be splitted to xor, so its NDD form will be
added together with xor NDD support.

gcc/ChangeLog:

* config/i386/i386.md (one_cmpl2): Add new constraints for NDD
and adjust output template.
(*one_cmpl2_1): Likewise.
(*one_cmplqi2_1): Likewise.
(*one_cmpl2_doubleword): Likewise, and adopt '&' to NDD dest.
(*one_cmpl2_2): Likewise.
(*one_cmplsi2_1_zext): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add not test.
---
 gcc/config/i386/i386.md | 58 ++---
 gcc/testsuite/gcc.target/i386/apx-ndd.c | 11 +
 2 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e97c1784e9a..61b7b79543b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14006,57 +14006,63 @@ (define_expand "one_cmpl2"
   [(set (match_operand:SDWIM 0 "nonimmediate_operand")
(not:SDWIM (match_operand:SDWIM 1 "nonimmediate_operand")))]
   ""
-  "ix86_expand_unary_operator (NOT, mode, operands); DONE;")
+  "ix86_expand_unary_operator (NOT, mode, operands,
+  TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*one_cmpl2_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro")
-   (not: (match_operand: 1 "nonimmediate_operand" "0")))]
-  "ix86_unary_operator_ok (NOT, mode, operands)"
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,&r")
+   (not: (match_operand: 1 "nonimmediate_operand" "0,ro")))]
+  "ix86_unary_operator_ok (NOT, mode, operands, TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(set (match_dup 0)
(not:DWIH (match_dup 1)))
(set (match_dup 2)
(not:DWIH (match_dup 3)))]
-  "split_double_mode (mode, &operands[0], 2, &operands[0], 
&operands[2]);")
+  "split_double_mode (mode, &operands[0], 2, &operands[0], &operands[2]);"
+  [(set_attr "isa" "*,apx_ndd")])
 
 (define_insn "*one_cmpl2_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,?k")
-   (not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0,k")))]
-  "ix86_unary_operator_ok (NOT, mode, operands)"
+  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
+   (not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0,rm,k")))]
+  "ix86_unary_operator_ok (NOT, mode, operands, TARGET_APX_NDD)"
   "@
not{}\t%0
+   not{}\t{%1, %0|%0, %1}
#"
-  [(set_attr "isa" "*,")
-   (set_attr "type" "negnot,msklog")
+  [(set_attr "isa" "*,apx_ndd,")
+   (set_attr "type" "negnot,negnot,msklog")
(set_attr "mode" "")])
 
 (define_insn "*one_cmplsi2_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,?k")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,?k")
(zero_extend:DI
- (not:SI (match_operand:SI 1 "register_operand" "0,k"]
-  "TARGET_64BIT && ix86_unary_operator_ok (NOT, SImode, operands)"
+ (not:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,k"]
+  "TARGET_64BIT && ix86_unary_operator_ok (NOT, SImode, operands,
+  TARGET_APX_NDD)"
   "@
not{l}\t%k0
+   not{l}\t{%1, %k0|%k0, %1}
#"
-  [(set_attr "isa" "x64,avx512bw_512")
-   (set_attr "type" "negnot,msklog")
-   (set_attr "mode" "SI,SI")])
+  [(set_attr "isa" "x64,apx_ndd,avx512bw_512")
+   (set_attr "type" "negnot,negnot,msklog")
+   (set_attr "mode" "SI,SI,SI")])
 
 (define_insn "*one_cmplqi2_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,?k")
-   (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,k")))]
-  "ix86_unary_operator_ok (NOT, QImode, operands)"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,r,?k")
+   (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,rm,k")))]
+  "ix86_unary_operator_ok (NOT, QImode, operands, TARGET_APX_NDD)"
   "@
not{b}\t%0
not{l}\t%k0
+   not{b}\t{%1, %0|%0, %1}
#"
-  [(set_attr "isa" "*,*,avx512f")
-   (set_attr "type" "negnot,negnot,msklog")
+  [(set_attr "isa" "*,*,apx_ndd,avx512f")
+   (set_attr "type" "negnot,negnot,negnot,msklog")
(set (attr "mode")
(cond [(eq_attr "alternative" "1")
 (const_string "SI")
-   (and (eq_attr "alternative" "2")
+   (and (eq_attr "alternative" "3")
 (match_test "!TARGET_AVX512DQ"))
 (const_string "HI")
   ]
@@ -14086,14 +14092,16 @@ (define_insn_and_split "*one_cmpl_1_slp"
 
 (define_insn "*one_cmpl2_2"
   [(set (reg FLAGS_REG)
-   (compare (not:SWI (match_operand:SWI 1 "nonimmediate_operand" "0"))
+   (compare (not:SWI (match_operand:SWI 1 "nonimmediate_operand" "0,rm"))
 (const_int 0)))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=m")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=m,r")
(not:S

[PATCH 01/16] [APX NDD] Support Intel APX NDD for legacy add insn

2023-12-06 Thread Hongyu Wang

From: Kong Lingling 

APX NDD provides an extra destination register operand for several gpr
related legacy insns, so a new alternative can be adopted to operand1
with "r" constraint.

This first patch supports NDD for add instruction, and keeps to use lea
when all operands are registers since lea have shorter encoding. For
add operations containing mem NDD will be adopted to save an extra move.

In legacy x86 binary operation expand it will force operands[0] and
operands[1] to be the same so add a helper function to allow NDD form
pattern that operands[0] and operands[1] can be different.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_fixup_binary_operands): Add
new use_ndd flag to check whether ndd can be used for this binop
and adjust operand emit.
(ix86_binary_operator_ok): Likewise.
(ix86_expand_binary_operator): Likewise, and void postreload
expand generate lea pattern when use_ndd is explicit parsed.
* config/i386/i386-options.cc (ix86_option_override_internal):
Prohibit apx subfeatures when not in 64bit mode.
* config/i386/i386-protos.h (ix86_binary_operator_ok):
Add use_ndd flag.
(ix86_fixup_binary_operand): Likewise.
(ix86_expand_binary_operand): Likewise.
* config/i386/i386.md (*add_1): Extend with new alternatives
to support NDD, and adjust output template.
(*addhi_1): Likewise.
(*addqi_1): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: New test.
---
 gcc/config/i386/i386-expand.cc  |  19 ++---
 gcc/config/i386/i386-options.cc |   2 +
 gcc/config/i386/i386-protos.h   |   6 +-
 gcc/config/i386/i386.md | 102 ++--
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  21 +
 5 files changed, 96 insertions(+), 54 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 4bd7d4f39c8..3ecda989cf8 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1260,14 +1260,14 @@ ix86_swap_binary_operands_p (enum rtx_code code, 
machine_mode mode,
   return false;
 }
 
-
 /* Fix up OPERANDS to satisfy ix86_binary_operator_ok.  Return the
destination to use for the operation.  If different from the true
-   destination in operands[0], a copy operation will be required.  */
+   destination in operands[0], a copy operation will be required except
+   under TARGET_APX_NDD.  */
 
 rtx
 ix86_fixup_binary_operands (enum rtx_code code, machine_mode mode,
-   rtx operands[])
+   rtx operands[], bool use_ndd)
 {
   rtx dst = operands[0];
   rtx src1 = operands[1];
@@ -1307,7 +1307,7 @@ ix86_fixup_binary_operands (enum rtx_code code, 
machine_mode mode,
 src1 = force_reg (mode, src1);
 
   /* Source 1 cannot be a non-matching memory.  */
-  if (MEM_P (src1) && !rtx_equal_p (dst, src1))
+  if (!use_ndd && MEM_P (src1) && !rtx_equal_p (dst, src1))
 src1 = force_reg (mode, src1);
 
   /* Improve address combine.  */
@@ -1338,11 +1338,11 @@ ix86_fixup_binary_operands_no_copy (enum rtx_code code,
 
 void
 ix86_expand_binary_operator (enum rtx_code code, machine_mode mode,
-rtx operands[])
+rtx operands[], bool use_ndd)
 {
   rtx src1, src2, dst, op, clob;
 
-  dst = ix86_fixup_binary_operands (code, mode, operands);
+  dst = ix86_fixup_binary_operands (code, mode, operands, use_ndd);
   src1 = operands[1];
   src2 = operands[2];
 
@@ -1352,7 +1352,8 @@ ix86_expand_binary_operator (enum rtx_code code, 
machine_mode mode,
 
   if (reload_completed
   && code == PLUS
-  && !rtx_equal_p (dst, src1))
+  && !rtx_equal_p (dst, src1)
+  && !use_ndd)
 {
   /* This is going to be an LEA; avoid splitting it later.  */
   emit_insn (op);
@@ -1451,7 +1452,7 @@ ix86_expand_vector_logical_operator (enum rtx_code code, 
machine_mode mode,
 
 bool
 ix86_binary_operator_ok (enum rtx_code code, machine_mode mode,
-rtx operands[3])
+rtx operands[3], bool use_ndd)
 {
   rtx dst = operands[0];
   rtx src1 = operands[1];
@@ -1475,7 +1476,7 @@ ix86_binary_operator_ok (enum rtx_code code, machine_mode 
mode,
 return false;
 
   /* Source 1 cannot be a non-matching memory.  */
-  if (MEM_P (src1) && !rtx_equal_p (dst, src1))
+  if (!use_ndd && MEM_P (src1) && !rtx_equal_p (dst, src1))
 /* Support "andhi/andsi/anddi" as a zero-extending move.  */
 return (code == AND
&& (mode == HImode
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index f86ad332aad..7d0a253e07f 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -2129,6 +2129,8 @@ ix86_option_override_internal (bool main_args_p,
 
   if (TARGET_APX_F && !TARGET_64BIT)
 error ("%<-mapxf%

[PATCH 03/16] [APX NDD] Disable seg_prefixed memory usage for NDD add

2023-12-06 Thread Hongyu Wang

NDD uses evex prefix, so when segment prefix is also applied, the instruction
could excceed its 15byte limit, especially adding immediates. This could happen
when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will
add the offset to segment register, which will be encoded using segment prefix.
Disable those *POFF constant usage in NDD add alternatives with new constraint.

gcc/ChangeLog:

* config/i386/constraints.md (je): New constraint.
* config/i386/i386-protos.h (x86_poff_operand_p): New function to
check any *POFF constant in operand.
* config/i386/i386.cc (x86_poff_operand_p): New prototype.
* config/i386/i386.md (*add_1): Split out je alternative for add.
---
 gcc/config/i386/constraints.md |  5 +
 gcc/config/i386/i386-protos.h  |  1 +
 gcc/config/i386/i386.cc| 25 +
 gcc/config/i386/i386.md|  8 
 4 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index cbee31fa40a..f4c3c3dd952 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -433,3 +433,8 @@ (define_address_constraint "jb"
 
 (define_register_constraint  "jc"
  "TARGET_APX_EGPR && !TARGET_AVX ? GENERAL_GPR16 : GENERAL_REGS")
+
+(define_constraint  "je"
+  "@internal constant that do not allow any unspec global offsets"
+  (and (match_operand 0 "x86_64_immediate_operand")
+   (match_test "!x86_poff_operand_p (op)")))
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index a9d0c568bba..7dfeb6af225 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -66,6 +66,7 @@ extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
 extern bool x86_extended_reg_mentioned_p (rtx);
 extern bool x86_extended_rex2reg_mentioned_p (rtx);
 extern bool x86_evex_reg_mentioned_p (rtx [], int);
+extern bool x86_poff_operand_p (rtx);
 extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
 extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 7c5cab4e2c6..8aa33aef7e1 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23331,6 +23331,31 @@ x86_evex_reg_mentioned_p (rtx operands[], int nops)
   return false;
 }
 
+/* Return true when rtx operand does not contain any UNSPEC_*POFF related
+   constant to avoid APX_NDD instructions excceed encoding length limit.  */
+bool
+x86_poff_operand_p (rtx operand)
+{
+  if (GET_CODE (operand) == CONST)
+{
+  rtx op = XEXP (operand, 0);
+  if (GET_CODE (op) == PLUS)
+   op = XEXP (op, 0);
+   
+  if (GET_CODE (op) == UNSPEC)
+   {
+ int unspec = XINT (op, 1);
+ return (unspec == UNSPEC_NTPOFF
+ || unspec == UNSPEC_TPOFF
+ || unspec == UNSPEC_DTPOFF
+ || unspec == UNSPEC_GOTTPOFF
+ || unspec == UNSPEC_GOTNTPOFF
+ || unspec == UNSPEC_INDNTPOFF);
+   }
+}
+  return false;
+}
+
 /* If profitable, negate (without causing overflow) integer constant
of mode MODE at location LOC.  Return true in this case.  */
 bool
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 1e846183347..a1626121227 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6418,10 +6418,10 @@ (define_insn_and_split 
"*add3_doubleword_concat_zext"
  "split_double_mode (mode, &operands[0], 1, &operands[0], &operands[5]);")
 
 (define_insn "*add_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r,r,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r,r,r,r,r")
(plus:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "%0,0,r,r,rm,r")
- (match_operand:SWI48 2 "x86_64_general_operand" "re,BM,0,le,re,BM")))
+ (match_operand:SWI48 1 "nonimmediate_operand" "%0,0,r,r,rm,r,m,r")
+ (match_operand:SWI48 2 "x86_64_general_operand" 
"re,BM,0,le,r,e,je,BM")))
(clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (PLUS, mode, operands,
TARGET_APX_NDD)"
@@ -6457,7 +6457,7 @@ (define_insn "*add_1"
: "add{}\t{%2, %0|%0, %2}";
 }
 }
-  [(set_attr "isa" "*,*,*,*,apx_ndd,apx_ndd")
+  [(set_attr "isa" "*,*,*,*,apx_ndd,apx_ndd,apx_ndd,apx_ndd")
(set (attr "type")
  (cond [(eq_attr "alternative" "3")
   (const_string "lea")
-- 
2.31.1

[PATCH 10/16] [APX NDD] Support APX NDD for or/xor insn

2023-12-06 Thread Hongyu Wang

From: Kong Lingling 

Similar to AND insn, two splitters need to be adjusted to prevent
misoptimizaiton for NDD OR/XOR.

Also adjust *one_cmplsi2_2_zext and its corresponding splitter that will
generate xor insn.

gcc/ChangeLog:

* config/i386/i386.md (3): Add new alternative for NDD
and adjust output templates.
(*_1): Likewise.
(*qi_1): Likewise.
(*notxor_1): Likewise.
(*si_1_zext): Likewise.
(*notxorqi_1): Likewise.
(*_2): Likewise.
(*si_2_zext): Likewise.
(*si_2_zext_imm): Likewise.
(*si_1_zext_imm): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*one_cmplsi2_2_zext): Likewise.
(define_split for *one_cmplsi2_2_zext): Use nonimmediate_operand for
operands[3].
(*3_doubleword): Add NDD constraints, adopt '&' to NDD dest
and emit move for optimized case if operands[0] != operands[1] or
operands[4] != operands[5].
(define_split for QI highpart OR/XOR): Prohibit splitter to split NDD
form OR/XOR insn to qi_ext_3.
(define_split for QI strict_lowpart optimization): Prohibit splitter to
split NDD form AND insn to *3_1_slp.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add or and xor test.
---
 gcc/config/i386/i386.md | 186 +++-
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  26 
 2 files changed, 143 insertions(+), 69 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d2528e0dcf6..ad4c958a1e8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12703,17 +12703,19 @@ (define_expand "3"
   && !x86_64_hilo_general_operand (operands[2], mode))
 operands[2] = force_reg (mode, operands[2]);
 
-  ix86_expand_binary_operator (, mode, operands);
+  ix86_expand_binary_operator (, mode, operands,
+  TARGET_APX_NDD);
   DONE;
 })
 
 (define_insn_and_split "*3_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,&r,&r")
(any_or:
-(match_operand: 1 "nonimmediate_operand" "%0,0")
-(match_operand: 2 "x86_64_hilo_general_operand" "r,o")))
+(match_operand: 1 "nonimmediate_operand" "%0,0,ro,r")
+(match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,o")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (, mode, operands)"
+  "ix86_binary_operator_ok (, mode, operands,
+   TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(const_int:DWIH 0)]
@@ -12725,20 +12727,29 @@ (define_insn_and_split "*3_doubleword"
   split_double_mode (mode, &operands[0], 3, &operands[0], &operands[3]);
 
   if (operands[2] == const0_rtx)
-emit_insn_deleted_note_p = true;
+{
+  if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+  else
+   emit_insn_deleted_note_p = true;
+}
   else if (operands[2] == constm1_rtx)
 {
   if ( == IOR)
emit_move_insn (operands[0], constm1_rtx);
   else
-   ix86_expand_unary_operator (NOT, mode, &operands[0]);
+   ix86_expand_unary_operator (NOT, mode, &operands[0],
+   TARGET_APX_NDD);
 }
   else
-ix86_expand_binary_operator (, mode, &operands[0]);
+ix86_expand_binary_operator (, mode, &operands[0],
+TARGET_APX_NDD);
 
   if (operands[5] == const0_rtx)
 {
-  if (emit_insn_deleted_note_p)
+  if (!rtx_equal_p (operands[3], operands[4]))
+   emit_move_insn (operands[3], operands[4]);
+  else if (emit_insn_deleted_note_p)
emit_note (NOTE_INSN_DELETED);
 }
   else if (operands[5] == constm1_rtx)
@@ -12746,37 +12757,43 @@ (define_insn_and_split "*3_doubleword"
   if ( == IOR)
emit_move_insn (operands[3], constm1_rtx);
   else
-   ix86_expand_unary_operator (NOT, mode, &operands[3]);
+   ix86_expand_unary_operator (NOT, mode, &operands[3],
+   TARGET_APX_NDD);
 }
   else
-ix86_expand_binary_operator (, mode, &operands[3]);
+ix86_expand_binary_operator (, mode, &operands[3],
+TARGET_APX_NDD);
 
   DONE;
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn "*_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
+  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,r,r,?k")
(any_or:SWI248
-(match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k")
-(match_operand:SWI248 2 "" "r,,k")))
+(match_operand:SWI248 1 "nonimmediate_operand" "%0,0,rm,r,k")
+(match_operand:SWI248 2 "" "r,,r,,k")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (, mode, operands)"
+  "ix86_binary_operator_ok (, mode, operands,
+   TARG

[PATCH 04/16] [APX NDD] Support APX NDD for adc insns

2023-12-06 Thread Hongyu Wang

From: Kong Lingling 

Legacy adc patterns are commonly adopted to TImode add, when extending TImode
add to NDD version, operands[0] and operands[1] can be different, so extra move
should be emitted if those patterns have optimization when adding const0_rtx.

For TImode insn, there could be register overlapping between operands[0]
and operands[1] as x86 allocates TImode register sequentially like rax:rdi,
rdi:rdx. After postreload split for TImode, write to 1st highpart rdi will
be overrided by the 2nd lowpart rdi if 2nd lowpart rdi have different src as
input, then the write to 1st highpart rdi will missed and cause miscompliation.
In addition, when input operands contain memory, the address register may also
overlaps with dest register if it is marked dead after one of highpart/lowpart
operation was done.
So the earlyclobber modifier '&' should be added to NDD dest to avoid
overlapping between dest and src operands.

NDD instructions will automatically zero-extend dest register to 64bit, so for
zext patterns it can adopt all NDD form that have memory src input.

gcc/ChangeLog:

* config/i386/i386.md (*add3_doubleword): Add ndd alternatives,
adopt '&' to ndd dest and move operands[1] to operands[0] when they are
not equal.
(*add3_doubleword_cc_overflow_1): Likewise.
(*addv4_doubleword): Likewise.
(*addv4_doubleword_1): Likewise.
(*add3_doubleword_zext): Likewise.
(addv4_overflow_1): Add ndd alternatives.
(*addv4_overflow_2): Likewise.
(@add3_carry): Likewise.
(*add3_carry_0): Likewise.
(*addsi3_carry_zext): Likewise.
(addcarry): Likewise.
(addcarry_0): Likewise.
(*addcarry_1): Likewise.
(*add3_eq): Likewise.
(*add3_ne): Likewise.
(*addsi3_carry_zext_0): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd-adc.c: New test.
---
 gcc/config/i386/i386.md | 193 
 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c |  15 ++
 2 files changed, 136 insertions(+), 72 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a1626121227..8dd8216041e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6294,12 +6294,12 @@ (define_expand "add3"
TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*add3_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,&r,&r")
(plus:
- (match_operand: 1 "nonimmediate_operand" "%0,0")
- (match_operand: 2 "x86_64_hilo_general_operand" "r,o")))
+ (match_operand: 1 "nonimmediate_operand" "%0,0,ro,r")
+ (match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,r")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (PLUS, mode, operands)"
+  "ix86_binary_operator_ok (PLUS, mode, operands, TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
@@ -6319,24 +6319,34 @@ (define_insn_and_split "*add3_doubleword"
   split_double_mode (mode, &operands[0], 3, &operands[0], &operands[3]);
   if (operands[2] == const0_rtx)
 {
+  /* Under NDD op0 and op1 may not equal, do not delete insn then.  */
+  bool emit_insn_deleted_note_p = true;
+  if (!rtx_equal_p (operands[0], operands[1]))
+   {
+ emit_move_insn (operands[0], operands[1]);
+ emit_insn_deleted_note_p = false;
+   }
   if (operands[5] != const0_rtx)
-   ix86_expand_binary_operator (PLUS, mode, &operands[3]);
+   ix86_expand_binary_operator (PLUS, mode, &operands[3],
+TARGET_APX_NDD);
   else if (!rtx_equal_p (operands[3], operands[4]))
emit_move_insn (operands[3], operands[4]);
-  else
+  else if (emit_insn_deleted_note_p)
emit_note (NOTE_INSN_DELETED);
   DONE;
 }
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*add3_doubleword_zext"
-  [(set (match_operand: 0 "nonimmediate_operand" "=r,o")
+  [(set (match_operand: 0 "nonimmediate_operand" "=r,o,&r,&r")
(plus:
  (zero_extend:
-   (match_operand:DWIH 2 "nonimmediate_operand" "rm,r")) 
- (match_operand: 1 "nonimmediate_operand" "0,0")))
+   (match_operand:DWIH 2 "nonimmediate_operand" "rm,r,rm,r"))
+ (match_operand: 1 "nonimmediate_operand" "0,0,r,m")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (UNKNOWN, mode, operands)"
+  "ix86_binary_operator_ok (UNKNOWN, mode, operands,
+   TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
@@ -6352,7 +6362,8 @@ (define_insn_and_split "*add3_doubleword_zext"
   (match_dup 4))

[PATCH 06/16] [APX NDD] Support APX NDD for sbb insn

2023-12-06 Thread Hongyu Wang

From: Kong Lingling 

Similar to *add3_doubleword, operands[1] may not equal to operands[0] so
extra move and earlyclobber are required.

gcc/ChangeLog:

* config/i386/i386.md (*sub3_doubleword): Add new alternative for
NDD, adopt '&' modifier to NDD dest and emit move when operands[0] not
equal to operands[1].
(*sub3_doubleword_zext): Likewise.
(*subv4_doubleword): Likewise.
(*subv4_doubleword_1): Likewise.
(*subv4_overflow_1): Add NDD alternatives and adjust output
templates.
(*subv4_overflow_2): Likewise.
(@sub3_carry): Likewise.
(*addsi3_carry_zext_0r): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*subsi3_carry_zext): Likewise.
(subborrow): Parse TARGET_APX_NDD to ix86_binary_operator_ok.
(subborrow_0): Likewise.
(*sub3_eq): Likewise.
(*sub3_ne): Likewise.
(*sub3_eq_1): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd-sbb.c: New test.
---
 gcc/config/i386/i386.md | 160 
 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c |   6 +
 2 files changed, 107 insertions(+), 59 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6ec498725aa..90981e733bd 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -7781,12 +7781,13 @@ (define_expand "sub3"
TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*sub3_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,&r,&r")
(minus:
- (match_operand: 1 "nonimmediate_operand" "0,0")
- (match_operand: 2 "x86_64_hilo_general_operand" "r,o")))
+ (match_operand: 1 "nonimmediate_operand" "0,0,ro,r")
+ (match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,o")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (MINUS, mode, operands)"
+  "ix86_binary_operator_ok (MINUS, mode, operands,
+   TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
@@ -7810,16 +7811,18 @@ (define_insn_and_split "*sub3_doubleword"
   TARGET_APX_NDD);
   DONE;
 }
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*sub3_doubleword_zext"
-  [(set (match_operand: 0 "nonimmediate_operand" "=r,o")
+  [(set (match_operand: 0 "nonimmediate_operand" "=r,o,&r,&r")
(minus:
- (match_operand: 1 "nonimmediate_operand" "0,0")
+ (match_operand: 1 "nonimmediate_operand" "0,0,r,o")
  (zero_extend:
-   (match_operand:DWIH 2 "nonimmediate_operand" "rm,r"
+   (match_operand:DWIH 2 "nonimmediate_operand" "rm,r,rm,r"
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (UNKNOWN, mode, operands)"
+  "ix86_binary_operator_ok (UNKNOWN, mode, operands,
+   TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
@@ -7833,7 +7836,8 @@ (define_insn_and_split "*sub3_doubleword_zext"
   (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0)))
 (const_int 0)))
  (clobber (reg:CC FLAGS_REG))])]
-  "split_double_mode (mode, &operands[0], 2, &operands[0], 
&operands[3]);")
+  "split_double_mode (mode, &operands[0], 2, &operands[0], &operands[3]);"
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn "*sub_1"
   [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,,r,r")
@@ -8167,14 +8171,15 @@ (define_insn_and_split "*subv4_doubleword"
(eq:CCO
  (minus:
(sign_extend:
- (match_operand: 1 "nonimmediate_operand" "0,0"))
+ (match_operand: 1 "nonimmediate_operand" "0,0,ro,r"))
(sign_extend:
- (match_operand: 2 "nonimmediate_operand" "r,o")))
+ (match_operand: 2 "nonimmediate_operand" "r,o,r,o")))
  (sign_extend:
(minus: (match_dup 1) (match_dup 2)
-   (set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+   (set (match_operand: 0 "nonimmediate_operand" "=ro,r,&r,&r")
(minus: (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (MINUS, mode, operands)"
+  "ix86_binary_operator_ok (MINUS, mode, operands,
+   TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
@@ -8202,22 +8207,24 @@ (define_insn_and_split "*subv4_doubleword"
 (match_dup 5)))])]
 {
   split_double_mode (mode, &operands[0], 3, &operands[0], &operands[3]);
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*subv4_doubleword_1"
   [(set (reg:CCO FLAGS_REG)
(eq:CCO
  (minus:
(sign_extend:
- (match_operand: 1 "no

[PATCH 07/16] [APX NDD] Support APX NDD for neg insn

2023-12-06 Thread Hongyu Wang

From: Kong Lingling 

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_unary_operator): Add use_ndd
parameter and adjust for NDD.
* config/i386/i386-protos.h: Add use_ndd parameter for
ix86_unary_operator_ok and ix86_expand_unary_operator.
* config/i386/i386.cc (ix86_unary_operator_ok): Add use_ndd parameter
and adjust for NDD.
* config/i386/i386.md (neg2): Add new constraint for NDD and
adjust output template.
(*neg_1): Likewise.
(*neg2_doubleword): Likewise and adopt '&' to NDD dest.
(*neg_2): Likewise.
(*neg_ccc_1): Likewise.
(*neg_ccc_2): Likewise.
(*negsi_1_zext): Likewise, and use nonimmediate_operand for operands[1]
to accept memory input for NDD alternatives.
(*negsi_2_zext): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add neg test.
---
 gcc/config/i386/i386-expand.cc  |  4 +-
 gcc/config/i386/i386-protos.h   |  5 +-
 gcc/config/i386/i386.cc |  5 +-
 gcc/config/i386/i386.md | 77 -
 gcc/testsuite/gcc.target/i386/apx-ndd.c | 29 ++
 5 files changed, 87 insertions(+), 33 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 93ecde4b4a8..d4bbd33ce07 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1494,7 +1494,7 @@ ix86_binary_operator_ok (enum rtx_code code, machine_mode 
mode,
 
 void
 ix86_expand_unary_operator (enum rtx_code code, machine_mode mode,
-   rtx operands[])
+   rtx operands[], bool use_ndd)
 {
   bool matching_memory = false;
   rtx src, dst, op, clob;
@@ -1513,7 +1513,7 @@ ix86_expand_unary_operator (enum rtx_code code, 
machine_mode mode,
 }
 
   /* When source operand is memory, destination must match.  */
-  if (MEM_P (src) && !matching_memory)
+  if (!use_ndd && MEM_P (src) && !matching_memory)
 src = force_reg (mode, src);
 
   /* Emit the instruction.  */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 481527872e8..fa952409729 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -127,7 +127,7 @@ extern bool ix86_vec_interleave_v2df_operator_ok (rtx 
operands[3], bool high);
 extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn);
 extern bool ix86_agi_dependent (rtx_insn *set_insn, rtx_insn *use_insn);
 extern void ix86_expand_unary_operator (enum rtx_code, machine_mode,
-   rtx[]);
+   rtx[], bool = false);
 extern rtx ix86_build_const_vector (machine_mode, bool, rtx);
 extern rtx ix86_build_signbit_mask (machine_mode, bool, bool);
 extern HOST_WIDE_INT ix86_convert_const_vector_to_integer (rtx,
@@ -147,7 +147,8 @@ extern void ix86_split_fp_absneg_operator (enum rtx_code, 
machine_mode,
   rtx[]);
 extern void ix86_expand_copysign (rtx []);
 extern void ix86_expand_xorsign (rtx []);
-extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode, rtx[2]);
+extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode, rtx[2],
+   bool = false);
 extern bool ix86_match_ccmode (rtx, machine_mode);
 extern bool ix86_match_ptest_ccmode (rtx);
 extern void ix86_expand_branch (enum rtx_code, rtx, rtx, rtx);
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 8aa33aef7e1..4b6bad37c8f 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -16209,11 +16209,12 @@ ix86_dep_by_shift_count (const_rtx set_insn, 
const_rtx use_insn)
 bool
 ix86_unary_operator_ok (enum rtx_code,
machine_mode,
-   rtx operands[2])
+   rtx operands[2],
+   bool use_ndd)
 {
   /* If one of operands is memory, source and destination must match.  */
   if ((MEM_P (operands[0])
-   || MEM_P (operands[1]))
+   || (!use_ndd && MEM_P (operands[1])))
   && ! rtx_equal_p (operands[0], operands[1]))
 return false;
   return true;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 90981e733bd..e97c1784e9a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -13287,13 +13287,14 @@ (define_expand "neg2"
   [(set (match_operand:SDWIM 0 "nonimmediate_operand")
(neg:SDWIM (match_operand:SDWIM 1 "nonimmediate_operand")))]
   ""
-  "ix86_expand_unary_operator (NEG, mode, operands); DONE;")
+  "ix86_expand_unary_operator (NEG, mode, operands,
+  TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*neg2_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro")
-   (neg: (match_operand: 1 "nonimmediate_operand" "0")))
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,&r")
+   (neg: (match_operand: 1 "nonimmedia

[PATCH 09/16] [APX NDD] Support APX NDD for and insn

2023-12-06 Thread Hongyu Wang

From: Kong Lingling 

For NDD form AND insn, there are three splitter fixes after extending legacy
patterns.

1. APX NDD does not support high QImode registers like ah, bh, ch, dh, so for
some optimization splitters that generates highpart zero_extract for QImode
need to be prohibited under NDD pattern.

2. Legacy AND insn will use r/qm/L constraint, and a post-reload splitter will
transform it into zero_extend move. But for NDD form AND, the splitter is not
strict enough as the splitter assum such AND will have the const_int operand
matching the constraint "L", then NDD form AND allows const_int with any QI
values. Restrict the splitter condition to match "L" constraint that strictly
matches zero-extend sematic.

3. Legacy AND insn will adopt r/0/Z constraint, a splitter will try to optimize
such form into strict_lowpart QImode AND when 7th bit is not set. But the
splitter will wronly convert non-zext form of NDD and with memory src, then the
strict_lowpart transform matches alternative 1 of *_slp_1 and
generates *movstrict_1 so the zext sematic was omitted. This could cause
highpart of dest not cleared and generates wrong code. Disable the splitter
when NDD adopted and operands[0] and operands[1] are not equal.

gcc/ChangeLog:

* config/i386/i386.md (and3): Add NDD alternatives and adjust
output template.
(*anddi_1): Likewise.
(*and_1): Likewise.
(*andqi_1): Likewise.
(*andsi_1_zext): Likewise.
(*anddi_2): Likewise.
(*andsi_2_zext): Likewise.
(*andqi_2_maybe_si): Likewise.
(*and_2): Likewise.
(*and3_doubleword): Add NDD alternative, adopt '&' to NDD dest and
emit move for optimized case if operands[0] not equal to operands[1].
(define_split for QI highpart AND): Prohibit splitter to split NDD
form AND insn to qi_ext_3.
(define_split for QI strict_lowpart optimization): Prohibit splitter to
split NDD form AND insn to *3_1_slp.
(define_split for zero_extend and optimization): Prohibit splitter to
split NDD form AND insn to zero_extend insn.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add and test.
---
 gcc/config/i386/i386.md | 175 +++-
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  13 ++
 2 files changed, 127 insertions(+), 61 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 61b7b79543b..d2528e0dcf6 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11710,18 +11710,19 @@ (define_expand "and3"
   (operands[0], gen_lowpart (mode, operands[1]),
mode, mode, 1));
   else
-ix86_expand_binary_operator (AND, mode, operands);
+ix86_expand_binary_operator (AND, mode, operands,
+TARGET_APX_NDD);
 
   DONE;
 })
 
 (define_insn_and_split "*and3_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,&r,&r")
(and:
-(match_operand: 1 "nonimmediate_operand" "%0,0")
-(match_operand: 2 "x86_64_hilo_general_operand" "r,o")))
+(match_operand: 1 "nonimmediate_operand" "%0,0,ro,r")
+(match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,o")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (AND, mode, operands)"
+  "ix86_binary_operator_ok (AND, mode, operands, TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(const_int:DWIH 0)]
@@ -11733,39 +11734,53 @@ (define_insn_and_split "*and3_doubleword"
   if (operands[2] == const0_rtx)
 emit_move_insn (operands[0], const0_rtx);
   else if (operands[2] == constm1_rtx)
-emit_insn_deleted_note_p = true;
+{
+  if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+  else
+   emit_insn_deleted_note_p = true;
+}
   else
-ix86_expand_binary_operator (AND, mode, &operands[0]);
+ix86_expand_binary_operator (AND, mode, &operands[0],
+TARGET_APX_NDD);
 
   if (operands[5] == const0_rtx)
 emit_move_insn (operands[3], const0_rtx);
   else if (operands[5] == constm1_rtx)
 {
-  if (emit_insn_deleted_note_p)
+  if (!rtx_equal_p (operands[3], operands[4]))
+   emit_move_insn (operands[3], operands[4]);
+  else if (emit_insn_deleted_note_p)
emit_note (NOTE_INSN_DELETED);
 }
   else
-ix86_expand_binary_operator (AND, mode, &operands[3]);
+ix86_expand_binary_operator (AND, mode, &operands[3],
+TARGET_APX_NDD);
 
   DONE;
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn "*anddi_1"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r,?k")
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,rm,r,r,r,r,?k")
(and:DI
-(match_operand:DI 1 "nonimmediate_operand" "%0,0,0,qm,k")
-(match_operand:DI 2 "x86_64_szext_general_operand" "Z,

[PATCH 15/16] [APX NDD] Support APX NDD for cmove insns

2023-12-06 Thread Hongyu Wang

gcc/ChangeLog:

* config/i386/i386.md (*movcc_noc): Extend with new constraints
to support NDD.
(*movsicc_noc_zext): Likewise.
(*movsicc_noc_zext_1): Likewise.
(*movqicc_noc): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd-cmov.c: New test.
---
 gcc/config/i386/i386.md  | 48 
 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c | 16 +++
 2 files changed, 45 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5c6275430d6..017ab720293 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -24417,47 +24417,56 @@ (define_split
(neg:SWI (ltu:SWI (reg:CCC FLAGS_REG) (const_int 0])
 
 (define_insn "*movcc_noc"
-  [(set (match_operand:SWI248 0 "register_operand" "=r,r")
+  [(set (match_operand:SWI248 0 "register_operand" "=r,r,r,r")
(if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator"
   [(reg FLAGS_REG) (const_int 0)])
- (match_operand:SWI248 2 "nonimmediate_operand" "rm,0")
- (match_operand:SWI248 3 "nonimmediate_operand" "0,rm")))]
+ (match_operand:SWI248 2 "nonimmediate_operand" "rm,0,rm,r")
+ (match_operand:SWI248 3 "nonimmediate_operand" "0,rm,r,rm")))]
   "TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "@
cmov%O2%C1\t{%2, %0|%0, %2}
-   cmov%O2%c1\t{%3, %0|%0, %3}"
-  [(set_attr "type" "icmov")
+   cmov%O2%c1\t{%3, %0|%0, %3}
+   cmov%O2%C1\t{%2, %3, %0|%0, %3, %2}
+   cmov%O2%c1\t{%3, %2, %0|%0, %2, %3}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "icmov")
(set_attr "mode" "")])
 
 (define_insn "*movsicc_noc_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
(if_then_else:DI (match_operator 1 "ix86_comparison_operator"
   [(reg FLAGS_REG) (const_int 0)])
  (zero_extend:DI
-   (match_operand:SI 2 "nonimmediate_operand" "rm,0"))
+   (match_operand:SI 2 "nonimmediate_operand" "rm,0,rm,r"))
  (zero_extend:DI
-   (match_operand:SI 3 "nonimmediate_operand" "0,rm"]
+   (match_operand:SI 3 "nonimmediate_operand" "0,rm,r,rm"]
   "TARGET_64BIT
&& TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "@
cmov%O2%C1\t{%2, %k0|%k0, %2}
-   cmov%O2%c1\t{%3, %k0|%k0, %3}"
-  [(set_attr "type" "icmov")
+   cmov%O2%c1\t{%3, %k0|%k0, %3}
+   cmov%O2%C1\t{%2, %3, %k0|%k0, %3, %2}
+   cmov%O2%c1\t{%3, %2, %k0|%k0, %2, %3}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "icmov")
(set_attr "mode" "SI")])
 
 (define_insn "*movsicc_noc_zext_1"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r")
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,r")
(zero_extend:DI
  (if_then_else:SI (match_operator 1 "ix86_comparison_operator"
 [(reg FLAGS_REG) (const_int 0)])
-(match_operand:SI 2 "nonimmediate_operand" "rm,0")
-(match_operand:SI 3 "nonimmediate_operand" "0,rm"]
+(match_operand:SI 2 "nonimmediate_operand" "rm,0,rm,r")
+(match_operand:SI 3 "nonimmediate_operand" "0,rm,r,rm"]
   "TARGET_64BIT
&& TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "@
cmov%O2%C1\t{%2, %k0|%k0, %2}
-   cmov%O2%c1\t{%3, %k0|%k0, %3}"
-  [(set_attr "type" "icmov")
+   cmov%O2%c1\t{%3, %k0|%k0, %3}
+   cmov%O2%C1\t{%2, %3, %k0|%k0, %3, %2}
+   cmov%O2%c1\t{%3, %2, %k0|%k0, %2, %3}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "icmov")
(set_attr "mode" "SI")])
 
 
@@ -24482,14 +24491,15 @@ (define_split
 })
 
 (define_insn "*movqicc_noc"
-  [(set (match_operand:QI 0 "register_operand" "=r,r")
+  [(set (match_operand:QI 0 "register_operand" "=r,r,r")
(if_then_else:QI (match_operator 1 "ix86_comparison_operator"
   [(reg FLAGS_REG) (const_int 0)])
- (match_operand:QI 2 "register_operand" "r,0")
- (match_operand:QI 3 "register_operand" "0,r")))]
+ (match_operand:QI 2 "register_operand" "r,0,r")
+ (match_operand:QI 3 "register_operand" "0,r,r")))]
   "TARGET_CMOVE && !TARGET_PARTIAL_REG_STALL"
   "#"
-  [(set_attr "type" "icmov")
+  [(set_attr "isa" "*,*,apx_ndd")
+   (set_attr "type" "icmov")
(set_attr "mode" "QI")])
 
 (define_split
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c 
b/gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
new file mode 100644
index 000..459dc965342
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -m64 -mapxf" } */
+/* { dg-final { scan-assembler-times "cmove\[^\n\r]*, %eax" 1 } } */
+/*

[PATCH 14/16] [APX NDD] Support APX NDD for shld/shrd insns

2023-12-06 Thread Hongyu Wang

For shld/shrd insns, the old pattern use match_dup 0 as its shift src and use
+r*m as its constraint. To support NDD we added new define_insns to handle NDD
form pattern with extra input and dest operand to be fixed in register.

gcc/ChangeLog:

* config/i386/i386.md (x86_64_shld_ndd): New define_insn.
(x86_64_shld_ndd_1): Likewise.
(*x86_64_shld_ndd_2): Likewise.
(x86_shld_ndd): Likewise.
(x86_shld_ndd_1): Likewise.
(*x86_shld_ndd_2): Likewise.
(x86_64_shrd_ndd): Likewise.
(x86_64_shrd_ndd_1): Likewise.
(*x86_64_shrd_ndd_2): Likewise.
(x86_shrd_ndd): Likewise.
(x86_shrd_ndd_1): Likewise.
(*x86_shrd_ndd_2): Likewise.
(*x86_64_shld_shrd_1_nozext): Adjust codegen under TARGET_APX_NDD.
(*x86_shld_shrd_1_nozext): Likewise.
(*x86_64_shrd_shld_1_nozext): Likewise.
(*x86_shrd_shld_1_nozext): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd-shld-shrd.c: New test.
---
 gcc/config/i386/i386.md   | 322 +-
 .../gcc.target/i386/apx-ndd-shld-shrd.c   |  24 ++
 2 files changed, 344 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6e4ac776f8a..5c6275430d6 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14510,6 +14510,23 @@ (define_insn "x86_64_shld"
(set_attr "amdfam10_decode" "vector")
(set_attr "bdver1_decode" "vector")])
 
+(define_insn "x86_64_shld_ndd"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand" "rm")
+ (and:QI (match_operand:QI 3 "nonmemory_operand" "Jc")
+ (const_int 63)))
+   (subreg:DI
+ (lshiftrt:TI
+   (zero_extend:TI
+ (match_operand:DI 2 "register_operand" "r"))
+   (minus:QI (const_int 64)
+ (and:QI (match_dup 3) (const_int 63 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD"
+  "shld{q}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ishift")
+   (set_attr "mode" "DI")])
+
 (define_insn "x86_64_shld_1"
   [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m")
 (ior:DI (ashift:DI (match_dup 0)
@@ -14531,6 +14548,24 @@ (define_insn "x86_64_shld_1"
(set_attr "amdfam10_decode" "vector")
(set_attr "bdver1_decode" "vector")])
 
+(define_insn "x86_64_shld_ndd_1"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand" "rm")
+  (match_operand:QI 3 "const_0_to_63_operand"))
+   (subreg:DI
+ (lshiftrt:TI
+   (zero_extend:TI
+ (match_operand:DI 2 "register_operand" "r"))
+   (match_operand:QI 4 "const_0_to_255_operand")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD
+   && INTVAL (operands[4]) == 64 - INTVAL (operands[3])"
+  "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ishift")
+   (set_attr "mode" "DI")
+   (set_attr "length_immediate" "1")])
+
+
 (define_insn_and_split "*x86_64_shld_shrd_1_nozext"
   [(set (match_operand:DI 0 "nonimmediate_operand")
(ior:DI (ashift:DI (match_operand:DI 4 "nonimmediate_operand")
@@ -14556,6 +14591,23 @@ (define_insn_and_split "*x86_64_shld_shrd_1_nozext"
   operands[4] = force_reg (DImode, operands[4]);
   emit_insn (gen_x86_64_shrd_1 (operands[0], operands[4], operands[3], 
operands[2]));
 }
+  else if (TARGET_APX_NDD)
+{
+ rtx tmp = gen_reg_rtx (DImode);
+ if (MEM_P (operands[4]))
+   {
+operands[1] = force_reg (DImode, operands[1]);
+emit_insn (gen_x86_64_shld_ndd_1 (tmp, operands[4], operands[1],
+  operands[2], operands[3]));
+   }
+ else if (MEM_P (operands[1]))
+   emit_insn (gen_x86_64_shrd_ndd_1 (tmp, operands[1], operands[4],
+operands[3], operands[2]));
+ else
+   emit_insn (gen_x86_64_shld_ndd_1 (tmp, operands[4], operands[1],
+operands[2], operands[3]));
+ emit_move_insn (operands[0], tmp);
+}
   else
{
  operands[1] = force_reg (DImode, operands[1]);
@@ -14588,6 +14640,33 @@ (define_insn_and_split "*x86_64_shld_2"
   (const_int 63 0)))
  (clobber (reg:CC FLAGS_REG))])])
 
+(define_insn_and_split "*x86_64_shld_ndd_2"
+  [(set (match_operand:DI 0 "nonimmediate_operand")
+   (ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand")
+  (match_operand:QI 3 "nonmemory_operand"))
+   (lshiftrt:DI (match_operand:DI 2 "register_operand")
+(mi

[PATCH 02/16] [APX NDD] Support APX NDD for optimization patterns of add

2023-12-06 Thread Hongyu Wang

From: Kong Lingling 

gcc/ChangeLog:

* config/i386/i386.md: (addsi_1_zext): Add new alternatives for
NDD and adjust output templates.
(*add_2): Likewise.
(*addsi_2_zext): Likewise.
(*add_3): Likewise.
(*addsi_3_zext): Likewise.
(*adddi_4): Likewise.
(*add_4): Likewise.
(*add_5): Likewise.
(*addv4): Likewise.
(*addv4_1): Likewise.
(*add3_cconly_overflow_1): Likewise.
(*add3_cc_overflow_1): Likewise.
(*addsi3_zext_cc_overflow_1): Likewise.
(*add3_cconly_overflow_2): Likewise.
(*add3_cc_overflow_2): Likewise.
(*addsi3_zext_cc_overflow_2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add more test.
---
 gcc/config/i386/i386.md | 310 +++-
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  53 ++--
 2 files changed, 232 insertions(+), 131 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a5b123a51bd..1e846183347 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6479,13 +6479,15 @@ (define_insn "*add_1"
 ;; patterns constructed from addsi_1 to match.
 
 (define_insn "addsi_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r,r,r")
(zero_extend:DI
- (plus:SI (match_operand:SI 1 "nonimmediate_operand" "%0,r,r")
-  (match_operand:SI 2 "x86_64_general_operand" "rBMe,0,le"
+ (plus:SI (match_operand:SI 1 "nonimmediate_operand" "%0,r,r,r,rm")
+  (match_operand:SI 2 "x86_64_general_operand" 
"rBMe,0,le,rBMe,re"
(clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands)"
+  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands,
+   TARGET_APX_NDD)"
 {
+  bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
 {
 case TYPE_LEA:
@@ -6493,11 +6495,13 @@ (define_insn "addsi_1_zext"
 
 case TYPE_INCDEC:
   if (operands[2] == const1_rtx)
-return "inc{l}\t%k0";
+return use_ndd ? "inc{l}\t{%1, %k0|%k0, %1}"
+  : "inc{l}\t%k0";
   else
 {
  gcc_assert (operands[2] == constm1_rtx);
-  return "dec{l}\t%k0";
+ return use_ndd ? "dec{l}\t{%1, %k0|%k0, %1}"
+: "dec{l}\t%k0";
}
 
 default:
@@ -6507,12 +6511,15 @@ (define_insn "addsi_1_zext"
 std::swap (operands[1], operands[2]);
 
   if (x86_maybe_negate_const_int (&operands[2], SImode))
-return "sub{l}\t{%2, %k0|%k0, %2}";
+return use_ndd ? "sub{l}\t{%2 ,%1, %k0|%k0, %1, %2}"
+  : "sub{l}\t{%2, %k0|%k0, %2}";
 
-  return "add{l}\t{%2, %k0|%k0, %2}";
+  return use_ndd ? "add{l}\t{%2 ,%1, %k0|%k0, %1, %2}"
+: "add{l}\t{%2, %k0|%k0, %2}";
 }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,*,*,apx_ndd,apx_ndd")
+   (set (attr "type")
  (cond [(eq_attr "alternative" "2")
  (const_string "lea")
(match_operand:SI 2 "incdec_operand")
@@ -6814,37 +6821,42 @@ (define_insn "*add_2"
   [(set (reg FLAGS_REG)
(compare
  (plus:SWI
-   (match_operand:SWI 1 "nonimmediate_operand" "%0,0,")
-   (match_operand:SWI 2 "" ",,0"))
+   (match_operand:SWI 1 "nonimmediate_operand" "%0,0,,rm,r")
+   (match_operand:SWI 2 "" ",,0,r,"))
  (const_int 0)))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=m,,")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=m,,,r,r")
(plus:SWI (match_dup 1) (match_dup 2)))]
   "ix86_match_ccmode (insn, CCGOCmode)
-   && ix86_binary_operator_ok (PLUS, mode, operands)"
+   && ix86_binary_operator_ok (PLUS, mode, operands, TARGET_APX_NDD)"
 {
+  bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
 {
 case TYPE_INCDEC:
   if (operands[2] == const1_rtx)
-return "inc{}\t%0";
+return use_ndd ? "inc{}\t{%1, %0|%0, %1}"
+  : "inc{}\t%0";
   else
 {
  gcc_assert (operands[2] == constm1_rtx);
-  return "dec{}\t%0";
+ return use_ndd ? "dec{}\t{%1, %0|%0, %1}"
+: "dec{}\t%0";
}
 
 default:
   if (which_alternative == 2)
 std::swap (operands[1], operands[2]);
 
-  gcc_assert (rtx_equal_p (operands[0], operands[1]));
   if (x86_maybe_negate_const_int (&operands[2], mode))
-return "sub{}\t{%2, %0|%0, %2}";
+return use_ndd ? "sub{}\t{%2, %1, %0|%0, %1, %2}"
+  : "sub{}\t{%2, %0|%0, %2}";
 
-  return "add{}\t{%2, %0|%0, %2}";
+  return use_ndd ? "add{}\t{%2, %1, %0|%0, %1, %2}"
+: "add{}\t{%2, %0|%0, %2}";
 }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,*,*

[PATCH 11/16] [APX NDD] Support APX NDD for left shift insns

2023-12-06 Thread Hongyu Wang

For left shift, there is an optimization TARGET_DOUBLE_WITH_ADD that shl
1 can be optimized to add. As NDD form of add requires src operand to
be register since NDD cannot take 2 memory src, we currently just keep
using NDD form shift instead of add.

The optimization TARGET_SHIFT1 will try to remove constant 1 to use shorter
opcode, but under NDD assembler will automatically use it whether $1 exist
or not, so do not involve NDD with it.

The doubleword insns for left shift calls ix86_expand_ashl, which assume
all shift related pattern has same operand[0] and operand[1]. For these pattern
we will support them in a standalone patch.

gcc/ChangeLog:

* config/i386/i386.md (*ashl3_1): Extend with new
alternatives to support NDD, limit the new alternative to
generate sal only, and adjust output template for NDD.
(*ashlsi3_1_zext): Likewise.
(*ashlhi3_1): Likewise.
(*ashlqi3_1): Likewise.
(*ashl3_cmp): Likewise.
(*ashlsi3_cmp_zext): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*ashl3_cconly): Likewise.
(*ashl3_doubleword_highpart): Adjust codegen for NDD.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add tests for sal.
---
 gcc/config/i386/i386.md | 172 
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  22 +++
 2 files changed, 136 insertions(+), 58 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ad4c958a1e8..c67896cf97c 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14472,10 +14472,19 @@ (define_insn_and_split 
"*ashl3_doubleword_highpart"
 {
   split_double_mode (mode, &operands[0], 1, &operands[0], &operands[3]);
   int bits = INTVAL (operands[2]) - ( * BITS_PER_UNIT);
-  if (!rtx_equal_p (operands[3], operands[1]))
-emit_move_insn (operands[3], operands[1]);
-  if (bits > 0)
-emit_insn (gen_ashl3 (operands[3], operands[3], GEN_INT (bits)));
+  bool op_equal_p = rtx_equal_p (operands[3], operands[1]);
+  if (bits == 0)
+{
+  if (!op_equal_p)
+   emit_move_insn (operands[3], operands[1]);
+}
+  else
+{
+  if (!op_equal_p && !TARGET_APX_NDD)
+   emit_move_insn (operands[3], operands[1]);
+  rtx op_tmp = TARGET_APX_NDD ? operands[1] : operands[3];
+  emit_insn (gen_ashl3 (operands[3], op_tmp, GEN_INT (bits)));
+}
   ix86_expand_clear (operands[0]);
   DONE;
 })
@@ -14782,12 +14791,14 @@ (define_insn "*bmi2_ashl3_1"
(set_attr "mode" "")])
 
 (define_insn "*ashl3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k")
-   (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm,k")
- (match_operand:QI 2 "nonmemory_operand" "c,M,r,")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k,r")
+   (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" 
"0,l,rm,k,rm")
+ (match_operand:QI 2 "nonmemory_operand" 
"c,M,r,,c")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, mode, operands)"
+  "ix86_binary_operator_ok (ASHIFT, mode, operands,
+   TARGET_APX_NDD)"
 {
+  bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
 {
 case TYPE_LEA:
@@ -14802,18 +14813,25 @@ (define_insn "*ashl3_1"
 
 default:
   if (operands[2] == const1_rtx
- && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+ /* For NDD form instructions related to TARGET_SHIFT1, the $1
+immediate do not need to be omitted as assembler will map it
+to use shorter encoding. */
+ && !use_ndd)
return "sal{}\t%0";
   else
-   return "sal{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "sal{}\t{%2, %1, %0|%0, %1, %2}"
+  : "sal{}\t{%2, %0|%0, %2}";
 }
 }
-  [(set_attr "isa" "*,*,bmi2,")
+  [(set_attr "isa" "*,*,bmi2,,apx_ndd")
(set (attr "type")
  (cond [(eq_attr "alternative" "1")
  (const_string "lea")
(eq_attr "alternative" "2")
  (const_string "ishiftx")
+   (eq_attr "alternative" "4")
+ (const_string "ishift")
 (and (and (match_test "TARGET_DOUBLE_WITH_ADD")
  (match_operand 0 "register_operand"))
 (match_operand 2 "const1_operand"))
@@ -14855,13 +14873,15 @@ (define_insn "*bmi2_ashlsi3_1_zext"
(set_attr "mode" "SI")])
 
 (define_insn "*ashlsi3_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
(zero_extend:DI
- (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "0,l,rm")
-(match_operand:QI 2 "nonmemory_operand" "cI,M,r"
+ (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "0

[PATCH 12/16] [APX NDD] Support APX NDD for right shift insns

2023-12-06 Thread Hongyu Wang

Similar to LSHIFT, rshift do not need to omit $1 for NDD form.

gcc/ChangeLog:

* config/i386/i386.md (ashr3_cvt): Extend with new
alternatives to support NDD, and adjust output templates.
(*ashr3_1): Likewise for SI/DI mode.
(*lshr3_1): Likewise.
(*si3_1_zext): Likewise.
(*ashr3_1): Likewise for QI/HI mode.
(*lshrqi3_1): Likewise.
(*lshrhi3_1): Likewise.
(3_cmp): Likewise.
(*3_cconly): Likewise.
(*ashrsi3_cvt_zext): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*highpartdisi2): Likewise.
(*si3_cmp_zext): Likewise.
(3_carry): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add l/ashiftrt tests.
---
 gcc/config/i386/i386.md | 232 +++-
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  24 +++
 2 files changed, 166 insertions(+), 90 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c67896cf97c..d1eae7248d9 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15808,39 +15808,45 @@ (define_mode_attr cvt_mnemonic
   [(SI "{cltd|cdq}") (DI "{cqto|cqo}")])
 
 (define_insn "ashr3_cvt"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=*d,rm")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=*d,rm,r")
(ashiftrt:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "*a,0")
+ (match_operand:SWI48 1 "nonimmediate_operand" "*a,0,rm")
  (match_operand:QI 2 "const_int_operand")))
(clobber (reg:CC FLAGS_REG))]
   "INTVAL (operands[2]) == GET_MODE_BITSIZE (mode)-1
&& (TARGET_USE_CLTD || optimize_function_for_size_p (cfun))
-   && ix86_binary_operator_ok (ASHIFTRT, mode, operands)"
+   && ix86_binary_operator_ok (ASHIFTRT, mode, operands,
+  TARGET_APX_NDD)"
   "@

-   sar{}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "imovx,ishift")
-   (set_attr "prefix_0f" "0,*")
-   (set_attr "length_immediate" "0,*")
-   (set_attr "modrm" "0,1")
+   sar{}\t{%2, %0|%0, %2}
+   sar{}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd")
+   (set_attr "type" "imovx,ishift,ishift")
+   (set_attr "prefix_0f" "0,*,*")
+   (set_attr "length_immediate" "0,*,*")
+   (set_attr "modrm" "0,1,1")
(set_attr "mode" "")])
 
 (define_insn "*ashrsi3_cvt_zext"
-  [(set (match_operand:DI 0 "register_operand" "=*d,r")
+  [(set (match_operand:DI 0 "register_operand" "=*d,r,r")
(zero_extend:DI
- (ashiftrt:SI (match_operand:SI 1 "register_operand" "*a,0")
+ (ashiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "*a,0,rm")
   (match_operand:QI 2 "const_int_operand"
(clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && INTVAL (operands[2]) == 31
&& (TARGET_USE_CLTD || optimize_function_for_size_p (cfun))
-   && ix86_binary_operator_ok (ASHIFTRT, SImode, operands)"
+   && ix86_binary_operator_ok (ASHIFTRT, SImode, operands,
+  TARGET_APX_NDD)"
   "@
{cltd|cdq}
-   sar{l}\t{%2, %k0|%k0, %2}"
-  [(set_attr "type" "imovx,ishift")
-   (set_attr "prefix_0f" "0,*")
-   (set_attr "length_immediate" "0,*")
-   (set_attr "modrm" "0,1")
+   sar{l}\t{%2, %k0|%k0, %2}
+   sar{l}\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd")
+   (set_attr "type" "imovx,ishift,ishift")
+   (set_attr "prefix_0f" "0,*,*")
+   (set_attr "length_immediate" "0,*,*")
+   (set_attr "modrm" "0,1,1")
(set_attr "mode" "SI")])
 
 (define_expand "@x86_shift_adj_3"
@@ -15882,13 +15888,15 @@ (define_insn "*bmi2_3_1"
(set_attr "mode" "")])
 
 (define_insn "*ashr3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
(ashiftrt:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,r")))
+ (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,rm")
+ (match_operand:QI 2 "nonmemory_operand" "c,r,c")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFTRT, mode, operands)"
+  "ix86_binary_operator_ok (ASHIFTRT, mode, operands,
+   TARGET_APX_NDD)"
 {
+  bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
 {
 case TYPE_ISHIFTX:
@@ -15896,14 +15904,16 @@ (define_insn "*ashr3_1"
 
 default:
   if (operands[2] == const1_rtx
- && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+ && !use_ndd)
return "sar{}\t%0";
   else
-   return "sar{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "sar{}\t{%2, %1, %0|%0, %1, %2}"
+  : "sar{}\t{%2, %0|%0, %2}";
 }
 }
-  [(set_attr "isa" "*,bmi2")
-   (set_attr "type" "ishift,ishiftx")
+  [(set_attr "isa" "*,bmi2,apx_ndd")
+   (set

[PATCH 05/16] [APX NDD] Support APX NDD for sub insns

2023-12-06 Thread Hongyu Wang

From: Kong Lingling 

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_fixup_binary_operands_no_copy):
Add use_ndd parameter and parse it.
* config/i386/i386-protos.h (ix86_fixup_binary_operands_no_copy):
Change define.
* config/i386/i386.md (sub3): Add new alternatives for NDD
and adjust output templates.
(*sub_1): Likewise.
(*sub_2): Likewise.
(subv4): Likewise.
(*subv4): Likewise.
(subv4_1): Likewise.
(usubv4): Likewise.
(*sub_3): Likewise.
(*subsi_1_zext): Likewise, and use nonimmediate_operand for operands[1]
to accept memory input for NDD alternatives.
(*subsi_2_zext): Likewise.
(*subsi_3_zext): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add test for ndd sub.
---
 gcc/config/i386/i386-expand.cc  |   5 +-
 gcc/config/i386/i386-protos.h   |   2 +-
 gcc/config/i386/i386.md | 155 
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  13 ++
 4 files changed, 120 insertions(+), 55 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 3ecda989cf8..93ecde4b4a8 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1326,9 +1326,10 @@ ix86_fixup_binary_operands (enum rtx_code code, 
machine_mode mode,
 
 void
 ix86_fixup_binary_operands_no_copy (enum rtx_code code,
-   machine_mode mode, rtx operands[])
+   machine_mode mode, rtx operands[],
+   bool use_ndd)
 {
-  rtx dst = ix86_fixup_binary_operands (code, mode, operands);
+  rtx dst = ix86_fixup_binary_operands (code, mode, operands, use_ndd);
   gcc_assert (dst == operands[0]);
 }
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 7dfeb6af225..481527872e8 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -111,7 +111,7 @@ extern void ix86_expand_vector_move_misalign (machine_mode, 
rtx[]);
 extern rtx ix86_fixup_binary_operands (enum rtx_code,
   machine_mode, rtx[], bool = false);
 extern void ix86_fixup_binary_operands_no_copy (enum rtx_code,
-   machine_mode, rtx[]);
+   machine_mode, rtx[], bool = 
false);
 extern void ix86_expand_binary_operator (enum rtx_code,
 machine_mode, rtx[], bool = false);
 extern void ix86_expand_vector_logical_operator (enum rtx_code,
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8dd8216041e..6ec498725aa 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -,7 +,8 @@ (define_expand "sub3"
(minus:SDWIM (match_operand:SDWIM 1 "nonimmediate_operand")
 (match_operand:SDWIM 2 "")))]
   ""
-  "ix86_expand_binary_operator (MINUS, mode, operands); DONE;")
+  "ix86_expand_binary_operator (MINUS, mode, operands,
+   TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*sub3_doubleword"
   [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
@@ -7803,7 +7804,10 @@ (define_insn_and_split "*sub3_doubleword"
   split_double_mode (mode, &operands[0], 3, &operands[0], &operands[3]);
   if (operands[2] == const0_rtx)
 {
-  ix86_expand_binary_operator (MINUS, mode, &operands[3]);
+  if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+  ix86_expand_binary_operator (MINUS, mode, &operands[3],
+  TARGET_APX_NDD);
   DONE;
 }
 })
@@ -7832,25 +7836,36 @@ (define_insn_and_split "*sub3_doubleword_zext"
   "split_double_mode (mode, &operands[0], 2, &operands[0], 
&operands[3]);")
 
 (define_insn "*sub_1"
-  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,")
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,,r,r")
(minus:SWI
- (match_operand:SWI 1 "nonimmediate_operand" "0,0")
- (match_operand:SWI 2 "" ",")))
+ (match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,r")
+ (match_operand:SWI 2 "" ",,r,")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (MINUS, mode, operands)"
-  "sub{}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "ix86_binary_operator_ok (MINUS, mode, operands,
+   TARGET_APX_NDD)"
+  "@
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
(set_attr "mode" "")])
 
 (define_insn "*subsi_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
(zero_extend:DI
- (minus:SI (match_operand:SI 1 "register_operand" "0")
-

[PATCH 16/16] [APX NDD] Support TImode shift for NDD

2023-12-06 Thread Hongyu Wang

For TImode shifts, they are splitted by splitter functions, which assume
operands[0] and operands[1] to be the same. For the NDD alternative the
assumption may not be true so add split functions for NDD to emit the NDD
form instructions, and omit the handling of !64bit target split.

Although the NDD form allows memory src, for post-reload splitter there are
no extra register to accept NDD form shift, especially shld/shrd. So only
accept register alternative for shift src under NDD.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_split_ashl_ndd): New
function to split NDD form lshift.
(ix86_split_rshift_ndd): Likewise for l/ashiftrt.
* config/i386/i386-protos.h (ix86_split_ashl_ndd): New
prototype.
(ix86_split_rshift_ndd): Likewise.
* config/i386/i386.md (ashl3_doubleword): Add NDD
alternative, call ndd split function when operands[0]
not equal to operands[1].
(define_split for doubleword lshift): Likewise.
(define_peephole for doubleword lshift): Likewise.
(3_doubleword): Likewise for l/ashiftrt.
(define_split for doubleword l/ashiftrt): Likewise.
(define_peephole for doubleword l/ashiftrt): Likewise.

gcc/ChangeLog:

* gcc.target/i386/apx-ndd-ti-shift.c: New test.
---
 gcc/config/i386/i386-expand.cc| 136 ++
 gcc/config/i386/i386-protos.h |   2 +
 gcc/config/i386/i386.md   |  56 ++--
 .../gcc.target/i386/apx-ndd-ti-shift.c|  91 
 4 files changed, 273 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index d4bbd33ce07..a53d69d5400 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -6678,6 +6678,142 @@ ix86_split_lshr (rtx *operands, rtx scratch, 
machine_mode mode)
 }
 }
 
+/* Helper function to split TImode ashl under NDD.  */
+void
+ix86_split_ashl_ndd (rtx *operands, rtx scratch)
+{
+  gcc_assert (TARGET_APX_NDD);
+  int half_width = GET_MODE_BITSIZE (TImode) >> 1;
+
+  rtx low[2], high[2];
+  int count;
+
+  split_double_mode (TImode, operands, 2, low, high);
+  if (CONST_INT_P (operands[2]))
+{
+  count = INTVAL (operands[2]) & (GET_MODE_BITSIZE (TImode) - 1);
+
+  if (count >= half_width)
+   {
+ count = count - half_width;
+ if (count == 0)
+   {
+ if (!rtx_equal_p (high[0], low[1]))
+   emit_move_insn (high[0], low[1]);
+   }
+ else if (count == 1)
+   emit_insn (gen_adddi3 (high[0], low[1], low[1]));
+ else
+   emit_insn (gen_ashldi3 (high[0], low[1], GEN_INT (count)));
+
+ ix86_expand_clear (low[0]);
+   }
+  else if (count == 1)
+   {
+ rtx x3 = gen_rtx_REG (CCCmode, FLAGS_REG);
+ rtx x4 = gen_rtx_LTU (TImode, x3, const0_rtx);
+ emit_insn (gen_add3_cc_overflow_1 (DImode, low[0],
+low[1], low[1]));
+ emit_insn (gen_add3_carry (DImode, high[0], high[1], high[1],
+x3, x4));
+   }
+  else
+   {
+ emit_insn (gen_x86_64_shld_ndd (high[0], high[1], low[1],
+ GEN_INT (count)));
+ emit_insn (gen_ashldi3 (low[0], low[1], GEN_INT (count)));
+   }
+}
+  else
+{
+  emit_insn (gen_x86_64_shld_ndd (high[0], high[1], low[1],
+ operands[2]));
+  emit_insn (gen_ashldi3 (low[0], low[1], operands[2]));
+  if (TARGET_CMOVE && scratch)
+   {
+ ix86_expand_clear (scratch);
+ emit_insn (gen_x86_shift_adj_1
+(DImode, high[0], low[0], operands[2], scratch));
+   }
+  else
+   emit_insn (gen_x86_shift_adj_2 (DImode, high[0], low[0], operands[2]));
+}
+}
+
+/* Helper function to split TImode l/ashr under NDD.  */
+void
+ix86_split_rshift_ndd (enum rtx_code code, rtx *operands, rtx scratch)
+{
+  gcc_assert (TARGET_APX_NDD);
+  int half_width = GET_MODE_BITSIZE (TImode) >> 1;
+  bool ashr_p = code == ASHIFTRT;
+  rtx (*gen_shr)(rtx, rtx, rtx) = ashr_p ? gen_ashrdi3
+: gen_lshrdi3;
+
+  rtx low[2], high[2];
+  int count;
+
+  split_double_mode (TImode, operands, 2, low, high);
+  if (CONST_INT_P (operands[2]))
+{
+  count = INTVAL (operands[2]) & (GET_MODE_BITSIZE (TImode) - 1);
+
+  if (ashr_p && (count == GET_MODE_BITSIZE (TImode) - 1))
+   {
+ emit_insn (gen_shr (high[0], high[1],
+ GEN_INT (half_width - 1)));
+ emit_move_insn (low[0], high[0]);
+   }
+  else if (count >= half_width)
+   {
+ if (ashr_p)
+   emit_insn (gen_shr (high[0], high[1],
+   GEN_INT (half_width - 1)));
+

[PATCH 13/16] [APX NDD] Support APX NDD for rotate insns

2023-12-06 Thread Hongyu Wang

gcc/ChangeLog:

* config/i386/i386.md (*3_1): Extend with a new
alternative to support NDD for SI/DI rotate, and adjust output
template.
(*si3_1_zext): Likewise.
(*3_1): Likewise for QI/HI modes.
(rcrsi2): Likewise, and use nonimmediate_operand for operands[1]
to accept memory input for NDD alternative.
(rcrdi2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add test for left/right rotate.
---
 gcc/config/i386/i386.md | 79 +++--
 gcc/testsuite/gcc.target/i386/apx-ndd.c | 20 +++
 2 files changed, 69 insertions(+), 30 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d1eae7248d9..6e4ac776f8a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16667,13 +16667,15 @@ (define_insn "*bmi2_rorx3_1"
(set_attr "mode" "")])
 
 (define_insn "*3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
(any_rotate:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,")))
+ (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,rm")
+ (match_operand:QI 2 "nonmemory_operand" "c,,c")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (, mode, operands)"
+  "ix86_binary_operator_ok (, mode, operands,
+   TARGET_APX_NDD)"
 {
+  bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
 {
 case TYPE_ROTATEX:
@@ -16681,14 +16683,16 @@ (define_insn "*3_1"
 
 default:
   if (operands[2] == const1_rtx
- && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+ && !use_ndd)
return "{}\t%0";
   else
-   return "{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "{}\t{%2, %1, %0|%0, %1, %2}"
+  : "{}\t{%2, %0|%0, %2}";
 }
 }
-  [(set_attr "isa" "*,bmi2")
-   (set_attr "type" "rotate,rotatex")
+  [(set_attr "isa" "*,bmi2,apx_ndd")
+   (set_attr "type" "rotate,rotatex,rotate")
(set (attr "preferred_for_size")
  (cond [(eq_attr "alternative" "0")
  (symbol_ref "true")]
@@ -16738,13 +16742,14 @@ (define_insn "*bmi2_rorxsi3_1_zext"
(set_attr "mode" "SI")])
 
 (define_insn "*si3_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
(zero_extend:DI
- (any_rotate:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm")
-(match_operand:QI 2 "nonmemory_operand" "cI,I"
+ (any_rotate:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,rm")
+(match_operand:QI 2 "nonmemory_operand" "cI,I,cI"
(clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && ix86_binary_operator_ok (, SImode, operands)"
 {
+  bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
 {
 case TYPE_ROTATEX:
@@ -16752,14 +16757,16 @@ (define_insn "*si3_1_zext"
 
 default:
   if (operands[2] == const1_rtx
- && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+ && !use_ndd)
return "{l}\t%k0";
   else
-   return "{l}\t{%2, %k0|%k0, %2}";
+   return use_ndd ? "{l}\t{%2, %1, %k0|%k0, %1, %2}"
+  : "{l}\t{%2, %k0|%k0, %2}";
 }
 }
-  [(set_attr "isa" "*,bmi2")
-   (set_attr "type" "rotate,rotatex")
+  [(set_attr "isa" "*,bmi2,apx_ndd")
+   (set_attr "type" "rotate,rotatex,rotate")
(set (attr "preferred_for_size")
  (cond [(eq_attr "alternative" "0")
  (symbol_ref "true")]
@@ -16803,19 +16810,25 @@ (define_split
(zero_extend:DI (rotatert:SI (match_dup 1) (match_dup 2])
 
 (define_insn "*3_1"
-  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m")
-   (any_rotate:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "0")
- (match_operand:QI 2 "nonmemory_operand" "c")))
+  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m,r")
+   (any_rotate:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "0,rm")
+ (match_operand:QI 2 "nonmemory_operand" "c,c")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (, mode, operands)"
+  "ix86_binary_operator_ok (, mode, operands,
+   TARGET_APX_NDD)"
 {
+  bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   if (operands[2] == const1_rtx
-  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+  && !use_ndd)
 return "{}\t%0";
   else
-return "{}\t{%2, %0|%0, %2}";
+return use_ndd
+  ? "{}\t{%2, %1, %0|%0, %1, %2}"
+  : "{}\t{%2, %0|

[patch-1v2, rs6000] enable fctiw on old archs [PR112707]

2023-12-06 Thread HAO CHEN GUI

Hi,
  SImode in float register is supported on P7 above. It causes "fctiw"
can't be generated on old 32-bit processors as the output operand of
fctiw insn is an SImode in float/double register. This patch fixes the
problem by adding one expand and one insn pattern for fctiw. The output
of new pattern is DImode. When the targets don't support SImode in
float register, it calls the new insn pattern and convert the DImode
to SImode via stack.

  Compared to last version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/638860.html
the main change is to change the mode of output operand of the new
insn from SFmode to DImode so that it can call stfiwx pattern directly.
No need additional unspecs.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with
no regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: enable fctiw on old archs

The powerpc 32-bit processors (e.g. 5470) supports "fctiw" instruction,
but the instruction can't be generated on such platforms as the insn is
guard by TARGET_POPCNTD.  The root cause is SImode in float register is
supported from Power7.  Actually implementation of "fctiw" only needs
stfiwx which is supported by the old 32-bit processors.  This patch
enables "fctiw" expand for these processors.

gcc/
PR target/112707
* config/rs6000/rs6000.md (expand lrintsi2): New.
(insn lrintsi2): Rename to...
(*lrintsi): ...this.
(lrintsi_di): New.

gcc/testsuite/
PR target/112707
* gcc.target/powerpc/pr112707-1.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2a1b5ecfaee..dfb7f19c6ad 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6722,7 +6722,27 @@ (define_insn "lrintdi2"
   "fctid %0,%1"
   [(set_attr "type" "fp")])

-(define_insn "lrintsi2"
+(define_expand "lrintsi2"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
+   (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
+  UNSPEC_FCTIW))]
+  "TARGET_HARD_FLOAT && TARGET_STFIWX"
+{
+  /* For those old archs in which SImode can't be hold in float registers,
+ call lrintsi_internal2 to put the result in SFmode then
+ convert it via stack.  */
+  if (!TARGET_POPCNTD)
+{
+  rtx tmp = gen_reg_rtx (DImode);
+  emit_insn (gen_lrintsi_di (tmp, operands[1]));
+  rtx stack = rs6000_allocate_stack_temp (SImode, false, true);
+  emit_insn (gen_stfiwx (stack, tmp));
+  emit_move_insn (operands[0], stack);
+  DONE;
+}
+})
+
+(define_insn "*lrintsi"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
(unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
   UNSPEC_FCTIW))]
@@ -6730,6 +6750,14 @@ (define_insn "lrintsi2"
   "fctiw %0,%1"
   [(set_attr "type" "fp")])

+(define_insn "lrintsi_di"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=d")
+   (unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
+  UNSPEC_FCTIW))]
+  "TARGET_HARD_FLOAT && !TARGET_POPCNTD"
+  "fctiw %0,%1"
+  [(set_attr "type" "fp")])
+
 (define_insn "btrunc2"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")]
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c
new file mode 100644
index 000..cce6bd7f690
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=7450 -fno-math-errno" } */
+/* { dg-require-effective-target ilp32 } */
+/* { dg-skip-if "" { has_arch_ppc64 } } */
+/* { dg-final { scan-assembler-times {\mfctiw\M} 2 } }  */
+/* { dg-final { scan-assembler-times {\mstfiwx\M} 2 } }  */
+
+int test1 (double a)
+{
+  return __builtin_irint (a);
+}
+
+int test2 (float a)
+{
+  return __builtin_irint (a);
+}

Re: [PATCH]middle-end: Fix peeled vect loop IV values.

2023-12-06 Thread Richard Biener

On Wed, 6 Dec 2023, Tamar Christina wrote:

> Hi All,
> 
> While waiting for reviews I found this case where both loop exit needs to go 
> to
> epilogue loop, but there was an IV related variable that was used in the 
> scalar
> iteration as well.
> 
> vect_update_ivs_after_vectorizer then blew the value away and replaced it with
> the value if it took the normal exit.
> 
> For these cases where we've peeled an a vector iteration, we should skip
> vect_update_ivs_after_vectorizer since all exits are "alternate" exits.
> 
> For this to be correct we have peeling put the right LCSSA variables so
> vectorable_live_operations takes care of it.
> 
> This is triggered by new testcases 79 and 80 in early break testsuite
> and I'll merge this commit in the main one.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
>   Put right LCSSA var for peeled vect loops.
>   (vect_do_peeling): Skip vect_update_ivs_after_vectorizer.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> 7d48502e2e46240553509dfa6d75fcab7fea36d3..bfdbeb7faaba29aad51c0561dace680c96759484
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1668,6 +1668,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
>edge loop_entry = single_succ_edge (new_preheader);
>if (flow_loops)
>   {
> +   bool peeled_iters = single_pred (loop->latch) != loop_exit->src;
> /* Link through the main exit first.  */
> for (auto gsi_from = gsi_start_phis (loop->header),
>  gsi_to = gsi_start_phis (new_loop->header);
> @@ -1692,11 +1693,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
> continue;
>   }
>   }
> +   /* If we have multiple exits and the vector loop is peeled then we
> +  need to use the value at start of loop.  */

This comment doesn't really match 'peeled_iters'?  Iff the main IV exit
source isn't loop->latch then won't we miscompute?  I realize the
complication is that slpeel_tree_duplicate_loop_to_edge_cfg is used from
elsewhere as well (so we can't check LOOP_VINFO_EARLY_BREAKS_VECT_PEELED).

> +   if (peeled_iters)
> + {
> +   tree tmp_arg = gimple_phi_result (from_phi);
> +   if (!new_phi_args.get (tmp_arg))
> + new_arg = tmp_arg;
> + }
>  
> tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
>  
> -   /* Main loop exit should use the final iter value.  */
> +   /* Otherwise, main loop exit should use the final iter value.  */
> SET_PHI_ARG_DEF (lcssa_phi, loop_exit->dest_idx, new_arg);
>  
> 
adjust_phi_and_debug_stmts 
(to_phi, loop_entry, new_res);
> @@ -3394,9 +3403,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
> niters, tree nitersm1,
>if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>   update_e = single_succ_edge (e->dest);
>  
> -  /* Update the main exit.  */
> -  vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> - update_e);
> +  /* If we have a peeled vector iteration, all exits are the same, leave 
> it
> +  and so the main exit needs to be treated the same as the alternative
> +  exits in that we leave their updates to vectorizable_live_operations.
> +  */
> +  if (!LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> + vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> +   update_e);

and now we don't update the main exit?  What's
LOOP_VINFO_EARLY_BREAKS_VECT_PEELED again vs.
LOOP_VINFO_EARLY_BREAKS?

>if (skip_epilog || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>   {
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[patch-2v2, rs6000] guard fctid on PPC64 and powerpc 476 [PR112707]

2023-12-06 Thread HAO CHEN GUI

Hi,
  The "fctid" is supported on 64-bit Power processors and powerpc 476. It
need a guard to check it. The patch fixes the issue.

  Compared with last version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/638859.html
the main change is to define TARGET_FCTID to POWERPC64 or PPC476. Also
guard "lrintdi2" by TARGET_FCTID as it generates fctid.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: guard fctid on PPC64 and powerpc 476.

fctid is supported on 64-bit Power processors and powerpc 476. It should
be guarded by this condition. The patch fixes the issue.

gcc/
PR target/112707
* config/rs6000/rs6000.h (TARGET_FCTID): Define.
* config/rs6000/rs6000.md (lrintdi2): Add guard TARGET_FCTID.
* (lrounddi2): Replace TARGET_FPRND with TARGET_FCTID.

gcc/testsuite/
PR target/112707
* gcc.target/powerpc/pr112707.h: New.
* gcc.target/powerpc/pr112707-2.c: New.
* gcc.target/powerpc/pr112707-3.c: New.
* gcc.target/powerpc/pr88558-p7.c: Remove fctid for ilp32 as it's
now guarded by powerpc64.
* gcc.target/powerpc/pr88558-p8.c: Likewise.
* gfortran.dg/nint_p7.f90: Add powerpc64 target requirement as
lrounddi2 is now guarded by powerpc64.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 22595f6ebd7..8c29ca68ccf 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -467,6 +467,8 @@ extern int rs6000_vector_align[];
 #define TARGET_FCFIDUS TARGET_POPCNTD
 #define TARGET_FCTIDUZ TARGET_POPCNTD
 #define TARGET_FCTIWUZ TARGET_POPCNTD
+/* Enable fctid on ppc64 and powerpc476.  */
+#define TARGET_FCTID   (TARGET_POWERPC64 || rs6000_cpu == PROCESSOR_PPC476)
 #define TARGET_CTZ TARGET_MODULO
 #define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64)
 #define TARGET_MADDLD  TARGET_MODULO
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2a1b5ecfaee..3be79d49dc0 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6718,7 +6718,7 @@ (define_insn "lrintdi2"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=d")
(unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
   UNSPEC_FCTID))]
-  "TARGET_HARD_FLOAT"
+  "TARGET_HARD_FLOAT && TARGET_FCTID"
   "fctid %0,%1"
   [(set_attr "type" "fp")])

@@ -6784,7 +6784,7 @@ (define_expand "lrounddi2"
(set (match_operand:DI 0 "gpc_reg_operand")
(unspec:DI [(match_dup 2)]
   UNSPEC_FCTID))]
-  "TARGET_HARD_FLOAT && TARGET_VSX && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_VSX && TARGET_FCTID"
 {
   operands[2] = gen_reg_rtx (mode);
 })
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr112707-2.c
new file mode 100644
index 000..672e00691ea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=7450 -fno-math-errno" } */
+/* { dg-require-effective-target ilp32 } */
+/* { dg-skip-if "" { has_arch_ppc64 } } */
+/* { dg-final { scan-assembler-not {\mfctid\M} } }  */
+
+/* powerpc 7450 doesn't support ppc64 (-m32 -mpowerpc64), so skips it.  */
+
+#include "pr112707.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-3.c 
b/gcc/testsuite/gcc.target/powerpc/pr112707-3.c
new file mode 100644
index 000..924338fd390
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707-3.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-math-errno -mdejagnu-cpu=476fp" } */
+/* { dg-require-effective-target ilp32 } */
+
+/* powerpc 476fp has hard float enabled which is required by fctid */
+
+#include "pr112707.h"
+
+/* { dg-final { scan-assembler-times {\mfctid\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707.h 
b/gcc/testsuite/gcc.target/powerpc/pr112707.h
new file mode 100644
index 000..e427dc6a72e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112707.h
@@ -0,0 +1,10 @@
+long long test1 (double a)
+{
+  return __builtin_llrint (a);
+}
+
+long long test2 (float a)
+{
+  return __builtin_llrint (a);
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c 
b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
index 3932656c5fd..13d433c4bdb 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
@@ -6,7 +6,6 @@
 #include "pr88558.h"

 /* { dg-final { scan-assembler-times {\mfctid\M} 4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mfctid\M} 2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mfctiw\M} 2 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mfctiw\M} 4 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mstfiwx\M} 2 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p8.c 
b/gcc/t

Re: [PATCH 13/21]middle-end: Update loop form analysis to support early break

2023-12-06 Thread Richard Biener

On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> This sets LOOP_VINFO_EARLY_BREAKS and does some misc changes so the other
> patches are self contained.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop.cc (vect_analyze_loop_form): Analyse all exits.
>   (vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
>   (vect_transform_loop): Use it.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> 51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb991f07cd6052491d0
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost 
> (loop_vec_info loop_vinfo)
>loop_vinfo->scalar_costs->finish_cost (nullptr);
>  }
>  
> -
>  /* Function vect_analyze_loop_form.
>  
> Verify that certain CFG restrictions hold, including:
> - the loop has a pre-header
> -   - the loop has a single entry and exit
> +   - the loop has a single entry
> +   - nested loops can have only a single exit.
> - the loop exit condition is simple enough
> - the number of iterations can be analyzed, i.e, a countable loop.  The
>   niter could be analyzed under some assumptions.  */
> @@ -1841,10 +1841,14 @@ vect_analyze_loop_form (class loop *loop, 
> vect_loop_form_info *info)
>  "not vectorized: latch block not empty.\n");
>  
>/* Make sure the exit is not abnormal.  */
> -  if (exit_e->flags & EDGE_ABNORMAL)
> -return opt_result::failure_at (vect_location,
> -"not vectorized:"
> -" abnormal loop exit edge.\n");
> +  auto_vec exits = get_loop_exit_edges (loop);
> +  for (edge e : exits)

Seeing this multiple times, this isn't the most efficient way to
iterate over all exits with LOOPS_HAVE_RECORDED_EXITS.

Note to myself: fix (add to) the API.

> +{
> +  if (e->flags & EDGE_ABNORMAL)
> + return opt_result::failure_at (vect_location,
> +"not vectorized:"
> +" abnormal loop exit edge.\n");
> +}
>  
>info->conds
>  = vect_get_loop_niters (loop, exit_e, &info->assumptions,
> @@ -1920,6 +1924,10 @@ vect_create_loop_vinfo (class loop *loop, 
> vec_info_shared *shared,
>  
>LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
>  
> +  /* Check to see if we're vectorizing multiple exits.  */
> +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> += !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> +

Seeing this, s/LOOP_VINFO_LOOP_CONDS/LOOP_VINFO_LOOP_EXIT_CONDS/g
might be good, if we in future avoid if-conversion in a separate
pass we will have other CONDs as well.

>if (info->inner_loop_cond)
>  {
>stmt_vec_info inner_loop_cond_info
> @@ -11577,7 +11585,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call)
>/* Make sure there exists a single-predecessor exit bb.  Do this before 
>   versioning.   */
>edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> -  if (! single_pred_p (e->dest))
> +  if (! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>  {
>split_loop_exit_edge (e, true);

Note this splitting is done to fulfil versioning constraints on CFG
update.  Do you have test coverage with alias versioning and early
breaks?

Otherwise OK.

Thanks,
Richard.

Re: [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations.

2023-12-06 Thread Jiahao Xu




在 2023/12/6 下午3:04, Jiahao Xu 写道:

LoongArch V1.1 adds support for approximate instructions, which are utilized 
along with additional
Newton-Raphson steps implement single precision floating-point division, square 
root and reciprocal
square root operations for better throughput.

The patches are modifications made based on the patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html


The changes in version 3 compared to the previous version include:

*Enable -mfrecipe when using -march=la664.
*Implement builtin functions for frecipe and frsqrte instructions and 
introduce a new builtin macro "__loongarch_frecipe".

*Add corresponding test cases for the implemented builtin functions.
*Update the usage for the new intrinsic functions and builtin functions 
in extend.texi.

*Add reverse tests for scenarios where the -mrecip option is not enabled.

Jiahao Xu (5):
   LoongArch: Add support for LoongArch V1.1 approximate instructions.
   LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
 instructions.
   LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.
   LoongArch: New options -mrecip and -mrecip= with ffast-math.
   LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf
 when -mrecip is enabled.

  gcc/config/loongarch/genopts/isa-evolution.in |   1 +
  gcc/config/loongarch/genopts/loongarch.opt.in |  11 +
  gcc/config/loongarch/larchintrin.h|  38 +++
  gcc/config/loongarch/lasx.md  |  89 ++-
  gcc/config/loongarch/lasxintrin.h |  34 +++
  gcc/config/loongarch/loongarch-builtins.cc|  66 +
  gcc/config/loongarch/loongarch-c.cc   |   3 +
  gcc/config/loongarch/loongarch-cpucfg-map.h   |   1 +
  gcc/config/loongarch/loongarch-def.cc |   3 +-
  gcc/config/loongarch/loongarch-protos.h   |   2 +
  gcc/config/loongarch/loongarch-str.h  |   1 +
  gcc/config/loongarch/loongarch.cc | 252 +-
  gcc/config/loongarch/loongarch.h  |  18 ++
  gcc/config/loongarch/loongarch.md | 104 ++--
  gcc/config/loongarch/loongarch.opt|  15 ++
  gcc/config/loongarch/lsx.md   |  89 ++-
  gcc/config/loongarch/lsxintrin.h  |  34 +++
  gcc/config/loongarch/predicates.md|   8 +
  gcc/doc/extend.texi   |  35 +++
  gcc/doc/invoke.texi   |  54 
  gcc/testsuite/gcc.target/loongarch/divf.c |  10 +
  .../loongarch/larch-frecipe-builtin.c |  28 ++
  .../gcc.target/loongarch/recip-divf.c |   9 +
  .../gcc.target/loongarch/recip-sqrtf.c|  23 ++
  gcc/testsuite/gcc.target/loongarch/sqrtf.c|  24 ++
  .../loongarch/vector/lasx/lasx-divf.c |  13 +
  .../vector/lasx/lasx-frecipe-builtin.c|  30 +++
  .../loongarch/vector/lasx/lasx-recip-divf.c   |  12 +
  .../loongarch/vector/lasx/lasx-recip-sqrtf.c  |  28 ++
  .../loongarch/vector/lasx/lasx-recip.c|  24 ++
  .../loongarch/vector/lasx/lasx-rsqrt.c|  26 ++
  .../loongarch/vector/lasx/lasx-sqrtf.c|  29 ++
  .../loongarch/vector/lsx/lsx-divf.c   |  13 +
  .../vector/lsx/lsx-frecipe-builtin.c  |  30 +++
  .../loongarch/vector/lsx/lsx-recip-divf.c |  12 +
  .../loongarch/vector/lsx/lsx-recip-sqrtf.c|  28 ++
  .../loongarch/vector/lsx/lsx-recip.c  |  24 ++
  .../loongarch/vector/lsx/lsx-rsqrt.c  |  26 ++
  .../loongarch/vector/lsx/lsx-sqrtf.c  |  29 ++
  39 files changed, 1234 insertions(+), 42 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/divf.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-divf.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/sqrtf.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch

Re: [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks

2023-12-06 Thread Richard Biener

On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> This finishes wiring that didn't fit in any of the other patches.
> Essentially just adding related changes so peeling for early break works.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop-manip.cc (vect_set_loop_condition_normal,
>   vect_do_peeling): Support early breaks.
>   * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p): Likewise.
>   * tree-vectorizer.cc (pass_vectorize::execute): Check all exits.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> eef2bb50c1505f5cf802d5d80300affc2cbe69f6..9c1405d79fd8fe8689007df3b7605b7a3d3ecdd7
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1187,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class 
> loop *loop,
> loop handles exactly VF scalars per iteration.  */
>  
>  static gcond *
> -vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge 
> exit_edge,
> +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
>   class loop *loop, tree niters, tree step,
>   tree final_iv, bool niters_maybe_zero,
>   gimple_stmt_iterator loop_cond_gsi)
> @@ -1296,7 +1296,8 @@ vect_set_loop_condition_normal (loop_vec_info /* 
> loop_vinfo */, edge exit_edge,
>gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
>  
>/* Record the number of latch iterations.  */
> -  if (limit == niters)
> +  if (limit == niters
> +  || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>  /* Case A: the loop iterates NITERS times.  Subtract one to get the
> latch count.  */
>  loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
> @@ -3242,6 +3243,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
> niters, tree nitersm1,
>  bound_epilog += vf - 1;
>if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
>  bound_epilog += 1;
> +
> +  /* For early breaks the scalar loop needs to execute at most VF times
> + to find the element that caused the break.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +{
> +  bound_epilog = vf;
> +  /* Force a scalar epilogue as we can't vectorize the index finding.  */
> +  vect_epilogues = false;

This is originally initialized with

  bool vect_epilogues = loop_vinfo->epilogue_vinfos.length () > 0;

so I think we should avoid filling that with LOOP_VINFO_EARLY_BREAKS
rather than fixing up after the fact?  That is in vect_analyze_loop
adjust

  /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
 enabled, SIMDUID is not set, it is the innermost loop and we have
 either already found the loop's SIMDLEN or there was no SIMDLEN to
 begin with.
 TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
  bool vect_epilogues = (!simdlen
 && loop->inner == NULL
 && param_vect_epilogues_nomask
 && LOOP_VINFO_PEELING_FOR_NITER 
(first_loop_vinfo)
 && !loop->simduid);

and add !LOOP_VINFO_EARLY_BREAKS?

> +}
> +
>bool epilog_peeling = maybe_ne (bound_epilog, 0U);
>poly_uint64 bound_scalar = bound_epilog;
>  
> @@ -3376,14 +3387,23 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
> niters, tree nitersm1,
> bound_prolog + bound_epilog)
> : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
>|| vect_epilogues));
> +
> +  /* We only support early break vectorization on known bounds at this time.
> + This means that if the vector loop can't be entered then we won't 
> generate
> + it at all.  So for now force skip_vector off because the additional 
> control
> + flow messes with the BB exits and we've already analyzed them.  */
> + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
> +

  bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
  ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
  bound_prolog + bound_epilog)
  : (!LOOP_REQUIRES_VERSIONING (loop_vinfo) 
 || vect_epilogues));

to me that looks like

  gcc_assert (!skip_vector || !LOOP_VINFO_EARLY_BREAKS (loop_vinfo));

should work?  You are basically relying on cost modeling rejecting
vectorization that doesn't enter the vector loop.

>/* Epilog loop must be executed if the number of iterations for epilog
>   loop is known at compile time, otherwise we need to add a check at
>   the end of vector loop and skip to the end of epilog loop.  */
>bool skip_epilog = (prolog_peeling < 0
> || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> || !

Causes to nvptx bootstrap fail: [PATCH v5] Introduce strub: machine-independent stack scrubbing

2023-12-06 Thread Tobias Burnus


Hi,

CC'd Thomas.

FYI the newly added file libgcc/strub.c of this patch (aka commit 
r14-6201-gf0a90c7d7333fc )
causes that nvptx does not bootstrap, failing with:

./gcc/as -v -o strub.o strub.s
Verifying sm_30 code with sm_50 code generation.
 ptxas -c -o /dev/null strub.o --gpu-name sm_50 -O0
ptxas strub.o, line 22; error   : Arguments mismatch for instruction 'st'
ptxas strub.o, line 22; error   : Unknown symbol '%frame'
ptxas strub.o, line 37; error   : Arguments mismatch for instruction 'setp'
ptxas strub.o, line 40; error   : Arguments mismatch for instruction 'st'
ptxas strub.o, line 37; error   : Unknown symbol '%frame'
ptxas strub.o, line 40; error   : Unknown symbol '%frame'
ptxas strub.o, line 59; error   : Arguments mismatch for instruction 'mov'
ptxas strub.o, line 67; error   : Arguments mismatch for instruction 'setp'
ptxas strub.o, line 59; error   : Unknown symbol '%stack'
ptxas strub.o, line 67; error   : Unknown symbol '%stack'
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status

That's

.visible .func __strub_enter (.param.u64 %in_ar0)
{
.reg.u64 %ar0;
ld.param.u64 %ar0, [%in_ar0];
.reg.u64 %r23;
mov.u64 %r23, %ar0;
st.u64  [%r23], %frame;
...

setp.le.u64 %r26, %r25, %frame;

...

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH RFA (libstdc++)] c++: partial ordering of object parameter [PR53499]

2023-12-06 Thread Jonathan Wakely

On Wed, 6 Dec 2023 at 02:21, Jason Merrill wrote:
>
> Tested x86_64-pc-linux-gnu.  Are the library test changes OK?

Sure, they seem fine.

>  A reduced
> example of the issue is at https://godbolt.org/z/cPxrcnKjG
>
> -- 8< --
>
> Looks like we implemented option 1 (skip the object parameter) for CWG532
> before the issue was resolved, and never updated to the final resolution of
> option 2 (model it as a reference).  More recently CWG2445 extended this
> handling to static member functions; I think that's wrong, and have
> opened CWG2834 to address that and how explicit object member functions
> interact with it.
>
> The FIXME comments are to guide how the explicit object member function
> support should change the uses of DECL_NONSTATIC_MEMBER_FUNCTION_P.
>
> The library testsuite changes are to make partial ordering work again
> between the generic operator- in the testcase and
> _Pointer_adapter::operator-.
>
> DR 532
> PR c++/53499
>
> gcc/cp/ChangeLog:
>
> * pt.cc (more_specialized_fn): Fix object parameter handling.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/template/partial-order4.C: New test.
> * g++.dg/template/spec26.C: Adjust for CWG532.
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/23_containers/vector/ext_pointer/types/1.cc
> * testsuite/23_containers/vector/ext_pointer/types/2.cc
> (N::operator-): Make less specialized.
> ---
>  gcc/cp/pt.cc  | 68 ++-
>  .../g++.dg/template/partial-order4.C  | 17 +
>  gcc/testsuite/g++.dg/template/spec26.C| 10 +--
>  .../vector/ext_pointer/types/1.cc |  4 +-
>  .../vector/ext_pointer/types/2.cc |  4 +-
>  5 files changed, 78 insertions(+), 25 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/template/partial-order4.C
>
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 924a20973b4..4b2af4f7aca 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -25218,27 +25218,61 @@ more_specialized_fn (tree pat1, tree pat2, int len)
>bool lose1 = false;
>bool lose2 = false;
>
> -  /* Remove the this parameter from non-static member functions.  If
> - one is a non-static member function and the other is not a static
> - member function, remove the first parameter from that function
> - also.  This situation occurs for operator functions where we
> - locate both a member function (with this pointer) and non-member
> - operator (with explicit first operand).  */
> -  if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl1))
> +  /* C++17 [temp.func.order]/3 (CWG532)
> +
> + If only one of the function templates M is a non-static member of some
> + class A, M is considered to have a new first parameter inserted in its
> + function parameter list. Given cv as the cv-qualifiers of M (if any), 
> the
> + new parameter is of type "rvalue reference to cv A" if the optional
> + ref-qualifier of M is && or if M has no ref-qualifier and the first
> + parameter of the other template has rvalue reference type. Otherwise, 
> the
> + new parameter is of type "lvalue reference to cv A".  */
> +
> +  if (DECL_STATIC_FUNCTION_P (decl1) || DECL_STATIC_FUNCTION_P (decl2))
>  {
> -  len--; /* LEN is the number of significant arguments for DECL1 */
> -  args1 = TREE_CHAIN (args1);
> -  if (!DECL_STATIC_FUNCTION_P (decl2))
> -   args2 = TREE_CHAIN (args2);
> -}
> -  else if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl2))
> -{
> -  args2 = TREE_CHAIN (args2);
> -  if (!DECL_STATIC_FUNCTION_P (decl1))
> +  /* Note C++20 DR2445 extended the above to static member functions, but
> +I think think the old G++ behavior of just skipping the object
> +parameter when comparing to a static member function was better, so
> +let's stick with that for now.  This is CWG2834.  --jason 2023-12 */
> +  if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl1)) /* FIXME or explicit */
> {
> - len--;
> + len--; /* LEN is the number of significant arguments for DECL1 */
>   args1 = TREE_CHAIN (args1);
> }
> +  else if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl2)) /* FIXME or 
> explicit */
> +   args2 = TREE_CHAIN (args2);
> +}
> +  else if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl1) /* FIXME implicit only */
> +  && DECL_NONSTATIC_MEMBER_FUNCTION_P (decl2))
> +{
> +  /* Note DR2445 also (IMO wrongly) removed the "only one" above, which
> +would break e.g.  cpp1y/lambda-generic-variadic5.C.  */
> +  len--;
> +  args1 = TREE_CHAIN (args1);
> +  args2 = TREE_CHAIN (args2);
> +}
> +  else if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl1) /* FIXME implicit only */
> +  || DECL_NONSTATIC_MEMBER_FUNCTION_P (decl2))
> +{
> +  /* The other is a non-member or explicit object member function;
> +rewrite the implicit object parameter to a reference.  */
> +  tree ns = D

RE: [PATCH]middle-end: Fix peeled vect loop IV values.

2023-12-06 Thread Tamar Christina

> > Hi All,
> >
> > While waiting for reviews I found this case where both loop exit needs to 
> > go to
> > epilogue loop, but there was an IV related variable that was used in the 
> > scalar
> > iteration as well.
> >
> > vect_update_ivs_after_vectorizer then blew the value away and replaced it 
> > with
> > the value if it took the normal exit.
> >
> > For these cases where we've peeled an a vector iteration, we should skip
> > vect_update_ivs_after_vectorizer since all exits are "alternate" exits.
> >
> > For this to be correct we have peeling put the right LCSSA variables so
> > vectorable_live_operations takes care of it.
> >
> > This is triggered by new testcases 79 and 80 in early break testsuite
> > and I'll merge this commit in the main one.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
> > Put right LCSSA var for peeled vect loops.
> > (vect_do_peeling): Skip vect_update_ivs_after_vectorizer.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> 7d48502e2e46240553509dfa6d75fcab7fea36d3..bfdbeb7faaba29aad51c0561d
> ace680c96759484 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -1668,6 +1668,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop
> *loop, edge loop_exit,
> >edge loop_entry = single_succ_edge (new_preheader);
> >if (flow_loops)
> > {
> > + bool peeled_iters = single_pred (loop->latch) != loop_exit->src;
> >   /* Link through the main exit first.  */
> >   for (auto gsi_from = gsi_start_phis (loop->header),
> >gsi_to = gsi_start_phis (new_loop->header);
> > @@ -1692,11 +1693,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> loop *loop, edge loop_exit,
> >   continue;
> > }
> > }
> > + /* If we have multiple exits and the vector loop is peeled then we
> > +need to use the value at start of loop.  */
> 
> This comment doesn't really match 'peeled_iters'?  Iff the main IV exit
> source isn't loop->latch then won't we miscompute?  I realize the
> complication is that slpeel_tree_duplicate_loop_to_edge_cfg is used from
> elsewhere as well (so we can't check LOOP_VINFO_EARLY_BREAKS_VECT_PEELED).
> 

No, because in both exits we restart the scalar iteration at the start of the 
last vector iteration.
Normally, the counted main exit would be updated by vect_iters_bound_vf - vf.  
Which is the same
As the induction value should we get to the final iteration.

More on it in your question below.

> > + if (peeled_iters)
> > +   {
> > + tree tmp_arg = gimple_phi_result (from_phi);
> > + if (!new_phi_args.get (tmp_arg))
> > +   new_arg = tmp_arg;
> > +   }
> >
> >   tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> >   gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
> >
> > - /* Main loop exit should use the final iter value.  */
> > + /* Otherwise, main loop exit should use the final iter value.  */
> >   SET_PHI_ARG_DEF (lcssa_phi, loop_exit->dest_idx, new_arg);
> >
> >
> adjust_phi_and_debug_stmts
> (to_phi, loop_entry, new_res);
> > @@ -3394,9 +3403,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > update_e = single_succ_edge (e->dest);
> >
> > -  /* Update the main exit.  */
> > -  vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> > -   update_e);
> > +  /* If we have a peeled vector iteration, all exits are the same, 
> > leave it
> > +and so the main exit needs to be treated the same as the alternative
> > +exits in that we leave their updates to vectorizable_live_operations.
> > +*/
> > +  if (!LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> > +   vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> > + update_e);
> 
> and now we don't update the main exit?  What's
> LOOP_VINFO_EARLY_BREAKS_VECT_PEELED again vs.
> LOOP_VINFO_EARLY_BREAKS?

So LOOP_VINFO_EARLY_BREAKS essentially says that there are multiple exits
That we can vectorize.

LOOP_VINFO_EARLY_BREAKS_VECT_PEELED is saying that in this loop, we've
picked as the main exit not the loops latch connected exit.  This means when we
exit from the "main" exit in the final iteration we may still have side effects 
to
perform and so the final iteration should be restarted.

Similarly exiting from an early exit in this case also means having to restart 
the
Loop at the same place to apply any partial side-effects.  This is because in 
these
Cases the loop exits at n - 1 iterations, the IV i

Re: [PATCH v7 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-12-06 Thread Jakub Jelinek

On Wed, Dec 06, 2023 at 07:33:21AM +, waffl3x wrote:
> Here is the next version, it feels very close to finished. As before, I
> haven't ran a bootstrap or the full testsuite yet but I did run the
> explicit-obj tests which completed as expected.
> 
> There's a few test cases that still need to be written but more tests
> can always be added. The behavior added by CWG2789 works in at least
> one case, but I have not added tests for it yet. The test cases for
> dependent lambda expressions need to be fleshed out more, but a few
> temporary ones are included to demonstrate that they do work and that
> the crash is fixed. Explicit object conversion functions work, but I
> need to add fleshed out tests for them, explicit-obj-basic5.C has that
> test.
> 
> I'll start the tests now and report back if anything fails, I'm
> confident everything will be fine though.
> 
> Alex

> From 937e12c57145bfd878a0bc4cd9735c2d3c4fcf22 Mon Sep 17 00:00:00 2001
> From: Waffl3x 
> Date: Tue, 5 Dec 2023 23:16:01 -0700
> Subject: [PATCH] P0847R7 (Deducing This) [PR102609] Another quick and dirty
>  patch for review, hopefully the last. gcc/cp/ChangeLog:
> 

Please add
PR c++/102609
line above this.

>   * call.cc (build_this_conversion):

Note, for the final submission, all the ):
should be followed by descriptions what has changed in there (but not why).
Plus it would be good to mention somewhere early in the cp/ChangeLog
entry that the patch implements C++23 P0847R7 - Deducing this paper
(unfortunately the ChangeLog verifier doesn't allow such free text above
the ChangeLog entry where it used to be written some years ago,
only allows there the PR line; I usually put such text after the ):
of the first entry now and only after it write exactly what changed
in that function.  Does the patch also implement CWG2586?

Also, I don't see in the patch the expected
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Predefine
__cpp_explicit_this_parameter=202110L for C++23.
part plus gcc/testsuite/cpp{23,26}/feat-cxx*.C additions checking
for that macro presence and its value.

Jakub

RE: [PATCH 13/21]middle-end: Update loop form analysis to support early break

2023-12-06 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, December 6, 2023 8:18 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: Re: [PATCH 13/21]middle-end: Update loop form analysis to support 
> early
> break
> 
> On Mon, 6 Nov 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This sets LOOP_VINFO_EARLY_BREAKS and does some misc changes so the other
> > patches are self contained.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-loop.cc (vect_analyze_loop_form): Analyse all exits.
> > (vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > (vect_transform_loop): Use it.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index
> 51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb991
> f07cd6052491d0 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost
> (loop_vec_info loop_vinfo)
> >loop_vinfo->scalar_costs->finish_cost (nullptr);
> >  }
> >
> > -
> >  /* Function vect_analyze_loop_form.
> >
> > Verify that certain CFG restrictions hold, including:
> > - the loop has a pre-header
> > -   - the loop has a single entry and exit
> > +   - the loop has a single entry
> > +   - nested loops can have only a single exit.
> > - the loop exit condition is simple enough
> > - the number of iterations can be analyzed, i.e, a countable loop.  The
> >   niter could be analyzed under some assumptions.  */
> > @@ -1841,10 +1841,14 @@ vect_analyze_loop_form (class loop *loop,
> vect_loop_form_info *info)
> >"not vectorized: latch block not empty.\n");
> >
> >/* Make sure the exit is not abnormal.  */
> > -  if (exit_e->flags & EDGE_ABNORMAL)
> > -return opt_result::failure_at (vect_location,
> > -  "not vectorized:"
> > -  " abnormal loop exit edge.\n");
> > +  auto_vec exits = get_loop_exit_edges (loop);
> > +  for (edge e : exits)
> 
> Seeing this multiple times, this isn't the most efficient way to
> iterate over all exits with LOOPS_HAVE_RECORDED_EXITS.
> 
> Note to myself: fix (add to) the API.
> 
> > +{
> > +  if (e->flags & EDGE_ABNORMAL)
> > +   return opt_result::failure_at (vect_location,
> > +  "not vectorized:"
> > +  " abnormal loop exit edge.\n");
> > +}
> >
> >info->conds
> >  = vect_get_loop_niters (loop, exit_e, &info->assumptions,
> > @@ -1920,6 +1924,10 @@ vect_create_loop_vinfo (class loop *loop,
> vec_info_shared *shared,
> >
> >LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> >
> > +  /* Check to see if we're vectorizing multiple exits.  */
> > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > += !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > +
> 
> Seeing this, s/LOOP_VINFO_LOOP_CONDS/LOOP_VINFO_LOOP_EXIT_CONDS/g
> might be good, if we in future avoid if-conversion in a separate
> pass we will have other CONDs as well.
> 
> >if (info->inner_loop_cond)
> >  {
> >stmt_vec_info inner_loop_cond_info
> > @@ -11577,7 +11585,7 @@ vect_transform_loop (loop_vec_info loop_vinfo,
> gimple *loop_vectorized_call)
> >/* Make sure there exists a single-predecessor exit bb.  Do this before
> >   versioning.   */
> >edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > -  if (! single_pred_p (e->dest))
> > +  if (! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> >  {
> >split_loop_exit_edge (e, true);
> 
> Note this splitting is done to fulfil versioning constraints on CFG
> update.  Do you have test coverage with alias versioning and early
> breaks?

No, only non-alias versioning.  I don't believe we can alias in the current
implementation because it's restricted to statically known objects with
a fixed size.

Thanks,
Tamar

> 
> Otherwise OK.
> 
> Thanks,
> Richard.

RE: [PATCH]middle-end: Fix peeled vect loop IV values.

2023-12-06 Thread Tamar Christina

> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, December 6, 2023 8:48 AM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH]middle-end: Fix peeled vect loop IV values.
> 
> > > Hi All,
> > >
> > > While waiting for reviews I found this case where both loop exit needs to 
> > > go to
> > > epilogue loop, but there was an IV related variable that was used in the 
> > > scalar
> > > iteration as well.
> > >
> > > vect_update_ivs_after_vectorizer then blew the value away and replaced it 
> > > with
> > > the value if it took the normal exit.
> > >
> > > For these cases where we've peeled an a vector iteration, we should skip
> > > vect_update_ivs_after_vectorizer since all exits are "alternate" exits.
> > >
> > > For this to be correct we have peeling put the right LCSSA variables so
> > > vectorable_live_operations takes care of it.
> > >
> > > This is triggered by new testcases 79 and 80 in early break testsuite
> > > and I'll merge this commit in the main one.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
> > >   Put right LCSSA var for peeled vect loops.
> > >   (vect_do_peeling): Skip vect_update_ivs_after_vectorizer.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> >
> 7d48502e2e46240553509dfa6d75fcab7fea36d3..bfdbeb7faaba29aad51c0561d
> > ace680c96759484 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -1668,6 +1668,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop
> > *loop, edge loop_exit,
> > >edge loop_entry = single_succ_edge (new_preheader);
> > >if (flow_loops)
> > >   {
> > > +   bool peeled_iters = single_pred (loop->latch) != loop_exit->src;
> > > /* Link through the main exit first.  */
> > > for (auto gsi_from = gsi_start_phis (loop->header),
> > >  gsi_to = gsi_start_phis (new_loop->header);
> > > @@ -1692,11 +1693,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop *loop, edge loop_exit,
> > > continue;
> > >   }
> > >   }
> > > +   /* If we have multiple exits and the vector loop is peeled then we
> > > +  need to use the value at start of loop.  */
> >
> > This comment doesn't really match 'peeled_iters'?  Iff the main IV exit
> > source isn't loop->latch then won't we miscompute?  I realize the
> > complication is that slpeel_tree_duplicate_loop_to_edge_cfg is used from
> > elsewhere as well (so we can't check
> LOOP_VINFO_EARLY_BREAKS_VECT_PEELED).
> >
> 
> No, because in both exits we restart the scalar iteration at the start of the 
> last
> vector iteration.
> Normally, the counted main exit would be updated by vect_iters_bound_vf - vf.
> Which is the same
> As the induction value should we get to the final iteration.
> 
> More on it in your question below.
> 
> > > +   if (peeled_iters)
> > > + {
> > > +   tree tmp_arg = gimple_phi_result (from_phi);
> > > +   if (!new_phi_args.get (tmp_arg))
> > > + new_arg = tmp_arg;
> > > + }
> > >
> > > tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> > > gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
> > >
> > > -   /* Main loop exit should use the final iter value.  */
> > > +   /* Otherwise, main loop exit should use the final iter value.  */
> > > SET_PHI_ARG_DEF (lcssa_phi, loop_exit->dest_idx, new_arg);
> > >
> > >
> > adjust_phi_and_debug_stmts
> > (to_phi, loop_entry, new_res);
> > > @@ -3394,9 +3403,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > >   update_e = single_succ_edge (e->dest);
> > >
> > > -  /* Update the main exit.  */
> > > -  vect_update_ivs_after_vectorizer (loop_vinfo, 
> > > niters_vector_mult_vf,
> > > - update_e);
> > > +  /* If we have a peeled vector iteration, all exits are the same, 
> > > leave it
> > > +  and so the main exit needs to be treated the same as the alternative
> > > +  exits in that we leave their updates to vectorizable_live_operations.
> > > +  */
> > > +  if (!LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> > > + vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> > > +   update_e);
> >
> > and now we don't update the main exit?  What's
> > LOOP_VINFO_EARLY_BREAKS_VECT_PEELED again vs.
> > LOOP_VINFO_EARLY_BREAKS?
> 
> So LOOP_VINFO_EARLY_BREAKS essentially says that there are multiple exits
> That we can vectorize.
> 
> LOOP_VINFO_EARLY_BREAKS_VECT_PEELED is saying that in this loop, we've
> picked as the mai

[PATCH][V3] RISC-V: Nan-box the result of movhf on soft-fp16

2023-12-06 Thread KuanLin Chen

According to spec, fmv.h checks if the input operands are correctly
 NaN-boxed. If not, the input value is treated as an n-bit canonical NaN.
 This patch fixs the issue that operands returned by soft-fp16 libgcc
 (i.e., __truncdfhf2) was not correctly NaN-boxed.

*gcc/ChangeLog:*

* config/riscv/riscv.cc (riscv_legitimize_move): Expand movhf

with Nan-boxing value.

* config/riscv/riscv.md (*movhf_softfloat_boxing): New pattern.


*gcc/testsuite/ChangeLog:*


gcc.target/riscv/_Float16-nanboxing.c: New test.


0001-RISC-V-Nan-box-the-result-of-movhf-on-soft-fp16.patch
Description: Binary data

RE: [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks

2023-12-06 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, December 6, 2023 8:32 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: Re: [PATCH 12/21]middle-end: Add remaining changes to peeling and
> vectorizer to support early breaks
> 
> On Mon, 6 Nov 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This finishes wiring that didn't fit in any of the other patches.
> > Essentially just adding related changes so peeling for early break works.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-loop-manip.cc (vect_set_loop_condition_normal,
> > vect_do_peeling): Support early breaks.
> > * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p): Likewise.
> > * tree-vectorizer.cc (pass_vectorize::execute): Check all exits.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> eef2bb50c1505f5cf802d5d80300affc2cbe69f6..9c1405d79fd8fe8689007df3b7
> 605b7a3d3ecdd7 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -1187,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512
> (class loop *loop,
> > loop handles exactly VF scalars per iteration.  */
> >
> >  static gcond *
> > -vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> exit_edge,
> > +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
> > class loop *loop, tree niters, tree step,
> > tree final_iv, bool niters_maybe_zero,
> > gimple_stmt_iterator loop_cond_gsi)
> > @@ -1296,7 +1296,8 @@ vect_set_loop_condition_normal (loop_vec_info /*
> loop_vinfo */, edge exit_edge,
> >gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
> >
> >/* Record the number of latch iterations.  */
> > -  if (limit == niters)
> > +  if (limit == niters
> > +  || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> >  /* Case A: the loop iterates NITERS times.  Subtract one to get the
> > latch count.  */
> >  loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
> > @@ -3242,6 +3243,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >  bound_epilog += vf - 1;
> >if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> >  bound_epilog += 1;
> > +
> > +  /* For early breaks the scalar loop needs to execute at most VF times
> > + to find the element that caused the break.  */
> > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +{
> > +  bound_epilog = vf;
> > +  /* Force a scalar epilogue as we can't vectorize the index finding.  
> > */
> > +  vect_epilogues = false;
> 
> This is originally initialized with
> 
>   bool vect_epilogues = loop_vinfo->epilogue_vinfos.length () > 0;
> 
> so I think we should avoid filling that with LOOP_VINFO_EARLY_BREAKS
> rather than fixing up after the fact?  That is in vect_analyze_loop
> adjust
> 
>   /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
>  enabled, SIMDUID is not set, it is the innermost loop and we have
>  either already found the loop's SIMDLEN or there was no SIMDLEN to
>  begin with.
>  TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
>   bool vect_epilogues = (!simdlen
>  && loop->inner == NULL
>  && param_vect_epilogues_nomask
>  && LOOP_VINFO_PEELING_FOR_NITER
> (first_loop_vinfo)
>  && !loop->simduid);
> 
> and add !LOOP_VINFO_EARLY_BREAKS?
> 
> > +}
> > +
> >bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> >poly_uint64 bound_scalar = bound_epilog;
> >
> > @@ -3376,14 +3387,23 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >   bound_prolog + bound_epilog)
> >   : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> >  || vect_epilogues));
> > +
> > +  /* We only support early break vectorization on known bounds at this 
> > time.
> > + This means that if the vector loop can't be entered then we won't 
> > generate
> > + it at all.  So for now force skip_vector off because the additional 
> > control
> > + flow messes with the BB exits and we've already analyzed them.  */
> > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
> > +
> 
>   bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>   ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
>   bound_prolog + bound_epilog)
>   : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
>  || vect_epilogues));
> 
> to me that looks like
> 
>   gcc_assert (!skip_vector || !LOOP_VINFO_EARLY_BREAKS (loop_vinfo)

RE: [PATCH]middle-end: Fix peeled vect loop IV values.

2023-12-06 Thread Richard Biener

On Wed, 6 Dec 2023, Tamar Christina wrote:

> > > Hi All,
> > >
> > > While waiting for reviews I found this case where both loop exit needs to 
> > > go to
> > > epilogue loop, but there was an IV related variable that was used in the 
> > > scalar
> > > iteration as well.
> > >
> > > vect_update_ivs_after_vectorizer then blew the value away and replaced it 
> > > with
> > > the value if it took the normal exit.
> > >
> > > For these cases where we've peeled an a vector iteration, we should skip
> > > vect_update_ivs_after_vectorizer since all exits are "alternate" exits.
> > >
> > > For this to be correct we have peeling put the right LCSSA variables so
> > > vectorable_live_operations takes care of it.
> > >
> > > This is triggered by new testcases 79 and 80 in early break testsuite
> > > and I'll merge this commit in the main one.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
> > >   Put right LCSSA var for peeled vect loops.
> > >   (vect_do_peeling): Skip vect_update_ivs_after_vectorizer.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> > 7d48502e2e46240553509dfa6d75fcab7fea36d3..bfdbeb7faaba29aad51c0561d
> > ace680c96759484 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -1668,6 +1668,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop
> > *loop, edge loop_exit,
> > >edge loop_entry = single_succ_edge (new_preheader);
> > >if (flow_loops)
> > >   {
> > > +   bool peeled_iters = single_pred (loop->latch) != loop_exit->src;
> > > /* Link through the main exit first.  */
> > > for (auto gsi_from = gsi_start_phis (loop->header),
> > >  gsi_to = gsi_start_phis (new_loop->header);
> > > @@ -1692,11 +1693,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop *loop, edge loop_exit,
> > > continue;
> > >   }
> > >   }
> > > +   /* If we have multiple exits and the vector loop is peeled then we
> > > +  need to use the value at start of loop.  */
> > 
> > This comment doesn't really match 'peeled_iters'?  Iff the main IV exit
> > source isn't loop->latch then won't we miscompute?  I realize the
> > complication is that slpeel_tree_duplicate_loop_to_edge_cfg is used from
> > elsewhere as well (so we can't check LOOP_VINFO_EARLY_BREAKS_VECT_PEELED).
> > 
> 
> No, because in both exits we restart the scalar iteration at the start of the 
> last vector iteration.
> Normally, the counted main exit would be updated by vect_iters_bound_vf - vf. 
>  Which is the same
> As the induction value should we get to the final iteration.
> 
> More on it in your question below.
> 
> > > +   if (peeled_iters)
> > > + {
> > > +   tree tmp_arg = gimple_phi_result (from_phi);
> > > +   if (!new_phi_args.get (tmp_arg))
> > > + new_arg = tmp_arg;
> > > + }
> > >
> > > tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> > > gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
> > >
> > > -   /* Main loop exit should use the final iter value.  */
> > > +   /* Otherwise, main loop exit should use the final iter value.  */
> > > SET_PHI_ARG_DEF (lcssa_phi, loop_exit->dest_idx, new_arg);
> > >
> > >
> > adjust_phi_and_debug_stmts
> > (to_phi, loop_entry, new_res);
> > > @@ -3394,9 +3403,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > >   update_e = single_succ_edge (e->dest);
> > >
> > > -  /* Update the main exit.  */
> > > -  vect_update_ivs_after_vectorizer (loop_vinfo, 
> > > niters_vector_mult_vf,
> > > - update_e);
> > > +  /* If we have a peeled vector iteration, all exits are the same, 
> > > leave it
> > > +  and so the main exit needs to be treated the same as the alternative
> > > +  exits in that we leave their updates to vectorizable_live_operations.
> > > +  */
> > > +  if (!LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> > > + vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> > > +   update_e);
> > 
> > and now we don't update the main exit?  What's
> > LOOP_VINFO_EARLY_BREAKS_VECT_PEELED again vs.
> > LOOP_VINFO_EARLY_BREAKS?
> 
> So LOOP_VINFO_EARLY_BREAKS essentially says that there are multiple exits
> That we can vectorize.
> 
> LOOP_VINFO_EARLY_BREAKS_VECT_PEELED is saying that in this loop, we've
> picked as the main exit not the loops latch connected exit.  This means when 
> we
> exit from the "main" exit in the final iteration we may still have side 
> effects to
> perform and so the final iteration should be res

RE: [PATCH 13/21]middle-end: Update loop form analysis to support early break

2023-12-06 Thread Richard Biener

On Wed, 6 Dec 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Wednesday, December 6, 2023 8:18 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: Re: [PATCH 13/21]middle-end: Update loop form analysis to support 
> > early
> > break
> > 
> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This sets LOOP_VINFO_EARLY_BREAKS and does some misc changes so the other
> > > patches are self contained.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-vect-loop.cc (vect_analyze_loop_form): Analyse all exits.
> > >   (vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > >   (vect_transform_loop): Use it.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > index
> > 51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb991
> > f07cd6052491d0 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost
> > (loop_vec_info loop_vinfo)
> > >loop_vinfo->scalar_costs->finish_cost (nullptr);
> > >  }
> > >
> > > -
> > >  /* Function vect_analyze_loop_form.
> > >
> > > Verify that certain CFG restrictions hold, including:
> > > - the loop has a pre-header
> > > -   - the loop has a single entry and exit
> > > +   - the loop has a single entry
> > > +   - nested loops can have only a single exit.
> > > - the loop exit condition is simple enough
> > > - the number of iterations can be analyzed, i.e, a countable loop.  
> > > The
> > >   niter could be analyzed under some assumptions.  */
> > > @@ -1841,10 +1841,14 @@ vect_analyze_loop_form (class loop *loop,
> > vect_loop_form_info *info)
> > >  "not vectorized: latch block not empty.\n");
> > >
> > >/* Make sure the exit is not abnormal.  */
> > > -  if (exit_e->flags & EDGE_ABNORMAL)
> > > -return opt_result::failure_at (vect_location,
> > > -"not vectorized:"
> > > -" abnormal loop exit edge.\n");
> > > +  auto_vec exits = get_loop_exit_edges (loop);
> > > +  for (edge e : exits)
> > 
> > Seeing this multiple times, this isn't the most efficient way to
> > iterate over all exits with LOOPS_HAVE_RECORDED_EXITS.
> > 
> > Note to myself: fix (add to) the API.
> > 
> > > +{
> > > +  if (e->flags & EDGE_ABNORMAL)
> > > + return opt_result::failure_at (vect_location,
> > > +"not vectorized:"
> > > +" abnormal loop exit edge.\n");
> > > +}
> > >
> > >info->conds
> > >  = vect_get_loop_niters (loop, exit_e, &info->assumptions,
> > > @@ -1920,6 +1924,10 @@ vect_create_loop_vinfo (class loop *loop,
> > vec_info_shared *shared,
> > >
> > >LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> > >
> > > +  /* Check to see if we're vectorizing multiple exits.  */
> > > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > += !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > > +
> > 
> > Seeing this, s/LOOP_VINFO_LOOP_CONDS/LOOP_VINFO_LOOP_EXIT_CONDS/g
> > might be good, if we in future avoid if-conversion in a separate
> > pass we will have other CONDs as well.
> > 
> > >if (info->inner_loop_cond)
> > >  {
> > >stmt_vec_info inner_loop_cond_info
> > > @@ -11577,7 +11585,7 @@ vect_transform_loop (loop_vec_info loop_vinfo,
> > gimple *loop_vectorized_call)
> > >/* Make sure there exists a single-predecessor exit bb.  Do this before
> > >   versioning.   */
> > >edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > -  if (! single_pred_p (e->dest))
> > > +  if (! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > >  {
> > >split_loop_exit_edge (e, true);
> > 
> > Note this splitting is done to fulfil versioning constraints on CFG
> > update.  Do you have test coverage with alias versioning and early
> > breaks?
> 
> No, only non-alias versioning.  I don't believe we can alias in the current
> implementation because it's restricted to statically known objects with
> a fixed size.

Hm, if side-effects are all correctly in place do we still have that
restriction?

int x;
void foo (int *a, int *b)
{
  int local_x = x;
  for (int i = 0; i < 1024; ++i)
{
  if (i % local_x == 13)
break;
  a[i] = 2 * b[i];
}
}

the early exit isn't SCEV analyzable but doesn't depend on any
memory and all side-effects are after the exit already.  But
vectorizing would require alias versioning.

Richard.

[PATCH] treat argp-based mem as frame related in dse

2023-12-06 Thread Jiufu Guo

Hi,

The issue mentioned in PR112525 would be able to be handled by  
   
updating dse.cc to treat arg_pointer_rtx similarly with frame_pointer_rtx.  
   
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30271#c10 also mentioned   
   
this idea.   

One thing, arpg area may be used to pass argument to callee. So, it would   
 
be needed to check if call insns are using that mem.

Bootstrap ®test pass on ppc64{,le} and x86_64.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)


PR rtl-optimization/112525

gcc/ChangeLog:

* dse.cc (get_group_info): Add arg_pointer_rtx as frame_related.
(check_mem_read_rtx): Add parameter to indicate if it is checking mem
for call insn.
(scan_insn): Add mem checking on call usage.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr112525.c: New test.

---
 gcc/dse.cc  | 17 +
 gcc/testsuite/gcc.target/powerpc/pr112525.c | 15 +++
 2 files changed, 28 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr112525.c

diff --git a/gcc/dse.cc b/gcc/dse.cc
index 1a85dae1f8c..f43b7bf8ba6 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -682,7 +682,8 @@ get_group_info (rtx base)
   gi->group_kill = BITMAP_ALLOC (&dse_bitmap_obstack);
   gi->process_globally = false;
   gi->frame_related =
-   (base == frame_pointer_rtx) || (base == hard_frame_pointer_rtx);
+   (base == frame_pointer_rtx) || (base == hard_frame_pointer_rtx)
+   || (base == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM]);
   gi->offset_map_size_n = 0;
   gi->offset_map_size_p = 0;
   gi->offset_map_n = NULL;
@@ -2157,7 +2158,7 @@ replace_read (store_info *store_info, insn_info_t 
store_insn,
be active.  */
 
 static void
-check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
+check_mem_read_rtx (rtx *loc, bb_info_t bb_info, bool used_in_call = false)
 {
   rtx mem = *loc, mem_addr;
   insn_info_t insn_info;
@@ -2302,7 +2303,8 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
 stored, rewrite the read.  */
  else
{
- if (store_info->rhs
+ if (!used_in_call
+ && store_info->rhs
  && known_subrange_p (offset, width, store_info->offset,
   store_info->width)
  && all_positions_needed_p (store_info,
@@ -2368,7 +2370,8 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
 
  /* If this read is just reading back something that we just
 stored, rewrite the read.  */
- if (store_info->rhs
+ if (!used_in_call
+ && store_info->rhs
  && store_info->group_id == -1
  && store_info->cse_base == base
  && known_subrange_p (offset, width, store_info->offset,
@@ -2650,6 +2653,12 @@ scan_insn (bb_info_t bb_info, rtx_insn *insn, int 
max_active_local_stores)
that is not relative to the frame.  */
 add_non_frame_wild_read (bb_info);
 
+  for (rtx link = CALL_INSN_FUNCTION_USAGE (insn);
+  link != NULL_RTX;
+  link = XEXP (link, 1))
+   if (GET_CODE (XEXP (link, 0)) == USE && MEM_P (XEXP (XEXP (link, 0),0)))
+ check_mem_read_rtx (&XEXP (XEXP (link, 0),0), bb_info, true);
+
   return;
 }
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr112525.c 
b/gcc/testsuite/gcc.target/powerpc/pr112525.c
new file mode 100644
index 000..428598188e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112525.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2" } */
+
+typedef struct teststruct
+{
+  double d;
+  int arr[15]; 
+} teststruct;
+
+void
+foo (teststruct p)
+{
+}
+
+/* { dg-final { scan-assembler-not {\mstd\M} } } */
-- 
2.25.1

RE: [PATCH 13/21]middle-end: Update loop form analysis to support early break

2023-12-06 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, December 6, 2023 9:15 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 13/21]middle-end: Update loop form analysis to support 
> early
> break
> 
> On Wed, 6 Dec 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Wednesday, December 6, 2023 8:18 AM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > > Subject: Re: [PATCH 13/21]middle-end: Update loop form analysis to support
> early
> > > break
> > >
> > > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This sets LOOP_VINFO_EARLY_BREAKS and does some misc changes so the
> other
> > > > patches are self contained.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * tree-vect-loop.cc (vect_analyze_loop_form): Analyse all exits.
> > > > (vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > > > (vect_transform_loop): Use it.
> > > >
> > > > --- inline copy of patch --
> > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > index
> > >
> 51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb991
> > > f07cd6052491d0 100644
> > > > --- a/gcc/tree-vect-loop.cc
> > > > +++ b/gcc/tree-vect-loop.cc
> > > > @@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost
> > > (loop_vec_info loop_vinfo)
> > > >loop_vinfo->scalar_costs->finish_cost (nullptr);
> > > >  }
> > > >
> > > > -
> > > >  /* Function vect_analyze_loop_form.
> > > >
> > > > Verify that certain CFG restrictions hold, including:
> > > > - the loop has a pre-header
> > > > -   - the loop has a single entry and exit
> > > > +   - the loop has a single entry
> > > > +   - nested loops can have only a single exit.
> > > > - the loop exit condition is simple enough
> > > > - the number of iterations can be analyzed, i.e, a countable loop.  
> > > > The
> > > >   niter could be analyzed under some assumptions.  */
> > > > @@ -1841,10 +1841,14 @@ vect_analyze_loop_form (class loop *loop,
> > > vect_loop_form_info *info)
> > > >"not vectorized: latch block not 
> > > > empty.\n");
> > > >
> > > >/* Make sure the exit is not abnormal.  */
> > > > -  if (exit_e->flags & EDGE_ABNORMAL)
> > > > -return opt_result::failure_at (vect_location,
> > > > -  "not vectorized:"
> > > > -  " abnormal loop exit edge.\n");
> > > > +  auto_vec exits = get_loop_exit_edges (loop);
> > > > +  for (edge e : exits)
> > >
> > > Seeing this multiple times, this isn't the most efficient way to
> > > iterate over all exits with LOOPS_HAVE_RECORDED_EXITS.
> > >
> > > Note to myself: fix (add to) the API.
> > >
> > > > +{
> > > > +  if (e->flags & EDGE_ABNORMAL)
> > > > +   return opt_result::failure_at (vect_location,
> > > > +  "not vectorized:"
> > > > +  " abnormal loop exit edge.\n");
> > > > +}
> > > >
> > > >info->conds
> > > >  = vect_get_loop_niters (loop, exit_e, &info->assumptions,
> > > > @@ -1920,6 +1924,10 @@ vect_create_loop_vinfo (class loop *loop,
> > > vec_info_shared *shared,
> > > >
> > > >LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> > > >
> > > > +  /* Check to see if we're vectorizing multiple exits.  */
> > > > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > += !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > > > +
> > >
> > > Seeing this,
> s/LOOP_VINFO_LOOP_CONDS/LOOP_VINFO_LOOP_EXIT_CONDS/g
> > > might be good, if we in future avoid if-conversion in a separate
> > > pass we will have other CONDs as well.
> > >
> > > >if (info->inner_loop_cond)
> > > >  {
> > > >stmt_vec_info inner_loop_cond_info
> > > > @@ -11577,7 +11585,7 @@ vect_transform_loop (loop_vec_info
> loop_vinfo,
> > > gimple *loop_vectorized_call)
> > > >/* Make sure there exists a single-predecessor exit bb.  Do this 
> > > > before
> > > >   versioning.   */
> > > >edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > -  if (! single_pred_p (e->dest))
> > > > +  if (! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS
> (loop_vinfo))
> > > >  {
> > > >split_loop_exit_edge (e, true);
> > >
> > > Note this splitting is done to fulfil versioning constraints on CFG
> > > update.  Do you have test coverage with alias versioning and early
> > > breaks?
> >
> > No, only non-alias versioning.  I don't believe we can alias in the current
> > implementation because it's restricted to statically known objects with
> > a fixed size.
> 
> Hm, if side-effects are all correctly in place do we still have that
> restriction?

RE: [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks

2023-12-06 Thread Richard Biener

On Wed, 6 Dec 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Wednesday, December 6, 2023 8:32 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: Re: [PATCH 12/21]middle-end: Add remaining changes to peeling and
> > vectorizer to support early breaks
> > 
> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This finishes wiring that didn't fit in any of the other patches.
> > > Essentially just adding related changes so peeling for early break works.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-vect-loop-manip.cc (vect_set_loop_condition_normal,
> > >   vect_do_peeling): Support early breaks.
> > >   * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p): Likewise.
> > >   * tree-vectorizer.cc (pass_vectorize::execute): Check all exits.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> > eef2bb50c1505f5cf802d5d80300affc2cbe69f6..9c1405d79fd8fe8689007df3b7
> > 605b7a3d3ecdd7 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -1187,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512
> > (class loop *loop,
> > > loop handles exactly VF scalars per iteration.  */
> > >
> > >  static gcond *
> > > -vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> > exit_edge,
> > > +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
> > >   class loop *loop, tree niters, tree step,
> > >   tree final_iv, bool niters_maybe_zero,
> > >   gimple_stmt_iterator loop_cond_gsi)
> > > @@ -1296,7 +1296,8 @@ vect_set_loop_condition_normal (loop_vec_info /*
> > loop_vinfo */, edge exit_edge,
> > >gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
> > >
> > >/* Record the number of latch iterations.  */
> > > -  if (limit == niters)
> > > +  if (limit == niters
> > > +  || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > >  /* Case A: the loop iterates NITERS times.  Subtract one to get the
> > > latch count.  */
> > >  loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
> > > @@ -3242,6 +3243,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >  bound_epilog += vf - 1;
> > >if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > >  bound_epilog += 1;
> > > +
> > > +  /* For early breaks the scalar loop needs to execute at most VF times
> > > + to find the element that caused the break.  */
> > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +{
> > > +  bound_epilog = vf;
> > > +  /* Force a scalar epilogue as we can't vectorize the index 
> > > finding.  */
> > > +  vect_epilogues = false;
> > 
> > This is originally initialized with
> > 
> >   bool vect_epilogues = loop_vinfo->epilogue_vinfos.length () > 0;
> > 
> > so I think we should avoid filling that with LOOP_VINFO_EARLY_BREAKS
> > rather than fixing up after the fact?  That is in vect_analyze_loop
> > adjust
> > 
> >   /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
> >  enabled, SIMDUID is not set, it is the innermost loop and we have
> >  either already found the loop's SIMDLEN or there was no SIMDLEN to
> >  begin with.
> >  TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
> >   bool vect_epilogues = (!simdlen
> >  && loop->inner == NULL
> >  && param_vect_epilogues_nomask
> >  && LOOP_VINFO_PEELING_FOR_NITER
> > (first_loop_vinfo)
> >  && !loop->simduid);
> > 
> > and add !LOOP_VINFO_EARLY_BREAKS?
> > 
> > > +}
> > > +
> > >bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> > >poly_uint64 bound_scalar = bound_epilog;
> > >
> > > @@ -3376,14 +3387,23 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > > bound_prolog + bound_epilog)
> > > : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > >|| vect_epilogues));
> > > +
> > > +  /* We only support early break vectorization on known bounds at this 
> > > time.
> > > + This means that if the vector loop can't be entered then we won't 
> > > generate
> > > + it at all.  So for now force skip_vector off because the additional 
> > > control
> > > + flow messes with the BB exits and we've already analyzed them.  */
> > > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
> > > +
> > 
> >   bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >   ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
> >

Re: [PATCH v7 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-12-06 Thread waffl3x







On Wednesday, December 6th, 2023 at 1:48 AM, Jakub Jelinek  
wrote:


> 
> 
> On Wed, Dec 06, 2023 at 07:33:21AM +, waffl3x wrote:
> 
> > Here is the next version, it feels very close to finished. As before, I
> > haven't ran a bootstrap or the full testsuite yet but I did run the
> > explicit-obj tests which completed as expected.
> > 
> > There's a few test cases that still need to be written but more tests
> > can always be added. The behavior added by CWG2789 works in at least
> > one case, but I have not added tests for it yet. The test cases for
> > dependent lambda expressions need to be fleshed out more, but a few
> > temporary ones are included to demonstrate that they do work and that
> > the crash is fixed. Explicit object conversion functions work, but I
> > need to add fleshed out tests for them, explicit-obj-basic5.C has that
> > test.
> > 
> > I'll start the tests now and report back if anything fails, I'm
> > confident everything will be fine though.
> > 
> > Alex
> 
> > From 937e12c57145bfd878a0bc4cd9735c2d3c4fcf22 Mon Sep 17 00:00:00 2001
> > From: Waffl3x waff...@protonmail.com
> > Date: Tue, 5 Dec 2023 23:16:01 -0700
> > Subject: [PATCH] P0847R7 (Deducing This) [PR102609] Another quick and dirty
> > patch for review, hopefully the last. gcc/cp/ChangeLog:
> 
> 
> Please add
> PR c++/102609
> line above this.
> 
> > * call.cc (build_this_conversion):
> 
> 
> Note, for the final submission, all the ):
> should be followed by descriptions what has changed in there (but not why).

Yeah, I remember the drill, it just takes me a long time so I've been
slacking.

> Plus it would be good to mention somewhere early in the cp/ChangeLog
> entry that the patch implements C++23 P0847R7 - Deducing this paper
> (unfortunately the ChangeLog verifier doesn't allow such free text above
> the ChangeLog entry where it used to be written some years ago,
> only allows there the PR line; I usually put such text after the ):
> of the first entry now and only after it write exactly what changed
> in that function. Does the patch also implement CWG2586?

Oh jeez, I had been doing it the way you're saying is rejected.
Shouldn't the ChangeLog verifier be changed to allow this?

The patch does not implement CWG2586 at this time. I couldn't determine
if it were ready to go or not. I have a skeleton of tests for it that I
never finished, but as far as I know the implementation does conform to
CWG2789, this just happened to be how it worked out.

> 
> Also, I don't see in the patch the expected
> gcc/c-family/
> * c-cppbuiltin.cc (c_cpp_builtins): Predefine
> __cpp_explicit_this_parameter=202110L for C++23.
> part plus gcc/testsuite/cpp{23,26}/feat-cxx*.C additions checking
> for that macro presence and its value.
> 
> Jakub

Yeah I was meaning to look into how to do that, I originally added the
test and then never included it in any of the patches, or that's what
remember anyway. This saves me the work though, I'll be sure to add
that.

Alex

RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits

2023-12-06 Thread Richard Biener

On Wed, 6 Dec 2023, Tamar Christina wrote:

> > > > is the exit edge you are looking for without iterating over all loop 
> > > > exits.
> > > >
> > > > > + gimple *tmp_vec_stmt = vec_stmt;
> > > > > + tree tmp_vec_lhs = vec_lhs;
> > > > > + tree tmp_bitstart = bitstart;
> > > > > + /* For early exit where the exit is not in the BB that 
> > > > > leads
> > > > > +to the latch then we're restarting the iteration in 
> > > > > the
> > > > > +scalar loop.  So get the first live value.  */
> > > > > + restart_loop = restart_loop || exit_e != main_e;
> > > > > + if (restart_loop)
> > > > > +   {
> > > > > + tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> > > > > + tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
> > > > > + tmp_bitstart = build_zero_cst (TREE_TYPE 
> > > > > (bitstart));
> > > >
> > > > Hmm, that gets you the value after the first iteration, not the one 
> > > > before which
> > > > would be the last value of the preceeding vector iteration?
> > > > (but we don't keep those, we'd need a PHI)
> > >
> > > I don't fully follow.  The comment on top of this hunk under if 
> > > (loop_vinfo) states
> > > that lhs should be pointing to a PHI.
> > >
> > > When I inspect the statement I see
> > >
> > > i_14 = PHI 
> > >
> > > so i_14 is the value at the start of the current iteration.  If we're 
> > > coming from the
> > > header 0, otherwise i_11 which is the value of the previous iteration?
> > >
> > > The peeling code explicitly leaves i_14 in the merge block and not i_11 
> > > for this
> > exact reason.
> > > So I'm confused, my understanding is that we're already *at* the right 
> > > PHI.
> > >
> > > Is it perhaps that you thought we put i_11 here for the early exits? In 
> > > which case
> > > Yes I'd agree that that would be wrong, and there we would have had to 
> > > look at
> > > The defs, but i_11 is the def.
> > >
> > > I already kept this in mind and leveraged peeling to make this part 
> > > easier.
> > > i_11 is used in the main exit and i_14 in the early one.
> > 
> > I think the important detail is that this code is only executed for
> > vect_induction_defs which are indeed PHIs and so we're sure the
> > value live is before any modification so fine to feed as initial
> > value for the PHI in the epilog.
> > 
> > Maybe we can assert the def type here?
> 
> We can't assert because until cfg cleanup the dead value is still seen and 
> still
> vectorized.  That said I've added a guard here.  We vectorize the 
> non-induction
> value as normal now and if it's ever used it'll fail.
> 
> > 
> > > >
> > > > Why again do we need (non-induction) live values from the vector loop 
> > > > to the
> > > > epilogue loop again?
> > >
> > > They can appear as the result value of the main exit.
> > >
> > > e.g. in testcase (vect-early-break_17.c)
> > >
> > > #define N 1024
> > > unsigned vect_a[N];
> > > unsigned vect_b[N];
> > >
> > > unsigned test4(unsigned x)
> > > {
> > >  unsigned ret = 0;
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >vect_b[i] = x + i;
> > >if (vect_a[i] > x)
> > >  return vect_a[i];
> > >vect_a[i] = x;
> > >ret = vect_a[i] + vect_b[i];
> > >  }
> > >  return ret;
> > > }
> > >
> > > The only situation they can appear in the as an early-break is when
> > > we have a case where main exit != latch connected exit.
> > >
> > > However in these cases they are unused, and only there because
> > > normally you would have exited (i.e. there was a return) but the
> > > vector loop needs to start over so we ignore it.
> > >
> > > These happen in testcase vect-early-break_74.c and
> > > vect-early-break_78.c
> > 
> > Hmm, so in that case their value is incorrect (but doesn't matter,
> > we ignore it)?
> > 
> 
> Correct, they're placed there due to exit redirection, but in these inverted
> testcases where we've peeled the vector iteration you can't ever skip the
> epilogue.  So they are guaranteed not to be used.
> 
> > > > > + gimple_stmt_iterator exit_gsi;
> > > > > + tree new_tree
> > > > > +   = vectorizable_live_operation_1 (loop_vinfo, 
> > > > > stmt_info,
> > > > > +exit_e, vectype, 
> > > > > ncopies,
> > > > > +slp_node, bitsize,
> > > > > +tmp_bitstart, 
> > > > > tmp_vec_lhs,
> > > > > +lhs_type, 
> > > > > restart_loop,
> > > > > +&exit_gsi);
> > > > > +
> > > > > + /* Use the empty block on the exit to materialize the 
> > > > > new
> > > > stmts
> > > > > +so we can use update the PHI here.  */
> > > > > + if (gimple_phi_num_args (use_stmt) == 1)
> > > > > +   {
> > >

RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-12-06 Thread Richard Biener

On Wed, 6 Dec 2023, Tamar Christina wrote:

> > > > +
> > > > +  tree truth_type = truth_type_for (vectype_op);  machine_mode mode =
> > > > + TYPE_MODE (truth_type);  int ncopies;
> > > > +
> > 
> > more line break issues ... (also below, check yourself)
> > 
> > shouldn't STMT_VINFO_VECTYPE already match truth_type here?  If not
> > it looks to be set wrongly (or shouldn't be set at all)
> > 
> 
> Fixed, I now leverage the existing vect_recog_bool_pattern to update the types
> If needed and determine the initial type in vect_get_vector_types_for_stmt.
> 
> > > > +  if (slp_node)
> > > > +ncopies = 1;
> > > > +  else
> > > > +ncopies = vect_get_num_copies (loop_vinfo, truth_type);
> > > > +
> > > > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);  bool
> > > > + masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > > > +
> > 
> > what about with_len?
> 
> Should be easy to add, but don't know how it works.
> 
> > 
> > > > +  /* Analyze only.  */
> > > > +  if (!vec_stmt)
> > > > +{
> > > > +  if (direct_optab_handler (cbranch_optab, mode) == 
> > > > CODE_FOR_nothing)
> > > > +   {
> > > > + if (dump_enabled_p ())
> > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +  "can't vectorize early exit because the "
> > > > +  "target doesn't support flag setting 
> > > > vector "
> > > > +  "comparisons.\n");
> > > > + return false;
> > > > +   }
> > > > +
> > > > +  if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
> > 
> > Why NE_EXPR?  This looks wrong.  Or vectype_op is wrong if you're
> > emitting
> > 
> >  mask = op0 CMP op1;
> >  if (mask != 0)
> > 
> > I think you need to check for CMP, not NE_EXPR.
> 
> Well CMP is checked by vectorizable_comparison_1, but I realized this
> check is not checking what I wanted and the cbranch requirements
> already do.  So removed.
> 
> > 
> > > > +   {
> > > > + if (dump_enabled_p ())
> > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +  "can't vectorize early exit because the "
> > > > +  "target does not support boolean vector "
> > > > +  "comparisons for type %T.\n", 
> > > > truth_type);
> > > > + return false;
> > > > +   }
> > > > +
> > > > +  if (ncopies > 1
> > > > + && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > > > +   {
> > > > + if (dump_enabled_p ())
> > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +  "can't vectorize early exit because the "
> > > > +  "target does not support boolean vector 
> > > > OR for "
> > > > +  "type %T.\n", truth_type);
> > > > + return false;
> > > > +   }
> > > > +
> > > > +  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, 
> > > > code, gsi,
> > > > + vec_stmt, slp_node, cost_vec))
> > > > +   return false;
> > 
> > I suppose vectorizable_comparison_1 will check this again, so the above
> > is redundant?
> > 
> 
> The IOR? No, vectorizable_comparison_1 doesn't reduce so may not check it
> depending on the condition.
> 
> > > > +  /* Determine if we need to reduce the final value.  */
> > > > +  if (stmts.length () > 1)
> > > > +{
> > > > +  /* We build the reductions in a way to maintain as much 
> > > > parallelism as
> > > > +possible.  */
> > > > +  auto_vec workset (stmts.length ());
> > > > +  workset.splice (stmts);
> > > > +  while (workset.length () > 1)
> > > > +   {
> > > > + new_temp = make_temp_ssa_name (truth_type, NULL,
> > > > "vexit_reduc");
> > > > + tree arg0 = workset.pop ();
> > > > + tree arg1 = workset.pop ();
> > > > + new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0,
> > > > arg1);
> > > > + vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > > +  &cond_gsi);
> > > > + if (slp_node)
> > > > +   slp_node->push_vec_def (new_stmt);
> > > > + else
> > > > +   STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> > > > + workset.quick_insert (0, new_temp);
> > 
> > Reduction epilogue handling has similar code to reduce a set of vectors
> > to a single one with an operation.  I think we want to share that code.
> > 
> 
> I've taken a look but that code isn't suitable here since they have different
> constraints.  I don't require an in-order reduction since for the comparison
> all we care about is whether in a lane any bit is set or not.  This means:
> 
> 1. we can reduce using a fast operation like IOR.
> 2. we can reduce in as much parallelism as possible.
> 
> The comp

Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

2023-12-06 Thread Hao Liu OS

Hi,

Update the patch to fix problems in the test case:
 - add "-details" option to the dump command
 - add dg-require and target filters to avoid potential failures on platforms 
that don't support vectorization.

Thanks,
-Hao

gcc/ChangeLog:

PR tree-optimization/112774
* tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
printed with additional  info.
* tree-scalar-evolution.cc: add record_nonwrapping_chrec and
nonwrapping_chrec_p to set and check the new flag respectively.
* tree-scalar-evolution.h: Likewise.
* tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
scev_probably_wraps_p): call record_nonwrapping_chrec before
record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is 
set and
return false from scev_probably_wraps_p.
* tree-vect-loop.cc (vect_analyze_loop): call
free_numbers_of_iterations_estimates explicitly.
* gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
to represent the nonwrapping info.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/scev-16.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 18 ++
 gcc/tree-pretty-print.cc|  2 +-
 gcc/tree-scalar-evolution.cc| 24 
 gcc/tree-scalar-evolution.h |  2 ++
 gcc/tree-ssa-loop-niter.cc  | 21 -
 gcc/tree-vect-loop.cc   |  4 
 gcc/tree.h  |  8 +---
 7 files changed, 70 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
new file mode 100644
index 000..120f40c0b6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
+
+int A[1024 * 2];
+
+int foo (unsigned offset, unsigned N)
+{
+  int sum = 0;
+
+  for (unsigned i = 0; i < N; i++)
+sum += A[i + offset];
+
+  return sum;
+}
+
+/* Loop can be vectorized by referring "i + offset" is nonwrapping from array. 
 */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { ! { 
avr-*-* msp430-*-* pru-*-* } } } } } */
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 1fadd752d05..0dabb6d1580 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
   dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
   pp_string (pp, ", +, ");
   dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
-  pp_string (pp, "}_");
+  pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}_");
   pp_scalar (pp, "%u", CHREC_VARIABLE (node));
   is_stmt = false;
   break;
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index f61277c32df..81630603c12 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
   return res;
 }

+/* If CHREC doesn't overflow, set the nonwrapping flag.  */
+
+void record_nonwrapping_chrec (tree chrec)
+{
+  CHREC_NOWRAP(chrec) = 1;
+
+  if (dump_file && (dump_flags & TDF_SCEV))
+{
+  fprintf (dump_file, "(record_nonwrapping_chrec: ");
+  print_generic_expr (dump_file, chrec);
+  fprintf (dump_file, ")\n");
+}
+}
+
+/* Return true if CHREC's nonwrapping flag is set.  */
+
+bool nonwrapping_chrec_p (tree chrec)
+{
+  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
+return false;
+
+  return CHREC_NOWRAP(chrec);
+}
+
 /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */

 static tree
diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
index a64ed78fe63..f57fde12ee2 100644
--- a/gcc/tree-scalar-evolution.h
+++ b/gcc/tree-scalar-evolution.h
@@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, 
struct affine_iv *,
   bool);
 extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
 extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
+extern void record_nonwrapping_chrec (tree);
+extern bool nonwrapping_chrec_p (tree);

 /* Returns the basic block preceding LOOP, or the CFG entry block when
the loop is function's body.  */
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 2098bef9a97..d465e0ed7e1 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)

   /* If access is not executed on every iteration, we must ensure that overlow
  may

RE: [PATCH 10/21]middle-end: implement relevancy analysis support for control flow

2023-12-06 Thread Richard Biener

On Wed, 6 Dec 2023, Tamar Christina wrote:

> > > > + && LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != cond)
> > > > +   *relevant = vect_used_in_scope;
> > 
> > but why not simply mark all gconds as vect_used_in_scope?
> > 
> 
> We break outer-loop vectorization since doing so would pull the inner loop's
> exit into scope for the outerloop.   Also we can't force the loop's main IV 
> exit
> to be in scope, since it will be replaced by the vectorizer.
> 
> I've updated the code to remove the quadratic lookup.
> 
> > > > +}
> > > >
> > > >/* changing memory.  */
> > > >if (gimple_code (stmt_info->stmt) != GIMPLE_PHI) @@ -374,6 +379,11 @@
> > > > vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
> > > > *relevant = vect_used_in_scope;
> > > >}
> > > >
> > > > +  auto_vec exits = get_loop_exit_edges (loop);  auto_bitmap
> > > > + exit_bbs;  for (edge exit : exits)
> > 
> > is it your mail client messing patches up?  missing line-break
> > again.
> > 
> 
> Yeah, seems it was, hopefully fixed now.
> 
> > > > +bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > +
> > 
> > you don't seem to use the bitmap?
> > 
> > > >/* uses outside the loop.  */
> > > >FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > > > SSA_OP_DEF)
> > > >  {
> > > > @@ -392,7 +402,6 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > > > loop_vec_info loop_vinfo,
> > > >   /* We expect all such uses to be in the loop exit phis
> > > >  (because of loop closed form)   */
> > > >   gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> > > > - gcc_assert (bb == single_exit (loop)->dest);
> > > >
> > > >*live_p = true;
> > > > }
> > > > @@ -793,6 +802,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info
> > > > loop_vinfo, bool *fatal)
> > > > return res;
> > > > }
> > > >   }
> > > > +   }
> > > > + else if (gcond *cond = dyn_cast  (stmt_vinfo->stmt))
> > > > +   {
> > > > + enum tree_code rhs_code = gimple_cond_code (cond);
> > > > + gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> > > > + opt_result res
> > > > +   = process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > > > +  loop_vinfo, relevant, &worklist, false);
> > > > + if (!res)
> > > > +   return res;
> > > > + res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > > > +   loop_vinfo, relevant, &worklist, false);
> > > > + if (!res)
> > > > +   return res;
> > > >  }
> > 
> > I guess we're missing an
> > 
> >   else
> > gcc_unreachable ();
> > 
> > to catch not handled stmt kinds (do we have gcond patterns yet?)
> > 
> > > >   else if (gcall *call = dyn_cast  (stmt_vinfo->stmt))
> > > > {
> > > > @@ -13043,11 +13066,15 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >  node_instance, cost_vec);
> > > >if (!res)
> > > > return res;
> > > > -   }
> > > > +}
> > > > +
> > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > +STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> > 
> > I think it should rather be vect_condition_def.  It's also not
> > this functions business to set STMT_VINFO_DEF_TYPE.  If we ever
> > get to handle not if-converted code (or BB vectorization of that)
> > then a gcond would define the mask stmts are under.
> > 
> 
> Hmm sure, I've had to place it in multiple other places but moved it
> away from here.  The main ones are set during dataflow analysis when
> we determine which statements need to be moved.

I'd have set it where we set STMT_VINFO_TYPE on conds to
loop_exit_ctrl_vec_info_type.

The patch below has it in vect_mark_pattern_stmts only?  Guess it's
in the other patch(es) now.

> > > >switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > > >  {
> > > >case vect_internal_def:
> > > > +  case vect_early_exit_def:
> > > >  break;
> > > >
> > > >case vect_reduction_def:
> > > > @@ -13080,6 +13107,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >  {
> > > >gcall *call = dyn_cast  (stmt_info->stmt);
> > > >gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > > + || gimple_code (stmt_info->stmt) == GIMPLE_COND
> > > >   || (call && gimple_call_lhs (call) == NULL_TREE));
> > > >*need_to_vectorize = true;
> > > >  }
> > > > @@ -13835,6 +13863,14 @@ vect_is_simple_use (vec_info *vinfo,
> > > > stmt_vec_info stmt, slp_tree slp_node,
> > > >   else
> > > > *op = gimple_op (ass, operand + 1);
> > > > }
> > > > +  else if (gcond *cond = dyn_cast  (stmt->stmt))
> > > > +   {
> > > > + gimple_match_op m_op;
> > > > +

[PATCH] libgcc: Avoid -Wbuiltin-declaration-mismatch warnings in emutls.c

2023-12-06 Thread Jakub Jelinek

Hi!

When libgcc is being built in --disable-tls configuration or on
a target without native TLS support, one gets annoying warnings:
../../../../libgcc/emutls.c:61:7: warning: conflicting types for built-in 
function ‘__emutls_get_address’; expected ‘void *(void *)’ 
[-Wbuiltin-declaration-mismatch]
   61 | void *__emutls_get_address (struct __emutls_object *);
  |   ^~~~
../../../../libgcc/emutls.c:63:6: warning: conflicting types for built-in 
function ‘__emutls_register_common’; expected ‘void(void *, unsigned int,  
unsigned int,  void *)’ [-Wbuiltin-declaration-mismatch]
   63 | void __emutls_register_common (struct __emutls_object *, word, word, 
void *);
  |  ^~~~
../../../../libgcc/emutls.c:140:1: warning: conflicting types for built-in 
function ‘__emutls_get_address’; expected ‘void *(void *)’ 
[-Wbuiltin-declaration-mismatch]
  140 | __emutls_get_address (struct __emutls_object *obj)
  | ^~~~
../../../../libgcc/emutls.c:204:1: warning: conflicting types for built-in 
function ‘__emutls_register_common’; expected ‘void(void *, unsigned int,  
unsigned int,  void *)’ [-Wbuiltin-declaration-mismatch]
  204 | __emutls_register_common (struct __emutls_object *obj,
  | ^~~~
The thing is that in that case __emutls_get_address and
__emutls_register_common are builtins, and are declared with void *
arguments rather than struct __emutls_object *.
Now, struct __emutls_object is a type private to libgcc/emutls.c and the
middle-end creates on demand when calling the builtins a similar structure
(with small differences, like not having the union in there).

We have a precedent for this e.g. for fprintf or strftime builtins where
the builtins are created with magic fileptr_type_node or const_tm_ptr_type_node
types and then match it with user definition of pointers to some structure,
but I think for this case users should never define these functions
themselves nor call them and having special types for them in the compiler
would mean extra compile time spent during compiler initialization and more
GC data, so I think it is better to keep the compiler as is.

On the library side, there is an option to just follow what the
compiler is doing and do
 EMUTLS_ATTR void
-__emutls_register_common (struct __emutls_object *obj,
+__emutls_register_common (void *xobj,
   word size, word align, void *templ)
 {
+  struct __emutls_object *obj = (struct __emutls_object *) xobj;
but that will make e.g. libabigail complain about ABI change in libgcc.

So, the patch just turns the warning off.

Tested on x86_64-linux with --disable-tls, ok for trunk?

2023-12-06  Thomas Schwinge  
Jakub Jelinek  

PR libgcc/109289
* emutls.c: Add GCC diagnostic ignored "-Wbuiltin-declaration-mismatch"
pragma.

--- libgcc/emutls.c.jj  2023-01-16 11:52:16.780723793 +0100
+++ libgcc/emutls.c 2023-12-06 10:49:46.438060090 +0100
@@ -57,6 +57,14 @@ struct __emutls_array
 #  define EMUTLS_ATTR
 #endif
 
+/* __emutls_get_address and __emutls_register_common are registered as
+   builtins, but the compiler struct __emutls_object doesn't have
+   a union in there and is only created when actually needed for
+   the calls to the builtins, so the builtins are created with void *
+   arguments rather than struct __emutls_object *.  Avoid
+   -Wbuiltin-declaration-mismatch warnings.  */
+#pragma GCC diagnostic ignored "-Wbuiltin-declaration-mismatch"
+
 EMUTLS_ATTR
 void *__emutls_get_address (struct __emutls_object *);
 EMUTLS_ATTR

Jakub

Re: [PATCH v7] Introduce attribute sym_alias

2023-12-06 Thread Jan Hubicka

> On Nov 30, 2023, Jan Hubicka  wrote:
> 
> >> +  if (VAR_P (replaced))
> >> +  varpool_node::create_alias (sym_node->decl, replacement);
> >> +  else
> >> +  cgraph_node::create_alias (sym_node->decl, replacement);
> 
> Unfortunately, this change didn't work.  Several of the C++ tests
> regressed with it.  Going back to same-body aliases, they work.
> 
> I suspect this may have to do with the infrastructure put in to deal
> with cdtors clones.

Do you have short testcase for this?  THe main oddities with same body
aliases comes from the fact that C++ FE creates them early during
parsing before all declaration flags are finished.

Later we do:

  /* Ugly, but the fixup cannot happen at a time same body alias is created;
 C++ FE is confused about the COMDAT groups being right.  */
  if (symtab->cpp_implicit_aliases_done)
FOR_EACH_SYMBOL (node)
  if (node->cpp_implicit_alias)
  node->fixup_same_cpp_alias_visibility (node->get_alias_target ());

Fixup copies some flags such as inline flags, visibility and comdat
groups which can change during parsing process.

Since you produce aliases late at finalization time, I do not see how
this could be relevant.  Pehraps unless you manage to copy wrong flags
from implicit aliases before the fixup happens which would be simple
ordering problem

Honza
> 
> I've also found some discrepancies between C and C++ WRT sym_alias in
> static local variables, and failure to detect and report symbol name
> clashes between sym_aliases and unrelated declarations.  Thanks, Joseph,
> for pushing me to consider other cases I hadn't thought of before :-)
> I'm going to look into these, but for now, the patch below gets a full
> pass, with these issues XFAILed.
> 
> 
> > The IPA bits are fine.  I will take a look on your second patch.
> 
> Thanks!
> 
> 
> Introduce attribute sym_alias
> 
> This patch introduces an attribute to add extra asm names (aliases)
> for a decl when its definition is output.  The main goal is to ease
> interfacing C++ with Ada, as C++ mangled names have to be named, and
> in some cases (e.g. when using stdint.h typedefs in function
> arguments) the symbol names may vary across platforms.
> 
> The attribute is usable in C and C++, presumably in all C-family
> languages.  It can be attached to global variables and functions.  In
> C++, it can also be attached to class types, namespace-scoped
> variables and functions, static data members, member functions,
> explicit instantiations and specializations of template functions,
> members and classes.
> 
> When applied to constructors or destructor, additional sym aliases
> with _Base and _Del suffixes are defined for variants other than
> complete-object ones.  This changes the assumption that clones always
> carry the same attributes as their abstract declarations, so there is
> now a function to adjust them.
> 
> C++ also had a bug in which attributes from local extern declarations
> failed to be propagated to a preexisting corresponding
> namespace-scoped decl.  I've fixed that, and adjusted acc tests that
> distinguished between C and C++ in this regard.
> 
> Applying the attribute to class types is only valid in C++, and the
> effect is to attach the alias to the RTTI object associated with the
> class type.
> 
> for  gcc/ChangeLog
> 
>   * attribs.cc: Include cgraph.h.
>   (decl_attributes): Allow late introduction of sym_alias in
>   types.
>   (create_sym_alias_decl, create_sym_alias_decls): New.
>   * attribs.h: Declare them.
>   (FOR_EACH_SYM_ALIAS): New macro.
>   * cgraph.cc (cgraph_node::create): Create sym_alias decls.
>   * varpool.cc (varpool_node::get_create): Create sym_alias
>   decls.
>   * cgraph.h (symtab_node::remap_sym_alias_target): New.
>   * symtab.cc (symtab_node::remap_sym_alias_target): Define.
>   * cgraphunit.cc (cgraph_node::analyze): Create alias_target
>   node if needed.
>   (analyze_functions): Fixup visibility of implicit alias only
>   after its node is analyzed.
>   * doc/extend.texi (sym_alias): Document for variables,
>   functions and types.
> 
> for  gcc/ada/ChangeLog
> 
>   * doc/gnat_rm/interfacing_to_other_languages.rst: Mention
>   attribute sym_alias to give RTTI symbols mnemonic names.
>   * doc/gnat_ugn/the_gnat_compilation_model.rst: Mention
>   aliases.  Fix incorrect ref to C1 ctor variant.
> 
> for  gcc/c-family/ChangeLog
> 
>   * c-ada-spec.cc (pp_asm_name): Use first sym_alias if
>   available.
>   * c-attribs.cc (handle_sym_alias_attribute): New.
>   (c_common_attribute_table): Add sym_alias.
>   (handle_copy_attribute): Do not copy sym_alias attribute.
> 
> for  gcc/c/ChangeLog
> 
>   * c-decl.cc (duplicate_decls): Remap sym_alias target.
> 
> for  gcc/cp/ChangeLog
> 
>   * class.cc (adjust_clone_attributes): New.
>   (copy_fndecl_with_name, build_clone): Call it.
>   * cp-tree.h (adjust_clone_attributes): Decla

Re: [PATCH v5] Introduce strub: machine-independent stack scrubbing

2023-12-06 Thread Jan Hubicka

Hi,
I am sorry for sending this late.  I think the ipa changes are generally
fine.  There are few things which was not clear to me.
> for  gcc/ChangeLog
> 
>   * Makefile.in (OBJS): Add ipa-strub.o.
>   (GTFILES): Add ipa-strub.cc.
>   * builtins.def (BUILT_IN_STACK_ADDRESS): New.
>   (BUILT_IN___STRUB_ENTER): New.
>   (BUILT_IN___STRUB_UPDATE): New.
>   (BUILT_IN___STRUB_LEAVE): New.
>   * builtins.cc: Include ipa-strub.h.
>   (STACK_STOPS, STACK_UNSIGNED): Define.
>   (expand_builtin_stack_address): New.
>   (expand_builtin_strub_enter): New.
>   (expand_builtin_strub_update): New.
>   (expand_builtin_strub_leave): New.
>   (expand_builtin): Call them.
>   * common.opt (fstrub=*): New options.
>   * doc/extend.texi (strub): New type attribute.
>   (__builtin_stack_address): New function.
>   (Stack Scrubbing): New section.
>   * doc/invoke.texi (-fstrub=*): New options.
>   (-fdump-ipa-*): New passes.
>   * gengtype-lex.l: Ignore multi-line pp-directives.
>   * ipa-inline.cc: Include ipa-strub.h.
>   (can_inline_edge_p): Test strub_inlinable_to_p.
>   * ipa-split.cc: Include ipa-strub.h.
>   (execute_split_functions): Test strub_splittable_p.
>   * ipa-strub.cc, ipa-strub.h: New.
>   * passes.def: Add strub_mode and strub passes.
>   * tree-cfg.cc (gimple_verify_flow_info): Note on debug stmts.
>   * tree-pass.h (make_pass_ipa_strub_mode): Declare.
>   (make_pass_ipa_strub): Declare.
>   (make_pass_ipa_function_and_variable_visibility): Fix
>   formatting.
>   * tree-ssa-ccp.cc (optimize_stack_restore): Keep restores
>   before strub leave.
>   * multiple_target.cc (pass_target_clone::gate): Test seen_error.
>   * attribs.cc: Include ipa-strub.h.
>   (decl_attributes): Support applying attributes to function
>   type, rather than pointer type, at handler's request.
>   (comp_type_attributes): Combine strub_comptypes and target
>   comp_type results.
>   * doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New.
>   (TARGET_STRUB_MAY_USE_MEMSET): New.
>   * doc/tm.texi: Rebuilt.
>   * cgraph.h (symtab_node::reset): Add preserve_comdat_group
>   param, with a default.
>   * cgraphunit.cc (symtab_node::reset): Use it.
> 
> for  gcc/c-family/ChangeLog
> 
>   * c-attribs.cc: Include ipa-strub.h.
>   (handle_strub_attribute): New.
>   (c_common_attribute_table): Add strub.
> 
> for  gcc/ada/ChangeLog
> 
>   * gcc-interface/trans.cc: Include ipa-strub.h.
>   (gigi): Make internal decls for targets of compiler-generated
>   calls strub-callable too.
>   (build_raise_check): Likewise.
>   * gcc-interface/utils.cc: Include ipa-strub.h.
>   (handle_strub_attribute): New.
>   (gnat_internal_attribute_table): Add strub.
> 
> for  gcc/testsuite/ChangeLog
> 
>   * c-c++-common/strub-O0.c: New.
>   * c-c++-common/strub-O1.c: New.
>   * c-c++-common/strub-O2.c: New.
>   * c-c++-common/strub-O2fni.c: New.
>   * c-c++-common/strub-O3.c: New.
>   * c-c++-common/strub-O3fni.c: New.
>   * c-c++-common/strub-Og.c: New.
>   * c-c++-common/strub-Os.c: New.
>   * c-c++-common/strub-all1.c: New.
>   * c-c++-common/strub-all2.c: New.
>   * c-c++-common/strub-apply1.c: New.
>   * c-c++-common/strub-apply2.c: New.
>   * c-c++-common/strub-apply3.c: New.
>   * c-c++-common/strub-apply4.c: New.
>   * c-c++-common/strub-at-calls1.c: New.
>   * c-c++-common/strub-at-calls2.c: New.
>   * c-c++-common/strub-defer-O1.c: New.
>   * c-c++-common/strub-defer-O2.c: New.
>   * c-c++-common/strub-defer-O3.c: New.
>   * c-c++-common/strub-defer-Os.c: New.
>   * c-c++-common/strub-internal1.c: New.
>   * c-c++-common/strub-internal2.c: New.
>   * c-c++-common/strub-parms1.c: New.
>   * c-c++-common/strub-parms2.c: New.
>   * c-c++-common/strub-parms3.c: New.
>   * c-c++-common/strub-relaxed1.c: New.
>   * c-c++-common/strub-relaxed2.c: New.
>   * c-c++-common/strub-short-O0-exc.c: New.
>   * c-c++-common/strub-short-O0.c: New.
>   * c-c++-common/strub-short-O1.c: New.
>   * c-c++-common/strub-short-O2.c: New.
>   * c-c++-common/strub-short-O3.c: New.
>   * c-c++-common/strub-short-Os.c: New.
>   * c-c++-common/strub-strict1.c: New.
>   * c-c++-common/strub-strict2.c: New.
>   * c-c++-common/strub-tail-O1.c: New.
>   * c-c++-common/strub-tail-O2.c: New.
>   * c-c++-common/torture/strub-callable1.c: New.
>   * c-c++-common/torture/strub-callable2.c: New.
>   * c-c++-common/torture/strub-const1.c: New.
>   * c-c++-common/torture/strub-const2.c: New.
>   * c-c++-common/torture/strub-const3.c: New.
>   * c-c++-common/torture/strub-const4.c: New.
>   * c-c++-common/torture/strub-data1.c: New.
>   * c-c++-common/torture/strub-data2.c: New.
>   * c-c++-common/tor

Re: [PATCH] libgcc: Avoid -Wbuiltin-declaration-mismatch warnings in emutls.c

2023-12-06 Thread Richard Biener

On Wed, 6 Dec 2023, Jakub Jelinek wrote:

> Hi!
> 
> When libgcc is being built in --disable-tls configuration or on
> a target without native TLS support, one gets annoying warnings:
> ../../../../libgcc/emutls.c:61:7: warning: conflicting types for built-in 
> function ?__emutls_get_address?; expected ?void *(void *)? 
> [-Wbuiltin-declaration-mismatch]
>61 | void *__emutls_get_address (struct __emutls_object *);
>   |   ^~~~
> ../../../../libgcc/emutls.c:63:6: warning: conflicting types for built-in 
> function ?__emutls_register_common?; expected ?void(void *, unsigned int,  
> unsigned int,  void *)? [-Wbuiltin-declaration-mismatch]
>63 | void __emutls_register_common (struct __emutls_object *, word, word, 
> void *);
>   |  ^~~~
> ../../../../libgcc/emutls.c:140:1: warning: conflicting types for built-in 
> function ?__emutls_get_address?; expected ?void *(void *)? 
> [-Wbuiltin-declaration-mismatch]
>   140 | __emutls_get_address (struct __emutls_object *obj)
>   | ^~~~
> ../../../../libgcc/emutls.c:204:1: warning: conflicting types for built-in 
> function ?__emutls_register_common?; expected ?void(void *, unsigned int,  
> unsigned int,  void *)? [-Wbuiltin-declaration-mismatch]
>   204 | __emutls_register_common (struct __emutls_object *obj,
>   | ^~~~
> The thing is that in that case __emutls_get_address and
> __emutls_register_common are builtins, and are declared with void *
> arguments rather than struct __emutls_object *.
> Now, struct __emutls_object is a type private to libgcc/emutls.c and the
> middle-end creates on demand when calling the builtins a similar structure
> (with small differences, like not having the union in there).
> 
> We have a precedent for this e.g. for fprintf or strftime builtins where
> the builtins are created with magic fileptr_type_node or 
> const_tm_ptr_type_node
> types and then match it with user definition of pointers to some structure,
> but I think for this case users should never define these functions
> themselves nor call them and having special types for them in the compiler
> would mean extra compile time spent during compiler initialization and more
> GC data, so I think it is better to keep the compiler as is.
> 
> On the library side, there is an option to just follow what the
> compiler is doing and do
>  EMUTLS_ATTR void
> -__emutls_register_common (struct __emutls_object *obj,
> +__emutls_register_common (void *xobj,
>word size, word align, void *templ)
>  {
> +  struct __emutls_object *obj = (struct __emutls_object *) xobj;
> but that will make e.g. libabigail complain about ABI change in libgcc.
> 
> So, the patch just turns the warning off.
> 
> Tested on x86_64-linux with --disable-tls, ok for trunk?

Works for me.

Richard.

> 2023-12-06  Thomas Schwinge  
>   Jakub Jelinek  
> 
>   PR libgcc/109289
>   * emutls.c: Add GCC diagnostic ignored "-Wbuiltin-declaration-mismatch"
>   pragma.
> 
> --- libgcc/emutls.c.jj2023-01-16 11:52:16.780723793 +0100
> +++ libgcc/emutls.c   2023-12-06 10:49:46.438060090 +0100
> @@ -57,6 +57,14 @@ struct __emutls_array
>  #  define EMUTLS_ATTR
>  #endif
>  
> +/* __emutls_get_address and __emutls_register_common are registered as
> +   builtins, but the compiler struct __emutls_object doesn't have
> +   a union in there and is only created when actually needed for
> +   the calls to the builtins, so the builtins are created with void *
> +   arguments rather than struct __emutls_object *.  Avoid
> +   -Wbuiltin-declaration-mismatch warnings.  */
> +#pragma GCC diagnostic ignored "-Wbuiltin-declaration-mismatch"
> +
>  EMUTLS_ATTR
>  void *__emutls_get_address (struct __emutls_object *);
>  EMUTLS_ATTR
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH v7 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-12-06 Thread waffl3x

Follow up to this, bootstrapped and tested with no regressions.

On Wednesday, December 6th, 2023 at 12:33 AM, waffl3x  
wrote:


> 
> 
> Here is the next version, it feels very close to finished. As before, I
> haven't ran a bootstrap or the full testsuite yet but I did run the
> explicit-obj tests which completed as expected.
> 
> There's a few test cases that still need to be written but more tests
> can always be added. The behavior added by CWG2789 works in at least
> one case, but I have not added tests for it yet. The test cases for
> dependent lambda expressions need to be fleshed out more, but a few
> temporary ones are included to demonstrate that they do work and that
> the crash is fixed. Explicit object conversion functions work, but I
> need to add fleshed out tests for them, explicit-obj-basic5.C has that
> test.
> 
> I'll start the tests now and report back if anything fails, I'm
> confident everything will be fine though.
> 
> Alex

Re: Causes to nvptx bootstrap fail: [PATCH v5] Introduce strub: machine-independent stack scrubbing

2023-12-06 Thread Thomas Schwinge

Hi Alexandre!

On 2023-12-06T09:36:33+0100, Tobias Burnus  wrote:
> FYI the newly added file libgcc/strub.c of this patch (aka commit 
> r14-6201-gf0a90c7d7333fc )
> causes that nvptx does not bootstrap, failing with:

('s%bootstrap%build'.)

> ./gcc/as -v -o strub.o strub.s
> Verifying sm_30 code with sm_50 code generation.
>   ptxas -c -o /dev/null strub.o --gpu-name sm_50 -O0
> ptxas strub.o, line 22; error   : Arguments mismatch for instruction 'st'
> ptxas strub.o, line 22; error   : Unknown symbol '%frame'
> [...]

Per commit r14-6201-gf0a90c7d7333fc7f554b906245c84bdf04d716d7
"Introduce strub: machine-independent stack scrubbing", we have:

A function associated with @code{at-calls} @code{strub} mode
(@code{strub("at-calls")}, or just @code{strub}) undergoes interface
changes.  Its callers are adjusted to match the changes, and to scrub
(overwrite with zeros) the stack space used by the called function after
it returns.

As I understand things, this cannot be implemented (at the call site) for
nvptx, given that the callee's stack is not visible there: PTX is unusual
in that the concept of a "standard" stack isn't exposed.

Instead of allowing "strub" pieces that can be implemented, should this
whole machinery generally be disabled (forced '-fstrub=disable', or via a
new target hook?)?  The libgcc functions should then not get defined
(thus, linker error upon accidental use), or should just '__builtin_trap'
if that makes more sense?  Need an effective-target for the test cases.

Alternatively, we may also leave the generic middle end handling alive,
and 'sorry' (or similar) in the nvptx back end, as necessary?

Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH v7 3/5] OpenMP: Pointers and member mappings

2023-12-06 Thread Tobias Burnus


Hi Julian,

LGTM, except for:

* The 'target exit data' handling - comments below - looks a bit
fishy/inconsistent.

I intent to have a closer look with more code context, but maybe you
should have a look at it as well.

BTW: Fortran deep-mapping is not yet on mainline. Are you aware of
changes or in particular testcases on OG13 related to your patch series
that should be included when upstreaming that auto-mapping of
allocatable components patch?

* * *

On 19.08.23 00:47, Julian Brown wrote:

This patch changes the mapping node arrangement used for array components
of derived types in order to accommodate for changes made in the previous
patch, particularly the use of "GOMP_MAP_ATTACH_DETACH" for pointer-typed
derived-type members instead of "GOMP_MAP_ALWAYS_POINTER".

We change the mapping nodes used for a derived-type mapping like this:

   type T
   integer, pointer, dimension(:) :: arrptr
   end type T

   type(T) :: tvar
   [...]
   !$omp target map(tofrom: tvar%arrptr)

So that the nodes used look like this:

   1) map(to: tvar%arrptr)   -->
   GOMP_MAP_TO [implicit]  *tvar%arrptr%data  (the array data)
   GOMP_MAP_TO_PSETtvar%arrptr(the descriptor)
   GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data

   2) map(tofrom: tvar%arrptr(3:8)   -->
   GOMP_MAP_TOFROM *tvar%arrptr%data(3)  (size 8-3+1, etc.)


[Clarification for the patch readers - not to the patch writer]

At least to me the wording or implications of (1) were not fully clear to
me. However, it is intended that the implicit mapping ensures that the
pointer is actually mapped.

Fortunately, that's indeed the case :-)
as testing and also the new testcase 
libgomp/testsuite/libgomp.fortran/map-subarray-4.f90  shows.

The related part comes from OpenMP 5.1 [349:9-12] or slightly refined
later, here the TR12 wording:

If a list item in a map clause is an associated pointer that is not 
attach-ineligible and the pointer is
not the base pointer of another list item in a map clause on the same 
construct, then it is treated as if
its pointer target is implicitly mapped in the same clause. For the purposes of 
the map clause, the
mapped pointer target is treated as if its base pointer is the associated 
pointer."  [213:18-21]

* * *


[...]
In the middle end, we process mappings using the struct sibling-list
handling machinery by moving the "GOMP_MAP_TO_PSET" node from the middle
of the group of three mapping nodes to the proper sorted position after
the GOMP_MAP_STRUCT mapping:

   GOMP_MAP_STRUCT   tvar (len: 1)
   GOMP_MAP_TO_PSET  tvar%arr (size: 64, etc.)  <--. moved here
   [...]   |
   GOMP_MAP_TOFROM *tvar%arrptr%data(3) ___|
   GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data

In another case, if we have an array of derived-type values "dtarr",
and mappings like:

   i = 1
   j = 1
   map(to: dtarr(i)%arrptr) map(tofrom: dtarr(j)%arrptr(3:8))

We still map the same way, but this time we cannot prove that the base
expressions "dtarr(i) and "dtarr(j)" are the same in the front-end.
So we keep both mappings, but we move the "[implicit]" mapping of the
full-array reference to the end of the clause list in gimplify.cc (by
adjusting the topological sorting algorithm):

   GOMP_MAP_STRUCT dtvar  (len: 2)
   GOMP_MAP_TO_PSETdtvar(i)%arrptr
   GOMP_MAP_TO_PSETdtvar(j)%arrptr
   [...]
   GOMP_MAP_TOFROM *dtvar(j)%arrptr%data(3)  (size: 8-3+1)
   GOMP_MAP_ATTACH_DETACH  dtvar(j)%arrptr%data
   GOMP_MAP_TO [implicit]  *dtvar(i)%arrptr%data(1)  (size: whole array)
   GOMP_MAP_ATTACH_DETACH  dtvar(i)%arrptr%data

Always moving "[implicit]" full-array mappings after array-section
mappings (without that bit set) means that we'll avoid copying the whole
array unnecessarily -- even in cases where we can't prove that the arrays
are the same.

The patch also fixes some bugs with "enter data" and "exit data"
directives with this new mapping arrangement.  Also now if you have
mappings like this:

   #pragma omp target enter data map(to: dv, dv%arr(1:20))

The whole of the derived-type variable "dv" is mapped, so the
GOMP_MAP_TO_PSET for the array-section mapping can be dropped:

   GOMP_MAP_TOdv

   GOMP_MAP_TO*dv%arr%data
   GOMP_MAP_TO_PSET   dv%arr <-- deleted (array section mapping)
   GOMP_MAP_ATTACH_DETACH dv%arr%data

To accommodate for recent changes to mapping nodes made by
Tobias, this version of the patch avoids using GOMP_MAP_TO_PSET
for "exit data" directives, in favour of using the "correct"
GOMP_MAP_RELEASE/GOMP_MAP_DELETE kinds during early expansion.  A new
flag is introduced so the middle-end knows when the latter two kinds
are being used specifically for an array descriptor.

This version of the patch is based on the version posted for the og13
branch:

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/62.html

[...]


+  if (OMP_CLAUSE_MAP_KIND (node) == GOMP_MAP_DELETE
+   || OMP_CLAUSE_MAP_KIND (node) == G

Re: [PATCH v8] c++: implement P2564, consteval needs to propagate up [PR107687]

2023-12-06 Thread Prathamesh Kulkarni

On Tue, 5 Dec 2023 at 06:18, Marek Polacek  wrote:
>
> On Mon, Dec 04, 2023 at 04:49:29PM -0500, Jason Merrill wrote:
> > On 12/4/23 15:23, Marek Polacek wrote:
> > > +/* FN is not a consteval function, but may become one.  Remember to
> > > +   escalate it after all pending templates have been instantiated.  */
> > > +
> > > +void
> > > +maybe_store_immediate_escalating_fn (tree fn)
> > > +{
> > > +  if (unchecked_immediate_escalating_function_p (fn))
> > > +remember_escalating_expr (fn);
> > > +}
> >
> > > +++ b/gcc/cp/decl.cc
> > > @@ -18441,7 +18441,10 @@ finish_function (bool inline_p)
> > >if (!processing_template_decl
> > >&& !DECL_IMMEDIATE_FUNCTION_P (fndecl)
> > >&& !DECL_OMP_DECLARE_REDUCTION_P (fndecl))
> > > -cp_fold_function (fndecl);
> > > +{
> > > +  cp_fold_function (fndecl);
> > > +  maybe_store_immediate_escalating_fn (fndecl);
> > > +}
> >
> > I think maybe_store_, and the call to it from finish_function, are unneeded;
> > we will have already decided whether we need to remember the function during
> > the call to cp_fold_function.
>
> 'Tis true.
>
> > OK with that change.
>
> Here's what I pushed after another regtest.  Thanks!
Hi Marek,
It seems the patch caused following regressions on aarch64:

Running g++:g++.dg/modules/modules.exp ...
FAIL: g++.dg/modules/xtreme-header-4_b.C -std=c++2b (internal compiler
error: tree check: expected class 'type', have 'declaration'
(template_decl) in get_originating_module_decl, at cp/module.cc:18659)
FAIL: g++.dg/modules/xtreme-header-5_b.C -std=c++2b (internal compiler
error: tree check: expected class 'type', have 'declaration'
(template_decl) in get_originating_module_decl, at cp/module.cc:18659)
FAIL: g++.dg/modules/xtreme-header_b.C -std=c++2b (internal compiler
error: tree check: expected class 'type', have 'declaration'
(template_decl) in get_originating_module_decl, at cp/module.cc:18659)

Log files: 
https://ci.linaro.org/job/tcwg_gcc_check--master-aarch64-build/1299/artifact/artifacts/00-sumfiles/

Thanks,
Prathamesh
>
> -- >8 --
> This patch implements P2564, described at , whereby
> certain functions are promoted to consteval.  For example:
>
>   consteval int id(int i) { return i; }
>
>   template 
>   constexpr int f(T t)
>   {
> return t + id(t); // id causes f to be promoted to consteval
>   }
>
>   void g(int i)
>   {
> f (3);
>   }
>
> now compiles.  Previously the code was ill-formed: we would complain
> that 't' in 'f' is not a constant expression.  Since 'f' is now
> consteval, it means that the call to id(t) is in an immediate context,
> so doesn't have to produce a constant -- this is how we allow consteval
> functions composition.  But making 'f' consteval also means that
> the call to 'f' in 'g' must yield a constant; failure to do so results
> in an error.  I made the effort to have cc1plus explain to us what's
> going on.  For example, calling f(i) produces this neat diagnostic:
>
> w.C:11:11: error: call to consteval function 'f(i)' is not a constant 
> expression
>11 | f (i);
>   | ~~^~~
> w.C:11:11: error: 'i' is not a constant expression
> w.C:6:22: note: 'constexpr int f(T) [with T = int]' was promoted to an 
> immediate function because its body contains an immediate-escalating 
> expression 'id(t)'
> 6 | return t + id(t); // id causes f to be promoted to 
> consteval
>   |~~^~~
>
> which hopefully makes it clear what's going on.
>
> Implementing this proposal has been tricky.  One problem was delayed
> instantiation: instantiating a function can set off a domino effect
> where one call promotes a function to consteval but that then means
> that another function should also be promoted, etc.
>
> In v1, I addressed the delayed instantiation problem by instantiating
> trees early, so that we can escalate functions right away.  That caused
> a number of problems, and in certain cases, like consteval-prop3.C, it
> can't work, because we need to wait till EOF to see the definition of
> the function anyway.  Overeager instantiation tends to cause diagnostic
> problems too.
>
> In v2, I attempted to move the escalation to the gimplifier, at which
> point all templates have been instantiated.  That attempt flopped,
> however, because once we've gimplified a function, its body is discarded
> and as a consequence, you can no longer evaluate a call to that function
> which is required for escalating, which needs to decide if a call is
> a constant expression or not.
>
> Therefore, we have to perform the escalation before gimplifying, but
> after instantiate_pending_templates.  That's not easy because we have
> no way to walk all the trees.  In the v2 patch, I use two vectors: one
> to store function decls that may become consteval, and another to
> remember references to immediate-escalating functions.  Unfortunately
> the latter must also stash functions that call immediate-escalating
> functions.  Consider:
>
>

Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

2023-12-06 Thread Richard Biener

On Wed, Dec 6, 2023 at 10:46 AM Hao Liu OS  wrote:
>
> Hi,
>
> Update the patch to fix problems in the test case:
>  - add "-details" option to the dump command
>  - add dg-require and target filters to avoid potential failures on platforms 
> that don't support vectorization.

Interesting simple trick - the downside is that this makes the
recursive dependence
of SCEV on niter analysis and niter analysis on SCEV even "worse".  Also you
set the flag on CHRECs that are not necessarily cached, so I'm not sure how
effective this will be ...

Can you try to do some statistics on say SPEC CPU?  I'm usually
building (with -j1) with -fopt-info-vec and diff build logs, you can then see
how many more loops (and which) we vectorize additionally?

Thanks,
Richard.

> Thanks,
> -Hao
>
> gcc/ChangeLog:
>
> PR tree-optimization/112774
> * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
> printed with additional  info.
> * tree-scalar-evolution.cc: add record_nonwrapping_chrec and
> nonwrapping_chrec_p to set and check the new flag respectively.
> * tree-scalar-evolution.h: Likewise.
> * tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
> infer_loop_bounds_from_pointer_arith, 
> infer_loop_bounds_from_signedness,
> scev_probably_wraps_p): call record_nonwrapping_chrec before
> record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is 
> set and
> return false from scev_probably_wraps_p.
> * tree-vect-loop.cc (vect_analyze_loop): call
> free_numbers_of_iterations_estimates explicitly.
> * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
> to represent the nonwrapping info.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/scev-16.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 18 ++
>  gcc/tree-pretty-print.cc|  2 +-
>  gcc/tree-scalar-evolution.cc| 24 
>  gcc/tree-scalar-evolution.h |  2 ++
>  gcc/tree-ssa-loop-niter.cc  | 21 -
>  gcc/tree-vect-loop.cc   |  4 
>  gcc/tree.h  |  8 +---
>  7 files changed, 70 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> new file mode 100644
> index 000..120f40c0b6c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> +
> +int A[1024 * 2];
> +
> +int foo (unsigned offset, unsigned N)
> +{
> +  int sum = 0;
> +
> +  for (unsigned i = 0; i < N; i++)
> +sum += A[i + offset];
> +
> +  return sum;
> +}
> +
> +/* Loop can be vectorized by referring "i + offset" is nonwrapping from 
> array.  */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { ! { 
> avr-*-* msp430-*-* pru-*-* } } } } } */
> diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> index 1fadd752d05..0dabb6d1580 100644
> --- a/gcc/tree-pretty-print.cc
> +++ b/gcc/tree-pretty-print.cc
> @@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int 
> spc, dump_flags_t flags,
>dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
>pp_string (pp, ", +, ");
>dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
> -  pp_string (pp, "}_");
> +  pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}_");
>pp_scalar (pp, "%u", CHREC_VARIABLE (node));
>is_stmt = false;
>break;
> diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> index f61277c32df..81630603c12 100644
> --- a/gcc/tree-scalar-evolution.cc
> +++ b/gcc/tree-scalar-evolution.cc
> @@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
>return res;
>  }
>
> +/* If CHREC doesn't overflow, set the nonwrapping flag.  */
> +
> +void record_nonwrapping_chrec (tree chrec)
> +{
> +  CHREC_NOWRAP(chrec) = 1;
> +
> +  if (dump_file && (dump_flags & TDF_SCEV))
> +{
> +  fprintf (dump_file, "(record_nonwrapping_chrec: ");
> +  print_generic_expr (dump_file, chrec);
> +  fprintf (dump_file, ")\n");
> +}
> +}
> +
> +/* Return true if CHREC's nonwrapping flag is set.  */
> +
> +bool nonwrapping_chrec_p (tree chrec)
> +{
> +  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
> +return false;
> +
> +  return CHREC_NOWRAP(chrec);
> +}
> +
>  /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */
>
>  static tree
> diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> index a64ed78fe63..f57fde12ee2 100644
> --- a/gcc/tree-scalar-evolution.h
> +++ b/gcc/tree-scalar-evolution.h
> @@ -43,6 +43,8 @@ extern

Re: [PATCH v3 10/11] c: Turn -Wincompatible-pointer-types into a permerror

2023-12-06 Thread Prathamesh Kulkarni

On Mon, 20 Nov 2023 at 15:28, Florian Weimer  wrote:
>
> The change to build_conditional_expr drops the downgrade
> from a pedwarn to warning for builtins for C99 and later
> language dialects.  It remains a warning in C89 mode (not
> a permerror), as the -std=gnu89 -fno-permissive test shows.
Hi Florian,
It seems this patch caused a fallout for
gcc.dg/fixed-point/composite-type.c on arm, where the tests for
warnings fail.
For instance:
FAIL: gcc.dg/fixed-point/composite-type.c  (test for warnings, line 71)
Excess errors:
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.dg/fixed-point/composite-type.c:71:3:
error: passing argument 1 of 'f2_sf' from incompatible pointer type
[-Wincompatible-pointer-types]
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.dg/fixed-point/composite-type.c:71:3:
error: passing argument 1 of 'f2_sf' from incompatible pointer type
[-Wincompatible-pointer-types]
(snipped rest)

Should these warnings be now upgraded to dg-error ?

Thanks,
Prathamesh
>
> gcc/
>
> * doc/invoke.texi (Warning Options): Document changes.
>
> gcc/c/
>
> PR c/96284
> * c-typeck.cc (build_conditional_expr): Upgrade most pointer
> type mismatches to a permerror.
> (convert_for_assignment): Use permerror_opt and
> permerror_init for OPT_Wincompatible_pointer_types warnings.
>
> gcc/testsuite/
>
> * gcc.dg/permerror-default.c (incompatible_pointer_types):
> Expect new permerror.
> * gcc.dg/permerror-gnu89-nopermissive.c
> (incompatible_pointer_types):   Likewise.
> * gcc.dg/permerror-pedantic.c (incompatible_pointer_types):
> Likewise.
> * gcc.dg/permerror-system.c: Likewise.
> * gcc.dg/Wincompatible-pointer-types-2.c: Compile with
> -fpermissivedue to expected errors.
> * gcc.dg/Wincompatible-pointer-types-5.c: New test.  Copied
> from gcc.dg/Wincompatible-pointer-types-2.c.  Expect errors.
> * gcc.dg/anon-struct-11.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/anon-struct-11a.c: New test.  Copied from
> gcc.dg/anon-struct-11.c.  Expect errors.
> * gcc.dg/anon-struct-13.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/anon-struct-13a.c: New test.  Copied from
> gcc.dg/anon-struct-13.c.  Expect errors.
> * gcc.dg/builtin-arith-overflow-4.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/builtin-arith-overflow-4a.c: New test.  Copied from
> gcc.dg/builtin-arith-overflow-4.c.  Expect errors.
> * gcc.dg/c23-qual-4.c: Expect -Wincompatible-pointer-types errors.
> * gcc.dg/dfp/composite-type.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/dfp/composite-type-2.c: New test.  Copied from
> gcc.dg/dfp/composite-type.c.  Expect errors.
> * gcc.dg/diag-aka-1.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/diag-aka-1a.c: New test.  Copied from gcc.dg/diag-aka-1a.c.
> Expect errors.
> * gcc.dg/enum-compat-1.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/enum-compat-2.c: New test.  Copied from
> gcc.dg/enum-compat-1.c.  Expect errors.
> * gcc.dg/func-ptr-conv-1.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/func-ptr-conv-2.c: New test.  Copied from
> gcc.dg/func-ptr-conv-1.c.  Expect errors.
> * gcc.dg/init-bad-7.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/init-bad-7a.c: New test.  Copied from gcc.dg/init-bad-7.c.
> Expect errors.
> * gcc.dg/noncompile/incomplete-3.c (foo): Expect
> -Wincompatible-pointer-types error.
> * gcc.dg/param-type-mismatch-2.c (test8): Likewise.
> * gcc.dg/pointer-array-atomic.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/pointer-array-atomic-2.c: New test.  Copied from
> gcc.dg/pointer-array-atomic.c.  Expect errors.
> * gcc.dg/pointer-array-quals-1.c (test): Expect
> -Wincompatible-pointer-types errors.
> * gcc.dg/transparent-union-1.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/transparent-union-1a.c: New test.  Copied from
> gcc.dg/transparent-union-1.c.  Expect errors.
> * gcc.target/aarch64/acle/memtag_2a.c
> (test_memtag_warning_return_qualifier): Expect additional
> errors.
> * gcc.target/aarch64/sve/acle/general-c/load_2.c (f1): Likewise.
> * gcc.target/aarch64/sve/acle/general-c/load_ext_gather_offset_1.c
> (f1): Likewise.
> * gcc.target/aarch64/sve/acle/general-c/load_ext_gather_offset_2.c
> (f1): Likewise.
> * gcc.target/aarch64/sve/acle/general-c/load

Re: [PATCH v3 00/16] Support Intel APX NDD

2023-12-06 Thread Uros Bizjak

On Wed, Dec 6, 2023 at 9:08 AM Hongyu Wang  wrote:
>
> Hi,
>
> Following up the discussion of V2 patches in
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639368.html,
> this patch series add early clobber for all TImode NDD alternatives
> to avoid any potential overlapping between dest register and src
> register/memory. Also use get_attr_isa (insn) == ISA_APX_NDD instead of
> checking alternative at asm output stage.
>
> Bootstrapped & regtested on x86_64-pc-linux-gnu{-m32,} and sde.
>
> Ok for master?

LGTM, but Hongtao should have the final approval here.

Thanks,
Uros.

>
> Hongyu Wang (7):
>   [APX NDD] Disable seg_prefixed memory usage for NDD add
>   [APX NDD] Support APX NDD for left shift insns
>   [APX NDD] Support APX NDD for right shift insns
>   [APX NDD] Support APX NDD for rotate insns
>   [APX NDD] Support APX NDD for shld/shrd insns
>   [APX NDD] Support APX NDD for cmove insns
>   [APX NDD] Support TImode shift for NDD
>
> Kong Lingling (9):
>   [APX NDD] Support Intel APX NDD for legacy add insn
>   [APX NDD] Support APX NDD for optimization patterns of add
>   [APX NDD] Support APX NDD for adc insns
>   [APX NDD] Support APX NDD for sub insns
>   [APX NDD] Support APX NDD for sbb insn
>   [APX NDD] Support APX NDD for neg insn
>   [APX NDD] Support APX NDD for not insn
>   [APX NDD] Support APX NDD for and insn
>   [APX NDD] Support APX NDD for or/xor insn
>
>  gcc/config/i386/constraints.md|5 +
>  gcc/config/i386/i386-expand.cc|  164 +-
>  gcc/config/i386/i386-options.cc   |2 +
>  gcc/config/i386/i386-protos.h |   16 +-
>  gcc/config/i386/i386.cc   |   30 +-
>  gcc/config/i386/i386.md   | 2325 +++--
>  gcc/testsuite/gcc.target/i386/apx-ndd-adc.c   |   15 +
>  gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c  |   16 +
>  gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c   |6 +
>  .../gcc.target/i386/apx-ndd-shld-shrd.c   |   24 +
>  .../gcc.target/i386/apx-ndd-ti-shift.c|   91 +
>  gcc/testsuite/gcc.target/i386/apx-ndd.c   |  202 ++
>  12 files changed, 2141 insertions(+), 755 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd.c
>
> --
> 2.31.1
>

Re: [PATCH] testsuite: Adjust for the new permerror -Wincompatible-pointer-types

2023-12-06 Thread Florian Weimer

* Yang Yujie:

> From: Yang Yujie 
> Subject: [PATCH] testsuite: Adjust for the new permerror
>  -Wincompatible-pointer-types
> To: gcc-patches@gcc.gnu.org
> Cc: r...@cebitec.uni-bielefeld.de, mikest...@comcast.net, fwei...@redhat.com,
>  Yang Yujie 
> Date: Wed,  6 Dec 2023 10:29:31 +0800 (9 hours, 42 minutes, 7 seconds ago)
> Message-ID: <20231206022931.33437-1-yangyu...@loongson.cn>
>
> r14-6037 turned -Wincompatible-pointer-types into a permerror,
> which causes the following tests to fail.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/fixed-point/composite-type.c: replace dg-warning with dg-error.
> ---
>  .../gcc.dg/fixed-point/composite-type.c   | 64 +--
>  1 file changed, 32 insertions(+), 32 deletions(-)

Looks reasonable to me, but I can't approve it.

Thanks,
Florian

Re: [PATCH v3 10/11] c: Turn -Wincompatible-pointer-types into a permerror

2023-12-06 Thread Florian Weimer

* Prathamesh Kulkarni:

> On Mon, 20 Nov 2023 at 15:28, Florian Weimer  wrote:
>>
>> The change to build_conditional_expr drops the downgrade
>> from a pedwarn to warning for builtins for C99 and later
>> language dialects.  It remains a warning in C89 mode (not
>> a permerror), as the -std=gnu89 -fno-permissive test shows.
> Hi Florian,
> It seems this patch caused a fallout for
> gcc.dg/fixed-point/composite-type.c on arm, where the tests for
> warnings fail.
> For instance:
> FAIL: gcc.dg/fixed-point/composite-type.c  (test for warnings, line 71)
> Excess errors:
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.dg/fixed-point/composite-type.c:71:3:
> error: passing argument 1 of 'f2_sf' from incompatible pointer type
> [-Wincompatible-pointer-types]
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.dg/fixed-point/composite-type.c:71:3:
> error: passing argument 1 of 'f2_sf' from incompatible pointer type
> [-Wincompatible-pointer-types]
> (snipped rest)
>
> Should these warnings be now upgraded to dg-error ?

Right, a patch for this was posted (but I don't see it in the archives):

From: Yang Yujie 
Subject: [PATCH] testsuite: Adjust for the new permerror
 -Wincompatible-pointer-types
To: gcc-patches@gcc.gnu.org
Cc: r...@cebitec.uni-bielefeld.de, mikest...@comcast.net, fwei...@redhat.com,
 Yang Yujie 
Date: Wed,  6 Dec 2023 10:29:31 +0800 (9 hours, 42 minutes, 7 seconds ago)
Message-ID: <20231206022931.33437-1-yangyu...@loongson.cn>

Thanks,
Florian

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Jakub Jelinek

On Mon, Sep 18, 2023 at 11:26:49PM +0200, Martin Uecker via Gcc-patches wrote:
> Add option Walloc-size that warns about allocations that have
> insufficient storage for the target type of the pointer the
> storage is assigned to.
> 
>   PR c/71219
> gcc:
>   * doc/invoke.texi: Document -Walloc-size option.
> 
> gcc/c-family:
> 
>   * c.opt (Walloc-size): New option.
> 
> gcc/c:
>   * c-typeck.cc (convert_for_assignment): Add warning.
> 
> gcc/testsuite:
> 
>   * gcc.dg/Walloc-size-1.c: New test.

> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -7384,6 +7384,33 @@ convert_for_assignment (location_t location, 
> location_t expr_loc, tree type,
>   "request for implicit conversion "
>   "from %qT to %qT not permitted in C++", rhstype, type);
>  
> +  /* Warn of new allocations that are not big enough for the target
> +  type.  */
> +  tree fndecl;
> +  if (warn_alloc_size
> +   && TREE_CODE (rhs) == CALL_EXPR
> +   && (fndecl = get_callee_fndecl (rhs)) != NULL_TREE
> +   && DECL_IS_MALLOC (fndecl))
> + {
> +   tree fntype = TREE_TYPE (fndecl);
> +   tree fntypeattrs = TYPE_ATTRIBUTES (fntype);
> +   tree alloc_size = lookup_attribute ("alloc_size", fntypeattrs);
> +   if (alloc_size)
> + {
> +   tree args = TREE_VALUE (alloc_size);
> +   int idx = TREE_INT_CST_LOW (TREE_VALUE (args)) - 1;
> +   /* For calloc only use the second argument.  */
> +   if (TREE_CHAIN (args))
> + idx = TREE_INT_CST_LOW (TREE_VALUE (TREE_CHAIN (args))) - 1;

I wonder if this part isn't too pedantic or more of a code style.
Some packages fail to build with this with -Werror because they do
  struct S *p = calloc (sizeof (struct S), 1);
or similar.  It is true that calloc arguments are documented to be
nmemb, size, but given sufficient alignment (which is not really different
between either order of arguments) isn't it completely valid to allocate
char array with sizeof (struct S) elements and then store a struct S object
into it?
Given
  struct S { int a, b; int flex[]; };
  struct S *a = calloc (sizeof (struct S), 1);
  struct S *b = calloc (1, sizeof (struct S));
  struct S *c = calloc (sizeof (struct S), n);
  struct S *d = calloc (n, sizeof (struct S));
  struct S *e = calloc (n * sizeof (struct S), 1);
  struct S *f = calloc (1, n * sizeof (struct S));
  struct S *g = calloc (offsetof (struct S, flex[0]) + n * sizeof (int), 1);
  struct S *h = calloc (1, offsetof (struct S, flex[0]) + n * sizeof (int));
what from these feels like a potentially dangerous use compared to
just style that some project might want to enforce?
I'd say b, d, h deserve no warning at all, the rest are just style warnings
(e and f might have warnings desirable as a potential security problem
because it doesn't deal with overflows).

So, for -Walloc-size as written, wouldn't it be better to check for calloc
both arguments, if both are constant, either just multiply them and check
the product or check if both are smaller than TYPE_SIZE_UNIT, if only one of
them is constant, check that one with possible exception of 1 (in which case
try say if the other argument is multiple_of_p or something similar)?

> +   tree arg = CALL_EXPR_ARG (rhs, idx);
> +   if (TREE_CODE (arg) == INTEGER_CST
> +   && tree_int_cst_lt (arg, TYPE_SIZE_UNIT (ttl)))
> +  warning_at (location, OPT_Walloc_size, "allocation of "
> +  "insufficient size %qE for type %qT with "
> +  "size %qE", arg, ttl, TYPE_SIZE_UNIT (ttl));
> + }
> + }
> +

Jakub

Re: [PATCH] libsupc++: try cxa_thread_atexit_impl at runtime

2023-12-06 Thread Thomas Schwinge

Hi Alexandre!

On 2023-12-06T02:28:42-0300, Alexandre Oliva  wrote:
> libsupc++: try cxa_thread_atexit_impl at runtime
>
> g++.dg/tls/thread_local-order2.C fails when the toolchain is built for
> a platform that lacks __cxa_thread_atexit_impl, even if the program is
> built and run using that toolchain on a (later) platform that offers
> __cxa_thread_atexit_impl.
>
> This patch adds runtime testing for __cxa_thread_atexit_impl on select
> platforms (GNU variants, for starters) that support weak symbols.

Need something like:

--- libstdc++-v3/libsupc++/atexit_thread.cc
+++ libstdc++-v3/libsupc++/atexit_thread.cc
@@ -164,2 +164,4 @@ __cxxabiv1::__cxa_thread_atexit (void 
(_GLIBCXX_CDTOR_CALLABI *dtor)(void *),
 return __cxa_thread_atexit_impl (dtor, obj, dso_handle);
+#else
+  (void) dso_handle;
 #endif

... to avoid:

[...]/source-gcc/libstdc++-v3/libsupc++/atexit_thread.cc: In function ‘int 
__cxxabiv1::__cxa_thread_atexit(void (*)(void*), void*, void*)’:
[...]/source-gcc/libstdc++-v3/libsupc++/atexit_thread.cc:151:51: error: 
unused parameter ‘dso_handle’ [-Werror=unused-parameter]
  151 |  void *obj, void *dso_handle)
  | ~~^~
cc1plus: all warnings being treated as errors
make[4]: *** [atexit_thread.lo] Error 1

With that, GCC/nvptx then is back to:

UNSUPPORTED: g++.dg/tls/thread_local6.C  -std=c++98
PASS: g++.dg/tls/thread_local6.C  -std=c++14 (test for excess errors)
PASS: g++.dg/tls/thread_local6.C  -std=c++14 execution test
PASS: g++.dg/tls/thread_local6.C  -std=c++17 (test for excess errors)
PASS: g++.dg/tls/thread_local6.C  -std=c++17 execution test
PASS: g++.dg/tls/thread_local6.C  -std=c++20 (test for excess errors)
PASS: g++.dg/tls/thread_local6.C  -std=c++20 execution test


Grüße
 Thomas


> for  libstdc++-v3/ChangeLog
>
>   * config/os/gnu-linux/os_defines.h
>   (_GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL): Define.
>   * libsupc++/atexit_thread.cc [__GXX_WEAK__ &&
>   _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL]
>   (__cxa_thread_atexit): Add dynamic detection of
>   __cxa_thread_atexit_impl.
> ---
>  libstdc++-v3/config/os/gnu-linux/os_defines.h |5 +
>  libstdc++-v3/libsupc++/atexit_thread.cc   |   23 ++-
>  2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/config/os/gnu-linux/os_defines.h 
> b/libstdc++-v3/config/os/gnu-linux/os_defines.h
> index 87317031fcd71..a2e4baec069d5 100644
> --- a/libstdc++-v3/config/os/gnu-linux/os_defines.h
> +++ b/libstdc++-v3/config/os/gnu-linux/os_defines.h
> @@ -60,6 +60,11 @@
>  # define _GLIBCXX_HAVE_FLOAT128_MATH 1
>  #endif
>
> +// Enable __cxa_thread_atexit to rely on a (presumably libc-provided)
> +// __cxa_thread_atexit_impl, if it happens to be defined, even if
> +// configure couldn't find it during the build.
> +#define _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL 1
> +
>  #ifdef __linux__
>  // The following libpthread properties only apply to Linux, not GNU/Hurd.
>
> diff --git a/libstdc++-v3/libsupc++/atexit_thread.cc 
> b/libstdc++-v3/libsupc++/atexit_thread.cc
> index 9346d50f5dafe..aa4ed5312bfe3 100644
> --- a/libstdc++-v3/libsupc++/atexit_thread.cc
> +++ b/libstdc++-v3/libsupc++/atexit_thread.cc
> @@ -138,11 +138,32 @@ namespace {
>}
>  }
>
> +#if __GXX_WEAK__ && _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL
> +extern "C"
> +int __attribute__ ((__weak__))
> +__cxa_thread_atexit_impl (void (_GLIBCXX_CDTOR_CALLABI *func) (void *),
> +   void *arg, void *d);
> +#endif
> +
> +// ??? We can't make it an ifunc, can we?
>  extern "C" int
>  __cxxabiv1::__cxa_thread_atexit (void (_GLIBCXX_CDTOR_CALLABI *dtor)(void *),
> -  void *obj, void */*dso_handle*/)
> +  void *obj, void *dso_handle)
>_GLIBCXX_NOTHROW
>  {
> +#if __GXX_WEAK__ && _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL
> +  if (__cxa_thread_atexit_impl)
> +// Rely on a (presumably libc-provided) __cxa_thread_atexit_impl,
> +// if it happens to be defined, even if configure couldn't find it
> +// during the build.  _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL
> +// may be defined e.g. in os_defines.h on platforms where some
> +// versions of libc have a __cxa_thread_atexit_impl definition,
> +// but whose earlier versions didn't.  This enables programs build
> +// by toolchains compatible with earlier libc versions to still
> +// benefit from a libc-provided __cxa_thread_atexit_impl.
> +return __cxa_thread_atexit_impl (dtor, obj, dso_handle);
> +#endif
> +
>// Do this initialization once.
>if (__gthread_active_p ())
>  {
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sit

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Xi Ruoyao

On Wed, 2023-12-06 at 13:24 +0100, Jakub Jelinek wrote:
> I wonder if this part isn't too pedantic or more of a code style.
> Some packages fail to build with this with -Werror because they do
>   struct S *p = calloc (sizeof (struct S), 1);
> or similar.  It is true that calloc arguments are documented to be
> nmemb, size, but given sufficient alignment (which is not really different
> between either order of arguments) isn't it completely valid to allocate
> char array with sizeof (struct S) elements and then store a struct S object
> into it?

In PR112364 Martin Uecker has pointed out the alignment may be different
with the different order of arguments, per C23 (N2293).  With earlier
versions of the standard some people believe the alignment should not be
different, while the other people disagree (as the text is not very
clear).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH v3 0/5] target_version and aarch64 function multiversioning

2023-12-06 Thread Andrew Carlotti

This series adds support for function multiversioning on aarch64.

Patches 1-3 are already approved, with just one minor change from the previous
version of patch 1 suggested by Richard Sandiford.

Patches 4-5 are updated based on Richard's reviews.  The only major change is
replacing the EXPANDED_CLONES_ATTRIBUTE target hook with the
TARGET_HAS_FMV_TARGET_ATTRIBUTE macro.  I've also reorganised
dispatch_function_versions and aarch64_mangle_decl_assembler_name, along with
several other minor fixes.

The updated series passes regression testing on both aarch64 for C and C++.
The previous version passed testing on x86; I haven't retested it since.

Ok for master?

Thanks,
Andrew

[1/5] aarch64: Add cpu feature detection to libgcc

2023-12-06 Thread Andrew Carlotti

This is added to enable function multiversioning, but can also be used
directly.  The interface is chosen to match that used in LLVM's
compiler-rt, to facilitate cross-compiler compatibility.

The content of the patch is derived almost entirely from Pavel's prior
contributions to compiler-rt/lib/builtins/cpu_model.c. I have made minor
changes to align more closely with GCC coding style, and to exclude any code
from other LLVM contributors, and am adding this to GCC with Pavel's approval.

libgcc/ChangeLog:

* config/aarch64/t-aarch64: Include cpuinfo.c
* config/aarch64/cpuinfo.c: New file
(__init_cpu_features_constructor) New.
(__init_cpu_features_resolver) New.
(__init_cpu_features) New.

Co-authored-by: Pavel Iliin 


diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
new file mode 100644
index 
..634f591c194bc70048f714d7eb0ace1f2f4137ea
--- /dev/null
+++ b/libgcc/config/aarch64/cpuinfo.c
@@ -0,0 +1,500 @@
+/* CPU feature detection for AArch64 architecture.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+  
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#if __has_include()
+#include 
+
+#if __has_include()
+#include 
+#else
+typedef struct __ifunc_arg_t {
+  unsigned long _size;
+  unsigned long _hwcap;
+  unsigned long _hwcap2;
+} __ifunc_arg_t;
+#endif
+
+#if __has_include()
+#include 
+
+/* CPUFeatures must correspond to the same AArch64 features in aarch64.cc  */
+enum CPUFeatures {
+  FEAT_RNG,
+  FEAT_FLAGM,
+  FEAT_FLAGM2,
+  FEAT_FP16FML,
+  FEAT_DOTPROD,
+  FEAT_SM4,
+  FEAT_RDM,
+  FEAT_LSE,
+  FEAT_FP,
+  FEAT_SIMD,
+  FEAT_CRC,
+  FEAT_SHA1,
+  FEAT_SHA2,
+  FEAT_SHA3,
+  FEAT_AES,
+  FEAT_PMULL,
+  FEAT_FP16,
+  FEAT_DIT,
+  FEAT_DPB,
+  FEAT_DPB2,
+  FEAT_JSCVT,
+  FEAT_FCMA,
+  FEAT_RCPC,
+  FEAT_RCPC2,
+  FEAT_FRINTTS,
+  FEAT_DGH,
+  FEAT_I8MM,
+  FEAT_BF16,
+  FEAT_EBF16,
+  FEAT_RPRES,
+  FEAT_SVE,
+  FEAT_SVE_BF16,
+  FEAT_SVE_EBF16,
+  FEAT_SVE_I8MM,
+  FEAT_SVE_F32MM,
+  FEAT_SVE_F64MM,
+  FEAT_SVE2,
+  FEAT_SVE_AES,
+  FEAT_SVE_PMULL128,
+  FEAT_SVE_BITPERM,
+  FEAT_SVE_SHA3,
+  FEAT_SVE_SM4,
+  FEAT_SME,
+  FEAT_MEMTAG,
+  FEAT_MEMTAG2,
+  FEAT_MEMTAG3,
+  FEAT_SB,
+  FEAT_PREDRES,
+  FEAT_SSBS,
+  FEAT_SSBS2,
+  FEAT_BTI,
+  FEAT_LS64,
+  FEAT_LS64_V,
+  FEAT_LS64_ACCDATA,
+  FEAT_WFXT,
+  FEAT_SME_F64,
+  FEAT_SME_I64,
+  FEAT_SME2,
+  FEAT_RCPC3,
+  FEAT_MAX,
+  FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
+   in __aarch64_cpu_features.  */
+  FEAT_INIT  /* Used as flag of features initialization completion.  */
+};
+
+/* Architecture features used in Function Multi Versioning.  */
+struct {
+  unsigned long long features;
+  /* As features grows new fields could be added.  */
+} __aarch64_cpu_features __attribute__((visibility("hidden"), nocommon));
+
+#ifndef _IFUNC_ARG_HWCAP
+#define _IFUNC_ARG_HWCAP (1ULL << 62)
+#endif
+#ifndef AT_HWCAP
+#define AT_HWCAP 16
+#endif
+#ifndef HWCAP_CPUID
+#define HWCAP_CPUID (1 << 11)
+#endif
+#ifndef HWCAP_FP
+#define HWCAP_FP (1 << 0)
+#endif
+#ifndef HWCAP_ASIMD
+#define HWCAP_ASIMD (1 << 1)
+#endif
+#ifndef HWCAP_AES
+#define HWCAP_AES (1 << 3)
+#endif
+#ifndef HWCAP_PMULL
+#define HWCAP_PMULL (1 << 4)
+#endif
+#ifndef HWCAP_SHA1
+#define HWCAP_SHA1 (1 << 5)
+#endif
+#ifndef HWCAP_SHA2
+#define HWCAP_SHA2 (1 << 6)
+#endif
+#ifndef HWCAP_ATOMICS
+#define HWCAP_ATOMICS (1 << 8)
+#endif
+#ifndef HWCAP_FPHP
+#define HWCAP_FPHP (1 << 9)
+#endif
+#ifndef HWCAP_ASIMDHP
+#define HWCAP_ASIMDHP (1 << 10)
+#endif
+#ifndef HWCAP_ASIMDRDM
+#define HWCAP_ASIMDRDM (1 << 12)
+#endif
+#ifndef HWCAP_JSCVT
+#define HWCAP_JSCVT (1 << 13)
+#endif
+#ifndef HWCAP_FCMA
+#define HWCAP_FCMA (1 << 14)
+#endif
+#ifndef HWCAP_LRCPC
+#define HWCAP_LRCPC (1 << 15)
+#endif
+#ifndef HWCAP_DCPOP
+#define HWCAP_DCPOP (1 << 16)
+#endif
+#ifndef HWCAP_SHA3
+#define HWCAP_SHA3 (1 << 17)
+#endif
+#ifndef HWCAP_SM3
+#define HWCAP_SM3 (1 << 18)
+#endif
+#ifndef HWCAP_SM4
+#define HWCAP_SM4 (1 << 19)

[PATCH v3 2/5] c-family: Simplify attribute exclusion handling

2023-12-06 Thread Andrew Carlotti

This patch changes the handling of mutual exclusions involving the
target and target_clones attributes to use the generic attribute
exclusion lists.  Additionally, the duplicate handling for the
always_inline and noinline attribute exclusion is removed.

The only change in functionality is the choice of warning message
displayed - due to either a change in the wording for mutual exclusion
warnings, or a change in the order in which different checks occur.

gcc/c-family/ChangeLog:

* c-attribs.cc (attr_always_inline_exclusions): New.
(attr_target_exclusions): Ditto.
(attr_target_clones_exclusions): Ditto.
(c_common_attribute_table): Add new exclusion lists.
(handle_noinline_attribute): Remove custom exclusion handling.
(handle_always_inline_attribute): Ditto.
(handle_target_attribute): Ditto.
(handle_target_clones_attribute): Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mvc2.C:
* g++.target/i386/mvc3.C:


diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 
461732f60f7c4031cc6692000fbdddb9f726a035..b3b41ef123a0f171f57acb1b7f7fdde716428c00
 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -214,6 +214,13 @@ static const struct attribute_spec::exclusions 
attr_inline_exclusions[] =
   ATTR_EXCL (NULL, false, false, false),
 };
 
+static const struct attribute_spec::exclusions attr_always_inline_exclusions[] 
=
+{
+  ATTR_EXCL ("noinline", true, true, true),
+  ATTR_EXCL ("target_clones", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
 static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
 {
   ATTR_EXCL ("always_inline", true, true, true),
@@ -221,6 +228,19 @@ static const struct attribute_spec::exclusions 
attr_noinline_exclusions[] =
   ATTR_EXCL (NULL, false, false, false),
 };
 
+static const struct attribute_spec::exclusions attr_target_exclusions[] =
+{
+  ATTR_EXCL ("target_clones", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
+static const struct attribute_spec::exclusions attr_target_clones_exclusions[] 
=
+{
+  ATTR_EXCL ("always_inline", true, true, true),
+  ATTR_EXCL ("target", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
 extern const struct attribute_spec::exclusions attr_noreturn_exclusions[] =
 {
   ATTR_EXCL ("alloc_align", true, true, true),
@@ -332,7 +352,7 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_leaf_attribute, NULL },
   { "always_inline",  0, 0, true,  false, false, false,
  handle_always_inline_attribute,
- attr_inline_exclusions },
+ attr_always_inline_exclusions },
   { "gnu_inline", 0, 0, true,  false, false, false,
  handle_gnu_inline_attribute,
  attr_inline_exclusions },
@@ -483,9 +503,11 @@ const struct attribute_spec c_common_attribute_table[] =
   { "error", 1, 1, true,  false, false, false,
  handle_error_attribute, NULL },
   { "target", 1, -1, true, false, false, false,
- handle_target_attribute, NULL },
+ handle_target_attribute,
+ attr_target_exclusions },
   { "target_clones",  1, -1, true, false, false, false,
- handle_target_clones_attribute, NULL },
+ handle_target_clones_attribute,
+ attr_target_clones_exclusions },
   { "optimize",   1, -1, true, false, false, false,
  handle_optimize_attribute, NULL },
   /* For internal use only.  The leading '*' both prevents its usage in
@@ -1397,16 +1419,7 @@ handle_noinline_attribute (tree *node, tree name,
   int ARG_UNUSED (flags), bool *no_add_attrs)
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
-{
-  if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
-   {
- warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-  "with attribute %qs", name, "always_inline");
- *no_add_attrs = true;
-   }
-  else
-   DECL_UNINLINABLE (*node) = 1;
-}
+DECL_UNINLINABLE (*node) = 1;
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
@@ -1487,22 +1500,9 @@ handle_always_inline_attribute (tree *node, tree name,
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
 {
-  if (lookup_attribute ("noinline", DECL_ATTRIBUTES (*node)))
-   {
- warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-  "with %qs attribute", name, "noinline");
- *no_add_attrs = true;
-   }
-  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
-   {
-

[PATCH v3 3/5] ada: Improve attribute exclusion handling

2023-12-06 Thread Andrew Carlotti

Change the handling of some attribute mutual exclusions to use the
generic attribute exclusion lists, and fix some asymmetric exclusions by
adding the exclusions for always_inline after noinline or target_clones.

Aside from the new always_inline exclusions, the only change is
functionality is the choice of warning message displayed.  All warnings
about attribute mutual exclusions now use the same message.

gcc/ada/ChangeLog:

* gcc-interface/utils.cc (attr_noinline_exclusions): New.
(attr_always_inline_exclusions): Ditto.
(attr_target_exclusions): Ditto.
(attr_target_clones_exclusions): Ditto.
(gnat_internal_attribute_table): Add new exclusion lists.
(handle_noinline_attribute): Remove custom exclusion handling.
(handle_target_attribute): Ditto.
(handle_target_clones_attribute): Ditto.


diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 
e7b5c7783b1f1c702130c8879c79b7e329764b09..f2c504ddf8d3df11abe81aec695c9eea0b39da6c
 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -130,6 +130,32 @@ static const struct attribute_spec::exclusions 
attr_stack_protect_exclusions[] =
   { NULL, false, false, false },
 };
 
+static const struct attribute_spec::exclusions attr_always_inline_exclusions[] 
=
+{
+  { "noinline", true, true, true },
+  { "target_clones", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
+{
+  { "always_inline", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_target_exclusions[] =
+{
+  { "target_clones", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_target_clones_exclusions[] 
=
+{
+  { "always_inline", true, true, true },
+  { "target", true, true, true },
+  { NULL, false, false, false },
+};
+
 /* Fake handler for attributes we don't properly support, typically because
they'd require dragging a lot of the common-c front-end circuitry.  */
 static tree fake_attribute_handler (tree *, tree, tree, int, bool *);
@@ -165,7 +191,7 @@ const struct attribute_spec gnat_internal_attribute_table[] 
=
   { "strub",   0, 1, false, true, false, true,
 handle_strub_attribute, NULL },
   { "noinline", 0, 0,  true,  false, false, false,
-handle_noinline_attribute, NULL },
+handle_noinline_attribute, attr_noinline_exclusions },
   { "noclone",  0, 0,  true,  false, false, false,
 handle_noclone_attribute, NULL },
   { "no_icf",   0, 0,  true,  false, false, false,
@@ -175,7 +201,7 @@ const struct attribute_spec gnat_internal_attribute_table[] 
=
   { "leaf", 0, 0,  true,  false, false, false,
 handle_leaf_attribute, NULL },
   { "always_inline",0, 0,  true,  false, false, false,
-handle_always_inline_attribute, NULL },
+handle_always_inline_attribute, attr_always_inline_exclusions },
   { "malloc",   0, 0,  true,  false, false, false,
 handle_malloc_attribute, NULL },
   { "type generic", 0, 0,  false, true,  true,  false,
@@ -192,9 +218,9 @@ const struct attribute_spec gnat_internal_attribute_table[] 
=
   { "simd", 0, 1,  true,  false, false, false,
 handle_simd_attribute, NULL },
   { "target",   1, -1, true,  false, false, false,
-handle_target_attribute, NULL },
+handle_target_attribute, attr_target_exclusions },
   { "target_clones",1, -1, true,  false, false, false,
-handle_target_clones_attribute, NULL },
+handle_target_clones_attribute, attr_target_clones_exclusions },
 
   { "vector_size",  1, 1,  false, true,  false, false,
 handle_vector_size_attribute, NULL },
@@ -6755,16 +6781,7 @@ handle_noinline_attribute (tree *node, tree name,
   int ARG_UNUSED (flags), bool *no_add_attrs)
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
-{
-  if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
-   {
- warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-  "with attribute %qs", name, "always_inline");
- *no_add_attrs = true;
-   }
-  else
-   DECL_UNINLINABLE (*node) = 1;
-}
+DECL_UNINLINABLE (*node) = 1;
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
@@ -7063,12 +7080,6 @@ handle_target_attribute (tree *node, tree name, tree 
args, int flags,
   warning (OPT_Wattributes, "%qE attribute ignored", name);
   *no_add_attrs = true;
 }
-  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
-{
-  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-  "with %qs attribute", name, "target_clones");
-  *no_add_attrs = true;
-}
   else if (!targetm.target_option.valid_attribute_p (*node, name, args, flags))
 *no_add_attrs = true;
 
@@ -7096,23 +7107,8 @@

[PATCH v3 4/5] Add support for target_version attribute

2023-12-06 Thread Andrew Carlotti

This patch adds support for the "target_version" attribute to the middle
end and the C++ frontend, which will be used to implement function
multiversioning in the aarch64 backend.

On targets that don't use the "target" attribute for multiversioning,
there is no conflict between the "target" and "target_clones"
attributes.  This patch therefore makes the mutual exclusion in
C-family, D and Ada conditonal upon the value of the
expanded_clones_attribute target hook.

The "target_version" attribute is only added to C++ in this patch,
because this is currently the only frontend which supports
multiversioning using the "target" attribute.  Support for the
"target_version" attribute will be extended to C at a later date.

Targets that currently use the "target" attribute for function
multiversioning (i.e. i386 and rs6000) are not affected by this patch.

Ok for master?

gcc/ChangeLog:

* attribs.cc (decl_attributes): Pass attribute name to target.
(is_function_default_version): Update comment to specify
incompatibility with target_version attributes.
* cgraphclones.cc (cgraph_node::create_version_clone_with_body):
Call valid_version_attribute_p for target_version attributes.
* defaults.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): New macro.
* target.def (valid_version_attribute_p): New hook.
* doc/tm.texi.in: Add new hook.
* doc/tm.texi: Regenerate.
* multiple_target.cc (create_dispatcher_calls): Remove redundant
is_function_default_version check.
(expand_target_clones): Use target macro to pick attribute name.
* targhooks.cc (default_target_option_valid_version_attribute_p):
New.
* targhooks.h (default_target_option_valid_version_attribute_p):
New.
* tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
target_version attributes.

gcc/c-family/ChangeLog:

* c-attribs.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto, and add target_version.
(attr_target_version_exclusions): New.
(c_common_attribute_table): Add target_version.
(handle_target_version_attribute): New.

gcc/ada/ChangeLog:

* gcc-interface/utils.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto.

gcc/d/ChangeLog:

* d-attribs.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto.

gcc/cp/ChangeLog:

* decl2.cc (check_classfn): Update comment to include
target_version attributes.


diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 
f2c504ddf8d3df11abe81aec695c9eea0b39da6c..5d946c33b212c5ea50e7a73524e8c1d062280956
 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -145,14 +145,16 @@ static const struct attribute_spec::exclusions 
attr_noinline_exclusions[] =
 
 static const struct attribute_spec::exclusions attr_target_exclusions[] =
 {
-  { "target_clones", true, true, true },
+  { "target_clones", TARGET_HAS_FMV_TARGET_ATTRIBUTE,
+TARGET_HAS_FMV_TARGET_ATTRIBUTE, TARGET_HAS_FMV_TARGET_ATTRIBUTE },
   { NULL, false, false, false },
 };
 
 static const struct attribute_spec::exclusions attr_target_clones_exclusions[] 
=
 {
   { "always_inline", true, true, true },
-  { "target", true, true, true },
+  { "target", TARGET_HAS_FMV_TARGET_ATTRIBUTE, TARGET_HAS_FMV_TARGET_ATTRIBUTE,
+TARGET_HAS_FMV_TARGET_ATTRIBUTE },
   { NULL, false, false, false },
 };
 
diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 
c7209c26acc9faf699774b0ef669ec6748b9073d..19cccf2d7ca4fdd6a46a01884393c6779333dbc5
 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -657,7 +657,8 @@ decl_attributes (tree *node, tree attributes, int flags,
  options to the attribute((target(...))) list.  */
   if (TREE_CODE (*node) == FUNCTION_DECL
   && current_target_pragma
-  && targetm.target_option.valid_attribute_p (*node, NULL_TREE,
+  && targetm.target_option.valid_attribute_p (*node,
+ get_identifier ("target"),
  current_target_pragma, 0))
 {
   tree cur_attr = lookup_attribute ("target", attributes);
@@ -1241,8 +1242,9 @@ make_dispatcher_decl (const tree decl)
   return func_decl;  
 }
 
-/* Returns true if decl is multi-versioned and DECL is the default function,
-   that is it is not tagged with target specific optimization.  */
+/* Returns true if DECL is multi-versioned using the target attribute, and this
+   is the default version.  This function can only be used for targets that do
+   not support the "target_version" attribute.  */
 
 bool
 is_function_default_version (const tree decl)
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c

[PATCH v3 5/5] aarch64: Add function multiversioning support

2023-12-06 Thread Andrew Carlotti

This adds initial support for function multiversioning on aarch64 using
the target_version and target_clones attributes.  This loosely follows
the Beta specification in the ACLE [1], although with some differences
that still need to be resolved (possibly as follow-up patches).

Existing function multiversioning implementations are broken in various
ways when used across translation units.  This includes placing
resolvers in the wrong translation units, and using symbol mangling that
callers to unintentionally bypass the resolver in some circumstances.
Fixing these issues for aarch64 will require modifications to our ACLE
specification.  It will also require further adjustments to existing
middle end code, to facilitate different mangling and resolver
placement while preserving existing target behaviours.

The list of function multiversioning features specified in the ACLE is
also inconsistent with the list of features supported in target option
extensions.  I intend to resolve some or all of these inconsistencies at
a later stage.

The target_version attribute is currently only supported in C++, since
this is the only frontend with existing support for multiversioning
using the target attribute.  On the other hand, this patch happens to
enable multiversioning with the target_clones attribute in Ada and D, as
well as the entire C family, using their existing frontend support.

This patch also does not support the following aspects of the Beta
specification:

- The target_clones attribute should allow an implicit unlisted
  "default" version.
- There should be an option to disable function multiversioning at
  compile time.
- Unrecognised target names in a target_clones attribute should be
  ignored (with an optional warning).  This current patch raises an
  error instead.

[1] 
https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning

---

I believe the support present in this patch correctly handles function
multiversioning within a single translation unit for all features in the ACLE
specification with option extension support.

Is it ok to push this patch in its current state? I'd then continue working on
incremental improvements to the supported feature extensions and the ABI issues
in followup patches, along with corresponding changes and improvements to the
ACLE specification.


gcc/ChangeLog:

* config/aarch64/aarch64-feature-deps.h (fmv_deps_):
Define aarch64_feature_flags mask foreach FMV feature.
* config/aarch64/aarch64-option-extensions.def: Use new macros
to define FMV feature extensions.
* config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
Check for target_version attribute after processing target
attribute.
(aarch64_fmv_feature_data): New.
(aarch64_parse_fmv_features): New.
(aarch64_process_target_version_attr): New.
(aarch64_option_valid_version_attribute_p): New.
(get_feature_mask_for_version): New.
(compare_feature_masks): New.
(aarch64_compare_version_priority): New.
(build_ifunc_arg_type): New.
(make_resolver_func): New.
(add_condition_to_bb): New.
(dispatch_function_versions): New.
(aarch64_generate_version_dispatcher_body): New.
(aarch64_get_function_versions_dispatcher): New.
(aarch64_common_function_versions): New.
(aarch64_mangle_decl_assembler_name): New.
(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
(TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
(TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
(TARGET_COMPARE_VERSION_PRIORITY): New implementation.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
* config/aarch64/aarch64.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE):
Set target macro.
* config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
  new value to report duplicate FMV feature.
* common/config/aarch64/cpuinfo.h: New file.

libgcc/ChangeLog:

* config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared
  copy in gcc/common

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/options_set_17.c: Reorder expected flags.
* gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_21.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
* gcc

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Jakub Jelinek

On Wed, Dec 06, 2023 at 08:31:12PM +0800, Xi Ruoyao wrote:
> On Wed, 2023-12-06 at 13:24 +0100, Jakub Jelinek wrote:
> > I wonder if this part isn't too pedantic or more of a code style.
> > Some packages fail to build with this with -Werror because they do
> >   struct S *p = calloc (sizeof (struct S), 1);
> > or similar.  It is true that calloc arguments are documented to be
> > nmemb, size, but given sufficient alignment (which is not really different
> > between either order of arguments) isn't it completely valid to allocate
> > char array with sizeof (struct S) elements and then store a struct S object
> > into it?
> 
> In PR112364 Martin Uecker has pointed out the alignment may be different
> with the different order of arguments, per C23 (N2293).  With earlier
> versions of the standard some people believe the alignment should not be
> different, while the other people disagree (as the text is not very
> clear).

I can understand implementations which use smaller alignment based on
allocation size, but are there any which consider for that just the second
calloc argument rather than the product of both arguments?
I think they'd quickly break a lot of real-world code.
Further I think
"size less than or equal to the size requested"
is quite ambiguous in the calloc case, isn't the size requested in the
calloc case actually nmemb * size rather than just size?

Jakub

[PATCH] RISC-V: Fix VSETVL PASS bug

2023-12-06 Thread Juzhe-Zhong

As PR112855 mentioned, the VSETVL PASS insert vsetvli in unexpected location.

Due to 2 reasons:
1. incorrect transparant computation LCM data. We need to check VL operand defs 
and uses.
2. incorrect fusion of unrelated edge which is the edge never reach the vsetvl 
expression.

PR target/112855

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc 
(pre_vsetvl::compute_lcm_local_properties): Fix transparant LCM data.
(pre_vsetvl::earliest_fuse_vsetvl_info): Disable earliest fusion for 
unrelated edge.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112855.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 63 ++-
 .../gcc.target/riscv/rvv/autovec/pr112855.c   | 26 
 2 files changed, 86 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112855.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 484a8b3a514..f0dd43bece7 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2034,7 +2034,7 @@ private:
 gcc_unreachable ();
   }
 
-  bool anticpatable_exp_p (const vsetvl_info &header_info)
+  bool anticipated_exp_p (const vsetvl_info &header_info)
   {
 if (!header_info.has_nonvlmax_reg_avl () && !header_info.has_vl ())
   return true;
@@ -2645,7 +2645,7 @@ pre_vsetvl::compute_lcm_local_properties ()
}
}
 
- for (const insn_info *insn : bb->real_nondebug_insns ())
+ for (insn_info *insn : bb->real_nondebug_insns ())
{
  if (info.has_nonvlmax_reg_avl ()
  && find_access (insn->defs (), REGNO (info.get_avl (
@@ -2653,6 +2653,59 @@ pre_vsetvl::compute_lcm_local_properties ()
  bitmap_clear_bit (m_transp[bb_index], i);
  break;
}
+
+ if (info.has_vl ()
+ && reg_mentioned_p (info.get_vl (), insn->rtl ()))
+   {
+ if (find_access (insn->defs (), REGNO (info.get_vl (
+   /* We can't fuse vsetvl into the blocks that modify the
+  VL operand since successors of such blocks will need
+  the value of those blocks are defining.
+
+ bb 4: def a5
+ /   \
+ bb 5:use a5  bb 6:vsetvl a5, 5
+
+  The example above shows that we can't fuse vsetvl
+  from bb 6 into bb 4 since the successor bb 5 is using
+  the value defined in bb 4.  */
+   ;
+ else
+   {
+ /* We can't fuse vsetvl into the blocks that use the
+VL operand which has different value from the
+vsetvl info.
+
+   bb 4: def a5
+ |
+   bb 5: use a5
+ |
+   bb 6: def a5
+ |
+   bb 7: use a5
+
+The example above shows that we can't fuse vsetvl
+from bb 6 into bb 5 since their value is different.
+  */
+ resource_info resource
+   = full_register (REGNO (info.get_vl ()));
+ def_lookup dl = crtl->ssa->find_def (resource, insn);
+ def_info *def
+   = dl.matching_set_or_last_def_of_prev_group ();
+ gcc_assert (def);
+ insn_info *def_insn = extract_single_source (
+   dyn_cast (def));
+ if (def_insn && vsetvl_insn_p (def_insn->rtl ()))
+   {
+ vsetvl_info def_info = vsetvl_info (def_insn);
+ if (m_dem.compatible_p (def_info, info))
+   continue;
+   }
+   }
+
+ bitmap_clear_bit (m_transp[bb_index], i);
+ break;
+   }
}
}
 
@@ -2663,7 +2716,7 @@ pre_vsetvl::compute_lcm_local_properties ()
   vsetvl_info &footer_info = block_info.get_exit_info ();
 
   if (header_info.valid_p ()
- && (anticpatable_exp_p (header_info) || block_info.full_available))
+ && (anticipated_exp_p (header_info) || block_info.full_available))
bitmap_set_bit (m_antloc[bb_index],
get_expr_index (m_exprs, header_i

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Martin Uecker

Am Mittwoch, dem 06.12.2023 um 13:57 +0100 schrieb Jakub Jelinek:
> On Wed, Dec 06, 2023 at 08:31:12PM +0800, Xi Ruoyao wrote:
> > On Wed, 2023-12-06 at 13:24 +0100, Jakub Jelinek wrote:
> > > I wonder if this part isn't too pedantic or more of a code style.
> > > Some packages fail to build with this with -Werror because they do
> > >   struct S *p = calloc (sizeof (struct S), 1);
> > > or similar.  It is true that calloc arguments are documented to be
> > > nmemb, size, but given sufficient alignment (which is not really different
> > > between either order of arguments) isn't it completely valid to allocate
> > > char array with sizeof (struct S) elements and then store a struct S 
> > > object
> > > into it?
> > 
> > In PR112364 Martin Uecker has pointed out the alignment may be different
> > with the different order of arguments, per C23 (N2293).  With earlier
> > versions of the standard some people believe the alignment should not be
> > different, while the other people disagree (as the text is not very
> > clear).
> 
> I can understand implementations which use smaller alignment based on
> allocation size, but are there any which consider for that just the second
> calloc argument rather than the product of both arguments?

Not that I know of.  

> I think they'd quickly break a lot of real-world code.

There are quite a few projects which use calloc with swapped
arguments.

> Further I think
> "size less than or equal to the size requested"
> is quite ambiguous in the calloc case, isn't the size requested in the
> calloc case actually nmemb * size rather than just size?

This is unclear but it can be understood this way.
This was also Joseph's point.

I am happy to submit a patch that changes the code so
that the swapped arguments to calloc do not cause a warning
anymore.

On the other hand, the only feedback I got so far was
from people who were then happy to get this warning.

Martin

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Martin Uecker

Am Mittwoch, dem 06.12.2023 um 14:34 +0100 schrieb Martin Uecker:
> Am Mittwoch, dem 06.12.2023 um 13:57 +0100 schrieb Jakub Jelinek:
> > On Wed, Dec 06, 2023 at 08:31:12PM +0800, Xi Ruoyao wrote:
> > > On Wed, 2023-12-06 at 13:24 +0100, Jakub Jelinek wrote:
> > > > I wonder if this part isn't too pedantic or more of a code style.
> > > > Some packages fail to build with this with -Werror because they do
> > > >   struct S *p = calloc (sizeof (struct S), 1);
> > > > or similar.  It is true that calloc arguments are documented to be
> > > > nmemb, size, but given sufficient alignment (which is not really 
> > > > different
> > > > between either order of arguments) isn't it completely valid to allocate
> > > > char array with sizeof (struct S) elements and then store a struct S 
> > > > object
> > > > into it?
> > > 
> > > In PR112364 Martin Uecker has pointed out the alignment may be different
> > > with the different order of arguments, per C23 (N2293).  With earlier
> > > versions of the standard some people believe the alignment should not be
> > > different, while the other people disagree (as the text is not very
> > > clear).
> > 
> > I can understand implementations which use smaller alignment based on
> > allocation size, but are there any which consider for that just the second
> > calloc argument rather than the product of both arguments?
> 
> Not that I know of.  
> 
> > I think they'd quickly break a lot of real-world code.
> 
> There are quite a few projects which use calloc with swapped
> arguments.
> 
> > Further I think
> > "size less than or equal to the size requested"
> > is quite ambiguous in the calloc case, isn't the size requested in the
> > calloc case actually nmemb * size rather than just size?
> 
> This is unclear but it can be understood this way.
> This was also Joseph's point.
> 
> I am happy to submit a patch that changes the code so
> that the swapped arguments to calloc do not cause a warning
> anymore.
> 
> On the other hand, the only feedback I got so far was
> from people who were then happy to get this warning.

Note that it is now -Wextra.

[PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-06 Thread Manos Anagnostakis

This is an RTL pass that detects store forwarding from stores to larger loads 
(load pairs).

This optimization is SPEC2017-driven and was found to be beneficial for some 
benchmarks,
through testing on ampere1/ampere1a machines.

For example, it can transform cases like

str  d5, [sp, #320]
fmul d5, d31, d29
ldp  d31, d17, [sp, #312] # Large load from small store

to

str  d5, [sp, #320]
fmul d5, d31, d29
ldr  d31, [sp, #312]
ldr  d17, [sp, #320]

Currently, the pass is disabled by default on all architectures and enabled by 
a target-specific option.

If deemed beneficial enough for a default, it will be enabled on 
ampere1/ampere1a,
or other architectures as well, without needing to be turned on by this option.

Bootstrapped and regtested on aarch64-linux.

gcc/ChangeLog:

* config.gcc: Add aarch64-store-forwarding.o to extra_objs.
* config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass.
* config/aarch64/aarch64-protos.h (make_pass_avoid_store_forwarding): 
Declare.
* config/aarch64/aarch64.opt (mavoid-store-forwarding): New option.
(aarch64-store-forwarding-threshold): New param.
* config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
* doc/invoke.texi: Document new option and new param.
* config/aarch64/aarch64-store-forwarding.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
* gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
* gcc.target/aarch64/ldp_ssll_overlap.c: New test.

Signed-off-by: Manos Anagnostakis 
Co-Authored-By: Manolis Tsamis 
Co-Authored-By: Philipp Tomsich 
---
Changes in v6:
- An obvious change. insn_cnt was incremented only on
  stores and not for every insn in the bb. Now restored.

 gcc/config.gcc|   1 +
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 .../aarch64/aarch64-store-forwarding.cc   | 318 ++
 gcc/config/aarch64/aarch64.opt|   9 +
 gcc/config/aarch64/t-aarch64  |  10 +
 gcc/doc/invoke.texi   |  11 +-
 .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
 .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
 .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
 10 files changed, 449 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6450448f2f0..7c48429eb82 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -350,6 +350,7 @@ aarch64*-*-*)
cxx_target_objs="aarch64-c.o"
d_target_objs="aarch64-d.o"
extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o 
cortex-a57-fma-steering.o aarch64-speculation.o 
falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o"
+   extra_objs="${extra_objs} aarch64-store-forwarding.o"
target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
target_has_targetm_common=yes
;;
diff --git a/gcc/config/aarch64/aarch64-passes.def 
b/gcc/config/aarch64/aarch64-passes.def
index 662a13fd5e6..94ced0aebf6 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -24,3 +24,4 @@ INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 
1, pass_switch_pstat
 INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
 INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti);
 INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion);
+INSERT_PASS_AFTER (pass_peephole2, 1, pass_avoid_store_forwarding);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 60ff61f6d54..8f5f2ca4710 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1069,6 +1069,7 @@ rtl_opt_pass *make_pass_tag_collision_avoidance 
(gcc::context *);
 rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt);
 rtl_opt_pass *make_pass_cc_fusion (gcc::context *ctxt);
 rtl_opt_pass *make_pass_switch_pstate_sm (gcc::context *ctxt);
+rtl_opt_pass *make_pass_avoid_store_forwarding (gcc::context *ctxt);

 poly_uint64 aarch64_regmode_natural_size (machine_mode);

diff --git a/gcc/config/aarch64/aarch64-store-forwarding.cc 
b/gcc/config/aarch64/aarch64-store-forwarding.cc
new file mode 100644
index 000..8a6faefd8c0
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-store-forwardin

Re: [PATCH] libsupc++: try cxa_thread_atexit_impl at runtime

2023-12-06 Thread Jonathan Wakely

On Wed, 6 Dec 2023 at 12:30, Thomas Schwinge  wrote:
>
> Hi Alexandre!
>
> On 2023-12-06T02:28:42-0300, Alexandre Oliva  wrote:
> > libsupc++: try cxa_thread_atexit_impl at runtime
> >
> > g++.dg/tls/thread_local-order2.C fails when the toolchain is built for
> > a platform that lacks __cxa_thread_atexit_impl, even if the program is
> > built and run using that toolchain on a (later) platform that offers
> > __cxa_thread_atexit_impl.
> >
> > This patch adds runtime testing for __cxa_thread_atexit_impl on select
> > platforms (GNU variants, for starters) that support weak symbols.
>
> Need something like:
>
> --- libstdc++-v3/libsupc++/atexit_thread.cc
> +++ libstdc++-v3/libsupc++/atexit_thread.cc
> @@ -164,2 +164,4 @@ __cxxabiv1::__cxa_thread_atexit (void 
> (_GLIBCXX_CDTOR_CALLABI *dtor)(void *),
>  return __cxa_thread_atexit_impl (dtor, obj, dso_handle);
> +#else
> +  (void) dso_handle;
>  #endif

I would prefer:

--- a/libstdc++-v3/libsupc++/atexit_thread.cc
+++ b/libstdc++-v3/libsupc++/atexit_thread.cc
@@ -148,7 +148,7 @@ __cxa_thread_atexit_impl (void
(_GLIBCXX_CDTOR_CALLABI *func) (void *),
 // ??? We can't make it an ifunc, can we?
 extern "C" int
 __cxxabiv1::__cxa_thread_atexit (void (_GLIBCXX_CDTOR_CALLABI *dtor)(void *),
-void *obj, void *dso_handle)
+void *obj, [[maybe_unused]] void *dso_handle)
   _GLIBCXX_NOTHROW
 {
 #if __GXX_WEAK__





>
> ... to avoid:
>
> [...]/source-gcc/libstdc++-v3/libsupc++/atexit_thread.cc: In function 
> ‘int __cxxabiv1::__cxa_thread_atexit(void (*)(void*), void*, void*)’:
> [...]/source-gcc/libstdc++-v3/libsupc++/atexit_thread.cc:151:51: error: 
> unused parameter ‘dso_handle’ [-Werror=unused-parameter]
>   151 |  void *obj, void *dso_handle)
>   | ~~^~
> cc1plus: all warnings being treated as errors
> make[4]: *** [atexit_thread.lo] Error 1
>
> With that, GCC/nvptx then is back to:
>
> UNSUPPORTED: g++.dg/tls/thread_local6.C  -std=c++98
> PASS: g++.dg/tls/thread_local6.C  -std=c++14 (test for excess errors)
> PASS: g++.dg/tls/thread_local6.C  -std=c++14 execution test
> PASS: g++.dg/tls/thread_local6.C  -std=c++17 (test for excess errors)
> PASS: g++.dg/tls/thread_local6.C  -std=c++17 execution test
> PASS: g++.dg/tls/thread_local6.C  -std=c++20 (test for excess errors)
> PASS: g++.dg/tls/thread_local6.C  -std=c++20 execution test
>
>
> Grüße
>  Thomas
>
>
> > for  libstdc++-v3/ChangeLog
> >
> >   * config/os/gnu-linux/os_defines.h
> >   (_GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL): Define.
> >   * libsupc++/atexit_thread.cc [__GXX_WEAK__ &&
> >   _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL]
> >   (__cxa_thread_atexit): Add dynamic detection of
> >   __cxa_thread_atexit_impl.
> > ---
> >  libstdc++-v3/config/os/gnu-linux/os_defines.h |5 +
> >  libstdc++-v3/libsupc++/atexit_thread.cc   |   23 
> > ++-
> >  2 files changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/libstdc++-v3/config/os/gnu-linux/os_defines.h 
> > b/libstdc++-v3/config/os/gnu-linux/os_defines.h
> > index 87317031fcd71..a2e4baec069d5 100644
> > --- a/libstdc++-v3/config/os/gnu-linux/os_defines.h
> > +++ b/libstdc++-v3/config/os/gnu-linux/os_defines.h
> > @@ -60,6 +60,11 @@
> >  # define _GLIBCXX_HAVE_FLOAT128_MATH 1
> >  #endif
> >
> > +// Enable __cxa_thread_atexit to rely on a (presumably libc-provided)
> > +// __cxa_thread_atexit_impl, if it happens to be defined, even if
> > +// configure couldn't find it during the build.
> > +#define _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL 1
> > +
> >  #ifdef __linux__
> >  // The following libpthread properties only apply to Linux, not GNU/Hurd.
> >
> > diff --git a/libstdc++-v3/libsupc++/atexit_thread.cc 
> > b/libstdc++-v3/libsupc++/atexit_thread.cc
> > index 9346d50f5dafe..aa4ed5312bfe3 100644
> > --- a/libstdc++-v3/libsupc++/atexit_thread.cc
> > +++ b/libstdc++-v3/libsupc++/atexit_thread.cc
> > @@ -138,11 +138,32 @@ namespace {
> >}
> >  }
> >
> > +#if __GXX_WEAK__ && _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL
> > +extern "C"
> > +int __attribute__ ((__weak__))
> > +__cxa_thread_atexit_impl (void (_GLIBCXX_CDTOR_CALLABI *func) (void *),
> > +   void *arg, void *d);
> > +#endif
> > +
> > +// ??? We can't make it an ifunc, can we?
> >  extern "C" int
> >  __cxxabiv1::__cxa_thread_atexit (void (_GLIBCXX_CDTOR_CALLABI *dtor)(void 
> > *),
> > -  void *obj, void */*dso_handle*/)
> > +  void *obj, void *dso_handle)
> >_GLIBCXX_NOTHROW
> >  {
> > +#if __GXX_WEAK__ && _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL
> > +  if (__cxa_thread_atexit_impl)
> > +// Rely on a (presumably libc-provided) __cxa_thread_atexit_impl,
> > +// if it happens to be defined, even if configur

Re: [PATCH RFA (libstdc++)] c++: partial ordering of object parameter [PR53499]

2023-12-06 Thread Jason Merrill


On 12/6/23 01:02, waffl3x wrote:

On Tuesday, December 5th, 2023 at 9:36 PM, Jason Merrill  
wrote:

On 12/5/23 23:23, waffl3x wrote:


Does CWG2834 effect this weird edge case?


2834 affects all partial ordering with explicit object member functions;


Both in relation to each other, and to iobj and static member functions?


currently the working draft says that they get an additional fake object
parameter, which is clearly wrong.


Yeah, that's really weird. I was under the impression that's how static
member functions worked, I didn't realize it was also how it's
specified for xobj member functions. I still find it weird for static
member functions. I guess I'll have to study template partial ordering,
what it is, how it's specified and whatnot. I think I understand it
intuitively but not at a language law level.


Right, adding it to static member functions was a recent change, and IMO 
also wrong.



I couldn't quite grasp the
standardese so I'm not really sure. These are a few cases from a test
that I finalized last night. I ran this by jwakely and he agreed that
the behavior as shown is correct by the standard. I'll also add that
this is also the current behavior of my patch.

template concept Constrain = true;

inline constexpr int iobj_fn = 5;
inline constexpr int xobj_fn = 10;

struct S {
int f(Constrain auto) { return iobj_fn; };
int f(this S&&, auto) { return xobj_fn; };

int g(auto) { return iobj_fn; };
int g(this S&&, Constrain auto) { return xobj_fn; };
};
int main() {
S s{};
s.f (0) == iobj_fn;


Yes, the xobj fn isn't viable because it takes an rvalue ref.


static_cast(s).f (0) == iobj_fn;


Yes, the functions look the same to partial ordering, so we compare
constraints and the iobj fn is more constrained.


s.g (0) == iobj_fn;


Yes, the xobj fn isn't viable.


static_cast(s).g (0) == xobj_fn;


Yes, the xobj fn is more constrained.

Jason


It's funny to see you effortlessly agree with what took me a number of
hours pondering.


Well, I've also been thinking about this area, thus the patch.  :)


So just to confirm, you're also saying the changes proposed by CWG2834
will not change the behavior of this example?


I'm saying the changes I'm advocating for CWG2834 (the draft you saw on 
github is not at all final) will establish that behavior.


Jason

Re: [PATCH] RISC-V: Fix VSETVL PASS bug

2023-12-06 Thread Robin Dapp

LGTM.

+ /* Don't perform earliest fusion on unrelated edge.  */
+ if (bitmap_count_bits (e) != 1)
+   continue;

This could still use a comment why e is "unrelated" in that case
(no v2 needed).

Regards
 Robin

[PATCH] c++: Don't diagnose ignoring of attributes if all ignored attributes are attribute_ignored_p

2023-12-06 Thread Jakub Jelinek

On Tue, Dec 05, 2023 at 11:01:20AM -0500, Jason Merrill wrote:
> > And there is another thing I wonder about: with -Wno-attributes= we are
> > supposed to ignore the attributes altogether, but we are actually still
> > warning about them when we emit these generic warnings about ignoring
> > all attributes which appertain to this and that (perhaps with some
> > exceptions we first remove from the attribute chain), like:
> > void foo () { [[foo::bar]]; }
> > with -Wattributes -Wno-attributes=foo::bar
> > Shouldn't we call some helper function in cases like this and warn
> > not when std_attrs (or how the attribute chain var is called) is non-NULL,
> > but if it is non-NULL and contains at least one non-attribute_ignored_p
> > attribute?
> 
> Sounds good.

The following patch implements it.
I've kept warnings for cases where the C++ standard says explicitly any
attributes aren't ok - 
"If an attribute-specifier-seq appertains to a friend declaration, that
declaration shall be a definition."

For some changes I haven't figured out how could I cover it in the
testsuite.

So far tested with
GXX_TESTSUITE_STDS=98,11,14,17,20,23,26 make check-g++ 
RUNTESTFLAGS="dg.exp=Wno-attributes* ubsan.exp=Wno-attributes*"
(which is all tests that use -Wno-attributes=), ok for trunk if it passes
full bootstrap/regtest?

Note, C uses a different strategy, it has c_warn_unused_attributes
function which warns about all the attributes one by one unless they
are ignored (or allowed in certain position).
Though that is just a single diagnostic wording, while C++ FE just warns
that there are some ignored attributes and doesn't name them individually
(except for namespace and using namespace) and uses different wordings in
different spots.

2023-12-06  Jakub Jelinek  

gcc/
* attribs.h (any_nonignored_attribute_p): Declare.
* attribs.cc (any_nonignored_attribute_p): New function.
gcc/cp/
* parser.cc (cp_parser_statement, cp_parser_expression_statement,
cp_parser_declaration, cp_parser_elaborated_type_specifier,
cp_parser_asm_definition): Don't diagnose ignored attributes
if !any_nonignored_attribute_p.
* decl.cc (grokdeclarator): Likewise.
* name-lookup.cc (handle_namespace_attrs, finish_using_directive):
Don't diagnose ignoring of attr_ignored_p attributes.
gcc/testsuite/
* g++.dg/warn/Wno-attributes-1.C: New test.

--- gcc/attribs.h.jj2023-12-06 12:03:27.421176109 +0100
+++ gcc/attribs.h   2023-12-06 12:36:55.704884514 +0100
@@ -48,6 +48,7 @@ extern void apply_tm_attr (tree, tree);
 extern tree make_attribute (const char *, const char *, tree);
 extern bool attribute_ignored_p (tree);
 extern bool attribute_ignored_p (const attribute_spec *const);
+extern bool any_nonignored_attribute_p (tree);

 extern struct scoped_attributes *
   register_scoped_attributes (const scoped_attribute_specs &, bool = false);
--- gcc/attribs.cc.jj   2023-12-06 12:03:27.386176602 +0100
+++ gcc/attribs.cc  2023-12-06 12:36:55.704884514 +0100
@@ -584,6 +584,19 @@ attribute_ignored_p (const attribute_spe
   return as->max_length == -2;
 }

+/* Return true if the ATTRS chain contains at least one attribute which
+   is not ignored.  */
+
+bool
+any_nonignored_attribute_p (tree attrs)
+{
+  for (tree attr = attrs; attr; attr = TREE_CHAIN (attr))
+if (!attribute_ignored_p (attr))
+  return true;
+
+  return false;
+}
+
 /* See whether LIST contains at least one instance of attribute ATTR
(possibly with different arguments).  Return the first such attribute
if so, otherwise return null.  */
--- gcc/cp/parser.cc.jj 2023-12-06 12:03:27.502174967 +0100
+++ gcc/cp/parser.cc2023-12-06 12:36:55.704884514 +0100
@@ -12778,9 +12778,8 @@ cp_parser_statement (cp_parser* parser,
 SET_EXPR_LOCATION (statement, statement_location);

   /* Allow "[[fallthrough]];" or "[[assume(cond)]];", but warn otherwise.  */
-  if (std_attrs != NULL_TREE)
-warning_at (attrs_loc,
-   OPT_Wattributes,
+  if (std_attrs != NULL_TREE && any_nonignored_attribute_p (std_attrs))
+warning_at (attrs_loc, OPT_Wattributes,
"attributes at the beginning of statement are ignored");
 }

@@ -12986,7 +12985,7 @@ cp_parser_expression_statement (cp_parse
 }

   /* Allow "[[fallthrough]];", but warn otherwise.  */
-  if (attr != NULL_TREE)
+  if (attr != NULL_TREE && any_nonignored_attribute_p (attr))
 warning_at (loc, OPT_Wattributes,
"attributes at the beginning of statement are ignored");

@@ -15191,7 +15190,7 @@ cp_parser_declaration (cp_parser* parser
}
}

-  if (std_attrs != NULL_TREE && !attribute_ignored_p (std_attrs))
+  if (std_attrs != NULL_TREE && any_nonignored_attribute_p (std_attrs))
warning_at (make_location (attrs_loc, attrs_loc, parser->lexer),
OPT_Wattributes, "attribute ignored");
   if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
@@ -2109

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Jakub Jelinek

On Wed, Dec 06, 2023 at 02:34:10PM +0100, Martin Uecker wrote:
> > Further I think
> > "size less than or equal to the size requested"
> > is quite ambiguous in the calloc case, isn't the size requested in the
> > calloc case actually nmemb * size rather than just size?
> 
> This is unclear but it can be understood this way.
> This was also Joseph's point.
> 
> I am happy to submit a patch that changes the code so
> that the swapped arguments to calloc do not cause a warning
> anymore.

That would be my preference because then the allocation size is
correct and it is purely a style warning.
It doesn't follow how the warning is described:
"Warn about calls to allocation functions decorated with attribute
@code{alloc_size} that specify insufficient size for the target type of
the pointer the result is assigned to"
when the size is certainly sufficient.

But wonder what others think about it.

BTW, shouldn't the warning be for C++ as well?  Sure, I know,
people use operator new more often, but still, the 
allocators are used in there as well.

We have the -Wmemset-transposed-args warning, couldn't we
have a similar one for calloc, and perhaps do it solely in
the case where one uses sizeof of the type used in the cast
pointer?
So warn for
(struct S *) calloc (sizeof (struct S), 1)
or
(struct S *) calloc (sizeof (struct S), n)
but not for
(struct S *) calloc (4, 15)
or
(struct S *) calloc (sizeof (struct T), 1)
or similar?  Of course check for compatible types of TYPE_MAIN_VARIANTs.

Jakub

[Committed V2] RISC-V: Fix VSETVL PASS bug

2023-12-06 Thread Juzhe-Zhong

As PR112855 mentioned, the VSETVL PASS insert vsetvli in unexpected location.

Due to 2 reasons:
1. incorrect transparant computation LCM data. We need to check VL operand defs 
and uses.
2. incorrect fusion of unrelated edge which is the edge never reach the vsetvl 
expression.

PR target/112855

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc 
(pre_vsetvl::compute_lcm_local_properties): Fix transparant LCM data.
(pre_vsetvl::earliest_fuse_vsetvl_info): Disable earliest fusion for 
unrelated edge.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112855.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 66 ++-
 .../gcc.target/riscv/rvv/autovec/pr112855.c   | 26 
 2 files changed, 89 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112855.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 484a8b3a514..68f0be7e81d 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2034,7 +2034,7 @@ private:
 gcc_unreachable ();
   }
 
-  bool anticpatable_exp_p (const vsetvl_info &header_info)
+  bool anticipated_exp_p (const vsetvl_info &header_info)
   {
 if (!header_info.has_nonvlmax_reg_avl () && !header_info.has_vl ())
   return true;
@@ -2645,7 +2645,7 @@ pre_vsetvl::compute_lcm_local_properties ()
}
}
 
- for (const insn_info *insn : bb->real_nondebug_insns ())
+ for (insn_info *insn : bb->real_nondebug_insns ())
{
  if (info.has_nonvlmax_reg_avl ()
  && find_access (insn->defs (), REGNO (info.get_avl (
@@ -2653,6 +2653,59 @@ pre_vsetvl::compute_lcm_local_properties ()
  bitmap_clear_bit (m_transp[bb_index], i);
  break;
}
+
+ if (info.has_vl ()
+ && reg_mentioned_p (info.get_vl (), insn->rtl ()))
+   {
+ if (find_access (insn->defs (), REGNO (info.get_vl (
+   /* We can't fuse vsetvl into the blocks that modify the
+  VL operand since successors of such blocks will need
+  the value of those blocks are defining.
+
+ bb 4: def a5
+ /   \
+ bb 5:use a5  bb 6:vsetvl a5, 5
+
+  The example above shows that we can't fuse vsetvl
+  from bb 6 into bb 4 since the successor bb 5 is using
+  the value defined in bb 4.  */
+   ;
+ else
+   {
+ /* We can't fuse vsetvl into the blocks that use the
+VL operand which has different value from the
+vsetvl info.
+
+   bb 4: def a5
+ |
+   bb 5: use a5
+ |
+   bb 6: def a5
+ |
+   bb 7: use a5
+
+The example above shows that we can't fuse vsetvl
+from bb 6 into bb 5 since their value is different.
+  */
+ resource_info resource
+   = full_register (REGNO (info.get_vl ()));
+ def_lookup dl = crtl->ssa->find_def (resource, insn);
+ def_info *def
+   = dl.matching_set_or_last_def_of_prev_group ();
+ gcc_assert (def);
+ insn_info *def_insn = extract_single_source (
+   dyn_cast (def));
+ if (def_insn && vsetvl_insn_p (def_insn->rtl ()))
+   {
+ vsetvl_info def_info = vsetvl_info (def_insn);
+ if (m_dem.compatible_p (def_info, info))
+   continue;
+   }
+   }
+
+ bitmap_clear_bit (m_transp[bb_index], i);
+ break;
+   }
}
}
 
@@ -2663,7 +2716,7 @@ pre_vsetvl::compute_lcm_local_properties ()
   vsetvl_info &footer_info = block_info.get_exit_info ();
 
   if (header_info.valid_p ()
- && (anticpatable_exp_p (header_info) || block_info.full_available))
+ && (anticipated_exp_p (header_info) || block_info.full_available))
bitmap_set_bit (m_antloc[bb_index],
get_expr_index (m_exprs, header_i

Re: [PATCH] driver: Fix memory leak.

2023-12-06 Thread Costas Argyris

Attached a new patch with these changes.

On Mon, 4 Dec 2023 at 12:15, Jonathan Wakely  wrote:

> On Sat, 2 Dec 2023 at 21:24, Costas Argyris wrote:
> >
> > Use std::vector instead of malloc'd pointer
> > to get automatic freeing of memory.
>
> You can't include  there. Instead you need to define
> INCLUDE_VECTOR before "system.h"
>
> Shouldn't you be using resize, not reserve? Otherwise mdswitches[i] is
> undefined.
>
>


0001-driver-Fix-memory-leak.patch
Description: Binary data

[PATCH] libstdc++: Make __gnu_debug::vector usable in constant expressions [PR109536]

2023-12-06 Thread Jonathan Wakely

Any comments on this approach?

-- >8 --

This makes constexpr std::vector (mostly) work in Debug Mode. All safe
iterator instrumentation and checking is disabled during constant
evaluation, because it requires mutex locks and calls to non-inline
functions defined in libstdc++.so. It should be OK to disable the safety
checks, because most UB should be detected during constant evaluation
anyway.

We could try to enable the full checking in constexpr, but it would mean
wrapping all the non-inline functions like _M_attach with an inline
_M_constexpr_attach that does the iterator housekeeping inline without
mutex locks when calling for constant evaluation, and calls the
non-inline function at runtime. That could be done in future if we find
that we've lost safety or useful checking by disabling the safe
iterators.

There are a few test failures in C++20 mode, which I'm unable to
explain. The _Safe_iterator::operator++() member gives errors for using
non-constexpr functions during constant evaluation, even though those
functions are guarded by std::is_constant_evaluated() checks. The same
code works fine for C++23 and up.

libstdc++-v3/ChangeLog:

PR libstdc++/109536
* include/bits/c++config (__glibcxx_constexpr_assert): Remove
macro.
* include/bits/stl_algobase.h (__niter_base, __copy_move_a)
(__copy_move_backward_a, __fill_a, __fill_n_a, __equal_aux)
(__lexicographical_compare_aux): Add constexpr to overloads for
debug mode iterators.
* include/debug/helper_functions.h (__unsafe): Add constexpr.
* include/debug/macros.h (_GLIBCXX_DEBUG_VERIFY_COND_AT): Remove
macro, folding it into ...
(_GLIBCXX_DEBUG_VERIFY_AT_F): ... here. Do not use
__glibcxx_constexpr_assert.
* include/debug/safe_base.h (_Safe_iterator_base): Add constexpr
to some member functions. Omit attaching, detaching and checking
operations during constant evaluation.
* include/debug/safe_container.h (_Safe_container): Likewise.
* include/debug/safe_iterator.h (_Safe_iterator): Likewise.
* include/debug/safe_iterator.tcc (__niter_base, __copy_move_a)
(__copy_move_backward_a, __fill_a, __fill_n_a, __equal_aux)
(__lexicographical_compare_aux): Add constexpr.
* include/debug/vector (_Safe_vector, vector): Add constexpr.
Omit safe iterator operations during constant evaluation.
* testsuite/23_containers/vector/bool/capacity/constexpr.cc:
Remove dg-xfail-if for debug mode.
* testsuite/23_containers/vector/bool/cmp_c++20.cc: Likewise.
* testsuite/23_containers/vector/bool/cons/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/element_access/1.cc:
Likewise.
* testsuite/23_containers/vector/bool/element_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/assign/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/swap/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/capacity/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/cmp_c++20.cc: Likewise.
* testsuite/23_containers/vector/cons/constexpr.cc: Likewise.
* testsuite/23_containers/vector/data_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/element_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/assign/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/swap/constexpr.cc:
Likewise.
---
 libstdc++-v3/include/bits/c++config   |   9 -
 libstdc++-v3/include/bits/stl_algobase.h  |  15 ++
 libstdc++-v3/include/debug/helper_functions.h |   1 +
 libstdc++-v3/include/debug/macros.h   |   9 +-
 libstdc++-v3/include/debug/safe_base.h|  35 +++-
 libstdc++-v3/include/debug/safe_container.h   |  15 +-
 libstdc++-v3/include/debug/safe_iterator.h| 186 +++---
 libstdc++-v3/include/debug/safe_iterator.tcc  |  15 ++
 libstdc++-v3/include/debug/vector | 146 --
 .../vector/bool/capacity/constexpr.cc |   1 -
 .../23_containers/vector/bool/cmp_c++20.cc|   1 -
 .../vector/bool/cons/constexpr.cc |   1 -
 .../vector/bool/element_access/1.cc   |   1 -
 .../vector/bool/element_access/constexpr.cc   |   1 -
 .../vector/bool/modifiers/assign/constexpr.cc |   1 -
 .../vector/bool/modifiers/constexpr.cc|   1 -
 .../vector/bool/modifiers/swap/constexpr.cc   |   3 +-
 .../vector/capacity/constexpr.cc  |   1 -
 .../23_containers/vector/cmp_c++20.cc |   1 -
 .../23_containers/vector/cons/constexpr.cc|   1 -
 .../vector/data_access/constexpr.cc

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Siddhesh Poyarekar


On 2023-12-06 09:21, Jakub Jelinek wrote:

On Wed, Dec 06, 2023 at 02:34:10PM +0100, Martin Uecker wrote:

Further I think
"size less than or equal to the size requested"
is quite ambiguous in the calloc case, isn't the size requested in the
calloc case actually nmemb * size rather than just size?


This is unclear but it can be understood this way.
This was also Joseph's point.

I am happy to submit a patch that changes the code so
that the swapped arguments to calloc do not cause a warning
anymore.


That would be my preference because then the allocation size is
correct and it is purely a style warning.
It doesn't follow how the warning is described:
"Warn about calls to allocation functions decorated with attribute
@code{alloc_size} that specify insufficient size for the target type of
the pointer the result is assigned to"
when the size is certainly sufficient.

But wonder what others think about it.


+1, from a libc perspective, the transposed arguments don't make a 
difference, a typical allocator will produce sufficiently sized 
allocation for the calloc call.



BTW, shouldn't the warning be for C++ as well?  Sure, I know,
people use operator new more often, but still, the 
allocators are used in there as well.

We have the -Wmemset-transposed-args warning, couldn't we
have a similar one for calloc, and perhaps do it solely in
the case where one uses sizeof of the type used in the cast
pointer?
So warn for
(struct S *) calloc (sizeof (struct S), 1)
or
(struct S *) calloc (sizeof (struct S), n)
but not for
(struct S *) calloc (4, 15)
or
(struct S *) calloc (sizeof (struct T), 1)
or similar?  Of course check for compatible types of TYPE_MAIN_VARIANTs.


+1, this could be an analyzer warning, since in practice it is just a 
code cleanliness issue.


Thanks,
Sid

Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-06 Thread Richard Biener

On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
 wrote:
>
> This is an RTL pass that detects store forwarding from stores to larger loads 
> (load pairs).
>
> This optimization is SPEC2017-driven and was found to be beneficial for some 
> benchmarks,
> through testing on ampere1/ampere1a machines.
>
> For example, it can transform cases like
>
> str  d5, [sp, #320]
> fmul d5, d31, d29
> ldp  d31, d17, [sp, #312] # Large load from small store
>
> to
>
> str  d5, [sp, #320]
> fmul d5, d31, d29
> ldr  d31, [sp, #312]
> ldr  d17, [sp, #320]
>
> Currently, the pass is disabled by default on all architectures and enabled 
> by a target-specific option.
>
> If deemed beneficial enough for a default, it will be enabled on 
> ampere1/ampere1a,
> or other architectures as well, without needing to be turned on by this 
> option.

What is aarch64-specific about the pass?

I see an increasingly large number of target specific passes pop up (probably
for the excuse we can generalize them if necessary).  But GCC isn't LLVM
and this feels like getting out of hand?

The x86 backend also has its store-forwarding "pass" as part of mdreorg
in ix86_split_stlf_stall_load.

Richard.

> Bootstrapped and regtested on aarch64-linux.
>
> gcc/ChangeLog:
>
> * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
> * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass.
> * config/aarch64/aarch64-protos.h (make_pass_avoid_store_forwarding): 
> Declare.
> * config/aarch64/aarch64.opt (mavoid-store-forwarding): New option.
> (aarch64-store-forwarding-threshold): New param.
> * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
> * doc/invoke.texi: Document new option and new param.
> * config/aarch64/aarch64-store-forwarding.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
> * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
> * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
>
> Signed-off-by: Manos Anagnostakis 
> Co-Authored-By: Manolis Tsamis 
> Co-Authored-By: Philipp Tomsich 
> ---
> Changes in v6:
> - An obvious change. insn_cnt was incremented only on
>   stores and not for every insn in the bb. Now restored.
>
>  gcc/config.gcc|   1 +
>  gcc/config/aarch64/aarch64-passes.def |   1 +
>  gcc/config/aarch64/aarch64-protos.h   |   1 +
>  .../aarch64/aarch64-store-forwarding.cc   | 318 ++
>  gcc/config/aarch64/aarch64.opt|   9 +
>  gcc/config/aarch64/t-aarch64  |  10 +
>  gcc/doc/invoke.texi   |  11 +-
>  .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
>  .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
>  .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
>  10 files changed, 449 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 6450448f2f0..7c48429eb82 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -350,6 +350,7 @@ aarch64*-*-*)
> cxx_target_objs="aarch64-c.o"
> d_target_objs="aarch64-d.o"
> extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
> aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
> aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o 
> cortex-a57-fma-steering.o aarch64-speculation.o 
> falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o"
> +   extra_objs="${extra_objs} aarch64-store-forwarding.o"
> target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc 
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
> target_has_targetm_common=yes
> ;;
> diff --git a/gcc/config/aarch64/aarch64-passes.def 
> b/gcc/config/aarch64/aarch64-passes.def
> index 662a13fd5e6..94ced0aebf6 100644
> --- a/gcc/config/aarch64/aarch64-passes.def
> +++ b/gcc/config/aarch64/aarch64-passes.def
> @@ -24,3 +24,4 @@ INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 
> 1, pass_switch_pstat
>  INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
>  INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti);
>  INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion);
> +INSERT_PASS_AFTER (pass_peephole2, 1, pass_avoid_store_forwarding);
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 60ff61f6d54..8f5f2ca4710 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1069,6

Re: [PATCH v8] c++: implement P2564, consteval needs to propagate up [PR107687]

2023-12-06 Thread Marek Polacek

On Wed, Dec 06, 2023 at 05:09:21PM +0530, Prathamesh Kulkarni wrote:
> On Tue, 5 Dec 2023 at 06:18, Marek Polacek  wrote:
> >
> > On Mon, Dec 04, 2023 at 04:49:29PM -0500, Jason Merrill wrote:
> > > On 12/4/23 15:23, Marek Polacek wrote:
> > > > +/* FN is not a consteval function, but may become one.  Remember to
> > > > +   escalate it after all pending templates have been instantiated.  */
> > > > +
> > > > +void
> > > > +maybe_store_immediate_escalating_fn (tree fn)
> > > > +{
> > > > +  if (unchecked_immediate_escalating_function_p (fn))
> > > > +remember_escalating_expr (fn);
> > > > +}
> > >
> > > > +++ b/gcc/cp/decl.cc
> > > > @@ -18441,7 +18441,10 @@ finish_function (bool inline_p)
> > > >if (!processing_template_decl
> > > >&& !DECL_IMMEDIATE_FUNCTION_P (fndecl)
> > > >&& !DECL_OMP_DECLARE_REDUCTION_P (fndecl))
> > > > -cp_fold_function (fndecl);
> > > > +{
> > > > +  cp_fold_function (fndecl);
> > > > +  maybe_store_immediate_escalating_fn (fndecl);
> > > > +}
> > >
> > > I think maybe_store_, and the call to it from finish_function, are 
> > > unneeded;
> > > we will have already decided whether we need to remember the function 
> > > during
> > > the call to cp_fold_function.
> >
> > 'Tis true.
> >
> > > OK with that change.
> >
> > Here's what I pushed after another regtest.  Thanks!
> Hi Marek,
> It seems the patch caused following regressions on aarch64:
> 
> Running g++:g++.dg/modules/modules.exp ...
> FAIL: g++.dg/modules/xtreme-header-4_b.C -std=c++2b (internal compiler
> error: tree check: expected class 'type', have 'declaration'
> (template_decl) in get_originating_module_decl, at cp/module.cc:18659)
> FAIL: g++.dg/modules/xtreme-header-5_b.C -std=c++2b (internal compiler
> error: tree check: expected class 'type', have 'declaration'
> (template_decl) in get_originating_module_decl, at cp/module.cc:18659)
> FAIL: g++.dg/modules/xtreme-header_b.C -std=c++2b (internal compiler
> error: tree check: expected class 'type', have 'declaration'
> (template_decl) in get_originating_module_decl, at cp/module.cc:18659)
> 
> Log files: 
> https://ci.linaro.org/job/tcwg_gcc_check--master-aarch64-build/1299/artifact/artifacts/00-sumfiles/

Are you sure it's caused by my patch?  I reckon I've seen that FAIL
many times before.

Marek

[PATCH] libstdc++: Fix testsuite with -Wformat

2023-12-06 Thread Gwenole Beauchesne

Tested on x86_64-pc-linux-gnu with --enable-languages=c,c++ and
additional -Wformat to CXXFLAGS.

-- >8 --

Fix testsuite when compiling with -Wformat. Use nonnull arguments so
that -Wformat does not cause extraneous output to be reported as an
error.

FAIL: tr1/8_c_compatibility/cinttypes/functions.cc (test for excess errors)

libstdc++-v3/ChangeLog:

* testsuite/tr1/8_c_compatibility/cinttypes/functions.cc: Use
nonnull arguments to strtoimax() and wcstoimax() functions.

Signed-off-by: Gwenole Beauchesne 
---
 .../testsuite/tr1/8_c_compatibility/cinttypes/functions.cc| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/libstdc++-v3/testsuite/tr1/8_c_compatibility/cinttypes/functions.cc 
b/libstdc++-v3/testsuite/tr1/8_c_compatibility/cinttypes/functions.cc
index 518ddf49875..21f5263b5cc 100644
--- a/libstdc++-v3/testsuite/tr1/8_c_compatibility/cinttypes/functions.cc
+++ b/libstdc++-v3/testsuite/tr1/8_c_compatibility/cinttypes/functions.cc
@@ -29,10 +29,10 @@ void test01()
 #if _GLIBCXX_USE_C99_INTTYPES_TR1
 
   std::tr1::intmax_t i = 0, numer = 0, denom = 0, base = 0;
-  const char* s = 0;
+  const char* s = "0";
   char** endptr = 0;
 #if defined(_GLIBCXX_USE_WCHAR_T) && _GLIBCXX_USE_C99_INTTYPES_WCHAR_T_TR1
-  const wchar_t* ws = 0;
+  const wchar_t* ws = L"0";
   wchar_t** wendptr = 0;
 #endif
 
-- 
2.39.2

Re: [PATCH] driver: Fix memory leak.

2023-12-06 Thread Jakub Jelinek

On Wed, Dec 06, 2023 at 02:29:25PM +, Costas Argyris wrote:
> Attached a new patch with these changes.
> 
> On Mon, 4 Dec 2023 at 12:15, Jonathan Wakely  wrote:
> 
> > On Sat, 2 Dec 2023 at 21:24, Costas Argyris wrote:
> > >
> > > Use std::vector instead of malloc'd pointer
> > > to get automatic freeing of memory.
> >
> > You can't include  there. Instead you need to define
> > INCLUDE_VECTOR before "system.h"
> >
> > Shouldn't you be using resize, not reserve? Otherwise mdswitches[i] is
> > undefined.

Any reason not to use vec.h instead?
I especially don't like the fact that with a global
std::vector var;
it means runtime __cxa_atexit for the var the destruction, which it really
doesn't need on exit.

We really don't need to free the memory at exit time, that is just wasted
cycles, all we need is that it is freed before the pointer or vector is
cleared.

Jakub

Re: [PATCH] libsupc++: try cxa_thread_atexit_impl at runtime

2023-12-06 Thread Jonathan Wakely

On Wed, 6 Dec 2023 at 13:53, Jonathan Wakely  wrote:
>
> On Wed, 6 Dec 2023 at 12:30, Thomas Schwinge  wrote:
> >
> > Hi Alexandre!
> >
> > On 2023-12-06T02:28:42-0300, Alexandre Oliva  wrote:
> > > libsupc++: try cxa_thread_atexit_impl at runtime
> > >
> > > g++.dg/tls/thread_local-order2.C fails when the toolchain is built for
> > > a platform that lacks __cxa_thread_atexit_impl, even if the program is
> > > built and run using that toolchain on a (later) platform that offers
> > > __cxa_thread_atexit_impl.
> > >
> > > This patch adds runtime testing for __cxa_thread_atexit_impl on select
> > > platforms (GNU variants, for starters) that support weak symbols.
> >
> > Need something like:
> >
> > --- libstdc++-v3/libsupc++/atexit_thread.cc
> > +++ libstdc++-v3/libsupc++/atexit_thread.cc
> > @@ -164,2 +164,4 @@ __cxxabiv1::__cxa_thread_atexit (void 
> > (_GLIBCXX_CDTOR_CALLABI *dtor)(void *),
> >  return __cxa_thread_atexit_impl (dtor, obj, dso_handle);
> > +#else
> > +  (void) dso_handle;
> >  #endif
>
> I would prefer:
>
> --- a/libstdc++-v3/libsupc++/atexit_thread.cc
> +++ b/libstdc++-v3/libsupc++/atexit_thread.cc
> @@ -148,7 +148,7 @@ __cxa_thread_atexit_impl (void
> (_GLIBCXX_CDTOR_CALLABI *func) (void *),
>  // ??? We can't make it an ifunc, can we?
>  extern "C" int
>  __cxxabiv1::__cxa_thread_atexit (void (_GLIBCXX_CDTOR_CALLABI *dtor)(void *),
> -void *obj, void *dso_handle)
> +void *obj, [[maybe_unused]] void *dso_handle)
>_GLIBCXX_NOTHROW
>  {
>  #if __GXX_WEAK__


The patch is OK with that change.


>
>
>
>
> >
> > ... to avoid:
> >
> > [...]/source-gcc/libstdc++-v3/libsupc++/atexit_thread.cc: In function 
> > ‘int __cxxabiv1::__cxa_thread_atexit(void (*)(void*), void*, void*)’:
> > [...]/source-gcc/libstdc++-v3/libsupc++/atexit_thread.cc:151:51: error: 
> > unused parameter ‘dso_handle’ [-Werror=unused-parameter]
> >   151 |  void *obj, void *dso_handle)
> >   | ~~^~
> > cc1plus: all warnings being treated as errors
> > make[4]: *** [atexit_thread.lo] Error 1
> >
> > With that, GCC/nvptx then is back to:
> >
> > UNSUPPORTED: g++.dg/tls/thread_local6.C  -std=c++98
> > PASS: g++.dg/tls/thread_local6.C  -std=c++14 (test for excess errors)
> > PASS: g++.dg/tls/thread_local6.C  -std=c++14 execution test
> > PASS: g++.dg/tls/thread_local6.C  -std=c++17 (test for excess errors)
> > PASS: g++.dg/tls/thread_local6.C  -std=c++17 execution test
> > PASS: g++.dg/tls/thread_local6.C  -std=c++20 (test for excess errors)
> > PASS: g++.dg/tls/thread_local6.C  -std=c++20 execution test
> >
> >
> > Grüße
> >  Thomas
> >
> >
> > > for  libstdc++-v3/ChangeLog
> > >
> > >   * config/os/gnu-linux/os_defines.h
> > >   (_GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL): Define.
> > >   * libsupc++/atexit_thread.cc [__GXX_WEAK__ &&
> > >   _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL]
> > >   (__cxa_thread_atexit): Add dynamic detection of
> > >   __cxa_thread_atexit_impl.
> > > ---
> > >  libstdc++-v3/config/os/gnu-linux/os_defines.h |5 +
> > >  libstdc++-v3/libsupc++/atexit_thread.cc   |   23 
> > > ++-
> > >  2 files changed, 27 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/libstdc++-v3/config/os/gnu-linux/os_defines.h 
> > > b/libstdc++-v3/config/os/gnu-linux/os_defines.h
> > > index 87317031fcd71..a2e4baec069d5 100644
> > > --- a/libstdc++-v3/config/os/gnu-linux/os_defines.h
> > > +++ b/libstdc++-v3/config/os/gnu-linux/os_defines.h
> > > @@ -60,6 +60,11 @@
> > >  # define _GLIBCXX_HAVE_FLOAT128_MATH 1
> > >  #endif
> > >
> > > +// Enable __cxa_thread_atexit to rely on a (presumably libc-provided)
> > > +// __cxa_thread_atexit_impl, if it happens to be defined, even if
> > > +// configure couldn't find it during the build.
> > > +#define _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL 1
> > > +
> > >  #ifdef __linux__
> > >  // The following libpthread properties only apply to Linux, not GNU/Hurd.
> > >
> > > diff --git a/libstdc++-v3/libsupc++/atexit_thread.cc 
> > > b/libstdc++-v3/libsupc++/atexit_thread.cc
> > > index 9346d50f5dafe..aa4ed5312bfe3 100644
> > > --- a/libstdc++-v3/libsupc++/atexit_thread.cc
> > > +++ b/libstdc++-v3/libsupc++/atexit_thread.cc
> > > @@ -138,11 +138,32 @@ namespace {
> > >}
> > >  }
> > >
> > > +#if __GXX_WEAK__ && _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL
> > > +extern "C"
> > > +int __attribute__ ((__weak__))
> > > +__cxa_thread_atexit_impl (void (_GLIBCXX_CDTOR_CALLABI *func) (void *),
> > > +   void *arg, void *d);
> > > +#endif
> > > +
> > > +// ??? We can't make it an ifunc, can we?
> > >  extern "C" int
> > >  __cxxabiv1::__cxa_thread_atexit (void (_GLIBCXX_CDTOR_CALLABI 
> > > *dtor)(void *),
> > > -  void *obj, void */*dso_handle*/)

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Jakub Jelinek

On Wed, Dec 06, 2023 at 09:30:32AM -0500, Siddhesh Poyarekar wrote:
> > We have the -Wmemset-transposed-args warning, couldn't we
> > have a similar one for calloc, and perhaps do it solely in
> > the case where one uses sizeof of the type used in the cast
> > pointer?
> > So warn for
> > (struct S *) calloc (sizeof (struct S), 1)
> > or
> > (struct S *) calloc (sizeof (struct S), n)
> > but not for
> > (struct S *) calloc (4, 15)
> > or
> > (struct S *) calloc (sizeof (struct T), 1)
> > or similar?  Of course check for compatible types of TYPE_MAIN_VARIANTs.
> 
> +1, this could be an analyzer warning, since in practice it is just a code
> cleanliness issue.

We don't do such things in the analyzer, nor it is possible, by the time
analyzer sees the IL all the sizeofs etc. are folded.  Analyzer is for
expensive to compute warnings, code style warnings are normally implemented
in the FEs.

Jakub

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Siddhesh Poyarekar


On 2023-12-06 09:41, Jakub Jelinek wrote:

On Wed, Dec 06, 2023 at 09:30:32AM -0500, Siddhesh Poyarekar wrote:

We have the -Wmemset-transposed-args warning, couldn't we
have a similar one for calloc, and perhaps do it solely in
the case where one uses sizeof of the type used in the cast
pointer?
So warn for
(struct S *) calloc (sizeof (struct S), 1)
or
(struct S *) calloc (sizeof (struct S), n)
but not for
(struct S *) calloc (4, 15)
or
(struct S *) calloc (sizeof (struct T), 1)
or similar?  Of course check for compatible types of TYPE_MAIN_VARIANTs.


+1, this could be an analyzer warning, since in practice it is just a code
cleanliness issue.


We don't do such things in the analyzer, nor it is possible, by the time
analyzer sees the IL all the sizeofs etc. are folded.  Analyzer is for
expensive to compute warnings, code style warnings are normally implemented
in the FEs.


Thanks, understood.  A separate FE warning is fine as well.

Thanks,
Sid

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Martin Uecker

Am Mittwoch, dem 06.12.2023 um 15:21 +0100 schrieb Jakub Jelinek:
> On Wed, Dec 06, 2023 at 02:34:10PM +0100, Martin Uecker wrote:
> > > Further I think
> > > "size less than or equal to the size requested"
> > > is quite ambiguous in the calloc case, isn't the size requested in the
> > > calloc case actually nmemb * size rather than just size?
> > 
> > This is unclear but it can be understood this way.
> > This was also Joseph's point.
> > 
> > I am happy to submit a patch that changes the code so
> > that the swapped arguments to calloc do not cause a warning
> > anymore.
> 
> That would be my preference because then the allocation size is
> correct and it is purely a style warning.
> It doesn't follow how the warning is described:
> "Warn about calls to allocation functions decorated with attribute
> @code{alloc_size} that specify insufficient size for the target type of
> the pointer the result is assigned to"
> when the size is certainly sufficient.

The C standard defines the semantics of to allocate space 
of 'nmemb' objects of size 'size', so I would say
the warning and its description are correct because
if you call calloc with '1' as size argument but
the object size is larger then you specify an 
insufficient size for the object given the semantical
description of calloc in the standard.

If this does not affect alignment, then  this should 
not matter, but it is still not really correct. 
> 
> But wonder what others think about it.
> 
> BTW, shouldn't the warning be for C++ as well?  Sure, I know,
> people use operator new more often, but still, the 
> allocators are used in there as well.

We should, but it I am not familiar with the C++ FE.

> 
> We have the -Wmemset-transposed-args warning, couldn't we
> have a similar one for calloc, and perhaps do it solely in
> the case where one uses sizeof of the type used in the cast
> pointer?
> So warn for
> (struct S *) calloc (sizeof (struct S), 1)
> or
> (struct S *) calloc (sizeof (struct S), n)
> but not for
> (struct S *) calloc (4, 15)
> or
> (struct S *) calloc (sizeof (struct T), 1)
> or similar?  Of course check for compatible types of TYPE_MAIN_VARIANTs.
> 
>   Jakub

Yes, although in contrast to -Wmeset-transposed-args
this would be considered a "style" option which then
nobody would activate.  And if we put it into -Wextra
then we have the same situation as today.

Martin

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Jakub Jelinek

On Wed, Dec 06, 2023 at 03:56:10PM +0100, Martin Uecker wrote:
> > That would be my preference because then the allocation size is
> > correct and it is purely a style warning.
> > It doesn't follow how the warning is described:
> > "Warn about calls to allocation functions decorated with attribute
> > @code{alloc_size} that specify insufficient size for the target type of
> > the pointer the result is assigned to"
> > when the size is certainly sufficient.
> 
> The C standard defines the semantics of to allocate space 
> of 'nmemb' objects of size 'size', so I would say
> the warning and its description are correct because
> if you call calloc with '1' as size argument but
> the object size is larger then you specify an 
> insufficient size for the object given the semantical
> description of calloc in the standard.

1 is sizeof (char), so you ask for an array of sizeof (struct ...)
chars and store the struct into it.

> > We have the -Wmemset-transposed-args warning, couldn't we
> > have a similar one for calloc, and perhaps do it solely in
> > the case where one uses sizeof of the type used in the cast
> > pointer?
> > So warn for
> > (struct S *) calloc (sizeof (struct S), 1)
> > or
> > (struct S *) calloc (sizeof (struct S), n)
> > but not for
> > (struct S *) calloc (4, 15)
> > or
> > (struct S *) calloc (sizeof (struct T), 1)
> > or similar?  Of course check for compatible types of TYPE_MAIN_VARIANTs.
> 
> Yes, although in contrast to -Wmeset-transposed-args
> this would be considered a "style" option which then
> nobody would activate.  And if we put it into -Wextra
> then we have the same situation as today.

Well, the significant difference would be that users would
know that they got the size for the allocation right, just
that a coding style says it is better to put the type's size
as the second argument rather than first, and they could disable
that warning separately from -Walloc-size and still get warnings
on (struct S *) calloc (1, 1) or (struct S *) malloc (3) if
sizeof (struct S) is 24...

Jakub

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Martin Uecker

Am Mittwoch, dem 06.12.2023 um 16:01 +0100 schrieb Jakub Jelinek:
> On Wed, Dec 06, 2023 at 03:56:10PM +0100, Martin Uecker wrote:
> > > That would be my preference because then the allocation size is
> > > correct and it is purely a style warning.
> > > It doesn't follow how the warning is described:
> > > "Warn about calls to allocation functions decorated with attribute
> > > @code{alloc_size} that specify insufficient size for the target type of
> > > the pointer the result is assigned to"
> > > when the size is certainly sufficient.
> > 
> > The C standard defines the semantics of to allocate space 
> > of 'nmemb' objects of size 'size', so I would say
> > the warning and its description are correct because
> > if you call calloc with '1' as size argument but
> > the object size is larger then you specify an 
> > insufficient size for the object given the semantical
> > description of calloc in the standard.
> 
> 1 is sizeof (char), so you ask for an array of sizeof (struct ...)
> chars and store the struct into it.

If you use

char *p = calloc(sizeof(struct foo), 1);

it does not warn.

> 
> > > We have the -Wmemset-transposed-args warning, couldn't we
> > > have a similar one for calloc, and perhaps do it solely in
> > > the case where one uses sizeof of the type used in the cast
> > > pointer?
> > > So warn for
> > > (struct S *) calloc (sizeof (struct S), 1)
> > > or
> > > (struct S *) calloc (sizeof (struct S), n)
> > > but not for
> > > (struct S *) calloc (4, 15)
> > > or
> > > (struct S *) calloc (sizeof (struct T), 1)
> > > or similar?  Of course check for compatible types of TYPE_MAIN_VARIANTs.
> > 
> > Yes, although in contrast to -Wmeset-transposed-args
> > this would be considered a "style" option which then
> > nobody would activate.  And if we put it into -Wextra
> > then we have the same situation as today.
> 
> Well, the significant difference would be that users would
> know that they got the size for the allocation right, just
> that a coding style says it is better to put the type's size
> as the second argument rather than first, and they could disable
> that warning separately from -Walloc-size and still get warnings
> on (struct S *) calloc (1, 1) or (struct S *) malloc (3) if
> sizeof (struct S) is 24...

Ok. 

Note that another limitation of the current version is that it
does not warn for

... = (struct S*) calloc (...)

with the cast (which is non-idiomatic in C).  This is also
something I would like to address in the future and would be
more important for the C++ version.  But for this case it
should probably use the type of the cast and the warning
needs to be added somewhere else in the FE.


Martin

Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-06 Thread Manos Anagnostakis

Hi Richard,

thanks for the useful comments.

On Wed, Dec 6, 2023 at 4:32 PM Richard Biener 
wrote:

> On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
>  wrote:
> >
> > This is an RTL pass that detects store forwarding from stores to larger
> loads (load pairs).
> >
> > This optimization is SPEC2017-driven and was found to be beneficial for
> some benchmarks,
> > through testing on ampere1/ampere1a machines.
> >
> > For example, it can transform cases like
> >
> > str  d5, [sp, #320]
> > fmul d5, d31, d29
> > ldp  d31, d17, [sp, #312] # Large load from small store
> >
> > to
> >
> > str  d5, [sp, #320]
> > fmul d5, d31, d29
> > ldr  d31, [sp, #312]
> > ldr  d17, [sp, #320]
> >
> > Currently, the pass is disabled by default on all architectures and
> enabled by a target-specific option.
> >
> > If deemed beneficial enough for a default, it will be enabled on
> ampere1/ampere1a,
> > or other architectures as well, without needing to be turned on by this
> option.
>
> What is aarch64-specific about the pass?
>
The pass was designed to target load pairs, which are aarch64 specific,
thus it cannot handle generic loads.

>
> I see an increasingly large number of target specific passes pop up
> (probably
> for the excuse we can generalize them if necessary).  But GCC isn't LLVM
> and this feels like getting out of hand?
>
> The x86 backend also has its store-forwarding "pass" as part of mdreorg
> in ix86_split_stlf_stall_load.
>
> Richard.
>
> > Bootstrapped and regtested on aarch64-linux.
> >
> > gcc/ChangeLog:
> >
> > * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
> > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New
> pass.
> > * config/aarch64/aarch64-protos.h
> (make_pass_avoid_store_forwarding): Declare.
> > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New
> option.
> > (aarch64-store-forwarding-threshold): New param.
> > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
> > * doc/invoke.texi: Document new option and new param.
> > * config/aarch64/aarch64-store-forwarding.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
> > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
> > * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
> >
> > Signed-off-by: Manos Anagnostakis 
> > Co-Authored-By: Manolis Tsamis 
> > Co-Authored-By: Philipp Tomsich 
> > ---
> > Changes in v6:
> > - An obvious change. insn_cnt was incremented only on
> >   stores and not for every insn in the bb. Now restored.
> >
> >  gcc/config.gcc|   1 +
> >  gcc/config/aarch64/aarch64-passes.def |   1 +
> >  gcc/config/aarch64/aarch64-protos.h   |   1 +
> >  .../aarch64/aarch64-store-forwarding.cc   | 318 ++
> >  gcc/config/aarch64/aarch64.opt|   9 +
> >  gcc/config/aarch64/t-aarch64  |  10 +
> >  gcc/doc/invoke.texi   |  11 +-
> >  .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
> >  .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
> >  .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
> >  10 files changed, 449 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
> >  create mode 100644
> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
> >  create mode 100644
> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 6450448f2f0..7c48429eb82 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -350,6 +350,7 @@ aarch64*-*-*)
> > cxx_target_objs="aarch64-c.o"
> > d_target_objs="aarch64-d.o"
> > extra_objs="aarch64-builtins.o aarch-common.o
> aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o
> aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o
> aarch64-sve-builtins-sme.o cortex-a57-fma-steering.o aarch64-speculation.o
> falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o"
> > +   extra_objs="${extra_objs} aarch64-store-forwarding.o"
> > target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.h
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
> > target_has_targetm_common=yes
> > ;;
> > diff --git a/gcc/config/aarch64/aarch64-passes.def
> b/gcc/config/aarch64/aarch64-passes.def
> > index 662a13fd5e6..94ced0aebf6 100644
> > --- a/gcc/config/aarch64/aarch64-passes.def
> > +++ b/gcc/config/aarch64/aarch64-passes.def
> > @@ -24,3 +24,4 @@ INSERT_PASS_BEFORE
> (pass_late_thread_prologue_and_epilogue, 1, pass_switch_pstat
> >  INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
> >  INSERT_PASS_BEFORE (pass_shorten_bra

Re: HELP: one issue during the implementation for counted_by attribute

2023-12-06 Thread Qing Zhao

Just an update on this issue.

Finally, I resolved this issue with the following solution:

For the source code (portion):

"
struct annotated {
  size_t foo;
  char array[] __attribute__((counted_by (foo)));
};

p2->array[8] = 0;
“

C FE will generate the following: (*.005t.original)

*(.ACCESS_WITH_SIZE (p2->array, &p2->foo, 1, 8, -1) + 8) = 0;

i.e, the RETURN type of the call to .ACCESS_WITH_SIZE should be a pointer type 
to char,  char *
(Previously, the RETURN type of the call is char [])"

This resolved the issue nicely. 

Let me know if you see any obvious issue with this solution. 

thanks.

Qing


> On Nov 30, 2023, at 11:07 AM, Qing Zhao  wrote:
> 
> Hi, 
> 
> 1. For the following source code (portion):
> 
> struct annotated {
>  size_t foo;
>  char b;
>  char array[] __attribute__((counted_by (foo)));
> };
> 
> static void noinline bar ()
> {
>  struct annotated *p2 = alloc_buf (10);
>  p2->array[8] = 0;
>  return;
> }
> 
> 2. I modified C FE to generate the following code for the routine “bar”:
> 
> ;; Function bar (null)
> ;; enabled by -tree-original
> {
>  struct annotated * p2 = alloc_buf (10);
> 
>struct annotated * p2 = alloc_buf (10);
>  .ACCESS_WITH_SIZE ((char *) &p2->array, &p2->foo, 1, 8, -1)[8] = 0;
>  return;
> }
> 
> The gimpliflication asserted at:/home/opc/Install/latest-d/bin/gcc -O2 
> -fdump-tree-all ttt_1.c
> ttt_1.c: In function ‘bar’:
> ttt_1.c:29:5: internal compiler error: in create_tmp_var, at 
> gimple-expr.cc:488
>   29 |   p2->array[8] = 0;
>  |   ~~^~~
> 
> 3. The reason for this assertion failure is:  (in gcc/gimplify.cc)
> 
> 16686 case CALL_EXPR:
> 16687   ret = gimplify_call_expr (expr_p, pre_p, fallback != fb_none);
> 16688 
> 16689   /* C99 code may assign to an array in a structure returned
> 16690  from a function, and this has undefined behavior only on
> 16691  execution, so create a temporary if an lvalue is
> 16692  required.  */
> 16693   if (fallback == fb_lvalue)
> 16694 {
> 16695   *expr_p = get_initialized_tmp_var (*expr_p, pre_p, 
> post_p, false);
> 16696   mark_addressable (*expr_p);
> 16697   ret = GS_OK;
> 16698 }
> 16699   break;
> 
> At Line 16695, when gimplifier tried to create a temporary value for the 
> .ACCESS_WITH_SIZE function as:
>   tmp = .ACCESS_WITH_SIZE ((char *) &p2->array, &p2->foo, 1, 8, -1);
> 
> It asserted since the TYPE of the function .ACCESS_WITH_SIZE is an 
> INCOMPLETE_TYPE (it’s the TYPE of p2->array, which is an incomplete type).
> 
> 4. I am stuck on how to resolve this issue properly:
> The first question is:
> 
> Where should  we generate
>  tmp = .ACCESS_WITH_SIZE ((char *) &p2->array, &p2->foo, 1, 8, -1)
> 
> In C FE or in middle-end gimplification? 
> 
> Thanks a lot for your help.
> 
> Qing
>

[committed] Fix c-c++-common/fhardened-[12].c test fails on hppa

2023-12-06 Thread John David Anglin

Tested on hppa-unknown-linux-gnu.  Committed to trunk.

Dave
---

Fix c-c++-common/fhardened-[12].c test fails on hppa

The -fstack-protector and -fstack-protector-strong options are
not supported on hppa since the stack grows up.

2023-12-06  John David Anglin  

gcc/testsuite/ChangeLog:

* c-c++-common/fhardened-1.c: Ignore __SSP_STRONG__ define
if __hppa__ is defined.
* c-c++-common/fhardened-2.c: Ignore __SSP__ define
if __hppa__ is defined.

diff --git a/gcc/testsuite/c-c++-common/fhardened-1.c 
b/gcc/testsuite/c-c++-common/fhardened-1.c
index 7e6740655fe..23478be76b2 100644
--- a/gcc/testsuite/c-c++-common/fhardened-1.c
+++ b/gcc/testsuite/c-c++-common/fhardened-1.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target *-*-linux* *-*-gnu* } } */
 /* { dg-options "-fhardened -O" } */
 
-#ifndef __SSP_STRONG__
+#if !defined(__SSP_STRONG__) && !defined(__hppa__)
 # error "-fstack-protector-strong not enabled"
 #endif
 
diff --git a/gcc/testsuite/c-c++-common/fhardened-2.c 
b/gcc/testsuite/c-c++-common/fhardened-2.c
index 280ff96eb15..6ac66f9f6b7 100644
--- a/gcc/testsuite/c-c++-common/fhardened-2.c
+++ b/gcc/testsuite/c-c++-common/fhardened-2.c
@@ -4,7 +4,7 @@
 #ifdef __SSP_STRONG__
 # error "-fstack-protector-strong enabled when it should not be"
 #endif
-#ifndef __SSP__
+#if !defined(__SSP__) && !defined(__hppa__)
 # error "-fstack-protector not enabled"
 #endif
 


signature.asc
Description: PGP signature

[PATCH] c++: Handle '#pragma GCC target optimize' early [PR48026]

2023-12-06 Thread Gwenole Beauchesne

Tested on x86_64-pc-linux-gnu with --enable-languages=c,c++

-- >8 --

Handle '#pragma GCC optimize' earlier as the __OPTIMIZE__ macro may need
to be defined as well for certain usages.

This is a follow-up to r14-4967-g8697d3a1. Also add more tests for the
'#pragma GCC target' case with auto-vectorization enabled and multiple
combinations of namespaces and/or class member functions.

gcc/c-family/ChangeLog:

PR c++/48026
* c-pragma.cc (init-pragma): Register `#pragma GCC optimize' in
preprocess-only mode, and enable early handling.

gcc/testsuite/ChangeLog:

PR c++/48026
PR c++/41201
* g++.target/i386/vect-pragma-target-1.C: New test.
* g++.target/i386/vect-pragma-target-2.C: New test.
* gcc.target/i386/vect-pragma-target-1.c: New test.
* gcc.target/i386/vect-pragma-target-1.c: New test.

Signed-off-by: Gwenole Beauchesne 
---
 gcc/c-family/c-pragma.cc  |   4 +-
 .../g++.target/i386/vect-pragma-target-1.C|   6 +
 .../g++.target/i386/vect-pragma-target-2.C|   6 +
 .../gcc.target/i386/vect-pragma-target-1.c| 194 ++
 .../gcc.target/i386/vect-pragma-target-2.c|   7 +
 5 files changed, 216 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.target/i386/vect-pragma-target-1.C
 create mode 100644 gcc/testsuite/g++.target/i386/vect-pragma-target-2.C
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-pragma-target-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-pragma-target-2.c

diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 849f8ac8c8b..26d4c0c71e0 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1852,7 +1852,9 @@ init_pragma (void)
   c_register_pragma_with_early_handler ("GCC", "target",
handle_pragma_target,
handle_pragma_target);
-  c_register_pragma ("GCC", "optimize", handle_pragma_optimize);
+  c_register_pragma_with_early_handler ("GCC", "optimize",
+   handle_pragma_optimize,
+   handle_pragma_optimize);
   c_register_pragma_with_early_handler ("GCC", "push_options",
handle_pragma_push_options,
handle_pragma_push_options);
diff --git a/gcc/testsuite/g++.target/i386/vect-pragma-target-1.C 
b/gcc/testsuite/g++.target/i386/vect-pragma-target-1.C
new file mode 100644
index 000..2f360cf50e1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/vect-pragma-target-1.C
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O0" } */
+/* { dg-final { scan-assembler-times "paddd.+xmm\[0-9]+"1 } }   */
+/* { dg-final { scan-assembler-times "vfmadd132ps.+ymm\[0-9]+"  1 } }   */
+/* { dg-final { scan-assembler-times "vpaddw.+zmm\[0-9]+"   1 } }   */
+#include "../../gcc.target/i386/vect-pragma-target-1.c"
diff --git a/gcc/testsuite/g++.target/i386/vect-pragma-target-2.C 
b/gcc/testsuite/g++.target/i386/vect-pragma-target-2.C
new file mode 100644
index 000..b85bc93d845
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/vect-pragma-target-2.C
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O0" } */
+/* { dg-final { scan-assembler-times "paddd.+xmm\[0-9]+"1 } }   */
+/* { dg-final { scan-assembler-times "vfmadd132ps.+ymm\[0-9]+"  1 } }   */
+/* { dg-final { scan-assembler-times "vpaddw.+zmm\[0-9]+"   1 } }   */
+#include "../../gcc.target/i386/vect-pragma-target-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/vect-pragma-target-1.c 
b/gcc/testsuite/gcc.target/i386/vect-pragma-target-1.c
new file mode 100644
index 000..f5e71e453ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-pragma-target-1.c
@@ -0,0 +1,194 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O0" } */
+/* { dg-final { scan-assembler-times "paddd.+xmm\[0-9]+"1 } }   */
+/* { dg-final { scan-assembler-times "vfmadd132ps.+ymm\[0-9]+"  1 } }   */
+/* { dg-final { scan-assembler-times "vpaddw.+zmm\[0-9]+"   1 } }   */
+#ifndef CHECK_DEFINES
+#define CHECK_DEFINES 0
+#endif
+
+#define N 1024
+
+/* Optimization flags and tree vectorizer shall be disabled at this point */
+#if CHECK_DEFINES && defined(__OPTIMIZE__)
+#error "__OPTIMIZE__ is defined (not compiled with -O0?)"
+#endif
+
+#pragma GCC push_options
+#pragma GCC optimize ("O2", "tree-vectorize")
+
+/* Optimization flags and tree vectorizer shall be enabled at this point */
+#if CHECK_DEFINES && !defined(__OPTIMIZE__)
+#error "__OPTIMIZE__ is not defined"
+#endif
+
+#pragma GCC push_options
+#pragma GCC target ("

{Patch, fortran] PR112834 - Class array function selector causes chain of syntax and other spurious errors

2023-12-06 Thread Paul Richard Thomas

Dear All,

This patch was rescued from my ill-fated and long winded attempt to provide
a fix-up for function selector references, where the function is parsed
after the procedure containing the associate/select type construct (PRs
89645 and 99065). The fix-ups broke down completely once these constructs
were enclosed by another associate construct, where the selector is a
derived type or class function. My inclination now is to introduce two pass
parsing for contained procedures.

Returning to PR112834, the patch is simple enough and is well described by
the change logs. PR111853 was fixed as a side effect of the bigger patch.
Steve Kargl had also posted the same fix on the PR.

Regression tests - OK for trunk and 13-branch?

Paul


Change.Logs
Description: Binary data
diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc
index 9e3571d3dbe..cecd2940dcf 100644
--- a/gcc/fortran/match.cc
+++ b/gcc/fortran/match.cc
@@ -6436,9 +6436,9 @@ build_associate_name (const char *name, gfc_expr **e1, gfc_expr **e2)

   sym = expr1->symtree->n.sym;
   if (expr2->ts.type == BT_UNKNOWN)
-  sym->attr.untyped = 1;
+sym->attr.untyped = 1;
   else
-  copy_ts_from_selector_to_associate (expr1, expr2);
+copy_ts_from_selector_to_associate (expr1, expr2);

   sym->attr.flavor = FL_VARIABLE;
   sym->attr.referenced = 1;
@@ -6527,6 +6527,7 @@ select_type_set_tmp (gfc_typespec *ts)
   gfc_symtree *tmp = NULL;
   gfc_symbol *selector = select_type_stack->selector;
   gfc_symbol *sym;
+  gfc_expr *expr2;

   if (!ts)
 {
@@ -6550,7 +6551,19 @@ select_type_set_tmp (gfc_typespec *ts)
   sym = tmp->n.sym;
   gfc_add_type (sym, ts, NULL);

-  if (selector->ts.type == BT_CLASS && selector->attr.class_ok
+  /* If the SELECT TYPE selector is a function we might be able to obtain
+	 a typespec from the result. Since the function might not have been
+	 parsed yet we have to check that there is indeed a result symbol.  */
+  if (selector->ts.type == BT_UNKNOWN
+	  && gfc_state_stack->construct
+	  && (expr2 = gfc_state_stack->construct->expr2)
+	  && expr2->expr_type == EXPR_FUNCTION
+	  && expr2->symtree
+	  && expr2->symtree->n.sym && expr2->symtree->n.sym->result)
+	selector->ts = expr2->symtree->n.sym->result->ts;
+
+  if (selector->ts.type == BT_CLASS
+  && selector->attr.class_ok
 	  && selector->ts.u.derived && CLASS_DATA (selector))
 	{
 	  sym->attr.pointer
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index abd3a424f38..c1fa751d0e8 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -5131,7 +5131,7 @@ parse_associate (void)
   gfc_current_ns = my_ns;
   for (a = new_st.ext.block.assoc; a; a = a->next)
 {
-  gfc_symbol* sym;
+  gfc_symbol *sym, *tsym;
   gfc_expr *target;
   int rank;

@@ -5195,6 +5195,16 @@ parse_associate (void)
 	  sym->ts.type = BT_DERIVED;
 	  sym->ts.u.derived = derived;
 	}
+	  else if (target->symtree && (tsym = target->symtree->n.sym))
+	{
+	  sym->ts = tsym->result ? tsym->result->ts : tsym->ts;
+	  if (sym->ts.type == BT_CLASS)
+		{
+		  if (CLASS_DATA (sym)->as)
+		target->rank = CLASS_DATA (sym)->as->rank;
+		  sym->attr.class_ok = 1;
+		}
+	}
 	}

   rank = target->rank;
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 166b702cd9a..92678b816a1 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -5669,7 +5669,7 @@ gfc_expression_rank (gfc_expr *e)
   if (ref->type != REF_ARRAY)
 	continue;

-  if (ref->u.ar.type == AR_FULL)
+  if (ref->u.ar.type == AR_FULL && ref->u.ar.as)
 	{
 	  rank = ref->u.ar.as->rank;
 	  break;
diff --git a/gcc/fortran/trans-stmt.cc b/gcc/fortran/trans-stmt.cc
index 50b71e67234..b70c079fc55 100644
--- a/gcc/fortran/trans-stmt.cc
+++ b/gcc/fortran/trans-stmt.cc
@@ -1746,6 +1746,7 @@ trans_associate_var (gfc_symbol *sym, gfc_wrapped_block *block)
   e = sym->assoc->target;

   class_target = (e->expr_type == EXPR_VARIABLE)
+		&& e->ts.type == BT_CLASS
 		&& (gfc_is_class_scalar_expr (e)
 			|| gfc_is_class_array_ref (e, NULL));

@@ -2037,7 +2038,12 @@ trans_associate_var (gfc_symbol *sym, gfc_wrapped_block *block)

   /* Class associate-names come this way because they are
 	 unconditionally associate pointers and the symbol is scalar.  */
-  if (sym->ts.type == BT_CLASS && CLASS_DATA (sym)->attr.dimension)
+  if (sym->ts.type == BT_CLASS && e->expr_type ==EXPR_FUNCTION)
+	{
+	  gfc_conv_expr (&se, e);
+	  se.expr = gfc_evaluate_now (se.expr, &se.pre);
+	}
+  else if (sym->ts.type == BT_CLASS && CLASS_DATA (sym)->attr.dimension)
 	{
 	  tree target_expr;
 	  /* For a class array we need a descriptor for the selector.  */
! { dg-do run }
!
! Test the fix for PR112834 in which class array function selectors caused
! problems for both ASSOCIATE and SELECT_TYPE.
!
! Contributed by Paul Thomas  
!
module m
  implicit none
  type t
integer :: i = 0
  end type t
  integer :: i = 0
  type(t),

Re: [PATCH] RISC-V: Remove xfail from ssa-fre-3.c testcase

2023-12-06 Thread Palmer Dabbelt


On Tue, 05 Dec 2023 16:39:06 PST (-0800), e...@rivosinc.com wrote:

Ran the test case at 122e7b4f9d0c2d54d865272463a1d812002d0a5c where the xfail


That's the original port submission, I'm actually kind of surprised it 
still builds/works at all.



was introduced. The test did pass at that hash and has continued to pass since
then. Remove the xfail

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-fre-3.c: Remove xfail

Signed-off-by: Edwin Lu 
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c
index 224dd4f72ef..b2924837a22 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c
@@ -18,4 +18,4 @@ foo (int a, int b)
   return aa + bb;
 }

-/* { dg-final { scan-tree-dump "Replaced \\\(int\\\) aa_.*with a_" "fre1" { xfail { 
riscv*-*-* && lp64 } } } } */
+/* { dg-final { scan-tree-dump "Replaced \\\(int\\\) aa_.*with a_" "fre1" } } 
*/


Reviewed-by: Palmer Dabbelt 

Though Kito did all the test suite stuff back then, so not sure if he 
happens to remember anything specific about what was going on.


Thanks!

RE: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD

2023-12-06 Thread Tamar Christina

> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, November 28, 2023 5:56 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 17/21]AArch64: Add implementation for vector cbranch for
> Advanced SIMD
> 
> Richard Sandiford  writes:
> > Tamar Christina  writes:
> >> Hi All,
> >>
> >> This adds an implementation for conditional branch optab for AArch64.
> >>
> >> For e.g.
> >>
> >> void f1 ()
> >> {
> >>   for (int i = 0; i < N; i++)
> >> {
> >>   b[i] += a[i];
> >>   if (a[i] > 0)
> >>break;
> >> }
> >> }
> >>
> >> For 128-bit vectors we generate:
> >>
> >> cmgtv1.4s, v1.4s, #0
> >> umaxp   v1.4s, v1.4s, v1.4s
> >> fmovx3, d1
> >> cbnzx3, .L8
> >>
> >> and of 64-bit vector we can omit the compression:
> >>
> >> cmgtv1.2s, v1.2s, #0
> >> fmovx2, d1
> >> cbz x2, .L13
> >>
> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >>
> >> Ok for master?
> >>
> >> Thanks,
> >> Tamar
> >>
> >> gcc/ChangeLog:
> >>
> >>* config/aarch64/aarch64-simd.md (cbranch4): New.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>* gcc.target/aarch64/vect-early-break-cbranch.c: New test.
> >>
> >> --- inline copy of patch --
> >> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> >> index
> 90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd7
> 0a92924f62524c15 100644
> >> --- a/gcc/config/aarch64/aarch64-simd.md
> >> +++ b/gcc/config/aarch64/aarch64-simd.md
> >> @@ -3830,6 +3830,46 @@ (define_expand
> "vcond_mask_"
> >>DONE;
> >>  })
> >>
> >> +;; Patterns comparing two vectors and conditionally jump
> >> +
> >> +(define_expand "cbranch4"
> >> +  [(set (pc)
> >> +(if_then_else
> >> +  (match_operator 0 "aarch64_equality_operator"
> >> +[(match_operand:VDQ_I 1 "register_operand")
> >> + (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
> >> +  (label_ref (match_operand 3 ""))
> >> +  (pc)))]
> >> +  "TARGET_SIMD"
> >> +{
> >> +  auto code = GET_CODE (operands[0]);
> >> +  rtx tmp = operands[1];
> >> +
> >> +  /* If comparing against a non-zero vector we have to do a comparison 
> >> first
> >> + so we can have a != 0 comparison with the result.  */
> >> +  if (operands[2] != CONST0_RTX (mode))
> >> +emit_insn (gen_vec_cmp (tmp, operands[0], operands[1],
> >> +  operands[2]));
> >> +
> >> +  /* For 64-bit vectors we need no reductions.  */
> >> +  if (known_eq (128, GET_MODE_BITSIZE (mode)))
> >> +{
> >> +  /* Always reduce using a V4SI.  */
> >> +  rtx reduc = gen_lowpart (V4SImode, tmp);
> >> +  rtx res = gen_reg_rtx (V4SImode);
> >> +  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
> >> +  emit_move_insn (tmp, gen_lowpart (mode, res));
> >> +}
> >> +
> >> +  rtx val = gen_reg_rtx (DImode);
> >> +  emit_move_insn (val, gen_lowpart (DImode, tmp));
> >> +
> >> +  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
> >> +  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
> >> +  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
> >> +  DONE;
> >
> > Are you sure this is correct for the operands[2] != const0_rtx case?
> > It looks like it uses the same comparison code for the vector comparison
> > and the scalar comparison.
> >
> > E.g. if the pattern is passed a comparison:
> >
> >   (eq (reg:V2SI x) (reg:V2SI y))
> >
> > it looks like we'd generate a CMEQ for the x and y, then branch
> > when the DImode bitcast of the CMEQ result equals zero.  This means
> > that we branch when no elements of x and y are equal, rather than
> > when all elements of x and y are equal.
> >
> > E.g. for:
> >
> >{ 1, 2 } == { 1, 2 }
> >
> > CMEQ will produce { -1, -1 }, the scalar comparison will be -1 == 0,
> > and the branch won't be taken.
> >
> > ISTM it would be easier for the operands[2] != const0_rtx case to use
> > EOR instead of a comparison.  That gives a zero result if the input
> > vectors are equal and a nonzero result if the input vectors are
> > different.  We can then branch on the result using CODE and const0_rtx.
> >
> > (Hope I've got that right.)
> >
> > Maybe that also removes the need for patch 18.
> 
> Sorry, I forgot to say: we can't use operands[1] as a temporary,
> since it's only an input to the pattern.  The EOR destination would
> need to be a fresh register.

I've updated the patch but it doesn't help since cbranch doesn't really push
comparisons in.  So we don't seem to ever really get called with anything 
non-zero.

That said, I'm not entirely convince that the == case is correct. Since == 
means all bits
Equal instead of any bit set, and so it needs to generate cbz instead of cbnz 
and I'm not
sure that's guaranteed.

I do have a failing testcase with this but haven'

Re: [PATCH] [arm] testsuite: make mve_intrinsic_type_overloads-int.c libc-agnostic

2023-12-06 Thread Richard Earnshaw


Sorry, I only just spotted this while looking at something else.


On 23/05/2023 15:41, Christophe Lyon via Gcc-patches wrote:

Glibc defines int32_t as 'int' while newlib defines it as 'long int'.

Although these correspond to the same size, g++ complains when using the




   'wrong' version:
   invalid conversion from 'long int*' to 'int32_t*' {aka 'int*'} [-fpermissive]
or
   invalid conversion from 'int*' to 'int32_t*' {aka 'long int*'} [-fpermissive]

when calling vst1q(int32*, int32x4_t) with a first parameter of type
'long int *' (resp. 'int *')

To make this test pass with any type of toolchain, this patch defines
'word_type' according to which libc is in use.

2023-05-23  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c:
Support both definitions of int32_t.
---
  .../mve_intrinsic_type_overloads-int.c| 28 ++-
  1 file changed, 15 insertions(+), 13 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
index 7947dc024bc..ab51cc8b323 100644
--- 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
+++ 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
@@ -47,14 +47,22 @@ foo2 (short * addr, int16x8_t value)
vst1q (addr, value);
  }
  
-void

-foo3 (int * addr, int32x4_t value)
-{
-  vst1q (addr, value); /* { dg-warning "invalid conversion" "" { target c++ } 
} */
-}
+/* Glibc defines int32_t as 'int' while newlib defines it as 'long int'.
+
+   Although these correspond to the same size, g++ complains when using the
+   'wrong' version:
+  invalid conversion from 'long int*' to 'int32_t*' {aka 'int*'} [-fpermissive]
+
+  The trick below is to make this test pass whether using glibc-based or
+  newlib-based toolchains.  */
  
+#if defined(__GLIBC__)

+#define word_type int
+#else
+#define word_type long int
+#endif


GCC #defines __INT32_TYPE__ for this and should be more reliable than 
trying to detect one specific library implementation.  Did you try that?



  void
-foo4 (long * addr, int32x4_t value)
+foo3 (word_type * addr, int32x4_t value)
  {
vst1q (addr, value);
  }
@@ -78,13 +86,7 @@ foo7 (unsigned short * addr, uint16x8_t value)
  }
  
  void

-foo8 (unsigned int * addr, uint32x4_t value)
-{
-  vst1q (addr, value); /* { dg-warning "invalid conversion" "" { target c++ } 
} */
-}
-
-void
-foo9 (unsigned long * addr, uint32x4_t value)
+foo8 (unsigned word_type * addr, uint32x4_t value)
  {
vst1q (addr, value);
  }


R.

[PATCH] libiberty/buildargv: POSIX behaviour for backslash handling

2023-12-06 Thread Andrew Burgess

GDB makes use of the libiberty function buildargv for splitting the
inferior (program being debugged) argument string in the case where
the inferior is not being started under a shell.

I have recently been working to improve this area of GDB, and have
tracked done some of the unexpected behaviour to the libiberty
function buildargv, and how it handles backslash escapes.

For reference, I've been mostly reading:

  https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html

The issues that I would like to fix are:

  1. Backslashes within single quotes should not be treated as an
  escape, thus: '\a' should split to \a, retaining the backslash.

  2. Backslashes within double quotes should only act as an escape if
  they are immediately before one of the characters $ (dollar),
  ` (backtick), " (double quote), ` (backslash), or \n (newline).  In
  all other cases a backslash should not be treated as an escape
  character.  Thus: "\a" should split to \a, but "\$" should split to
  $.

  3. A backslash-newline sequence should be treated as a line
  continuation, both the backslash and the newline should be removed.

I've updated libiberty and also added some tests.  All the existing
libiberty tests continue to pass, but I'm not sure if there is more
testing that should be done, buildargv is used within lto-wraper.cc,
so maybe there's some testing folk can suggest that I run?
---
 libiberty/argv.c  |  8 +--
 libiberty/testsuite/test-expandargv.c | 34 +++
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/libiberty/argv.c b/libiberty/argv.c
index c2823d3e4ba..6bae4ca2ee9 100644
--- a/libiberty/argv.c
+++ b/libiberty/argv.c
@@ -224,9 +224,13 @@ char **buildargv (const char *input)
  if (bsquote)
{
  bsquote = 0;
- *arg++ = *input;
+ if (*input != '\n')
+   *arg++ = *input;
}
- else if (*input == '\\')
+ else if (*input == '\\'
+  && !squote
+  && (!dquote
+  || strchr ("$`\"\\\n", *(input + 1)) != NULL))
{
  bsquote = 1;
}
diff --git a/libiberty/testsuite/test-expandargv.c 
b/libiberty/testsuite/test-expandargv.c
index 30f2337ef77..b8dcc6a269a 100644
--- a/libiberty/testsuite/test-expandargv.c
+++ b/libiberty/testsuite/test-expandargv.c
@@ -142,6 +142,40 @@ const char *test_data[] = {
   "b",
   0,
 
+  /* Test 7 - No backslash removal within single quotes.  */
+  "'a\\$VAR' '\\\"'",/* Test 7 data */
+  ARGV0,
+  "@test-expandargv-7.lst",
+  0,
+  ARGV0,
+  "a\\$VAR",
+  "\\\"",
+  0,
+
+  /* Test 8 - Remove backslash / newline pairs.  */
+  "\"ab\\\ncd\" ef\\\ngh",/* Test 8 data */
+  ARGV0,
+  "@test-expandargv-8.lst",
+  0,
+  ARGV0,
+  "abcd",
+  "efgh",
+  0,
+
+  /* Test 9 - Backslash within double quotes.  */
+  "\"\\$VAR\" \"\\`\" \"\\\"\" \"\" \"\\n\" \"\\t\"",/* Test 9 data */
+  ARGV0,
+  "@test-expandargv-9.lst",
+  0,
+  ARGV0,
+  "$VAR",
+  "`",
+  "\"",
+  "\\",
+  "\\n",
+  "\\t",
+  0,
+
   0 /* Test done marker, don't remove. */
 };
 

base-commit: 458e7c937924bbcef80eb006af0b61420dbfc1c1
-- 
2.25.4

[committed v4 0/3] libgomp: OpenMP low-latency omp_alloc

2023-12-06 Thread Andrew Stubbs

Thank you, Tobias, for approving the v3 patch series with minor changes.

https://patchwork.sourceware.org/project/gcc/list/?series=27815&state=%2A&archive=both

These patches are what I've actually committed.  Besides the requested
changes there were one or two bug fixes and minor tweaks, but otherwise
the patches are the same.

The series implements device-specific allocators and adds a low-latency
allocator for both GPUs architectures.

Andrew Stubbs (3):
  libgomp, nvptx: low-latency memory allocator
  openmp, nvptx: low-lat memory access traits
  amdgcn, libgomp: low-latency allocator

 gcc/config/gcn/gcn-builtins.def   |   2 +
 gcc/config/gcn/gcn.cc |  16 +-
 libgomp/allocator.c   | 266 +++-
 libgomp/basic-allocator.c | 382 ++
 libgomp/config/gcn/allocator.c| 127 ++
 libgomp/config/gcn/libgomp-gcn.h  |   6 +
 libgomp/config/gcn/team.c |  12 +
 libgomp/config/nvptx/allocator.c  | 141 +++
 libgomp/config/nvptx/team.c   |  18 +
 libgomp/libgomp.h |   3 -
 libgomp/libgomp.texi  |  42 +-
 libgomp/plugin/plugin-gcn.c   |  35 +-
 libgomp/plugin/plugin-nvptx.c |  23 +-
 libgomp/testsuite/libgomp.c/omp_alloc-1.c |  66 +++
 libgomp/testsuite/libgomp.c/omp_alloc-2.c |  72 
 libgomp/testsuite/libgomp.c/omp_alloc-3.c |  49 +++
 libgomp/testsuite/libgomp.c/omp_alloc-4.c | 200 +
 libgomp/testsuite/libgomp.c/omp_alloc-5.c |  71 
 libgomp/testsuite/libgomp.c/omp_alloc-6.c | 121 ++
 .../testsuite/libgomp.c/omp_alloc-traits.c|  66 +++
 20 files changed, 1603 insertions(+), 115 deletions(-)
 create mode 100644 libgomp/basic-allocator.c
 create mode 100644 libgomp/config/gcn/allocator.c
 create mode 100644 libgomp/config/nvptx/allocator.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-6.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-traits.c

-- 
2.41.0

[committed v4 2/3] openmp, nvptx: low-lat memory access traits

2023-12-06 Thread Andrew Stubbs


The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all".  This change means that the omp_low_lat_mem_alloc predefined
allocator no longer works (but omp_cgroup_mem_alloc still does).

libgomp/ChangeLog:

* allocator.c (MEMSPACE_VALIDATE): New macro.
(omp_init_allocator): Use MEMSPACE_VALIDATE.
(omp_aligned_alloc): Use OMP_LOW_LAT_MEM_ALLOC_INVALID.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* config/nvptx/allocator.c (nvptx_memspace_validate): New function.
(MEMSPACE_VALIDATE): New macro.
(OMP_LOW_LAT_MEM_ALLOC_INVALID): New define.
* libgomp.texi: Document low-latency implementation details.
* testsuite/libgomp.c/omp_alloc-1.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-2.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-3.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-4.c (main): Add access trait.
* testsuite/libgomp.c/omp_alloc-5.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-6.c (main): Add access trait.
* testsuite/libgomp.c/omp_alloc-traits.c: New test.
---
 libgomp/allocator.c   | 20 ++
 libgomp/config/nvptx/allocator.c  | 21 ++
 libgomp/libgomp.texi  | 18 +
 libgomp/testsuite/libgomp.c/omp_alloc-1.c | 10 +++
 libgomp/testsuite/libgomp.c/omp_alloc-2.c |  8 +++
 libgomp/testsuite/libgomp.c/omp_alloc-3.c |  7 ++
 libgomp/testsuite/libgomp.c/omp_alloc-4.c |  7 +-
 libgomp/testsuite/libgomp.c/omp_alloc-5.c |  8 +++
 libgomp/testsuite/libgomp.c/omp_alloc-6.c |  7 +-
 .../testsuite/libgomp.c/omp_alloc-traits.c| 66 +++
 10 files changed, 166 insertions(+), 6 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-traits.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index fa398128368..a8a80f8028d 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -56,6 +56,10 @@
 #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
   free (((void)(MEMSPACE), (void)(SIZE), (ADDR)))
 #endif
+#ifndef MEMSPACE_VALIDATE
+#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \
+  (((void)(MEMSPACE), (void)(ACCESS), 1))
+#endif
 
 /* Map the predefined allocators to the correct memory space.
The index to this table is the omp_allocator_handle_t enum value.
@@ -439,6 +443,10 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
   if (data.pinned)
 return omp_null_allocator;
 
+  /* Reject unsupported memory spaces.  */
+  if (!MEMSPACE_VALIDATE (data.memspace, data.access))
+return omp_null_allocator;
+
   ret = gomp_malloc (sizeof (struct omp_allocator_data));
   *ret = data;
 #ifndef HAVE_SYNC_BUILTINS
@@ -522,6 +530,10 @@ retry:
 new_size += new_alignment - sizeof (void *);
   if (__builtin_add_overflow (size, new_size, &new_size))
 goto fail;
+#ifdef OMP_LOW_LAT_MEM_ALLOC_INVALID
+  if (allocator == omp_low_lat_mem_alloc)
+goto fail;
+#endif
 
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
@@ -820,6 +832,10 @@ retry:
 goto fail;
   if (__builtin_add_overflow (size_temp, new_size, &new_size))
 goto fail;
+#ifdef OMP_LOW_LAT_MEM_ALLOC_INVALID
+  if (allocator == omp_low_lat_mem_alloc)
+goto fail;
+#endif
 
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
@@ -1054,6 +1070,10 @@ retry:
   if (__builtin_add_overflow (size, new_size, &new_size))
 goto fail;
   old_size = data->size;
+#ifdef OMP_LOW_LAT_MEM_ALLOC_INVALID
+  if (allocator == omp_low_lat_mem_alloc)
+goto fail;
+#endif
 
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
index 6014fba177f..a3302411bcb 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -108,6 +108,21 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
 return realloc (addr, size);
 }
 
+static inline int
+nvptx_memspace_validate (omp_memspace_handle_t memspace, unsigned access)
+{
+#if __PTX_ISA_VERSION_MAJOR__ > 4 \
+|| (__PTX_ISA_VERSION_MAJOR__ == 4 && __PTX_ISA_VERSION_MINOR >= 1)
+  /* Disallow use of low-latency memory when it must be accessible by
+ all threads.  */
+  return (memspace != omp_low_lat_mem_space
+	  || access != omp_atv_all);
+#else
+  /* Low-latency memory is not available before PTX 4.1.  */
+  return (memspace != omp_low_lat_mem_space);
+#endif
+}
+
 #define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
   nvptx_memspace_alloc (MEMSPACE, SIZE)
 #define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
@@ -116,5 +131,11 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
   nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE)
 #d

1 2 >

1 - 100 of 194 matches

Mail list logo