[PATCH v2] i386: Refactor ssedoublemode

2024-07-05 Thread Hu, Lin1
I Modified the changelog and comments.

ssedoublemode's double should mean double type, like SI -> DI.
And we need to refactor some patterns with  instead of
.

BRs,
Lin

gcc/ChangeLog:

* config/i386/sse.md (ssedoublemode): Remove mappings to double
  of elements and mapping vector mode to the same number of
  double sized elements.
  (define_split for vec_concat_minus_plus): Change mode_attr from
  ssedoublemode to ssedoublevecmode.
  (define_split for vec_concat_plus_minus): Ditto.
  (avx512dq_shuf_64x2_1):
  Ditto.
  (avx512f_shuf_64x2_1): Ditto.
  (avx512vl_shuf_32x4_1): Ditto.
  (avx512f_shuf_32x4_1): Ditto.
---
 gcc/config/i386/sse.md | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d71b0f2567e..bda66d5e121 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -808,13 +808,12 @@ (define_mode_attr ssedoublemodelower
(V8HI "v8si")   (V16HI "v16si") (V32HI "v32si")
(V4SI "v4di")   (V8SI "v8di")   (V16SI "v16di")])
 
+;; Map vector mode to the same number of double sized elements.
 (define_mode_attr ssedoublemode
-  [(V4SF "V8SF") (V8SF "V16SF") (V16SF "V32SF")
-   (V2DF "V4DF") (V4DF "V8DF") (V8DF "V16DF")
+  [(V4SF "V4DF") (V8SF "V8DF") (V16SF "V16DF")
(V16QI "V16HI") (V32QI "V32HI") (V64QI "V64HI")
(V8HI "V8SI") (V16HI "V16SI") (V32HI "V32SI")
-   (V4SI "V4DI") (V8SI "V16SI") (V16SI "V32SI")
-   (V4DI "V8DI") (V8DI "V16DI")])
+   (V4SI "V4DI") (V8SI "V8DI") (V16SI "V16DI")])
 
 (define_mode_attr ssebytemode
   [(V8DI "V64QI") (V4DI "V32QI") (V2DI "V16QI")
@@ -3319,7 +3318,7 @@ (define_split
 (define_split
   [(set (match_operand:VF_128_256 0 "register_operand")
(match_operator:VF_128_256 7 "addsub_vs_operator"
- [(vec_concat:
+ [(vec_concat:
 (minus:VF_128_256
   (match_operand:VF_128_256 1 "register_operand")
   (match_operand:VF_128_256 2 "vector_operand"))
@@ -3353,7 +3352,7 @@ (define_split
 (define_split
   [(set (match_operand:VF_128_256 0 "register_operand")
(match_operator:VF_128_256 7 "addsub_vs_operator"
- [(vec_concat:
+ [(vec_concat:
 (plus:VF_128_256
   (match_operand:VF_128_256 1 "vector_operand")
   (match_operand:VF_128_256 2 "vector_operand"))
@@ -19869,7 +19868,7 @@ (define_expand "avx512dq_shuf_64x2_mask"
 (define_insn "avx512dq_shuf_64x2_1"
   [(set (match_operand:VI8F_256 0 "register_operand" "=x,v")
(vec_select:VI8F_256
- (vec_concat:
+ (vec_concat:
(match_operand:VI8F_256 1 "register_operand" "x,v")
(match_operand:VI8F_256 2 "nonimmediate_operand" "xjm,vm"))
  (parallel [(match_operand 3 "const_0_to_3_operand")
@@ -19922,7 +19921,7 @@ (define_expand "avx512f_shuf_64x2_mask"
 (define_insn "avx512f_shuf_64x2_1"
   [(set (match_operand:V8FI 0 "register_operand" "=v")
(vec_select:V8FI
- (vec_concat:
+ (vec_concat:
(match_operand:V8FI 1 "register_operand" "v")
(match_operand:V8FI 2 "nonimmediate_operand" "vm"))
  (parallel [(match_operand 3 "const_0_to_7_operand")
@@ -20020,7 +20019,7 @@ (define_expand "avx512vl_shuf_32x4_mask"
 (define_insn "avx512vl_shuf_32x4_1"
   [(set (match_operand:VI4F_256 0 "register_operand" "=x,v")
(vec_select:VI4F_256
- (vec_concat:
+ (vec_concat:
(match_operand:VI4F_256 1 "register_operand" "x,v")
(match_operand:VI4F_256 2 "nonimmediate_operand" "xjm,vm"))
  (parallel [(match_operand 3 "const_0_to_7_operand")
@@ -20091,7 +20090,7 @@ (define_expand "avx512f_shuf_32x4_mask"
 (define_insn "avx512f_shuf_32x4_1"
   [(set (match_operand:V16FI 0 "register_operand" "=v")
(vec_select:V16FI
- (vec_concat:
+ (vec_concat:
(match_operand:V16FI 1 "register_operand" "v")
(match_operand:V16FI 2 "nonimmediate_operand" "vm"))
  (parallel [(match_operand 3 "const_0_to_15_operand")
-- 
2.31.1



Re: [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4

2024-07-05 Thread Sébastien Michelland

Hi Oleg!


I don't understand why this is being limited to SH3 and SH4 only?
Almost all SH4 systems out there have an FPU (unless special configurations
are used).  So I'd say if switching to soft-fp, then for SH-anything, not
just SH3/SH4.

If it yields some improvements for some users, I'm all for it.


Yeah I just defaulted to SH3/SH4 conservatively because that's the only 
hardware I have. (My main platform also happens to be one of these SH4 
without an FPU, the SH4AL-DSP.)


Once this is tested/validated on simulator, I'll happily simplify the 
patch to apply to all SH.



I think it would make sense to test it using sh-sim on SH2 big-endian and
little endian at least, as that doesn't have an FPU and hence would run
tests utilizing soft-fp.

After building the toolchain for --target=sh-elf, you can use this to run
the testsuite in the simulator:

make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb}"

(add make -j parameter according to you needs -- it will be slow)


Alright, it might take a little bit.

Building the combined tree of gcc/binutils/newlib masters (again 
following [1]) gives me an ICE in libstdc++v3/src/libbacktrace, 
irrespective of my libgcc change:


---
during RTL pass: final
elf.c: In function ‘elf_zstd_decompress’:
elf.c:4999:1: internal compiler error: in output_296, at 
config/sh/sh.md:8408

 4999 | }
  | ^
0x1c8765e internal_error(char const*, ...)
../../combined/gcc/diagnostic-global-context.cc:491
0x881269 fancy_abort(char const*, int, char const*)
../../combined/gcc/diagnostic.cc:1725
0x83b73b output_296
../../combined/gcc/config/sh/sh.md:8408
0x83b73b output_296
../../combined/gcc/config/sh/sh.md:8063
0xb783c2 final_scan_insn_1
../../combined/gcc/final.cc:2773
0xb78938 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
../../combined/gcc/final.cc:2886
0xb78b5f final_1
../../combined/gcc/final.cc:1977
0xb796a8 rest_of_handle_final
../../combined/gcc/final.cc:4239
0xb796a8 execute
../../combined/gcc/final.cc:4317
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).

Please include the complete backtrace with any bug report.
See  for instructions.
make[9]: *** [Makefile:628: std_stacktrace-elf.lo] Error 1
make[9]: *** Waiting for unfinished jobs
make[9]: Leaving directory 
'/home/el/Programs/sh-elf-gcc/build-combined2/sh-elf/m2a/libstdc++-v3/src/libbacktrace'

---

My configure, for reference (--disable-source-highlight came up from a 
configure error earlier):


../combined/configure   \
--prefix="$PREFIX"  \
--target="sh-elf"   \
--enable-languages="c,c++"  \
--disable-gdb   \
--disable-source-highlight

The libbacktrace build in gcc (make all-libbacktrace) works without an 
issue.


I'll have to prepare a bug report (I couldn't find anything related), 
but bisecting on a triplet of repos doesn't sound very fun and I believe 
I do need the newlib to build libstdc++ in a reproducible way.


Any advice before I go on that tangent?

Sébastien

[1] https://gcc.gnu.org/simtest-howto.html


[PATCH] middle-end: Add debug function to dump dominator tree in dot format

2024-07-05 Thread Alex Coplan
Hi,

This adds a debug function to dump the dominator tree in dot/graphviz
format.  The idea is that the function can be called in GDB, the output
copy/pasted into a .dot file and then rendered using graphviz.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

* dominance.cc (debug_dominance_tree_dot): New.
diff --git a/gcc/dominance.cc b/gcc/dominance.cc
index 0357210ed27..b536f193abc 100644
--- a/gcc/dominance.cc
+++ b/gcc/dominance.cc
@@ -1658,6 +1658,20 @@ debug_dominance_info (enum cdi_direction dir)
   fprintf (stderr, "%i %i\n", bb->index, bb2->index);
 }
 
+/* Print the dominance tree (in direction DIR) in dot form.  This allows easily
+   visualizing the tree using graphviz.  */
+
+DEBUG_FUNCTION void
+debug_dominance_tree_dot (enum cdi_direction dir)
+{
+  fprintf (stderr, "digraph {\n");
+  basic_block bb, idom;
+  FOR_EACH_BB_FN (bb, cfun)
+if ((idom = get_immediate_dominator (dir, bb)))
+  fprintf (stderr, "%i -> %i;\n", idom->index, bb->index);
+  fprintf (stderr, "}\n");
+}
+
 /* Prints to stderr representation of the dominance tree (for direction DIR)
rooted in ROOT, indented by INDENT tabulators.  If INDENT_FIRST is false,
the first line of the output is not indented.  */


[PATCH v3] MIPS: Output $0 instead of 0 for conditional trap if one operand is zero

2024-07-05 Thread YunQiang Su
We have done so for MIPSr6, which removes the support of condtional
trap with IMM.  To be consistent, Let's do so for pre-R6.

We also add 2 new tests
1) be sure that $0 is used.
2) be sure we expand the condtional trap compare with constant,
   instead of leaving it to GAS.

We decide to so so for MIPSr6 is that, we find a problem for code
.setnoreorder
.setnomacro
teq $2,0
GAS expands it instead of converting it to `teq $2,$0`:
li  $3,0
teq $2,$3
It is wrong, as we ask for `nomacro`.

GCC works well with `teq $2,IMM`, if IMM is not zero.  To be
sure that it will always be so in future, Let's add a test for it.

gcc
* config/mips/mips.md(conditional_trap): Output $0 instead of
IMM0.

gcc/testsuite:
* gcc.target/mips/trap-compare-0.c: Testcase to be sure that
$0 is used instead of IMM0 for conditional trap.
* gcc.target/mips/trap-compare-imm-r6.c: Testcase to be sure
that we expand condtional trap compare with constant.
---
 gcc/config/mips/mips.md   |  2 +-
 .../gcc.target/mips/trap-compare-0.c  | 31 
 .../gcc.target/mips/trap-compare-imm-r6.c | 36 +++
 3 files changed, 68 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/trap-compare-0.c
 create mode 100644 gcc/testsuite/gcc.target/mips/trap-compare-imm-r6.c

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index fd64d3d001a..591ae3cb438 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -1254,7 +1254,7 @@ (define_insn "*conditional_trap"
 (match_operand:GPR 2 "arith_operand" "dI")])
(const_int 0))]
   "ISA_HAS_COND_TRAPI"
-  "t%C0\t%z1,%2"
+  "t%C0\t%z1,%z2"
   [(set_attr "type" "trap")])
 
 ;;
diff --git a/gcc/testsuite/gcc.target/mips/trap-compare-0.c 
b/gcc/testsuite/gcc.target/mips/trap-compare-0.c
new file mode 100644
index 000..fb0078d34f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/trap-compare-0.c
@@ -0,0 +1,31 @@
+/* Check that we use $0 instead of 0 in conditional trap.  */
+/* { dg-do compile } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+
+NOMIPS16
+void teq0 (int i) {
+  if (i == 0)
+__builtin_trap();
+}
+
+NOMIPS16
+void tne0 (int i) {
+  if (i != 0)
+__builtin_trap();
+}
+
+NOMIPS16
+void tge0 (int i) {
+  if (i >= 0)
+__builtin_trap();
+}
+NOMIPS16
+void tlt0 (int i) {
+  if (i < 0)
+__builtin_trap();
+}
+
+/* { dg-final { scan-assembler "teq0:.*\tteq\t\\\$4,\\\$0" } } */
+/* { dg-final { scan-assembler "tne0:.*\ttne\t\\\$4,\\\$0" } } */
+/* { dg-final { scan-assembler "tge0:.*\ttge\t\\\$4,\\\$0" } } */
+/* { dg-final { scan-assembler "tlt0:.*\ttlt\t\\\$4,\\\$0" } } */
diff --git a/gcc/testsuite/gcc.target/mips/trap-compare-imm-r6.c 
b/gcc/testsuite/gcc.target/mips/trap-compare-imm-r6.c
new file mode 100644
index 000..b12e40672d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/trap-compare-imm-r6.c
@@ -0,0 +1,36 @@
+/* Check that no teq $2,imm macro is used for R6.  */
+/* { dg-do compile } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+/* { dg-options "isa_rev>=6" } */
+
+NOMIPS16
+void teq5 (int i) {
+  if (i == 5)
+__builtin_trap();
+}
+
+NOMIPS16
+void tne5 (int i) {
+  if (i != 5)
+__builtin_trap();
+}
+
+NOMIPS16
+void tge5 (int i) {
+  if (i >= 5)
+__builtin_trap();
+}
+NOMIPS16
+void tlt5 (int i) {
+  if (i < 5)
+__builtin_trap();
+}
+
+/* { dg-final { scan-assembler "teq5:.*\tli\t\\\$2,5.*\tteq\t\\\$4,\\\$2" } } 
*/
+/* { dg-final { scan-assembler-not "teq5:.*\tteq\t\\\$4,5" } } */
+/* { dg-final { scan-assembler "tne5:.*\tli\t\\\$2,5.*\ttne\t\\\$4,\\\$2" } } 
*/
+/* { dg-final { scan-assembler-not "tne5:.*\ttne\t\\\$4,5" } } */
+/* { dg-final { scan-assembler "tge5:.*\tli\t\\\$2,4.*\ttge\t\\\$2,\\\$4" } } 
*/
+/* { dg-final { scan-assembler-not "tge5:.*\ttge\t\\\$4,5" } } */
+/* { dg-final { scan-assembler "tlt5:.*\tli\t\\\$2,4.*\ttge\t\\\$2,\\\$4" } } 
*/
+/* { dg-final { scan-assembler-not "tlt5:.*\ttlt\t\\\$4,5" } } */
-- 
2.39.3 (Apple Git-146)



Re: [PATCH] middle-end: Add debug function to dump dominator tree in dot format

2024-07-05 Thread Richard Biener
On Fri, 5 Jul 2024, Alex Coplan wrote:

> Hi,
> 
> This adds a debug function to dump the dominator tree in dot/graphviz
> format.  The idea is that the function can be called in GDB, the output
> copy/pasted into a .dot file and then rendered using graphviz.
> 
> Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Can you follow other APIs here and rename and use a FILE * arg?

DEBUG_FUNCTION void
dot_dominance_tree (FILE *f, enum cdi_direction dir)
...

that way in gdb you can do

(gdb) p fopen ("/tmp/x.dot", "w")
(gdb) p dot_dominance_tree ($1, CDI_DOMINATORS);
(gdb0 p fclose ($1);

and then dot the file?  It's also easier to use this from a
gdb python wrapper which can do the above as well.  In other
places there's then an overload with a const char *fname argument
doing the fopen/fclose itself.

Thanks,
Richard.

> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   * dominance.cc (debug_dominance_tree_dot): New.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] i386: Refactor ssedoublemode

2024-07-05 Thread Uros Bizjak
On Fri, Jul 5, 2024 at 9:07 AM Hu, Lin1  wrote:
>
> I Modified the changelog and comments.
>
> ssedoublemode's double should mean double type, like SI -> DI.
> And we need to refactor some patterns with  instead of
> .
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
> * config/i386/sse.md (ssedoublemode): Remove mappings to double
>   of elements and mapping vector mode to the same number of
>   double sized elements.

Better write: "Remove mappings to twice the number of same-sized
elements.  Add mappings to the same number of double-sized elements."

>   (define_split for vec_concat_minus_plus): Change mode_attr from
>   ssedoublemode to ssedoublevecmode.
>   (define_split for vec_concat_plus_minus): Ditto.
>   (avx512dq_shuf_64x2_1):
>   Ditto.
>   (avx512f_shuf_64x2_1): Ditto.
>   (avx512vl_shuf_32x4_1): Ditto.
>   (avx512f_shuf_32x4_1): Ditto.

OK with the above ChangeLog adjustment.

Thanks,
Uros.

> ---
>  gcc/config/i386/sse.md | 19 +--
>  1 file changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index d71b0f2567e..bda66d5e121 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -808,13 +808,12 @@ (define_mode_attr ssedoublemodelower
> (V8HI "v8si")   (V16HI "v16si") (V32HI "v32si")
> (V4SI "v4di")   (V8SI "v8di")   (V16SI "v16di")])
>
> +;; Map vector mode to the same number of double sized elements.
>  (define_mode_attr ssedoublemode
> -  [(V4SF "V8SF") (V8SF "V16SF") (V16SF "V32SF")
> -   (V2DF "V4DF") (V4DF "V8DF") (V8DF "V16DF")
> +  [(V4SF "V4DF") (V8SF "V8DF") (V16SF "V16DF")
> (V16QI "V16HI") (V32QI "V32HI") (V64QI "V64HI")
> (V8HI "V8SI") (V16HI "V16SI") (V32HI "V32SI")
> -   (V4SI "V4DI") (V8SI "V16SI") (V16SI "V32SI")
> -   (V4DI "V8DI") (V8DI "V16DI")])
> +   (V4SI "V4DI") (V8SI "V8DI") (V16SI "V16DI")])
>
>  (define_mode_attr ssebytemode
>[(V8DI "V64QI") (V4DI "V32QI") (V2DI "V16QI")
> @@ -3319,7 +3318,7 @@ (define_split
>  (define_split
>[(set (match_operand:VF_128_256 0 "register_operand")
> (match_operator:VF_128_256 7 "addsub_vs_operator"
> - [(vec_concat:
> + [(vec_concat:
>  (minus:VF_128_256
>(match_operand:VF_128_256 1 "register_operand")
>(match_operand:VF_128_256 2 "vector_operand"))
> @@ -3353,7 +3352,7 @@ (define_split
>  (define_split
>[(set (match_operand:VF_128_256 0 "register_operand")
> (match_operator:VF_128_256 7 "addsub_vs_operator"
> - [(vec_concat:
> + [(vec_concat:
>  (plus:VF_128_256
>(match_operand:VF_128_256 1 "vector_operand")
>(match_operand:VF_128_256 2 "vector_operand"))
> @@ -19869,7 +19868,7 @@ (define_expand "avx512dq_shuf_64x2_mask"
>  (define_insn "avx512dq_shuf_64x2_1"
>[(set (match_operand:VI8F_256 0 "register_operand" "=x,v")
> (vec_select:VI8F_256
> - (vec_concat:
> + (vec_concat:
> (match_operand:VI8F_256 1 "register_operand" "x,v")
> (match_operand:VI8F_256 2 "nonimmediate_operand" "xjm,vm"))
>   (parallel [(match_operand 3 "const_0_to_3_operand")
> @@ -19922,7 +19921,7 @@ (define_expand "avx512f_shuf_64x2_mask"
>  (define_insn "avx512f_shuf_64x2_1"
>[(set (match_operand:V8FI 0 "register_operand" "=v")
> (vec_select:V8FI
> - (vec_concat:
> + (vec_concat:
> (match_operand:V8FI 1 "register_operand" "v")
> (match_operand:V8FI 2 "nonimmediate_operand" "vm"))
>   (parallel [(match_operand 3 "const_0_to_7_operand")
> @@ -20020,7 +20019,7 @@ (define_expand "avx512vl_shuf_32x4_mask"
>  (define_insn "avx512vl_shuf_32x4_1"
>[(set (match_operand:VI4F_256 0 "register_operand" "=x,v")
> (vec_select:VI4F_256
> - (vec_concat:
> + (vec_concat:
> (match_operand:VI4F_256 1 "register_operand" "x,v")
> (match_operand:VI4F_256 2 "nonimmediate_operand" "xjm,vm"))
>   (parallel [(match_operand 3 "const_0_to_7_operand")
> @@ -20091,7 +20090,7 @@ (define_expand "avx512f_shuf_32x4_mask"
>  (define_insn "avx512f_shuf_32x4_1"
>[(set (match_operand:V16FI 0 "register_operand" "=v")
> (vec_select:V16FI
> - (vec_concat:
> + (vec_concat:
> (match_operand:V16FI 1 "register_operand" "v")
> (match_operand:V16FI 2 "nonimmediate_operand" "vm"))
>   (parallel [(match_operand 3 "const_0_to_15_operand")
> --
> 2.31.1
>


Re: [PATCH] middle-end: Add debug function to dump dominator tree in dot format

2024-07-05 Thread Alex Coplan
On 05/07/2024 09:59, Richard Biener wrote:
> On Fri, 5 Jul 2024, Alex Coplan wrote:
> 
> > Hi,
> > 
> > This adds a debug function to dump the dominator tree in dot/graphviz
> > format.  The idea is that the function can be called in GDB, the output
> > copy/pasted into a .dot file and then rendered using graphviz.
> > 
> > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
> 
> Can you follow other APIs here and rename and use a FILE * arg?
> 
> DEBUG_FUNCTION void
> dot_dominance_tree (FILE *f, enum cdi_direction dir)
> ...
> 
> that way in gdb you can do
> 
> (gdb) p fopen ("/tmp/x.dot", "w")
> (gdb) p dot_dominance_tree ($1, CDI_DOMINATORS);
> (gdb0 p fclose ($1);
> 
> and then dot the file?  It's also easier to use this from a
> gdb python wrapper which can do the above as well.  In other
> places there's then an overload with a const char *fname argument
> doing the fopen/fclose itself.

Yes, that sounds much better (it would certainly be more useable in
bigger functions that way).  I'll give that a go (including the
convenience overload), thanks.

Alex

> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Alex
> > 
> > gcc/ChangeLog:
> > 
> > * dominance.cc (debug_dominance_tree_dot): New.
> > 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


RE: [PATCH 1/2]AArch64: make aarch64_simd_vec_unpack_lo_/_hi_ consistent.

2024-07-05 Thread Tamar Christina
> > The principle is that, say:
> >
> >   (vec_select:V2SI (reg:V2DI R) (parallel [(const_int 0) (const_int 1)]))
> >
> > is (for little-endian) equivalent to:
> >
> >   (subreg:V2SI (reg:V2DI R) 0)
> 
> Sigh, of course I meant V4SI rather than V2DI in the above :)
> 
> > and similarly for the equivalent big-endian pair.  Simplification rules
> > are now supposed to ensure that only the second (simpler) form is generated
> > by target-independent code.  We should fix any cases where that doesn't
> > happen, since it would be a missed optimisation for any instructions
> > that take (in this case) V2SI inputs.
> >
> > There's no equivalent simplification for _hi because it isn't possible
> > to refer directly to the upper 64 bits of a 128-bit register using subregs.
> >

This removes aarch64_simd_vec_unpack_lo_.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(aarch64_simd_vec_unpack_lo_): Remove.
(vec_unpack_lo_topbits_shuffle_be"
 
 ;; Widening operations.
 
-(define_insn "aarch64_simd_vec_unpack_lo_"
-  [(set (match_operand: 0 "register_operand" "=w")
-(ANY_EXTEND: (vec_select:
-  (match_operand:VQW 1 "register_operand" "w")
-  (match_operand:VQW 2 "vect_par_cnst_lo_half" "")
-   )))]
-  "TARGET_SIMD"
-  "xtl\t%0., %1."
-  [(set_attr "type" "neon_shift_imm_long")]
-)
-
 (define_insn_and_split "aarch64_simd_vec_unpack_hi_"
   [(set (match_operand: 0 "register_operand" "=w")
 (ANY_EXTEND: (vec_select:
@@ -1952,14 +1941,11 @@ (define_expand "vec_unpack_hi_"
 )
 
 (define_expand "vec_unpack_lo_"
-  [(match_operand: 0 "register_operand")
-   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))]
+  [(set (match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand")))]
   "TARGET_SIMD"
   {
-rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
-emit_insn (gen_aarch64_simd_vec_unpack_lo_ (operands[0],
- operands[1], p));
-DONE;
+operands[1] = lowpart_subreg (mode, operands[1], mode);
   }
 )
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
6b106a72e49f11b8128485baceaaaddcbf139863..469eb938953a70bc6b0ce3d4aa16f773e40ee03e
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23188,7 +23188,8 @@ aarch64_gen_shareable_zero (machine_mode mode)
to split without that restriction and instead recombine shared zeros
if they turn out not to be worthwhile.  This would allow splits in
single-block functions and would also cope more naturally with
-   rematerialization.  */
+   rematerialization.  The downside of not doing this is that we lose the
+   optimizations for vector epilogues as well.  */
 
 bool
 aarch64_split_simd_shift_p (rtx_insn *insn)


rb18605.patch
Description: rb18605.patch


Re: [PATCH] Build/Cross: Look for target headers from include if sys-include doesn't exist

2024-07-05 Thread YunQiang Su
Ping again.


RE: [PATCH 2/2]AArch64: lower 2 reg TBL permutes with one zero register to 1 reg TBL.

2024-07-05 Thread Tamar Christina
> > +v16qi f3b (v16qi a)
> > +{
> > +  v16qi zeros = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
> > +  return __builtin_shufflevector (a, zeros, 0, 5, 1, 6, 2, 7, 3, 8, 4, 9, 
> > 5, 10, 6, 11,
> 7, 12);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {tbl\tv[0-9]+.16b, \{v[0-9]+.16b\}, 
> > v[0-
> 9]+.16b} 5 } } */
> 
> It'd be good to test with zeros as the first argument too.
> 

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.cc (struct expand_vec_perm_d): Add zero_op0_p
and zero_op_p1.
(aarch64_evpc_tbl): Implement register value remapping.
(aarch64_vectorize_vec_perm_const): Detect if operand is a zero dup
before it's forced to a reg.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/tbl_with_zero_1.c: New test.
* gcc.target/aarch64/tbl_with_zero_2.c: New test.

-- inline copy of patch --

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
469eb938953a70bc6b0ce3d4aa16f773e40ee03e..2d596c19a31a09b4ccbc957d42dce91e453a0dec
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -25413,6 +25413,7 @@ struct expand_vec_perm_d
   unsigned int vec_flags;
   unsigned int op_vec_flags;
   bool one_vector_p;
+  bool zero_op0_p, zero_op1_p;
   bool testing_p;
 };
 
@@ -25909,13 +25910,38 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d)
   /* to_constant is safe since this routine is specific to Advanced SIMD
  vectors.  */
   unsigned int nelt = d->perm.length ().to_constant ();
+
+  /* If one register is the constant vector of 0 then we only need
+ a one reg TBL and we map any accesses to the vector of 0 to -1.  We can't
+ do this earlier since vec_perm_indices clamps elements to within range so
+ we can only do it during codegen.  */
+  if (d->zero_op0_p)
+d->op0 = d->op1;
+  else if (d->zero_op1_p)
+d->op1 = d->op0;
+
   for (unsigned int i = 0; i < nelt; ++i)
-/* If big-endian and two vectors we end up with a weird mixed-endian
-   mode on NEON.  Reverse the index within each word but not the word
-   itself.  to_constant is safe because we checked is_constant above.  */
-rperm[i] = GEN_INT (BYTES_BIG_ENDIAN
-   ? d->perm[i].to_constant () ^ (nelt - 1)
-   : d->perm[i].to_constant ());
+{
+  auto val = d->perm[i].to_constant ();
+
+  /* If we're selecting from a 0 vector, we can just use an out of range
+index instead.  */
+  if ((d->zero_op0_p && val < nelt) || (d->zero_op1_p && val >= nelt))
+   rperm[i] = constm1_rtx;
+  else
+   {
+ /* If we are remapping a zero register as the first parameter we need
+to adjust the indices of the non-zero register.  */
+ if (d->zero_op0_p)
+   val = val % nelt;
+
+ /* If big-endian and two vectors we end up with a weird mixed-endian
+mode on NEON.  Reverse the index within each word but not the word
+itself.  to_constant is safe because we checked is_constant
+above.  */
+ rperm[i] = GEN_INT (BYTES_BIG_ENDIAN ? val ^ (nelt - 1) : val);
+   }
+}
 
   sel = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rperm));
   sel = force_reg (vmode, sel);
@@ -26161,6 +26187,7 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
  const vec_perm_indices &sel)
 {
   struct expand_vec_perm_d d;
+  d.zero_op0_p = d.zero_op1_p = false;
 
   /* Check whether the mask can be applied to a single vector.  */
   if (sel.ninputs () == 1
@@ -26179,6 +26206,8 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
   else
 d.one_vector_p = false;
 
+  d.zero_op0_p = op0 == CONST0_RTX (op_mode);
+  d.zero_op1_p = op1 == CONST0_RTX (op_mode);
   d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2,
 sel.nelts_per_input ());
   d.vmode = vmode;
diff --git a/gcc/testsuite/gcc.target/aarch64/tbl_with_zero_1.c 
b/gcc/testsuite/gcc.target/aarch64/tbl_with_zero_1.c
new file mode 100644
index 
..5595127f3302164b1eb06be50d5c37d41095eb06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/tbl_with_zero_1.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O1" } */
+
+typedef unsigned int v4si __attribute__ ((vector_size (16)));
+
+v4si f1 (v4si a)
+{
+  v4si zeros = {0,0,0,0};
+  return __builtin_shufflevector (a, zeros, 0, 5, 1, 6);
+}
+
+typedef unsigned short v8hi __attribute__ ((vector_size (16)));
+
+v8hi f2a (v8hi a)
+{
+  v8hi zeros = {0,0,0,0,0,0,0,0};
+  return __builtin_shufflevector (a, zeros, 0, 9, 1, 10, 2, 11, 3, 12);
+}
+
+v8hi f2b (v8hi a)
+{
+  v8hi zeros = {0,0,0,0,0,0,0,0};
+  return __builtin_shufflevector (a, zeros, 0, 5, 1, 6, 2, 7, 3, 8);
+}
+
+typedef unsigned char v16qi __attribute__ ((vector_size (16)

Re: [PATCH 1/2]AArch64: make aarch64_simd_vec_unpack_lo_/_hi_ consistent.

2024-07-05 Thread Richard Sandiford
Tamar Christina  writes:
>> > The principle is that, say:
>> >
>> >   (vec_select:V2SI (reg:V2DI R) (parallel [(const_int 0) (const_int 1)]))
>> >
>> > is (for little-endian) equivalent to:
>> >
>> >   (subreg:V2SI (reg:V2DI R) 0)
>> 
>> Sigh, of course I meant V4SI rather than V2DI in the above :)
>> 
>> > and similarly for the equivalent big-endian pair.  Simplification rules
>> > are now supposed to ensure that only the second (simpler) form is generated
>> > by target-independent code.  We should fix any cases where that doesn't
>> > happen, since it would be a missed optimisation for any instructions
>> > that take (in this case) V2SI inputs.
>> >
>> > There's no equivalent simplification for _hi because it isn't possible
>> > to refer directly to the upper 64 bits of a 128-bit register using subregs.
>> >
>
> This removes aarch64_simd_vec_unpack_lo_.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md
>   (aarch64_simd_vec_unpack_lo_): Remove.
>   (vec_unpack_lo_   * config/aarch64/aarch64.cc (aarch64_gen_shareable_zero): Update
>   comment.

OK, thanks.

Richard

> -- inline copy of patch --
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> fd10039f9a27d0da51624d6d3a6d8b2a532f5625..bbeee221f37c4875056cdf52932a787f8ac1c2aa
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1904,17 +1904,6 @@ (define_insn 
> "*aarch64_topbits_shuffle_be"
>  
>  ;; Widening operations.
>  
> -(define_insn "aarch64_simd_vec_unpack_lo_"
> -  [(set (match_operand: 0 "register_operand" "=w")
> -(ANY_EXTEND: (vec_select:
> -(match_operand:VQW 1 "register_operand" "w")
> -(match_operand:VQW 2 "vect_par_cnst_lo_half" "")
> - )))]
> -  "TARGET_SIMD"
> -  "xtl\t%0., %1."
> -  [(set_attr "type" "neon_shift_imm_long")]
> -)
> -
>  (define_insn_and_split "aarch64_simd_vec_unpack_hi_"
>[(set (match_operand: 0 "register_operand" "=w")
>  (ANY_EXTEND: (vec_select:
> @@ -1952,14 +1941,11 @@ (define_expand "vec_unpack_hi_"
>  )
>  
>  (define_expand "vec_unpack_lo_"
> -  [(match_operand: 0 "register_operand")
> -   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))]
> +  [(set (match_operand: 0 "register_operand")
> + (ANY_EXTEND: (match_operand:VQW 1 "register_operand")))]
>"TARGET_SIMD"
>{
> -rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
> -emit_insn (gen_aarch64_simd_vec_unpack_lo_ (operands[0],
> -   operands[1], p));
> -DONE;
> +operands[1] = lowpart_subreg (mode, operands[1], mode);
>}
>  )
>  
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 6b106a72e49f11b8128485baceaaaddcbf139863..469eb938953a70bc6b0ce3d4aa16f773e40ee03e
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -23188,7 +23188,8 @@ aarch64_gen_shareable_zero (machine_mode mode)
> to split without that restriction and instead recombine shared zeros
> if they turn out not to be worthwhile.  This would allow splits in
> single-block functions and would also cope more naturally with
> -   rematerialization.  */
> +   rematerialization.  The downside of not doing this is that we lose the
> +   optimizations for vector epilogues as well.  */
>  
>  bool
>  aarch64_split_simd_shift_p (rtx_insn *insn)


[PATCH] x86, Darwin: Fix bootstrap for 32b multilibs/hosts.

2024-07-05 Thread Iain Sandoe
This is Darwin-local, and I would like to apply it today to restore
bootstrap before my weekend test-runs, but would welcome any comments
or suggestions.

thanks
Iain

--- 8< ---

r15-1735-ge62ea4fb8ffcab06ddd  contained changes that altered the
codegen for 32b Darwin (whether hosted on 64b or as 32b host) such
that the per function picbase load is called multiple times in some
cases.  Darwin's back end is not expecting this (and indeed some of
the handling depends on a single instance).

The fixes the issue by marking those instructions as not copyable
(as suggested by Andrew Pinski).

The change is Darwin-specific.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_cannot_copy_insn_p): New.
(TARGET_CANNOT_COPY_INSN_P): New.

Signed-off-by: Iain Sandoe 
---
 gcc/config/i386/i386.cc | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 99def8d4a77..f75250f79de 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -27025,6 +27025,29 @@ ix86_libm_function_max_error (unsigned cfn, 
machine_mode mode,
 #undef TARGET_LIBM_FUNCTION_MAX_ERROR
 #define TARGET_LIBM_FUNCTION_MAX_ERROR ix86_libm_function_max_error
 
+#if TARGET_MACHO
+static bool
+ix86_cannot_copy_insn_p (rtx_insn *insn)
+{
+  if (TARGET_64BIT)
+return false;
+
+  rtx set = single_set (insn);
+  if (set)
+{
+  rtx src = SET_SRC (set);
+  if (GET_CODE (src) == UNSPEC
+ && XINT (src, 1) == UNSPEC_SET_GOT)
+   return true;
+}
+  return false;
+}
+
+#undef TARGET_CANNOT_COPY_INSN_P
+#define TARGET_CANNOT_COPY_INSN_P ix86_cannot_copy_insn_p
+
+#endif
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
-- 
2.39.2 (Apple Git-143)



Re: [PATCH 2/2]AArch64: lower 2 reg TBL permutes with one zero register to 1 reg TBL.

2024-07-05 Thread Richard Sandiford
Tamar Christina  writes:
>> > +v16qi f3b (v16qi a)
>> > +{
>> > +  v16qi zeros = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
>> > +  return __builtin_shufflevector (a, zeros, 0, 5, 1, 6, 2, 7, 3, 8, 4, 9, 
>> > 5, 10, 6, 11,
>> 7, 12);
>> > +}
>> > +
>> > +/* { dg-final { scan-assembler-times {tbl\tv[0-9]+.16b, \{v[0-9]+.16b\}, 
>> > v[0-
>> 9]+.16b} 5 } } */
>> 
>> It'd be good to test with zeros as the first argument too.
>> 
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (struct expand_vec_perm_d): Add zero_op0_p
>   and zero_op_p1.
>   (aarch64_evpc_tbl): Implement register value remapping.
>   (aarch64_vectorize_vec_perm_const): Detect if operand is a zero dup
>   before it's forced to a reg.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/tbl_with_zero_1.c: New test.
>   * gcc.target/aarch64/tbl_with_zero_2.c: New test.
>
> -- inline copy of patch --
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 469eb938953a70bc6b0ce3d4aa16f773e40ee03e..2d596c19a31a09b4ccbc957d42dce91e453a0dec
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -25413,6 +25413,7 @@ struct expand_vec_perm_d
>unsigned int vec_flags;
>unsigned int op_vec_flags;
>bool one_vector_p;
> +  bool zero_op0_p, zero_op1_p;
>bool testing_p;
>  };
>  
> @@ -25909,13 +25910,38 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d)
>/* to_constant is safe since this routine is specific to Advanced SIMD
>   vectors.  */
>unsigned int nelt = d->perm.length ().to_constant ();
> +
> +  /* If one register is the constant vector of 0 then we only need
> + a one reg TBL and we map any accesses to the vector of 0 to -1.  We 
> can't
> + do this earlier since vec_perm_indices clamps elements to within range 
> so
> + we can only do it during codegen.  */
> +  if (d->zero_op0_p)
> +d->op0 = d->op1;
> +  else if (d->zero_op1_p)
> +d->op1 = d->op0;
> +
>for (unsigned int i = 0; i < nelt; ++i)
> -/* If big-endian and two vectors we end up with a weird mixed-endian
> -   mode on NEON.  Reverse the index within each word but not the word
> -   itself.  to_constant is safe because we checked is_constant above.  */
> -rperm[i] = GEN_INT (BYTES_BIG_ENDIAN
> - ? d->perm[i].to_constant () ^ (nelt - 1)
> - : d->perm[i].to_constant ());
> +{
> +  auto val = d->perm[i].to_constant ();
> +
> +  /* If we're selecting from a 0 vector, we can just use an out of range
> +  index instead.  */
> +  if ((d->zero_op0_p && val < nelt) || (d->zero_op1_p && val >= nelt))
> + rperm[i] = constm1_rtx;
> +  else
> + {
> +   /* If we are remapping a zero register as the first parameter we need
> +  to adjust the indices of the non-zero register.  */
> +   if (d->zero_op0_p)
> + val = val % nelt;
> +
> +   /* If big-endian and two vectors we end up with a weird mixed-endian
> +  mode on NEON.  Reverse the index within each word but not the word
> +  itself.  to_constant is safe because we checked is_constant
> +  above.  */
> +   rperm[i] = GEN_INT (BYTES_BIG_ENDIAN ? val ^ (nelt - 1) : val);
> + }
> +}
>  
>sel = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rperm));
>sel = force_reg (vmode, sel);
> @@ -26161,6 +26187,7 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, 
> machine_mode op_mode,
> const vec_perm_indices &sel)
>  {
>struct expand_vec_perm_d d;
> +  d.zero_op0_p = d.zero_op1_p = false;

This is redundant with the assignments below.  OK with that line removed.

Not sure whether the test will work on Darwin & mingw, but we can restrict
to ELF targets later if necessary.

Thanks,
Richard

>  
>/* Check whether the mask can be applied to a single vector.  */
>if (sel.ninputs () == 1
> @@ -26179,6 +26206,8 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, 
> machine_mode op_mode,
>else
>  d.one_vector_p = false;
>  
> +  d.zero_op0_p = op0 == CONST0_RTX (op_mode);
> +  d.zero_op1_p = op1 == CONST0_RTX (op_mode);
>d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2,
>sel.nelts_per_input ());
>d.vmode = vmode;
> diff --git a/gcc/testsuite/gcc.target/aarch64/tbl_with_zero_1.c 
> b/gcc/testsuite/gcc.target/aarch64/tbl_with_zero_1.c
> new file mode 100644
> index 
> ..5595127f3302164b1eb06be50d5c37d41095eb06
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/tbl_with_zero_1.c
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O1" } */
> +
> +typedef unsigned int v4si __attribute__ ((vector_size (16)));
> +
> +v4si f1 (v4si a)
> +{
> +  v4si zeros = {0,0,0,0};
> +  return __builtin_shufflevec

Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

2024-07-05 Thread Richard Sandiford
Richard Sandiford  writes:
> Richard Biener  writes:
>> On Fri, Jun 28, 2024 at 2:16 PM Richard Biener
>>  wrote:
>>>
>>> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
>>>  wrote:
>>> >
>>> >
>>> >
>>> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford 
>>> > > :
>>> > >
>>> > > Richard Biener  writes:
>>> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
>>> > >>>  wrote:
>>> > >>>
>>> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt  
>>> > >>> wrote:
>>> > 
>>> >  for the testcase in the PR115406, here is part of the dump.
>>> > 
>>> >   char D.4882;
>>> >   vector(1)  _1;
>>> >   vector(1) signed char _2;
>>> >   char _5;
>>> > 
>>> >    :
>>> >   _1 = { -1 };
>>> > 
>>> >  When assign { -1 } to vector(1} {signed-boolean:8},
>>> >  Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of 
>>> >  dest
>>> >  with each vector elemnet. But i think the bit setting should only 
>>> >  apply for
>>> >  TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
>>> >  , it will be assigned as -1, instead of 1.
>>> >  Is there any specific reason vector(1)  is handled
>>> >  differently from vector<1> ?
>>> > 
>>> >  Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>>> >  Ok for trunk?
>>> > >>>
>>> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
>>> > >>> code should work for 8 bit
>>> > >>> entities as well, it seems we only set the LSB of each element in the
>>> > >>> "mask".  ISTR that SVE
>>> > >>> masks can have up to 8 bit elements (for 8 byte data elements), so
>>> > >>> maybe that's why
>>> > >>> <= BITS_PER_UNIT.
>>> > >
>>> > > Yeah.
>>> >
>>> > So is it necessary that only one bit is set for SVE?
>
> TBH I can't remember now.  It matches what SVE instructions produce, and
> lines up with the associated RTL code (which at the time was SVE-specific).
> But when dealing with multibyte elements, upper predicate element bits
> are ignored on read, so matching the instructions might not matter.
>
>>> > >>> So maybe instead of just setting one bit in
>>> > >>>
>>> > >>>  ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
>>> > >>>
>>> > >>> we should set elt_bits bits, aka (without testing)
>>> > >>>
>>> > >>>  ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
>>> > >>> % BITS_PER_UNIT);
>>> > >>>
>>> > >>> ?
>>> > >>
>>> > >> Alternatively
>>> > >>
>>> > >>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
>>> > >>  && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
>>> > >>
>>> > >> should be amended with
>>> > >>
>>> > >>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
>>> > >
>>> > > How about:
>>> > >
>>> > >  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
>>> > >{
>>> > >  gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
>>> > >
>>> > > ?
>>> >
>>> > Note the path is also necessary for avx512 and gcn mask modes which are 
>>> > integer modes.
>>> >
>>> > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially
>>> > > since it can change with the target attribute?  (At least TYPE_MODE_RAW
>>> > > would be stable.)
>>> >
>>> > That’s a good question and also related to GCC vector extension which can 
>>> > result in both BLKmode and integer modes to be used.  But I’m not sure 
>>> > how we expose masks to the middle end here.  A too large vector bool 
>>> > could be lowered to AVX512 mode.  Maybe we should simply reject 
>>> > interpret/encode of BLKmode vectors and make sure to never assign integer 
>>> > modes to vector bools (if the target didn’t specify that mode)?
>>> >
>>> > I guess some test coverage would be nice here.
>>>
>>> To continue on that, we do not currently have a way to capture a
>>> vector comparison output
>>> but the C++ frontend has vector ?:
>>>
>>> typedef int v8si __attribute__((vector_size(32)));
>>>
>>> void foo (v8si *a, v8si *b, v8si *c)
>>> {
>>>   *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0};
>>> }
>>>
>>> with SSE2 we get a  temporary, with AVX512 enabled
>>> that becomes .  When we enlarge the vector to size 128
>>> then even with AVX512 enabled I see  here and
>>> vector lowering decomposes that to scalar (also with AVX or SSE, so maybe
>>> just a missed optimization).  But note that to decompose this into two
>>> AVX512 vectors the temporary would have to change from 
>>> elements to .
>>>
>>> The not supported vector bool types have BLKmode sofar.
>>>
>>> But for example on i?86-linux with -mno-sse (like -march=i586) for
>>>
>>> typedef short v2hi __attribute__((vector_size(4)));
>>>
>>> void foo (v2hi *a, v2hi *b, v2hi *c)
>>> {
>>>   *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0};
>>> }
>>>
>>> we get a SImode vector  as I feared.  That means
>>>  (the BITS_PER_UNIT case) can be ambiguous
>>> between SVE (bool for a 8byte data vector) and emulated vectors
>>> ("word-mode" vectors; for 1byte d

[PATCH] [RFC] load and store-lanes with SLP

2024-07-05 Thread Richard Biener
The following is a prototype for how to represent load/store-lanes
within SLP.  I've for now settled with having a single load node
with multiple permute nodes, one for each loaded lane and a single
store node plus a single permute node feeding it.  For

  for (int i = 0; i < 1024; ++i)
{
  a[2*i] = b[2*i] + 7;
  a[2*i+1] = b[2*i+1] * 3;
}

you have the following SLP graph where I explain how things are set
up and code-generated:

t.c:3:21: note:   SLP graph after lowering permutations:
t.c:3:21: note:   node 0x578af60 (max_nunits=1, refcnt=1) vector([4,4]) int
t.c:3:21: note:   op template: *_6 = _7;
t.c:3:21: note: stmt 0 *_6 = _7;
t.c:3:21: note: stmt 1 *_12 = _13;
t.c:3:21: note: children 0x578aff8

This is the store node, it's marked with ldst_lanes = true during
SLP discovery.  This node code-generates

  .MASK_LEN_STORE_LANES (vectp_a.10_30, 32B, { -1, ... }, loop_len_38, 0, 
vect_array.9);

t.c:3:21: note:   node 0x578aff8 (max_nunits=1, refcnt=1) vector([4,4]) int
t.c:3:21: note:   op: VEC_PERM_EXPR
t.c:3:21: note: { }
t.c:3:21: note: lane permutation { 0[0] 1[0] }
t.c:3:21: note: children 0x578ab38 0x578ad98

This is the store-lane shuffling node, it's also marked with ldst_lanes
= true during SLP discovery.  This node code-generates

  vect_array.9[0] = vect__7.7_35;
  vect_array.9[1] = vect__13.8_33;

it records virtual defs as vectorized defs.  This is a bit awkward,
on the store side it would be nicer to elide 'vect_array' and instead
make .MASK_LEN_STORE_LANES "varargs", having the vector defs as
arguments.

t.c:3:21: note:   node 0x578ab38 (max_nunits=4, refcnt=2) vector([4,4]) int
t.c:3:21: note:   op template: _7 = _5 + 7;
t.c:3:21: note: stmt 0 _7 = _5 + 7;
t.c:3:21: note: children 0x578abd0 0x578ac68
t.c:3:21: note:   node (constant) 0x578ac68 (max_nunits=1, refcnt=1)
t.c:3:21: note: { 7 }
t.c:3:21: note:   node 0x578b978 (max_nunits=2, refcnt=2) vector(2) int
t.c:3:21: note:   op template: _13 = _11 * 3;
t.c:3:21: note: stmt 0 _13 = _11 * 3;
t.c:3:21: note: children 0x578b4b8 0x578b8e0
t.c:3:21: note:   node (constant) 0x578b8e0 (max_nunits=1, refcnt=1)
t.c:3:21: note: { 3 }
t.c:3:21: note:   node 0x578abd0 (max_nunits=4, refcnt=2) vector([4,4]) int
t.c:3:21: note:   op: VEC_PERM_EXPR
t.c:3:21: note: stmt 0 _5 = *_4;
t.c:3:21: note: lane permutation { 0[0] }
t.c:3:21: note: children 0x578b090
t.c:3:21: note:   node 0x578b4b8 (max_nunits=2, refcnt=2) vector(2) int
t.c:3:21: note:   op: VEC_PERM_EXPR
t.c:3:21: note: stmt 0 _11 = *_10;
t.c:3:21: note: lane permutation { 0[1] }
t.c:3:21: note: children 0x578bbd8

The above two are the load-lane shufflings, they are marked with
ldst_lanes during load permutation lowering.  They code generate

  _34 = vect_array.6[1];
  _36 = vect_array.6[0];

they pick up the vect_array.6 by looking at the virtual operand def
in the children node.  As with the store side this is awkward but
much harder to avoid since we lack multiple SSA defs support.  What
could be done though is to write vect_array.6 into SSA and use
BIT_FIELD_REF for the extracts?  (yes, that would require ARRAY_TYPE
gimple registers)

t.c:3:21: note:   node 0x578b090 (max_nunits=4, refcnt=3) vector([4,4]) int
t.c:3:21: note:   op template: _5 = *_4;
t.c:3:21: note: stmt 0 _5 = *_4;
t.c:3:21: note: stmt 1 _11 = *_10;
t.c:3:21: note: load permutation { 0 1 }

This is the load itself, marked ldst_lanes during load permutation
lowering.  It code generates

  vect_array.6 = .MASK_LEN_LOAD_LANES (vectp_b.4_40, 32B, { -1, ... }, 
loop_len_38, 0);

The patch below (tested with just the above testcase and on riscv)
runs into the following where I have not figured out what's wrong.

t.c:1:6: internal compiler error: Segmentation fault
1 | void foo (int * __restrict a, int *b)
  |  ^~~
0x3ea25f5 internal_error(char const*, ...)
/home/rguenther/src/gcc/gcc/diagnostic-global-context.cc:491
0x1ac12b6 crash_signal
/home/rguenther/src/gcc/gcc/toplev.cc:319
0x10af391 tree_check(tree_node const*, char const*, int, char const*, tree_code)
/home/rguenther/src/gcc/gcc/tree.h:3919
0x1140a0a TYPE_VECTOR_SUBPARTS(tree_node const*)
/home/rguenther/src/gcc/gcc/tree.h:4254
0x1e7ab36 vect_set_loop_controls_directly
/home/rguenther/src/gcc/gcc/tree-vect-loop-manip.cc:514

The patch gets ncopies wrong (I'm sure of that), and the way I create
virtual operands is surely going to break things.  It likely also
gets the support for doing load lanes for multi-lane SLP sub-groups
wrong.

The way this SLP representation forces us to separate IFN and array
access code generation to be separated introduces problems not present
with non-SLP.  Keeping the operation in one node would be managable
on the store side by recording an incoming lane permutation for it.
On the load side we could code gen

[PATCH v2] middle-end: Add debug functions to dump dominator tree in dot format

2024-07-05 Thread Alex Coplan
This is a v2 patch which implements richi's feedback.

OK if it survives bootstrap on aarch64?

Thanks,
Alex

-- >8 --

This adds debug functions to dump the dominator tree in dot format.
There are two overloads: one which takes a FILE * and another which
takes a const char *fname and wraps the first with fopen/fclose for
convenience.

gcc/ChangeLog:

* dominance.cc (dot_dominance_tree): New.
diff --git a/gcc/dominance.cc b/gcc/dominance.cc
index 0357210ed27..c14d997ded7 100644
--- a/gcc/dominance.cc
+++ b/gcc/dominance.cc
@@ -1658,6 +1658,36 @@ debug_dominance_info (enum cdi_direction dir)
   fprintf (stderr, "%i %i\n", bb->index, bb2->index);
 }
 
+/* Dump the dominance tree in direction DIR to the file F in dot form.
+   This allows easily visualizing the tree using graphviz.  */
+
+DEBUG_FUNCTION void
+dot_dominance_tree (FILE *f, enum cdi_direction dir)
+{
+  fprintf (f, "digraph {\n");
+  basic_block bb, idom;
+  FOR_EACH_BB_FN (bb, cfun)
+if ((idom = get_immediate_dominator (dir, bb)))
+  fprintf (f, "%i -> %i;\n", idom->index, bb->index);
+  fprintf (f, "}\n");
+}
+
+/* Convenience wrapper around the above that dumps the dominance tree in
+   direction DIR to the file at path FNAME in dot form.  */
+
+DEBUG_FUNCTION void
+dot_dominance_tree (const char *fname, enum cdi_direction dir)
+{
+  FILE *f = fopen (fname, "w");
+  if (f)
+{
+  dot_dominance_tree (f, dir);
+  fclose (f);
+}
+  else
+fprintf (stderr, "failed to open %s: %s\n", fname, xstrerror (errno));
+}
+
 /* Prints to stderr representation of the dominance tree (for direction DIR)
rooted in ROOT, indented by INDENT tabulators.  If INDENT_FIRST is false,
the first line of the output is not indented.  */


Re: [PATCH] Fix MinGW option -mcrtdll=

2024-07-05 Thread Pali Rohár
On Monday 24 June 2024 10:03:26 Jonathan Yong wrote:
> On 6/23/24 16:40, Pali Rohár wrote:
> > Add missing msvcr40* and msvcrtd* cases to CPP_SPEC and
> > document missing _UCRT macro and msvcr71* case.
> > 
> > Fixes commit 453cb585f0f8673a5d69d1b420ffd4b3f53aca00.
> Thanks, pushed to master branch.

Hello, thanks for quick reply.

I would like to ask one thing. I see that the mentioned commit was
branched into release gcc-14. Should be this fixup commit included
into release gcc-14 too?


[PATCH 0/2] [RISC-V] c implies zca, and conditionally zcf & zcd

2024-07-05 Thread Fei Gao
According to Zc-1.0.4-3.pdf from
https://github.com/riscvarchive/riscv-code-size-reduction/releases/tag/v1.0.4-3
The rule is that:
1. C always implies Zca
2. C+F implies Zcf (RV32 only)
3. C+D implies Zcd
Patch 2 handles this implication.

Without Patch 1, Patch 2 triggers an ICE when compiling 
gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c with -march=rv32imafcv.
The ICE is due to failure of gcc_assert (check_implied_ext ())
in riscv_subset_list::finalize ().
check_implied_ext checks if ALL implied extensions have been added
after handle_implied_ext.
In rv32imafcv case, handle_implied_ext first handles 'c'.
As there's no 'd' in subset list, zcd hasn't been added.
Then handle_implied_ext handles 'v'. 'v' implies 'zve64d',
and 'zve64d' implies 'd', so 'd' has been added into subset list.
However, after handle_implied_ext,
check_implied_ext finds 'c' and 'd' both available in subset
list, but 'zcd' is not here, causing assert failure. 

Patch 1 sovles the assert failure by calling handle_implied_ext
repeatly until there's no new subset added into the subset list.

This patch series also helps to expose the conflict between
c+d and zcmp/zcmt.

Regession tested rv64gc with no new failures.

Fei Gao (2):
  [RISC-V] add impied extension repeatly until stable
  [RISC-V] c impies zca, and conditionally zcf & zcd

 gcc/common/config/riscv/riscv-common.cc   | 26 ---
 gcc/config/riscv/riscv-subset.h   |  3 +++
 gcc/testsuite/gcc.target/riscv/arch-39.c  |  7 +
 gcc/testsuite/gcc.target/riscv/arch-40.c  |  7 +
 gcc/testsuite/gcc.target/riscv/attribute-15.c |  2 +-
 gcc/testsuite/gcc.target/riscv/attribute-18.c |  2 +-
 gcc/testsuite/gcc.target/riscv/pr110696.c |  2 +-
 .../riscv/rvv/base/abi-callee-saved-1-zcmp.c  |  2 +-
 .../riscv/rvv/base/abi-callee-saved-2-zcmp.c  |  2 +-
 .../gcc.target/riscv/rvv/base/pr114352-1.c|  4 +--
 .../gcc.target/riscv/rvv/base/pr114352-3.c|  8 +++---
 11 files changed, 51 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-39.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-40.c

-- 
2.17.1



[PATCH 1/2] [RISC-V] add implied extension repeatly until stable

2024-07-05 Thread Fei Gao
Call handle_implied_ext repeatly until there's no
new subset added into the subset list.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc 
(riscv_subset_list::riscv_subset_list):
init m_subset_num to 0.
(riscv_subset_list::add): increase m_subset_num once a subset added.
(riscv_subset_list::finalize): call handle_implied_ext repeatly
until no change in m_subset_num.
* config/riscv/riscv-subset.h: add m_subset_num member.

Signed-off-by: Fei Gao 
---
 gcc/common/config/riscv/riscv-common.cc | 14 +++---
 gcc/config/riscv/riscv-subset.h |  3 +++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 410e673f5e0..cad3551feb6 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -566,7 +566,8 @@ riscv_subset_t::riscv_subset_t ()
 }
 
 riscv_subset_list::riscv_subset_list (const char *arch, location_t loc)
-  : m_arch (arch), m_loc (loc), m_head (NULL), m_tail (NULL), m_xlen (0)
+  : m_arch (arch), m_loc (loc), m_head (NULL), m_tail (NULL), m_xlen (0),
+m_subset_num (0)
 {
 }
 
@@ -812,6 +813,7 @@ riscv_subset_list::add (const char *subset, int 
major_version,
   return;
 }
 
+  m_subset_num++;
   riscv_subset_t *s = new riscv_subset_t ();
   riscv_subset_t *itr;
 
@@ -1586,9 +1588,15 @@ void
 riscv_subset_list::finalize ()
 {
   riscv_subset_t *subset;
+  unsigned pre_subset_num;
 
-  for (subset = m_head; subset != NULL; subset = subset->next)
-handle_implied_ext (subset->name.c_str ());
+  do
+{
+  pre_subset_num = m_subset_num;
+  for (subset = m_head; subset != NULL; subset = subset->next)
+   handle_implied_ext (subset->name.c_str ());
+}
+  while (pre_subset_num != m_subset_num);
 
   gcc_assert (check_implied_ext ());
 
diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
index fe7f54d8bc5..7dc196a2007 100644
--- a/gcc/config/riscv/riscv-subset.h
+++ b/gcc/config/riscv/riscv-subset.h
@@ -62,6 +62,9 @@ private:
   /* X-len of m_arch. */
   unsigned m_xlen;
 
+  /* Number of subsets. */
+  unsigned m_subset_num;
+
   riscv_subset_list (const char *, location_t);
 
   const char *parsing_subset_version (const char *, const char *, unsigned *,
-- 
2.17.1



[PATCH 2/2] [RISC-V] c implies zca, and conditionally zcf & zcd

2024-07-05 Thread Fei Gao
According to Zc-1.0.4-3.pdf from
https://github.com/riscvarchive/riscv-code-size-reduction/releases/tag/v1.0.4-3
The rule is that:
1. C always implies Zca
2. C+F implies Zcf (RV32 only)
3. C+D implies Zcd

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
c implies zca, and conditionally zcf & zcd.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-15.c: adapt TC.
* gcc.target/riscv/attribute-18.c: likewise.
* gcc.target/riscv/pr110696.c: likewise.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: likewise.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: likewise.
* gcc.target/riscv/rvv/base/pr114352-1.c: likewise.
* gcc.target/riscv/rvv/base/pr114352-3.c: likewise.
* gcc.target/riscv/arch-39.c: New test.
* gcc.target/riscv/arch-40.c: New test.


Signed-off-by: Fei Gao 
---
 gcc/common/config/riscv/riscv-common.cc  | 12 
 gcc/testsuite/gcc.target/riscv/arch-39.c |  7 +++
 gcc/testsuite/gcc.target/riscv/arch-40.c |  7 +++
 gcc/testsuite/gcc.target/riscv/attribute-15.c|  2 +-
 gcc/testsuite/gcc.target/riscv/attribute-18.c|  2 +-
 gcc/testsuite/gcc.target/riscv/pr110696.c|  2 +-
 .../riscv/rvv/base/abi-callee-saved-1-zcmp.c |  2 +-
 .../riscv/rvv/base/abi-callee-saved-2-zcmp.c |  2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-1.c |  4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c |  8 
 10 files changed, 37 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-39.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-40.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index cad3551feb6..a02f1fe19a0 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -82,6 +82,18 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"a", "zaamo"},
   {"a", "zalrsc"},
 
+  {"c", "zca"},
+  {"c", "zcf",
+   [] (const riscv_subset_list *subset_list) -> bool
+   {
+ return subset_list->xlen () == 32 && subset_list->lookup ("f");
+   }},
+  {"c", "zcd",
+   [] (const riscv_subset_list *subset_list) -> bool
+   {
+ return subset_list->lookup ("d");
+   }},
+
   {"zdinx", "zfinx"},
   {"zfinx", "zicsr"},
   {"zdinx", "zicsr"},
diff --git a/gcc/testsuite/gcc.target/riscv/arch-39.c 
b/gcc/testsuite/gcc.target/riscv/arch-39.c
new file mode 100644
index 000..beeb81e44c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-39.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64idc_zcmt -mabi=lp64d" } */
+int
+foo ()
+{}
+
+/* { dg-error "zcd conflicts with zcmt" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-40.c 
b/gcc/testsuite/gcc.target/riscv/arch-40.c
new file mode 100644
index 000..eaefaf1d0d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-40.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64idc_zcmp -mabi=lp64d" } */
+int
+foo ()
+{}
+
+/* { dg-error "zcd conflicts with zcmp" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-15.c 
b/gcc/testsuite/gcc.target/riscv/attribute-15.c
index a2e394b6489..ac6caaecd4f 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-15.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-15.c
@@ -3,4 +3,4 @@
 int foo()
 {
 }
-/* { dg-final { scan-assembler ".attribute arch, 
\"rv32i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_zaamo1p0_zalrsc1p0\"" } } */
+/* { dg-final { scan-assembler ".attribute arch, 
\"rv32i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zcf1p0\"" 
} } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-18.c 
b/gcc/testsuite/gcc.target/riscv/attribute-18.c
index eefd602103d..9f7199f331a 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-18.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-18.c
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
 /* { dg-options "-mriscv-attribute -march=rv64imafdc -mabi=lp64d 
-misa-spec=2.2" } */
 int foo() {}
-/* { dg-final { scan-assembler ".attribute arch, 
\"rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_zaamo1p0_zalrsc1p0\"" } } */
+/* { dg-final { scan-assembler ".attribute arch, 
\"rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0\"" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr110696.c 
b/gcc/testsuite/gcc.target/riscv/pr110696.c
index 08682a047e0..8aa6cd07f4f 100644
--- a/gcc/testsuite/gcc.target/riscv/pr110696.c
+++ b/gcc/testsuite/gcc.target/riscv/pr110696.c
@@ -4,4 +4,4 @@ int foo()
 {
 }
 
-/* { dg-final { scan-assembler ".attribute arch, 
\"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl1024b1p0_zvl128b1p0_zvl2048b1p0_zvl256b1p0_zvl32b1p0_zvl4096b1p0_zvl512b1p0_zvl64b1p0\""
 } } */
+/* { dg-final { scan-assembler ".attribute arch, 
\"rv

Re: [PATCH v2] middle-end: Add debug functions to dump dominator tree in dot format

2024-07-05 Thread Richard Biener
On Fri, 5 Jul 2024, Alex Coplan wrote:

> This is a v2 patch which implements richi's feedback.
> 
> OK if it survives bootstrap on aarch64?

OK.

Thanks,
Richard.

> Thanks,
> Alex
> 
> -- >8 --
> 
> This adds debug functions to dump the dominator tree in dot format.
> There are two overloads: one which takes a FILE * and another which
> takes a const char *fname and wraps the first with fopen/fclose for
> convenience.
> 
> gcc/ChangeLog:
> 
>   * dominance.cc (dot_dominance_tree): New.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[r15-1848 Regression] FAIL: gcc.dg/vect/slp-54.c scan-tree-dump vect "vectorizing stmts using SLP" on Linux/x86_64

2024-07-05 Thread haochen.jiang
On Linux/x86_64,

7eb8b65780d9dc3e266056383279b00d5e152bea is the first bad commit
commit 7eb8b65780d9dc3e266056383279b00d5e152bea
Author: Richard Biener 
Date:   Wed Jul 3 13:50:59 2024 +0200

Support group size of three in SLP store permute lowering

caused

FAIL: gcc.dg/vect/slp-54.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorizing stmts using SLP"
FAIL: gcc.dg/vect/slp-54.c scan-tree-dump vect "vectorizing stmts using SLP"

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-1848/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/slp-54.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/slp-54.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH v8 07/12] Enable musttail tail conversion even when not optimizing

2024-07-05 Thread Richard Biener
On Sat, Jun 22, 2024 at 8:59 PM Andi Kleen  wrote:
>
> Enable the tailcall optimization for non optimizing builds,
> but in this case only checks calls that have the musttail attribute set.
> This makes musttail work without optimization.
>
> This is done with a new late musttail pass that is only active when
> not optimizing. The new pass relies on tree-cfg to discover musttails.
> This avoids a ~0.8% compiler run time penalty at -O0.
>
> gcc/ChangeLog:
>
> * function.h (struct function): Add has_musttail.
> * lto-streamer-in.cc (input_struct_function_base): Stream
> has_musttail.
> * lto-streamer-out.cc (output_struct_function_base): Dito.
> * passes.def (pass_musttail): Add.
> * tree-cfg.cc (notice_special_calls): Record has_musttail.
> (clear_special_calls): Clear has_musttail.
> * tree-pass.h (make_pass_musttail): Add.
> * tree-tailcall.cc (find_tail_calls): Handle only_musttail
>   argument.
> (tree_optimize_tail_calls_1): Pass on only_musttail.
> (execute_tail_calls): Pass only_musttail as false.
> (class pass_musttail): Add.
> (make_pass_musttail): Add.
> ---
>  gcc/function.h  |  3 ++
>  gcc/lto-streamer-in.cc  |  1 +
>  gcc/lto-streamer-out.cc |  1 +
>  gcc/passes.def  |  1 +
>  gcc/tree-cfg.cc |  3 ++
>  gcc/tree-pass.h |  1 +
>  gcc/tree-tailcall.cc| 66 +++--
>  7 files changed, 67 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/function.h b/gcc/function.h
> index c0ba6cc1531a..fbeadeaf4104 100644
> --- a/gcc/function.h
> +++ b/gcc/function.h
> @@ -430,6 +430,9 @@ struct GTY(()) function {
>/* Nonzero when the tail call has been identified.  */
>unsigned int tail_call_marked : 1;
>
> +  /* Has musttail marked calls.  */
> +  unsigned int has_musttail : 1;
> +
>/* Nonzero if the current function contains a #pragma GCC unroll.  */
>unsigned int has_unroll : 1;
>
> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
> index ad0ca24007a0..2e592be80823 100644
> --- a/gcc/lto-streamer-in.cc
> +++ b/gcc/lto-streamer-in.cc
> @@ -1325,6 +1325,7 @@ input_struct_function_base (struct function *fn, class 
> data_in *data_in,
>fn->calls_eh_return = bp_unpack_value (&bp, 1);
>fn->has_force_vectorize_loops = bp_unpack_value (&bp, 1);
>fn->has_simduid_loops = bp_unpack_value (&bp, 1);
> +  fn->has_musttail = bp_unpack_value (&bp, 1);
>fn->assume_function = bp_unpack_value (&bp, 1);
>fn->va_list_fpr_size = bp_unpack_value (&bp, 8);
>fn->va_list_gpr_size = bp_unpack_value (&bp, 8);
> diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
> index d4f728094ed5..0be381abbd96 100644
> --- a/gcc/lto-streamer-out.cc
> +++ b/gcc/lto-streamer-out.cc
> @@ -2290,6 +2290,7 @@ output_struct_function_base (struct output_block *ob, 
> struct function *fn)
>bp_pack_value (&bp, fn->calls_eh_return, 1);
>bp_pack_value (&bp, fn->has_force_vectorize_loops, 1);
>bp_pack_value (&bp, fn->has_simduid_loops, 1);
> +  bp_pack_value (&bp, fn->has_musttail, 1);
>bp_pack_value (&bp, fn->assume_function, 1);
>bp_pack_value (&bp, fn->va_list_fpr_size, 8);
>bp_pack_value (&bp, fn->va_list_gpr_size, 8);
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 041229e47a68..5b5390e6ac0b 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -444,6 +444,7 @@ along with GCC; see the file COPYING3.  If not see
>NEXT_PASS (pass_tsan_O0);
>NEXT_PASS (pass_sanopt);
>NEXT_PASS (pass_cleanup_eh);
> +  NEXT_PASS (pass_musttail);
>NEXT_PASS (pass_lower_resx);
>NEXT_PASS (pass_nrv);
>NEXT_PASS (pass_gimple_isel);
> diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
> index 7fb7b92966be..e6fd1294b958 100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -2290,6 +2290,8 @@ notice_special_calls (gcall *call)
>  cfun->calls_alloca = true;
>if (flags & ECF_RETURNS_TWICE)
>  cfun->calls_setjmp = true;
> +  if (gimple_call_must_tail_p (call))
> +cfun->has_musttail = true;
>  }
>
>
> @@ -2301,6 +2303,7 @@ clear_special_calls (void)
>  {
>cfun->calls_alloca = false;
>cfun->calls_setjmp = false;
> +  cfun->has_musttail = false;
>  }
>
>  /* Remove PHI nodes associated with basic block BB and all edges out of BB.  
> */
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index edebb2be245d..59e53558034f 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -368,6 +368,7 @@ extern gimple_opt_pass *make_pass_sra (gcc::context 
> *ctxt);
>  extern gimple_opt_pass *make_pass_sra_early (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tail_recursion (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tail_calls (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_musttail (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_fix_loops (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tree_loop (gcc::context *ctxt);
> 

Re: [PATCH v8 01/12] Improve must tail in RTL backend

2024-07-05 Thread Richard Biener
On Sat, Jun 22, 2024 at 8:57 PM Andi Kleen  wrote:
>
> - Give error messages for all causes of non sibling call generation
> - When giving error messages clear the musttail flag to avoid ICEs
> - Error out when tree-tailcall failed to mark a must-tail call
> sibcall. In this case it doesn't know the true reason and only gives
> a vague message.

We seem to have concluded that we need the tailcall pass run and
both markings after all.

Thus OK.

Richard.

> PR83324
>
> gcc/ChangeLog:
>
> * calls.cc (maybe_complain_about_tail_call): Clear must tail
> flag on error.
> (expand_call): Give error messages for all musttail failures.
> ---
>  gcc/calls.cc | 32 ++--
>  1 file changed, 26 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 21d78f9779fe..883eb9971257 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -1249,6 +1249,7 @@ maybe_complain_about_tail_call (tree call_expr, const 
> char *reason)
>  return;
>
>error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", reason);
> +  CALL_EXPR_MUST_TAIL_CALL (call_expr) = 0;
>  }
>
>  /* Fill in ARGS_SIZE and ARGS array based on the parameters found in
> @@ -2650,7 +2651,13 @@ expand_call (tree exp, rtx target, int ignore)
>/* The type of the function being called.  */
>tree fntype;
>bool try_tail_call = CALL_EXPR_TAILCALL (exp);
> -  bool must_tail_call = CALL_EXPR_MUST_TAIL_CALL (exp);
> +  /* tree-tailcall decided not to do tail calls. Error for the musttail case,
> + unfortunately we don't know the reason so it's fairly vague.
> + When tree-tailcall reported an error it already cleared the flag,
> + so this shouldn't really happen unless the
> + the musttail pass gave up walking before finding the call.  */
> +  if (!try_tail_call)
> +  maybe_complain_about_tail_call (exp, "other reasons");
>int pass;
>
>/* Register in which non-BLKmode value will be returned,
> @@ -3022,10 +3029,21 @@ expand_call (tree exp, rtx target, int ignore)
>   pushed these optimizations into -O2.  Don't try if we're already
>   expanding a call, as that means we're an argument.  Don't try if
>   there's cleanups, as we know there's code to follow the call.  */
> -  if (currently_expanding_call++ != 0
> -  || (!flag_optimize_sibling_calls && !CALL_FROM_THUNK_P (exp))
> -  || args_size.var
> -  || dbg_cnt (tail_call) == false)
> +  if (currently_expanding_call++ != 0)
> +{
> +  maybe_complain_about_tail_call (exp, "inside another call");
> +  try_tail_call = 0;
> +}
> +  if (!flag_optimize_sibling_calls
> +   && !CALL_FROM_THUNK_P (exp)
> +   && !CALL_EXPR_MUST_TAIL_CALL (exp))
> +try_tail_call = 0;
> +  if (args_size.var)
> +{
> +  maybe_complain_about_tail_call (exp, "variable size arguments");
> +  try_tail_call = 0;
> +}
> +  if (dbg_cnt (tail_call) == false)
>  try_tail_call = 0;
>
>/* Workaround buggy C/C++ wrappers around Fortran routines with
> @@ -3046,13 +3064,15 @@ expand_call (tree exp, rtx target, int ignore)
> if (MEM_P (*iter))
>   {
> try_tail_call = 0;
> +   maybe_complain_about_tail_call (exp,
> +   "hidden string length argument passed on 
> stack");
> break;
>   }
> }
>
>/* If the user has marked the function as requiring tail-call
>   optimization, attempt it.  */
> -  if (must_tail_call)
> +  if (CALL_EXPR_MUST_TAIL_CALL (exp))
>  try_tail_call = 1;
>
>/*  Rest of purposes for tail call optimizations to fail.  */
> --
> 2.45.2
>


Re: [PATCH v8 02/12] Fix pro_and_epilogue for sibcalls at -O0

2024-07-05 Thread Richard Biener
On Sat, Jun 22, 2024 at 8:57 PM Andi Kleen  wrote:
>
> Some of the cfg fixups in pro_and_epilogue for sibcalls were dependent on 
> "optimize".
> Make them check cfun->tail_call_marked instead to handle the -O0 musttail
> case. This fixes the musttail test cases on arm targets.
>
> PR115255
>
> gcc/ChangeLog:
>
> * function.cc (thread_prologue_and_epilogue_insns): Check
>   cfun->tail_call_marked for sibcalls too.
> (rest_of_handle_thread_prologue_and_epilogue): Dito.
> ---
>  gcc/function.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/function.cc b/gcc/function.cc
> index 4edd4da12474..7c9b181423d4 100644
> --- a/gcc/function.cc
> +++ b/gcc/function.cc
> @@ -6261,6 +6261,7 @@ thread_prologue_and_epilogue_insns (void)
>/* Threading the prologue and epilogue changes the artificial refs in the
>   entry and exit blocks, and may invalidate DF info for tail calls.  */
>if (optimize
> +  || cfun->tail_call_marked

Can you add comments in both places as to this being because of [[musttail]]
which marks tail-calls even with -fno-optimize-sibling-calls?

OK with those change.

>|| flag_optimize_sibling_calls
>|| flag_ipa_icf_functions
>|| in_lto_p)
> @@ -6557,7 +6558,7 @@ rest_of_handle_thread_prologue_and_epilogue (function 
> *fun)
>  {
>/* prepare_shrink_wrap is sensitive to the block structure of the control
>   flow graph, so clean it up first.  */
> -  if (optimize)
> +  if (cfun->tail_call_marked || optimize)
>  cleanup_cfg (0);
>
>/* On some machines, the prologue and epilogue code, or parts thereof,
> --
> 2.45.2
>


Re: [PATCH v8 03/12] Add a musttail generic attribute to the c-attribs table

2024-07-05 Thread Richard Biener
On Sat, Jun 22, 2024 at 8:57 PM Andi Kleen  wrote:
>
> It does nothing currently since statement attributes are handled
> directly in the parser.

Is this needed at all?  a "'musttail' attribute ignored" diagnostic isn't
much more helpful than "'foo' attribute directive ignored"?  Or does
stmt attribute parsing rely on this table as well?

Richard.

> gcc/c-family/ChangeLog:
>
> * c-attribs.cc (handle_musttail_attribute): Add.
> * c-common.h (handle_musttail_attribute): Add.
> ---
>  gcc/c-family/c-attribs.cc | 15 +++
>  gcc/c-family/c-common.h   |  1 +
>  2 files changed, 16 insertions(+)
>
> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> index f9b229aba7fc..5adc7b775eaf 100644
> --- a/gcc/c-family/c-attribs.cc
> +++ b/gcc/c-family/c-attribs.cc
> @@ -340,6 +340,8 @@ const struct attribute_spec c_common_gnu_attributes[] =
>{ "common", 0, 0, true,  false, false, false,
>   handle_common_attribute,
>   attr_common_exclusions },
> +  { "musttail",  0, 0, false, false, false,
> + false, handle_musttail_attribute, NULL },
>/* FIXME: logically, noreturn attributes should be listed as
>   "false, true, true" and apply to function types.  But implementing this
>   would require all the places in the compiler that use TREE_THIS_VOLATILE
> @@ -1222,6 +1224,19 @@ handle_common_attribute (tree *node, tree name, tree 
> ARG_UNUSED (args),
>return NULL_TREE;
>  }
>
> +/* Handle a "musttail" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +
> +tree
> +handle_musttail_attribute (tree ARG_UNUSED (*node), tree name, tree 
> ARG_UNUSED (args),
> +  int ARG_UNUSED (flags), bool *no_add_attrs)
> +{
> +  /* Currently only a statement attribute, handled directly in parser.  */
> +  warning (OPT_Wattributes, "%qE attribute ignored", name);
> +  *no_add_attrs = true;
> +  return NULL_TREE;
> +}
> +
>  /* Handle a "noreturn" attribute; arguments as in
> struct attribute_spec.handler.  */
>
> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> index 48c89b603bcd..e84c9c47513b 100644
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -1643,6 +1643,7 @@ extern tree find_tm_attribute (tree);
>  extern const struct attribute_spec::exclusions attr_cold_hot_exclusions[];
>  extern const struct attribute_spec::exclusions attr_noreturn_exclusions[];
>  extern tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
> +extern tree handle_musttail_attribute (tree *, tree, tree, int, bool *);
>  extern bool has_attribute (location_t, tree, tree, tree (*)(tree));
>  extern tree build_attr_access_from_parms (tree, bool);
>
> --
> 2.45.2
>


Re: [PATCH v8 06/12] Add tests for C/C++ musttail attributes

2024-07-05 Thread Richard Biener
On Sat, Jun 22, 2024 at 8:59 PM Andi Kleen  wrote:
>
> Some adopted from the existing C musttail plugin tests.

Looks good to me.

> gcc/testsuite/ChangeLog:
>
> * c-c++-common/musttail1.c: New test.
> * c-c++-common/musttail2.c: New test.
> * c-c++-common/musttail3.c: New test.
> * c-c++-common/musttail4.c: New test.
> * c-c++-common/musttail7.c: New test.
> * c-c++-common/musttail8.c: New test.
> * g++.dg/musttail6.C: New test.
> * g++.dg/musttail9.C: New test.
> * g++.dg/musttail10.C: New test.
> ---
>  gcc/testsuite/c-c++-common/musttail1.c | 14 ++
>  gcc/testsuite/c-c++-common/musttail2.c | 33 ++
>  gcc/testsuite/c-c++-common/musttail3.c | 29 +
>  gcc/testsuite/c-c++-common/musttail4.c | 17 
>  gcc/testsuite/c-c++-common/musttail5.c | 28 
>  gcc/testsuite/c-c++-common/musttail7.c | 14 ++
>  gcc/testsuite/c-c++-common/musttail8.c | 17 
>  gcc/testsuite/g++.dg/musttail10.C  | 34 +++
>  gcc/testsuite/g++.dg/musttail6.C   | 59 ++
>  gcc/testsuite/g++.dg/musttail9.C   | 10 +
>  10 files changed, 255 insertions(+)
>  create mode 100644 gcc/testsuite/c-c++-common/musttail1.c
>  create mode 100644 gcc/testsuite/c-c++-common/musttail2.c
>  create mode 100644 gcc/testsuite/c-c++-common/musttail3.c
>  create mode 100644 gcc/testsuite/c-c++-common/musttail4.c
>  create mode 100644 gcc/testsuite/c-c++-common/musttail5.c
>  create mode 100644 gcc/testsuite/c-c++-common/musttail7.c
>  create mode 100644 gcc/testsuite/c-c++-common/musttail8.c
>  create mode 100644 gcc/testsuite/g++.dg/musttail10.C
>  create mode 100644 gcc/testsuite/g++.dg/musttail6.C
>  create mode 100644 gcc/testsuite/g++.dg/musttail9.C
>
> diff --git a/gcc/testsuite/c-c++-common/musttail1.c 
> b/gcc/testsuite/c-c++-common/musttail1.c
> new file mode 100644
> index ..74efcc2a0bc6
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/musttail1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
> +/* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
> +
> +int __attribute__((noinline,noclone,noipa))
> +callee (int i)
> +{
> +  return i * i;
> +}
> +
> +int __attribute__((noinline,noclone,noipa))
> +caller (int i)
> +{
> +  [[gnu::musttail]] return callee (i + 1);
> +}
> diff --git a/gcc/testsuite/c-c++-common/musttail2.c 
> b/gcc/testsuite/c-c++-common/musttail2.c
> new file mode 100644
> index ..86f2c3d77404
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/musttail2.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
> +
> +struct box { char field[256]; int i; };
> +
> +int __attribute__((noinline,noclone,noipa))
> +test_2_callee (int i, struct box b)
> +{
> +  if (b.field[0])
> +return 5;
> +  return i * i;
> +}
> +
> +int __attribute__((noinline,noclone,noipa))
> +test_2_caller (int i)
> +{
> +  struct box b;
> +  [[gnu::musttail]] return test_2_callee (i + 1, b); /* { dg-error "cannot 
> tail-call: " } */
> +}
> +
> +extern void setjmp (void);
> +void
> +test_3 (void)
> +{
> +  [[gnu::musttail]] return setjmp (); /* { dg-error "cannot tail-call: " } */
> +}
> +
> +extern float f7(void);
> +
> +int
> +test_6 (void)
> +{
> +  [[gnu::musttail]] return f7(); /* { dg-error "cannot tail-call: " } */
> +}
> diff --git a/gcc/testsuite/c-c++-common/musttail3.c 
> b/gcc/testsuite/c-c++-common/musttail3.c
> new file mode 100644
> index ..ea9589c59ef2
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/musttail3.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
> +
> +extern int foo2 (int x, ...);
> +
> +struct str
> +{
> +  int a, b;
> +};
> +
> +struct str
> +cstruct (int x)
> +{
> +  if (x < 10)
> +[[clang::musttail]] return cstruct (x + 1);
> +  return ((struct str){ x, 0 });
> +}
> +
> +int
> +foo (int x)
> +{
> +  if (x < 10)
> +[[clang::musttail]] return foo2 (x, 29);
> +  if (x < 100)
> +{
> +  int k = foo (x + 1);
> +  [[clang::musttail]] return k;/* { dg-error "cannot tail-call: " } 
> */
> +}
> +  return x;
> +}
> diff --git a/gcc/testsuite/c-c++-common/musttail4.c 
> b/gcc/testsuite/c-c++-common/musttail4.c
> new file mode 100644
> index ..23f4b5e1cd68
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/musttail4.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
> +
> +struct box { char field[64]; int i; };
> +
> +struct box __attribute__((noinline,noclone,noipa))
> +returns_struct (int i)
> +{
> +  struct box b;
> +  b.i = i * i;
> +  return b;
> +}
> +
> +int __attribute__((noinline,noclone))
> +test_1 (int i)
> +{
> +  [[gnu::musttail]] return returns_struct (i * 5).i; /* { dg-error "cannot 
> tail-call: " } */
> +}
> diff --git a/gcc/testsuite/c-c++-common/musttail5.c 
> b/gcc/testsuite/c-c++-common/musttail5

Re: [PATCH v8 08/12] Give better error messages for musttail

2024-07-05 Thread Richard Biener
On Sat, Jun 22, 2024 at 9:01 PM Andi Kleen  wrote:
>
> When musttail is set, make tree-tailcall give error messages
> when it cannot handle a call. This avoids vague "other reasons"
> error messages later at expand time when it sees a musttail
> function not marked tail call.
>
> In various cases this requires delaying the error until
> the call is discovered.
>
> gcc/ChangeLog:
>
> * tree-tailcall.cc (maybe_error_musttail): New function.
> (bb_get_succ_edge_count): New function.
> (find_tail_calls): Add error reporting. Handle EH edges
> for error reporting.
> ---
>  gcc/tree-tailcall.cc | 116 +--
>  1 file changed, 102 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/tree-tailcall.cc b/gcc/tree-tailcall.cc
> index 0c6df10e64f7..4687e20e61d0 100644
> --- a/gcc/tree-tailcall.cc
> +++ b/gcc/tree-tailcall.cc
> @@ -40,9 +40,11 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-eh.h"
>  #include "dbgcnt.h"
>  #include "cfgloop.h"
> +#include "intl.h"
>  #include "common/common-target.h"
>  #include "ipa-utils.h"
>  #include "tree-ssa-live.h"
> +#include "diagnostic-core.h"
>
>  /* The file implements the tail recursion elimination.  It is also used to
> analyze the tail calls in general, passing the results to the rtl level
> @@ -402,6 +404,41 @@ propagate_through_phis (tree var, edge e)
>return var;
>  }
>
> +/* Report an error for failing to tail convert must call CALL
> +   with error message ERR. Also clear the flag to prevent further
> +   errors.  */
> +
> +static void
> +maybe_error_musttail (gcall *call, const char *err)
> +{
> +  if (gimple_call_must_tail_p (call))
> +{
> +  error_at (call->location, "cannot tail-call: %s", err);
> +  /* Avoid another error. ??? If there are multiple reasons why tail
> +calls fail it might be useful to report them all to avoid
> +whack-a-mole for the user. But currently there is too much
> +redundancy in the reporting, so keep it simple.  */
> +  gimple_call_set_must_tail (call, false); /* Avoid another error.  */
> +  gimple_call_set_tail (call, false);
> +}
> +}
> +
> +/* Count succ edges for BB and return in NUM_OTHER and NUM_EH.  */
> +
> +static void
> +bb_get_succ_edge_count (basic_block bb, int &num_other, int &num_eh)
> +{
> +  edge e;
> +  edge_iterator ei;
> +  num_eh = 0;
> +  num_other = 0;
> +  FOR_EACH_EDGE (e, ei, bb->succs)
> +if (e->flags & EDGE_EH)
> +  num_eh++;
> +else
> +  num_other++;
> +}
> +
>  /* Argument for compute_live_vars/live_vars_at_stmt and what 
> compute_live_vars
> returns.  Computed lazily, but just once for the function.  */
>  static live_vars_map *live_vars;
> @@ -426,8 +463,16 @@ find_tail_calls (basic_block bb, struct tailcall **ret, 
> bool only_musttail)
>tree var;
>
>if (!single_succ_p (bb))
> -return;
> +{
> +  int num_eh, num_other;
> +  bb_get_succ_edge_count (bb, num_eh, num_other);
> +  /* Allow EH edges so that we can give a better
> +error message later.  */

Please instead use has_abnormal_or_eh_outgoing_edge_p (bb) instead
to avoid adding another function like this.  Also only continue searching
for a musttail call if cfun->has_musttail

> +  if (num_other != 1)
> +   return;
> +}
>
> +  bool bad_stmt = false;
>for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
>  {
>stmt = gsi_stmt (gsi);
> @@ -448,6 +493,12 @@ find_tail_calls (basic_block bb, struct tailcall **ret, 
> bool only_musttail)
>   /* Handle only musttail calls when not optimizing.  */
>   if (only_musttail && !gimple_call_must_tail_p (call))
> return;
> + if (bad_stmt)
> +   {
> + maybe_error_musttail (call,
> + _("memory reference or volatile after call"));
> + return;
> +   }
>   ass_var = gimple_call_lhs (call);
>   break;
> }
> @@ -462,9 +513,14 @@ find_tail_calls (basic_block bb, struct tailcall **ret, 
> bool only_musttail)
>/* If the statement references memory or volatile operands, fail.  */
>if (gimple_references_memory_p (stmt)
>   || gimple_has_volatile_ops (stmt))
> -   return;
> +   {
> + bad_stmt = true;

break here when !cfun->has_musttail?

> +   }
>  }
> +  if (bad_stmt)
> +return;
> +
>if (gsi_end_p (gsi))
>  {
>edge_iterator ei;
> @@ -489,13 +545,26 @@ find_tail_calls (basic_block bb, struct tailcall **ret, 
> bool only_musttail)
>if (ass_var
>&& !is_gimple_reg (ass_var)
>&& !auto_var_in_fn_p (ass_var, cfun->decl))
> -return;
> +{
> +  maybe_error_musttail (call, _("return value in memory"));
> +  return;
> +}
> +
> +  if (cfun->calls_setjmp)
> +{
> +  maybe_error_musttail (call, _("caller uses setjmp"));
> +  return;
> +}
>
>/* If the ca

[patch, avr, applied] Fix PR87376: Wrong code with __memx and long long addition

2024-07-05 Thread Georg-Johann Lay

The DImode expanders from avr-dimode.md have code like

emit_move_insn(acc_a, operands[1])

where acc_a is a hard register and operands[1] may be a
mem reference to a non-generic address-space. The latter
may require a libcall, and since DImode moves are split into
8 QImode moves, these libcalls might clobber some of the
hard regs that hold acc_a or acc_b.

This patch simply denies non-generic address-space refs
as DImode expander's inputs.

Johann

--

AVR: target/87376 - Use nop_general_operand for DImode inputs.

The avr-dimode.md expanders have code like emit_move_insn (acc_a, 
operands[1]);

where acc_a is a hard register and operands[1] might be a non-generic
address-space memory reference.  Such loads may clobber hard regs since
some of them are implemented as libgcc calls /and/ 64-bit moves are
expanded as eight byte-moves, so that acc_a or acc_b might be clobbered
by such a load.

This patch simply denies non-generic address-space references by using
nop_general_operand for all avr-dimode.md input predicates.
With the patch, all memory loads that require library calls are issued
before the expander codes from avr-dimode.md are run.

PR target/87376
gcc/
* config/avr/avr-dimode.md: Use "nop_general_operand" instead
of "general_operand" as predicate for all input operands.

gcc/testsuite/
* gcc.target/avr/torture/pr87376.c: New test.diff --git a/gcc/config/avr/avr-dimode.md b/gcc/config/avr/avr-dimode.md
index 4b74e77e5e5..c357213e211 100644
--- a/gcc/config/avr/avr-dimode.md
+++ b/gcc/config/avr/avr-dimode.md
@@ -62,8 +62,8 @@ (define_mode_iterator ALL8S [ DQ  DA  TA])
 ;; "addta3" "adduta3"
 (define_expand "add3"
   [(parallel [(match_operand:ALL8 0 "general_operand" "")
-  (match_operand:ALL8 1 "general_operand" "")
-  (match_operand:ALL8 2 "general_operand" "")])]
+  (match_operand:ALL8 1 "nop_general_operand")
+  (match_operand:ALL8 2 "nop_general_operand")])]
   "avr_have_dimode"
   {
 rtx acc_a = gen_rtx_REG (mode, ACC_A);
@@ -178,8 +178,8 @@ (define_insn "*add3_const_insn"
 ;; "subta3" "subuta3"
 (define_expand "sub3"
   [(parallel [(match_operand:ALL8 0 "general_operand" "")
-  (match_operand:ALL8 1 "general_operand" "")
-  (match_operand:ALL8 2 "general_operand" "")])]
+  (match_operand:ALL8 1 "nop_general_operand")
+  (match_operand:ALL8 2 "nop_general_operand")])]
   "avr_have_dimode"
   {
 rtx acc_a = gen_rtx_REG (mode, ACC_A);
@@ -259,8 +259,8 @@ (define_insn "*sub3_const_insn"
 
 (define_expand "3"
   [(set (match_operand:ALL8S 0 "general_operand" "")
-(ss_addsub:ALL8S (match_operand:ALL8S 1 "general_operand" "")
- (match_operand:ALL8S 2 "general_operand" "")))]
+(ss_addsub:ALL8S (match_operand:ALL8S 1 "nop_general_operand")
+ (match_operand:ALL8S 2 "nop_general_operand")))]
   "avr_have_dimode"
   {
 rtx acc_a = gen_rtx_REG (mode, ACC_A);
@@ -332,8 +332,8 @@ (define_insn "*3_const_insn"
 
 (define_expand "3"
   [(set (match_operand:ALL8U 0 "general_operand" "")
-(us_addsub:ALL8U (match_operand:ALL8U 1 "general_operand" "")
- (match_operand:ALL8U 2 "general_operand" "")))]
+(us_addsub:ALL8U (match_operand:ALL8U 1 "nop_general_operand")
+ (match_operand:ALL8U 2 "nop_general_operand")))]
   "avr_have_dimode"
   {
 rtx acc_a = gen_rtx_REG (mode, ACC_A);
@@ -405,7 +405,7 @@ (define_insn "*3_const_insn"
 
 (define_expand "negdi2"
   [(parallel [(match_operand:DI 0 "general_operand" "")
-  (match_operand:DI 1 "general_operand" "")])]
+  (match_operand:DI 1 "nop_general_operand")])]
   "avr_have_dimode"
   {
 rtx acc_a = gen_rtx_REG (DImode, ACC_A);
@@ -602,8 +602,8 @@ (define_code_iterator di_shifts
 ;; "ashluta3"  "ashruta3"  "lshruta3"  "rotluta3"
 (define_expand "3"
   [(parallel [(match_operand:ALL8 0 "general_operand" "")
-  (di_shifts:ALL8 (match_operand:ALL8 1 "general_operand" "")
-  (match_operand:QI 2 "general_operand" ""))])]
+  (di_shifts:ALL8 (match_operand:ALL8 1 "nop_general_operand")
+  (match_operand:QI 2 "nop_general_operand"))])]
   "avr_have_dimode"
   {
 rtx acc_a = gen_rtx_REG (mode, ACC_A);
@@ -648,8 +648,8 @@ (define_insn "*3_insn"
 ;; "mulsidi3"
 (define_expand "mulsidi3"
   [(parallel [(match_operand:DI 0 "register_operand" "")
-  (match_operand:SI 1 "general_operand" "")
-  (match_operand:SI 2 "general_operand" "")
+  (match_operand:SI 1 "nop_general_operand")
+  (match_operand:SI 2 "nop_general_operand")
   ;; Just to mention the iterator 
   (clobber (any_extend:SI (match_dup 1)))])]
   "avr_have_dimode
diff --git a/gcc/testsuite/gcc.target/avr/torture/pr87376.c b/gcc/testsuite/gcc.target/avr/torture/pr87376.c
new

[committed v3] libstdc++: Use memchr to optimize std::find [PR88545]

2024-07-05 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

Patch v3 adds the in-range check to ranges::find to avoid false positive
matches:

  if (static_cast>(__value) != __value)
return __last;

-- >8 --

This optimizes std::find to use memchr when searching for an integer in
a range of bytes.

libstdc++-v3/ChangeLog:

PR libstdc++/88545
PR libstdc++/115040
* include/bits/cpp_type_traits.h (__can_use_memchr_for_find):
New variable template.
* include/bits/ranges_util.h (__find_fn): Use memchr when
possible.
* include/bits/stl_algo.h (find): Likewise.
* testsuite/25_algorithms/find/bytes.cc: New test.
---
 libstdc++-v3/include/bits/cpp_type_traits.h   |  13 ++
 libstdc++-v3/include/bits/ranges_util.h   |  21 +++
 libstdc++-v3/include/bits/stl_algo.h  |  35 -
 .../testsuite/25_algorithms/find/bytes.cc | 135 ++
 4 files changed, 202 insertions(+), 2 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/find/bytes.cc

diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
b/libstdc++-v3/include/bits/cpp_type_traits.h
index abe0c7603e3..4bfb4521e06 100644
--- a/libstdc++-v3/include/bits/cpp_type_traits.h
+++ b/libstdc++-v3/include/bits/cpp_type_traits.h
@@ -35,6 +35,10 @@
 #pragma GCC system_header
 
 #include 
+#include 
+#if __glibcxx_type_trait_variable_templates
+# include  // is_same_v, is_integral_v
+#endif
 
 //
 // This file provides some compile-time information about various types.
@@ -547,6 +551,15 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
 { static constexpr bool __value = false; };
 #endif
 
+#if __glibcxx_type_trait_variable_templates
+  template
+constexpr bool __can_use_memchr_for_find
+// Can only use memchr to search for narrow characters and std::byte.
+  = __is_byte<_ValT>::__value
+   // And only if the value to find is an integer (or is also std::byte).
+ && (is_same_v<_Tp, _ValT> || is_integral_v<_Tp>);
+#endif
+
   //
   // Move iterator type
   //
diff --git a/libstdc++-v3/include/bits/ranges_util.h 
b/libstdc++-v3/include/bits/ranges_util.h
index 9b79c3a229d..186acae4f70 100644
--- a/libstdc++-v3/include/bits/ranges_util.h
+++ b/libstdc++-v3/include/bits/ranges_util.h
@@ -34,6 +34,7 @@
 # include 
 # include 
 # include 
+# include  // __can_use_memchr_for_find
 
 #ifdef __glibcxx_ranges
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -494,6 +495,26 @@ namespace ranges
   operator()(_Iter __first, _Sent __last,
 const _Tp& __value, _Proj __proj = {}) const
   {
+   if constexpr (is_same_v<_Proj, identity>)
+ if constexpr(__can_use_memchr_for_find, _Tp>)
+   if constexpr (sized_sentinel_for<_Sent, _Iter>)
+ if constexpr (contiguous_iterator<_Iter>)
+   if (!is_constant_evaluated())
+ {
+   if (static_cast>(__value) != __value)
+ return __last;
+
+   auto __n = __last - __first;
+   if (__n > 0)
+ {
+   const int __ival = static_cast(__value);
+   const void* __p0 = std::to_address(__first);
+   if (auto __p1 = __builtin_memchr(__p0, __ival, __n))
+ __n = (const char*)__p1 - (const char*)__p0;
+ }
+   return __first + __n;
+ }
+
while (__first != __last
&& !(std::__invoke(__proj, *__first) == __value))
  ++__first;
diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index 1a996aa61da..45c3b591326 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -3838,14 +3838,45 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   template
 _GLIBCXX20_CONSTEXPR
 inline _InputIterator
-find(_InputIterator __first, _InputIterator __last,
-const _Tp& __val)
+find(_InputIterator __first, _InputIterator __last, const _Tp& __val)
 {
   // concept requirements
   __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>)
   __glibcxx_function_requires(_EqualOpConcept<
typename iterator_traits<_InputIterator>::value_type, _Tp>)
   __glibcxx_requires_valid_range(__first, __last);
+
+#if __cpp_if_constexpr && __glibcxx_type_trait_variable_templates
+  using _ValT = typename iterator_traits<_InputIterator>::value_type;
+  if constexpr (__can_use_memchr_for_find<_ValT, _Tp>)
+   {
+ // If converting the value to the 1-byte value_type alters its value,
+ // then it would not be found by std::find using equality comparison.
+ // We need to check this here, because otherwise something like
+ // memchr("a", 'a'+256, 1) would give a false positive match.
+ if (!(static_cast<_ValT>(__val) == __val))
+   return __last;
+ else if (!__is_constant_evaluated())
+   

Re: [PATCH 1/3] libstdc++: Use RAII in

2024-07-05 Thread Jonathan Wakely
Thanks for the feedback - pushed.

On Wed, 3 Jul 2024 at 16:37, Ville Voutilainen
 wrote:
>
> On Wed, 3 Jul 2024 at 18:33, Jonathan Wakely  wrote:
> >
> > On Thu, 27 Jun 2024 at 11:52, Jonathan Wakely wrote:
> > >
> > > This refactoring to use RAII doesn't seem to make any difference in
> > > benchmarks, although the generated code for some std::vector operations
> > > seems to be slightly larger. Maybe it will be faster (or slower) in some
> > > cases I didn't test?
> > >
> > > I think I like the change anyway - any other opinions on whether it's an
> > > improvement?
> >
> > Any thoughts before I push this? Better? Worse? Needs more cowbell?
>
> I think the patch is an improvement. Push it.
>



[committed] libstdc++: Add dg-error for new -Wdelete-incomplete diagnostics [PR115747]

2024-07-05 Thread Jonathan Wakely
These errors only show up when testing with -std=c++26 which isn't done
by default.

Tested x86_64-linux. Pushed to trunk.

-- >8 --

Since r15-1794-gbeb7a418aaef2e the -Wdelete-incomplete diagnostic is a
permerror instead of a (suppressed in system headers) warning. Add
dg-error directives.

libstdc++-v3/ChangeLog:

PR c++/115747
* testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc:
Add dg-error for new C++26 diagnostics.
---
 .../tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc   | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc 
b/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc
index d4cb45d0e06..a4c99ca1775 100644
--- 
a/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc
+++ 
b/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc
@@ -39,6 +39,9 @@ void test01()
   // { dg-error "incomplete" "" { target *-*-* } 600 }
 }
 
+// { dg-error "-Wdelete-incomplete" "" { target c++26 } 283 }
+// { dg-error "-Wdelete-incomplete" "" { target c++26 } 305 }
+
 // Ignore additional diagnostic given with -Wsystem-headers:
 // { dg-prune-output "has incomplete type" }
 // { dg-prune-output "possible problem detected" }
-- 
2.45.2



Re: [PATCH v8 09/12] Delay caller error reporting for musttail

2024-07-05 Thread Richard Biener
On Sat, Jun 22, 2024 at 9:00 PM Andi Kleen  wrote:
>
> Move the error reporting for caller attributes to be
> after the tail call discovery, so that we can give proper
> error messages tagged to the calls.

Hmm.  This all gets a bit awkward.  I realize that early checking
gets us less compile-time unnecessarily spent for searching for
a tail call - but at least for the musttail case parsing constraints
should put a practical limit on how far to look?

So what I wonder is whether it would be better to separate
searching for a (musttail) candidate separate from validation?

We could for example invoke find_tail_calls twice, once to
find a musttail candidate (can there be multiple ones?) and once
to validate and error?  Would that make the delaying less awkward?

> gcc/ChangeLog:
>
> * tree-tailcall.cc (maybe_error_musttail): Declare.
> (suitable_for_tail_opt_p): Take call and report errors.
> (suitable_for_tail_call_opt_p): Take call and report errors.
> (find_tail_calls): Report caller errors after discovery.
> (tree_optimize_tail_calls_1): Remove caller suitableness check.
> ---
>  gcc/tree-tailcall.cc | 62 ++--
>  1 file changed, 43 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/tree-tailcall.cc b/gcc/tree-tailcall.cc
> index 4687e20e61d0..a77fa1511415 100644
> --- a/gcc/tree-tailcall.cc
> +++ b/gcc/tree-tailcall.cc
> @@ -133,14 +133,20 @@ static tree m_acc, a_acc;
>
>  static bitmap tailr_arg_needs_copy;
>
> +static void maybe_error_musttail (gcall *call, const char *err);
> +
>  /* Returns false when the function is not suitable for tail call optimization
> -   from some reason (e.g. if it takes variable number of arguments).  */
> +   from some reason (e.g. if it takes variable number of arguments). CALL
> +   is call to report for.  */
>
>  static bool
> -suitable_for_tail_opt_p (void)
> +suitable_for_tail_opt_p (gcall *call)
>  {
>if (cfun->stdarg)
> -return false;
> +{
> +  maybe_error_musttail (call, _("caller uses stdargs"));
> +  return false;
> +}
>
>return true;
>  }
> @@ -148,35 +154,47 @@ suitable_for_tail_opt_p (void)
>  /* Returns false when the function is not suitable for tail call optimization
> for some reason (e.g. if it takes variable number of arguments).
> This test must pass in addition to suitable_for_tail_opt_p in order to 
> make
> -   tail call discovery happen.  */
> +   tail call discovery happen. CALL is call to report error for.  */
>
>  static bool
> -suitable_for_tail_call_opt_p (void)
> +suitable_for_tail_call_opt_p (gcall *call)
>  {
>tree param;
>
>/* alloca (until we have stack slot life analysis) inhibits
>   sibling call optimizations, but not tail recursion.  */
>if (cfun->calls_alloca)
> -return false;
> +{
> +  maybe_error_musttail (call, _("caller uses alloca"));
> +  return false;
> +}
>
>/* If we are using sjlj exceptions, we may need to add a call to
>   _Unwind_SjLj_Unregister at exit of the function.  Which means
>   that we cannot do any sibcall transformations.  */
>if (targetm_common.except_unwind_info (&global_options) == UI_SJLJ
>&& current_function_has_exception_handlers ())
> -return false;
> +{
> +  maybe_error_musttail (call, _("caller uses sjlj exceptions"));
> +  return false;
> +}
>
>/* Any function that calls setjmp might have longjmp called from
>   any called function.  ??? We really should represent this
>   properly in the CFG so that this needn't be special cased.  */
>if (cfun->calls_setjmp)
> -return false;
> +{
> +  maybe_error_musttail (call, _("caller uses setjmp"));
> +  return false;
> +}
>
>/* Various targets don't handle tail calls correctly in functions
>   that call __builtin_eh_return.  */
>if (cfun->calls_eh_return)
> -return false;
> +{
> +  maybe_error_musttail (call, _("caller uses __builtin_eh_return"));
> +  return false;
> +}
>
>/* ??? It is OK if the argument of a function is taken in some cases,
>   but not in all cases.  See PR15387 and PR19616.  Revisit for 4.1.  */
> @@ -184,7 +202,10 @@ suitable_for_tail_call_opt_p (void)
> param;
> param = DECL_CHAIN (param))
>  if (TREE_ADDRESSABLE (param))
> -  return false;
> +  {
> +   maybe_error_musttail (call, _("address of caller arguments taken"));
> +   return false;
> +  }
>
>return true;
>  }
> @@ -445,10 +466,12 @@ static live_vars_map *live_vars;
>  static vec live_vars_vec;
>
>  /* Finds tailcalls falling into basic block BB. The list of found tailcalls 
> is
> -   added to the start of RET. When ONLY_MUSTTAIL is set only handle 
> musttail.  */
> +   added to the start of RET. When ONLY_MUSTTAIL is set only handle musttail.
> +   Update OPT_TAILCALLS as output parameter.  */
>
>  static void
> -find_tail_calls (basic_block bb, struct tailcall **ret, 

[RFC] tree-if-conv: Handle nonzero masked elements [PR115336].

2024-07-05 Thread Robin Dapp
Hi,

in PR115336 we have the following

  vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0, 
... }, _482, 0);
  vect_iftmp.44 = vect_patt_391 | { 1, ... };
  .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44);

which assumes that a maskload sets the masked-out elements to zero.
RVV does not guarantee this and, unfortunately, leaves it to the
hardware implementation to do basically anything it wants (even keep
the previous value).  In the PR this leads to us reusing a previous
vector register and stale, nonzero values causing an invalid result.

Based on a Richard Sandiford's feedback I started with the attached
patch - it's more an RFC in its current shape and obviously not
tested exhaustively.

The idea is:
 - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value)
   after a MASK_LOAD if the target requires it.
 - Elide the VCOND_MASK when there is a COND_OP with a replacing else
   value later.

Is this, generally, reasonable or is there a better way?

My initial idea was to add an else value to MASK_LOAD.  Richard's
concern was, though, that this might make nonzero else values
appear inexpensive when they are actually not.

Even though I, mechanically, added match.pd patterns to catch
the most common cases (already at a point where a separate function
maybe in gimple-match-exports? would make more sense), there is
still significant code-quality fallout.
The regressing cases are often of a form where the VCOND_MASK is
not just a conversion away but rather hidden behind some unmasked
operation.  I'm not sure if we could ever recognize everything that
way without descending very deep.

Regards
 Robin

gcc/ChangeLog:

PR middle-end/115336

* config/riscv/riscv.cc (riscv_preferred_else_value): Add
MASK_LOAD.
* config/riscv/riscv.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO):
Set to true.
* defaults.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO): New.
* doc/tm.texi: Document.
* doc/tm.texi.in: Document.
* match.pd: Add patterns to allow replacing the else value of a
VEC_COND.
* tree-if-conv.cc (predicate_load_or_store): Emit a VEC_COND
after a MASK_LOAD if the target does not guarantee zeroing.
(predicate_statements): Add temporary lhs argument.
* tree-ssa-math-opts.cc (convert_mult_to_fma_1): Re-fold fold
result.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr115336.c: New test.
---
 gcc/config/riscv/riscv.cc |   2 +
 gcc/config/riscv/riscv.h  |   2 +
 gcc/defaults.h|   4 +
 gcc/doc/tm.texi   |   5 +
 gcc/doc/tm.texi.in|   5 +
 gcc/match.pd  | 125 +-
 .../gcc.target/riscv/rvv/autovec/pr115336.c   |  20 +++
 gcc/tree-if-conv.cc   |  57 +++-
 gcc/tree-ssa-math-opts.cc |   8 ++
 9 files changed, 217 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index c17141d909a..e10bc6824b9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11481,6 +11481,8 @@ riscv_preferred_else_value (unsigned ifn, tree vectype, 
unsigned int nops,
 {
   if (riscv_v_ext_mode_p (TYPE_MODE (vectype)))
 return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
+  else if (ifn == IFN_MASK_LOAD)
+return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
 
   return default_preferred_else_value (ifn, vectype, nops, ops);
 }
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 57910eecd3e..dc15eb5e60f 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1264,4 +1264,6 @@ extern void riscv_remove_unneeded_save_restore_calls 
(void);
 /* Check TLS Descriptors mechanism is selected.  */
 #define TARGET_TLSDESC (riscv_tls_dialect == TLS_DESCRIPTORS)
 
+#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 1
+
 #endif /* ! GCC_RISCV_H */
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 92f3e07f742..6ffbdaea229 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1457,6 +1457,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively. 
 If not, see
 #define DWARF_VERSION_DEFAULT 5
 #endif
 
+#ifndef TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO
+#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 0
+#endif
+
 #ifndef USED_FOR_TARGET
 /* Done this way to keep gengtype happy.  */
 #if BITS_PER_UNIT == 8
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index be5543b72f8..2fdebe7fd21 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12683,6 +12683,11 @@ maintainer is familiar with.
 
 @end defmac
 
+@defmac TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO
+Bla
+
+@end defmac
+
 @deftypefn {Target Hook} bool TARGET_HAVE_SPECULATION_SAFE_VALUE (bool 
@var{act

Re: [PATCH v8 10/12] Add documentation for musttail attribute

2024-07-05 Thread Richard Biener
On Sat, Jun 22, 2024 at 9:00 PM Andi Kleen  wrote:
>
> gcc/ChangeLog:

OK.

> * doc/extend.texi: Document [[musttail]]
> ---
>  gcc/doc/extend.texi | 25 +++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index b2e41a581dd1..f83e643da19c 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -9921,7 +9921,7 @@ same manner as the @code{deprecated} attribute.
>  @section Statement Attributes
>  @cindex Statement Attributes
>
> -GCC allows attributes to be set on null statements.  @xref{Attribute Syntax},
> +GCC allows attributes to be set on statements.  @xref{Attribute Syntax},
>  for details of the exact syntax for using attributes.  Other attributes are
>  available for functions (@pxref{Function Attributes}), variables
>  (@pxref{Variable Attributes}), labels (@pxref{Label Attributes}), enumerators
> @@ -9978,6 +9978,25 @@ foo (int x, int y)
>  @code{y} is not actually incremented and the compiler can but does not
>  have to optimize it to just @code{return 42 + 42;}.
>
> +@cindex @code{musttail} statement attribute
> +@item musttail
> +
> +The @code{gnu::musttail} or @code{clang::musttail} attribute
> +can be applied to a @code{return} statement with a return-value expression
> +that is a function call.  It asserts that the call must be a tail call that
> +does not allocate extra stack space, so it is safe to use tail recursion
> +to implement long running loops.
> +
> +@smallexample
> +[[gnu::musttail]] return foo();
> +@end smallexample
> +
> +If the compiler cannot generate a @code{musttail} tail call it will report
> +an error. On some targets tail calls may never be supported.
> +Tail calls cannot reference locals in memory, which may affect
> +builds without optimization when passing small structures, or passing
> +or returning large structures. Enabling -O1 or -O2 can improve
> +the success of tail calls.
>  @end table
>
>  @node Attribute Syntax
> @@ -10101,7 +10120,9 @@ the constant expression, if present.
>
>  @subsubheading Statement Attributes
>  In GNU C, an attribute specifier list may appear as part of a null
> -statement.  The attribute goes before the semicolon.
> +statement. The attribute goes before the semicolon.
> +Some attributes in new style syntax are also supported
> +on non-null statements.
>
>  @subsubheading Type Attributes
>
> --
> 2.45.2
>


Re: [PATCH v8 11/12] Dump reason for missing tail call into dump file

2024-07-05 Thread Richard Biener
On Sat, Jun 22, 2024 at 9:01 PM Andi Kleen  wrote:
>
> gcc/ChangeLog:

OK.  Btw, I wonder if you can squash all error/dump related patches to
tree-tailcall.cc

> * tree-tailcall.cc (maybe_error_musttail): Print reason to
> dump_file.
> (find_tail_calls): Print gimple stmt or other reasons that stop
> the search for tail calls into dump file.
> ---
>  gcc/tree-tailcall.cc | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-tailcall.cc b/gcc/tree-tailcall.cc
> index a77fa1511415..f69a9ad40bda 100644
> --- a/gcc/tree-tailcall.cc
> +++ b/gcc/tree-tailcall.cc
> @@ -442,6 +442,11 @@ maybe_error_musttail (gcall *call, const char *err)
>gimple_call_set_must_tail (call, false); /* Avoid another error.  */
>gimple_call_set_tail (call, false);
>  }
> +  if (dump_file)
> +{
> +  print_gimple_stmt (dump_file, call, 0, TDF_SLIM);
> +  fprintf (dump_file, "Cannot convert: %s\n", err);
> +}
>  }
>
>  /* Count succ edges for BB and return in NUM_OTHER and NUM_EH.  */
> @@ -492,7 +497,12 @@ find_tail_calls (basic_block bb, struct tailcall **ret, 
> bool only_musttail,
>/* Allow EH edges so that we can give a better
>  error message later.  */
>if (num_other != 1)
> -   return;
> +   {
> + if (dump_file)
> +   fprintf (dump_file, "Basic block %d has %d eh / %d other edges\n",
> +  bb->index, num_eh, num_other);
> + return;
> +   }
>  }
>
>bool bad_stmt = false;
> @@ -537,6 +547,11 @@ find_tail_calls (basic_block bb, struct tailcall **ret, 
> bool only_musttail,
>if (gimple_references_memory_p (stmt)
>   || gimple_has_volatile_ops (stmt))
> {
> + if (dump_file)
> +   {
> + fprintf (dump_file, "Cannot handle ");
> + print_gimple_stmt (dump_file, stmt, 0);
> +   }
>   bad_stmt = true;
> }
>  }
> --
> 2.45.2
>


Re: [PATCH v8 12/12] Mark expand musttail error messages for translation

2024-07-05 Thread Richard Biener
On Sat, Jun 22, 2024 at 8:58 PM Andi Kleen  wrote:
>
> The musttail error messages are reported to the user, so must be
> translated.

OK.

> gcc/ChangeLog:
>
> * calls.cc (initialize_argument_information): Mark messages
> for translation.
> (can_implement_as_sibling_call_p): Dito.
> (expand_call): Dito.
> ---
>  gcc/calls.cc | 56 ++--
>  1 file changed, 28 insertions(+), 28 deletions(-)
>
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 883eb9971257..f28c58217fdf 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -1420,9 +1420,9 @@ initialize_argument_information (int num_actuals 
> ATTRIBUTE_UNUSED,
> {
>   *may_tailcall = false;
>   maybe_complain_about_tail_call (exp,
> - "a callee-copied argument 
> is"
> - " stored in the current"
> - " function's frame");
> + _("a callee-copied argument 
> is"
> +   " stored in the current"
> +   " function's frame"));
> }
>
>   args[i].tree_value = build_fold_addr_expr_loc (loc,
> @@ -1489,8 +1489,8 @@ initialize_argument_information (int num_actuals 
> ATTRIBUTE_UNUSED,
>   type = TREE_TYPE (args[i].tree_value);
>   *may_tailcall = false;
>   maybe_complain_about_tail_call (exp,
> - "argument must be passed"
> - " by copying");
> + _("argument must be passed"
> +   " by copying"));
> }
>   arg.pass_by_reference = true;
> }
> @@ -2508,8 +2508,8 @@ can_implement_as_sibling_call_p (tree exp,
>  {
>maybe_complain_about_tail_call
> (exp,
> -"machine description does not have"
> -" a sibcall_epilogue instruction pattern");
> +_("machine description does not have"
> +  " a sibcall_epilogue instruction pattern"));
>return false;
>  }
>
> @@ -2519,7 +2519,7 @@ can_implement_as_sibling_call_p (tree exp,
>   sibling calls will return a structure.  */
>if (structure_value_addr != NULL_RTX)
>  {
> -  maybe_complain_about_tail_call (exp, "callee returns a structure");
> +  maybe_complain_about_tail_call (exp, _("callee returns a structure"));
>return false;
>  }
>
> @@ -2528,8 +2528,8 @@ can_implement_as_sibling_call_p (tree exp,
>if (!targetm.function_ok_for_sibcall (fndecl, exp))
>  {
>maybe_complain_about_tail_call (exp,
> - "target is not able to optimize the"
> - " call into a sibling call");
> + _("target is not able to optimize the"
> +   " call into a sibling call"));
>return false;
>  }
>
> @@ -2537,18 +2537,18 @@ can_implement_as_sibling_call_p (tree exp,
>   optimized.  */
>if (flags & ECF_RETURNS_TWICE)
>  {
> -  maybe_complain_about_tail_call (exp, "callee returns twice");
> +  maybe_complain_about_tail_call (exp, _("callee returns twice"));
>return false;
>  }
>if (flags & ECF_NORETURN)
>  {
> -  maybe_complain_about_tail_call (exp, "callee does not return");
> +  maybe_complain_about_tail_call (exp, _("callee does not return"));
>return false;
>  }
>
>if (TYPE_VOLATILE (TREE_TYPE (TREE_TYPE (addr
>  {
> -  maybe_complain_about_tail_call (exp, "volatile function type");
> +  maybe_complain_about_tail_call (exp, _("volatile function type"));
>return false;
>  }
>
> @@ -2567,7 +2567,7 @@ can_implement_as_sibling_call_p (tree exp,
>   the argument areas are shared.  */
>if (fndecl && decl_function_context (fndecl) == current_function_decl)
>  {
> -  maybe_complain_about_tail_call (exp, "nested function");
> +  maybe_complain_about_tail_call (exp, _("nested function"));
>return false;
>  }
>
> @@ -2579,8 +2579,8 @@ can_implement_as_sibling_call_p (tree exp,
> crtl->args.size - crtl->args.pretend_args_size))
>  {
>maybe_complain_about_tail_call (exp,
> - "callee required more stack slots"
> - " than the caller");
> + _("callee required more stack slots"
> +   " than the caller"));
>return false;
>  }
>
> @@ -2594,15 +2594,15 @@ can_implement_as_sibling_call_p (tree exp,
> crtl->arg

Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

2024-07-05 Thread Richard Biener
On Fri, Jul 5, 2024 at 11:09 AM Richard Sandiford
 wrote:
>
> Richard Sandiford  writes:
> > Richard Biener  writes:
> >> On Fri, Jun 28, 2024 at 2:16 PM Richard Biener
> >>  wrote:
> >>>
> >>> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
> >>>  wrote:
> >>> >
> >>> >
> >>> >
> >>> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford 
> >>> > > :
> >>> > >
> >>> > > Richard Biener  writes:
> >>> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
> >>> > >>>  wrote:
> >>> > >>>
> >>> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt  
> >>> > >>> wrote:
> >>> > 
> >>> >  for the testcase in the PR115406, here is part of the dump.
> >>> > 
> >>> >   char D.4882;
> >>> >   vector(1)  _1;
> >>> >   vector(1) signed char _2;
> >>> >   char _5;
> >>> > 
> >>> >    :
> >>> >   _1 = { -1 };
> >>> > 
> >>> >  When assign { -1 } to vector(1} {signed-boolean:8},
> >>> >  Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit 
> >>> >  of dest
> >>> >  with each vector elemnet. But i think the bit setting should only 
> >>> >  apply for
> >>> >  TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
> >>> >  , it will be assigned as -1, instead of 1.
> >>> >  Is there any specific reason vector(1)  is 
> >>> >  handled
> >>> >  differently from vector<1> ?
> >>> > 
> >>> >  Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> >>> >  Ok for trunk?
> >>> > >>>
> >>> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
> >>> > >>> code should work for 8 bit
> >>> > >>> entities as well, it seems we only set the LSB of each element in 
> >>> > >>> the
> >>> > >>> "mask".  ISTR that SVE
> >>> > >>> masks can have up to 8 bit elements (for 8 byte data elements), so
> >>> > >>> maybe that's why
> >>> > >>> <= BITS_PER_UNIT.
> >>> > >
> >>> > > Yeah.
> >>> >
> >>> > So is it necessary that only one bit is set for SVE?
> >
> > TBH I can't remember now.  It matches what SVE instructions produce, and
> > lines up with the associated RTL code (which at the time was SVE-specific).
> > But when dealing with multibyte elements, upper predicate element bits
> > are ignored on read, so matching the instructions might not matter.
> >
> >>> > >>> So maybe instead of just setting one bit in
> >>> > >>>
> >>> > >>>  ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
> >>> > >>>
> >>> > >>> we should set elt_bits bits, aka (without testing)
> >>> > >>>
> >>> > >>>  ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
> >>> > >>> % BITS_PER_UNIT);
> >>> > >>>
> >>> > >>> ?
> >>> > >>
> >>> > >> Alternatively
> >>> > >>
> >>> > >>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
> >>> > >>  && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
> >>> > >>
> >>> > >> should be amended with
> >>> > >>
> >>> > >>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
> >>> > >
> >>> > > How about:
> >>> > >
> >>> > >  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == 
> >>> > > MODE_VECTOR_BOOL)
> >>> > >{
> >>> > >  gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
> >>> > >
> >>> > > ?
> >>> >
> >>> > Note the path is also necessary for avx512 and gcn mask modes which are 
> >>> > integer modes.
> >>> >
> >>> > > Is it OK for TYPE_MODE to affect tree-level semantics though, 
> >>> > > especially
> >>> > > since it can change with the target attribute?  (At least 
> >>> > > TYPE_MODE_RAW
> >>> > > would be stable.)
> >>> >
> >>> > That’s a good question and also related to GCC vector extension which 
> >>> > can result in both BLKmode and integer modes to be used.  But I’m not 
> >>> > sure how we expose masks to the middle end here.  A too large vector 
> >>> > bool could be lowered to AVX512 mode.  Maybe we should simply reject 
> >>> > interpret/encode of BLKmode vectors and make sure to never assign 
> >>> > integer modes to vector bools (if the target didn’t specify that mode)?
> >>> >
> >>> > I guess some test coverage would be nice here.
> >>>
> >>> To continue on that, we do not currently have a way to capture a
> >>> vector comparison output
> >>> but the C++ frontend has vector ?:
> >>>
> >>> typedef int v8si __attribute__((vector_size(32)));
> >>>
> >>> void foo (v8si *a, v8si *b, v8si *c)
> >>> {
> >>>   *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : 
> >>> (v8si){0,0,0,0,0,0,0,0};
> >>> }
> >>>
> >>> with SSE2 we get a  temporary, with AVX512 enabled
> >>> that becomes .  When we enlarge the vector to size 128
> >>> then even with AVX512 enabled I see  here and
> >>> vector lowering decomposes that to scalar (also with AVX or SSE, so maybe
> >>> just a missed optimization).  But note that to decompose this into two
> >>> AVX512 vectors the temporary would have to change from 
> >>> elements to .
> >>>
> >>> The not supported vector bool types have BLKmode sofar.
> >>>
> >>> But for example on i?86-linux with -mno-sse (like -march=i586) f

Re: [RFC] tree-if-conv: Handle nonzero masked elements [PR115336].

2024-07-05 Thread Richard Biener
On Fri, 5 Jul 2024, Robin Dapp wrote:

> Hi,
> 
> in PR115336 we have the following
> 
>   vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0, 
> ... }, _482, 0);
>   vect_iftmp.44 = vect_patt_391 | { 1, ... };
>   .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44);
> 
> which assumes that a maskload sets the masked-out elements to zero.

To me this looks like mis-applying of match.pd:6083?

Applying pattern match.pd:6083, gimple-match-1.cc:45749
gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219;
new phi replacement stmt
iftmp.0_62 = iftmp.0_61 | _219;

so originally it wasn't

  iftmp.0_61 = .MASK_LOAD (_260, 8B, _259);
  iftmp.0_62 = iftmp.0_61 | _219;

but sth like

  iftmp.0_62 =  ? _219 : iftmp.0_61;

and iftmp.0_61 is zero-one-valued because it is _Bool.  Shouldn't
if-conversion always turn the COND_EXPR into a .COND_IOR here?


> RVV does not guarantee this and, unfortunately, leaves it to the
> hardware implementation to do basically anything it wants (even keep
> the previous value).  In the PR this leads to us reusing a previous
> vector register and stale, nonzero values causing an invalid result.
> 
> Based on a Richard Sandiford's feedback I started with the attached
> patch - it's more an RFC in its current shape and obviously not
> tested exhaustively.
> 
> The idea is:
>  - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value)
>after a MASK_LOAD if the target requires it.
>  - Elide the VCOND_MASK when there is a COND_OP with a replacing else
>value later.
> 
> Is this, generally, reasonable or is there a better way?
> 
> My initial idea was to add an else value to MASK_LOAD.  Richard's
> concern was, though, that this might make nonzero else values
> appear inexpensive when they are actually not.
> 
> Even though I, mechanically, added match.pd patterns to catch
> the most common cases (already at a point where a separate function
> maybe in gimple-match-exports? would make more sense), there is
> still significant code-quality fallout.
> The regressing cases are often of a form where the VCOND_MASK is
> not just a conversion away but rather hidden behind some unmasked
> operation.  I'm not sure if we could ever recognize everything that
> way without descending very deep.
> 
> Regards
>  Robin
> 
> gcc/ChangeLog:
> 
>   PR middle-end/115336
> 
>   * config/riscv/riscv.cc (riscv_preferred_else_value): Add
>   MASK_LOAD.
>   * config/riscv/riscv.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO):
>   Set to true.
>   * defaults.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO): New.
>   * doc/tm.texi: Document.
>   * doc/tm.texi.in: Document.
>   * match.pd: Add patterns to allow replacing the else value of a
>   VEC_COND.
>   * tree-if-conv.cc (predicate_load_or_store): Emit a VEC_COND
>   after a MASK_LOAD if the target does not guarantee zeroing.
>   (predicate_statements): Add temporary lhs argument.
>   * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Re-fold fold
>   result.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/autovec/pr115336.c: New test.
> ---
>  gcc/config/riscv/riscv.cc |   2 +
>  gcc/config/riscv/riscv.h  |   2 +
>  gcc/defaults.h|   4 +
>  gcc/doc/tm.texi   |   5 +
>  gcc/doc/tm.texi.in|   5 +
>  gcc/match.pd  | 125 +-
>  .../gcc.target/riscv/rvv/autovec/pr115336.c   |  20 +++
>  gcc/tree-if-conv.cc   |  57 +++-
>  gcc/tree-ssa-math-opts.cc |   8 ++
>  9 files changed, 217 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c
> 
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index c17141d909a..e10bc6824b9 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -11481,6 +11481,8 @@ riscv_preferred_else_value (unsigned ifn, tree 
> vectype, unsigned int nops,
>  {
>if (riscv_v_ext_mode_p (TYPE_MODE (vectype)))
>  return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
> +  else if (ifn == IFN_MASK_LOAD)
> +return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
>  
>return default_preferred_else_value (ifn, vectype, nops, ops);
>  }
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 57910eecd3e..dc15eb5e60f 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1264,4 +1264,6 @@ extern void riscv_remove_unneeded_save_restore_calls 
> (void);
>  /* Check TLS Descriptors mechanism is selected.  */
>  #define TARGET_TLSDESC (riscv_tls_dialect == TLS_DESCRIPTORS)
>  
> +#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 1
> +
>  #endif /* ! GCC_RISCV_H */
> diff --git a/gcc/defaults.h b/gcc/defaults.h
> index 92f3e07f742..6ffbdaea229 100644
> 

Re: [RFC] tree-if-conv: Handle nonzero masked elements [PR115336].

2024-07-05 Thread Richard Biener
On Fri, 5 Jul 2024, Richard Biener wrote:

> On Fri, 5 Jul 2024, Robin Dapp wrote:
> 
> > Hi,
> > 
> > in PR115336 we have the following
> > 
> >   vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 
> > 0, ... }, _482, 0);
> >   vect_iftmp.44 = vect_patt_391 | { 1, ... };
> >   .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44);
> > 
> > which assumes that a maskload sets the masked-out elements to zero.
> 
> To me this looks like mis-applying of match.pd:6083?
> 
> Applying pattern match.pd:6083, gimple-match-1.cc:45749
> gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219;
> new phi replacement stmt
> iftmp.0_62 = iftmp.0_61 | _219;
> 
> so originally it wasn't
> 
>   iftmp.0_61 = .MASK_LOAD (_260, 8B, _259);
>   iftmp.0_62 = iftmp.0_61 | _219;
> 
> but sth like
> 
>   iftmp.0_62 =  ? _219 : iftmp.0_61;
> 
> and iftmp.0_61 is zero-one-valued because it is _Bool.  Shouldn't
> if-conversion always turn the COND_EXPR into a .COND_IOR here?
> 
> > RVV does not guarantee this and, unfortunately, leaves it to the
> > hardware implementation to do basically anything it wants (even keep
> > the previous value).  In the PR this leads to us reusing a previous
> > vector register and stale, nonzero values causing an invalid result.
> > 
> > Based on a Richard Sandiford's feedback I started with the attached
> > patch - it's more an RFC in its current shape and obviously not
> > tested exhaustively.
> > 
> > The idea is:
> >  - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value)
> >after a MASK_LOAD if the target requires it.
> >  - Elide the VCOND_MASK when there is a COND_OP with a replacing else
> >value later.
> > 
> > Is this, generally, reasonable or is there a better way?
> > 
> > My initial idea was to add an else value to MASK_LOAD.  Richard's
> > concern was, though, that this might make nonzero else values
> > appear inexpensive when they are actually not.

In general I'd agree that we want an else value for .MASK_LOAD
and also possibly for .MASK_STORE (but then we need to represent an
else value that says do-not-change/merge).

> > Even though I, mechanically, added match.pd patterns to catch
> > the most common cases (already at a point where a separate function
> > maybe in gimple-match-exports? would make more sense), there is
> > still significant code-quality fallout.
> > The regressing cases are often of a form where the VCOND_MASK is
> > not just a conversion away but rather hidden behind some unmasked
> > operation.  I'm not sure if we could ever recognize everything that
> > way without descending very deep.

I think the vectorizer would need to track masks in a better way,
much like we have the SLP permute optimization phase we'd have a
mask optimization phase.

Richard.

> > Regards
> >  Robin
> > 
> > gcc/ChangeLog:
> > 
> > PR middle-end/115336
> > 
> > * config/riscv/riscv.cc (riscv_preferred_else_value): Add
> > MASK_LOAD.
> > * config/riscv/riscv.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO):
> > Set to true.
> > * defaults.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO): New.
> > * doc/tm.texi: Document.
> > * doc/tm.texi.in: Document.
> > * match.pd: Add patterns to allow replacing the else value of a
> > VEC_COND.
> > * tree-if-conv.cc (predicate_load_or_store): Emit a VEC_COND
> > after a MASK_LOAD if the target does not guarantee zeroing.
> > (predicate_statements): Add temporary lhs argument.
> > * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Re-fold fold
> > result.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/riscv/rvv/autovec/pr115336.c: New test.
> > ---
> >  gcc/config/riscv/riscv.cc |   2 +
> >  gcc/config/riscv/riscv.h  |   2 +
> >  gcc/defaults.h|   4 +
> >  gcc/doc/tm.texi   |   5 +
> >  gcc/doc/tm.texi.in|   5 +
> >  gcc/match.pd  | 125 +-
> >  .../gcc.target/riscv/rvv/autovec/pr115336.c   |  20 +++
> >  gcc/tree-if-conv.cc   |  57 +++-
> >  gcc/tree-ssa-math-opts.cc |   8 ++
> >  9 files changed, 217 insertions(+), 11 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c
> > 
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index c17141d909a..e10bc6824b9 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -11481,6 +11481,8 @@ riscv_preferred_else_value (unsigned ifn, tree 
> > vectype, unsigned int nops,
> >  {
> >if (riscv_v_ext_mode_p (TYPE_MODE (vectype)))
> >  return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
> > +  else if (ifn == IFN_MASK_LOAD)
> > +return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
> >  
> >return default_preferred_else_value (ifn, vectype, nops, 

[PATCH v1] Match: Support form 2 for the .SAT_TRUNC

2024-07-05 Thread pan2 . li
From: Pan Li 

This patch would like to add form 2 support for the .SAT_TRUNC.  Aka:

Form 2:
  #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \
  NT __attribute__((noinline)) \
  sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
  {\
bool overflow = x > (WT)(NT)(-1);  \
return overflow ? (NT)-1 : (NT)x;  \
  }

DEF_SAT_U_TRUC_FMT_2(uint32, uint64)

Before this patch:
   3   │
   4   │ __attribute__((noinline))
   5   │ uint32_t sat_u_truc_uint64_t_to_uint32_t_fmt_2 (uint64_t x)
   6   │ {
   7   │   uint32_t _1;
   8   │   long unsigned int _3;
   9   │
  10   │ ;;   basic block 2, loop depth 0
  11   │ ;;pred:   ENTRY
  12   │   _3 = MIN_EXPR ;
  13   │   _1 = (uint32_t) _3;
  14   │   return _1;
  15   │ ;;succ:   EXIT
  16   │
  17   │ }

After this patch:
   3   │
   4   │ __attribute__((noinline))
   5   │ uint32_t sat_u_truc_uint64_t_to_uint32_t_fmt_2 (uint64_t x)
   6   │ {
   7   │   uint32_t _1;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;pred:   ENTRY
  11   │   _1 = .SAT_TRUNC (x_2(D)); [tail call]
  12   │   return _1;
  13   │ ;;succ:   EXIT
  14   │
  15   │ }

The below test suites are passed for this patch:
1. The x86 bootstrap test.
2. The x86 fully regression test.
3. The rv64gcv fully regresssion test.

gcc/ChangeLog:

* match.pd: Add form 2 for .SAT_TRUNC.
* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
Add new case NOP_EXPR,  and try to match SAT_TRUNC.

Signed-off-by: Pan Li 
---
 gcc/match.pd  | 17 -
 gcc/tree-ssa-math-opts.cc |  4 
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 4edfa2ae2c9..3759c64d461 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3234,7 +3234,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
-/* Unsigned saturation truncate, case 1 (), sizeof (WT) > sizeof (NT).
+/* Unsigned saturation truncate, case 1, sizeof (WT) > sizeof (NT).
SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))).  */
 (match (unsigned_integer_sat_trunc @0)
  (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1)))
@@ -3250,6 +3250,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   }
   (if (otype_precision < itype_precision && wi::eq_p (trunc_max, int_cst))
 
+/* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT).
+   SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)).  */
+(match (unsigned_integer_sat_trunc @0)
+ (convert (min @0 INTEGER_CST@1))
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && TYPE_UNSIGNED (TREE_TYPE (@0)))
+ (with
+  {
+   unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
+   unsigned otype_precision = TYPE_PRECISION (type);
+   wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
+   wide_int int_cst = wi::to_wide (@1, itype_precision);
+  }
+  (if (otype_precision < itype_precision && wi::eq_p (trunc_max, int_cst))
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index a35caf5f058..ac86be8eb94 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -6170,6 +6170,10 @@ math_opts_dom_walker::after_dom_children (basic_block bb)
  match_unsigned_saturation_sub (&gsi, as_a (stmt));
  break;
 
+   case NOP_EXPR:
+ match_unsigned_saturation_trunc (&gsi, as_a (stmt));
+ break;
+
default:;
}
}
-- 
2.34.1



[RFC WIP] RAW_DATA_CST for #embed optimization

2024-07-05 Thread Jakub Jelinek
Hi!

Just to show what I'm working on on top of the already posted #embed
patches.
Working on the C FE only for now, the patch emits CPP_EMBED tokens
when preprocessing C (but still with the important simplification that
CPP_EMBED token is always preceded by {CPP_NUMBER,CPP_EMBED} CPP_COMMA
and followed by CPP_COMMA {CPP_NUMBER,CPP_EMBED}, without that the FE code
would be significantly larger).

The patch introduces RAW_DATA_CST tree, which is shorthand for a (possibly huge)
sequence of INTEGER_CSTs in CONSTRUCTOR of ARRAY_TYPE.  TREE_TYPE of this
is the type of each element in the sequence (rather than say some array type
like in the STRING_CST case) and it intentionally doesn't own the underlying
data, but has (so far unused) RAW_DATA_OWNER subtree for the GC owner of
that data.  If RAW_DATA_OWNER is NULL, the data is owned directly by libcpp
buffers, but e.g. for #embed in headers compiled into PCH I think we'll need
to add a STRING_CST as the owner of the data.
This way, it is cheap to peel them off or split them apart (see e.g. the
testcases which create possibly huge RAW_DATA_CST and then using designed
initializer store some other element in the middle of it).
The new testcases already work with the patch, what is missing is handling
it where needed in the middle-end (surely needs to be handled e.g. in the
gimplifier if somebody uses #embed in automatic var initializers, in IPA-ICF
hashing, or when trying to extract a constant from some large array
initializer or LTO), plus decide what to do with CPP_EMBED in the FE (one
possibility is to keep it as CPP_EMBED only when peeking non-raw token
when some new parser flag is set and otherwise already during token peeking
split it off into individual CPP_NUMBER tokens, another possibility if
we manage to come up with all spots where CPP_EMBED can validly appear in
the grammar (my current understanding besides the initializers already
handled is in function arguments or attributes (both using
c_parser_expr_list, so that function could be taught to parse CPP_EMBED
into list of INTEGER_CSTs), in comma expressions (so teach
c_parser_expression about it, but in that case I wonder if we just don't
need to ensure correct warnings and that is it, because whether a comma
expression is something,somethingelse or
something,1,2,3,4,5,6,7,8,somethingelse
shouldn't be significantly different) and I think in OpenMP sizes clause
(but that could be changed to use c_parser_expr_list); for C++ I think
template arguments, or multi-dimensional array indices are other spots).

I don't have a bootstrapped compiler with this patch yet, but -O0 built
cc1 can compile at -O0
unsigned char a[] = {
#embed "cc1plus"
};
where cc1plus is 261M binary compiles under 20s for me (most time spent
in assemble_string).  If I preprocess the same with C++ (so that it
just expands it into a sequence of 261M numbers), I've Ctrl-C the
compilation after a couple of minutes.  Obviously I'll do a proper
test with optimized compiler later.

Anyway, thoughts on this before I spend too much time on it?

And while clang 19 now claims to have #embed implemented, most of the
testcases in the patch are miscompiled by it.

--- libcpp/files.cc.jj  2024-07-03 14:52:12.231817485 +0200
+++ libcpp/files.cc 2024-07-03 15:44:39.248913032 +0200
@@ -1241,7 +1241,10 @@ finish_embed (cpp_reader *pfile, _cpp_fi
 limit = params->limit;
 
   size_t embed_tokens = 0;
-  if (CPP_OPTION (pfile, directives_only) && limit >= 64)
+  if ((CPP_OPTION (pfile, directives_only)
+   || !CPP_OPTION (pfile, cplusplus))
+  && CPP_OPTION (pfile, lang) != CLK_ASM
+  && limit >= 64)
 embed_tokens = ((limit - 2) / INT_MAX) + (((limit - 2) % INT_MAX) != 0);
 
   size_t max = INTTYPE_MAXIMUM (size_t) / sizeof (cpp_token);
--- gcc/varasm.cc.jj2024-05-07 18:10:10.674871087 +0200
+++ gcc/varasm.cc   2024-07-04 14:58:33.570465411 +0200
@@ -4875,6 +4875,7 @@ initializer_constant_valid_p_1 (tree val
 case FIXED_CST:
 case STRING_CST:
 case COMPLEX_CST:
+case RAW_DATA_CST:
   return null_pointer_node;
 
 case ADDR_EXPR:
@@ -5468,6 +5469,9 @@ array_size_for_constructor (tree val)
 {
   if (TREE_CODE (index) == RANGE_EXPR)
index = TREE_OPERAND (index, 1);
+  if (value && TREE_CODE (value) == RAW_DATA_CST)
+   index = size_binop (PLUS_EXPR, index,
+   size_int (RAW_DATA_LENGTH (value) - 1));
   if (max_index == NULL_TREE || tree_int_cst_lt (max_index, index))
max_index = index;
 }
@@ -5659,6 +5663,12 @@ output_constructor_regular_field (oc_loc
   /* Output the element's initial value.  */
   if (local->val == NULL_TREE)
 assemble_zeros (fieldsize);
+  else if (local->val && TREE_CODE (local->val) == RAW_DATA_CST)
+{
+  fieldsize *= RAW_DATA_LENGTH (local->val);
+  assemble_string (RAW_DATA_POINTER (local->val),
+  RAW_DATA_LENGTH (local->val));
+}
   else
 fieldsize = output_constant (local->

[COMMITED] MAINTAINERS: Fix order in DCO

2024-07-05 Thread Filip Kastl
Hi,

I've noticed wrong order in the Contributing under the DCO section of the
MAINTAINERS file.  I'm commiting the fix as obvious.

Filip Kastl

-- 8< --

ChangeLog:

* MAINTAINERS: Fix order in Contributing under the DCO.

Signed-off-by: Filip Kastl 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index b4739f29107..762b91256c4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -766,6 +766,7 @@ Robin Dapp  

 Robin Dapp 
 Michal Jires   
 Matthias Kretz 
+Prathamesh Kulkarni
 Tim Lange  
 Jeff Law   
 Jeff Law   
@@ -791,4 +792,3 @@ Jonathan Wakely 

 Alexander Westbrooks   
 Chung-Ju Wu
 Pengxuan Zheng 
-Prathamesh Kulkarni
-- 
2.45.2



Re: [RFC] tree-if-conv: Handle nonzero masked elements [PR115336].

2024-07-05 Thread Richard Sandiford
Robin Dapp  writes:
> Hi,
>
> in PR115336 we have the following
>
>   vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0, 
> ... }, _482, 0);
>   vect_iftmp.44 = vect_patt_391 | { 1, ... };
>   .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44);
>
> which assumes that a maskload sets the masked-out elements to zero.
> RVV does not guarantee this and, unfortunately, leaves it to the
> hardware implementation to do basically anything it wants (even keep
> the previous value).  In the PR this leads to us reusing a previous
> vector register and stale, nonzero values causing an invalid result.
>
> Based on a Richard Sandiford's feedback I started with the attached
> patch - it's more an RFC in its current shape and obviously not
> tested exhaustively.
>
> The idea is:
>  - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value)
>after a MASK_LOAD if the target requires it.
>  - Elide the VCOND_MASK when there is a COND_OP with a replacing else
>value later.
>
> Is this, generally, reasonable or is there a better way?
>
> My initial idea was to add an else value to MASK_LOAD.  Richard's
> concern was, though, that this might make nonzero else values
> appear inexpensive when they are actually not.

Yeah, but like I said off-list, I still think an else operand is the way
to go.  It would be quite a bit of work to retrofit at this point, but
it feels like the right long-term solution.

FTR, my concern & suggestion was:

  I suppose the difficulty is that we might make:

MASK_LOAD (mask, ptr, some-arbitrary-else-value)

  seem as cheap as:

MASK_LOAD (mask, ptr, { 0, 0,. ... 0})

  which definitely isn't the case for SVE (and I'm guessing also
  for AVX).  It would be better to keep the:

COND_EXPR (mask, ..., some-arbitrary-else-value)

  separate and try to optimise it with surrounding code.

  Perhaps we could use the optab's insn predicates to test whether
  a given else value is acceptable?  We already do something similar
  for the gather load arguments.

Since then, there's been discussion about taking the same
predicated-based approach for vec_cmp.

That is, we'd add the else value to the maskload optab and MASK_LOAD
internal function, but use predicates to test whether a target supports
a particular else value.

Thanks,
Richard


Re: [RFC] tree-if-conv: Handle nonzero masked elements [PR115336].

2024-07-05 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, 5 Jul 2024, Richard Biener wrote:
>
>> On Fri, 5 Jul 2024, Robin Dapp wrote:
>> 
>> > Hi,
>> > 
>> > in PR115336 we have the following
>> > 
>> >   vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 
>> > 0, ... }, _482, 0);
>> >   vect_iftmp.44 = vect_patt_391 | { 1, ... };
>> >   .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44);
>> > 
>> > which assumes that a maskload sets the masked-out elements to zero.
>> 
>> To me this looks like mis-applying of match.pd:6083?
>> 
>> Applying pattern match.pd:6083, gimple-match-1.cc:45749
>> gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219;
>> new phi replacement stmt
>> iftmp.0_62 = iftmp.0_61 | _219;
>> 
>> so originally it wasn't
>> 
>>   iftmp.0_61 = .MASK_LOAD (_260, 8B, _259);
>>   iftmp.0_62 = iftmp.0_61 | _219;
>> 
>> but sth like
>> 
>>   iftmp.0_62 =  ? _219 : iftmp.0_61;
>> 
>> and iftmp.0_61 is zero-one-valued because it is _Bool.  Shouldn't
>> if-conversion always turn the COND_EXPR into a .COND_IOR here?
>> 
>> > RVV does not guarantee this and, unfortunately, leaves it to the
>> > hardware implementation to do basically anything it wants (even keep
>> > the previous value).  In the PR this leads to us reusing a previous
>> > vector register and stale, nonzero values causing an invalid result.
>> > 
>> > Based on a Richard Sandiford's feedback I started with the attached
>> > patch - it's more an RFC in its current shape and obviously not
>> > tested exhaustively.
>> > 
>> > The idea is:
>> >  - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value)
>> >after a MASK_LOAD if the target requires it.
>> >  - Elide the VCOND_MASK when there is a COND_OP with a replacing else
>> >value later.
>> > 
>> > Is this, generally, reasonable or is there a better way?
>> > 
>> > My initial idea was to add an else value to MASK_LOAD.  Richard's
>> > concern was, though, that this might make nonzero else values
>> > appear inexpensive when they are actually not.
>
> In general I'd agree that we want an else value for .MASK_LOAD
> and also possibly for .MASK_STORE (but then we need to represent an
> else value that says do-not-change/merge).

Not sure about MASK_STORE.  AFAICT, an else value for MASK_STORE would
be directly equivalent to:

  tmp = vec_cond_expr ;
  *ptr = tmp;

and IMO it'd be better to keep representing it like that instead.

That is, MASK_STORE is changing something that already has a definition
(the target memory), and so there's no ambiguity in having a masked
operation that only changes part of the target.

But for MASK_LOAD and other masked vector-producing functions, we're
creating a new full vector based on a partial vector operation.
There's no existing target to modify, and so we need to get the
definition of the inactive elements from somewhere else.

Thanks,
Richard


[PATCH v2] Vect: Distribute truncation into .SAT_SUB operands

2024-07-05 Thread pan2 . li
From: Pan Li 

To get better vectorized code of .SAT_SUB,  we would like to avoid the
truncated operation for the assignment.  For example, as below.

unsigned int _1;
unsigned int _2;
_9 = (unsigned short int).SAT_SUB (_1, _2);

If we make sure that the _1 is in the range of unsigned short int.  Such
as a def similar to:

_1 = (unsigned short int)_4;

Then we can do the distribute the truncation operation to:

_3 = MIN_EXPR (_2, 65535);
_9 = .SAT_SUB ((unsigned short int)_1, (unsigned short int)_3);

Let's take RISC-V vector as example to tell the changes.  For below
sample code:

__attribute__((noinline))
void test (uint16_t *x, unsigned b, unsigned n)
{
  unsigned a = 0;
  uint16_t *p = x;

  do {
a = *--p;
*p = (uint16_t)(a >= b ? a - b : 0);
  } while (--n);
}

Before this patch:
  ...
  .L3:
  vle16.v   v1,0(a3)
  vrsub.vx  v5,v2,t1
  mvt3,a4
  addw  a4,a4,t5
  vrgather.vv   v3,v1,v5
  vsetvli   zero,zero,e32,m1,ta,ma
  vzext.vf2 v1,v3
  vssubu.vx v1,v1,a1
  vsetvli   zero,zero,e16,mf2,ta,ma
  vncvt.x.x.w   v1,v1
  vrgather.vv   v3,v1,v5
  vse16.v   v3,0(a3)
  sub   a3,a3,t4
  bgtu  t6,a4,.L3
  ...

After this patch:
test:
  ...
  .L3:
  vle16.v   v3,0(a3)
  vrsub.vx  v5,v2,a6
  mva7,a4
  addw  a4,a4,t3
  vrgather.vv   v1,v3,v5
  vssubu.vv v1,v1,v6
  vrgather.vv   v3,v1,v5
  vse16.v   v3,0(a3)
  sub   a3,a3,t1
  bgtu  t4,a4,.L3
  ...

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The rv64gcv build with glibc.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_recog_sat_sub_pattern_distribute):
Add new func impl to perform the truncation distribution.
(vect_recog_sat_sub_pattern): Perform above optimize before
generate .SAT_SUB call.

Signed-off-by: Pan Li 
---
 gcc/tree-vect-patterns.cc | 75 +++
 1 file changed, 75 insertions(+)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 86e893a1c43..90449bd0ddd 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4566,6 +4566,79 @@ vect_recog_sat_add_pattern (vec_info *vinfo, 
stmt_vec_info stmt_vinfo,
   return NULL;
 }
 
+/*
+ * Try to distribute the truncation for .SAT_SUB pattern,  mostly occurs in
+ * the benchmark zip.  Aka:
+ *
+ *   unsigned int _1;
+ *   unsigned int _2;
+ *   _9 = (unsigned short int).SAT_SUB (_1, _2);
+ *
+ *   if _1 is known to be in the range of unsigned short int.  For example
+ *   there is a def _1 = (unsigned short int)_4.  Then we can distribute the
+ *   truncation to:
+ *
+ *   _3 = MIN (65535, _2);
+ *   _9 = .SAT_SUB (_4, (unsigned short int)_3);
+ *
+ *   Then,  we can better vectorized code and avoid the unnecessary narrowing
+ *   stmt during vectorization.
+ */
+static void
+vect_recog_sat_sub_pattern_distribute (vec_info *vinfo,
+  stmt_vec_info stmt_vinfo,
+  gimple *stmt, tree lhs, tree *ops)
+{
+  tree otype = TREE_TYPE (lhs);
+  tree itype = TREE_TYPE (ops[0]);
+  unsigned itype_prec = TYPE_PRECISION (itype);
+  unsigned otype_prec = TYPE_PRECISION (otype);
+
+  if (types_compatible_p (otype, itype) || otype_prec >= itype_prec)
+return;
+
+  int_range_max r;
+  gimple_ranger granger;
+
+  if (granger.range_of_expr (r, ops[0], stmt) && !r.undefined_p ())
+{
+  wide_int bound = r.upper_bound ();
+  wide_int otype_max = wi::mask (otype_prec, /* negate */false, 
itype_prec);
+  tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
+  tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
+
+  if (!wi::leu_p (bound, otype_max) || v_otype == NULL || v_itype == NULL
+   || !target_supports_op_p (v_itype, MIN_EXPR, optab_vector))
+   return;
+
+  /* 1. Build truncated op_0  */
+  vect_unpromoted_value unprom;
+  tree tmp = vect_look_through_possible_promotion (vinfo, ops[0], &unprom);
+
+  if (tmp == NULL_TREE || TYPE_PRECISION (unprom.type) != otype_prec)
+   {
+ tmp = vect_recog_temp_ssa_var (otype, NULL);
+ gimple *op_0_cast = gimple_build_assign (tmp, NOP_EXPR, ops[0]);
+ append_pattern_def_seq (vinfo, stmt_vinfo, op_0_cast, v_otype);
+   }
+
+  ops[0] = tmp;
+
+  /* 2. Build MIN_EXPR (op_1, 65536)  */
+  tree max = wide_int_to_tree (itype, otype_max);
+  tree op_1_in = vect_recog_temp_ssa_var (itype, NULL);
+  gimple *op_1_min = gimple_build_assign (op_1_in, MIN_EXPR, ops[1], max);
+  append_pattern_def_seq (vinfo, stmt_vinfo, op_1_min, v_itype);
+
+  /* 3. Build truncated op_1  */
+  tmp = vect_recog_temp_ssa_var (otype, NULL);
+  gimple *op_1_cast = gimple_build_assign (tmp, NOP_EXPR, op_1_in);
+  append_pattern_def_seq (vinfo, stmt_vinfo, op_1_cast, v_otype);
+
+  ops[1] = tmp;
+}
+}
+
 /*
  * Try to detect saturation sub pattern 

[PATCH] c++, contracts: Fix ICE in create_tmp_var [PR113968]

2024-07-05 Thread Nina Dinka Ranns
Certain places in contract parsing currently do not check for errors.
This results in contracts
with embedded errors which eventually confuse gimplify. Checks for
errors added in
grok_contract() and cp_parser_contract_attribute_spec() to exit early
if an error is encountered.

Tested on x86_64-pc-linux-gnu
---

PR c++/113968

gcc/cp/ChangeLog:

* contracts.cc (grok_contract): Check for error_mark_node early
  exit
* parser.cc (cp_parser_contract_attribute_spec): Check for
  error_mark_node early exit

gcc/testsuite/ChangeLog:

* g++.dg/contracts/pr113968.C: New test.

Signed-off-by: Nina Ranns 


---
 gcc/cp/contracts.cc   |  7 ++
 gcc/cp/parser.cc  |  3 +++
 gcc/testsuite/g++.dg/contracts/pr113968.C | 29 +++
 3 files changed, 39 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/contracts/pr113968.C

diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
index 634e3cf4fa9..a7d0fdacf6e 100644
--- a/gcc/cp/contracts.cc
+++ b/gcc/cp/contracts.cc
@@ -750,6 +750,9 @@ tree
 grok_contract (tree attribute, tree mode, tree result, cp_expr condition,
location_t loc)
 {
+  if (condition == error_mark_node)
+return error_mark_node;
+
   tree_code code;
   if (is_attribute_p ("assert", attribute))
 code = ASSERTION_STMT;
@@ -785,6 +788,10 @@ grok_contract (tree attribute, tree mode, tree
result, cp_expr condition,

   /* The condition is converted to bool.  */
   condition = finish_contract_condition (condition);
+
+  if (condition == error_mark_node)
+return error_mark_node;
+
   CONTRACT_CONDITION (contract) = condition;

   return contract;
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 31ae9c2fb54..22c5e760aee 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -30835,6 +30835,9 @@ cp_parser_contract_attribute_spec (cp_parser
*parser, tree attribute)
   return error_mark_node;
 }

+  if (contract == error_mark_node)
+return error_mark_node;
+
   return finish_contract_attribute (attribute, contract);
 }

diff --git a/gcc/testsuite/g++.dg/contracts/pr113968.C
b/gcc/testsuite/g++.dg/contracts/pr113968.C
new file mode 100644
index 000..fbaad1c930d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/contracts/pr113968.C
@@ -0,0 +1,29 @@
+// check that an invalid contract condition doesn't cause an ICE
+// { dg-do compile }
+// { dg-options "-std=c++2a -fcontracts " }
+
+struct A
+{
+  A (A&);
+};
+struct S
+{
+  void f(A a)
+[[ pre : a]] // { dg-error "could not convert" }
+[[ pre : a.b]]// { dg-error "has no member" }
+{
+
+}
+};
+void f(A a)
+  [[ pre : a]] // { dg-error "could not convert" }
+  [[ pre : a.b]]// { dg-error "has no member" }
+  {
+[[ assert : a ]]; // { dg-error "could not convert" }
+[[ assert : a.b ]];// { dg-error "has no member" }
+  }
+
+int
+main ()
+{
+}
-- 
2.45.2


Re: [RFC] tree-if-conv: Handle nonzero masked elements [PR115336].

2024-07-05 Thread Richard Biener



> Am 05.07.2024 um 16:00 schrieb Richard Sandiford :
> 
> Richard Biener  writes:
>>> On Fri, 5 Jul 2024, Richard Biener wrote:
>>> 
 On Fri, 5 Jul 2024, Robin Dapp wrote:
>>> 
 Hi,
 
 in PR115336 we have the following
 
  vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 
 0, ... }, _482, 0);
  vect_iftmp.44 = vect_patt_391 | { 1, ... };
  .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44);
 
 which assumes that a maskload sets the masked-out elements to zero.
>>> 
>>> To me this looks like mis-applying of match.pd:6083?
>>> 
>>> Applying pattern match.pd:6083, gimple-match-1.cc:45749
>>> gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219;
>>> new phi replacement stmt
>>> iftmp.0_62 = iftmp.0_61 | _219;
>>> 
>>> so originally it wasn't
>>> 
>>>  iftmp.0_61 = .MASK_LOAD (_260, 8B, _259);
>>>  iftmp.0_62 = iftmp.0_61 | _219;
>>> 
>>> but sth like
>>> 
>>>  iftmp.0_62 =  ? _219 : iftmp.0_61;
>>> 
>>> and iftmp.0_61 is zero-one-valued because it is _Bool.  Shouldn't
>>> if-conversion always turn the COND_EXPR into a .COND_IOR here?
>>> 
 RVV does not guarantee this and, unfortunately, leaves it to the
 hardware implementation to do basically anything it wants (even keep
 the previous value).  In the PR this leads to us reusing a previous
 vector register and stale, nonzero values causing an invalid result.
 
 Based on a Richard Sandiford's feedback I started with the attached
 patch - it's more an RFC in its current shape and obviously not
 tested exhaustively.
 
 The idea is:
 - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value)
   after a MASK_LOAD if the target requires it.
 - Elide the VCOND_MASK when there is a COND_OP with a replacing else
   value later.
 
 Is this, generally, reasonable or is there a better way?
 
 My initial idea was to add an else value to MASK_LOAD.  Richard's
 concern was, though, that this might make nonzero else values
 appear inexpensive when they are actually not.
>> 
>> In general I'd agree that we want an else value for .MASK_LOAD
>> and also possibly for .MASK_STORE (but then we need to represent an
>> else value that says do-not-change/merge).
> 
> Not sure about MASK_STORE.  AFAICT, an else value for MASK_STORE would
> be directly equivalent to:
> 
>  tmp = vec_cond_expr ;
>  *ptr = tmp;
> 
> and IMO it'd be better to keep representing it like that instead.
> 
> That is, MASK_STORE is changing something that already has a definition
> (the target memory), and so there's no ambiguity in having a masked
> operation that only changes part of the target.
> 
> But for MASK_LOAD and other masked vector-producing functions, we're
> creating a new full vector based on a partial vector operation.
> There's no existing target to modify, and so we need to get the
> definition of the inactive elements from somewhere else.

Agreed.  I also think the predicate idea for the else value in MASK_LOAD is 
good.

Richard 

> Thanks,
> Richard


Re: [PATCH] c++, contracts: Fix ICE in create_tmp_var [PR113968]

2024-07-05 Thread Iain Sandoe
Hi Nina,

thanks for the patch!

> On 5 Jul 2024, at 15:25, Nina Dinka Ranns  wrote:
> 
> Certain places in contract parsing currently do not check for errors.
> This results in contracts
> with embedded errors which eventually confuse gimplify. Checks for
> errors added in
> grok_contract() and cp_parser_contract_attribute_spec() to exit early
> if an error is encountered.
> 
> Tested on x86_64-pc-linux-gnu

this LGTM, but I cannot approve it.
Iain

> ---
> 
>PR c++/113968
> 
> gcc/cp/ChangeLog:
> 
>* contracts.cc (grok_contract): Check for error_mark_node early
>  exit
>* parser.cc (cp_parser_contract_attribute_spec): Check for
>  error_mark_node early exit
> 
> gcc/testsuite/ChangeLog:
> 
>* g++.dg/contracts/pr113968.C: New test.
> 
> Signed-off-by: Nina Ranns 
> 
> 
> ---
> gcc/cp/contracts.cc   |  7 ++
> gcc/cp/parser.cc  |  3 +++
> gcc/testsuite/g++.dg/contracts/pr113968.C | 29 +++
> 3 files changed, 39 insertions(+)
> create mode 100644 gcc/testsuite/g++.dg/contracts/pr113968.C
> 
> diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
> index 634e3cf4fa9..a7d0fdacf6e 100644
> --- a/gcc/cp/contracts.cc
> +++ b/gcc/cp/contracts.cc
> @@ -750,6 +750,9 @@ tree
> grok_contract (tree attribute, tree mode, tree result, cp_expr condition,
>location_t loc)
> {
> +  if (condition == error_mark_node)
> +return error_mark_node;
> +
>   tree_code code;
>   if (is_attribute_p ("assert", attribute))
> code = ASSERTION_STMT;
> @@ -785,6 +788,10 @@ grok_contract (tree attribute, tree mode, tree
> result, cp_expr condition,
> 
>   /* The condition is converted to bool.  */
>   condition = finish_contract_condition (condition);
> +
> +  if (condition == error_mark_node)
> +return error_mark_node;
> +
>   CONTRACT_CONDITION (contract) = condition;
> 
>   return contract;
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index 31ae9c2fb54..22c5e760aee 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -30835,6 +30835,9 @@ cp_parser_contract_attribute_spec (cp_parser
> *parser, tree attribute)
>   return error_mark_node;
> }
> 
> +  if (contract == error_mark_node)
> +return error_mark_node;
> +
>   return finish_contract_attribute (attribute, contract);
> }
> 
> diff --git a/gcc/testsuite/g++.dg/contracts/pr113968.C
> b/gcc/testsuite/g++.dg/contracts/pr113968.C
> new file mode 100644
> index 000..fbaad1c930d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/contracts/pr113968.C
> @@ -0,0 +1,29 @@
> +// check that an invalid contract condition doesn't cause an ICE
> +// { dg-do compile }
> +// { dg-options "-std=c++2a -fcontracts " }
> +
> +struct A
> +{
> +  A (A&);
> +};
> +struct S
> +{
> +  void f(A a)
> +[[ pre : a]] // { dg-error "could not convert" }
> +[[ pre : a.b]]// { dg-error "has no member" }
> +{
> +
> +}
> +};
> +void f(A a)
> +  [[ pre : a]] // { dg-error "could not convert" }
> +  [[ pre : a.b]]// { dg-error "has no member" }
> +  {
> +[[ assert : a ]]; // { dg-error "could not convert" }
> +[[ assert : a.b ]];// { dg-error "has no member" }
> +  }
> +
> +int
> +main ()
> +{
> +}
> -- 
> 2.45.2



[pushed] wwwdocs: contribute: Uppercase spelling of Bugzilla

2024-07-05 Thread Gerald Pfeifer


---
 htdocs/contribute.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/contribute.html b/htdocs/contribute.html
index 24f8d30b..caff1f2a 100644
--- a/htdocs/contribute.html
+++ b/htdocs/contribute.html
@@ -322,7 +322,7 @@ not be confused with other patches.
 
 If your patch relates a bug in the compiler for which there is an
 existing PR number the bug number should be stated.  Use the
-short-form variant [PRn] wwwdocs: without the bugzilla
+short-form variant [PRn] wwwdocs: without the Bugzilla
 component identifier and with no space between 'PR' and the number.
 The body of the commit message should still contain the full form
 (PR /n) within the body of
-- 
2.45.2


Re: [RFC] tree-if-conv: Handle nonzero masked elements [PR115336].

2024-07-05 Thread Robin Dapp
> FTR, my concern & suggestion was:
> 
>   I suppose the difficulty is that we might make:
> 
> MASK_LOAD (mask, ptr, some-arbitrary-else-value)
> 
>   seem as cheap as:
> 
> MASK_LOAD (mask, ptr, { 0, 0,. ... 0})
> 
>   which definitely isn't the case for SVE (and I'm guessing also
>   for AVX).  It would be better to keep the:
> 
> COND_EXPR (mask, ..., some-arbitrary-else-value)
> 
>   separate and try to optimise it with surrounding code.

Ah, I rather understood that as an "either or" rather than an "and".

> That is, we'd add the else value to the maskload optab and MASK_LOAD
> internal function, but use predicates to test whether a target supports
> a particular else value.

So e.g. for SVE we'd have

  res = MASK_LOAD (mask, ptr, {0, 0, ..., 0})
  COND_EXPR (mask, res, {0, 0, ...})

which would always be folded/optimized away, while for RVV
we'd have

  res = MASK_LOAD (mask, ptr, whatever_else_value)
  COND_EXPR (mask, res, {0, 0, ...})

that could/should result in a separate instruction depending on
the rest of the code?

Regards
 Robin


Re: [RFC] tree-if-conv: Handle nonzero masked elements [PR115336].

2024-07-05 Thread Robin Dapp
> To me this looks like mis-applying of match.pd:6083?
> 
> Applying pattern match.pd:6083, gimple-match-1.cc:45749
> gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219;
> new phi replacement stmt
> iftmp.0_62 = iftmp.0_61 | _219;
> 
> so originally it wasn't
> 
>   iftmp.0_61 = .MASK_LOAD (_260, 8B, _259);
>   iftmp.0_62 = iftmp.0_61 | _219;
> 
> but sth like
> 
>   iftmp.0_62 =  ? _219 : iftmp.0_61;
> 
> and iftmp.0_61 is zero-one-valued because it is _Bool.  Shouldn't
> if-conversion always turn the COND_EXPR into a .COND_IOR here?

Yes, just that the mask is inverted in the original COND_EXPR.
Hmm, with COND_IOR we'd explicitly set the else value to the
target's preferred else value then I presume.  That's probably
more elegant than an extra VEC_COND but I could imagine seeing
the same problems of redundant VEC_CONDs not being folded.

Regards
 Robin



Re: [PATCH v3] Arm: Fix ldrd offset range [PR115153]

2024-07-05 Thread Richard Earnshaw (lists)
On 27/06/2024 17:25, Wilco Dijkstra wrote:
> Hi Richard,
> 
>> The Linaro CI is reporting an ICE while building libgfortran with this 
>> change.
> 
> So it looks like Thumb-2 oddly enough restricts the negative range of DFmode
> eventhough that is unnecessary and inefficient. The easiest workaround turned
> out to avoid using checked adjust_address.

It is necessary because DFmode has constraints such as "r" (GENERAL_REGS) and 
"m" in a load, so the range needs to be legitimate here as well as when a 
VFP_REG is used.

It might be possible to fix this, but it would involve a lot of rewriting the 
code to get it right.

But you're right that we don't need to validate this change: we just assume 
it's right anyway and if it were wrong something bigger has already gone wrong 
earlier and we'd end up with invalid offsets in the assembly code.

OK.

R.

> 
> Cheers,
> Wilco
> 
> 
> v3: Use adjust_address_nv with DFmode.
> 
> The valid offset range of LDRD in arm_legitimate_index_p is increased to
> -1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode.
> Fix this by moving the LDRD check earlier.
> 
> Passes bootstrap & regress, OK for commit and backport to GCC14.2?
> 
> gcc:
> PR target/115153
> * config/arm/arm.cc (arm_legitimate_index_p): Move LDRD case before 
> NEON.
> (thumb2_legitimate_index_p): Update comments.
> (output_move_neon): Use DFmode for vldr/vstr and non-checking 
> adjust_address.
> 
> gcc/testsuite:
> PR target/115153
> * gcc.target/arm/pr115153.c: Add new test.
> * lib/target-supports.exp: Add arm_arch_v7ve_neon target support.
> 
> ---
> 
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index 
> 7d67d2cfee9f4edc91f187e940be40c07ff726cd..5e6608a30f17bf8185464e3fd0b202a71ff83fc8
>  100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -8858,6 +8858,28 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
> RTX_CODE outer,
>   && INTVAL (index) > -1024
>   && (INTVAL (index) & 3) == 0);
>  
> +  if (arm_address_register_rtx_p (index, strict_p)
> +  && (GET_MODE_SIZE (mode) <= 4))
> +return 1;
> +
> +  /* This handles DFmode only if !TARGET_HARD_FLOAT.  */
> +  if (mode == DImode || mode == DFmode)
> +{
> +  if (code == CONST_INT)
> + {
> +   HOST_WIDE_INT val = INTVAL (index);
> +
> +   /* Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
> +  If vldr is selected it uses arm_coproc_mem_operand.  */
> +   if (TARGET_LDRD)
> + return val > -256 && val < 256;
> +   else
> + return val > -4096 && val < 4092;
> + }
> +
> +  return TARGET_LDRD && arm_address_register_rtx_p (index, strict_p);
> +}
> +
>/* For quad modes, we restrict the constant offset to be slightly less
>   than what the instruction format permits.  We do this because for
>   quad mode moves, we will actually decompose them into two separate
> @@ -8870,7 +8892,7 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
> RTX_CODE outer,
>   && (INTVAL (index) & 3) == 0);
>  
>/* We have no such constraint on double mode offsets, so we permit the
> - full range of the instruction format.  */
> + full range of the instruction format.  Note DImode is included here.  */
>if (TARGET_NEON && VALID_NEON_DREG_MODE (mode))
>  return (code == CONST_INT
>   && INTVAL (index) < 1024
> @@ -8883,27 +8905,6 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
> RTX_CODE outer,
>   && INTVAL (index) > -1024
>   && (INTVAL (index) & 3) == 0);
>  
> -  if (arm_address_register_rtx_p (index, strict_p)
> -  && (GET_MODE_SIZE (mode) <= 4))
> -return 1;
> -
> -  if (mode == DImode || mode == DFmode)
> -{
> -  if (code == CONST_INT)
> - {
> -   HOST_WIDE_INT val = INTVAL (index);
> -
> -   /* Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
> -  If vldr is selected it uses arm_coproc_mem_operand.  */
> -   if (TARGET_LDRD)
> - return val > -256 && val < 256;
> -   else
> - return val > -4096 && val < 4092;
> - }
> -
> -  return TARGET_LDRD && arm_address_register_rtx_p (index, strict_p);
> -}
> -
>if (GET_MODE_SIZE (mode) <= 4
>&& ! (arm_arch4
>   && (mode == HImode
> @@ -9006,7 +9007,7 @@ thumb2_legitimate_index_p (machine_mode mode, rtx 
> index, int strict_p)
>   && (INTVAL (index) & 3) == 0);
>  
>/* We have no such constraint on double mode offsets, so we permit the
> - full range of the instruction format.  */
> + full range of the instruction format.  Note DImode is included here.  */
>if (TARGET_NEON && VALID_NEON_DREG_MODE (mode))
>  return (code == CONST_INT
>   && INTVAL (index) < 1024
> @@ -9017,6 +9018,7 @@ thumb2_legitimate_index_p (machine_mode mode, rtx 
> index, int strict_p)
>&& (GET_MODE_SIZE (mode) <= 4))
>  return 1;
>  
> +  /* This

[PATCH] c++: normalizing ttp parm constraints [PR115656]

2024-07-05 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14 and perhaps 13?

Alternatively we can set current_template_parms from weakly_subsumes
instead, who has only one caller anyway.

-- >8 --

Here we normalize the constraint same_as for the first
time during constraint subsumption checking of B / TT as part of ttp
coercion.  During this normalization the set of in-scope template
parameters i.e. current_template_parms is empty, which tricks the
satisfaction cache into thinking that the satisfaction value of the
constraint is independent of its template parameters, and we incorrectly
conflate the satisfaction value with auto = bool vs auto = long and
accept the specialization A.

This patch fixes this by setting current_template_parms appropirately
during subsumption checking.

PR c++/115656

gcc/cp/ChangeLog:

* pt.cc (is_compatible_template_arg): Set current_template_parms
around the call to weakly_subsumes.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-ttp7.C: New test.
---
 gcc/cp/pt.cc   |  4 
 gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C | 12 
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 017cc7fd0ab..1f6553790a5 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -8493,6 +8493,10 @@ is_compatible_template_arg (tree parm, tree arg, tree 
args)
 return false;
 }
 
+  /* Normalization needs to know the effective set of in-scope
+ template parameters.  */
+  auto ctp = make_temp_override (current_template_parms,
+DECL_TEMPLATE_PARMS (arg));
   return weakly_subsumes (parm_cons, arg);
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C
new file mode 100644
index 000..bc123ecf75e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C
@@ -0,0 +1,12 @@
+// PR c++/115656
+// { dg-do compile { target c++20 } }
+
+template concept same_as = __is_same(T, U);
+
+template U, template> class TT>
+struct A { };
+
+template> class B;
+
+A a1;
+A a2; // { dg-error "constraint failure" }
-- 
2.45.2.746.g06e570c0df



Re: [RFC] tree-if-conv: Handle nonzero masked elements [PR115336].

2024-07-05 Thread Richard Sandiford
Robin Dapp  writes:
>> FTR, my concern & suggestion was:
>> 
>>   I suppose the difficulty is that we might make:
>> 
>> MASK_LOAD (mask, ptr, some-arbitrary-else-value)
>> 
>>   seem as cheap as:
>> 
>> MASK_LOAD (mask, ptr, { 0, 0,. ... 0})
>> 
>>   which definitely isn't the case for SVE (and I'm guessing also
>>   for AVX).  It would be better to keep the:
>> 
>> COND_EXPR (mask, ..., some-arbitrary-else-value)
>> 
>>   separate and try to optimise it with surrounding code.
>
> Ah, I rather understood that as an "either or" rather than an "and".

Yeah, could have been clearer, sorry.

>> That is, we'd add the else value to the maskload optab and MASK_LOAD
>> internal function, but use predicates to test whether a target supports
>> a particular else value.
>
> So e.g. for SVE we'd have
>
>   res = MASK_LOAD (mask, ptr, {0, 0, ..., 0})
>   COND_EXPR (mask, res, {0, 0, ...})
>
> which would always be folded/optimized away, while for RVV
> we'd have
>
>   res = MASK_LOAD (mask, ptr, whatever_else_value)
>   COND_EXPR (mask, res, {0, 0, ...})
>
> that could/should result in a separate instruction depending on
> the rest of the code?

Yeah, I think so.  I guess for RVV there's a choice between:

(1) making the insn predicate accept all else values and making
the insn emit an explicit blend between the loaded result
and the else value

(2) making the insn predicate only accept “undefined” (SCRATCH in
rtl terms)

(2) sounds more in keeping with Juzhe's fix for PR110751.

Thanks,
Richard


Re: [PATCH] c++, contracts: Fix ICE in create_tmp_var [PR113968]

2024-07-05 Thread Jason Merrill

On 7/5/24 10:25 AM, Nina Dinka Ranns wrote:

Certain places in contract parsing currently do not check for errors.
This results in contracts
with embedded errors which eventually confuse gimplify. Checks for
errors added in
grok_contract() and cp_parser_contract_attribute_spec() to exit early
if an error is encountered.


Thanks for the patch!


Tested on x86_64-pc-linux-gnu
---

 PR c++/113968

gcc/cp/ChangeLog:

 * contracts.cc (grok_contract): Check for error_mark_node early
   exit


These hunks are OK.


 * parser.cc (cp_parser_contract_attribute_spec): Check for
   error_mark_node early exit


This seems redundant, since finish_contract_attribute already checks for 
error_mark_node and we're returning its result unchanged.


Also, the convention is for wrapped lines in ChangeLog entries to line 
up with the *, and to finish sentences with a period.



gcc/testsuite/ChangeLog:

 * g++.dg/contracts/pr113968.C: New test.

Signed-off-by: Nina Ranns 


---
  gcc/cp/contracts.cc   |  7 ++
  gcc/cp/parser.cc  |  3 +++
  gcc/testsuite/g++.dg/contracts/pr113968.C | 29 +++
  3 files changed, 39 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/contracts/pr113968.C

diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
index 634e3cf4fa9..a7d0fdacf6e 100644
--- a/gcc/cp/contracts.cc
+++ b/gcc/cp/contracts.cc
@@ -750,6 +750,9 @@ tree
  grok_contract (tree attribute, tree mode, tree result, cp_expr condition,
 location_t loc)
  {
+  if (condition == error_mark_node)
+return error_mark_node;
+
tree_code code;
if (is_attribute_p ("assert", attribute))
  code = ASSERTION_STMT;
@@ -785,6 +788,10 @@ grok_contract (tree attribute, tree mode, tree
result, cp_expr condition,

/* The condition is converted to bool.  */
condition = finish_contract_condition (condition);
+
+  if (condition == error_mark_node)
+return error_mark_node;
+
CONTRACT_CONDITION (contract) = condition;

return contract;
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 31ae9c2fb54..22c5e760aee 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -30835,6 +30835,9 @@ cp_parser_contract_attribute_spec (cp_parser
*parser, tree attribute)
return error_mark_node;
  }

+  if (contract == error_mark_node)
+return error_mark_node;
+
return finish_contract_attribute (attribute, contract);
  }

diff --git a/gcc/testsuite/g++.dg/contracts/pr113968.C
b/gcc/testsuite/g++.dg/contracts/pr113968.C
new file mode 100644
index 000..fbaad1c930d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/contracts/pr113968.C
@@ -0,0 +1,29 @@
+// check that an invalid contract condition doesn't cause an ICE
+// { dg-do compile }
+// { dg-options "-std=c++2a -fcontracts " }
+
+struct A
+{
+  A (A&);
+};
+struct S
+{
+  void f(A a)
+[[ pre : a]] // { dg-error "could not convert" }
+[[ pre : a.b]]// { dg-error "has no member" }
+{
+
+}
+};
+void f(A a)
+  [[ pre : a]] // { dg-error "could not convert" }
+  [[ pre : a.b]]// { dg-error "has no member" }
+  {
+[[ assert : a ]]; // { dg-error "could not convert" }
+[[ assert : a.b ]];// { dg-error "has no member" }
+  }
+
+int
+main ()
+{
+}




[PATCH] c++: missing SFINAE during alias CTAD [PR115296]

2024-07-05 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/14?

-- >8 --

During the alias CTAD transformation, if substitution failed for some
guide we should just discard the guide silently.  We currently do
discard the guide, but not silently, which causes us to reject the
below testcase due to forming a too-large array type when transforming
the user-defined deduction guides.

This patch fixes this by passing complain=tf_none instead of
=tf_warning_or_error in the couple of spots where we expect subsitution
to possibly fail and so subsequently check for error_mark_node.

PR c++/115296

gcc/cp/ChangeLog:

* pt.cc (alias_ctad_tweaks): Pass complain=tf_none to
tsubst_decl and tsubst_constraint_info.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-alias23.C: New test.
---
 gcc/cp/pt.cc  |  4 ++--
 .../g++.dg/cpp2a/class-deduction-alias23.C| 20 +++
 2 files changed, 22 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias23.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d1316483e24..a382dce8788 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -30451,7 +30451,7 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
  /* Parms are to have DECL_CHAIN tsubsted, which would be skipped
 if cp_unevaluated_operand.  */
  cp_evaluated ev;
- g = tsubst_decl (DECL_TEMPLATE_RESULT (f), targs, complain,
+ g = tsubst_decl (DECL_TEMPLATE_RESULT (f), targs, tf_none,
   /*use_spec_table=*/false);
}
  if (g == error_mark_node)
@@ -30478,7 +30478,7 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
{
  if (tree outer_targs = outer_template_args (f))
ci = tsubst_constraint_info (ci, outer_targs, complain, 
in_decl);
- ci = tsubst_constraint_info (ci, targs, complain, in_decl);
+ ci = tsubst_constraint_info (ci, targs, tf_none, in_decl);
}
  if (ci == error_mark_node)
continue;
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias23.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias23.C
new file mode 100644
index 000..e5766586761
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias23.C
@@ -0,0 +1,20 @@
+// PR c++/115296
+// { dg-do compile { target c++20 } }
+
+using size_t = decltype(sizeof(0));
+
+template
+struct span { span(T); };
+
+template
+span(T(&)[N]) -> span; // { dg-bogus "array exceeds maximum" }
+
+template
+requires (sizeof(T(&)[N]) != 42) // { dg-bogus "array exceeds maximum" }
+span(T*) -> span;
+
+template
+using array_view = span;
+
+span x = 0;
+array_view y = 0;
-- 
2.45.2.746.g06e570c0df



Re: [PATCH] c++, coroutines, contracts: Handle coroutine and void functions [PR110871,PR110872,PR115434].

2024-07-05 Thread Iain Sandoe



> On 17 Jun 2024, at 13:15, Iain Sandoe  wrote:
> 
> This patch came out of a discussion on Mattermost about how to deal
> with contracts/coroutines integration.  Actually, it would also allow
> some semantic checking to be deferred until the same spot - at which
> time there are no dependent types, which can simplify the process.
> 
> NOTE: this is a fix for bugs in the existing '2a' contracts impl. it
> does not attempt to make any of the changes required by P2900 to 
> either code-gen or constexpr handling.
> 
> Tested on x86_64-darwin, so far, OK for trunk if testing succeeds on
> x86_64/powerpc64 linux too?

Testing was sucessful and these patches have been on the contracts-noattr
compiler explorer instance for a while now.

OK for trunk?
thanks
Iain

> thanks,
> Iain
> 
> --- 8< ---
> 
> The current implementation of contracts emits the checks into function
> bodies in three places; for pre-conditions at the start of the body,
> for asserts in-line in the function body and for post-conditions as an
> addition to return statements.
> 
> In general (at least with existing "2a" contract semantics) the in-line
> contract asserts behave as expected.
> 
> However, the mechanism is not applicable to:
> 
> * Handling pre conditions in coroutines since, for those, the standard
>  specifies a wrapping of the original function body by functionality
>  implementing initial and final suspends (along with some housekeeping
>  to route exceptions).  Thus for such transformed function bodies, the
>  preconditions then get actioned after the initial suspend, which does
>  not behave as intended.
> 
>  * Handling post conditions in functions that do not have return
>statements (which applies to coroutines and void functions).
> 
> In the following, we identify a potentially transformed function body
> (in the case of coroutines, this is usually called the "ramp()" function).
> 
> The patch here re-implements the code insertion in one of the two following
> ways (code for exposition only):
> 
>  * For functions with no post-conditions we wrap the potentially
>transformed function as follows:
> 
>  {
> handle_pre_condition_checking ();
> potentially_transformed_function_body ();
>  }
> 
>  This implements the intent that the preconditions are processed after
>  the function parameters are initialised but before any other actions.
> 
>  * For functions with post-conditions:
> 
>  try
>   {
> if (preconditions_exist)
>   handle_pre_condition_checking ();
> potentially_transformed_function_body ();
>   }
>  finally
>   {
> handle_post_condition_checking ();
>   }
>  else [only if the function is not marked noexcept(true) ]
>   {
> __rethrow ();
>   }
> 
> In this, post-conditions [that might apply to the return value etc.]
> are evaluated on every non-exceptional edge out of the function.
> 
> At present, the model here is that exceptions thrown by the function
> propagate upwards as if there were no contracts present.  If the desired
> semantic becomes that an exception is counted as equivalent to a contract
> violation - then we can add a second handler in place of the rethrow.
> 
> At constexpr time we need to evaluate the contract conditions, but not
> the exceptional path, which is handled by a flag on the EH_ELSE_EXPR that
> indicates it is in use for contract handling.
> 
> This patch specifically does not address changes to code-gen and constexpr
> handling that are contained in P2900.
> 
>   PR c++/115434
>   PR c++/110871
>   PR c++/110872
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (cxx_eval_constant_expression): Handle EH_ELSE_EXPR.
>   * contracts.cc (finish_contract_attribute): Remove excess line.
>   (build_contract_condition_function): Post condition handlers are
>   void now.
>   (emit_postconditions_cleanup): Remove.
>   (emit_postconditions): New.
>   (add_pre_condition_fn_call): New.
>   (add_post_condition_fn_call): New.
>   (apply_preconditions): New.
>   (apply_postconditions): New.
>   (maybe_apply_function_contracts): New.
>   (apply_postcondition_to_return): Remove.
>   * contracts.h (apply_postcondition_to_return): Remove.
>   (maybe_apply_function_contracts): Add.
>   * coroutines.cc (coro_build_actor_or_destroy_function): Do not
>   copy contracts to coroutine helpers.
>   * cp-tree.h (CONTRACT_EH_ELSE_P): New.
>   * decl.cc (finish_function): Handle wrapping a possibly
>   transformed function body in contract checks.
>   * typeck.cc (check_return_expr): Remove handling of post
>   conditions on return expressions.
> 
> gcc/ChangeLog:
> 
>   * gimplify.cc (struct gimplify_ctx): Add a flag to show we are
>   expending a handler.
>   (gimplify_expr): When we are expanding a handler, and the body
>   transforms might have re-written DECL_RESULT into a gimple var,
>   ensure that hander references to DECL_RESULT are also re-written
>   to r

Re: [Fortran, Patch, PR 96992, V3] Fix Class arrays of different ranks are rejected as storage association argument

2024-07-05 Thread Harald Anlauf

Hi Andre,

Am 03.07.24 um 12:58 schrieb Andre Vehreschild:

Hi Harald,

I am sorry for the long delay, but fixing the negative stride lead from one
issue to the next. I finally got a version that does not regress. Please have a
look.

This patch has two parts:
1. The runtime library part in pr96992_3p1.patch and
2. the compiler changes in pr96992_3p2.patch.

In my branch also the two patches from Paul for pr59104 and pr102689 are
living, which might lead to small shifts during application of the patches.

NOTE, this patch adds internal packing and unpacking of class arrays similar to
the regular pack and unpack. I think this is necessary, because the regular
un-/pack does not use the vptr's _copy routine for moving data and therefore
may produce bugs.

The un-/pack_class routines are yet only used for converting a derived type
array to a class array. Extending their use when a UN-/PACK() is applied on a
class array is still to be done (as part of another PR).

Regtests fine on x86_64-pc-linux-gnu/ Fedora 39.


this is a really huge patch to review, and I am not sure that I can do
this without help from others.  Paul?  Anybody else?

As far as I can tell for now:

- pr96992_3p1.patch (the libgfortran part) looks good to me.

- git had some whitespace issues with pr96992_3p2.patch as attached,
  but I could fix that locally and do some testing parallel to reading.

A few advance comments on the latter patch:

- my understanding is that the PR at the end of a summary line should be
  like in:

Fortran: Fix rejecting class arrays of different ranks as storage
association argument [PR96992]

  I was told that this helps people explicitly scanning for the PR
  number in that place.

- some rewrites of logical conditions change the coding style from
  what it recommended GNU coding style, and I find the more compact
  way used in some places harder to grok (but that may be just me).
  Example:

@@ -8850,20 +8857,24 @@ gfc_conv_array_parameter (gfc_se * se, gfc_expr
* expr, bool g77,
   /* There is no need to pack and unpack the array, if it is contiguous
  and not a deferred- or assumed-shape array, or if it is simply
  contiguous.  */
-  no_pack = ((sym && sym->as
- && !sym->attr.pointer
- && sym->as->type != AS_DEFERRED
- && sym->as->type != AS_ASSUMED_RANK
- && sym->as->type != AS_ASSUMED_SHAPE)
- ||
-(ref && ref->u.ar.as
- && ref->u.ar.as->type != AS_DEFERRED
+  no_pack = false;
+  gfc_array_spec *as;
+  if (sym)
+{
+  symbol_attribute *attr
+   = &(IS_CLASS_ARRAY (sym) ? CLASS_DATA (sym)->attr : sym->attr);
+  as = IS_CLASS_ARRAY (sym) ? CLASS_DATA (sym)->as : sym->as;
+  no_pack
+   = (as && !attr->pointer && as->type != AS_DEFERRED
+  && as->type != AS_ASSUMED_RANK && as->type != AS_ASSUMED_SHAPE);
+}
+  if (ref && ref->u.ar.as)
+no_pack = no_pack
+ || (ref->u.ar.as->type != AS_DEFERRED
  && ref->u.ar.as->type != AS_ASSUMED_RANK
- && ref->u.ar.as->type != AS_ASSUMED_SHAPE)
- ||
-gfc_is_simply_contiguous (expr, false, true));
-
-  no_pack = contiguous && no_pack;
+ && ref->u.ar.as->type != AS_ASSUMED_SHAPE);
+  no_pack
+= contiguous && (no_pack || gfc_is_simply_contiguous (expr, false,
true));

   /* If we have an EXPR_OP or a function returning an explicit-shaped
  or allocatable array, an array temporary will be generated which


I understand that this may be your personal coding style, but you
might keep in mind that reviewers have to understand the code, too...

I have not fully understood your logic when packing is now invoked.
We not only need to do it for explicit-size arrays, but also for
assumed-size.  This still fails for my slightly extended testcase
(see attached) where I pass the class array via:

  subroutine d4(x,n)
integer, intent(in) :: n
!   class (foo), intent(inout) :: x(n)  ! OK
class (foo), intent(inout) :: x(*)  ! not OK
call d3(x,n)! Simply pass assumed-size array
  end subroutine d4

I am unable to point to the places in your patch where you need to
handle that in addition.

Otherwise I was unable to see any obvious, major problem with the
patch, but then I am not fluent enough in class handling in the
gfortran FE.  So if e.g. Paul jumps in here within the next 72 hours,
it would be great.

So here comes the issue with the attached code variant.
After your patch, this prints as last 4 relevant lines:

 full: -43  44  45 -46  47
48 -49  50
 d3_1: -43  44  45
 d3_2:  43 -44 -45
 full:  43 -44 -45 -46  47
48 -49  50

while when switching the declaration of the dummy argument of d4:

 full: -43  44  45 

ping: [PATCH] libcpp: Support extended characters for #pragma {push,pop}_macro [PR109704]

2024-07-05 Thread Lewis Hyatt
Hello-

https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642926.html

May I please ping this one again? It's the largest remaining gap in
UTF-8 support for libcpp that I know of. Thanks!

-Lewis

On Tue, May 28, 2024 at 7:46 PM Lewis Hyatt  wrote:
>
> Hello-
>
> May I please ping this one (now for GCC 15)? Thanks!
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642926.html
>
> -Lewis
>
> On Sat, Feb 10, 2024 at 9:02 AM Lewis Hyatt  wrote:
> >
> > Hello-
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642926.html
> >
> > May I please ping this one? Thanks!
> >
> > On Sat, Jan 13, 2024 at 5:12 PM Lewis Hyatt  wrote:
> > >
> > > Hello-
> > >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109704
> > >
> > > The below patch fixes the issue noted in the PR that extended characters
> > > cannot appear in the identifier passed to a #pragma push_macro or #pragma
> > > pop_macro. Bootstrap + regtest all languages on x86-64 Linux. Is it OK for
> > > GCC 13 please?


Re: [wwwdocs] [PATCH 0/4] Fix various typos

2024-07-05 Thread Gerald Pfeifer
Hi Pokechu22,

On Tue, 22 Mar 2022, Pokechu22 via Gcc-patches wrote:
> While working on a separate patch, I found several typos on the website.
> I have only looked within the htdocs directory, not its subdirectories.

thanks for those patches, and sorry they fell through the cracks 
originally. I'll keep my eyes open for potential future ones.

> These are individual patches per file, since that seemed reasonable to 
> me.

Agreed.

> I believe this is a small enough change that I do not need to go through
> copyright assignment or DCO certification (if I had supplied a list of
> files and line numbers, I suspect anyone else would have made the same
> patches).  I would rather remain pseudonymous.

For DCO certification we need real names. Luckily, as you mentioned, these 
were not big and for any still relevant I did some spell checking on those
files and compared the findings. And some of your cathces were found and 
re-created independently.

I believe we have now handled everything your reported/submitted. Please 
advise if there is anything open (or newly found)!

Thank you,
Gerald


[PATCH, pushed] Fortran: switch test to use issignaling() built-in

2024-07-05 Thread FX Coudert
Pushed after testing on x86_64-unknown-linux-gnu.

The macro may not be present in all libc's, but the built-in is always 
available.
See https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656409.html for context.


gcc/testsuite/ChangeLog:

* gfortran.dg/ieee/signaling_2.f90: Adjust test.
* gfortran.dg/ieee/signaling_2_c.c: Adjust test.




0001-Fortran-switch-test-to-use-issignaling-built-in.patch
Description: Binary data


Re: [PATCH, gfortran] libgfortran: implement fpu-macppc for Darwin, support IEEE arithmetic

2024-07-05 Thread FX Coudert
> This part of the patch is quite old, but from the remaining log it looks I 
> got an error here:
> Now on a second thought, this did not require a fix perhaps. We can drop it.

I have addressed this: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656484.html
The test should now be run on all targets, regardless of the issignaling() 
macro availability.


> By the way, do we have some point of comparison from other ppc32 platforms, 
> Linux or BSD (for the recent gcc master)?

You can have a look at testresults here: 
https://gcc.gnu.org/pipermail/gcc-testresults/

FX

Re: [PATCH] x86, Darwin: Fix bootstrap for 32b multilibs/hosts.

2024-07-05 Thread Iain Sandoe



> On 5 Jul 2024, at 09:34, Iain Sandoe  wrote:
> 
> This is Darwin-local, and I would like to apply it today to restore
> bootstrap before my weekend test-runs,

tested on x86_64-linux (m32/m64) x86_64-darwin17 (m32/m64)
i686-darwin17 (m32/m64) i686-darwin9 (m32/m64) x86_64-darwin21 (m64)
pushed to trunk to restore bootstrap on the affected targets,
thanks
Iain

> but would welcome any comments
> or suggestions.
> 
> thanks
> Iain
> 
> --- 8< ---
> 
> r15-1735-ge62ea4fb8ffcab06ddd  contained changes that altered the
> codegen for 32b Darwin (whether hosted on 64b or as 32b host) such
> that the per function picbase load is called multiple times in some
> cases.  Darwin's back end is not expecting this (and indeed some of
> the handling depends on a single instance).
> 
> The fixes the issue by marking those instructions as not copyable
> (as suggested by Andrew Pinski).
> 
> The change is Darwin-specific.
> 
> gcc/ChangeLog:
> 
>   * config/i386/i386.cc (ix86_cannot_copy_insn_p): New.
>   (TARGET_CANNOT_COPY_INSN_P): New.
> 
> Signed-off-by: Iain Sandoe 
> ---
> gcc/config/i386/i386.cc | 23 +++
> 1 file changed, 23 insertions(+)
> 
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 99def8d4a77..f75250f79de 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -27025,6 +27025,29 @@ ix86_libm_function_max_error (unsigned cfn, 
> machine_mode mode,
> #undef TARGET_LIBM_FUNCTION_MAX_ERROR
> #define TARGET_LIBM_FUNCTION_MAX_ERROR ix86_libm_function_max_error
> 
> +#if TARGET_MACHO
> +static bool
> +ix86_cannot_copy_insn_p (rtx_insn *insn)
> +{
> +  if (TARGET_64BIT)
> +return false;
> +
> +  rtx set = single_set (insn);
> +  if (set)
> +{
> +  rtx src = SET_SRC (set);
> +  if (GET_CODE (src) == UNSPEC
> +   && XINT (src, 1) == UNSPEC_SET_GOT)
> + return true;
> +}
> +  return false;
> +}
> +
> +#undef TARGET_CANNOT_COPY_INSN_P
> +#define TARGET_CANNOT_COPY_INSN_P ix86_cannot_copy_insn_p
> +
> +#endif
> +
> #if CHECKING_P
> #undef TARGET_RUN_TARGET_SELFTESTS
> #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
> -- 
> 2.39.2 (Apple Git-143)
>