[gcc r15-1724] [PR115565] cse: Don't use a valid regno for non-register in comparison_qty

2024-06-29 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:69bc5fb97dc3fada81869e00fa65d39f7def6acf

commit r15-1724-g69bc5fb97dc3fada81869e00fa65d39f7def6acf
Author: Maciej W. Rozycki 
Date:   Sat Jun 29 23:26:55 2024 +0100

[PR115565] cse: Don't use a valid regno for non-register in comparison_qty

Use INT_MIN rather than -1 in `comparison_qty' where a comparison is not
with a register, because the value of -1 is actually a valid reference
to register 0 in the case where it has not been assigned a quantity.

Using -1 makes `REG_QTY (REGNO (folded_arg1)) == ent->comparison_qty'
comparison in `fold_rtx' to incorrectly trigger in rare circumstances
and return true for a memory reference, making CSE consider a comparison
operation to evaluate to a constant expression and consequently make the
resulting code incorrectly execute or fail to execute conditional
blocks.

This has caused a miscompilation of rwlock.c from LinuxThreads for the
`alpha-linux-gnu' target, where `rwlock->__rw_writer != thread_self ()'
expression (where `thread_self' returns the thread pointer via a PALcode
call) has been decided to be always true (with `ent->comparison_qty'
using -1 for a reference to to `rwlock->__rw_writer', while register 0
holding the thread pointer retrieved by `thread_self') and code for the
false case has been optimized away where it mustn't have, causing
program lockups.

The issue has been observed as a regression from commit 08a692679fb8
("Undefined cse.c behaviour causes 3.4 regression on HPUX"),
, and up to
commit 932ad4d9b550 ("Make CSE path following use the CFG"),
, where CSE
has been restructured sufficiently for the issue not to trigger with the
original reproducer anymore.  However the original bug remains and can
trigger, because `comparison_qty' will still be assigned -1 for a memory
reference and the `reg_qty' member of a `cse_reg_info_table' entry will
still be assigned -1 for register 0 where the entry has not been
assigned a quantity, e.g. at initialization.

Use INT_MIN then as noted above, so that the value remains negative, for
consistency with the REGNO_QTY_VALID_P macro (even though not used on
`comparison_qty'), and then so that it should not ever match a valid
negated register number, fixing the regression with commit 08a692679fb8.

gcc/
PR rtl-optimization/115565
* cse.cc (record_jump_cond): Use INT_MIN rather than -1 for
`comparison_qty' if !REG_P.

Diff:
---
 gcc/cse.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index c53deecbe54..65794ac5f2c 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -239,7 +239,7 @@ static int next_qty;
the constant being compared against, or zero if the comparison
is not against a constant.  `comparison_qty' holds the quantity
being compared against when the result is known.  If the comparison
-   is not with a register, `comparison_qty' is -1.  */
+   is not with a register, `comparison_qty' is INT_MIN.  */
 
 struct qty_table_elem
 {
@@ -4058,7 +4058,7 @@ record_jump_cond (enum rtx_code code, machine_mode mode, 
rtx op0, rtx op1)
   else
{
  ent->comparison_const = op1;
- ent->comparison_qty = -1;
+ ent->comparison_qty = INT_MIN;
}
 
   return;


[gcc r15-1881] ada: Make the names of uninstalled cross-gnattools consistent across builds

2024-07-07 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:d364c4ced8823bc41fda84f182e1d64e7870549e

commit r15-1881-gd364c4ced8823bc41fda84f182e1d64e7870549e
Author: Maciej W. Rozycki 
Date:   Sun Jul 7 15:04:51 2024 +0100

ada: Make the names of uninstalled cross-gnattools consistent across builds

We suffer from an inconsistency in the names of uninstalled gnattools
executables in cross-compiler configurations.  The cause is a recipe we
have:

ada.all.cross:
for tool in $(ADA_TOOLS) ; do \
  if [ -f $$tool$(exeext) ] ; \
  then \
$(MV) $$tool$(exeext) $$tool-cross$(exeext); \
  fi; \
done

the intent of which is to give the names of gnattools executables the
'-cross' suffix, consistently with the compiler drivers: 'gcc-cross',
'g++-cross', etc.

A problem with the recipe is that this 'make' target is called too early
in the build process, before gnattools have been made.  Consequently no
renames happen and owing to that they are conditional on the presence of
the individual executables the recipe succeeds doing nothing.

However if a target is requested later on such as 'make pdf' that does
not cause gnattools executables to be rebuilt, then 'ada.all.cross' does
succeed in renaming the executables already present in the build tree.
Then if the 'gnat' testsuite is run later on which expects non-suffixed
'gnatmake' executable, it does not find the 'gnatmake-cross' executable
in the build tree and may either catastrophically fail or incorrectly
use a system-installed copy of 'gnatmake'.

Of course if a target is requested such as `make all' that does cause
gnattools executables to be rebuilt, then both suffixed and non-suffixed
uninstalled executables result.

Fix the problem by moving the renaming of gnattools to a separate 'make'
recipe, pasted into a new 'gnattools-cross-mv' target and the existing
legacy 'cross-gnattools' target.  Then invoke the new target explicitly
from the 'gnattools-cross' recipe in gnattools/.

Update the test harness accordingly, so that suffixed gnattools are used
in cross-compilation testsuite runs.

gcc/ada/
* gcc-interface/Make-lang.in (ada.all.cross): Move recipe to...
(GNATTOOLS_CROSS_MV): ... this new variable.
(cross-gnattools): Paste it here.
(gnattools-cross-mv): New target.

gnattools/
* Makefile.in (gnattools-cross): Also build 'gnattools-cross-mv'
in GCC_DIR.

gcc/testsuite/
* lib/gnat.exp (local_find_gnatmake, find_gnatclean): Use
'-cross' suffix where testing a cross-compiler.

Diff:
---
 gcc/ada/gcc-interface/Make-lang.in | 19 ---
 gcc/testsuite/lib/gnat.exp | 22 ++
 gnattools/Makefile.in  |  1 +
 3 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/gcc/ada/gcc-interface/Make-lang.in 
b/gcc/ada/gcc-interface/Make-lang.in
index ebf1f70de78..b2841104651 100644
--- a/gcc/ada/gcc-interface/Make-lang.in
+++ b/gcc/ada/gcc-interface/Make-lang.in
@@ -780,6 +780,7 @@ regnattools:
 cross-gnattools: force
$(MAKE) -C ada $(ADA_TOOLS_FLAGS_TO_PASS) gnattools1-re
$(MAKE) -C ada $(ADA_TOOLS_FLAGS_TO_PASS) gnattools2
+   $(GNATTOOLS_CROSS_MV)
 
 canadian-gnattools: force
$(MAKE) -C ada $(ADA_TOOLS_FLAGS_TO_PASS) gnattools1-re
@@ -795,19 +796,23 @@ gnatlib gnatlib-sjlj gnatlib-zcx gnatlib-shared: force
   FORCE_DEBUG_ADAFLAGS="$(FORCE_DEBUG_ADAFLAGS)" \
   $@
 
+gnattools-cross-mv:
+   $(GNATTOOLS_CROSS_MV)
+
+GNATTOOLS_CROSS_MV=\
+  for tool in $(ADA_TOOLS) ; do \
+if [ -f $$tool$(exeext) ] ; \
+then \
+  $(MV) $$tool$(exeext) $$tool-cross$(exeext); \
+fi; \
+  done
+
 # use only for native compiler
 gnatlib_and_tools: gnatlib gnattools
 
 # Build hooks:
 
 ada.all.cross:
-   for tool in $(ADA_TOOLS) ; do \
- if [ -f $$tool$(exeext) ] ; \
- then \
-   $(MV) $$tool$(exeext) $$tool-cross$(exeext); \
- fi; \
-   done
-
 ada.start.encap:
 ada.rest.encap:
 ada.man:
diff --git a/gcc/testsuite/lib/gnat.exp b/gcc/testsuite/lib/gnat.exp
index 471f83e9844..c278cb7f044 100644
--- a/gcc/testsuite/lib/gnat.exp
+++ b/gcc/testsuite/lib/gnat.exp
@@ -199,12 +199,19 @@ proc prune_gnat_output { text } {
 # which prevent multilib from working, so define a new one.
 
 proc local_find_gnatmake {} {
+global target_triplet
 global tool_root_dir
+global host_triplet
 
 if ![is_remote host] {
-set file [lookfor_file $tool_root_dir gnatmake]
+   if { "$host_triplet" == "$target_triplet" } {
+   set gnatmake gnatmake
+   } else {
+   set gnatmake gnatmake-cross
+   }
+   set file [lookfor_file $tool_root_dir $gnatmake]
 if { $file == "" } {
-   s

[gcc r14-10485] [PR115565] cse: Don't use a valid regno for non-register in comparison_qty

2024-07-22 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:323d010fa5d433e6eb5ec5124544f19fb4b4eee6

commit r14-10485-g323d010fa5d433e6eb5ec5124544f19fb4b4eee6
Author: Maciej W. Rozycki 
Date:   Sat Jun 29 23:26:55 2024 +0100

[PR115565] cse: Don't use a valid regno for non-register in comparison_qty

Use INT_MIN rather than -1 in `comparison_qty' where a comparison is not
with a register, because the value of -1 is actually a valid reference
to register 0 in the case where it has not been assigned a quantity.

Using -1 makes `REG_QTY (REGNO (folded_arg1)) == ent->comparison_qty'
comparison in `fold_rtx' to incorrectly trigger in rare circumstances
and return true for a memory reference, making CSE consider a comparison
operation to evaluate to a constant expression and consequently make the
resulting code incorrectly execute or fail to execute conditional
blocks.

This has caused a miscompilation of rwlock.c from LinuxThreads for the
`alpha-linux-gnu' target, where `rwlock->__rw_writer != thread_self ()'
expression (where `thread_self' returns the thread pointer via a PALcode
call) has been decided to be always true (with `ent->comparison_qty'
using -1 for a reference to to `rwlock->__rw_writer', while register 0
holding the thread pointer retrieved by `thread_self') and code for the
false case has been optimized away where it mustn't have, causing
program lockups.

The issue has been observed as a regression from commit 08a692679fb8
("Undefined cse.c behaviour causes 3.4 regression on HPUX"),
, and up to
commit 932ad4d9b550 ("Make CSE path following use the CFG"),
, where CSE
has been restructured sufficiently for the issue not to trigger with the
original reproducer anymore.  However the original bug remains and can
trigger, because `comparison_qty' will still be assigned -1 for a memory
reference and the `reg_qty' member of a `cse_reg_info_table' entry will
still be assigned -1 for register 0 where the entry has not been
assigned a quantity, e.g. at initialization.

Use INT_MIN then as noted above, so that the value remains negative, for
consistency with the REGNO_QTY_VALID_P macro (even though not used on
`comparison_qty'), and then so that it should not ever match a valid
negated register number, fixing the regression with commit 08a692679fb8.

gcc/
PR rtl-optimization/115565
* cse.cc (record_jump_cond): Use INT_MIN rather than -1 for
`comparison_qty' if !REG_P.

(cherry picked from commit 69bc5fb97dc3fada81869e00fa65d39f7def6acf)

Diff:
---
 gcc/cse.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index c53deecbe547..65794ac5f2ca 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -239,7 +239,7 @@ static int next_qty;
the constant being compared against, or zero if the comparison
is not against a constant.  `comparison_qty' holds the quantity
being compared against when the result is known.  If the comparison
-   is not with a register, `comparison_qty' is -1.  */
+   is not with a register, `comparison_qty' is INT_MIN.  */
 
 struct qty_table_elem
 {
@@ -4058,7 +4058,7 @@ record_jump_cond (enum rtx_code code, machine_mode mode, 
rtx op0, rtx op1)
   else
{
  ent->comparison_const = op1;
- ent->comparison_qty = -1;
+ ent->comparison_qty = INT_MIN;
}
 
   return;


[gcc r13-8932] [PR115565] cse: Don't use a valid regno for non-register in comparison_qty

2024-07-22 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:4ce7c81212c7819dfe6dbbe2399220fb12da6d71

commit r13-8932-g4ce7c81212c7819dfe6dbbe2399220fb12da6d71
Author: Maciej W. Rozycki 
Date:   Sat Jun 29 23:26:55 2024 +0100

[PR115565] cse: Don't use a valid regno for non-register in comparison_qty

Use INT_MIN rather than -1 in `comparison_qty' where a comparison is not
with a register, because the value of -1 is actually a valid reference
to register 0 in the case where it has not been assigned a quantity.

Using -1 makes `REG_QTY (REGNO (folded_arg1)) == ent->comparison_qty'
comparison in `fold_rtx' to incorrectly trigger in rare circumstances
and return true for a memory reference, making CSE consider a comparison
operation to evaluate to a constant expression and consequently make the
resulting code incorrectly execute or fail to execute conditional
blocks.

This has caused a miscompilation of rwlock.c from LinuxThreads for the
`alpha-linux-gnu' target, where `rwlock->__rw_writer != thread_self ()'
expression (where `thread_self' returns the thread pointer via a PALcode
call) has been decided to be always true (with `ent->comparison_qty'
using -1 for a reference to to `rwlock->__rw_writer', while register 0
holding the thread pointer retrieved by `thread_self') and code for the
false case has been optimized away where it mustn't have, causing
program lockups.

The issue has been observed as a regression from commit 08a692679fb8
("Undefined cse.c behaviour causes 3.4 regression on HPUX"),
, and up to
commit 932ad4d9b550 ("Make CSE path following use the CFG"),
, where CSE
has been restructured sufficiently for the issue not to trigger with the
original reproducer anymore.  However the original bug remains and can
trigger, because `comparison_qty' will still be assigned -1 for a memory
reference and the `reg_qty' member of a `cse_reg_info_table' entry will
still be assigned -1 for register 0 where the entry has not been
assigned a quantity, e.g. at initialization.

Use INT_MIN then as noted above, so that the value remains negative, for
consistency with the REGNO_QTY_VALID_P macro (even though not used on
`comparison_qty'), and then so that it should not ever match a valid
negated register number, fixing the regression with commit 08a692679fb8.

gcc/
PR rtl-optimization/115565
* cse.cc (record_jump_cond): Use INT_MIN rather than -1 for
`comparison_qty' if !REG_P.

(cherry picked from commit 69bc5fb97dc3fada81869e00fa65d39f7def6acf)

Diff:
---
 gcc/cse.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index 8fbda4ecc867..9a399e312f17 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -239,7 +239,7 @@ static int next_qty;
the constant being compared against, or zero if the comparison
is not against a constant.  `comparison_qty' holds the quantity
being compared against when the result is known.  If the comparison
-   is not with a register, `comparison_qty' is -1.  */
+   is not with a register, `comparison_qty' is INT_MIN.  */
 
 struct qty_table_elem
 {
@@ -4068,7 +4068,7 @@ record_jump_cond (enum rtx_code code, machine_mode mode, 
rtx op0,
   else
{
  ent->comparison_const = op1;
- ent->comparison_qty = -1;
+ ent->comparison_qty = INT_MIN;
}
 
   return;


[gcc r12-10633] [PR115565] cse: Don't use a valid regno for non-register in comparison_qty

2024-07-22 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:8d8f804b18e4a38671957b3e4c239ef625506317

commit r12-10633-g8d8f804b18e4a38671957b3e4c239ef625506317
Author: Maciej W. Rozycki 
Date:   Sat Jun 29 23:26:55 2024 +0100

[PR115565] cse: Don't use a valid regno for non-register in comparison_qty

Use INT_MIN rather than -1 in `comparison_qty' where a comparison is not
with a register, because the value of -1 is actually a valid reference
to register 0 in the case where it has not been assigned a quantity.

Using -1 makes `REG_QTY (REGNO (folded_arg1)) == ent->comparison_qty'
comparison in `fold_rtx' to incorrectly trigger in rare circumstances
and return true for a memory reference, making CSE consider a comparison
operation to evaluate to a constant expression and consequently make the
resulting code incorrectly execute or fail to execute conditional
blocks.

This has caused a miscompilation of rwlock.c from LinuxThreads for the
`alpha-linux-gnu' target, where `rwlock->__rw_writer != thread_self ()'
expression (where `thread_self' returns the thread pointer via a PALcode
call) has been decided to be always true (with `ent->comparison_qty'
using -1 for a reference to to `rwlock->__rw_writer', while register 0
holding the thread pointer retrieved by `thread_self') and code for the
false case has been optimized away where it mustn't have, causing
program lockups.

The issue has been observed as a regression from commit 08a692679fb8
("Undefined cse.c behaviour causes 3.4 regression on HPUX"),
, and up to
commit 932ad4d9b550 ("Make CSE path following use the CFG"),
, where CSE
has been restructured sufficiently for the issue not to trigger with the
original reproducer anymore.  However the original bug remains and can
trigger, because `comparison_qty' will still be assigned -1 for a memory
reference and the `reg_qty' member of a `cse_reg_info_table' entry will
still be assigned -1 for register 0 where the entry has not been
assigned a quantity, e.g. at initialization.

Use INT_MIN then as noted above, so that the value remains negative, for
consistency with the REGNO_QTY_VALID_P macro (even though not used on
`comparison_qty'), and then so that it should not ever match a valid
negated register number, fixing the regression with commit 08a692679fb8.

gcc/
PR rtl-optimization/115565
* cse.cc (record_jump_cond): Use INT_MIN rather than -1 for
`comparison_qty' if !REG_P.

(cherry picked from commit 69bc5fb97dc3fada81869e00fa65d39f7def6acf)

Diff:
---
 gcc/cse.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index ca53974810ed..32e8ea79980d 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -239,7 +239,7 @@ static int next_qty;
the constant being compared against, or zero if the comparison
is not against a constant.  `comparison_qty' holds the quantity
being compared against when the result is known.  If the comparison
-   is not with a register, `comparison_qty' is -1.  */
+   is not with a register, `comparison_qty' is INT_MIN.  */
 
 struct qty_table_elem
 {
@@ -4068,7 +4068,7 @@ record_jump_cond (enum rtx_code code, machine_mode mode, 
rtx op0,
   else
{
  ent->comparison_const = op1;
- ent->comparison_qty = -1;
+ ent->comparison_qty = INT_MIN;
}
 
   return;


[gcc r15-843] vax: Fix descriptions of the FP format options [PR79646]

2024-05-26 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:a7f6543f21303583356fd2d2d1805bffbecc1bc5

commit r15-843-ga7f6543f21303583356fd2d2d1805bffbecc1bc5
Author: Abe Skolnik 
Date:   Mon May 27 05:07:32 2024 +0100

vax: Fix descriptions of the FP format options [PR79646]

Replace "Target" with "Generate" consistently and place a hyphen in
"double-precision" as this is used as an adjective here.

gcc/ChangeLog:

PR target/79646
* config/vax/vax.opt (md, md-float, mg, mg-float): Correct
descriptions.

Diff:
---
 gcc/config/vax/vax.opt | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/vax/vax.opt b/gcc/config/vax/vax.opt
index 2cc66e543fe..fa2be78e9fa 100644
--- a/gcc/config/vax/vax.opt
+++ b/gcc/config/vax/vax.opt
@@ -20,19 +20,19 @@
 
 md
 Target RejectNegative InverseMask(G_FLOAT)
-Target DFLOAT double precision code.
+Generate DFLOAT double-precision code.
 
 md-float
 Target RejectNegative InverseMask(G_FLOAT)
-Target DFLOAT double precision code.
+Generate DFLOAT double-precision code.
 
 mg
 Target RejectNegative Mask(G_FLOAT)
-Generate GFLOAT double precision code.
+Generate GFLOAT double-precision code.
 
 mg-float
 Target RejectNegative Mask(G_FLOAT)
-Generate GFLOAT double precision code.
+Generate GFLOAT double-precision code.
 
 mgnu
 Target RejectNegative InverseMask(UNIX_ASM)


[gcc r15-844] VAX/doc: Fix issues with FP format option documentation

2024-05-26 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:314448fc65f40c98ee8bc02dfb54ea49d2f2c60d

commit r15-844-g314448fc65f40c98ee8bc02dfb54ea49d2f2c60d
Author: Maciej W. Rozycki 
Date:   Mon May 27 05:07:32 2024 +0100

VAX/doc: Fix issues with FP format option documentation

Use the correct names of the D_floating and G_floating data formats as
per the VAX ISA nomenclature[1].  Document the `-md', `-md-float', and
`-mg-float' options.

References:

[1] DEC STD 032-0 "VAX Architecture Standard", Digital Equipment
Corporation, A-DS-EL-00032-00-0 Rev J, December 15, 1989, Section
1.2 "Data Types", pp. 1-7, 1-9

gcc/
* doc/invoke.texi (Option Summary): Add `-md', `-md-float', and
`-mg-float' options.  Reorder, matching VAX Options.
(VAX Options): Reword the description of `-mg' option.  Add
`-md', `-md-float', and `-mg-float' options.

Diff:
---
 gcc/doc/invoke.texi | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c9d8f6b37b6..2cba380718b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1429,7 +1429,7 @@ See RS/6000 and PowerPC Options.
 -mbig-switch}
 
 @emph{VAX Options}
-@gccoptlist{-mg  -mgnu  -munix  -mlra}
+@gccoptlist{-munix  -mgnu  -md  -md-float  -mg  -mg-float  -mlra}
 
 @emph{Visium Options}
 @gccoptlist{-mdebug  -msim  -mfpu  -mno-fpu  -mhard-float  -msoft-float
@@ -34129,9 +34129,19 @@ ranges.
 Do output those jump instructions, on the assumption that the
 GNU assembler is being used.
 
+@opindex md
+@opindex md-float
+@item -md
+@itemx -md-float
+Use the D_floating data format for double-precision floating-point numbers
+instead of G_floating.
+
 @opindex mg
+@opindex mg-float
 @item -mg
-Output code for G-format floating-point numbers instead of D-format.
+@itemx -mg-float
+Use the G_floating data format for double-precision floating-point numbers
+instead of D_floating.
 
 @opindex mlra
 @opindex mno-lra


[gcc r15-5374] Alpha: Remove leftover `; ; ' for "unaligned_store"

2024-11-17 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:4a8eb5c6d87f3a1ccdf6eb248e6a7dd4cffbb7d4

commit r15-5374-g4a8eb5c6d87f3a1ccdf6eb248e6a7dd4cffbb7d4
Author: Maciej W. Rozycki 
Date:   Mon Nov 18 03:02:59 2024 +

Alpha: Remove leftover `;;' for "unaligned_store"

Remove stray `;;' from the middle of the introductory comment for the
"unaligned_store" expander, clearly a leftover from a previous
edition.

gcc/
* config/alpha/alpha.md (unaligned_store): Remove stray
`;;'.

Diff:
---
 gcc/config/alpha/alpha.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index bd92392878e2..e57a9d31e013 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -4201,7 +4201,7 @@
 })
 
 ;; For the unaligned byte and halfword cases, we use code similar to that
-;; in the ;; Architecture book, but reordered to lower the number of registers
+;; in the Architecture book, but reordered to lower the number of registers
 ;; required.  Operand 0 is the address.  Operand 1 is the data to store.
 ;; Operands 2, 3, and 4 are DImode temporaries, where operands 2 and 4 may
 ;; be the same temporary, if desired.  If the address is in a register,


[gcc r15-6079] testsuite: Mark gcc.c-torture/execute/memcpy-a?.c tests expensive

2024-12-10 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:34dfb30ca8dba6bc184e563b0ddc26a5239294e3

commit r15-6079-g34dfb30ca8dba6bc184e563b0ddc26a5239294e3
Author: Maciej W. Rozycki 
Date:   Tue Dec 10 14:24:18 2024 +

testsuite: Mark gcc.c-torture/execute/memcpy-a?.c tests expensive

These tests can take several seconds per compilation to complete, taking
total elapsed time measured in minutes.  Mark them as expensive so as to
let people skip them where they want to save on testing time.

gcc/testsuite/
* gcc.c-torture/execute/memcpy-a1.c: Mark as expensive.
* gcc.c-torture/execute/memcpy-a2.c: Likewise.
* gcc.c-torture/execute/memcpy-a4.c: Likewise.
* gcc.c-torture/execute/memcpy-a8.c: Likewise.

Diff:
---
 gcc/testsuite/gcc.c-torture/execute/memcpy-a1.c | 1 +
 gcc/testsuite/gcc.c-torture/execute/memcpy-a2.c | 1 +
 gcc/testsuite/gcc.c-torture/execute/memcpy-a4.c | 1 +
 gcc/testsuite/gcc.c-torture/execute/memcpy-a8.c | 1 +
 4 files changed, 4 insertions(+)

diff --git a/gcc/testsuite/gcc.c-torture/execute/memcpy-a1.c 
b/gcc/testsuite/gcc.c-torture/execute/memcpy-a1.c
index 9730e39c85a0..cf5245ca7104 100644
--- a/gcc/testsuite/gcc.c-torture/execute/memcpy-a1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/memcpy-a1.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target run_expensive_tests } */
 /* { dg-timeout-factor 8 } */
 /* { dg-skip-if "memory full + time hog" { "avr-*-*" } } */
 
diff --git a/gcc/testsuite/gcc.c-torture/execute/memcpy-a2.c 
b/gcc/testsuite/gcc.c-torture/execute/memcpy-a2.c
index f3b5a42c8336..5ec6131745ec 100644
--- a/gcc/testsuite/gcc.c-torture/execute/memcpy-a2.c
+++ b/gcc/testsuite/gcc.c-torture/execute/memcpy-a2.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target run_expensive_tests } */
 /* { dg-timeout-factor 8 } */
 /* { dg-skip-if "memory full + time hog" { "avr-*-*" } } */
 
diff --git a/gcc/testsuite/gcc.c-torture/execute/memcpy-a4.c 
b/gcc/testsuite/gcc.c-torture/execute/memcpy-a4.c
index 44c30126cff6..03da5c393777 100644
--- a/gcc/testsuite/gcc.c-torture/execute/memcpy-a4.c
+++ b/gcc/testsuite/gcc.c-torture/execute/memcpy-a4.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target run_expensive_tests } */
 /* { dg-timeout-factor 8 } */
 /* { dg-skip-if "memory full + time hog" { "avr-*-*" } } */
 
diff --git a/gcc/testsuite/gcc.c-torture/execute/memcpy-a8.c 
b/gcc/testsuite/gcc.c-torture/execute/memcpy-a8.c
index baee56aee332..fc55b7836818 100644
--- a/gcc/testsuite/gcc.c-torture/execute/memcpy-a8.c
+++ b/gcc/testsuite/gcc.c-torture/execute/memcpy-a8.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target run_expensive_tests } */
 /* { dg-timeout-factor 8 } */
 /* { dg-skip-if "memory full + time hog" { "avr-*-*" } } */


[gcc r15-5610] testsuite: Expand coverage for `__builtin_memcpy'

2024-11-23 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:75405ead52a987adb70576b66a4220ec490c523d

commit r15-5610-g75405ead52a987adb70576b66a4220ec490c523d
Author: Maciej W. Rozycki 
Date:   Sat Nov 23 14:02:43 2024 +

testsuite: Expand coverage for `__builtin_memcpy'

Expand coverage for `__builtin_memcpy', primarily for "cpymemM" block
copy pattern, although with smaller sizes open-coded sequences may be
produced instead.

This verifies block sizes in bytes from 1 to 64, across byte alignments
of 1, 2, 4, 8 and byte misalignments within from 0 up to 7 (there's some
redundancy there for the sake of simplicity of the test cases) both for
the source and the destination, making sure all data is copied and no
data is changed outside the area meant to be written.

These choice of the ranges for the parameters has come from the Alpha
backend, whose "cpymemM" pattern covers copies being made of up to 64
bytes and has various corner cases related to base alignment and the
misalignment within.

The test cases have turned invaluable in verifying changes to the Alpha
backend, but functionality covered is generic, so I have concluded these
tests qualify for generic verification and do not have to be limited to
the Alpha-specific subset of the testsuite.

On the implementation side the tests turned out being quite stressful to
GCC and the original simpler version that just expanded all code inline
took a lot of time to complete compilation.  Depending on the target and
compilation options elapsed times up to 40 minutes (!) have been seen,
especially with GCC built at `-O0' for debugging purposes.

At the cost of increased complexity where a pair of macros is required
per variant rather than just one I have split the code into individual
functions forced not to be inlined and it improved compilation times
considerably without losing coverage.

Example compilation times with reasonably fast POWER9@2.166GHz at `-O2'
optimization and GCC built at `-O2' for various targets:

mips-linux-gnu:23s
vax-netbsdelf: 29s
alphaev56-linux-gnu:   39s
alpha-linux-gnu:   43s
powerpc64le-linux-gnu: 48s

With GCC built at `-O0':

alphaev56-linux-gnu: 3m37s
alpha-linux-gnu: 3m54s

I have therefore set the timeout factor accordingly so as to take slower
test hosts into account.

gcc/testsuite/
* gcc.c-torture/execute/memcpy-a1.c: New file.
* gcc.c-torture/execute/memcpy-a2.c: New file.
* gcc.c-torture/execute/memcpy-a4.c: New file.
* gcc.c-torture/execute/memcpy-a8.c: New file.
* gcc.c-torture/execute/memcpy-ax.h: New file.

Diff:
---
 gcc/testsuite/gcc.c-torture/execute/memcpy-a1.c |   4 +
 gcc/testsuite/gcc.c-torture/execute/memcpy-a2.c |   4 +
 gcc/testsuite/gcc.c-torture/execute/memcpy-a4.c |   4 +
 gcc/testsuite/gcc.c-torture/execute/memcpy-a8.c |   4 +
 gcc/testsuite/gcc.c-torture/execute/memcpy-ax.h | 243 
 5 files changed, 259 insertions(+)

diff --git a/gcc/testsuite/gcc.c-torture/execute/memcpy-a1.c 
b/gcc/testsuite/gcc.c-torture/execute/memcpy-a1.c
new file mode 100644
index ..086a7c0b7052
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/memcpy-a1.c
@@ -0,0 +1,4 @@
+/* { dg-timeout-factor 8 } */
+
+#define ax_t a1_t
+#include "memcpy-ax.h"
diff --git a/gcc/testsuite/gcc.c-torture/execute/memcpy-a2.c 
b/gcc/testsuite/gcc.c-torture/execute/memcpy-a2.c
new file mode 100644
index ..57f4aa64d951
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/memcpy-a2.c
@@ -0,0 +1,4 @@
+/* { dg-timeout-factor 8 } */
+
+#define ax_t a2_t
+#include "memcpy-ax.h"
diff --git a/gcc/testsuite/gcc.c-torture/execute/memcpy-a4.c 
b/gcc/testsuite/gcc.c-torture/execute/memcpy-a4.c
new file mode 100644
index ..274845044e4e
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/memcpy-a4.c
@@ -0,0 +1,4 @@
+/* { dg-timeout-factor 8 } */
+
+#define ax_t a4_t
+#include "memcpy-ax.h"
diff --git a/gcc/testsuite/gcc.c-torture/execute/memcpy-a8.c 
b/gcc/testsuite/gcc.c-torture/execute/memcpy-a8.c
new file mode 100644
index ..59548ceae947
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/memcpy-a8.c
@@ -0,0 +1,4 @@
+/* { dg-timeout-factor 8 } */
+
+#define ax_t a8_t
+#include "memcpy-ax.h"
diff --git a/gcc/testsuite/gcc.c-torture/execute/memcpy-ax.h 
b/gcc/testsuite/gcc.c-torture/execute/memcpy-ax.h
new file mode 100644
index ..3fcaca8ccaa9
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/memcpy-ax.h
@@ -0,0 +1,243 @@
+typedef unsigned int __attribute__ ((mode (QI))) int08_t;
+typedef unsigned int __attribute__ ((mode (HI))) int16_t;
+typedef unsigned int __attribute__ ((mode (SI))) int32_t;
+typedef unsigned int __attribute__ ((mode (DI))) int64_t;
+
+typedef union
+  {
+int08_t v[88];
+  }
+a1

[gcc r15-5609] build: Discard obsolete references to $(GCC_PARTS)

2024-11-23 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:38cbee0b3c8da58c7e195c582867ffacee3b0850

commit r15-5609-g38cbee0b3c8da58c7e195c582867ffacee3b0850
Author: Maciej W. Rozycki 
Date:   Sat Nov 23 14:02:42 2024 +

build: Discard obsolete references to $(GCC_PARTS)

The $(GCC_PARTS) variable was deleted with the Makefile rework in commit
fa9585134f6f ("libgcc move to the top level")[1] back in 2007, and yet
the Ada and Modula 2 frontends added references to this variable later
on, with commit e972fd5281b7 ("[Ada] clean ups in Makefiles")[2] back in
2011 and commit 1eee94d35177 ("Merge modula-2 front end onto gcc.") back
in 2022 respectively.

I guess it's because the frontends lived too long externally.  Discard
the references then, they serve no purpose nowadays.

References:

[1] 


[2] 


gcc/ada/
* gcc-interface/Make-lang.in (gnattools): Remove $(GCC_PARTS).

gcc/m2/
* Make-lang.in (m2 modula-2 modula2): Remove $(GCC_PARTS).

Diff:
---
 gcc/ada/gcc-interface/Make-lang.in | 2 +-
 gcc/m2/Make-lang.in| 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/gcc-interface/Make-lang.in 
b/gcc/ada/gcc-interface/Make-lang.in
index 0b8f2dd56406..f3009f1d612c 100644
--- a/gcc/ada/gcc-interface/Make-lang.in
+++ b/gcc/ada/gcc-interface/Make-lang.in
@@ -793,7 +793,7 @@ gnatbind$(exeext): ada/b_gnatb.o $(CONFIG_H) 
$(GNATBIND_OBJS) $(EXTRA_HOST_OBJS)
+$(GCC_LINK) -o $@ $(CFLAGS) ada/b_gnatb.o $(GNATBIND_OBJS) 
$(EXTRA_HOST_OBJS) ggc-none.o libcommon-target.a $(LIBS) $(SYSLIBS) $(GNATLIB)
 
 # use target-gcc target-gnatmake target-gnatbind target-gnatlink
-gnattools: $(GCC_PARTS) $(CONFIG_H) prefix.o force
+gnattools: $(CONFIG_H) prefix.o force
$(MAKE) -C ada $(ADA_TOOLS_FLAGS_TO_PASS) gnattools1
$(MAKE) -C ada $(ADA_TOOLS_FLAGS_TO_PASS) gnattools2
 
diff --git a/gcc/m2/Make-lang.in b/gcc/m2/Make-lang.in
index e2a152f78d76..7515a9b10192 100644
--- a/gcc/m2/Make-lang.in
+++ b/gcc/m2/Make-lang.in
@@ -65,8 +65,7 @@ RSTSRC =  $(srcdir)/doc/gm2.texi \
   m2/Builtins.rst
 
 # Define the names for selecting modula-2 in LANGUAGES.
-m2 modula-2 modula2: gm2$(exeext) xgcc$(exeext) cc1gm2$(exeext) \
- $(GCC_PASSES) $(GCC_PARTS)
+m2 modula-2 modula2: gm2$(exeext) xgcc$(exeext) cc1gm2$(exeext) $(GCC_PASSES)
 m2.serial = cc1gm2$(exeext)
 
 m2.tags: force


[gcc r15-6833] Alpha: Restore frame pointer last in `builtin_longjmp' [PR64242]

2025-01-12 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:3cf0e6ab2aa9e7cb9a406079ff19856a6461d9f0

commit r15-6833-g3cf0e6ab2aa9e7cb9a406079ff19856a6461d9f0
Author: Maciej W. Rozycki 
Date:   Sun Jan 12 16:48:53 2025 +

Alpha: Restore frame pointer last in `builtin_longjmp' [PR64242]

Add similar arrangements to `builtin_longjmp' for Alpha as with commit
71b144289c1c ("re PR middle-end/64242 (Longjmp expansion incorrect)")
and commit 511ed59d0b04 ("Fix PR64242 - Longjmp expansion incorrect"),
so as to restore the frame pointer last, so that accesses to a local
buffer supplied can still be fulfilled with memory accesses via the
original frame pointer, fixing:

FAIL: gcc.c-torture/execute/pr64242.c   -O0  execution test
FAIL: gcc.c-torture/execute/pr64242.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr64242.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr64242.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr64242.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr64242.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr64242.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test

and adding no regressions in `alpha-linux-gnu' testing.

gcc/
PR middle-end/64242
* config/alpha/alpha.md (`builtin_longjmp'): Restore frame
pointer last.  Add frame clobber and schedule blockage.

Diff:
---
 gcc/config/alpha/alpha.md | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 35c8030422f5..178ce992206d 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -5005,14 +5005,28 @@
   rtx pv = gen_rtx_REG (Pmode, 27);
 
   /* This bit is the same as expand_builtin_longjmp.  */
+
   emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode)));
   emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
-  emit_move_insn (hard_frame_pointer_rtx, fp);
+
   emit_move_insn (pv, lab);
+
+  /* Restore the frame pointer and stack pointer.  We must use a
+ temporary since the setjmp buffer may be a local.  */
+  fp = copy_to_reg (fp);
   emit_stack_restore (SAVE_NONLOCAL, stack);
+
+  /* Ensure the frame pointer move is not optimized.  */
+  emit_insn (gen_blockage ());
+  emit_clobber (hard_frame_pointer_rtx);
+  emit_clobber (frame_pointer_rtx);
+  emit_move_insn (hard_frame_pointer_rtx, fp);
+
   emit_use (hard_frame_pointer_rtx);
   emit_use (stack_pointer_rtx);
 
+  /* End of the bit corresponding to expand_builtin_longjmp.  */
+
   /* Load the label we are jumping through into $27 so that we know
  where to look for it when we get back to setjmp's function for
  restoring the gp.  */


[gcc r15-6834] Alpha: Always respect -mbwx, -mcix, -mfix, -mmax, and their inverse

2025-01-12 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:19fdb9f3792d4c3c9ff3d18dc4566bb16e62de60

commit r15-6834-g19fdb9f3792d4c3c9ff3d18dc4566bb16e62de60
Author: Maciej W. Rozycki 
Date:   Sun Jan 12 16:48:53 2025 +

Alpha: Always respect -mbwx, -mcix, -mfix, -mmax, and their inverse

Contrary to user documentation the `-mbwx', `-mcix', `-mfix', `-mmax'
feature options and their inverse forms are ignored whenever `-mcpu='
option is in effect, either by having been given explicitly or where
configured as the default such as with the `alphaev56-linux-gnu' target.
In the latter case there is no way to change the settings these options
are supposed to tweak other than with `-mcpu=' and the settings cannot
be individually controlled, making all the feature options permanently
inactive.

It seems a regression from commit 7816bea0e23b ("config.gcc: Reorganize
--with-cpu logic.") back in 2003, which replaced the setting of the
default feature mask with the setting of the default CPU across a few
targets, and the complementing logic in the Alpha backend wasn't updated
accordingly.

Fix this by making the individual feature options take precedence over
`-mcpu='.  Add test cases to verify this is the case, and to cover the
defaults as well for the boundary cases.

This has a drawback where the order of the options is ignored between
`-mcpu=' and these individual options, so e.g. `-mno-bwx -mcpu=ev6' will
keep the BWX feature disabled even though `-mcpu=ev6' comes later in the
command line.  This may affect some scenarios involving user overrides
such as with CFLAGS passed to `configure' and `make' invocations.  I do
believe it has been our practice anyway for more finegrained options to
override group options regardless of their relative order on the command
line and in any case using `-mcpu=ev6 -mbwx' as the override will do the
right thing if required, canceling any previous `-mno-bwx'.

This has been spotted with `alphaev56-linux-gnu' target verification and
a recently added test case:

FAIL: gcc.target/alpha/stwx0.c   -O1   scan-assembler-times \\sldq_u\\s 2
FAIL: gcc.target/alpha/stwx0.c   -O1   scan-assembler-times \\smskwh\\s 1
FAIL: gcc.target/alpha/stwx0.c   -O1   scan-assembler-times \\smskwl\\s 1
FAIL: gcc.target/alpha/stwx0.c   -O1   scan-assembler-times \\sstq_u\\s 2

(and similarly for the remaining optimization levels covered) which this
fix has addressed.

gcc/
* config/alpha/alpha.cc (alpha_option_override): Ignore CPU
flags corresponding to features the enabling or disabling of
which has been requested with an individual feature option.

gcc/testsuite/
* gcc.target/alpha/target-bwx-1.c: New file.
* gcc.target/alpha/target-bwx-2.c: New file.
* gcc.target/alpha/target-bwx-3.c: New file.
* gcc.target/alpha/target-bwx-4.c: New file.
* gcc.target/alpha/target-cix-1.c: New file.
* gcc.target/alpha/target-cix-2.c: New file.
* gcc.target/alpha/target-cix-3.c: New file.
* gcc.target/alpha/target-cix-4.c: New file.
* gcc.target/alpha/target-fix-1.c: New file.
* gcc.target/alpha/target-fix-2.c: New file.
* gcc.target/alpha/target-fix-3.c: New file.
* gcc.target/alpha/target-fix-4.c: New file.
* gcc.target/alpha/target-max-1.c: New file.
* gcc.target/alpha/target-max-2.c: New file.
* gcc.target/alpha/target-max-3.c: New file.
* gcc.target/alpha/target-max-4.c: New file.

Diff:
---
 gcc/config/alpha/alpha.cc | 5 +++--
 gcc/testsuite/gcc.target/alpha/target-bwx-1.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-bwx-2.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-bwx-3.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-bwx-4.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-cix-1.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-cix-2.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-cix-3.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-cix-4.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-fix-1.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-fix-2.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-fix-3.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-fix-4.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-max-1.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-max-2.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-max-3.c | 6 ++
 gcc/testsuite/gcc.target/alpha/target-max-4.c | 6 ++
 17 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index 030dc7728859..958a785ffd0e 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -460,8 +460,9 @@ alpha_option_override (void)

[gcc r15-6835] Alpha: Optimize block moves coming from longword-aligned source

2025-01-12 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:4e557210b7f9fd669ff66c6958327eb2d4262d80

commit r15-6835-g4e557210b7f9fd669ff66c6958327eb2d4262d80
Author: Maciej W. Rozycki 
Date:   Sun Jan 12 16:48:53 2025 +

Alpha: Optimize block moves coming from longword-aligned source

Now that we have proper alignment determination for block moves in place
the case of copying a block of longword-aligned data has become real, so
implement the merging of loaded data from pairs of SImode registers into
single DImode registers for the purpose of using with unaligned stores
efficiently, as suggested by a comment in `alpha_expand_block_move' and
discard the comment.  Provide test cases accordingly.

gcc/
* config/alpha/alpha.cc (alpha_expand_block_move): Merge loaded
data from pairs of SImode registers into single DImode registers
if to be used with unaligned stores.

gcc/testsuite/
* gcc.target/alpha/memcpy-si-aligned.c: New file.
* gcc.target/alpha/memcpy-si-unaligned.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-src.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-src-bwx.c: New file.

Diff:
---
 gcc/config/alpha/alpha.cc  | 45 +++
 gcc/testsuite/gcc.target/alpha/memcpy-si-aligned.c | 16 +++
 .../gcc.target/alpha/memcpy-si-unaligned-dst.c | 16 +++
 .../gcc.target/alpha/memcpy-si-unaligned-src-bwx.c | 11 +
 .../gcc.target/alpha/memcpy-si-unaligned-src.c | 15 +++
 .../gcc.target/alpha/memcpy-si-unaligned.c | 51 ++
 6 files changed, 146 insertions(+), 8 deletions(-)

diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index 958a785ffd0e..8ec9e8c5d399 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -3931,14 +3931,44 @@ alpha_expand_block_move (rtx operands[])
 {
   words = bytes / 4;
 
-  for (i = 0; i < words; ++i)
-   data_regs[nregs + i] = gen_reg_rtx (SImode);
+  /* Load an even quantity of SImode data pieces only.  */
+  unsigned int hwords = words / 2;
+  for (i = 0; i / 2 < hwords; ++i)
+   {
+ data_regs[nregs + i] = gen_reg_rtx (SImode);
+ emit_move_insn (data_regs[nregs + i],
+ adjust_address (orig_src, SImode, ofs + i * 4));
+   }
 
-  for (i = 0; i < words; ++i)
-   emit_move_insn (data_regs[nregs + i],
-   adjust_address (orig_src, SImode, ofs + i * 4));
+  /* If we'll be using unaligned stores, merge data from pairs
+of SImode registers into DImode registers so that we can
+store it more efficiently via quadword unaligned stores.  */
+  unsigned int j;
+  if (dst_align < 32)
+   for (i = 0, j = 0; i < words / 2; ++i, j = i * 2)
+ {
+   rtx hi = expand_simple_binop (DImode, ASHIFT,
+ data_regs[nregs + j + 1],
+ GEN_INT (32), NULL_RTX,
+ 1, OPTAB_WIDEN);
+   data_regs[nregs + i] = expand_simple_binop (DImode, IOR, hi,
+   data_regs[nregs + j],
+   NULL_RTX,
+   1, OPTAB_WIDEN);
+ }
+  else
+   j = i;
 
-  nregs += words;
+  /* Take care of any remaining odd trailing SImode data piece.  */
+  if (j < words)
+   {
+ data_regs[nregs + i] = gen_reg_rtx (SImode);
+ emit_move_insn (data_regs[nregs + i],
+ adjust_address (orig_src, SImode, ofs + j * 4));
+ ++i;
+   }
+
+  nregs += i;
   bytes -= words * 4;
   ofs += words * 4;
 }
@@ -4057,13 +4087,12 @@ alpha_expand_block_move (rtx operands[])
 }
 
   /* Due to the above, this won't be aligned.  */
-  /* ??? If we have more than one of these, consider constructing full
- words in registers and using alpha_expand_unaligned_store_words.  */
   while (i < nregs && GET_MODE (data_regs[i]) == SImode)
 {
   alpha_expand_unaligned_store (orig_dst, data_regs[i], 4, ofs);
   ofs += 4;
   i++;
+  gcc_assert (i == nregs || GET_MODE (data_regs[i]) != SImode);
 }
 
   if (dst_align >= 16)
diff --git a/gcc/testsuite/gcc.target/alpha/memcpy-si-aligned.c 
b/gcc/testsuite/gcc.target/alpha/memcpy-si-aligned.c
new file mode 100644
index ..2572a3187e9d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/alpha/memcpy-si-aligned.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+unsigned int aligned_src_si[17] = { [0 ... 16] = 0xeaebeced };
+unsigned int aligned_dst_si[17] = { [0 ... 16] = 0xdcdbdad9 };
+
+void
+memcpy_aligned_data_si

[gcc r15-6836] Alpha: Fix a block move pessimisation with zero-extension after LDWU

2025-01-12 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:ed8cd42d138fa048e0c0eff1ea28b39f5abe1c29

commit r15-6836-ged8cd42d138fa048e0c0eff1ea28b39f5abe1c29
Author: Maciej W. Rozycki 
Date:   Sun Jan 12 16:48:54 2025 +

Alpha: Fix a block move pessimisation with zero-extension after LDWU

For the BWX case we have a pessimisation in `alpha_expand_block_move'
for HImode loads where we place the data loaded into a HImode register
as well, therefore losing information that indeed the data loaded has
already been zero-extended to the full DImode width of the register.
Later on when we store this data in QImode quantities into an unaligned
destination, we zero-extend it again for the purpose of right-shifting,
such as with the test case included producing code at `-O2' as follows:

ldah $2,unaligned_src_hi($29)   !gprelhigh
lda $1,unaligned_src_hi($2) !gprellow
ldwu $6,unaligned_src_hi($2)!gprellow
ldwu $5,2($1)
ldwu $4,4($1)
bis $31,$31,$31
zapnot $6,3,$3  # Redundant!
ldbu $7,6($1)
zapnot $5,3,$2  # Redundant!
stb $6,0($16)
zapnot $4,3,$1  # Redundant!
stb $5,2($16)
srl $3,8,$3
stb $4,4($16)
srl $2,8,$2
stb $3,1($16)
srl $1,8,$1
stb $2,3($16)
stb $1,5($16)
stb $7,6($16)

The non-BWX case is unaffected, because there we use byte insertion, so
we don't care that data is held in a HImode register.

Address this by making the holding RTX a HImode subreg of the original
DImode register, which the RTL passes can then see through and eliminate
the zero-extension where otherwise required, resulting in this shortened
code:

ldah $2,unaligned_src_hi($29)   !gprelhigh
lda $1,unaligned_src_hi($2) !gprellow
ldwu $4,unaligned_src_hi($2)!gprellow
ldwu $3,2($1)
ldwu $2,4($1)
bis $31,$31,$31
srl $4,8,$6
ldbu $1,6($1)
srl $3,8,$5
stb $4,0($16)
stb $6,1($16)
srl $2,8,$4
stb $3,2($16)
stb $5,3($16)
stb $2,4($16)
stb $4,5($16)
stb $1,6($16)

While at it reformat the enclosing do-while statement according to the
GNU Coding Standards, observing that in this case it does not obfuscate
the change owing to the odd original indentation.

gcc/
* config/alpha/alpha.cc (alpha_expand_block_move): Use a HImode
subreg of a DImode register to hold data from an aligned HImode
load.

Diff:
---
 gcc/config/alpha/alpha.cc   | 17 +++--
 .../gcc.target/alpha/memcpy-hi-unaligned-dst.c  | 16 
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index 8ec9e8c5d399..6965ece16d0b 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -3999,14 +3999,19 @@ alpha_expand_block_move (rtx operands[])
   if (bytes >= 2)
 {
   if (src_align >= 16)
-   {
- do {
-   data_regs[nregs++] = tmp = gen_reg_rtx (HImode);
-   emit_move_insn (tmp, adjust_address (orig_src, HImode, ofs));
+   do
+ {
+   tmp = gen_reg_rtx (DImode);
+   emit_move_insn (tmp,
+   expand_simple_unop (DImode, SET,
+   adjust_address (orig_src,
+   HImode, ofs),
+   NULL_RTX, 1));
+   data_regs[nregs++] = gen_rtx_SUBREG (HImode, tmp, 0);
bytes -= 2;
ofs += 2;
- } while (bytes >= 2);
-   }
+ }
+   while (bytes >= 2);
   else if (! TARGET_BWX)
{
  data_regs[nregs++] = tmp = gen_reg_rtx (HImode);
diff --git a/gcc/testsuite/gcc.target/alpha/memcpy-hi-unaligned-dst.c 
b/gcc/testsuite/gcc.target/alpha/memcpy-hi-unaligned-dst.c
new file mode 100644
index ..4e3c02f5b906
--- /dev/null
+++ b/gcc/testsuite/gcc.target/alpha/memcpy-hi-unaligned-dst.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mbwx" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+unsigned short unaligned_src_hi[4];
+
+void
+memcpy_unaligned_dst_hi (void *dst)
+{
+  __builtin_memcpy (dst, unaligned_src_hi, 7);
+}
+
+/* { dg-final { scan-assembler-times "\\sldwu\\s" 3 } } */
+/* { dg-final { scan-assembler-times "\\sldbu\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\sstb\\s" 7 } } */
+/* { dg-final { scan-assembler-not "\\szapnot\\s" } } */


[gcc r15-6832] Alpha: Add memory clobbers to `builtin_longjmp' expansion

2025-01-12 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:46861167f548ec622918d95acd2424b64f56797d

commit r15-6832-g46861167f548ec622918d95acd2424b64f56797d
Author: Maciej W. Rozycki 
Date:   Sun Jan 12 16:48:53 2025 +

Alpha: Add memory clobbers to `builtin_longjmp' expansion

Add the same memory clobbers to `builtin_longjmp' for Alpha as with
commit 41439bf6a647 ("builtins.c (expand_builtin_longjmp): Added two
memory clobbers."), to prevent instructions that access memory via the
frame or stack pointer from being moved across the write to the frame
pointer.

gcc/
* config/alpha/alpha.md (builtin_longjmp): Add memory clobbers.

Diff:
---
 gcc/config/alpha/alpha.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 376c4cba90c5..35c8030422f5 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -5005,6 +5005,8 @@
   rtx pv = gen_rtx_REG (Pmode, 27);
 
   /* This bit is the same as expand_builtin_longjmp.  */
+  emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode)));
+  emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
   emit_move_insn (hard_frame_pointer_rtx, fp);
   emit_move_insn (pv, lab);
   emit_stack_restore (SAVE_NONLOCAL, stack);


[gcc r15-6436] Alpha: Permit constant zero source for "insvmisaligndi"

2024-12-25 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:3c99ea19d26a2458302c54a33fbd17abfbef787a

commit r15-6436-g3c99ea19d26a2458302c54a33fbd17abfbef787a
Author: Maciej W. Rozycki 
Date:   Wed Dec 25 22:23:39 2024 +

Alpha: Permit constant zero source for "insvmisaligndi"

Eliminate a redundant bitwise inclusive OR operation on the insertion of
constant zero into a bit-field, improving code produced at `-O2' from an
output sequence such as:

mov $31,$3  # Redundant!
ldq_u $1,7($16)
insqh $3,$16,$3 # Redundant!
ldq_u $2,0($16)
mskqh $1,$16,$1
mskql $2,$16,$2
bis $1,$3,$1# Redundant!
stq_u $1,7($16)
stq_u $2,0($16)
ret $31,($26),1

to:

ldq_u $2,7($16)
ldq_u $1,0($16)
mskqh $2,$16,$2
stq_u $2,7($16)
mskql $1,$16,$1
stq_u $1,0($16)
ret $31,($26),1

for a quadword unaligned store operation.  As shown in the example this
only triggers for the high-part store (and therefore only for 2-byte,
4-byte, and 8-byte stores), because `insXl' insns are fully expressed in
terms of RTL and therefore the insertion of zero is eliminated in later
RTL passes, however corresponding `insXh' insns are unspecs only, making
them impossible to see through.

We can get this optimal right from expand though, given that our handler
for "insvmisaligndi", i.e. `alpha_expand_unaligned_store', has explicit
provisions for `const0_rtx' source.

gcc/
* config/alpha/alpha.md (insvmisaligndi): Use "reg_or_0_operand"
rather than "register_operand" for operand 3.

gcc/testsuite/
* gcc.target/alpha/stlx0.c: New file.
* gcc.target/alpha/stqx0.c: New file.
* gcc.target/alpha/stwx0.c: New file.
* gcc.target/alpha/stwx0-bwx.c: New file.

Diff:
---
 gcc/config/alpha/alpha.md  |  2 +-
 gcc/testsuite/gcc.target/alpha/stlx0.c | 28 
 gcc/testsuite/gcc.target/alpha/stqx0.c | 28 
 gcc/testsuite/gcc.target/alpha/stwx0-bwx.c | 19 +++
 gcc/testsuite/gcc.target/alpha/stwx0.c | 28 
 5 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 22ea4db057c3..2faa94252573 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -4626,7 +4626,7 @@
   [(set (zero_extract:DI (match_operand:BLK 0 "memory_operand")
 (match_operand:DI 1 "const_int_operand")
 (match_operand:DI 2 "const_int_operand"))
-   (match_operand:DI 3 "register_operand"))]
+   (match_operand:DI 3 "reg_or_0_operand"))]
   ""
 {
   /* We can do 16, 32 and 64 bit fields, if aligned on byte boundaries.  */
diff --git a/gcc/testsuite/gcc.target/alpha/stlx0.c 
b/gcc/testsuite/gcc.target/alpha/stlx0.c
new file mode 100644
index ..876eceb1cae4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/alpha/stlx0.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+typedef struct { int v __attribute__ ((packed)); } intx;
+
+void
+stlx0 (intx *p)
+{
+  p->v = 0;
+}
+
+/* Expect assembly such as:
+
+   ldq_u $2,3($16)
+   ldq_u $1,0($16)
+   msklh $2,$16,$2
+   stq_u $2,3($16)
+   mskll $1,$16,$1
+   stq_u $1,0($16)
+
+   without any INSLH, INSLL, or BIS instructions.  */
+
+/* { dg-final { scan-assembler-times "\\sldq_u\\s" 2 } } */
+/* { dg-final { scan-assembler-times "\\smsklh\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\smskll\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\sstq_u\\s" 2 } } */
+/* { dg-final { scan-assembler-not "\\s(?:bis|inslh|insll)\\s" } } */
diff --git a/gcc/testsuite/gcc.target/alpha/stqx0.c 
b/gcc/testsuite/gcc.target/alpha/stqx0.c
new file mode 100644
index ..042cdf0749fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/alpha/stqx0.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+typedef struct { long v __attribute__ ((packed)); } longx;
+
+void
+stqx0 (longx *p)
+{
+  p->v = 0;
+}
+
+/* Expect assembly such as:
+
+   ldq_u $2,7($16)
+   ldq_u $1,0($16)
+   mskqh $2,$16,$2
+   stq_u $2,7($16)
+   mskql $1,$16,$1
+   stq_u $1,0($16)
+
+   without any INSQH, INSQL, or BIS instructions.  */
+
+/* { dg-final { scan-assembler-times "\\sldq_u\\s" 2 } } */
+/* { dg-final { scan-assembler-times "\\smskqh\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\smskql\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\sstq_u\\s" 2 } } */
+/* { dg-final { scan-assembler-not "\\s(?:bis|insqh|insql)\\s" }

[gcc r15-6438] Alpha: Adjust MEM alignment for block clear [PR115459]

2024-12-25 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:2984a3fac3d6b98e2cd6d7ee1c701159be86af78

commit r15-6438-g2984a3fac3d6b98e2cd6d7ee1c701159be86af78
Author: Maciej W. Rozycki 
Date:   Wed Dec 25 22:23:40 2024 +

Alpha: Adjust MEM alignment for block clear [PR115459]

By inference it appears to me that the same fix for PR target/115459
needs to be applied to the block clear operation that has been done for
block move, as implemented by commit ccfe71518039 ("[alpha] adjust MEM
alignment for block move [PR115459]").

gcc/
PR target/115459
* config/alpha/alpha.cc (alpha_expand_block_clear): Adjust MEM
to match inferred alignment.

Diff:
---
 gcc/config/alpha/alpha.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index 58da4a886321..7c28743f2ee3 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -4076,6 +4076,12 @@ alpha_expand_block_clear (rtx operands[])
   else if (a >= 16)
align = a, alignofs = 2 - c % 2;
}
+
+  if (MEM_P (orig_dst) && MEM_ALIGN (orig_dst) < align)
+   {
+ orig_dst = shallow_copy_rtx (orig_dst);
+ set_mem_align (orig_dst, align);
+   }
 }
 
   /* Handle an unaligned prefix first.  */


[gcc r15-6434] testsuite: Expand coverage for `__builtin_memset' with 0

2024-12-25 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:5a089689a29173cbd1d4eeb93d6e3861890fde18

commit r15-6434-g5a089689a29173cbd1d4eeb93d6e3861890fde18
Author: Maciej W. Rozycki 
Date:   Wed Dec 25 22:23:39 2024 +

testsuite: Expand coverage for `__builtin_memset' with 0

Expand coverage for `__builtin_memset' for the special case of clearing
a block, primarily for "setmemM" block set pattern, though with smaller
sizes open-coded sequences may be produced instead.

This verifies block sizes in bytes from 1 to 64 across byte alignments
of 1, 2, 4, 8 and byte misalignments within from 0 up to 7 (there's some
redundancy there for the sake of simplicity of the test case), making
sure all the intended area is cleared and no data is changed outside it.

These choice of the ranges for the parameters has come from the Alpha
backend, whose "setmemM" pattern has various corner cases related to
base alignment and the misalignment within.

The test case has turned invaluable in verifying changes to the Alpha
backend, but functionality covered is generic, so I have concluded this
test qualifies for generic verification and does not have to be limited
to the Alpha-specific subset of the testsuite.

Just as with `__builtin_memcpy' tests this code turned out to require
quite a lot of time to compile, although a bit less than the former.

Example compilation times with reasonably fast POWER9@2.166GHz at `-O2'
optimization and GCC built at `-O2' for various targets:

mips-linux-gnu:19s
vax-netbsdelf: 27s
alphaev56-linux-gnu:   30s
alpha-linux-gnu:   31s
powerpc64le-linux-gnu: 47s

With GCC built at `-O0':

alphaev56-linux-gnu: 2m59s
alpha-linux-gnu: 3m06s

I have therefore set the timeout factor accordingly so as to take slower
test hosts into account.

gcc/testsuite/
* gcc.c-torture/execute/memclr.c: New file.

Diff:
---
 gcc/testsuite/gcc.c-torture/execute/memclr.c | 233 +++
 1 file changed, 233 insertions(+)

diff --git a/gcc/testsuite/gcc.c-torture/execute/memclr.c 
b/gcc/testsuite/gcc.c-torture/execute/memclr.c
new file mode 100644
index ..f45adb5339c9
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/memclr.c
@@ -0,0 +1,233 @@
+/* { dg-require-effective-target run_expensive_tests } */
+/* { dg-timeout-factor 4 } */
+/* { dg-skip-if "memory full + time hog" { "avr-*-*" } } */
+
+typedef unsigned int __attribute__ ((mode (QI))) int08_t;
+typedef unsigned int __attribute__ ((mode (HI))) int16_t;
+typedef unsigned int __attribute__ ((mode (SI))) int32_t;
+typedef unsigned int __attribute__ ((mode (DI))) int64_t;
+
+typedef union
+  {
+int08_t v[88];
+  }
+a1_t;
+
+typedef union
+  {
+int08_t v[88];
+int16_t a;
+  }
+a2_t;
+
+typedef union
+  {
+int08_t v[88];
+int32_t a;
+  }
+a4_t;
+
+typedef union
+  {
+int08_t v[88];
+int64_t a;
+  }
+a8_t;
+
+#define MEMCLR_DEFINE_ONE(align, offset, count)
\
+  static void __attribute__ ((noinline))   \
+  memclr_check_one_ ## align ## offset ## count (void) \
+{  \
+  static a ## align ## _t dst = {{ [0 ... 87] = 0xaa }};   \
+  int i;   \
+   \
+  __builtin_memset (dst.v + 8 + offset, 0, count); \
+  asm ("" : : : "memory"); \
+  for (i = 0; i < 8 + offset; i++) \
+   if (dst.v[i] != 0xaa)   \
+ __builtin_abort ();   \
+  for (; i < 8 + offset + count; i++)  \
+   if (dst.v[i] != 0x00)   \
+ __builtin_abort ();   \
+  for (; i < sizeof (dst.v); i++)  \
+   if (dst.v[i] != 0xaa)   \
+ __builtin_abort ();   \
+}
+
+#define MEMCLR_DEFINE_ONE_ALIGN_OFFSET(align, offset)  \
+  MEMCLR_DEFINE_ONE (align, offset,  1)
\
+  MEMCLR_DEFINE_ONE (align, offset,  2)
\
+  MEMCLR_DEFINE_ONE (align, offset,  3)
\
+  MEMCLR_DEFINE_ONE (align, offset,  4)
\
+  MEMCLR_DEFINE_ONE (align, offset,  5)
\
+  MEMCLR_DEFINE_ONE (align, offset,  6)
\
+  MEMCLR_DEFINE_ONE (align, offset,  7) 

[gcc r15-6433] Alpha/testsuite: Run target testing over all the usual optimization levels

2024-12-25 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:46cb538cc0ad58936748538166562e8e2a31487e

commit r15-6433-g46cb538cc0ad58936748538166562e8e2a31487e
Author: Maciej W. Rozycki 
Date:   Wed Dec 25 22:23:39 2024 +

Alpha/testsuite: Run target testing over all the usual optimization levels

Use `gcc-dg-runtest' test driver rather than `dg-runtest' to run the
Alpha testsuite as several targets already do.  Add `-Og -g' and `-Oz'
as well via ADDITIONAL_TORTURE_OPTIONS to expand coverage.  Adjust test
options across individual test cases accordingly where required.

Discard base-2.c, cix-2.c, and max-2.c test cases as they merely are
optimization variants of base-1.c, cix-1.c, and max-1.c respectively,
run at `-O2' rather than the default level (`-O0'), now covered by the
framework with the latter ones in a generic way.

Old test results:

=== gcc Summary ===

# of expected passes44

vs new ones:
=== gcc Summary ===

# of expected passes364
# of unsupported tests  5

gcc/testsuite/
* gcc.target/alpha/alpha.exp: Use `gcc-dg-runtest' rather than
`dg-runtest'.  Add `-Og -g' and `-Oz' variants via
ADDITIONAL_TORTURE_OPTIONS.
* gcc.target/alpha/2715-1.c: Adjust test options
accordingly.
* gcc.target/alpha/20011018-1.c: Likewise.
* gcc.target/alpha/980217-1.c: Likewise.
* gcc.target/alpha/asm-1.c: Likewise.
* gcc.target/alpha/pr105209.c: Likewise.
* gcc.target/alpha/pr106966.c: Likewise.
* gcc.target/alpha/pr115297.c: Likewise.
* gcc.target/alpha/pr115526.c: Likewise.
* gcc.target/alpha/pr19518.c: Likewise.
* gcc.target/alpha/pr22093.c: Likewise.
* gcc.target/alpha/pr24178.c: Likewise.
* gcc.target/alpha/pr39740.c: Likewise.
* gcc.target/alpha/pr42113.c: Likewise.
* gcc.target/alpha/pr42269-1.c: Likewise.
* gcc.target/alpha/pr42448-1.c: Likewise.
* gcc.target/alpha/pr42448-2.c: Likewise.
* gcc.target/alpha/pr42774.c: Likewise.
* gcc.target/alpha/pr61586.c: Likewise.
* gcc.target/alpha/pr66140.c: Likewise.
* gcc.target/alpha/pr83628-1.c: Likewise.
* gcc.target/alpha/pr83628-2.c: Likewise.
* gcc.target/alpha/pr83628-3.c: Likewise.
* gcc.target/alpha/pr86984.c: Likewise.
* gcc.target/alpha/sqrt.c: Likewise.
* gcc.target/alpha/base-2.c: Remove file.
* gcc.target/alpha/cix-2.c: Remove file.
* gcc.target/alpha/max-2.c: Remove file.

Diff:
---
 gcc/testsuite/gcc.target/alpha/2715-1.c | 2 +-
 gcc/testsuite/gcc.target/alpha/20011018-1.c | 2 +-
 gcc/testsuite/gcc.target/alpha/980217-1.c   | 2 +-
 gcc/testsuite/gcc.target/alpha/alpha.exp| 4 +++-
 gcc/testsuite/gcc.target/alpha/asm-1.c  | 2 +-
 gcc/testsuite/gcc.target/alpha/base-2.c | 5 -
 gcc/testsuite/gcc.target/alpha/cix-2.c  | 5 -
 gcc/testsuite/gcc.target/alpha/max-2.c  | 5 -
 gcc/testsuite/gcc.target/alpha/pr105209.c   | 2 +-
 gcc/testsuite/gcc.target/alpha/pr106966.c   | 2 +-
 gcc/testsuite/gcc.target/alpha/pr115297.c   | 2 +-
 gcc/testsuite/gcc.target/alpha/pr115526.c   | 2 +-
 gcc/testsuite/gcc.target/alpha/pr19518.c| 2 +-
 gcc/testsuite/gcc.target/alpha/pr22093.c| 2 +-
 gcc/testsuite/gcc.target/alpha/pr24178.c| 3 ++-
 gcc/testsuite/gcc.target/alpha/pr39740.c| 2 +-
 gcc/testsuite/gcc.target/alpha/pr42113.c| 2 +-
 gcc/testsuite/gcc.target/alpha/pr42269-1.c  | 3 ++-
 gcc/testsuite/gcc.target/alpha/pr42448-1.c  | 2 +-
 gcc/testsuite/gcc.target/alpha/pr42448-2.c  | 2 +-
 gcc/testsuite/gcc.target/alpha/pr42774.c| 2 +-
 gcc/testsuite/gcc.target/alpha/pr61586.c| 2 +-
 gcc/testsuite/gcc.target/alpha/pr66140.c| 2 +-
 gcc/testsuite/gcc.target/alpha/pr83628-1.c  | 3 ++-
 gcc/testsuite/gcc.target/alpha/pr83628-2.c  | 3 ++-
 gcc/testsuite/gcc.target/alpha/pr83628-3.c  | 3 ++-
 gcc/testsuite/gcc.target/alpha/pr86984.c| 2 +-
 gcc/testsuite/gcc.target/alpha/sqrt.c   | 2 +-
 28 files changed, 32 insertions(+), 40 deletions(-)

diff --git a/gcc/testsuite/gcc.target/alpha/2715-1.c 
b/gcc/testsuite/gcc.target/alpha/2715-1.c
index 3ff15604eb96..8b81022a315f 100644
--- a/gcc/testsuite/gcc.target/alpha/2715-1.c
+++ b/gcc/testsuite/gcc.target/alpha/2715-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mieee" } */
+/* { dg-options "-mieee" } */
 
 float foo(unsigned char n)
 {
diff --git a/gcc/testsuite/gcc.target/alpha/20011018-1.c 
b/gcc/testsuite/gcc.target/alpha/20011018-1.c
index e01fcf5c4ad9..a68054c5c9e0 100644
--- a/gcc/testsuite/gcc.target/alpha/20011018-1.c
+++ b/gcc/testsuite/gcc.target/alpha/20011018-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-option

[gcc r15-6439] Alpha: Fix offset adjustment in unaligned access helpers

2024-12-25 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:524fedd7f658f9c57e5f230f21cadf406c5d5011

commit r15-6439-g524fedd7f658f9c57e5f230f21cadf406c5d5011
Author: Maciej W. Rozycki 
Date:   Wed Dec 25 22:23:40 2024 +

Alpha: Fix offset adjustment in unaligned access helpers

Correct the offset adjustment made in the multi-word unaligned access
helpers such that it is actually used by the unaligned load and store
instructions, fixing a bug introduced with commit 1eb356b98df2 ("alpha
gprel optimizations")[1] back in 2001, which replaced address changes
made directly according to the argument of the MEM expression passed
with one made according to an address previously extracted from said MEM
expression.  The address is however incorrectly extracted from said MEM
before an adjustment has been made to it for the offset supplied.

This bug is usually covered by the fact that our block move and clear
operations are hardly ever provided with correct block alignment data
and we also usually fail to fetch that information from the MEM supplied
(although PR target/115459 shows it does happen sometimes).  Instead the
bit alignment of 8 is usually conservatively used, meaning that a zero
offset is passed to `alpha_expand_unaligned_store_words' and then code
has been written such that neither `alpha_expand_unaligned_load_words'
nor `alpha_expand_unaligned_store_words' cannot ever be called with
nonzero offset from `alpha_expand_block_move'.

The only situation where `alpha_expand_unaligned_store_words' can be
called with nonzero offset is from `alpha_expand_block_clear' with a BWX
target for a misaligned block that has been embedded in a data object of
a higher alignment such that there is a small unaligned prefix our code
decides to handle so as to align further stores.

For instance it happens when a block clear is called for a block of 9
bytes embedded at offset 1 in a structure aligned to a 2-byte word, as
illustrated by the test case included.  Now this test case does not work
without the change that comes next applied, because the backend cannot
see the word alignment of the struct and uses the bit alignment of 8
instead.

Should this change be swapped with the next one incorrect code such as:

stb $31,1($16)
lda $3,1($16)
ldq_u $2,8($16)
ldq_u $1,1($16)
mskqh $2,$3,$2
stq_u $2,8($16)
mskql $1,$3,$1
stq_u $1,1($16)

would be produced, where the unadjusted offsets of 1/8 can be seen with
the LDQ_U/STQ_U operations along with byte masks calculated accordingly
rather than the expected offsets of 2/9.  As a result the byte at the
offset of 9 fails to get cleared.  In these circumstances this would
also show as execution failures with the memclr.c test:

FAIL: gcc.c-torture/execute/memclr.c   -O1  execution test
FAIL: gcc.c-torture/execute/memclr.c   -Os  execution test

-- not at `-O0' though, as the higher alignment cannot be retrieved in
that case, and then not at `-O2' or higher optimization levels either,
because then we choose to open-code this block clear instead:

ldbu $1,0($16)
stw $31,8($16)
stq $1,0($16)

avoiding the bug in `alpha_expand_unaligned_store_words'.

I am leaving the pattern match test case XFAIL-ed here for documentation
purposes and it will be un-XFAIL-ed along with the fix to retrieve the
correct alignment.  The run test is of course never expected to fail.

References:

[1] 


gcc/
* config/alpha/alpha.cc (alpha_expand_unaligned_load_words):
Move address extraction until after the MEM referred has been
adjusted for the offset supplied.
(alpha_expand_unaligned_store_words): Likewise.

gcc/testsuite/
* gcc.target/alpha/memclr-a2-o1-c9-ptr.c: New file.
* gcc.target/alpha/memclr-a2-o1-c9-run.c: New file.

Diff:
---
 gcc/config/alpha/alpha.cc  | 16 +++
 .../gcc.target/alpha/memclr-a2-o1-c9-ptr.c | 50 ++
 .../gcc.target/alpha/memclr-a2-o1-c9-run.c | 25 +++
 3 files changed, 83 insertions(+), 8 deletions(-)

diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index 7c28743f2ee3..07753297c387 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -3625,10 +3625,6 @@ alpha_expand_unaligned_load_words (rtx *out_regs, rtx 
smem,
   rtx sreg, areg, tmp, smema;
   HOST_WIDE_INT i;
 
-  smema = XEXP (smem, 0);
-  if (GET_CODE (smema) == LO_SUM)
-smema = force_reg (Pmode, smema);
-
   /* Generate all the tmp registers we need.  */
   for (i = 0; i < words; ++i)
 {
@@ -3640,6 +363

[gcc r15-6435] testsuite: Expand coverage for unaligned memory stores

2024-12-25 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:665e0f9c08a9922ba06aeaa719fd4adc5a689df7

commit r15-6435-g665e0f9c08a9922ba06aeaa719fd4adc5a689df7
Author: Maciej W. Rozycki 
Date:   Wed Dec 25 22:23:39 2024 +

testsuite: Expand coverage for unaligned memory stores

Expand coverage for unaligned memory stores, for the "insvmisalignM"
patterns, for 2-byte, 4-byte, and 8-byte scalars, across byte alignments
of 1, 2, 4 and byte misalignments within from 0 up to 7 (there's some
redundancy there for the sake of simplicity of the test case), making
sure all data is written and no data is changed outside the area meant
to be written.

The test case has turned invaluable in verifying changes to the Alpha
backend, but functionality covered is generic, so I have concluded this
test qualifies for generic verification and does not have to be limited
to the Alpha-specific subset of the testsuite.

gcc/testsuite/
* gcc.c-torture/execute/misalign.c: New file.

Diff:
---
 gcc/testsuite/gcc.c-torture/execute/misalign.c | 84 ++
 1 file changed, 84 insertions(+)

diff --git a/gcc/testsuite/gcc.c-torture/execute/misalign.c 
b/gcc/testsuite/gcc.c-torture/execute/misalign.c
new file mode 100644
index ..f63b960932f2
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/misalign.c
@@ -0,0 +1,84 @@
+typedef unsigned int __attribute__ ((mode (QI))) intw1_t;
+typedef unsigned int __attribute__ ((mode (HI))) intw2_t;
+typedef unsigned int __attribute__ ((mode (SI))) intw4_t;
+typedef unsigned int __attribute__ ((mode (DI))) intw8_t;
+
+#define MISALIGN_DEFINE_ONE(align, width, offset)  \
+  static void  \
+  misalign_check_one_ ## align ## width ## offset (void)   \
+{  \
+  static union \
+   {   \
+ intw1_t v[32];\
+ struct __attribute__ ((packed))   \
+   {   \
+ intw1_t o[8 + offset];\
+ intw ## width ## _t x;\
+   } x;\
+ intw ## align ## _t a;\
+   }   \
+  dst = {{ [0 ... 31] = 0xaa }};   \
+  static const union   \
+   {   \
+ intw1_t v[8]; \
+ intw ## width ## _t x;\
+   }   \
+  src = {{ 1, 2, 3, 4, 5, 6, 7, 8 }};  \
+  int i, j;
\
+   \
+  dst.x.x = src.x; \
+  asm ("" : : : "memory"); \
+  for (i = 0; i < 8 + offset; i++) \
+   if (dst.v[i] != 0xaa)   \
+ __builtin_abort ();   \
+  for (j = 0; i < 8 + offset + width; i++, j++)\
+   if (dst.v[i] != src.v[j])   \
+ __builtin_abort ();   \
+  for (; i < sizeof (dst.v); i++)  \
+   if (dst.v[i] != 0xaa)   \
+ __builtin_abort ();   \
+}
+
+#define MISALIGN_DEFINE_ONE_ALIGN_WIDTH(align, width)  \
+  MISALIGN_DEFINE_ONE (align, width, 1)
\
+  MISALIGN_DEFINE_ONE (align, width, 2)
\
+  MISALIGN_DEFINE_ONE (align, width, 3)
\
+  MISALIGN_DEFINE_ONE (align, width, 4)
\
+  MISALIGN_DEFINE_ONE (align, width, 5)
\
+  MISALIGN_DEFINE_ONE (align, width, 6)
\
+  MISALIGN_DEFINE_ONE (align, width, 7)
+
+MISALIGN_DEFINE_ONE_ALIGN_WIDTH (1, 2)
+MISALIGN_DEFINE_ONE_ALIGN_WIDTH (1, 4)
+MISALIGN_DEFINE_ONE_ALIGN_WIDTH (1, 8)
+MISALIGN_DEFINE_ONE_ALIGN_WIDTH (2, 4)
+MISALIGN_DEFINE_ONE_ALIGN_WIDTH (2, 8)
+MISALIGN_DEFINE_ONE_

[gcc r15-6437] Alpha: Remove code duplication in block clear trailer

2024-12-25 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:6036a1a479154706a3a7c779ee28e74b03357c55

commit r15-6437-g6036a1a479154706a3a7c779ee28e74b03357c55
Author: Maciej W. Rozycki 
Date:   Wed Dec 25 22:23:39 2024 +

Alpha: Remove code duplication in block clear trailer

Remove code duplication in the part of `alpha_expand_block_clear' that
handles any aligned trailing part of the block, observing that the two
legs of code only differ by the machine mode and that we already take
the same approach with handling any unaligned prefix earlier on.  No
functional change, just code shuffling.

gcc/
* config/alpha/alpha.cc (alpha_expand_block_clear): Fold two
legs of a conditional together.

Diff:
---
 gcc/config/alpha/alpha.cc | 41 -
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index f196524dfa12..58da4a886321 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -4236,40 +4236,23 @@ alpha_expand_block_clear (rtx operands[])
 
   /* If we have appropriate alignment (and it wouldn't take too many
  instructions otherwise), mask out the bytes we need.  */
-  if (TARGET_BWX ? words > 2 : bytes > 0)
+  if ((TARGET_BWX ? words > 2 : bytes > 0)
+  && (align >= 64 || (align >= 32 && bytes < 4)))
 {
-  if (align >= 64)
-   {
- rtx mem, tmp;
- HOST_WIDE_INT mask;
+  machine_mode mode = (align >= 64 ? DImode : SImode);
+  rtx mem, tmp;
+  HOST_WIDE_INT mask;
 
- mem = adjust_address (orig_dst, DImode, ofs);
- set_mem_alias_set (mem, 0);
+  mem = adjust_address (orig_dst, mode, ofs);
+  set_mem_alias_set (mem, 0);
 
- mask = HOST_WIDE_INT_M1U << (bytes * 8);
+  mask = HOST_WIDE_INT_M1U << (bytes * 8);
 
- tmp = expand_binop (DImode, and_optab, mem, GEN_INT (mask),
- NULL_RTX, 1, OPTAB_WIDEN);
+  tmp = expand_binop (mode, and_optab, mem, GEN_INT (mask),
+ NULL_RTX, 1, OPTAB_WIDEN);
 
- emit_move_insn (mem, tmp);
- return 1;
-   }
-  else if (align >= 32 && bytes < 4)
-   {
- rtx mem, tmp;
- HOST_WIDE_INT mask;
-
- mem = adjust_address (orig_dst, SImode, ofs);
- set_mem_alias_set (mem, 0);
-
- mask = HOST_WIDE_INT_M1U << (bytes * 8);
-
- tmp = expand_binop (SImode, and_optab, mem, GEN_INT (mask),
- NULL_RTX, 1, OPTAB_WIDEN);
-
- emit_move_insn (mem, tmp);
- return 1;
-   }
+  emit_move_insn (mem, tmp);
+  return 1;
 }
 
   if (!TARGET_BWX && bytes >= 4)


[gcc r15-6440] Alpha: Also use tree information to get base block alignment

2024-12-25 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:e0dae4da4c45e3959b0624551f80283c45a60446

commit r15-6440-ge0dae4da4c45e3959b0624551f80283c45a60446
Author: Maciej W. Rozycki 
Date:   Wed Dec 25 22:23:40 2024 +

Alpha: Also use tree information to get base block alignment

We hardly ever emit code using machine instructions for aligned memory
accesses for block move and clear operation and the reason for this
appears to be that suboptimal alignment is often passed by the caller
and then we only try to find a better alignment by checking pseudo
register pointer alignment information, and from observation it's most
often only set for stack frame references.

This code originates from before Tree SSA days and we can do better
nowadays, by looking up the original tree node associated with a MEM
RTL, so implement this approach, factoring out repeating code from
`alpha_expand_block_move' and `alpha_expand_block_clear' to a new
function.

In some cases howewer tree information is not available while pointer
alignment is, such as with the case concerned with PR target/115459,
where we have:

(gdb) pr orig_src
(mem:BLK (plus:DI (reg/f:DI 65 virtual-stack-vars [ lock.206_2 ])
(const_int 8368 [0x20b0])) [8  S18 A8])
(gdb) pr orig_dst
(mem/j/c:BLK (plus:DI (reg/f:DI 65 virtual-stack-vars [ lock.206_2 ])
(const_int 8208 [0x2010])) [8 MEM[(struct 
gnat__debug_pools__print_info_stdout__internal__L_18__B1182b__S1183b___PAD 
*)_339].F[1 ...]{lb: 1 sz: 1}+0 S18 A128])
(gdb)

showing no tree information and the alignment of 8 only for `orig_src',
while indeed REGNO_POINTER_ALIGN returns 128 for pseudo 65.  So retain
the old approach and return the largest alignment determined and its
associated offset.

Add test cases accordingly and remove XFAILs from memclr-a2-o1-c9-ptr.c
now that it does get aligned code produced now.

gcc/
* config/alpha/alpha.cc
(alpha_get_mem_rtx_alignment_and_offset): New function.
(alpha_expand_block_move, alpha_expand_block_clear): Use it for
alignment retrieval.

gcc/testsuite/
* gcc.target/alpha/memclr-a2-o1-c9-ptr.c: Remove XFAILs.
* gcc.target/alpha/memcpy-di-aligned.c: New file.
* gcc.target/alpha/memcpy-di-unaligned.c: New file.
* gcc.target/alpha/memcpy-di-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-di-unaligned-src.c: New file.

Diff:
---
 gcc/config/alpha/alpha.cc  | 160 +
 .../gcc.target/alpha/memclr-a2-o1-c9-ptr.c |  10 +-
 gcc/testsuite/gcc.target/alpha/memcpy-di-aligned.c |  16 +++
 .../gcc.target/alpha/memcpy-di-unaligned-dst.c |  16 +++
 .../gcc.target/alpha/memcpy-di-unaligned-src.c |  15 ++
 .../gcc.target/alpha/memcpy-di-unaligned.c |  51 +++
 6 files changed, 206 insertions(+), 62 deletions(-)

diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index 07753297c387..3b3a237a955f 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -3771,6 +3771,78 @@ alpha_expand_unaligned_store_words (rtx *data_regs, rtx 
dmem,
   emit_move_insn (st_addr_1, st_tmp_1);
 }
 
+/* Get the base alignment and offset of EXPR in A and O respectively.
+   Check for any pseudo register pointer alignment and for any tree
+   node information and return the largest alignment determined and
+   its associated offset.  */
+
+static void
+alpha_get_mem_rtx_alignment_and_offset (rtx expr, int &a, HOST_WIDE_INT &o)
+{
+  HOST_WIDE_INT tree_offset = 0, reg_offset = 0, mem_offset = 0;
+  int tree_align = 0, reg_align = 0, mem_align = MEM_ALIGN (expr);
+
+  gcc_assert (MEM_P (expr));
+
+  rtx addr = XEXP (expr, 0);
+  switch (GET_CODE (addr))
+{
+case REG:
+  reg_align = REGNO_POINTER_ALIGN (REGNO (addr));
+  break;
+
+case PLUS:
+  if (REG_P (XEXP (addr, 0)) && CONST_INT_P (XEXP (addr, 1)))
+   {
+ reg_offset = INTVAL (XEXP (addr, 1));
+ reg_align = REGNO_POINTER_ALIGN (REGNO (XEXP (addr, 0)));
+   }
+  break;
+
+default:
+  break;
+}
+
+  tree mem = MEM_EXPR (expr);
+  if (mem != NULL_TREE)
+switch (TREE_CODE (mem))
+  {
+  case MEM_REF:
+   tree_offset = mem_ref_offset (mem).force_shwi ();
+   tree_align = get_object_alignment (get_base_address (mem));
+   break;
+
+  case COMPONENT_REF:
+   {
+ tree byte_offset = component_ref_field_offset (mem);
+ tree bit_offset = DECL_FIELD_BIT_OFFSET (TREE_OPERAND (mem, 1));
+ poly_int64 offset;
+ if (!byte_offset
+ || !poly_int_tree_p (byte_offset, &offset)
+ || !tree_fits_shwi_p (bit_offset))
+   break;
+ tree_offset = offset + tree_to_shwi (bit_offset) / BITS_PER_UNIT;
+   }
+   tree_align = get_object_alignment (get_base

[gcc r15-9036] Alpha: Add option to avoid data races for sub-longword memory stores [PR117759]

2025-03-30 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:3d4d82211c8cbfde0b852bde1603b5d549426df7

commit r15-9036-g3d4d82211c8cbfde0b852bde1603b5d549426df7
Author: Maciej W. Rozycki 
Date:   Sun Mar 30 15:24:50 2025 +0100

Alpha: Add option to avoid data races for sub-longword memory stores 
[PR117759]

With non-BWX Alpha implementations we have a problem of data races where
a 8-bit byte or 16-bit word quantity is to be written to memory in that
in those cases we use an unprotected RMW access of a 32-bit longword or
64-bit quadword width.  If contents of the longword or quadword accessed
outside the byte or word to be written are changed midway through by a
concurrent write executing on the same CPU such as by a signal handler
or a parallel write executing on another CPU such as by another thread
or via a shared memory segment, then the concluding write of the RMW
access will clobber them.  This is especially important for the safety
of RCU algorithms, but is otherwise an issue anyway.

To guard against these data races with byte and aligned word quantities
introduce the `-msafe-bwa' command-line option (standing for Safe Byte &
Word Access) that instructs the compiler to instead use an atomic RMW
access sequence where byte and word memory access machine instructions
are not available.  There is no change to code produced for BWX targets.

It would be sufficient for the secondary reload handle to use a pair of
scratch registers, as requested by `reload_out', but it would end
with poor code produced as one of the scratches would be occupied by
data retrieved and the other one would have to be reloaded with repeated
calculations, all within the LL/SC sequence.

Therefore I chose to add a dedicated `reload_out_safe_bwa' handler
and ask for more scratches there by defining a 256-bit OI integer mode.
While reload is documented in our manual to support an arbitrary number
of scratches in reality it hasn't been implemented for IRA:

/* ??? It would be useful to be able to handle only two, or more than
   three, operands, but for now we can only handle the case of having
   exactly three: output, input and one temp/scratch.  */

and it seems to be the case for LRA as well.  Do what everyone else does
then and just have one wide multi-register scratch.

I note that the atomic sequences emitted are suboptimal performance-wise
as the looping branch for the unsuccessful completion of the sequence
points backwards, which means it will be predicted as taken despite that
in most cases it will fall through.  I do not see it as a deficiency of
this change proposed as it takes care of recording that the branch is
unlikely to be taken, by calling `alpha_emit_unlikely_jump'.  Therefore
generic code elsewhere should instead be investigated and adjusted
accordingly for the arrangement to actually take effect.

Add test cases accordingly.

There are notable regressions between a plain `-mno-bwx' configuration
and a `-mno-bwx -msafe-bwa' one:

FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O0  execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O1  execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O2  execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O3 -g  execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -Os  execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test
FAIL: g++.dg/init/array25.C  -std=c++17 execution test
FAIL: g++.dg/init/array25.C  -std=c++98 execution test
FAIL: g++.dg/init/array25.C  -std=c++26 execution test

They come from the fact that these test cases play tricks with alignment
and end up calling code that expects a reference to aligned data but is
handed one to unaligned data.

This doesn't cause a visible problem with plain `-mno-bwx' code, because
the resulting alignment exception is fixed up by Linux.  There's no such
handling currently implemented for LDL_L or LDQ_L instructions (which
are first in the sequence) and consequently the offender is issued with
SIGBUS instead.  Suitable handling will be added to Linux to complement
this change that will emulate the trapping instructions[1], so these
interim regressions are seen as harmless and expected.

References:

[1] "Alpha: Emulate unaligned LDx_L/STx_C for data consistency",



gcc/
PR target/117759
* config/alpha/alpha-modes.def (OI): New integer mode.
* config/alpha/alpha-protos.h (alpha_expand_mov_safe_bwa): N

[gcc r15-9034] Alpha: Export `emit_unlikely_jump' for a subsequent change to use

2025-03-30 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:47a48c7f42b5ad908f087bf612615632319cf445

commit r15-9034-g47a48c7f42b5ad908f087bf612615632319cf445
Author: Maciej W. Rozycki 
Date:   Sun Mar 30 15:24:50 2025 +0100

Alpha: Export `emit_unlikely_jump' for a subsequent change to use

Rename `emit_unlikely_jump' function to `alpha_emit_unlikely_jump', so
as to avoid namespace pollution, updating callers accordingly and export
it for use in the machine description.  Make it return the insn emitted.

gcc/
* config/alpha/alpha-protos.h (alpha_emit_unlikely_jump): New
prototype.
* config/alpha/alpha.cc (emit_unlikely_jump): Rename to...
(alpha_emit_unlikely_jump): ... this.  Return the insn emitted.
(alpha_split_atomic_op, alpha_split_compare_and_swap)
(alpha_split_compare_and_swap_12, alpha_split_atomic_exchange)
(alpha_split_atomic_exchange_12): Update call sites accordingly.

Diff:
---
 gcc/config/alpha/alpha-protos.h |  1 +
 gcc/config/alpha/alpha.cc   | 19 ++-
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/gcc/config/alpha/alpha-protos.h b/gcc/config/alpha/alpha-protos.h
index 1bc5520e5d55..6d28fa88ecf9 100644
--- a/gcc/config/alpha/alpha-protos.h
+++ b/gcc/config/alpha/alpha-protos.h
@@ -59,6 +59,7 @@ extern rtx alpha_expand_zap_mask (HOST_WIDE_INT);
 extern void alpha_expand_builtin_vector_binop (rtx (*)(rtx, rtx, rtx),
   machine_mode,
   rtx, rtx, rtx);
+extern rtx alpha_emit_unlikely_jump (rtx, rtx);
 extern void alpha_expand_builtin_establish_vms_condition_handler (rtx, rtx);
 extern void alpha_expand_builtin_revert_vms_condition_handler (rtx);
 
diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index 6965ece16d0b..d3e8a3a9756e 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -4421,12 +4421,13 @@ alpha_expand_builtin_vector_binop (rtx (*gen) (rtx, 
rtx, rtx),
 /* A subroutine of the atomic operation splitters.  Jump to LABEL if
COND is true.  Mark the jump as unlikely to be taken.  */
 
-static void
-emit_unlikely_jump (rtx cond, rtx label)
+rtx
+alpha_emit_unlikely_jump (rtx cond, rtx label)
 {
   rtx x = gen_rtx_IF_THEN_ELSE (VOIDmode, cond, label, pc_rtx);
   rtx_insn *insn = emit_jump_insn (gen_rtx_SET (pc_rtx, x));
   add_reg_br_prob_note (insn, profile_probability::very_unlikely ());
+  return insn;
 }
 
 /* Subroutines of the atomic operation splitters.  Emit barriers
@@ -4518,7 +4519,7 @@ alpha_split_atomic_op (enum rtx_code code, rtx mem, rtx 
val, rtx before,
   emit_insn (gen_store_conditional (mode, cond, mem, scratch));
 
   x = gen_rtx_EQ (DImode, cond, const0_rtx);
-  emit_unlikely_jump (x, label);
+  alpha_emit_unlikely_jump (x, label);
 
   alpha_post_atomic_barrier (model);
 }
@@ -4568,7 +4569,7 @@ alpha_split_compare_and_swap (rtx operands[])
   emit_insn (gen_rtx_SET (cond, x));
   x = gen_rtx_EQ (DImode, cond, const0_rtx);
 }
-  emit_unlikely_jump (x, label2);
+  alpha_emit_unlikely_jump (x, label2);
 
   emit_move_insn (cond, newval);
   emit_insn (gen_store_conditional
@@ -4577,7 +4578,7 @@ alpha_split_compare_and_swap (rtx operands[])
   if (!is_weak)
 {
   x = gen_rtx_EQ (DImode, cond, const0_rtx);
-  emit_unlikely_jump (x, label1);
+  alpha_emit_unlikely_jump (x, label1);
 }
 
   if (!is_mm_relaxed (mod_f))
@@ -4680,7 +4681,7 @@ alpha_split_compare_and_swap_12 (rtx operands[])
   emit_insn (gen_rtx_SET (cond, x));
   x = gen_rtx_EQ (DImode, cond, const0_rtx);
 }
-  emit_unlikely_jump (x, label2);
+  alpha_emit_unlikely_jump (x, label2);
 
   emit_insn (gen_mskxl (cond, scratch, mask, addr));
 
@@ -4692,7 +4693,7 @@ alpha_split_compare_and_swap_12 (rtx operands[])
   if (!is_weak)
 {
   x = gen_rtx_EQ (DImode, cond, const0_rtx);
-  emit_unlikely_jump (x, label1);
+  alpha_emit_unlikely_jump (x, label1);
 }
 
   if (!is_mm_relaxed (mod_f))
@@ -4732,7 +4733,7 @@ alpha_split_atomic_exchange (rtx operands[])
   emit_insn (gen_store_conditional (mode, cond, mem, scratch));
 
   x = gen_rtx_EQ (DImode, cond, const0_rtx);
-  emit_unlikely_jump (x, label);
+  alpha_emit_unlikely_jump (x, label);
 
   alpha_post_atomic_barrier (model);
 }
@@ -4806,7 +4807,7 @@ alpha_split_atomic_exchange_12 (rtx operands[])
   emit_insn (gen_store_conditional (DImode, scratch, mem, scratch));
 
   x = gen_rtx_EQ (DImode, scratch, const0_rtx);
-  emit_unlikely_jump (x, label);
+  alpha_emit_unlikely_jump (x, label);
 
   alpha_post_atomic_barrier (model);
 }


[gcc r15-9037] Alpha: Add option to avoid data races for partial writes [PR117759]

2025-03-30 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:1b85c548e2480116c74a7f74b487e3787c770056

commit r15-9037-g1b85c548e2480116c74a7f74b487e3787c770056
Author: Maciej W. Rozycki 
Date:   Sun Mar 30 15:24:51 2025 +0100

Alpha: Add option to avoid data races for partial writes [PR117759]

Similarly to data races with 8-bit byte or 16-bit word quantity memory
writes on non-BWX Alpha implementations we have the same problem even on
BWX implementations with partial memory writes produced for unaligned
stores as well as block memory move and clear operations.  This happens
at the boundaries of the area written where we produce unprotected RMW
sequences, such as for example:

ldbu $1,0($3)
stw $31,8($3)
stq $1,0($3)

to zero a 9-byte member at the byte offset of 1 of a quadword-aligned
struct, happily clobbering a 1-byte member at the beginning of said
struct if concurrent write happens while executing on the same CPU such
as in a signal handler or a parallel write happens while executing on
another CPU such as in another thread or via a shared memory segment.

To guard against these data races with partial memory write accesses
introduce the `-msafe-partial' command-line option that instructs the
compiler to protect boundaries of the data quantity accessed by instead
using a longer code sequence composed of narrower memory writes where
suitable machine instructions are available (i.e. with BWX targets) or
atomic RMW access sequences where byte and word memory access machine
instructions are not available (i.e. with non-BWX targets).

Owing to the desire of branch avoidance there are redundant overlapping
writes in unaligned cases where STQ_U operations are used in the middle
of a block so as to make sure no part of data to be written has been
lost regardless of run-time alignment.  For the non-BWX case it means
that with blocks whose size is not a multiple of 8 there are additional
atomic RMW sequences issued towards the end of the block in addition to
the always required pair enclosing the block from each end.

Only one such additional atomic RMW sequence is actually required, but
code currently issues two for the sake of simplicity.  An improvement
might be added to `alpha_expand_unaligned_store_words_safe_partial' in
the future, by folding `alpha_expand_unaligned_store_safe_partial' code
for handling multi-word blocks whose size is not a multiple of 8 (i.e.
with a trailing partial-word part).  It would improve performance a bit,
but current code is correct regardless.

Update test cases with `-mno-safe-partial' where required and add new
ones accordingly.

In some cases GCC chooses to open-code block memory write operations, so
with non-BWX targets `-msafe-partial' will in the usual case have to be
used together with `-msafe-bwa'.

Credit to Magnus Lindholm  for sharing hardware for
the purpose of verifying the BWX side of this change.

gcc/
PR target/117759
* config/alpha/alpha-protos.h
(alpha_expand_unaligned_store_safe_partial): New prototype.
* config/alpha/alpha.cc (alpha_expand_movmisalign)
(alpha_expand_block_move, alpha_expand_block_clear): Handle
TARGET_SAFE_PARTIAL.
(alpha_expand_unaligned_store_safe_partial)
(alpha_expand_unaligned_store_words_safe_partial)
(alpha_expand_clear_safe_partial_nobwx): New functions.
* config/alpha/alpha.md (insvmisaligndi): Handle
TARGET_SAFE_PARTIAL.
* config/alpha/alpha.opt (msafe-partial): New option.
* config/alpha/alpha.opt.urls: Regenerate.
* doc/invoke.texi (Option Summary, DEC Alpha Options): Document
the new option.

gcc/testsuite/
PR target/117759
* gcc.target/alpha/memclr-a2-o1-c9-ptr.c: Add
`-mno-safe-partial'.
* gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c: New file.
* gcc.target/alpha/memcpy-di-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c: New
file.
* gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c:
New file.
* gcc.target/alpha/memcpy-si-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c: New
file.
* gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c:
New file.
* gcc.target/alpha/stlx0.c: Add `-mno-safe-partial'.
* gcc.target/alpha/stlx0-safe-partial.c: New file.
* gcc.target/alpha/stlx0-safe-partial-bwx.c: New file.
* gcc.target/alpha/stqx0.c: Add `-mno-safe-partial'.
* gcc.target/alpha/stqx0-safe-partial.c: New file.
  

[gcc r15-9035] IRA+LRA: Let the backend request to split basic blocks

2025-03-30 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:89f03fd59fbe151b2efee354c616862a9b194ed9

commit r15-9035-g89f03fd59fbe151b2efee354c616862a9b194ed9
Author: Maciej W. Rozycki 
Date:   Sun Mar 30 15:24:50 2025 +0100

IRA+LRA: Let the backend request to split basic blocks

The next change for Alpha will produce extra labels and branches in
reload, which in turn requires basic blocks to be split at completion.
We do this already for functions that can trap, so just extend the
arrangement with a flag for the backend to use whenever it finds it
necessary.

gcc/
* function.h (struct function): Add
`split_basic_blocks_after_reload' member.
* lra.cc (lra): Handle it.
* reload1.cc (reload): Likewise.

Diff:
---
 gcc/function.h | 3 +++
 gcc/lra.cc | 6 --
 gcc/reload1.cc | 6 --
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/function.h b/gcc/function.h
index e8aa52fc780a..2260d6704ecc 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -449,6 +449,9 @@ struct GTY(()) function {
   /* Set for artificial function created for [[assume (cond)]].
  These should be GIMPLE optimized, but not expanded to RTL.  */
   unsigned int assume_function : 1;
+
+  /* Nonzero if reload will have to split basic blocks.  */
+  unsigned int split_basic_blocks_after_reload : 1;
 };
 
 /* Add the decl D to the local_decls list of FUN.  */
diff --git a/gcc/lra.cc b/gcc/lra.cc
index 8c6991751e5c..2b3014f160cf 100644
--- a/gcc/lra.cc
+++ b/gcc/lra.cc
@@ -2615,8 +2615,10 @@ lra (FILE *f, int verbose)
 
   inserted_p = fixup_abnormal_edges ();
 
-  /* We've possibly turned single trapping insn into multiple ones.  */
-  if (cfun->can_throw_non_call_exceptions)
+  /* Split basic blocks if we've possibly turned single trapping insn
+ into multiple ones or otherwise the backend requested to do so.  */
+  if (cfun->can_throw_non_call_exceptions
+  || cfun->split_basic_blocks_after_reload)
 {
   auto_sbitmap blocks (last_basic_block_for_fn (cfun));
   bitmap_ones (blocks);
diff --git a/gcc/reload1.cc b/gcc/reload1.cc
index fe4fe58981c9..64ec74e2bf5f 100644
--- a/gcc/reload1.cc
+++ b/gcc/reload1.cc
@@ -1272,8 +1272,10 @@ reload (rtx_insn *first, int global)
 
   inserted = fixup_abnormal_edges ();
 
-  /* We've possibly turned single trapping insn into multiple ones.  */
-  if (cfun->can_throw_non_call_exceptions)
+  /* Split basic blocks if we've possibly turned single trapping insn
+ into multiple ones or otherwise the backend requested to do so.  */
+  if (cfun->can_throw_non_call_exceptions
+  || cfun->split_basic_blocks_after_reload)
 {
   auto_sbitmap blocks (last_basic_block_for_fn (cfun));
   bitmap_ones (blocks);


[gcc r16-33] Alpha: Fix base block alignment calculation regression

2025-04-19 Thread Maciej W. Rozycki via Gcc-cvs
https://gcc.gnu.org/g:1dd769b3d0d9251649dcb645d7ed6c4ba2202306

commit r16-33-g1dd769b3d0d9251649dcb645d7ed6c4ba2202306
Author: Maciej W. Rozycki 
Date:   Sat Apr 19 14:10:25 2025 +0100

Alpha: Fix base block alignment calculation regression

In determination of base block alignment we only examine a COMPONENT_REF
tree node at hand without ever checking if its ultimate alignment has
been reduced by the combined offset going back to the outermost object.
Consequently cases have been observed where quadword accesses have been
produced for a memory location referring a nested struct member only
aligned to the longword boundary, causing emulation to trigger.

Address this issue by recursing into COMPONENT_REF tree nodes until the
outermost one has been reached, which is supposed to be a MEM_REF one,
accumulating the offset as we go, fixing a commit e0dae4da4c45 ("Alpha:
Also use tree information to get base block alignment") regression.

Bail out and refrain from using tree information for alignment if we end
up at something different or we are unable to calculate the offset at
any point.

gcc/
* config/alpha/alpha.cc
(alpha_get_mem_rtx_alignment_and_offset): Recurse into
COMPONENT_REF nodes.

gcc/testsuite/
* gcc.target/alpha/memcpy-nested-offset-long.c: New file.
* gcc.target/alpha/memcpy-nested-offset-quad.c: New file.

Diff:
---
 gcc/config/alpha/alpha.cc  | 23 +++
 .../gcc.target/alpha/memcpy-nested-offset-long.c   | 76 ++
 .../gcc.target/alpha/memcpy-nested-offset-quad.c   | 64 ++
 3 files changed, 150 insertions(+), 13 deletions(-)

diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index ba470d9e75ec..14e7da57ca6f 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -4291,14 +4291,10 @@ alpha_get_mem_rtx_alignment_and_offset (rtx expr, int 
&a, HOST_WIDE_INT &o)
 
   tree mem = MEM_EXPR (expr);
   if (mem != NULL_TREE)
-switch (TREE_CODE (mem))
-  {
-  case MEM_REF:
-   tree_offset = mem_ref_offset (mem).force_shwi ();
-   tree_align = get_object_alignment (get_base_address (mem));
-   break;
+{
+  HOST_WIDE_INT comp_offset = 0;
 
-  case COMPONENT_REF:
+  for (; TREE_CODE (mem) == COMPONENT_REF; mem = TREE_OPERAND (mem, 0))
{
  tree byte_offset = component_ref_field_offset (mem);
  tree bit_offset = DECL_FIELD_BIT_OFFSET (TREE_OPERAND (mem, 1));
@@ -4307,14 +4303,15 @@ alpha_get_mem_rtx_alignment_and_offset (rtx expr, int 
&a, HOST_WIDE_INT &o)
  || !poly_int_tree_p (byte_offset, &offset)
  || !tree_fits_shwi_p (bit_offset))
break;
- tree_offset = offset + tree_to_shwi (bit_offset) / BITS_PER_UNIT;
+ comp_offset += offset + tree_to_shwi (bit_offset) / BITS_PER_UNIT;
}
-   tree_align = get_object_alignment (get_base_address (mem));
-   break;
 
-  default:
-   break;
-  }
+  if (TREE_CODE (mem) == MEM_REF)
+   {
+ tree_offset = comp_offset + mem_ref_offset (mem).force_shwi ();
+ tree_align = get_object_alignment (get_base_address (mem));
+   }
+}
 
   if (reg_align > mem_align)
 {
diff --git a/gcc/testsuite/gcc.target/alpha/memcpy-nested-offset-long.c 
b/gcc/testsuite/gcc.target/alpha/memcpy-nested-offset-long.c
new file mode 100644
index ..631d14f3de27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/alpha/memcpy-nested-offset-long.c
@@ -0,0 +1,76 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+typedef unsigned int __attribute__ ((mode (DI))) int64_t;
+typedef unsigned int __attribute__ ((mode (SI))) int32_t;
+
+typedef union
+  {
+int32_t l[8];
+  }
+val;
+
+typedef struct
+  {
+int32_t l[2];
+val v;
+  }
+tre;
+
+typedef struct
+  {
+int32_t l[3];
+tre t;
+  }
+due;
+
+typedef struct
+  {
+val v;
+int64_t q;
+int32_t l[2];
+due d;
+  }
+uno;
+
+void
+memcpy_nested_offset_long (uno *u)
+{
+  u->d.t.v = u->v;
+}
+
+/* Expect assembly such as:
+
+   ldq $4,0($16)
+   ldq $3,8($16)
+   ldq $2,16($16)
+   srl $4,32,$7
+   ldq $1,24($16)
+   srl $3,32,$6
+   stl $4,68($16)
+   srl $2,32,$5
+   stl $7,72($16)
+   srl $1,32,$4
+   stl $3,76($16)
+   stl $6,80($16)
+   stl $2,84($16)
+   stl $5,88($16)
+   stl $1,92($16)
+   stl $4,96($16)
+
+   that is with four quadword loads at offsets 0, 8, 16, 24 each and
+   eight longword stores at offsets 68, 72, 76, 80, 84, 88, 92, 96 each.  */
+
+/* { dg-final { scan-assembler-times 
"\\sldq\\s\\\$\[0-9\]+,0\\\(\\\$16\\\)\\s" 1 } } */
+/* { dg-final { scan-assembler-times 
"\\sldq\\s\\\$\[0-9\]+,8\\\(\\\$16\\\)\\s" 1 } } */
+/* { dg-final { scan-assembler-times 
"\\sldq\\s\\