Re: [PATCH htdocs] gcc-14: document -ftrampoline-impl

2025-02-09 Thread Iain Sandoe



> On 10 Feb 2025, at 08:28, Sam James  wrote:
> 
> ---
> Iain, does this look OK?
> 
> htdocs/gcc-14/changes.html | 7 +++
> 1 file changed, 7 insertions(+)
> 
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> index ba4780ca..2db442b2 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -123,6 +123,13 @@ You may also want to check out our
> not an optimization, to avoid relying on library
> implementations.
>   
> +  
> +   New option
> +href="https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Code-Gen-Options.html#index-ftrampoline-impl";>-ftrampoline-impl=,
> +   to choose the position of generated trampolines.

maybe “placement” instead of “position” - but not a show-stopper for me
otherwise LGTM.
Iain

> The default remains the stack
> +   for almost all targets. To override the default,
> +   -ftrampoline-impl=heap can be passed on supported ABIs
> +   (x86_64 Darwin, x86_64 and aarch64 Linux).
>   
> 
> New function attribute
> -- 
> 2.48.1
> 



Re: [PATCH 0/3] GCC13/GCC12 backport [PR108707][PR109610]

2025-02-09 Thread Richard Biener
On Mon, Feb 10, 2025 at 6:46 AM liuhongt  wrote:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108707#c9
>
> >Pranav Gorantla 2025-02-06 04:30:05 UTC
> >Facing similar issue in gcc-13. Is it possible to backport the fix of this 
> >Bug 108707 and Bug 109610 to gcc-13, gcc-12 as well.
>
> This series is to ask approval for the backport of r14-172 and r14-1252 to 
> GCC13 and GCC12 release branch.
> Note r14-1252 is a fix to r14-172 which regressed powerpc testcase in 
> PR109610.
> I have verified the fix also works on GCC13/GCC12 branch for PR109610.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}, and 
> aarch64-linux-gnu.
> Ok for backport

No, please avoid backporting missed-optimization fixes, esp. in code
areas as sensitive as RA.
The only exception should be the newest active branch (gcc-14) at this
point, and even there
you need to be very careful.  The PR itself isn't even marked as
regression btw. (not that this
would change my assessment much).

Thanks,
Richard.

>
> liuhongt (3):
>   Use NO_REGS in cost calculation when the preferred register class are
> not known yet.
>   Only use NO_REGS in cost calculation when !hard_regno_mode_ok for
> GENERAL_REGS and mode.
>   Adjust testcases after better RA decision.
>
>  gcc/ira-costs.cc  |   7 +
>  .../i386/avx2-dest-false-dep-for-glc.c|  28 +-
>  .../i386/avx512dq-dest-false-dep-for-glc.c| 257 ++---
>  .../i386/avx512f-dest-false-dep-for-glc.c | 348 ++
>  .../i386/avx512fp16-dest-false-dep-for-glc.c  | 118 --
>  .../i386/avx512vl-dest-false-dep-for-glc.c| 243 +---
>  gcc/testsuite/gcc.target/i386/pr108707.c  |  16 +
>  7 files changed, 813 insertions(+), 204 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr108707.c
>
> --
> 2.34.1
>


[PATCH htdocs] gcc-14: document -ftrampoline-impl

2025-02-09 Thread Sam James
---
Iain, does this look OK?

 htdocs/gcc-14/changes.html | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index ba4780ca..2db442b2 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -123,6 +123,13 @@ You may also want to check out our
 not an optimization, to avoid relying on library
 implementations.
   
+  
+   New option
+   https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Code-Gen-Options.html#index-ftrampoline-impl";>-ftrampoline-impl=,
+   to choose the position of generated trampolines. The default remains the 
stack
+   for almost all targets. To override the default,
+   -ftrampoline-impl=heap can be passed on supported ABIs
+   (x86_64 Darwin, x86_64 and aarch64 Linux).
   
 
 New function attribute
-- 
2.48.1



[patch,avr] Add -mno-call-main to tweak running main()

2025-02-09 Thread Georg-Johann Lay

On devices with very limited resources, it may be desirable to run
main in a more efficient way than provided by the startup code

   XCALL main
   XJMP  exit

from section .init9.  In AVR-LibC v2.3, that code has been moved to
libmcu.a, hence symbol __call_main can be satisfied so that the
respective code is no more pulled in from that library.
Instead, main can be run by putting it in section .init9.

The patch adds attributes noreturn and section(".init9"), and
sets __call_main=0 when it encounters main().

Ok for trunk?

Johann

--

AVR: target/118806 - Add -mno-call-main to tweak running main().

On devices with very limited resources, it may be desirable to run
main in a more efficient way than provided by the startup code

   XCALL main
   XJMP  exit

from section .init9.  In AVR-LibC v2.3, that code has been moved to
libmcu.a, hence symbol __call_main can be satisfied so that the
respective code is no more pulled in from that library.
Instead, main can be run by putting it in section .init9.

The patch adds attributes noreturn and section(".init9"), and
sets __call_main=0 when it encounters main().

gcc/
PR target/118806
* config/avr/avr.opt (-mcall-main): New option and...
(avropt_call_main): ...variable.
* config/avr/avr.cc (avr_no_call_main_p): New variable.
(avr_insert_attributes) [-mno-call-main, main]: Add attributes
noreturn and section(".init9") to main.  Set avr_no_call_main_p.
(avr_file_end) [avr_no_call_main_p]: Define symbol __call_main.
* doc/invoke.texi (AVR Options) <-mno-call-main>: Document.
<-mnodevicelib>: Extend explanation.AVR: target/118806 - Add -mno-call-main to tweak running main().

On devices with very limited resources, it may be desirable to run
main in a more efficient way than provided by the startup code

   XCALL main
   XJMP  exit

from section .init9.  In AVR-LibC v2.3, that code has been moved to
libmcu.a, hence symbol __call_main can be satisfied so that the
respective code is no more pulled in from that library.
Instead, main can be run by putting it in section .init9.

The patch adds attributes noreturn and section(".init9"), and
sets __call_main=0 when it encounters main().

gcc/
PR target/118806
* config/avr/avr.opt (-mcall-main): New option and...
(avropt_call_main): ...variable.
* config/avr/avr.cc (avr_no_call_main_p): New variable.
(avr_insert_attributes) [-mno-call-main, main]: Add attributes
noreturn and section(".init9") to main.  Set avr_no_call_main_p.
(avr_file_end) [avr_no_call_main_p]: Define symbol __call_main.
* doc/invoke.texi (AVR Options) <-mno-call-main>: Document.
<-mnodevicelib>: Extend explanation.

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 9bfb3785ddb..c31d06d6d20 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -230,6 +230,9 @@ bool avr_need_clear_bss_p = false;
 bool avr_need_copy_data_p = false;
 bool avr_has_rodata_p = false;
 
+/* To track if we satisfy __call_main from AVR-LibC.  */
+bool avr_no_call_main_p = false;
+
 /* Counts how often pass avr-fuse-add has been executed.  Is is kept in
sync with cfun->machine->n_avr_fuse_add_executed and serves as an
insn condition for shift insn splitters.  */
@@ -11712,6 +11715,32 @@ avr_insert_attributes (tree node, tree *attributes)
 			   NULL, *attributes);
 }
 
+#if defined WITH_AVRLIBC
+  if (avropt_call_main == 0
+  && TREE_CODE (node) == FUNCTION_DECL
+  && MAIN_NAME_P (DECL_NAME (node)))
+{
+  if (lookup_attribute ("section", *attributes))
+	{
+	  warning (OPT_Wattributes, "% attribute on main function"
+		   " inhibits %<-mno-call-main%>");
+	}
+  else
+	{
+	  if (!lookup_attribute ("noreturn", *attributes))
+	*attributes = tree_cons (get_identifier ("noreturn"),
+ NULL_TREE, *attributes);
+	  // Put main into section .init9 so that it is executed even
+	  // though it's not called.
+	  tree init9 = build_string (1 + strlen (".init9"), ".init9");
+	  tree arg = build_tree_list (NULL_TREE, init9);
+	  *attributes = tree_cons (get_identifier ("section"),
+   arg, *attributes);
+	  avr_no_call_main_p = true;
+	}
+} // -mno-call-main
+#endif // AVR-LibC
+
   avr_handle_isr_attribute (node, attributes, "signal");
   avr_handle_isr_attribute (node, attributes, "interrupt");
 
@@ -12311,6 +12340,15 @@ avr_file_end (void)
 
   if (avr_need_clear_bss_p)
 fputs (".global __do_clear_bss\n", asm_out_file);
+
+  /* Don't let __call_main call main() and exit().
+ Defining this symbol will keep the code from being pulled
+ in from lib.a as requested by AVR-LibC's gcrt1.S.
+ We invoke main() by other means: putting it in .init9.  */
+
+  if (avr_no_call_main_p)
+fputs (".global __call_main\n"
+	   "__call_main = 0\n", asm_out_file);
 }
 
 

[PATCH 2/3] Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode.

2025-02-09 Thread liuhongt
r14-172-g0368d169492017 replaces GENERAL_REGS with NO_REGS in cost
calculation when the preferred register class are not known yet.
It regressed powerpc PR109610 and PR109858, it looks too aggressive to use
NO_REGS when mode can be allocated with GENERAL_REGS.
The patch takes a step back, still use GENERAL_REGS when
hard_regno_mode_ok for mode and GENERAL_REGS, otherwise uses NO_REGS.

gcc/ChangeLog:

PR target/109610
PR target/109858
* ira-costs.cc (scan_one_insn): Only use NO_REGS in cost
calculation when !hard_regno_mode_ok for GENERAL_REGS and
mode, otherwise still use GENERAL_REGS.

(cherry picked from commit 4fb66b2329319e9b47e89200d613b6f741a114fc)
---
 gcc/ira-costs.cc | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index 003963f2a19..d9e700e8947 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -1572,12 +1572,16 @@ scan_one_insn (rtx_insn *insn)
   && (! ira_use_lra_p || ! pic_offset_table_rtx
  || ! contains_symbol_ref_p (XEXP (note, 0
 {
-  /* Costs for NO_REGS are used in cost calculation on the
-1st pass when the preferred register classes are not
-known yet.  In this case we take the best scenario.  */
-  enum reg_class cl = NO_REGS;
+  enum reg_class cl = GENERAL_REGS;
   rtx reg = SET_DEST (set);
   int num = COST_INDEX (REGNO (reg));
+  /* Costs for NO_REGS are used in cost calculation on the
+1st pass when the preferred register classes are not
+known yet.  In this case we take the best scenario when
+mode can't be put into GENERAL_REGS.  */
+  if (!targetm.hard_regno_mode_ok (ira_class_hard_regs[cl][0],
+  GET_MODE (reg)))
+   cl = NO_REGS;
 
   COSTS (costs, num)->mem_cost
-= ira_memory_move_cost[GET_MODE (reg)][cl][1] * frequency;
-- 
2.34.1



[PATCH 0/3] GCC13/GCC12 backport [PR108707][PR109610]

2025-02-09 Thread liuhongt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108707#c9

>Pranav Gorantla 2025-02-06 04:30:05 UTC
>Facing similar issue in gcc-13. Is it possible to backport the fix of this Bug 
>108707 and Bug 109610 to gcc-13, gcc-12 as well.

This series is to ask approval for the backport of r14-172 and r14-1252 to 
GCC13 and GCC12 release branch.
Note r14-1252 is a fix to r14-172 which regressed powerpc testcase in PR109610.
I have verified the fix also works on GCC13/GCC12 branch for PR109610.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}, and aarch64-linux-gnu.
Ok for backport


liuhongt (3):
  Use NO_REGS in cost calculation when the preferred register class are
not known yet.
  Only use NO_REGS in cost calculation when !hard_regno_mode_ok for
GENERAL_REGS and mode.
  Adjust testcases after better RA decision.

 gcc/ira-costs.cc  |   7 +
 .../i386/avx2-dest-false-dep-for-glc.c|  28 +-
 .../i386/avx512dq-dest-false-dep-for-glc.c| 257 ++---
 .../i386/avx512f-dest-false-dep-for-glc.c | 348 ++
 .../i386/avx512fp16-dest-false-dep-for-glc.c  | 118 --
 .../i386/avx512vl-dest-false-dep-for-glc.c| 243 +---
 gcc/testsuite/gcc.target/i386/pr108707.c  |  16 +
 7 files changed, 813 insertions(+), 204 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr108707.c

-- 
2.34.1



Re: [PATCH 0/3] GCC13/GCC12 backport [PR108707][PR109610]

2025-02-09 Thread Hongtao Liu
On Mon, Feb 10, 2025 at 1:43 PM liuhongt  wrote:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108707#c9
>
> >Pranav Gorantla 2025-02-06 04:30:05 UTC
> >Facing similar issue in gcc-13. Is it possible to backport the fix of this 
> >Bug 108707 and Bug 109610 to gcc-13, gcc-12 as well.
>
> This series is to ask approval for the backport of r14-172 and r14-1252 to 
> GCC13 and GCC12 release branch.
> Note r14-1252 is a fix to r14-172 which regressed powerpc testcase in 
> PR109610.
> I have verified the fix also works on GCC13/GCC12 branch for PR109610.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}, and 
> aarch64-linux-gnu.
> Ok for backport
>
>
> liuhongt (3):
>   Use NO_REGS in cost calculation when the preferred register class are
> not known yet.
>   Only use NO_REGS in cost calculation when !hard_regno_mode_ok for
> GENERAL_REGS and mode.
>   Adjust testcases after better RA decision.
>
>  gcc/ira-costs.cc  |   7 +
>  .../i386/avx2-dest-false-dep-for-glc.c|  28 +-
>  .../i386/avx512dq-dest-false-dep-for-glc.c| 257 ++---
>  .../i386/avx512f-dest-false-dep-for-glc.c | 348 ++
>  .../i386/avx512fp16-dest-false-dep-for-glc.c  | 118 --
>  .../i386/avx512vl-dest-false-dep-for-glc.c| 243 +---
>  gcc/testsuite/gcc.target/i386/pr108707.c  |  16 +
>  7 files changed, 813 insertions(+), 204 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr108707.c
>
> --
> 2.34.1
>


-- 
BR,
Hongtao


[PATCH 1/3] Use NO_REGS in cost calculation when the preferred register class are not known yet.

2025-02-09 Thread liuhongt
gcc/ChangeLog:

PR rtl-optimization/108707
* ira-costs.cc (scan_one_insn): Use NO_REGS instead of
GENERAL_REGS when preferred reg_class is not known.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr108707.c: New test.

(cherry picked from commit 0368d169492017cfab5622d38b15be94154d458c)
---
 gcc/ira-costs.cc |  5 -
 gcc/testsuite/gcc.target/i386/pr108707.c | 16 
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr108707.c

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index bdb1356af91..003963f2a19 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -1572,7 +1572,10 @@ scan_one_insn (rtx_insn *insn)
   && (! ira_use_lra_p || ! pic_offset_table_rtx
  || ! contains_symbol_ref_p (XEXP (note, 0
 {
-  enum reg_class cl = GENERAL_REGS;
+  /* Costs for NO_REGS are used in cost calculation on the
+1st pass when the preferred register classes are not
+known yet.  In this case we take the best scenario.  */
+  enum reg_class cl = NO_REGS;
   rtx reg = SET_DEST (set);
   int num = COST_INDEX (REGNO (reg));
 
diff --git a/gcc/testsuite/gcc.target/i386/pr108707.c 
b/gcc/testsuite/gcc.target/i386/pr108707.c
new file mode 100644
index 000..6405cfe7cdc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr108707.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-not {(?n)vfmadd[1-3]*ps.*\(} { target { ! ia32 
} } } } */
+/* { dg-final { scan-assembler-times {(?n)vfmadd[1-3]*ps[ \t]*} 3 } } */
+
+#include
+
+void
+foo (__m512 pv, __m512 a, __m512 b, __m512 c,
+ __m512* pdest, __m512* p1)
+{
+  __m512 t = *p1;
+pdest[0] = _mm512_fmadd_ps (t, pv, a);
+pdest[1] = _mm512_fmadd_ps (t, pv, b);
+pdest[2] = _mm512_fmadd_ps (t, pv, c);
+}
-- 
2.34.1



[PATCH 3/3] Adjust testcases after better RA decision.

2025-02-09 Thread liuhongt
After optimization for RA, memory op is not propagated into
instructions(>1), and it make testcases not generate vxorps since
the memory is loaded into the dest, and the dest is never unused now.

So rewrite testcases to make the codegen more stable.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx2-dest-false-dep-for-glc.c: Rewrite
testcase to make the codegen more stable.
* gcc.target/i386/avx512dq-dest-false-dep-for-glc.c: Ditto
* gcc.target/i386/avx512f-dest-false-dep-for-glc.c: Ditto.
* gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c: Ditto.
* gcc.target/i386/avx512vl-dest-false-dep-for-glc.c: Ditto.

(cherry picked from commit 525713ed9db904ed2504decc5bde9d58bd5d98b4)
---
 .../i386/avx2-dest-false-dep-for-glc.c|  28 +-
 .../i386/avx512dq-dest-false-dep-for-glc.c| 257 ++---
 .../i386/avx512f-dest-false-dep-for-glc.c | 348 ++
 .../i386/avx512fp16-dest-false-dep-for-glc.c  | 118 --
 .../i386/avx512vl-dest-false-dep-for-glc.c| 243 +---
 5 files changed, 790 insertions(+), 204 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c 
b/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c
index fe331fe5e2c..e260888627f 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c
+++ b/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c
@@ -5,16 +5,28 @@
 
 #include 
 
-extern __m256i i1, i2, i3, i4;
-extern __m256d d1, d2;
-extern __m256 f1, f2;
+__m256i
+foo0 (__m256i i3, __m256i i1, __m256i i2)
+{
+  return _mm256_permutevar8x32_epi32 (i1, i2);
+}
+
+__m256i
+foo1 (__m256i i2, __m256i i1)
+{
+  return _mm256_permute4x64_epi64 (i1, 12);
+}
+
+__m256d
+foo2 (__m256d d2, __m256d d1)
+{
+  return _mm256_permute4x64_pd (d1, 12);
+}
 
-void vperm_test (void)
+__m256
+foo3 (__m256 f2, __m256i i2, __m256 f1)
 {
-  i3 = _mm256_permutevar8x32_epi32 (i1, i2);
-  i4 = _mm256_permute4x64_epi64 (i1, 12);
-  d2 = _mm256_permute4x64_pd (d1, 12);
-  f2 = _mm256_permutevar8x32_ps (f1, i2);
+  return _mm256_permutevar8x32_ps (f1, i2);
 }
 
 /* { dg-final { scan-assembler-times "vxorps" 4 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c 
b/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c
index b334b88194b..b615b8d 100644
--- a/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c
@@ -13,56 +13,219 @@ extern __m512 f1, f11;
 extern __m256 f2;
 extern __m128 f3, f33;
 
-__mmask32 m32;
 __mmask16 m16;
 __mmask8 m8;
 
-void mullo_test (void)
-{
-  i1 = _mm512_mullo_epi64 (i1, i1);
-  i1 = _mm512_mask_mullo_epi64 (i1, m8, i1, i1);
-  i1 = _mm512_maskz_mullo_epi64 (m8, i1, i1);
-  i2 = _mm256_mullo_epi64 (i2, i2);
-  i2 = _mm256_mask_mullo_epi64 (i2, m8, i2, i2);
-  i2 = _mm256_maskz_mullo_epi64 (m8, i2, i2);
-  i3 = _mm_mullo_epi64 (i3, i3);
-  i3 = _mm_mask_mullo_epi64 (i3, m8, i3, i3);
-  i3 = _mm_maskz_mullo_epi64 (m8, i3, i3);
-}
-
-void range_test (void)
-{
-  d1 = _mm512_range_pd (d1, d11, 15);
-  d11 = _mm512_range_round_pd (d11, d1, 15, 8);
-  d1 = _mm512_mask_range_pd (d1, m8, d11, d11, 15);
-  d11 = _mm512_mask_range_round_pd (d11, m8, d1, d1, 15, 8);
-  d1 = _mm512_maskz_range_pd (m8, d11, d11, 15);
-  d11 = _mm512_maskz_range_round_pd (m8, d1, d1, 15, 8);
-  d2 = _mm256_range_pd (d2, d2, 15);
-  d2 = _mm256_mask_range_pd (d2, m8, d2, d2, 15);
-  d2 = _mm256_maskz_range_pd (m8, d2, d2, 15);
-  d3 = _mm_range_pd (d3, d3, 15);
-  d3 = _mm_mask_range_pd (d3, m8, d3, d3, 15);
-  d3 = _mm_maskz_range_pd (m8, d3, d3, 15);
-  d33 = _mm_range_sd (d33, d33, 15);
-  d33 = _mm_mask_range_sd (d33, m8, d33, d33, 15);
-  d33 = _mm_maskz_range_sd (m8, d33, d33, 15);
-
-  f1 = _mm512_range_ps (f1, f11, 15);
-  f11 = _mm512_range_round_ps (f11, f1, 15, 8);
-  f1 = _mm512_mask_range_ps (f1, m16, f11, f11, 15);
-  f11 = _mm512_mask_range_round_ps (f11, m16, f1, f1, 15, 8);
-  f1 = _mm512_maskz_range_ps (m16, f11, f11, 15);
-  f11 = _mm512_maskz_range_round_ps (m16, f1, f1, 15, 8);
-  f2 = _mm256_range_ps (f2, f2, 15);
-  f2 = _mm256_mask_range_ps (f2, m8, f2, f2, 15);
-  f2 = _mm256_maskz_range_ps (m8, f2, f2, 15);
-  f3 = _mm_range_ps (f3, f3, 15);
-  f3 = _mm_mask_range_ps (f3, m8, f3, f3, 15);
-  f3 = _mm_maskz_range_ps (m8, f3, f3, 15);
-  f33 = _mm_range_ss (f33, f33, 15);
-  f33 = _mm_mask_range_ss (f33, m8, f33, f33, 15);
-  f33 = _mm_maskz_range_ss (m8, f33, f33, 15);
+#define MULLO(func, type)  \
+  type \
+  mullo##type (type i2, type i1)   \
+  {\
+return func (i1, i1);  \
+  }
+
+#define MULLO_MASK(func, type) \
+  type \
+  mullo_mask##type (type i2, type i1)  \
+  {\
+return func (i

[PATCH] i386: Fix AVX512BW intrin header with __OPTIMIZE__ [PR 118813]

2025-02-09 Thread Haochen Jiang
Hi all,

When moving intrins around for AVX10 implementation in GCC 14,
the intrin _kshiftli_mask32 and _kshiftri_mask32 are wrongly
wrapped by "#if __OPTIMIZE__" instead of "#ifdef __OPTIMIZE__",
leading to the intrin file not `-Wsystem-headers -Wundef` clean
since r14-4490.

Ok for trunk?

Thx,
Haochen

gcc/ChangeLog:

PR target/118813
* config/i386/avx512bwintrin.h: Fix wrong __OPTIMIZE__
wrap.
---
 gcc/config/i386/avx512bwintrin.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index 187e15a80ca..47c4c03e796 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -199,7 +199,7 @@ _kunpackw_mask32 (__mmask16 __A, __mmask16 __B)
  (__mmask32) __B);
 }
 
-#if __OPTIMIZE__
+#ifdef __OPTIMIZE__
 extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _kshiftli_mask32 (__mmask32 __A, unsigned int __B)
-- 
2.31.1



RE: [PATCH] i386: Fix AVX512BW intrin header with __OPTIMIZE__ [PR 118813]

2025-02-09 Thread Liu, Hongtao



> -Original Message-
> From: Jiang, Haochen 
> Sent: Monday, February 10, 2025 2:10 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] i386: Fix AVX512BW intrin header with __OPTIMIZE__ [PR
> 118813]
> 
> Hi all,
> 
> When moving intrins around for AVX10 implementation in GCC 14, the intrin
> _kshiftli_mask32 and _kshiftri_mask32 are wrongly wrapped by "#if
> __OPTIMIZE__" instead of "#ifdef __OPTIMIZE__", leading to the intrin file not
> `-Wsystem-headers -Wundef` clean since r14-4490.
> 
> Ok for trunk?
Ok, and please backport to GCC14 release branch.
> 
> Thx,
> Haochen
> 
> gcc/ChangeLog:
> 
>   PR target/118813
>   * config/i386/avx512bwintrin.h: Fix wrong __OPTIMIZE__
>   wrap.
> ---
>  gcc/config/i386/avx512bwintrin.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/i386/avx512bwintrin.h
> b/gcc/config/i386/avx512bwintrin.h
> index 187e15a80ca..47c4c03e796 100644
> --- a/gcc/config/i386/avx512bwintrin.h
> +++ b/gcc/config/i386/avx512bwintrin.h
> @@ -199,7 +199,7 @@ _kunpackw_mask32 (__mmask16 __A, __mmask16
> __B)
> (__mmask32) __B);
>  }
> 
> -#if __OPTIMIZE__
> +#ifdef __OPTIMIZE__
>  extern __inline __mmask32
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _kshiftli_mask32 (__mmask32 __A, unsigned int __B)
> --
> 2.31.1



Re: [PATCH] i386: Change RTL representation of bt[lq] [PR118623]

2025-02-09 Thread Jakub Jelinek
On Sun, Feb 09, 2025 at 09:24:30AM +0100, Uros Bizjak wrote:
> "For commutative and comparison operators, a constant is always made
> the second operand."

Isn't that just for commutative comparison operators (eq, ne, ordered,
unordered, ltgt, uneq)?  Compare itself even isn't RTX_COMPARE at all,
it is RTX_BIN_ARITH, so similar to minus.
And I must say I can't find where simplify-rtx.cc or combine.cc would be
trying to canonicalize that order (note, it would need to change the related
uses of flags as well).

Jakub



Re: [PATCH 0/1] gdc: define ELFv1, ELFv2 and D_PPCUseIEEE128 versions for powerpc

2025-02-09 Thread Iain Buclaw
Excerpts from liushuyu's message of Februar 6, 2025 2:02 am:
> From: Zixing Liu 
> 
> This set of patches will add proper IEEE 128 quad precision marking to GDC
> compiler, so that it works with the new changes in D standard library
> where POWER system can use any math functions inside the standard library
> that requires the "real" type.
> 
> The patch also adds the ELFv1 and ELFv2 version identifiers to bridge
> the gap between LLVM D Compiler (LDC) and GNU D Compiler (GDC) so that
> the user can reliably use the "version(...)" syntax to check for which
> ABI is currently in use.

Thanks. I wonder if something could be done to predefine ELFv1 for other 
targets too. Unconditionally calling add_builtin_version in glibc-d.cc, 
freebsd-d.cc, ..., doesn't seem like the best thing to do, but I'm open 
for suggestions.

> +
> +  if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128)
> +d_add_builtin_version ("D_PPCUseIEEE128");

It says in the spec that all version identifiers derived from any 
predefined versions by appending any character(s) are reserved.

So there's no need for the `D_` prefix, `PPC_UseIEEE128` will suffice.

> +
> +// { dg-final { scan-assembler "test_version" } }
> +extern (C) bool test_version() {
> +// { dg-final { scan-assembler "li 3,1" } }
> +version (D_PPCUseIEEE128) return true;
> +else return false;
> +}
> +
> +// { dg-final { scan-assembler "test_elf_version" } }
> +extern (C) bool test_elf_version() {
> +// { dg-final { scan-assembler "li 3,1" } }
> +version (ELFv2) return true;
> +else return false;
> +}

These two tests should return a different value, otherwise you could 
just end up matching the same function return twice.

Kind regards,
Iain.


Re: [PATCH] i386: Change RTL representation of bt[lq] [PR118623]

2025-02-09 Thread Uros Bizjak
On Sun, Feb 9, 2025 at 11:31 PM Jakub Jelinek  wrote:
>
> On Sun, Feb 09, 2025 at 09:24:30AM +0100, Uros Bizjak wrote:
> > "For commutative and comparison operators, a constant is always made
> > the second operand."
>
> Isn't that just for commutative comparison operators (eq, ne, ordered,
> unordered, ltgt, uneq)?  Compare itself even isn't RTX_COMPARE at all,
> it is RTX_BIN_ARITH, so similar to minus.
> And I must say I can't find where simplify-rtx.cc or combine.cc would be
> trying to canonicalize that order (note, it would need to change the related
> uses of flags as well).

When combine combines some RTX with any operand of COMPARE RTX, it
canonicalizes COMPARE RTX in simplify_compare_const:

/* Try to simplify a comparison between OP0 and a constant OP1,
   where CODE is the comparison code that will be tested, into a
   (CODE OP0 const0_rtx) form.

   The result is a possibly different comparison code to use.
   *POP0 and *POP1 may be updated.  */

This form won't match the new BT RTX.

At the moment, there is no instruction using compare:CCC that could be
combined with e.g. ZERO_EXTRACT RTX to form "*bt" insn, so the
canonicalization rules do not matter, because BT is always expanded
from splitters in a kind-of manual way. This is the reason that patch
is OK, but without adding the new form to
ix86_canonicalize_comparison, combine won't be able to match the new
BT RTX in future.

Please see ix86_canonicalize_comparison for a couple of examples on
how to allow combine to match non-canonical compares.

Uros.


Re: [PATCH] c++/modules: Better handle no-linkage decls in unnamed namespaces [PR118799]

2025-02-09 Thread Nathaniel Shead
On Sun, Feb 09, 2025 at 01:16:00AM +1100, Nathaniel Shead wrote:
> Tested on x86_64-pc-linux-gnu, OK for trunk if full bootstrap + regtest
> passes?
> 
> -- >8 --
> 
> There are two issues with no-linkage decls (e.g. explicit type aliases)
> in unnamed namespaces that this patch fixes.
> 
> Firstly, we don't currently handle exporting no-linkage decls in unnamed
> namespaces.  This should be ill-formed in [module.export], since having
> an exported declaration within a namespace-definition makes the
> namespace definition exported (p2), but an unnamed namespace has
> internal linkage thus violating p3.
> 
> Secondly, by the standard it appears to be possible to emit unnamed
> namespaces from named modules in certain scenarios.  This patch makes
> the adjustments needed to ensure we don't error in this case.
> 

One thing to note with this is that it means that the following sample:

  export module M;
  namespace {
struct Internal {};
using Alias = Internal;
  }

will still error: 

test.cpp:4:9: error: ‘using {anonymous}::Alias = struct {anonymous}::Internal’ 
exposes TU-local entity ‘struct {anonymous}::Internal’
4 |   using Alias = Internal;
  | ^
test.cpp:3:10: note: ‘struct {anonymous}::Internal’ declared with internal 
linkage
3 |   struct Internal {};
  |  ^~~~

https://eel.is/c++draft/basic.link#17 is the relevant paragraph here,
but I'm not 100% sure what it says about this example; I suppose that
given Alias isn't really an entity maybe this should be OK?  But either
way we definitely don't want to emit this alias.

Maybe the correct approach here is to mark an explicit type alias
TU-local iff the type it refers to is itself TU-local?  So something
like this on top of this patch:

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 21251f98a35..7aa7f93df57 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -13331,6 +13331,8 @@ bool
 depset::hash::is_tu_local_entity (tree decl, bool explain/*=false*/)
 {
   gcc_checking_assert (DECL_P (decl));
+  location_t loc = DECL_SOURCE_LOCATION (decl);
+  tree type = TREE_TYPE (decl);

   /* Only types, functions, variables, and template (specialisations)
  can be TU-local.  */
@@ -13340,15 +13342,22 @@ depset::hash::is_tu_local_entity (tree decl, bool 
explain/*=false*/)
   && TREE_CODE (decl) != TEMPLATE_DECL)
 return false;

-  /* An explicit type alias is not an entity, and so is never TU-local.
- Neither are the built-in declarations of 'int' and such.  */
+  /* An explicit type alias is not an entity, but rather inherits
+ whether it's TU-local from the type that it aliases.
+ The built-in declarations of 'int' and such are never TU-local.  */
   if (TREE_CODE (decl) == TYPE_DECL
   && !DECL_SELF_REFERENCE_P (decl)
   && !DECL_IMPLICIT_TYPEDEF_P (decl))
-return false;
-
-  location_t loc = DECL_SOURCE_LOCATION (decl);
-  tree type = TREE_TYPE (decl);
+{
+  if (tree orig = DECL_ORIGINAL_TYPE (decl))
+{
+  if (explain)
+inform (loc, "%qD is an alias of TU-local type %qT", decl, orig);
+  return is_tu_local_entity (TYPE_NAME (orig), explain);
+}
+  else
+return false;
+}

   /* Check specializations first for slightly better explanations.  */
   int use_tpl = -1;

Thoughts?

>   PR c++/118799
> 
> gcc/cp/ChangeLog:
> 
>   * module.cc (depset::hash::is_tu_local_entity): Only types,
>   functions, variables, and template (specialisations) can be
>   TU-local.
>   (module_state::write_namespaces): Allow unnamed namespaces in
>   named modules.
>   (check_module_decl_linkage): Error for all exported declarations
>   in an unnamed namespace.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/export-6.C: Adjust error message, add check for
>   no-linkage decls in namespace.
>   * g++.dg/modules/internal-4_b.C: Allow exposing a namespace with
>   internal linkage.
>   * g++.dg/modules/using-30_a.C: New test.
>   * g++.dg/modules/using-30_b.C: New test.
>   * g++.dg/modules/using-30_c.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/module.cc| 35 -
>  gcc/testsuite/g++.dg/modules/export-6.C | 33 ++-
>  gcc/testsuite/g++.dg/modules/internal-4_b.C |  4 +--
>  gcc/testsuite/g++.dg/modules/using-30_a.C   |  9 ++
>  gcc/testsuite/g++.dg/modules/using-30_b.C   |  6 
>  gcc/testsuite/g++.dg/modules/using-30_c.C   | 13 
>  6 files changed, 77 insertions(+), 23 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/using-30_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/using-30_b.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/using-30_c.C
> 
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index 59716e1873e..21251f98a35 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -13332,6 +13332,14 @@ depset::hash::is_tu_loca

Re: [PATCH v2] RISC-V: Vector pesudoinsns with x0 operand to use imm 0

2025-02-09 Thread Vineet Gupta
On 2/8/25 23:02, Jeff Law wrote:
> On 2/7/25 9:34 PM, Vineet Gupta wrote:
>> A couple of Vector pseudoinstructions use x0 scalar which being regfile
>> crosser could be inefficient on certain wider uarches.
>>
>> Use the imm 0 form, which should be functionally equivalent.
>>
>>   pseudoinsnorig insn with x0 this patch
>>       ---
>>   vneg.v vd,vs  vrsub.vx vd,vs,x0 vrsub.vi vd,vs,0
>>   vncvt.x.x.w vd,vs,vm  vnsrl.wx vd,vs,x0,vm  vnsrl.wi vd,vs,0,vm
>>   vwcvt.x.x.v vd,vs,vm  vwadd.vx vd,vs,x0,vm  (imm not supported)
>>
>> This passes my testsuite A/B run but obviously wait for the CI tester to
>> give a green light.
>>
>> gcc/ChangeLog:
>>  * config/riscv/vector.md: vncvt substitute vnsrl.
>>  vnsrl with x0 replace with immediate 0.
>>  vneg substitute vrsub.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: 
>> Change
>>  expected pattern.
>>  * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: 
>> Ditto.
>>  * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: 
>> Ditto.
>>  * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: 
>> Ditto.
>>  * gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Ditto
>>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/vls/cond_convert-11.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/vls/cond_convert-12.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/vls/cond_trunc-1.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/vls/cond_trunc-2.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/vls/cond_trunc-3.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/vls/convert-11.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/vls/convert-12.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/vls/neg-1.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/vls/trunc-1.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/vls/trunc-2.c: Ditto.
>>  * gcc.target/riscv/rvv/autovec/vls/trunc-3.c: Ditto.
>>  * gcc.target/riscv/rvv/base/simplify-vdiv.c: Ditto.
>>  * gcc.target/riscv/rvv/base/unop_v_constraint-1.c: Ditto.
> LGTM.  I think the only question is whether or not to make an exception 
> for this or not.  We are in stage4 after all ;-)  Figure we can make a 
> decision on the Tues call if you're available.

I don't have a strong opinion either way, just wanted to get it out of my tree 
:-)
Yeah sure, 9 PM IST is manageable.

-Vineet


Re: [committed][rtl-optimization/116244] Don't create bogus regs in alter_subreg

2025-02-09 Thread Georg-Johann Lay

CCing Denis

Am 08.02.25 um 23:51 schrieb Jeff Law:

On 2/8/25 1:52 PM, Georg-Johann Lay wrote:

Am 08.02.25 um 18:23 schrieb Jeff Law:

On 2/8/25 3:04 AM, Georg-Johann Lay wrote:
That test case from https://gcc.gnu.org/bugzilla/show_bug.cgi? 
id=116389#c7

still ICEs with that change in http://gcc.gnu.org/r15-7428

pr116389-red.c: In function 'func':
pr116389-red.c:20:1: error: insn does not satisfy its constraints:
    20 | }
   | ^
(insn 27 69 28 5 (set (mem/c:SI (plus:HI (reg/f:HI 28 r28)
 (const_int 2 [0x2])) [4 %sfp+2 S4 A8])
 (reg:SI 30 r30)) "pr116389-red.c":16:19 146 {*movsi_split}
  (nil))
during RTL pass: postreload
pr116389-red.c:20:1: internal compiler error: in 
extract_constrain_insn, at recog.cc:2783


Reason is that R30 is the last GPR and can hold HImode at most,
but due to a paradoxical subreg, there is r30:SI.

Bummer as that was the kind of scenario it's supposed to fix.

What did that insn look like before IRA and for whatever pseudo was 
in that operand, what hard register did IRA think it should be 
allocated to?


jeff


The .ira dump has several paradoxical subregs like:

(insn 22 21 60 4 (set (reg/v:SI 51 [ val32 ])
 (subreg:SI (reg:HI 53 [ t$val ]) 0)) "pr116389-red.c":14:14 
146 {*movsi_split}


(insn 27 26 28 5 (set (reg/v:SI 51 [ val32 ])
 (subreg:SI (reg/v:HI 52 [ diff ]) 0)) "pr116389-red.c":16:19 
146 {*movsi_split}


(insn 35 34 36 7 (set (reg:HI 54 [ _7 ])
 (ashift:HI (subreg:HI (reg/v:SI 51 [ val32 ]) 0)
 (const_int 1 [0x1]))) "pr116389-red.c":18:13 667 {ashlhi3}

I don't know which one is causing the trouble.  Maybe the attached
dumps will help.
I would suggest looking at the .IRA dump.  You want to know what 
register was assigned to pseudo 52.  I'm guessing it'll be r30, at which 
point you'll probably want to debug the same code I just changed to 
figure out why it's not working in the expected manner.


jeff


That subreg handling in ira-build.c triggers 3 times:

create_insn_allocnos[func:ira(324)]: outer=(subreg:HI (reg/v:SI 51 [ 
val32 ]) 0)
create_insn_allocnos[func:ira(324)]: outer=(subreg:SI (reg/v:HI 52 [ 
diff ]) 0)
create_insn_allocnos[func:ira(324)]: outer=(subreg:SI (reg:HI 53 [ t$val 
]) 0)


Insn 27 in the .ira dump reads:

(insn 27 26 28 5 (set (reg/v:SI 51 [ val32 ])
(subreg:SI (reg/v:HI 52 [ diff ]) 0)) "pr116389-red.c":16:19 
146 {*movsi_split}

 (expr_list:REG_DEAD (reg/v:HI 52 [ diff ])

As far as I can see, IRA doesn't allocate a pseudo but expects a spill 
for r52:


Allocno a3r52 of GENERAL_REGS(14) has 2 avail. regs  18-19, node:  18-19 
obj 0 (confl regs =  0-17 20-36),  obj 1 (confl regs =  0-17 20-36)


Pushing a3(r52,l0)(potential spill: pri=12000, cost=12000)

  Popping a4(r53,l0)  -- assign reg 18
  Popping a3(r52,l0)  -- spill
  Popping a0(r54,l0)  -- assign reg 24

Disposition:
1:r51  l0   mem3:r52  l0   mem4:r53  l0180:r54  l0 
  24
6:r55  l0208:r56  l0247:r57  l0245:r58  l0 
  24

2:r59  l024

And spill code generation is up to reload? .reload reads:

insn=27, live_throughout: 32, dead_or_set: 51, 52

Spilling for insn 27.
Using reg 20 for reload 1

Reloads for insn # 27
Reload 0: reload_out (SI) = (reg/v:SI 51 [ val32 ])
NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
reload_out_reg: (reg/v:SI 51 [ val32 ])
Reload 1: reload_in (HI) = (reg/v:HI 52 [ diff ])
GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1)
reload_in_reg: (reg/v:HI 52 [ diff ])
reload_reg_rtx: (reg:HI 30 r30)


(insn 69 26 27 5 (set (reg:HI 30 r30)
(mem/c:HI (plus:HI (reg/f:HI 28 r28)
(const_int 2 [0x2])) [4 %sfp+2 S2 A8])) 
"pr116389-red.c":16:19 128 {*movhi_split}

 (nil))

(insn 27 69 28 5 (set (mem/c:SI (plus:HI (reg/f:HI 28 r28)
(const_int 2 [0x2])) [4 %sfp+2 S4 A8])
(reg:SI 30 r30)) "pr116389-red.c":16:19 146 {*movsi_split}
 (nil))

So it loads HI:30 from fp+2 and then pushes it as SI:30 to fp+2.
Insn 69 is generated by reload.

It seems reload is generating these insns to do a paradoxical
subreg in memory since it cannot be done in registers?

Johann



Re: [PATCH] i386: Change RTL representation of bt[lq] [PR118623]

2025-02-09 Thread Uros Bizjak
On Sat, Feb 8, 2025 at 9:40 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase is miscompiled because of RTL represententation
> of bt{l,q} insn followed by e.g. j{c,nc} being misleading to what it
> actually does.
> Let's look e.g. at
> (define_insn_and_split "*jcc_bt"
>   [(set (pc)
> (if_then_else (match_operator 0 "bt_comparison_operator"
> [(zero_extract:SWI48
>(match_operand:SWI48 1 "nonimmediate_operand")
>(const_int 1)
>(match_operand:QI 2 "nonmemory_operand"))
>  (const_int 0)])
>   (label_ref (match_operand 3))
>   (pc)))
>(clobber (reg:CC FLAGS_REG))]
>   "(TARGET_USE_BT || optimize_function_for_size_p (cfun))
>&& (CONST_INT_P (operands[2])
>? (INTVAL (operands[2]) < GET_MODE_BITSIZE (mode)
>   && INTVAL (operands[2])
>>= (optimize_function_for_size_p (cfun) ? 8 : 32))
>: !memory_operand (operands[1], mode))
>&& ix86_pre_reload_split ()"
>   "#"
>   "&& 1"
>   [(set (reg:CCC FLAGS_REG)
> (compare:CCC
>   (zero_extract:SWI48
> (match_dup 1)
> (const_int 1)
> (match_dup 2))
>   (const_int 0)))
>(set (pc)
> (if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
>   (label_ref (match_dup 3))
>   (pc)))]
> {
>   operands[0] = shallow_copy_rtx (operands[0]);
>   PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
> })
> The define_insn part in RTL describes exactly what it does,
> jumps to op3 if bit op2 in op1 is set (for op0 NE) or not set (for op0 EQ).
> The problem is with what it splits into.
> put_condition_code %C1 for CCCmode comparisons emits c for EQ and LTU,
> nc for NE and GEU and ICEs otherwise.
> CCCmode is used mainly for carry out of add/adc, borrow out of sub/sbb,
> in those cases e.g. for add we have
> (set (reg:CCC flags) (compare:CCC (plus:M x y) x))
> and use (ltu (reg:CCC flags) (const_int 0)) for carry set and
> (geu (reg:CCC flags) (const_int 0)) for carry not set.  These cases
> model in RTL what is actually happening, compare in infinite precision
> x from the result of finite precision addition in M mode and if it is
> less than unsigned (i.e. overflow happened), carry is set.
> Another use of CCCmode is in UNSPEC_* patterns, those are used with
> (eq (reg:CCC flags) (const_int 0)) for carry set and ne for unset,
> given the UNSPEC no big deal, the middle-end doesn't know what means
> set or unset.
> But for the bt{l,q}; j{c,nc} case the above splits it into
> (set (reg:CCC flags) (compare:CCC (zero_extract) (const_int 0)))
> for bt and
> (set (pc) (if_then_else (eq (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
> for the bit set case (so that the jump expands to jc) and ne for
> the bit not set case (so that the jump expands to jnc).
> Similarly for the different splitters for cmov and set{c,nc} etc.
> The problem is that when the middle-end reads this RTL, it feels
> the exact opposite to it.  If zero_extract is 1, flags is set
> to comparison of 1 and 0 and that would mean using ne ne in the
> if_then_else, and vice versa.
>
> So, in order to better describe in RTL what is actually happening,
> one possibility would be to swap the behavior of put_condition_code
> and use NE + LTU -> c and EQ + GEU -> nc rather than the current
> EQ + LTU -> c and NE + GEU -> nc; and adjust everything.  The
> following patch uses a more limited approach, instead of representing
> bt{l,q}; j{c,nc} case as written above it uses
> (set (reg:CCC flags) (compare:CCC (const_int 0) (zero_extract)))
> and
> (set (pc) (if_then_else (ltu (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
> which uses the existing put_condition_code but describes what the
> insns actually do in RTL clearly.  If zero_extract is 1,
> then flags are LTU, 0U < 1U, if zero_extract is 0, then flags are GEU,
> 0U >= 0U.  The patch adjusts the *bt define_insn and all the
> splitters to it and its comparisons/conditional moves/setXX.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2025-02-08  Jakub Jelinek  
>
> PR target/118623
> * config/i386/i386.md (*bt): Represent bt as
> compare:CCC of const0_rtx and zero_extract rather than
> zero_extract and const0_rtx.
> (*bt_mask): Likewise.
> (*jcc_bt): Likewise.  Use LTU and GEU as flags test
> instead of EQ and NE.
> (*jcc_bt_mask): Likewise.
> (*jcc_bt_mask_1): Likewise.
> (Help combine recognize bt followed by cmov splitter): Likewise.
> (*bt_setcqi): Likewise.
> (*bt_setncqi): Likewise.
> (*bt_setnc): Likewise.
> (*bt_setncqi_2): Likewise.
> (*bt_setc_mask): Likewise.
>
> * gcc.c-torture/execute/pr118623.c: New test.

OK, but please note that gene

Re: [Patch, Fortran] Fix PR 24878, checking actual arguments against global symbols

2025-02-09 Thread Thomas Koenig

Am 08.02.25 um 22:46 schrieb Harald Anlauf:

looks good, just two minor comments:

+  actual_name = act_sym->name ? act_sym->name : act_sym->name;

Why not just

+  actual_name = act_sym->name;

?


That was a leftover from a previous commit.




+  gfc_error ("Type mismatch passing global function %qs "
+ "declared at %L at %L (%s/%s)", actual_name,
+ &gsym->where, &actual->where,
+ gfc_typename (&global_asym->ts),
+ gfc_dummy_typename (&formal->ts));

These result in lines exceeding column 80.


Fixed.


I am also not a native speaker, but "at %L at %L" sounds strange to me.
Could you find a minor rewording?


I tried, but I could not find anything better...

So, if anybody can think of a more clever wording, the patch for
this is pre-approved :-)

Committed as r15-7449, thanks for the review!

Best regards

Thomas



[committed][PR middle-end/117263] Avoid unused-but-set warning in genautomata

2025-02-09 Thread Jeff Law
This is a trivial bug where a user wanted to define NDEBUG when building 
genautomata, presumably trying to debug its behavior.  This resulted in 
a unused-but-set warning which caused the build to fail.


Dario included the trivial fixes in the PR which I put through the usual 
bootstrap & regression test as well as compiling genautomata with NDEBUG.


Pushing to the trunk.

I'm not addressing whether or not NDEBUG is still useful.  That would be 
Vlad's call.


Jeff

commit b81bb3ed216213fdaba82addae9fc34619ad6ec7
Author: Dario Gjorgjevski 
Date:   Sun Feb 9 09:16:31 2025 -0700

[PR middle-end/117263] Avoid unused-but-set warning in genautomata

This is a trivial bug where a user wanted to define NDEBUG when building
genautomata, presumably trying to debug its behavior.  This resulted in a
unused-but-set warning which caused the build to fail.

Dario included the trivial fixes in the PR which I put through the usual
bootstrap & regression test as well as compiling genautomata with NDEBUG.

Pushing to the trunk.

PR middle-end/117263
gcc/
* genautomata.cc (output_statistics): Avoid set but unnused warnings
when compiling with NDEBUG.

diff --git a/gcc/genautomata.cc b/gcc/genautomata.cc
index 69f856d141c..4059a229f96 100644
--- a/gcc/genautomata.cc
+++ b/gcc/genautomata.cc
@@ -9088,8 +9088,8 @@ static void
 output_statistics (FILE *f)
 {
   automaton_t automaton;
-  int states_num;
 #ifndef NDEBUG
+  int states_num;
   int transition_comb_vect_els = 0;
   int transition_full_vect_els = 0;
   int min_issue_delay_vect_els = 0;
@@ -9106,13 +9106,17 @@ output_statistics (FILE *f)
   automaton->NDFA_states_num, automaton->NDFA_arcs_num);
   fprintf (f, "%5d DFA states,   %5d DFA arcs\n",
   automaton->DFA_states_num, automaton->DFA_arcs_num);
+#ifndef NDEBUG
   states_num = automaton->DFA_states_num;
+#endif
   if (!no_minimization_flag)
{
  fprintf (f, "%5d minimal DFA states,   %5d minimal DFA arcs\n",
   automaton->minimal_DFA_states_num,
   automaton->minimal_DFA_arcs_num);
+#ifndef NDEBUG
  states_num = automaton->minimal_DFA_states_num;
+#endif
}
   fprintf (f, "%5d all insns  %5d insn equivalence classes\n",
   description->insns_num, automaton->insn_equiv_classes_num);


Re: [committed][rtl-optimization/116244] Don't create bogus regs in alter_subreg

2025-02-09 Thread Jeff Law

On 2/9/25 1:10 AM, Georg-Johann Lay wrote:



The .ira dump has several paradoxical subregs like:

(insn 22 21 60 4 (set (reg/v:SI 51 [ val32 ])
 (subreg:SI (reg:HI 53 [ t$val ]) 0)) "pr116389-red.c":14:14 
146 {*movsi_split}


(insn 27 26 28 5 (set (reg/v:SI 51 [ val32 ])
 (subreg:SI (reg/v:HI 52 [ diff ]) 0)) "pr116389-red.c":16:19 
146 {*movsi_split}


(insn 35 34 36 7 (set (reg:HI 54 [ _7 ])
 (ashift:HI (subreg:HI (reg/v:SI 51 [ val32 ]) 0)
 (const_int 1 [0x1]))) "pr116389-red.c":18:13 667 {ashlhi3}

I don't know which one is causing the trouble.  Maybe the attached
dumps will help.
I would suggest looking at the .IRA dump.  You want to know what 
register was assigned to pseudo 52.  I'm guessing it'll be r30, at 
which point you'll probably want to debug the same code I just changed 
to figure out why it's not working in the expected manner.


jeff


That subreg handling in ira-build.c triggers 3 times:

create_insn_allocnos[func:ira(324)]: outer=(subreg:HI (reg/v:SI 51 
[ val32 ]) 0)
create_insn_allocnos[func:ira(324)]: outer=(subreg:SI (reg/v:HI 52 
[ diff ]) 0)
create_insn_allocnos[func:ira(324)]: outer=(subreg:SI (reg:HI 53 
[ t$val ]) 0)


Insn 27 in the .ira dump reads:

(insn 27 26 28 5 (set (reg/v:SI 51 [ val32 ])
     (subreg:SI (reg/v:HI 52 [ diff ]) 0)) "pr116389-red.c":16:19 
146 {*movsi_split}

  (expr_list:REG_DEAD (reg/v:HI 52 [ diff ])

As far as I can see, IRA doesn't allocate a pseudo but expects a spill 
for r52:


Allocno a3r52 of GENERAL_REGS(14) has 2 avail. regs  18-19, node:  18-19 
obj 0 (confl regs =  0-17 20-36),  obj 1 (confl regs =  0-17 20-36)


Pushing a3(r52,l0)(potential spill: pri=12000, cost=12000)

   Popping a4(r53,l0)  -- assign reg 18
   Popping a3(r52,l0)  -- spill
   Popping a0(r54,l0)  -- assign reg 24

Disposition:
     1:r51  l0   mem    3:r52  l0   mem    4:r53  l0    18    0:r54  l0 
   24
     6:r55  l0    20    8:r56  l0    24    7:r57  l0    24    5:r58  l0 
   24

     2:r59  l0    24

And spill code generation is up to reload? .reload reads:
Yes.  What this all says is that IRA is doing something sensible.  So 
the problem is reload or something in the avr port.





insn=27, live_throughout: 32, dead_or_set: 51, 52

Spilling for insn 27.
Using reg 20 for reload 1

Reloads for insn # 27
Reload 0: reload_out (SI) = (reg/v:SI 51 [ val32 ])
 NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
 reload_out_reg: (reg/v:SI 51 [ val32 ])
Reload 1: reload_in (HI) = (reg/v:HI 52 [ diff ])
 GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1)
 reload_in_reg: (reg/v:HI 52 [ diff ])
 reload_reg_rtx: (reg:HI 30 r30)


(insn 69 26 27 5 (set (reg:HI 30 r30)
     (mem/c:HI (plus:HI (reg/f:HI 28 r28)
     (const_int 2 [0x2])) [4 %sfp+2 S2 A8])) "pr116389- 
red.c":16:19 128 {*movhi_split}

  (nil))

(insn 27 69 28 5 (set (mem/c:SI (plus:HI (reg/f:HI 28 r28)
     (const_int 2 [0x2])) [4 %sfp+2 S4 A8])
     (reg:SI 30 r30)) "pr116389-red.c":16:19 146 {*movsi_split}
  (nil))

So it loads HI:30 from fp+2 and then pushes it as SI:30 to fp+2.
Insn 69 is generated by reload.

It seems reload is generating these insns to do a paradoxical
subreg in memory since it cannot be done in registers?
My working memory of reload is diminishing by the day, especially 
anything related to subregs.  I wouldn't necessarily make the assumption 
that it cannot be done, it's more likely there just aren't enough 
registers available at the right points.  So the object gets forced into 
memory irrespective of whether or not a paradoxical subreg is involved. 
But that's speculation -- subreg handling in reload is nontrivial and 
there may be places where it just gives up a generates reloads.


If someone wanted to chase down reload's behavior, I'd start with the 
selection of r30 as the reload register.  It makes sense in HImode, but 
not when there's a outer paradoxical subreg at the use point.


jeff




Johann





[committed][RISC-V][PR target/115123] Fix testsuite fallout from sinking heuristic change

2025-02-09 Thread Jeff Law
Code sinking is just semantic preserving code motions, so it's a lot 
like scheduling in that code motions can change the vector configuration 
needed at various program points.  That in turn can also change the 
number of vsetvls as we may or may not be able to merge them after the 
code motions.


The sinking heuristics were twiddled several months ago resulting in a 
handful of scan-asm failures.  This patch adjusts the tests 
appropriately fixing pr115123 (P3 regression).


Pushing to the trunk.

jeff

commit 22e30d60b971eed9a4754ea920d05b1b7e89090a
Author: Jeff Law 
Date:   Sun Feb 9 09:55:56 2025 -0700

[PR target/115123] Fix testsuite fallout from sinking heuristic change

Code sinking is just semantic preserving code motions, so it's a lot like
scheduling in that code motions can change the vector configuration needed 
at
various program points.  That in turn can also change the number of vsetvls 
as
we may or may not be able to merge them after the code motions.

The sinking heuristics were twiddled several months ago resulting in a 
handful
of scan-asm failures.  This patch adjusts the tests appropriately fixing
pr115123 (P3 regression).

PR target/115123
gcc/testsuite
* gcc.target/riscv/rvv/base/pr114352-3.c: Adjust expected output.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-66.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-82.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-83.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-86.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-88.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-90.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-91.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-92.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c
index a764afbbbc1..9bfa39c5268 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c
@@ -4,6 +4,7 @@
 
 /*
 ** test_1:
+** ...
 ** sext\.w\s+[atx][0-9]+,\s*[atx][0-9]+
 ** ...
 */
@@ -56,6 +57,7 @@ test_3 (int *a, int *b, int *out, unsigned count)
 
 /*
 ** test_4:
+** ...
 ** sext\.w\s+[atx][0-9]+,\s*[atx][0-9]+
 ** ...
 */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c
index 21bc0729cf6..cdb1a4ee921 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c
@@ -37,4 +37,4 @@ void f (void * restrict in, void * restrict out, int l, int 
n, int m, int cond)
   }
 }
 
-/* { dg-final { scan-assembler 
{add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+ble\s+[a-x0-9]+,\s*zero,\.L[0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]}
 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts 
"-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler 
{add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+ble\s+[a-x0-9]+,\s*zero,\.L[0-9]+\s+.L[0-9]+:\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]}
 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts 
"-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c
index 5539486b506..c7c9e1f91ff 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c
@@ -36,4 +36,4 @@ void f (void * restrict in, void * restrict out, int l, int 
n, int m, int cond)
   }
 }
 
-/* { dg-final { scan-assembler 
{add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+ble\s+[a-x0-9]+,\s*zero,\.L[0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]}
 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts 
"-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler 
{add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+ble\s+[a-x0-9]+,\s*zero,\.L[0-9]+\s+.L[0-9]+:\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]}
 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts 
"-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-66.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-66.c
index 6e995461c6f..c174845f7bc 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-66.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-66.c
@@ -17,5 +17,4 @@ void f2 (void * restrict in, void * restrict out, int l, int 
n, int m, size_t vl
   }
 }
 
-/* { dg-final {