Re: [PATCH htdocs] gcc-14: document -ftrampoline-impl
> On 10 Feb 2025, at 08:28, Sam James wrote: > > --- > Iain, does this look OK? > > htdocs/gcc-14/changes.html | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html > index ba4780ca..2db442b2 100644 > --- a/htdocs/gcc-14/changes.html > +++ b/htdocs/gcc-14/changes.html > @@ -123,6 +123,13 @@ You may also want to check out our > not an optimization, to avoid relying on library > implementations. > > + > + New option > +href="https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Code-Gen-Options.html#index-ftrampoline-impl";>-ftrampoline-impl=, > + to choose the position of generated trampolines. maybe “placement” instead of “position” - but not a show-stopper for me otherwise LGTM. Iain > The default remains the stack > + for almost all targets. To override the default, > + -ftrampoline-impl=heap can be passed on supported ABIs > + (x86_64 Darwin, x86_64 and aarch64 Linux). > > > New function attribute > -- > 2.48.1 >
Re: [PATCH 0/3] GCC13/GCC12 backport [PR108707][PR109610]
On Mon, Feb 10, 2025 at 6:46 AM liuhongt wrote: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108707#c9 > > >Pranav Gorantla 2025-02-06 04:30:05 UTC > >Facing similar issue in gcc-13. Is it possible to backport the fix of this > >Bug 108707 and Bug 109610 to gcc-13, gcc-12 as well. > > This series is to ask approval for the backport of r14-172 and r14-1252 to > GCC13 and GCC12 release branch. > Note r14-1252 is a fix to r14-172 which regressed powerpc testcase in > PR109610. > I have verified the fix also works on GCC13/GCC12 branch for PR109610. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}, and > aarch64-linux-gnu. > Ok for backport No, please avoid backporting missed-optimization fixes, esp. in code areas as sensitive as RA. The only exception should be the newest active branch (gcc-14) at this point, and even there you need to be very careful. The PR itself isn't even marked as regression btw. (not that this would change my assessment much). Thanks, Richard. > > liuhongt (3): > Use NO_REGS in cost calculation when the preferred register class are > not known yet. > Only use NO_REGS in cost calculation when !hard_regno_mode_ok for > GENERAL_REGS and mode. > Adjust testcases after better RA decision. > > gcc/ira-costs.cc | 7 + > .../i386/avx2-dest-false-dep-for-glc.c| 28 +- > .../i386/avx512dq-dest-false-dep-for-glc.c| 257 ++--- > .../i386/avx512f-dest-false-dep-for-glc.c | 348 ++ > .../i386/avx512fp16-dest-false-dep-for-glc.c | 118 -- > .../i386/avx512vl-dest-false-dep-for-glc.c| 243 +--- > gcc/testsuite/gcc.target/i386/pr108707.c | 16 + > 7 files changed, 813 insertions(+), 204 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr108707.c > > -- > 2.34.1 >
[PATCH htdocs] gcc-14: document -ftrampoline-impl
--- Iain, does this look OK? htdocs/gcc-14/changes.html | 7 +++ 1 file changed, 7 insertions(+) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index ba4780ca..2db442b2 100644 --- a/htdocs/gcc-14/changes.html +++ b/htdocs/gcc-14/changes.html @@ -123,6 +123,13 @@ You may also want to check out our not an optimization, to avoid relying on library implementations. + + New option + https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Code-Gen-Options.html#index-ftrampoline-impl";>-ftrampoline-impl=, + to choose the position of generated trampolines. The default remains the stack + for almost all targets. To override the default, + -ftrampoline-impl=heap can be passed on supported ABIs + (x86_64 Darwin, x86_64 and aarch64 Linux). New function attribute -- 2.48.1
[patch,avr] Add -mno-call-main to tweak running main()
On devices with very limited resources, it may be desirable to run main in a more efficient way than provided by the startup code XCALL main XJMP exit from section .init9. In AVR-LibC v2.3, that code has been moved to libmcu.a, hence symbol __call_main can be satisfied so that the respective code is no more pulled in from that library. Instead, main can be run by putting it in section .init9. The patch adds attributes noreturn and section(".init9"), and sets __call_main=0 when it encounters main(). Ok for trunk? Johann -- AVR: target/118806 - Add -mno-call-main to tweak running main(). On devices with very limited resources, it may be desirable to run main in a more efficient way than provided by the startup code XCALL main XJMP exit from section .init9. In AVR-LibC v2.3, that code has been moved to libmcu.a, hence symbol __call_main can be satisfied so that the respective code is no more pulled in from that library. Instead, main can be run by putting it in section .init9. The patch adds attributes noreturn and section(".init9"), and sets __call_main=0 when it encounters main(). gcc/ PR target/118806 * config/avr/avr.opt (-mcall-main): New option and... (avropt_call_main): ...variable. * config/avr/avr.cc (avr_no_call_main_p): New variable. (avr_insert_attributes) [-mno-call-main, main]: Add attributes noreturn and section(".init9") to main. Set avr_no_call_main_p. (avr_file_end) [avr_no_call_main_p]: Define symbol __call_main. * doc/invoke.texi (AVR Options) <-mno-call-main>: Document. <-mnodevicelib>: Extend explanation.AVR: target/118806 - Add -mno-call-main to tweak running main(). On devices with very limited resources, it may be desirable to run main in a more efficient way than provided by the startup code XCALL main XJMP exit from section .init9. In AVR-LibC v2.3, that code has been moved to libmcu.a, hence symbol __call_main can be satisfied so that the respective code is no more pulled in from that library. Instead, main can be run by putting it in section .init9. The patch adds attributes noreturn and section(".init9"), and sets __call_main=0 when it encounters main(). gcc/ PR target/118806 * config/avr/avr.opt (-mcall-main): New option and... (avropt_call_main): ...variable. * config/avr/avr.cc (avr_no_call_main_p): New variable. (avr_insert_attributes) [-mno-call-main, main]: Add attributes noreturn and section(".init9") to main. Set avr_no_call_main_p. (avr_file_end) [avr_no_call_main_p]: Define symbol __call_main. * doc/invoke.texi (AVR Options) <-mno-call-main>: Document. <-mnodevicelib>: Extend explanation. diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc index 9bfb3785ddb..c31d06d6d20 100644 --- a/gcc/config/avr/avr.cc +++ b/gcc/config/avr/avr.cc @@ -230,6 +230,9 @@ bool avr_need_clear_bss_p = false; bool avr_need_copy_data_p = false; bool avr_has_rodata_p = false; +/* To track if we satisfy __call_main from AVR-LibC. */ +bool avr_no_call_main_p = false; + /* Counts how often pass avr-fuse-add has been executed. Is is kept in sync with cfun->machine->n_avr_fuse_add_executed and serves as an insn condition for shift insn splitters. */ @@ -11712,6 +11715,32 @@ avr_insert_attributes (tree node, tree *attributes) NULL, *attributes); } +#if defined WITH_AVRLIBC + if (avropt_call_main == 0 + && TREE_CODE (node) == FUNCTION_DECL + && MAIN_NAME_P (DECL_NAME (node))) +{ + if (lookup_attribute ("section", *attributes)) + { + warning (OPT_Wattributes, "% attribute on main function" + " inhibits %<-mno-call-main%>"); + } + else + { + if (!lookup_attribute ("noreturn", *attributes)) + *attributes = tree_cons (get_identifier ("noreturn"), + NULL_TREE, *attributes); + // Put main into section .init9 so that it is executed even + // though it's not called. + tree init9 = build_string (1 + strlen (".init9"), ".init9"); + tree arg = build_tree_list (NULL_TREE, init9); + *attributes = tree_cons (get_identifier ("section"), + arg, *attributes); + avr_no_call_main_p = true; + } +} // -mno-call-main +#endif // AVR-LibC + avr_handle_isr_attribute (node, attributes, "signal"); avr_handle_isr_attribute (node, attributes, "interrupt"); @@ -12311,6 +12340,15 @@ avr_file_end (void) if (avr_need_clear_bss_p) fputs (".global __do_clear_bss\n", asm_out_file); + + /* Don't let __call_main call main() and exit(). + Defining this symbol will keep the code from being pulled + in from lib.a as requested by AVR-LibC's gcrt1.S. + We invoke main() by other means: putting it in .init9. */ + + if (avr_no_call_main_p) +fputs (".global __call_main\n" + "__call_main = 0\n", asm_out_file); }
[PATCH 2/3] Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode.
r14-172-g0368d169492017 replaces GENERAL_REGS with NO_REGS in cost calculation when the preferred register class are not known yet. It regressed powerpc PR109610 and PR109858, it looks too aggressive to use NO_REGS when mode can be allocated with GENERAL_REGS. The patch takes a step back, still use GENERAL_REGS when hard_regno_mode_ok for mode and GENERAL_REGS, otherwise uses NO_REGS. gcc/ChangeLog: PR target/109610 PR target/109858 * ira-costs.cc (scan_one_insn): Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode, otherwise still use GENERAL_REGS. (cherry picked from commit 4fb66b2329319e9b47e89200d613b6f741a114fc) --- gcc/ira-costs.cc | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc index 003963f2a19..d9e700e8947 100644 --- a/gcc/ira-costs.cc +++ b/gcc/ira-costs.cc @@ -1572,12 +1572,16 @@ scan_one_insn (rtx_insn *insn) && (! ira_use_lra_p || ! pic_offset_table_rtx || ! contains_symbol_ref_p (XEXP (note, 0 { - /* Costs for NO_REGS are used in cost calculation on the -1st pass when the preferred register classes are not -known yet. In this case we take the best scenario. */ - enum reg_class cl = NO_REGS; + enum reg_class cl = GENERAL_REGS; rtx reg = SET_DEST (set); int num = COST_INDEX (REGNO (reg)); + /* Costs for NO_REGS are used in cost calculation on the +1st pass when the preferred register classes are not +known yet. In this case we take the best scenario when +mode can't be put into GENERAL_REGS. */ + if (!targetm.hard_regno_mode_ok (ira_class_hard_regs[cl][0], + GET_MODE (reg))) + cl = NO_REGS; COSTS (costs, num)->mem_cost -= ira_memory_move_cost[GET_MODE (reg)][cl][1] * frequency; -- 2.34.1
[PATCH 0/3] GCC13/GCC12 backport [PR108707][PR109610]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108707#c9 >Pranav Gorantla 2025-02-06 04:30:05 UTC >Facing similar issue in gcc-13. Is it possible to backport the fix of this Bug >108707 and Bug 109610 to gcc-13, gcc-12 as well. This series is to ask approval for the backport of r14-172 and r14-1252 to GCC13 and GCC12 release branch. Note r14-1252 is a fix to r14-172 which regressed powerpc testcase in PR109610. I have verified the fix also works on GCC13/GCC12 branch for PR109610. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}, and aarch64-linux-gnu. Ok for backport liuhongt (3): Use NO_REGS in cost calculation when the preferred register class are not known yet. Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode. Adjust testcases after better RA decision. gcc/ira-costs.cc | 7 + .../i386/avx2-dest-false-dep-for-glc.c| 28 +- .../i386/avx512dq-dest-false-dep-for-glc.c| 257 ++--- .../i386/avx512f-dest-false-dep-for-glc.c | 348 ++ .../i386/avx512fp16-dest-false-dep-for-glc.c | 118 -- .../i386/avx512vl-dest-false-dep-for-glc.c| 243 +--- gcc/testsuite/gcc.target/i386/pr108707.c | 16 + 7 files changed, 813 insertions(+), 204 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr108707.c -- 2.34.1
Re: [PATCH 0/3] GCC13/GCC12 backport [PR108707][PR109610]
On Mon, Feb 10, 2025 at 1:43 PM liuhongt wrote: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108707#c9 > > >Pranav Gorantla 2025-02-06 04:30:05 UTC > >Facing similar issue in gcc-13. Is it possible to backport the fix of this > >Bug 108707 and Bug 109610 to gcc-13, gcc-12 as well. > > This series is to ask approval for the backport of r14-172 and r14-1252 to > GCC13 and GCC12 release branch. > Note r14-1252 is a fix to r14-172 which regressed powerpc testcase in > PR109610. > I have verified the fix also works on GCC13/GCC12 branch for PR109610. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}, and > aarch64-linux-gnu. > Ok for backport > > > liuhongt (3): > Use NO_REGS in cost calculation when the preferred register class are > not known yet. > Only use NO_REGS in cost calculation when !hard_regno_mode_ok for > GENERAL_REGS and mode. > Adjust testcases after better RA decision. > > gcc/ira-costs.cc | 7 + > .../i386/avx2-dest-false-dep-for-glc.c| 28 +- > .../i386/avx512dq-dest-false-dep-for-glc.c| 257 ++--- > .../i386/avx512f-dest-false-dep-for-glc.c | 348 ++ > .../i386/avx512fp16-dest-false-dep-for-glc.c | 118 -- > .../i386/avx512vl-dest-false-dep-for-glc.c| 243 +--- > gcc/testsuite/gcc.target/i386/pr108707.c | 16 + > 7 files changed, 813 insertions(+), 204 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr108707.c > > -- > 2.34.1 > -- BR, Hongtao
[PATCH 1/3] Use NO_REGS in cost calculation when the preferred register class are not known yet.
gcc/ChangeLog: PR rtl-optimization/108707 * ira-costs.cc (scan_one_insn): Use NO_REGS instead of GENERAL_REGS when preferred reg_class is not known. gcc/testsuite/ChangeLog: * gcc.target/i386/pr108707.c: New test. (cherry picked from commit 0368d169492017cfab5622d38b15be94154d458c) --- gcc/ira-costs.cc | 5 - gcc/testsuite/gcc.target/i386/pr108707.c | 16 2 files changed, 20 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr108707.c diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc index bdb1356af91..003963f2a19 100644 --- a/gcc/ira-costs.cc +++ b/gcc/ira-costs.cc @@ -1572,7 +1572,10 @@ scan_one_insn (rtx_insn *insn) && (! ira_use_lra_p || ! pic_offset_table_rtx || ! contains_symbol_ref_p (XEXP (note, 0 { - enum reg_class cl = GENERAL_REGS; + /* Costs for NO_REGS are used in cost calculation on the +1st pass when the preferred register classes are not +known yet. In this case we take the best scenario. */ + enum reg_class cl = NO_REGS; rtx reg = SET_DEST (set); int num = COST_INDEX (REGNO (reg)); diff --git a/gcc/testsuite/gcc.target/i386/pr108707.c b/gcc/testsuite/gcc.target/i386/pr108707.c new file mode 100644 index 000..6405cfe7cdc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr108707.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-not {(?n)vfmadd[1-3]*ps.*\(} { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times {(?n)vfmadd[1-3]*ps[ \t]*} 3 } } */ + +#include + +void +foo (__m512 pv, __m512 a, __m512 b, __m512 c, + __m512* pdest, __m512* p1) +{ + __m512 t = *p1; +pdest[0] = _mm512_fmadd_ps (t, pv, a); +pdest[1] = _mm512_fmadd_ps (t, pv, b); +pdest[2] = _mm512_fmadd_ps (t, pv, c); +} -- 2.34.1
[PATCH 3/3] Adjust testcases after better RA decision.
After optimization for RA, memory op is not propagated into instructions(>1), and it make testcases not generate vxorps since the memory is loaded into the dest, and the dest is never unused now. So rewrite testcases to make the codegen more stable. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-dest-false-dep-for-glc.c: Rewrite testcase to make the codegen more stable. * gcc.target/i386/avx512dq-dest-false-dep-for-glc.c: Ditto * gcc.target/i386/avx512f-dest-false-dep-for-glc.c: Ditto. * gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c: Ditto. * gcc.target/i386/avx512vl-dest-false-dep-for-glc.c: Ditto. (cherry picked from commit 525713ed9db904ed2504decc5bde9d58bd5d98b4) --- .../i386/avx2-dest-false-dep-for-glc.c| 28 +- .../i386/avx512dq-dest-false-dep-for-glc.c| 257 ++--- .../i386/avx512f-dest-false-dep-for-glc.c | 348 ++ .../i386/avx512fp16-dest-false-dep-for-glc.c | 118 -- .../i386/avx512vl-dest-false-dep-for-glc.c| 243 +--- 5 files changed, 790 insertions(+), 204 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c b/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c index fe331fe5e2c..e260888627f 100644 --- a/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c +++ b/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c @@ -5,16 +5,28 @@ #include -extern __m256i i1, i2, i3, i4; -extern __m256d d1, d2; -extern __m256 f1, f2; +__m256i +foo0 (__m256i i3, __m256i i1, __m256i i2) +{ + return _mm256_permutevar8x32_epi32 (i1, i2); +} + +__m256i +foo1 (__m256i i2, __m256i i1) +{ + return _mm256_permute4x64_epi64 (i1, 12); +} + +__m256d +foo2 (__m256d d2, __m256d d1) +{ + return _mm256_permute4x64_pd (d1, 12); +} -void vperm_test (void) +__m256 +foo3 (__m256 f2, __m256i i2, __m256 f1) { - i3 = _mm256_permutevar8x32_epi32 (i1, i2); - i4 = _mm256_permute4x64_epi64 (i1, 12); - d2 = _mm256_permute4x64_pd (d1, 12); - f2 = _mm256_permutevar8x32_ps (f1, i2); + return _mm256_permutevar8x32_ps (f1, i2); } /* { dg-final { scan-assembler-times "vxorps" 4 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c b/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c index b334b88194b..b615b8d 100644 --- a/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c +++ b/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c @@ -13,56 +13,219 @@ extern __m512 f1, f11; extern __m256 f2; extern __m128 f3, f33; -__mmask32 m32; __mmask16 m16; __mmask8 m8; -void mullo_test (void) -{ - i1 = _mm512_mullo_epi64 (i1, i1); - i1 = _mm512_mask_mullo_epi64 (i1, m8, i1, i1); - i1 = _mm512_maskz_mullo_epi64 (m8, i1, i1); - i2 = _mm256_mullo_epi64 (i2, i2); - i2 = _mm256_mask_mullo_epi64 (i2, m8, i2, i2); - i2 = _mm256_maskz_mullo_epi64 (m8, i2, i2); - i3 = _mm_mullo_epi64 (i3, i3); - i3 = _mm_mask_mullo_epi64 (i3, m8, i3, i3); - i3 = _mm_maskz_mullo_epi64 (m8, i3, i3); -} - -void range_test (void) -{ - d1 = _mm512_range_pd (d1, d11, 15); - d11 = _mm512_range_round_pd (d11, d1, 15, 8); - d1 = _mm512_mask_range_pd (d1, m8, d11, d11, 15); - d11 = _mm512_mask_range_round_pd (d11, m8, d1, d1, 15, 8); - d1 = _mm512_maskz_range_pd (m8, d11, d11, 15); - d11 = _mm512_maskz_range_round_pd (m8, d1, d1, 15, 8); - d2 = _mm256_range_pd (d2, d2, 15); - d2 = _mm256_mask_range_pd (d2, m8, d2, d2, 15); - d2 = _mm256_maskz_range_pd (m8, d2, d2, 15); - d3 = _mm_range_pd (d3, d3, 15); - d3 = _mm_mask_range_pd (d3, m8, d3, d3, 15); - d3 = _mm_maskz_range_pd (m8, d3, d3, 15); - d33 = _mm_range_sd (d33, d33, 15); - d33 = _mm_mask_range_sd (d33, m8, d33, d33, 15); - d33 = _mm_maskz_range_sd (m8, d33, d33, 15); - - f1 = _mm512_range_ps (f1, f11, 15); - f11 = _mm512_range_round_ps (f11, f1, 15, 8); - f1 = _mm512_mask_range_ps (f1, m16, f11, f11, 15); - f11 = _mm512_mask_range_round_ps (f11, m16, f1, f1, 15, 8); - f1 = _mm512_maskz_range_ps (m16, f11, f11, 15); - f11 = _mm512_maskz_range_round_ps (m16, f1, f1, 15, 8); - f2 = _mm256_range_ps (f2, f2, 15); - f2 = _mm256_mask_range_ps (f2, m8, f2, f2, 15); - f2 = _mm256_maskz_range_ps (m8, f2, f2, 15); - f3 = _mm_range_ps (f3, f3, 15); - f3 = _mm_mask_range_ps (f3, m8, f3, f3, 15); - f3 = _mm_maskz_range_ps (m8, f3, f3, 15); - f33 = _mm_range_ss (f33, f33, 15); - f33 = _mm_mask_range_ss (f33, m8, f33, f33, 15); - f33 = _mm_maskz_range_ss (m8, f33, f33, 15); +#define MULLO(func, type) \ + type \ + mullo##type (type i2, type i1) \ + {\ +return func (i1, i1); \ + } + +#define MULLO_MASK(func, type) \ + type \ + mullo_mask##type (type i2, type i1) \ + {\ +return func (i
[PATCH] i386: Fix AVX512BW intrin header with __OPTIMIZE__ [PR 118813]
Hi all, When moving intrins around for AVX10 implementation in GCC 14, the intrin _kshiftli_mask32 and _kshiftri_mask32 are wrongly wrapped by "#if __OPTIMIZE__" instead of "#ifdef __OPTIMIZE__", leading to the intrin file not `-Wsystem-headers -Wundef` clean since r14-4490. Ok for trunk? Thx, Haochen gcc/ChangeLog: PR target/118813 * config/i386/avx512bwintrin.h: Fix wrong __OPTIMIZE__ wrap. --- gcc/config/i386/avx512bwintrin.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h index 187e15a80ca..47c4c03e796 100644 --- a/gcc/config/i386/avx512bwintrin.h +++ b/gcc/config/i386/avx512bwintrin.h @@ -199,7 +199,7 @@ _kunpackw_mask32 (__mmask16 __A, __mmask16 __B) (__mmask32) __B); } -#if __OPTIMIZE__ +#ifdef __OPTIMIZE__ extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _kshiftli_mask32 (__mmask32 __A, unsigned int __B) -- 2.31.1
RE: [PATCH] i386: Fix AVX512BW intrin header with __OPTIMIZE__ [PR 118813]
> -Original Message- > From: Jiang, Haochen > Sent: Monday, February 10, 2025 2:10 PM > To: gcc-patches@gcc.gnu.org > Cc: Liu, Hongtao ; ubiz...@gmail.com > Subject: [PATCH] i386: Fix AVX512BW intrin header with __OPTIMIZE__ [PR > 118813] > > Hi all, > > When moving intrins around for AVX10 implementation in GCC 14, the intrin > _kshiftli_mask32 and _kshiftri_mask32 are wrongly wrapped by "#if > __OPTIMIZE__" instead of "#ifdef __OPTIMIZE__", leading to the intrin file not > `-Wsystem-headers -Wundef` clean since r14-4490. > > Ok for trunk? Ok, and please backport to GCC14 release branch. > > Thx, > Haochen > > gcc/ChangeLog: > > PR target/118813 > * config/i386/avx512bwintrin.h: Fix wrong __OPTIMIZE__ > wrap. > --- > gcc/config/i386/avx512bwintrin.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/config/i386/avx512bwintrin.h > b/gcc/config/i386/avx512bwintrin.h > index 187e15a80ca..47c4c03e796 100644 > --- a/gcc/config/i386/avx512bwintrin.h > +++ b/gcc/config/i386/avx512bwintrin.h > @@ -199,7 +199,7 @@ _kunpackw_mask32 (__mmask16 __A, __mmask16 > __B) > (__mmask32) __B); > } > > -#if __OPTIMIZE__ > +#ifdef __OPTIMIZE__ > extern __inline __mmask32 > __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > _kshiftli_mask32 (__mmask32 __A, unsigned int __B) > -- > 2.31.1
Re: [PATCH] i386: Change RTL representation of bt[lq] [PR118623]
On Sun, Feb 09, 2025 at 09:24:30AM +0100, Uros Bizjak wrote: > "For commutative and comparison operators, a constant is always made > the second operand." Isn't that just for commutative comparison operators (eq, ne, ordered, unordered, ltgt, uneq)? Compare itself even isn't RTX_COMPARE at all, it is RTX_BIN_ARITH, so similar to minus. And I must say I can't find where simplify-rtx.cc or combine.cc would be trying to canonicalize that order (note, it would need to change the related uses of flags as well). Jakub
Re: [PATCH 0/1] gdc: define ELFv1, ELFv2 and D_PPCUseIEEE128 versions for powerpc
Excerpts from liushuyu's message of Februar 6, 2025 2:02 am: > From: Zixing Liu > > This set of patches will add proper IEEE 128 quad precision marking to GDC > compiler, so that it works with the new changes in D standard library > where POWER system can use any math functions inside the standard library > that requires the "real" type. > > The patch also adds the ELFv1 and ELFv2 version identifiers to bridge > the gap between LLVM D Compiler (LDC) and GNU D Compiler (GDC) so that > the user can reliably use the "version(...)" syntax to check for which > ABI is currently in use. Thanks. I wonder if something could be done to predefine ELFv1 for other targets too. Unconditionally calling add_builtin_version in glibc-d.cc, freebsd-d.cc, ..., doesn't seem like the best thing to do, but I'm open for suggestions. > + > + if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128) > +d_add_builtin_version ("D_PPCUseIEEE128"); It says in the spec that all version identifiers derived from any predefined versions by appending any character(s) are reserved. So there's no need for the `D_` prefix, `PPC_UseIEEE128` will suffice. > + > +// { dg-final { scan-assembler "test_version" } } > +extern (C) bool test_version() { > +// { dg-final { scan-assembler "li 3,1" } } > +version (D_PPCUseIEEE128) return true; > +else return false; > +} > + > +// { dg-final { scan-assembler "test_elf_version" } } > +extern (C) bool test_elf_version() { > +// { dg-final { scan-assembler "li 3,1" } } > +version (ELFv2) return true; > +else return false; > +} These two tests should return a different value, otherwise you could just end up matching the same function return twice. Kind regards, Iain.
Re: [PATCH] i386: Change RTL representation of bt[lq] [PR118623]
On Sun, Feb 9, 2025 at 11:31 PM Jakub Jelinek wrote: > > On Sun, Feb 09, 2025 at 09:24:30AM +0100, Uros Bizjak wrote: > > "For commutative and comparison operators, a constant is always made > > the second operand." > > Isn't that just for commutative comparison operators (eq, ne, ordered, > unordered, ltgt, uneq)? Compare itself even isn't RTX_COMPARE at all, > it is RTX_BIN_ARITH, so similar to minus. > And I must say I can't find where simplify-rtx.cc or combine.cc would be > trying to canonicalize that order (note, it would need to change the related > uses of flags as well). When combine combines some RTX with any operand of COMPARE RTX, it canonicalizes COMPARE RTX in simplify_compare_const: /* Try to simplify a comparison between OP0 and a constant OP1, where CODE is the comparison code that will be tested, into a (CODE OP0 const0_rtx) form. The result is a possibly different comparison code to use. *POP0 and *POP1 may be updated. */ This form won't match the new BT RTX. At the moment, there is no instruction using compare:CCC that could be combined with e.g. ZERO_EXTRACT RTX to form "*bt" insn, so the canonicalization rules do not matter, because BT is always expanded from splitters in a kind-of manual way. This is the reason that patch is OK, but without adding the new form to ix86_canonicalize_comparison, combine won't be able to match the new BT RTX in future. Please see ix86_canonicalize_comparison for a couple of examples on how to allow combine to match non-canonical compares. Uros.
Re: [PATCH] c++/modules: Better handle no-linkage decls in unnamed namespaces [PR118799]
On Sun, Feb 09, 2025 at 01:16:00AM +1100, Nathaniel Shead wrote: > Tested on x86_64-pc-linux-gnu, OK for trunk if full bootstrap + regtest > passes? > > -- >8 -- > > There are two issues with no-linkage decls (e.g. explicit type aliases) > in unnamed namespaces that this patch fixes. > > Firstly, we don't currently handle exporting no-linkage decls in unnamed > namespaces. This should be ill-formed in [module.export], since having > an exported declaration within a namespace-definition makes the > namespace definition exported (p2), but an unnamed namespace has > internal linkage thus violating p3. > > Secondly, by the standard it appears to be possible to emit unnamed > namespaces from named modules in certain scenarios. This patch makes > the adjustments needed to ensure we don't error in this case. > One thing to note with this is that it means that the following sample: export module M; namespace { struct Internal {}; using Alias = Internal; } will still error: test.cpp:4:9: error: ‘using {anonymous}::Alias = struct {anonymous}::Internal’ exposes TU-local entity ‘struct {anonymous}::Internal’ 4 | using Alias = Internal; | ^ test.cpp:3:10: note: ‘struct {anonymous}::Internal’ declared with internal linkage 3 | struct Internal {}; | ^~~~ https://eel.is/c++draft/basic.link#17 is the relevant paragraph here, but I'm not 100% sure what it says about this example; I suppose that given Alias isn't really an entity maybe this should be OK? But either way we definitely don't want to emit this alias. Maybe the correct approach here is to mark an explicit type alias TU-local iff the type it refers to is itself TU-local? So something like this on top of this patch: diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 21251f98a35..7aa7f93df57 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -13331,6 +13331,8 @@ bool depset::hash::is_tu_local_entity (tree decl, bool explain/*=false*/) { gcc_checking_assert (DECL_P (decl)); + location_t loc = DECL_SOURCE_LOCATION (decl); + tree type = TREE_TYPE (decl); /* Only types, functions, variables, and template (specialisations) can be TU-local. */ @@ -13340,15 +13342,22 @@ depset::hash::is_tu_local_entity (tree decl, bool explain/*=false*/) && TREE_CODE (decl) != TEMPLATE_DECL) return false; - /* An explicit type alias is not an entity, and so is never TU-local. - Neither are the built-in declarations of 'int' and such. */ + /* An explicit type alias is not an entity, but rather inherits + whether it's TU-local from the type that it aliases. + The built-in declarations of 'int' and such are never TU-local. */ if (TREE_CODE (decl) == TYPE_DECL && !DECL_SELF_REFERENCE_P (decl) && !DECL_IMPLICIT_TYPEDEF_P (decl)) -return false; - - location_t loc = DECL_SOURCE_LOCATION (decl); - tree type = TREE_TYPE (decl); +{ + if (tree orig = DECL_ORIGINAL_TYPE (decl)) +{ + if (explain) +inform (loc, "%qD is an alias of TU-local type %qT", decl, orig); + return is_tu_local_entity (TYPE_NAME (orig), explain); +} + else +return false; +} /* Check specializations first for slightly better explanations. */ int use_tpl = -1; Thoughts? > PR c++/118799 > > gcc/cp/ChangeLog: > > * module.cc (depset::hash::is_tu_local_entity): Only types, > functions, variables, and template (specialisations) can be > TU-local. > (module_state::write_namespaces): Allow unnamed namespaces in > named modules. > (check_module_decl_linkage): Error for all exported declarations > in an unnamed namespace. > > gcc/testsuite/ChangeLog: > > * g++.dg/modules/export-6.C: Adjust error message, add check for > no-linkage decls in namespace. > * g++.dg/modules/internal-4_b.C: Allow exposing a namespace with > internal linkage. > * g++.dg/modules/using-30_a.C: New test. > * g++.dg/modules/using-30_b.C: New test. > * g++.dg/modules/using-30_c.C: New test. > > Signed-off-by: Nathaniel Shead > --- > gcc/cp/module.cc| 35 - > gcc/testsuite/g++.dg/modules/export-6.C | 33 ++- > gcc/testsuite/g++.dg/modules/internal-4_b.C | 4 +-- > gcc/testsuite/g++.dg/modules/using-30_a.C | 9 ++ > gcc/testsuite/g++.dg/modules/using-30_b.C | 6 > gcc/testsuite/g++.dg/modules/using-30_c.C | 13 > 6 files changed, 77 insertions(+), 23 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/modules/using-30_a.C > create mode 100644 gcc/testsuite/g++.dg/modules/using-30_b.C > create mode 100644 gcc/testsuite/g++.dg/modules/using-30_c.C > > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc > index 59716e1873e..21251f98a35 100644 > --- a/gcc/cp/module.cc > +++ b/gcc/cp/module.cc > @@ -13332,6 +13332,14 @@ depset::hash::is_tu_loca
Re: [PATCH v2] RISC-V: Vector pesudoinsns with x0 operand to use imm 0
On 2/8/25 23:02, Jeff Law wrote: > On 2/7/25 9:34 PM, Vineet Gupta wrote: >> A couple of Vector pseudoinstructions use x0 scalar which being regfile >> crosser could be inefficient on certain wider uarches. >> >> Use the imm 0 form, which should be functionally equivalent. >> >> pseudoinsnorig insn with x0 this patch >> --- >> vneg.v vd,vs vrsub.vx vd,vs,x0 vrsub.vi vd,vs,0 >> vncvt.x.x.w vd,vs,vm vnsrl.wx vd,vs,x0,vm vnsrl.wi vd,vs,0,vm >> vwcvt.x.x.v vd,vs,vm vwadd.vx vd,vs,x0,vm (imm not supported) >> >> This passes my testsuite A/B run but obviously wait for the CI tester to >> give a green light. >> >> gcc/ChangeLog: >> * config/riscv/vector.md: vncvt substitute vnsrl. >> vnsrl with x0 replace with immediate 0. >> vneg substitute vrsub. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: >> Change >> expected pattern. >> * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: >> Ditto. >> * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: >> Ditto. >> * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: >> Ditto. >> * gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto. >> * gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto. >> * gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto. >> * gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto. >> * gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto. >> * gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto. >> * gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto. >> * gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto. >> * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: Ditto. >> * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: Ditto. >> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Ditto >> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c: Ditto. >> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c: Ditto. >> * gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Ditto. >> * gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto. >> * gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Ditto. >> * gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Ditto. >> * gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto. >> * gcc.target/riscv/rvv/autovec/vls/cond_convert-11.c: Ditto. >> * gcc.target/riscv/rvv/autovec/vls/cond_convert-12.c: Ditto. >> * gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: Ditto. >> * gcc.target/riscv/rvv/autovec/vls/cond_trunc-1.c: Ditto. >> * gcc.target/riscv/rvv/autovec/vls/cond_trunc-2.c: Ditto. >> * gcc.target/riscv/rvv/autovec/vls/cond_trunc-3.c: Ditto. >> * gcc.target/riscv/rvv/autovec/vls/convert-11.c: Ditto. >> * gcc.target/riscv/rvv/autovec/vls/convert-12.c: Ditto. >> * gcc.target/riscv/rvv/autovec/vls/neg-1.c: Ditto. >> * gcc.target/riscv/rvv/autovec/vls/trunc-1.c: Ditto. >> * gcc.target/riscv/rvv/autovec/vls/trunc-2.c: Ditto. >> * gcc.target/riscv/rvv/autovec/vls/trunc-3.c: Ditto. >> * gcc.target/riscv/rvv/base/simplify-vdiv.c: Ditto. >> * gcc.target/riscv/rvv/base/unop_v_constraint-1.c: Ditto. > LGTM. I think the only question is whether or not to make an exception > for this or not. We are in stage4 after all ;-) Figure we can make a > decision on the Tues call if you're available. I don't have a strong opinion either way, just wanted to get it out of my tree :-) Yeah sure, 9 PM IST is manageable. -Vineet
Re: [committed][rtl-optimization/116244] Don't create bogus regs in alter_subreg
CCing Denis Am 08.02.25 um 23:51 schrieb Jeff Law: On 2/8/25 1:52 PM, Georg-Johann Lay wrote: Am 08.02.25 um 18:23 schrieb Jeff Law: On 2/8/25 3:04 AM, Georg-Johann Lay wrote: That test case from https://gcc.gnu.org/bugzilla/show_bug.cgi? id=116389#c7 still ICEs with that change in http://gcc.gnu.org/r15-7428 pr116389-red.c: In function 'func': pr116389-red.c:20:1: error: insn does not satisfy its constraints: 20 | } | ^ (insn 27 69 28 5 (set (mem/c:SI (plus:HI (reg/f:HI 28 r28) (const_int 2 [0x2])) [4 %sfp+2 S4 A8]) (reg:SI 30 r30)) "pr116389-red.c":16:19 146 {*movsi_split} (nil)) during RTL pass: postreload pr116389-red.c:20:1: internal compiler error: in extract_constrain_insn, at recog.cc:2783 Reason is that R30 is the last GPR and can hold HImode at most, but due to a paradoxical subreg, there is r30:SI. Bummer as that was the kind of scenario it's supposed to fix. What did that insn look like before IRA and for whatever pseudo was in that operand, what hard register did IRA think it should be allocated to? jeff The .ira dump has several paradoxical subregs like: (insn 22 21 60 4 (set (reg/v:SI 51 [ val32 ]) (subreg:SI (reg:HI 53 [ t$val ]) 0)) "pr116389-red.c":14:14 146 {*movsi_split} (insn 27 26 28 5 (set (reg/v:SI 51 [ val32 ]) (subreg:SI (reg/v:HI 52 [ diff ]) 0)) "pr116389-red.c":16:19 146 {*movsi_split} (insn 35 34 36 7 (set (reg:HI 54 [ _7 ]) (ashift:HI (subreg:HI (reg/v:SI 51 [ val32 ]) 0) (const_int 1 [0x1]))) "pr116389-red.c":18:13 667 {ashlhi3} I don't know which one is causing the trouble. Maybe the attached dumps will help. I would suggest looking at the .IRA dump. You want to know what register was assigned to pseudo 52. I'm guessing it'll be r30, at which point you'll probably want to debug the same code I just changed to figure out why it's not working in the expected manner. jeff That subreg handling in ira-build.c triggers 3 times: create_insn_allocnos[func:ira(324)]: outer=(subreg:HI (reg/v:SI 51 [ val32 ]) 0) create_insn_allocnos[func:ira(324)]: outer=(subreg:SI (reg/v:HI 52 [ diff ]) 0) create_insn_allocnos[func:ira(324)]: outer=(subreg:SI (reg:HI 53 [ t$val ]) 0) Insn 27 in the .ira dump reads: (insn 27 26 28 5 (set (reg/v:SI 51 [ val32 ]) (subreg:SI (reg/v:HI 52 [ diff ]) 0)) "pr116389-red.c":16:19 146 {*movsi_split} (expr_list:REG_DEAD (reg/v:HI 52 [ diff ]) As far as I can see, IRA doesn't allocate a pseudo but expects a spill for r52: Allocno a3r52 of GENERAL_REGS(14) has 2 avail. regs 18-19, node: 18-19 obj 0 (confl regs = 0-17 20-36), obj 1 (confl regs = 0-17 20-36) Pushing a3(r52,l0)(potential spill: pri=12000, cost=12000) Popping a4(r53,l0) -- assign reg 18 Popping a3(r52,l0) -- spill Popping a0(r54,l0) -- assign reg 24 Disposition: 1:r51 l0 mem3:r52 l0 mem4:r53 l0180:r54 l0 24 6:r55 l0208:r56 l0247:r57 l0245:r58 l0 24 2:r59 l024 And spill code generation is up to reload? .reload reads: insn=27, live_throughout: 32, dead_or_set: 51, 52 Spilling for insn 27. Using reg 20 for reload 1 Reloads for insn # 27 Reload 0: reload_out (SI) = (reg/v:SI 51 [ val32 ]) NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional reload_out_reg: (reg/v:SI 51 [ val32 ]) Reload 1: reload_in (HI) = (reg/v:HI 52 [ diff ]) GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1) reload_in_reg: (reg/v:HI 52 [ diff ]) reload_reg_rtx: (reg:HI 30 r30) (insn 69 26 27 5 (set (reg:HI 30 r30) (mem/c:HI (plus:HI (reg/f:HI 28 r28) (const_int 2 [0x2])) [4 %sfp+2 S2 A8])) "pr116389-red.c":16:19 128 {*movhi_split} (nil)) (insn 27 69 28 5 (set (mem/c:SI (plus:HI (reg/f:HI 28 r28) (const_int 2 [0x2])) [4 %sfp+2 S4 A8]) (reg:SI 30 r30)) "pr116389-red.c":16:19 146 {*movsi_split} (nil)) So it loads HI:30 from fp+2 and then pushes it as SI:30 to fp+2. Insn 69 is generated by reload. It seems reload is generating these insns to do a paradoxical subreg in memory since it cannot be done in registers? Johann
Re: [PATCH] i386: Change RTL representation of bt[lq] [PR118623]
On Sat, Feb 8, 2025 at 9:40 AM Jakub Jelinek wrote: > > Hi! > > The following testcase is miscompiled because of RTL represententation > of bt{l,q} insn followed by e.g. j{c,nc} being misleading to what it > actually does. > Let's look e.g. at > (define_insn_and_split "*jcc_bt" > [(set (pc) > (if_then_else (match_operator 0 "bt_comparison_operator" > [(zero_extract:SWI48 >(match_operand:SWI48 1 "nonimmediate_operand") >(const_int 1) >(match_operand:QI 2 "nonmemory_operand")) > (const_int 0)]) > (label_ref (match_operand 3)) > (pc))) >(clobber (reg:CC FLAGS_REG))] > "(TARGET_USE_BT || optimize_function_for_size_p (cfun)) >&& (CONST_INT_P (operands[2]) >? (INTVAL (operands[2]) < GET_MODE_BITSIZE (mode) > && INTVAL (operands[2]) >>= (optimize_function_for_size_p (cfun) ? 8 : 32)) >: !memory_operand (operands[1], mode)) >&& ix86_pre_reload_split ()" > "#" > "&& 1" > [(set (reg:CCC FLAGS_REG) > (compare:CCC > (zero_extract:SWI48 > (match_dup 1) > (const_int 1) > (match_dup 2)) > (const_int 0))) >(set (pc) > (if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)]) > (label_ref (match_dup 3)) > (pc)))] > { > operands[0] = shallow_copy_rtx (operands[0]); > PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0]))); > }) > The define_insn part in RTL describes exactly what it does, > jumps to op3 if bit op2 in op1 is set (for op0 NE) or not set (for op0 EQ). > The problem is with what it splits into. > put_condition_code %C1 for CCCmode comparisons emits c for EQ and LTU, > nc for NE and GEU and ICEs otherwise. > CCCmode is used mainly for carry out of add/adc, borrow out of sub/sbb, > in those cases e.g. for add we have > (set (reg:CCC flags) (compare:CCC (plus:M x y) x)) > and use (ltu (reg:CCC flags) (const_int 0)) for carry set and > (geu (reg:CCC flags) (const_int 0)) for carry not set. These cases > model in RTL what is actually happening, compare in infinite precision > x from the result of finite precision addition in M mode and if it is > less than unsigned (i.e. overflow happened), carry is set. > Another use of CCCmode is in UNSPEC_* patterns, those are used with > (eq (reg:CCC flags) (const_int 0)) for carry set and ne for unset, > given the UNSPEC no big deal, the middle-end doesn't know what means > set or unset. > But for the bt{l,q}; j{c,nc} case the above splits it into > (set (reg:CCC flags) (compare:CCC (zero_extract) (const_int 0))) > for bt and > (set (pc) (if_then_else (eq (reg:CCC flags) (const_int 0)) (label_ref) (pc))) > for the bit set case (so that the jump expands to jc) and ne for > the bit not set case (so that the jump expands to jnc). > Similarly for the different splitters for cmov and set{c,nc} etc. > The problem is that when the middle-end reads this RTL, it feels > the exact opposite to it. If zero_extract is 1, flags is set > to comparison of 1 and 0 and that would mean using ne ne in the > if_then_else, and vice versa. > > So, in order to better describe in RTL what is actually happening, > one possibility would be to swap the behavior of put_condition_code > and use NE + LTU -> c and EQ + GEU -> nc rather than the current > EQ + LTU -> c and NE + GEU -> nc; and adjust everything. The > following patch uses a more limited approach, instead of representing > bt{l,q}; j{c,nc} case as written above it uses > (set (reg:CCC flags) (compare:CCC (const_int 0) (zero_extract))) > and > (set (pc) (if_then_else (ltu (reg:CCC flags) (const_int 0)) (label_ref) (pc))) > which uses the existing put_condition_code but describes what the > insns actually do in RTL clearly. If zero_extract is 1, > then flags are LTU, 0U < 1U, if zero_extract is 0, then flags are GEU, > 0U >= 0U. The patch adjusts the *bt define_insn and all the > splitters to it and its comparisons/conditional moves/setXX. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2025-02-08 Jakub Jelinek > > PR target/118623 > * config/i386/i386.md (*bt): Represent bt as > compare:CCC of const0_rtx and zero_extract rather than > zero_extract and const0_rtx. > (*bt_mask): Likewise. > (*jcc_bt): Likewise. Use LTU and GEU as flags test > instead of EQ and NE. > (*jcc_bt_mask): Likewise. > (*jcc_bt_mask_1): Likewise. > (Help combine recognize bt followed by cmov splitter): Likewise. > (*bt_setcqi): Likewise. > (*bt_setncqi): Likewise. > (*bt_setnc): Likewise. > (*bt_setncqi_2): Likewise. > (*bt_setc_mask): Likewise. > > * gcc.c-torture/execute/pr118623.c: New test. OK, but please note that gene
Re: [Patch, Fortran] Fix PR 24878, checking actual arguments against global symbols
Am 08.02.25 um 22:46 schrieb Harald Anlauf: looks good, just two minor comments: + actual_name = act_sym->name ? act_sym->name : act_sym->name; Why not just + actual_name = act_sym->name; ? That was a leftover from a previous commit. + gfc_error ("Type mismatch passing global function %qs " + "declared at %L at %L (%s/%s)", actual_name, + &gsym->where, &actual->where, + gfc_typename (&global_asym->ts), + gfc_dummy_typename (&formal->ts)); These result in lines exceeding column 80. Fixed. I am also not a native speaker, but "at %L at %L" sounds strange to me. Could you find a minor rewording? I tried, but I could not find anything better... So, if anybody can think of a more clever wording, the patch for this is pre-approved :-) Committed as r15-7449, thanks for the review! Best regards Thomas
[committed][PR middle-end/117263] Avoid unused-but-set warning in genautomata
This is a trivial bug where a user wanted to define NDEBUG when building genautomata, presumably trying to debug its behavior. This resulted in a unused-but-set warning which caused the build to fail. Dario included the trivial fixes in the PR which I put through the usual bootstrap & regression test as well as compiling genautomata with NDEBUG. Pushing to the trunk. I'm not addressing whether or not NDEBUG is still useful. That would be Vlad's call. Jeff commit b81bb3ed216213fdaba82addae9fc34619ad6ec7 Author: Dario Gjorgjevski Date: Sun Feb 9 09:16:31 2025 -0700 [PR middle-end/117263] Avoid unused-but-set warning in genautomata This is a trivial bug where a user wanted to define NDEBUG when building genautomata, presumably trying to debug its behavior. This resulted in a unused-but-set warning which caused the build to fail. Dario included the trivial fixes in the PR which I put through the usual bootstrap & regression test as well as compiling genautomata with NDEBUG. Pushing to the trunk. PR middle-end/117263 gcc/ * genautomata.cc (output_statistics): Avoid set but unnused warnings when compiling with NDEBUG. diff --git a/gcc/genautomata.cc b/gcc/genautomata.cc index 69f856d141c..4059a229f96 100644 --- a/gcc/genautomata.cc +++ b/gcc/genautomata.cc @@ -9088,8 +9088,8 @@ static void output_statistics (FILE *f) { automaton_t automaton; - int states_num; #ifndef NDEBUG + int states_num; int transition_comb_vect_els = 0; int transition_full_vect_els = 0; int min_issue_delay_vect_els = 0; @@ -9106,13 +9106,17 @@ output_statistics (FILE *f) automaton->NDFA_states_num, automaton->NDFA_arcs_num); fprintf (f, "%5d DFA states, %5d DFA arcs\n", automaton->DFA_states_num, automaton->DFA_arcs_num); +#ifndef NDEBUG states_num = automaton->DFA_states_num; +#endif if (!no_minimization_flag) { fprintf (f, "%5d minimal DFA states, %5d minimal DFA arcs\n", automaton->minimal_DFA_states_num, automaton->minimal_DFA_arcs_num); +#ifndef NDEBUG states_num = automaton->minimal_DFA_states_num; +#endif } fprintf (f, "%5d all insns %5d insn equivalence classes\n", description->insns_num, automaton->insn_equiv_classes_num);
Re: [committed][rtl-optimization/116244] Don't create bogus regs in alter_subreg
On 2/9/25 1:10 AM, Georg-Johann Lay wrote: The .ira dump has several paradoxical subregs like: (insn 22 21 60 4 (set (reg/v:SI 51 [ val32 ]) (subreg:SI (reg:HI 53 [ t$val ]) 0)) "pr116389-red.c":14:14 146 {*movsi_split} (insn 27 26 28 5 (set (reg/v:SI 51 [ val32 ]) (subreg:SI (reg/v:HI 52 [ diff ]) 0)) "pr116389-red.c":16:19 146 {*movsi_split} (insn 35 34 36 7 (set (reg:HI 54 [ _7 ]) (ashift:HI (subreg:HI (reg/v:SI 51 [ val32 ]) 0) (const_int 1 [0x1]))) "pr116389-red.c":18:13 667 {ashlhi3} I don't know which one is causing the trouble. Maybe the attached dumps will help. I would suggest looking at the .IRA dump. You want to know what register was assigned to pseudo 52. I'm guessing it'll be r30, at which point you'll probably want to debug the same code I just changed to figure out why it's not working in the expected manner. jeff That subreg handling in ira-build.c triggers 3 times: create_insn_allocnos[func:ira(324)]: outer=(subreg:HI (reg/v:SI 51 [ val32 ]) 0) create_insn_allocnos[func:ira(324)]: outer=(subreg:SI (reg/v:HI 52 [ diff ]) 0) create_insn_allocnos[func:ira(324)]: outer=(subreg:SI (reg:HI 53 [ t$val ]) 0) Insn 27 in the .ira dump reads: (insn 27 26 28 5 (set (reg/v:SI 51 [ val32 ]) (subreg:SI (reg/v:HI 52 [ diff ]) 0)) "pr116389-red.c":16:19 146 {*movsi_split} (expr_list:REG_DEAD (reg/v:HI 52 [ diff ]) As far as I can see, IRA doesn't allocate a pseudo but expects a spill for r52: Allocno a3r52 of GENERAL_REGS(14) has 2 avail. regs 18-19, node: 18-19 obj 0 (confl regs = 0-17 20-36), obj 1 (confl regs = 0-17 20-36) Pushing a3(r52,l0)(potential spill: pri=12000, cost=12000) Popping a4(r53,l0) -- assign reg 18 Popping a3(r52,l0) -- spill Popping a0(r54,l0) -- assign reg 24 Disposition: 1:r51 l0 mem 3:r52 l0 mem 4:r53 l0 18 0:r54 l0 24 6:r55 l0 20 8:r56 l0 24 7:r57 l0 24 5:r58 l0 24 2:r59 l0 24 And spill code generation is up to reload? .reload reads: Yes. What this all says is that IRA is doing something sensible. So the problem is reload or something in the avr port. insn=27, live_throughout: 32, dead_or_set: 51, 52 Spilling for insn 27. Using reg 20 for reload 1 Reloads for insn # 27 Reload 0: reload_out (SI) = (reg/v:SI 51 [ val32 ]) NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional reload_out_reg: (reg/v:SI 51 [ val32 ]) Reload 1: reload_in (HI) = (reg/v:HI 52 [ diff ]) GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1) reload_in_reg: (reg/v:HI 52 [ diff ]) reload_reg_rtx: (reg:HI 30 r30) (insn 69 26 27 5 (set (reg:HI 30 r30) (mem/c:HI (plus:HI (reg/f:HI 28 r28) (const_int 2 [0x2])) [4 %sfp+2 S2 A8])) "pr116389- red.c":16:19 128 {*movhi_split} (nil)) (insn 27 69 28 5 (set (mem/c:SI (plus:HI (reg/f:HI 28 r28) (const_int 2 [0x2])) [4 %sfp+2 S4 A8]) (reg:SI 30 r30)) "pr116389-red.c":16:19 146 {*movsi_split} (nil)) So it loads HI:30 from fp+2 and then pushes it as SI:30 to fp+2. Insn 69 is generated by reload. It seems reload is generating these insns to do a paradoxical subreg in memory since it cannot be done in registers? My working memory of reload is diminishing by the day, especially anything related to subregs. I wouldn't necessarily make the assumption that it cannot be done, it's more likely there just aren't enough registers available at the right points. So the object gets forced into memory irrespective of whether or not a paradoxical subreg is involved. But that's speculation -- subreg handling in reload is nontrivial and there may be places where it just gives up a generates reloads. If someone wanted to chase down reload's behavior, I'd start with the selection of r30 as the reload register. It makes sense in HImode, but not when there's a outer paradoxical subreg at the use point. jeff Johann
[committed][RISC-V][PR target/115123] Fix testsuite fallout from sinking heuristic change
Code sinking is just semantic preserving code motions, so it's a lot like scheduling in that code motions can change the vector configuration needed at various program points. That in turn can also change the number of vsetvls as we may or may not be able to merge them after the code motions. The sinking heuristics were twiddled several months ago resulting in a handful of scan-asm failures. This patch adjusts the tests appropriately fixing pr115123 (P3 regression). Pushing to the trunk. jeff commit 22e30d60b971eed9a4754ea920d05b1b7e89090a Author: Jeff Law Date: Sun Feb 9 09:55:56 2025 -0700 [PR target/115123] Fix testsuite fallout from sinking heuristic change Code sinking is just semantic preserving code motions, so it's a lot like scheduling in that code motions can change the vector configuration needed at various program points. That in turn can also change the number of vsetvls as we may or may not be able to merge them after the code motions. The sinking heuristics were twiddled several months ago resulting in a handful of scan-asm failures. This patch adjusts the tests appropriately fixing pr115123 (P3 regression). PR target/115123 gcc/testsuite * gcc.target/riscv/rvv/base/pr114352-3.c: Adjust expected output. * gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-66.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-82.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-83.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-86.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-88.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-90.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-91.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-92.c: Likewise. diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c index a764afbbbc1..9bfa39c5268 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c @@ -4,6 +4,7 @@ /* ** test_1: +** ... ** sext\.w\s+[atx][0-9]+,\s*[atx][0-9]+ ** ... */ @@ -56,6 +57,7 @@ test_3 (int *a, int *b, int *out, unsigned count) /* ** test_4: +** ... ** sext\.w\s+[atx][0-9]+,\s*[atx][0-9]+ ** ... */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c index 21bc0729cf6..cdb1a4ee921 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c @@ -37,4 +37,4 @@ void f (void * restrict in, void * restrict out, int l, int n, int m, int cond) } } -/* { dg-final { scan-assembler {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+ble\s+[a-x0-9]+,\s*zero,\.L[0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+ble\s+[a-x0-9]+,\s*zero,\.L[0-9]+\s+.L[0-9]+:\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c index 5539486b506..c7c9e1f91ff 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c @@ -36,4 +36,4 @@ void f (void * restrict in, void * restrict out, int l, int n, int m, int cond) } } -/* { dg-final { scan-assembler {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+ble\s+[a-x0-9]+,\s*zero,\.L[0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+ble\s+[a-x0-9]+,\s*zero,\.L[0-9]+\s+.L[0-9]+:\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-66.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-66.c index 6e995461c6f..c174845f7bc 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-66.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-66.c @@ -17,5 +17,4 @@ void f2 (void * restrict in, void * restrict out, int l, int n, int m, size_t vl } } -/* { dg-final {