[r15-3052 Regression] FAIL: gfortran.dg/sizeof_6.f90 -O1 execution test on Linux/x86_64
On Linux/x86_64, c7b76a076cb2c6ded7ae208464019b04cb0531a2 is the first bad commit commit c7b76a076cb2c6ded7ae208464019b04cb0531a2 Author: Andrew Pinski Date: Mon Aug 19 08:06:36 2024 -0700 match: Reject non-ssa name/min invariants in gimple_extract [PR116412] caused FAIL: gfortran.dg/sizeof_6.f90 -O1 execution test with GCC configured with ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3052/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/sizeof_6.f90 --target_board='unix{-m64}'" (Please do not reply to this email, for question about this report, contact me at haochen dot jiang at intel.com.) (If you met problems with cascadelake related, disabling AVX512F in command line might save that.) (However, please make sure that there is no potential problems with AVX512.)
[PATCH v3] RISC-V: Enable -gvariable-location-views by default
This affects only the RISC-V targets, where the compiler options -gvariable-location-views and consequently also -ginline-points are disabled by default, which is unexpected and disables some useful features of the generated debug info. Due to a bug in the gas assembler the .loc statement is not usable to generate location view debug info. That is detected by configure: configure:31500: checking assembler for dwarf2 debug_view support configure:31509: .../riscv-unknown-elf/bin/as-o conftest.o conftest.s >&5 conftest.s: Assembler messages: conftest.s:5: Error: .uleb128 only supports constant or subtract expressions conftest.s:6: Error: .uleb128 only supports constant or subtract expressions configure:31512: $? = 1 configure: failed program was .file 1 "conftest.s" .loc 1 3 0 view .LVU1 nop .data .uleb128 .LVU1 .uleb128 .LVU1 configure:31523: result: no This results in dwarf2out_as_locview_support being set to false, and that creates a sequence of events, with the end result that most inlined functions either have no DW_AT_entry_pc, or one with a wrong entry pc value. But the location views can also be generated without using any .loc statements, therefore we should enable the option -gvariable-location-views by default, regardless of the status of -gas-locview-support. Note however, that the combination of the following compiler options -g -O2 -gvariable-location-views -gno-as-loc-support turned out to create invalid assembler intermediate files, with lots of assembler errors like: Error: leb128 operand is an undefined symbol: .LVU3 This affects all targets, except RISC-V of course ;-) and is fixed by the changes in dwarf2out.cc Finally the .debug_loclists created without assembler locview support did contain location view pairs like v000 v000 which is the value from FORCE_RESET_NEXT_VIEW, but that is most likely not as expected either, so change that as well. gcc/ChangeLog: * dwarf2out.cc (dwarf2out_maybe_output_loclist_view_pair, output_loc_list): Correct handling of -gno-as-loc-support, use ZERO_VIEW_P to output view number as zero value. * toplev.cc (process_options): Do not automatically disable -gvariable-location-views when -gno-as-loc-support or -gno-as-locview-support is used. gcc/testsuite/ChangeLog: * gcc.dg/debug/dwarf2/inline2.c: Add checks for inline entry_pc. * gcc.dg/debug/dwarf2/inline6.c: Add -gno-as-loc-support and check the resulting location views. --- gcc/dwarf2out.cc| 16 ++-- gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c | 3 +++ gcc/testsuite/gcc.dg/debug/dwarf2/inline6.c | 7 ++- gcc/toplev.cc | 4 +--- 4 files changed, 20 insertions(+), 10 deletions(-) v2: fixed the boot-strap error triggered by v1 on any target, except RISC-V, the issue was triggered by libstdc++-v3/src/c++11/cxx11-ios_failure-lt.s which is generated using -gno-as-loc-support, which triggered a latent issue. v3: added some tests for the fixed bugs. Regression-tested on x86_64-pc-linux-gnu, riscv-unknown-elf and riscv64-unknown-elf, OK for trunk? diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc index 346feeb53c8..79f97b5a55e 100644 --- a/gcc/dwarf2out.cc +++ b/gcc/dwarf2out.cc @@ -10374,7 +10374,7 @@ dwarf2out_maybe_output_loclist_view_pair (dw_loc_list_ref curr) #ifdef DW_LLE_view_pair dw2_asm_output_data (1, DW_LLE_view_pair, "DW_LLE_view_pair"); - if (dwarf2out_as_locview_support) + if (dwarf2out_as_locview_support && dwarf2out_as_loc_support) { if (ZERO_VIEW_P (curr->vbegin)) dw2_asm_output_data_uleb128 (0, "Location view begin"); @@ -10396,8 +10396,10 @@ dwarf2out_maybe_output_loclist_view_pair (dw_loc_list_ref curr) } else { - dw2_asm_output_data_uleb128 (curr->vbegin, "Location view begin"); - dw2_asm_output_data_uleb128 (curr->vend, "Location view end"); + dw2_asm_output_data_uleb128 (ZERO_VIEW_P (curr->vbegin) + ? 0 : curr->vbegin, "Location view begin"); + dw2_asm_output_data_uleb128 (ZERO_VIEW_P (curr->vend) + ? 0 : curr->vend, "Location view end"); } #endif /* DW_LLE_view_pair */ @@ -10430,7 +10432,7 @@ output_loc_list (dw_loc_list_ref list_head) vcount++; /* ?? dwarf_split_debug_info? */ - if (dwarf2out_as_locview_support) + if (dwarf2out_as_locview_support && dwarf2out_as_loc_support) { char label[MAX_ARTIFICIAL_LABEL_BYTES]; @@ -10460,10 +10462,12 @@ output_loc_list (dw_loc_list_ref list_head) } else { - dw2_asm_output_data_uleb128 (curr->vbegin, + dw2_asm_output_data_uleb128 (ZERO_VIEW_P (curr->vbegin) + ? 0 : curr->vbegin,
Re: [PATCH] Do not emit a redundant DW_TAG_lexical_block for inlined subroutines
On Tue, 20 Aug 2024, Bernd Edlinger wrote: > On 8/20/24 13:00, Richard Biener wrote: > > On Fri, Aug 16, 2024 at 12:49 PM Bernd Edlinger > > wrote: > >> > >> While this already works correctly for the case when an inlined > >> subroutine contains only one subrange, a redundant DW_TAG_lexical_block > >> is still emitted when the subroutine has multiple blocks. > > > > Huh. The point is that the inline context is a single scope block with no > > siblings - how did that get messed up? The patch unfortunately does not > > contain a testcase. > > > > Well, I became aware of this because I am working on a gdb patch, > which improves the debug experience of optimized C code, and to my surprise > the test case did not work with gcc-8, while gcc-9 and following were fine. > Initially I did not see what is wrong, therefore I started to bisect when > this changed, and so I found your patch, which removed some lexical blocks > in the debug info of this gdb test case: > > from binutils-gdb/gdb/testsuite/gdb.cp/step-and-next-inline.cc > in case you have the binutils-gdb already downloaded you can skip this: > $ git clone git://sourceware.org/git/binutils-gdb > $ cd binutils-gdb/gdb/testsuite/gdb.cp > $ gcc -g -O2 step-and-next-inline.cc > > when you look at the debug info with readelf -w a.out > you will see, that the function "tree_check" > is inlined three times, one looks like this > <2><86b>: Abbrev Number: 40 (DW_TAG_inlined_subroutine) > <86c> DW_AT_abstract_origin: <0x95b> > <870> DW_AT_entry_pc: 0x1175 > <878> DW_AT_GNU_entry_view: 0 > <879> DW_AT_ranges : 0x21 > <87d> DW_AT_call_file : 1 > <87e> DW_AT_call_line : 52 > <87f> DW_AT_call_column : 10 > <880> DW_AT_sibling : <0x8bf> > <3><884>: Abbrev Number: 8 (DW_TAG_formal_parameter) > <885> DW_AT_abstract_origin: <0x974> > <889> DW_AT_location: 0x37 (location list) > <88d> DW_AT_GNU_locviews: 0x35 > <3><891>: Abbrev Number: 8 (DW_TAG_formal_parameter) > <892> DW_AT_abstract_origin: <0x96c> > <896> DW_AT_location: 0x47 (location list) > <89a> DW_AT_GNU_locviews: 0x45 > <3><89e>: Abbrev Number: 41 (DW_TAG_lexical_block) > <89f> DW_AT_ranges : 0x21 > > see the lexical block has the same DW_AT_ranges, as the > inlined subroutine, but the other invocations do not > have this lexical block, since your original fix removed > those. > And this lexical block triggered an unexpected issue > in my gdb patch, which I owe you one, for helping me > finding it :-) > > Before that I have never looked at these lexical blocks, > but all I can say is that while compiling this test case, > in the first invocation of gen_inlined_subroutine_die > there are several SUBBLOCKS linked via BLOCK_CHAIN, > and only the first one is used to emit the lexical_block, > while the other siblings must be fully decoded, otherwise > there is an internal error, that I found by try-and-error. > I thought that is since the subroutine is split over several > places, and therefore it appeared natural to me, that the > subroutine is also using several SUBBLOCKS. OK, so the case in question looks like { Scope block #8 step-and-next-inline.cc:52 Originating from : static struct tree * tree_check (struct tree *, int); Fragment chain : #16 #17 struct tree * t; int i; { Scope block #9 Originating from :#0 Fragment chain : #10 #11 struct tree * x; } { Scope block #10 Originating from :#0 Fragment of : #9 struct tree * x; } { Scope block #11 Originating from :#0 Fragment of : #9 struct tree * x; } } so we have fragments here which we should ignore, but then fragments are to collect multiple ranges which, when we do not emit a lexical block for block #9 above, we will likely fail to emit and which we instead should associate with block #8, the DW_TAG_inlined_subroutine. Somehow it seems to "work" as to associate DW_AT_ranges with the DW_TAG_inlined_subroutine. I've used the following - there's no need to process BLOCK_CHAIN as fragments are ignored by gen_block_die. diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc index d5144714c6e..4e6ad2ab7e1 100644 --- a/gcc/dwarf2out.cc +++ b/gcc/dwarf2out.cc @@ -25194,8 +25194,13 @@ gen_inlined_subroutine_die (tree stmt, dw_die_ref context_die) Do that by doing the recursion to subblocks on the single subblock of STMT. */ bool unwrap_one = false; - if (BLOCK_SUBBLOCKS (stmt) && !BLOCK_CHAIN (BLOCK_SUBBLOCKS (stmt))) + if (BLOCK_SUBBLOCKS (stmt)) { + tree subblock = BLOCK_SUBBLOCKS (stmt); + /* We should never elide that BLOCK, but we may have multiple fragments. +Assert that there's only a single real inline-scope block. */ + for (tree next = BLOCK_CHAIN (subblock); next; next = BLOCK_CHAIN (next)) + gcc_checking_assert (BLOCK_FRAGMENT_ORIGIN (next) == subblock); tree origin = block_ultimate_origin (BLOCK_SUBBLOCKS (stmt)); if
[PATCH] rs6000: allow split vsx_stxvd2x4_le_const after RA[pr116030]
Hi, Previous, vsx_stxvd2x4_le_const_ is introduced for 'split1' pass, so it is guarded by "can_create_pseudo_p ()". While, it would be possible to match the pattern of this insn during/after RA, so this insn could be updated to make it work for split pass after RA. Bootstrap®test pass on ppc64{,le}. Is this ok for trunk? BR, Jeff (Jiufu) Guo PR target/116030 gcc/ChangeLog: * config/rs6000/vsx.md (vsx_stxvd2x4_le_const_): Allow insn after RA. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr116030.c: New test. --- gcc/config/rs6000/vsx.md| 9 + gcc/testsuite/gcc.target/powerpc/pr116030.c | 17 + 2 files changed, 22 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116030.c diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 27069d070e1..2dd87b7a9db 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -3454,12 +3454,12 @@ (define_insn "*vsx_stxvd2x4_le_" (define_insn_and_split "vsx_stxvd2x4_le_const_" [(set (match_operand:VSX_W 0 "memory_operand" "=Z") - (match_operand:VSX_W 1 "immediate_operand" "W"))] + (match_operand:VSX_W 1 "immediate_operand" "W")) + (clobber (match_scratch:VSX_W 2 "=&wa"))] "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode) && !TARGET_P9_VECTOR - && const_vec_duplicate_p (operands[1]) - && can_create_pseudo_p ()" + && const_vec_duplicate_p (operands[1])" "#" "&& 1" [(set (match_dup 2) @@ -3472,7 +3472,8 @@ (define_insn_and_split "vsx_stxvd2x4_le_const_" { /* Here all the constants must be loaded without memory. */ gcc_assert (easy_altivec_constant (operands[1], mode)); - operands[2] = gen_reg_rtx (mode); + if (GET_CODE(operands[2]) == SCRATCH) +operands[2] = gen_reg_rtx (mode); } [(set_attr "type" "vecstore") (set_attr "length" "8")]) diff --git a/gcc/testsuite/gcc.target/powerpc/pr116030.c b/gcc/testsuite/gcc.target/powerpc/pr116030.c new file mode 100644 index 000..ada0a4fd2b1 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr116030.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-mdejagnu-cpu=power8 -Os -fno-forward-propagate -ftrivial-auto-var-init=zero -save-temps" } */ + +/* Verify we do not ICE on the tests below. */ +union U128 +{ + _Decimal128 d; + unsigned long long int u[2]; +}; + +union U128 +foo () +{ + volatile union U128 u128; + u128.d = 0.99e+39DL; + return u128; +} -- 2.25.1
Re: [PATCH 1/2] SVE intrinsics: Fold constant operands for svdiv
> On 19 Aug 2024, at 17:03, Richard Sandiford wrote: > > External email: Use caution opening links or attachments > > > Kyrylo Tkachov mailto:ktkac...@nvidia.com>> writes: >> Hi Richard, >> >>> On 19 Aug 2024, at 14:57, Richard Sandiford >>> wrote: >>> >>> External email: Use caution opening links or attachments >>> >>> >>> Jennifer Schmitz writes: This patch implements constant folding for svdiv. A new gimple_folder method was added that uses const_binop to fold binary operations using a given tree_code. For svdiv, this method is used to fold constant operands. Additionally, if at least one of the operands is a zero vector, svdiv is folded to a zero vector (in case of ptrue, _x, or _z). Tests were added to check the produced assembly for different predicates and signed and unsigned integers. Currently, constant folding is only implemented for integers and binary operations, but extending it to float types and other operations is planned for a future follow-up. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold): Add constant folding. * config/aarch64/aarch64-sve-builtins.cc (gimple_folder::const_fold): New method. * config/aarch64/aarch64-sve-builtins.h (gimple_folder::const_fold): Add function declaration. gcc/testsuite/ * gcc.target/aarch64/sve/const_fold_div_1.c: New test. * gcc.target/aarch64/sve/const_fold_div_zero.c: Likewise. From 79355d876503558f661b46ebbeaa11c74ce176cb Mon Sep 17 00:00:00 2001 From: Jennifer Schmitz Date: Thu, 15 Aug 2024 05:42:06 -0700 Subject: [PATCH 1/2] SVE intrinsics: Fold constant operands for svdiv This patch implements constant folding for svdiv. A new gimple_folder method was added that uses const_binop to fold binary operations using a given tree_code. For svdiv, this method is used to fold constant operands. Additionally, if at least one of the operands is a zero vector, svdiv is folded to a zero vector (in case of ptrue, _x, or _z). Tests were added to check the produced assembly for different predicates and signed and unsigned integers. Currently, constant folding is only implemented for integers and binary operations, but extending it to float types and other operations is planned for a future follow-up. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold): Add constant folding. * config/aarch64/aarch64-sve-builtins.cc (gimple_folder::const_fold): New method. * config/aarch64/aarch64-sve-builtins.h (gimple_folder::const_fold): Add function declaration. gcc/testsuite/ * gcc.target/aarch64/sve/const_fold_div_1.c: New test. * gcc.target/aarch64/sve/const_fold_div_zero.c: Likewise. --- .../aarch64/aarch64-sve-builtins-base.cc | 30 ++- gcc/config/aarch64/aarch64-sve-builtins.cc| 25 +++ gcc/config/aarch64/aarch64-sve-builtins.h | 1 + .../gcc.target/aarch64/sve/const_fold_div_1.c | 128 .../aarch64/sve/const_fold_div_zero.c | 186 ++ .../aarch64/sve/const_fold_mul_zero.c | 95 + 6 files changed, 462 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_zero.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_mul_zero.c diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index d55bee0b72f..7f948ecc0c7 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -755,8 +755,32 @@ public: gimple * fold (gimple_folder &f) const override { -tree divisor = gimple_call_arg (f.call, 2); -tree divisor_cst = uniform_integer_cst_p (divisor); +tree pg = gimple_call_arg (f.call, 0); +tree op1 = gimple_call_arg (f.call, 1); +tree op2 = gimple_call_arg (f.call, 2); + +/* For integer division, if the dividend or divisor are all zeros, + fold to zero vector. */ +int step = f.type_suffix (0).element_bytes; +if (f.pred != PRED_m || is_ptrue (pg, step)) + { + if (vector_cst_all_same (op1, step) + && integer_zerop (VECTOR_CST_E
Re: [PATCH] rs6000: allow split vsx_stxvd2x4_le_const after RA[pr116030]
Jiufu Guo writes: > Hi, > > Previous, vsx_stxvd2x4_le_const_ is introduced for 'split1' pass, > so it is guarded by "can_create_pseudo_p ()". > While, it would be possible to match the pattern of this insn during/after > RA, so this insn could be updated to make it work for split pass after RA. > > Bootstrap®test pass on ppc64{,le}. > Is this ok for trunk? > > BR, > Jeff (Jiufu) Guo > > > PR target/116030 > > gcc/ChangeLog: > > * config/rs6000/vsx.md (vsx_stxvd2x4_le_const_): Allow insn > after RA. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr116030.c: New test. > > --- > gcc/config/rs6000/vsx.md| 9 + > gcc/testsuite/gcc.target/powerpc/pr116030.c | 17 + > 2 files changed, 22 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116030.c > > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index 27069d070e1..2dd87b7a9db 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -3454,12 +3454,12 @@ (define_insn "*vsx_stxvd2x4_le_" > > (define_insn_and_split "vsx_stxvd2x4_le_const_" >[(set (match_operand:VSX_W 0 "memory_operand" "=Z") > - (match_operand:VSX_W 1 "immediate_operand" "W"))] > + (match_operand:VSX_W 1 "immediate_operand" "W")) > + (clobber (match_scratch:VSX_W 2 "=&wa"))] >"!BYTES_BIG_ENDIAN > && VECTOR_MEM_VSX_P (mode) > && !TARGET_P9_VECTOR > - && const_vec_duplicate_p (operands[1]) > - && can_create_pseudo_p ()" > + && const_vec_duplicate_p (operands[1])" >"#" >"&& 1" >[(set (match_dup 2) > @@ -3472,7 +3472,8 @@ (define_insn_and_split "vsx_stxvd2x4_le_const_" > { >/* Here all the constants must be loaded without memory. */ >gcc_assert (easy_altivec_constant (operands[1], mode)); > - operands[2] = gen_reg_rtx (mode); > + if (GET_CODE(operands[2]) == SCRATCH) > +operands[2] = gen_reg_rtx (mode); > } >[(set_attr "type" "vecstore") > (set_attr "length" "8")]) > diff --git a/gcc/testsuite/gcc.target/powerpc/pr116030.c > b/gcc/testsuite/gcc.target/powerpc/pr116030.c > new file mode 100644 > index 000..ada0a4fd2b1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr116030.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mdejagnu-cpu=power8 -Os -fno-forward-propagate > -ftrivial-auto-var-init=zero -save-temps" } */ Is -save-temps needed here? > + > +/* Verify we do not ICE on the tests below. */ > +union U128 > +{ > + _Decimal128 d; > + unsigned long long int u[2]; > +}; > + > +union U128 > +foo () > +{ > + volatile union U128 u128; > + u128.d = 0.99e+39DL; > + return u128; > +} signature.asc Description: PGP signature
Re: [Ping x2 , Fortran, Patch, PR77518, (coarray), v4] Fix ICE in sizeof(coarray)
Hi Jerry, thank you for the review. Committed as gcc-15-3062-g515730fd65a Thanks again, Andre On Tue, 20 Aug 2024 09:16:50 -0700 Jerry D wrote: > On 8/20/24 5:35 AM, Andre Vehreschild wrote: > > Hi all, > > > > pinging this patch. > > > > Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline? > > > > Regards, > > Andre > > > > Your approach looks reasonable so I think OK to push. > > Thanks, > > Jerry > -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] rs6000: allow split vsx_stxvd2x4_le_const after RA[pr116030]
Hi, Sam James writes: > Jiufu Guo writes: > >> Hi, >> >> Previous, vsx_stxvd2x4_le_const_ is introduced for 'split1' pass, >> so it is guarded by "can_create_pseudo_p ()". >> While, it would be possible to match the pattern of this insn during/after >> RA, so this insn could be updated to make it work for split pass after RA. >> >> Bootstrap®test pass on ppc64{,le}. >> Is this ok for trunk? >> >> BR, >> Jeff (Jiufu) Guo >> >> >> PR target/116030 >> >> gcc/ChangeLog: >> >> * config/rs6000/vsx.md (vsx_stxvd2x4_le_const_): Allow insn >> after RA. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr116030.c: New test. >> >> --- >> gcc/config/rs6000/vsx.md| 9 + >> gcc/testsuite/gcc.target/powerpc/pr116030.c | 17 + >> 2 files changed, 22 insertions(+), 4 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116030.c >> >> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md >> index 27069d070e1..2dd87b7a9db 100644 >> --- a/gcc/config/rs6000/vsx.md >> +++ b/gcc/config/rs6000/vsx.md >> @@ -3454,12 +3454,12 @@ (define_insn "*vsx_stxvd2x4_le_" >> >> (define_insn_and_split "vsx_stxvd2x4_le_const_" >>[(set (match_operand:VSX_W 0 "memory_operand" "=Z") >> -(match_operand:VSX_W 1 "immediate_operand" "W"))] >> +(match_operand:VSX_W 1 "immediate_operand" "W")) >> + (clobber (match_scratch:VSX_W 2 "=&wa"))] >>"!BYTES_BIG_ENDIAN >> && VECTOR_MEM_VSX_P (mode) >> && !TARGET_P9_VECTOR >> - && const_vec_duplicate_p (operands[1]) >> - && can_create_pseudo_p ()" >> + && const_vec_duplicate_p (operands[1])" >>"#" >>"&& 1" >>[(set (match_dup 2) >> @@ -3472,7 +3472,8 @@ (define_insn_and_split "vsx_stxvd2x4_le_const_" >> { >>/* Here all the constants must be loaded without memory. */ >>gcc_assert (easy_altivec_constant (operands[1], mode)); >> - operands[2] = gen_reg_rtx (mode); >> + if (GET_CODE(operands[2]) == SCRATCH) >> +operands[2] = gen_reg_rtx (mode); >> } >>[(set_attr "type" "vecstore") >> (set_attr "length" "8")]) >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr116030.c >> b/gcc/testsuite/gcc.target/powerpc/pr116030.c >> new file mode 100644 >> index 000..ada0a4fd2b1 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr116030.c >> @@ -0,0 +1,17 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-mdejagnu-cpu=power8 -Os -fno-forward-propagate >> -ftrivial-auto-var-init=zero -save-temps" } */ > > Is -save-temps needed here? Thanks for catching this! We could remove this option for this patch. BR, Jeff (Jiufu) Guo. > >> + >> +/* Verify we do not ICE on the tests below. */ >> +union U128 >> +{ >> + _Decimal128 d; >> + unsigned long long int u[2]; >> +}; >> + >> +union U128 >> +foo () >> +{ >> + volatile union U128 u128; >> + u128.d = 0.99e+39DL; >> + return u128; >> +}
Re: [RFC] Generalize formation of lane-reducing ops in loop reduction
>> >> >> 1. Background >> >> >> >> For loop reduction of accumulating result of a widening operation, the >> >> preferred pattern is lane-reducing operation, if supported by target. >> >> Because >> >> this kind of operation need not preserve intermediate results of widening >> >> operation, and only produces reduced amount of final results for >> >> accumulation, >> >> choosing the pattern could lead to pretty compact codegen. >> >> >> >> Three lane-reducing opcodes are defined in gcc, belonging to two kinds of >> >> operations: dot-product (DOT_PROD_EXPR) and sum-of-absolute-difference >> >> (SAD_EXPR). WIDEN_SUM_EXPR could be seen as a degenerated dot-product >> >> with a >> >> constant operand as "1". Currently, gcc only supports recognition of >> >> simple >> >> lane-reducing case, in which each accumulation statement of loop reduction >> >> forms one pattern: >> >> >> >> char *d0, *d1; >> >> short *s0, *s1; >> >> >> >> for (i) { >> >>sum += d0[i] * d1[i]; // = DOT_PROD > >> char> >> >>sum += abs(s0[i] - s1[i]); // = SAD >> >> } >> >> >> >> We could rewrite the example as the below using only one statement, whose >> >> non- >> >> reduction addend is the sum of the above right-side parts. As a whole, the >> >> addend would match nothing, while its two sub-expressions could be >> >> recognized >> >> as corresponding lane-reducing patterns. >> >> >> >> for (i) { >> >>sum += d0[i] * d1[i] + abs(s0[i] - s1[i]); >> >> } >> > >> > Note we try to recognize the original form as SLP reduction (which of >> > course fails). >> > >> >> This case might be too elaborately crafted to be very common in reality. >> >> Though, we do find seemingly variant but essentially similar code pattern >> >> in >> >> some AI applications, which use matrix-vector operations extensively, some >> >> usages are just single loop reduction composed of multiple dot-products. A >> >> code snippet from ggml: >> >> >> >> for (int j = 0; j < qk/2; ++j) { >> >>const uint8_t xh_0 = ((qh >> (j + 0)) << 4) & 0x10; >> >>const uint8_t xh_1 = ((qh >> (j + 12)) ) & 0x10; >> >> >> >>const int32_t x0 = (x[i].qs[j] & 0xF) | xh_0; >> >>const int32_t x1 = (x[i].qs[j] >> 4) | xh_1; >> >> >> >>sumi += (x0 * y[i].qs[j]) + (x1 * y[i].qs[j + qk/2]); >> >> } >> >> >> >> In the source level, it appears to be a nature and minor scaling-up of >> >> simple >> >> one lane-reducing pattern, but it is beyond capability of current >> >> vectorization >> >> pattern recognition, and needs some kind of generic extension to the >> >> framework. >> >> Sorry for late response. >> >> > So this is about re-associating lane-reducing ops to alternative >> > lane-reducing >> > ops to save repeated accumulation steps? >> >> You mean re-associating slp-based lane-reducing ops to loop-based? > > Yes. > >> > The thing is that IMO pattern recognition as we do now is limiting and >> > should >> > eventually move to the SLP side where we should be able to more freely >> > "undo" and associate. >> >> No matter pattern recognition is done prior to or within SLP, the must thing >> is we >> need to figure out which op is qualified for lane-reducing by some means. >> >> For example, when seeing a mult in a loop with vectorization-favored shape, >> ... >> t = a * b; // char a, b >> ... >> >> we could not say it is decidedly applicable for reduced computation via >> dot-product >> even the corresponding target ISA is available. > > True. Note there's a PR which shows SLP lane-reducing written out like > > a[i] = b[4*i] * 3 + b[4*i+1] * 3 + b[4*i+2] * 3 + b[4*i+3] * 3; > > which we cannot change to a DOT_PROD because we do not know which > lanes are reduced. My point was there are non-reduction cases where knowing > which actual lanes get reduced would help. For reductions it's not important > and associating in a way to expose more possible (reduction) lane reductions > is almost always going to be a win. > >> Recognition of normal patterns merely involves local statement-based match, >> while >> for lane-reducing, validity check requires global loop-wise analysis on >> structure of >> reduction, probably not same as, but close to what is proposed in the RFC. >> The >> basic logic, IMHO, is independent of where pattern recognition is >> implemented. >> As the matter of fact, this is not about of "associating", but "tagging" >> (mark all lane- >> reducing quantifiable statements). After the process, "re-associator" could >> play its >> role to guide selection of either loop-based or slp-based lane-reducing op. >> >> > I've searched twice now, a few days ago I read that the optabs not >> > specifying >> > which lanes are combined/reduced is a limitation. Yes, it is - I hope we >> > can >> > rectify this, so if this is motivation enough we should split the optabs up >> > into even/odd/hi/lo (or whatever else interesting targets actually do). >> >> Actually, how lanes are combined/reduced does
Re: [PATCH 3/8] tree-ifcvt: Enforce zero else value after maskload.
On Wed, 21 Aug 2024, Robin Dapp wrote: > > > > When predicating a load we implicitly assume that the else value is > > > > zero. In order to formalize this this patch queries the target for > > > > its supported else operand and uses that for the maskload call. > > > > Subsequently, if the else operand is nonzero, a cond_expr enforcing > > > > a zero else value is emitted. > > > > Why? I don't think the vectorizer relies on a particular else > > value? I'd say it would be appropriate for if-conversion to > > use "ANY" and for the vectorizer to then pick a supported > > version and/or enforce the else value it needs via a blend? > > In PR115336 we have something like > > _Bool iftmp.0_113; > _Bool iftmp.0_114; > iftmp.0_113 = .MASK_LOAD (_170, 8B, _169, _171(D)); > iftmp.0_114 = _47 | iftmp.0_113; > > which assumes zeroing. I see - is that some trick ifcvt performs? I can't immediately see the connection to the PR and it only contains RISC-V assembly analysis. > In order to circumvent that we could use COND_IOR > but I suppose that wouldn't be optimized away even on targets that zero > masked elements? "ANY" would seem to be wrong here. What I was trying to say is that of course any transform we perform that requires zero-masking should either make .MAKS_LOAD perform that or add a COND_EXPR. But it shouldn't be required to make all .MASK_LOADs be zero-masked, no? I'm of course fine if you think that's the best way for RISC-V given other targets are likely unaffected as they can perform zero-masking. > So instead, right now the flow is to emit a COND_EXPR after the MASK_LOAD > here if the target does not zero and have the vectorizer vectorize it into > a blend (or something else if the surrounding code allows). > > What I didn't do (in the posted version, just locally) is an explicit > VEC_COND_EXPR after each masked (gather/load lanes) call the vectorizer > does. Do we need that? AFAICT loop masking (be it len style or > fully-masked style) should be safe. Well, why should we need that? There seem to be the assumption that .MASK_LOAD is zero-masked in very few places (PR115336, but not identified there), if we'd assume this everywhere there would be way more issues with RISC-V? For example when we do loop masking I think we elide .COND_op for "safe" operations. But even that doesn't assume zero-masking. Richard.
[pushed] c++, coroutines: Check for malformed functions before splitting.
tested on x86_64-darwin, powerpc64-linux and against cppcoro and folly coroutines tests, pushed to trunk as obvious, thanks, Iain --- 8< --- This performs the same basic check that is done by finish_function to catch cases where the function is so badly malformed that we do not have a consistent binding level. gcc/cp/ChangeLog: * coroutines.cc (split_coroutine_body_from_ramp): Check that the binding level is as expected before attempting to outline the function body. Signed-off-by: Iain Sandoe --- gcc/cp/coroutines.cc | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index f7791cbfb9a..7af2a188561 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -4553,10 +4553,16 @@ coro_rewrite_function_body (location_t fn_start, tree fnbody, tree orig, static tree split_coroutine_body_from_ramp (tree fndecl) { - tree body; + /* Sanity-check and punt if we have a nonsense tree because of earlier + parse errors, perhaps. */ + if (!current_binding_level + || current_binding_level->kind != sk_function_parms) +return NULL_TREE; + /* Once we've tied off the original user-authored body in fn_body. Start the replacement synthesized ramp body. */ + tree body; if (use_eh_spec_block (fndecl)) { body = pop_stmt_list (TREE_OPERAND (current_eh_spec_block, 0)); -- 2.39.2 (Apple Git-143)
[PATCH] c++, coroutines: Tidy up awaiter variable checks.
Tested on x86_64-darwin, powerpc64le-linux, and against cppcoro and folly coroutines testsuites, OK for trunk? thanks Iain --- 8< --- When we build an await expression, we might need to materialise the awaiter if it is a prvalue. This re-implements this using core APIs instead of local code. gcc/cp/ChangeLog: * coroutines.cc (build_co_await): Simplify checks for the cases that we need to materialise an awaiter. Signed-off-by: Iain Sandoe --- gcc/cp/coroutines.cc | 59 +--- 1 file changed, 11 insertions(+), 48 deletions(-) diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 3356e7f5b24..1f1ea5c2fe4 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -1149,55 +1149,18 @@ build_co_await (location_t loc, tree a, suspend_point_kind suspend_kind, if (!awrs_meth || awrs_meth == error_mark_node) return error_mark_node; - /* To complete the lookups, we need an instance of 'e' which is built from - 'o' according to [expr.await] 3.4. - - If we need to materialize this as a temporary, then that will have to be - 'promoted' to a coroutine frame var. However, if the awaitable is a - user variable, parameter or comes from a scope outside this function, - then we must use it directly - or we will see unnecessary copies. - - If o is a variable, find the underlying var. */ - tree e_proxy = STRIP_NOPS (o); - if (INDIRECT_REF_P (e_proxy)) -e_proxy = TREE_OPERAND (e_proxy, 0); - while (TREE_CODE (e_proxy) == COMPONENT_REF) -{ - e_proxy = TREE_OPERAND (e_proxy, 0); - if (INDIRECT_REF_P (e_proxy)) - e_proxy = TREE_OPERAND (e_proxy, 0); - if (TREE_CODE (e_proxy) == CALL_EXPR) - { - /* We could have operator-> here too. */ - tree op = TREE_OPERAND (CALL_EXPR_FN (e_proxy), 0); - if (DECL_OVERLOADED_OPERATOR_P (op) - && DECL_OVERLOADED_OPERATOR_IS (op, COMPONENT_REF)) - { - e_proxy = CALL_EXPR_ARG (e_proxy, 0); - STRIP_NOPS (e_proxy); - gcc_checking_assert (TREE_CODE (e_proxy) == ADDR_EXPR); - e_proxy = TREE_OPERAND (e_proxy, 0); - } - } - STRIP_NOPS (e_proxy); -} - - /* Only build a temporary if we need it. */ - STRIP_NOPS (e_proxy); - if (TREE_CODE (e_proxy) == PARM_DECL - || (VAR_P (e_proxy) && !is_local_temp (e_proxy))) + /* [expr.await]/3.3 If o would be a prvalue, the temporary + materialization conversion ([conv.rval]) is applied. */ + if (!glvalue_p (o) && !xvalue_p (o)) +o = build_target_expr_with_type (o, TREE_TYPE (o), tf_warning_or_error); + + tree e_proxy = o; + if (glvalue_p (o)) +o = NULL_TREE; /* Use the existing entity. */ + else /* We need to materialise it. */ { - e_proxy = o; - o = NULL_TREE; /* The var is already present. */ -} - else -{ - tree p_type = o_type; - if (glvalue_p (o)) - p_type = cp_build_reference_type (p_type, !lvalue_p (o)); - e_proxy = get_awaitable_var (suspend_kind, p_type); - o = cp_build_modify_expr (loc, e_proxy, INIT_EXPR, o, - tf_warning_or_error); + e_proxy = get_awaitable_var (suspend_kind, o_type); + o = cp_build_init_expr (loc, e_proxy, convert_from_reference (o)); e_proxy = convert_from_reference (e_proxy); } -- 2.39.2 (Apple Git-143)
Re: [PATCH 0/7] v3 of libdiagnostics
On Wed, Aug 21, 2024 at 2:01 AM David Malcolm wrote: > > On Tue, 2024-08-20 at 11:49 +0200, Richard Biener wrote: > > On Thu, Aug 15, 2024 at 8:13 PM David Malcolm > > wrote: > > > > > > Here's v3 of my patch kit for "libdiagnostics", which makes GCC's > > > diagnostics subsystem available as a shared library; see: > > > https://gcc.gnu.org/wiki/libdiagnostics > > > > So this is to make use of this from gas? > > One of the clients of the library would be gas, yes (although > optionally, to avoid complicating bootstrap of binutils). > > However other clients would be possible, including those that are not > under the "GNU toolchain" umbrella (provided that they can be linked > against GPLv3 code). > > For example, I've been experimenting with Python bindings, which would > allow Python scripts to reuse GCC's diagnostics code (e.g. for SARIF, > fix-it hints, etc). I've also been experimenting with IDE integration > (see PR 115970), and it would be nice if users of libdiagnostics got > that "for free". > > Another example of client code is the sarif-replay tool in patch 7, as > a sarif consumer (if nothing else, writing this has exposed various > bugs in our existing SARIF-writing code). The analyzer's integration > test suite generates 10s of thousands of .sarif files, so having > tooling to work with them is "scratching my own itch". > > > Is the plan to move > > sources (and dependences) to the toplevel then, possibly > > building a static convenience lib for GCCs use? > > The problem with moving things from "gcc" to, say, a new subdirectory > of the top level source directory is that our "gcc" subdirectory has a > lot of support code that would also need refactoring/moving. Off the > top of my head: > - selftest framework > - DejaGnu .exp stuff below gcc/testsuite/lib > - C++11 support, such as our "make-unique.h" > - libcpp: the big one: diagnostics uses libcpp > - probably some configure/Makefile.in entanglements > > Fixing the above would be a major task. Yeah - I was wondering about dependences ... > So the patch kit punts on this by adding/moving stuff within "gcc" (and > requiring an opt-in via --enable-libdiagnostics). OK, fair. > > > > Note I'm missing documentation (which is probably there > > in the libdiagnostics.h header); an addition to sourcebuild.texi > > might be nice at least and documenting --enable-libdiagnostics > > in install.texi. > > The libdiagnostics.h header has comments, but, yes, I should probably > add docs similar to that of libgccjit.h (tutorial and API reference). > I'll do that for the next iteration of the patch. > > Patch 2 of the kit documents --enable-libdiagnostics in install.texi. > I'll add notes to sourcebuild.texi in the next iteration of the patch. > > > Thanks; hope the above makes sense. Sure. Note I'm not super-happy with adding maintainance burden on the GCC side for external users when the component is so deeply interwinded with GCC internals. But I won't object ;) Richard. > Dave > > > > > > New in v3: > > > * it bootstraps and pass regression tests > > > * I added an opt-in configure flag: --enable-libdiagnostics, which > > > must be enabled to build it (along with --enable-host-shared) > > > * a new "sarif-replay" command-line tool that takes .sarif files > > > and replays the diagnostics within them as if they were GCC > > > diagnostics, in GCC's textual format (i.e. GCC as a SARIF > > > *consumer*, > > > as well as producer). This is implemented on top of > > > libdiagnostics > > > hence I've been "eating my own dogfood" > > > * support for execution paths in libdiagnostics API > > > * lots of fixes > > > > > > Patch 1 has libdiagnostic.h, the public header file > > > Patch 2 has the implementation > > > Patch 3 has the C++ wrapper API I added in v2 > > > Patch 4 has a refactoring of gcc-dg.exp I needed for patch 5. > > > Patch 5 has the testsuite for libdiagnostics itself > > > Patch 6 implements JSON parsing support > > > Patch 7 implements the sarif-replay command-line tool, and its > > > testsuite, exercising various valid, invalid, and malformed > > > input files. > > > > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu, > > > both with and without --enable-libdiagnostics. > > > With --enable-libdiagnostics the patch kit has this effect: > > > > > > # of .sum files: 20->22 (+2) > > > FAIL: 110 > > > PASS: 617481->617679 (+198) 100.03% > > > XFAIL: 4512 > > > XPASS: 13 > > > UNTESTED: 136 > > > UNSUPPORTED: 8058 > > > > > > where the two new .sum files are: > > > > > > BUILD/gcc/testsuite/libdiagnostics/libdiagnostics.sum: > > > PASS: 132 tests > > > > > > BUILD/gcc/testsuite/sarif-replay/sarif-replay.sum: > > > PASS: 66 tests > > > > > > OK for trunk? > > > > > > David Malcolm (7): > > > libdiagnostics v3: header > > > libdiagnostics v3: implementation > > > libdiagnostics v3: add C++ wrapper API > > > testsuite: move dg-test cleanup code from gcc-dg.exp to its own >
[PATCH V2] add rlwinm pattern for DImode for constant building
Hi, 'rlwinm' pattern is already well used for SImode. As this instruction can touch the whole 64bit register, so some constants in DImode can be built via 'lis/li+rlwinm'. To achieve this, a new pattern for 'rlwinm' is added, and use this pattern if a constant is able to be built by 'lis/li; rlwinm'. Compare with previous patch, https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649792.html this version adds option '-mpowerpc64' to let the test cases can be run with -m32. Bootstrap and regtest pass on ppc64{,le}. Is ok for trunk? Jeff (Jiufu Guo). gcc/ChangeLog: * config/rs6000/rs6000-protos.h (can_be_rotated_to_lowbits): Add new parameter. * config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rlwinm): New function. (rs6000_emit_set_long_const): Generate 'lis/li+rlwinm'. (can_be_rotated_to_lowbits): Add new parameter. * config/rs6000/rs6000.md (rlwinm_di_mask): New pattern. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr93012.c: Update to match 'rlwinm'. * gcc.target/powerpc/rlwinm4di-1.c: New test. * gcc.target/powerpc/rlwinm4di-2.c: New test. * gcc.target/powerpc/rlwinm4di.c: New test. * gcc.target/powerpc/rlwinm4di.h: New test. --- gcc/config/rs6000/rs6000-protos.h | 2 +- gcc/config/rs6000/rs6000.cc | 65 ++- gcc/config/rs6000/rs6000.md | 18 + gcc/testsuite/gcc.target/powerpc/pr93012.c| 2 +- .../gcc.target/powerpc/rlwinm4di-1.c | 25 +++ .../gcc.target/powerpc/rlwinm4di-2.c | 19 ++ gcc/testsuite/gcc.target/powerpc/rlwinm4di.c | 6 ++ gcc/testsuite/gcc.target/powerpc/rlwinm4di.h | 25 +++ 8 files changed, 158 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di.h diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 09a57a806fa..10505a8061a 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -36,7 +36,7 @@ extern bool vspltisw_vupkhsw_constant_p (rtx, machine_mode, int * = nullptr); extern int vspltis_shifted (rtx); extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int); extern bool macho_lo_sum_memory_operand (rtx, machine_mode); -extern bool can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT, int, int *); +extern bool can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT, int, int *, bool = false); extern bool can_be_rotated_to_positive_16bits (HOST_WIDE_INT); extern bool can_be_rotated_to_negative_15bits (HOST_WIDE_INT); extern int num_insns_constant (rtx, machine_mode); diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 6ba9df4f02e..853eaede673 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -10454,6 +10454,51 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int *shift, HOST_WIDE_INT *mask) return false; } +/* Check if value C can be generated by 2 instructions, one instruction + is li/lis, another instruction is rlwinm. */ + +static bool +can_be_built_by_li_lis_and_rlwinm (HOST_WIDE_INT c, HOST_WIDE_INT *val, + int *shift, HOST_WIDE_INT *mask) +{ + unsigned HOST_WIDE_INT low = c & 0xULL; + unsigned HOST_WIDE_INT high = (c >> 32) & 0xULL; + unsigned HOST_WIDE_INT v; + + /* diff of high and low (high ^ low) should be the mask position. */ + unsigned HOST_WIDE_INT m = low ^ high; + int tz = ctz_hwi (m); + int lz = clz_hwi (m); + if (m != 0) +m = ((HOST_WIDE_INT_M1U >> (lz + tz)) << tz); + if (high != 0) +m = ~m; + v = high != 0 ? high : ((low | ~m) & 0x); + + if ((high != 0) && ((v & m) != low || lz < 33 || tz < 1)) +return false; + + /* rotl32 on positive/negative value of 'li' 15/16bits. */ + int n; + if (!can_be_rotated_to_lowbits (v, 15, &n, true) + && !can_be_rotated_to_lowbits ((~v) & 0xULL, 15, &n, true)) +{ + /* rotate32 from a negative value of 'lis'. */ + if (!can_be_rotated_to_lowbits (v & 0xULL, 16, &n, true)) + return false; + n += 16; +} + n = 32 - (n % 32); + n %= 32; + v = ((v >> n) | (v << (32 - n))) & 0x; + if (v & 0x8000ULL) +v |= HOST_WIDE_INT_M1U << 32; + *mask = m; + *val = v; + *shift = n; + return true; +} + /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode. Output insns to set DEST equal to the constant C as a series of lis, ori and shl instructions. If NUM_INSNS is not NULL, then @@ -10553,6 +10598,18 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c, int *num_insns) return; } + HOST_WIDE_INT val; + if (can_be_built_by_li_lis_and_rlwinm (c, &val, &shift, &mask)) +{ +
Re: [PATCH v3 5/7] OpenMP: common C/C++ testcases for dispatch + adjust_args
Here is an updated version following Tobias's review on the ME patch. The differences compared to the previous version are: * updated DejaGnu patterns * added testcase dispatch-8.c -- PA commit 533f2693680f109837f03cda2e123b155bbb5c60 Author: Paul-Antoine Arras Date: Fri May 24 19:04:35 2024 +0200 OpenMP: common C/C++ testcases for dispatch + adjust_args gcc/testsuite/ChangeLog: * c-c++-common/gomp/declare-variant-2.c: Adjust dg-error directives. * c-c++-common/gomp/adjust-args-1.c: New test. * c-c++-common/gomp/adjust-args-2.c: New test. * c-c++-common/gomp/dispatch-1.c: New test. * c-c++-common/gomp/dispatch-2.c: New test. * c-c++-common/gomp/dispatch-3.c: New test. * c-c++-common/gomp/dispatch-4.c: New test. * c-c++-common/gomp/dispatch-5.c: New test. * c-c++-common/gomp/dispatch-6.c: New test. * c-c++-common/gomp/dispatch-7.c: New test. * c-c++-common/gomp/dispatch-8.c: New test. diff --git gcc/testsuite/c-c++-common/gomp/adjust-args-1.c gcc/testsuite/c-c++-common/gomp/adjust-args-1.c new file mode 100644 index 000..728abe62092 --- /dev/null +++ gcc/testsuite/c-c++-common/gomp/adjust-args-1.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-fdump-tree-gimple" } */ + +int f (int a, void *b, float c[2]); + +#pragma omp declare variant (f) match (construct={dispatch}) adjust_args (nothing: a) adjust_args (need_device_ptr: b, c) +int f0 (int a, void *b, float c[2]); +#pragma omp declare variant (f) match (construct={dispatch}) adjust_args (nothing: a) adjust_args (need_device_ptr: b) adjust_args (need_device_ptr: c) +int f1 (int a, void *b, float c[2]); + +int test () { + int a; + void *b; + float c[2]; + struct {int a;} s; + + s.a = f0 (a, b, c); + #pragma omp dispatch + s.a = f0 (a, b, c); + + f1 (a, b, c); + #pragma omp dispatch + s.a = f1 (a, b, c); + + return s.a; +} + +/* { dg-final { scan-tree-dump-times "__builtin_omp_get_default_device \\(\\);" 2 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = __builtin_omp_get_mapped_ptr \\(&c, D\.\[0-9]+\\);" 2 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = __builtin_omp_get_mapped_ptr \\(b, D\.\[0-9]+\\);" 2 "gimple" } } */ diff --git gcc/testsuite/c-c++-common/gomp/adjust-args-2.c gcc/testsuite/c-c++-common/gomp/adjust-args-2.c new file mode 100644 index 000..d2a4a5f4ec4 --- /dev/null +++ gcc/testsuite/c-c++-common/gomp/adjust-args-2.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-fdump-tree-gimple" } */ + +int f (int a, void *b, float c[2]); + +#pragma omp declare variant (f) match (construct={dispatch}) adjust_args (nothing: a) adjust_args (need_device_ptr: b, c) +int f0 (int a, void *b, float c[2]); +#pragma omp declare variant (f) adjust_args (need_device_ptr: b, c) match (construct={dispatch}) adjust_args (nothing: a) +int f1 (int a, void *b, float c[2]); + +void test () { + int a; + void *b; + float c[2]; + + #pragma omp dispatch + f0 (a, b, c); + + #pragma omp dispatch device (-4852) + f0 (a, b, c); + + #pragma omp dispatch device (a + a) + f0 (a, b, c); +} + +/* { dg-final { scan-tree-dump-times "__builtin_omp_get_default_device \\(\\);" 3 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = __builtin_omp_get_mapped_ptr \\(&c, D\.\[0-9]+\\);" 2 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = __builtin_omp_get_mapped_ptr \\(b, D\.\[0-9]+\\);" 2 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = __builtin_omp_get_mapped_ptr \\(&c, -4852\\);" 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = __builtin_omp_get_mapped_ptr \\(b, -4852\\);" 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-not "#pragma omp dispatch device" "gimple" } } */ diff --git gcc/testsuite/c-c++-common/gomp/declare-variant-2.c gcc/testsuite/c-c++-common/gomp/declare-variant-2.c index 05e485ef6a8..50d9b2dcf4b 100644 --- gcc/testsuite/c-c++-common/gomp/declare-variant-2.c +++ gcc/testsuite/c-c++-common/gomp/declare-variant-2.c @@ -8,9 +8,9 @@ void f3 (void); void f4 (void); #pragma omp declare variant match(user={condition(0)}) /* { dg-error "expected '\\(' before 'match'" } */ void f5 (void); -#pragma omp declare variant (f1) /* { dg-error "expected 'match' before end of line" } */ +#pragma omp declare variant (f1) /* { dg-error "expected 'match' or 'adjust_args' before end of line" } */ void f6 (void); -#pragma omp declare variant (f1) simd /* { dg-error "expected 'match' before 'simd'" } */ +#pragma omp declare variant (f1) simd /* { dg-error "expected 'match' or 'adjust_args' before 'simd'" } */ void f7 (void); #pragma omp declare variant (f1) match /* { dg-error "expected '\\(' before end of line" } */ void f8 (void); diff --git gcc/testsuite/c-c++-common/gomp/dispatch-1.c gcc/testsuite/c-c++-common/gomp/dispatch-1.c new file m
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
On Tue, Aug 20, 2024 at 3:41 PM Qing Zhao wrote: > > > > > On Aug 20, 2024, at 05:58, Richard Biener > > wrote: > > > > On Tue, Aug 13, 2024 at 5:34 PM Qing Zhao wrote: > >> > >> With the addition of the 'counted_by' attribute and its wide roll-out > >> within the Linux kernel, a use case has been found that would be very > >> nice to have for object allocators: being able to set the counted_by > >> counter variable without knowing its name. > >> > >> For example, given: > >> > >> struct foo { > >>... > >>int counter; > >>... > >>struct bar array[] __attribute__((counted_by (counter))); > >> } *p; > >> > >> The existing Linux object allocators are roughly: > >> > >> #define alloc(P, FAM, COUNT) ({ \ > >>size_t __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \ > >>kmalloc(__size, GFP); \ > >> }) > >> > >> Right now, any addition of a counted_by annotation must also > >> include an open-coded assignment of the counter variable after > >> the allocation: > >> > >> p = alloc(p, array, how_many); > >> p->counter = how_many; > >> > >> In order to avoid the tedious and error-prone work of manually adding > >> the open-coded counted-by intializations everywhere in the Linux > >> kernel, a new GCC builtin __builtin_get_counted_by will be very useful > >> to be added to help the adoption of the counted-by attribute. > >> > >> -- Built-in Function: TYPE __builtin_get_counted_by (PTR) > >> The built-in function '__builtin_get_counted_by' checks whether the > >> array object pointed by the pointer PTR has another object > >> associated with it that represents the number of elements in the > >> array object through the 'counted_by' attribute (i.e. the > >> counted-by object). If so, returns a pointer to the corresponding > >> counted-by object. If such counted-by object does not exist, > >> returns a NULL pointer. > >> > >> This built-in function is only available in C for now. > >> > >> The argument PTR must be a pointer to an array. The TYPE of the > >> returned value must be a pointer type pointing to the corresponding > >> type of the counted-by object or a pointer type pointing to the > >> SIZE_T in case of a NULL pointer being returned. > >> > >> With this new builtin, the central allocator could be updated to: > >> > >> #define alloc(P, FAM, COUNT) ({ \ > >>typeof(P) __p; \ > >>size_t __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \ > >>__p = kmalloc(__size, GFP); \ > >>if (__builtin_get_counted_by (__p->FAM)) \ > >> *(__builtin_get_counted_by(__p->FAM)) = COUNT; \ > >>__p; \ > >> }) > >> > >> And then structs can gain the counted_by attribute without needing > >> additional open-coded counter assignments for each struct, and > >> unannotated structs could still use the same allocator. > > > > Did you consider a __builtin_set_counted_by (PTR, VALUE)? > > Yes, that’s the initial request from Kees. -) > > The title of PR116016 is: add __builtin_set_counted_by(P->FAM, COUNT) or > equivalent > > After extensive discussion (Martin Uecker raised the initial idea in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116016#c24, more discussions > followed, till comments #31). we decided to provide > __builtin_get_counted_by(PTR) instead of __builtin_set_counted_by(PTR, VALUE) > due to the following two reasons: > > 1. __builtin_get_counted_by should be enough to provide the functionality, > and even simpler; > 2. More flexible to be used by the programmer to be able to both WRITE and > READ the counted-by field. > > > > > > > Note that __builtin_get_counted_by to me suggests it returns the > > value and not a pointer to the value. > > The syntax of __builtin_get_counted_by is: > > TYPE __builtin_get_counted_by (PTR) > > The returned value is: > > returns a pointer to the corresponding > counted-by object. If such counted-by object does not exist, > returns a NULL pointer. > > This built-in function is only available in C for now. > > The argument PTR must be a pointer to an array. The TYPE of the > returned value must be a pointer type pointing to the corresponding > type of the counted-by object or a pointer type pointing to the > SIZE_T in case of a NULL pointer being returned. > > > > A more proper language extension might involve a keyword > > like __real, so __counted_by X would produce an lvalue, selecting > > the counted-by member. > > Yes, if the returned value could be a LVALUE instead of a Pointer, that’s > even simpler and cleaner. > However, then as you mentioned below, another builtin > “__builtin_has_attribute(PTR, counted_by)” need > to be queried first to make sure the counted_by field exists. > > We have discussed this approach, and I preferred this approach too. > > However, the main reason we gave up on that direction is: > > There is NO __builtin_has_attribute (PTR, counted_by) been supported by > CLANG, and not sure how difficult for CLANG to add this new bu
Re: [PATCH] Do not emit a redundant DW_TAG_lexical_block for inlined subroutines
On Wed, 21 Aug 2024, Richard Biener wrote: > On Tue, 20 Aug 2024, Bernd Edlinger wrote: > > > On 8/20/24 13:00, Richard Biener wrote: > > > On Fri, Aug 16, 2024 at 12:49 PM Bernd Edlinger > > > wrote: > > >> > > >> While this already works correctly for the case when an inlined > > >> subroutine contains only one subrange, a redundant DW_TAG_lexical_block > > >> is still emitted when the subroutine has multiple blocks. > > > > > > Huh. The point is that the inline context is a single scope block with no > > > siblings - how did that get messed up? The patch unfortunately does not > > > contain a testcase. > > > > > > > Well, I became aware of this because I am working on a gdb patch, > > which improves the debug experience of optimized C code, and to my surprise > > the test case did not work with gcc-8, while gcc-9 and following were fine. > > Initially I did not see what is wrong, therefore I started to bisect when > > this changed, and so I found your patch, which removed some lexical blocks > > in the debug info of this gdb test case: > > > > from binutils-gdb/gdb/testsuite/gdb.cp/step-and-next-inline.cc > > in case you have the binutils-gdb already downloaded you can skip this: > > $ git clone git://sourceware.org/git/binutils-gdb > > $ cd binutils-gdb/gdb/testsuite/gdb.cp > > $ gcc -g -O2 step-and-next-inline.cc > > > > when you look at the debug info with readelf -w a.out > > you will see, that the function "tree_check" > > is inlined three times, one looks like this > > <2><86b>: Abbrev Number: 40 (DW_TAG_inlined_subroutine) > > <86c> DW_AT_abstract_origin: <0x95b> > > <870> DW_AT_entry_pc: 0x1175 > > <878> DW_AT_GNU_entry_view: 0 > > <879> DW_AT_ranges : 0x21 > > <87d> DW_AT_call_file : 1 > > <87e> DW_AT_call_line : 52 > > <87f> DW_AT_call_column : 10 > > <880> DW_AT_sibling : <0x8bf> > > <3><884>: Abbrev Number: 8 (DW_TAG_formal_parameter) > > <885> DW_AT_abstract_origin: <0x974> > > <889> DW_AT_location: 0x37 (location list) > > <88d> DW_AT_GNU_locviews: 0x35 > > <3><891>: Abbrev Number: 8 (DW_TAG_formal_parameter) > > <892> DW_AT_abstract_origin: <0x96c> > > <896> DW_AT_location: 0x47 (location list) > > <89a> DW_AT_GNU_locviews: 0x45 > > <3><89e>: Abbrev Number: 41 (DW_TAG_lexical_block) > > <89f> DW_AT_ranges : 0x21 > > > > see the lexical block has the same DW_AT_ranges, as the > > inlined subroutine, but the other invocations do not > > have this lexical block, since your original fix removed > > those. > > And this lexical block triggered an unexpected issue > > in my gdb patch, which I owe you one, for helping me > > finding it :-) > > > > Before that I have never looked at these lexical blocks, > > but all I can say is that while compiling this test case, > > in the first invocation of gen_inlined_subroutine_die > > there are several SUBBLOCKS linked via BLOCK_CHAIN, > > and only the first one is used to emit the lexical_block, > > while the other siblings must be fully decoded, otherwise > > there is an internal error, that I found by try-and-error. > > I thought that is since the subroutine is split over several > > places, and therefore it appeared natural to me, that the > > subroutine is also using several SUBBLOCKS. > > OK, so the case in question looks like > > { Scope block #8 step-and-next-inline.cc:52 Originating from : static > struct tree * tree_check (struct tree *, int); Fragment chain : #16 #17 > struct tree * t; > int i; > > { Scope block #9 Originating from :#0 Fragment chain : #10 #11 > struct tree * x; > > } > > { Scope block #10 Originating from :#0 Fragment of : #9 > struct tree * x; > > } > > { Scope block #11 Originating from :#0 Fragment of : #9 > struct tree * x; > > } > > } > > so we have fragments here which we should ignore, but then fragments > are to collect multiple ranges which, when we do not emit a > lexical block for block #9 above, we will likely fail to emit and > which we instead should associate with block #8, the > DW_TAG_inlined_subroutine. > > Somehow it seems to "work" as to associate DW_AT_ranges with the > DW_TAG_inlined_subroutine. > > I've used the following - there's no need to process BLOCK_CHAIN > as fragments are ignored by gen_block_die. > > diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc > index d5144714c6e..4e6ad2ab7e1 100644 > --- a/gcc/dwarf2out.cc > +++ b/gcc/dwarf2out.cc > @@ -25194,8 +25194,13 @@ gen_inlined_subroutine_die (tree stmt, dw_die_ref > context_die) > Do that by doing the recursion to subblocks on the single subblock > of STMT. */ >bool unwrap_one = false; > - if (BLOCK_SUBBLOCKS (stmt) && !BLOCK_CHAIN (BLOCK_SUBBLOCKS (stmt))) > + if (BLOCK_SUBBLOCKS (stmt)) > { > + tree subblock = BLOCK_SUBBLOCKS (stmt); > + /* We should never elide that BLOCK, but we may have multiple > fragments. > +Assert t
Re: [PATCH v3 6/7] OpenMP: Fortran front-end support for dispatch + adjust_args
Here is an updated version following Tobias's review on the ME patch. The main differences compared to the previous version are: * Added support for !$OMP END DISPATCH * Added testcase dispatch-9.f90 * Updated DejaGnu patterns -- PA commit d427c071326c7bf6ccf7ccbc06d6d1cbbb29e73a Author: Paul-Antoine Arras Date: Fri May 24 19:13:50 2024 +0200 OpenMP: Fortran front-end support for dispatch + adjust_args This patch adds support for the `dispatch` construct and the `adjust_args` clause to the Fortran front-end. Handling of `adjust_args` across translation units is missing due to PR115271. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_clauses): Handle novariants and nocontext clauses. (show_omp_node): Handle EXEC_OMP_DISPATCH. (show_code_node): Likewise. * frontend-passes.cc (gfc_code_walker): Handle novariants and nocontext. * gfortran.h (enum gfc_statement): Add ST_OMP_DISPATCH. (symbol_attribute): Add omp_declare_variant_need_device_ptr. (gfc_omp_clauses): Add novariants and nocontext. (gfc_omp_declare_variant): Add need_device_ptr_arg_list. (enum gfc_exec_op): Add EXEC_OMP_DISPATCH. * match.h (gfc_match_omp_dispatch): Declare. * openmp.cc (gfc_free_omp_clauses): Free novariants and nocontext clauses. (gfc_free_omp_declare_variant_list): Free need_device_ptr_arg_list namelist. (enum omp_mask2): Add OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT. (gfc_match_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT. (OMP_DISPATCH_CLAUSES): Define. (gfc_match_omp_dispatch): New function. (gfc_match_omp_declare_variant): Parse adjust_args. (resolve_omp_clauses): Handle adjust_args, novariants and nocontext. Adjust handling of OMP_LIST_IS_DEVICE_PTR. (icode_code_error_callback): Handle EXEC_OMP_DISPATCH. (omp_code_to_statement): Likewise. (resolve_omp_dispatch): New function. (gfc_resolve_omp_directive): Handle EXEC_OMP_DISPATCH. * parse.cc (decode_omp_directive): Match dispatch. (next_statement): Handle ST_OMP_DISPATCH. (gfc_ascii_statement): Likewise. (parse_omp_dispatch): New function. (parse_executable): Handle ST_OMP_DISPATCH. * resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_DISPATCH. * st.cc (gfc_free_statement): Likewise. * trans-decl.cc (create_function_arglist): Declare. (gfc_get_extern_function_decl): Call it. * trans-openmp.cc (gfc_trans_omp_clauses): Handle novariants and nocontext. (gfc_trans_omp_dispatch): New function. (gfc_trans_omp_directive): Handle EXEC_OMP_DISPATCH. (gfc_trans_omp_declare_variant): Handle adjust_args. * trans.cc (trans_code): Handle EXEC_OMP_DISPATCH:. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/declare-variant-2.f90: Update dg-error. * gfortran.dg/gomp/declare-variant-21.f90: New test (xfail). * gfortran.dg/gomp/declare-variant-21-aux.f90: New test. * gfortran.dg/gomp/adjust-args-1.f90: New test. * gfortran.dg/gomp/adjust-args-2.f90: New test. * gfortran.dg/gomp/adjust-args-3.f90: New test. * gfortran.dg/gomp/adjust-args-4.f90: New test. * gfortran.dg/gomp/adjust-args-5.f90: New test. * gfortran.dg/gomp/dispatch-1.f90: New test. * gfortran.dg/gomp/dispatch-2.f90: New test. * gfortran.dg/gomp/dispatch-3.f90: New test. * gfortran.dg/gomp/dispatch-4.f90: New test. * gfortran.dg/gomp/dispatch-5.f90: New test. * gfortran.dg/gomp/dispatch-6.f90: New test. * gfortran.dg/gomp/dispatch-7.f90: New test. * gfortran.dg/gomp/dispatch-8.f90: New test. * gfortran.dg/gomp/dispatch-9.f90: New test. diff --git gcc/fortran/dump-parse-tree.cc gcc/fortran/dump-parse-tree.cc index 80aa8ef84e7..a15a17c086c 100644 --- gcc/fortran/dump-parse-tree.cc +++ gcc/fortran/dump-parse-tree.cc @@ -2139,6 +2139,18 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses) } fputc (')', dumpfile); } + if (omp_clauses->novariants) +{ + fputs (" NOVARIANTS(", dumpfile); + show_expr (omp_clauses->novariants); + fputc (')', dumpfile); +} + if (omp_clauses->nocontext) +{ + fputs (" NOCONTEXT(", dumpfile); + show_expr (omp_clauses->nocontext); + fputc (')', dumpfile); +} } /* Show a single OpenMP or OpenACC directive node and everything underneath it @@ -2176,6 +2188,9 @@ show_omp_node (int level, gfc_code *c) case EXEC_OMP_CANCEL: name = "CANCEL"; break; case EXEC_OMP_CANCELLA
[wwwdocs v2] gcc-15: Mention c++ header dependency changes () in porting_to.html
Hi, this is the second version of my patch. See version 1 here: https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659584.html Changes made: - Removed plural when referring to the single changed header. From the two versions of the text I considered I chose the one with less changes as Jonathan suggested. - Changed "in libstdc++" to "within libstdc++". Validated with the W3 Validator. Is this ok to be pushed? Cheers, Filip Kastl --- htdocs/gcc-15/changes.html| 3 +- htdocs/gcc-15/porting_to.html | 54 +++ 2 files changed, 55 insertions(+), 2 deletions(-) create mode 100644 htdocs/gcc-15/porting_to.html diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index fe7cf3c1..d0d6d147 100644 --- a/htdocs/gcc-15/changes.html +++ b/htdocs/gcc-15/changes.html @@ -17,9 +17,8 @@ This page is a "brief" summary of some of the huge number of improvements in GCC 15. - diff --git a/htdocs/gcc-15/porting_to.html b/htdocs/gcc-15/porting_to.html new file mode 100644 index ..702cf507 --- /dev/null +++ b/htdocs/gcc-15/porting_to.html @@ -0,0 +1,54 @@ + + + + + +Porting to GCC 15 +https://gcc.gnu.org/gcc.css";> + + + +Porting to GCC 15 + + +The GCC 15 release series differs from previous GCC releases in +a number of ways. Some of these are a result +of bug fixing, and some old behaviors have been intentionally changed +to support new standards, or relaxed in standards-conforming ways to +facilitate compilation or run-time performance. + + + +Some of these changes are user visible and can cause grief when +porting to GCC 15. This document is an effort to identify common issues +and provide solutions. Let us know if you have suggestions for improvements! + + +Note: GCC 15 has not been released yet, so this document is +a work-in-progress. + + + +C++ language issues + +Header dependency changes +Some C++ Standard Library headers have been changed to no longer include +other headers that were being used internally by the library. +As such, C++ programs that used standard library components without +including the right headers will no longer compile. + + +In particular, the following header is used less widely within libstdc++ and +may need to be included explicitly when compiling with GCC 15: + + ++ (for std::int8_t, std::int32_t etc.) + + + + + + + + -- 2.45.2
Re: [PATCH] Align ix86_{move_max,store_max} with vectorizer.
On Wed, Aug 21, 2024 at 7:40 AM liuhongt wrote: > > When none of mprefer-vector-width, avx256_optimal/avx128_optimal, > avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will > set ix86_{move_max,store_max} as max available vector length except > for AVX part. > > if (TARGET_AVX512F_P (opts->x_ix86_isa_flags) > && TARGET_EVEX512_P (opts->x_ix86_isa_flags2)) > opts->x_ix86_move_max = PVW_AVX512; > else > opts->x_ix86_move_max = PVW_AVX128; > > So for -mavx2, vectorizer will choose 256-bit for vectorization, but > 128-bit is used for struct copy, there could be a potential STLF issue > due to this "misalign". > > The patch fixes that and improved 538.imagick_r by ~30% for -march=x86-64-v3 > -O2. > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Any comments? Should we look at the avx128_optimal tune and/or avx256_split_regs and avx256_optimal also for 512? Because IIRC the vectorizers default looks at that as well (OTOH larger stores should be fine for STLF). > gcc/ChangeLog: > > * config/i386/i386-options.cc (ix86_option_override_internal): > set ix86_{move_max,store_max} to PVW_AVX256 when TARGET_AVX > instead of PVW_AVX128. > > gcc/testsuite/ChangeLog: > * gcc.target/i386/pieces-memcpy-10.c: Add -mprefer-vector-width=128. > * gcc.target/i386/pieces-memcpy-6.c: Ditto. > * gcc.target/i386/pieces-memset-38.c: Ditto. > * gcc.target/i386/pieces-memset-40.c: Ditto. > * gcc.target/i386/pieces-memset-41.c: Ditto. > * gcc.target/i386/pieces-memset-42.c: Ditto. > * gcc.target/i386/pieces-memset-43.c: Ditto. > * gcc.target/i386/pieces-strcpy-2.c: Ditto. > * gcc.target/i386/pieces-memcpy-22.c: New test. > * gcc.target/i386/pieces-memset-51.c: New test. > * gcc.target/i386/pieces-strcpy-3.c: New test. > --- > gcc/config/i386/i386-options.cc | 6 ++ > gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c | 2 +- > gcc/testsuite/gcc.target/i386/pieces-memcpy-22.c | 12 > gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c | 2 +- > gcc/testsuite/gcc.target/i386/pieces-memset-38.c | 2 +- > gcc/testsuite/gcc.target/i386/pieces-memset-40.c | 2 +- > gcc/testsuite/gcc.target/i386/pieces-memset-41.c | 2 +- > gcc/testsuite/gcc.target/i386/pieces-memset-42.c | 2 +- > gcc/testsuite/gcc.target/i386/pieces-memset-43.c | 2 +- > gcc/testsuite/gcc.target/i386/pieces-memset-51.c | 12 > gcc/testsuite/gcc.target/i386/pieces-strcpy-2.c | 2 +- > gcc/testsuite/gcc.target/i386/pieces-strcpy-3.c | 15 +++ > 12 files changed, 53 insertions(+), 8 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-22.c > create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-51.c > create mode 100644 gcc/testsuite/gcc.target/i386/pieces-strcpy-3.c > > diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc > index f423455b363..f79257cc764 100644 > --- a/gcc/config/i386/i386-options.cc > +++ b/gcc/config/i386/i386-options.cc > @@ -3023,6 +3023,9 @@ ix86_option_override_internal (bool main_args_p, > if (TARGET_AVX512F_P (opts->x_ix86_isa_flags) > && TARGET_EVEX512_P (opts->x_ix86_isa_flags2)) > opts->x_ix86_move_max = PVW_AVX512; > + /* Align with vectorizer to avoid potential STLF issue. */ > + else if (TARGET_AVX_P (opts->x_ix86_isa_flags)) > + opts->x_ix86_move_max = PVW_AVX256; > else > opts->x_ix86_move_max = PVW_AVX128; > } > @@ -3047,6 +3050,9 @@ ix86_option_override_internal (bool main_args_p, > if (TARGET_AVX512F_P (opts->x_ix86_isa_flags) > && TARGET_EVEX512_P (opts->x_ix86_isa_flags2)) > opts->x_ix86_store_max = PVW_AVX512; > + /* Align with vectorizer to avoid potential STLF issue. */ > + else if (TARGET_AVX_P (opts->x_ix86_isa_flags)) > + opts->x_ix86_store_max = PVW_AVX256; > else > opts->x_ix86_store_max = PVW_AVX128; > } > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c > b/gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c > index 5faee21f9b9..53ad0b3be44 100644 > --- a/gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c > +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */ > +/* { dg-options "-O2 -mno-avx2 -mavx -mprefer-vector-width=128 > -mtune=sandybridge" } */ > > extern char *dst, *src; > > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-22.c > b/gcc/testsuite/gcc.target/i386/pieces-memcpy-22.c > new file mode 100644 > index 000..605b3623ffc > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pieces-
Re: [wwwdocs v2] gcc-15: Mention c++ header dependency changes () in porting_to.html
On Wed, 21 Aug 2024 at 09:48, Filip Kastl wrote: > > Hi, > > this is the second version of my patch. See version 1 here: > > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659584.html > > Changes made: > - Removed plural when referring to the single changed header. From the two > versions of the text I considered I chose the one with less changes as > Jonathan suggested. > - Changed "in libstdc++" to "within libstdc++". > > Validated with the W3 Validator. Is this ok to be pushed? LGTM. I think I can approve this, since it's documenting libstdc++ changes and I can approve libstdc++ patches, so unless Gerald has any further suggestions, please do push - thanks! > > Cheers, > Filip Kastl > > > --- > htdocs/gcc-15/changes.html| 3 +- > htdocs/gcc-15/porting_to.html | 54 +++ > 2 files changed, 55 insertions(+), 2 deletions(-) > create mode 100644 htdocs/gcc-15/porting_to.html > > diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html > index fe7cf3c1..d0d6d147 100644 > --- a/htdocs/gcc-15/changes.html > +++ b/htdocs/gcc-15/changes.html > @@ -17,9 +17,8 @@ > > This page is a "brief" summary of some of the huge number of improvements > in GCC 15. > - > > diff --git a/htdocs/gcc-15/porting_to.html b/htdocs/gcc-15/porting_to.html > new file mode 100644 > index ..702cf507 > --- /dev/null > +++ b/htdocs/gcc-15/porting_to.html > @@ -0,0 +1,54 @@ > + > + > + > + > + > +Porting to GCC 15 > +https://gcc.gnu.org/gcc.css";> > + > + > + > +Porting to GCC 15 > + > + > +The GCC 15 release series differs from previous GCC releases in > +a number of ways. Some of these are a result > +of bug fixing, and some old behaviors have been intentionally changed > +to support new standards, or relaxed in standards-conforming ways to > +facilitate compilation or run-time performance. > + > + > + > +Some of these changes are user visible and can cause grief when > +porting to GCC 15. This document is an effort to identify common issues > +and provide solutions. Let us know if you have suggestions for improvements! > + > + > +Note: GCC 15 has not been released yet, so this document is > +a work-in-progress. > + > + > + > +C++ language issues > + > +Header dependency changes > +Some C++ Standard Library headers have been changed to no longer include > +other headers that were being used internally by the library. > +As such, C++ programs that used standard library components without > +including the right headers will no longer compile. > + > + > +In particular, the following header is used less widely within libstdc++ and > +may need to be included explicitly when compiling with GCC 15: > + > + > +> + (for std::int8_t, std::int32_t etc.) > + > + > + > + > + > + > + > + > -- > 2.45.2 >
[committed] libstdc++: Fix std::variant to reject array types [PR116381]
Tested x86_64-linux. Pushed to trunk. Probably worth backporting too. It could potentially cause new errors for people using arrays in std::variant, but that's forbidden by the standard. -- >8 -- libstdc++-v3/ChangeLog: PR libstdc++/116381 * include/std/variant (variant): Fix conditions for static_assert to match the spec. * testsuite/20_util/variant/types_neg.cc: New test. --- libstdc++-v3/include/std/variant| 6 ++ .../testsuite/20_util/variant/types_neg.cc | 17 + 2 files changed, 19 insertions(+), 4 deletions(-) create mode 100644 libstdc++-v3/testsuite/20_util/variant/types_neg.cc diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant index 12108d07f0b..5fb7770d889 100644 --- a/libstdc++-v3/include/std/variant +++ b/libstdc++-v3/include/std/variant @@ -1457,10 +1457,8 @@ namespace __detail::__variant static_assert(sizeof...(_Types) > 0, "variant must have at least one alternative"); - static_assert(!(std::is_reference_v<_Types> || ...), - "variant must have no reference alternative"); - static_assert(!(std::is_void_v<_Types> || ...), - "variant must have no void alternative"); + static_assert(((std::is_object_v<_Types> && !is_array_v<_Types>) && ...), + "variant alternatives must be non-array object types"); using _Base = __detail::__variant::_Variant_base<_Types...>; diff --git a/libstdc++-v3/testsuite/20_util/variant/types_neg.cc b/libstdc++-v3/testsuite/20_util/variant/types_neg.cc new file mode 100644 index 000..5cd3d02154b --- /dev/null +++ b/libstdc++-v3/testsuite/20_util/variant/types_neg.cc @@ -0,0 +1,17 @@ +// { dg-do compile { target c++17 } } + +# include + +std::variant<> v0; // { dg-error "here" } +// { dg-error "must have at least one alternative" "" { target *-*-* } 0 } +std::variant v1; // { dg-error "here" } +std::variant v2; // { dg-error "here" } +std::variant v3; // { dg-error "here" } +std::variant v4; // { dg-error "here" } +std::variant v5; // { dg-error "here" } +std::variant v6; // { dg-error "here" } +// { dg-error "must be non-array object types" "" { target *-*-* } 0 } + +// All of variant's base classes are instantiated before checking any +// static_assert, so we get lots of errors before the expected errors above. +// { dg-excess-errors "" } -- 2.46.0
Re: [PATCH] optabs-query: Guard smallest_int_mode_for_size [PR115495].
On Wed, Aug 21, 2024 at 8:37 AM Robin Dapp wrote: > > Hi, > > in get_best_extraction_insn we use smallest_int_mode_for_size with > struct_bits as size argument. In PR115495 struct_bits = 256 and we > don't have a mode for that. This patch just bails for such cases. > > This does not happen on the current trunk anymore (so the test passes > unpatched) but we've seen it internally. Does it still make sense > to install it (and backport to 14)? > > Bootstrapped and regtested on x86 and aarch64. Regtested on rv64gcv. > > Regards > Robin > > PR middle-end/115495 > > gcc/ChangeLog: > > * optabs-query.cc (get_best_extraction_insn): Return if > smallest_int_mode_for_size might not find a mode. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/pr115495.c: New test. > --- > gcc/optabs-query.cc | 4 > gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115495.c | 9 + > 2 files changed, 13 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115495.c > > diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc > index 93c1d7b8485..dc2690e720f 100644 > --- a/gcc/optabs-query.cc > +++ b/gcc/optabs-query.cc > @@ -208,6 +208,10 @@ get_best_extraction_insn (extraction_insn *insn, > machine_mode field_mode) > { >opt_scalar_int_mode mode_iter; > + > + if (maybe_gt (struct_bits, GET_MODE_PRECISION (MAX_MODE_INT))) > +return false; > + >FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode_for_size (struct_bits)) I think we instead should change this iteration to use FOR_EACH_MODE_IN_CLASS (like smallest_mode_for_size does) and skip to small modes? > { >scalar_int_mode mode = mode_iter.require (); > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115495.c > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115495.c > new file mode 100644 > index 000..bbf4d720f63 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115495.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3" } */ > + > +extern short a[]; > +short b; > +int main() { > + for (char c = 0; c < 18; c += 1) > +a[c + 0] = b; > +} > -- > 2.46.0 >
Re: [committed] libstdc++: Fix std::variant to reject array types [PR116381]
On Wed, 21 Aug 2024 at 09:55, Jonathan Wakely wrote: > > Tested x86_64-linux. Pushed to trunk. > > Probably worth backporting too. It could potentially cause new errors > for people using arrays in std::variant, but that's forbidden by the > standard. Notably, both libc++ and MSVC STL reject array types in std::variant. Only libstdc++ had the bug that allowed them.
[PATCH] libstdc++: Simplify C++20 implementation of std::variant
Tested x86_64-linux. This should improve compile times for C++20 and up. I need to test this with Clang, but then I plan to push it if all goes well. -- >8 -- For C++20 the __detail::__variant::_Uninitialized primary template can be used for all types, because _Variant_union can have a non-trivially destructible union member in C++20, and the constrained user-provided destructor will ensure we don't destroy inactive objects. Since we always use the primary template for C++20, we don't need the _Uninitialized::_M_get accessors to abstract the difference between the primary template and the partial specialization. That allows us to simplify __get_n for C++20 too. Also improve the comments that explain the uses of _Uninitialized and when/why _Variant_union needs a user-provided destructor. libstdc++-v3/ChangeLog: * include/std/variant [C++20] (_Uninitialized): Always use the primary template. [C++20] (__get_n): Access the _M_storage member directly. --- libstdc++-v3/include/std/variant | 83 ++-- 1 file changed, 37 insertions(+), 46 deletions(-) diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant index 5fb7770d889..08c5395b54d 100644 --- a/libstdc++-v3/include/std/variant +++ b/libstdc++-v3/include/std/variant @@ -54,10 +54,9 @@ // C++ < 20 || __cpp_concepts < 202002L || __cpp_constexpr < 201811L #if __cpp_lib_variant < 202106L -# include // Use __aligned_membuf instead of union. +# include // Use __aligned_membuf for storage. #endif - namespace std _GLIBCXX_VISIBILITY(default) { _GLIBCXX_BEGIN_NAMESPACE_VERSION @@ -209,17 +208,18 @@ namespace __variant __as(const std::variant<_Types...>&& __v) noexcept { return std::move(__v); } - // For C++17: - // _Uninitialized is guaranteed to be a trivially destructible type, - // even if T is not. - // For C++20: - // _Uninitialized is trivially destructible iff T is, so _Variant_union - // needs a constrained non-trivial destructor. +#if __cpp_lib_variant < 202106L template> struct _Uninitialized; +#else + template +struct _Uninitialized; +#endif - template -struct _Uninitialized<_Type, true> + // The primary template is used for trivially destructible types in C++17, + // and for all types in C++20. + template +struct _Uninitialized { template constexpr @@ -227,6 +227,7 @@ namespace __variant : _M_storage(std::forward<_Args>(__args)...) { } +#if __cpp_lib_variant < 202106L constexpr const _Type& _M_get() const & noexcept { return _M_storage; } @@ -238,46 +239,18 @@ namespace __variant constexpr _Type&& _M_get() && noexcept { return std::move(_M_storage); } +#endif _Type _M_storage; }; +#if __cpp_lib_variant < 202106L + // This partial specialization is used for non-trivially destructible types + // in C++17, so that _Uninitialized is trivially destructible and can be + // used as a union member in _Variadic_union. template struct _Uninitialized<_Type, false> { -#if __cpp_lib_variant >= 202106L - template - constexpr - _Uninitialized(in_place_index_t<0>, _Args&&... __args) - : _M_storage(std::forward<_Args>(__args)...) - { } - - constexpr ~_Uninitialized() { } - - _Uninitialized(const _Uninitialized&) = default; - _Uninitialized(_Uninitialized&&) = default; - _Uninitialized& operator=(const _Uninitialized&) = default; - _Uninitialized& operator=(_Uninitialized&&) = default; - - constexpr const _Type& _M_get() const & noexcept - { return _M_storage; } - - constexpr _Type& _M_get() & noexcept - { return _M_storage; } - - constexpr const _Type&& _M_get() const && noexcept - { return std::move(_M_storage); } - - constexpr _Type&& _M_get() && noexcept - { return std::move(_M_storage); } - - struct _Empty_byte { }; - - union { - _Empty_byte _M_empty; - _Type _M_storage; - }; -#else template constexpr _Uninitialized(in_place_index_t<0>, _Args&&... __args) @@ -299,7 +272,6 @@ namespace __variant { return std::move(*_M_storage._M_ptr()); } __gnu_cxx::__aligned_membuf<_Type> _M_storage; -#endif }; template @@ -316,6 +288,22 @@ namespace __variant return __variant::__get_n<_Np - 3>( std::forward<_Union>(__u)._M_rest._M_rest._M_rest); } +#else + template +constexpr auto&& +__get_n(_Union&& __u) noexcept +{ + if constexpr (_Np == 0) + return std::forward<_Union>(__u)._M_first._M_storage; + else if constexpr (_Np == 1) + return std::forward<_Union>(__u)._M_rest._M_first._M_storage; + else if constexpr (_Np == 2) + return std::forward<_Union>(__u)._M_rest._M_rest._M_first._M_storage; + else + return __variant::__get_n<_Np - 3>( +std::forward<_Union>(__u)._M_rest._M_rest
Re: [committed] libstdc++: Fix std::variant to reject array types [PR116381]
On Wed, Aug 21, 2024 at 1:56 AM Jonathan Wakely wrote: > > Tested x86_64-linux. Pushed to trunk. > > Probably worth backporting too. It could potentially cause new errors > for people using arrays in std::variant, but that's forbidden by the > standard. It might be worth mentioning in porting_to guide just in case. You never know since we have gotten bug reports about broken code that was also rejected by clang/MSVC due to a change in GCC. Thanks, Andrew > > -- >8 -- > > libstdc++-v3/ChangeLog: > > PR libstdc++/116381 > * include/std/variant (variant): Fix conditions for > static_assert to match the spec. > * testsuite/20_util/variant/types_neg.cc: New test. > --- > libstdc++-v3/include/std/variant| 6 ++ > .../testsuite/20_util/variant/types_neg.cc | 17 + > 2 files changed, 19 insertions(+), 4 deletions(-) > create mode 100644 libstdc++-v3/testsuite/20_util/variant/types_neg.cc > > diff --git a/libstdc++-v3/include/std/variant > b/libstdc++-v3/include/std/variant > index 12108d07f0b..5fb7770d889 100644 > --- a/libstdc++-v3/include/std/variant > +++ b/libstdc++-v3/include/std/variant > @@ -1457,10 +1457,8 @@ namespace __detail::__variant > >static_assert(sizeof...(_Types) > 0, > "variant must have at least one alternative"); > - static_assert(!(std::is_reference_v<_Types> || ...), > - "variant must have no reference alternative"); > - static_assert(!(std::is_void_v<_Types> || ...), > - "variant must have no void alternative"); > + static_assert(((std::is_object_v<_Types> && !is_array_v<_Types>) && > ...), > + "variant alternatives must be non-array object types"); > >using _Base = __detail::__variant::_Variant_base<_Types...>; > > diff --git a/libstdc++-v3/testsuite/20_util/variant/types_neg.cc > b/libstdc++-v3/testsuite/20_util/variant/types_neg.cc > new file mode 100644 > index 000..5cd3d02154b > --- /dev/null > +++ b/libstdc++-v3/testsuite/20_util/variant/types_neg.cc > @@ -0,0 +1,17 @@ > +// { dg-do compile { target c++17 } } > + > +# include > + > +std::variant<> v0; // { dg-error "here" } > +// { dg-error "must have at least one alternative" "" { target *-*-* } 0 } > +std::variant v1; // { dg-error "here" } > +std::variant v2; // { dg-error "here" } > +std::variant v3; // { dg-error "here" } > +std::variant v4; // { dg-error "here" } > +std::variant v5; // { dg-error "here" } > +std::variant v6; // { dg-error "here" } > +// { dg-error "must be non-array object types" "" { target *-*-* } 0 } > + > +// All of variant's base classes are instantiated before checking any > +// static_assert, so we get lots of errors before the expected errors above. > +// { dg-excess-errors "" } > -- > 2.46.0 >
Re: [PATCH 2/8] ifn: Add else-operand handling.
On Sun, 11 Aug 2024, Robin Dapp wrote: > This patch adds else-operand handling to the internal functions. LGTM. > gcc/ChangeLog: > > * internal-fn.cc (add_mask_and_len_args): Rename... > (add_mask_else_and_len_args): ...to this and add else handling. > (expand_partial_load_optab_fn): Use adjusted function. > (expand_partial_store_optab_fn): Ditto. > (expand_scatter_store_optab_fn): Ditto. > (expand_gather_load_optab_fn): Ditto. > (internal_fn_len_index): Adjust for masked loads. > (internal_fn_else_index): Add masked loads. > --- > gcc/internal-fn.cc | 69 ++ > 1 file changed, 58 insertions(+), 11 deletions(-) > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 8a2e07f2f96..586978e8f3f 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -331,17 +331,18 @@ get_multi_vector_move (tree array_type, convert_optab > optab) >return convert_optab_handler (optab, imode, vmode); > } > > -/* Add mask and len arguments according to the STMT. */ > +/* Add mask, else, and len arguments according to the STMT. */ > > static unsigned int > -add_mask_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt) > +add_mask_else_and_len_args (expand_operand *ops, unsigned int opno, gcall > *stmt) > { >internal_fn ifn = gimple_call_internal_fn (stmt); >int len_index = internal_fn_len_index (ifn); >/* BIAS is always consecutive next of LEN. */ >int bias_index = len_index + 1; >int mask_index = internal_fn_mask_index (ifn); > - /* The order of arguments are always {len,bias,mask}. */ > + > + /* The order of arguments is always {mask, else, len, bias}. */ >if (mask_index >= 0) > { >tree mask = gimple_call_arg (stmt, mask_index); > @@ -362,6 +363,23 @@ add_mask_and_len_args (expand_operand *ops, unsigned int > opno, gcall *stmt) > >create_input_operand (&ops[opno++], mask_rtx, > TYPE_MODE (TREE_TYPE (mask))); > + > +} > + > + int els_index = internal_fn_else_index (ifn); > + if (els_index >= 0) > +{ > + tree els = gimple_call_arg (stmt, els_index); > + tree els_type = TREE_TYPE (els); > + if (TREE_CODE (els) == SSA_NAME > + && SSA_NAME_IS_DEFAULT_DEF (els) > + && VAR_P (SSA_NAME_VAR (els))) > + create_undefined_input_operand (&ops[opno++], TYPE_MODE (els_type)); > + else > + { > + rtx els_rtx = expand_normal (els); > + create_input_operand (&ops[opno++], els_rtx, TYPE_MODE (els_type)); > + } > } >if (len_index >= 0) > { > @@ -3014,7 +3032,7 @@ static void > expand_partial_load_optab_fn (internal_fn ifn, gcall *stmt, convert_optab > optab) > { >int i = 0; > - class expand_operand ops[5]; > + class expand_operand ops[6]; >tree type, lhs, rhs, maskt; >rtx mem, target; >insn_code icode; > @@ -3044,7 +3062,7 @@ expand_partial_load_optab_fn (internal_fn ifn, gcall > *stmt, convert_optab optab) >target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); >create_call_lhs_operand (&ops[i++], target, TYPE_MODE (type)); >create_fixed_operand (&ops[i++], mem); > - i = add_mask_and_len_args (ops, i, stmt); > + i = add_mask_else_and_len_args (ops, i, stmt); >expand_insn (icode, i, ops); > >assign_call_lhs (lhs, target, &ops[0]); > @@ -3090,7 +3108,7 @@ expand_partial_store_optab_fn (internal_fn ifn, gcall > *stmt, convert_optab optab >reg = expand_normal (rhs); >create_fixed_operand (&ops[i++], mem); >create_input_operand (&ops[i++], reg, TYPE_MODE (type)); > - i = add_mask_and_len_args (ops, i, stmt); > + i = add_mask_else_and_len_args (ops, i, stmt); >expand_insn (icode, i, ops); > } > > @@ -3676,7 +3694,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall > *stmt, direct_optab optab) >create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset))); >create_integer_operand (&ops[i++], scale_int); >create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs))); > - i = add_mask_and_len_args (ops, i, stmt); > + i = add_mask_else_and_len_args (ops, i, stmt); > >insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE > (rhs)), > TYPE_MODE (TREE_TYPE (offset))); > @@ -3705,7 +3723,7 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, > direct_optab optab) >create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE > (offset))); >create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset))); >create_integer_operand (&ops[i++], scale_int); > - i = add_mask_and_len_args (ops, i, stmt); > + i = add_mask_else_and_len_args (ops, i, stmt); >insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE > (lhs)), > TYPE_MODE (TREE_TYPE (offset))); >expand_insn (icode, i, ops); > @@ -4590,6 +4608,18 @@ g
Re: [committed] libstdc++: Fix std::variant to reject array types [PR116381]
On Wed, 21 Aug 2024 at 10:03, Andrew Pinski wrote: > > On Wed, Aug 21, 2024 at 1:56 AM Jonathan Wakely wrote: > > > > Tested x86_64-linux. Pushed to trunk. > > > > Probably worth backporting too. It could potentially cause new errors > > for people using arrays in std::variant, but that's forbidden by the > > standard. > > It might be worth mentioning in porting_to guide just in case. You > never know since we have gotten bug reports about broken code that was > also rejected by clang/MSVC due to a change in GCC. Let's wait and see if it breaks anything when distros start building with the change. I don't expect any real world code to break, and if I'm right then there's no point documenting the change. For the backports maybe it makes sense to do it conditionally: #ifdef __STRICT_ANSI__ static_assert(((std::is_object_v<_Types> && !is_array_v<_Types>) && ...), "variant alternatives must be non-array object types"); #else static_assert((std::is_object_v<_Types> && ...), "variant alternatives must be object types"); #endif If you're asking for strict conformance, you shouldn't be using arrays in std::variant. This would avoid changing behaviour on release branches for non-strict modes.
Re: [PATCH] Align ix86_{move_max,store_max} with vectorizer.
On Wed, Aug 21, 2024 at 4:49 PM Richard Biener wrote: > > On Wed, Aug 21, 2024 at 7:40 AM liuhongt wrote: > > > > When none of mprefer-vector-width, avx256_optimal/avx128_optimal, > > avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will > > set ix86_{move_max,store_max} as max available vector length except > > for AVX part. > > > > if (TARGET_AVX512F_P (opts->x_ix86_isa_flags) > > && TARGET_EVEX512_P (opts->x_ix86_isa_flags2)) > > opts->x_ix86_move_max = PVW_AVX512; > > else > > opts->x_ix86_move_max = PVW_AVX128; > > > > So for -mavx2, vectorizer will choose 256-bit for vectorization, but > > 128-bit is used for struct copy, there could be a potential STLF issue > > due to this "misalign". > > > > The patch fixes that and improved 538.imagick_r by ~30% for > > -march=x86-64-v3 -O2. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Any comments? > > Should we look at the avx128_optimal tune and/or avx256_split_regs and > avx256_optimal > also for 512? Because IIRC the vectorizers default looks at that as > well (OTOH larger > stores should be fine for STLF). For Double Pumped processors, i.e. SRF, there's no STLF issue for 128-bit store and 256-bit load since the 256-bit load is teared down to 2 128-bit load. I guess it should be similar for Znver1/Znve4, so it should be fine with the mismatch between struct copy and vectorizer size. One exception is that we use 256-bit for vectorization and 512-bit for struct copy on SPR, it could be an issue when the struct copy is after the vectorization. But I didn't observe any cases yet, and for not-STLF-stall case, 512-bit copy should be better than 256-bit copy on SPR, So I'll leave it there.(There's a plan to enable 512-bit vectorization for SPR by default, it's ongoing). > > > gcc/ChangeLog: > > > > * config/i386/i386-options.cc (ix86_option_override_internal): > > set ix86_{move_max,store_max} to PVW_AVX256 when TARGET_AVX > > instead of PVW_AVX128. > > > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pieces-memcpy-10.c: Add -mprefer-vector-width=128. > > * gcc.target/i386/pieces-memcpy-6.c: Ditto. > > * gcc.target/i386/pieces-memset-38.c: Ditto. > > * gcc.target/i386/pieces-memset-40.c: Ditto. > > * gcc.target/i386/pieces-memset-41.c: Ditto. > > * gcc.target/i386/pieces-memset-42.c: Ditto. > > * gcc.target/i386/pieces-memset-43.c: Ditto. > > * gcc.target/i386/pieces-strcpy-2.c: Ditto. > > * gcc.target/i386/pieces-memcpy-22.c: New test. > > * gcc.target/i386/pieces-memset-51.c: New test. > > * gcc.target/i386/pieces-strcpy-3.c: New test. > > --- > > gcc/config/i386/i386-options.cc | 6 ++ > > gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c | 2 +- > > gcc/testsuite/gcc.target/i386/pieces-memcpy-22.c | 12 > > gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c | 2 +- > > gcc/testsuite/gcc.target/i386/pieces-memset-38.c | 2 +- > > gcc/testsuite/gcc.target/i386/pieces-memset-40.c | 2 +- > > gcc/testsuite/gcc.target/i386/pieces-memset-41.c | 2 +- > > gcc/testsuite/gcc.target/i386/pieces-memset-42.c | 2 +- > > gcc/testsuite/gcc.target/i386/pieces-memset-43.c | 2 +- > > gcc/testsuite/gcc.target/i386/pieces-memset-51.c | 12 > > gcc/testsuite/gcc.target/i386/pieces-strcpy-2.c | 2 +- > > gcc/testsuite/gcc.target/i386/pieces-strcpy-3.c | 15 +++ > > 12 files changed, 53 insertions(+), 8 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-22.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-51.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pieces-strcpy-3.c > > > > diff --git a/gcc/config/i386/i386-options.cc > > b/gcc/config/i386/i386-options.cc > > index f423455b363..f79257cc764 100644 > > --- a/gcc/config/i386/i386-options.cc > > +++ b/gcc/config/i386/i386-options.cc > > @@ -3023,6 +3023,9 @@ ix86_option_override_internal (bool main_args_p, > > if (TARGET_AVX512F_P (opts->x_ix86_isa_flags) > > && TARGET_EVEX512_P (opts->x_ix86_isa_flags2)) > > opts->x_ix86_move_max = PVW_AVX512; > > + /* Align with vectorizer to avoid potential STLF issue. */ > > + else if (TARGET_AVX_P (opts->x_ix86_isa_flags)) > > + opts->x_ix86_move_max = PVW_AVX256; > > else > > opts->x_ix86_move_max = PVW_AVX128; > > } > > @@ -3047,6 +3050,9 @@ ix86_option_override_internal (bool main_args_p, > > if (TARGET_AVX512F_P (opts->x_ix86_isa_flags) > > && TARGET_EVEX512_P (opts->x_ix86_isa_flags2)) > > opts->x_ix86_store_max = PVW_AVX512; > > + /* Align with vectorizer to avoid potential STLF issue. */ > > + else if (TARGET_AVX_P (opts->x_ix86_isa_
Re: [Fortran, Patch, PR86468, v1] Fix propagation of corank to array components in derived types.
Hi Harald, thanks for the review. I have changed the style of the code. Interestingly did the contrib/check_GNU_style.(py|sh) not complain on the old style nor on the new style. I tend to just trust clang-format to do a reproducible job and stick with that. Committed as: gcc-15-3066-g723b30bee4e Thanks again, Andre On Tue, 20 Aug 2024 22:32:32 +0200 Harald Anlauf wrote: > Hi Andre, > > Am 20.08.24 um 13:52 schrieb Andre Vehreschild: > > Hi all, > > > > attached patch fixes an ICE in gimplify by assuring that the corank of a > > non-pointer, non-coarray array component in a derived type is zero. > > Previously (erroneously) the full corank of the type has been used. There > > is one exception for pointer typed array components in coarray derived > > types. These can be associated only to coarray array targets (compare F2018 > > C1024 and C1026). Therefore for those the corank is still propagated. > > the patch is OK for mainline, but the formatting violates the coding > style here: > > @@ -2909,12 +2909,14 @@ gfc_get_derived_type (gfc_symbol * derived, int > codimen) > else > akind = GFC_ARRAY_ALLOCATABLE; > /* Pointers to arrays aren't actually pointer types. The > - descriptors are separate, but the data is common. */ > - field_type = gfc_build_array_type (field_type, c->as, akind, > - !c->attr.target > - && !c->attr.pointer, > - c->attr.contiguous, > - codimen); > + descriptors are separate, but the data is common. Every > + array pointer in a coarray derived type needs to provide > space > + for the coarray management, too. Therefore treat coarrays > + and pointers to coarrays in derived types the same. */ > + field_type = gfc_build_array_type ( > ^^^ > Please move this opening parenthesis to the next line, > otherwise the indenting with emacs goes sideways. > > + field_type, c->as, akind, !c->attr.target && > !c->attr.pointer, > + c->attr.contiguous, > + c->attr.codimension || c->attr.pointer ? codimen : 0); > } > else > field_type = gfc_get_nodesc_array_type (field_type, c->as, > > > Thanks, > Harald > > > Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline? > > > > Regards, > > Andre > > -- > > Andre Vehreschild * Email: vehre ad gmx dot de > -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH 4/8] vect: Add maskload else value support.
On Sun, 11 Aug 2024, Robin Dapp wrote: > This patch adds an else operand to vectorized masked load calls. > The current implementation adds else-value arguments to the respective > target-querying functions that is used to supply the vectorizer with the > proper else value. > > Right now, the only spot where a zero else value is actually enforced is > tree-ifcvt. Loop masking and other instances of masked loads in the > vectorizer itself do not use vec_cond_exprs. > > gcc/ChangeLog: > > * internal-fn.cc (internal_gather_scatter_fn_supported_p): Add > else argument. > * internal-fn.h (internal_gather_scatter_fn_supported_p): Ditto. > (MASK_LOAD_ELSE_NONE): Define. > (MASK_LOAD_ELSE_ZERO): Ditto. > (MASK_LOAD_ELSE_M1): Ditto. > (MASK_LOAD_ELSE_UNDEFINED): Ditto. > * optabs-query.cc (supports_vec_convert_optab_p): Return icode. > (get_supported_else_val): Return supported else value for > optab's operand at index. > (supports_vec_gather_load_p): Add else argument. > (supports_vec_scatter_store_p): Ditto. > * optabs-query.h (supports_vec_gather_load_p): Ditto. > (get_supported_else_val): Ditto. > * optabs-tree.cc (target_supports_mask_load_store_p): Ditto. > (can_vec_mask_load_store_p): Ditto. > (target_supports_len_load_store_p): Ditto. > (get_len_load_store_mode): Ditto. > * optabs-tree.h (target_supports_mask_load_store_p): Ditto. > (can_vec_mask_load_store_p): Ditto. > * tree-vect-data-refs.cc (vect_lanes_optab_supported_p): Ditto. > (vect_gather_scatter_fn_p): Ditto. > (vect_check_gather_scatter): Ditto. > (vect_load_lanes_supported): Ditto. > * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): > Ditto. > * tree-vect-slp.cc (vect_get_operand_map): Adjust indices for > else operand. > (vect_slp_analyze_node_operations): Skip undefined else operand. > * tree-vect-stmts.cc (exist_non_indexing_operands_for_use_p): > Add else operand handling. > (vect_get_vec_defs_for_operand): Handle undefined else operand. > (check_load_store_for_partial_vectors): Add else argument. > (vect_truncate_gather_scatter_offset): Ditto. > (vect_use_strided_gather_scatters_p): Ditto. > (get_group_load_store_type): Ditto. > (get_load_store_type): Ditto. > (vect_get_mask_load_else): Ditto. > (vect_get_else_val_from_tree): Ditto. > (vect_build_one_gather_load_call): Add zero else operand. > (vectorizable_load): Use else operand. > * tree-vectorizer.h (vect_gather_scatter_fn_p): Add else > argument. > (vect_load_lanes_supported): Ditto. > (vect_get_mask_load_else): Ditto. > (vect_get_else_val_from_tree): Ditto. > --- > gcc/internal-fn.cc | 19 +++- > gcc/internal-fn.h | 11 +- > gcc/optabs-query.cc| 83 +++--- > gcc/optabs-query.h | 3 +- > gcc/optabs-tree.cc | 43 +--- > gcc/optabs-tree.h | 8 +- > gcc/tree-vect-data-refs.cc | 39 +-- > gcc/tree-vect-patterns.cc | 17 ++- > gcc/tree-vect-slp.cc | 22 +++- > gcc/tree-vect-stmts.cc | 218 + > gcc/tree-vectorizer.h | 11 +- > 11 files changed, 367 insertions(+), 107 deletions(-) > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 586978e8f3f..2fc676e397c 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4988,12 +4988,15 @@ internal_fn_stored_value_index (internal_fn fn) > or stored. OFFSET_VECTOR_TYPE is the vector type that holds the > offset from the shared base address of each loaded or stored element. > SCALE is the amount by which these offsets should be multiplied > - *after* they have been extended to address width. */ > + *after* they have been extended to address width. > + If the target supports the gather load the supported else value > + will be written to the position ELSVAL points to if it is nonzero. */ > > bool > internal_gather_scatter_fn_supported_p (internal_fn ifn, tree vector_type, > tree memory_element_type, > - tree offset_vector_type, int scale) > + tree offset_vector_type, int scale, > + int *elsval) > { >if (!tree_int_cst_equal (TYPE_SIZE (TREE_TYPE (vector_type)), > TYPE_SIZE (memory_element_type))) > @@ -5006,9 +5009,15 @@ internal_gather_scatter_fn_supported_p (internal_fn > ifn, tree vector_type, > TYPE_MODE (offset_vector_type)); >int output_ops = internal_load_fn_p (ifn) ? 1 : 0; >bool unsigned_p = TYPE_UNSIGNED (TREE_TYPE (offset_vector_type)); > - return (icode != CODE_FOR_nothing > - && insn_operand_matches (icode, 2 + output_ops, GEN_INT (unsigned_p)) > -
Re: [PATCH] sra: Avoid risking x87 magling binary representation of a replacement (PR 58416)
On Mon, 19 Aug 2024, Martin Jambor wrote: > Hi, > > PR 58416 shows that storing non-floating point data to floating point > scalar registers can lead to miscompilations when the data is > normalized or otherwise processed upon loading to a register. To > avoid that risk, this patch detects situations where we have multiple > types and a we decide to represent the data in a type with a mode that > is known to not be able to transfer actual bits reliably using the new > TARGET_MODE_CAN_TRANSFER_BITS hook. > > Bootstrapped and tested on x86_64-linux. OK for trunk? OK (well, you know SRA best). > Any back-ports to release branches would of course need a back-port of > the hook itself, unfortunately. Of course. Thanks, Richard. > Thanks, > > Martin > > > gcc/ChangeLog: > > 2024-08-19 Martin Jambor > > PR target/58416 > * tree-sra.cc (types_risk_mangled_binary_repr_p): New function. > (sort_and_splice_var_accesses): Use it. > (propagate_subaccesses_from_rhs): Likewise. > > gcc/testsuite/ChangeLog: > > 2024-08-19 Martin Jambor > > PR target/58416 > * gcc.dg/torture/pr58416.c: New test. > --- > gcc/testsuite/gcc.dg/torture/pr58416.c | 32 ++ > gcc/tree-sra.cc| 28 +- > 2 files changed, 59 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr58416.c > > diff --git a/gcc/testsuite/gcc.dg/torture/pr58416.c > b/gcc/testsuite/gcc.dg/torture/pr58416.c > new file mode 100644 > index 000..0922b0e7089 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr58416.c > @@ -0,0 +1,32 @@ > +/* { dg-do run } */ > + > +struct s { > + char s[sizeof(long double)]; > +}; > + > +union u { > + long double d; > + struct s s; > +}; > + > +int main() > +{ > + union u x = {0}; > +#if __SIZEOF_LONG_DOUBLE__ == 16 > + x.s = (struct s){""}; > +#elif __SIZEOF_LONG_DOUBLE__ == 12 > + x.s = (struct s){""}; > +#elif __SIZEOF_LONG_DOUBLE__ == 8 > + x.s = (struct s){""}; > +#elif __SIZEOF_LONG_DOUBLE__ == 4 > + x.s = (struct s){""}; > +#endif > + > + union u y = x; > + > + for (unsigned char *p = (unsigned char *)&y + sizeof y; > + p-- > (unsigned char *)&y;) > +if (*p != (unsigned char)'x') > + __builtin_abort (); > + return 0; > +} > diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc > index 8040b0c5645..64e2f007d68 100644 > --- a/gcc/tree-sra.cc > +++ b/gcc/tree-sra.cc > @@ -2335,6 +2335,19 @@ same_access_path_p (tree exp1, tree exp2) >return true; > } > > +/* Return true when either T1 is a type that, when loaded into a register and > + stored back to memory will yield the same bits or when both T1 and T2 are > + compatible. */ > + > +static bool > +types_risk_mangled_binary_repr_p (tree t1, tree t2) > +{ > + if (mode_can_transfer_bits (TYPE_MODE (t1))) > +return false; > + > + return !types_compatible_p (t1, t2); > +} > + > /* Sort all accesses for the given variable, check for partial overlaps and > return NULL if there are any. If there are none, pick a representative > for > each combination of offset and size and create a linked list out of them. > @@ -2461,6 +2474,17 @@ sort_and_splice_var_accesses (tree var) > } > unscalarizable_region = true; > } > + else if (types_risk_mangled_binary_repr_p (access->type, ac2->type)) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + fprintf (dump_file, "Cannot scalarize the following access " > +"because data would be held in a mode which is not " > +"guaranteed to preserve all bits.\n "); > + dump_access (dump_file, access, false); > + } > + unscalarizable_region = true; > + } > > if (grp_same_access_path > && !same_access_path_p (access->expr, ac2->expr)) > @@ -3127,7 +3151,9 @@ propagate_subaccesses_from_rhs (struct access *lacc, > struct access *racc) > ret = true; > subtree_mark_written_and_rhs_enqueue (lacc); > } > - if (!lacc->first_child && !racc->first_child) > + if (!lacc->first_child > + && !racc->first_child > + && !types_risk_mangled_binary_repr_p (racc->type, lacc->type)) > { > /* We are about to change the access type from aggregate to scalar, >so we need to put the reverse flag onto the access, if any. */ > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH] Update LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker plugin hook
On Tue, Aug 20, 2024 at 3:24 PM H.J. Lu wrote: > > On Tue, Aug 20, 2024 at 2:03 AM Richard Biener > wrote: > > > > On Wed, Aug 14, 2024 at 3:15 PM H.J. Lu wrote: > > > > > > The new hook allows the linker plugin to distinguish calls to > > > claim_file_handler that know the object is being used by the linker > > > (from ldmain.c:add_archive_element), from calls that don't know it's > > > being used by the linker (from elf_link_is_defined_archive_symbol); in > > > the latter case, the plugin should avoid including the unused LTO archive > > > members in linker output. To get the proper support for archives with > > > LTO common symbols, the linker fix for > > > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=32083 > > > > > > is required. > > > > > > PR lto/116361 > > > * lto-plugin.c (claim_file_handler_v2): Include the LTO object > > > only if it is known to be used for link output. > > > > > > Signed-off-by: H.J. Lu > > > --- > > > lto-plugin/lto-plugin.c | 20 > > > 1 file changed, 12 insertions(+), 8 deletions(-) > > > > > > diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c > > > index 152648338b9..2d2bfa60d42 100644 > > > --- a/lto-plugin/lto-plugin.c > > > +++ b/lto-plugin/lto-plugin.c > > > @@ -1286,13 +1286,17 @@ claim_file_handler_v2 (const struct > > > ld_plugin_input_file *file, int *claimed, > > > lto_file.symtab.syms); > > >check (status == LDPS_OK, LDPL_FATAL, "could not add symbols"); > > > > We are still doing add_symbols, shouldn't what we do depend on what > > that does? The > > If status != LDPS_OK, the plugin will abort because of LDPL_FATAL. > > > function comment says > > > >If KNOWN_USED, the object is known by the linker > >to be used, or an older API version is in use that does not provide that > >information; otherwise, the linker is only determining whether this is > >a plugin object and it should not be registered as having offload data if > >not claimed by the plugin. > > > > where do you check "if not claimed by the plugin"? I think this at least > > needs > > clarification with the change. > > See my reply below. > > > > - LOCK_SECTION; > > > - num_claimed_files++; > > > - claimed_files = > > > - xrealloc (claimed_files, > > > - num_claimed_files * sizeof (struct plugin_file_info)); > > > - claimed_files[num_claimed_files - 1] = lto_file; > > > - UNLOCK_SECTION; > > > + /* Include it only if it is known to be used for link output. */ > > > + if (known_used) > > > + { > > > + LOCK_SECTION; > > > + num_claimed_files++; > > > + claimed_files = > > > + xrealloc (claimed_files, > > > + num_claimed_files * sizeof (struct > > > plugin_file_info)); > > > + claimed_files[num_claimed_files - 1] = lto_file; > > > + UNLOCK_SECTION; > > > + } > > > > > >*claimed = 1; > > > } > > > @@ -1313,7 +1317,7 @@ claim_file_handler_v2 (const struct > > > ld_plugin_input_file *file, int *claimed, > > >if (*claimed && !obj.offload && offload_files_last_lto == NULL) > > > offload_files_last_lto = offload_files_last; > > > > > > - if (obj.offload && (known_used || obj.found > 0)) > > > + if (obj.offload && known_used && obj.found > 0) > > The offload data is included when it is claimed by the plugin > even if known_used is 0. It looks quite odd to me. To me the whole 'known_used' thing looks odd - I would have expected the linker to do two round-trips for archives maybe; first with knwon_used == 0, just getting the add_symbol calls (aka, get the LTO symbol table), then the linker computes whether the archive is used and if it is, re-do the claim_file hook with known_used == 1. Is that how it is done? Otherwise how should the plugin know whether the file should be added or not? Will the linker take care of that then? Where is the API documented? I think how known_used is to be used needs better documentation. Did you look at how other linkers use known_used? Sorry for just asking questions and having no answers ;) Richard. > Since > can't test it and it isn't needed for PR lto/116361, I dropped > this change in the v2 patch: > > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660539.html > > If you agree that this change is correct, I can include it and update > comments. > > > > { > > >/* Add file to the list. The order must be exactly the same as > > > the final > > > order after recompilation and linking, otherwise host and target > > > tables > > > -- > > > 2.46.0 > > > > > > > -- > H.J.
Re: [PATCH] MATCH: add abs support for half float
Hi Richard, > On 20 Aug 2024, at 6:09 pm, Richard Biener wrote: > > External email: Use caution opening links or attachments > > > On Fri, Aug 9, 2024 at 2:39 AM Kugan Vivekanandarajah > wrote: >> >> Thanks for the comments. >> >>> On 2 Aug 2024, at 8:36 pm, Richard Biener >>> wrote: >>> >>> External email: Use caution opening links or attachments >>> >>> >>> On Fri, Aug 2, 2024 at 11:20 AM Kugan Vivekanandarajah >>> wrote: > On 1 Aug 2024, at 10:46 pm, Richard Biener > wrote: > > External email: Use caution opening links or attachments > > > On Thu, Aug 1, 2024 at 5:31 AM Kugan Vivekanandarajah > wrote: >> >> >> On Mon, Jul 29, 2024 at 10:11 AM Andrew Pinski wrote: >>> >>> On Mon, Jul 29, 2024 at 12:57 AM Kugan Vivekanandarajah >>> wrote: On Thu, Jul 25, 2024 at 10:19 PM Richard Biener wrote: > > On Thu, Jul 25, 2024 at 4:42 AM Kugan Vivekanandarajah > wrote: >> >> On Tue, Jul 23, 2024 at 11:56 PM Richard Biener >> wrote: >>> >>> On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah >>> wrote: On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski wrote: > > On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah > wrote: >> >> Revised based on the comment and moved it into existing patterns >> as. >> >> gcc/ChangeLog: >> >> * match.pd: Extend A CMP 0 ? A : -A into (type)A CMP 0 ? A : -A. >> Extend A CMP 0 ? A : -A into (type) A CMP 0 ? A : -A. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/tree-ssa/absfloat16.c: New test. > > The testcase needs to make sure it runs only for targets that > support > float16 so like: > > /* { dg-require-effective-target float16 } */ > /* { dg-add-options float16 } */ Added in the attached version. >>> >>> + /* (type)A >=/> 0 ? A : -Asame as abs (A) */ >>> (for cmp (ge gt) >>> (simplify >>> - (cnd (cmp @0 zerop) @1 (negate @1)) >>> -(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0)) >>> -&& !TYPE_UNSIGNED (TREE_TYPE(@0)) >>> -&& bitwise_equal_p (@0, @1)) >>> + (cnd (cmp (convert?@0 @1) zerop) @2 (negate @2)) >>> +(if (!HONOR_SIGNED_ZEROS (TREE_TYPE (@1)) >>> +&& !TYPE_UNSIGNED (TREE_TYPE (@1)) >>> +&& ((VECTOR_TYPE_P (type) >>> + && tree_nop_conversion_p (TREE_TYPE (@0), TREE_TYPE >>> (@1))) >>> + || (!VECTOR_TYPE_P (type) >>> + && (TYPE_PRECISION (TREE_TYPE (@1)) >>> + <= TYPE_PRECISION (TREE_TYPE (@0) >>> +&& bitwise_equal_p (@1, @2)) >>> >>> I wonder about the bitwise_equal_p which tests @1 against @2 now >>> with the convert still applied to @1 - that looks odd. You are >>> allowing >>> sign-changing conversions but doesn't that change ge/gt behavior? >>> Also why are sign/zero-extensions not OK for vector types? >> Thanks for the review. >> My main motivation here is for _Float16 as below. >> >> _Float16 absfloat16 (_Float16 x) >> { >> float _1; >> _Float16 _2; >> _Float16 _4; >> [local count: 1073741824]: >> _1 = (float) x_3(D); >> if (_1 < 0.0) >> goto ; [41.00%] >> else >> goto ; [59.00%] >> [local count: 440234144]:\ >> _4 = -x_3(D); >> [local count: 1073741824]: >> # _2 = PHI <_4(3), x_3(D)(2)> >> return _2; >> } >> >> This is why I added bitwise_equal_p test of @1 against @2 with >> TYPE_PRECISION checks. >> I agree that I will have to check for sign-changing conversions. >> >> Just to keep it simple, I disallowed vector types. I am not sure if >> this would hit vec types. I am happy to handle this if that is >> needed. > > I think with __builtin_convertvector you should be able to construct > a testcase that does Thanks. For the pattern, ``` /* A >=/> 0 ? A : -Asame as abs (A) */ (for cmp (ge gt) (simplify (cnd (cmp @0 zerop) @1 (negate @1)) (if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0)) && !TYPE_UNSIGNED (TREE_TYPE(@0)) && bitwise_equal_p (@0, @1)) (if (TYPE_UNSIGNED (type)) (absu:type @0) (abs @0) ``` the vector type doesn't se
Re: [PATCH] ifcvt: Fix force_operand ICE due to noce_convert_multiple_sets [PR116353]
On Tue, Aug 13, 2024 at 10:48 PM Jeff Law wrote: > > > > On 8/13/24 5:57 AM, Manolis Tsamis wrote: > > Now that more operations are allowed for noce_convert_multiple_sets, we > > need to > > check noce_can_force_operand on the sequence before calling > > try_emit_cmove_seq. > > Otherwise an inappropriate argument may be given to copy_to_mode_reg and > > result > > in an ICE. > > > > Fix-up for the recent ifcvt commit 72c9b5f438f22cca493b4e2a8a2a31ff61bf1477 > > > > PR tree-optimization/116353 > > > > gcc/ChangeLog: > > > > * ifcvt.cc (bb_ok_for_noce_convert_multiple_sets): Check > > noce_can_force_operand. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr116353.c: New test. > OK. > > Note I'm not entirely sure that noce_can_force_operand is sufficient > based on what I've seen on the v8 port. > > What I'm seeing on the v8 port is that we're trying to create a > conditional move where one arm is a rotate expression. That in and of > itself isn't an issue. The port does (of course) expect a register > operand, so we use force_operand to get the result into a GPR. So far, > so good. > > The ifcvt code does check noce_can_force_operand which returns true. I > don't remember the precise details other than it looked reasonable. So > still, so far, so good. > > The problem in the v850 doesn't have a generalized rotation pattern. > The expander will FAIL for most rotate counts and there's no alternate > synthesis currently defined in the optabs interface for a word mode > rotate. So that in turn causes force_operand to return NULL_RTX to its > caller and boom! > > The rotation patterns are allowed to FAIL if I'm reading the docs > correctly. So it seems like what the v8 port is doing is valid, but it > is causing a segfault and this testsuite failure: > > > Tests that now fail, but worked before (4 tests): > > > > v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: > > gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer > > -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess > > errors) > > v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: > > gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer > > -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess > > errors) > > v850-sim/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c > > -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer > > -finline-functions (test for excess errors) > > v850-sim/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 > > -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer > > -finline-functions (test for excess errors) > > > But perhaps it isn't that bad in practice. I can fix this by removing a > bit of what I expect is unnecessary code in the v850 port. > Hi Jeff, Yes, what you're saying is right, testing noce_can_force_operand for SET_SRC is not sufficient. I don't think you should change the v850 backend, this should be fixed in ifcvt. As far as my analysis goes this looks to be the same issue with PR116358. We can force_operand SET_SRC just fine, but that doesn't imply we can force_operand its individual arguments. I'm testing a solution that involves checking noce_can_force_operand in try_emit_cmove_seq, so that we know if it's safe to call emit_conditional_move (essentially an extension of this initial fix). Thanks, Manolis > Jeff
[Fortran, Patch, PR86468, v1] Follow up: Remove obsolete VIEW_CONVERT
Hi all, attached small patch removes a VIEW_CONVERT that I erroneously inserted during patching pr110033. PR86468 fixes the (co-)rank computation and therefore this VIEW_CONVERT is IMO obsolete. I think it may cause hard to find runtime bugs in the future and therefore like to remove it. Regtests ok on x86_64-pc-linux-gnu. Ok for mainline? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de From 3a83901e64568967600d5ba643723ae2ad80e0ac Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Wed, 21 Aug 2024 11:22:57 +0200 Subject: [PATCH] [Fortran] Remove unnecessary view_convert obsoleted by [PR86468]. This patch removes an unnecessary view_convert in trans_associate to prevent hard to find runtime errors in the future. The view_convert was erroneously introduced not understanding why ranks of the arrays to assign are different. The ranks are fixed by PR86468 now and the view_convert is obsolete. gcc/fortran/ChangeLog: PR fortran/86468 * trans-stmt.cc (trans_associate_var): Remove superfluous view_convert. --- gcc/fortran/trans-stmt.cc | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/gcc/fortran/trans-stmt.cc b/gcc/fortran/trans-stmt.cc index 023b1739b85..d92ca6477e4 100644 --- a/gcc/fortran/trans-stmt.cc +++ b/gcc/fortran/trans-stmt.cc @@ -2031,9 +2031,7 @@ trans_associate_var (gfc_symbol *sym, gfc_wrapped_block *block) gfc_class_data_get (GFC_DECL_SAVED_DESCRIPTOR (tmp))); } else - gfc_add_modify (&se.pre, sym->backend_decl, - build1 (VIEW_CONVERT_EXPR, -TREE_TYPE (sym->backend_decl), se.expr)); + gfc_add_modify (&se.pre, sym->backend_decl, se.expr); if (unlimited) { -- 2.46.0
Re: [PATCH 3/8] tree-ifcvt: Enforce zero else value after maskload.
> > > Why? I don't think the vectorizer relies on a particular else > > > value? I'd say it would be appropriate for if-conversion to > > > use "ANY" and for the vectorizer to then pick a supported > > > version and/or enforce the else value it needs via a blend? > > > > In PR115336 we have something like > > > > _Bool iftmp.0_113; > > _Bool iftmp.0_114; > > iftmp.0_113 = .MASK_LOAD (_170, 8B, _169, _171(D)); > > iftmp.0_114 = _47 | iftmp.0_113; > > > > which assumes zeroing. > > I see - is that some trick ifcvt performs? I can't immediately > see the connection to the PR and it only contains RISC-V assembly > analysis. It happens in predicate_scalar_phi where we build the cond_expr. After converting iftmp.0_114 = PHI we have _BoolD.2746 _47; iftmp.0_114 = _47 ? 1 : iftmp.0_113; which is folded into iftmp.0_114 = _47 | iftmp.0_113; I should really have documented that and more in the PR already... So it's not an ifcvt trick but rather match. Another related case is PR116059. > > In order to circumvent that we could use COND_IOR > > but I suppose that wouldn't be optimized away even on targets that zero > > masked elements? "ANY" would seem to be wrong here. > > What I was trying to say is that of course any transform we perform > that requires zero-masking should either make .MAKS_LOAD perform > that or add a COND_EXPR. But it shouldn't be required to make > all .MASK_LOADs be zero-masked, no? > > I'm of course fine if you think that's the best way for RISC-V given > other targets are likely unaffected as they can perform zero-masking. No, the less zeroing the better of course :) Richard S's point was to make the COND_EXPR explicit, so that e.g. a MASK_LOAD (mask, ..., 1) does not appear cheap as cheap as MASK_LOAD (mask, ..., 0) on zeroing targets. >From this I kind of jumped to the conclusion (see below) that we might want it everywhere. With the patches as is, ifcvt would enforce the zero here while all other masked-load occurrences in the vectorizer would just query the target's preferred else value and simply use that without blend/cond_expr. > > What I didn't do (in the posted version, just locally) is an explicit > > VEC_COND_EXPR after each masked (gather/load lanes) call the vectorizer > > does. Do we need that? AFAICT loop masking (be it len style or > > fully-masked style) should be safe. > > Well, why should we need that? There seem to be the assumption that > .MASK_LOAD is zero-masked in very few places (PR115336, but not > identified there), if we'd assume this everywhere there would be > way more issues with RISC-V? Ok, I was already pretty sure we don't need - and glad to hear it confirmed. I was just thinking for consistency reasons we might want a masked load to always look like foo123 = .MASK_..._LOAD (mask, ..., else) foo124 = COND_EXPR (mask, foo123, 0); where foo124 would be optimized away (or not even emitted) for zeroing targets). That way subsequent code could always rely on zero. But as this requirement seems very rare it doesn't look like a useful invariant to enforce. All in all, it seems we don't need major changes to the approach. I'm going to work on the comments for the other patches. -- Regards Robin
Ping^2: C++/ME patch ping
Hi, Pinging these patches again: - https://inbox.sourceware.org/20240807131613.526335-1-ar...@aarsen.me/ - https://inbox.sourceware.org/20240802211503.3992610-2-ar...@aarsen.me/ Thanks in advance, have a lovely day. -- Arsen Arsenović signature.asc Description: PGP signature
Re: [PATCH 3/8] tree-ifcvt: Enforce zero else value after maskload.
On Wed, 21 Aug 2024, Robin Dapp wrote: > > > > Why? I don't think the vectorizer relies on a particular else > > > > value? I'd say it would be appropriate for if-conversion to > > > > use "ANY" and for the vectorizer to then pick a supported > > > > version and/or enforce the else value it needs via a blend? > > > > > > In PR115336 we have something like > > > > > > _Bool iftmp.0_113; > > > _Bool iftmp.0_114; > > > iftmp.0_113 = .MASK_LOAD (_170, 8B, _169, _171(D)); > > > iftmp.0_114 = _47 | iftmp.0_113; > > > > > > which assumes zeroing. > > > > I see - is that some trick ifcvt performs? I can't immediately > > see the connection to the PR and it only contains RISC-V assembly > > analysis. > > It happens in predicate_scalar_phi where we build the cond_expr. > > After converting > > iftmp.0_114 = PHI > > we have > > _BoolD.2746 _47; > iftmp.0_114 = _47 ? 1 : iftmp.0_113; > which is folded into > iftmp.0_114 = _47 | iftmp.0_113; > > I should really have documented that and more in the PR already... > So it's not an ifcvt trick but rather match. _47 was the .MASK_LOAD def, right? It's not exactly obvious what goes wrong - the transform above is correct - it's only "unexpected" for the lanes that were masked. So the actual bug must be downstream from iftmp.0_144. I think one can try to reason on the ifcvt (scalar) code by assuming the .MASK_LOAD def would be undefined. Then we'd have _47(D) ? 1 : iftmp.0_133 -> _47(D) | iftmp.0_133, I think that's at most phishy as the COND_EXPR has a well-defined value while the IOR might spill "undefined" elsewhere causing divergence. Is that what is actually happening? Richard. > Another related case is PR116059. > > > > In order to circumvent that we could use COND_IOR > > > but I suppose that wouldn't be optimized away even on targets that zero > > > masked elements? "ANY" would seem to be wrong here. > > > > What I was trying to say is that of course any transform we perform > > that requires zero-masking should either make .MAKS_LOAD perform > > that or add a COND_EXPR. But it shouldn't be required to make > > all .MASK_LOADs be zero-masked, no? > > > > I'm of course fine if you think that's the best way for RISC-V given > > other targets are likely unaffected as they can perform zero-masking. > > No, the less zeroing the better of course :) > > Richard S's point was to make the COND_EXPR explicit, so that e.g. > a MASK_LOAD (mask, ..., 1) does not appear cheap as cheap as > MASK_LOAD (mask, ..., 0) on zeroing targets. > > From this I kind of jumped to the conclusion (see below) that we might want > it everywhere. > > With the patches as is, ifcvt would enforce the zero here while all other > masked-load occurrences in the vectorizer would just query the target's > preferred else value and simply use that without blend/cond_expr. > > > > What I didn't do (in the posted version, just locally) is an explicit > > > VEC_COND_EXPR after each masked (gather/load lanes) call the vectorizer > > > does. Do we need that? AFAICT loop masking (be it len style or > > > fully-masked style) should be safe. > > > > Well, why should we need that? There seem to be the assumption that > > .MASK_LOAD is zero-masked in very few places (PR115336, but not > > identified there), if we'd assume this everywhere there would be > > way more issues with RISC-V? > > Ok, I was already pretty sure we don't need - and glad to hear it confirmed. > I was just thinking for consistency reasons we might want a masked > load to always look like > foo123 = .MASK_..._LOAD (mask, ..., else) > foo124 = COND_EXPR (mask, foo123, 0); > where foo124 would be optimized away (or not even emitted) for zeroing > targets). That way subsequent code could always rely on zero. > But as this requirement seems very rare it doesn't look like a useful > invariant to enforce. > > All in all, it seems we don't need major changes to the approach. > I'm going to work on the comments for the other patches. > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH] libstdc++: Check ios::uppercase for ios::fixed floating-point output [PR114862]
This is still pending a decision by LEWG, but I've pushed it to trunk anyway. We can always revert it before GCC 15 is released if the committee decides against it, but this way we might get user feedback on it. On Thu, 1 Aug 2024 at 22:41, Jonathan Wakely wrote: > > Tested x86_64-linux. > > -- >8 -- > > This is LWG 4084 which I filed recently. LWG seems to support making the > change, so that std::num_put can use the %F format for floating-point > numbers. > > libstdc++-v3/ChangeLog: > > PR libstdc++/114862 > * src/c++98/locale_facets.cc (__num_base::_S_format_float): > Check uppercase flag for fixed format. > * testsuite/22_locale/num_put/put/char/lwg4084.cc: New test. > --- > libstdc++-v3/src/c++98/locale_facets.cc | 13 -- > .../22_locale/num_put/put/char/lwg4084.cc | 46 +++ > 2 files changed, 54 insertions(+), 5 deletions(-) > create mode 100644 > libstdc++-v3/testsuite/22_locale/num_put/put/char/lwg4084.cc > > diff --git a/libstdc++-v3/src/c++98/locale_facets.cc > b/libstdc++-v3/src/c++98/locale_facets.cc > index fa469b1b872..02f53fd5ec1 100644 > --- a/libstdc++-v3/src/c++98/locale_facets.cc > +++ b/libstdc++-v3/src/c++98/locale_facets.cc > @@ -84,17 +84,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > if (__mod) >*__fptr++ = __mod; > -// [22.2.2.2.2] Table 58 > +// C++11 [facet.num.put.virtuals] Table 88 > +// _GLIBCXX_RESOLVE_LIB_DEFECTS > +// 4084. std::fixed ignores std::uppercase > +bool __upper = __flags & ios_base::uppercase; > if (__fltfield == ios_base::fixed) > - *__fptr++ = 'f'; > + *__fptr++ = __upper ? 'F' : 'f'; > else if (__fltfield == ios_base::scientific) > - *__fptr++ = (__flags & ios_base::uppercase) ? 'E' : 'e'; > + *__fptr++ = __upper ? 'E' : 'e'; > #if _GLIBCXX_USE_C99_STDIO > else if (__fltfield == (ios_base::fixed | ios_base::scientific)) > - *__fptr++ = (__flags & ios_base::uppercase) ? 'A' : 'a'; > + *__fptr++ = __upper ? 'A' : 'a'; > #endif > else > - *__fptr++ = (__flags & ios_base::uppercase) ? 'G' : 'g'; > + *__fptr++ = __upper ? 'G' : 'g'; > *__fptr = '\0'; >} > > diff --git a/libstdc++-v3/testsuite/22_locale/num_put/put/char/lwg4084.cc > b/libstdc++-v3/testsuite/22_locale/num_put/put/char/lwg4084.cc > new file mode 100644 > index 000..b7c7da11f86 > --- /dev/null > +++ b/libstdc++-v3/testsuite/22_locale/num_put/put/char/lwg4084.cc > @@ -0,0 +1,46 @@ > +// { dg-do run } > +// LWG 4084. std::fixed ignores std::uppercase > +// PR libstdc++/114862 std::uppercase not applying to nan's and inf's > + > +#include > +#include > +#include > +#include > + > +void > +test_nan() > +{ > + std::ostringstream out; > + double nan = std::numeric_limits::quiet_NaN(); > + out << std::fixed; > + out << ' ' << nan << ' ' << -nan; > + out << std::uppercase; > + out << ' ' << nan << ' ' << -nan; > + out << std::showpoint; > + out << ' ' << nan << ' ' << -nan; > + out << std::showpos; > + out << ' ' << nan << ' ' << -nan; > + VERIFY( out.str() == " nan -nan NAN -NAN NAN -NAN +NAN -NAN" ); > +} > + > +void > +test_inf() > +{ > + std::ostringstream out; > + double inf = std::numeric_limits::infinity(); > + out << std::fixed; > + out << ' ' << inf << ' ' << -inf; > + out << std::uppercase; > + out << ' ' << inf << ' ' << -inf; > + out << std::showpoint; > + out << ' ' << inf << ' ' << -inf; > + out << std::showpos; > + out << ' ' << inf << ' ' << -inf; > + VERIFY( out.str() == " inf -inf INF -INF INF -INF +INF -INF" ); > +} > + > +int main() > +{ > + test_nan(); > + test_inf(); > +} > -- > 2.45.2 >
RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
> -Original Message- > From: Richard Biener > Sent: Tuesday, August 20, 2024 10:36 AM > To: Richard Sandiford > Cc: Prathamesh Kulkarni ; Thomas Schwinge > ; gcc-patches@gcc.gnu.org > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while streaming in for > accelerator > > External email: Use caution opening links or attachments > > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford > : > > > > Prathamesh Kulkarni writes: > >> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc index > >> cbf6041fd68..0420183faf8 100644 > >> --- a/gcc/lto-streamer-in.cc > >> +++ b/gcc/lto-streamer-in.cc > >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If not > see > >> #include "debug.h" > >> #include "alloc-pool.h" > >> #include "toplev.h" > >> +#include "stor-layout.h" > >> > >> /* Allocator used to hold string slot entries for line map > streaming. > >> */ static struct object_allocator > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@ lto_read_tree_1 > (class lto_input_block *ib, class data_in *data_in, tree expr) > >> with -g1, see for example PR113488. */ > >> else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr) == > expr) > >>DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE; > >> + > >> +#ifdef ACCEL_COMPILER > >> + /* For decl with aggregate type, host streams out VOIDmode. > >> + Compute the correct DECL_MODE by calling relayout_decl. */ > >> + if ((VAR_P (expr) > >> + || TREE_CODE (expr) == PARM_DECL > >> + || TREE_CODE (expr) == FIELD_DECL) > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr)) > >> + && DECL_MODE (expr) == VOIDmode) > >> +relayout_decl (expr); > >> +#endif > > > > Genuine question, but: is relayout_decl safe in this context? It > does > > a lot more than just reset the mode. It also applies the target > ABI's > > preferences wrt alignment, padding, and so on, rather than > preserving > > those of the host's. > > It would be better to just recompute the mode here. Hi, The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE (expr)) in lto_read_tree_1 instead of calling relayout_decl (expr). I checked layout_decl_type does the same thing for setting decl mode, except for bit fields. Since bit-fields cannot have aggregate type, I am assuming setting DECL_MODE (expr) to TYPE_MODE (TREE_TYPE (expr)) would be OK in this case ? Sorry if this sounds like a silly ques -- Why would it be unsafe to call relayout_decl for variables that are mapped to accelerator even if it'd not preserve host's properties ? I assumed we want to assign accel's ABI properties for mapped decls (mode being one of them), or am I misunderstanding ? Signed-off-by: Prathamesh Kulkarni Thanks, Prathamesh > > Richard > > > Thanks, > > Richard > > > > > >> } > >> } > >> > >> diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc index > >> 10c0809914c..0ff8bd1171e 100644 > >> --- a/gcc/stor-layout.cc > >> +++ b/gcc/stor-layout.cc > >> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type, const char > *name, tree fields, > >> layout_decl (TYPE_NAME (type), 0); > >> } > >> > >> +/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE). */ > >> + > >> +void compute_array_mode (tree type) > >> +{ > >> + gcc_assert (TREE_CODE (type) == ARRAY_TYPE); > >> + > >> + SET_TYPE_MODE (type, BLKmode); > >> + if (TYPE_SIZE (type) != 0 > >> + && ! targetm.member_type_forces_blk (type, VOIDmode) > >> + /* BLKmode elements force BLKmode aggregate; > >> + else extract/store fields may lose. */ > >> + && (TYPE_MODE (TREE_TYPE (type)) != BLKmode > >> + || TYPE_NO_FORCE_BLK (TREE_TYPE (type > >> +{ > >> + SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type), > >> + TYPE_SIZE (type))); > >> + if (TYPE_MODE (type) != BLKmode > >> + && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT > >> + && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE > (type))) > >> +{ > >> + TYPE_NO_FORCE_BLK (type) = 1; > >> + SET_TYPE_MODE (type, BLKmode); > >> +} > >> +} > >> +} > >> + > >> /* Calculate the mode, size, and alignment for TYPE. > >>For an array type, calculate the element separation as well. > >>Record TYPE on the chain of permanent or temporary types @@ > >> -2709,24 +2735,7 @@ layout_type (tree type) > >>align = MAX (align, BITS_PER_UNIT); #endif > >>SET_TYPE_ALIGN (type, align); > >> -SET_TYPE_MODE (type, BLKmode); > >> -if (TYPE_SIZE (type) != 0 > >> -&& ! targetm.member_type_forces_blk (type, VOIDmode) > >> -/* BLKmode elements force BLKmode aggregate; > >> - else extract/store fields may lose. */ > >> -&& (TYPE_MODE (TREE_TYPE (type)) != BLKmode > >> -|| TYPE_NO_FORCE_BLK (TREE_TYPE (type > >> - { > >> -SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type), > >> -
Re: [RFC] Support single lane SLP early break
On Tue, 20 Aug 2024, Tamar Christina wrote: > Hi, > > I've been working on a prototype of moving early break to SLP. > > As we've discussed on IRC I've decided to first try adding the gconds as roots > and start SLP discovery using them as roots. > > This works great and doesn't require any changed to build_slp, it also has the > additional benefit in that we can easily (as a follow up) add groups of > gconds and then try to SLP the roots together if the operations are the same > and then decompose the tree based on the roots if not. > > So it looks like using the roots are the best approach. However I've hit some > issues that I could solve, but would require me to modify large chunks of code > and would like your input before I start. > > 1. roots are currently not analyzed or code-gened through vectorizable_*. >this is because it looks like only things used as roots so far are things >that all targets support (like constructors) or that will be lowered by >veclower later. This is easy to fix I can work roots into the analysis >part in vect_slp_analyze_node_operations and pass enough information to >vectorize_slp_instance_root_stmt to be able to use > vectorizable_early_break. >I have a prototype of this currently working but it's a hack and need to do >it properly if it's the way you'd like to go. There is currently no "explicit" separate analysis of the root but only vect_slp_analyze_operations doing &cost_vec) /* CTOR instances require vectorized defs for the SLP tree root. */ || (SLP_INSTANCE_KIND (instance) == slp_inst_kind_ctor && (SLP_TREE_DEF_TYPE (SLP_INSTANCE_TREE (instance)) != vect_internal_def /* Make sure we vectorized with the expected type. */ || !useless_type_conversion_p (TREE_TYPE (TREE_TYPE (gimple_assign_rhs1 (instance->root_stmts[0]->stmt))), TREE_TYPE (SLP_TREE_VECTYPE (SLP_INSTANCE_TREE (instance)) /* Check we can vectorize the reduction. */ || (SLP_INSTANCE_KIND (instance) == slp_inst_kind_bb_reduc && !vectorizable_bb_reduc_epilogue (instance, &cost_vec))) for the transform phase we do have vectorize_slp_instance_root_stmt (called by vect_schedule_slp). Both do not really fit the vectorizable_* API since how the root looks like really depends on the SLP instance kind. So it would be above where you'd hook in the required code, adding a slp_inst_kind_early_break or so. Factoring the analysis part into a vectorizable_slp_instance_root () function would be an improvement of course. > 2. consider the loop: > > #ifndef N > #define N 800 > #endif > unsigned vect_a[N]; > unsigned vect_b[N]; > > unsigned test4(unsigned x) > { > unsigned ret = 0; > for (int i = 0; i < N; i++) > { >vect_b[i] = x + i; >if (vect_a[i]*2 != x) > break; >vect_a[i] = x; > > } > return ret; > } > > The build part looks like: > > note: === vect_analyze_slp === > note: Analyzing vectorizable control flow: if (patt_6 != 0) > note: Starting SLP discovery for > note: patt_6 = _4 != x_9(D); > note: starting SLP discovery for node 0x5141280 > note: Build SLP for patt_6 = _4 != x_9(D); > note: precomputed vectype: vector(4) > note: nunits = 4 > note: vect_is_simple_use: operand x_9(D), type of def: external > note: vect_is_simple_use: operand # RANGE [irange] unsigned int [0, 0][2, > +INF] MASK 0xfffe VALUE 0x0 > _3 * 2, type of def: internal > note: starting SLP discovery for node 0x51413a0 > note: Build SLP for _4 = _3 * 2; > note: precomputed vectype: vector(4) unsigned int > note: nunits = 4 > note: vect_is_simple_use: operand # VUSE <.MEM_10> > vect_aD.4416[i_15], type of def: internal > note: vect_is_simple_use: operand 2, type of def: constant > note: vect_is_simple_use: operand # VUSE <.MEM_10> > vect_aD.4416[i_15], type of def: internal > note: vect_is_simple_use: operand 2, type of def: constant > note: starting SLP discovery for node 0x5141430 > note: Build SLP for _3 = vect_a[i_15]; > note: precomputed vectype: vector(4) unsigned int > note: nunits = 4 > note: SLP discovery for node 0x5141430 succeeded > note: SLP discovery for node 0x51413a0 succeeded > note: SLP discovery for node 0x5141280 succeeded > note: SLP size 3 vs. limit 10. > note: Final SLP tree for instance 0x5208e30: > note: node 0x5141280 (max_nunits=4, refcnt=2) vector(4) > note: op template: patt_6 = _4 != x_9(D); > note: stmt 0 patt_6 = _4 != x_9(D); > note: children 0x5141310 0x51413a0 > note: node (external) 0x5141310 (max_nunits=1, refcnt=1) > note: { x_9(D) } > note: node 0x51413a0 (max_nunits=4, refcnt=2) vector(4) unsigned int > note: op template: _4 = _3 * 2; > note:
Re: [PATCH 3/8] tree-ifcvt: Enforce zero else value after maskload.
> > > > _Bool iftmp.0_113; > > > > _Bool iftmp.0_114; > > > > iftmp.0_113 = .MASK_LOAD (_170, 8B, _169, _171(D)); > > > > iftmp.0_114 = _47 | iftmp.0_113; > > _BoolD.2746 _47; > > iftmp.0_114 = _47 ? 1 : iftmp.0_113; > > which is folded into > > iftmp.0_114 = _47 | iftmp.0_113; > > _47 was the .MASK_LOAD def, right? _47 is the inverted load mask, iftmp.0_113 is the MASK_LOAD result. Its mask is _169 where _169 = ~_47; > It's not exactly obvious what goes wrong - the transform above > is correct - it's only "unexpected" for the lanes that were > masked. So the actual bug must be downstream from iftmp.0_144. > > I think one can try to reason on the ifcvt (scalar) code by > assuming the .MASK_LOAD def would be undefined. Then we'd > have _47(D) ? 1 : iftmp.0_133 -> _47(D) | iftmp.0_133, I think > that's at most phishy as the COND_EXPR has a well-defined > value while the IOR might spill "undefined" elsewhere causing > divergence. Is that what is actually happening? After vectorization we recognize the mask (_47) as degenerate, i.e. all ones and, conversely, the masked load mask (_169) is all zeros. So we shouldn't really load anything. Optimized we have vect_patt_384.36_436 = .MASK_LEN_GATHER_LOAD (_435, vect__47.35_432, 1, { 0, ... }, { 0, ... }, _471, 0); vect_iftmp.37_439 = vect_patt_384.36_436 | { 1, ... }; We then re-use a non-zero vector register as masked load result. Its stale values cause the wrong result (which should be 1 everywhere). -- Regards Robin
Re: [PATCH] vect: Multistep float->int conversion only with no trapping math
On Tue, Aug 20, 2024 at 3:35 PM Juergen Christ wrote: > > Am Tue, Aug 20, 2024 at 02:51:02PM +0200 schrieb Richard Biener: > > On Tue, Aug 20, 2024 at 11:16 AM Juergen Christ > > wrote: > > > > > > Am Tue, Aug 20, 2024 at 10:15:22AM +0200 schrieb Richard Biener: > > > > On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ > > > > wrote: > > > > > > > > > > Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener: > > > > > > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ > > > > > > wrote: > > > > > > > > > > > > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener: > > > > > > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Do not convert floats to ints in multiple step if trapping > > > > > > > > > math is > > > > > > > > > enabled. This might hide some inexact signals. > > > > > > > > > > > > > > > > > > Also use correct sign (the sign of the target integer type) > > > > > > > > > for the > > > > > > > > > intermediate steps. This only affects undefined behaviour > > > > > > > > > (casting > > > > > > > > > floats to unsigned datatype where the float is negative). > > > > > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > > > > > * tree-vect-stmts.cc (vectorizable_conversion): > > > > > > > > > multi-step > > > > > > > > > float to int conversion only with trapping math and > > > > > > > > > correct > > > > > > > > > sign. > > > > > > > > > > > > > > > > > > Signed-off-by: Juergen Christ > > > > > > > > > > > > > > > > > > Bootstrapped and tested on x84 and s390. Ok for trunk? > > > > > > > > > > > > > > > > > > --- > > > > > > > > > gcc/tree-vect-stmts.cc | 8 +--- > > > > > > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > > > > > > > index fdcda0d2abae..2ddd13383193 100644 > > > > > > > > > --- a/gcc/tree-vect-stmts.cc > > > > > > > > > +++ b/gcc/tree-vect-stmts.cc > > > > > > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info > > > > > > > > > *vinfo, > > > > > > > > > break; > > > > > > > > > > > > > > > > > > cvt_type > > > > > > > > > - = build_nonstandard_integer_type > > > > > > > > > (GET_MODE_BITSIZE (rhs_mode), 0); > > > > > > > > > + = build_nonstandard_integer_type > > > > > > > > > (GET_MODE_BITSIZE (rhs_mode), > > > > > > > > > + TYPE_UNSIGNED > > > > > > > > > (lhs_type)); > > > > > > > > > > > > > > > > But lhs_type should be a float type here, the idea that for a > > > > > > > > FLOAT_EXPR (int -> float) > > > > > > > > a signed integer type is the natural one to use - as it's 2x > > > > > > > > wider > > > > > > > > than the original > > > > > > > > RHS type it's signedness doesn't matter. Note all float types > > > > > > > > should be > > > > > > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on > > > > > > > > the intent IMO. > > > > > > > > > > > > > > > > Please drop it. > > > > > > > > > > > > > > Will do. Sorry about that. > > > > > > > > > > > > > > > > cvt_type = get_same_sized_vectype (cvt_type, > > > > > > > > > vectype_in); > > > > > > > > > if (cvt_type == NULL_TREE) > > > > > > > > > goto unsupported; > > > > > > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info > > > > > > > > > *vinfo, > > > > > > > > >if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE > > > > > > > > > (rhs_mode)) > > > > > > > > > goto unsupported; > > > > > > > > > > > > > > > > > > - if (code == FIX_TRUNC_EXPR) > > > > > > > > > + if (code == FIX_TRUNC_EXPR && !flag_trapping_math) > > > > > > > > > { > > > > > > > > > cvt_type > > > > > > > > > - = build_nonstandard_integer_type > > > > > > > > > (GET_MODE_BITSIZE (rhs_mode), 0); > > > > > > > > > + = build_nonstandard_integer_type > > > > > > > > > (GET_MODE_BITSIZE (rhs_mode), > > > > > > > > > + TYPE_UNSIGNED > > > > > > > > > (lhs_type)); > > > > > > > > > > > > > > > > Here it might be relevant for correctness - we have to choose > > > > > > > > between > > > > > > > > sfix and ufix for the float -> [u]int conversion. > > > > > > > > > > > > > > > > Do you have a testcase? Shouldn't the exactness be > > > > > > > > independent of the integer > > > > > > > > type we convert to? > > > > > > > > > > > > > > I was looking at this little program which contains undefined > > > > > > > behaviour: > > > > > > > > > > > > > > #include > > > > > > > > > > > > > > __attribute__((noinline,noclone,noipa)) > > > > > > > void > > > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out); > > > > > > > > > > > > > > void > > > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out) > > > > > > > { > > > > > > >
Re: [PATCH 3/8] tree-ifcvt: Enforce zero else value after maskload.
On Wed, 21 Aug 2024, Robin Dapp wrote: > > > > > _Bool iftmp.0_113; > > > > > _Bool iftmp.0_114; > > > > > iftmp.0_113 = .MASK_LOAD (_170, 8B, _169, _171(D)); > > > > > iftmp.0_114 = _47 | iftmp.0_113; > > > > _BoolD.2746 _47; > > > iftmp.0_114 = _47 ? 1 : iftmp.0_113; > > > which is folded into > > > iftmp.0_114 = _47 | iftmp.0_113; > > > > > _47 was the .MASK_LOAD def, right? > > _47 is the inverted load mask, iftmp.0_113 is the MASK_LOAD result. > Its mask is _169 where _169 = ~_47; > > > It's not exactly obvious what goes wrong - the transform above > > is correct - it's only "unexpected" for the lanes that were > > masked. So the actual bug must be downstream from iftmp.0_144. > > > > I think one can try to reason on the ifcvt (scalar) code by > > assuming the .MASK_LOAD def would be undefined. Then we'd > > have _47(D) ? 1 : iftmp.0_133 -> _47(D) | iftmp.0_133, I think > > that's at most phishy as the COND_EXPR has a well-defined > > value while the IOR might spill "undefined" elsewhere causing > > divergence. Is that what is actually happening? > > After vectorization we recognize the mask (_47) as degenerate, > i.e. all ones and, conversely, the masked load mask (_169) is all zeros. > So we shouldn't really load anything. > > Optimized we have > > vect_patt_384.36_436 = .MASK_LEN_GATHER_LOAD (_435, vect__47.35_432, 1, { > 0, ... }, { 0, ... }, _471, 0); > vect_iftmp.37_439 = vect_patt_384.36_436 | { 1, ... }; > > We then re-use a non-zero vector register as masked load result. Its > stale values cause the wrong result (which should be 1 everywhere). And we fail to fold vect_patt_384.36_436 | { 1, ... } to { 1, ... }? Or is the issue that vector masks contain padding and with non-zero masking we'd have garbage in the padding and that leaks here? That is, _47 ? 1 : iftmp.0_113 -> _47 | iftmp.0_113 assumes there's exactly one bit in a bool, specifically it has assumptions on padding - I'd guess that *(char *)p = 17; _Bool _47 = *(_Bool *)p; ... = _47 ? 1 : b; would have similar issues but eventually loading 17 into a _Bool invokes undefined behavior. So maybe the COND_EXPRs are only required for .MASK_LOADs of _Bool (or any other type with padding)? Richard.
Re: [PATCH] Do not emit a redundant DW_TAG_lexical_block for inlined subroutines
On 8/21/24 10:45, Richard Biener wrote: > On Wed, 21 Aug 2024, Richard Biener wrote: > >> On Tue, 20 Aug 2024, Bernd Edlinger wrote: >> >>> On 8/20/24 13:00, Richard Biener wrote: On Fri, Aug 16, 2024 at 12:49 PM Bernd Edlinger wrote: > > While this already works correctly for the case when an inlined > subroutine contains only one subrange, a redundant DW_TAG_lexical_block > is still emitted when the subroutine has multiple blocks. Huh. The point is that the inline context is a single scope block with no siblings - how did that get messed up? The patch unfortunately does not contain a testcase. >>> >>> Well, I became aware of this because I am working on a gdb patch, >>> which improves the debug experience of optimized C code, and to my surprise >>> the test case did not work with gcc-8, while gcc-9 and following were fine. >>> Initially I did not see what is wrong, therefore I started to bisect when >>> this changed, and so I found your patch, which removed some lexical blocks >>> in the debug info of this gdb test case: >>> >>> from binutils-gdb/gdb/testsuite/gdb.cp/step-and-next-inline.cc >>> in case you have the binutils-gdb already downloaded you can skip this: >>> $ git clone git://sourceware.org/git/binutils-gdb >>> $ cd binutils-gdb/gdb/testsuite/gdb.cp >>> $ gcc -g -O2 step-and-next-inline.cc >>> >>> when you look at the debug info with readelf -w a.out >>> you will see, that the function "tree_check" >>> is inlined three times, one looks like this >>> <2><86b>: Abbrev Number: 40 (DW_TAG_inlined_subroutine) >>> <86c> DW_AT_abstract_origin: <0x95b> >>> <870> DW_AT_entry_pc: 0x1175 >>> <878> DW_AT_GNU_entry_view: 0 >>> <879> DW_AT_ranges : 0x21 >>> <87d> DW_AT_call_file : 1 >>> <87e> DW_AT_call_line : 52 >>> <87f> DW_AT_call_column : 10 >>> <880> DW_AT_sibling : <0x8bf> >>> <3><884>: Abbrev Number: 8 (DW_TAG_formal_parameter) >>> <885> DW_AT_abstract_origin: <0x974> >>> <889> DW_AT_location: 0x37 (location list) >>> <88d> DW_AT_GNU_locviews: 0x35 >>> <3><891>: Abbrev Number: 8 (DW_TAG_formal_parameter) >>> <892> DW_AT_abstract_origin: <0x96c> >>> <896> DW_AT_location: 0x47 (location list) >>> <89a> DW_AT_GNU_locviews: 0x45 >>> <3><89e>: Abbrev Number: 41 (DW_TAG_lexical_block) >>> <89f> DW_AT_ranges : 0x21 >>> >>> see the lexical block has the same DW_AT_ranges, as the >>> inlined subroutine, but the other invocations do not >>> have this lexical block, since your original fix removed >>> those. >>> And this lexical block triggered an unexpected issue >>> in my gdb patch, which I owe you one, for helping me >>> finding it :-) >>> >>> Before that I have never looked at these lexical blocks, >>> but all I can say is that while compiling this test case, >>> in the first invocation of gen_inlined_subroutine_die >>> there are several SUBBLOCKS linked via BLOCK_CHAIN, >>> and only the first one is used to emit the lexical_block, >>> while the other siblings must be fully decoded, otherwise >>> there is an internal error, that I found by try-and-error. >>> I thought that is since the subroutine is split over several >>> places, and therefore it appeared natural to me, that the >>> subroutine is also using several SUBBLOCKS. >> >> OK, so the case in question looks like >> >> { Scope block #8 step-and-next-inline.cc:52 Originating from : static >> struct tree * tree_check (struct tree *, int); Fragment chain : #16 #17 >> struct tree * t; >> int i; >> >> { Scope block #9 Originating from :#0 Fragment chain : #10 #11 >> struct tree * x; >> >> } >> >> { Scope block #10 Originating from :#0 Fragment of : #9 >> struct tree * x; >> >> } >> >> { Scope block #11 Originating from :#0 Fragment of : #9 >> struct tree * x; >> >> } >> >> } >> >> so we have fragments here which we should ignore, but then fragments >> are to collect multiple ranges which, when we do not emit a >> lexical block for block #9 above, we will likely fail to emit and >> which we instead should associate with block #8, the >> DW_TAG_inlined_subroutine. >> >> Somehow it seems to "work" as to associate DW_AT_ranges with the >> DW_TAG_inlined_subroutine. >> >> I've used the following - there's no need to process BLOCK_CHAIN >> as fragments are ignored by gen_block_die. >> >> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc >> index d5144714c6e..4e6ad2ab7e1 100644 >> --- a/gcc/dwarf2out.cc >> +++ b/gcc/dwarf2out.cc >> @@ -25194,8 +25194,13 @@ gen_inlined_subroutine_die (tree stmt, dw_die_ref >> context_die) >> Do that by doing the recursion to subblocks on the single subblock >> of STMT. */ >>bool unwrap_one = false; >> - if (BLOCK_SUBBLOCKS (stmt) && !BLOCK_CHAIN (BLOCK_SUBBLOCKS (stmt))) >> + if (BLOCK_SUBBLOCKS (stmt)) >> { >> + tree subblock = BLOCK_SUBBLOCKS (stmt); >> + /* We should never elide that
RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote: > > > > -Original Message- > > From: Richard Biener > > Sent: Tuesday, August 20, 2024 10:36 AM > > To: Richard Sandiford > > Cc: Prathamesh Kulkarni ; Thomas Schwinge > > ; gcc-patches@gcc.gnu.org > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while streaming in for > > accelerator > > > > External email: Use caution opening links or attachments > > > > > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford > > : > > > > > > Prathamesh Kulkarni writes: > > >> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc index > > >> cbf6041fd68..0420183faf8 100644 > > >> --- a/gcc/lto-streamer-in.cc > > >> +++ b/gcc/lto-streamer-in.cc > > >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If not > > see > > >> #include "debug.h" > > >> #include "alloc-pool.h" > > >> #include "toplev.h" > > >> +#include "stor-layout.h" > > >> > > >> /* Allocator used to hold string slot entries for line map > > streaming. > > >> */ static struct object_allocator > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@ lto_read_tree_1 > > (class lto_input_block *ib, class data_in *data_in, tree expr) > > >> with -g1, see for example PR113488. */ > > >> else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr) == > > expr) > > >>DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE; > > >> + > > >> +#ifdef ACCEL_COMPILER > > >> + /* For decl with aggregate type, host streams out VOIDmode. > > >> + Compute the correct DECL_MODE by calling relayout_decl. */ > > >> + if ((VAR_P (expr) > > >> + || TREE_CODE (expr) == PARM_DECL > > >> + || TREE_CODE (expr) == FIELD_DECL) > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr)) > > >> + && DECL_MODE (expr) == VOIDmode) > > >> +relayout_decl (expr); > > >> +#endif > > > > > > Genuine question, but: is relayout_decl safe in this context? It > > does > > > a lot more than just reset the mode. It also applies the target > > ABI's > > > preferences wrt alignment, padding, and so on, rather than > > preserving > > > those of the host's. > > > > It would be better to just recompute the mode here. > Hi, > The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE (expr)) in > lto_read_tree_1 instead of calling relayout_decl (expr). > I checked layout_decl_type does the same thing for setting decl mode, except > for bit fields. Since bit-fields cannot have > aggregate type, I am assuming setting DECL_MODE (expr) to TYPE_MODE > (TREE_TYPE (expr)) would be OK in this case ? Yep, that should work. > Sorry if this sounds like a silly ques -- Why would it be unsafe to call > relayout_decl for variables that are mapped to accelerator even > if it'd not preserve host's properties ? I assumed we want to assign accel's > ABI properties for mapped decls (mode being one of them), > or am I misunderstanding ? Structure layout need not be compatible but we are preserving that of the host instead of re-layouting in target context. Likewise type <-> mode mapping doesn't have to agree. Richard. > Signed-off-by: Prathamesh Kulkarni > > Thanks, > Prathamesh > > > > > Richard > > > > > Thanks, > > > Richard > > > > > > > > >> } > > >> } > > >> > > >> diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc index > > >> 10c0809914c..0ff8bd1171e 100644 > > >> --- a/gcc/stor-layout.cc > > >> +++ b/gcc/stor-layout.cc > > >> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type, const char > > *name, tree fields, > > >> layout_decl (TYPE_NAME (type), 0); > > >> } > > >> > > >> +/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE). */ > > >> + > > >> +void compute_array_mode (tree type) > > >> +{ > > >> + gcc_assert (TREE_CODE (type) == ARRAY_TYPE); > > >> + > > >> + SET_TYPE_MODE (type, BLKmode); > > >> + if (TYPE_SIZE (type) != 0 > > >> + && ! targetm.member_type_forces_blk (type, VOIDmode) > > >> + /* BLKmode elements force BLKmode aggregate; > > >> + else extract/store fields may lose. */ > > >> + && (TYPE_MODE (TREE_TYPE (type)) != BLKmode > > >> + || TYPE_NO_FORCE_BLK (TREE_TYPE (type > > >> +{ > > >> + SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type), > > >> + TYPE_SIZE (type))); > > >> + if (TYPE_MODE (type) != BLKmode > > >> + && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT > > >> + && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE > > (type))) > > >> +{ > > >> + TYPE_NO_FORCE_BLK (type) = 1; > > >> + SET_TYPE_MODE (type, BLKmode); > > >> +} > > >> +} > > >> +} > > >> + > > >> /* Calculate the mode, size, and alignment for TYPE. > > >>For an array type, calculate the element separation as well. > > >>Record TYPE on the chain of permanent or temporary types @@ > > >> -2709,24 +2735,7 @@ layout_type (tree type) > > >>align = MAX (align, BITS_PER_UNIT); #end
Re: [PATCH] Do not emit a redundant DW_TAG_lexical_block for inlined subroutines
On Wed, 21 Aug 2024, Bernd Edlinger wrote: > On 8/21/24 10:45, Richard Biener wrote: > > On Wed, 21 Aug 2024, Richard Biener wrote: > > > >> On Tue, 20 Aug 2024, Bernd Edlinger wrote: > >> > >>> On 8/20/24 13:00, Richard Biener wrote: > On Fri, Aug 16, 2024 at 12:49 PM Bernd Edlinger > wrote: > > > > While this already works correctly for the case when an inlined > > subroutine contains only one subrange, a redundant DW_TAG_lexical_block > > is still emitted when the subroutine has multiple blocks. > > Huh. The point is that the inline context is a single scope block with > no > siblings - how did that get messed up? The patch unfortunately does not > contain a testcase. > > >>> > >>> Well, I became aware of this because I am working on a gdb patch, > >>> which improves the debug experience of optimized C code, and to my > >>> surprise > >>> the test case did not work with gcc-8, while gcc-9 and following were > >>> fine. > >>> Initially I did not see what is wrong, therefore I started to bisect when > >>> this changed, and so I found your patch, which removed some lexical blocks > >>> in the debug info of this gdb test case: > >>> > >>> from binutils-gdb/gdb/testsuite/gdb.cp/step-and-next-inline.cc > >>> in case you have the binutils-gdb already downloaded you can skip this: > >>> $ git clone git://sourceware.org/git/binutils-gdb > >>> $ cd binutils-gdb/gdb/testsuite/gdb.cp > >>> $ gcc -g -O2 step-and-next-inline.cc > >>> > >>> when you look at the debug info with readelf -w a.out > >>> you will see, that the function "tree_check" > >>> is inlined three times, one looks like this > >>> <2><86b>: Abbrev Number: 40 (DW_TAG_inlined_subroutine) > >>> <86c> DW_AT_abstract_origin: <0x95b> > >>> <870> DW_AT_entry_pc: 0x1175 > >>> <878> DW_AT_GNU_entry_view: 0 > >>> <879> DW_AT_ranges : 0x21 > >>> <87d> DW_AT_call_file : 1 > >>> <87e> DW_AT_call_line : 52 > >>> <87f> DW_AT_call_column : 10 > >>> <880> DW_AT_sibling : <0x8bf> > >>> <3><884>: Abbrev Number: 8 (DW_TAG_formal_parameter) > >>> <885> DW_AT_abstract_origin: <0x974> > >>> <889> DW_AT_location: 0x37 (location list) > >>> <88d> DW_AT_GNU_locviews: 0x35 > >>> <3><891>: Abbrev Number: 8 (DW_TAG_formal_parameter) > >>> <892> DW_AT_abstract_origin: <0x96c> > >>> <896> DW_AT_location: 0x47 (location list) > >>> <89a> DW_AT_GNU_locviews: 0x45 > >>> <3><89e>: Abbrev Number: 41 (DW_TAG_lexical_block) > >>> <89f> DW_AT_ranges : 0x21 > >>> > >>> see the lexical block has the same DW_AT_ranges, as the > >>> inlined subroutine, but the other invocations do not > >>> have this lexical block, since your original fix removed > >>> those. > >>> And this lexical block triggered an unexpected issue > >>> in my gdb patch, which I owe you one, for helping me > >>> finding it :-) > >>> > >>> Before that I have never looked at these lexical blocks, > >>> but all I can say is that while compiling this test case, > >>> in the first invocation of gen_inlined_subroutine_die > >>> there are several SUBBLOCKS linked via BLOCK_CHAIN, > >>> and only the first one is used to emit the lexical_block, > >>> while the other siblings must be fully decoded, otherwise > >>> there is an internal error, that I found by try-and-error. > >>> I thought that is since the subroutine is split over several > >>> places, and therefore it appeared natural to me, that the > >>> subroutine is also using several SUBBLOCKS. > >> > >> OK, so the case in question looks like > >> > >> { Scope block #8 step-and-next-inline.cc:52 Originating from : static > >> struct tree * tree_check (struct tree *, int); Fragment chain : #16 #17 > >> struct tree * t; > >> int i; > >> > >> { Scope block #9 Originating from :#0 Fragment chain : #10 #11 > >> struct tree * x; > >> > >> } > >> > >> { Scope block #10 Originating from :#0 Fragment of : #9 > >> struct tree * x; > >> > >> } > >> > >> { Scope block #11 Originating from :#0 Fragment of : #9 > >> struct tree * x; > >> > >> } > >> > >> } > >> > >> so we have fragments here which we should ignore, but then fragments > >> are to collect multiple ranges which, when we do not emit a > >> lexical block for block #9 above, we will likely fail to emit and > >> which we instead should associate with block #8, the > >> DW_TAG_inlined_subroutine. > >> > >> Somehow it seems to "work" as to associate DW_AT_ranges with the > >> DW_TAG_inlined_subroutine. > >> > >> I've used the following - there's no need to process BLOCK_CHAIN > >> as fragments are ignored by gen_block_die. > >> > >> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc > >> index d5144714c6e..4e6ad2ab7e1 100644 > >> --- a/gcc/dwarf2out.cc > >> +++ b/gcc/dwarf2out.cc > >> @@ -25194,8 +25194,13 @@ gen_inlined_subroutine_die (tree stmt, dw_die_ref > >> context_die) > >> Do that by doing the recursion
Re: [Ping, Patch, Fortran, 77871, v1] Allow for class typed coarray parameter as dummy [PR77871]
Hi all, pinging this patch for the first time. Rebased and regtested ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline? - Andre On Thu, 15 Aug 2024 14:39:25 +0200 Andre Vehreschild wrote: > Hi all, > > attached patch fixes another regression on coarrays. This time for class typed > coarrays as dummys. > > Regtested ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline? > > Regards, > Andre > -- > Andre Vehreschild * Email: vehre ad gmx dot de -- Andre Vehreschild * Email: vehre ad gmx dot de From eeacd9a2c5cc4ddfe6201ad335adb0f48767fba1 Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Thu, 15 Aug 2024 13:49:49 +0200 Subject: [PATCH] [Fortran] Allow for class type coarray parameters. [PR77871] gcc/fortran/ChangeLog: PR fortran/77871 * trans-expr.cc (gfc_conv_derived_to_class): Assign token when converting a coarray to class. (gfc_get_tree_for_caf_expr): For classes get the caf decl from the saved descriptor. (gfc_get_caf_token_offset):Assert that coarray=lib is set and cover more cases where the tree having the coarray token can be. * trans-intrinsic.cc (gfc_conv_intrinsic_caf_get): Use unified test for pointers. gcc/testsuite/ChangeLog: * gfortran.dg/coarray/dummy_3.f90: New test. --- gcc/fortran/trans-expr.cc | 36 --- gcc/fortran/trans-intrinsic.cc| 2 +- gcc/testsuite/gfortran.dg/coarray/dummy_3.f90 | 33 + 3 files changed, 58 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/coarray/dummy_3.f90 diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc index 8801a15c3a8..4681a131139 100644 --- a/gcc/fortran/trans-expr.cc +++ b/gcc/fortran/trans-expr.cc @@ -810,6 +810,16 @@ gfc_conv_derived_to_class (gfc_se *parmse, gfc_expr *e, gfc_symbol *fsym, /* Now set the data field. */ ctree = gfc_class_data_get (var); + if (flag_coarray == GFC_FCOARRAY_LIB && CLASS_DATA (fsym)->attr.codimension) +{ + tree token; + tmp = gfc_get_tree_for_caf_expr (e); + if (POINTER_TYPE_P (TREE_TYPE (tmp))) + tmp = build_fold_indirect_ref (tmp); + gfc_get_caf_token_offset (parmse, &token, nullptr, tmp, NULL_TREE, e); + gfc_add_modify (&parmse->pre, gfc_conv_descriptor_token (ctree), token); +} + if (optional) cond_optional = gfc_conv_expr_present (e->symtree->n.sym); @@ -2368,6 +2378,10 @@ gfc_get_tree_for_caf_expr (gfc_expr *expr) if (expr->symtree->n.sym->ts.type == BT_CLASS) { + if (DECL_P (caf_decl) && DECL_LANG_SPECIFIC (caf_decl) + && GFC_DECL_SAVED_DESCRIPTOR (caf_decl)) + caf_decl = GFC_DECL_SAVED_DESCRIPTOR (caf_decl); + if (expr->ref && expr->ref->type == REF_ARRAY) { caf_decl = gfc_class_data_get (caf_decl); @@ -2432,16 +2446,12 @@ gfc_get_caf_token_offset (gfc_se *se, tree *token, tree *offset, tree caf_decl, { tree tmp; + gcc_assert (flag_coarray == GFC_FCOARRAY_LIB); + /* Coarray token. */ if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (caf_decl))) -{ - gcc_assert (GFC_TYPE_ARRAY_AKIND (TREE_TYPE (caf_decl)) - == GFC_ARRAY_ALLOCATABLE - || expr->symtree->n.sym->attr.select_type_temporary - || expr->symtree->n.sym->assoc); *token = gfc_conv_descriptor_token (caf_decl); -} - else if (DECL_LANG_SPECIFIC (caf_decl) + else if (DECL_P (caf_decl) && DECL_LANG_SPECIFIC (caf_decl) && GFC_DECL_TOKEN (caf_decl) != NULL_TREE) *token = GFC_DECL_TOKEN (caf_decl); else @@ -2459,7 +2469,7 @@ gfc_get_caf_token_offset (gfc_se *se, tree *token, tree *offset, tree caf_decl, && (GFC_TYPE_ARRAY_AKIND (TREE_TYPE (caf_decl)) == GFC_ARRAY_ALLOCATABLE || GFC_TYPE_ARRAY_AKIND (TREE_TYPE (caf_decl)) == GFC_ARRAY_POINTER)) *offset = build_int_cst (gfc_array_index_type, 0); - else if (DECL_LANG_SPECIFIC (caf_decl) + else if (DECL_P (caf_decl) && DECL_LANG_SPECIFIC (caf_decl) && GFC_DECL_CAF_OFFSET (caf_decl) != NULL_TREE) *offset = GFC_DECL_CAF_OFFSET (caf_decl); else if (GFC_TYPE_ARRAY_CAF_OFFSET (TREE_TYPE (caf_decl)) != NULL_TREE) @@ -2526,11 +2536,13 @@ gfc_get_caf_token_offset (gfc_se *se, tree *token, tree *offset, tree caf_decl, } else if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (caf_decl))) tmp = gfc_conv_descriptor_data_get (caf_decl); + else if (INDIRECT_REF_P (caf_decl)) +tmp = TREE_OPERAND (caf_decl, 0); else - { - gcc_assert (POINTER_TYPE_P (TREE_TYPE (caf_decl))); - tmp = caf_decl; - } +{ + gcc_assert (POINTER_TYPE_P (TREE_TYPE (caf_decl))); + tmp = caf_decl; +} *offset = fold_build2_loc (input_location, MINUS_EXPR, gfc_array_index_type, fold_convert (gfc_array_index_type, *offset), diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc index 0632e3e4d2f..ceda7843fa9 100644 --- a/gcc/fortran/trans-intrinsic.cc +++ b/gcc/fortran/trans-intrinsic.cc @@ -1900,7 +1900,7 @@ gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs, tree lhs_kind, gf
Re: [PATCH 3/8] tree-ifcvt: Enforce zero else value after maskload.
> And we fail to fold vect_patt_384.36_436 | { 1, ... } to { 1, ... }? > Or is the issue that vector masks contain padding and with > non-zero masking we'd have garbage in the padding and that leaks > here? That is, _47 ? 1 : iftmp.0_113 -> _47 | iftmp.0_113 assumes > there's exactly one bit in a bool, specifically it has assumptions > on padding - I'd guess that > > *(char *)p = 17; > _Bool _47 = *(_Bool *)p; > ... = _47 ? 1 : b; > > would have similar issues but eventually loading 17 into a _Bool > invokes undefined behavior. So maybe the COND_EXPRs are only > required for .MASK_LOADs of _Bool (or any other type with padding)? Hmm yeah, if you put it like that, very likely. I have only seen it with _Bool/mask types. In this PR here the significant bit is correct and it's the others leaking. -- Regards Robin
[PATCH] tree-optimization/116380 - bogus SSA update with loop distribution
When updating LC PHIs after copying loops we have to handle defs defined outside of the loop appropriately (by not setting them to NULL ...). This mimics how we handle this in the SSA updating code of the vectorizer. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. PR tree-optimization/116380 * tree-loop-distribution.cc (copy_loop_before): Handle out-of-loop defs appropriately. * gcc.dg/torture/pr116380.c: New testcase. --- gcc/testsuite/gcc.dg/torture/pr116380.c | 16 gcc/tree-loop-distribution.cc | 3 +++ 2 files changed, 19 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/torture/pr116380.c diff --git a/gcc/testsuite/gcc.dg/torture/pr116380.c b/gcc/testsuite/gcc.dg/torture/pr116380.c new file mode 100644 index 000..5ffd99459d2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr116380.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-fno-tree-scev-cprop" } */ + +int a[3], d[3], c; +int f(int e, int b) +{ + for (; e < 3; e++) +{ + a[0] = 0; + if (b) + c = b; + d[e] = 0; + a[e] = 0; +} + return e; +} diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc index f87393ee94d..10f261a8769 100644 --- a/gcc/tree-loop-distribution.cc +++ b/gcc/tree-loop-distribution.cc @@ -980,6 +980,9 @@ copy_loop_before (class loop *loop, bool redirect_lc_phi_defs) if (TREE_CODE (USE_FROM_PTR (use_p)) == SSA_NAME) { tree new_def = get_current_def (USE_FROM_PTR (use_p)); + if (!new_def) + /* Something defined outside of the loop. */ + continue; SET_USE (use_p, new_def); } } -- 2.43.0
Re: [PATCH] Update LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker plugin hook
On Wed, Aug 21, 2024 at 2:38 AM Richard Biener wrote: > > On Tue, Aug 20, 2024 at 3:24 PM H.J. Lu wrote: > > > > On Tue, Aug 20, 2024 at 2:03 AM Richard Biener > > wrote: > > > > > > On Wed, Aug 14, 2024 at 3:15 PM H.J. Lu wrote: > > > > > > > > The new hook allows the linker plugin to distinguish calls to > > > > claim_file_handler that know the object is being used by the linker > > > > (from ldmain.c:add_archive_element), from calls that don't know it's > > > > being used by the linker (from elf_link_is_defined_archive_symbol); in > > > > the latter case, the plugin should avoid including the unused LTO > > > > archive > > > > members in linker output. To get the proper support for archives with > > > > LTO common symbols, the linker fix for > > > > > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=32083 > > > > > > > > is required. > > > > > > > > PR lto/116361 > > > > * lto-plugin.c (claim_file_handler_v2): Include the LTO object > > > > only if it is known to be used for link output. > > > > > > > > Signed-off-by: H.J. Lu > > > > --- > > > > lto-plugin/lto-plugin.c | 20 > > > > 1 file changed, 12 insertions(+), 8 deletions(-) > > > > > > > > diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c > > > > index 152648338b9..2d2bfa60d42 100644 > > > > --- a/lto-plugin/lto-plugin.c > > > > +++ b/lto-plugin/lto-plugin.c > > > > @@ -1286,13 +1286,17 @@ claim_file_handler_v2 (const struct > > > > ld_plugin_input_file *file, int *claimed, > > > > lto_file.symtab.syms); > > > >check (status == LDPS_OK, LDPL_FATAL, "could not add symbols"); > > > > > > We are still doing add_symbols, shouldn't what we do depend on what > > > that does? The > > > > If status != LDPS_OK, the plugin will abort because of LDPL_FATAL. > > > > > function comment says > > > > > >If KNOWN_USED, the object is known by the linker > > >to be used, or an older API version is in use that does not provide > > > that > > >information; otherwise, the linker is only determining whether this is > > >a plugin object and it should not be registered as having offload data > > > if > > >not claimed by the plugin. > > > > > > where do you check "if not claimed by the plugin"? I think this at least > > > needs > > > clarification with the change. > > > > See my reply below. > > > > > > - LOCK_SECTION; > > > > - num_claimed_files++; > > > > - claimed_files = > > > > - xrealloc (claimed_files, > > > > - num_claimed_files * sizeof (struct plugin_file_info)); > > > > - claimed_files[num_claimed_files - 1] = lto_file; > > > > - UNLOCK_SECTION; > > > > + /* Include it only if it is known to be used for link output. */ > > > > + if (known_used) > > > > + { > > > > + LOCK_SECTION; > > > > + num_claimed_files++; > > > > + claimed_files = > > > > + xrealloc (claimed_files, > > > > + num_claimed_files * sizeof (struct > > > > plugin_file_info)); > > > > + claimed_files[num_claimed_files - 1] = lto_file; > > > > + UNLOCK_SECTION; > > > > + } > > > > > > > >*claimed = 1; > > > > } > > > > @@ -1313,7 +1317,7 @@ claim_file_handler_v2 (const struct > > > > ld_plugin_input_file *file, int *claimed, > > > >if (*claimed && !obj.offload && offload_files_last_lto == NULL) > > > > offload_files_last_lto = offload_files_last; > > > > > > > > - if (obj.offload && (known_used || obj.found > 0)) > > > > + if (obj.offload && known_used && obj.found > 0) > > > > The offload data is included when it is claimed by the plugin > > even if known_used is 0. It looks quite odd to me. > > To me the whole 'known_used' thing looks odd - I would have expected > the linker to do two round-trips for archives maybe; first with > knwon_used == 0, just getting the add_symbol calls (aka, get > the LTO symbol table), then the linker computes whether the archive > is used and if it is, re-do the claim_file hook with known_used == 1. > > Is that how it is done? Yes. > Otherwise how should the plugin know whether the file should be added or not? > Will the linker take care of that then? Where is the API documented? I think Yes, linker will do the right thing after https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a6f8fe0a9e9cbe871652e46ba7c22d5e9fb86208 > how known_used is to be used needs better documentation. The known documentation is in the comments for claim_file_handler_v2. > Did you look at how other linkers use known_used? BFD linker uses it for common symbol support in archive. BFD linker calls claim_file_handler_v2 with known_used == 0 to check for non-common definition in an archive member. If there is one, include the archive member in the output, otherwise, exclude it. Other linkers never do this for common symbols. > Sorry for just asking questions and having no answer
Re: [PING] [PATCH v2] Support if conversion for switches
On Tue, Aug 13, 2024 at 7:34 PM Andi Kleen wrote: > > Andi Kleen writes: > > I wanted to ping this patch. I believe Richard ok'ed most of it earlier > but need an ok for the changes resulting from his review too > (but they were mostly only test suite and comment fixes > apart from some minor tweaks) OK. Thanks, Richard. > -Andi > > > The gimple-if-to-switch pass converts if statements with > > multiple equal checks on the same value to a switch. This breaks > > vectorization which cannot handle switches. > > > > Teach the tree-if-conv pass used by the vectorizer to handle > > simple switch statements, like those created by if-to-switch earlier. > > These are switches that only have a single non default block, > > They are handled similar to COND in if conversion. > > > > This makes the vect-bitfield-read-1-not test fail. The test > > checks for a bitfield analysis failing, but it actually > > relied on the ifcvt erroring out early because the test > > is using a switch. The if conversion still does not > > work because the switch is not in a form that this > > patch can handle, but it fails much later and the bitfield > > analysis succeeds, which makes the test fail. I marked > > it xfail because it doesn't seem to be testing what it wants > > to test. > > > > [v2: Fix tests to run correctly. Update comments and commit log. > > Fix gimple switch accessor use.] > > > > gcc/ChangeLog: > > > > PR tree-opt/115866 > > * tree-if-conv.cc (if_convertible_switch_p): New function. > > (if_convertible_stmt_p): Check for switch. > > (get_loop_body_in_if_conv_order): Handle switch. > > (predicate_bbs): Likewise. > > (predicate_statements): Likewise. > > (remove_conditions_and_labels): Likewise. > > (ifcvt_split_critical_edges): Likewise. > > (ifcvt_local_dce): Likewise. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/vect/vect-switch-ifcvt-1.c: New test. > > * gcc.dg/vect/vect-switch-ifcvt-2.c: New test. > > * gcc.dg/vect/vect-switch-search-line-fast.c: New test. > > * gcc.dg/vect/vect-bitfield-read-1-not.c: Change to xfail. > > --- > > gcc/doc/cfg.texi | 4 +- > > .../gcc.dg/vect/vect-bitfield-read-1-not.c| 2 +- > > .../gcc.dg/vect/vect-switch-ifcvt-1.c | 115 ++ > > .../gcc.dg/vect/vect-switch-ifcvt-2.c | 49 > > .../vect/vect-switch-search-line-fast.c | 17 +++ > > gcc/tree-if-conv.cc | 93 +- > > 6 files changed, 272 insertions(+), 8 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-2.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c > > > > diff --git a/gcc/doc/cfg.texi b/gcc/doc/cfg.texi > > index 9a22420f91f..a6f2b9f97d6 100644 > > --- a/gcc/doc/cfg.texi > > +++ b/gcc/doc/cfg.texi > > @@ -83,13 +83,13 @@ lexicographical order, except @code{ENTRY_BLOCK} and > > @code{EXIT_BLOCK}. > > The macro @code{FOR_ALL_BB} also visits all basic blocks in > > lexicographical order, including @code{ENTRY_BLOCK} and @code{EXIT_BLOCK}. > > > > -@findex post_order_compute, inverted_post_order_compute, > > walk_dominator_tree > > +@findex post_order_compute, inverted_post_order_compute, dom_walker::walk > > The functions @code{post_order_compute} and > > @code{inverted_post_order_compute} > > can be used to compute topological orders of the CFG. The orders are > > stored as vectors of basic block indices. The @code{BASIC_BLOCK} array > > can be used to iterate each basic block by index. > > Dominator traversals are also possible using > > -@code{walk_dominator_tree}. Given two basic blocks A and B, block A > > +@code{dom_walker::walk}. Given two basic blocks A and B, block A > > dominates block B if A is @emph{always} executed before B@. > > > > Each @code{basic_block} also contains pointers to the first > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c > > b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c > > index 0d91067ebb2..85f4de8464a 100644 > > --- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c > > +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c > > @@ -55,6 +55,6 @@ int main (void) > >return 0; > > } > > > > -/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */ > > +/* { dg-final { scan-tree-dump-times "Bitfield OK to lower." 0 "ifcvt" { > > xfail *-*-* } } } */ > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c > > b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c > > new file mode 100644 > > index 000..f5352ef8ed7 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c > > @@ -0,0 +1,115 @@ > > +/* { dg-require-effective-target vect_int } */ > > +#include "tree-vect.h" > > + > > +extern void abort (void); > > + > > +int > > +f1 (char *s)
Re: [PATCH 1/2] Makefile.tpl: drop leftover intermodule cruft
On Thu, Aug 15, 2024 at 12:14 AM Sam James wrote: > > intermodule supported was dropped in r0-103106-gde6ba7aee152a0 with some > remaining bits for Fortran removed in r14-1696-gecc96eb5d2a0e5. OK > Remove some small leftovers. > > * Makefile.in: Regenerate. > * Makefile.tpl (STAGE1_CONFIGURE_FLAGS: Remove --disable-intermodule. > --- > Makefile.in | 11 --- > Makefile.tpl | 11 --- > 2 files changed, 8 insertions(+), 14 deletions(-) > > diff --git a/Makefile.in b/Makefile.in > index 34c5550beca2..a1a56bb5dd2c 100644 > --- a/Makefile.in > +++ b/Makefile.in > @@ -610,14 +610,11 @@ STAGEautofeedback_CONFIGURE_FLAGS = > $(STAGE_CONFIGURE_FLAGS) > STAGE1_CFLAGS = @stage1_cflags@ > STAGE1_CHECKING = @stage1_checking@ > STAGE1_LANGUAGES = @stage1_languages@ > -# * We force-disable intermodule optimizations, even if > -# --enable-intermodule was passed, since the installed compiler > -# probably can't handle them. Luckily, autoconf always respects > -# the last argument when conflicting --enable arguments are passed. > -# * Likewise, we force-disable coverage flags, since the installed > -# compiler probably has never heard of them. > +# * We force-disable coverage flags, since the installed compiler probably > +# has never heard of them. Luckily, autoconf always respects the last > +# argument when conflicting --enable arguments are passed. > # * We also disable -Wformat, since older GCCs don't understand newer %s. > -STAGE1_CONFIGURE_FLAGS = --disable-intermodule $(STAGE1_CHECKING) \ > +STAGE1_CONFIGURE_FLAGS = $(STAGE1_CHECKING) \ > --disable-coverage --enable-languages="$(STAGE1_LANGUAGES)" \ > --disable-build-format-warnings > > diff --git a/Makefile.tpl b/Makefile.tpl > index 8f4bf297918c..cbb3c6789dcf 100644 > --- a/Makefile.tpl > +++ b/Makefile.tpl > @@ -533,14 +533,11 @@ STAGE[+id+]_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS) > STAGE1_CFLAGS = @stage1_cflags@ > STAGE1_CHECKING = @stage1_checking@ > STAGE1_LANGUAGES = @stage1_languages@ > -# * We force-disable intermodule optimizations, even if > -# --enable-intermodule was passed, since the installed compiler > -# probably can't handle them. Luckily, autoconf always respects > -# the last argument when conflicting --enable arguments are passed. > -# * Likewise, we force-disable coverage flags, since the installed > -# compiler probably has never heard of them. > +# * We force-disable coverage flags, since the installed compiler probably > +# has never heard of them. Luckily, autoconf always respects the last > +# argument when conflicting --enable arguments are passed. > # * We also disable -Wformat, since older GCCs don't understand newer %s. > -STAGE1_CONFIGURE_FLAGS = --disable-intermodule $(STAGE1_CHECKING) \ > +STAGE1_CONFIGURE_FLAGS = $(STAGE1_CHECKING) \ > --disable-coverage --enable-languages="$(STAGE1_LANGUAGES)" \ > --disable-build-format-warnings > > -- > 2.45.2 >
Re: [PATCH 1/2] SVE intrinsics: Fold constant operands for svdiv
Jennifer Schmitz writes: > thank you for the feedback. I would like to summarize what I understand from > your suggestions before I start revising to make sure we are on the same page: > > 1. The new setup for constant folding of SVE intrinsics for binary operations > where both operands are constant vectors looks like this: > > In gcc/fold-const.cc: > NEW: vector_const_binop: Handles vector part of const_binop element-wise > const_binop: For vector arguments, calls vector_const_binop with const_binop > as callback > poly_int_binop: Is now public and -if necessary- we can implement missing > codes (e.g. TRUNC_DIV_EXPR) Yeah. And specifically: I think we can move: if (TREE_CODE (arg1) == INTEGER_CST && TREE_CODE (arg2) == INTEGER_CST) { wide_int warg1 = wi::to_wide (arg1), res; wide_int warg2 = wi::to_wide (arg2, TYPE_PRECISION (type)); if (!wide_int_binop (res, code, warg1, warg2, sign, &overflow)) return NULL_TREE; poly_res = res; } into poly_int_binop. It shouldn't affect compile times on non-poly targets too much, since poly_int_tree_p (arg1) just checks for INTEGER_CST there. > In aarch64 backend: > NEW: aarch64_vector_const_binop: adapted from int_const_binop, but calls > poly_int_binop Yes. The main differences are that we shouldn't treat any operation as overflowing, and that we can handle cases that are well-defined for intrinsics but not for gimple. > intrinsic_impl::fold: calls vector_const_binop with > aarch64_vector_const_binop as callback Yeah. > 2. Folding where only one operand is constant (0/x, x/0, 0*x etc.) can be > handled individually in intrinsic_impl, but in separate patches. If there is > already code to check for uniform vectors (e.g. in the svdiv->svasrd case), > we try to share code. Yeah. And in particular, we should try to handle (and test) vector-scalar _n intrinsics as well as vector-vector intrinsics. > Does that cover what you proposed? Otherwise, please feel free to correct any > misunderstandings. SGTM. Thanks, Richard
[patch] libgomp: Add interop types and routines to OpenMP's headers and module
This patch adds 'interop' to C/C++'s omp.h and Fortran's omp_lib.h and omp_lib module. The implementation should match OpenMP 5.1 (which added interop) and also TR13; the Fortran routine support is new in TR13. It also adds 'hsa' as foreign object enum/paramter, which is currently being added to the additional-definitions document. * * * The routine interface does not exactly match the OpenMP spec as some VALUE and BIND(C) and one c_int has been used to reduce pointless differences between OpenMP and C/C++. This shouldn't affect the usage as almost no user will worries about the API used for a procedure reference. But if a user defines the routine interface him-/herself, this will fail. (But why should (s)he? There is 'omp_lib.h' and the 'omp_lib' module, after all – and several items in those files are implementation defined.) On the C/C++ side, there are also some differences (at least with TR13) with regards to unsigned vs. signed and to enum (of size __UINTPTR_T__) vs. 'typdef (u)intptr_t', but they shouldn't matter either (effectively same API) – and, again, there is a omp.h, which any sensible user should use. * * * While there is a stub implementation for the routines, to make them really useful, two things are missing: Support for the 'interop' directive in the compiler itself (+ a libgomp function for it) and supporting some foreign run time types in the libgomp plugin. Also missing is the documentation of the added routines in libgomp.texi. All of which will be added in later patches. Build + tested on x86-64-gnu-linux (with offloading enabled but that's not yet relevant). Comments, remarks, suggestions before I commit it? Tobias libgomp: Add interop types and routines to OpenMP's headers and module This commit adds OpenMP 5.1+'s interop enumeration, type and routine declarations to the C/C++ header file and, new in OpenMP TR13, also to the Fortran module and omp_lib.h header file. While a stub implementation is provided, only with foreign runtime support by the libgomp GPU plugins and with the 'interop' directive, this becomes really useful. libgomp/ChangeLog: * fortran.c (omp_get_interop_str_, omp_get_interop_name_, omp_get_interop_type_desc_, omp_get_interop_rc_desc_): Add. * libgomp.map (GOMP_5.1.3): New; add interop routines. * omp.h.in: Add interop typedefs, enum and prototypes. * omp_lib.f90.in: Add paramters and interfaces for interop. * omp_lib.h.in: Likewise; move F90 '&' to column 81 for -ffree-length-80. * target.c (omp_get_num_interop_properties, omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str, omp_get_interop_name, omp_get_interop_type_desc, omp_get_interop_rc_desc): Add. * testsuite/libgomp.c/interop-routines-1.c: New test. * testsuite/libgomp.fortran/interop-routines-1.F90: New test. * testsuite/libgomp.fortran/interop-routines-2.F90: New test. * testsuite/libgomp.fortran/interop-routines-3.F: New test. * testsuite/libgomp.fortran/interop-routines-4.F: New test. * testsuite/libgomp.fortran/interop-routines-5.F: New test. * testsuite/libgomp.fortran/interop-routines-6.F: New test. libgomp/fortran.c | 41 libgomp/libgomp.map| 15 ++ libgomp/omp.h.in | 69 ++ libgomp/omp_lib.f90.in | 99 + libgomp/omp_lib.h.in | 167 -- libgomp/target.c | 91 libgomp/testsuite/libgomp.c/interop-routines-1.c | 246 + .../libgomp.fortran/interop-routines-1.F90 | 222 +++ .../libgomp.fortran/interop-routines-2.F90 | 3 + .../testsuite/libgomp.fortran/interop-routines-3.F | 2 + .../testsuite/libgomp.fortran/interop-routines-4.F | 4 + .../testsuite/libgomp.fortran/interop-routines-5.F | 4 + .../testsuite/libgomp.fortran/interop-routines-6.F | 4 + 13 files changed, 945 insertions(+), 22 deletions(-) diff --git a/libgomp/fortran.c b/libgomp/fortran.c index cfbea32b022..b62a3f29916 100644 --- a/libgomp/fortran.c +++ b/libgomp/fortran.c @@ -102,6 +102,10 @@ ialias_redirect (omp_set_default_allocator) ialias_redirect (omp_get_default_allocator) ialias_redirect (omp_display_env) ialias_redirect (omp_fulfill_event) +ialias_redirect (omp_get_interop_str) +ialias_redirect (omp_get_interop_name) +ialias_redirect (omp_get_interop_type_desc) +ialias_redirect (omp_get_interop_rc_desc) #endif #ifndef LIBGOMP_GNU_SYMBOL_VERSIONING @@ -807,4 +811,41 @@ omp_display_env_8_ (const int64_t *verbose) omp_display_env (!!*verbose); } +void +omp_get_interop_str_ (const char **res, size_t *res_len, + const omp_interop_t interop, + omp_interop_property_t property_id, + omp_interop_rc_t *ret_code) +{ + *res = omp_get_interop_str (interop, property_id, ret_code); + *res_len = *res ? strlen (*res) : 0; +} + +void +omp_get_inter
[PATCH 3/2] libstdc++: Optimize std::projected
Tested on x86_64-pc-linux-gnu, does this look OK for trunk? I'm not sure if the current specification of 'projected' strictly speaking allows for this optimization, but it seems like a natural one that should be allowed. -- >8 -- Algorithms that are generalized to take projections usually default the projection to std::identity, which really means no projection at all. In that case, I believe we could shortcut the projection logic to return the indirectly readable type unchanged rather than a wrapped version of it. This should help with compile times especially after P2609R3 which made the indirect invocability concepts more expensive to check when projections are in the picture. libstdc++-v3/ChangeLog: * include/bits/iterator_concepts.h (__detail::__projected): Define partial specialization for a std::identity projection. --- libstdc++-v3/include/bits/iterator_concepts.h | 5 + 1 file changed, 5 insertions(+) diff --git a/libstdc++-v3/include/bits/iterator_concepts.h b/libstdc++-v3/include/bits/iterator_concepts.h index d849ddc32fc..642c709fee0 100644 --- a/libstdc++-v3/include/bits/iterator_concepts.h +++ b/libstdc++-v3/include/bits/iterator_concepts.h @@ -803,6 +803,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION using __projected_Proj = _Proj; }; }; + +// Optimize the common case of the projection being std::identity. +template + struct __projected<_Iter, identity> + { using __type = _Iter; }; } // namespace __detail /// [projected], projected -- 2.46.0.267.gbb9c16bd4f
[PATCH] rs6000: Fix PTImode handling in power8 swap optimization pass [PR116415]
Our power8 swap optimization pass has some special handling for optimizing swaps of TImode variables. The test case reported in bugzilla uses a call to __atomic_compare_exchange, which introduces a variable of PTImode and that does not get the same treatment as TImode leading to wrong code generation. The simple fix is to treat PTImode identically to TImode. This passed bootstrap and regtesting on powerpc64le-linux with no regressions. I also confirmed the testcase is correctly not run on -m32 BE and passes on -m64 BE. Ok for trunk? This is broken back to GCC 12, so ok for the releases branches after some bake-in time on trunk? Peter gcc/ PR target/116415 * config/rs6000/rs6000-p8swap.cc (rs6000_analyze_swaps): Handle PTImode identically to TImode. gcc/testsuite/ PR target/116415 * gcc.target/powerpc/pr116415.c: New test. diff --git a/gcc/config/rs6000/rs6000-p8swap.cc b/gcc/config/rs6000/rs6000-p8swap.cc index 639f477d782..15e44bb63a6 100644 --- a/gcc/config/rs6000/rs6000-p8swap.cc +++ b/gcc/config/rs6000/rs6000-p8swap.cc @@ -2469,10 +2469,11 @@ rs6000_analyze_swaps (function *fun) mode = V4SImode; } - if (ALTIVEC_OR_VSX_VECTOR_MODE (mode) || mode == TImode) + if (ALTIVEC_OR_VSX_VECTOR_MODE (mode) || mode == TImode + || mode == PTImode) { insn_entry[uid].is_relevant = 1; - if (mode == TImode || mode == V1TImode + if (mode == TImode || mode == PTImode || mode == V1TImode || FLOAT128_VECTOR_P (mode)) insn_entry[uid].is_128_int = 1; if (DF_REF_INSN_INFO (mention)) @@ -2497,10 +2498,11 @@ rs6000_analyze_swaps (function *fun) && ALTIVEC_OR_VSX_VECTOR_MODE (GET_MODE (SET_DEST (insn mode = GET_MODE (SET_DEST (insn)); - if (ALTIVEC_OR_VSX_VECTOR_MODE (mode) || mode == TImode) + if (ALTIVEC_OR_VSX_VECTOR_MODE (mode) || mode == TImode + || mode == PTImode) { insn_entry[uid].is_relevant = 1; - if (mode == TImode || mode == V1TImode + if (mode == TImode || mode == PTImode || mode == V1TImode || FLOAT128_VECTOR_P (mode)) insn_entry[uid].is_128_int = 1; if (DF_REF_INSN_INFO (mention)) diff --git a/gcc/testsuite/gcc.target/powerpc/pr116415.c b/gcc/testsuite/gcc.target/powerpc/pr116415.c new file mode 100644 index 000..5fad810ceb0 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr116415.c @@ -0,0 +1,43 @@ +/* { dg-do run } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target p8vector_hw } */ +/* { dg-require-effective-target int128 } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ + +/* PR 116415: Verify our Power8 swap optimization pass doesn't incorrectly swap + PTImode values. They should be handled identically to TImode values. */ + +#include +#include +#include + +typedef union { + struct { +uint64_t a; +uint64_t b; + } t; + __uint128_t data; +} Value; +Value value, next; + +void +bug (Value *val, Value *nxt) +{ + for (;;) { +nxt->t.a = val->t.a + 1; +nxt->t.b = val->t.b + 2; +if (__atomic_compare_exchange (&val->data, &val->data, &nxt->data, + 0, __ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE)) + break; + } +} + +int +main (void) +{ + bug (&value, &next); + printf ("%lu %lu\n", value.t.a, value.t.b); + if (value.t.a != 1 || value.t.b != 2) +abort (); + return 0; +}
Re: [PATCH 3/2] libstdc++: Optimize std::projected
On Wed, 21 Aug 2024 at 13:58, Patrick Palka wrote: > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk? I'm not > sure if the current specification of 'projected' strictly speaking > allows for this optimization, but it seems like a natural one that > should be allowed. Yeah, I can't see any conformance problems with doing this. I'm sure somebody smarter than me will point it out if there's a problem, so let's try it and see. OK for trunk, thanks. > > -- >8 -- > > Algorithms that are generalized to take projections usually default the > projection to std::identity, which really means no projection at all. > In that case, I believe we could shortcut the projection logic to return > the indirectly readable type unchanged rather than a wrapped version of > it. This should help with compile times especially after P2609R3 which > made the indirect invocability concepts more expensive to check when > projections are in the picture. > > libstdc++-v3/ChangeLog: > > * include/bits/iterator_concepts.h (__detail::__projected): > Define partial specialization for a std::identity projection. > --- > libstdc++-v3/include/bits/iterator_concepts.h | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/libstdc++-v3/include/bits/iterator_concepts.h > b/libstdc++-v3/include/bits/iterator_concepts.h > index d849ddc32fc..642c709fee0 100644 > --- a/libstdc++-v3/include/bits/iterator_concepts.h > +++ b/libstdc++-v3/include/bits/iterator_concepts.h > @@ -803,6 +803,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > using __projected_Proj = _Proj; > }; >}; > + > +// Optimize the common case of the projection being std::identity. > +template > + struct __projected<_Iter, identity> > + { using __type = _Iter; }; >} // namespace __detail > >/// [projected], projected > -- > 2.46.0.267.gbb9c16bd4f >
Re: [PATCH] Update LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker plugin hook
On Wed, Aug 21, 2024 at 2:27 PM H.J. Lu wrote: > > On Wed, Aug 21, 2024 at 2:38 AM Richard Biener > wrote: > > > > On Tue, Aug 20, 2024 at 3:24 PM H.J. Lu wrote: > > > > > > On Tue, Aug 20, 2024 at 2:03 AM Richard Biener > > > wrote: > > > > > > > > On Wed, Aug 14, 2024 at 3:15 PM H.J. Lu wrote: > > > > > > > > > > The new hook allows the linker plugin to distinguish calls to > > > > > claim_file_handler that know the object is being used by the linker > > > > > (from ldmain.c:add_archive_element), from calls that don't know it's > > > > > being used by the linker (from elf_link_is_defined_archive_symbol); in > > > > > the latter case, the plugin should avoid including the unused LTO > > > > > archive > > > > > members in linker output. To get the proper support for archives with > > > > > LTO common symbols, the linker fix for > > > > > > > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=32083 > > > > > > > > > > is required. > > > > > > > > > > PR lto/116361 > > > > > * lto-plugin.c (claim_file_handler_v2): Include the LTO object > > > > > only if it is known to be used for link output. > > > > > > > > > > Signed-off-by: H.J. Lu > > > > > --- > > > > > lto-plugin/lto-plugin.c | 20 > > > > > 1 file changed, 12 insertions(+), 8 deletions(-) > > > > > > > > > > diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c > > > > > index 152648338b9..2d2bfa60d42 100644 > > > > > --- a/lto-plugin/lto-plugin.c > > > > > +++ b/lto-plugin/lto-plugin.c > > > > > @@ -1286,13 +1286,17 @@ claim_file_handler_v2 (const struct > > > > > ld_plugin_input_file *file, int *claimed, > > > > > lto_file.symtab.syms); > > > > >check (status == LDPS_OK, LDPL_FATAL, "could not add symbols"); > > > > > > > > We are still doing add_symbols, shouldn't what we do depend on what > > > > that does? The > > > > > > If status != LDPS_OK, the plugin will abort because of LDPL_FATAL. > > > > > > > function comment says > > > > > > > >If KNOWN_USED, the object is known by the linker > > > >to be used, or an older API version is in use that does not provide > > > > that > > > >information; otherwise, the linker is only determining whether this > > > > is > > > >a plugin object and it should not be registered as having offload > > > > data if > > > >not claimed by the plugin. > > > > > > > > where do you check "if not claimed by the plugin"? I think this at > > > > least needs > > > > clarification with the change. > > > > > > See my reply below. > > > > > > > > - LOCK_SECTION; > > > > > - num_claimed_files++; > > > > > - claimed_files = > > > > > - xrealloc (claimed_files, > > > > > - num_claimed_files * sizeof (struct > > > > > plugin_file_info)); > > > > > - claimed_files[num_claimed_files - 1] = lto_file; > > > > > - UNLOCK_SECTION; > > > > > + /* Include it only if it is known to be used for link output. > > > > > */ > > > > > + if (known_used) > > > > > + { > > > > > + LOCK_SECTION; > > > > > + num_claimed_files++; > > > > > + claimed_files = > > > > > + xrealloc (claimed_files, > > > > > + num_claimed_files * sizeof (struct > > > > > plugin_file_info)); > > > > > + claimed_files[num_claimed_files - 1] = lto_file; > > > > > + UNLOCK_SECTION; > > > > > + } > > > > > > > > > >*claimed = 1; > > > > > } > > > > > @@ -1313,7 +1317,7 @@ claim_file_handler_v2 (const struct > > > > > ld_plugin_input_file *file, int *claimed, > > > > >if (*claimed && !obj.offload && offload_files_last_lto == NULL) > > > > > offload_files_last_lto = offload_files_last; > > > > > > > > > > - if (obj.offload && (known_used || obj.found > 0)) > > > > > + if (obj.offload && known_used && obj.found > 0) > > > > > > The offload data is included when it is claimed by the plugin > > > even if known_used is 0. It looks quite odd to me. > > > > To me the whole 'known_used' thing looks odd - I would have expected > > the linker to do two round-trips for archives maybe; first with > > knwon_used == 0, just getting the add_symbol calls (aka, get > > the LTO symbol table), then the linker computes whether the archive > > is used and if it is, re-do the claim_file hook with known_used == 1. > > > > Is that how it is done? > > Yes. > > > Otherwise how should the plugin know whether the file should be added or > > not? > > Will the linker take care of that then? Where is the API documented? I > > think > > Yes, linker will do the right thing after > > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a6f8fe0a9e9cbe871652e46ba7c22d5e9fb86208 > > > how known_used is to be used needs better documentation. > > The known documentation is in the comments for > claim_file_handler_v2. OK, I find that lacking. Specifically ", the linker is only determining whether this is a p
Re: [PATCH] testsuite: Add -fwrapv to signbit-5.c
On 2024-08-20 14:37, Tamar Christina wrote: -Original Message- From: Richard Biener Sent: Tuesday, August 20, 2024 12:33 PM To: Torbjorn SVENSSON Cc: Jeff Law ; gcc-patches@gcc.gnu.org; Richard Earnshaw ; quic_apin...@quicinc.com; yvan.r...@foss.st.com; Tamar Christina Subject: Re: [PATCH] testsuite: Add -fwrapv to signbit-5.c On Fri, Aug 16, 2024 at 4:30 PM Torbjorn SVENSSON wrote: On 2024-08-16 16:07, Jeff Law wrote: On 8/16/24 4:12 AM, Torbjörn SVENSSON wrote: Ok for trunk and releases/gcc-14? Verified this on x86_64 and arm-none-eabi. Don't know if the other "truth type" dg-lines can be removed as well. -- On Cortex-M55 with MVE, the test case fails due to -INT_MAX being undefined. Adding -fwrapv solves the issues. Regtested on x86_64-pc-linux and arm-none-eabi for Cortex-M0/M3/M4/M7/M33/M55/M85/A7. gcc/testsuite/ChangeLog: * gcc.dg/signbit-5.c: Add -fwrapv and remove x86 exception. Presumably the -x[i] when i == 0 cases? I'm a bit surprised that doing a -INT_MIN didn't produce -INT_MIN, but it's still a bad thing to do due to the overflow. On the Cortex-M55 with MVE, -INT_MIN will result in INT_MIN, i.e. a large negative value. The negated INT_MIN value cannot be represented using the two complement form with the same number of bits. Note this is what should happen everywhere. I think the testcase should simply avoid this special value - not sure why it was written this way. We shouldn't turn the test into -fwrapv only. Tamar, why did you cover negating INT_MIN here? Precisely because it is the special case. while -INT_MIN is undefined behaviour unless -fwrapv it was hiding the constant from the language to test that the implementation defined behavior from scalar and vector matched. As you pointed out, -INT_MIN should be INT_MIN on most architectures. and that's what the test expects. However right shift of a negative value is Implementation defined. This likely indicates that this isn't the same for vector and scalar on MVE. I guess in hindsight the test is doing too much as it also is testing if the code can be vectorized, and the implementation defined behavior testing belongs in the backend. (this was 4 years ago, live and learn 😊). Can we use INT_MIN + 1 instead? Yes, think that's the better solution, testing -fwrapv here doesn't seem right as signbit-4.c already tests the wrapv behavior just with scalar. And also using -fwrapv tests a different part of the code as the pattern is guarded by TYPE_OVERFLOW_UNDEFINED. Are you going to provide a patch chaining this or am I supposed to do it? I don't mind doing it, but I have a feeling that you know better what is expected or not here than I do. :) Kind regards, Torbjörn Thanks, Tamar Richard. So, OK for the trunk and release branches. If we need to adjust risc-v we'll know if a few days :-) Ok. Pushed as r15-2950 and r14-10592. Kind regards, Torbjörn
[PATCH] tree-optimization/116406 - ICE with int<->float punning prevention
The following does away with the idea to use non-symmetrical testing of mode_can_transfer_bits in hash-table equality testing. It isn't feasible to always control query order to maintain consistency. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/116406 * tree-ssa-sccvn.cc (vn_reference_eq): Never equate float and int when the float mode cannot transfer bits. Do not try to anticipate which is the mode we actually load from. * gcc.dg/tree-ssa/pr116406.c: New testcase. * gcc.dg/tree-ssa/ssa-pre-30.c: On x86 dd -msse -mfpmath=sse. --- gcc/testsuite/gcc.dg/tree-ssa/pr116406.c | 21 + gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c | 1 + gcc/tree-ssa-sccvn.cc | 3 ++- 3 files changed, 24 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116406.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116406.c b/gcc/testsuite/gcc.dg/tree-ssa/pr116406.c new file mode 100644 index 000..6643c49218f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116406.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-Os -finstrument-functions-once" } */ +/* { dg-additional-options "-mfpmath=387" { target { x86_64-*-* i?86-*-* } } } */ + +typedef union { + float f32; + double f64; + long i64; +} U; + +_Bool +foo (int c, U u) +{ + switch (c) +{ +case 1: + return u.f32 - u.f64; +case 0: + return u.i64; +} +} diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c index cf9317372d6..29dc1812338 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* { dg-require-effective-target int32 } */ /* { dg-options "-O2 -fdump-tree-pre-details" } */ +/* { dg-additional-options "-msse -mfpmath=sse" { target { x86_64-*-* i?86-*-* } } } */ int f; int g; diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc index 4370d09d9d8..abf7d38d15c 100644 --- a/gcc/tree-ssa-sccvn.cc +++ b/gcc/tree-ssa-sccvn.cc @@ -838,7 +838,8 @@ vn_reference_eq (const_vn_reference_t const vr1, const_vn_reference_t const vr2) return false; } else if (TYPE_MODE (vr1->type) != TYPE_MODE (vr2->type) - && !mode_can_transfer_bits (TYPE_MODE (vr1->type))) + && (!mode_can_transfer_bits (TYPE_MODE (vr1->type)) + || !mode_can_transfer_bits (TYPE_MODE (vr2->type return false; i = 0; -- 2.43.0
Re: [PATCH 1/2] libstdc++: Implement P2609R3 changes to the indirect invocability concepts
On Wed, 21 Aug 2024 at 01:40, Patrick Palka wrote: > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps > 14? > > -- >8 -- > > This implements the changes of this C++23 paper as a DR against C++20. It's a little unfortunate that we can't bump the __cpp_lib_ranges macro for C++20 to advertise support for this. That's because for C++20 it's stuck at 202110 which is the last value before P2387R3 "Pipe support for user-defined range adaptors" and P2494R2 "Relaxing range adaptors to allow for move only types", which are not C++20 DRs. We already implement the later P2602R2 "Poison Pills are Too Toxic" as a C++20 DR, and this patch does the same for P2609R3. But that's a more general problem of feature test macros sometimes being too coarse. The alternative would be a new macro for every feature, but the cure would be worse than the disease. This patch is OK for trunk. I think it makes sense for gcc-14 too, but please wait a while before doing the backport. > Note that since the later P2538R1 "ADL-proof std::projected" which we > already implement, we can't use a simple partial specialization to match > specializations of the 'projected' alias template. So instead we identify > such specializations by giving them a pair of distinguishing member aliases > that we can check for. > > libstdc++-v3/ChangeLog: > > * include/bits/iterator_concepts.h (__detail::__indirect_value): > Define for C++23. > (__indirect_value_t): Define for C++23 as per P2609R3. > (iter_common_reference_t): Adjust for C++23 as per P2609R3. > (indirectly_unary_invocable): Likewise. > (indirectly_regular_unary_invocable): Likewise. > (indirect_unary_predicate): Likewise. > (indirect_binary_predicate): Likewise. > (indirect_equivalence_relation): Likewise. > (indirect_strict_weak_order): Likewise. > (__detail::__projected::__type): Define member aliases > __projected_Iter and __projected_Proj providing the > template arguments of the current specialization for C++23. > * include/bits/version.def (ranges): Update value for C++23. > * include/bits/version.h: Regenerate. > * testsuite/24_iterators/indirect_callable/p2609r3.cc: New test. > * testsuite/std/ranges/version_c++23.cc: Update expected value > of __cpp_lib_ranges macro. > --- > libstdc++-v3/include/bits/iterator_concepts.h | 61 ++- > libstdc++-v3/include/bits/version.def | 2 +- > libstdc++-v3/include/bits/version.h | 4 +- > .../24_iterators/indirect_callable/p2609r3.cc | 27 > .../testsuite/std/ranges/version_c++23.cc | 2 +- > 5 files changed, 77 insertions(+), 19 deletions(-) > create mode 100644 > libstdc++-v3/testsuite/24_iterators/indirect_callable/p2609r3.cc > > diff --git a/libstdc++-v3/include/bits/iterator_concepts.h > b/libstdc++-v3/include/bits/iterator_concepts.h > index ce0b8a10f88..9306b7bd194 100644 > --- a/libstdc++-v3/include/bits/iterator_concepts.h > +++ b/libstdc++-v3/include/bits/iterator_concepts.h > @@ -552,9 +552,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > concept indirectly_readable >= __detail::__indirectly_readable_impl>; > > + namespace __detail > + { > +template > + struct __indirect_value > + { using type = iter_value_t<_Tp>&; }; > + > +// __indirect_value> is defined later. > + } // namespace __detail > + > + template > +using __indirect_value_t = typename > __detail::__indirect_value<_Tp>::type; > + >template > using iter_common_reference_t > - = common_reference_t, iter_value_t<_Tp>&>; > + = common_reference_t, __indirect_value_t<_Tp>>; > >/// Requirements for writing a value into an iterator's referenced object. >template > @@ -710,24 +722,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >template > concept indirectly_unary_invocable = indirectly_readable<_Iter> > - && copy_constructible<_Fn> && invocable<_Fn&, iter_value_t<_Iter>&> > + && copy_constructible<_Fn> && invocable<_Fn&, > __indirect_value_t<_Iter>> >&& invocable<_Fn&, iter_reference_t<_Iter>> >&& invocable<_Fn&, iter_common_reference_t<_Iter>> > - && common_reference_with&>, > + && common_reference_with __indirect_value_t<_Iter>>, >invoke_result_t<_Fn&, > iter_reference_t<_Iter>>>; > >template > concept indirectly_regular_unary_invocable = indirectly_readable<_Iter> >&& copy_constructible<_Fn> > - && regular_invocable<_Fn&, iter_value_t<_Iter>&> > + && regular_invocable<_Fn&, __indirect_value_t<_Iter>> >&& regular_invocable<_Fn&, iter_reference_t<_Iter>> >&& regular_invocable<_Fn&, iter_common_reference_t<_Iter>> > - && common_reference_with&>, > + && common_reference_with __indirect_value_t<_Iter>>, >invoke_result_t<_Fn&, > iter_reference_t<_Iter>>>; >
[PATCH v1 1/2] RISC-V: Add testcases for unsigned vector .SAT_TRUNC form 2
From: Pan Li This patch would like to add test cases for the unsigned vector .SAT_TRUNC form 2. Aka: Form 2: #define DEF_VEC_SAT_U_TRUNC_FMT_2(NT, WT) \ void __attribute__((noinline))\ vec_sat_u_trunc_##NT##_##WT##_fmt_2 (NT *out, WT *in, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ WT max = (WT)(NT)-1;\ out[i] = in[i] > max ? (NT)max : (NT)in[i]; \ } \ } DEF_VEC_SAT_U_TRUNC_FMT_2 (uint32_t, uint64_t) The below test is passed for this patch. * The rv64gcv regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-10.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-11.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-12.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-7.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-8.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-9.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-10.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-11.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-12.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-7.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-8.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-9.c: New test. Signed-off-by: Pan Li --- .../rvv/autovec/unop/vec_sat_u_trunc-10.c | 19 +++ .../rvv/autovec/unop/vec_sat_u_trunc-11.c | 21 + .../rvv/autovec/unop/vec_sat_u_trunc-12.c | 19 +++ .../rvv/autovec/unop/vec_sat_u_trunc-7.c | 19 +++ .../rvv/autovec/unop/vec_sat_u_trunc-8.c | 21 + .../rvv/autovec/unop/vec_sat_u_trunc-9.c | 23 +++ .../rvv/autovec/unop/vec_sat_u_trunc-run-10.c | 16 + .../rvv/autovec/unop/vec_sat_u_trunc-run-11.c | 16 + .../rvv/autovec/unop/vec_sat_u_trunc-run-12.c | 16 + .../rvv/autovec/unop/vec_sat_u_trunc-run-7.c | 16 + .../rvv/autovec/unop/vec_sat_u_trunc-run-8.c | 16 + .../rvv/autovec/unop/vec_sat_u_trunc-run-9.c | 16 + .../riscv/rvv/autovec/vec_sat_arith.h | 18 +++ 13 files changed, 236 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-10.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-11.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-12.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-7.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-8.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-9.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-10.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-11.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-12.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-7.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-8.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-9.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-10.c new file mode 100644 index 000..f5084e503eb --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-10.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-skip-if "" { *-*-* } { "-flto" } } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "../vec_sat_arith.h" + +/* +** vec_sat_u_trunc_uint16_t_uint32_t_fmt_2: +** ... +** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*mf2,\s*ta,\s*ma +** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\) +** vnclipu\.wi\s+v[0-9]+,\s*v[0-9]+,\s*0 +** vse16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\) +** ... +*/ +DEF_VEC_SAT_U_TRUNC_FMT_2 (uint16_t, uint32_t) + +/* { dg-fin
[PATCH v1 2/2] RISC-V: Add testcases for unsigned vector .SAT_TRUNC form 3
From: Pan Li This patch would like to add test cases for the unsigned vector .SAT_TRUNC form 3. Aka: Form 3: #define DEF_VEC_SAT_U_TRUNC_FMT_3(NT, WT) \ void __attribute__((noinline))\ vec_sat_u_trunc_##NT##_##WT##_fmt_3 (NT *out, WT *in, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ WT max = (WT)(NT)-1;\ out[i] = in[i] <= max ? (NT)in[i] : (NT)max;\ } \ } DEF_VEC_SAT_U_TRUNC_FMT_3 (uint32_t, uint64_t) The below test is passed for this patch. * The rv64gcv regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-13.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-14.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-15.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-16.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-17.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-18.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-13.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-14.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-15.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-16.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-17.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-18.c: New test. Signed-off-by: Pan Li --- .../rvv/autovec/unop/vec_sat_u_trunc-13.c | 19 +++ .../rvv/autovec/unop/vec_sat_u_trunc-14.c | 21 + .../rvv/autovec/unop/vec_sat_u_trunc-15.c | 23 +++ .../rvv/autovec/unop/vec_sat_u_trunc-16.c | 19 +++ .../rvv/autovec/unop/vec_sat_u_trunc-17.c | 21 + .../rvv/autovec/unop/vec_sat_u_trunc-18.c | 19 +++ .../rvv/autovec/unop/vec_sat_u_trunc-run-13.c | 16 + .../rvv/autovec/unop/vec_sat_u_trunc-run-14.c | 16 + .../rvv/autovec/unop/vec_sat_u_trunc-run-15.c | 16 + .../rvv/autovec/unop/vec_sat_u_trunc-run-16.c | 16 + .../rvv/autovec/unop/vec_sat_u_trunc-run-17.c | 16 + .../rvv/autovec/unop/vec_sat_u_trunc-run-18.c | 16 + .../riscv/rvv/autovec/vec_sat_arith.h | 18 +++ 13 files changed, 236 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-13.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-14.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-15.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-16.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-17.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-18.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-13.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-14.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-15.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-16.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-17.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-18.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-13.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-13.c new file mode 100644 index 000..49bdbdc3606 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-13.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-skip-if "" { *-*-* } { "-flto" } } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "../vec_sat_arith.h" + +/* +** vec_sat_u_trunc_uint8_t_uint16_t_fmt_3: +** ... +** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*mf2,\s*ta,\s*ma +** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\) +** vnclipu\.wi\s+v[0-9]+,\s*v[0-9]+,\s*0 +** vse8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\) +** ... +*/ +DEF_VEC_SAT_U_TRUNC_FMT_3 (uint8_t, uint16_t) + +/*
Re: [PATCH 2/2] libstdc++: Implement P2997R1 changes to the indirect invocability concepts
On Wed, 21 Aug 2024 at 01:40, Patrick Palka wrote: > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps > 14? > > -- >8 -- > > This implements the changes of this C++26 paper as a DR against C++20. > > libstdc++-v3/ChangeLog: > > * include/bits/iterator_concepts.h (indirectly_unary_invocable): > Relax as per P2997R1. > (indirectly_regular_unary_invocable): Likewise. > (indirect_unary_predicate): Likewise. > (indirect_binary_predicate): Likewise. > (indirect_equivalence_relation): Likewise. > (indirect_strict_weak_order): Likewise. > * version.def (ranges): Update value for C++26. > * version.h: Regenerate. > * testsuite/24_iterators/indirect_callable/p2997r1.cc: New test. > * testsuite/std/ranges/version_c++23.cc: Restrict to C++23 mode. > * testsuite/std/ranges/version_c++26.cc: New test. Can we get rid of version_c++23.cc (and not add version_c++26.cc) and just expand the check in std/ranges/synopsis.cc instead? Currently it does: #ifndef __cpp_lib_ranges # error "Feature test macro for ranges is missing in " #elif __cpp_lib_ranges < 201911L # error "Feature test macro for ranges has wrong value in " #endif but that could be: #ifndef __cpp_lib_ranges # error "Feature test macro for ranges is missing in " #elif __cplusplus > 202302 && __cpp_lib_ranges < 202406L # error "Feature test macro for ranges has wrong value in " #elif __cplusplus == 202302 && __cpp_lib_ranges < 202406L # error "Feature test macro for ranges has wrong value in " #elif __cpp_lib_ranges < 201911L # error "Feature test macro for ranges has wrong value in " #endif or define EXPECTED_VALUE to the appropriate value for each __cplusplus dialect, then have one test of __cpp_lib_ranges != EXPECTED_VALUE. This looks fine apart from that quibble. > --- > libstdc++-v3/include/bits/iterator_concepts.h | 17 ++--- > libstdc++-v3/include/bits/version.def | 5 +++ > libstdc++-v3/include/bits/version.h | 7 +++- > .../24_iterators/indirect_callable/p2997r1.cc | 37 +++ > .../testsuite/std/ranges/version_c++23.cc | 2 +- > .../testsuite/std/ranges/version_c++26.cc | 10 + > 6 files changed, 63 insertions(+), 15 deletions(-) > create mode 100644 > libstdc++-v3/testsuite/24_iterators/indirect_callable/p2997r1.cc > create mode 100644 libstdc++-v3/testsuite/std/ranges/version_c++26.cc > > diff --git a/libstdc++-v3/include/bits/iterator_concepts.h > b/libstdc++-v3/include/bits/iterator_concepts.h > index 9306b7bd194..d849ddc32fc 100644 > --- a/libstdc++-v3/include/bits/iterator_concepts.h > +++ b/libstdc++-v3/include/bits/iterator_concepts.h > @@ -724,7 +724,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > concept indirectly_unary_invocable = indirectly_readable<_Iter> >&& copy_constructible<_Fn> && invocable<_Fn&, > __indirect_value_t<_Iter>> >&& invocable<_Fn&, iter_reference_t<_Iter>> > - && invocable<_Fn&, iter_common_reference_t<_Iter>> >&& common_reference_with __indirect_value_t<_Iter>>, >invoke_result_t<_Fn&, > iter_reference_t<_Iter>>>; > > @@ -733,15 +732,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >&& copy_constructible<_Fn> >&& regular_invocable<_Fn&, __indirect_value_t<_Iter>> >&& regular_invocable<_Fn&, iter_reference_t<_Iter>> > - && regular_invocable<_Fn&, iter_common_reference_t<_Iter>> >&& common_reference_with __indirect_value_t<_Iter>>, >invoke_result_t<_Fn&, > iter_reference_t<_Iter>>>; > >template > concept indirect_unary_predicate = indirectly_readable<_Iter> >&& copy_constructible<_Fn> && predicate<_Fn&, > __indirect_value_t<_Iter>> > - && predicate<_Fn&, iter_reference_t<_Iter>> > - && predicate<_Fn&, iter_common_reference_t<_Iter>>; > + && predicate<_Fn&, iter_reference_t<_Iter>>; > >template > concept indirect_binary_predicate > @@ -750,9 +747,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >&& predicate<_Fn&, __indirect_value_t<_I1>, __indirect_value_t<_I2>> >&& predicate<_Fn&, __indirect_value_t<_I1>, iter_reference_t<_I2>> >&& predicate<_Fn&, iter_reference_t<_I1>, __indirect_value_t<_I2>> > - && predicate<_Fn&, iter_reference_t<_I1>, iter_reference_t<_I2>> > - && predicate<_Fn&, iter_common_reference_t<_I1>, > - iter_common_reference_t<_I2>>; > + && predicate<_Fn&, iter_reference_t<_I1>, iter_reference_t<_I2>>; > >template > concept indirect_equivalence_relation > @@ -762,9 +757,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >&& equivalence_relation<_Fn&, __indirect_value_t<_I1>, > iter_reference_t<_I2>> >&& equivalence_relation<_Fn&, iter_reference_t<_I1>, > __indirect_value_t<_I2>> >&& equivalence_relation<_Fn&, iter_reference_t<_I1>, > - iter_reference_t<_I2>> > - && equiva
Re: [PATCH] Update LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker plugin hook
On Wed, Aug 21, 2024 at 6:23 AM Richard Biener wrote: > > On Wed, Aug 21, 2024 at 2:27 PM H.J. Lu wrote: > > > > On Wed, Aug 21, 2024 at 2:38 AM Richard Biener > > wrote: > > > > > > On Tue, Aug 20, 2024 at 3:24 PM H.J. Lu wrote: > > > > > > > > On Tue, Aug 20, 2024 at 2:03 AM Richard Biener > > > > wrote: > > > > > > > > > > On Wed, Aug 14, 2024 at 3:15 PM H.J. Lu wrote: > > > > > > > > > > > > The new hook allows the linker plugin to distinguish calls to > > > > > > claim_file_handler that know the object is being used by the linker > > > > > > (from ldmain.c:add_archive_element), from calls that don't know it's > > > > > > being used by the linker (from elf_link_is_defined_archive_symbol); > > > > > > in > > > > > > the latter case, the plugin should avoid including the unused LTO > > > > > > archive > > > > > > members in linker output. To get the proper support for archives > > > > > > with > > > > > > LTO common symbols, the linker fix for > > > > > > > > > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=32083 > > > > > > > > > > > > is required. > > > > > > > > > > > > PR lto/116361 > > > > > > * lto-plugin.c (claim_file_handler_v2): Include the LTO > > > > > > object > > > > > > only if it is known to be used for link output. > > > > > > > > > > > > Signed-off-by: H.J. Lu > > > > > > --- > > > > > > lto-plugin/lto-plugin.c | 20 > > > > > > 1 file changed, 12 insertions(+), 8 deletions(-) > > > > > > > > > > > > diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c > > > > > > index 152648338b9..2d2bfa60d42 100644 > > > > > > --- a/lto-plugin/lto-plugin.c > > > > > > +++ b/lto-plugin/lto-plugin.c > > > > > > @@ -1286,13 +1286,17 @@ claim_file_handler_v2 (const struct > > > > > > ld_plugin_input_file *file, int *claimed, > > > > > > lto_file.symtab.syms); > > > > > >check (status == LDPS_OK, LDPL_FATAL, "could not add > > > > > > symbols"); > > > > > > > > > > We are still doing add_symbols, shouldn't what we do depend on what > > > > > that does? The > > > > > > > > If status != LDPS_OK, the plugin will abort because of LDPL_FATAL. > > > > > > > > > function comment says > > > > > > > > > >If KNOWN_USED, the object is known by the linker > > > > >to be used, or an older API version is in use that does not > > > > > provide that > > > > >information; otherwise, the linker is only determining whether > > > > > this is > > > > >a plugin object and it should not be registered as having offload > > > > > data if > > > > >not claimed by the plugin. > > > > > > > > > > where do you check "if not claimed by the plugin"? I think this at > > > > > least needs > > > > > clarification with the change. > > > > > > > > See my reply below. > > > > > > > > > > - LOCK_SECTION; > > > > > > - num_claimed_files++; > > > > > > - claimed_files = > > > > > > - xrealloc (claimed_files, > > > > > > - num_claimed_files * sizeof (struct > > > > > > plugin_file_info)); > > > > > > - claimed_files[num_claimed_files - 1] = lto_file; > > > > > > - UNLOCK_SECTION; > > > > > > + /* Include it only if it is known to be used for link > > > > > > output. */ > > > > > > + if (known_used) > > > > > > + { > > > > > > + LOCK_SECTION; > > > > > > + num_claimed_files++; > > > > > > + claimed_files = > > > > > > + xrealloc (claimed_files, > > > > > > + num_claimed_files * sizeof (struct > > > > > > plugin_file_info)); > > > > > > + claimed_files[num_claimed_files - 1] = lto_file; > > > > > > + UNLOCK_SECTION; > > > > > > + } > > > > > > > > > > > >*claimed = 1; > > > > > > } > > > > > > @@ -1313,7 +1317,7 @@ claim_file_handler_v2 (const struct > > > > > > ld_plugin_input_file *file, int *claimed, > > > > > >if (*claimed && !obj.offload && offload_files_last_lto == NULL) > > > > > > offload_files_last_lto = offload_files_last; > > > > > > > > > > > > - if (obj.offload && (known_used || obj.found > 0)) > > > > > > + if (obj.offload && known_used && obj.found > 0) > > > > > > > > The offload data is included when it is claimed by the plugin > > > > even if known_used is 0. It looks quite odd to me. > > > > > > To me the whole 'known_used' thing looks odd - I would have expected > > > the linker to do two round-trips for archives maybe; first with > > > knwon_used == 0, just getting the add_symbol calls (aka, get > > > the LTO symbol table), then the linker computes whether the archive > > > is used and if it is, re-do the claim_file hook with known_used == 1. > > > > > > Is that how it is done? > > > > Yes. > > > > > Otherwise how should the plugin know whether the file should be added or > > > not? > > > Will the linker take care of that then? Where is the API documented? I > > > think > > > > Yes, linker will do the right thing after
Re: [RFC/RFA][PATCH v4 06/12] aarch64: Implement new expander for efficient CRC computation
Mariam Arutunian writes: > This patch introduces two new expanders for the aarch64 backend, > dedicated to generate optimized code for CRC computations. > The new expanders are designed to leverage specific hardware capabilities > to achieve faster CRC calculations, > particularly using the crc32, crc32c and pmull instructions when supported > by the target architecture. > > Expander 1: Bit-Forward CRC (crc4) > For targets that support pmul instruction (TARGET_AES), > the expander will generate code that uses the pmull (crypto_pmulldi) > instruction for CRC computation. > > Expander 2: Bit-Reversed CRC (crc_rev4) > The expander first checks if the target supports the CRC32* instruction set > (TARGET_CRC32) > and the polynomial in use is 0x1EDC6F41 (iSCSI) or 0x04C11DB7 (HDLC). If > the conditions are met, > it emits calls to the corresponding crc32* instruction (depending on the > data size and the polynomial). > If the target does not support crc32* but supports pmull, it then uses the > pmull (crypto_pmulldi) instruction for bit-reversed CRC computation. > Otherwise table-based CRC is generated. > > gcc/config/aarch64/ > > * aarch64-protos.h (aarch64_expand_crc_using_pmull): New extern > function declaration. > (aarch64_expand_reversed_crc_using_pmull): Likewise. > * aarch64.cc (aarch64_expand_crc_using_pmull): New function. > (aarch64_expand_reversed_crc_using_pmull): Likewise. > * aarch64.md (crc_rev4): New expander for > reversed CRC. > (crc4): New expander for bit-forward CRC. > * iterators.md (crc_data_type): New mode attribute. > > gcc/testsuite/gcc.target/aarch64/ > > * crc-1-pmul.c: New test. > * crc-10-pmul.c: Likewise. > * crc-12-pmul.c: Likewise. > * crc-13-pmul.c: Likewise. > * crc-14-pmul.c: Likewise. > * crc-17-pmul.c: Likewise. > * crc-18-pmul.c: Likewise. > * crc-21-pmul.c: Likewise. > * crc-22-pmul.c: Likewise. > * crc-23-pmul.c: Likewise. > * crc-4-pmul.c: Likewise. > * crc-5-pmul.c: Likewise. > * crc-6-pmul.c: Likewise. > * crc-7-pmul.c: Likewise. > * crc-8-pmul.c: Likewise. > * crc-9-pmul.c: Likewise. > * crc-CCIT-data16-pmul.c: Likewise. > * crc-CCIT-data8-pmul.c: Likewise. > * crc-coremark-16bitdata-pmul.c: Likewise. > * crc-crc32-data16.c: Likewise. > * crc-crc32-data32.c: Likewise. > * crc-crc32-data8.c: Likewise. > * crc-crc32c-data16.c: Likewise. > * crc-crc32c-data32.c: Likewise. > * crc-crc32c-data8.c: Likewise. OK for trunk once the prerequisites are approved. Thanks for all your work on this. Which other parts of the series still need review? I can try to help out with the target-independent bits. (That said, I'm not sure I'm the best person to review the tree recognition pass, but I can have a go.) Richard > > Signed-off-by: Mariam Arutunian > Co-authored-by: Richard Sandiford > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index 42639e9efcf..469111e3b17 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -1112,5 +1112,8 @@ extern void aarch64_adjust_reg_alloc_order (); > > bool aarch64_optimize_mode_switching (aarch64_mode_entity); > void aarch64_restore_za (rtx); > +void aarch64_expand_crc_using_pmull (scalar_mode, scalar_mode, rtx *); > +void aarch64_expand_reversed_crc_using_pmull (scalar_mode, scalar_mode, rtx > *); > + > > #endif /* GCC_AARCH64_PROTOS_H */ > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 7f0cc47d0f0..0cb8f3e8090 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -30314,6 +30314,137 @@ aarch64_retrieve_sysreg (const char *regname, bool > write_p, bool is128op) >return sysreg->encoding; > } > > +/* Generate assembly to calculate CRC > + using carry-less multiplication instruction. > + OPERANDS[1] is input CRC, > + OPERANDS[2] is data (message), > + OPERANDS[3] is the polynomial without the leading 1. */ > + > +void > +aarch64_expand_crc_using_pmull (scalar_mode crc_mode, > + scalar_mode data_mode, > + rtx *operands) > +{ > + /* Check and keep arguments. */ > + gcc_assert (!CONST_INT_P (operands[0])); > + gcc_assert (CONST_INT_P (operands[3])); > + rtx crc = operands[1]; > + rtx data = operands[2]; > + rtx polynomial = operands[3]; > + > + unsigned HOST_WIDE_INT crc_size = GET_MODE_BITSIZE (crc_mode); > + unsigned HOST_WIDE_INT data_size = GET_MODE_BITSIZE (data_mode); > + gcc_assert (crc_size <= 32); > + gcc_assert (data_size <= crc_size); > + > + /* Calculate the quotient. */ > + unsigned HOST_WIDE_INT > + q = gf2n_poly_long_div_quotient (UINTVAL (polynomial), crc_size); > + /* CRC calculation's main part. */ > + if (crc_size > data_size) > +crc = expand_shift (RSHIFT_EXPR, DImode, crc, crc_size - data_size, > + NULL_RTX,
.
> On Aug 21, 2024, at 04:44, Richard Biener wrote: > > On Tue, Aug 20, 2024 at 3:41 PM Qing Zhao wrote: >> >> >> >>> On Aug 20, 2024, at 05:58, Richard Biener >>> wrote: >>> >>> On Tue, Aug 13, 2024 at 5:34 PM Qing Zhao wrote: With the addition of the 'counted_by' attribute and its wide roll-out within the Linux kernel, a use case has been found that would be very nice to have for object allocators: being able to set the counted_by counter variable without knowing its name. For example, given: struct foo { ... int counter; ... struct bar array[] __attribute__((counted_by (counter))); } *p; The existing Linux object allocators are roughly: #define alloc(P, FAM, COUNT) ({ \ size_t __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \ kmalloc(__size, GFP); \ }) Right now, any addition of a counted_by annotation must also include an open-coded assignment of the counter variable after the allocation: p = alloc(p, array, how_many); p->counter = how_many; In order to avoid the tedious and error-prone work of manually adding the open-coded counted-by intializations everywhere in the Linux kernel, a new GCC builtin __builtin_get_counted_by will be very useful to be added to help the adoption of the counted-by attribute. -- Built-in Function: TYPE __builtin_get_counted_by (PTR) The built-in function '__builtin_get_counted_by' checks whether the array object pointed by the pointer PTR has another object associated with it that represents the number of elements in the array object through the 'counted_by' attribute (i.e. the counted-by object). If so, returns a pointer to the corresponding counted-by object. If such counted-by object does not exist, returns a NULL pointer. This built-in function is only available in C for now. The argument PTR must be a pointer to an array. The TYPE of the returned value must be a pointer type pointing to the corresponding type of the counted-by object or a pointer type pointing to the SIZE_T in case of a NULL pointer being returned. With this new builtin, the central allocator could be updated to: #define alloc(P, FAM, COUNT) ({ \ typeof(P) __p; \ size_t __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \ __p = kmalloc(__size, GFP); \ if (__builtin_get_counted_by (__p->FAM)) \ *(__builtin_get_counted_by(__p->FAM)) = COUNT; \ __p; \ }) And then structs can gain the counted_by attribute without needing additional open-coded counter assignments for each struct, and unannotated structs could still use the same allocator. >>> >>> Did you consider a __builtin_set_counted_by (PTR, VALUE)? >> >> Yes, that’s the initial request from Kees. -) >> >> The title of PR116016 is: add __builtin_set_counted_by(P->FAM, COUNT) or >> equivalent >> >> After extensive discussion (Martin Uecker raised the initial idea in >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116016#c24, more discussions >> followed, till comments #31). we decided to provide >> __builtin_get_counted_by(PTR) instead of __builtin_set_counted_by(PTR, >> VALUE) due to the following two reasons: >> >> 1. __builtin_get_counted_by should be enough to provide the functionality, >> and even simpler; >> 2. More flexible to be used by the programmer to be able to both WRITE and >> READ the counted-by field. >> >> >> >>> >>> Note that __builtin_get_counted_by to me suggests it returns the >>> value and not a pointer to the value. >> >> The syntax of __builtin_get_counted_by is: >> >> TYPE __builtin_get_counted_by (PTR) >> >> The returned value is: >> >> returns a pointer to the corresponding >>counted-by object. If such counted-by object does not exist, >>returns a NULL pointer. >> >> This built-in function is only available in C for now. >> >>The argument PTR must be a pointer to an array. The TYPE of the >>returned value must be a pointer type pointing to the corresponding >>type of the counted-by object or a pointer type pointing to the >>SIZE_T in case of a NULL pointer being returned. >> >> >>> A more proper language extension might involve a keyword >>> like __real, so __counted_by X would produce an lvalue, selecting >>> the counted-by member. >> >> Yes, if the returned value could be a LVALUE instead of a Pointer, that’s >> even simpler and cleaner. >> However, then as you mentioned below, another builtin >> “__builtin_has_attribute(PTR, counted_by)” need >> to be queried first to make sure the counted_by field exists. >> >> We have discussed this approach, and I preferred this approach too. >> >> However, the main reason we gave up on that direction is: >> >> There is NO __builtin_ha
Re: [PATCH] PR target/116365: Add user-friendly arguments to --param aarch64-autovec-preference=N
Kyrylo Tkachov writes: >> On 20 Aug 2024, at 19:11, Richard Sandiford >> wrot>> Jennifer Schmitz writes: >>> The param aarch64-autovec-preference=N is a useful tool for testing >>> auto-vectorisation in GCC as it allows the user to force a particular >>> strategy. So far, N could be an numerical value between 0 and 4. >>> This patch adds more user-friendly names to distinguish the options. >>> For backwards compatibility, the numerical values are retained, but are made >>> aliases of the new user-readable strings. >>> >>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no >>> regression. >>> Ok for mainline? >> >> User-readable names are good, but just to check: who is the intended >> user here? Is it just to save developers/experimenters the effort of >> having to look up the magic numbers (and yes, I have to do that each >> time I use it :) )? Or is it for more general users? >> >> If this is something that is being recommended for general use, >> then we should probably promote it from a --param to a full -m option. >> >> The difference is that --params are intended to be developer options, >> so they can be removed or changed without warning. For this reason, >> if we keep it as a --param, I think we should remove the old numeric >> values and just go with the symbolic ones. >> >> Instead, -m options are intended as user options and are more stable. >> If an -m option becomes redundant later, we'd probably just turn it into >> a no-op rather than remove it. > > I recommended to Jennifer to keep the old names as aliases because I had seem > some internal scripts that did some specialist analysis use them and I got > the impression there may be more such users. > But those scripts can easily be updated and indeed this is a param that we > don’t promise to keep it backwards—compatible. > I’ve been in two minds over whether to keep the old options, I’m okay with > switching to just the enum values. > I wouldn’t want to make this an -m* option as I don’t want to see this used > in production code to override what the compiler should be getting right on > its own. Ah, ok, thanks. In that case I agree we should just keep it as a --param. I've a slight preference for dropping the numeric arguments, but definitely no objection to keeping them if that's more convenient. Richard > > Thanks, > Kyrill > > >> >>> >>> Signed-off-by: Jennifer Schmitz >>> >>> gcc/ >>> PR target/116365 >>> * config/aarch64/aarch64-opts.h >>> (enum aarch64_autovec_preference_enum): New enum. >>> * config/aarch64/aarch64.cc (aarch64_cmp_autovec_modes): >>> Change numerical to enum values. >>> (aarch64_autovectorize_vector_modes): Change numerical to enum >>> values. >>> (aarch64_vector_costs::record_potential_advsimd_unrolling): >>> Change numerical to enum values. >>> * config/aarch64/aarch64.opt: Change param type to enum. >>> * doc/invoke.texi: Update documentation. >>> >>> gcc/testsuite/ >>> PR target/116365 >>> * gcc.target/aarch64/autovec_param_0.c: New test. >>> * gcc.target/aarch64/autovec_param_1.c: Likewise. >>> * gcc.target/aarch64/autovec_param_2.c: Likewise. >>> * gcc.target/aarch64/autovec_param_3.c: Likewise. >>> * gcc.target/aarch64/autovec_param_4.c: Likewise. >>> * gcc.target/aarch64/autovec_param_asimd-only.c: Likewise. >>> * gcc.target/aarch64/autovec_param_default.c: Likewise. >>> * gcc.target/aarch64/autovec_param_prefer-asimd.c: Likewise. >>> * gcc.target/aarch64/autovec_param_prefer-sve.c: Likewise. >>> * gcc.target/aarch64/autovec_param_sve-only.c: Likewise. >>> * gcc.target/aarch64/neoverse_v1_2.c: Update parameter value. >>> * gcc.target/aarch64/neoverse_v1_3.c: Likewise. >>> * gcc.target/aarch64/sve/cond_asrd_1.c: Likewise. >>> * gcc.target/aarch64/sve/cond_cnot_4.c: Likewise. >>> * gcc.target/aarch64/sve/cond_unary_5.c: Likewise. >>> * gcc.target/aarch64/sve/cond_uxt_5.c: Likewise. >>> * gcc.target/aarch64/sve/cond_xorsign_2.c: Likewise. >>> * gcc.target/aarch64/sve/pr98268-1.c: Likewise. >>> * gcc.target/aarch64/sve/pr98268-2.c: Likewise. >>> >>> From 2e8693143f1c9c0668dea7bad77b3eadac6a4835 Mon Sep 17 00:00:00 2001 >>> From: Jennifer Schmitz >>> Date: Mon, 19 Aug 2024 08:42:55 -0700 >>> Subject: [PATCH] PR target/116365: Add user-friendly arguments to --param >>> aarch64-autovec-preference=N >>> >>> The param aarch64-autovec-preference=N is a useful tool for testing >>> auto-vectorisation in GCC as it allows the user to force a particular >>> strategy. So far, N could be a numerical value between 0 and 4. >>> This patch adds more user-friendly names to distinguish the options. >>> For backwards compatibility, the numerical values are retained, but are made >>> aliases of the new user-readable strings. >>> >>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no >>> regression. >>> Ok for mainline
Re: [PATCH v2] aarch64: Implement popcountti2 pattern [PR113042]
Andrew Pinski writes: > When CSSC is not enabled, 128bit popcount can be implemented > just via the vector (v16qi) cnt instruction followed by a reduction, > like how the 64bit one is currently implemented instead of > splitting into 2 64bit popcount. > > Changes since v1: > * v2: Make operand 0 be DImode instead of TImode and simplify. > > Build and tested for aarch64-linux-gnu. > > PR target/113042 > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (popcountti2): New define_expand. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/popcnt10.c: New test. > * gcc.target/aarch64/popcnt9.c: New test. OK, thanks. Richard > > Signed-off-by: Andrew Pinski > --- > gcc/config/aarch64/aarch64.md | 13 +++ > gcc/testsuite/gcc.target/aarch64/popcnt10.c | 25 + > gcc/testsuite/gcc.target/aarch64/popcnt9.c | 25 + > 3 files changed, 63 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt10.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt9.c > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 12dcc16529a..c54b29cd64b 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -5378,6 +5378,19 @@ (define_expand "popcount2" > } > }) > > +(define_expand "popcountti2" > + [(match_operand:DI 0 "register_operand") > + (match_operand:TI 1 "register_operand")] > + "TARGET_SIMD && !TARGET_CSSC" > +{ > + rtx v = gen_reg_rtx (V16QImode); > + rtx v1 = gen_reg_rtx (V16QImode); > + emit_move_insn (v, gen_lowpart (V16QImode, operands[1])); > + emit_insn (gen_popcountv16qi2 (v1, v)); > + emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v16qi (operands[0], v1)); > + DONE; > +}) > + > (define_insn "clrsb2" >[(set (match_operand:GPI 0 "register_operand" "=r") > (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))] > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt10.c > b/gcc/testsuite/gcc.target/aarch64/popcnt10.c > new file mode 100644 > index 000..4d01fc67022 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt10.c > @@ -0,0 +1,25 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > +/* PR target/113042 */ > + > +#pragma GCC target "+cssc" > + > +/* > +** h128: > +** ldp x([0-9]+), x([0-9]+), \[x0\] > +** cnt x([0-9]+), x([0-9]+) > +** cnt x([0-9]+), x([0-9]+) > +** add w0, w([0-9]+), w([0-9]+) > +** ret > +*/ > + > + > +unsigned h128 (const unsigned __int128 *a) { > + return __builtin_popcountg (a[0]); > +} > + > +/* popcount with CSSC should be split into 2 sections. */ > +/* { dg-final { scan-tree-dump-not "POPCOUNT " "optimized" } } */ > +/* { dg-final { scan-tree-dump-times " __builtin_popcount" 2 "optimized" } } > */ > + > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt9.c > b/gcc/testsuite/gcc.target/aarch64/popcnt9.c > new file mode 100644 > index 000..c778fc7f420 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt9.c > @@ -0,0 +1,25 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > +/* PR target/113042 */ > + > +#pragma GCC target "+nocssc" > + > +/* > +** h128: > +** ldr q([0-9]+), \[x0\] > +** cnt v([0-9]+).16b, v\1.16b > +** addvb([0-9]+), v\2.16b > +** fmovw0, s\3 > +** ret > +*/ > + > + > +unsigned h128 (const unsigned __int128 *a) { > + return __builtin_popcountg (a[0]); > +} > + > +/* There should be only one POPCOUNT. */ > +/* { dg-final { scan-tree-dump-times "POPCOUNT " 1 "optimized" } } */ > +/* { dg-final { scan-tree-dump-not " __builtin_popcount" "optimized" } } */ > +
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
(Resend since the previous one has no subject). > On Aug 21, 2024, at 04:44, Richard Biener wrote: > > On Tue, Aug 20, 2024 at 3:41 PM Qing Zhao wrote: >> >> >> >>> On Aug 20, 2024, at 05:58, Richard Biener >>> wrote: >>> >>> On Tue, Aug 13, 2024 at 5:34 PM Qing Zhao wrote: With the addition of the 'counted_by' attribute and its wide roll-out within the Linux kernel, a use case has been found that would be very nice to have for object allocators: being able to set the counted_by counter variable without knowing its name. For example, given: struct foo { ... int counter; ... struct bar array[] __attribute__((counted_by (counter))); } *p; The existing Linux object allocators are roughly: #define alloc(P, FAM, COUNT) ({ \ size_t __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \ kmalloc(__size, GFP); \ }) Right now, any addition of a counted_by annotation must also include an open-coded assignment of the counter variable after the allocation: p = alloc(p, array, how_many); p->counter = how_many; In order to avoid the tedious and error-prone work of manually adding the open-coded counted-by intializations everywhere in the Linux kernel, a new GCC builtin __builtin_get_counted_by will be very useful to be added to help the adoption of the counted-by attribute. -- Built-in Function: TYPE __builtin_get_counted_by (PTR) The built-in function '__builtin_get_counted_by' checks whether the array object pointed by the pointer PTR has another object associated with it that represents the number of elements in the array object through the 'counted_by' attribute (i.e. the counted-by object). If so, returns a pointer to the corresponding counted-by object. If such counted-by object does not exist, returns a NULL pointer. This built-in function is only available in C for now. The argument PTR must be a pointer to an array. The TYPE of the returned value must be a pointer type pointing to the corresponding type of the counted-by object or a pointer type pointing to the SIZE_T in case of a NULL pointer being returned. With this new builtin, the central allocator could be updated to: #define alloc(P, FAM, COUNT) ({ \ typeof(P) __p; \ size_t __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \ __p = kmalloc(__size, GFP); \ if (__builtin_get_counted_by (__p->FAM)) \ *(__builtin_get_counted_by(__p->FAM)) = COUNT; \ __p; \ }) And then structs can gain the counted_by attribute without needing additional open-coded counter assignments for each struct, and unannotated structs could still use the same allocator. >>> >>> Did you consider a __builtin_set_counted_by (PTR, VALUE)? >> >> Yes, that’s the initial request from Kees. -) >> >> The title of PR116016 is: add __builtin_set_counted_by(P->FAM, COUNT) or >> equivalent >> >> After extensive discussion (Martin Uecker raised the initial idea in >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116016#c24, more discussions >> followed, till comments #31). we decided to provide >> __builtin_get_counted_by(PTR) instead of __builtin_set_counted_by(PTR, >> VALUE) due to the following two reasons: >> >> 1. __builtin_get_counted_by should be enough to provide the functionality, >> and even simpler; >> 2. More flexible to be used by the programmer to be able to both WRITE and >> READ the counted-by field. >> >> >> >>> >>> Note that __builtin_get_counted_by to me suggests it returns the >>> value and not a pointer to the value. >> >> The syntax of __builtin_get_counted_by is: >> >> TYPE __builtin_get_counted_by (PTR) >> >> The returned value is: >> >> returns a pointer to the corresponding >>counted-by object. If such counted-by object does not exist, >>returns a NULL pointer. >> >> This built-in function is only available in C for now. >> >>The argument PTR must be a pointer to an array. The TYPE of the >>returned value must be a pointer type pointing to the corresponding >>type of the counted-by object or a pointer type pointing to the >>SIZE_T in case of a NULL pointer being returned. >> >> >>> A more proper language extension might involve a keyword >>> like __real, so __counted_by X would produce an lvalue, selecting >>> the counted-by member. >> >> Yes, if the returned value could be a LVALUE instead of a Pointer, that’s >> even simpler and cleaner. >> However, then as you mentioned below, another builtin >> “__builtin_has_attribute(PTR, counted_by)” need >> to be queried first to make sure the counted_by field exists. >> >> We have discussed this approach, and I preferred this approach too. >> >> However, the main reason we gave up on tha
Re: [PATCH] optabs-query: Guard smallest_int_mode_for_size [PR115495].
Richard Biener writes: > On Wed, Aug 21, 2024 at 8:37 AM Robin Dapp wrote: >> >> Hi, >> >> in get_best_extraction_insn we use smallest_int_mode_for_size with >> struct_bits as size argument. In PR115495 struct_bits = 256 and we >> don't have a mode for that. This patch just bails for such cases. >> >> This does not happen on the current trunk anymore (so the test passes >> unpatched) but we've seen it internally. Does it still make sense >> to install it (and backport to 14)? >> >> Bootstrapped and regtested on x86 and aarch64. Regtested on rv64gcv. >> >> Regards >> Robin >> >> PR middle-end/115495 >> >> gcc/ChangeLog: >> >> * optabs-query.cc (get_best_extraction_insn): Return if >> smallest_int_mode_for_size might not find a mode. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/riscv/rvv/autovec/pr115495.c: New test. >> --- >> gcc/optabs-query.cc | 4 >> gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115495.c | 9 + >> 2 files changed, 13 insertions(+) >> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115495.c >> >> diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc >> index 93c1d7b8485..dc2690e720f 100644 >> --- a/gcc/optabs-query.cc >> +++ b/gcc/optabs-query.cc >> @@ -208,6 +208,10 @@ get_best_extraction_insn (extraction_insn *insn, >> machine_mode field_mode) >> { >>opt_scalar_int_mode mode_iter; >> + >> + if (maybe_gt (struct_bits, GET_MODE_PRECISION (MAX_MODE_INT))) >> +return false; >> + >>FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode_for_size (struct_bits)) > > I think we instead should change this iteration to use FOR_EACH_MODE_IN_CLASS > (like smallest_mode_for_size does) and skip to small modes? I can't remember whether we rely on the int_n stuff here. (If we do though, it'd only be in a limited way, since the loop only tries int_n for the first size.) An alternative would be to make smallest_int_mode_for_size return an optional mode, which arguably it should be doing anyway. Thanks, Richard > >> { >>scalar_int_mode mode = mode_iter.require (); >> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115495.c >> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115495.c >> new file mode 100644 >> index 000..bbf4d720f63 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115495.c >> @@ -0,0 +1,9 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3" } */ >> + >> +extern short a[]; >> +short b; >> +int main() { >> + for (char c = 0; c < 18; c += 1) >> +a[c + 0] = b; >> +} >> -- >> 2.46.0 >>
[PATCH v3] Update LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker plugin hook
This hook allows the BFD linker plugin to distinguish calls to claim_file_handler that know the object is being used by the linker (from ldmain.c:add_archive_element), from calls that don't know it's being used by the linker (from elf_link_is_defined_archive_symbol); in the latter case, the plugin should avoid including the unused LTO archive members in link output. To get the proper support for archives with LTO common symbols, the linker fix commit a6f8fe0a9e9cbe871652e46ba7c22d5e9fb86208 Author: H.J. Lu Date: Wed Aug 14 20:50:02 2024 -0700 lto: Don't include unused LTO archive members in output is required. PR lto/116361 * lto-plugin.c (claim_file_handler_v2): Rename claimed to can_be_claimed. Include the LTO object only if it is known to be included in link output. Signed-off-by: H.J. Lu --- lto-plugin/lto-plugin.c | 53 - 1 file changed, 31 insertions(+), 22 deletions(-) diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c index 152648338b9..61b0de62f52 100644 --- a/lto-plugin/lto-plugin.c +++ b/lto-plugin/lto-plugin.c @@ -1191,16 +1191,19 @@ process_offload_section (void *data, const char *name, off_t offset, off_t len) return 1; } -/* Callback used by a linker to check if the plugin will claim FILE. Writes - the result in CLAIMED. If KNOWN_USED, the object is known by the linker - to be used, or an older API version is in use that does not provide that - information; otherwise, the linker is only determining whether this is - a plugin object and it should not be registered as having offload data if - not claimed by the plugin. */ +/* Callback used by a linker to check if the plugin can claim FILE. + Writes the result in CAN_BE_CLAIMED. If KNOWN_USED != 0, the object + is known by the linker to be included in link output, or an older API + version is in use that does not provide that information. Otherwise, + the linker is only determining whether this is a plugin object and + only the symbol table is needed by the linker. In this case, the + object should not be included in link output and this function will + be called by the linker again with KNOWN_USED != 0 after the linker + decides the object should be included in link output. */ static enum ld_plugin_status -claim_file_handler_v2 (const struct ld_plugin_input_file *file, int *claimed, - int known_used) +claim_file_handler_v2 (const struct ld_plugin_input_file *file, + int *can_be_claimed, int known_used) { enum ld_plugin_status status; struct plugin_objfile obj; @@ -1229,7 +1232,7 @@ claim_file_handler_v2 (const struct ld_plugin_input_file *file, int *claimed, } lto_file.handle = file->handle; - *claimed = 0; + *can_be_claimed = 0; obj.file = file; obj.found = 0; obj.offload = false; @@ -1286,15 +1289,19 @@ claim_file_handler_v2 (const struct ld_plugin_input_file *file, int *claimed, lto_file.symtab.syms); check (status == LDPS_OK, LDPL_FATAL, "could not add symbols"); - LOCK_SECTION; - num_claimed_files++; - claimed_files = - xrealloc (claimed_files, - num_claimed_files * sizeof (struct plugin_file_info)); - claimed_files[num_claimed_files - 1] = lto_file; - UNLOCK_SECTION; + /* Include it only if it is known to be used for link output. */ + if (known_used) + { + LOCK_SECTION; + num_claimed_files++; + claimed_files = + xrealloc (claimed_files, + num_claimed_files * sizeof (struct plugin_file_info)); + claimed_files[num_claimed_files - 1] = lto_file; + UNLOCK_SECTION; + } - *claimed = 1; + *can_be_claimed = 1; } LOCK_SECTION; @@ -1310,10 +1317,10 @@ claim_file_handler_v2 (const struct ld_plugin_input_file *file, int *claimed, /* If this is an LTO file without offload, and it is the first LTO file, save the pointer to the last offload file in the list. Further offload LTO files will be inserted after it, if any. */ - if (*claimed && !obj.offload && offload_files_last_lto == NULL) + if (*can_be_claimed && !obj.offload && offload_files_last_lto == NULL) offload_files_last_lto = offload_files_last; - if (obj.offload && (known_used || obj.found > 0)) + if (obj.offload && known_used && obj.found > 0) { /* Add file to the list. The order must be exactly the same as the final order after recompilation and linking, otherwise host and target tables @@ -1324,7 +1331,9 @@ claim_file_handler_v2 (const struct ld_plugin_input_file *file, int *claimed, ofld->name = lto_file.name; ofld->next = NULL; - if (*claimed && offload_files_last_lto == NULL && file->offset != 0 + if (*can_be_claimed + && offload_files_last_lto == NULL + && file->offset != 0 && go
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
Am Mittwoch, dem 21.08.2024 um 14:12 + schrieb Qing Zhao: ... > > > > > > + if (__builtin_get_counted_by (__p->FAM)) \ > > > + *(__builtin_get_counted_by(__p->FAM)) = COUNT; \ > > > > > > How to improve it? (Thanks a lot for your suggestion). > > > > There's lack of syntactic guarantee that __builtin_get_counted_by (...) != > > 0 is > > a constant expression. __builtin_set_counted_by (...) would avoid this > > when it would be documented to expand to nothing for a type without a > > counted_by > > member. Does it matter? > > Writing > > > > size_t fake; > > __builtin_choose_expr (__builtin_get_counted_by (__p-->FAM) != 0, > > > > *(__builtin_get_counted_by(__p->FAM)), __fake) = COUNT; > > > > would ensure this but of course requiring the __fake lvalue is ugly, too. > > Yes, you are right. When I wrote the testing case, I felt wield too. (:- > > And another issue with the returned value of __builtin_get_counted_by(PTR) > is, since it returns a pointer, the TYPE of the pointee matters, especially > when returns a NULL pointer, I used a pointer type pointing to the size_t in > case of a NULL pointer being returned to avoid some strict-aliasing issue. > please see PR116316, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116316) for > details. It needs to return a pointer to the actual type of the count field. For NULL, it probably needs to be size_t* for above assignment to work. But if we changed it to return a void pointer, we could make this a compile-time check: auto ret = __builtin_get_counted_by(__p->FAM); _Generic(ret, void*: (void)0, default: *ret = COUNT); > > Yes, I do feel that the approach __builtin_get_counted_by is not very good. > Maybe it’s better to provide > A. __builtin_set_counted_by > or > B. The unary operator __counted_by(PTR) to return a Lvalue, in this case, > we need a __builtin_has_attribute first to check whether PTR has the > counted_by attribute first. You could potentially do the same __counted_by and test for type void. _Generic(typeof(__counted_by(PTR)), void: (void)0, __counted_by(PTR) = COUNT); Martin > > Any suggestion? > > thanks. > > Qing > > > > > > Richard. > > > > > > > > Qing > > > > > > > > > > > No objection to the patch but I wanted to share my thoughts here. > > > > > > > > Richard. > > > > > > > > > Bootstrapped and regression tested on both X86 and aarch64, no issue. > > > > > > > > > > Okay for trunk? > > > > > > > > > > thanks. > > > > > > > > > > Qing. > > > > > > > > > > > > > > > PR c/116016 > > > > > > > > > > gcc/c-family/ChangeLog: > > > > > > > > > > * c-common.cc: Add new __builtin_get_counted_by. > > > > > * c-common.h (enum rid): Add RID_BUILTIN_GET_COUNTED_BY. > > > > > > > > > > gcc/c/ChangeLog: > > > > > > > > > > * c-decl.cc (names_builtin_p): Add RID_BUILTIN_GET_COUNTED_BY. > > > > > * c-parser.cc (has_counted_by_object): New routine. > > > > > (get_counted_by_ref): New routine. > > > > > (c_parser_postfix_expression): Handle New > > > > > RID_BUILTIN_GET_COUNTED_BY. > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > * doc/extend.texi: Add documentation for > > > > > __builtin_get_counted_by. > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > * gcc.dg/builtin-get-counted-by-1.c: New test. > > > > > * gcc.dg/builtin-get-counted-by.c: New test. > > > > > --- > > > > > gcc/c-family/c-common.cc | 1 + > > > > > gcc/c-family/c-common.h | 1 + > > > > > gcc/c/c-decl.cc | 1 + > > > > > gcc/c/c-parser.cc | 72 +++ > > > > > gcc/doc/extend.texi | 55 +++ > > > > > .../gcc.dg/builtin-get-counted-by-1.c | 91 +++ > > > > > gcc/testsuite/gcc.dg/builtin-get-counted-by.c | 54 +++ > > > > > 7 files changed, 275 insertions(+) > > > > > create mode 100644 gcc/testsuite/gcc.dg/builtin-get-counted-by-1.c > > > > > create mode 100644 gcc/testsuite/gcc.dg/builtin-get-counted-by.c > > > > > > > > > > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc > > > > > index e7e371fd26f..4b27c6bfeeb 100644 > > > > > --- a/gcc/c-family/c-common.cc > > > > > +++ b/gcc/c-family/c-common.cc > > > > > @@ -430,6 +430,7 @@ const struct c_common_resword c_common_reswords[] > > > > > = > > > > > { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY }, > > > > > { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY }, > > > > > { "__builtin_convertvector", RID_BUILTIN_CONVERTVECTOR, 0 }, > > > > > + { "__builtin_get_counted_by", RID_BUILTIN_GET_COUNTED_BY, D_CONLY > > > > > }, > > > > > { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 }, > > > > > { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY }, > > > > > { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 }, > > > > > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h > > > > > i
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
Am Mittwoch, dem 21.08.2024 um 16:34 +0200 schrieb Martin Uecker: > Am Mittwoch, dem 21.08.2024 um 14:12 + schrieb Qing Zhao: > > > > > Yes, I do feel that the approach __builtin_get_counted_by is not very good. > > Maybe it’s better to provide > > A. __builtin_set_counted_by > > or > > B. The unary operator __counted_by(PTR) to return a Lvalue, in this case, > > we need a __builtin_has_attribute first to check whether PTR has the > > counted_by attribute first. > > You could potentially do the same __counted_by and test for type void. > > _Generic(typeof(__counted_by(PTR)), void: (void)0, __counted_by(PTR) = COUNT); But just doing A. also seems ok. Martin
Re: [wwwdocs v2] gcc-15: Mention c++ header dependency changes () in porting_to.html
On Wed 2024-08-21 09:50:39, Jonathan Wakely wrote: > On Wed, 21 Aug 2024 at 09:48, Filip Kastl wrote: > > > > Hi, > > > > this is the second version of my patch. See version 1 here: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659584.html > > > > Changes made: > > - Removed plural when referring to the single changed header. From the two > > versions of the text I considered I chose the one with less changes as > > Jonathan suggested. > > - Changed "in libstdc++" to "within libstdc++". > > > > Validated with the W3 Validator. Is this ok to be pushed? > > LGTM. I think I can approve this, since it's documenting libstdc++ > changes and I can approve libstdc++ patches, so unless Gerald has any > further suggestions, please do push - thanks! > I've waited a few hours in case Gerald chimes in. I'm going to push it now. Hope that's ok. Thanks for the feedback! Filip Kastl > > > > Cheers, > > Filip Kastl > > > > > > --- > > htdocs/gcc-15/changes.html| 3 +- > > htdocs/gcc-15/porting_to.html | 54 +++ > > 2 files changed, 55 insertions(+), 2 deletions(-) > > create mode 100644 htdocs/gcc-15/porting_to.html > > > > diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html > > index fe7cf3c1..d0d6d147 100644 > > --- a/htdocs/gcc-15/changes.html > > +++ b/htdocs/gcc-15/changes.html > > @@ -17,9 +17,8 @@ > > > > This page is a "brief" summary of some of the huge number of improvements > > in GCC 15. > > - > > > > diff --git a/htdocs/gcc-15/porting_to.html b/htdocs/gcc-15/porting_to.html > > new file mode 100644 > > index ..702cf507 > > --- /dev/null > > +++ b/htdocs/gcc-15/porting_to.html > > @@ -0,0 +1,54 @@ > > + > > + > > + > > + > > + > > +Porting to GCC 15 > > +https://gcc.gnu.org/gcc.css";> > > + > > + > > + > > +Porting to GCC 15 > > + > > + > > +The GCC 15 release series differs from previous GCC releases in > > +a number of ways. Some of these are a result > > +of bug fixing, and some old behaviors have been intentionally changed > > +to support new standards, or relaxed in standards-conforming ways to > > +facilitate compilation or run-time performance. > > + > > + > > + > > +Some of these changes are user visible and can cause grief when > > +porting to GCC 15. This document is an effort to identify common issues > > +and provide solutions. Let us know if you have suggestions for > > improvements! > > + > > + > > +Note: GCC 15 has not been released yet, so this document is > > +a work-in-progress. > > + > > + > > + > > +C++ language issues > > + > > +Header dependency changes > > +Some C++ Standard Library headers have been changed to no longer include > > +other headers that were being used internally by the library. > > +As such, C++ programs that used standard library components without > > +including the right headers will no longer compile. > > + > > + > > +In particular, the following header is used less widely within libstdc++ > > and > > +may need to be included explicitly when compiling with GCC 15: > > + > > + > > +> > + (for std::int8_t, std::int32_t etc.) > > + > > + > > + > > + > > + > > + > > + > > + > > -- > > 2.45.2 > > >
[PATCH v2] combine.cc (make_more_copies): Copy attributes from the original pseudo, PR115883
The only thing that's changed with the patch in v2 since the first version (pinged once) is the commit message. CC to the nexts-of-kin as a heads-up. Regtested cross to cris-elf and native x86_64-linux-gnu at r15-3043-g64028d626a50. The gcc.dg/guality/pr54200.c magically being fixed was also noticed at an earlier test-run, at r15-1880-gce34fcc572a0. I see on gcc-testresults that this test fails for several targets. Ok to commit? -- >8 -- The first of the late-combine passes, propagates some of the copies made during the (in-time-)combine pass in make_more_copies into the users of the "original" pseudo registers and removes the "old" pseudos. That effectively removes attributes such as REG_POINTER, which matter to LRA. The quoted PR is for an ICE-manifesting bug that was exposed by the late-combine pass and went back to hiding with this patch until commit r15-2937-g3673b7054ec2, the fix for PR116236, when it was actually fixed. To wit, this patch is only incidentally related to that bug. In other words, the REG_POINTER attribute should not be required for LRA to work correctly. This patch merely corrects state for those propagated register-uses to ante late-combine. For reasons not investigated, this fixes a failing test "FAIL: gcc.dg/guality/pr54200.c -Og -DPREVENT_OPTIMIZATION line 20 z == 3" for x86_64-linux-gnu. PR middle-end/115883 * combine.cc (make_more_copies): Copy attributes from the original pseudo to the new copy. --- gcc/combine.cc | 6 ++ 1 file changed, 6 insertions(+) diff --git a/gcc/combine.cc b/gcc/combine.cc index 3b50bc3529c4..6e6e710aae08 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -15102,6 +15102,12 @@ make_more_copies (void) continue; rtx new_reg = gen_reg_rtx (GET_MODE (dest)); + + // The "original" pseudo copies have important attributes + // attached, like pointerness. We want that for these copies + // too, for use by insn recognition and later passes. + set_reg_attrs_from_value (new_reg, dest); + rtx_insn *new_insn = gen_move_insn (new_reg, src); SET_SRC (set) = new_reg; emit_insn_before (new_insn, insn); -- 2.30.2 brgds, H-P
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
> On Aug 21, 2024, at 10:34, Martin Uecker wrote: > > Am Mittwoch, dem 21.08.2024 um 14:12 + schrieb Qing Zhao: > > ... >> >>> + if (__builtin_get_counted_by (__p->FAM)) \ + *(__builtin_get_counted_by(__p->FAM)) = COUNT; \ How to improve it? (Thanks a lot for your suggestion). >>> >>> There's lack of syntactic guarantee that __builtin_get_counted_by (...) != >>> 0 is >>> a constant expression. __builtin_set_counted_by (...) would avoid this >>> when it would be documented to expand to nothing for a type without a >>> counted_by >>> member. > > Does it matter? > >>> Writing >>> >>> size_t fake; >>> __builtin_choose_expr (__builtin_get_counted_by (__p-->FAM) != 0, >>> >>> *(__builtin_get_counted_by(__p->FAM)), __fake) = COUNT; >>> >>> would ensure this but of course requiring the __fake lvalue is ugly, too. >> >> Yes, you are right. When I wrote the testing case, I felt wield too. (:- >> >> And another issue with the returned value of __builtin_get_counted_by(PTR) >> is, since it returns a pointer, the TYPE of the pointee matters, especially >> when returns a NULL pointer, I used a pointer type pointing to the size_t in >> case of a NULL pointer being returned to avoid some strict-aliasing issue. >> please see PR116316, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116316) >> for details. > > It needs to return a pointer to the actual type of the count field. > For NULL, it probably needs to be size_t* for above assignment > to work. Yes, that’s what I did for the patch after understanding PR116316. -:) The current doc of the returned type is: The TYPE of the returned value must be a pointer type pointing to the corresponding type of the counted-by object or a pointer type pointing to the SIZE_T in case of a NULL pointer being returned. > > But if we changed it to return a void pointer, we could make this > a compile-time check: > > auto ret = __builtin_get_counted_by(__p->FAM); > > _Generic(ret, void*: (void)0, default: *ret = COUNT); Is there any benefit to return a void pointer than a SIZE_T pointer for the NULL pointer? > > >> >> Yes, I do feel that the approach __builtin_get_counted_by is not very good. >> Maybe it’s better to provide >> A. __builtin_set_counted_by >> or >> B. The unary operator __counted_by(PTR) to return a Lvalue, in this case, >> we need a __builtin_has_attribute first to check whether PTR has the >> counted_by attribute first. > > You could potentially do the same __counted_by and test for type void. > > _Generic(typeof(__counted_by(PTR)), void: (void)0, __counted_by(PTR) = COUNT); Oh, so, is there any benefit for the unary operator __counted_by(PTR) than the current __builtin_get_counted_by? Thanks. Qing > > Martin > >> >> Any suggestion? >> >> thanks. >> >> Qing >> >> >>> >>> Richard. >>> Qing > > No objection to the patch but I wanted to share my thoughts here. > > Richard. > >> Bootstrapped and regression tested on both X86 and aarch64, no issue. >> >> Okay for trunk? >> >> thanks. >> >> Qing. >> >> >> PR c/116016 >> >> gcc/c-family/ChangeLog: >> >> * c-common.cc: Add new __builtin_get_counted_by. >> * c-common.h (enum rid): Add RID_BUILTIN_GET_COUNTED_BY. >> >> gcc/c/ChangeLog: >> >> * c-decl.cc (names_builtin_p): Add RID_BUILTIN_GET_COUNTED_BY. >> * c-parser.cc (has_counted_by_object): New routine. >> (get_counted_by_ref): New routine. >> (c_parser_postfix_expression): Handle New >> RID_BUILTIN_GET_COUNTED_BY. >> >> gcc/ChangeLog: >> >> * doc/extend.texi: Add documentation for __builtin_get_counted_by. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/builtin-get-counted-by-1.c: New test. >> * gcc.dg/builtin-get-counted-by.c: New test. >> --- >> gcc/c-family/c-common.cc | 1 + >> gcc/c-family/c-common.h | 1 + >> gcc/c/c-decl.cc | 1 + >> gcc/c/c-parser.cc | 72 +++ >> gcc/doc/extend.texi | 55 +++ >> .../gcc.dg/builtin-get-counted-by-1.c | 91 +++ >> gcc/testsuite/gcc.dg/builtin-get-counted-by.c | 54 +++ >> 7 files changed, 275 insertions(+) >> create mode 100644 gcc/testsuite/gcc.dg/builtin-get-counted-by-1.c >> create mode 100644 gcc/testsuite/gcc.dg/builtin-get-counted-by.c >> >> diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc >> index e7e371fd26f..4b27c6bfeeb 100644 >> --- a/gcc/c-family/c-common.cc >> +++ b/gcc/c-family/c-common.cc >> @@ -430,6 +430,7 @@ const struct c_common_resword c_common_reswords[] = >> { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY }, >> { "__builtin_complex", RID_
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
> On Aug 21, 2024, at 10:45, Martin Uecker wrote: > > Am Mittwoch, dem 21.08.2024 um 16:34 +0200 schrieb Martin Uecker: >> Am Mittwoch, dem 21.08.2024 um 14:12 + schrieb Qing Zhao: >> >>> >>> Yes, I do feel that the approach __builtin_get_counted_by is not very good. >>> Maybe it’s better to provide >>> A. __builtin_set_counted_by >>> or >>> B. The unary operator __counted_by(PTR) to return a Lvalue, in this case, >>> we need a __builtin_has_attribute first to check whether PTR has the >>> counted_by attribute first. >> >> You could potentially do the same __counted_by and test for type void. >> >> _Generic(typeof(__counted_by(PTR)), void: (void)0, __counted_by(PTR) = >> COUNT); > > But just doing A. also seems ok. I am fine with A. It’s easier to be used by the end users. The only potential problem with A is, the functionality of READing the counted-by field is missing. Is that okay? Kees? Thanks. Qing > > Martin
Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned vector .SAT_TRUNC form 2
LGTM. -- Regards Robin
Re: [PATCH v1 2/2] RISC-V: Add testcases for unsigned vector .SAT_TRUNC form 3
LGTM. -- Regards Robin
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
Am Mittwoch, dem 21.08.2024 um 15:24 + schrieb Qing Zhao: > > > > But if we changed it to return a void pointer, we could make this > > a compile-time check: > > > > auto ret = __builtin_get_counted_by(__p->FAM); > > > > _Generic(ret, void*: (void)0, default: *ret = COUNT); > > Is there any benefit to return a void pointer than a SIZE_T pointer for > the NULL pointer? Yes! You can test with _Generic (or __builtin_types_compatible_p) at compile-time based on the type whether you can set *ret to COUNT or not as in the example above. So it is not a weird run-time test which needs to be optimized away. > > > > > > > > > > > Yes, I do feel that the approach __builtin_get_counted_by is not very > > > good. > > > Maybe it’s better to provide > > > A. __builtin_set_counted_by > > > or > > > B. The unary operator __counted_by(PTR) to return a Lvalue, in this case, > > > we need a __builtin_has_attribute first to check whether PTR has the > > > counted_by attribute first. > > > > You could potentially do the same __counted_by and test for type void. > > > > _Generic(typeof(__counted_by(PTR)), void: (void)0, __counted_by(PTR) = > > COUNT); > > Oh, so, is there any benefit for the unary operator __counted_by(PTR) than > the current __builtin_get_counted_by? I don't know. You suggested it ;-) It probably makes it harder to test the type because you need the typeof / C2Y Generic combination, but maybe there are other ways to test. Martin > > Thanks. > > Qing > > > > Martin > > > > > > > > Any suggestion? > > > > > > thanks. > > > > > > Qing > > > > > > > > > > > > > > Richard. > > > > > > > > > > > > > > Qing > > > > > > > > > > > > > > > > > No objection to the patch but I wanted to share my thoughts here. > > > > > > > > > > > > Richard. > > > > > > > > > > > > > Bootstrapped and regression tested on both X86 and aarch64, no > > > > > > > issue. > > > > > > > > > > > > > > Okay for trunk? > > > > > > > > > > > > > > thanks. > > > > > > > > > > > > > > Qing. > > > > > > > > > > > > > > > > > > > > > PR c/116016 > > > > > > > > > > > > > > gcc/c-family/ChangeLog: > > > > > > > > > > > > > > * c-common.cc: Add new __builtin_get_counted_by. > > > > > > > * c-common.h (enum rid): Add RID_BUILTIN_GET_COUNTED_BY. > > > > > > > > > > > > > > gcc/c/ChangeLog: > > > > > > > > > > > > > > * c-decl.cc (names_builtin_p): Add > > > > > > > RID_BUILTIN_GET_COUNTED_BY. > > > > > > > * c-parser.cc (has_counted_by_object): New routine. > > > > > > > (get_counted_by_ref): New routine. > > > > > > > (c_parser_postfix_expression): Handle New > > > > > > > RID_BUILTIN_GET_COUNTED_BY. > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > * doc/extend.texi: Add documentation for > > > > > > > __builtin_get_counted_by. > > > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > > > * gcc.dg/builtin-get-counted-by-1.c: New test. > > > > > > > * gcc.dg/builtin-get-counted-by.c: New test. > > > > > > > --- > > > > > > > gcc/c-family/c-common.cc | 1 + > > > > > > > gcc/c-family/c-common.h | 1 + > > > > > > > gcc/c/c-decl.cc | 1 + > > > > > > > gcc/c/c-parser.cc | 72 +++ > > > > > > > gcc/doc/extend.texi | 55 +++ > > > > > > > .../gcc.dg/builtin-get-counted-by-1.c | 91 > > > > > > > +++ > > > > > > > gcc/testsuite/gcc.dg/builtin-get-counted-by.c | 54 +++ > > > > > > > 7 files changed, 275 insertions(+) > > > > > > > create mode 100644 gcc/testsuite/gcc.dg/builtin-get-counted-by-1.c > > > > > > > create mode 100644 gcc/testsuite/gcc.dg/builtin-get-counted-by.c > > > > > > > > > > > > > > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc > > > > > > > index e7e371fd26f..4b27c6bfeeb 100644 > > > > > > > --- a/gcc/c-family/c-common.cc > > > > > > > +++ b/gcc/c-family/c-common.cc > > > > > > > @@ -430,6 +430,7 @@ const struct c_common_resword > > > > > > > c_common_reswords[] = > > > > > > > { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY }, > > > > > > > { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY }, > > > > > > > { "__builtin_convertvector", RID_BUILTIN_CONVERTVECTOR, 0 }, > > > > > > > + { "__builtin_get_counted_by", RID_BUILTIN_GET_COUNTED_BY, > > > > > > > D_CONLY }, > > > > > > > { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 }, > > > > > > > { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY }, > > > > > > > { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 }, > > > > > > > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h > > > > > > > index 2510ee4dbc9..5d5a297012f 100644 > > > > > > > --- a/gcc/c-family/c-common.h > > > > > > > +++ b/gcc/c-family/c-common.h > > > > > > > @@ -110,6 +110,7 @@ enum rid > > > > > > > RID_TYPES_COMPATIBLE_P, RID_BUILTIN_COMPLEX,
Re: [PATCH] c, v2: Add support for unsequenced and reproducible attributes
On Thu, 1 Aug 2024, Jakub Jelinek wrote: > +Unsequenced functions without pointer or reference arguments are similar > +to functions with the @code{const} attribute, except that @code{const} > +attribute also requires finitness. So, both functions with @code{const} s/finitness/finiteness/ (in all places). > --- gcc/testsuite/gcc.dg/c23-attr-reproducible-4.c.jj 2024-08-01 > 14:37:23.948824359 +0200 > +++ gcc/testsuite/gcc.dg/c23-attr-reproducible-4.c2024-08-01 > 14:37:23.948824359 +0200 > @@ -0,0 +1,12 @@ > +/* Test C23 reproducible attribute: duplicates (allowed after N2557). */ The reference to N2557 seems anachronistic here, since the restrictions on duplicates were removed some time before the unsequenced and reproducible attributes were added to the working draft; there never was a time when C23 supported those attributes without allowing duplicates. (The test itself is fine; testing duplicates is a good thing to do.) > --- gcc/testsuite/gcc.dg/c23-attr-reproducible-6.c.jj 2024-08-01 > 14:37:23.948824359 +0200 > +++ gcc/testsuite/gcc.dg/c23-attr-reproducible-6.c2024-08-01 > 14:37:23.948824359 +0200 > @@ -0,0 +1,21 @@ > +/* Test C23 reproducible attribute: composite type on ?:. */ > +/* { dg-do compile } */ > +/* { dg-options "-std=c23 -pedantic-errors" } */ > + > +int > +f1 () [[reproducible]] > +{ > + return 42; > +} > + > +int > +f2 () > +{ > + return 43; > +} > + > +int > +f3 () > +{ > + return 44; > +} I don't see how this test relates to the "composite type on ?:" comment (or that it's doing anything useful). > --- gcc/testsuite/gcc.dg/c23-attr-unsequenced-4.c.jj 2024-08-01 > 14:37:23.949824346 +0200 > +++ gcc/testsuite/gcc.dg/c23-attr-unsequenced-4.c 2024-08-01 > 14:37:23.949824346 +0200 > @@ -0,0 +1,12 @@ > +/* Test C23 unsequenced attribute: duplicates (allowed after N2557). */ Likewise here. > --- gcc/testsuite/gcc.dg/c23-attr-unsequenced-6.c.jj 2024-08-01 > 14:37:23.949824346 +0200 > +++ gcc/testsuite/gcc.dg/c23-attr-unsequenced-6.c 2024-08-01 > 14:37:23.949824346 +0200 > @@ -0,0 +1,21 @@ > +/* Test C23 unsequenced attribute: composite type on ?:. */ > +/* { dg-do compile } */ > +/* { dg-options "-std=c23 -pedantic-errors" } */ > + > +int > +f1 () [[unsequenced]] > +{ > + return 42; > +} > + > +int > +f2 () > +{ > + return 43; > +} > + > +int > +f3 () > +{ > + return 44; > +} And likewise here. The patch is OK with the above fixed in the absence of any objections within a week. (I'm supposing fixed here means removing c23-attr-reproducible-6.c and c23-attr-unsequenced-6.c; if there's some different test meant to be there, or some reason those tests are in fact useful, then the updated / new tests should be reviewed.) -- Joseph S. Myers josmy...@redhat.com
[PATCH] arm: Always use vmov.f64 instead of vmov.f32 with MVE
With MVE, vmov.f64 is always supported (no need for +fp.dp extension). This patch updates two patterns: - in movdi_vfp, we incorrectly checked TARGET_VFP_SINGLE || TARGET_HAVE_MVE instead of TARGET_VFP_SINGLE && !TARGET_HAVE_MVE, and didn't take into account these two possibilities when computing the length attribute. - in thumb2_movdf_vfp, we checked only TARGET_VFP_SINGLE. No need to update movdf_vfp, since it is enabled only for TARGET_ARM (which is not the case when MVE is enabled). The patch also updates gcc.target/arm/armv8_1m-fp64-move-1.c, to accept only vmov.f64 instead of vmov.f32. Tested on arm-none-eabi with: qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve.fp qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve.fp+fp.dp 2024-08-21 Christophe Lyon gcc/ * config/arm/vfp.md (movdi_vfp, thumb2_movdf_vfp): Handle MVE case. gcc/testsuite/ * gcc.target/arm/armv8_1m-fp64-move-1.c: Update expected code. --- gcc/config/arm/vfp.md | 8 gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c | 8 +--- 2 files changed, 5 insertions(+), 11 deletions(-) diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index 773f55664a9..3212d9c7aa1 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -367,7 +367,7 @@ case 8: return \"vmov%?\\t%Q0, %R0, %P1\\t%@ int\"; case 9: - if (TARGET_VFP_SINGLE || TARGET_HAVE_MVE) + if (TARGET_VFP_SINGLE && !TARGET_HAVE_MVE) return \"vmov%?.f32\\t%0, %1\\t%@ int\;vmov%?.f32\\t%p0, %p1\\t%@ int\"; else return \"vmov%?.f64\\t%P0, %P1\\t%@ int\"; @@ -385,7 +385,7 @@ (symbol_ref "arm_count_output_move_double_insns (operands) * 4") (eq_attr "alternative" "9") (if_then_else - (match_test "TARGET_VFP_SINGLE") + (match_test "TARGET_VFP_SINGLE && !TARGET_HAVE_MVE") (const_int 8) (const_int 4))] (const_int 4))) @@ -744,7 +744,7 @@ case 6: case 7: case 9: return output_move_double (operands, true, NULL); case 8: - if (TARGET_VFP_SINGLE) + if (TARGET_VFP_SINGLE && !TARGET_HAVE_MVE) return \"vmov%?.f32\\t%0, %1\;vmov%?.f32\\t%p0, %p1\"; else return \"vmov%?.f64\\t%P0, %P1\"; @@ -758,7 +758,7 @@ (set (attr "length") (cond [(eq_attr "alternative" "6,7,9") (const_int 8) (eq_attr "alternative" "8") (if_then_else -(match_test "TARGET_VFP_SINGLE") +(match_test "TARGET_VFP_SINGLE && !TARGET_HAVE_MVE") (const_int 8) (const_int 4))] (const_int 4))) diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c index d236f0826c3..4a3cf0a5afb 100644 --- a/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c @@ -33,13 +33,7 @@ w_r () /* ** w_w: -** ( -** vmov.f32s2, s0 -** vmov.f32s3, s1 -** | -** vmov.f32s3, s1 -** vmov.f32s2, s0 -** ) +** vmov.f64d1, d0 ** bx lr */ void -- 2.34.1
Re: [PATCH] testuite: Accept vmov.f64
On Wed, 14 Aug 2024 at 22:04, Torbjörn SVENSSON wrote: > > Ok for trunk and releases/gcc-14? > > -- > > On Cortex-M55 with fpv5-d16, the vmov.f64 instruction is used. Hi Torbjorn, Thanks for the patch: after looking further I realized that we can always generate vmov.f64 with MVE, so I propose this patch instead: https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661064.html Thanks, Christophe > > gcc/testsuite/ChangeLog: > > * armv8_1m-fp64-move-1.c: Accept vmov.f64 instruction. > > Signed-off-by: Torbjörn SVENSSON > --- > gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c > b/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c > index d236f0826c3..44abfcf1518 100644 > --- a/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c > +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c > @@ -2,7 +2,7 @@ > /* { dg-options "-O" } */ > /* { dg-require-effective-target arm_v8_1m_mve_ok } */ > /* { dg-add-options arm_v8_1m_mve } */ > -/* { dg-additional-options "-mfloat-abi=hard" } * > +/* { dg-additional-options "-mfloat-abi=hard" } */ > /* { dg-final { check-function-bodies "**" "" } } */ > > /* > @@ -39,6 +39,8 @@ w_r () > ** | > ** vmov.f32s3, s1 > ** vmov.f32s2, s0 > +** | > +** vmov.f64d1, d0 > ** ) > ** bx lr > */ > -- > 2.25.1 >
[PATCH v2] tree-optimization/116024 - match.pd: add 4 int-compare simplifications
Hi, sending a v2 of https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659851.html after changing variable types in all new testcases from standard to fixed-width. Could anyone please assist with reviewing and/or pushing to trunk/14 since I don't have commit access? Many thanks, Artemiy -- 8< This patch implements match.pd patterns for the following transformations: (1) (UB-on-overflow types) C1 - X cmp C2 -> X cmp C1 - C2 (2) (unsigned types) C1 - X cmp C2 -> (a) X cmp C1 - C2, when cmp is !=, == (b) X - (C1 - C2) cmp C2, when cmp is <=, > (c) X - (C1 - C2 + 1) cmp C2, when cmp is <, >=, (3) (signed wrapping types) C1 - X cmp C2 (a) X cmp C1 - C2, when cmp is !=, == (b) X - (C1 + 1) rcmp -(C2 + 1), otherwise (4) (all wrapping types) X + C1 cmp C2 -> (a) X cmp C2 - C1, when cmp is !=, == (b) X cmp -C1, when cmp is <=, > and C2 - C1 == max (c) X cmp -C1, when cmp is <, >= and C2 - C1 == min Included along are testcases for all the aforementioned changes. This patch has been bootstrapped and regtested on aarch64, x86_64, and i386, and additionally regtested on riscv32. Existing tests were adjusted where necessary. gcc/ChangeLog: PR tree-optimization/116024 * match.pd: New transformations around integer comparison. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr116024.c: New test. * gcc.dg/tree-ssa/pr116024-1.c: Ditto. * gcc.dg/tree-ssa/pr116024-1-fwrapv.c: Ditto. * gcc.dg/tree-ssa/pr116024-2.c: Ditto. * gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto. * gcc.dg/pr67089-6.c: Adjust. * gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Ditto. Signed-off-by: Artemiy Volkov --- gcc/match.pd | 75 ++- gcc/testsuite/gcc.dg/pr67089-6.c | 4 +- .../gcc.dg/tree-ssa/pr116024-1-fwrapv.c | 74 ++ gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c| 74 ++ .../gcc.dg/tree-ssa/pr116024-2-fwrapv.c | 38 ++ gcc/testsuite/gcc.dg/tree-ssa/pr116024-2.c| 39 ++ gcc/testsuite/gcc.dg/tree-ssa/pr116024.c | 74 ++ .../gcc.target/aarch64/gtu_to_ltu_cmp_1.c | 2 +- 8 files changed, 376 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2-fwrapv.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024.c diff --git a/gcc/match.pd b/gcc/match.pd index 65a3aae2243..bf3ccef7437 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -8800,6 +8800,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (cmp @0 { TREE_OVERFLOW (res) ? drop_tree_overflow (res) : res; } (for cmp (lt le gt ge) + rcmp (gt ge lt le) (for op (plus minus) rop (minus plus) (simplify @@ -8827,7 +8828,79 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) "X cmp C2 -+ C1"), WARN_STRICT_OVERFLOW_COMPARISON); } - (cmp @0 { res; }) + (cmp @0 { res; }) +/* For wrapping types, simplify X + C1 CMP C2 to X CMP -C1 when possible. */ + (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))) + (with + { + wide_int max = wi::max_value (TREE_TYPE (@0)); + wide_int min = wi::min_value (TREE_TYPE (@0)); + + wide_int c2 = rop == PLUS_EXPR + ? wi::add (wi::to_wide (@2), wi::to_wide (@1)) + : wi::sub (wi::to_wide (@2), wi::to_wide (@1)); + } + (if (((cmp == LE_EXPR || cmp == GT_EXPR) && wi::eq_p (c2, max)) + || ((cmp == LT_EXPR || cmp == GE_EXPR) && wi::eq_p (c2, min))) + (with + { +wide_int c1 = rop == PLUS_EXPR + ? wi::add (min, wi::to_wide (@1)) + : wi::sub (min, wi::to_wide (@1)); +tree c1_cst = build_uniform_cst (TREE_TYPE (@0), + wide_int_to_tree (TREE_TYPE (@0), c1)); + } + (rcmp @0 { c1_cst; }) + +/* Invert sign of X in comparisons of the form C1 - X CMP C2. */ + +(for cmp (lt le gt ge eq ne) + rcmp (gt ge lt le eq ne) + (simplify + (cmp (minus INTEGER_CST@0 @1) INTEGER_CST@2) + (if (!TREE_OVERFLOW (@0) && !TREE_OVERFLOW (@2) + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@1))) + (with { tree res = int_const_binop (MINUS_EXPR, @0, @2); } + (if (TREE_OVERFLOW (res)) + (with + { + fold_overflow_warning (("assuming signed overflow does not occur " + "when simplifying conditional to constant"), + WARN_STRICT_OVERFLOW_CONDITIONAL); + } + (switch +(if (cmp == NE_EXPR) + { constant_b
Re: [Fortran, Patch, PR86468, v1] Follow up: Remove obsolete VIEW_CONVERT
On Wed, Aug 21, 2024 at 12:17:46PM +0200, Andre Vehreschild wrote: > > attached small patch removes a VIEW_CONVERT that I erroneously inserted during > patching pr110033. PR86468 fixes the (co-)rank computation and therefore this > VIEW_CONVERT is IMO obsolete. I think it may cause hard to find runtime bugs > in > the future and therefore like to remove it. > > Regtests ok on x86_64-pc-linux-gnu. Ok for mainline? > Yes. -- Steve
Re: [PATCH 2/2] libstdc++: Implement P2997R1 changes to the indirect invocability concepts
On Wed, 21 Aug 2024, Jonathan Wakely wrote: > On Wed, 21 Aug 2024 at 01:40, Patrick Palka wrote: > > > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps > > 14? > > > > -- >8 -- > > > > This implements the changes of this C++26 paper as a DR against C++20. > > > > libstdc++-v3/ChangeLog: > > > > * include/bits/iterator_concepts.h (indirectly_unary_invocable): > > Relax as per P2997R1. > > (indirectly_regular_unary_invocable): Likewise. > > (indirect_unary_predicate): Likewise. > > (indirect_binary_predicate): Likewise. > > (indirect_equivalence_relation): Likewise. > > (indirect_strict_weak_order): Likewise. > > * version.def (ranges): Update value for C++26. > > * version.h: Regenerate. > > * testsuite/24_iterators/indirect_callable/p2997r1.cc: New test. > > * testsuite/std/ranges/version_c++23.cc: Restrict to C++23 mode. > > * testsuite/std/ranges/version_c++26.cc: New test. > > Can we get rid of version_c++23.cc (and not add version_c++26.cc) and > just expand the check in std/ranges/synopsis.cc instead? > > Currently it does: > > #ifndef __cpp_lib_ranges > # error "Feature test macro for ranges is missing in " > #elif __cpp_lib_ranges < 201911L > # error "Feature test macro for ranges has wrong value in " > #endif > > but that could be: > > #ifndef __cpp_lib_ranges > # error "Feature test macro for ranges is missing in " > #elif __cplusplus > 202302 && __cpp_lib_ranges < 202406L > # error "Feature test macro for ranges has wrong value in " > #elif __cplusplus == 202302 && __cpp_lib_ranges < 202406L > # error "Feature test macro for ranges has wrong value in " > #elif __cpp_lib_ranges < 201911L > # error "Feature test macro for ranges has wrong value in " > #endif > > or define EXPECTED_VALUE to the appropriate value for each __cplusplus > dialect, then have one test of __cpp_lib_ranges != EXPECTED_VALUE. Sounds good, though I opted to test for the exact __cpp_lib_ranges value instead of using an inequality. Like so? -- >8 -- Subject: [PATCH] libstdc++: Implement P2997R1 changes to the indirect invocability concepts This implements the changes of this C++26 paper as a DR against C++20. In passing this patch removes the std/ranges/version_c++23.cc test which is now mostly obsolete after the version.def FTM refactoring, and instead expands the __cpp_lib_ranges checks in another test file so that they test the exact expected value of the FTM on a per language version basis. libstdc++-v3/ChangeLog: * include/bits/iterator_concepts.h (indirectly_unary_invocable): Relax as per P2997R1. (indirectly_regular_unary_invocable): Likewise. (indirect_unary_predicate): Likewise. (indirect_binary_predicate): Likewise. (indirect_equivalence_relation): Likewise. (indirect_strict_weak_order): Likewise. * version.def (ranges): Update value for C++26. * version.h: Regenerate. * testsuite/24_iterators/indirect_callable/p2997r1.cc: New test. * testsuite/std/ranges/version_c++23.cc: Remove. * testsuite/std/ranges/headers/ranges/synopsis.cc: Refine the __cpp_lib_ranges checks. --- libstdc++-v3/include/bits/iterator_concepts.h | 17 ++--- libstdc++-v3/include/bits/version.def | 5 ++ libstdc++-v3/include/bits/version.h | 7 +- .../24_iterators/indirect_callable/p2997r1.cc | 37 ++ .../std/ranges/headers/ranges/synopsis.cc | 6 +- .../testsuite/std/ranges/version_c++23.cc | 70 --- 6 files changed, 57 insertions(+), 85 deletions(-) create mode 100644 libstdc++-v3/testsuite/24_iterators/indirect_callable/p2997r1.cc delete mode 100644 libstdc++-v3/testsuite/std/ranges/version_c++23.cc diff --git a/libstdc++-v3/include/bits/iterator_concepts.h b/libstdc++-v3/include/bits/iterator_concepts.h index 9306b7bd194..d849ddc32fc 100644 --- a/libstdc++-v3/include/bits/iterator_concepts.h +++ b/libstdc++-v3/include/bits/iterator_concepts.h @@ -724,7 +724,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION concept indirectly_unary_invocable = indirectly_readable<_Iter> && copy_constructible<_Fn> && invocable<_Fn&, __indirect_value_t<_Iter>> && invocable<_Fn&, iter_reference_t<_Iter>> - && invocable<_Fn&, iter_common_reference_t<_Iter>> && common_reference_with>, invoke_result_t<_Fn&, iter_reference_t<_Iter>>>; @@ -733,15 +732,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION && copy_constructible<_Fn> && regular_invocable<_Fn&, __indirect_value_t<_Iter>> && regular_invocable<_Fn&, iter_reference_t<_Iter>> - && regular_invocable<_Fn&, iter_common_reference_t<_Iter>> && common_reference_with>, invoke_result_t<_Fn&, iter_reference_t<_Iter>>>; template concept indirect_unary_predicate = indirectly_readable<_Iter> && copy_constructible<
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
On Wed, Aug 21, 2024 at 03:27:56PM +, Qing Zhao wrote: > > On Aug 21, 2024, at 10:45, Martin Uecker wrote: > > > > Am Mittwoch, dem 21.08.2024 um 16:34 +0200 schrieb Martin Uecker: > >> Am Mittwoch, dem 21.08.2024 um 14:12 + schrieb Qing Zhao: > >> > >>> > >>> Yes, I do feel that the approach __builtin_get_counted_by is not very > >>> good. > >>> Maybe it’s better to provide > >>> A. __builtin_set_counted_by > >>> or > >>> B. The unary operator __counted_by(PTR) to return a Lvalue, in this case, > >>> we need a __builtin_has_attribute first to check whether PTR has the > >>> counted_by attribute first. > >> > >> You could potentially do the same __counted_by and test for type void. > >> > >> _Generic(typeof(__counted_by(PTR)), void: (void)0, __counted_by(PTR) = > >> COUNT); > > > > But just doing A. also seems ok. > > I am fine with A. It’s easier to be used by the end users. > > The only potential problem with A is, the functionality of READing the > counted-by field is missing. > Is that okay? Kees? After seeing the utility of __builtin_get_counted_by() I realized that we really do want it for the ability to examine the _type_ of the counter member, otherwise we run the risk of assignment truncation. For example: struct flex { unsigned char counter; int array[] __attribute__((counted_by(counter))); } *p; count = 1000; ... __builtin_set_counted_by(p->array, count); What value would p->counter end up with? (I assume it would wrap around, which is bad). And there would be no way to catch it at run-time without a way to check the type. For example with __builtin_get_counted_by, we can do: if (__builtin_get_counted_by(p->array)) { size_t max_value = type_max(typeof(*__builtin_get_counted_by(p->array))); if (count > type_max) ...fail cleanly... *__builtin_get_counted_by(p->array) = count; } I don't strictly need to READ the value (but it seems nice). Currently I can already do a READ with something like this: size_t count = __builtin_dynamic_object_size(p->array, 1) / sizeof(*p->array); But I don't have a way to examine the counter _type_ without __builtin_get_counted_by, so I prefer it over __builtin_set_counted_by. Thanks! -Kees -- Kees Cook
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
On Wed, Aug 21, 2024 at 05:43:42PM +0200, Martin Uecker wrote: > Am Mittwoch, dem 21.08.2024 um 15:24 + schrieb Qing Zhao: > > > > > > But if we changed it to return a void pointer, we could make this > > > a compile-time check: > > > > > > auto ret = __builtin_get_counted_by(__p->FAM); > > > > > > _Generic(ret, void*: (void)0, default: *ret = COUNT); > > > > Is there any benefit to return a void pointer than a SIZE_T pointer for > > the NULL pointer? > > Yes! You can test with _Generic (or __builtin_types_compatible_p) > at compile-time based on the type whether you can set *ret to COUNT > or not as in the example above. > > So it is not a weird run-time test which needs to be optimized > away. I don't have a strong opinion here, but I would tend to agree that returning "void *" is a better signal that it is not valid. And I do really like the _Generic example there, which makes it even easier to do the "set if counted_by" action. -- Kees Cook
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
> On Aug 21, 2024, at 11:43, Martin Uecker wrote: > > Am Mittwoch, dem 21.08.2024 um 15:24 + schrieb Qing Zhao: >>> >>> But if we changed it to return a void pointer, we could make this >>> a compile-time check: >>> >>> auto ret = __builtin_get_counted_by(__p->FAM); >>> >>> _Generic(ret, void*: (void)0, default: *ret = COUNT); >> >> Is there any benefit to return a void pointer than a SIZE_T pointer for >> the NULL pointer? > > Yes! You can test with _Generic (or __builtin_types_compatible_p) > at compile-time based on the type whether you can set *ret to COUNT > or not as in the example above. > > So it is not a weird run-time test which needs to be optimized > away. Okay, I see. Yes, this makes good sense to me. > > >> >>> >>> Yes, I do feel that the approach __builtin_get_counted_by is not very good. Maybe it’s better to provide A. __builtin_set_counted_by or B. The unary operator __counted_by(PTR) to return a Lvalue, in this case, we need a __builtin_has_attribute first to check whether PTR has the counted_by attribute first. >>> >>> You could potentially do the same __counted_by and test for type void. >>> >>> _Generic(typeof(__counted_by(PTR)), void: (void)0, __counted_by(PTR) = >>> COUNT); >> >> Oh, so, is there any benefit for the unary operator __counted_by(PTR) than >> the current __builtin_get_counted_by? > > I don't know. You suggested it ;-) > > It probably makes it harder to test the type because you need the > typeof / C2Y Generic combination, but maybe there are other ways > to test. For the unary operator __counted_by(PTR), “PTR” must have a counted_by attribute, if not, there will be a compilation time error. Then the user could write the following code: If __builtin_has_attriubtes (PTR,counted_by) __counted_by(PTR) = COUNT; From the design point of view, I think this might be the cleanest solution. However, currently, CLANG doesn’t have __builtin_has_attributes. In order to provide a consistent interface, __builtin_get_counted_by(PTR) is fine. Thanks. Qing > > > Martin > > >> >> Thanks. >> >> Qing >>> >>> Martin >>> Any suggestion? thanks. Qing > > Richard. > >> >> Qing >> >>> >>> No objection to the patch but I wanted to share my thoughts here. >>> >>> Richard. >>> Bootstrapped and regression tested on both X86 and aarch64, no issue. Okay for trunk? thanks. Qing. PR c/116016 gcc/c-family/ChangeLog: * c-common.cc: Add new __builtin_get_counted_by. * c-common.h (enum rid): Add RID_BUILTIN_GET_COUNTED_BY. gcc/c/ChangeLog: * c-decl.cc (names_builtin_p): Add RID_BUILTIN_GET_COUNTED_BY. * c-parser.cc (has_counted_by_object): New routine. (get_counted_by_ref): New routine. (c_parser_postfix_expression): Handle New RID_BUILTIN_GET_COUNTED_BY. gcc/ChangeLog: * doc/extend.texi: Add documentation for __builtin_get_counted_by. gcc/testsuite/ChangeLog: * gcc.dg/builtin-get-counted-by-1.c: New test. * gcc.dg/builtin-get-counted-by.c: New test. --- gcc/c-family/c-common.cc | 1 + gcc/c-family/c-common.h | 1 + gcc/c/c-decl.cc | 1 + gcc/c/c-parser.cc | 72 +++ gcc/doc/extend.texi | 55 +++ .../gcc.dg/builtin-get-counted-by-1.c | 91 +++ gcc/testsuite/gcc.dg/builtin-get-counted-by.c | 54 +++ 7 files changed, 275 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/builtin-get-counted-by-1.c create mode 100644 gcc/testsuite/gcc.dg/builtin-get-counted-by.c diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc index e7e371fd26f..4b27c6bfeeb 100644 --- a/gcc/c-family/c-common.cc +++ b/gcc/c-family/c-common.cc @@ -430,6 +430,7 @@ const struct c_common_resword c_common_reswords[] = { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY }, { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY }, { "__builtin_convertvector", RID_BUILTIN_CONVERTVECTOR, 0 }, + { "__builtin_get_counted_by", RID_BUILTIN_GET_COUNTED_BY, D_CONLY }, { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 }, { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY }, { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 }, diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index
[patch] libgomp.texi: Document OpenMP's Interoperability Routines
Add documentation for OpenMP's interoperability routines. This obviously, depends on the actual implementation patch, posted at: https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661035.html (albeit I will post a v2 in a moment). I am sure there will be comments, suggestions and remarks :-) Tobias PS: I am not 100% sure whether adding the implementation detail makes sense or not. libgomp.texi: Document OpenMP's Interoperability Routines libgomp/ChangeLog: * libgomp.texi (Interoperability Routines): Add. diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index fe25d879788..ecc60882d72 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -656,7 +656,7 @@ specification in version 5.2. * Lock Routines:: * Timing Routines:: * Event Routine:: -@c * Interoperability Routines:: +* Interoperability Routines:: * Memory Management Routines:: @c * Tool Control Routine:: * Environment Display Routine:: @@ -2884,21 +2884,294 @@ event handle that has already been fulfilled is also undefined. -@c @node Interoperability Routines -@c @section Interoperability Routines -@c -@c Routines to obtain properties from an @code{omp_interop_t} object. -@c They have C linkage and do not throw exceptions. -@c -@c @menu -@c * omp_get_num_interop_properties:: -@c * omp_get_interop_int:: -@c * omp_get_interop_ptr:: -@c * omp_get_interop_str:: -@c * omp_get_interop_name:: -@c * omp_get_interop_type_desc:: -@c * omp_get_interop_rc_desc:: -@c @end menu +@node Interoperability Routines +@section Interoperability Routines + +Routines to obtain properties from an object of OpenMP interop type. +They have C linkage and do not throw exceptions. + +@menu +* omp_get_num_interop_properties:: Get the number of implementation-specific properties +* omp_get_interop_int:: Obtain integer-valued interoperability property +* omp_get_interop_ptr:: Obtain pointer-valued interoperability property +* omp_get_interop_str:: Obtain string-valued interoperability property +* omp_get_interop_name:: Obtain the name of an interop_property value as string +* omp_get_interop_type_desc:: Obtain type and description to an interop_property +* omp_get_interop_rc_desc:: Obtain error string to an interop_rc error code +@end menu + + + +@node omp_get_num_interop_properties +@subsection @code{omp_get_num_interop_properties} -- Get the number of implementation-specific properties +@table @asis +@item @emph{Description}: +The @code{omp_get_num_interop_properties} function returns the number of +implementation-defined interoperability properties available for the passed +@var{interop}, extending the OpenMP-defined properties. The available OpenMP +interop_property-type values range from @code{omp_ipr_first} to the value +returned by @code{omp_get_num_interop_properties} minus one. + +No implementation-defined properties are currently defined in GCC. + +Implementation remark: In GCC, the Fortran interface differs from the one shown +below: the function has C binding, @var{interop} is passed by value and an +integer of @code{c_int} kind is returnd, permitting to have the same ABI as the +C function. This does not affect the usage of the function when GCC's +@code{omp_lib} module or @code{omp_lib.h} header is used. + +@item @emph{C/C++}: +@multitable @columnfractions .20 .80 +@item @emph{Prototype}: @tab @code{int omp_get_num_interop_properties(const omp_interop_t interop)} +@end multitable + +@item @emph{Fortran}: +@multitable @columnfractions .20 .80 +@item @emph{Interface}: @tab @code{integer function omp_get_num_interop_properties(interop)} +@item @tab @code{integer(omp_interop_kind), intent(in) :: interop} +@end multitable + +@item @emph{See also}: +@ref{omp_get_interop_name}, @ref{omp_get_interop_type_desc} + +@item @emph{Reference}: +@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.1, +@uref{https://www.openmp.org, OpenMP specification TR13}, Section 26.1 +@end table + + + +@node omp_get_interop_int +@subsection @code{omp_get_interop_int} -- Obtain integer-valued interoperability property +@table @asis +@item @emph{Description}: +The @code{omp_get_interop_int} function returns the integer value associated +with the @var{property_id} interoperability property of the passed @var{interop} +object. If successful, @var{ret_code} is set to @code{omp_irc_success}. + +Implementation remark: In GCC, the Fortran interface differs from the one shown +below: the function has C binding and @var{interop} and @var{property_id} are +passed by value, permitting to have the same ABI as the C function. This does +not affect the usage of the function when GCC's @code{omp_lib} module or +@code{omp_lib.h} header is used. + +@item @emph{C/C++}: +@multitable @columnfractions .20 .80 +@item @emph{Prototype}: @tab @code{omp_intptr_t omp_get_interop_int(const omp_interop_t interop, + omp_interop_property_t property_id, int *ret_code)} +@end multitable + +@item @emph{Fortran
[PATCH v3 1/2] c++: improve location of parsed RETURN_EXPRs
For clarity, here's the entire split-up patch I intend to push, if it looks OK. Tested on x86_64-pc-linux-gnu. I've renamed the field we've discussed and also a few parameters that refer to 'kw' to be less specific. The code is functionally identical. OK for trunk? TIA, have a lovely day. -- >8 -- This patch improves the EXPR_LOCATION associated with parsed RETURN_EXPRs so that they can be used in diagnostics later. This change also happened to un-suppress an analyzer false-negative that was happening because the location of RETURN_EXPR was entirely within the NULL macro, which was defined in a system header. PR analyzer/116304. gcc/cp/ChangeLog: * cp-tree.h (finish_return_stmt): Add optional location_t parameter, defaulting to input_location. * parser.cc (cp_parser_jump_statement): Improve return and co_return locations so that they span their entire statements. * semantics.cc (finish_return_stmt): Use the new stmt_loc parameter in place of input_location. gcc/testsuite/ChangeLog: * c-c++-common/analyzer/inlining-4-multiline.c: Adjust locations in diagnostics. * c-c++-common/analyzer/malloc-paths-9-noexcept.c: Ditto. * c-c++-common/analyzer/malloc-CWE-401-example.c: Accept the new warning on line 34 (fixed false negative). --- gcc/cp/cp-tree.h | 2 +- gcc/cp/parser.cc | 9 +++-- gcc/cp/semantics.cc| 4 ++-- .../c-c++-common/analyzer/inlining-4-multiline.c | 6 -- .../c-c++-common/analyzer/malloc-CWE-401-example.c | 1 + .../c-c++-common/analyzer/malloc-paths-9-noexcept.c| 10 +- 6 files changed, 16 insertions(+), 16 deletions(-) diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index a53fbcb43ec4..83be768420aa 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -7794,7 +7794,7 @@ extern void finish_while_stmt (tree); extern tree begin_do_stmt (void); extern void finish_do_body (tree); extern void finish_do_stmt (tree, tree, bool, tree, bool); -extern tree finish_return_stmt (tree); +extern tree finish_return_stmt (tree, location_t = input_location); extern tree begin_for_scope(tree *); extern tree begin_for_stmt (tree, tree); extern void finish_init_stmt (tree); diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index c9654cfff9d2..68b3f0a0f5c4 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -14952,14 +14952,19 @@ cp_parser_jump_statement (cp_parser* parser, tree &std_attrs) set_musttail_on_return (expr, token->location, musttail_p); } + /* A location spanning the whole statement (up to ';'). */ + auto stmt_loc = make_location (token->location, + token->location, + input_location); + /* Build the return-statement, check co-return first, since type deduction is not valid there. */ if (keyword == RID_CO_RETURN) - statement = finish_co_return_stmt (token->location, expr); + statement = finish_co_return_stmt (stmt_loc, expr); else if (FNDECL_USED_AUTO (current_function_decl) && in_discarded_stmt) /* Don't deduce from a discarded return statement. */; else - statement = finish_return_stmt (expr); + statement = finish_return_stmt (expr, stmt_loc); /* Look for the final `;'. */ cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON); } diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 5ab2076b673c..734c613a474e 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -1400,7 +1400,7 @@ finish_do_stmt (tree cond, tree do_stmt, bool ivdep, tree unroll, indicated. */ tree -finish_return_stmt (tree expr) +finish_return_stmt (tree expr, location_t stmt_loc) { tree r; bool no_warning; @@ -1423,7 +1423,7 @@ finish_return_stmt (tree expr) verify_sequence_points (expr); } - r = build_stmt (input_location, RETURN_EXPR, expr); + r = build_stmt (stmt_loc, RETURN_EXPR, expr); RETURN_EXPR_LOCAL_ADDR_P (r) = dangling; if (no_warning) suppress_warning (r, OPT_Wreturn_type); diff --git a/gcc/testsuite/c-c++-common/analyzer/inlining-4-multiline.c b/gcc/testsuite/c-c++-common/analyzer/inlining-4-multiline.c index 5c971c581ae4..235b715cff96 100644 --- a/gcc/testsuite/c-c++-common/analyzer/inlining-4-multiline.c +++ b/gcc/testsuite/c-c++-common/analyzer/inlining-4-multiline.c @@ -109,15 +109,9 @@ outer (int flag) | 'const char* inner(int)': event 5 (depth 3) | - | - | #define NULL - | - |
[PATCH v3 2/2] c++: improve diagnostic of 'return's in coroutines
We now point out why a function is a coroutine, and where (the first return) is in the function. gcc/cp/ChangeLog: * coroutines.cc (struct coroutine_info): Rename first_coro_keyword -> first_coro_expr. The former name is no longer accurate. (coro_promise_type_found_p): Adjust accordingly. (coro_function_valid_p): Change how we diagnose 'return' statements in coroutines to also point out where a function was made a coroutine, and where 'return' was used. (find_coro_traits_template_decl): Rename kw parameter into loc, since it might not refer to a keyword always. (instantiate_coro_traits): Ditto. (find_coro_handle_template_decl): Ditto. (get_handle_type_address): Ditto. (get_handle_type_from_address): Ditto. (instantiate_coro_handle_for_promise_type): Ditto. (build_template_co_await_expr): Ditto. (finish_co_await_expr): Ditto. (finish_co_yield_expr): Ditto. (finish_co_return_stmt): Ditto. gcc/testsuite/ChangeLog: * g++.dg/coroutines/co-return-syntax-08-bad-return.C: Update to match new diagnostic. Test more keyword combinations. --- gcc/cp/coroutines.cc | 127 ++ .../co-return-syntax-08-bad-return.C | 52 ++- 2 files changed, 119 insertions(+), 60 deletions(-) diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index f7791cbfb9a6..81096784b4d7 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -93,8 +93,8 @@ struct GTY((for_user)) coroutine_info tree promise_proxy; /* Likewise, a proxy promise instance. */ tree from_address; /* handle_type from_address function. */ tree return_void; /* The expression for p.return_void() if it exists. */ - location_t first_coro_keyword; /* The location of the keyword that made this - function into a coroutine. */ + location_t first_coro_expr; /* The location of the expression that turned +this funtion into a coroutine. */ /* Flags to avoid repeated errors for per-function issues. */ bool coro_ret_type_error_emitted; bool coro_promise_error_emitted; @@ -285,7 +285,7 @@ static GTY(()) tree void_coro_handle_address; Lookup the coroutine_traits template decl. */ static tree -find_coro_traits_template_decl (location_t kw) +find_coro_traits_template_decl (location_t loc) { /* If we are missing fundamental information, such as the traits, (or the declaration found is not a type template), then don't emit an error for @@ -300,7 +300,7 @@ find_coro_traits_template_decl (location_t kw) { if (!traits_error_emitted) { - gcc_rich_location richloc (kw); + gcc_rich_location richloc (loc); error_at (&richloc, "coroutines require a traits template; cannot" " find %<%E::%E%>", std_node, coro_traits_identifier); inform (&richloc, "perhaps %<#include %> is missing"); @@ -315,7 +315,7 @@ find_coro_traits_template_decl (location_t kw) /* Instantiate Coroutine traits for the function signature. */ static tree -instantiate_coro_traits (tree fndecl, location_t kw) +instantiate_coro_traits (tree fndecl, location_t loc) { /* [coroutine.traits.primary] So now build up a type list for the template . @@ -358,7 +358,7 @@ instantiate_coro_traits (tree fndecl, location_t kw) if (traits_class == error_mark_node) { - error_at (kw, "cannot instantiate %"); + error_at (loc, "cannot instantiate %"); return NULL_TREE; } @@ -368,7 +368,7 @@ instantiate_coro_traits (tree fndecl, location_t kw) /* [coroutine.handle] */ static tree -find_coro_handle_template_decl (location_t kw) +find_coro_handle_template_decl (location_t loc) { /* As for the coroutine traits, this error is per TU, so only emit it once. */ @@ -380,7 +380,7 @@ find_coro_handle_template_decl (location_t kw) || !DECL_CLASS_TEMPLATE_P (handle_decl)) { if (!coro_handle_error_emitted) - error_at (kw, "coroutines require a handle class template;" + error_at (loc, "coroutines require a handle class template;" " cannot find %<%E::%E%>", std_node, coro_handle_identifier); coro_handle_error_emitted = true; return NULL_TREE; @@ -394,21 +394,21 @@ find_coro_handle_template_decl (location_t kw) void*. If that is not the case, signals an error and returns NULL_TREE. */ static tree -get_handle_type_address (location_t kw, tree handle_type) +get_handle_type_address (location_t loc, tree handle_type) { tree addr_getter = lookup_member (handle_type, coro_address_identifier, 1, 0, tf_warning_or_error); if (!addr_getter || addr_getter == error_mark_node) { qualified_name_lookup_error (handle_type, coro_address_identifier, - error_
Re: [PATCH] c++: Partially implement CWG 2867 - Order of initialization for structured bindings [PR115769]
On 8/14/24 3:41 AM, Jakub Jelinek wrote: Hi! The following patch partially implements CWG 2867 - Order of initialization for structured bindings. The DR requires that initialization of e is sequenced before r_i and that r_i initialization is sequenced before r_j for j > i, we already do it that way, the former ordering is a necessity so that the get calls are actually emitted on already initialized variable, the rest just because we implemented it that way, by going through the structured binding vars in ascending order and doing their initialization. The hard part not implemented yet is the lifetime extension of the temporaries from the e initialization to after the get calls (if any). Unlike the range-for lifetime extension patch which I've posted recently where IMO we can just ignore lifetime extension of reference bound temporaries because all the temporaries are extended to the same spot, here lifetime extension of reference bound temporaries should last until the end of lifetime of e, while other temporaries only after all the get calls. The patch just attempts to deal with automatic structured bindings for now, I'll post a patch for static locals incrementally and I don't have a patch for namespace scope structured bindings yet, this patch should just keep existing behavior for both static locals and namespace scope structured bindings. What GCC currently emits is a CLEANUP_POINT_EXPR around the e initialization, followed optionally by nested CLEANUP_STMTs for cleanups like the e dtor if any and dtors of lifetime extended temporaries from reference binding; inside of the CLEANUP_STMT CLEANUP_BODY then the initialization of the individual variables for the tuple case, again with optional CLEANUP_STMT if e.g. lifetime extended temporaries from reference binding are needed in those. The following patch drops that first CLEANUP_POINT_EXPR and instead wraps the whole sequence of the e initialization and the individual variable initialization with get calls after it into a single CLEANUP_POINT_EXPR. If there are any CLEANUP_STMTs needed, they are all emitted first, with the CLEANUP_POINT_EXPR for e initialization and the individual variable initialization inside of those, and a guard variable set after different phases in those expressions guarding the corresponding cleanups, so that they aren't invoked until the respective variables are constructed. This is implemented by cp_finish_decl doing cp_finish_decomp on its own when !processing_template_decl (otherwise we often don't cp_finish_decl or process it at a different time from when we want to call cp_finish_decomp) or unless the decl is erroneous (cp_finish_decl has too many early returns for erroneous cases, and for those we can actually call it even multiple times, for the non-erroneous cases non-processing_template_decl cases we need to call it just once). The two testcases try to construct various temporaries and variables and verify the order in which the temporaries and variables are constructed and destructed. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-08-14 Jakub Jelinek PR c++/115769 * cp-tree.h: Partially implement CWG 2867 - Order of initialization for structured bindings. (cp_finish_decomp): Add bool argument defaulted to false. * decl.cc (initialize_local_var): Add DECOMP argument, if true, don't build cleanup and temporarily override stmts_are_full_exprs_p to 0 rather than 1. Formatting fix. (cp_finish_decl): Invoke cp_finish_decomp fpr structured bindings here if !processing_template_decl, first with TEST_P true. For automatic structured binding bases if the test cp_finish_decomp returned true wrap the initialization together with what non-test cp_finish_decomp emits with a CLEANUP_POINT_EXPR, and if there are any CLEANUP_STMTs needed, emit them around the whole CLEANUP_POINT_EXPR with guard variables for the cleanups. (cp_finish_decomp): Add TEST_P argument, change return type from void to bool, if TEST_P, return true instead of emitting actual code for the tuple case, otherwise return false. * parser.cc (cp_convert_range_for): Don't call cp_finish_decomp unless range_decl is erroneous. (cp_parser_decomposition_declaration): Set DECL_DECOMP_BASE before cp_finish_decl call, call cp_finish_decomp after it only if processing_template_decl or decl is erroneous. (cp_finish_omp_range_for): Call cp_finish_decomp only if processing_template_decl or decl is erroneous. * pt.cc (tsubst_stmt): Likewise. * g++.dg/DRs/dr2867-1.C: New test. * g++.dg/DRs/dr2867-2.C: New test. +If there are any cleanups, either extend_ref_init_temps +created ones or e.g. array destruction, push those first +with the cleanups guarded on a bool temporary, initially
Re: [PATCH v3 1/2] c++: improve location of parsed RETURN_EXPRs
On 8/21/24 1:52 PM, Arsen Arsenović wrote: For clarity, here's the entire split-up patch I intend to push, if it looks OK. Tested on x86_64-pc-linux-gnu. I've renamed the field we've discussed and also a few parameters that refer to 'kw' to be less specific. The code is functionally identical. OK for trunk? TIA, have a lovely day. -- >8 -- This patch improves the EXPR_LOCATION associated with parsed RETURN_EXPRs so that they can be used in diagnostics later. This change also happened to un-suppress an analyzer false-negative that was happening because the location of RETURN_EXPR was entirely within the NULL macro, which was defined in a system header. PR analyzer/116304. The PR number should be on its own line, and in the subject line. gcc/cp/ChangeLog: * cp-tree.h (finish_return_stmt): Add optional location_t parameter, defaulting to input_location. * parser.cc (cp_parser_jump_statement): Improve return and co_return locations so that they span their entire statements. * semantics.cc (finish_return_stmt): Use the new stmt_loc parameter in place of input_location. gcc/testsuite/ChangeLog: * c-c++-common/analyzer/inlining-4-multiline.c: Adjust locations in diagnostics. This doesn't look like a location adjustment, but removing testing of expected output. If the output changed, please change the test to check the new output rather than not at all. * c-c++-common/analyzer/malloc-paths-9-noexcept.c: Ditto. ...like you do properly in this test. * c-c++-common/analyzer/malloc-CWE-401-example.c: Accept the new warning on line 34 (fixed false negative). I'd think the new dg-warning could replace the obsolete TODO? --- gcc/cp/cp-tree.h | 2 +- gcc/cp/parser.cc | 9 +++-- gcc/cp/semantics.cc| 4 ++-- .../c-c++-common/analyzer/inlining-4-multiline.c | 6 -- .../c-c++-common/analyzer/malloc-CWE-401-example.c | 1 + .../c-c++-common/analyzer/malloc-paths-9-noexcept.c| 10 +- 6 files changed, 16 insertions(+), 16 deletions(-) diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index a53fbcb43ec4..83be768420aa 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -7794,7 +7794,7 @@ extern void finish_while_stmt (tree); extern tree begin_do_stmt (void); extern void finish_do_body(tree); extern void finish_do_stmt(tree, tree, bool, tree, bool); -extern tree finish_return_stmt (tree); +extern tree finish_return_stmt (tree, location_t = input_location); extern tree begin_for_scope (tree *); extern tree begin_for_stmt(tree, tree); extern void finish_init_stmt (tree); diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index c9654cfff9d2..68b3f0a0f5c4 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -14952,14 +14952,19 @@ cp_parser_jump_statement (cp_parser* parser, tree &std_attrs) set_musttail_on_return (expr, token->location, musttail_p); } + /* A location spanning the whole statement (up to ';'). */ + auto stmt_loc = make_location (token->location, + token->location, + input_location); + /* Build the return-statement, check co-return first, since type deduction is not valid there. */ if (keyword == RID_CO_RETURN) - statement = finish_co_return_stmt (token->location, expr); + statement = finish_co_return_stmt (stmt_loc, expr); else if (FNDECL_USED_AUTO (current_function_decl) && in_discarded_stmt) /* Don't deduce from a discarded return statement. */; else - statement = finish_return_stmt (expr); + statement = finish_return_stmt (expr, stmt_loc); /* Look for the final `;'. */ cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON); } diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 5ab2076b673c..734c613a474e 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -1400,7 +1400,7 @@ finish_do_stmt (tree cond, tree do_stmt, bool ivdep, tree unroll, indicated. */ tree -finish_return_stmt (tree expr) +finish_return_stmt (tree expr, location_t stmt_loc) { tree r; bool no_warning; @@ -1423,7 +1423,7 @@ finish_return_stmt (tree expr) verify_sequence_points (expr); } - r = build_stmt (input_location, RETURN_EXPR, expr); + r = build_stmt (stmt_loc, RETURN_EXPR, expr); RETURN_EXPR_LOCAL_ADDR_P (r) = dangling; if (no_warning) suppress_warning (r, OPT_Wreturn_type); diff --git a/gcc/testsuite/c-c++-common/analyzer/inlining-4-multiline.c b/gcc/testsuite/c-c++-common/analyzer/inlining-4-
[PATCH 0/9] c++, coroutines: Patch set for ramp function fixes.
This is a series of patches that addresses the majority of the open PRs related to the coroutine ramp function. It is presented as a series because the actual bug fixes depend on some preparatory patches (which are also used to resolve issues with other PR fixes - e.g. Arsen's fix for PR109867). The series has been tested incrementally against the GCC testsuite, cppcoro and the folly coroutines tests. - Iain Sandoe (9): c++, coroutines: Split the ramp build into a separate function. c++, coroutines: Separate the analysis, ramp and outlined function synthesis. c++, coroutines: Separate allocator work from the ramp body build. c++, coroutines: Fix handling of early exceptions [PR113773]. c++, coroutines: Only allow void get_return_object if the ramp is void [PR100476]. c++, coroutines: Allow convertible get_return_on_allocation_fail [PR109682]. c++, coroutines: Fix ordering of return object conversions [PR115908]. c++, coroutines: Rework handling of throwing_cleanups [PR102051]. c++, coroutines: Look through initial_await target exprs [PR110635]. gcc/cp/coroutines.cc | 1391 - gcc/cp/coroutines.h | 132 ++ gcc/cp/cp-tree.h |1 - gcc/cp/decl.cc| 80 +- .../coro-bad-gro-00-class-gro-scalar-return.C |4 +- .../coro-bad-gro-01-void-gro-non-class-coro.C |4 +- .../coroutines/coro-bad-grooaf-00-static.C|6 +- gcc/testsuite/g++.dg/coroutines/pr102051.C| 16 + gcc/testsuite/g++.dg/coroutines/pr102489.C|2 +- gcc/testsuite/g++.dg/coroutines/pr103868.C|2 +- gcc/testsuite/g++.dg/coroutines/pr109682.C| 28 + gcc/testsuite/g++.dg/coroutines/pr110635.C| 72 + gcc/testsuite/g++.dg/coroutines/pr115908.C| 75 + .../g++.dg/coroutines/pr94879-folly-1.C |3 +- .../g++.dg/coroutines/pr94883-folly-2.C | 39 +- gcc/testsuite/g++.dg/coroutines/pr96749-2.C |2 +- .../g++.dg/coroutines/ramp-return-b.C |8 +- .../g++.dg/coroutines/torture/pr113773.C | 66 + 18 files changed, 1135 insertions(+), 796 deletions(-) create mode 100644 gcc/cp/coroutines.h create mode 100644 gcc/testsuite/g++.dg/coroutines/pr102051.C create mode 100644 gcc/testsuite/g++.dg/coroutines/pr109682.C create mode 100644 gcc/testsuite/g++.dg/coroutines/pr110635.C create mode 100644 gcc/testsuite/g++.dg/coroutines/pr115908.C create mode 100644 gcc/testsuite/g++.dg/coroutines/torture/pr113773.C -- 2.39.2 (Apple Git-143)
[PATCH 1/9] c++, coroutines: Split the ramp build into a separate function.
This is primarily preparation to partition the functionality of the coroutine transform into analysis, ramp generation and then (later) synthesis of the coroutine body. The patch does fix one latent issue in the ordering of DTORs for frame parameter copies (to ensure that they are processed in reverse order to the copy creation). gcc/cp/ChangeLog: * coroutines.cc (build_actor_fn): Arrange to apply any required parameter copy DTORs in reverse order to their creation. (morph_fn_to_coro): Split the ramp function completion into a separate function. (build_ramp_function): New. Signed-off-by: Iain Sandoe --- gcc/cp/coroutines.cc | 360 +++ 1 file changed, 192 insertions(+), 168 deletions(-) diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 1f1ea5c2fe4..50362fc3556 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -2298,7 +2298,7 @@ static void build_actor_fn (location_t loc, tree coro_frame_type, tree actor, tree fnbody, tree orig, hash_map *local_var_uses, hash_map *suspend_points, - vec *param_dtor_list, + vec *param_dtor_list, tree resume_idx_var, unsigned body_count, tree frame_size) { verify_stmt_tree (fnbody); @@ -2513,19 +2513,15 @@ build_actor_fn (location_t loc, tree coro_frame_type, tree actor, tree fnbody, fnf2_x = build1 (CONVERT_EXPR, integer_type_node, fnf2_x); tree cmp = build2 (NE_EXPR, integer_type_node, fnf2_x, integer_zero_node); finish_if_stmt_cond (cmp, need_free_if); - if (param_dtor_list != NULL) + while (!param_dtor_list->is_empty ()) { - int i; - tree pid; - FOR_EACH_VEC_ELT (*param_dtor_list, i, pid) - { - tree m - = lookup_member (coro_frame_type, pid, 1, 0, tf_warning_or_error); - tree a = build_class_member_access_expr (actor_frame, m, NULL_TREE, - false, tf_warning_or_error); - if (tree dtor = cxx_maybe_build_cleanup (a, tf_warning_or_error)) - add_stmt (dtor); - } + tree pid = param_dtor_list->pop (); + tree m = lookup_member (coro_frame_type, pid, 1, 0, tf_warning_or_error); + gcc_checking_assert (m); + tree a = build_class_member_access_expr (actor_frame, m, NULL_TREE, + false, tf_warning_or_error); + if (tree dtor = cxx_maybe_build_cleanup (a, tf_warning_or_error)) + add_stmt (dtor); } /* Build the frame DTOR. */ @@ -4553,147 +4549,28 @@ split_coroutine_body_from_ramp (tree fndecl) return body; } -/* Here we: - a) Check that the function and promise type are valid for a - coroutine. - b) Carry out the initial morph to create the skeleton of the - coroutine ramp function and the rewritten body. - - Assumptions. +/* Build the ramp function. + Here we take the original function definition which has now had its body + removed, and use it as the declaration of the ramp which both replaces the + user's written function at call sites, and is responsible for starting + the coroutine it defined. + returns NULL_TREE on error or an expression for the frame size. - 1. We only hit this code once all dependencies are resolved. - 2. The function body will be either a bind expr or a statement list - 3. That cfun and current_function_decl are valid for the case we're - expanding. - 4. 'input_location' will be of the final brace for the function. - - We do something like this: - declare a dummy coro frame. - struct _R_frame { - using handle_type = coro::coroutine_handle; - void (*_Coro_resume_fn)(_R_frame *); - void (*_Coro_destroy_fn)(_R_frame *); - coro1::promise_type _Coro_promise; - bool _Coro_frame_needs_free; free the coro frame mem if set. - bool _Coro_i_a_r_c; [dcl.fct.def.coroutine] / 5.3 - short _Coro_resume_index; - handle_type _Coro_self_handle; - parameter copies (were required). - local variables saved (including awaitables) - (maybe) trailing space. - }; */ + We should arrive here with the state of the compiler as if we had just + executed start_preparsed_function(). */ -bool -morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) +static tree +build_ramp_function (tree orig, location_t fn_start, tree coro_frame_ptr, +tree coro_frame_type, +hash_map *param_uses, +tree act_des_fn_ptr, tree actor, tree destroy, +vec *param_dtor_list) { - gcc_checking_assert (orig && TREE_CODE (orig) == FUNCTION_DECL); - - *resumer = error_mark_node; - *destroyer = error_mark_node; - if (!coro_function_valid_p (orig)) -{ - /* For early errors, we do not want a diagnostic about the missing -ramp return value, since the user cannot fix this - a 'return' is -not allowed in a coroutine. */ - suppress_warni
[PATCH 5/9] c++, coroutines: Only allow void get_return_object if the ramp is void [PR100476].
Require that the value returned by get_return_object is convertible to the ramp return. This means that the only time we allow a void get_return_object, is when the ramp is also a void function. We diagnose this early to allow us to exit the ramp build if the return values are incompatible. PR c++/100476 gcc/cp/ChangeLog: * coroutines.cc (cp_coroutine_transform::build_ramp_function): Remove special handling of void get_return_object expressions. gcc/testsuite/ChangeLog: * g++.dg/coroutines/coro-bad-gro-01-void-gro-non-class-coro.C: Adjust expected diagnostic. * g++.dg/coroutines/pr102489.C: Avoid void get_return_object. * g++.dg/coroutines/pr103868.C: Likewise. * g++.dg/coroutines/pr94879-folly-1.C: Likewise. * g++.dg/coroutines/pr94883-folly-2.C: Likewise. * g++.dg/coroutines/pr96749-2.C: Likewise. Signed-off-by: Iain Sandoe --- gcc/cp/coroutines.cc | 48 +-- .../coro-bad-gro-01-void-gro-non-class-coro.C | 2 +- gcc/testsuite/g++.dg/coroutines/pr102489.C| 2 +- gcc/testsuite/g++.dg/coroutines/pr103868.C| 2 +- .../g++.dg/coroutines/pr94879-folly-1.C | 3 +- .../g++.dg/coroutines/pr94883-folly-2.C | 39 +++ gcc/testsuite/g++.dg/coroutines/pr96749-2.C | 2 +- 7 files changed, 48 insertions(+), 50 deletions(-) diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 2faf198c206..d152ad20dca 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -4640,6 +4640,7 @@ cp_coroutine_transform::build_ramp_function () tree promise_type = get_coroutine_promise_type (orig_fn_decl); tree fn_return_type = TREE_TYPE (TREE_TYPE (orig_fn_decl)); + bool void_ramp_p = VOID_TYPE_P (fn_return_type); /* [dcl.fct.def.coroutine] / 10 (part1) The unqualified-id get_return_object_on_allocation_failure is looked up @@ -4720,6 +4721,19 @@ cp_coroutine_transform::build_ramp_function () return; } + /* Check for a bad get return object type. */ + tree gro_return_type = FUNC_OR_METHOD_TYPE_P (TREE_TYPE (get_ro_meth)) + ? TREE_TYPE (TREE_TYPE (get_ro_meth)) + : TREE_TYPE (get_ro_meth); + if (VOID_TYPE_P (gro_return_type) && !void_ramp_p) +{ + error_at (fn_start, "no viable conversion from % provided by" + " % to return type %qT", fn_return_type); + valid_coroutine = false; + input_location = save_input_loc; + return; +} + /* So now construct the Ramp: */ tree stmt = begin_function_body (); /* Now build the ramp function pieces. */ @@ -4816,7 +4830,7 @@ cp_coroutine_transform::build_ramp_function () tree cond = build1 (CONVERT_EXPR, frame_ptr_type, nullptr_node); cond = build2 (EQ_EXPR, boolean_type_node, coro_fp, cond); finish_if_stmt_cond (cond, if_stmt); - if (VOID_TYPE_P (fn_return_type)) + if (void_ramp_p) { /* Execute the get-return-object-on-alloc-fail call... */ finish_expr_stmt (grooaf); @@ -5028,7 +5042,6 @@ cp_coroutine_transform::build_ramp_function () tree gro_context_body = push_stmt_list (); tree gro_type = TREE_TYPE (get_ro); - bool gro_is_void_p = VOID_TYPE_P (gro_type); tree gro = NULL_TREE; tree gro_bind_vars = NULL_TREE; @@ -5037,8 +5050,11 @@ cp_coroutine_transform::build_ramp_function () tree gro_cleanup_stmt = NULL_TREE; /* We have to sequence the call to get_return_object before initial suspend. */ - if (gro_is_void_p) -r = get_ro; + if (void_ramp_p) +{ + gcc_checking_assert (VOID_TYPE_P (gro_type)); + r = get_ro; +} else if (same_type_p (gro_type, fn_return_type)) { /* [dcl.fct.def.coroutine] / 7 @@ -5122,31 +5138,11 @@ cp_coroutine_transform::build_ramp_function () for an object of the return type. */ if (same_type_p (gro_type, fn_return_type)) -r = gro_is_void_p ? NULL_TREE : DECL_RESULT (orig_fn_decl); - else if (!gro_is_void_p) +r = void_ramp_p ? NULL_TREE : DECL_RESULT (orig_fn_decl); + else /* check_return_expr will automatically return gro as an rvalue via treat_lvalue_as_rvalue_p. */ r = gro; - else if (CLASS_TYPE_P (fn_return_type)) -{ - /* For class type return objects, we can attempt to construct, -even if the gro is void. ??? Citation ??? c++/100476 */ - r = build_special_member_call (NULL_TREE, -complete_ctor_identifier, NULL, -fn_return_type, LOOKUP_NORMAL, -tf_warning_or_error); - r = build_cplus_new (fn_return_type, r, tf_warning_or_error); -} - else -{ - /* We can't initialize a non-class return value from void. */ - error_at (fn_start, "cannot initialize a return object of type" - " %qT with an rvalue of type %", fn_return_type); - r = error