Re: [PATCH v2 6/7] Alpha: Add option to avoid data races for sub-longword memory stores [PR117759]
On 1/6/25 5:59 PM, Linus Torvalds wrote: On Mon, 6 Jan 2025 at 16:13, Jeff Law wrote: But in the case of concurrent accesses, shouldn't these objects be declared as atomic? No. They aren't concurrent accesses to the same variable. They are concurrent accesses to *different* memory locations, and the compiler is not allowed to mess them up. You're absolutely right. Definitely not kosher. Thanks. Jeff
Re: [PATCH 1/2] testsuite: add testcase for fixed PR117546
Jeff Law writes: > On 1/3/25 11:11 AM, Sam James wrote: >> Sam James writes: >> >>> PR117546 was fixed by Eric's r14-10693-gadab597af288d6 change, but >>> the testcase here is sufficiently different to be worth including >>> in torture/. >>> >>> gcc/testsuite/ChangeLog: >>> PR ipa/117546 >>> * gcc.dg/torture/pr117546.c: New test. >>> --- >> Sorry, forgot to ask: OK? (for both) > OK for both. Thanks! Pushed. > > jeff
[PATCH] LoongArch: Adjust the cost of ADDRESS_REG_REG [PR114978].
After changing this cost from 1 to 3, the performance of spec2006 401 473 416 465 482 can be improved by about 2% on LA664. Add option '-maddr-reg-reg-cost='. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in: Add option '-maddr-reg-reg-cost='. * config/loongarch/loongarch-def.cc (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Initialize addr_reg_reg_cost to 3. * config/loongarch/loongarch-opts.cc (loongarch_target_option_override): If '-maddr-reg-reg-cost=' is not used, set it to the initial value. * config/loongarch/loongarch-tune.h (struct loongarch_rtx_cost_data): Add the member addr_reg_reg_cost and its assignment function to the structure loongarch_rtx_cost_data. * config/loongarch/loongarch.cc (loongarch_address_insns): Use la_addr_reg_reg_cost to set the cost of ADDRESS_REG_REG. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/loongarch.opt.urls: Regenerate. * doc/invoke.texi: Add description of '-maddr-reg-reg-cost='. gcc/testsuite/ChangeLog: * gcc.target/loongarch/const-double-zero-stx.c: Add '-maddr-reg-reg-cost=1'. * gcc.target/loongarch/stack-check-alloca-1.c: Likewise. --- gcc/config/loongarch/genopts/loongarch.opt.in | 4 gcc/config/loongarch/loongarch-def.cc | 1 + gcc/config/loongarch/loongarch-opts.cc | 3 +++ gcc/config/loongarch/loongarch-tune.h | 7 +++ gcc/config/loongarch/loongarch.cc | 2 +- gcc/config/loongarch/loongarch.opt | 4 gcc/config/loongarch/loongarch.opt.urls| 3 +++ gcc/doc/invoke.texi| 7 ++- gcc/testsuite/gcc.target/loongarch/const-double-zero-stx.c | 2 +- gcc/testsuite/gcc.target/loongarch/stack-check-alloca-1.c | 2 +- 10 files changed, 31 insertions(+), 4 deletions(-) diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index 8c292c8600d..39c1545e540 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -177,6 +177,10 @@ mbranch-cost= Target RejectNegative Joined UInteger Var(la_branch_cost) Save -mbranch-cost=COST Set the cost of branches to roughly COST instructions. +maddr-reg-reg-cost= +Target RejectNegative Joined UInteger Var(la_addr_reg_reg_cost) Save +-maddr-reg-reg-cost=COST Set the cost of ADDRESS_REG_REG to the value calculated by COST. + mcheck-zero-division Target Mask(CHECK_ZERO_DIV) Save Trap on integer divide by zero. diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc index b0271eb3b9a..5f235a04ef2 100644 --- a/gcc/config/loongarch/loongarch-def.cc +++ b/gcc/config/loongarch/loongarch-def.cc @@ -136,6 +136,7 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data () movcf2gr (COSTS_N_INSNS (7)), movgr2cf (COSTS_N_INSNS (15)), branch_cost (6), +addr_reg_reg_cost (3), memory_latency (4) {} /* The following properties cannot be looked up directly using "cpucfg". diff --git a/gcc/config/loongarch/loongarch-opts.cc b/gcc/config/loongarch/loongarch-opts.cc index 36342cc9373..c2a63f75fc2 100644 --- a/gcc/config/loongarch/loongarch-opts.cc +++ b/gcc/config/loongarch/loongarch-opts.cc @@ -1010,6 +1010,9 @@ loongarch_target_option_override (struct loongarch_target *target, if (!opts_set->x_la_branch_cost) opts->x_la_branch_cost = loongarch_cost->branch_cost; + if (!opts_set->x_la_addr_reg_reg_cost) +opts->x_la_addr_reg_reg_cost = loongarch_cost->addr_reg_reg_cost; + /* other stuff */ if (ABI_LP64_P (target->abi.base)) opts->x_flag_pcc_struct_return = 0; diff --git a/gcc/config/loongarch/loongarch-tune.h b/gcc/config/loongarch/loongarch-tune.h index e69173ebf79..f7819fe7678 100644 --- a/gcc/config/loongarch/loongarch-tune.h +++ b/gcc/config/loongarch/loongarch-tune.h @@ -38,6 +38,7 @@ struct loongarch_rtx_cost_data unsigned short movcf2gr; unsigned short movgr2cf; unsigned short branch_cost; + unsigned short addr_reg_reg_cost; unsigned short memory_latency; /* Default RTX cost initializer, implemented in loongarch-def.cc. */ @@ -115,6 +116,12 @@ struct loongarch_rtx_cost_data return *this; } + loongarch_rtx_cost_data addr_reg_reg_cost_ (unsigned short _addr_reg_reg_cost) + { +addr_reg_reg_cost = _addr_reg_reg_cost; +return *this; + } + loongarch_rtx_cost_data memory_latency_ (unsigned short _memory_latency) { memory_latency = _memory_latency; diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 89237c377e7..d5e90bfd1e1 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -2383,7 +2383,7 @@ loongarch_address_insns (rtx x, machin
RE: [PATCH v1 1/4] Match: Refactor the signed SAT_SUB match patterns [NFC]
Kindly ping for the series. Pan -Original Message- From: Li, Pan2 Sent: Monday, December 23, 2024 3:09 PM To: gcc-patches@gcc.gnu.org Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: RE: [PATCH v1 1/4] Match: Refactor the signed SAT_SUB match patterns [NFC] Kindly ping for this series, and Merry Christmas! Pan -Original Message- From: Li, Pan2 Sent: Thursday, December 12, 2024 4:42 PM To: gcc-patches@gcc.gnu.org Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 Subject: [PATCH v1 1/4] Match: Refactor the signed SAT_SUB match patterns [NFC] From: Pan Li This patch would like to refactor the all signed SAT_ADD patterns, aka: * Extract type check outside. * Re-arrange the related match pattern forms together. The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Refactor sorts of signed SAT_SUB match patterns. Signed-off-by: Pan Li --- gcc/match.pd | 98 +--- 1 file changed, 40 insertions(+), 58 deletions(-) diff --git a/gcc/match.pd b/gcc/match.pd index dd5302015c7..1ef504f141f 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3375,6 +3375,46 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (wi::bit_and (wi::to_wide (@1), wi::to_wide (@3)) == 0))) ) +(if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)) + (match (signed_integer_sat_sub @0 @1) + /* T Z = (T)((UT)X - (UT)Y); + SAT_S_SUB = (X ^ Y) & (X ^ Z) < 0 ? (-(T)(X < 0) ^ MAX) : Z */ + (cond^ (lt (bit_and:c (bit_xor:c @0 @1) + (bit_xor @0 (nop_convert@2 (minus (nop_convert @0) + (nop_convert @1) +integer_zerop) +(bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value) +@2)) + (match (signed_integer_sat_sub @0 @1) + /* T Z = (T)((UT)X - (UT)Y); + SAT_S_SUB = (X ^ Y) & (X ^ Z) >= 0 ? Z : (-(T)(X < 0) ^ MAX) */ + (cond^ (ge (bit_and:c (bit_xor:c @0 @1) + (bit_xor @0 (nop_convert@2 (minus (nop_convert @0) + (nop_convert @1) +integer_zerop) +@2 +(bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value))) + (match (signed_integer_sat_sub @0 @1) + /* T Z = (T)((UT)X - (UT)Y); + SAT_S_SUB = (X ^ Y) < 0 & (X ^ Z) < 0 ? (-(T)(X < 0) ^ MAX) : Z */ + (cond^ (bit_and:c (lt (bit_xor @0 (nop_convert@2 (minus (nop_convert @0) + (nop_convert @1 + integer_zerop) + (lt (bit_xor:c @0 @1) integer_zerop)) +(bit_xor:c (nop_convert (negate (nop_convert (convert + (lt @0 integer_zerop) + max_value) +@2)) + (match (signed_integer_sat_sub @0 @1) + /* Z = .SUB_OVERFLOW (X, Y) + SAT_S_SUB = IMAGPART (Z) != 0 ? (-(T)(X < 0) ^ MAX) : REALPART (Z) */ + (cond^ (ne (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop) +(bit_xor:c (nop_convert? + (negate (nop_convert? (convert (lt @0 integer_zerop) + max_value) +(realpart @2)) + (if (types_match (type, @0, @1) + /* The boundary condition for case 10: IMM = 1: SAT_U_SUB = X >= IMM ? (X - IMM) : 0. simplify (X != 0 ? X + ~0 : 0) to X - (X != 0). */ @@ -3386,64 +3426,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (with { tree itype = TREE_TYPE (@2); } (convert (minus @2 (convert:itype @1)) -/* Signed saturation sub, case 1: - T minus = (T)((UT)X - (UT)Y); - SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus; - - The T and UT are type pair like T=int8_t, UT=uint8_t. */ -(match (signed_integer_sat_sub @0 @1) - (cond^ (lt (bit_and:c (bit_xor:c @0 @1) - (bit_xor @0 (nop_convert@2 (minus (nop_convert @0) -(nop_convert @1) - integer_zerop) - (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value) - @2) - (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type - -/* Signed saturation sub, case 2: - T minus = (T)((UT)X - (UT)Y); - SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus; - - The T and UT are type pair like T=int8_t, UT=uint8_t. */ -(match (signed_integer_sat_sub @0 @1) - (cond^ (ge (bit_and:c (bit_xor:c @0 @1) - (bit_xor @0 (nop_convert@2 (minus (nop_convert @0) -(nop_convert @1) - integer_zerop) - @2 - (bit_xor:c (negate (convert (lt @0 integer_zerop
[PATCH] c-pretty-print.cc (pp_c_tree_decl_identifier): Strip private name encoding, PR118303
Regtested native x86_64-linux. Also tested mmix-knuth-mmixware, where it fixes ONE testcase, but one which is a regression on master. The PR component is currently ipa, changed from the original middle-end. IIUC this bug-fix doesn't fit the ipa category IMHO, but rather more general tree-optimization or rather middle-end, to which I'll change the component unless I see a reason for this fitting ipa stated. Ok to commit? -- >8 -- This is a part of PR118303. It fixes FAIL: gcc.dg/analyzer/CVE-2005-1689-minimal.c (test for excess errors) FAIL: gcc.dg/analyzer/CVE-2005-1689-minimal.c inbuf.data (test for warnings, line 62) for targets where the parameter on that line is subject to TARGET_CALLEE_COPIES being true. c-family: PR middle-end/118303 * c-pretty-print.cc (c_pretty_printer::primary_expression) : Call primary_expression for all SSA_NAME_VAR nodes and instead move the DECL_ARTIFICIAL private name stripping to... (pp_c_tree_decl_identifier): ...here. --- gcc/c-family/c-pretty-print.cc | 39 +- 1 file changed, 15 insertions(+), 24 deletions(-) diff --git a/gcc/c-family/c-pretty-print.cc b/gcc/c-family/c-pretty-print.cc index 22a71d1e3558..0b6810e12242 100644 --- a/gcc/c-family/c-pretty-print.cc +++ b/gcc/c-family/c-pretty-print.cc @@ -1398,29 +1398,7 @@ c_pretty_printer::primary_expression (tree e) case SSA_NAME: if (SSA_NAME_VAR (e)) - { - tree var = SSA_NAME_VAR (e); - if (tree id = SSA_NAME_IDENTIFIER (e)) - { - const char *name = IDENTIFIER_POINTER (id); - const char *dot; - if (DECL_ARTIFICIAL (var) && (dot = strchr (name, '.'))) - { - /* Print the name without the . suffix (such as in VLAs). -Use pp_c_identifier so that it can be converted into -the appropriate encoding. */ - size_t size = dot - name; - char *ident = XALLOCAVEC (char, size + 1); - memcpy (ident, name, size); - ident[size] = '\0'; - pp_c_identifier (this, ident); - } - else - primary_expression (var); - } - else - primary_expression (var); - } + primary_expression (SSA_NAME_VAR (e)); else if (gimple_assign_single_p (SSA_NAME_DEF_STMT (e))) { /* Print only the right side of the GIMPLE assignment. */ @@ -3033,7 +3011,20 @@ pp_c_tree_decl_identifier (c_pretty_printer *pp, tree t) gcc_assert (DECL_P (t)); if (DECL_NAME (t)) -name = IDENTIFIER_POINTER (DECL_NAME (t)); +{ + const char *dot; + name = IDENTIFIER_POINTER (DECL_NAME (t)); + if (DECL_ARTIFICIAL (t) && (dot = strchr (name, '.'))) + { + /* Print the name without the . suffix (such as in VLAs and +callee-copied parameters). */ + size_t size = dot - name; + char *ident = XALLOCAVEC (char, size + 1); + memcpy (ident, name, size); + ident[size] = '\0'; + name = ident; + } +} else { static char xname[8]; -- 2.39.2
Re: [PATCH] testsuite: generalized field-merge tests for <32-bit int [PR118025]
On Jan 6, 2025, at 3:05 PM, Alexandre Oliva wrote: > > On Dec 22, 2024, Alexandre Oliva wrote: > >> for gcc/testsuite/ChangeLog > >> PR testsuite/118025 >> * gcc.dg/field-merge-1.c: Convert constants to desired types. >> * gcc.dg/field-merge-3.c: Likewise. >> * gcc.dg/field-merge-4.c: Likewise. >> * gcc.dg/field-merge-5.c: Likewise. >> * gcc.dg/field-merge-11.c: Likewise. >> * gcc.dg/field-merge-17.c: Don't mess with padding bits. > > Ping? > https://gcc.gnu.org/pipermail/gcc-patches/2024-December/672194.html Ok.
Re: [PATCH v2 6/7] Alpha: Add option to avoid data races for sub-longword memory stores [PR117759]
On 1/6/25 6:03 AM, Maciej W. Rozycki wrote: With non-BWX Alpha implementations we have a problem of data races where a 8-bit byte or 16-bit word quantity is to be written to memory in that in those cases we use an unprotected RMW access of a 32-bit longword or 64-bit quadword width. If contents of the longword or quadword accessed outside the byte or word to be written are changed midway through by a concurrent write executing on the same CPU such as by a signal handler or a parallel write executing on another CPU such as by another thread or via a shared memory segment, then the concluding write of the RMW access will clobber them. This is especially important for the safety of RCU algorithms, but is otherwise an issue anyway. But in the case of concurrent accesses, shouldn't these objects be declared as atomic? Similarly for objects potentially accessed in a signal hnadler shouldn't they be accessed via sig_atomic_t? Point being I'm not 100% sure we really need to tackle this problem in a fully generic manner for all 8/16 bit accesses What am I missing here? jeff
Re: [PATCH 1/2] testsuite: add testcase for fixed PR117546
On 1/3/25 11:11 AM, Sam James wrote: Sam James writes: PR117546 was fixed by Eric's r14-10693-gadab597af288d6 change, but the testcase here is sufficiently different to be worth including in torture/. gcc/testsuite/ChangeLog: PR ipa/117546 * gcc.dg/torture/pr117546.c: New test. --- Sorry, forgot to ask: OK? (for both) OK for both. jeff
Re: [PATCH v2 6/7] Alpha: Add option to avoid data races for sub-longword memory stores [PR117759]
On Mon, Jan 06, 2025 at 05:12:57PM -0700, Jeff Law wrote: > > > On 1/6/25 6:03 AM, Maciej W. Rozycki wrote: > > With non-BWX Alpha implementations we have a problem of data races where > > a 8-bit byte or 16-bit word quantity is to be written to memory in that > > in those cases we use an unprotected RMW access of a 32-bit longword or > > 64-bit quadword width. If contents of the longword or quadword accessed > > outside the byte or word to be written are changed midway through by a > > concurrent write executing on the same CPU such as by a signal handler > > or a parallel write executing on another CPU such as by another thread > > or via a shared memory segment, then the concluding write of the RMW > > access will clobber them. This is especially important for the safety > > of RCU algorithms, but is otherwise an issue anyway. > But in the case of concurrent accesses, shouldn't these objects be declared > as atomic? Similarly for objects potentially accessed in a signal hnadler > shouldn't they be accessed via sig_atomic_t? > > Point being I'm not 100% sure we really need to tackle this problem in a > fully generic manner for all 8/16 bit accesses > > What am I missing here? Doesn't the behavior Maciej is describing constitute a data race injected by the compiler? As of C11 and C++11, this is forbidden, correct? Thanx, Paul
Re: [PATCH v3] aarch64: remove extra XTN in vector concatenation
Akram Ahmad writes: > Hi Richard, > > Thanks for the feedback. I've copied in the resulting patch here- if > this is okay, please could it be committed on my behalf? The patch > continues below. > > Many thanks, > > Akram Thanks. LGTM. Pushed to trunk. Richard > --- > > GIMPLE code which performs a narrowing truncation on the result of a > vector concatenation currently results in an unnecessary XTN being > emitted following a UZP1 to concate the operands. In cases such as this, > UZP1 should instead use a smaller arrangement specifier to replace the > XTN instruction. This is seen in cases such as in this GIMPLE example: > > int32x2_t foo (svint64_t a, svint64_t b) > { > vector(2) int vect__2.8; > long int _1; > long int _3; > vector(2) long int _12; > > [local count: 1073741824]: > _1 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, a_6(D)); > _3 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, b_7(D)); > _12 = {_1, _3}; > vect__2.8_13 = (vector(2) int) _12; > return vect__2.8_13; > > } > > Original assembly generated: > > bar: > ptrue p3.b, all > uaddv d0, p3, z0.d > uaddv d1, p3, z1.d > uzp1v0.2d, v0.2d, v1.2d > xtn v0.2s, v0.2d > ret > > This patch therefore defines the *aarch64_trunc_concat insn which > truncates the concatenation result, rather than concatenating the > truncated operands (such as in *aarch64_narrow_trunc), resulting > in the following optimised assembly being emitted: > > bar: > ptrue p3.b, all > uaddv d0, p3, z0.d > uaddv d1, p3, z1.d > uzp1v0.2s, v0.2s, v1.2s > ret > > This patch passes all regression tests on aarch64 with no new failures. > A supporting test for this optimisation is also written and passes. > > OK for master? I do not have commit rights so I cannot push the patch > myself. > > gcc/ChangeLog: > > * config/aarch64/aarch64-simd.md: (*aarch64_trunc_concat) > new insn definition. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/truncated_concatenation_1.c: new test > for the above example and other modes covered by insn > definitions. > --- > gcc/config/aarch64/aarch64-simd.md| 16 ++ > .../aarch64/sve/truncated_concatenation_1.c | 32 +++ > 2 files changed, 48 insertions(+) > create mode 100644 > gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index cfe95bd4c31..6c129d6c4a8 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -1872,6 +1872,22 @@ >[(set_attr "type" "neon_permute")] > ) > > +(define_insn "*aarch64_trunc_concat" > + [(set (match_operand: 0 "register_operand" "=w") > + (truncate: > + (vec_concat:VQN > + (match_operand: 1 "register_operand" "w") > + (match_operand: 2 "register_operand" "w"] > + "TARGET_SIMD" > +{ > + if (!BYTES_BIG_ENDIAN) > +return "uzp1\\t%0., %1., %2."; > + else > +return "uzp1\\t%0., %2., %1."; > +} > + [(set_attr "type" "neon_permute")] > +) > + > ;; Packing doubles. > > (define_expand "vec_pack_trunc_" > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c > b/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c > new file mode 100644 > index 000..95577a1a9ef > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c > @@ -0,0 +1,32 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -Wall -march=armv8.2-a+sve" } */ > + > +#include > +#include > + > +int8x8_t f1 (int16x4_t a, int16x4_t b) { > +int8x8_t ab = vdup_n_s8 (0); > +int16x8_t ab_concat = vcombine_s16 (a, b); > +ab = vmovn_s16 (ab_concat); > +return ab; > +} > + > +int16x4_t f2 (int32x2_t a, int32x2_t b) { > +int16x4_t ab = vdup_n_s16 (0); > +int32x4_t ab_concat = vcombine_s32 (a, b); > +ab = vmovn_s32 (ab_concat); > +return ab; > +} > + > +int32x2_t f3 (svint64_t a, svint64_t b) { > +int32x2_t ab = vdup_n_s32 (0); > +ab = vset_lane_s32 ((int)svaddv_s64 (svptrue_b64 (), a), ab, 0); > +ab = vset_lane_s32 ((int)svaddv_s64 (svptrue_b64 (), b), ab, 1); > +return ab; > +} > + > +/* { dg-final { scan-assembler-not {\txtn\t} } }*/ > +/* { dg-final { scan-assembler-not {\tfcvtn\t} } }*/ > +/* { dg-final { scan-assembler-times {\tuzp1\tv[0-9]+\.8b, v[0-9]+\.8b, > v[0-9]+\.8b} 1 } }*/ > +/* { dg-final { scan-assembler-times {\tuzp1\tv[0-9]+\.4h, v[0-9]+\.4h, > v[0-9]+\.4h} 1 } }*/ > +/* { dg-final { scan-assembler-times {\tuzp1\tv[0-9]+\.2s, v[0-9]+\.2s, > v[0-9]+\.2s} 1 } }*/ > \ No newline at end of file
[COMMITTED 06/30] ada: Silence unused parameter warning on linux
From: Tonu Naks In __gnat_locate_exec_on_path (char *exec_name, int current_dir_on_windows) the recently added second parameter is for windows only. On non-windows platforms its usage is removed by the preprocessor and the compiler reports unused parameter. gcc/ada/ChangeLog: * adaint.c: void parameter on non-windows platforms Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/adaint.c | 4 1 file changed, 4 insertions(+) diff --git a/gcc/ada/adaint.c b/gcc/ada/adaint.c index 0b6d4bb6b4e..0459956ff5b 100644 --- a/gcc/ada/adaint.c +++ b/gcc/ada/adaint.c @@ -3110,6 +3110,10 @@ __gnat_locate_exec_on_path (char *exec_name, int current_dir_on_windows) } #else + /* Tell the compiler that we are not going to use this parameter + on non-windows platforms. */ + (void)current_dir_on_windows; + const char *path_val = getenv ("PATH"); /* If PATH is not defined, proceed with __gnat_locate_exec anyway, so we can -- 2.43.0
[COMMITTED 07/30] ada: Follow-on to Use inheritance in Gen_IL
From: Bob Duff Fix too-long line. gcc/ada/ChangeLog: * gen_il-gen.adb: Fix too-long line. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/gen_il-gen.adb | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/ada/gen_il-gen.adb b/gcc/ada/gen_il-gen.adb index 5064e8c6eb7..c200abc0223 100644 --- a/gcc/ada/gen_il-gen.adb +++ b/gcc/ada/gen_il-gen.adb @@ -983,7 +983,8 @@ package body Gen_IL.Gen is end loop; Iterate_Types (Node_Kind, Pre => Check_Potential_Inheritance'Access); - Iterate_Types (Entity_Kind, Pre => Check_Potential_Inheritance'Access); + Iterate_Types + (Entity_Kind, Pre => Check_Potential_Inheritance'Access); if Could_Be_Inherited_Error then raise Illegal with "some fields could be inherited"; -- 2.43.0
[COMMITTED 01/30] ada: Remove workaround for RM_Size being unable to represent unknown size
From: Piotr Trojanek When validating instances of Ada.Unchecked_Conversion, we explicitly detected null arrays, because a size of 0 was used to indicate both an unknown size and an actual size of 0. This limitation has been lifted, so we can remove detection of null arrays. Code cleanup; behavior is unaffected. gcc/ada/ChangeLog: * sem_ch13.adb (Validate_Unchecked_Conversions): Remove detection of null arrays; remove tests for sizes being present, which are redundant after calling Known_Static_RM_Size. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_ch13.adb | 42 +++--- 1 file changed, 3 insertions(+), 39 deletions(-) diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb index f425048222e..fe6429fea97 100644 --- a/gcc/ada/sem_ch13.adb +++ b/gcc/ada/sem_ch13.adb @@ -18870,39 +18870,6 @@ package body Sem_Ch13 is procedure Validate_Unchecked_Conversions is - function Is_Null_Array (T : Entity_Id) return Boolean; - -- We want to warn in the case of converting to a wrong-sized array of - -- bytes, including the zero-size case. This returns True in that case, - -- which is necessary because a size of 0 is used to indicate both an - -- unknown size and a size of 0. It's OK for this to return True in - -- other zero-size cases, but we don't go out of our way; for example, - -- we don't bother with multidimensional arrays. - - function Is_Null_Array (T : Entity_Id) return Boolean is - begin - if Is_Array_Type (T) and then Is_Constrained (T) then -declare - Index : constant Node_Id := First_Index (T); - R : Node_Id; -- N_Range -begin - case Nkind (Index) is - when N_Range => - R := Index; - when N_Subtype_Indication => - R := Range_Expression (Constraint (Index)); - when N_Identifier | N_Expanded_Name => - R := Scalar_Range (Entity (Index)); - when others => - raise Program_Error; - end case; - - return Is_Null_Range (Low_Bound (R), High_Bound (R)); -end; - end if; - - return False; - end Is_Null_Array; - begin for N in Unchecked_Conversions.First .. Unchecked_Conversions.Last loop declare @@ -18933,9 +18900,8 @@ package body Sem_Ch13 is goto Continue; end if; -if (Known_Static_RM_Size (Source) - and then Known_Static_RM_Size (Target)) - or else Is_Null_Array (Target) +if Known_Static_RM_Size (Source) + and then Known_Static_RM_Size (Target) then -- This validation check, which warns if we have unequal sizes -- for unchecked conversion, and thus implementation dependent @@ -18946,9 +18912,7 @@ package body Sem_Ch13 is Source_Siz := RM_Size (Source); Target_Siz := RM_Size (Target); - if Present (Source_Siz) and then Present (Target_Siz) - and then Source_Siz /= Target_Siz - then + if Source_Siz /= Target_Siz then Error_Msg ("?z?types for unchecked conversion have different sizes!", Eloc, Act_Unit); -- 2.43.0
[COMMITTED 02/30] ada: Fix markup in user's guide
From: Ronan Desplanques gcc/ada/ChangeLog: * doc/gnat_ugn/building_executable_programs_with_gnat.rst: Fix markup. * gnat_ugn.texi: Regenerate. Tested on x86_64-pc-linux-gnu, committed on master. --- .../doc/gnat_ugn/building_executable_programs_with_gnat.rst | 2 +- gcc/ada/gnat_ugn.texi | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst b/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst index 48e47eb8336..4f46fba93ae 100644 --- a/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst +++ b/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst @@ -6747,7 +6747,7 @@ be presented in subsequent sections. .. index:: -o (gnatbind) :switch:`-o {file}` - Name the output file ``file`` (default is :file:`b~`xxx`.adb`). + Name the output file ``file`` (default is :file:`b~{xxx}.adb`). Note that if this option is used, then linking must be done manually, gnatlink cannot be used. diff --git a/gcc/ada/gnat_ugn.texi b/gcc/ada/gnat_ugn.texi index 544d73e7f36..252b32f74ae 100644 --- a/gcc/ada/gnat_ugn.texi +++ b/gcc/ada/gnat_ugn.texi @@ -19,7 +19,7 @@ @copying @quotation -GNAT User's Guide for Native Platforms , Dec 12, 2024 +GNAT User's Guide for Native Platforms , Jan 03, 2025 AdaCore @@ -16077,7 +16077,7 @@ equivalent @code{gnatmake} flag (@ref{d1,,Switches for gnatmake}). @item @code{-o `file'} -Name the output file @code{file} (default is @code{b~`xxx}.adb`). +Name the output file @code{file} (default is @code{b~`xxx'.adb}). Note that if this option is used, then linking must be done manually, gnatlink cannot be used. -- 2.43.0
[PATCH v3 0/6] c++: Add some missing LAMBDA_EXPR_EXTRA_SCOPEs
This patch series fixes some ABI issues in lambdas, with a side effect of fixing some issues with module streaming. This doesn't completely fix the ABI for lambdas (in particular, namespace scope aliases are still broken) but it at least improves the situation. Successfully bootstrapped and regtested on x86_64-pc-linux-gnu. Nathaniel Shead (6): c++: Fix mangling of lambdas in static data member initializers [PR107741] c++: Fix mangling of otherwise unattached class-scope lambdas [PR118245] c++: Fix ABI for lambdas declared in alias templates [PR116568] c++: Update mangling of lambdas in expressions c++/modules: Add testcase for fixed ICE [PR116568] c++/modules: Diagnose TU-local lambdas, give mangling scope to lambdas in concepts gcc/c-family/c-opts.cc| 2 +- gcc/common.opt| 5 +- gcc/cp/cp-tree.h | 9 +- gcc/cp/decl2.cc | 77 gcc/cp/lambda.cc | 33 +- gcc/cp/mangle.cc | 15 ++- gcc/cp/module.cc | 7 +- gcc/cp/parser.cc | 111 -- gcc/cp/pt.cc | 50 +--- gcc/doc/invoke.texi | 3 + gcc/testsuite/g++.dg/abi/lambda-ctx2-19.C | 10 ++ gcc/testsuite/g++.dg/abi/lambda-ctx2-19vs20.C | 8 ++ gcc/testsuite/g++.dg/abi/lambda-ctx2-20.C | 10 ++ gcc/testsuite/g++.dg/abi/lambda-ctx2.h| 27 + gcc/testsuite/g++.dg/abi/lambda-ctx3.C| 21 gcc/testsuite/g++.dg/abi/lambda-ctx4.C| 29 + gcc/testsuite/g++.dg/abi/macro0.C | 2 +- gcc/testsuite/g++.dg/abi/mangle74.C | 4 +- .../g++.dg/cpp0x/static-member-init-1.C | 5 + .../g++.dg/cpp2a/lambda-generic-mangle1.C | 2 +- .../g++.dg/cpp2a/lambda-generic-mangle1a.C| 2 +- gcc/testsuite/g++.dg/cpp2a/lambda-uneval20.C | 7 ++ gcc/testsuite/g++.dg/modules/internal-4_b.C | 6 +- gcc/testsuite/g++.dg/modules/lambda-8.h | 7 ++ gcc/testsuite/g++.dg/modules/lambda-8_a.H | 5 + gcc/testsuite/g++.dg/modules/lambda-8_b.C | 5 + gcc/testsuite/g++.dg/modules/lambda-9.h | 2 + gcc/testsuite/g++.dg/modules/lambda-9_a.H | 4 + gcc/testsuite/g++.dg/modules/lambda-9_b.C | 6 + gcc/testsuite/g++.dg/modules/late-ret-3_a.H | 2 +- gcc/testsuite/g++.dg/other/fold1.C| 2 +- 31 files changed, 402 insertions(+), 76 deletions(-) create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx2-19.C create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx2-19vs20.C create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx2-20.C create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx2.h create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx3.C create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx4.C create mode 100644 gcc/testsuite/g++.dg/cpp0x/static-member-init-1.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval20.C create mode 100644 gcc/testsuite/g++.dg/modules/lambda-8.h create mode 100644 gcc/testsuite/g++.dg/modules/lambda-8_a.H create mode 100644 gcc/testsuite/g++.dg/modules/lambda-8_b.C create mode 100644 gcc/testsuite/g++.dg/modules/lambda-9.h create mode 100644 gcc/testsuite/g++.dg/modules/lambda-9_a.H create mode 100644 gcc/testsuite/g++.dg/modules/lambda-9_b.C -- 2.47.0
[PATCH v3 1/6] c++: Fix mangling of lambdas in static data member initializers [PR107741]
This fixes an issue where lambdas declared in the initializer of a static data member within the class body do not get a mangling scope of that variable; this results in mangled names that do not conform to the ABI spec. To do this, the patch splits up grokfield for this case specifically, allowing a declaration to be build and used in start_lambda_scope before parsing the initializer, so that record_lambda_scope works correctly. As a drive-by, this also fixes the issue of a static member not being visible within its own initializer. PR c++/107741 gcc/c-family/ChangeLog: * c-opts.cc (c_common_post_options): Bump ABI version. gcc/ChangeLog: * common.opt: Add -fabi-version=20. * doc/invoke.texi: Likewise. gcc/cp/ChangeLog: * cp-tree.h (start_initialized_static_member): Declare. (finish_initialized_static_member): Declare. * decl2.cc (start_initialized_static_member): New function. (finish_initialized_static_member): New function. * lambda.cc (record_lambda_scope): Support falling back to old ABI (maybe with warning). * parser.cc (cp_parser_member_declaration): Build decl early when parsing an initialized static data member. gcc/testsuite/ChangeLog: * g++.dg/abi/macro0.C: Bump ABI version. * g++.dg/abi/mangle74.C: Remove XFAILs. * g++.dg/other/fold1.C: Restore originally raised error. * g++.dg/abi/lambda-ctx2-19.C: New test. * g++.dg/abi/lambda-ctx2-19vs20.C: New test. * g++.dg/abi/lambda-ctx2-20.C: New test. * g++.dg/abi/lambda-ctx2.h: New test. * g++.dg/cpp0x/static-member-init-1.C: New test. Signed-off-by: Nathaniel Shead --- gcc/c-family/c-opts.cc| 2 +- gcc/common.opt| 5 +- gcc/cp/cp-tree.h | 3 + gcc/cp/decl2.cc | 77 +++ gcc/cp/lambda.cc | 33 ++-- gcc/cp/parser.cc | 74 -- gcc/doc/invoke.texi | 3 + gcc/testsuite/g++.dg/abi/lambda-ctx2-19.C | 10 +++ gcc/testsuite/g++.dg/abi/lambda-ctx2-19vs20.C | 8 ++ gcc/testsuite/g++.dg/abi/lambda-ctx2-20.C | 10 +++ gcc/testsuite/g++.dg/abi/lambda-ctx2.h| 27 +++ gcc/testsuite/g++.dg/abi/macro0.C | 2 +- gcc/testsuite/g++.dg/abi/mangle74.C | 4 +- .../g++.dg/cpp0x/static-member-init-1.C | 5 ++ gcc/testsuite/g++.dg/other/fold1.C| 2 +- 15 files changed, 230 insertions(+), 35 deletions(-) create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx2-19.C create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx2-19vs20.C create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx2-20.C create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx2.h create mode 100644 gcc/testsuite/g++.dg/cpp0x/static-member-init-1.C diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc index b81d1350b1a..87b231861a6 100644 --- a/gcc/c-family/c-opts.cc +++ b/gcc/c-family/c-opts.cc @@ -1084,7 +1084,7 @@ c_common_post_options (const char **pfilename) /* Change flag_abi_version to be the actual current ABI level, for the benefit of c_cpp_builtins, and to make comparison simpler. */ - const int latest_abi_version = 19; + const int latest_abi_version = 20; /* Generate compatibility aliases for ABI v13 (8.2) by default. */ const int abi_compat_default = 13; diff --git a/gcc/common.opt b/gcc/common.opt index e2ac99df1d0..4c2560a0632 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1034,10 +1034,13 @@ Driver Undocumented ; Default in G++ 13. ; ; 19: Emits ABI tags if needed in structured binding mangled names. -; Ignores cv-quals on [[no_unique_object]] members. +; Ignores cv-quals on [[no_unique_address]] members. ; Mangles constraints on function templates. ; Default in G++ 14. ; +; 20: Fix mangling of lambdas in static data member initializers. +; Default in G++ 15. +; ; Additional positive integers will be assigned as new versions of ; the ABI become the default version of the ABI. fabi-version= diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 1d741ecedc3..f77a325bdd0 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -7224,6 +7224,9 @@ extern tree grokfield (const cp_declarator *, cp_decl_specifier_seq *, tree, bool, tree, tree); extern tree grokbitfield (const cp_declarator *, cp_decl_specifier_seq *, tree, tree, tree); +extern tree start_initialized_static_member(const cp_declarator *, +cp_decl_specifier_seq *, tree); +extern void finish_initialized_static_member (tree, tree, tree); extern tree splice_template_attributes (tree *, tree); extern bool any_dependent_type_attributes_p(tree); extern tree cp_reconstruct_complex_type
[PATCH v3 2/6] c++: Fix mangling of otherwise unattached class-scope lambdas [PR118245]
Something like this should probably be backported to GCC 14 too, since my change in r14-9232-g3685fae23bb008 inadvertantly caused ICEs that this fixes. But without the previous patch this patch will cause ABI changes, and I'm not sure how easily it would be to divorce those changes from the fix here. I suppose probably the true issue is that r14-9232 inadvertantly changed ABI for lambdas in base classes, and we should just revert the parser.cc changes for 14.3? (And accept that it'll regress those modules tests.) -- >8 -- This is a step closer to implementing the suggested changes for https://github.com/itanium-cxx-abi/cxx-abi/pull/85. Most lambdas defined within a class should have an extra scope of that class so that uses across different TUs are properly merged by the linker. This also needs to happen during template instantiation. While I was working on this I found some other cases where the mangling of lambdas was incorrect and causing issues, notably the testcase lambda-ctx3.C which currently emits the same mangling for the base class and member lambdas, causing mysterious assembler errors since r14-9232. This is also the root cause of PR c++/118245. One notable case not handled either here or in the ABI is what is supposed to happen with lambdas declared in alias templates; see lambda-ctx4.C. I believe that by the C++ standard, such lambdas should also dedup across TUs, but this isn't currently implemented (for class-scope or not). I wasn't able to work out how to fix the mangling logic for this case easily so I've just excluded alias templates from the class-scope mangling rules in template instantiation. Since this should only affect usage of lambdas in unevaluated contexts (a C++20 feature) this patch does not add an ABI flag to control this behaviour. PR c++/118245 gcc/cp/ChangeLog: * cp-tree.h (LAMBDA_EXPR_EXTRA_SCOPE): Adjust comment. * parser.cc (cp_parser_class_head): Start (and do not finish) lambda scope for all valid types. (cp_parser_class_specifier): Finish lambda scope after parsing members instead. (cp_parser_member_declaration): Adjust comment to mention missing lambda scoping for static member initializers. * pt.cc (instantiate_class_template): Add lambda scoping. (instantiate_template): Likewise. gcc/testsuite/ChangeLog: * g++.dg/abi/lambda-ctx3.C: New test. * g++.dg/abi/lambda-ctx4.C: New test. * g++.dg/cpp2a/lambda-uneval20.C: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/cp-tree.h | 3 ++- gcc/cp/parser.cc | 23 gcc/cp/pt.cc | 14 +++- gcc/testsuite/g++.dg/abi/lambda-ctx3.C | 21 ++ gcc/testsuite/g++.dg/abi/lambda-ctx4.C | 22 +++ gcc/testsuite/g++.dg/cpp2a/lambda-uneval20.C | 7 ++ 6 files changed, 79 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx3.C create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx4.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval20.C diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index f77a325bdd0..824b7bb61ae 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -1562,7 +1562,8 @@ enum cp_lambda_default_capture_mode_type { (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->locus) /* The mangling scope for the lambda: FUNCTION_DECL, PARM_DECL, VAR_DECL, - FIELD_DECL or NULL_TREE. If this is NULL_TREE, we have no linkage. */ + FIELD_DECL, TYPE_DECL, or NULL_TREE. If this is NULL_TREE, we have no + linkage. */ #define LAMBDA_EXPR_EXTRA_SCOPE(NODE) \ (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->extra_scope) diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index dd98649..c851185a86d 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -27313,6 +27313,8 @@ cp_parser_class_specifier (cp_parser* parser) if (!braces.require_open (parser)) { pop_deferring_access_checks (); + if (type != error_mark_node) + finish_lambda_scope (); return error_mark_node; } @@ -27377,7 +27379,10 @@ cp_parser_class_specifier (cp_parser* parser) if (cp_parser_allow_gnu_extensions_p (parser)) attributes = cp_parser_gnu_attributes_opt (parser); if (type != error_mark_node) -type = finish_struct (type, attributes); +{ + type = finish_struct (type, attributes); + finish_lambda_scope (); +} if (nested_name_specifier_p) pop_inner_scope (old_scope, scope); @@ -28217,6 +28222,12 @@ cp_parser_class_head (cp_parser* parser, if (flag_concepts) type = associate_classtype_constraints (type); + /* Lambdas in bases and members must have the same mangling scope for ABI. + We open this scope now, and will close it in cp_parser_class_specifier + after parsing the member list. */ + if (type && type != erro
[PATCH v3 3/6] c++: Fix ABI for lambdas declared in alias templates [PR116568]
I'm not 100% sure I've handled this properly, any feedback welcome. In particular, maybe should I use `DECL_IMPLICIT_TYPEDEF_P` in the mangling logic instead of `!TYPE_DECL_ALIAS_P`? They both seem to work in this case but not sure which would be clearer. I also looked into trying do a limited form of 'start_decl' before parsing the type but there were too many circular dependencies for me to work through, so I think any such changes would have to wait till GCC16 (if they're even possible at all). -- >8 -- This adds mangling support for lambdas with a mangling context of an alias template, and gives that context when instantiating such a lambda. This only currently works for class-scope alias templates, however, due to the if (LAMBDA_EXPR_EXTRA_SCOPE (t)) record_lambda_scope (r); condition in 'tsubst_lambda_scope'. For namespace-scope alias templates, we can't easily add the mangling context: we can't build the TYPE_DECL to record against until after we've parsed the type (and already recorded lambda scope), as `start_decl` relies on the type being passed in correctly, and setting the mangling scope after parsing is too late because e.g. 'template_class_depth' (called from grokfndecl when building the lambda functions while parsing the type) relies on the LAMBDA_EXPR_EXTRA_SCOPE already being properly set. This will also likely matter for 'determine_visibility'. I'm not sure what a good way to break this recursive dependency is. This change also requires a slight adjustment to the late-ret-3 testcase, as changing the order of creating the node for the alias adjusted the tiebreak sorting of cluster members when logging. PR c++/116568 gcc/cp/ChangeLog: * mangle.cc (maybe_template_info): Support getting template info of alias templates. (canonicalize_for_substitution): Don't canonicalise aliases. (decl_mangling_context): Don't treat aliases as lambda closure types. (write_unqualified_name): Likewise. * pt.cc (tsubst_decl): Start lambda scope for alias templates. (instantiate_template): No longer need to special case alias templates here. gcc/testsuite/ChangeLog: * g++.dg/abi/lambda-ctx4.C: Adjust mangling, include namespace scope alias templates (XFAILed for now). * g++.dg/modules/late-ret-3_a.H: Adjust cluster order. Signed-off-by: Nathaniel Shead --- gcc/cp/mangle.cc| 13 +++- gcc/cp/pt.cc| 23 + gcc/testsuite/g++.dg/abi/lambda-ctx4.C | 21 --- gcc/testsuite/g++.dg/modules/late-ret-3_a.H | 2 +- 4 files changed, 33 insertions(+), 26 deletions(-) diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc index 170dafd52c1..9d457e2a2f3 100644 --- a/gcc/cp/mangle.cc +++ b/gcc/cp/mangle.cc @@ -292,7 +292,7 @@ abi_check (int ver) static tree maybe_template_info (const tree decl) { - if (TREE_CODE (decl) == TYPE_DECL) + if (TREE_CODE (decl) == TYPE_DECL && !TYPE_DECL_ALIAS_P (decl)) { /* TYPE_DECLs are handled specially. Look at its type to decide if this is a template instantiation. */ @@ -305,7 +305,7 @@ maybe_template_info (const tree decl) { /* Check if the template is a primary template. */ if (DECL_LANG_SPECIFIC (decl) != NULL - && VAR_OR_FUNCTION_DECL_P (decl) + && (VAR_OR_FUNCTION_DECL_P (decl) || TREE_CODE (decl) == TYPE_DECL) && DECL_TEMPLATE_INFO (decl) && PRIMARY_TEMPLATE_P (DECL_TI_TEMPLATE (decl))) return DECL_TEMPLATE_INFO (decl); @@ -402,8 +402,8 @@ write_exception_spec (tree spec) static inline tree canonicalize_for_substitution (tree node) { - /* For a TYPE_DECL, use the type instead. */ - if (TREE_CODE (node) == TYPE_DECL) + /* For a non-alias TYPE_DECL, use the type instead. */ + if (TREE_CODE (node) == TYPE_DECL && !TYPE_DECL_ALIAS_P (node)) node = TREE_TYPE (node); if (TYPE_P (node) && TYPE_CANONICAL (node) != node @@ -1044,6 +1044,7 @@ decl_mangling_context (tree decl) decl = DECL_TEMPLATE_RESULT (decl); if (TREE_CODE (decl) == TYPE_DECL + && !TYPE_DECL_ALIAS_P (decl) && LAMBDA_TYPE_P (TREE_TYPE (decl))) { tree extra = LAMBDA_TYPE_EXTRA_SCOPE (TREE_TYPE (decl)); @@ -1588,7 +1589,9 @@ write_unqualified_name (tree decl) if (TREE_CODE (decl) == TYPE_DECL && TYPE_UNNAMED_P (type)) write_unnamed_type_name (type); - else if (TREE_CODE (decl) == TYPE_DECL && LAMBDA_TYPE_P (type)) + else if (TREE_CODE (decl) == TYPE_DECL + && !TYPE_DECL_ALIAS_P (decl) + && LAMBDA_TYPE_P (type)) write_closure_type_name (type); else write_source_name (DECL_NAME (decl)); diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index cb234b53a55..cba7b97ef70 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -15753,6 +15753,7 @@ tsubst_decl (tree t, tree args, tsubst_flags_t com
[PATCH v3 4/6] c++: Update mangling of lambdas in expressions
https://github.com/itanium-cxx-abi/cxx-abi/pull/85 clarifies that mangling a lambda expression should use 'L' rather than "tl". This only affects C++20 (and later) so no ABI flag is given. gcc/cp/ChangeLog: * mangle.cc (write_expression): Update mangling for lambdas. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/lambda-generic-mangle1.C: Update mangling. * g++.dg/cpp2a/lambda-generic-mangle1a.C: Likewise. Signed-off-by: Nathaniel Shead --- gcc/cp/mangle.cc | 2 +- gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C | 2 +- gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc index 9d457e2a2f3..e2e8fb2c71b 100644 --- a/gcc/cp/mangle.cc +++ b/gcc/cp/mangle.cc @@ -3769,7 +3769,7 @@ write_expression (tree expr) equivalent. So just use the closure type mangling. */ - write_string ("tl"); + write_char ('L'); write_type (LAMBDA_EXPR_CLOSURE (expr)); write_char ('E'); } diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C b/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C index 0051307f53d..306959a4f9f 100644 --- a/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C +++ b/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1.C @@ -6,4 +6,4 @@ struct C { void f(decltype([](T, auto) { return 0; })) {} }; void g() { C().f({}); } -// { dg-final { scan-assembler "_ZN1C1fIiEEvDTtlNS_UlT_TL0__E_EEE" } } +// { dg-final { scan-assembler "_ZN1C1fIiEEvDTLNS_UlT_TL0__E_EEE" } } diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C b/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C index dc7b0125631..b7bd4ecdd46 100644 --- a/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C +++ b/gcc/testsuite/g++.dg/cpp2a/lambda-generic-mangle1a.C @@ -7,4 +7,4 @@ struct C { void f(decltype([](T, auto) { return 0; })) {} }; void g() { C().f({}); } -// { dg-final { scan-assembler "_ZN1C1fIiEEvDTtlNS_UlT_T_E_EEE" } } +// { dg-final { scan-assembler "_ZN1C1fIiEEvDTLNS_UlT_T_E_EEE" } } -- 2.47.0
[PATCH v3 6/6] c++/modules: Diagnose TU-local lambdas, give mangling scope to lambdas in concepts
Happy to defer this till GCC16 if preferred. -- >8 -- This fills in a hole left in r15-6378-g9016c5ac94c557 with regards to detection of TU-local lambdas. Now that LAMBDA_EXPR_EXTRA_SCOPE is properly set for most lambdas we can use it to detect lambdas that are TU-local. Lambdas in concept definitions I believe should not be considered TU-local, since they are always unevaluated and should never be emitted. This patch gives these lambdas a mangling scope (though it will never be actually used in name mangling). Namespace-scope alias declarations currently do not have an extra scope set either, so this patch will cause them to be falsely detected as TU-local and error. However, I believe this is better than the alternative, as such lambdas cannot be properly referred to from other TUs now anyway causing strange issues down the line if they were allowed. (Clang makes such lambdas internal linkage.) If in the future we ever correctly give an extra scope to these lambdas they will automatically start working in modules again, but in the meantime they should be avoided (or wrapped in an unnamed namespace). gcc/cp/ChangeLog: * cp-tree.h (finish_concept_definition): Adjust parameters. (start_concept_definition): Declare. * module.cc (depset::hash::is_tu_local_entity): Use LAMBDA_EXPR_EXTRA_SCOPE to detect TU-local lambdas. * parser.cc (cp_parser_concept_definition): Start a lambda scope for concept definitions. * pt.cc (tsubst_lambda_expr): Namespace-scope lambdas may now have extra scope. (finish_concept_definition): Split into... (start_concept_definition): ...this new function. gcc/testsuite/ChangeLog: * g++.dg/modules/internal-4_b.C: Remove XFAILs, add new XFAIL for lambda alias. * g++.dg/modules/lambda-9.h: New test. * g++.dg/modules/lambda-9_a.H: New test. * g++.dg/modules/lambda-9_b.C: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/cp-tree.h| 3 ++- gcc/cp/module.cc| 7 --- gcc/cp/parser.cc| 14 +- gcc/cp/pt.cc| 21 +++-- gcc/testsuite/g++.dg/modules/internal-4_b.C | 6 +- gcc/testsuite/g++.dg/modules/lambda-9.h | 2 ++ gcc/testsuite/g++.dg/modules/lambda-9_a.H | 4 gcc/testsuite/g++.dg/modules/lambda-9_b.C | 6 ++ 8 files changed, 51 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/lambda-9.h create mode 100644 gcc/testsuite/g++.dg/modules/lambda-9_a.H create mode 100644 gcc/testsuite/g++.dg/modules/lambda-9_b.C diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 824b7bb61ae..ec8935243de 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -8679,7 +8679,8 @@ struct diagnosing_failed_constraint extern cp_expr finish_constraint_or_expr (location_t, cp_expr, cp_expr); extern cp_expr finish_constraint_and_expr (location_t, cp_expr, cp_expr); extern cp_expr finish_constraint_primary_expr (cp_expr); -extern tree finish_concept_definition (cp_expr, tree, tree); +extern tree start_concept_definition (cp_expr); +extern tree finish_concept_definition (tree, tree, tree); extern tree combine_constraint_expressions (tree, tree); extern tree append_constraint (tree, tree); extern tree get_constraints (const_tree); diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 5350e6c4bad..aa0fee1c6cf 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -13365,9 +13365,10 @@ depset::hash::is_tu_local_entity (tree decl, bool explain/*=false*/) tree main_decl = TYPE_MAIN_DECL (type); if (!DECL_CLASS_SCOPE_P (main_decl) && !decl_function_context (main_decl) - /* FIXME: Lambdas defined outside initializers. We'll need to more -thoroughly set LAMBDA_TYPE_EXTRA_SCOPE to check this. */ - && !LAMBDA_TYPE_P (type)) + /* LAMBDA_EXPR_EXTRA_SCOPE will be set for lambdas defined in +contexts where they would not be TU-local. */ + && !(LAMBDA_TYPE_P (type) + && LAMBDA_TYPE_EXTRA_SCOPE (type))) { if (explain) inform (loc, "%qT has no name and is not defined within a class, " diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index c851185a86d..f29d2fd0b00 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -31680,8 +31680,20 @@ cp_parser_concept_definition (cp_parser *parser) return error_mark_node; } + tree decl = start_concept_definition (id); + if (decl == error_mark_node) +{ + cp_parser_skip_to_end_of_statement (parser); + cp_parser_consume_semicolon_at_end_of_statement (parser); + return error_mark_node; +} + processing_constraint_expression_sentinel parsing_constraint; + + start_lambda_scope (decl); tree init = cp_parser_constr
Re: [PATCH v2] Replace uptr by usize/SIZE_T in interfaces
On Mon, Jan 06, 2025 at 11:59:08AM +0100, Stefan Schulze Frielinghaus wrote: > For some targets uptr is mapped to unsigned int and size_t to unsigned > long and sizeof(int)==sizeof(long) holds. Still, these are distinct > types and type checking may fail. Therefore, replace uptr by > usize/SIZE_T wherever a size_t is expected. > > Part of #116957 > > Cherry picked from LLVM commit 9a156f6b2b0c892d8713ba907f07f027b24953d8 > (removed memprof, msan, and nsan parts). > > libsanitizer/ChangeLog: > > PR sanitizer/117725 > * asan/asan_interceptors.cpp: Cherry picked LLVM commit > 9a156f6b2b0c892d8713ba907f07f027b24953d8. > * asan/asan_interceptors.h: Ditto. > * asan/asan_interceptors_memintrinsics.h: Ditto. > * asan/sanitizer_common_interceptors.inc: Ditto. > * sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc: > Ditto. > * sanitizer_common/sanitizer_platform_limits_posix.h: > Ditto. > * tsan/tsan_interceptors_posix.cpp: Ditto. LGTM, thanks. Jakub
libgo patch committed: fix Config.Time in tests with expired certificates
PR 118286 points out that some libgo tests are starting to fail because they use test certificates that expired on January 1. This libgo patch is a backport of https://go.dev/cl/640237 in the main repo. It uses the existing config.Time field to avoid these test failures. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian ed1493e12ed75e837e9b9aa794ed24daf397df7c diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE index d189a9c6df0..7c93a2e9123 100644 --- a/gcc/go/gofrontend/MERGE +++ b/gcc/go/gofrontend/MERGE @@ -1,4 +1,4 @@ -f4956f807f1a33e406cf1b3bf3479a9ac1c1015a +96014b17d9a846d1d878ac4732c2baaf5ee8b2d2 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. diff --git a/libgo/go/crypto/tls/handshake_client_test.go b/libgo/go/crypto/tls/handshake_client_test.go index 0950bb0ac45..40409a42c5b 100644 --- a/libgo/go/crypto/tls/handshake_client_test.go +++ b/libgo/go/crypto/tls/handshake_client_test.go @@ -881,6 +881,7 @@ func testResumption(t *testing.T, version uint16) { MaxVersion: version, CipherSuites: []uint16{TLS_RSA_WITH_RC4_128_SHA, TLS_ECDHE_RSA_WITH_RC4_128_SHA}, Certificates: testConfig.Certificates, + Time: testTime, } issuer, err := x509.ParseCertificate(testRSACertificateIssuer) @@ -897,6 +898,7 @@ func testResumption(t *testing.T, version uint16) { ClientSessionCache: NewLRUClientSessionCache(32), RootCAs:rootCAs, ServerName: "example.golang", + Time: testTime, } testResumeState := func(test string, didResume bool) { @@ -944,20 +946,20 @@ func testResumption(t *testing.T, version uint16) { } // An old session ticket can resume, but the server will provide a ticket encrypted with a fresh key. - serverConfig.Time = func() time.Time { return time.Now().Add(24*time.Hour + time.Minute) } + serverConfig.Time = func() time.Time { return testTime().Add(24*time.Hour + time.Minute) } testResumeState("ResumeWithOldTicket", true) if bytes.Equal(ticket[:ticketKeyNameLen], getTicket()[:ticketKeyNameLen]) { t.Fatal("old first ticket matches the fresh one") } // Now the session tickey key is expired, so a full handshake should occur. - serverConfig.Time = func() time.Time { return time.Now().Add(24*8*time.Hour + time.Minute) } + serverConfig.Time = func() time.Time { return testTime().Add(24*8*time.Hour + time.Minute) } testResumeState("ResumeWithExpiredTicket", false) if bytes.Equal(ticket, getTicket()) { t.Fatal("expired first ticket matches the fresh one") } - serverConfig.Time = func() time.Time { return time.Now() } // reset the time back + serverConfig.Time = func() time.Time { return testTime() } // reset the time back key1 := randomKey() serverConfig.SetSessionTicketKeys([][32]byte{key1}) @@ -974,11 +976,11 @@ func testResumption(t *testing.T, version uint16) { testResumeState("KeyChangeFinish", true) // Age the session ticket a bit, but not yet expired. - serverConfig.Time = func() time.Time { return time.Now().Add(24*time.Hour + time.Minute) } + serverConfig.Time = func() time.Time { return testTime().Add(24*time.Hour + time.Minute) } testResumeState("OldSessionTicket", true) ticket = getTicket() // Expire the session ticket, which would force a full handshake. - serverConfig.Time = func() time.Time { return time.Now().Add(24*8*time.Hour + time.Minute) } + serverConfig.Time = func() time.Time { return testTime().Add(24*8*time.Hour + 2*time.Minute) } testResumeState("ExpiredSessionTicket", false) if bytes.Equal(ticket, getTicket()) { t.Fatal("new ticket wasn't provided after old ticket expired") @@ -988,7 +990,7 @@ func testResumption(t *testing.T, version uint16) { d := 0 * time.Hour for i := 0; i < 13; i++ { d += 12 * time.Hour - serverConfig.Time = func() time.Time { return time.Now().Add(d) } + serverConfig.Time = func() time.Time { return testTime().Add(d) } testResumeState("OldSessionTicket", true) } // Expire it (now a little more than 7 days) and make sure a full @@ -996,7 +998,7 @@ func testResumption(t *testing.T, version uint16) { // TLS 1.3 since the client should be using a fresh ticket sent over // by the server. d += 12 * time.Hour - serverConfig.Time = func() time.Time { return time.Now().Add(d) } + serverConfig.Time = func() time.Time { return testTime().Add(d) } if version == VersionTLS13 { testResumeState("ExpiredSessionTicket", true) } else
Re: [PATCH v2 3/7] Alpha: Fix a block move pessimisation with zero-extension after LDWU
On 1/6/25 6:03 AM, Maciej W. Rozycki wrote: For the BWX case we have a pessimisation in `alpha_expand_block_move' for HImode loads where we place the data loaded into a HImode register as well, therefore losing information that indeed the data loaded has already been zero-extended to the full DImode width of the register. Later on when we store this data in QImode quantities into an unaligned destination, we zero-extend it again for the purpose of right-shifting, such as with the test case included producing code at `-O2' as follows: ldah $2,unaligned_src_hi($29) !gprelhigh lda $1,unaligned_src_hi($2) !gprellow ldwu $6,unaligned_src_hi($2)!gprellow ldwu $5,2($1) ldwu $4,4($1) bis $31,$31,$31 zapnot $6,3,$3 # Redundant! ldbu $7,6($1) zapnot $5,3,$2 # Redundant! stb $6,0($16) zapnot $4,3,$1 # Redundant! stb $5,2($16) srl $3,8,$3 stb $4,4($16) srl $2,8,$2 stb $3,1($16) srl $1,8,$1 stb $2,3($16) stb $1,5($16) stb $7,6($16) The non-BWX case is unaffected, because there we use byte insertion, so we don't care that data is held in a HImode register. Address this by making the holding RTX a HImode subreg of the original DImode register, which the RTL passes can then see through and eliminate the zero-extension where otherwise required, resulting in this shortened code: ldah $2,unaligned_src_hi($29) !gprelhigh lda $1,unaligned_src_hi($2) !gprellow ldwu $4,unaligned_src_hi($2)!gprellow ldwu $3,2($1) ldwu $2,4($1) bis $31,$31,$31 srl $4,8,$6 ldbu $1,6($1) srl $3,8,$5 stb $4,0($16) stb $6,1($16) srl $2,8,$4 stb $3,2($16) stb $5,3($16) stb $2,4($16) stb $4,5($16) stb $1,6($16) While at it reformat the enclosing do-while statement according to the GNU Coding Standards, observing that in this case it does not obfuscate the change owing to the odd original indentation. gcc/ * config/alpha/alpha.cc (alpha_expand_block_move): Use a HImode subreg of a DImode register to hold data from an aligned HImode load. Generally OK. + data_regs[nregs++] = gen_rtx_SUBREG (HImode, tmp, 0); Would this be better as a gen_lowpart or something like that rather than generating the SUBREG directly? Though I guess you don't have to worry about big endian at all, so i may not matter in practice. Your call. jeff
Re: [RFC][PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
Jennifer Schmitz writes: >> It would also be good to check for performance regressions, now that we have >> a patch to test: >> I will run SPEC2017 with -mcpu=generic and -mcpu=native on Grace, but we >> would appreciate help with benchmarking on other platforms. >> Tamar, would you still be willing to test the patch on other platforms? >> >> If there are no other changes necessary and assuming there are no >> performance regressions, I was planning to commit the patch in January after >> returning from christmas break. >> >> In the meantime I wish everyone happy holidays. >> Jennifer > On Grace, the patch has no non-noise impact on performance for SPEC2017 with > -mcpu=generic and -mcpu=native. I also re-validated on aarch64 today, no > regression. > Do you advise to run additional performance tests or is the patch ready to be > pushed to trunk? Go for it :) The patch is clearly moving in the right direction and it's a question of "when" not "if" we remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS. We'll also get more performance coverage on non-aarch64 targets once the patch is pushed. Thanks again for doing this. Richard
RE: [RFC][PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> -Original Message- > From: Richard Sandiford > Sent: Monday, January 6, 2025 5:54 PM > To: Jennifer Schmitz > Cc: Richard Biener ; Richard Biener > ; Tamar Christina ; > gcc-patches@gcc.gnu.org; Kyrylo Tkachov > Subject: Re: [RFC][PATCH] AArch64: Remove > AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > > Jennifer Schmitz writes: > >> It would also be good to check for performance regressions, now that we > >> have a > patch to test: > >> I will run SPEC2017 with -mcpu=generic and -mcpu=native on Grace, but we > would appreciate help with benchmarking on other platforms. > >> Tamar, would you still be willing to test the patch on other platforms? > >> > >> If there are no other changes necessary and assuming there are no > >> performance > regressions, I was planning to commit the patch in January after returning > from > christmas break. > >> > >> In the meantime I wish everyone happy holidays. > >> Jennifer > > On Grace, the patch has no non-noise impact on performance for SPEC2017 with > -mcpu=generic and -mcpu=native. I also re-validated on aarch64 today, no > regression. > > Do you advise to run additional performance tests or is the patch ready to > > be > pushed to trunk? > > Go for it :) The patch is clearly moving in the right direction and it's a > question of "when" not "if" we remove > AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS. > We'll also get more performance coverage on non-aarch64 targets once the > patch is pushed. > > Thanks again for doing this. Same, Sorry I had missed the question from me above. Just go for it and I'll triage any issues after it's been committed. We can address any fallout in stage-4. Thanks for sticking with it. Tamar. > > Richard
Re: [PATCH] c: Restore warning for incomplete structures declared in parameter list [PR117866]
On Mon, 6 Jan 2025, Martin Uecker wrote: > > Happy new year! Please consider the following patch. > > Bootstrapped and regression tested on x86_64. > > > c: Restore warning for incomplete structures declared in parameter list > [PR117866] > > In C23 mode the warning about declaring structures and union in > parameter lists was removed, because it is possible to redeclare > a compatible type elsewhere. This is not the case for incomplete types, > so restore the warning for those types. > > PR c/117886 > > gcc/c/ChangeLog: > * c-decl.cc (get_parm_info): Change condition for warning. > > gcc/testsuite/ChangeLog: > * gcc.dg/pr117866.c: New test. > * gcc.dg/struct-pr118007.c: Adapt. OK. -- Joseph S. Myers josmy...@redhat.com
Re: [PATCH v2 2/7] Alpha: Optimize block moves coming from longword-aligned source
On 1/6/25 6:03 AM, Maciej W. Rozycki wrote: Now that we have proper alignment determination for block moves in place the case of copying a block of longword-aligned data has become real, so implement the merging of loaded data from pairs of SImode registers into single DImode registers for the purpose of using with unaligned stores efficiently, as suggested by a comment in `alpha_expand_block_move' and discard the comment. Provide test cases accordingly. gcc/ * config/alpha/alpha.cc (alpha_expand_block_move): Merge loaded data from pairs of SImode registers into single DImode registers if to be used with unaligned stores. gcc/testsuite/ * gcc.target/alpha/memcpy-si-aligned.c: New file. * gcc.target/alpha/memcpy-si-unaligned.c: New file. * gcc.target/alpha/memcpy-si-unaligned-dst.c: New file. * gcc.target/alpha/memcpy-si-unaligned-src.c: New file. * gcc.target/alpha/memcpy-si-unaligned-src-bwx.c: New file. OK jeff
Re: [PATCH] testsuite: generalized field-merge tests for <32-bit int [PR118025]
On Dec 22, 2024, Alexandre Oliva wrote: > for gcc/testsuite/ChangeLog > PR testsuite/118025 > * gcc.dg/field-merge-1.c: Convert constants to desired types. > * gcc.dg/field-merge-3.c: Likewise. > * gcc.dg/field-merge-4.c: Likewise. > * gcc.dg/field-merge-5.c: Likewise. > * gcc.dg/field-merge-11.c: Likewise. > * gcc.dg/field-merge-17.c: Don't mess with padding bits. Ping? https://gcc.gnu.org/pipermail/gcc-patches/2024-December/672194.html -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
[PATCH 0/4] vect, cfgloopmanip: Fix profile consistency of early break loops [PR117790]
This patch series aims to fix the consistency of the CFG profile for vectorized early break loops. I.e., if the CFG profile entering the vectorizer is consistent for a given early break loop, this series aims to ensure that the final vectorized loop also has a consistent profile. For SPEC CPU 2017 on Neoverse V1, the series shows a 4.8% improvement in imagick with no significant regressions elsewhere. The series is structured as follows: - 1/4 adds missing code to set the counts of the exit blocks when creating the CFG for a vectorized early break loop. - 2/4 does most of the heavy lifting, adding new code to cfgloopmanip to correctly scale the profile of a loop with multiple exits, adjusting scale_loop_profile to use it and exposing new helpers in cfgloopmanip.h used by subsequent vect patches. - 3/4 makes various fixes to ensure adding an epilog skip edge preserves consistency of the profile. - 4/4 adjusts scale_profile_for_vect_loop to make it correctly handle loops with multiple exits. Many thanks to Andrew Carlotti who gave me significant help in debugging and discussing the ideas used in this patch series. Bootstrapped/regtested on aarch64-linux-gnu, arm-linux-gnueabihf, and x86_64-linux-gnu. Alex Coplan (4): vect: Set counts of early break exit blocks correctly [PR117790] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790] vect: Ensure profile consistency when adding epilog guard [PR117790] vect: Fix scale_profile_for_vect_loop for multiple exits [PR117790] gcc/cfgloopmanip.cc | 309 -- gcc/cfgloopmanip.h| 7 + .../gcc.dg/vect/vect-early-break-profile-1.c | 10 + .../gcc.dg/vect/vect-early-break-profile-2.c | 21 ++ gcc/tree-vect-loop-manip.cc | 58 +++- gcc/tree-vect-loop.cc | 21 +- 6 files changed, 378 insertions(+), 48 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break-profile-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break-profile-2.c
[PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]
As it stands, scale_loop_profile doesn't correctly handle loops with multiple exits. In particular, in the case where the expected niters exceeds iteration_bound, scale_loop_profile attempts to reduce the number of iterations with a call to scale_loop_frequencies, which multiplies the count of each BB by a given probability. This transformation preserves the relationships between the counts of the BBs within the loop (and thus the edge probabilities stay the same) but this cannot possibly work for loops with multiple exits, since in order for the expected niters to reduce (and counts along exit edges to remain the same), the exit edge probabilities must increase, thus decreasing the probabilities of the internal edges, meaning that the ratios of the counts of the BBs inside the loop must change. So we need a different approach (not a straightforward multiplicative scaling) to adjust the expected niters of a loop with multiple exits. This patch introduces a new helper, flow_scale_loop_freqs, which can be used to correctly scale the profile of a loop with multiple exits. It is parameterized by a probability (with which to scale the header and therefore the expected niters) and a lambda which gives the desired counts for the exit edges. In this patch, to make things simpler, flow_scale_loop_freqs only handles loop shapes without internal control flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test whether a given loop meets these criteria. This restriction is reasonable since this patch is motivated by fixing the profile consistency for early break vectorization, and we don't currently vectorize loops with internal control flow. We also fall back to a multiplicative scaling (the status quo) for loops that flow_scale_loop_freqs can't handle, so the patch should be a net improvement. We wrap the call to flow_scale_loop_freqs in a helper scale_loop_freqs_with_exit_counts which handles the above-mentioned fallback. This wrapper is still generic in that it accepts a lambda to allow overriding the desired exit edge counts. We specialize this with another wrapper, scale_loop_freqs_hold_exit_counts (keeping the counts along exit edges fixed), which is then used to implement the niters-scaling case of scale_loop_profile, thus fixing this path through the function for loops with multiple exits. Finally, we expose two new wrapper functions in cfgloopmanip.h for use in subsequent vectorizer patches. scale_loop_profile_hold_exit_counts is a variant of scale_loop_profile which assumes we want to keep the counts along exit edges of the loop fixed through both parts of the transformation (including the initial probability scale). scale_loop_freqs_with_new_exit_count is intended to be used in a subsequent patch when adding a skip edge around the epilog, where the reduction of count entering the loop is mirrored by a reduced count along a given exit edge. Bootstrapped/regtested as a series on aarch64-linux-gnu, x86_64-linux-gnu, and arm-linux-gnueabihf. OK for trunk? Thanks, Alex gcc/ChangeLog: PR tree-optimization/117790 * cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New. (flow_scale_loop_freqs): New. (scale_loop_freqs_with_exit_counts): New. (scale_loop_freqs_hold_exit_counts): New. (scale_loop_profile): Refactor to use the newly-added scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to correctly handle reducing the expected niters for loops with multiple exits. (scale_loop_freqs_with_new_exit_count): New. (scale_loop_profile_1): New. (scale_loop_profile_hold_exit_counts): New. * cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New. (scale_loop_freqs_with_new_exit_count): New. --- gcc/cfgloopmanip.cc | 309 gcc/cfgloopmanip.h | 7 + 2 files changed, 294 insertions(+), 22 deletions(-) diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc index 534e556e1e4..1714f312d42 100644 --- a/gcc/cfgloopmanip.cc +++ b/gcc/cfgloopmanip.cc @@ -677,17 +677,243 @@ update_loop_exit_probability_scale_dom_bbs (class loop *loop, return exit_edge; } -/* Scale profile in LOOP by P. - If ITERATION_BOUND is not -1, scale even further if loop is predicted - to iterate too many times. - Before caling this function, preheader block profile should be already - scaled to final count. This is necessary because loop iterations are - determined by comparing header edge count to latch ege count and thus - they need to be scaled synchronously. */ +/* Check if LOOP is a simple loop for scaling purposes; that is, it + has no internal control flow and at most one exit from each BB. + + Also check if the sum of the new exit counts given by + GET_EXIT_COUNT will differ significantly from the count coming in to + the loop. Return false if so, since flow_scale_loop_freqs relies on + loops having consistent profiles in
[PATCH 1/4] vect: Set counts of early break exit blocks correctly [PR117790]
This adds missing code to correctly set the counts of the exit blocks we create when building the CFG for a vectorized early break loop. Tested as a series on aarch64-linux-gnu, arm-linux-gnueabihf, and x86_64-linux-gnu. OK for trunk? Thanks, Alex gcc/ChangeLog: PR tree-optimization/117790 * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Set profile counts for {main,alt}_loop_exit_block. --- gcc/tree-vect-loop-manip.cc | 10 ++ 1 file changed, 10 insertions(+) diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index 5d1b70aea43..53d36eaa25f 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -1686,6 +1686,16 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit, set_immediate_dominator (CDI_DOMINATORS, new_preheader, loop->header); + + /* Fix up the profile counts of the new exit blocks. + main_loop_exit_block was created by duplicating the + preheader, so needs its count scaling according to the main + exit edge's probability. The remaining count from the + preheader goes to the alt_loop_exit_block, since all + alternative exits have been redirected there. */ + main_loop_exit_block->count = loop_exit->count (); + alt_loop_exit_block->count + = preheader->count - main_loop_exit_block->count; } /* Adjust the epilog loop PHI entry values to continue iteration.
[PATCH 3/4] vect: Ensure profile consistency when adding epilog guard [PR117790]
This patch tries to make the CFG profile consistent when adding a guard edge to skip the epilog during peeling. The changes can be summarized as follows: - We avoid adding the guard edge entirely if the guard condition folds to false, otherwise the profile will become inconsistent since the cfgcleanup code doesn't attempt to update it on removing the dead edge. - If the guard condition instead folds to true, we account for this by giving the skip edge 100% probability (otherwise the profile will again become inconsistent when removing the other now-dead edge). - Finally, we use the new helper scale_loop_freqs_with_new_exit_count instead of scale_loop_profile to update the epilog frequencies / probabiltiies. We make the assumption here that if the IV exit is taken in the vector loop, then it will also be taken in the epilog (and not an early exit). Since we add the guard to the vector iv exit, we know any reduction in count associated with the epilog skip should be accounted for by a reduction in the epilog's iv exit edge count. Bootstrapped/regtested as a series on aarch64-linux-gnu, arm-linux-gnueabihf, and x86_64-linux-gnu. OK for trunk? Thanks, Alex gcc/ChangeLog: PR tree-optimization/117790 * tree-vect-loop-manip.cc (vect_do_peeling): Attempt to maintain consistency of the CFG profile when adding an epilog skip edge. gcc/testsuite/ChangeLog: PR tree-optimization/117790 * gcc.dg/vect/vect-early-break-profile-1.c: New test. --- .../gcc.dg/vect/vect-early-break-profile-1.c | 10 gcc/tree-vect-loop-manip.cc | 48 ++- 2 files changed, 47 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break-profile-1.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-profile-1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-profile-1.c new file mode 100644 index 000..5387e3a0465 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-profile-1.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-add-options vect_early_break } */ +/* { dg-additional-options "-fdump-tree-vect-blocks-details" } */ +int a[100]; +void f() +{ + for (int i = 0; i < 100 && a[i]; i++) +a[i]++; +} +/* { dg-final { scan-tree-dump-not "Invalid sum" "vect" } } */ diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index 53d36eaa25f..4d472ab56ab 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -3546,18 +3546,23 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, /* If we have a peeled vector iteration we will never skip the epilog loop and we can simplify the cfg a lot by not doing the edge split. */ - if (skip_epilog || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + guard_cond = fold_build2 (EQ_EXPR, boolean_type_node, +niters, niters_vector_mult_vf); + if ((skip_epilog || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + && !integer_zerop (guard_cond)) { - guard_cond = fold_build2 (EQ_EXPR, boolean_type_node, -niters, niters_vector_mult_vf); + profile_probability prob_skip + = integer_onep (guard_cond) + ? profile_probability::always () + : prob_epilog.invert (); guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; + edge enter_e = single_succ_edge (guard_bb); edge epilog_e = LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo); guard_to = epilog_e->dest; guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to, skip_vector ? anchor : guard_bb, - prob_epilog.invert (), - irred_flag); + prob_skip, irred_flag); doms.safe_push (guard_to); if (vect_epilogues) epilogue_vinfo->skip_this_loop_edge = guard_e; @@ -3586,15 +3591,36 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, } } - /* Only need to handle basic block before epilog loop if it's not - the guard_bb, which is the case when skip_vector is true. */ - if (guard_bb != bb_before_epilog) + basic_block epilog_ph = loop_preheader_edge (epilog)->src; + + profile_probability epilog_scale + = (epilog_ph->count - guard_e->count ()).probability_in (epilog_ph->count); + + enter_e->dest->count -= guard_e->count (); + + /* If we added a vector skip then ENTER_E may not target the epilog + preheader but a block that has the epilog preheader as its single + successor: handle that case. */ + if (enter_e->dest != epilog_ph) { - prob_epilog = prob_vector * prob_epilog + prob_vector.invert (); + gcc_assert (single_succ (enter_e->dest) == epilog_ph); + epilog_ph->count -= guard_e->count (); + } - scale_bbs_frequencies (&bb_before_epilog, 1, prob_epilog); + if (epilog_e->count () < guard_e->count ()) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "epilog iv exit count < guard count, " + "profile will become inco
[PATCH 4/4] vect: Fix scale_profile_for_vect_loop for multiple exits [PR117790]
This adjusts scale_profile_for_vect_loop to DTRT for loops with multiple exits, namely using scale_loop_profile_hold_exit_counts instead and scaling the expected niters by 1 / VF. Tested as a series on aarch64-linux-gnu, arm-linux-gnueabihf, and x86_64-linux-gnu. OK for trunk? Thanks, Alex gcc/ChangeLog: PR tree-optimization/117790 * tree-vect-loop.cc (scale_profile_for_vect_loop): Use scale_loop_profile_hold_exit_counts instead of scale_loop_profile. Drop the exit edge parameter, since the code now handles multiple exits. Adjust the caller ... (vect_transform_loop): ... here. gcc/testsuite/ChangeLog: PR tree-optimization/117790 * gcc.dg/vect/vect-early-break-profile-2.c: New test. --- .../gcc.dg/vect/vect-early-break-profile-2.c | 21 +++ gcc/tree-vect-loop.cc | 21 ++- 2 files changed, 27 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break-profile-2.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-profile-2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-profile-2.c new file mode 100644 index 000..03c67802b74 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-profile-2.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-add-options vect_early_break } */ +/* { dg-additional-options "-fdump-tree-vect-blocks-details" } */ +int DECPOWERS[11]; +int multies[] = {5, 3, 1049, 0}; +short decNumberSquareRoot_accnext; +int decNumberSquareRoot_accunits; +void decGetDigits(short *, int); +void decNumberSquareRoot() { + int exponent, drop = 0; + for (;; drop++) { +if (exponent >= 0) + break; +if (decNumberSquareRoot_accnext * multies[drop] >> 7 * DECPOWERS[drop]) + break; +exponent++; + } + if (drop) +decGetDigits(&decNumberSquareRoot_accnext, decNumberSquareRoot_accunits); +} +/* { dg-final { scan-tree-dump-not "Invalid sum" "vect" } } */ diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 2b9d5956635..ca65c0058db 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -12022,7 +12022,7 @@ vect_gen_loop_len_mask (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi, profile. */ static void -scale_profile_for_vect_loop (class loop *loop, edge exit_e, unsigned vf, bool flat) +scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool flat) { /* For flat profiles do not scale down proportionally by VF and only cap by known iteration count bounds. */ @@ -12053,18 +12053,10 @@ scale_profile_for_vect_loop (class loop *loop, edge exit_e, unsigned vf, bool fl vf /= 2; } - if (entry_count.nonzero_p ()) -set_edge_probability_and_rescale_others - (exit_e, - entry_count.probability_in (loop->header->count / vf)); - /* Avoid producing very large exit probability when we do not have - sensible profile. */ - else if (exit_e->probability < profile_probability::always () / (vf * 2)) -set_edge_probability_and_rescale_others (exit_e, exit_e->probability * vf); - loop->latch->count = single_pred_edge (loop->latch)->count (); - - scale_loop_profile (loop, profile_probability::always () / vf, - get_likely_max_loop_iterations_int (loop)); + const auto likely_max_niters = get_likely_max_loop_iterations_int (loop); + scale_loop_profile_hold_exit_counts (loop, + profile_probability::always () / vf, + likely_max_niters); } /* For a vectorized stmt DEF_STMT_INFO adjust all vectorized PHI @@ -12874,8 +12866,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) assumed_vf) - 1 : wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed, assumed_vf) - 1); - scale_profile_for_vect_loop (loop, LOOP_VINFO_IV_EXIT (loop_vinfo), - assumed_vf, flat); + scale_profile_for_vect_loop (loop, assumed_vf, flat); if (dump_enabled_p ()) {
Re: [PATCH] c: do not warn about truncating NUL char when initializing nonstring arrays [PR117178]
On Sun, Dec 15, 2024 at 08:02:57PM -0800, Kees Cook wrote: > When initializing a nonstring char array when compiled with > -Wunterminated-string-initialization the warning trips even when > truncating the trailing NUL character from the string constant. Only > warn about this when running under -Wc++-compat since under C++ we should > not initialize nonstrings from C strings. Thanks for the patch. > > PR c/117178 > > gcc/c/ChangeLog: > > * c-typeck.cc (digest_init): Check for nonstring attribute and > avoid warning when only the trailing NUL is truncated. > (build_c_cast): Update function call prototype. > (store_init_value): Ditto. > (output_init_element): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.dg/Wunterminated-string-initialization.c: Update for > new warning text and check for nonstring cases. > * gcc.dg/Wunterminated-string-initialization-c++.c: Duplicate > C test for -Wc++-compat. > --- > gcc/c/c-typeck.cc | 73 --- > .../Wunterminated-string-initialization-c++.c | 28 +++ > .../Wunterminated-string-initialization.c | 26 ++- > 3 files changed, 98 insertions(+), 29 deletions(-) > create mode 100644 > gcc/testsuite/gcc.dg/Wunterminated-string-initialization-c++.c > > diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc > index 10b02da8752d..5cd0c07b87b1 100644 > --- a/gcc/c/c-typeck.cc > +++ b/gcc/c/c-typeck.cc > @@ -116,8 +116,8 @@ static void push_member_name (tree); > static int spelling_length (void); > static char *print_spelling (char *); > static void warning_init (location_t, int, const char *); > -static tree digest_init (location_t, tree, tree, tree, bool, bool, bool, > bool, > - bool, bool); > +static tree digest_init (location_t, tree, tree, tree, tree, bool, bool, > bool, > + bool, bool, bool); > static void output_init_element (location_t, tree, tree, bool, tree, tree, > bool, >bool, struct obstack *); > static void output_pending_init_elements (int, struct obstack *); > @@ -6731,7 +6731,7 @@ build_c_cast (location_t loc, tree type, tree expr) > t = build_constructor_single (type, field, t); > if (!maybe_const) > t = c_wrap_maybe_const (t, true); > - t = digest_init (loc, type, t, > + t = digest_init (loc, field, type, t, > NULL_TREE, false, false, false, true, false, false); > TREE_CONSTANT (t) = TREE_CONSTANT (value); > return t; > @@ -8646,8 +8646,8 @@ store_init_value (location_t init_loc, tree decl, tree > init, tree origtype) > } >bool constexpr_p = (VAR_P (decl) > && C_DECL_DECLARED_CONSTEXPR (decl)); > - value = digest_init (init_loc, type, init, origtype, npc, int_const_expr, > -arith_const_expr, true, > + value = digest_init (init_loc, decl, type, init, origtype, npc, > +int_const_expr, arith_const_expr, true, > TREE_STATIC (decl) || constexpr_p, constexpr_p); > >/* Store the expression if valid; else report error. */ > @@ -8996,8 +8996,8 @@ check_constexpr_init (location_t loc, tree type, tree > init, > on initializers for 'constexpr' objects apply. */ > > static tree > -digest_init (location_t init_loc, tree type, tree init, tree origtype, > - bool null_pointer_constant, bool int_const_expr, > +digest_init (location_t init_loc, tree field, tree type, tree init, > + tree origtype, bool null_pointer_constant, bool int_const_expr, >bool arith_const_expr, bool strict_string, >bool require_constant, bool require_constexpr) Please document the new FIELD parameter. > { > @@ -9132,27 +9132,46 @@ digest_init (location_t init_loc, tree type, tree > init, tree origtype, > && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST) > { > unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (inside_init); > - unsigned unit = TYPE_PRECISION (typ1) / BITS_PER_UNIT; > - > - /* Subtract the size of a single (possibly wide) character > - because it's ok to ignore the terminating null char > - that is counted in the length of the constant. */ > - if (compare_tree_int (TYPE_SIZE_UNIT (type), len - unit) < 0) > - pedwarn_init (init_loc, 0, > - ("initializer-string for array of %qT " > -"is too long"), typ1); > - else if (warn_unterminated_string_initialization > -&& compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0) > - warning_at (init_loc, OPT_Wunterminated_string_initialization, > - ("initializer-string for array of %qT " > - "is too long"), typ1); > + > if (compare_tree_int (TYPE_SIZE_UNIT (type), len) <
Re: [PATCH v2 6/7] Alpha: Add option to avoid data races for sub-longword memory stores [PR117759]
On Mon, 6 Jan 2025 at 16:13, Jeff Law wrote: > > But in the case of concurrent accesses, shouldn't these objects be > declared as atomic? No. They aren't concurrent accesses to the same variable. They are concurrent accesses to *different* memory locations, and the compiler is not allowed to mess them up. IOW, if you have struct myvar { pthread_mutex_t buffer_lock; pthread_mutex_t another_var_lock; char buffer[7]; char another_var; }; and "buffer_lock" serializes accesses to "buffer[]", and "another_var_lock" serializes accesses to "another_var", then the compiler IS NOT ALLOWED TO TOUCH "another_var" when the code touches "buffer[]". So if a compiler turns "memset(var->buffer, 0, 7)" into "load 8 bytes, clear 7 of the bytes, store 8 bytes", then the compiler is buggy. Because that messes up another thread that accesses "another_var", and the 8-byte write may write back an old value that is no longer valid. There is absolutely no gray area here. It was always buggy, and the alpha architecture was always completely and fundamentally mis-designed. C11 made it explicitly clear: "Different threads of execution are always allowed to access (read and modify) different memory locations concurrently, with no interference and no synchronization requirements" but honestly, that was just codifying something that should have been blindingly obvious even before. Linus
Re: [PATCH] COBOL 1/8 hdr: header files
On Sat, 4 Jan 2025, James K. Lowden wrote: > On Fri, 3 Jan 2025 19:46:38 +0100 > Jakub Jelinek wrote: > > > Again, the question is if it needs to be supported everywhere, or > > just error out on targets which don't have _Float128 > > Our preference is simply to error out on targets that don't support > _Float128. "error out" when? GCC should build --enable-languages=all for all targets. But some languages or libraries might get disabled if not supported for the target. So it shouldn't literally be failing the build of GCC, though it might be disabling COBOL or the COBOL runtime libraries. Furthermore, in general whether _Float128 is supported depends on the multilib, which strongly indicates that this should affect the configure and build logic for the runtime libraries rather than the build of the compiler. Specifically, look at libquadmath/configure.ac: the configure script is still run when __float128 isn't supported, but BUILD_LIBQUADMATH is disabled so that no library actually gets built when the type isn't supported. I don't know how well the powerpc64*-linux-gnu --enable-targets=all configurations actually work in practice, but in principle they have both BE and LE multilibs (LE supports _Float128, BE doesn't). -- Joseph S. Myers josmy...@redhat.com
Re: [PATCH v2 1/7] Alpha: Always respect -mbwx, -mcix, -mfix, -mmax, and their inverse
On 1/6/25 6:03 AM, Maciej W. Rozycki wrote: Contrary to user documentation the `-mbwx', `-mcix', `-mfix', `-mmax' feature options and their inverse forms are ignored whenever `-mcpu=' option is in effect, either by having been given explicitly or where configured as the default such as with the `alphaev56-linux-gnu' target. In the latter case there is no way to change the settings these options are supposed to tweak other than with `-mcpu=' and the settings cannot be individually controlled, making all the feature options permanently inactive. It seems a regression from commit 7816bea0e23b ("config.gcc: Reorganize --with-cpu logic.") back in 2003, which replaced the setting of the default feature mask with the setting of the default CPU across a few targets, and the complementing logic in the Alpha backend wasn't updated accordingly. Fix this by making the individual feature options take precedence over `-mcpu='. Add test cases to verify this is the case, and to cover the defaults as well for the boundary cases. This has a drawback where the order of the options is ignored between `-mcpu=' and these individual options, so e.g. `-mno-bwx -mcpu=ev6' will keep the BWX feature disabled even though `-mcpu=ev6' comes later in the command line. This may affect some scenarios involving user overrides such as with CFLAGS passed to `configure' and `make' invocations. I do believe it has been our practice anyway for more finegrained options to override group options regardless of their relative order on the command line and in any case using `-mcpu=ev6 -mbwx' as the override will do the right thing if required, canceling any previous `-mno-bwx'. This has been spotted with `alphaev56-linux-gnu' target verification and a recently added test case: FAIL: gcc.target/alpha/stwx0.c -O1 scan-assembler-times \\sldq_u\\s 2 FAIL: gcc.target/alpha/stwx0.c -O1 scan-assembler-times \\smskwh\\s 1 FAIL: gcc.target/alpha/stwx0.c -O1 scan-assembler-times \\smskwl\\s 1 FAIL: gcc.target/alpha/stwx0.c -O1 scan-assembler-times \\sstq_u\\s 2 (and similarly for the remaining optimization levels covered) which this fix has addressed. gcc/ * config/alpha/alpha.cc (alpha_option_override): Ignore CPU flags corresponding to features the enabling or disabling of which has been requested with an individual feature option. gcc/testsuite/ * gcc.target/alpha/target-bwx-1.c: New file. * gcc.target/alpha/target-bwx-2.c: New file. * gcc.target/alpha/target-bwx-3.c: New file. * gcc.target/alpha/target-bwx-4.c: New file. * gcc.target/alpha/target-cix-1.c: New file. * gcc.target/alpha/target-cix-2.c: New file. * gcc.target/alpha/target-cix-3.c: New file. * gcc.target/alpha/target-cix-4.c: New file. * gcc.target/alpha/target-fix-1.c: New file. * gcc.target/alpha/target-fix-2.c: New file. * gcc.target/alpha/target-fix-3.c: New file. * gcc.target/alpha/target-fix-4.c: New file. * gcc.target/alpha/target-max-1.c: New file. * gcc.target/alpha/target-max-2.c: New file. * gcc.target/alpha/target-max-3.c: New file. * gcc.target/alpha/target-max-4.c: New file. OK jeff
Re: [PATCH v2 4/7] Alpha: Export `emit_unlikely_jump' for a subsequent change to use
On 1/6/25 6:03 AM, Maciej W. Rozycki wrote: Rename `emit_unlikely_jump' function to `alpha_emit_unlikely_jump', so as to avoid namespace pollution, updating callers accordingly and export it for use in the machine description. Make it return the insn emitted. gcc/ * config/alpha/alpha-protos.h (alpha_emit_unlikely_jump): New prototype. * config/alpha/alpha.cc (emit_unlikely_jump): Rename to... (alpha_emit_unlikely_jump): ... this. Return the insn emitted. (alpha_split_atomic_op, alpha_split_compare_and_swap) (alpha_split_compare_and_swap_12, alpha_split_atomic_exchange) (alpha_split_atomic_exchange_12): Update call sites accordingly. OK jeff
Re: [PATCH] ifcombine field-merge: improve handling of dwords
On Dec 21, 2024, Alexandre Oliva wrote: > On Dec 20, 2024, Jakub Jelinek wrote: >> On Wed, Dec 18, 2024 at 12:59:11AM -0300, Alexandre Oliva wrote: >>> * gcc.dg/field-merge-16.c: New. >> Note the test FAILs on i686-linux or on x86_64-linux with -m32. > Indeed, thanks. Here's a fix. Ping? https://gcc.gnu.org/pipermail/gcc-patches/2024-December/672161.html > for gcc/ChangeLog > * gimple-fold.cc (fold_truth_andor_for_ifcombine): Limit > boundary choice by word size as well. Try aligned double-word > loads as a last resort. > for gcc/testsuite/ChangeLog > * gcc.dg/field-merge-17.c: New. -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
Re: [PATCH v2 6/7] Alpha: Add option to avoid data races for sub-longword memory stores [PR117759]
On 1/6/25 6:03 AM, Maciej W. Rozycki wrote: With non-BWX Alpha implementations we have a problem of data races where a 8-bit byte or 16-bit word quantity is to be written to memory in that in those cases we use an unprotected RMW access of a 32-bit longword or 64-bit quadword width. If contents of the longword or quadword accessed outside the byte or word to be written are changed midway through by a concurrent write executing on the same CPU such as by a signal handler or a parallel write executing on another CPU such as by another thread or via a shared memory segment, then the concluding write of the RMW access will clobber them. This is especially important for the safety of RCU algorithms, but is otherwise an issue anyway. To guard against these data races with byte and aligned word quantities introduce the `-msafe-bwa' command-line option (standing for Safe Byte & Word Access) that instructs the compiler to instead use an atomic RMW access sequence where byte and word memory access machine instructions are not available. There is no change to code produced for BWX targets. It would be sufficient for the secondary reload handle to use a pair of scratch registers, as requested by `reload_out', but it would end with poor code produced as one of the scratches would be occupied by data retrieved and the other one would have to be reloaded with repeated calculations, all within the LL/SC sequence. Therefore I chose to add a dedicated `reload_out_safe_bwa' handler and ask for more scratches there by defining a 256-bit OI integer mode. While reload is documented in our manual to support an arbitrary number of scratches in reality it hasn't been implemented for IRA: /* ??? It would be useful to be able to handle only two, or more than three, operands, but for now we can only handle the case of having exactly three: output, input and one temp/scratch. */ and it seems to be the case for LRA as well. Do what everyone else does then and just have one wide multi-register scratch. I note that the atomic sequences emitted are suboptimal performance-wise as the looping branch for the unsuccessful completion of the sequence points backwards, which means it will be predicted as taken despite that in most cases it will fall through. I do not see it as a deficiency of this change proposed as it takes care of recording that the branch is unlikely to be taken, by calling `alpha_emit_unlikely_jump'. Therefore generic code elsewhere shou Looks like this got truncated. Anyway, easy to forget how limited branch prediction was in this era. I haven't pondered static branch prediction in forever. I wouldn't worry too much about this case -- we can always come back to it if generic doesn't do the right thing with code layout. Add test cases accordingly. There are notable regressions between a plain `-mno-bwx' configuration and a `-mno-bwx -msafe-bwa' one: FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O0 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O1 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O3 -g execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -Os execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: g++.dg/init/array25.C -std=c++17 execution test FAIL: g++.dg/init/array25.C -std=c++98 execution test FAIL: g++.dg/init/array25.C -std=c++26 execution test They come from the fact that these test cases play tricks with alignment and end up calling code that expects a reference to aligned data but is handed one to unaligned data. This doesn't cause a visible problem with plain `-mno-bwx' code, because the resulting alignment exception is fixed up by Linux. There's no such handling currently implemented for LDL_L or LDQ_L instructions (which are first in the sequence) and consequently the offender is issued with SIGBUS instead. Suitable handling will be added to Linux to complement this change, so these regressions are seen as harmless and expected. gcc/ PR target/117759 * config/alpha/alpha-modes.def (OI): New integer mode. * config/alpha/alpha-protos.h (alpha_expand_mov_safe_bwa): New prototype. * config/alpha/alpha.cc (alpha_expand_mov_safe_bwa): New function. (alpha_secondary_reload): Handle TARGET_SAFE_BWA. * config/alpha/alpha.md (aligned_store_safe_bwa) (unaligned_store_safe_bwa, reload_out_safe_bwa) (reload_out_unaligned_safe_bwa): New expanders. (mov, movcqi, reload_out_aligned): Handle TARGET_SAFE_BWA. (reload_out): Guard against TARGET_SAFE_BWA. * config/alpha/alpha.opt (msafe-bwa): New option. * config/alpha/alpha.opt.urls: Regenerate.
[PATCH v1] LoongArch: Opitmize the cost of vec_construct.
When analyzing 525 on LoongArch architecture, it was found that the for loop of hotspot function x264_pixel_satd_8x4 could not be quantized 256-bit due to the cost of vec_construct setting. After re-adjusting vec_construct, the performance of 525 program was improved by 16.57%. It was found that this function can be vectorized on the aarch64 and x86 architectures, see [PR98138]. Co-Authored-By: Deng Jianbo . gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_builtin_vectorization_cost): Modify the construction cost of the vec_construct vector. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vect-slp-two-operator.c: New test. --- gcc/config/loongarch/loongarch.cc | 6 ++-- .../loongarch/vect-slp-two-operator.c | 36 +++ 2 files changed, 39 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-slp-two-operator.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 89237c377e7..ff27b96c31e 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -4127,10 +4127,10 @@ loongarch_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, case vec_construct: elements = TYPE_VECTOR_SUBPARTS (vectype); - if (ISA_HAS_LASX) - return elements + 1; + if (LASX_SUPPORTED_MODE_P (mode) && !LSX_SUPPORTED_MODE_P (mode)) + return elements / 2 + 3; else - return elements; + return elements / 2 + 1; default: gcc_unreachable (); diff --git a/gcc/testsuite/gcc.target/loongarch/vect-slp-two-operator.c b/gcc/testsuite/gcc.target/loongarch/vect-slp-two-operator.c new file mode 100644 index 000..f27492e10f0 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vect-slp-two-operator.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mlasx -ftree-vectorize -fdump-tree-vect -fdump-tree-vect-details" } */ + +typedef unsigned char uint8_t; +typedef unsigned int uint32_t; + +#define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\ +int t0 = s0 + s1;\ +int t1 = s0 - s1;\ +int t2 = s2 + s3;\ +int t3 = s2 - s3;\ +d0 = t0 + t2;\ +d1 = t1 + t3;\ +d2 = t0 - t2;\ +d3 = t1 - t3;\ +} + +void sink(uint32_t tmp[4][4]); + +int x264_pixel_satd_8x4( uint8_t *pix1, int i_pix1, uint8_t *pix2, int i_pix2 ) +{ +uint32_t tmp[4][4]; +int sum = 0; +for( int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2 ) +{ +uint32_t a0 = (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << 16); +uint32_t a1 = (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << 16); +uint32_t a2 = (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << 16); +uint32_t a3 = (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << 16); +HADAMARD4( tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], a0,a1,a2,a3 ); +} +sink(tmp); +} + +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ -- 2.20.1
Re: [PATCH] arm: [MVE intrinsics] Another fix for moves of tuples (PR target/118131)
ping? On Fri, 20 Dec 2024 at 23:53, Christophe Lyon wrote: > > Commit r15-6389-g670df03e5294a3 only partially fixed support for moves > of large modes: despite the introduction of V2x* and V4x* modes in > r15-6245-g4f4e13dd235b to support MVE tuples, we still need to support > TI, OI and XI modes, which appear for instance in gcc.dg/pr100887.c. > > The problem was noticed when running the testsuite with > -mthumb/-march=armv8.1-m.main+mve.fp+fp.dp/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto > where several tests would ICE in output_move_neon. > > gcc/ChangeLog: > > PR target/118131 > * config/arm/arm.h (VALID_MVE_STRUCT_MODE): Accept TI, OI and XI > modes again. > --- > gcc/config/arm/arm.h | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h > index b2044db938b..4246b34b6e0 100644 > --- a/gcc/config/arm/arm.h > +++ b/gcc/config/arm/arm.h > @@ -1128,7 +1128,10 @@ extern const int arm_arch_cde_coproc_bits[]; > || (MODE) == CImode || (MODE) == XImode) > > #define VALID_MVE_STRUCT_MODE(MODE)\ > - ((MODE) == V2x16QImode \ > + ((MODE) == TImode\ > + || (MODE) == OImode \ > + || (MODE) == XImode \ > + || (MODE) == V2x16QImode\ > || (MODE) == V2x8HImode \ > || (MODE) == V2x4SImode \ > || (MODE) == V2x8HFmode \ > -- > 2.34.1 >
Re: [PATCH v2 2/3] cfgexpand: Rewrite add_scope_conflicts_2 to use cache and look back further [PR111422]
On Tue, Dec 31, 2024 at 2:04 PM Tamar Christina wrote: > > > -Original Message- > > From: Richard Biener > > Sent: Wednesday, November 20, 2024 11:28 AM > > To: Andrew Pinski > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [PATCH v2 2/3] cfgexpand: Rewrite add_scope_conflicts_2 to use > > cache and look back further [PR111422] > > > > On Sat, Nov 16, 2024 at 5:25 AM Andrew Pinski > > wrote: > > > > > > After fixing loop-im to do the correct overflow rewriting > > > for pointer types too. We end up with code like: > > > ``` > > > _9 = (unsigned long) &g; > > > _84 = _9 + 18446744073709551615; > > > _11 = _42 + _84; > > > _44 = (signed char *) _11; > > > ... > > > *_44 = 10; > > > g ={v} {CLOBBER(eos)}; > > > ... > > > n[0] = &f; > > > *_44 = 8; > > > g ={v} {CLOBBER(eos)}; > > > ``` > > > > > > Which was not being recongized by the scope conflicts code. > > > This was because it only handled one level walk backs rather than > > > multiple ones. > > > This fixes the issue by having a cache which records all references to > > > addresses > > > of stack variables. > > > > > > Unlike the previous patch, this only records and looks at addresses of > > > stack > > variables. > > > The cache uses a bitmap and uses the index as the bit to look at. > > > > > > PR middle-end/117426 > > > PR middle-end/111422 > > > gcc/ChangeLog: > > > > > > * cfgexpand.cc (struct vars_ssa_cache): New class. > > > (vars_ssa_cache::vars_ssa_cache): New constructor. > > > (vars_ssa_cache::~vars_ssa_cache): New deconstructor. > > > (vars_ssa_cache::create): New method. > > > (vars_ssa_cache::exists): New method. > > > (vars_ssa_cache::add_one): New method. > > > (vars_ssa_cache::update): New method. > > > (vars_ssa_cache::dump): New method. > > > (add_scope_conflicts_2): Factor mostly out to > > > vars_ssa_cache::operator(). New cache argument. > > > Walk the bitmap cache for the stack variables addresses. > > > (vars_ssa_cache::operator()): New method factored out from > > > add_scope_conflicts_2. Rewrite to be a full walk of all operands > > > and use a worklist. > > > (add_scope_conflicts_1): Add cache new argument for the addr > > > cache. > > > Just call add_scope_conflicts_2 for the phi result instead of > > > calling > > > for the uses and don't call walk_stmt_load_store_addr_ops for > > > phis. > > > Update call to add_scope_conflicts_2 to add cache argument. > > > (add_scope_conflicts): Add cache argument and update calls to > > > add_scope_conflicts_1. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/torture/pr117426-1.c: New test. > > > > > > Signed-off-by: Andrew Pinski > > > --- > > > gcc/cfgexpand.cc | 292 +++--- > > > gcc/testsuite/gcc.dg/torture/pr117426-1.c | 53 > > > 2 files changed, 308 insertions(+), 37 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.dg/torture/pr117426-1.c > > > > > > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc > > > index b88e8827667..841d3c1254e 100644 > > > --- a/gcc/cfgexpand.cc > > > +++ b/gcc/cfgexpand.cc > > > @@ -585,35 +585,243 @@ visit_conflict (gimple *, tree op, tree, void > > > *data) > > >return false; > > > } > > > > > > -/* Helper function for add_scope_conflicts_1. For USE on > > > - a stmt, if it is a SSA_NAME and in its SSA_NAME_DEF_STMT is known to > > > be > > > - based on some ADDR_EXPR, invoke VISIT on that ADDR_EXPR. */ > > > +/* A cache for ssa name to address of stack variables. > > > + When taking into account if a ssa name refers to an > > > + address of a stack variable, we need to walk the > > > + expressions backwards to find the addresses. This > > > + cache is there so we don't need to walk the expressions > > > + all the time. */ > > > +struct vars_ssa_cache > > > +{ > > > +private: > > > + /* Currently an entry is a bitmap of all of the known stack variables > > > + addresses that are referenced by the ssa name. > > > + When the bitmap is the nullptr, then there is no cache. > > > + Currently only empty bitmaps are shared. > > > + The reason for why empty cache is not just a null is so we know the > > > + cache for an entry is filled in. */ > > > + struct entry > > > + { > > > +bitmap bmap = nullptr; > > > + }; > > > + entry *vars_ssa_caches; > > > +public: > > > > > > -static inline void > > > -add_scope_conflicts_2 (tree use, bitmap work, > > > - walk_stmt_load_store_addr_fn visit) > > > + vars_ssa_cache(); > > > + ~vars_ssa_cache(); > > > + const_bitmap operator() (tree name); > > > + void dump (FILE *file); > > > + > > > +private: > > > + /* Can't copy. */ > > > + vars_ssa_cache(const vars_ssa_cache&) = delete; > > > + vars_ssa_cache(vars_ssa_cache&&) = delete; > > > + > > > + /* The shared empty bitmap. */
Re: [PATCH] or1k: add .note.GNU-stack section on linux
On 1/6/25 10:02 AM, Stafford Horne wrote: On Mon, Jan 06, 2025 at 07:37:56AM -0700, Jeff Law wrote: On 1/6/25 6:01 AM, Stafford Horne wrote: In the OpenRISC build we get the following warning: ld: warning: __modsi3_s.o: missing .note.GNU-stack section implies executable stack ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker Fix this by adding a .note.GNU-stack to indicate the stack does not need to be executable for the lib1funcs. Note, this is also needed for the upcoming glibc 2.41. libgcc/ * config/or1k/lib1funcs.S: Add .note.GNU-stack section on linux. OK for the trunk. You've got write privs, correct? Thanks, Yes I should still have access. Also, I would like to add to the release-14 branch. Is there anything special I need to do to add the patch there? Nothing really special on the release branches. jeff
Re: [Ping, Fortran, Patch, PR116669, v1] Fix ICE in deallocation of derived types having cyclic dependencies
On 1/6/25 2:06 AM, Andre Vehreschild wrote: Hi all, pinging attached rebased patch. Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? - Andre On Thu, 12 Dec 2024 14:50:13 +0100 Andre Vehreschild wrote: Hi all, attached patch improves analysis of cycles in derived types, i.e. type dependencies ala: type(T) type(T2), allocatable :: c end type type(T2) type(T), allocatable :: t end type are now detected and deallocating an object that is of any of the types now no longer crashes the compiler because of an endless recursion. To accomplish this, I stored the symbols of the types seen in a C++ set and checked if a component's type is already present in there. When a type has such an indirect self-reference, it gets marked by setting its symbol_attribute::recursive flag. Later steps then can make use of it. Furthermore are _deallocate members of the vtab populated when a type has the recursive and the alloc_comp flag set. Bootstraps and regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? This one is good to go. Jerry Note: The patch was developed on top of my coarray patch, but should apply with delta on a regular trunk w/o issues. Regards, Andre -- Andre Vehreschild * Email: vehre ad gcc dot gnu dot org -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [-Ping-, Fortran, Patch, PR116669, v3] Fix ICE in deallocation of derived types having cyclic dependencies
On 1/6/25 6:21 AM, Andre Vehreschild wrote: Hi all, during looking for something completely different, I figured, that gcc does not use std::set internally, but its implementation of hash_set. I therefore adapted the patch to use it. Nothing more changed. Still regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? ... and this tweak is OK. Proceed. Jerry Regards, Andre On Mon, 6 Jan 2025 11:06:46 +0100 Andre Vehreschild wrote: Hi all, pinging attached rebased patch. Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? - Andre On Thu, 12 Dec 2024 14:50:13 +0100 Andre Vehreschild wrote: Hi all, attached patch improves analysis of cycles in derived types, i.e. type dependencies ala: type(T) type(T2), allocatable :: c end type type(T2) type(T), allocatable :: t end type are now detected and deallocating an object that is of any of the types now no longer crashes the compiler because of an endless recursion. To accomplish this, I stored the symbols of the types seen in a C++ set and checked if a component's type is already present in there. When a type has such an indirect self-reference, it gets marked by setting its symbol_attribute::recursive flag. Later steps then can make use of it. Furthermore are _deallocate members of the vtab populated when a type has the recursive and the alloc_comp flag set. Bootstraps and regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? Note: The patch was developed on top of my coarray patch, but should apply with delta on a regular trunk w/o issues. Regards, Andre -- Andre Vehreschild * Email: vehre ad gcc dot gnu dot org -- Andre Vehreschild * Email: vehre ad gmx dot de -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [Ping, Fortran, Patch, PR114612, v1] Fix missing deep-copy for allocatable components of derived types having cycles.
On 1/6/25 2:08 AM, Andre Vehreschild wrote: Hi all, attached patch has been rebased to latest trunk. Just pinging! Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? - Andre On Fri, 13 Dec 2024 12:10:58 +0100 Andre Vehreschild wrote: Hi all, attached patch fixes deep-copying (or rather its former absence) for allocatable components of derived types having cyclic dependencies. Regtested ok on x86_64-pc-linux-gnu / F41. Ok for mainline? Yes, OK. Thanks. Jerry Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de -- Andre Vehreschild * Email: vehre ad gmx dot de
[PATCH v3] aarch64: remove extra XTN in vector concatenation
Hi Richard, Thanks for the feedback. I've copied in the resulting patch here- if this is okay, please could it be committed on my behalf? The patch continues below. Many thanks, Akram --- GIMPLE code which performs a narrowing truncation on the result of a vector concatenation currently results in an unnecessary XTN being emitted following a UZP1 to concate the operands. In cases such as this, UZP1 should instead use a smaller arrangement specifier to replace the XTN instruction. This is seen in cases such as in this GIMPLE example: int32x2_t foo (svint64_t a, svint64_t b) { vector(2) int vect__2.8; long int _1; long int _3; vector(2) long int _12; [local count: 1073741824]: _1 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, a_6(D)); _3 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, b_7(D)); _12 = {_1, _3}; vect__2.8_13 = (vector(2) int) _12; return vect__2.8_13; } Original assembly generated: bar: ptrue p3.b, all uaddv d0, p3, z0.d uaddv d1, p3, z1.d uzp1v0.2d, v0.2d, v1.2d xtn v0.2s, v0.2d ret This patch therefore defines the *aarch64_trunc_concat insn which truncates the concatenation result, rather than concatenating the truncated operands (such as in *aarch64_narrow_trunc), resulting in the following optimised assembly being emitted: bar: ptrue p3.b, all uaddv d0, p3, z0.d uaddv d1, p3, z1.d uzp1v0.2s, v0.2s, v1.2s ret This patch passes all regression tests on aarch64 with no new failures. A supporting test for this optimisation is also written and passes. OK for master? I do not have commit rights so I cannot push the patch myself. gcc/ChangeLog: * config/aarch64/aarch64-simd.md: (*aarch64_trunc_concat) new insn definition. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/truncated_concatenation_1.c: new test for the above example and other modes covered by insn definitions. --- gcc/config/aarch64/aarch64-simd.md| 16 ++ .../aarch64/sve/truncated_concatenation_1.c | 32 +++ 2 files changed, 48 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index cfe95bd4c31..6c129d6c4a8 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1872,6 +1872,22 @@ [(set_attr "type" "neon_permute")] ) +(define_insn "*aarch64_trunc_concat" + [(set (match_operand: 0 "register_operand" "=w") + (truncate: + (vec_concat:VQN + (match_operand: 1 "register_operand" "w") + (match_operand: 2 "register_operand" "w"] + "TARGET_SIMD" +{ + if (!BYTES_BIG_ENDIAN) +return "uzp1\\t%0., %1., %2."; + else +return "uzp1\\t%0., %2., %1."; +} + [(set_attr "type" "neon_permute")] +) + ;; Packing doubles. (define_expand "vec_pack_trunc_" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c b/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c new file mode 100644 index 000..95577a1a9ef --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -Wall -march=armv8.2-a+sve" } */ + +#include +#include + +int8x8_t f1 (int16x4_t a, int16x4_t b) { +int8x8_t ab = vdup_n_s8 (0); +int16x8_t ab_concat = vcombine_s16 (a, b); +ab = vmovn_s16 (ab_concat); +return ab; +} + +int16x4_t f2 (int32x2_t a, int32x2_t b) { +int16x4_t ab = vdup_n_s16 (0); +int32x4_t ab_concat = vcombine_s32 (a, b); +ab = vmovn_s32 (ab_concat); +return ab; +} + +int32x2_t f3 (svint64_t a, svint64_t b) { +int32x2_t ab = vdup_n_s32 (0); +ab = vset_lane_s32 ((int)svaddv_s64 (svptrue_b64 (), a), ab, 0); +ab = vset_lane_s32 ((int)svaddv_s64 (svptrue_b64 (), b), ab, 1); +return ab; +} + +/* { dg-final { scan-assembler-not {\txtn\t} } }*/ +/* { dg-final { scan-assembler-not {\tfcvtn\t} } }*/ +/* { dg-final { scan-assembler-times {\tuzp1\tv[0-9]+\.8b, v[0-9]+\.8b, v[0-9]+\.8b} 1 } }*/ +/* { dg-final { scan-assembler-times {\tuzp1\tv[0-9]+\.4h, v[0-9]+\.4h, v[0-9]+\.4h} 1 } }*/ +/* { dg-final { scan-assembler-times {\tuzp1\tv[0-9]+\.2s, v[0-9]+\.2s, v[0-9]+\.2s} 1 } }*/ \ No newline at end of file -- 2.34.1
Re: [PATCH] or1k: add .note.GNU-stack section on linux
On Mon, Jan 06, 2025 at 10:03:50AM -0700, Jeff Law wrote: > > > On 1/6/25 10:02 AM, Stafford Horne wrote: > > On Mon, Jan 06, 2025 at 07:37:56AM -0700, Jeff Law wrote: > > > > > > > > > On 1/6/25 6:01 AM, Stafford Horne wrote: > > > > In the OpenRISC build we get the following warning: > > > > > > > > ld: warning: __modsi3_s.o: missing .note.GNU-stack section > > > > implies executable stack > > > > ld: NOTE: This behaviour is deprecated and will be removed in a > > > > future version of the linker > > > > > > > > Fix this by adding a .note.GNU-stack to indicate the stack does not > > > > need to be > > > > executable for the lib1funcs. > > > > > > > > Note, this is also needed for the upcoming glibc 2.41. > > > > > > > > libgcc/ > > > > * config/or1k/lib1funcs.S: Add .note.GNU-stack section on linux. > > > OK for the trunk. You've got write privs, correct? > > > > Thanks, > > > > Yes I should still have access. Also, I would like to add to the release-14 > > branch. Is there anything special I need to do to add the patch there? > Nothing really special on the release branches. Thanks, I have pushed the patch to trunk and releases/gcc-14. -Stafford
RE: [PATCH v2 2/3] cfgexpand: Rewrite add_scope_conflicts_2 to use cache and look back further [PR111422]
> -Original Message- > From: Tamar Christina > Sent: Tuesday, December 31, 2024 1:04 PM > To: Richard Biener ; Andrew Pinski > > Cc: gcc-patches@gcc.gnu.org > Subject: RE: [PATCH v2 2/3] cfgexpand: Rewrite add_scope_conflicts_2 to use > cache and look back further [PR111422] > > > -Original Message- > > From: Richard Biener > > Sent: Wednesday, November 20, 2024 11:28 AM > > To: Andrew Pinski > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [PATCH v2 2/3] cfgexpand: Rewrite add_scope_conflicts_2 to use > > cache and look back further [PR111422] > > > > On Sat, Nov 16, 2024 at 5:25 AM Andrew Pinski > > wrote: > > > > > > After fixing loop-im to do the correct overflow rewriting > > > for pointer types too. We end up with code like: > > > ``` > > > _9 = (unsigned long) &g; > > > _84 = _9 + 18446744073709551615; > > > _11 = _42 + _84; > > > _44 = (signed char *) _11; > > > ... > > > *_44 = 10; > > > g ={v} {CLOBBER(eos)}; > > > ... > > > n[0] = &f; > > > *_44 = 8; > > > g ={v} {CLOBBER(eos)}; > > > ``` > > > > > > Which was not being recongized by the scope conflicts code. > > > This was because it only handled one level walk backs rather than multiple > ones. > > > This fixes the issue by having a cache which records all references to > > > addresses > > > of stack variables. > > > > > > Unlike the previous patch, this only records and looks at addresses of > > > stack > > variables. > > > The cache uses a bitmap and uses the index as the bit to look at. > > > > > > PR middle-end/117426 > > > PR middle-end/111422 > > > gcc/ChangeLog: > > > > > > * cfgexpand.cc (struct vars_ssa_cache): New class. > > > (vars_ssa_cache::vars_ssa_cache): New constructor. > > > (vars_ssa_cache::~vars_ssa_cache): New deconstructor. > > > (vars_ssa_cache::create): New method. > > > (vars_ssa_cache::exists): New method. > > > (vars_ssa_cache::add_one): New method. > > > (vars_ssa_cache::update): New method. > > > (vars_ssa_cache::dump): New method. > > > (add_scope_conflicts_2): Factor mostly out to > > > vars_ssa_cache::operator(). New cache argument. > > > Walk the bitmap cache for the stack variables addresses. > > > (vars_ssa_cache::operator()): New method factored out from > > > add_scope_conflicts_2. Rewrite to be a full walk of all operands > > > and use a worklist. > > > (add_scope_conflicts_1): Add cache new argument for the addr > > > cache. > > > Just call add_scope_conflicts_2 for the phi result instead of > > > calling > > > for the uses and don't call walk_stmt_load_store_addr_ops for > > > phis. > > > Update call to add_scope_conflicts_2 to add cache argument. > > > (add_scope_conflicts): Add cache argument and update calls to > > > add_scope_conflicts_1. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/torture/pr117426-1.c: New test. > > > > > > Signed-off-by: Andrew Pinski > > > --- > > > gcc/cfgexpand.cc | 292 +++--- > > > gcc/testsuite/gcc.dg/torture/pr117426-1.c | 53 > > > 2 files changed, 308 insertions(+), 37 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.dg/torture/pr117426-1.c > > > > > > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc > > > index b88e8827667..841d3c1254e 100644 > > > --- a/gcc/cfgexpand.cc > > > +++ b/gcc/cfgexpand.cc > > > @@ -585,35 +585,243 @@ visit_conflict (gimple *, tree op, tree, void > > > *data) > > >return false; > > > } > > > > > > -/* Helper function for add_scope_conflicts_1. For USE on > > > - a stmt, if it is a SSA_NAME and in its SSA_NAME_DEF_STMT is known to > > > be > > > - based on some ADDR_EXPR, invoke VISIT on that ADDR_EXPR. */ > > > +/* A cache for ssa name to address of stack variables. > > > + When taking into account if a ssa name refers to an > > > + address of a stack variable, we need to walk the > > > + expressions backwards to find the addresses. This > > > + cache is there so we don't need to walk the expressions > > > + all the time. */ > > > +struct vars_ssa_cache > > > +{ > > > +private: > > > + /* Currently an entry is a bitmap of all of the known stack variables > > > + addresses that are referenced by the ssa name. > > > + When the bitmap is the nullptr, then there is no cache. > > > + Currently only empty bitmaps are shared. > > > + The reason for why empty cache is not just a null is so we know the > > > + cache for an entry is filled in. */ > > > + struct entry > > > + { > > > +bitmap bmap = nullptr; > > > + }; > > > + entry *vars_ssa_caches; > > > +public: > > > > > > -static inline void > > > -add_scope_conflicts_2 (tree use, bitmap work, > > > - walk_stmt_load_store_addr_fn visit) > > > + vars_ssa_cache(); > > > + ~vars_ssa_cache(); > > > + const_bitmap operator() (tree name); > > > + void dum
[COMMITTED 18/30] ada: Fix crash on Depends contract with homonym functions
From: Piotr Trojanek When resolving names in flow contracts, we refine the ordinary analysis by knowing that an overloaded name must refer to an abstract state and not a function. However, when all overloadings refer to function, we shouldn't crash, but instead let the error to be diagnosed later. gcc/ada/ChangeLog: * sem_prag.adb (Resolve_State): Continue ordinary processing. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_prag.adb | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb index 081716a4027..1841a0b79ad 100644 --- a/gcc/ada/sem_prag.adb +++ b/gcc/ada/sem_prag.adb @@ -33804,11 +33804,7 @@ package body Sem_Prag is State := Homonym (State); end loop; --- A function can never act as a state. If the homonym chain does --- not contain a corresponding state, then something went wrong in --- the overloading mechanism. - -raise Program_Error; +-- A function can never act as a state; it will be diagnosed later end if; end if; end Resolve_State; -- 2.43.0
[COMMITTED 12/30] ada: Ada version used to compile runtime is constant
From: Piotr Trojanek Code cleanup. gcc/ada/ChangeLog: * opt.ads (Ada_Version_Runtime): Now a constant, since it cannot and should never be modified. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/opt.ads | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/ada/opt.ads b/gcc/ada/opt.ads index aea52f3ad69..a09a1ccffa6 100644 --- a/gcc/ada/opt.ads +++ b/gcc/ada/opt.ads @@ -127,7 +127,7 @@ package Opt is -- remains set to Ada_Version_Default). This is used in the rare cases -- (notably pragma Obsolescent) where we want the explicit version set. - Ada_Version_Runtime : Ada_Version_Type := Ada_With_All_Extensions; + Ada_Version_Runtime : constant Ada_Version_Type := Ada_With_All_Extensions; -- GNAT -- Ada version used to compile the runtime. Used to set Ada_Version (but -- not Ada_Version_Explicit) when compiling predefined or internal units. -- 2.43.0
[COMMITTED 13/30] ada: Avoid null-exclusion checks for Node_Field_Table
From: Piotr Trojanek By generating the type of Node_Field_Table with a "not null" qualifier we check the null exclusion of its elements only once, at the object declaration. Tiny performance improvement for the debug builds (because in production builds checks are disabled anyway); semantics is unaffected. gcc/ada/ChangeLog: * gen_il-gen.adb (Put_Tables): Add "not null" to the generated code. * rtsfind.adb (Cstring_Ptr): Same for table with predefined RE_Id error messages. * impunit.adb (Aunit_Record): Same for array of alternative unit names. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/gen_il-gen.adb | 2 +- gcc/ada/impunit.adb| 2 +- gcc/ada/rtsfind.adb| 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/ada/gen_il-gen.adb b/gcc/ada/gen_il-gen.adb index c200abc0223..bac17ebc356 100644 --- a/gcc/ada/gen_il-gen.adb +++ b/gcc/ada/gen_il-gen.adb @@ -2214,7 +2214,7 @@ package body Gen_IL.Gen is Field_Enum_Type_Name & "_Index range <>) of " & Field_Enum_Type_Name & ";" & LF); Put (S, "type " & Field_Enum_Type_Name & - "_Array_Ref is access constant " & Field_Enum_Type_Name & + "_Array_Ref is not null access constant " & Field_Enum_Type_Name & "_Array;" & LF); Put (S, "subtype A is " & Field_Enum_Type_Name & "_Array;" & LF); -- Short name to make allocators below more readable diff --git a/gcc/ada/impunit.adb b/gcc/ada/impunit.adb index 47a5459792e..f8771772b52 100644 --- a/gcc/ada/impunit.adb +++ b/gcc/ada/impunit.adb @@ -649,7 +649,7 @@ package body Impunit is type Aunit_Record is record Fname : String (1 .. 6); - Aname : String_Ptr_Const; + Aname : not null String_Ptr_Const; end record; -- Array of alternative unit names diff --git a/gcc/ada/rtsfind.adb b/gcc/ada/rtsfind.adb index 6697184fae0..16deb82973c 100644 --- a/gcc/ada/rtsfind.adb +++ b/gcc/ada/rtsfind.adb @@ -153,7 +153,7 @@ package body Rtsfind is --packed component size of 43 is not supported - type CString_Ptr is access constant String; + type CString_Ptr is not null access constant String; type PRE_Id_Entry is record Str : CString_Ptr; -- 2.43.0
[COMMITTED 04/30] ada: Fix incorrect incomplete type error
From: Viljar Indus In Ada 2005 even if the formal is using a tagged limited type then the type should not be considered incomplete. gcc/ada/ChangeLog: * sem_ch6.adb (Analyze_Subprogram_Body_Helper): Exchange_Limited_Views also in Ada 2005. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_ch6.adb | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb index 2f09284ebda..ab12f22ad51 100644 --- a/gcc/ada/sem_ch6.adb +++ b/gcc/ada/sem_ch6.adb @@ -4536,8 +4536,10 @@ package body Sem_Ch6 is -- may now appear in parameter and result profiles. Since the analysis -- of a subprogram body may use the parameter and result profile of the -- spec, swap any limited views with their non-limited counterpart. + -- + -- Note that the non-limited view should also be exchanged in Ada 2005. - if Ada_Version >= Ada_2012 and then Present (Spec_Id) then + if Ada_Version >= Ada_2005 and then Present (Spec_Id) then Exch_Views := Exchange_Limited_Views (Spec_Id); end if; -- 2.43.0
[COMMITTED 20/30] ada: Fix assertion failure on 'Old in post-condition with -gnat2022
From: Eric Botcazou It comes from a small oversight in the updated implementation for Ada 2022. gcc/ada/ChangeLog: PR ada/117956 * sem_util.adb (Is_Known_On_Entry): Be prepared for constants coming from a renaming declaration. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_util.adb | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb index 595d3d10d0f..ffc631a7bee 100644 --- a/gcc/ada/sem_util.adb +++ b/gcc/ada/sem_util.adb @@ -31188,8 +31188,17 @@ package body Sem_Util is -- removal, e.g. by validity checks. if not Comes_From_Source (Obj) then - return -Is_Known_On_Entry (Expression (Parent (Obj))); + declare + D : constant Node_Id := + Declaration_Node (Obj); + begin + if Nkind (D) = N_Object_Renaming_Declaration + then +return Is_Known_On_Entry (Name (D)); + else +return Is_Known_On_Entry (Expression (D)); + end if; + end; end if; -- return False if not "all views are constant". -- 2.43.0
[COMMITTED 08/30] ada: Elide copy for calls as default values of nonlimited by-reference components
From: Eric Botcazou This prevents a temporary from being created on the primary stack to hold the result of the function calls before it is copied to the object being elaborated in the nonlimited by-reference case. That's already not done in the nonlimited non-by-reference case and there is no reason to do it in the former case either. The main issue are the calls to Remove_Side_Effects in Expand_Ctrl_Function_Call (controlled case only) and in Expand_N_Assignment_Statement, which serve various purposes including very technical ones beside removing side effects. The change is therefore very conservative and only removes the copy in the case of a naked function call for the time being. gcc/ada/ChangeLog: * einfo.ads (Returns_By_Ref): Fix description. * exp_ch3.adb (Build_Record_Init_Proc.Build_Assignment): Do not adjust the component manually (if need be), set No_Finalize_Actions instead of No_Ctrl_Actions for this purpose. Do not adjust when the expression is a naked function call. * exp_ch5.adb (Make_Tag_Ctrl_Assignment): Document the quirks of the function. Assert that the LHS of the assignment does not have side effects and replace calls to Duplicate_Subexpr_No_Checks with calls to New_Copy_Tree. Rename local variable Asn to New_N. (Expand_N_Assignment_Statement): In the tagged or controlled record case, do remove side effects from both operands on entry. Remove them in the controlled record case, except if the RHS is a function call and the assignment has the No_Ctrl_Actions flag set. * exp_ch6.adb (Expand_Ctrl_Function_Call): Bail out when the parent node is an assignment statement with the No_Ctrl_Actions flag set. * sem_util.adb (Statically_Different): Return True for a function call that does not return its result by reference. * sinfo.ads (No_Ctrl_Actions): Adjust description and add a note for the code generator. (No_Finalize_Actions): Likewise. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/einfo.ads| 4 +- gcc/ada/exp_ch3.adb | 80 +++- gcc/ada/exp_ch5.adb | 143 ++- gcc/ada/exp_ch6.adb | 21 +-- gcc/ada/sem_util.adb | 28 +++-- gcc/ada/sinfo.ads| 22 --- 6 files changed, 180 insertions(+), 118 deletions(-) diff --git a/gcc/ada/einfo.ads b/gcc/ada/einfo.ads index d283358c0c0..1946e68b3c7 100644 --- a/gcc/ada/einfo.ads +++ b/gcc/ada/einfo.ads @@ -4278,8 +4278,8 @@ package Einfo is --Returns_By_Ref -- Defined in subprogram type entities and functions. Set if a function --- (or an access-to-function type) returns a result by reference, either --- because the result is built in place, or its type is by-reference. +-- (or a function type) returns a result by reference, either because the +-- result is built in place or its type is limited in Ada 95. --Reverse_Bit_Order [base type only] -- Defined in all record type entities. Set if entity has a Bit_Order diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb index 9419d5d2fac..65d8eb7c433 100644 --- a/gcc/ada/exp_ch3.adb +++ b/gcc/ada/exp_ch3.adb @@ -2618,11 +2618,10 @@ package body Exp_Ch3 is Default_Loc : constant Source_Ptr := Sloc (Default); Typ : constant Entity_Id := Underlying_Type (Etype (Id)); - Adj_Call : Node_Id; - Exp : Node_Id; - Exp_Q: Node_Id; - Lhs : Node_Id; - Res : List_Id; + Exp : Node_Id; + Exp_Q : Node_Id; + Lhs : Node_Id; + Res : List_Id; begin Lhs := @@ -2677,57 +2676,48 @@ package body Exp_Ch3 is Name => Lhs, Expression => Exp)); - Set_No_Ctrl_Actions (First (Res)); - Exp_Q := Unqualify (Exp); - -- Adjust the tag if tagged (because of possible view conversions). - -- Suppress the tag adjustment when not Tagged_Type_Expansion because - -- tags are represented implicitly in objects, and when the record is - -- initialized with a raise expression. - - if Is_Tagged_Type (Typ) - and then Tagged_Type_Expansion - and then Nkind (Exp_Q) /= N_Raise_Expression - then --- Get the relevant type for the call to --- Make_Tag_Assignment_From_Type, which, for concurrent types is --- their corresponding record. - -declare - T : Entity_Id := Underlying_Type (Typ); -begin - if Ekind (T) in E_Protected_Type | E_Task_Type then - T := Corresponding_Record_Type (T); - end if; - - Append_To (Res, - Make_Tag_Assignment_From_Type - (Default_Loc, -New_Copy_Tree (Lhs, Ne
[COMMITTED 09/30] ada: C++ exception hierarchies: adjust for gnat-llvm
From: Alexandre Oliva gnat-llvm doesn't support C++ imports, so arrange for the GNAT.CPP* units to be omitted from gnat-llvm builds. Drop G++ exception interoperability support from raise-gcc.c, so as to not require the GNAT.CPP* units that define some of the required symbols. Co-Authored-By: Olivier Hainque gcc/ada/ChangeLog: * Makefile.rtl (LLVM_BUILD): Define based on LLVM_CONFIG. (GNATRTL_NONTASKING_OBJS): Make g-cpp, g-cppstd, and g-cstyin conditional on -gcc or -arm EH, and on no LLVM_BUILD. * raise-gcc.c (GXX_EH_INTEROP): Define as 0 on gnat-llvm or CERT, and 1 otherwise. (__gnat_get_cxx_dependent_exception) Omit on !GXX_EH_INTEROP. (__gnat_maybe_get_cxx_dependent_exception): Likewise. (__gnat_get_cxx_exception_type_info): Likewise. (__gnat_obtain_caught_object): Likewise. (is_handled_by): Omit eid parameter and G++ interop on !GXX_EH_INTEROP. Adjust callers. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/Makefile.rtl | 23 ++- gcc/ada/raise-gcc.c | 40 +--- 2 files changed, 47 insertions(+), 16 deletions(-) diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl index cd112d3c744..34bc5c80431 100644 --- a/gcc/ada/Makefile.rtl +++ b/gcc/ada/Makefile.rtl @@ -26,6 +26,12 @@ ifndef ADAC ADAC=$(CC) endif +ifeq ($(LLVM_CONFIG),) +LLVM_BUILD := $(shell $(ADAC) -v | grep ^llvm-gcc) +else +LLVM_BUILD := llvm-gcc +endif + # Objects needed only for tasking GNATRTL_TASKING_OBJS= \ a-dispat$(objext) \ @@ -419,9 +425,6 @@ GNATRTL_NONTASKING_OBJS= \ g-comlin$(objext) \ g-comver$(objext) \ g-crc32$(objext) \ - g-cpp$(objext) \ - g-cppstd$(objext) \ - g-cstyin$(objext) \ g-ctrl_c$(objext) \ g-curexc$(objext) \ g-debpoo$(objext) \ @@ -3152,7 +3155,12 @@ ifeq ($(EH_MECHANISM),-gcc) s-excmac.ads # include @@ -975,6 +991,8 @@ __gnat_exception_language_is_ada (_Unwind_Exception *except) return (exception_class_eq (except, GNAT_EXCEPTION_CLASS)); } +#if GXX_EH_INTEROP + /* Check whether *THROWN_PTR of EXCEPT_TYPEINFO is to be caught by a CHOICE_TYPEINFO handler under LANG convention. Implemented by GNAT.CPP_Exception.Convert_Caught_Object. */ @@ -1099,20 +1117,17 @@ __gnat_obtain_caught_object (int *success_p, void **thrown_ptr_p, } } +#endif /* GXX_EH_INTEROP */ + /* Return how CHOICE matches PROPAGATED_EXCEPTION. */ static enum action_kind is_handled_by (Exception_Id choice, -#ifndef CERT +#if GXX_EH_INTEROP Exception_Id *eid, -#endif +#endif /* GXX_EH_INTEROP */ _Unwind_Exception *propagated_exception) { -#ifndef CERT - char lang; - bool primary; -#endif - /* All others choice match everything. */ if (choice == GNAT_ALL_OTHERS) return handler; @@ -1144,7 +1159,10 @@ is_handled_by (Exception_Id choice, ) return handler; -#ifndef CERT +#if GXX_EH_INTEROP + char lang; + bool primary; + /* C++ exception occurrences with exact (C) or base (B) type matching. */ if (((primary = exception_class_eq (propagated_exception, CXX_EXCEPTION_CLASS)) @@ -1171,7 +1189,7 @@ is_handled_by (Exception_Id choice, return handler; } } -#endif +#endif /* GXX_EH_INTEROP */ return nothing; } @@ -1259,9 +1277,9 @@ get_action_description_for (_Unwind_Ptr ip, = (Exception_Id) get_ttype_entry_for (region, ar_filter); act = is_handled_by (choice, -#ifndef CERT +#if GXX_EH_INTEROP eid, -#endif +#endif /* GXX_EH_INTEROP */ uw_exception); if (act != nothing) { -- 2.43.0
[COMMITTED 03/30] ada: Fix finalization issue introduced by previous change
From: Eric Botcazou When detecting calls to subprograms specified for aspects of a type, the entity denoted by the aspects must go through Ultimate_Alias, since that of the name of the calls did the same. gcc/ada/ChangeLog: * exp_ch6.adb (Expand_Call_Helper): Call Ultimate_Alias for the detection of calls to subprograms specified for Constant_Indexing. * exp_util.adb (Is_Indexed_Container): Likewise. (Is_Iterated_Container): Likewise for Default_Iterator. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/exp_ch6.adb | 5 +++-- gcc/ada/exp_util.adb | 4 ++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb index c65ea91cb13..e6e5d50dbbf 100644 --- a/gcc/ada/exp_ch6.adb +++ b/gcc/ada/exp_ch6.adb @@ -5331,16 +5331,17 @@ package body Exp_Ch6 is elsif Nkind (Call_Node) = N_Function_Call and then Nkind (Parent (Call_Node)) = N_Function_Call +and then Is_Entity_Name (Name (Parent (Call_Node))) then declare Aspect : constant Node_Id := Find_Value_Of_Aspect (Etype (Call_Node), Aspect_Constant_Indexing); +Subp : constant Entity_Id := Entity (Name (Parent (Call_Node))); begin if Present (Aspect) - and then Is_Entity_Name (Name (Parent (Call_Node))) - and then Entity (Name (Parent (Call_Node))) = Entity (Aspect) + and then Subp = Ultimate_Alias (Entity (Aspect)) then -- Resolution is now finished, make sure we don't start -- analysis again because of the duplication. diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb index a9424b95880..25f9f077174 100644 --- a/gcc/ada/exp_util.adb +++ b/gcc/ada/exp_util.adb @@ -8945,7 +8945,7 @@ package body Exp_Util is Aspect := Find_Value_Of_Aspect (Typ, Aspect_Constant_Indexing); if Present (Aspect) then -Index := Entity (Aspect); +Index := Ultimate_Alias (Entity (Aspect)); -- Examine the statements following the container object and -- look for a call to the default indexing routine where the @@ -9030,7 +9030,7 @@ package body Exp_Util is Aspect := Find_Value_Of_Aspect (Typ, Aspect_Default_Iterator); if Present (Aspect) then -Iter := Entity (Aspect); +Iter := Ultimate_Alias (Entity (Aspect)); -- Examine the statements following the container object and -- look for a call to the default iterate routine where the -- 2.43.0
[COMMITTED 21/30] ada: Fix memory leak when failing to initialize newly allocated memory
From: Eric Botcazou This makes the compiler generate cleanup code to deallocate the memory when the evaluation of the expression of an allocator raises an exception, if the expression is a call to a function that may raise, i.e. is not declared with the No_Raise aspect/pragma. This can also be disabled by means of -gnatdQ. gcc/ada/ChangeLog: * debug.adb (dQ): Document usage. * exp_ch4.ads (Build_Cleanup_For_Allocator): New declaration. * exp_ch4.adb (Build_Cleanup_For_Allocator): New procedure. (Expand_Allocator_Expression): Build a cleanup to deallocate the memory when the evaluation of the expression raises an exception. * exp_ch6.adb (Make_Build_In_Place_Call_In_Allocator): Likewise. * exp_util.adb (Build_Allocate_Deallocate_Proc): Do not generate the detachment if the deallocation is for the cleanup of an allocator. * gen_il-fields.ads (Opt_Field_Enum): Add For_Allocator. * gen_il-gen-gen_nodes.adb (N_Free_Statement): Likewise. * sinfo.ads (For_Allocator): Document usage on N_Free_Statement. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/debug.adb| 5 +- gcc/ada/exp_ch4.adb | 123 +-- gcc/ada/exp_ch4.ads | 9 +++ gcc/ada/exp_ch6.adb | 16 +++- gcc/ada/exp_util.adb | 5 ++ gcc/ada/gen_il-fields.ads| 1 + gcc/ada/gen_il-gen-gen_nodes.adb | 3 +- gcc/ada/sinfo.ads| 5 ++ 8 files changed, 140 insertions(+), 27 deletions(-) diff --git a/gcc/ada/debug.adb b/gcc/ada/debug.adb index 2d0c32b0f09..c4b6d035e5c 100644 --- a/gcc/ada/debug.adb +++ b/gcc/ada/debug.adb @@ -74,7 +74,7 @@ package body Debug is -- dN No file name information in exception messages -- dO Output immediate error messages -- dP Do not check for controlled objects in preelaborable packages - -- dQ + -- dQ Do not generate cleanups for qualified expressions of allocators -- dR Bypass check for correct version of s-rpc -- dS Never convert numbers to machine numbers in Sem_Eval -- dT Convert to machine numbers only for constant declarations @@ -640,6 +640,9 @@ package body Debug is -- in preelaborable packages, but this restriction is a huge pain, -- especially in the predefined library units. + -- dQ Do not generate cleanups to deallocate the memory in case qualified + -- expressions of allocators raise an exception. + -- dR Bypass the check for a proper version of s-rpc being present -- to use the -gnatz? switch. This allows debugging of the use -- of stubs generation without needing to have GLADE (or some diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb index 18656ea24fd..75d79019f80 100644 --- a/gcc/ada/exp_ch4.adb +++ b/gcc/ada/exp_ch4.adb @@ -437,6 +437,37 @@ package body Exp_Ch4 is return; end Build_Boolean_Array_Proc_Call; + - + -- Build_Cleanup_For_Allocator -- + - + + function Build_Cleanup_For_Allocator + (Loc : Source_Ptr; + Obj_Id : Entity_Id; + Pool: Entity_Id; + Actions : List_Id) return Node_Id + is + Free_Stmt : constant Node_Id := +Make_Free_Statement (Loc, New_Occurrence_Of (Obj_Id, Loc)); + + begin + Set_For_Allocator (Free_Stmt); + Set_Storage_Pool (Free_Stmt, Pool); + + return +Make_Block_Statement (Loc, + Handled_Statement_Sequence => +Make_Handled_Sequence_Of_Statements (Loc, + Statements => Actions, + Exception_Handlers => New_List ( +Make_Exception_Handler (Loc, + Exception_Choices => New_List ( +Make_Others_Choice (Loc)), + Statements=> New_List ( +Free_Stmt, +Make_Raise_Statement (Loc)); + end Build_Cleanup_For_Allocator; + --- -- Build_Eq_Call -- --- @@ -574,7 +605,12 @@ package body Exp_Ch4 is T : constant Entity_Id := Entity (Indic); PtrT : constant Entity_Id := Etype (N); DesigT : constant Entity_Id := Designated_Type (PtrT); + Pool : constant Node_Id:= Storage_Pool (N); Special_Return : constant Boolean:= For_Special_Return_Object (N); + Special_Pool : constant Boolean:= +Present (Pool) + and then +(Is_RTE (Pool, RE_RS_Pool) or else Is_RTE (Pool, RE_SS_Pool)); Static_Match : constant Boolean:= not Is_Constrained (DesigT) or else Subtypes_Statically_Match (T, DesigT); @@ -586,8 +622,7 @@ package body Exp_Ch4 is -- of Exp into the newly allocated memory. procedure Build_Explicit_Assignment (Temp : Entity_Id;
[COMMITTED 27/30] ada: Fix incorrect RM reference in s-imagef.adb
From: Bob Duff gcc/ada/ChangeLog: * libgnat/s-imagef.adb (Set_Image_Integer): Change "RM A.3.10" to be "RM A.10.9". Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/libgnat/s-imagef.adb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/ada/libgnat/s-imagef.adb b/gcc/ada/libgnat/s-imagef.adb index 56992064fb1..06cbed11369 100644 --- a/gcc/ada/libgnat/s-imagef.adb +++ b/gcc/ada/libgnat/s-imagef.adb @@ -115,7 +115,7 @@ package body System.Image_F is -- Q * 10**(-D) -- This value can be written to the output file or to the result string - -- according to the format described in RM A.3.10. The details of this + -- according to the format described in RM A.10.9. The details of this -- operation are omitted here. -- A 64-bit value can represent all integers with 18 decimal digits, but -- 2.43.0
[COMMITTED 26/30] ada: Reduce footprint of C++ exception interoperation support
From: Alexandre Oliva The initial C++ base-type exception interoperation support change brought all of GNAT.CPP* along with raise-gcc, because of [__gnat_]Convert_Caught_Object. Move that private but pragma-exported function to GNAT.CPP.Std.Type_Info, so that it can rely on the C++ virtual/dispatch calls that justified the introduction of the Ada wrapper type, to avoid emulating virtual calls in C or bringing in a dependency on the C++ compiler and runtime. Drop the CharPtr package instantiation, that brought a huge amount of unnecessary code, and use string and storage primitives instead, using the strcmp builtin directly for the C string compares. Move the conversion to Ada String in Name to the wrapper interface in GNAT.CPP.Std, adjusting the private internal type to shave off a few more bytes from the only unit that raise-gcc will still need. Finally, disable heap finalization for Type_Info_Ptr, to avoid dragging in all of the finalization code. Thank to Eric Botcazou for the suggestion. gcc/ada/ChangeLog: * libgnat/g-cppexc.adb (Convert_Caught_Object): Move... * libgnat/g-cstyin.adb (Convert_Caught_Object): ... here. Use object call notation. (strcmp): New. (Char_Arr, CharPtr, Char_Pointer, To_chars_ptr): Drop. Do not import Interfaces.C.Pointers. (To_Pointer): Convert from System.Address. (Name_Starts_With_Asterisk): Rename local variable. (Name_Past_Asterisk): Rewrite with System.Address and strcmp. Import System.Storage_Elements. (Equals): Use strcmp. (Before): Fix logic error. Use strcmp. (Name): Move conversion to String... * libgnat/g-cppstd.adb (Name): ... here. Import Interfaces.C.Strings. * libgnat/g-cppstd.ads (Type_Info_Ptr): Disable heap finalization. * libgnat/g-cstyin.ads (Name): Change return type. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/libgnat/g-cppexc.adb | 40 -- gcc/ada/libgnat/g-cppstd.adb | 3 +- gcc/ada/libgnat/g-cppstd.ads | 4 +- gcc/ada/libgnat/g-cstyin.adb | 80 ++-- gcc/ada/libgnat/g-cstyin.ads | 2 +- 5 files changed, 65 insertions(+), 64 deletions(-) diff --git a/gcc/ada/libgnat/g-cppexc.adb b/gcc/ada/libgnat/g-cppexc.adb index 11022880670..bad748fdfe3 100644 --- a/gcc/ada/libgnat/g-cppexc.adb +++ b/gcc/ada/libgnat/g-cppexc.adb @@ -267,44 +267,4 @@ package body GNAT.CPP_Exceptions is end Get_Type_Info; - function Convert_Caught_Object (Choice, Except : Type_Info_Ptr; - Thrown : in out Address; - Lang : Character) - return Interfaces.C.C_bool; - pragma Export (Cpp, Convert_Caught_Object, "__gnat_convert_caught_object"); - -- Convert the exception object at Thrown, under Lang convention, - -- from type Except to type Choice, adjusting Thrown as needed and - -- returning True, or returning False in case the conversion fails. - - --- - -- Convert_Caught_Object -- - --- - - function Convert_Caught_Object (Choice, Except : Type_Info_Ptr; - Thrown : in out Address; - Lang : Character) - return Interfaces.C.C_bool is - begin - if Equals (Choice, Except) then - return C_bool'(True); - end if; - - if Lang = 'B' then - if Is_Pointer_P (Except) then -declare - Thrown_Indirect : Address; - for Thrown_Indirect'Address use Thrown; -begin - Thrown := Thrown_Indirect; -end; - end if; - - if Do_Catch (Choice, Except, Thrown, 1) then -return C_bool'(True); - end if; - end if; - - return C_bool'(False); - end Convert_Caught_Object; - end GNAT.CPP_Exceptions; diff --git a/gcc/ada/libgnat/g-cppstd.adb b/gcc/ada/libgnat/g-cppstd.adb index 000dd474c5c..8cb64edaffe 100644 --- a/gcc/ada/libgnat/g-cppstd.adb +++ b/gcc/ada/libgnat/g-cppstd.adb @@ -34,6 +34,7 @@ with GNAT.CPP.Std.Type_Info; with Ada.Unchecked_Conversion; +with Interfaces.C.Strings; use Interfaces.C.Strings; package body GNAT.CPP.Std is -- @@ -53,7 +54,7 @@ package body GNAT.CPP.Std is function Name (this : Type_Info_Ptr) return String - is (this.all.Name); + is (Value (this.all.Name)); --- -- Before --- diff --git a/gcc/ada/libgnat/g-cppstd.ads b/gcc/ada/libgnat/g-cppstd.ads index 63ef03e43dd..be8907c4f77 100644 --- a/gcc/ada/libgnat/g-cppstd.ads +++ b/gcc/ada/libgnat/g-cppstd.ads @@ -50,7 +50,8 @@ package GNAT.CPP.Std is function Name (this : Type_Info_Ptr) -- return Interfaces.C.Strings.chars_ptr;
[COMMITTED 05/30] ada: Use inheritance in Gen_IL
From: Bob Duff In Gen_IL, detect cases where fields could be inherited from an abstract type instead of being defined in each of two or more descendants of that type. Raise Illegal when that is the case, except in specific cases called out as exceptions to this rule. For every such case, either update the types declared in Gen_Nodes and Gen_Entities to use inheritance, or add the case to the list of exceptions where we do not want to use inheritance. gcc/ada/ChangeLog: * gen_il-internals.ads: Split Fields field into two fields Imm_Fields and Fields. * gen_il-gen.adb: Modify the field-inheritance algorithm to inherit at each level of the type hierarchy, rather than just inheriting into concrete types. For example, if C is a concrete type derived from B, which is in turn derived from A, we now set the Fields of B to include those of A. (We had always set the Fields of C to include those of A and B, and we still do that.) (Compute_Fields_For_One_Type): Detect cases where a given field is declared for all descendants of a given abstract type, in which case we should consider declaring it in the abstract type, and inheriting it in those descendants. (Exception_To_Inheritance_Rule): These are the cases where we could inherit, but we don't want to. * gen_il-gen-gen_nodes.adb: Move fields up the type hierarchy, so they are inherited instead of being defined separately. * gen_il-gen-gen_entities.adb: Likewise. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/gen_il-gen-gen_entities.adb | 18 +- gcc/ada/gen_il-gen-gen_nodes.adb| 291 gcc/ada/gen_il-gen.adb | 270 +++--- gcc/ada/gen_il-internals.ads| 6 +- 4 files changed, 289 insertions(+), 296 deletions(-) diff --git a/gcc/ada/gen_il-gen-gen_entities.adb b/gcc/ada/gen_il-gen-gen_entities.adb index f887e0c3c99..8cf66b2611d 100644 --- a/gcc/ada/gen_il-gen-gen_entities.adb +++ b/gcc/ada/gen_il-gen-gen_entities.adb @@ -302,7 +302,8 @@ begin -- Gen_IL.Gen.Gen_Entities -- but not getters; the Ekind is modified before any such getters are -- called. - Ab (Exception_Or_Object_Kind, Entity_Kind); + Ab (Exception_Or_Object_Kind, Entity_Kind, + (Sm (Esize, Uint))); Ab (Object_Kind, Exception_Or_Object_Kind, (Sm (Current_Value, Node_Id), @@ -313,7 +314,6 @@ begin -- Gen_IL.Gen.Gen_Entities Sm (Component_Clause, Node_Id), Sm (Corresponding_Record_Component, Node_Id), Sm (Entry_Formal, Node_Id), -Sm (Esize, Uint), Sm (Interface_Name, Node_Id), Sm (Normalized_First_Bit, Uint), Sm (Normalized_Position, Uint), @@ -334,7 +334,6 @@ begin -- Gen_IL.Gen.Gen_Entities Ab (Allocatable_Kind, Object_Kind, (Sm (Activation_Record_Component, Node_Id), Sm (Alignment, Unat), -Sm (Esize, Uint), Sm (Finalization_Master_Node, Node_Id), Sm (Interface_Name, Node_Id), Sm (Is_Finalized_Transient, Flag), @@ -403,7 +402,6 @@ begin -- Gen_IL.Gen.Gen_Entities Sm (Default_Expr_Function, Node_Id), Sm (Default_Value, Node_Id), Sm (Entry_Component, Node_Id), -Sm (Esize, Uint), Sm (Extra_Accessibility, Node_Id), Sm (Extra_Constrained, Node_Id), Sm (Extra_Formal, Node_Id), @@ -433,8 +431,7 @@ begin -- Gen_IL.Gen.Gen_Entities Ab (Formal_Object_Kind, Object_Kind, -- Generic formal objects are also objects - (Sm (Entry_Component, Node_Id), -Sm (Esize, Uint))); + (Sm (Entry_Component, Node_Id))); Cc (E_Generic_In_Out_Parameter, Formal_Object_Kind, -- A generic in out parameter, created by the use of a generic in out @@ -993,7 +990,8 @@ begin -- Gen_IL.Gen.Gen_Entities Sm (Static_Call_Helper, Node_Id), Sm (SPARK_Pragma, Node_Id), Sm (SPARK_Pragma_Inherited, Flag), -Sm (Subps_Index, Unat))); +Sm (Subps_Index, Unat), +Sm (LSP_Subprogram, Node_Id))); Cc (E_Function, Subprogram_Kind, -- A function, created by a function declaration or a function body @@ -1020,7 +1018,6 @@ begin -- Gen_IL.Gen.Gen_Entities Sm (Is_Predicate_Function, Flag), Sm (Is_Primitive_Wrapper, Flag), Sm (Is_Private_Primitive, Flag), -Sm (LSP_Subprogram, Node_Id), Sm (Mechanism, Mechanism_Type), Sm (Next_Inlined_Subprogram, Node_Id), Sm (Original_Protected_Subprogram, Node_Id), @@ -1039,8 +1036,7 @@ begin -- Gen_IL.Gen.Gen_Entities -- defined concatenation operator created whenever an array is declared. -- We do not make normal derived operators explicit in the tree, but the -- concatenation operators are made explicit. - (Sm (Extra_Accessibility_Of_Result, Node_Id), -Sm (LSP_Subpro
[COMMITTED 23/30] ada: Small housekeeping work in Exp_Aggr
From: Eric Botcazou This moves a few declarations around and tweaks a few comments. gcc/ada/ChangeLog: * exp_aggr.adb (Case_Table_Type): Fix reference in comment. (In_Place_Assign_OK): Move declaration around. (Is_Build_In_Place_Aggregate_Return): Likewise and adjust. (Expand_Array_Aggregate): Streamline for the sake of consistency. (Aggr_Assignment_OK_For_Backend): Remove reference to Gigi/gcc. (Backend_Processing_Possible): Likewise. (Expand_Array_Aggregate): Add comment. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/exp_aggr.adb | 41 +++-- 1 file changed, 19 insertions(+), 22 deletions(-) diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb index dae9d4e5009..fb5cfc6d1f0 100644 --- a/gcc/ada/exp_aggr.adb +++ b/gcc/ada/exp_aggr.adb @@ -85,7 +85,7 @@ package body Exp_Aggr is end record; type Case_Table_Type is array (Nat range <>) of Case_Bounds; - -- Table type used by Check_Case_Choices procedure + -- Table type used by Sort_Case_Table procedure function Get_Base_Object (N : Node_Id) return Entity_Id; -- Return the base object, i.e. the outermost prefix object, that N refers @@ -106,6 +106,17 @@ package body Exp_Aggr is -- Comp_Typ of aggregate N. Init_Expr denotes the initialization -- expression of the component. All generated code is added to Stmts. + function In_Place_Assign_OK + (N : Node_Id; + Target_Object : Entity_Id := Empty) return Boolean; + -- Predicate to determine whether an aggregate assignment can be done in + -- place, because none of the new values can depend on the components of + -- the target of the assignment. + + function Is_Build_In_Place_Aggregate_Return (N : Node_Id) return Boolean; + -- Return True if N is a simple return whose expression needs to be built + -- in place in the return object, assuming the expression is an aggregate. + function Is_Static_Dispatch_Table_Aggregate (N : Node_Id) return Boolean; -- Returns true if N is an aggregate used to initialize the components -- of a statically allocated dispatch table. @@ -162,14 +173,6 @@ package body Exp_Aggr is -- Local subprograms for Record Aggregate Expansion -- -- - function Is_Build_In_Place_Aggregate_Return (N : Node_Id) return Boolean; - -- Return True if N is a simple return whose expression needs to be built - -- in place in the return object, assuming the expression is an aggregate, - -- possibly qualified or a dependent expression of a conditional expression - -- (and possibly recursively). Such qualified and conditional expressions - -- are transparent for this purpose since an enclosing return is propagated - -- resp. distributed into these expressions by the expander. - function Build_Record_Aggr_Code (N : Node_Id; Typ : Entity_Id; @@ -210,13 +213,6 @@ package body Exp_Aggr is -- defaults. An aggregate for a type with mutable components must be -- expanded into individual assignments. - function In_Place_Assign_OK - (N : Node_Id; - Target_Object : Entity_Id := Empty) return Boolean; - -- Predicate to determine whether an aggregate assignment can be done in - -- place, because none of the new values can depend on the components of - -- the target of the assignment. - procedure Initialize_Discriminants (N : Node_Id; Typ : Entity_Id); -- If the type of the aggregate is a type extension with renamed discrimi- -- nants, we must initialize the hidden discriminants of the parent. @@ -310,8 +306,7 @@ package body Exp_Aggr is -- these are cases we handle in there. procedure Expand_Array_Aggregate (N : Node_Id); - -- This is the top-level routine for array aggregate expansion. - -- N is the N_Aggregate node to be expanded. + -- This is the top-level routine for array aggregate expansion procedure Expand_Delta_Array_Aggregate (N : Node_Id; Deltas : List_Id); -- This is the top-level routine for delta array aggregate expansion @@ -351,8 +346,8 @@ package body Exp_Aggr is -- Aggr_Assignment_OK_For_Backend -- - -- Back-end processing by Gigi/gcc is possible only if all the following - -- conditions are met: + -- Back-end processing is possible only if all the following conditions + -- are met: --1. N consists of a single OTHERS choice, possibly recursively, or -- of a single choice, possibly recursively, if it is surrounded by @@ -806,8 +801,8 @@ package body Exp_Aggr is -- Backend_Processing_Possible -- - - -- Backend processing by Gigi/gcc is possible only if all the following - -- conditions are met: + -- Back-end processing is possible only if all the following conditions + -- are met:
[COMMITTED 22/30] ada: cleanup documentation for shift and rotate
From: Bob Duff Documentation updated. gcc/ada/ChangeLog: * sinfo.ads (Shift_Count_OK): Update comments. (Is_Power_Of_2_For_Shift): Likewise. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sinfo.ads | 34 -- 1 file changed, 16 insertions(+), 18 deletions(-) diff --git a/gcc/ada/sinfo.ads b/gcc/ada/sinfo.ads index 3db084ef391..2b82ab345d6 100644 --- a/gcc/ada/sinfo.ads +++ b/gcc/ada/sinfo.ads @@ -1812,16 +1812,15 @@ package Sinfo is --A flag set on an aggregate that uses parentheses as delimiters -- Is_Power_Of_2_For_Shift - --A flag present only in N_Op_Expon nodes. It is set when the - --exponentiation is of the form 2 ** N, where the type of N is an - --unsigned integral subtype whose size does not exceed the size of - --Standard_Integer (i.e. a type that can be safely converted to - --Natural), and the exponentiation appears as the right operand of an - --integer multiplication or an integer division where the dividend is - --unsigned. It is also required that overflow checking is off for both - --the exponentiation and the multiply/divide node. If this set of - --conditions holds, and the flag is set, then the division or - --multiplication can be (and is) converted to a shift. + --Present in N_Op_Expon nodes. It is set when the exponentiation is of + --the form 2 ** N, where the type of N is an integer subtype whose size + --does not exceed the size of Standard.Integer (i.e. a type that can be + --safely converted to Natural), and the exponentiation appears as the + --right operand of an integer multiplication or an integer division + --where the dividend is unsigned. It is also required that overflow + --checking is off for both the exponentiation and the multiply/divide + --node. If this set of conditions holds, and the flag is set, then the + --division or multiplication is converted to a shift. -- Is_Preelaborable_Call --Present in call marker nodes. Set when the related call is non-static @@ -2291,7 +2290,8 @@ package Sinfo is --known to be in range, i.e. is in the range from zero to word length --minus one. If this flag is not set, then the shift count may be --outside this range, i.e. larger than the word length, and the code - --must ensure that such shift counts give the appropriate result. + --generated by the back end must ensure that such shift counts give the + --appropriate result. -- Source_Type --Used in an N_Validate_Unchecked_Conversion node to point to the @@ -7738,15 +7738,13 @@ package Sinfo is -- operator names. Note that for a given shift operation, one node -- covers all possible types, as for normal operators. - -- Note: it is perfectly permissible for the expander to generate - -- shift operation nodes directly, in which case they will be analyzed - -- and parsed in the usual manner. - -- Sprint syntax: shift-function-name!(expr, count) - -- Note: the Left_Opnd field holds the first argument (the value to - -- be shifted). The Right_Opnd field holds the second argument (the - -- shift count). The Chars field is the name of the intrinsic function. + -- The Left_Opnd field holds the first argument (the value to be + -- shifted). The Right_Opnd field holds the second argument (the shift + -- count). The Chars field is the name of the intrinsic function. + -- The back end is responsible for generating correct code in the case + -- of large shift counts (in which case Shift_Count_OK will be False). -- N_Op_Rotate_Left -- Sloc points to the function name -- 2.43.0
[COMMITTED 17/30] ada: Crash in prefix notation with access to class-wide object
From: Javier Miranda The compiler crashes analyzing a prefix notation call when its prefix is an access to a class-wide object, an actual parameter is missing, and the sources are compiled with language extensions (-gnatX) and full errors (-gnatf). gcc/ada/ChangeLog: * sem_ch4.adb (Try_Object_Operation): if no candidate interpretation matches the context, redo the same analysis with Report_Error True to report the error. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_ch4.adb | 29 +++-- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb index 18b3a4fc22f..94e43429814 100644 --- a/gcc/ada/sem_ch4.adb +++ b/gcc/ada/sem_ch4.adb @@ -10498,6 +10498,11 @@ package body Sem_Ch4 is Set_Etype (Subprog, Any_Type); Set_Parent (New_Call_Node, Parent (Node_To_Replace)); + -- Perform the analysis searching for a candidate with Report_Error + -- set to False (see above); if no candidate interpretation matches + -- the context, this analysis will be redone with Report_Error set + -- to True to provide additional information. + if not Is_Overloaded (Obj) then Try_One_Prefix_Interpretation (Obj_Type); @@ -10537,18 +10542,22 @@ package body Sem_Ch4 is if All_Errors_Mode then Report_Error := True; -if Try_Primitive_Operation - (Call_Node => New_Call_Node, - Node_To_Replace => Node_To_Replace) - or else -Try_Class_Wide_Operation - (Call_Node => New_Call_Node, - Node_To_Replace => Node_To_Replace) -then - null; -end if; +if not Is_Overloaded (Obj) then + Try_One_Prefix_Interpretation (Obj_Type); +else + declare + I : Interp_Index; + It : Interp; + begin + Get_First_Interp (Obj, I, It); + while Present (It.Nam) loop + Try_One_Prefix_Interpretation (It.Typ); + Get_Next_Interp (I, It); + end loop; + end; +end if; else Analyze_One_Call (N => New_Call_Node, -- 2.43.0
[COMMITTED 14/30] ada: Remove level attribute from Rules in the SARIF report
From: Viljar Indus A Rule object in the SARIF report does not have a level attribute. Result objects are the elements in the SARIF reprot that have a level attribute that ultimately determines the level of each diagnostic object. Rules can have a defaultConfiguration attribute which has a level attribute that can be overridden in multiple ways. This can make the overall report more complex than it needs to be. It is simpler to remove the attribute from rules where it does not really matter and add it back in under the defaultConfiguration when there is an explicit need for it. gcc/ada/ChangeLog: * diagnostics-sarif_emitter.adb (Print_Rule): Remove printing of the level attribute since it does not match the SARIF schema. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/diagnostics-sarif_emitter.adb | 6 -- 1 file changed, 6 deletions(-) diff --git a/gcc/ada/diagnostics-sarif_emitter.adb b/gcc/ada/diagnostics-sarif_emitter.adb index b6035c2970d..f0be97d1a1e 100644 --- a/gcc/ada/diagnostics-sarif_emitter.adb +++ b/gcc/ada/diagnostics-sarif_emitter.adb @@ -229,12 +229,10 @@ package body Diagnostics.SARIF_Emitter is procedure Print_Rule (Diag : Diagnostic_Type); -- Print a rule node that consists of the following attributes: -- * ruleId - -- * level -- * name -- -- { --"id": , - --"level": , --"name": -- }, @@ -1006,10 +1004,6 @@ package body Diagnostics.SARIF_Emitter is Write_Char (','); NL_And_Indent; - Write_String_Attribute ("level", Kind_To_String (Diag)); - Write_Char (','); - NL_And_Indent; - if Human_Id = null then Write_String_Attribute ("name", "Uncategorized_Diagnostic"); else -- 2.43.0
[COMMITTED 10/30] ada: Cleanup preanalysis of static expressions
From: Javier Miranda Complete previous patch; required to avoid regressions in GNATProve. gcc/ada/ChangeLog: * sem_ch6.adb (Analyze_Expression_Function): Set the parent of the new node to be the parent of the original to get the proper context, which is needed for complete error reporting and for semantic analysis. Patch suggested by Eric Botcazou. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_ch6.adb | 6 ++ 1 file changed, 6 insertions(+) diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb index ab12f22ad51..2e619d89444 100644 --- a/gcc/ada/sem_ch6.adb +++ b/gcc/ada/sem_ch6.adb @@ -427,6 +427,12 @@ package body Sem_Ch6 is -- As elsewhere, we do not emit freeze nodes within a generic unit. if not Inside_A_Generic then +-- Set the parent of the new node to be the parent of the original +-- to get the proper context, which is needed for complete error +-- reporting and for semantic analysis. + +Set_Parent (New_Body, Parent (N)); + Freeze_Expr_Types (Def_Id => Def_Id, Typ=> Typ, -- 2.43.0
[COMMITTED 19/30] ada: Declare that the new argument may not be used
From: Tonu Naks gcc/ada/ChangeLog: * adaint.c (__gnat_locate_exec_on_path): modify function signature Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/adaint.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/gcc/ada/adaint.c b/gcc/ada/adaint.c index 0459956ff5b..b906ac092ef 100644 --- a/gcc/ada/adaint.c +++ b/gcc/ada/adaint.c @@ -3079,7 +3079,8 @@ __gnat_locate_exec (char *exec_name, char *path_val) /* Locate an executable using the Systems default PATH. */ char * -__gnat_locate_exec_on_path (char *exec_name, int current_dir_on_windows) +__gnat_locate_exec_on_path (char *exec_name, + int current_dir_on_windows ATTRIBUTE_UNUSED) { char *apath_val; @@ -3110,10 +3111,6 @@ __gnat_locate_exec_on_path (char *exec_name, int current_dir_on_windows) } #else - /* Tell the compiler that we are not going to use this parameter - on non-windows platforms. */ - (void)current_dir_on_windows; - const char *path_val = getenv ("PATH"); /* If PATH is not defined, proceed with __gnat_locate_exec anyway, so we can -- 2.43.0
[PATCH v3 5/6] c++/modules: Add testcase for fixed ICE [PR116568]
This ICE was fixed by ensuring that the lambdas had LAMBDA_EXPR_EXTRA_SCOPE properly set. PR c++/116568 gcc/testsuite/ChangeLog: * g++.dg/modules/lambda-8.h: New test. * g++.dg/modules/lambda-8_a.H: New test. * g++.dg/modules/lambda-8_b.C: New test. Signed-off-by: Nathaniel Shead --- gcc/testsuite/g++.dg/modules/lambda-8.h | 7 +++ gcc/testsuite/g++.dg/modules/lambda-8_a.H | 5 + gcc/testsuite/g++.dg/modules/lambda-8_b.C | 5 + 3 files changed, 17 insertions(+) create mode 100644 gcc/testsuite/g++.dg/modules/lambda-8.h create mode 100644 gcc/testsuite/g++.dg/modules/lambda-8_a.H create mode 100644 gcc/testsuite/g++.dg/modules/lambda-8_b.C diff --git a/gcc/testsuite/g++.dg/modules/lambda-8.h b/gcc/testsuite/g++.dg/modules/lambda-8.h new file mode 100644 index 000..0c66f053b20 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/lambda-8.h @@ -0,0 +1,7 @@ +template struct S { + template static constexpr auto x = []{}; + template using t = decltype([]{}); +}; + +inline auto x = S::x; +using t = S::t; diff --git a/gcc/testsuite/g++.dg/modules/lambda-8_a.H b/gcc/testsuite/g++.dg/modules/lambda-8_a.H new file mode 100644 index 000..d20958ee140 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/lambda-8_a.H @@ -0,0 +1,5 @@ +// PR c++/116568 +// { dg-additional-options "-fmodules-ts -std=c++20" } +// { dg-module-cmi {} } + +#include "lambda-8.h" diff --git a/gcc/testsuite/g++.dg/modules/lambda-8_b.C b/gcc/testsuite/g++.dg/modules/lambda-8_b.C new file mode 100644 index 000..05ea4afd8c1 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/lambda-8_b.C @@ -0,0 +1,5 @@ +// PR c++/116568 +// { dg-additional-options "-fmodules-ts -std=c++20" } + +#include "lambda-8.h" +import "lambda-8_a.H"; -- 2.47.0
Re: [Ping, Fortran, Patch, PR114612, v1] Fix missing deep-copy for allocatable components of derived types having cycles.
Hi all, attached patch has been rebased to latest trunk. Just pinging! Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? - Andre On Fri, 13 Dec 2024 12:10:58 +0100 Andre Vehreschild wrote: > Hi all, > > attached patch fixes deep-copying (or rather its former absence) for > allocatable components of derived types having cyclic dependencies. > > Regtested ok on x86_64-pc-linux-gnu / F41. Ok for mainline? > > Regards, > Andre > -- > Andre Vehreschild * Email: vehre ad gmx dot de -- Andre Vehreschild * Email: vehre ad gmx dot de From 13b7fa360ccd246bb6dc93c0fc177b85d12f578f Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Fri, 13 Dec 2024 12:07:01 +0100 Subject: [PATCH] Fortran: Ensure deep copy of allocatable components in cylic types [PR114612] gcc/fortran/ChangeLog: PR fortran/114612 * trans-array.cc (structure_alloc_comps): Ensure deep copy is also done for types having cycles. gcc/testsuite/ChangeLog: * gfortran.dg/alloc_comp_deep_copy_4.f03: New test. --- gcc/fortran/trans-array.cc| 7 ++--- .../gfortran.dg/alloc_comp_deep_copy_4.f03| 29 +++ 2 files changed, 32 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/alloc_comp_deep_copy_4.f03 diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc index 67e139664c2..764e51989cb 100644 --- a/gcc/fortran/trans-array.cc +++ b/gcc/fortran/trans-array.cc @@ -10588,10 +10588,9 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl, tree dest, false, false, NULL_TREE, NULL_TREE); gfc_add_expr_to_block (&fnblock, tmp); } - else if ((c->attr.allocatable) - && !c->attr.proc_pointer && !same_type - && (!(cmp_has_alloc_comps && c->as) || c->attr.codimension - || caf_in_coarray (caf_mode))) + else if (c->attr.allocatable && !c->attr.proc_pointer + && (!(cmp_has_alloc_comps && c->as) || c->attr.codimension + || caf_in_coarray (caf_mode))) { rank = c->as ? c->as->rank : 0; if (c->attr.codimension) diff --git a/gcc/testsuite/gfortran.dg/alloc_comp_deep_copy_4.f03 b/gcc/testsuite/gfortran.dg/alloc_comp_deep_copy_4.f03 new file mode 100644 index 000..3c445be032f --- /dev/null +++ b/gcc/testsuite/gfortran.dg/alloc_comp_deep_copy_4.f03 @@ -0,0 +1,29 @@ +!{ dg-do run } +! +! Contributed Vladimir Terzi +! Check that deep-copy for b=a works. + +program pr114672 +type node +integer::val +type(node),allocatable::next +end type + +type(node)::a,b + +allocate(a%next) +a%val=1 +a%next%val=2 +!print*,a%val,a%next%val +b=a +b%val=3 +b%next%val=4 +if (loc(b) == loc(a)) stop 1 +if (loc(b%next) == loc(a%next)) stop 2 +!print*,a%val,a%next%val +deallocate(b%next) +if (.NOT. allocated(a%next)) stop 3 +!print*,a%val,a%next%val +deallocate(a%next) +end + -- 2.47.1
Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive
> On Jan 4, 2025, at 12:58, Jeff Law wrote: > > > > On 1/3/25 10:30 AM, Qing Zhao wrote: >>> On Jan 3, 2025, at 11:41, Richard Biener wrote: >>> >>> >>> Am 03.01.2025 um 16:22 schrieb Jeff Law : So this is an implementation of an idea I had a few years back and prototyped last spring to fix pr92539. pr92539 is a false positive Warray-bounds warning triggered by loop unrolling. The warning is in code that will never execute, but none of the optimizers clean things up well enough or early enough in the pipeline to avoid the warning. To optimize away the code we can take advantage of the fact that we're comparing a value to a bogus pointer. So for example an EQ comparison against &"aa" + 3 is always false and a NE comparison against that would always be true. >>> >>> Forming the pointer invokes UB, the comparison is neither false nor true. >>> I’d say this is a classical case for path isolation on the pointer forming, >>> not only discarding either the true or false path of the compare. >>> >>> The path isolation pass is probably too late in this case? >> Path isolation pass is After bounds pass. >> From my understanding, path isolation pass only insert TRAP on the >> problematic path. The false positive warning on that path still there. >> Please see: >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106762 > Correct. But we have choices here. Forming the address doesn't actually > cause user visible UB, though it is UB in the IL. > > Given it doesn't cause user visible UB, we could insert the trap *before* the > UB inducing statement. That would then make the statement unreachable and > it'd get removed avoiding the false positive diagnostic. Yes, that’s a good idea. However, in order to distinguish a user visible UB and a UB in the IL that is introduced purely by compiler, we might need some new marking in the IR? Qing > > But path-isolation is too late in the gimple pipeline to really help us here. > > >> My work of adding -fdiagnostics-explain-harder also handled path isolation >> pass. >> Should the loop unrolling pass need to be adjusted to avoid forming the >> invalid pointers? > It it can reasonably be adjusted, it'd probably be good. It's an area I'm > not terribly familiar with. > > jeff
Re: [PATCH] or1k: add .note.GNU-stack section on linux
On 1/6/25 6:01 AM, Stafford Horne wrote: In the OpenRISC build we get the following warning: ld: warning: __modsi3_s.o: missing .note.GNU-stack section implies executable stack ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker Fix this by adding a .note.GNU-stack to indicate the stack does not need to be executable for the lib1funcs. Note, this is also needed for the upcoming glibc 2.41. libgcc/ * config/or1k/lib1funcs.S: Add .note.GNU-stack section on linux. OK for the trunk. You've got write privs, correct? jeff
[PATCH] c: Restore warning for incomplete structures declared in parameter list [PR117866]
Happy new year! Please consider the following patch. Bootstrapped and regression tested on x86_64. c: Restore warning for incomplete structures declared in parameter list [PR117866] In C23 mode the warning about declaring structures and union in parameter lists was removed, because it is possible to redeclare a compatible type elsewhere. This is not the case for incomplete types, so restore the warning for those types. PR c/117886 gcc/c/ChangeLog: * c-decl.cc (get_parm_info): Change condition for warning. gcc/testsuite/ChangeLog: * gcc.dg/pr117866.c: New test. * gcc.dg/struct-pr118007.c: Adapt. diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc index f60b2a54a17..56d13884182 100644 --- a/gcc/c/c-decl.cc +++ b/gcc/c/c-decl.cc @@ -8677,7 +8677,7 @@ get_parm_info (bool ellipsis, tree expr) if (b->id) { /* The %s will be one of 'struct', 'union', or 'enum'. */ - if (!flag_isoc23) + if (!flag_isoc23 || !COMPLETE_TYPE_P (decl)) warning_at (b->locus, 0, "%<%s %E%> declared inside parameter list" " will not be visible outside of this definition or" diff --git a/gcc/testsuite/gcc.dg/pr117866.c b/gcc/testsuite/gcc.dg/pr117866.c new file mode 100644 index 000..3caf707791d --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr117866.c @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options "-std=c23" } */ + +void convert(struct fractpoint *pt); /* { dg-warning "declared inside parameter list" } */ + diff --git a/gcc/testsuite/gcc.dg/strub-pr118007.c b/gcc/testsuite/gcc.dg/strub-pr118007.c index 6c24cad6529..51f48245b3f 100644 --- a/gcc/testsuite/gcc.dg/strub-pr118007.c +++ b/gcc/testsuite/gcc.dg/strub-pr118007.c @@ -2,4 +2,4 @@ /* { dg-do compile } */ /* { dg-options "-fstrub=all -O2" } */ -void rb_ec_error_print(struct rb_execution_context_struct *volatile) {} +void rb_ec_error_print(struct rb_execution_context_struct *volatile) {} /* { dg-warning "declared inside parameter list" } */
Re: [PATCH v2 1/4] testsuite: RISC-V: Add effective target for E ABI variant
On 1/4/25 11:01 AM, Dimitar Dimitrov wrote: Add new effective target check for either ILP32E or ILP64E ABI variants. Initial implementation only checks for RV32E or RV64E ISA, which in turn implies that ILP32E/ILP64E ABI is used. The RV32I+ILP32E and RV64I+ILP64E combinations are not yet caught by the check, but they do not seem to be widely used currently. gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_riscv_abi_e): New procedure. All 4 patches in this series are fine. I do fear that trying to keep the E variants "clean" will be a long term maintenance task though. Hell, we're struggling with keeping the existing stuff clean. But IMHO that's not a good reason to reject, especially with someone that's planning to test this regularly (and presumably send fixes regularly). Jeff
Re: [ping][PATCH] testsuite/118127: Pass fortran tests on ppc64le for IEEE128 long doubles
On Mon, Jan 06, 2025 at 11:01:18AM -0500, Siddhesh Poyarekar wrote: > Ping! > > On 2024-12-19 08:16, Siddhesh Poyarekar wrote: > > Denormal behaviour is well defined for IEEE128 long doubles, so don't > > XFAIL some gfortran tests on ppc64le when configured with the IEEE128 > > long double ABI. > > > > gcc/testsuite/ChangeLog: > > > > PR testsuite/118127 > > * gfortran.dg/default_format_2.f90: Don't xfail for > > ppc_ieee128_ok. > > * gfortran.dg/default_format_denormal_2.f90: Likewise. > > * gfortran.dg/large_real_kind_form_io_2.f90: Likewise. > > > > Signed-off-by: Siddhesh Poyarekar That looks incorrect to me. ppc_ieee128_ok just means that one can use the __ieee128 type (and only if -mfloat128 option is passed). What the tests care is whether real(16) is IEEE128 or IBM128. That is dependent on what glibc gcc has been configured against, with what configure options and whether -mabi=ieeelongdouble or -mabi=ibmlongdouble options were used. The long_double_ibm128 and long_double_ieee128 effective targets would be slightly closer to what you want, but they actually test whether that is the case when one uses dg-add-options long_double_ibm128 or dg-add-options long_double_ieee128. For the xfail, guess you want a test which will check what is the default... So, e.g. (for powerpc* only) try to compile integer function foo () integer, parameter :: kl = selected_real_kind (precision (0.0_8) + 1) foo = precision (0.0_kl) end and see if it returns 33 (that is IEEE quad) or 31 (IBM double double). Jakub
Re: [PATCH] or1k: add .note.GNU-stack section on linux
On Mon, Jan 06, 2025 at 07:37:56AM -0700, Jeff Law wrote: > > > On 1/6/25 6:01 AM, Stafford Horne wrote: > > In the OpenRISC build we get the following warning: > > > > ld: warning: __modsi3_s.o: missing .note.GNU-stack section implies > > executable stack > > ld: NOTE: This behaviour is deprecated and will be removed in a future > > version of the linker > > > > Fix this by adding a .note.GNU-stack to indicate the stack does not need to > > be > > executable for the lib1funcs. > > > > Note, this is also needed for the upcoming glibc 2.41. > > > > libgcc/ > > * config/or1k/lib1funcs.S: Add .note.GNU-stack section on linux. > OK for the trunk. You've got write privs, correct? Thanks, Yes I should still have access. Also, I would like to add to the release-14 branch. Is there anything special I need to do to add the patch there? -Stafford
Re: [-Ping-, Fortran, Patch, PR116669, v3] Fix ICE in deallocation of derived types having cyclic dependencies
Hi all, during looking for something completely different, I figured, that gcc does not use std::set internally, but its implementation of hash_set. I therefore adapted the patch to use it. Nothing more changed. Still regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? Regards, Andre On Mon, 6 Jan 2025 11:06:46 +0100 Andre Vehreschild wrote: > Hi all, > > pinging attached rebased patch. > > Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? > > - Andre > > On Thu, 12 Dec 2024 14:50:13 +0100 > Andre Vehreschild wrote: > > > Hi all, > > > > attached patch improves analysis of cycles in derived types, i.e. type > > dependencies ala: > > > > type(T) > > type(T2), allocatable :: c > > end type > > > > type(T2) > > type(T), allocatable :: t > > end type > > > > are now detected and deallocating an object that is of any of the types > > now no longer crashes the compiler because of an endless recursion. To > > accomplish this, I stored the symbols of the types seen in a C++ set > > and checked if a component's type is already present in there. When a > > type has such an indirect self-reference, it gets marked by setting its > > symbol_attribute::recursive flag. Later steps then can make use of it. > > > > Furthermore are _deallocate members of the vtab populated when a type > > has the recursive and the alloc_comp flag set. > > > > Bootstraps and regtests ok on x86_64-pc-linux-gnu / F41. Ok for > > mainline? > > > > Note: The patch was developed on top of my coarray patch, but should > > apply with delta on a regular trunk w/o issues. > > > > Regards, > > Andre > > -- > > Andre Vehreschild * Email: vehre ad gcc dot gnu dot org > > > -- > Andre Vehreschild * Email: vehre ad gmx dot de -- Andre Vehreschild * Email: vehre ad gmx dot de From e12066f328cb7952e28996775739592c1978c066 Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Mon, 9 Dec 2024 14:56:27 +0100 Subject: [PATCH] Fortran: Extend cylic type detection for deallocate [PR116669] Using cycles in derived/class types lead to the compiler doing a endless recursion in several locations, when the cycle was not immediate. An immediate cyclic dependency is present in, for example T T::comp. Cylcic dependencies of the form T T2::comp; T2 T::comp2; are now detected and the recursive bit in the derived type's attr is set. gcc/fortran/ChangeLog: PR fortran/116669 * class.cc (gfc_find_derived_vtab): Use attr to determine cyclic type dependendies. * expr.cc (gfc_has_default_initializer): Prevent endless recursion by storing already visited derived types. * resolve.cc (resolve_cyclic_derived_type): Determine if a type is used in its hierarchy in a cyclic way. (resolve_fl_derived0): Call resolve_cyclic_derived_type. (resolve_fl_derived): Ensure vtab is generated when cyclic derived types have allocatable components. * trans-array.cc (structure_alloc_comps): Prevent endless loop for derived type cycles. * trans-expr.cc (gfc_get_ultimate_alloc_ptr_comps_caf_token): Off topic, just prevent memory leaks. gcc/testsuite/ChangeLog: * gfortran.dg/class_array_15.f03: Freeing more memory. * gfortran.dg/recursive_alloc_comp_6.f90: New test. --- gcc/fortran/class.cc | 19 +- gcc/fortran/expr.cc | 38 gcc/fortran/resolve.cc| 58 +-- gcc/fortran/trans-array.cc| 25 +--- gcc/fortran/trans-expr.cc | 10 +++- gcc/testsuite/gfortran.dg/class_array_15.f03 | 2 +- .../gfortran.dg/recursive_alloc_comp_6.f90| 28 + 7 files changed, 136 insertions(+), 44 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/recursive_alloc_comp_6.f90 diff --git a/gcc/fortran/class.cc b/gcc/fortran/class.cc index e0dd571cd68..3e0dce1b54d 100644 --- a/gcc/fortran/class.cc +++ b/gcc/fortran/class.cc @@ -2507,20 +2507,6 @@ gfc_find_derived_vtab (gfc_symbol *derived) { gfc_component *c; gfc_symbol *parent = NULL, *parent_vtab = NULL; - bool rdt = false; - - /* Is this a derived type with recursive allocatable - components? */ - c = (derived->attr.unlimited_polymorphic - || derived->attr.abstract) ? - NULL : derived->components; - for (; c; c= c->next) - if (c->ts.type == BT_DERIVED - && c->ts.u.derived == derived) - { - rdt = true; - break; - } gfc_get_symbol (name, ns, &vtype); if (!gfc_add_flavor (&vtype->attr, FL_DERIVED, NULL, @@ -2703,9 +2689,8 @@ gfc_find_derived_vtab (gfc_symbol *derived) c->attr.access = ACCESS_PRIVATE; c->tb = XCNEW (gfc_typebound_proc); c->tb->ppc = 1; - if (derived->attr.unlimited_polymorphic - || derived->attr.abstract - || !rdt) + if (derived->attr.unlimited_polymorphic || derived->attr.abstract + || !derived->attr.recursive) c->initializer = gfc_get_null_expr (NULL); else { diff --git a/gcc/fort
Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive
On 1/6/25 7:11 AM, Qing Zhao wrote: Given it doesn't cause user visible UB, we could insert the trap *before* the UB inducing statement. That would then make the statement unreachable and it'd get removed avoiding the false positive diagnostic. Yes, that’s a good idea. However, in order to distinguish a user visible UB and a UB in the IL that is introduced purely by compiler, we might need some new marking in the IR? I don't think we've ever really tackled that question; the closest I can think of would be things like integer overflow which we try to avoid allowing the compiler to introduce. If we take the integer overflow as the model, then that would say we should be tackling this during loop unrolling. jeff
[PATCH] Do not call cp_parser_omp_dispatch directly in cp_parser_pragma
This is a followup to ed49709acda OpenMP: C++ front-end support for dispatch + adjust_args. The call to cp_parser_omp_dispatch only belongs in cp_parser_omp_construct. In cp_parser_pragma, handle PRAGMA_OMP_DISPATCH by calling cp_parser_omp_construct. gcc/cp/ChangeLog: * parser.cc (cp_parser_pragma): Replace call to cp_parser_omp_dispatch with cp_parser_omp_construct. --- gcc/cp/parser.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index f548dc31c2b..7434c1d91ba 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -53060,7 +53060,7 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p) break; case PRAGMA_OMP_DISPATCH: - cp_parser_omp_dispatch (parser, pragma_tok); + cp_parser_omp_construct (parser, pragma_tok, if_p); return true; case PRAGMA_IVDEP: -- 2.45.2
Re: [PATCH] Only apply adjust_args in OpenMP dispatch if variant substitution occurs
Apologies, I forgot to add the testcase. Please find attached an updated patch. On 06/01/2025 17:12, Paul-Antoine Arras wrote: This is a followup to 084ea8ad584 OpenMP: middle-end support for dispatch + adjust_args. This patch fixes a bug that caused arguments in an OpenMP dispatch call to be modified even when no variant substitution occurred. gcc/ChangeLog: * gimplify.cc (gimplify_call_expr): Create variable variant_substituted_p to control whether adjust_args applies. --- gcc/gimplify.cc | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index bd324be926a..251d581f44c 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -3857,7 +3857,8 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) enum gimplify_status ret; int i, nargs; gcall *call; - bool builtin_va_start_p = false, omp_dispatch_p = false; + bool builtin_va_start_p = false, omp_dispatch_p = false, + variant_substituted_p = false; location_t loc = EXPR_LOCATION (*expr_p); gcc_assert (TREE_CODE (*expr_p) == CALL_EXPR); @@ -4035,7 +4036,10 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) { tree variant = omp_resolve_declare_variant (fndecl); if (variant != fndecl) - CALL_EXPR_FN (*expr_p) = build1 (ADDR_EXPR, fnptrtype, variant); + { + CALL_EXPR_FN (*expr_p) = build1 (ADDR_EXPR, fnptrtype, variant); + variant_substituted_p = true; + } } /* There is a sequence point before the call, so any side effects in @@ -4325,8 +4329,9 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) } } - if ((need_device_ptr && !is_device_ptr) - || (need_device_addr && !has_device_addr)) + if (variant_substituted_p + && ((need_device_ptr && !is_device_ptr) + || (need_device_addr && !has_device_addr))) { if (dispatch_device_num == NULL_TREE) { -- PAFrom 4d58719cb35b379950767bec8da0446570bd735e Mon Sep 17 00:00:00 2001 From: Paul-Antoine Arras Date: Mon, 6 Jan 2025 17:00:10 +0100 Subject: [PATCH] Only apply adjust_args in OpenMP dispatch if variant substitution occurs This is a followup to 084ea8ad584 OpenMP: middle-end support for dispatch + adjust_args. This patch fixes a bug that caused arguments in an OpenMP dispatch call to be modified even when no variant substitution occurred. gcc/ChangeLog: * gimplify.cc (gimplify_call_expr): Create variable variant_substituted_p to control whether adjust_args applies. gcc/testsuite/ChangeLog: * c-c++-common/gomp/adjust-args-4.c: New test. --- gcc/gimplify.cc | 13 ++ .../c-c++-common/gomp/adjust-args-4.c | 24 +++ 2 files changed, 33 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/adjust-args-4.c diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index bd324be926a..251d581f44c 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -3857,7 +3857,8 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) enum gimplify_status ret; int i, nargs; gcall *call; - bool builtin_va_start_p = false, omp_dispatch_p = false; + bool builtin_va_start_p = false, omp_dispatch_p = false, + variant_substituted_p = false; location_t loc = EXPR_LOCATION (*expr_p); gcc_assert (TREE_CODE (*expr_p) == CALL_EXPR); @@ -4035,7 +4036,10 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) { tree variant = omp_resolve_declare_variant (fndecl); if (variant != fndecl) - CALL_EXPR_FN (*expr_p) = build1 (ADDR_EXPR, fnptrtype, variant); + { + CALL_EXPR_FN (*expr_p) = build1 (ADDR_EXPR, fnptrtype, variant); + variant_substituted_p = true; + } } /* There is a sequence point before the call, so any side effects in @@ -4325,8 +4329,9 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) } } - if ((need_device_ptr && !is_device_ptr) - || (need_device_addr && !has_device_addr)) + if (variant_substituted_p + && ((need_device_ptr && !is_device_ptr) + || (need_device_addr && !has_device_addr))) { if (dispatch_device_num == NULL_TREE) { diff --git a/gcc/testsuite/c-c++-common/gomp/adjust-args-4.c b/gcc/testsuite/c-c++-common/gomp/adjust-args-4.c new file mode 100644 index 000..377932e1b9c --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/adjust-args-4.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-fdump-tree-gimple" } */ + +/* Ensure that adjust_args is only applied when variant substitution happens. */ + +void h(int *); +void f(int *);
Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive
On 1/6/25 9:01 AM, Richard Biener wrote: Note unrolling doesn't introduce UB - it makes conditional UB "obvious". That's fair and it's how I often view these kinds of things when they pop out via jump threading. So unless the condition guarding the UB unrolling exposes is visibly false to the compiler but we fail to exploit that (missed optimization) there's not much that we can do. I think "folding" away the UB like what Jeff proposes trades false negatives for the false positive diagnostics. I hadn't looked at it in that way, but I can certainly see that point of view. Of course the counter to that is all kinds of optimizations can do the same. That said - I think for these unroller exposed cases of apparent false positives we should improve the path pruning in the unroller itself. For the other cases the path diagnostic might help clarify that the UB happens on the 'n-th' iteration of the loop when some additional condition is true/false. Agreed. So in the end Jeff - I think your patch isn't a good approach for the issue at hand. Understood. It was an idea, I don't mind setting it aside. Jeff
Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive
> On Jan 6, 2025, at 11:01, Richard Biener wrote: > > On Mon, Jan 6, 2025 at 3:43 PM Qing Zhao wrote: >> >> >> >>> On Jan 6, 2025, at 09:21, Jeff Law wrote: >>> >>> >>> >>> On 1/6/25 7:11 AM, Qing Zhao wrote: > > Given it doesn't cause user visible UB, we could insert the trap *before* > the UB inducing statement. That would then make the statement > unreachable and it'd get removed avoiding the false positive diagnostic. Yes, that’s a good idea. However, in order to distinguish a user visible UB and a UB in the IL that is introduced purely by compiler, we might need some new marking in the IR? >>> I don't think we've ever really tackled that question; the closest I can >>> think of would be things like integer overflow which we try to avoid >>> allowing the compiler to introduce. If we take the integer overflow as the >>> model, then that would say we should be tackling this during loop unrolling. >> >> UB that is introduced by compiler transformation is one important cause of >> false positive warnings. >> >> There are two approaches to tackle this problem from my understanding: >> >> 1. Avoid generating such UB from the beginning. i.e, for every compiler >> transformation that might introduce such UB, we should add check to avoid >> generating it. >> >> 2. Marking the IR portion that were generated by compiler transformations, >> then check whether the UB is compiler generated when issue static checker >> warnings. >> >> Are there other approaches? > > Note unrolling doesn't introduce UB - it makes conditional UB > "obvious”. So, you mean this is the same issue as PR109071 (and PR85788, PR88771, etc), i.e, the compiler optimization make the conditional UB that’s originally in the source code “obvious” after code duplication? (I need to study the testing case in PR92539 more carefully to make sure this is the case...) If so, then the claimed false positive warning in PR92539 actually is a real bug in the original source code, and my patch that introduced the new option “--fdiagnostics-details” should also include loop unrolling to provide more details on the warning introduced by loop unrolling. > Note -Warray-bounds wants to > diagnose UB, so doing path isolation and removing the UB would make > -Warray-bounds useless. > > So unless the condition guarding the UB unrolling exposes is visibly > false to the compiler but we fail > to exploit that (missed optimization) there's not much that we can do. > I think "folding" away the UB > like what Jeff proposes trades false negatives for the false positive > diagnostics. > > Note the unroller knows UB that effectively bounds the number of > iterations, even on conditional > paths and it uses this to limit the number of copies _and_ to prune > unreachable paths (exploiting > UB, avoiding diagnostics). But one of the limitations is that it only > prunes paths in the last unrolled > copy which can be insufficient (ISTR some PR where I noticed this). > > That said - I think for these unroller exposed cases of apparent false > positives we should improve > the path pruning in the unroller itself. For the other cases the path > diagnostic might help clarify > that the UB happens on the 'n-th' iteration of the loop when some > additional condition is true/false. So, the “other cases” refer to the situation similar as PR109071, i.e, “conditional UB” in the original source code is made obvious after loop unrolling? Yes, for such cases, the new option I have been trying to add, “-fdiagnostic-details” should be able to track and provide more details on the conditions that lead to the UB. Is this understanding correct? thanks. Qing > > So in the end Jeff - I think your patch isn't a good approach for the > issue at hand. > > Richard. > >> The above is very rough and initial idea at this moment. >> >> Qing >> >>> >>> jeff
Re: [RFC][PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> On 19 Dec 2024, at 14:10, Jennifer Schmitz wrote: > > > >> On 19 Dec 2024, at 11:14, Richard Sandiford >> wrote: >> >> External email: Use caution opening links or attachments >> >> >> Jennifer Schmitz writes: >>> @@ -8834,22 +8834,7 @@ vectorizable_store (vec_info *vinfo, >>> { >>> if (costing_p) >>> { >>> - /* Only need vector extracting when there are more >>> - than one stores. */ >>> - if (nstores > 1) >>> - inside_cost >>> - += record_stmt_cost (cost_vec, 1, vec_to_scalar, >>> -stmt_info, slp_node, >>> -0, vect_body); >>> - /* Take a single lane vector type store as scalar >>> - store to avoid ICE like 110776. */ >>> - if (VECTOR_TYPE_P (ltype) >>> - && known_ne (TYPE_VECTOR_SUBPARTS (ltype), 1U)) >>> - n_adjacent_stores++; >>> - else >>> - inside_cost >>> - += record_stmt_cost (cost_vec, 1, scalar_store, >>> -stmt_info, 0, vect_body); >>> + n_adjacent_stores++; >>> continue; >>> } >>> tree newref, newoff; >>> @@ -8905,9 +8890,26 @@ vectorizable_store (vec_info *vinfo, >>> if (costing_p) >>> { >>> if (n_adjacent_stores > 0) >>> - vect_get_store_cost (vinfo, stmt_info, slp_node, >>> n_adjacent_stores, >>> - alignment_support_scheme, misalignment, >>> - &inside_cost, cost_vec); >>> + { >>> + /* Take a single lane vector type store as scalar >>> + store to avoid ICE like 110776. */ >>> + if (VECTOR_TYPE_P (ltype) >>> + && known_ne (TYPE_VECTOR_SUBPARTS (ltype), 1U)) >> >> Sorry to ask, since it's pre-existing, but could you change this to >> maybe_ne while you're there? nunits==1+1X should be treated as a vector >> rather than a scalar. > Sure, I made the change (see patch below) and re-validated on aarch64. > > It would also be good to check for performance regressions, now that we have > a patch to test: > I will run SPEC2017 with -mcpu=generic and -mcpu=native on Grace, but we > would appreciate help with benchmarking on other platforms. > Tamar, would you still be willing to test the patch on other platforms? > > If there are no other changes necessary and assuming there are no performance > regressions, I was planning to commit the patch in January after returning > from christmas break. > > In the meantime I wish everyone happy holidays. > Jennifer On Grace, the patch has no non-noise impact on performance for SPEC2017 with -mcpu=generic and -mcpu=native. I also re-validated on aarch64 today, no regression. Do you advise to run additional performance tests or is the patch ready to be pushed to trunk? Thanks, Jennifer > > This patch removes the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS tunable and > use_new_vector_costs entry in aarch64-tuning-flags.def and makes the > AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS paths in the backend the > default. To that end, the function aarch64_use_new_vector_costs_p and its uses > were removed. To prevent costing vec_to_scalar operations with 0, as > described in > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665481.html, > we adjusted vectorizable_store such that the variable n_adjacent_stores > also covers vec_to_scalar operations. This way vec_to_scalar operations > are not costed individually, but as a group. > As suggested by Richard Sandiford, the "known_ne" in the multilane-check > was replaced by "maybe_ne" in order to treat nunits==1+1X as a vector > rather than a scalar. > > Two tests were adjusted due to changes in codegen. In both cases, the > old code performed loop unrolling once, but the new code does not: > Example from gcc.target/aarch64/sve/strided_load_2.c (compiled with > -O2 -ftree-vectorize -march=armv8.2-a+sve -mtune=generic > -moverride=tune=none): > f_int64_t_32: >cbz w3, .L92 >mov x4, 0 >uxtwx3, w3 > + cntdx5 > + whilelo p7.d, xzr, x3 > + mov z29.s, w5 >mov z31.s, w2 > - whilelo p6.d, xzr, x3 > - mov x2, x3 > - index z30.s, #0, #1 > - uqdecd x2 > - ptrue p5.b, all > - whilelo p7.d, xzr, x2 > + index z30.d, #0, #1 > + ptrue p6.b, all >.p2align 3,,7 > .L94: > - ld1dz27.d, p7/z, [x0, #1, mul vl] > - ld1dz28.d, p6/z, [x0] > - movprfx z29, z31 > - mul z29.s, p5/m, z29.s, z30.s > - incwx4 > - uunpklo z0.d, z29.s > - uunpkhi z29.d, z29.s > - ld1dz25.d, p6/z, [x1, z0.d, lsl 3] >
Re: [PATCH 1/2] Alpha: Add memory clobbers to `builtin_longjmp' expansion
On 1/5/25 9:40 AM, Maciej W. Rozycki wrote: Add the same memory clobbers to `builtin_longjmp' for Alpha as with commit 41439bf6a647 ("builtins.c (expand_builtin_longjmp): Added two memory clobbers."), to prevent instructions that access memory via the frame or stack pointer from being moved across the write to the frame pointer. gcc/ * config/alpha/alpha.md (builtin_longjmp): Add memory clobbers. OK jeff
Re: [PATCH 2/2] Alpha: Restore frame pointer last in `builtin_longjmp' [PR64242]
On 1/5/25 9:40 AM, Maciej W. Rozycki wrote: Add similar arrangements to `builtin_longjmp' for Alpha as with commit 71b144289c1c ("re PR middle-end/64242 (Longjmp expansion incorrect)") and commit 511ed59d0b04 ("Fix PR64242 - Longjmp expansion incorrect"), so as to restore the frame pointer last, so that accesses to a local buffer supplied can still be fulfilled with memory accesses via the original frame pointer, fixing: FAIL: gcc.c-torture/execute/pr64242.c -O0 execution test FAIL: gcc.c-torture/execute/pr64242.c -O1 execution test FAIL: gcc.c-torture/execute/pr64242.c -O2 execution test FAIL: gcc.c-torture/execute/pr64242.c -O3 -g execution test FAIL: gcc.c-torture/execute/pr64242.c -Os execution test FAIL: gcc.c-torture/execute/pr64242.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/pr64242.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test and adding no regressions in `alpha-linux-gnu' testing. gcc/ PR middle-end/64242 * gcc/config/alpha/alpha.md (`builtin_longjmp'): Restore frame pointer last. Add frame clobber and schedule blockage. OK. jeff
Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive
> On Jan 6, 2025, at 09:21, Jeff Law wrote: > > > > On 1/6/25 7:11 AM, Qing Zhao wrote: >>> >>> Given it doesn't cause user visible UB, we could insert the trap *before* >>> the UB inducing statement. That would then make the statement unreachable >>> and it'd get removed avoiding the false positive diagnostic. >> Yes, that’s a good idea. >> However, in order to distinguish a user visible UB and a UB in the IL that >> is introduced purely by compiler, we might need some new marking in the IR? > I don't think we've ever really tackled that question; the closest I can > think of would be things like integer overflow which we try to avoid allowing > the compiler to introduce. If we take the integer overflow as the model, > then that would say we should be tackling this during loop unrolling. UB that is introduced by compiler transformation is one important cause of false positive warnings. There are two approaches to tackle this problem from my understanding: 1. Avoid generating such UB from the beginning. i.e, for every compiler transformation that might introduce such UB, we should add check to avoid generating it. 2. Marking the IR portion that were generated by compiler transformations, then check whether the UB is compiler generated when issue static checker warnings. Are there other approaches? The above is very rough and initial idea at this moment. Qing > > jeff >
Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive
On Mon, Jan 6, 2025 at 3:43 PM Qing Zhao wrote: > > > > > On Jan 6, 2025, at 09:21, Jeff Law wrote: > > > > > > > > On 1/6/25 7:11 AM, Qing Zhao wrote: > >>> > >>> Given it doesn't cause user visible UB, we could insert the trap *before* > >>> the UB inducing statement. That would then make the statement > >>> unreachable and it'd get removed avoiding the false positive diagnostic. > >> Yes, that’s a good idea. > >> However, in order to distinguish a user visible UB and a UB in the IL that > >> is introduced purely by compiler, we might need some new marking in the IR? > > I don't think we've ever really tackled that question; the closest I can > > think of would be things like integer overflow which we try to avoid > > allowing the compiler to introduce. If we take the integer overflow as the > > model, then that would say we should be tackling this during loop unrolling. > > UB that is introduced by compiler transformation is one important cause of > false positive warnings. > > There are two approaches to tackle this problem from my understanding: > > 1. Avoid generating such UB from the beginning. i.e, for every compiler > transformation that might introduce such UB, we should add check to avoid > generating it. > > 2. Marking the IR portion that were generated by compiler transformations, > then check whether the UB is compiler generated when issue static checker > warnings. > > Are there other approaches? Note unrolling doesn't introduce UB - it makes conditional UB "obvious". Note -Warray-bounds wants to diagnose UB, so doing path isolation and removing the UB would make -Warray-bounds useless. So unless the condition guarding the UB unrolling exposes is visibly false to the compiler but we fail to exploit that (missed optimization) there's not much that we can do. I think "folding" away the UB like what Jeff proposes trades false negatives for the false positive diagnostics. Note the unroller knows UB that effectively bounds the number of iterations, even on conditional paths and it uses this to limit the number of copies _and_ to prune unreachable paths (exploiting UB, avoiding diagnostics). But one of the limitations is that it only prunes paths in the last unrolled copy which can be insufficient (ISTR some PR where I noticed this). That said - I think for these unroller exposed cases of apparent false positives we should improve the path pruning in the unroller itself. For the other cases the path diagnostic might help clarify that the UB happens on the 'n-th' iteration of the loop when some additional condition is true/false. So in the end Jeff - I think your patch isn't a good approach for the issue at hand. Richard. > The above is very rough and initial idea at this moment. > > Qing > > > > > jeff > > >
[ping][PATCH] testsuite/118127: Pass fortran tests on ppc64le for IEEE128 long doubles
Ping! On 2024-12-19 08:16, Siddhesh Poyarekar wrote: Denormal behaviour is well defined for IEEE128 long doubles, so don't XFAIL some gfortran tests on ppc64le when configured with the IEEE128 long double ABI. gcc/testsuite/ChangeLog: PR testsuite/118127 * gfortran.dg/default_format_2.f90: Don't xfail for ppc_ieee128_ok. * gfortran.dg/default_format_denormal_2.f90: Likewise. * gfortran.dg/large_real_kind_form_io_2.f90: Likewise. Signed-off-by: Siddhesh Poyarekar --- gcc/testsuite/gfortran.dg/default_format_2.f90 | 2 +- gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 | 2 +- gcc/testsuite/gfortran.dg/large_real_kind_form_io_2.f90 | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gfortran.dg/default_format_2.f90 b/gcc/testsuite/gfortran.dg/default_format_2.f90 index 5ad7b3a6429..e76dce33b35 100644 --- a/gcc/testsuite/gfortran.dg/default_format_2.f90 +++ b/gcc/testsuite/gfortran.dg/default_format_2.f90 @@ -1,4 +1,4 @@ -! { dg-do run { xfail powerpc*-apple-darwin* powerpc*-*-linux* } } +! { dg-do run { xfail { { powerpc*-apple-darwin* || powerpc*-*-linux* } && { ! ppc_ieee128_ok } } } } ! { dg-require-effective-target fortran_large_real } ! Test XFAILed on these platforms because the system's printf() lacks ! proper support for denormalized long doubles. See PR24685 diff --git a/gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 b/gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 index e9ccf5e8f61..1872396764d 100644 --- a/gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 +++ b/gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 @@ -1,4 +1,4 @@ -! { dg-do run { xfail powerpc*-*-* } } +! { dg-do run { xfail { powerpc*-*-* && { ! ppc_ieee128_ok } } } } ! { dg-require-effective-target fortran_large_real } ! Test XFAILed on this platform because the system's printf() lacks ! proper support for denormalized long doubles. See PR24685 diff --git a/gcc/testsuite/gfortran.dg/large_real_kind_form_io_2.f90 b/gcc/testsuite/gfortran.dg/large_real_kind_form_io_2.f90 index 34b8aec462c..3a4a9e1b078 100644 --- a/gcc/testsuite/gfortran.dg/large_real_kind_form_io_2.f90 +++ b/gcc/testsuite/gfortran.dg/large_real_kind_form_io_2.f90 @@ -1,4 +1,4 @@ -! { dg-do run { xfail powerpc*-apple-darwin* powerpc*-*-linux* } } +! { dg-do run { xfail { { powerpc*-apple-darwin* || powerpc*-*-linux* } && { ! ppc_ieee128_ok } } } } ! Test XFAILed on these platforms because the system's printf() lacks ! proper support for denormalized long doubles. See PR24685 ! { dg-require-effective-target fortran_large_real }
[PATCH] Only apply adjust_args in OpenMP dispatch if variant substitution occurs
This is a followup to 084ea8ad584 OpenMP: middle-end support for dispatch + adjust_args. This patch fixes a bug that caused arguments in an OpenMP dispatch call to be modified even when no variant substitution occurred. gcc/ChangeLog: * gimplify.cc (gimplify_call_expr): Create variable variant_substituted_p to control whether adjust_args applies. --- gcc/gimplify.cc | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index bd324be926a..251d581f44c 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -3857,7 +3857,8 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) enum gimplify_status ret; int i, nargs; gcall *call; - bool builtin_va_start_p = false, omp_dispatch_p = false; + bool builtin_va_start_p = false, omp_dispatch_p = false, + variant_substituted_p = false; location_t loc = EXPR_LOCATION (*expr_p); gcc_assert (TREE_CODE (*expr_p) == CALL_EXPR); @@ -4035,7 +4036,10 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) { tree variant = omp_resolve_declare_variant (fndecl); if (variant != fndecl) - CALL_EXPR_FN (*expr_p) = build1 (ADDR_EXPR, fnptrtype, variant); + { + CALL_EXPR_FN (*expr_p) = build1 (ADDR_EXPR, fnptrtype, variant); + variant_substituted_p = true; + } } /* There is a sequence point before the call, so any side effects in @@ -4325,8 +4329,9 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) } } - if ((need_device_ptr && !is_device_ptr) - || (need_device_addr && !has_device_addr)) + if (variant_substituted_p + && ((need_device_ptr && !is_device_ptr) + || (need_device_addr && !has_device_addr))) { if (dispatch_device_num == NULL_TREE) { -- 2.45.2
Re: [PATCH v2 5/7] IRA+LRA: Let the backend request to split basic blocks
"Maciej W. Rozycki" writes: > The next change for Alpha will produce extra labels and branches in > reload, which in turn requires basic blocks to be split at completion. > We do this already for functions that can trap, so just extend the > arrangement with a flag for the backend to use whenever it finds it > necessary. > > gcc/ > * function.h (struct function): Add > `split_basic_blocks_after_reload' member. > * lra.cc (lra): Handle it. > * reload1.cc (reload): Likewise. > --- > This was approved by Richard Sandiford in v1, so I'll commit it along with > 6/7, which relies on it. Still stands, but I didn't think of this before, so thought I'd ask: Would it also work to record max_label_num () before RA and then compare that recorded value with max_label_num () after RA? Or would that have too many false positives? Thanks, Richard > No change from v1. > --- > gcc/function.h |3 +++ > gcc/lra.cc |6 -- > gcc/reload1.cc |6 -- > 3 files changed, 11 insertions(+), 4 deletions(-) > > gcc-split-basic-blocks-after-reload.diff > > Index: gcc/gcc/function.h > === > --- gcc.orig/gcc/function.h > +++ gcc/gcc/function.h > @@ -449,6 +449,9 @@ struct GTY(()) function { >/* Set for artificial function created for [[assume (cond)]]. > These should be GIMPLE optimized, but not expanded to RTL. */ >unsigned int assume_function : 1; > + > + /* Nonzero if reload will have to split basic blocks. */ > + unsigned int split_basic_blocks_after_reload : 1; > }; > > /* Add the decl D to the local_decls list of FUN. */ > Index: gcc/gcc/lra.cc > === > --- gcc.orig/gcc/lra.cc > +++ gcc/gcc/lra.cc > @@ -2594,8 +2594,10 @@ lra (FILE *f, int verbose) > >inserted_p = fixup_abnormal_edges (); > > - /* We've possibly turned single trapping insn into multiple ones. */ > - if (cfun->can_throw_non_call_exceptions) > + /* Split basic blocks if we've possibly turned single trapping insn > + into multiple ones or otherwise the backend requested to do so. */ > + if (cfun->can_throw_non_call_exceptions > + || cfun->split_basic_blocks_after_reload) > { >auto_sbitmap blocks (last_basic_block_for_fn (cfun)); >bitmap_ones (blocks); > Index: gcc/gcc/reload1.cc > === > --- gcc.orig/gcc/reload1.cc > +++ gcc/gcc/reload1.cc > @@ -1272,8 +1272,10 @@ reload (rtx_insn *first, int global) > >inserted = fixup_abnormal_edges (); > > - /* We've possibly turned single trapping insn into multiple ones. */ > - if (cfun->can_throw_non_call_exceptions) > + /* Split basic blocks if we've possibly turned single trapping insn > + into multiple ones or otherwise the backend requested to do so. */ > + if (cfun->can_throw_non_call_exceptions > + || cfun->split_basic_blocks_after_reload) > { >auto_sbitmap blocks (last_basic_block_for_fn (cfun)); >bitmap_ones (blocks);
[PATCH] Accept commas between clauses in OpenMP declare variant
Add support to the Fortran parser for the new OpenMP syntax that allows a comma after the directive name and between clauses of declare variant. The C and C++ parsers already support this syntax so only a new test is added. gcc/fortran/ChangeLog: * openmp.cc (gfc_match_omp_declare_variant): Match comma after directive name and between clauses. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/declare-variant-2.f90: Remove error test for a comma after the directive name. * c-c++-common/gomp/adjust-args-5.c: New test. * gfortran.dg/gomp/adjust-args-11.f90: New test. --- gcc/fortran/openmp.cc | 4 ++ .../c-c++-common/gomp/adjust-args-5.c | 21 + .../gfortran.dg/gomp/adjust-args-11.f90 | 45 +++ .../gfortran.dg/gomp/declare-variant-2.f90| 3 -- 4 files changed, 70 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/adjust-args-5.c create mode 100644 gcc/testsuite/gfortran.dg/gomp/adjust-args-11.f90 diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 79c0f1b2e62..9d28dc9 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -6595,6 +6595,10 @@ gfc_match_omp_declare_variant (void) for (;;) { + gfc_gobble_whitespace (); + gfc_match_char (','); + gfc_gobble_whitespace (); + enum clause { match, diff --git a/gcc/testsuite/c-c++-common/gomp/adjust-args-5.c b/gcc/testsuite/c-c++-common/gomp/adjust-args-5.c new file mode 100644 index 000..863b77458e4 --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/adjust-args-5.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ + +/* Check that the OpenMP 6 syntax with commas after the directive name and + between clauses is supported. */ + +int f (int a, void *b, float c[2]); + +#pragma omp declare variant (f), match (construct={dispatch}), adjust_args (nothing: a), adjust_args (need_device_ptr: b, c) +int f0 (int a, void *b, float c[2]); + +int test () { + int a; + void *b; + float c[2]; + struct {int a;} s; + + #pragma omp dispatch, novariants(0), nocontext(1) + s.a = f0 (a, b, c); + + return s.a; +} diff --git a/gcc/testsuite/gfortran.dg/gomp/adjust-args-11.f90 b/gcc/testsuite/gfortran.dg/gomp/adjust-args-11.f90 new file mode 100644 index 000..3b26f1b0868 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/adjust-args-11.f90 @@ -0,0 +1,45 @@ +! { dg-do compile } + +! Check that the OpenMP 5.1 syntax with commas between clauses is supported. +! A comma after the directive name is introduced in 5.2, which is not supported +! yet. + +module main + use iso_c_binding, only: c_ptr + implicit none + + type :: struct +integer :: a +real :: b + end type + + interface +integer function f(a, b, c) + import c_ptr + integer, intent(in) :: a + type(c_ptr), intent(inout) :: b + type(c_ptr), intent(out) :: c(:) +end function +integer function f0(a, b, c) + import c_ptr + integer, intent(in) :: a + type(c_ptr), intent(inout) :: b + type(c_ptr), intent(out) :: c(:) + !$omp declare variant (f), match (construct={dispatch}), & + !$omp& adjust_args (nothing: a), adjust_args (need_device_ptr: b, c) +end function + end interface + +contains +subroutine test + integer :: a + type(c_ptr) :: b + type(c_ptr) :: c(2) + type(struct) :: s + + !!$omp dispatch, nocontext(.false.), novariants(.false.) ! Not supported yet + !$omp dispatch nocontext(.false.), novariants(.false.) + s%a = f0 (a, b, c) + +end subroutine +end module diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-2.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-2.f90 index 62d2cb96fac..beea713efba 100644 --- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-2.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-2.f90 @@ -182,9 +182,6 @@ contains subroutine f74 () !$omp declare variant (f1) match(construct={requires}) ! { dg-warning "unknown selector 'requires' for context selector set 'construct' at .1." } end subroutine - subroutine f75 () -!$omp declare variant (f1),match(construct={parallel}) ! { dg-error "expected 'match' or 'adjust_args' at .1." } - end subroutine subroutine f76 () !$omp declare variant (f1) match(implementation={atomic_default_mem_order("relaxed")}) ! { dg-error "expected identifier at .1." } end subroutine -- 2.45.2
[PATCH] i386: Add br_mispredict_scale in cost table.
Hi, For later processors, the pipeline went deeper so the penalty for untaken branch can be larger than before. Add a new parameter br_mispredict_scale to describe the penalty, and adopt to noce_max_ifcvt_seq_cost hook to allow longer sequence to be converted with cmove. This improves cpu2017 544 with -Ofast -march=native for 14% on P-core SPR, and 8% on E-core SRF, no other regression observed. Bootstrapped & regtested on x86-64-pc-linux-gnu. OK for trunk? gcc/ChangeLog: * config/i386/i386.cc (ix86_noce_max_ifcvt_seq_cost): Adjust cost with ix86_tune_cost->br_mispredict_scale. * config/i386/i386.h (processor_costs): Add br_mispredict_scale. * config/i386/x86-tune-costs.h: Add new br_mispredict_scale to all processor_costs, in which icelake_cost/alderlake_cost with value COSTS_N_INSNS (2) + 3 and other processor with value COSTS_N_INSNS (2). gcc/testsuite/ChangeLog: * gcc.target/i386/cmov12.c: New test. --- gcc/config/i386/i386.cc| 8 ++- gcc/config/i386/i386.h | 2 ++ gcc/config/i386/x86-tune-costs.h | 33 ++ gcc/testsuite/gcc.target/i386/cmov12.c | 21 4 files changed, 63 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/cmov12.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 655335e2f47..11770aa8a50 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -25088,7 +25088,13 @@ ix86_max_noce_ifcvt_seq_cost (edge e) return param_max_rtl_if_conversion_unpredictable_cost; } - return BRANCH_COST (true, predictable_p) * COSTS_N_INSNS (2); + /* For modern machines with deeper pipeline, the penalty for branch + misprediction could be higher than before to reset the pipeline + slots. Add parameter br_mispredict_scale as a factor to describe + the impact of reseting the pipeline. */ + + return BRANCH_COST (true, predictable_p) +* ix86_tune_cost->br_mispredict_scale; } /* Return true if SEQ is a good candidate as a replacement for the diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 00626d539a9..e8e528c7811 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -232,6 +232,8 @@ struct processor_costs { to be unrolled. */ const unsigned small_unroll_factor; /* Unroll factor for small loop to be unrolled. */ + const int br_mispredict_scale; /* Branch mispredict scale for ifcvt + threshold. */ }; extern const struct processor_costs *ix86_cost; diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h index 56a09f12b94..a4a128cd5dd 100644 --- a/gcc/config/i386/x86-tune-costs.h +++ b/gcc/config/i386/x86-tune-costs.h @@ -137,6 +137,7 @@ struct processor_costs ix86_size_cost = {/* costs for tuning for size */ NULL,/* Func alignment. */ 4, /* Small unroll limit. */ 2, /* Small unroll factor. */ + COSTS_N_INSNS (2), /* Branch mispredict scale. */ }; /* Processor costs (relative to an add) */ @@ -248,6 +249,7 @@ struct processor_costs i386_cost = {/* 386 specific costs */ "4", /* Func alignment. */ 4, /* Small unroll limit. */ 2, /* Small unroll factor. */ + COSTS_N_INSNS (2), /* Branch mispredict scale. */ }; static stringop_algs i486_memcpy[2] = { @@ -360,6 +362,7 @@ struct processor_costs i486_cost = {/* 486 specific costs */ "16",/* Func alignment. */ 4, /* Small unroll limit. */ 2, /* Small unroll factor. */ + COSTS_N_INSNS (2), /* Branch mispredict scale. */ }; static stringop_algs pentium_memcpy[2] = { @@ -470,6 +473,7 @@ struct processor_costs pentium_cost = { "16",/* Func alignment. */ 4, /* Small unroll limit. */ 2, /* Small unroll factor. */ + COSTS_N_INSNS (2), /* Branch mispredict scale. */ }; static const @@ -573,6 +577,7 @@ struct processor_costs lakemont_cost = { "16",/* Func alignment. */ 4, /* Small unroll limit. */ 2, /* Small unroll factor. */ + COSTS_N_INSNS (2), /* Branch mispredict scale. */ }; /* PentiumPro has optimized rep instructions for blocks aligned by 8 bytes @@ -691,6 +696,7 @@ struct processor_costs pentiumpro_cost =
Re: [PATCH v2 6/7] Alpha: Add option to avoid data races for sub-longword memory stores [PR117759]
On Mon, 6 Jan 2025 at 16:59, Linus Torvalds wrote: > > There is absolutely no gray area here. It was always buggy, and the > alpha architecture was always completely and fundamentally > mis-designed. Note that I really do want to re-emphasize that while I think it's kind of interesting that Maciej is trying to make gcc DTRT on alpha, the non-BWX machines really are a completely broken architecture and almost entirely unfixable. Yeah, yeah, Maciej also has patches to avoid all the ldq_u/stq_u sequences for regular byte accesses into actually using ldl_l / stl_c sequences, but those instructions take hundreds of cycles and go out on the bus outside the CPU. So using actual thread-safe byte ops with ldl_l / stl_c turns those non-BWX alpha CPU's into something very sad and pointless. You might as well go full retro and use a 6502 or something. And even the newer alphas that *had* BWX were designed to still do byte operations with quadword ops and masking. Yes, byte ops existed, but they were still very much designed as a "only when you really can't use the masked word model". For example, the standard "memset()" library routine for EV6 literally does exactly the thing that I said would be a compiler bug to do. Because that's literally the core design of the architecture: it's buggy. So while I applaud Maciej's efforts, I'm not convinced they are all that productive. Even with a fixed compiler, it's all broken anyway. Of course, most of the time, you won't ever see the breakage. It's there, but hitting it in practice is almost impossible. The Linux kernel uses the known-broken memcpy and memset library code. All user space does the same. They are hopelessly and fundamentally broken, but they work in practice as long as you don't have concurrent accesses "near-by" in space-time. Linus
Re: [PATCH] [sanitizer] Replace uptr by usize/SIZE_T in interfaces
On Mon, Jan 06, 2025 at 10:49:31AM +0100, Jakub Jelinek wrote: > On Mon, Jan 06, 2025 at 09:55:16AM +0100, Stefan Schulze Frielinghaus wrote: > > For some targets uptr is mapped to unsigned int and size_t to unsigned > > long and sizeof(int)==sizeof(long) holds. Still, these are distinct > > types and type checking may fail. Therefore, replace uptr by > > usize/SIZE_T wherever a size_t is expected. > > > > Part of #116957 > > > > Cherry picked from LLVM commit 9a156f6b2b0c892d8713ba907f07f027b24953d8 > > (removed memprof, msan, nsan, and tsan parts). > > > > PR sanitizer/117725 > > libsanitizer/ChangeLog: > > > > * asan/asan_interceptors.cpp: Cherry picked LLVM commit > > 9a156f6b2b0c892d8713ba907f07f027b24953d8. > > * asan/asan_interceptors.h: Ditto. > > * asan/asan_interceptors_memintrinsics.h: Ditto. > > * asan/sanitizer_common_interceptors.inc: Ditto. > > * sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc: > > Ditto. > > * sanitizer_common/sanitizer_platform_limits_posix.h: > > Ditto. > > Why have you removed the tsan parts? > compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp > is in > libsanitizer/tsan/tsan_interceptors_posix.cpp Thanks for spotting this! I've added this back and will send a v2 patch promptly. I've also moved/indented the line PR sanitizer/117725 in all patches accordingly. Thanks, Stefan
[PATCH v2] Replace uptr by usize/SIZE_T in interfaces
For some targets uptr is mapped to unsigned int and size_t to unsigned long and sizeof(int)==sizeof(long) holds. Still, these are distinct types and type checking may fail. Therefore, replace uptr by usize/SIZE_T wherever a size_t is expected. Part of #116957 Cherry picked from LLVM commit 9a156f6b2b0c892d8713ba907f07f027b24953d8 (removed memprof, msan, and nsan parts). libsanitizer/ChangeLog: PR sanitizer/117725 * asan/asan_interceptors.cpp: Cherry picked LLVM commit 9a156f6b2b0c892d8713ba907f07f027b24953d8. * asan/asan_interceptors.h: Ditto. * asan/asan_interceptors_memintrinsics.h: Ditto. * asan/sanitizer_common_interceptors.inc: Ditto. * sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc: Ditto. * sanitizer_common/sanitizer_platform_limits_posix.h: Ditto. * tsan/tsan_interceptors_posix.cpp: Ditto. --- Added tsan parts. libsanitizer/asan/asan_interceptors.cpp | 6 ++-- libsanitizer/asan/asan_interceptors.h | 6 ++-- .../asan/asan_interceptors_memintrinsics.h| 4 +-- .../sanitizer_common_interceptors.inc | 26 +++--- ...izer_common_interceptors_memintrinsics.inc | 34 +-- .../sanitizer_platform_limits_posix.h | 2 +- libsanitizer/tsan/tsan_interceptors_posix.cpp | 4 +-- 7 files changed, 41 insertions(+), 41 deletions(-) diff --git a/libsanitizer/asan/asan_interceptors.cpp b/libsanitizer/asan/asan_interceptors.cpp index c13bcf2382b..48f7636fbcb 100644 --- a/libsanitizer/asan/asan_interceptors.cpp +++ b/libsanitizer/asan/asan_interceptors.cpp @@ -85,7 +85,7 @@ int OnExit() { // -- Wrappers {{{1 using namespace __asan; -DECLARE_REAL_AND_INTERCEPTOR(void *, malloc, uptr) +DECLARE_REAL_AND_INTERCEPTOR(void *, malloc, usize) DECLARE_REAL_AND_INTERCEPTOR(void, free, void *) #define COMMON_INTERCEPT_FUNCTION_VER(name, ver) \ @@ -529,7 +529,7 @@ DEFINE_REAL(char*, index, const char *string, int c) return REAL(strcat)(to, from); } -INTERCEPTOR(char*, strncat, char *to, const char *from, uptr size) { +INTERCEPTOR(char*, strncat, char *to, const char *from, usize size) { void *ctx; ASAN_INTERCEPTOR_ENTER(ctx, strncat); AsanInitFromRtl(); @@ -617,7 +617,7 @@ INTERCEPTOR(char*, __strdup, const char *s) { } #endif // ASAN_INTERCEPT___STRDUP -INTERCEPTOR(char*, strncpy, char *to, const char *from, uptr size) { +INTERCEPTOR(char*, strncpy, char *to, const char *from, usize size) { void *ctx; ASAN_INTERCEPTOR_ENTER(ctx, strncpy); AsanInitFromRtl(); diff --git a/libsanitizer/asan/asan_interceptors.h b/libsanitizer/asan/asan_interceptors.h index 652379d39a3..85cde07ca7e 100644 --- a/libsanitizer/asan/asan_interceptors.h +++ b/libsanitizer/asan/asan_interceptors.h @@ -129,11 +129,11 @@ void InitializePlatformInterceptors(); # define ASAN_INTERCEPT_PTHREAD_ATFORK 0 #endif -DECLARE_REAL(int, memcmp, const void *a1, const void *a2, uptr size) +DECLARE_REAL(int, memcmp, const void *a1, const void *a2, SIZE_T size) DECLARE_REAL(char*, strchr, const char *str, int c) DECLARE_REAL(SIZE_T, strlen, const char *s) -DECLARE_REAL(char*, strncpy, char *to, const char *from, uptr size) -DECLARE_REAL(uptr, strnlen, const char *s, uptr maxlen) +DECLARE_REAL(char*, strncpy, char *to, const char *from, SIZE_T size) +DECLARE_REAL(SIZE_T, strnlen, const char *s, SIZE_T maxlen) DECLARE_REAL(char*, strstr, const char *s1, const char *s2) # if !SANITIZER_APPLE diff --git a/libsanitizer/asan/asan_interceptors_memintrinsics.h b/libsanitizer/asan/asan_interceptors_memintrinsics.h index eb44f8f2f72..14727a5d665 100644 --- a/libsanitizer/asan/asan_interceptors_memintrinsics.h +++ b/libsanitizer/asan/asan_interceptors_memintrinsics.h @@ -18,8 +18,8 @@ #include "asan_mapping.h" #include "interception/interception.h" -DECLARE_REAL(void *, memcpy, void *to, const void *from, uptr size) -DECLARE_REAL(void *, memset, void *block, int c, uptr size) +DECLARE_REAL(void *, memcpy, void *to, const void *from, SIZE_T size) +DECLARE_REAL(void *, memset, void *block, int c, SIZE_T size) namespace __asan { diff --git a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc index b8627f8557a..542176e6e5a 100644 --- a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc +++ b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc @@ -445,7 +445,7 @@ INTERCEPTOR(SIZE_T, strnlen, const char *s, SIZE_T maxlen) { #endif #if SANITIZER_INTERCEPT_STRNDUP -INTERCEPTOR(char*, strndup, const char *s, uptr size) { +INTERCEPTOR(char*, strndup, const char *s, usize size) { void *ctx; COMMON_INTERCEPTOR_STRNDUP_IMPL(ctx, s, size); } @@ -455,7 +455,7 @@ INTERCEPTOR(char*, strndup, const char *s, uptr size) { #endif // SANITIZER_INTERCEPT_STRNDUP #if SANITIZER_INTERCEPT___STRNDUP -INTERCEPTOR(char*, __str
[PATCH] [sanitizer] Fix few size types in memprof (#119114)
From: Vitaly Buka Fix type in a few related Min() calls. Follow up to #116957. Co-authored-by: Stefan Schulze Frielinghaus Cherry picked from LLVM commit 6dec33834d1fd89f16e271dde9607c1de9554144 (removed memprof part). PR sanitizer/117725 libsanitizer/ChangeLog: * asan/asan_interceptors.cpp: Cherry picked from LLVM commit 6dec33834d1fd89f16e271dde9607c1de9554144. * sanitizer_common/sanitizer_common_interceptors.inc: Ditto. --- libsanitizer/asan/asan_interceptors.cpp| 4 ++-- .../sanitizer_common/sanitizer_common_interceptors.inc| 2 +- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/libsanitizer/asan/asan_interceptors.cpp b/libsanitizer/asan/asan_interceptors.cpp index be214f3ebeb6..e6c95dfb82c3 100644 --- a/libsanitizer/asan/asan_interceptors.cpp +++ b/libsanitizer/asan/asan_interceptors.cpp @@ -535,7 +535,7 @@ INTERCEPTOR(char*, strncat, char *to, const char *from, usize size) { AsanInitFromRtl(); if (flags()->replace_str) { uptr from_length = MaybeRealStrnlen(from, size); -uptr copy_length = Min(size, from_length + 1); +uptr copy_length = Min(size, from_length + 1); ASAN_READ_RANGE(ctx, from, copy_length); uptr to_length = internal_strlen(to); ASAN_READ_STRING_OF_LEN(ctx, to, to_length, to_length); @@ -622,7 +622,7 @@ INTERCEPTOR(char*, strncpy, char *to, const char *from, usize size) { ASAN_INTERCEPTOR_ENTER(ctx, strncpy); AsanInitFromRtl(); if (flags()->replace_str) { -uptr from_size = Min(size, MaybeRealStrnlen(from, size) + 1); +uptr from_size = Min(size, MaybeRealStrnlen(from, size) + 1); CHECK_RANGES_OVERLAP("strncpy", to, from_size, from, from_size); ASAN_READ_RANGE(ctx, from, from_size); ASAN_WRITE_RANGE(ctx, to, size); diff --git a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc index f6b695defce4..99ad3b244d4e 100644 --- a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc +++ b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc @@ -347,7 +347,7 @@ extern const short *_tolower_tab_; uptr copy_length = internal_strnlen(s, size); \ char *new_mem = (char *)WRAP(malloc)(copy_length + 1); \ if (common_flags()->intercept_strndup) {\ -COMMON_INTERCEPTOR_READ_STRING(ctx, s, Min(size, copy_length + 1)); \ +COMMON_INTERCEPTOR_READ_STRING(ctx, s, Min(size, copy_length + 1)); \ } \ if (new_mem) { \ COMMON_INTERCEPTOR_COPY_STRING(ctx, new_mem, s, copy_length); \ -- 2.47.0
[PATCH 0/4] libsanitizer: Fix PR117725
These four cherry picks resolve PR117725, i.e., bootstrap is restored on s390 -m31. Ok for mainline? -- 2.47.0
[PATCH] [sanitizer] Fix type in some Min() calls (#119248)
This is a follow-up to 6dec33834d1fd89f16e271dde9607c1de9554144 and #116957 and #119114. Cherry picked from LLVM commit 65a2eb0b1589590ae78cc1e5f05cd004b3b3bec5. PR sanitizer/117725 libsanitizer/ChangeLog: * sanitizer_common/sanitizer_common_interceptors.inc: Cherry picked from LLVM commit 65a2eb0b1589590ae78cc1e5f05cd004b3b3bec5. --- .../sanitizer_common_interceptors.inc | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc index 47436a6cd20f..24a8a2d4dc55 100644 --- a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc +++ b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc @@ -520,14 +520,14 @@ INTERCEPTOR(int, strncmp, const char *s1, const char *s2, usize size) { void *ctx; COMMON_INTERCEPTOR_ENTER(ctx, strncmp, s1, s2, size); unsigned char c1 = 0, c2 = 0; - uptr i; + usize i; for (i = 0; i < size; i++) { c1 = (unsigned char)s1[i]; c2 = (unsigned char)s2[i]; if (c1 != c2 || c1 == '\0') break; } - uptr i1 = i; - uptr i2 = i; + usize i1 = i; + usize i2 = i; if (common_flags()->strict_string_checks) { for (; i1 < size && s1[i1]; i1++) {} for (; i2 < size && s2[i2]; i2++) {} @@ -583,14 +583,14 @@ INTERCEPTOR(int, strncasecmp, const char *s1, const char *s2, SIZE_T size) { void *ctx; COMMON_INTERCEPTOR_ENTER(ctx, strncasecmp, s1, s2, size); unsigned char c1 = 0, c2 = 0; - uptr i; + usize i; for (i = 0; i < size; i++) { c1 = (unsigned char)s1[i]; c2 = (unsigned char)s2[i]; if (CharCaseCmp(c1, c2) != 0 || c1 == '\0') break; } - uptr i1 = i; - uptr i2 = i; + usize i1 = i; + usize i2 = i; if (common_flags()->strict_string_checks) { for (; i1 < size && s1[i1]; i1++) {} for (; i2 < size && s2[i2]; i2++) {} @@ -851,7 +851,7 @@ int MemcmpInterceptorCommon(void *ctx, unsigned char c1 = 0, c2 = 0; const unsigned char *s1 = (const unsigned char*)a1; const unsigned char *s2 = (const unsigned char*)a2; - uptr i; + usize i; for (i = 0; i < size; i++) { c1 = s1[i]; c2 = s2[i]; -- 2.47.0
[PATCH] [sanitizer] Replace uptr by usize/SIZE_T in interfaces
For some targets uptr is mapped to unsigned int and size_t to unsigned long and sizeof(int)==sizeof(long) holds. Still, these are distinct types and type checking may fail. Therefore, replace uptr by usize/SIZE_T wherever a size_t is expected. Part of #116957 Cherry picked from LLVM commit 9a156f6b2b0c892d8713ba907f07f027b24953d8 (removed memprof, msan, nsan, and tsan parts). PR sanitizer/117725 libsanitizer/ChangeLog: * asan/asan_interceptors.cpp: Cherry picked LLVM commit 9a156f6b2b0c892d8713ba907f07f027b24953d8. * asan/asan_interceptors.h: Ditto. * asan/asan_interceptors_memintrinsics.h: Ditto. * asan/sanitizer_common_interceptors.inc: Ditto. * sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc: Ditto. * sanitizer_common/sanitizer_platform_limits_posix.h: Ditto. --- libsanitizer/asan/asan_interceptors.cpp| 6 ++-- libsanitizer/asan/asan_interceptors.h | 6 ++-- .../asan/asan_interceptors_memintrinsics.h| 4 +-- .../sanitizer_common_interceptors.inc | 26 +++--- ...izer_common_interceptors_memintrinsics.inc | 34 +-- .../sanitizer_platform_limits_posix.h | 2 +- 14 files changed, 65 insertions(+), 66 deletions(-) diff --git a/libsanitizer/asan/asan_interceptors.cpp b/libsanitizer/asan/asan_interceptors.cpp index 4129ee807612..be214f3ebeb6 100644 --- a/libsanitizer/asan/asan_interceptors.cpp +++ b/libsanitizer/asan/asan_interceptors.cpp @@ -85,7 +85,7 @@ int OnExit() { // -- Wrappers {{{1 using namespace __asan; -DECLARE_REAL_AND_INTERCEPTOR(void *, malloc, uptr) +DECLARE_REAL_AND_INTERCEPTOR(void *, malloc, usize) DECLARE_REAL_AND_INTERCEPTOR(void, free, void *) #define COMMON_INTERCEPT_FUNCTION_VER(name, ver) \ @@ -529,7 +529,7 @@ DEFINE_REAL(char*, index, const char *string, int c) return REAL(strcat)(to, from); } -INTERCEPTOR(char*, strncat, char *to, const char *from, uptr size) { +INTERCEPTOR(char*, strncat, char *to, const char *from, usize size) { void *ctx; ASAN_INTERCEPTOR_ENTER(ctx, strncat); AsanInitFromRtl(); @@ -617,7 +617,7 @@ INTERCEPTOR(char*, __strdup, const char *s) { } #endif // ASAN_INTERCEPT___STRDUP -INTERCEPTOR(char*, strncpy, char *to, const char *from, uptr size) { +INTERCEPTOR(char*, strncpy, char *to, const char *from, usize size) { void *ctx; ASAN_INTERCEPTOR_ENTER(ctx, strncpy); AsanInitFromRtl(); diff --git a/libsanitizer/asan/asan_interceptors.h b/libsanitizer/asan/asan_interceptors.h index 826b45f5ada8..3e2386eaf809 100644 --- a/libsanitizer/asan/asan_interceptors.h +++ b/libsanitizer/asan/asan_interceptors.h @@ -124,11 +124,11 @@ void InitializePlatformInterceptors(); # define ASAN_INTERCEPT_PTHREAD_ATFORK 0 #endif -DECLARE_REAL(int, memcmp, const void *a1, const void *a2, uptr size) +DECLARE_REAL(int, memcmp, const void *a1, const void *a2, SIZE_T size) DECLARE_REAL(char*, strchr, const char *str, int c) DECLARE_REAL(SIZE_T, strlen, const char *s) -DECLARE_REAL(char*, strncpy, char *to, const char *from, uptr size) -DECLARE_REAL(uptr, strnlen, const char *s, uptr maxlen) +DECLARE_REAL(char*, strncpy, char *to, const char *from, SIZE_T size) +DECLARE_REAL(SIZE_T, strnlen, const char *s, SIZE_T maxlen) DECLARE_REAL(char*, strstr, const char *s1, const char *s2) # if !SANITIZER_APPLE diff --git a/libsanitizer/asan/asan_interceptors_memintrinsics.h b/libsanitizer/asan/asan_interceptors_memintrinsics.h index eb44f8f2f729..14727a5d665e 100644 --- a/libsanitizer/asan/asan_interceptors_memintrinsics.h +++ b/libsanitizer/asan/asan_interceptors_memintrinsics.h @@ -18,8 +18,8 @@ #include "asan_mapping.h" #include "interception/interception.h" -DECLARE_REAL(void *, memcpy, void *to, const void *from, uptr size) -DECLARE_REAL(void *, memset, void *block, int c, uptr size) +DECLARE_REAL(void *, memcpy, void *to, const void *from, SIZE_T size) +DECLARE_REAL(void *, memset, void *block, int c, SIZE_T size) namespace __asan { diff --git a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc index ba3693dbd11f..f6b695defce4 100644 --- a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc +++ b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc @@ -445,7 +445,7 @@ INTERCEPTOR(SIZE_T, strnlen, const char *s, SIZE_T maxlen) { #endif #if SANITIZER_INTERCEPT_STRNDUP -INTERCEPTOR(char*, strndup, const char *s, uptr size) { +INTERCEPTOR(char*, strndup, const char *s, usize size) { void *ctx; COMMON_INTERCEPTOR_STRNDUP_IMPL(ctx, s, size); } @@ -455,7 +455,7 @@ INTERCEPTOR(char*, strndup, const char *s, uptr size) { #endif // SANITIZER_INTERCEPT_STRNDUP #if SANITIZER_INTERCEPT___STRNDUP -INTERCEPTOR(char*, __strndup, const char *s, uptr size) { +INTERCEPTOR(char*, __strndup, const char *s, usize size) { void *ctx; COMMON_INTERCEP
[PATCH] [sanitizer] Add type __sanitizer::ssize (#116957)
Since the sanitizer merge in commit r15-5164-gfa321004f3f628 of GCC which entails LLVM commit 61a6439f35b6de28ff4aff4450d6fca970292fd5, GCCs bootstrap is broken on s390 -m31. This is due to commit ec68dc1ca4d967b599f1202855917d5ec9cae52f which introduces stricter type checking which is why GCC bootstrap fails with ``` In file included from /gcc/src/libsanitizer/interception/interception.h:18, from /gcc/src/libsanitizer/interception/interception_type_test.cpp:14: /gcc/src/libsanitizer/interception/interception_type_test.cpp:30:61: error: static assertion failed 30 | COMPILER_CHECK((__sanitizer::is_same<::SSIZE_T, ::ssize_t>::value)); |~^~ /gcc/src/libsanitizer/sanitizer_common/sanitizer_internal_defs.h:363:44: note: in definition of macro 'COMPILER_CHECK' 363 | #define COMPILER_CHECK(pred) static_assert(pred, "") |^~~~ make[8]: *** [Makefile:469: interception_type_test.lo] Error 1 ``` The culprit seems to be that we don't check for equality of type sizes anymore but rather whether the types are indeed the same. On s390 -m31 we have that `sizeof(int)==sizeof(long)` holds which is why previously the checks succeeded. They fail now because ``` size_t => unsigned long ssize_t => long ptrdiff_t => int ::SSIZE_T => __sanitizer::sptr => int ::PTRDIFF_T => __sanitizer::sptr => int ``` This is fixed by mapping `SSIZE_T` to `long` in the end. ``` #if defined(__s390__) && !defined(__s390x__) typedef long ssize; #else typedef sptr ssize; #endif #define SSIZE_T __sanitizer::ssize ``` Cherry picked from LLVM commit ce44640fe29550461120d22b0358e6cac4aed822. PR sanitizer/117725 libsanitizer/ChangeLog: * interception/interception.h: Cherry picked from LLVM commit ce44640fe29550461120d22b0358e6cac4aed822. * sanitizer_common/sanitizer_internal_defs.h: Ditto. --- libsanitizer/interception/interception.h| 2 +- libsanitizer/sanitizer_common/sanitizer_internal_defs.h | 6 ++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/libsanitizer/interception/interception.h b/libsanitizer/interception/interception.h index 0580d97edda6..3cb6b446638e 100644 --- a/libsanitizer/interception/interception.h +++ b/libsanitizer/interception/interception.h @@ -37,7 +37,7 @@ #endif #define SIZE_T __sanitizer::usize -#define SSIZE_T __sanitizer::sptr +#define SSIZE_T __sanitizer::ssize typedef __sanitizer::sptrPTRDIFF_T; typedef __sanitizer::s64 INTMAX_T; typedef __sanitizer::u64 UINTMAX_T; diff --git a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h index 9208b12552ff..fff60c96f632 100644 --- a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h +++ b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h @@ -203,6 +203,12 @@ typedef __SIZE_TYPE__ usize; typedef uptr usize; #endif +#if defined(__s390__) && !defined(__s390x__) +typedef long ssize; +#else +typedef sptr ssize; +#endif + typedef u64 tid_t; // --- ATTENTION - -- 2.47.0