Re: [PATCH, rs6000] Remove TImode from mode iterator BOOL_128 [PR100694]
Hi, On 15/2/2022 下午 10:56, Segher Boessenkool wrote: > On Tue, Feb 15, 2022 at 11:01:03AM +0800, HAO CHEN GUI wrote: > Hi! > >> On 15/2/2022 上午 5:36, Segher Boessenkool wrote: >>> On Wed, Feb 09, 2022 at 10:43:17AM +0800, HAO CHEN GUI wrote: >>> All that are arguments for expanding to split form, not for removing >>> TImode from the iterator. And you leave PTImode, which *always* is in >>> GPRs! >> >From my understanding, PTImode has limitation that it needs to be assigned >> with an even/odd register pair. So it can't be split before the reload pass. > > TImode is put in an even/odd pair always as well. What is special about > PTImode here? TI is allowed in any GPRs. TI can be placed in r3/r4 or r4/r5 (both odd/even and even/odd) while PTI can only be placed in r4/r5 (even/odd). So if we split PTI before reload,the constraint is broken then PTI can be placed in any GPRs, I think. > >> Currently it is split after reload.> > > This prevents almost all optimisations. Splits after reload should be a > last resort thing. They almost always cause bigger problems than what > they are meant to solve. There aren't many splitters that *have* to run > after RA! > >>> (You'll also have to show it is *correct*, you need to prove (or show it >>> really likely :-) ) that after this change there are no TImode things >>> generated anywhere (anywhere!) that are no longer handled now). >>> >> Yes, the TI may be generated after expand pass and causes ICEs. So how about >> creating two mode iterators? One is for expand which doesn't include TImode, >> another is for the split which include TImode and make TImode to be split >> as early as possible? > > You can also have the expanders fail for TImode? That gives you a good > place to put in a code comment as well ;-) > Yes, I will take it. > > Segher
[committed] openmp: For min/max omp atomic compare forms verify arg types with build_binary_op [PR104531]
Hi! The MIN_EXPR/MAX_EXPR handling in *build_binary_op is minimal (especially for C FE), because min/max aren't expressions the languages contain directly. I'm using those for the #pragma omp atomic x = x < y ? y : x; forms, but e.g. for the attached testcase we normally reject _Complex int vs. int comparisons, in C++ due to MIN/MAX_EXPR we were diagnosing it as invalid types for while in C we accept it and ICEd later on. The following patch will try build_binary_op with LT_EXPR on the operands first to get needed diagnostics and fail if it returns error_mark_node. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2022-02-16 Jakub Jelinek PR c/104531 * c-omp.cc (c_finish_omp_atomic): For MIN_EXPR/MAX_EXPR, try first build_binary_op with LT_EXPR and only if that doesn't return error_mark_node call build_modify_expr. * c-c++-common/gomp/atomic-31.c: New test. --- gcc/c-family/c-omp.cc.jj2022-02-11 00:19:22.107067674 +0100 +++ gcc/c-family/c-omp.cc 2022-02-15 16:06:24.173311609 +0100 @@ -353,8 +353,13 @@ c_finish_omp_atomic (location_t loc, enu } bool save = in_late_binary_op; in_late_binary_op = true; - x = build_modify_expr (loc, blhs ? blhs : lhs, NULL_TREE, opcode, -loc, rhs, NULL_TREE); + if ((opcode == MIN_EXPR || opcode == MAX_EXPR) + && build_binary_op (loc, LT_EXPR, blhs ? blhs : lhs, rhs, + true) == error_mark_node) +x = error_mark_node; + else +x = build_modify_expr (loc, blhs ? blhs : lhs, NULL_TREE, opcode, + loc, rhs, NULL_TREE); in_late_binary_op = save; if (x == error_mark_node) return error_mark_node; --- gcc/testsuite/c-c++-common/gomp/atomic-31.c.jj 2022-02-15 17:02:17.938486108 +0100 +++ gcc/testsuite/c-c++-common/gomp/atomic-31.c 2022-02-15 17:04:23.811729201 +0100 @@ -0,0 +1,11 @@ +/* c/104531 */ +/* { dg-do compile } */ + +int x; + +void +foo (_Complex int y) +{ + #pragma omp atomic compare /* { dg-error "invalid operands" } */ + x = x > y ? y : x; +} Jakub
[PATCH v2, rs6000] Enable absolute jump table for PPC AIX and Linux
Hi, This patch enables absolute jump tables on PPC AIX and Linux. For AIX, the jump table is placed in data section. For Linux, it is placed in RELRO section when relocation is needed. Bootstrapped and tested on AIX,Linux BE and LE with no regressions. Is this okay for trunk? Any recommendations? Thanks a lot. ChangeLog 2022-02-16 Haochen Gui gcc/ * config/rs6000/aix.h (JUMP_TABLES_IN_TEXT_SECTION): Define. * config/rs6000/linux64.h (JUMP_TABLES_IN_TEXT_SECTION): Likewise. * config/rs6000/rs6000.cc (rs6000_option_override_internal): Enable absolute jump tables for AIX and Linux. (rs6000_xcoff_function_rodata_section): Implement. * config/rs6000/xcoff.h (TARGET_ASM_FUNCTION_RODATA_SECTION): Define. patch.diff diff --git a/gcc/config/rs6000/aix.h b/gcc/config/rs6000/aix.h index ad3238bf09a..b52208c2ee7 100644 --- a/gcc/config/rs6000/aix.h +++ b/gcc/config/rs6000/aix.h @@ -253,7 +253,7 @@ /* Indicate that jump tables go in the text section. */ -#define JUMP_TABLES_IN_TEXT_SECTION 1 +#define JUMP_TABLES_IN_TEXT_SECTION 0 /* Define any extra SPECS that the compiler needs to generate. */ #undef SUBTARGET_EXTRA_SPECS diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h index b2a7afabc73..16df9ef167f 100644 --- a/gcc/config/rs6000/linux64.h +++ b/gcc/config/rs6000/linux64.h @@ -239,7 +239,7 @@ extern int dot_symbols; /* Indicate that jump tables go in the text section. */ #undef JUMP_TABLES_IN_TEXT_SECTION -#define JUMP_TABLES_IN_TEXT_SECTION TARGET_64BIT +#define JUMP_TABLES_IN_TEXT_SECTION 0 /* The linux ppc64 ABI isn't explicit on whether aggregates smaller than a doubleword should be padded upward or downward. You could diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index bc3ef0721a4..e9c5552c082 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -4954,6 +4954,10 @@ rs6000_option_override_internal (bool global_init_p) warning (0, "%qs is deprecated and not recommended in any circumstances", "-mno-speculate-indirect-jumps"); + /* Enable absolute jump tables for AIX and Linux. */ + if (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2) +rs6000_relative_jumptables = 0; + return ret; } @@ -28751,6 +28755,15 @@ constant_generates_xxspltidp (vec_const_128bit_type *vsx_const) return sf_value; } +section * rs6000_xcoff_function_rodata_section (tree decl ATTRIBUTE_UNUSED, + bool relocatable) +{ + if (relocatable) +return data_section; + else +return readonly_data_section; +} + struct gcc_target targetm = TARGET_INITIALIZER; diff --git a/gcc/config/rs6000/xcoff.h b/gcc/config/rs6000/xcoff.h index cd0f99cb9c6..0dacd86eed9 100644 --- a/gcc/config/rs6000/xcoff.h +++ b/gcc/config/rs6000/xcoff.h @@ -98,7 +98,7 @@ #define TARGET_ASM_SELECT_SECTION rs6000_xcoff_select_section #define TARGET_ASM_SELECT_RTX_SECTION rs6000_xcoff_select_rtx_section #define TARGET_ASM_UNIQUE_SECTION rs6000_xcoff_unique_section -#define TARGET_ASM_FUNCTION_RODATA_SECTION default_no_function_rodata_section +#define TARGET_ASM_FUNCTION_RODATA_SECTION rs6000_xcoff_function_rodata_section #define TARGET_STRIP_NAME_ENCODING rs6000_xcoff_strip_name_encoding #define TARGET_SECTION_TYPE_FLAGS rs6000_xcoff_section_type_flags #ifdef HAVE_AS_TLS
[PATCH] combine: Fix up -fcompare-debug issue in the combiner [PR104544]
Hi! On the following testcase on aarch64-linux, we behave differently with -g and -g0. The problem is that on: (insn 10011 10010 10012 2 (set (reg:CC 66 cc) (compare:CC (reg:DI 105) (const_int 0 [0]))) "pr104544.c":18:3 407 {cmpdi} (expr_list:REG_DEAD (reg:DI 105) (nil))) (insn 10012 10011 10013 2 (set (reg:SI 109) (eq:SI (reg:CC 66 cc) (const_int 0 [0]))) "pr104544.c":18:3 444 {aarch64_cstoresi} (expr_list:REG_DEAD (reg:CC 66 cc) (nil))) (insn 10013 10012 10016 2 (set (reg:DI 110) (zero_extend:DI (reg:SI 109))) "pr104544.c":18:3 111 {*zero_extendsidi2_aarch64} (expr_list:REG_DEAD (reg:SI 109) (nil))) (insn 10016 10013 10017 2 (parallel [ (set (reg:CC 66 cc) (compare:CC (const_int 0 [0]) (reg:DI 110))) (set (reg:DI 111) (neg:DI (reg:DI 110))) ]) "pr104544.c":18:3 281 {negdi_carryout} (expr_list:REG_DEAD (reg:DI 110) (nil))) ... (debug_insn 6 5 7 2 (var_location:SI y (debug_expr:SI D#5)) "pr104544.c":18:3 -1 (nil)) (debug_insn 7 6 10033 2 (debug_marker) "pr104544.c":11:3 -1 (nil)) (insn 10033 7 10034 2 (set (reg:DI 117 [ _14 ]) (ior:DI (reg:DI 111) (reg:DI 112))) "pr104544.c":11:6 496 {iordi3} (expr_list:REG_DEAD (reg:DI 112) (expr_list:REG_DEAD (reg:DI 111) (nil we successfully split 3 insns into two: Trying 10011, 10013 -> 10016: 10011: cc:CC=cmp(r105:DI,0) REG_DEAD r105:DI 10013: r110:DI=cc:CC==0 REG_DEAD cc:CC 10016: {cc:CC=cmp(0,r110:DI);r111:DI=-r110:DI;} REG_DEAD r110:DI Failed to match this instruction: (parallel [ (set (reg:CC 66 cc) (compare:CC (reg:DI 105) (const_int 0 [0]))) (set (reg:DI 111) (neg:DI (eq:DI (reg:DI 105) (const_int 0 [0] ]) Failed to match this instruction: (parallel [ (set (reg:CC 66 cc) (compare:CC (reg:DI 105) (const_int 0 [0]))) (set (reg:DI 111) (neg:DI (eq:DI (reg:DI 105) (const_int 0 [0] ]) Successfully matched this instruction: (set (reg:DI 111) (neg:DI (eq:DI (reg:DI 105) (const_int 0 [0] Successfully matched this instruction: (set (reg:CC 66 cc) (compare:CC (reg:DI 105) (const_int 0 [0]))) Successfully matched this instruction: (set (reg:DI 112) (neg:DI (eq:DI (reg:CC 66 cc) (const_int 0 [0] allowing combination of insns 10011, 10013 and 10016 original costs 4 + 4 + 4 = 16 replacement costs 4 + 4 = 12 deferring deletion of insn with uid = 10011. but the code that searches forward for insns to update their log links (before the change there is a link from insn 10033 to insn 10016 for pseudo 111) only finds insn 10033 and updates the log link if -g isn't enabled, otherwise it stops earlier because there are debug insns in between. So, with -g LOG_LINKS of 10033 isn't updated, points eventually to NOTE_INSN_DELETED and so we do not attempt to combine 10033 with other insns, while with -g0 we do. The following patch fixes that by instead ignoring debug insns during the searching. We can still check BLOCK_FOR_INSN (insn) on those, because if we notice DEBUG_INSN in a following basic block, necessarily there won't be any further normal insns in the current block after it. Bootstrapped/regtested on x86_64-linux and i686-linux, bootstrapped on aarch64-linux, regtest on aarch64-linux still pending, ok for trunk if it succeeds? 2022-02-16 Jakub Jelinek PR rtl-optimization/104544 * combine.cc (try_combine): When looking for insn whose links should be updated from i3 to i2, don't stop on debug insns, instead skip over them. * gcc.dg/pr104544.c: New test. --- gcc/combine.cc.jj 2022-02-11 13:51:56.294928090 +0100 +++ gcc/combine.cc 2022-02-15 14:15:41.663012950 +0100 @@ -4223,10 +4223,12 @@ try_combine (rtx_insn *i3, rtx_insn *i2, for (rtx_insn *insn = NEXT_INSN (i3); !done && insn - && NONDEBUG_INSN_P (insn) + && INSN_P (insn) && BLOCK_FOR_INSN (insn) == this_basic_block; insn = NEXT_INSN (insn)) { + if (DEBUG_INSN_P (insn)) + continue; struct insn_link *link; FOR_EACH_LOG_LINK (link, insn) if (link->insn == i3 && link->regno == regno) --- gcc/testsuite/gcc.dg/pr104544.c.jj 2022-02-15 14:17:50.154221461 +0100 +++ gcc/testsuite/gcc.dg/pr104544.c 2022-02-15 14:17:34.441440536 +0100 @@ -0,0 +1,19 @@ +/* PR rtl-optimization/104544 */ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -fcompare-debug" } */ + +int m, n; +__int128 q; + +void +bar (unsigned __int128 x, int y) +{ + if (x) +q += y; +} + +void +foo (void) +{ + bar (!!q - 1, (m += m ? m : 1)
[PATCH] Restrict the two sources of vect_recog_cond_expr_convert_pattern to be of the same type when convert is extension.
> > +(match (cond_expr_convert_p @0 @2 @3 @6) > > + (cond (simple_comparison@6 @0 @1) (convert@4 @2) (convert@5 @3)) > > + (if (types_match (TREE_TYPE (@2), TREE_TYPE (@3)) > > But in principle @2 or @3 could safely differ in sign, you'd then need to > ensure > to insert sign conversions to @2/@3 to the signedness of @4/@5. > It turns out differ in sign is not suitable for extension(but ok for truncation), because it's zero_extend vs sign_extend. The patch add types_match check when convert is extension. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. And native Bootstrapped and regtested on CLX. Ok for trunk? gcc/ChangeLog: PR tree-optimization/104551 PR tree-optimization/103771 * match.pd (cond_expr_convert_p): Add types_match check when convert is extension. * tree-vect-patterns.cc (gimple_cond_expr_convert_p): Adjust comments. (vect_recog_cond_expr_convert_pattern): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr104551.c: New test. --- gcc/match.pd | 8 +--- gcc/testsuite/gcc.target/i386/pr104551.c | 24 gcc/tree-vect-patterns.cc| 6 -- 3 files changed, 33 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr104551.c diff --git a/gcc/match.pd b/gcc/match.pd index 05a10ab6bfd..8e80b9f1576 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -7692,11 +7692,13 @@ and, (if (INTEGRAL_TYPE_P (type) && INTEGRAL_TYPE_P (TREE_TYPE (@2)) && INTEGRAL_TYPE_P (TREE_TYPE (@0)) - && INTEGRAL_TYPE_P (TREE_TYPE (@3)) && TYPE_PRECISION (type) != TYPE_PRECISION (TREE_TYPE (@0)) && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@2)) - && TYPE_PRECISION (TREE_TYPE (@0)) - == TYPE_PRECISION (TREE_TYPE (@3)) + && (types_match (TREE_TYPE (@2), TREE_TYPE (@3)) + || ((TYPE_PRECISION (TREE_TYPE (@0)) + == TYPE_PRECISION (TREE_TYPE (@3))) + && INTEGRAL_TYPE_P (TREE_TYPE (@3)) + && TYPE_PRECISION (TREE_TYPE (@3)) > TYPE_PRECISION (type))) && single_use (@4) && single_use (@5 diff --git a/gcc/testsuite/gcc.target/i386/pr104551.c b/gcc/testsuite/gcc.target/i386/pr104551.c new file mode 100644 index 000..6300f25c0d5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr104551.c @@ -0,0 +1,24 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mavx2" } */ +/* { dg-require-effective-target avx2 } */ + +unsigned int +__attribute__((noipa)) +test(unsigned int a, unsigned char p[16]) { + unsigned int res = 0; + for (unsigned b = 0; b < a; b += 1) +res = p[b] ? p[b] : (char) b; + return res; +} + +int main () +{ + unsigned int a = 16U; + unsigned char p[16]; + for (int i = 0; i != 16; i++) +p[i] = (unsigned char)128; + unsigned int res = test (a, p); + if (res != 128) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index a8f96d59643..217bdfd7045 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -929,8 +929,10 @@ vect_reassociating_reduction_p (vec_info *vinfo, with conditions: 1) @1, @2, c, d, a, b are all integral type. 2) There's single_use for both @1 and @2. - 3) a, c and d have same precision. + 3) a, c have same precision. 4) c and @1 have different precision. + 5) c, d are the same type or they can differ in sign when convert is + truncation. record a and c and d and @3. */ @@ -952,7 +954,7 @@ extern bool gimple_cond_expr_convert_p (tree, tree*, tree (*)(tree)); TYPE_PRECISION (TYPE_E) != TYPE_PRECISION (TYPE_CD); TYPE_PRECISION (TYPE_AB) == TYPE_PRECISION (TYPE_CD); single_use of op_true and op_false. - TYPE_AB could differ in sign. + TYPE_AB could differ in sign when (TYPE_E) A is a truncation. Input: -- 2.18.1
Re: [PATCH] rs6000: Retry tbegin. instructions that can fail intermittently
On Tue, Feb 15, 2022 at 04:59:45PM -0600, Peter Bergner wrote: > > That is the way any HTM code should be written in the first place > > (except for rollback-only transactions, but let's not go there -- > > besides, it is normal for those to fail as well, and there needs to be a > > fallback there as well :-) ) > > Agreed and I'm not sure why I didn't write it that way to begin with. > Maybe I thought it was so simple that the likelihood of it failing was > so small we'd never see it? Anyway, we do now, so... Yeah, and you perhaps were misled by not seeing it fail in any testing (it fails only .02% of the time you said). For that reason it helps to make testcases fail *more* often. That isn't very trivial to do with HTM of course. Since we don't do HTM anymore it will all fade away, and let's not bother, why am I typing still :-) Segher
[pushed] aarch64: Extend PR100056 patterns to +
pr100056.c contains things like: int or_shift_u3a (unsigned i) { i &= 7; return i | (i << 11); } After g:96146e61cd7aee62c21c2845916ec42152918ab7, the preferred gimple representation of this is a multiplication: i_2 = i_1(D) & 7; _5 = i_2 * 2049; Expand then open-codes the multiplication back to individual shifts, but (of course) it uses + rather than | to combine the shifts. This means that we end up with the RTL equivalent of: i + (i << 11) I wondered about canonicalising the + to | (*back* to | in this case) when the operands have no set bits in common and when one of the operands is &, | or ^, but that didn't seem to be a popular idea when I asked on IRC. The feeling seemed to be that + is inherently simpler than |, so we shouldn't be “simplifying” the other way. This patch therefore adjusts the PR100056 patterns to handle + as well as |, in cases where the operands are provably disjoint. For: int or_shift_u8 (unsigned char i) { return i | (i << 11); } the instructions: 2: r95:SI=zero_extend(x0:QI) REG_DEAD x0:QI 7: r98:SI=r95:SI<<0xb are combined into: (parallel [ (set (reg:SI 98) (and:SI (ashift:SI (reg:SI 0 x0 [ i ]) (const_int 11 [0xb])) (const_int 522240 [0x7f800]))) (set (reg/v:SI 95 [ i ]) (zero_extend:SI (reg:QI 0 x0 [ i ]))) ]) which fails to match, but which is then split into its individual (independent) sets. Later the zero_extend is combined with the add to get an ADD UXTB: (set (reg:SI 99) (plus:SI (zero_extend:SI (reg:QI 0 x0 [ i ])) (reg:SI 98))) This means that there is never a 3-insn combo to match the split against. The end result is therefore: ubfiz w1, w0, 11, 8 add w0, w1, w0, uxtb This is a bit redundant, since it's doing the zero_extend twice. It is at least 2 instructions though, rather than the 3 that we had before the original patch for PR100056. or_shift_u8_asm is affected similarly. The net effect is that we do still have 2 UBFIZs, but we're at least back down to 2 instructions per function, as for GCC 11. I think that's good enough for now. There are probably other instructions that should be extended to support + as well as | (e.g. the EXTR ones), but those aren't regressions and so are GCC 13 material. Tested on aarch64-linux-gnu & pushed. Richard gcc/ PR target/100056 * config/aarch64/iterators.md (LOGICAL_OR_PLUS): New iterator. * config/aarch64/aarch64.md: Extend the PR100056 patterns to handle plus in the same way as ior, if the operands have no set bits in common. gcc/testsuite/ PR target/100056 * gcc.target/aarch64/pr100056.c: XFAIL the original UBFIZ test and instead expect two UBFIZs + two ADD UXTBs. --- gcc/config/aarch64/aarch64.md | 33 ++--- gcc/config/aarch64/iterators.md | 3 ++ gcc/testsuite/gcc.target/aarch64/pr100056.c | 4 ++- 3 files changed, 29 insertions(+), 11 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 64cc21d5802..590918464b8 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4558,7 +4558,7 @@ (define_insn "*_3" (define_split [(set (match_operand:GPI 0 "register_operand") - (LOGICAL:GPI + (LOGICAL_OR_PLUS:GPI (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") (match_operand:QI 2 "aarch64_shift_imm_")) (match_operand:GPI 3 "const_int_operand")) @@ -4571,16 +4571,23 @@ (define_split && REGNO (operands[1]) == REGNO (operands[4]))) && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[4])) << INTVAL (operands[2]), mode) - == INTVAL (operands[3]))" + == INTVAL (operands[3])) + && ( != PLUS + || (GET_MODE_MASK (GET_MODE (operands[4])) + & INTVAL (operands[3])) == 0)" [(set (match_dup 5) (zero_extend:GPI (match_dup 4))) - (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 5) (match_dup 2)) - (match_dup 5)))] - "operands[5] = gen_reg_rtx (mode);" + (set (match_dup 0) (match_dup 6))] + { +operands[5] = gen_reg_rtx (mode); +rtx shift = gen_rtx_ASHIFT (mode, operands[5], operands[2]); +rtx_code new_code = ( == PLUS ? IOR : ); +operands[6] = gen_rtx_fmt_ee (new_code, mode, shift, operands[5]); + } ) (define_split [(set (match_operand:GPI 0 "register_operand") - (LOGICAL:GPI + (LOGICAL_OR_PLUS:GPI (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") (match_operand:QI 2 "aarch64_shift_imm_")) (match_operand:GPI 4 "const_int_operand")) @@ -4589,11 +4596,17 @@ (define_split && pow2_or_zerop (UINTVAL (operands[3]
[pushed] aarch64: Remove XFAIL for bic-bitmask-1.c
bic-bitmask-1.c is now passing, so remove the XFAIL. Tested on aarch64-linux-gnu & pushed. Richard gcc/testsuite/ * gcc.target/aarch64/bic-bitmask-1.c: Remove XFAIL. --- gcc/testsuite/gcc.target/aarch64/bic-bitmask-1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/aarch64/bic-bitmask-1.c b/gcc/testsuite/gcc.target/aarch64/bic-bitmask-1.c index 568c1ffc8bc..bcb9cdd494e 100644 --- a/gcc/testsuite/gcc.target/aarch64/bic-bitmask-1.c +++ b/gcc/testsuite/gcc.target/aarch64/bic-bitmask-1.c @@ -10,4 +10,4 @@ uint32x4_t foo (int32x4_t a) return vceqq_s32 (vbicq_s32 (a, cst), zero); } -/* { dg-final { scan-assembler-not {\tbic\t} { xfail { aarch64*-*-* } } } } */ +/* { dg-final { scan-assembler-not {\tbic\t} } } */ -- 2.25.1
[pushed] aarch64: Tweak atomic-inst-cas.c options
atomic-inst-cas.c has code to skip __atomic_compare_exchange_n calls for invalid memory orderings, but -Winvalid-memory-model applies before the dead code is removed (which is the right behaviour IMO). This patch therefore suppresses the warning for this test. Tested on aarch64-linux-gnu & pushed. Richard gcc/testsuite/ * gcc.target/aarch64/atomic-inst-cas.c: Add -Wno-invalid-memory-model. --- gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c b/gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c index f6f28922319..0b4533adade 100644 --- a/gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c +++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c @@ -1,5 +1,7 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -march=armv8-a+lse" } */ +/* -Winvalid-memory-model warnings are issued before the dead invalid calls + are removed. */ +/* { dg-options "-O2 -march=armv8-a+lse -Wno-invalid-memory-model" } */ /* Test ARMv8.1-A CAS instruction. */ -- 2.25.1
Re: [PATCH], PR target/99708 - Define __SIZEOF_FLOAT128__ and __SIZEOF_IBM128__
On Tue, Feb 15, 2022 at 06:05:06PM -0500, Michael Meissner wrote: > On Tue, Feb 15, 2022 at 04:05:11PM -0600, Segher Boessenkool wrote: > > On all older compilers these macros will not be defined, but the types > > often are. If you are willing to not support older compilers properly > > anyway, you could just *always* use the types, which will work with most > > very old compilers as well (and the approach using these propesed > > predefines will *not*!) > > The types are not defined on older systems. They are defined since GCC 6. Using new macros to see if the types exist will miss all of GCC 6, 7, 8, 9, 10, 11. > Both __ibm128 (ibm128_float_type_node) and __float128 > (ieee128_float_type_node) > are only defined if TARGET_FLOAT128_TYPE is true. > > TARGET_FLOAT128_TYPE is only true if both TARGET_FLOAT128_ENABLE_TYPE and > TARGET_VSX are true. > > TARGET_FLOAT128_ENABLE_TYPE is only true on linux64 systems. > > Now, the code to set __SIZEOF_IBM128__ and __SIZEOF_FLOAT128__ is in the code > that also defines __FLOAT128__. This code checks whether the __float128 and > __ibm128 keywords are allowed. These keywords are only set if > TARGET_FLOAT128_TYPE is true, and if the user did not use the -mno-float128 > option. In the GCC 7 time frame, we did not set this by default, but in the > modern compilers, it is always set by default on Linux 64-bit systems. __SIZEOF_IBM128__ should be defined based on what we have for the __ibm128 type, not on what we have for the __float128 type Yes I know we *currently* define those under the same conditions, but why write code that is more fragile than needed? Please don't. It is easy to do it correctly, so it is no real hassle for the writer of the code, and it is much better for the reader. Segher
Re: [PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).
On 05/02/2022 19:09, Hafiz Abid Qadeer wrote: > On 04/02/2022 11:25, Hafiz Abid Qadeer wrote: >> On 04/02/2022 09:46, Thomas Schwinge wrote: >> >>> >>> Abid, are you going to address these? I think it does make sense if the >>> C/C++ and Fortran test cases match as much as feasible. >>> >> Sure. I will do that. > > The attached patch address those issues apart from removing pool_size trait. Is this change ok to commit? Thanks, -- Hafiz Abid Qadeer
[PATCH] selftest: Move C-specific tests to c_family
When trying to make use of the selftest framework over on the rust frontend, we ran into issues where rust1 was expected to produce errors containing C-like type names such as `int`. I had gotten in contact with David Malcolm on the gcc mailing list [1], who advised moving some test functions to a better location. The offending functions have also been renamed in order to better fit the C family of tests, and are thus not called when performing general selftests anymore. Kindly, [1]: https://gcc.gnu.org/pipermail/gcc/2021-November/237703.html 2022-02-16 Arthur Cohen * diagnostic.cc (diagnostic_cc_tests): Rename to... (c_diagnostic_cc_tests): ...this. * opt_problem.cc (opt_problem_cc_tests): Rename to... (c_opt_problem_cc_tests): ...this. --- gcc/c-family/c-common.cc | 2 ++ gcc/c-family/c-common.h | 2 ++ gcc/diagnostic.cc | 2 +- gcc/opt-problem.cc| 2 +- gcc/selftest-run-tests.cc | 2 -- gcc/selftest.h| 2 -- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc index 7203d761df1..d034837bb5b 100644 --- a/gcc/c-family/c-common.cc +++ b/gcc/c-family/c-common.cc @@ -9120,6 +9120,8 @@ c_family_tests (void) c_indentation_cc_tests (); c_pretty_print_cc_tests (); c_spellcheck_cc_tests (); + c_diagnostic_cc_tests (); + c_opt_problem_cc_tests (); } } // namespace selftest diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index a8d6f82bb2c..ed20c5837be 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -1513,8 +1513,10 @@ extern tree braced_lists_to_strings (tree, tree); namespace selftest { /* Declarations for specific families of tests within c-family, by source file, in alphabetical order. */ + extern void c_diagnostic_cc_tests (void); extern void c_format_cc_tests (void); extern void c_indentation_cc_tests (void); + extern void c_opt_problem_cc_tests (void); extern void c_pretty_print_cc_tests (void); extern void c_spellcheck_cc_tests (void); diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc index 87eb473d2f3..73324a728fe 100644 --- a/gcc/diagnostic.cc +++ b/gcc/diagnostic.cc @@ -2472,7 +2472,7 @@ test_num_digits () /* Run all of the selftests within this file. */ void -diagnostic_cc_tests () +c_diagnostic_cc_tests () { test_print_escaped_string (); test_print_parseable_fixits_none (); diff --git a/gcc/opt-problem.cc b/gcc/opt-problem.cc index e45d14e94b6..11fec57d679 100644 --- a/gcc/opt-problem.cc +++ b/gcc/opt-problem.cc @@ -324,7 +324,7 @@ test_opt_result_failure_at (const line_table_case &case_) /* Run all of the selftests within this file. */ void -opt_problem_cc_tests () +c_opt_problem_cc_tests () { test_opt_result_success (); for_each_line_table_case (test_opt_result_failure_at); diff --git a/gcc/selftest-run-tests.cc b/gcc/selftest-run-tests.cc index 99c35423253..d59e0aeddee 100644 --- a/gcc/selftest-run-tests.cc +++ b/gcc/selftest-run-tests.cc @@ -76,7 +76,6 @@ selftest::run_tests () json_cc_tests (); cgraph_cc_tests (); optinfo_emit_json_cc_tests (); - opt_problem_cc_tests (); ordered_hash_map_tests_cc_tests (); splay_tree_cc_tests (); @@ -95,7 +94,6 @@ selftest::run_tests () /* Higher-level tests, or for components that other selftests don't rely on. */ diagnostic_show_locus_cc_tests (); - diagnostic_cc_tests (); diagnostic_format_json_cc_tests (); edit_context_cc_tests (); fold_const_cc_tests (); diff --git a/gcc/selftest.h b/gcc/selftest.h index 7a715631c62..7568a6d24d4 100644 --- a/gcc/selftest.h +++ b/gcc/selftest.h @@ -222,7 +222,6 @@ extern void attribs_cc_tests (); extern void bitmap_cc_tests (); extern void cgraph_cc_tests (); extern void convert_cc_tests (); -extern void diagnostic_cc_tests (); extern void diagnostic_format_json_cc_tests (); extern void diagnostic_show_locus_cc_tests (); extern void digraph_cc_tests (); @@ -238,7 +237,6 @@ extern void hash_map_tests_cc_tests (); extern void hash_set_tests_cc_tests (); extern void input_cc_tests (); extern void json_cc_tests (); -extern void opt_problem_cc_tests (); extern void optinfo_emit_json_cc_tests (); extern void opts_cc_tests (); extern void ordered_hash_map_tests_cc_tests (); -- 2.35.1 -- Arthur Cohen Toolchain Engineer Embecosm GmbH Geschäftsführer: Jeremy Bennett Niederlassung: Nürnberg Handelsregister: HR-B 36368 www.embecosm.de Fürther Str. 27 90429 Nürnberg Tel.: 091 - 128 707 040 Fax: 091 - 128 707 077 OpenPGP_0x1B3465B044AD9C65.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH] combine: Fix up -fcompare-debug issue in the combiner [PR104544]
Hi! On Wed, Feb 16, 2022 at 09:53:34AM +0100, Jakub Jelinek wrote: > On the following testcase on aarch64-linux, we behave differently > with -g and -g0. [ huge snip ] > The following patch fixes that by instead ignoring debug insns during the > searching. We can still check BLOCK_FOR_INSN (insn) on those, because > if we notice DEBUG_INSN in a following basic block, necessarily there won't > be any further normal insns in the current block after it. > --- gcc/combine.cc.jj 2022-02-11 13:51:56.294928090 +0100 > +++ gcc/combine.cc2022-02-15 14:15:41.663012950 +0100 > @@ -4223,10 +4223,12 @@ try_combine (rtx_insn *i3, rtx_insn *i2, > for (rtx_insn *insn = NEXT_INSN (i3); > !done > && insn > -&& NONDEBUG_INSN_P (insn) > +&& INSN_P (insn) > && BLOCK_FOR_INSN (insn) == this_basic_block; > insn = NEXT_INSN (insn)) > { > + if (DEBUG_INSN_P (insn)) > + continue; > struct insn_link *link; > FOR_EACH_LOG_LINK (link, insn) > if (link->insn == i3 && link->regno == regno) About half of the similar loops in combine.c are still broken this way, from a quick sampling :-( Okay for trunk and all backports you may want. Thanks! Segher
Re: [PATCH] combine: Fix up -fcompare-debug issue in the combiner [PR104544]
On Wed, Feb 16, 2022 at 04:44:58AM -0600, Segher Boessenkool wrote: > > --- gcc/combine.cc.jj 2022-02-11 13:51:56.294928090 +0100 > > +++ gcc/combine.cc 2022-02-15 14:15:41.663012950 +0100 > > @@ -4223,10 +4223,12 @@ try_combine (rtx_insn *i3, rtx_insn *i2, > > for (rtx_insn *insn = NEXT_INSN (i3); > >!done > >&& insn > > - && NONDEBUG_INSN_P (insn) > > + && INSN_P (insn) > >&& BLOCK_FOR_INSN (insn) == this_basic_block; > >insn = NEXT_INSN (insn)) > > { > > + if (DEBUG_INSN_P (insn)) > > + continue; > > struct insn_link *link; > > FOR_EACH_LOG_LINK (link, insn) > > if (link->insn == i3 && link->regno == regno) > > About half of the similar loops in combine.c are still broken this way, > from a quick sampling :-( Looking for just NONDEBUG_INSN_P, I don't see any other than this. > Okay for trunk and all backports you may want. Thanks! Thanks. Jakub
[PATCH][gcc][middle-end] PR104498: Fix comparing symbol reference
Hi, As reported on PR104498, the issue here is that when compare_base_symbol_refs swaps x and y but doesn't take that into account when computing the distance. This patch makes sure that if x and y are swapped, we correct the distance computation by multiplying it by -1 to end up with the correct expected result of the original Y_BASE - X_BASE. Bootstrapped and regression tested on aarch64-none-linux. OK for trunk? gcc/ChangeLog: PR middle-end/104498 * alias.cc (compare_base_symbol_refs): Correct distance computation when swapping x and y. diff --git a/gcc/alias.cc b/gcc/alias.cc index 3fd71cff2e2b488bc39fcf7d937e118b96f491ab..8c08452e0acfcbf1bfd8fd2e8cd420b5b929d6b4 100644 --- a/gcc/alias.cc +++ b/gcc/alias.cc @@ -2195,6 +2195,7 @@ compare_base_symbol_refs (const_rtx x_base, const_rtx y_base, tree x_decl = SYMBOL_REF_DECL (x_base); tree y_decl = SYMBOL_REF_DECL (y_base); bool binds_def = true; + bool swap = false; if (XSTR (x_base, 0) == XSTR (y_base, 0)) return 1; @@ -2204,6 +2205,7 @@ compare_base_symbol_refs (const_rtx x_base, const_rtx y_base, { if (!x_decl) { + swap = true; std::swap (x_decl, y_decl); std::swap (x_base, y_base); } @@ -2238,8 +2240,8 @@ compare_base_symbol_refs (const_rtx x_base, const_rtx y_base, if (SYMBOL_REF_BLOCK (x_base) != SYMBOL_REF_BLOCK (y_base)) return 0; if (distance) - *distance += (SYMBOL_REF_BLOCK_OFFSET (y_base) - - SYMBOL_REF_BLOCK_OFFSET (x_base)); + *distance += (swap ? -1 : 1) * (SYMBOL_REF_BLOCK_OFFSET (y_base) + - SYMBOL_REF_BLOCK_OFFSET (x_base)); return binds_def ? 1 : -1; } /* Either the symbols are equal (via aliasing) or they refer to
[wwwdocs PATCH v2] gcc-12: Mention -mno-direct-extern-access
--- htdocs/gcc-12/changes.html | 4 1 file changed, 4 insertions(+) diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html index b6341fda..7d253f29 100644 --- a/htdocs/gcc-12/changes.html +++ b/htdocs/gcc-12/changes.html @@ -399,6 +399,10 @@ a work-in-progress. Add CS prefix to call and jmp to indirect thunk with branch target in r8-r15 registers via -mindirect-branch-cs-prefix. + Always use global offset table (GOT) to access external data and + function symbols when the new -mno-direct-extern-access + command-line option is specified. + -- 2.35.1
Re: [wwwdocs PATCH] gcc-12: Mention -mno-direct-extern-access
On Sat, Feb 12, 2022 at 2:27 PM Gerald Pfeifer wrote: > > On Sat, 12 Feb 2022, H.J. Lu via Gcc-patches wrote: > > + Always use GOT to access external data and function symbols via > > + -mno-direct-extern-access. > > Maybe say "global offset table (GOT)"? Fixed, > And at first I was confused reading this, so I suggest something like > > "...when the new -mno-direct-extern-access command-line > option is specified" Fixed. > or > > "New command-line option ... that ..." ? > > Gerald Fixed in the v2 patch. Thanks. -- H.J.
Re: [GCC 11 PATCH 0/5] x86: Backport straight-line-speculation mitigation
On Tue, Feb 15, 2022 at 10:52 PM Hongtao Liu wrote: > > On Tue, Feb 1, 2022 at 2:55 AM H.J. Lu via Gcc-patches > wrote: > > > > Backport -mindirect-branch-cs-prefix: > > > > commit 48a4ae26c225eb018ecb59f131e2c4fd4f3cf89a > > Author: H.J. Lu > > Date: Wed Oct 27 06:27:15 2021 -0700 > > > > x86: Add -mindirect-branch-cs-prefix > > > > Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to > > indirect thunk with branch target in r8-r15 registers so that the call > > and jmp instruction length is 6 bytes to allow them to be replaced with > > "lfence; call *%r8-r15" or "lfence; jmp *%r8-r15" at run-time. > > > > commit 63738e176726d31953deb03f7e32cf8b760735ac > > Author: H.J. Lu > > Date: Wed Oct 27 07:48:54 2021 -0700 > > > > x86: Add -mharden-sls=[none|all|return|indirect-branch] > > > > Add -mharden-sls= to mitigate against straight line speculation (SLS) > > for function return and indirect branch by adding an INT3 instruction > > after function return and indirect branch. > > > > and followup commits to support Linux kernel commits: > > > > commit e463a09af2f0677b9485a7e8e4e70b396b2ffb6f > > Author: Peter Zijlstra > > Date: Sat Dec 4 14:43:44 2021 +0100 > > > > x86: Add straight-line-speculation mitigation > > > > commit 68cf4f2a72ef8786e6b7af6fd9a89f27ac0f520d > > Author: Peter Zijlstra > > Date: Fri Nov 19 17:50:25 2021 +0100 > > > > x86: Use -mindirect-branch-cs-prefix for RETPOLINE builds > > > > H.J. Lu (5): > > x86: Remove "%!" before ret > > x86: Add -mharden-sls=[none|all|return|indirect-branch] > > x86: Add -mindirect-branch-cs-prefix > > x86: Rename -harden-sls=indirect-branch to -harden-sls=indirect-jmp > > x86: Generate INT3 for __builtin_eh_return > The patch LGTM. I am pushing this patch set into GCC 11 branch. Thanks. > > > > gcc/config/i386/i386-opts.h | 7 > > gcc/config/i386/i386.c| 38 +-- > > gcc/config/i386/i386.md | 2 +- > > gcc/config/i386/i386.opt | 24 > > gcc/doc/invoke.texi | 18 - > > gcc/testsuite/gcc.target/i386/harden-sls-1.c | 14 +++ > > gcc/testsuite/gcc.target/i386/harden-sls-2.c | 14 +++ > > gcc/testsuite/gcc.target/i386/harden-sls-3.c | 14 +++ > > gcc/testsuite/gcc.target/i386/harden-sls-4.c | 16 > > gcc/testsuite/gcc.target/i386/harden-sls-5.c | 17 + > > gcc/testsuite/gcc.target/i386/harden-sls-6.c | 18 + > > .../i386/indirect-thunk-cs-prefix-1.c | 14 +++ > > .../i386/indirect-thunk-cs-prefix-2.c | 15 > > 13 files changed, 198 insertions(+), 13 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-1.c > > create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-2.c > > create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-3.c > > create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-4.c > > create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-5.c > > create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-6.c > > create mode 100644 > > gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c > > create mode 100644 > > gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-2.c > > > > -- > > 2.34.1 > > > > > -- > BR, > Hongtao -- H.J.
Re: [PATCH] Restrict the two sources of vect_recog_cond_expr_convert_pattern to be of the same type when convert is extension.
On Wed, Feb 16, 2022 at 05:03:09PM +0800, liuhongt via Gcc-patches wrote: > > > +(match (cond_expr_convert_p @0 @2 @3 @6) > > > + (cond (simple_comparison@6 @0 @1) (convert@4 @2) (convert@5 @3)) > > > + (if (types_match (TREE_TYPE (@2), TREE_TYPE (@3)) > > > > But in principle @2 or @3 could safely differ in sign, you'd then need to > > ensure > > to insert sign conversions to @2/@3 to the signedness of @4/@5. > > > It turns out differ in sign is not suitable for extension(but ok for > truncation), > because it's zero_extend vs sign_extend. > > The patch add types_match check when convert is extension. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > And native Bootstrapped and regtested on CLX. > > Ok for trunk? > > gcc/ChangeLog: > > PR tree-optimization/104551 > PR tree-optimization/103771 > * match.pd (cond_expr_convert_p): Add types_match check when > convert is extension. > * tree-vect-patterns.cc > (gimple_cond_expr_convert_p): Adjust comments. > (vect_recog_cond_expr_convert_pattern): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr104551.c: New test. > --- > gcc/match.pd | 8 +--- > gcc/testsuite/gcc.target/i386/pr104551.c | 24 > gcc/tree-vect-patterns.cc| 6 -- > 3 files changed, 33 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr104551.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index 05a10ab6bfd..8e80b9f1576 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -7692,11 +7692,13 @@ and, >(if (INTEGRAL_TYPE_P (type) > && INTEGRAL_TYPE_P (TREE_TYPE (@2)) > && INTEGRAL_TYPE_P (TREE_TYPE (@0)) > - && INTEGRAL_TYPE_P (TREE_TYPE (@3)) > && TYPE_PRECISION (type) != TYPE_PRECISION (TREE_TYPE (@0)) > && TYPE_PRECISION (TREE_TYPE (@0)) > == TYPE_PRECISION (TREE_TYPE (@2)) > - && TYPE_PRECISION (TREE_TYPE (@0)) > - == TYPE_PRECISION (TREE_TYPE (@3)) > + && (types_match (TREE_TYPE (@2), TREE_TYPE (@3)) > +|| ((TYPE_PRECISION (TREE_TYPE (@0)) > + == TYPE_PRECISION (TREE_TYPE (@3))) > +&& INTEGRAL_TYPE_P (TREE_TYPE (@3)) > +&& TYPE_PRECISION (TREE_TYPE (@3)) > TYPE_PRECISION (type))) > && single_use (@4) > && single_use (@5 I find this quite unreadable, it looks like if @2 and @3 are treated differently. I think keeping the old 3 lines and just adding && (TYPE_PRECISION (TREE_TYPE (@0)) >= TYPE_PRECISION (type) || (TYPE_UNSIGNED (TREE_TYPE (@2)) == TYPE_UNSIGNED (TREE_TYPE (@3 after it ideally with a comment why would be better. Note, if the precision of @0 and type is the same, I think signedness can still differ, no? Jakub
[wwwdocs PATCH] gcc-11.3: Mention -mharden-sls= and -mindirect-branch-cs-prefix
--- htdocs/gcc-11/changes.html | 7 +++ 1 file changed, 7 insertions(+) diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html index fbd1b8ba..8e6d4ec8 100644 --- a/htdocs/gcc-11/changes.html +++ b/htdocs/gcc-11/changes.html @@ -1129,6 +1129,13 @@ are not listed here). no longer changes how they are passed nor returned. This ABI change is now diagnosed with -Wpsabi. + Mitigation against straight line speculation (SLS) for function + return and indirect jump is supported via + -mharden-sls=[none|all|return|indirect-jmp]. + + Add CS prefix to call and jmp to indirect thunk with branch target + in r8-r15 registers via -mindirect-branch-cs-prefix. + -- 2.35.1
Re: [PATCH] combine: Fix up -fcompare-debug issue in the combiner [PR104544]
On Wed, Feb 16, 2022 at 11:55:23AM +0100, Jakub Jelinek wrote: > On Wed, Feb 16, 2022 at 04:44:58AM -0600, Segher Boessenkool wrote: > > About half of the similar loops in combine.c are still broken this way, > > from a quick sampling :-( > > Looking for just NONDEBUG_INSN_P, I don't see any other than this. Ah yes, I was confused by !NONDEBUG_INSN. Too many inversions make my head spin (NONDEBUG_INSN really means RTX_INSN && !DEBUG_INSN). So everything looks fine here now. Thanks for double checking! Segher
Re: [PATCH] [PATCH,v5,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack
On 2/15/22 10:02, Richard Sandiford wrote: Dan Li writes: Shadow Call Stack can be used to protect the return address of a function at runtime, and clang already supports this feature[1]. Looks good, thanks. However, when I bootstrap it on aarch64-linux-gnu I get: .../gcc/ubsan.cc: In function ‘bool ubsan_expand_null_ifn(gimple_stmt_iterator*)’: .../gcc/ubsan.cc:835:50: error: enumerated and non-enumerated type in conditional expression [-Werror=extra] 835 | = (flag_sanitize_recover & ((check_align ? SANITIZE_ALIGNMENT : 0) | ^~~~ .../gcc/ubsan.cc:836:51: error: enumerated and non-enumerated type in conditional expression [-Werror=extra] 836 | | (check_null ? SANITIZE_NULL : 0))) |~~~^~~ I think this is because you're taking the last available bit of the enum :-) A hacky fix is to add "+ 0" to SANITIZE_ALIGNMENT and SANITIZE_NULL in the code quoted above (i.e. the code in the error messages). That seems slightly more robust than a cast to unsigned int (say), since "+ 0" will work even if the values become 64-bit quantities in future. Richard Ah, apologize for my mistake! I specified --disable-werror in ./configure from the beginning, I didn't see this problem before. As you said, I use the following patch: diff --git a/gcc/ubsan.cc b/gcc/ubsan.cc index 5641d3cc3be..a858994c841 100644 --- a/gcc/ubsan.cc +++ b/gcc/ubsan.cc @@ -832,8 +832,8 @@ ubsan_expand_null_ifn (gimple_stmt_iterator *gsip) else { enum built_in_function bcode - = (flag_sanitize_recover & ((check_align ? SANITIZE_ALIGNMENT : 0) - | (check_null ? SANITIZE_NULL : 0))) + = (flag_sanitize_recover & ((check_align ? SANITIZE_ALIGNMENT + 0 : 0) + | (check_null ? SANITIZE_NULL + 0 : 0))) ? BUILT_IN_UBSAN_HANDLE_TYPE_MISMATCH_V1 : BUILT_IN_UBSAN_HANDLE_TYPE_MISMATCH_V1_ABORT; tree fn = builtin_decl_implicit (bcode); And tested fine in native compiling for x86_64 , I will change it in the next version. BTW: The platform I'm using is x86-64, so I'm trying to find a way to reproduce this issue when cross-compiling with aarch64, which I haven't found so far, the issue only seems to happen with native compilation. But most of the code changes are for the aarch64 platform, is it enough for me to do the following tests before submitting the patch? 1) A full compile of gcc under x86_64 platform (make; make install; make bootstrap;) 2) Test all testsuites in aarch64 cross-compile environment (make -k check) Thanks, Dan
[committed] testsuite: Add testcase for already fixed PR [PR104448]
Hi! This PR has been fixed with r12-7147-g2f9ab267e725ddf2. Tested on x86_64-linux -m32/-m64, committed to trunk as obvious. 2022-02-16 Jakub Jelinek PR target/104448 * gcc.target/i386/pr104448.c: New test. --- gcc/testsuite/gcc.target/i386/pr104448.c.jj 2022-02-16 17:02:45.172189326 +0100 +++ gcc/testsuite/gcc.target/i386/pr104448.c2022-02-16 17:01:50.481951141 +0100 @@ -0,0 +1,9 @@ +/* PR target/104448 */ +/* { dg-do compile { target { *-*-linux* && lp64 } } } */ +/* { dg-options "-mavx5124vnniw -mno-xsave -mabi=ms" } */ + +int +main () +{ + return 0; +} Jakub
Re: [PATCH v7] c++: Add diagnostic when operator= is used as truth cond [PR25689]
On 2/16/22 02:16, Zhao Wei Liew wrote: On Wed Feb 16, 2022 at 4:06 AM +08, Jason Merrill wrote: Ah, I see. I found it a bit odd that gcc-commit-mklog auto-generated a subject with "c:", but I just went with it as I didn't know any better. Unfortunately, I can't change it now on the current thread. That came from this line in the testcase: > +/* PR c/25689 */ The PR should be c++/25689. Also, sometimes the bugzilla component isn't the same as the area of the compiler you're changing; the latter is what you want in the patch subject, so that the right people know to review it. Oh, I see. Thanks for the explanation. I've fixed the line. Ah, I didn't notice that. Sorry about that! I'm kinda new to the whole mailing list setup so there are some kinks I have to iron out. FWIW it's often easier to send the patch as an attachment. Alright, I'll send patches as attachments instead. I originally sent them as text as it is easier to comment on them. It is a bit more of a hassle in this case because your mail sender doesn't mark the patch as text, but rather application/mbox or application/x-patch, so my mail reader for patch review (Thunderbird) doesn't display it inline. I tried sending myself a patch through the gmail web interface, and it used text/x-patch, which is OK; what are you using to send? Maybe renaming the file to .txt before sending would help? +/* Test non-empty class */ +void f2(B b1, B b2) +{ + if (b1 = 0); /* { dg-warning "suggest parentheses" } */ + if (b1 = 0.); /* { dg-warning "suggest parentheses" } */ + if (b1 = b2); /* { dg-warning "suggest parentheses" } */ + if (b1.operator= (0)); + + /* Ideally, we wouldn't warn for non-empty classes using trivial + operator= (below), but we currently do as it is a MODIFY_EXPR. */ + // if (b1.operator= (b2)); You can avoid it by calling suppress_warning on that MODIFY_EXPR in build_over_call. Unfortunately, that also affects the warning for if (b1 = b2) just 5 lines above. Both expressions seem to generate the same tree structure. True, you would need to put the call to suppress_warning in build_new_op around where CALL_EXPR_OPERATOR_SYNTAX is set. Jason
Re: [pushed] aarch64: Tweak atomic-inst-cas.c options
On Wed, Feb 16, 2022 at 2:25 AM Richard Sandiford via Gcc-patches wrote: > > atomic-inst-cas.c has code to skip __atomic_compare_exchange_n > calls for invalid memory orderings, but -Winvalid-memory-model > applies before the dead code is removed (which is the right > behaviour IMO). This patch therefore suppresses the warning > for this test. It is a bit more complex than that, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104200#c3 for the reduced testcase. The undefined (invalid) arguments to __atomic_compare_exchange_n are only after constant propagation really which is not done at -O0, though the warning does it. So the warning does constant propagation of the arguments but not if it was conditionally executed. Most likely waccess should do a similar thing like it was done for the uninitialized warnings in https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589983.html . Thanks, Andrew Pinski > > Tested on aarch64-linux-gnu & pushed. > > Richard > > > gcc/testsuite/ > * gcc.target/aarch64/atomic-inst-cas.c: Add > -Wno-invalid-memory-model. > --- > gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c > b/gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c > index f6f28922319..0b4533adade 100644 > --- a/gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c > +++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c > @@ -1,5 +1,7 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -march=armv8-a+lse" } */ > +/* -Winvalid-memory-model warnings are issued before the dead invalid calls > + are removed. */ > +/* { dg-options "-O2 -march=armv8-a+lse -Wno-invalid-memory-model" } */ > > /* Test ARMv8.1-A CAS instruction. */ > > -- > 2.25.1
[PATCH] rs6000: Workaround for new ifcvt behavior [PR104335]
Hi, since r12-6747-gaa8cfe785953a0 ifcvt not only passes real comparisons but also "cc comparisons" (i.e. the representation of the result of a comparison) to the backend. rs6000_emit_int_cmove () is not prepared to handle this. Therefore, this patch makes it return false in such a case in order to avoid an ICE. I bootstrapped (with --enable-languages=all on P10, --enable-languages="c, c++, fortran, go, lto, objc, obj-c++" otherwise) and regtested on Power7, Power8, Power9 and Power10. Testsuite is unchanged on P7 and P9. On P8 I hit some different FAILs vs master but they look unrelated and seem to be caused by "spawn failed" i.e. out of memory or so. On P10 I compared the testsuite of the last commit before the breaking one (r12-6746-ge9ebb86799fd77, but commenting out a line that would still result in a "-Wformat-diag" bootstrap error then) vs. the patched master: No regressions. Is it OK? Regards Robin -- PR target/104335 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_int_cmove): Return false if the expected comparison's first operand is of mode MODE_CC.From b9f053bf266bd1518e0eac36509ebde57266 Mon Sep 17 00:00:00 2001 From: Robin Dapp Date: Thu, 10 Feb 2022 09:01:51 -0600 Subject: [PATCH] rs6000: Workaround for new ifcvt behavior [PR104335]. Since r12-6747-gaa8cfe785953a0 ifcvt passes a "cc comparison" i.e. the representation of the result of a comparison to the backend. rs6000_emit_int_cmove () is not prepared to handle this. Therefore, this patch makes it return false in such a case. PR target/104335 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_int_cmove): Return false if the expected comparison's first operand is of mode MODE_CC. --- gcc/config/rs6000/rs6000.cc | 6 ++ 1 file changed, 6 insertions(+) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index eaba9a2d698..ebc5b0cefdc 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -16175,6 +16175,12 @@ rs6000_emit_int_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond) if (mode != SImode && (!TARGET_POWERPC64 || mode != DImode)) return false; + /* PR104335: We now need to expect CC-mode "comparisons" + coming from ifcvt. The following code expects proper + comparisons so better abort here. */ + if (XEXP (op, 0) && GET_MODE_CLASS (GET_MODE (XEXP (op, 0))) == MODE_CC) +return false; + /* We still have to do the compare, because isel doesn't do a compare, it just looks at the CRx bits set by a previous compare instruction. */ -- 2.31.1
libbacktrace patch committed: Initialize DWARF 5 fields of unit
When I added the DWARF 5 support to libbacktrace in 2019-12-13 I forgot to initialize the new fields of the unit data structure. Whoops. Fixed with this patch. Bootstrapped and ran libbacktrace and Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian * dwarf.c (build_address_map): Initialize DWARF 5 fields of unit. ab59cb2055658a72fdccba0be76eeadd222ffef6 diff --git a/libbacktrace/dwarf.c b/libbacktrace/dwarf.c index c0bae0e501e..2158bc14065 100644 --- a/libbacktrace/dwarf.c +++ b/libbacktrace/dwarf.c @@ -2221,6 +2221,9 @@ build_address_map (struct backtrace_state *state, uintptr_t base_address, u->comp_dir = NULL; u->abs_filename = NULL; u->lineoff = 0; + u->str_offsets_base = 0; + u->addr_base = 0; + u->rnglists_base = 0; /* The actual line number mappings will be read as needed. */ u->lines = NULL;
Re: [PATCH] c++: NON_DEPENDENT_EXPR is not potentially constant [PR104507]
On Tue, 15 Feb 2022, Jason Merrill wrote: > On 2/15/22 17:00, Patrick Palka wrote: > > On Tue, 15 Feb 2022, Jason Merrill wrote: > > > > > On 2/15/22 15:13, Patrick Palka wrote: > > > > On Tue, 15 Feb 2022, Patrick Palka wrote: > > > > > > > > > Here we're crashing from potential_constant_expression because it > > > > > tries > > > > > to perform trial evaluation of the first operand '(bool)__r' of the > > > > > conjunction (which is overall wrapped in a NON_DEPENDENT_EXPR), but > > > > > cxx_eval_constant_expression ICEs on unhandled trees (of which > > > > > CAST_EXPR > > > > > is one). > > > > > > > > > > Since cxx_eval_constant_expression always treats NON_DEPENDENT_EXPR > > > > > as non-constant, and since NON_DEPENDENT_EXPR is also opaque to > > > > > instantiate_non_dependent_expr, it seems futile to have p_c_e_1 ever > > > > > return true for NON_DEPENDENT_EXPR, so let's just instead return false > > > > > and avoid recursing. > > > > > > Well, in a template we use pce1 to decide whether to complain about > > > something > > > that needs to be constant but can't be. We aren't trying to get a value > > > yet. > > > > Makes sense.. though for NON_DEPENDENT_EXPR in particular, ISTM this > > tree is always used in a context where a constant expression isn't > > required, e.g. in the build_x_* functions. > > Fair enough. The patch is OK with a comment to that effect. Thanks, I committed the following as r12-7264: -- >8 -- Subject: [PATCH] c++: treat NON_DEPENDENT_EXPR as not potentially constant [PR104507] Here we're crashing from potential_constant_expression because it tries to perform trial evaluation of the first operand '(bool)__r' of the conjunction (which is overall wrapped in a NON_DEPENDENT_EXPR), but cxx_eval_constant_expression ICEs on unsupported trees (of which CAST_EXPR is one). The sequence of events is: 1. build_non_dependent_expr for the array subscript yields NON_DEPENDENT_EXPR<<<(bool)__r && __s>>> ? 1 : 2 2. cp_build_array_ref calls fold_non_dependent_expr on this subscript (after this point, processing_template_decl is cleared) 3. during which, the COND_EXPR case of tsubst_copy_and_build calls fold_non_dependent_expr on the first operand 4. during which, we crash from p_c_e_1 because it attempts trial evaluation of the CAST_EXPR '(bool)__r'. Note that even if this crash didn't happen, fold_non_dependent_expr from cp_build_array_ref would still ultimately be one big no-op here since neither constexpr evaluation nor tsubst handle NON_DEPENDENT_EXPR. In light of this and of the observation that we should never see NON_DEPENDENT_EXPR in a context where a constant expression is needed (it's used primarily in the build_x_* family of functions), it seems futile for p_c_e_1 to ever return true for NON_DEPENDENT_EXPR. And the otherwise inconsistent handling of NON_DEPENDENT_EXPR between p_c_e_1, cxx_evaluate_constexpr_expression and tsubst apparently leads to weird bugs such as this one. PR c++/104507 gcc/cp/ChangeLog: * constexpr.cc (potential_constant_expression_1) : Return false instead of recursing. Assert tf_error isn't set. gcc/testsuite/ChangeLog: * g++.dg/template/non-dependent21.C: New test. --- gcc/cp/constexpr.cc | 9 - gcc/testsuite/g++.dg/template/non-dependent21.C | 9 + 2 files changed, 17 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/template/non-dependent21.C diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc index 7274c3b760e..4716694cb71 100644 --- a/gcc/cp/constexpr.cc +++ b/gcc/cp/constexpr.cc @@ -9065,6 +9065,14 @@ potential_constant_expression_1 (tree t, bool want_rval, bool strict, bool now, case BIND_EXPR: return RECUR (BIND_EXPR_BODY (t), want_rval); +case NON_DEPENDENT_EXPR: + /* Treat NON_DEPENDENT_EXPR as non-constant: it's not handled by +constexpr evaluation or tsubst, so fold_non_dependent_expr can't +do anything useful with it. And we shouldn't see it in a context +where a constant expression is strictly required, hence the assert. */ + gcc_checking_assert (!(flags & tf_error)); + return false; + case CLEANUP_POINT_EXPR: case MUST_NOT_THROW_EXPR: case TRY_CATCH_EXPR: @@ -9072,7 +9080,6 @@ potential_constant_expression_1 (tree t, bool want_rval, bool strict, bool now, case EH_SPEC_BLOCK: case EXPR_STMT: case PAREN_EXPR: -case NON_DEPENDENT_EXPR: /* For convenience. */ case LOOP_EXPR: case EXIT_EXPR: diff --git a/gcc/testsuite/g++.dg/template/non-dependent21.C b/gcc/testsuite/g++.dg/template/non-dependent21.C new file mode 100644 index 000..89900837b8b --- /dev/null +++ b/gcc/testsuite/g++.dg/template/non-dependent21.C @@ -0,0 +1,9 @@ +// PR c++/104507 + +extern const char *_k_errmsg[]; + +template +const char* DoFoo(int __r, int __s) { + const char* n = _k_errmsg[(bool)__r &&
Re: [PATCH] c++: return-type-req in constraint using only outer tparms [PR104527]
On Tue, 15 Feb 2022, Jason Merrill wrote: > On 2/14/22 11:32, Patrick Palka wrote: > > Here the template context for the atomic constraint has two levels of > > template arguments, but since it depends only on the innermost argument > > T we use a single-level argument vector during substitution into the > > constraint (built by get_mapped_args). We eventually pass this vector > > to do_auto_deduction as part of checking the return-type-requirement > > inside the atom, but do_auto_deduction expects outer_targs to be a full > > set of arguments for sake of satisfaction. > > Could we note the current number of levels in the map and use that in > get_mapped_args instead of the highest level parameter we happened to use? Ah yeah, that seems to work nicely. IIUC it should suffice to remember whether the atomic constraint expression came from a concept definition. If it did, then the depth of the argument vector returned by get_mapped_args must be one, otherwise (as in the testcase) it must be the same as the template depth of the constrained entity, which is the depth of ARGS. How does the following look? Bootstrapped and regtested on x86_64-pc-linux-gnu and also on cmcstl2 and range-v3. -- >8 -- Subject: [PATCH] c++: return-type-req in constraint using only outer tparms [PR104527] Here the template context for the atomic constraint has two levels of template parameters, but since it depends only on the innermost parameter T we use a single-level argument vector (built by get_mapped_args) during substitution into the atom. We eventually pass this vector to do_auto_deduction as part of checking the return-type-requirement within the atom, but do_auto_deduction expects outer_targs to be a full set of arguments for sake of satisfaction. This patch fixes this by making get_mapped_args always return an argument vector whose depth corresponds to the template depth of the context in which the atomic constraint expression was written, instead of the highest parameter level that the expression happens to use. PR c++/104527 gcc/cp/ChangeLog: * constraint.cc (normalize_atom): Set ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P appropriately. (get_mapped_args): Make static, adjust parameters. Always return a vector whose depth corresponds to the template depth of the context of the atomic constraint expression. Micro-optimize by passing false as exact to safe_grow_cleared and by collapsing a multi-level depth-one argument vector. (satisfy_atom): Adjust call to get_mapped_args and diagnose_atomic_constraint. (diagnose_atomic_constraint): Replace map parameter with an args parameter. * cp-tree.h (ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P): Define. (get_mapped_args): Remove declaration. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-return-req4.C: New test. --- gcc/cp/constraint.cc | 64 +++ gcc/cp/cp-tree.h | 7 +- .../g++.dg/cpp2a/concepts-return-req4.C | 24 +++ 3 files changed, 69 insertions(+), 26 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-return-req4.C diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index 12db7e5cf14..306e28955c6 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -764,6 +764,8 @@ normalize_atom (tree t, tree args, norm_info info) tree ci = build_tree_list (t, info.context); tree atom = build1 (ATOMIC_CONSTR, ci, map); + if (info.in_decl && concept_definition_p (info.in_decl)) +ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P (atom) = true; if (!info.generate_diagnostics ()) { /* Cache the ATOMIC_CONSTRs that we return, so that sat_hasher::equal @@ -2826,33 +2828,37 @@ satisfaction_value (tree t) return boolean_true_node; } -/* Build a new template argument list with template arguments corresponding - to the parameters used in an atomic constraint. */ +/* Build a new template argument vector according to the parameter + mapping of the atomic constraint T, using arguments from ARGS. */ -tree -get_mapped_args (tree map) +static tree +get_mapped_args (tree t, tree args) { + tree map = ATOMIC_CONSTR_MAP (t); + /* No map, no arguments. */ if (!map) return NULL_TREE; - /* Find the mapped parameter with the highest level. */ - int count = 0; - for (tree p = map; p; p = TREE_CHAIN (p)) -{ - int level; - int index; - template_parm_level_and_index (TREE_VALUE (p), &level, &index); - if (level > count) -count = level; -} + /* Determine the depth of the resulting argument vector. */ + int depth; + if (ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P (t)) +/* The expression of this atomic constraint comes from a concept definition, + whose template depth is always one, so the resulting argument vector + will also have depth one. */ +depth = 1; + else +/* Otherwise, the e
Re: libgo patch committed: Update to Go1.18beta2 release
On Tue, Feb 15, 2022 at 1:19 AM Eric Botcazou wrote: > > > I've committed a change to update libgo to the Go1.18beta2 release. > > This apparently broke the build on SPARC/Solaris 11.3: I've committed this patch to fix these problems. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu and x86_64-solaris. Ian p 24ca97325cab7bc454c785d55f37120fe7ea6f74 diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE index 745132a3d9d..3742414c828 100644 --- a/gcc/go/gofrontend/MERGE +++ b/gcc/go/gofrontend/MERGE @@ -1,4 +1,4 @@ -0af68c0552341a44f1fb12301f9eff954b9dde88 +3742e8a154bfec805054b4ebf0809f12dc7694da The first line of this file holds the git revision number of the last merge done from the gofrontend repository. diff --git a/libgo/go/net/fcntl_libc_test.go b/libgo/go/net/fcntl_libc_test.go index f59a1aa33ba..c935c4540cf 100644 --- a/libgo/go/net/fcntl_libc_test.go +++ b/libgo/go/net/fcntl_libc_test.go @@ -6,7 +6,10 @@ package net -import "syscall" +import ( + "syscall" + _ "unsafe" +) // Use a helper function to call fcntl. This is defined in C in // libgo/runtime. diff --git a/libgo/go/os/signal/internal/pty/pty.go b/libgo/go/os/signal/internal/pty/pty.go index e5ee3f6dc01..01c3908becf 100644 --- a/libgo/go/os/signal/internal/pty/pty.go +++ b/libgo/go/os/signal/internal/pty/pty.go @@ -2,7 +2,7 @@ // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. -//go:build (aix || darwin || dragonfly || freebsd || hurd || (linux && !android) || netbsd || openbsd) && cgo +//go:build (aix || darwin || dragonfly || freebsd || hurd || (linux && !android) || netbsd || openbsd || solaris) && cgo // Package pty is a simple pseudo-terminal package for Unix systems, // implemented by calling C functions via cgo. diff --git a/libgo/go/runtime/os3_solaris.go b/libgo/go/runtime/os3_solaris.go index ec23ce2cc0c..6c825746fbc 100644 --- a/libgo/go/runtime/os3_solaris.go +++ b/libgo/go/runtime/os3_solaris.go @@ -36,6 +36,14 @@ func solarisExecutablePath() string { return executablePath } +func setProcessCPUProfiler(hz int32) { + setProcessCPUProfilerTimer(hz) +} + +func setThreadCPUProfiler(hz int32) { + setThreadCPUProfilerHz(hz) +} + //go:nosplit func validSIGPROF(mp *m, c *sigctxt) bool { return true diff --git a/libgo/go/runtime/stubs2.go b/libgo/go/runtime/stubs2.go index 0b9e60587e1..587109209d1 100644 --- a/libgo/go/runtime/stubs2.go +++ b/libgo/go/runtime/stubs2.go @@ -2,7 +2,7 @@ // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. -//go:build !aix && !darwin && !js && !openbsd && !plan9 && !solaris && !windows +//go:build !js && !plan9 && !windows package runtime diff --git a/libgo/go/syscall/exec_bsd.go b/libgo/go/syscall/exec_bsd.go index c05ae138811..ff88bc45366 100644 --- a/libgo/go/syscall/exec_bsd.go +++ b/libgo/go/syscall/exec_bsd.go @@ -143,13 +143,13 @@ func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir *byte, attr // User and groups if cred := sys.Credential; cred != nil { ngroups := len(cred.Groups) - var groups *Gid_t + var groups unsafe.Pointer if ngroups > 0 { gids := make([]Gid_t, ngroups) for i, v := range cred.Groups { gids[i] = Gid_t(v) } - groups = &gids[0] + groups = unsafe.Pointer(&gids[0]) } if !cred.NoSetGroups { err1 = raw_setgroups(ngroups, groups) diff --git a/libgo/go/syscall/export_unix_test.go b/libgo/go/syscall/export_unix_test.go index 184eb84c0b1..bd904c70f36 100644 --- a/libgo/go/syscall/export_unix_test.go +++ b/libgo/go/syscall/export_unix_test.go @@ -2,7 +2,7 @@ // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. -//go:build dragonfly || freebsd || hurd || linux || netbsd || openbsd +//go:build dragonfly || freebsd || hurd || linux || netbsd || openbsd || solaris package syscall diff --git a/libgo/go/syscall/syscall_solaris.go b/libgo/go/syscall/syscall_solaris.go index 13c60a493d9..673ba8223fc 100644 --- a/libgo/go/syscall/syscall_solaris.go +++ b/libgo/go/syscall/syscall_solaris.go @@ -6,8 +6,6 @@ package syscall import "unsafe" -const _F_DUP2FD_CLOEXEC = F_DUP2FD_CLOEXEC - func (ts *Timestruc) Unix() (sec int64, nsec int64) { return int64(ts.Sec), int64(ts.Nsec) }
[PATCH] c++: double non-dep folding from finish_compound_literal [PR104565]
In finish_compound_literal, we perform non-dependent expr folding before calling check_narrowing (ever since r9-5973). But ever since r10-7096, check_narrowing also performs non-dependent expr folding of its own. This double folding cause tsubst to see non-templated trees during the second folding, which causes a spurious error in the below testcase. This patch removes this first folding operation; it now seems obviated by the second one. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk/10/11? PR c++/104565 gcc/cp/ChangeLog: * semantics.cc (finish_compound_literal): Don't perform non-dependent expr folding before calling check_narrowing. gcc/testsuite/ChangeLog: * g++.dg/template/non-dependent22.C: New test. --- gcc/cp/semantics.cc | 10 +++--- gcc/testsuite/g++.dg/template/non-dependent22.C | 12 2 files changed, 15 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/g++.dg/template/non-dependent22.C diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 0cb17a6a8ab..114baa48710 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -3203,13 +3203,9 @@ finish_compound_literal (tree type, tree compound_literal, return error_mark_node; compound_literal = reshape_init (type, compound_literal, complain); if (SCALAR_TYPE_P (type) - && !BRACE_ENCLOSED_INITIALIZER_P (compound_literal)) -{ - tree t = instantiate_non_dependent_expr_sfinae (compound_literal, - complain); - if (!check_narrowing (type, t, complain)) - return error_mark_node; -} + && !BRACE_ENCLOSED_INITIALIZER_P (compound_literal) + && !check_narrowing (type, compound_literal, complain)) +return error_mark_node; if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type) == NULL_TREE) { diff --git a/gcc/testsuite/g++.dg/template/non-dependent22.C b/gcc/testsuite/g++.dg/template/non-dependent22.C new file mode 100644 index 000..83a6a13f15b --- /dev/null +++ b/gcc/testsuite/g++.dg/template/non-dependent22.C @@ -0,0 +1,12 @@ +// PR c++/104565 +// { dg-do compile { target c++11 } } + +struct apa { + constexpr int n() const { return 3; } +}; + +template +int f() { + apa foo; + return int{foo.n()}; // no matching function for call to 'apa::n(apa*)' +} -- 2.35.1.129.gb80121027d
[PATCH] libgomp : OMPD implementation
HI, I am sorry that the previous patch was buggy. This patch contains the header files and source files of functions that are specified in OpenMP Application ProgrammingInterface book from sections (5.1, 5.2, 5.3, 5.4, 5.5.1, 5.5.2) the functions are tested using the gdb plugin and the results are correct. Please Review this Patch and reply to us. 2022-02-16 Mohamed Atef * Makefile.am (toolexeclib_LTLIBRARIES): Add libgompd.la. (libgompd_la_LDFLAGS, libgompd_la_DEPENDENCIES, libgompd_la_LINK, libgompd_la_SOURCES, libgompd_version_dep, libgompd_version_script, libgompd.ver-sun, libgompd.ver, libgompd_version_info): Defined. * Makefile.in: Regenerate. * aclocal.m4: Regenerate. * config/darwin/plugin-suffix.h: Removed (). * config/hpux/plugin-suffix.h: Removed (). * config/posix/plugin-suffix.h: Removed (). * configure: Regenerate. * env.c: (#include "ompd-support.h") : Added. (initialize_env) : Call ompd_load(). * parallel.c:(#include "ompd-support.h"): Added. (GOMP_parallel) : Call ompd_bp_parallel_begin and ompd_bp_parallel_end. * libgomp.map: Add OMP_5.0.3 symobl versions. * libgompd.map: New file. * omp-tools.h.in : New file. * omp-types.h.in : New file. * ompd-support.h : New file. * ompd-support.c : New file. * ompd-helper.h : New file. * ompd-helper.c: New file. * ompd-init.c: New file. * testsuite/Makfile.in: Regenerate. diff --git a/configure b/configure index 9c2d7df1bb2..c270ea34098 100755 --- a/configure +++ b/configure @@ -766,6 +766,7 @@ infodir docdir oldincludedir includedir +runstatedir localstatedir sharedstatedir sysconfdir @@ -936,6 +937,7 @@ datadir='${datarootdir}' sysconfdir='${prefix}/etc' sharedstatedir='${prefix}/com' localstatedir='${prefix}/var' +runstatedir='${localstatedir}/run' includedir='${prefix}/include' oldincludedir='/usr/include' docdir='${datarootdir}/doc/${PACKAGE}' @@ -1188,6 +1190,15 @@ do | -silent | --silent | --silen | --sile | --sil) silent=yes ;; + -runstatedir | --runstatedir | --runstatedi | --runstated \ + | --runstate | --runstat | --runsta | --runst | --runs \ + | --run | --ru | --r) +ac_prev=runstatedir ;; + -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \ + | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \ + | --run=* | --ru=* | --r=*) +runstatedir=$ac_optarg ;; + -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb) ac_prev=sbindir ;; -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \ @@ -1325,7 +1336,7 @@ fi for ac_var in exec_prefix prefix bindir sbindir libexecdir datarootdir \ datadir sysconfdir sharedstatedir localstatedir includedir \ oldincludedir docdir infodir htmldir dvidir pdfdir psdir \ - libdir localedir mandir + libdir localedir mandir runstatedir do eval ac_val=\$$ac_var # Remove trailing slashes. @@ -1485,6 +1496,7 @@ Fine tuning of the installation directories: --sysconfdir=DIRread-only single-machine data [PREFIX/etc] --sharedstatedir=DIRmodifiable architecture-independent data [PREFIX/com] --localstatedir=DIR modifiable single-machine data [PREFIX/var] + --runstatedir=DIR modifiable per-process data [LOCALSTATEDIR/run] --libdir=DIRobject code libraries [EPREFIX/lib] --includedir=DIRC header files [PREFIX/include] --oldincludedir=DIR C header files for non-gcc [/usr/include] * testsuite/libgomp.fortran/allocate-1.f90: Remove spurious diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am index f8b2a06d63e..22a27df105e 100644 --- a/libgomp/Makefile.am +++ b/libgomp/Makefile.am @@ -20,7 +20,7 @@ AM_CPPFLAGS = $(addprefix -I, $(search_path)) AM_CFLAGS = $(XCFLAGS) AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS) -toolexeclib_LTLIBRARIES = libgomp.la +toolexeclib_LTLIBRARIES = libgomp.la libgompd.la nodist_toolexeclib_HEADERS = libgomp.spec if LIBGOMP_BUILD_VERSIONED_SHLIB @@ -32,13 +32,21 @@ libgomp.ver: $(top_srcdir)/libgomp.map $(EGREP) -v '#(#| |$$)' $< | \ $(PREPROCESS) -P -include config.h - > $@ || (rm -f $@ ; exit 1) +libgompd.ver: $(top_srcdir)/libgompd.map + $(EGREP) -v '#(#| |$$)' $< | \ + $(PREPROCESS) -P -include config.h - > $@ || (rm -f $@ ; exit 1) + if LIBGOMP_BUILD_VERSIONED_SHLIB_GNU libgomp_version_script = -Wl,--version-script,libgomp.ver +libgompd_version_script = -Wl,--version-script,libgompd.ver libgomp_version_dep = libgomp.ver +libgompd_version_dep = libgompd.ver endif if LIBGOMP_BUILD_VERSIONED_SHLIB_SUN libgomp_version_script = -Wl,-M,libgomp.ver-sun +libgompd_version_script = -Wl,-M,libgompd.ver-sun libgomp_version_dep = libgomp.ver-sun +libgompd_version_dep = libgompd.ver-sun libgomp.ver-sun : libgomp.
Re: [PATCH] libgomp : OMPD implementation
Sorry I forgot to uncomment 2 lines, here's the Patch Again. Thanks Mohamed On Wed, Feb 16, 2022 at 10:54 PM Mohamed Atef wrote: > HI, > I am sorry that the previous patch was buggy. > This patch contains the header files and source files of functions that > are specified in OpenMP Application ProgrammingInterface book from sections > (5.1, 5.2, 5.3, 5.4, 5.5.1, 5.5.2) the functions are tested using the gdb > plugin and the results are correct. > Please Review this Patch and reply to us. > > 2022-02-16 Mohamed Atef > > * Makefile.am (toolexeclib_LTLIBRARIES): Add libgompd.la. > (libgompd_la_LDFLAGS, libgompd_la_DEPENDENCIES, libgompd_la_LINK, > libgompd_la_SOURCES, libgompd_version_dep, libgompd_version_script, > libgompd.ver-sun, libgompd.ver, libgompd_version_info): Defined. > * Makefile.in: Regenerate. > * aclocal.m4: Regenerate. > * config/darwin/plugin-suffix.h: Removed (). > * config/hpux/plugin-suffix.h: Removed (). > * config/posix/plugin-suffix.h: Removed (). > * configure: Regenerate. > * env.c: (#include "ompd-support.h") : Added. > (initialize_env) : Call ompd_load(). > * parallel.c:(#include "ompd-support.h"): Added. > (GOMP_parallel) : Call ompd_bp_parallel_begin and > ompd_bp_parallel_end. > * libgomp.map: Add OMP_5.0.3 symobl versions. > * libgompd.map: New file. > * omp-tools.h.in : New file. > * omp-types.h.in : New file. > * ompd-support.h : New file. > * ompd-support.c : New file. > * ompd-helper.h : New file. > * ompd-helper.c: New file. > * ompd-init.c: New file. > * testsuite/Makfile.in: Regenerate. > > > > diff --git a/configure b/configure index 9c2d7df1bb2..c270ea34098 100755 --- a/configure +++ b/configure @@ -766,6 +766,7 @@ infodir docdir oldincludedir includedir +runstatedir localstatedir sharedstatedir sysconfdir @@ -936,6 +937,7 @@ datadir='${datarootdir}' sysconfdir='${prefix}/etc' sharedstatedir='${prefix}/com' localstatedir='${prefix}/var' +runstatedir='${localstatedir}/run' includedir='${prefix}/include' oldincludedir='/usr/include' docdir='${datarootdir}/doc/${PACKAGE}' @@ -1188,6 +1190,15 @@ do | -silent | --silent | --silen | --sile | --sil) silent=yes ;; + -runstatedir | --runstatedir | --runstatedi | --runstated \ + | --runstate | --runstat | --runsta | --runst | --runs \ + | --run | --ru | --r) +ac_prev=runstatedir ;; + -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \ + | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \ + | --run=* | --ru=* | --r=*) +runstatedir=$ac_optarg ;; + -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb) ac_prev=sbindir ;; -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \ @@ -1325,7 +1336,7 @@ fi for ac_var in exec_prefix prefix bindir sbindir libexecdir datarootdir \ datadir sysconfdir sharedstatedir localstatedir includedir \ oldincludedir docdir infodir htmldir dvidir pdfdir psdir \ - libdir localedir mandir + libdir localedir mandir runstatedir do eval ac_val=\$$ac_var # Remove trailing slashes. @@ -1485,6 +1496,7 @@ Fine tuning of the installation directories: --sysconfdir=DIRread-only single-machine data [PREFIX/etc] --sharedstatedir=DIRmodifiable architecture-independent data [PREFIX/com] --localstatedir=DIR modifiable single-machine data [PREFIX/var] + --runstatedir=DIR modifiable per-process data [LOCALSTATEDIR/run] --libdir=DIRobject code libraries [EPREFIX/lib] --includedir=DIRC header files [PREFIX/include] --oldincludedir=DIR C header files for non-gcc [/usr/include] diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog index 7905565c420..be4ce1dbe12 100644 --- a/libgomp/ChangeLog +++ b/libgomp/ChangeLog @@ -1,3 +1,30 @@ +2022-02-16 Mohamed Atef + +* Makefile.am (toolexeclib_LTLIBRARIES): Add libgompd.la. +(libgompd_la_LDFLAGS, libgompd_la_DEPENDENCIES, libgompd_la_LINK, +libgompd_la_SOURCES, libgompd_version_dep, libgompd_version_script, +libgompd.ver-sun, libgompd.ver, libgompd_version_info): Defined. +* Makefile.in: Regenerate. +* aclocal.m4: Regenerate. +* config/darwin/plugin-suffix.h: Removed (). +* config/hpux/plugin-suffix.h: Removed (). +* config/posix/plugin-suffix.h: Removed (). +* configure: Regenerate. +* env.c: (#include "ompd-support.h") : Added. +(initialize_env) : Call ompd_load(). +* parallel.c:(#include "ompd-support.h"): Added. +(GOMP_parallel) : Call ompd_bp_parallel_begin and ompd_bp_parallel_end. +* libgomp.map: Add OMP_5.0.3 symobl versions. +* libgompd.map: New file. +* omp-tools.h.in : New file. +* omp-types.h.in : New fi
[PATCH] PR fortran/104573 - ICE in resolve_structure_cons, at fortran/resolve.cc:1299
Dear Fortranners, while we detect invalid uses of type(*), we may run into other issues later when the declared variable is used, leading to an ICE due to a NULL pointer dereference. This is demonstrated by Gerhard's testcase. Steve and I came to rather similar fixes, see PR. Mine is attached. Regtested on x86_64-pc-linux-gnu. OK for mainline? Thanks, Harald From 01d629506edca711f02912e2cc124f8894cfa389 Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Wed, 16 Feb 2022 22:13:02 +0100 Subject: [PATCH] Fortran: error recovery after invalid assumed type declaration gcc/fortran/ChangeLog: PR fortran/104573 * resolve.cc (resolve_structure_cons): Avoid NULL pointer dereference when there is no valid component. gcc/testsuite/ChangeLog: PR fortran/104573 * gfortran.dg/assumed_type_14.f90: New test. --- gcc/fortran/resolve.cc| 8 +--- gcc/testsuite/gfortran.dg/assumed_type_14.f90 | 12 2 files changed, 17 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/assumed_type_14.f90 diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc index 266e41e25b1..2fa1acdbd6d 100644 --- a/gcc/fortran/resolve.cc +++ b/gcc/fortran/resolve.cc @@ -1288,15 +1288,17 @@ resolve_structure_cons (gfc_expr *expr, int init) } } - cons = gfc_constructor_first (expr->value.constructor); - /* A constructor may have references if it is the result of substituting a parameter variable. In this case we just pull out the component we want. */ if (expr->ref) comp = expr->ref->u.c.sym->components; - else + else if (expr->ts.u.derived) comp = expr->ts.u.derived->components; + else +return false; + + cons = gfc_constructor_first (expr->value.constructor); for (; comp && cons; comp = comp->next, cons = gfc_constructor_next (cons)) { diff --git a/gcc/testsuite/gfortran.dg/assumed_type_14.f90 b/gcc/testsuite/gfortran.dg/assumed_type_14.f90 new file mode 100644 index 000..6cfe2e4fb73 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/assumed_type_14.f90 @@ -0,0 +1,12 @@ +! { dg-do compile } +! PR fortran/104573 - ICE in resolve_structure_cons +! Contributed by G.Steinmetz + +program p + type t + end type + type(*), parameter :: x = t() ! { dg-error "Assumed type of variable" } + print *, x +end + +! { dg-prune-output "Cannot convert" } -- 2.34.1
Re: [committed] d: Merge upstream dmd 52844d4b1, druntime dbd0c874, phobos 896b1d0e1.
Hi Iain, > This patch merges the D front-end implementation with upstream dmd > 52844d4b1, as well as the D runtime libraries with druntime dbd0c874, > and phobos 896b1d0e1, including the latest features and bug-fixes ahead of > the 2.099.0-beta1 release. this patch broke Solaris bootstrap: /vol/gcc/src/hg/master/local/libphobos/libdruntime/core/sys/posix/sys/ipc.d:193:5: error: static assert: "Unsupported platform" 193 | static assert(false, "Unsupported platform"); | ^ The attached patch fixes this. Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University diff --git a/libphobos/libdruntime/core/sys/posix/sys/ipc.d b/libphobos/libdruntime/core/sys/posix/sys/ipc.d --- a/libphobos/libdruntime/core/sys/posix/sys/ipc.d +++ b/libphobos/libdruntime/core/sys/posix/sys/ipc.d @@ -188,6 +188,31 @@ else version (DragonFlyBSD) enum IPC_SET= 1; enum IPC_STAT = 2; } +else version (Solaris) +{ +struct ipc_perm +{ + uid_t uid; + gid_t gid; + uid_t cuid; + gid_t cgid; + mode_t mode; + uintseq; + key_t key; +version (D_LP64) {} else + int[4] pad; +} + +enum IPC_CREAT = 0x0200; +enum IPC_EXCL = 0x0400; +enum IPC_NOWAIT = 0x0800; + +enum key_t IPC_PRIVATE = 0; + +enum IPC_RMID = 10; +enum IPC_SET= 11; +enum IPC_STAT = 12; +} else { static assert(false, "Unsupported platform"); @@ -233,6 +258,10 @@ else version (CRuntime_UClibc) { key_t ftok(const scope char*, int); } +else version (Solaris) +{ +key_t ftok(const scope char*, int); +} else { static assert(false, "Unsupported platform");
[PATCH, V3] PR target/99708- Define __SIZEOF_FLOAT128__ and __SIZEOF_IBM128__
[PATCH, V3] Define __SIZEOF_FLOAT128__ and __SIZEOF_IBM128__. Define the sizes of the PowerPC specific types __float128 and __ibm128 if those types are enabled. This patch will define __SIZEOF_IBM128__ and __SIZEOF_FLOAT128__ if their respective types are created in the compiler. Currently, this means both of these will be defined if float128 support is enabled. But at some point in the future, __ibm128 could be enabled without enabling float128 support and __SIZEOF_IBM128__ would be defined. I have tested this on a little endian power9 system and there were no regressions. I did verify by hand that if I compile with -mno-vsx, that __SIZEOF_IBM128__ is not defined. Can I check this into the master branch? Ideally, it should also be backported to GCC 11 and 10. 2022-02-16 Michael Meissner gcc/ PR target/99708 * config/rs6000/rs6000-c.cc (rs6000_cpu_cpp_builtins): Define __SIZEOF_IBM128__ if the IBM 128-bit long double type is created. Define __SIZEOF_FLOAT128__ if we have float128 support. gcc/testsuite/ PR target/99708 * gcc.target/powerpc/pr99708.c: New test. --- gcc/config/rs6000/rs6000-c.cc | 7 ++- gcc/testsuite/gcc.target/powerpc/pr99708.c | 21 + 2 files changed, 27 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99708.c diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index 15251efc209..ec4e5c3f53a 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -622,8 +622,13 @@ rs6000_cpu_cpp_builtins (cpp_reader *pfile) builtin_define ("__RSQRTE__"); if (TARGET_FRSQRTES) builtin_define ("__RSQRTEF__"); + if (ibm128_float_type_node) +builtin_define ("__SIZEOF_IBM128__=16"); if (TARGET_FLOAT128_TYPE) -builtin_define ("__FLOAT128_TYPE__"); +{ + builtin_define ("__FLOAT128_TYPE__"); + builtin_define ("__SIZEOF_FLOAT128__=16"); +} #ifdef TARGET_LIBC_PROVIDES_HWCAP_IN_TCB builtin_define ("__BUILTIN_CPU_SUPPORTS__"); #endif diff --git a/gcc/testsuite/gcc.target/powerpc/pr99708.c b/gcc/testsuite/gcc.target/powerpc/pr99708.c new file mode 100644 index 000..d478f7bc4c0 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr99708.c @@ -0,0 +1,21 @@ +/* { dg-do run } */ +/* { require-effective-target ppc_float128_sw } */ +/* { dg-options "-O2 -mvsx -mfloat128" } */ + +/* + * PR target/99708 + * + * Verify that __SIZEOF_FLOAT128__ and __SIZEOF_IBM128__ are properly defined. + */ + +#include + +int main (void) +{ + if (__SIZEOF_FLOAT128__ != sizeof (__float128) + || __SIZEOF_IBM128__ != sizeof (__ibm128)) +abort (); + + return 0; +} + -- 2.35.1 -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Re: [PATCH] rs6000: Workaround for new ifcvt behavior [PR104335]
Hi! On Wed, Feb 16, 2022 at 08:11:17PM +0100, Robin Dapp wrote: > since r12-6747-gaa8cfe785953a0 ifcvt not only passes real comparisons > but also "cc comparisons" (i.e. the representation of the result of a > comparison) to the backend. rs6000_emit_int_cmove () is not prepared to > handle this. Therefore, this patch makes it return false in such a case > in order to avoid an ICE. > On P10 I compared the testsuite of the last commit before the breaking > one (r12-6746-ge9ebb86799fd77, but commenting out a line that would > still result in a "-Wformat-diag" bootstrap error then) I have used --disable-werror for weeks already :-( > PR target/104335 > > gcc/ChangeLog: > > * config/rs6000/rs6000.cc (rs6000_emit_int_cmove): Return false > if the expected comparison's first operand is of mode MODE_CC. Please send patches as plain text, not as base64. > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -16175,6 +16175,12 @@ rs6000_emit_int_cmove (rtx dest, rtx op, rtx > true_cond, rtx false_cond) >if (mode != SImode && (!TARGET_POWERPC64 || mode != DImode)) > return false; > > + /* PR104335: We now need to expect CC-mode "comparisons" > + coming from ifcvt. The following code expects proper > + comparisons so better abort here. */ > + if (XEXP (op, 0) && GET_MODE_CLASS (GET_MODE (XEXP (op, 0))) == MODE_CC) > +return false; Why that first test? XEXP (op, 0) is required to not be nil. The patch is okay without that (if it passes testing of course :-) ) Thanks! Segher
Re: [PATCH, V3] PR target/99708- Define __SIZEOF_FLOAT128__ and __SIZEOF_IBM128__
Hi! On Wed, Feb 16, 2022 at 06:03:53PM -0500, Michael Meissner wrote: > [PATCH, V3] Define __SIZEOF_FLOAT128__ and __SIZEOF_IBM128__. > > Define the sizes of the PowerPC specific types __float128 and __ibm128 if > those > types are enabled. > > This patch will define __SIZEOF_IBM128__ and __SIZEOF_FLOAT128__ if their > respective types are created in the compiler. > gcc/ > PR target/99708 > * config/rs6000/rs6000-c.cc (rs6000_cpu_cpp_builtins): Define > __SIZEOF_IBM128__ if the IBM 128-bit long double type is created. > Define __SIZEOF_FLOAT128__ if we have float128 support. > --- a/gcc/config/rs6000/rs6000-c.cc > +++ b/gcc/config/rs6000/rs6000-c.cc > @@ -622,8 +622,13 @@ rs6000_cpu_cpp_builtins (cpp_reader *pfile) > builtin_define ("__RSQRTE__"); >if (TARGET_FRSQRTES) > builtin_define ("__RSQRTEF__"); > + if (ibm128_float_type_node) > +builtin_define ("__SIZEOF_IBM128__=16"); >if (TARGET_FLOAT128_TYPE) > -builtin_define ("__FLOAT128_TYPE__"); > +{ > + builtin_define ("__FLOAT128_TYPE__"); > + builtin_define ("__SIZEOF_FLOAT128__=16"); > +} if (TARGET_FLOAT128_TYPE) builtin_define ("__FLOAT128_TYPE__"); if (float128_type_node) builtin_define ("__SIZEOF_FLOAT128__=16"); if (ibm128_float_type_node) builtin_define ("__SIZEOF_IBM128__=16"); Okay like that. Thanks! Segher
[committed] analyzer: fixes to free of non-heap detection [PR104560]
PR analyzer/104560 reports various false positives from -Wanalyzer-free-of-non-heap seen with rdma-core, on what's effectively: free (&ptr->field) where in this case "field" is the first element of its struct, and thus &ptr->field == ptr, and could be on the heap. The root cause is due to malloc_state_machine::on_stmt making "LHS = &EXPR;" transition LHS from start to non_heap when EXPR is not a MEM_REF; this assumption doesn't hold for the above case. This patch eliminates that state transition, instead relying on malloc_state_machine::get_default_state to detect regions known to not be on the heap. Doing so fixes the false positive, but eliminates some events relating to free-of-alloca identifying the alloca, so the patch also reworks free_of_non_heap to capture which region has been freed, adding region creation events to diagnostic paths, so that the alloca calls can be identified, and using the memory space of the region for more precise wording of the diagnostic. The improvement to malloc_state_machine::get_default_state also means we now detect attempts to free VLAs, functions and code labels. In doing so I spotted that I wasn't adding region creation events for regions for global variables, and for cases where an allocation is the last stmt within its basic block, so the patch also fixes these issues. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r12-7268-ga61aaee63848d422e8443e17bbec3257ee59d5d8. gcc/analyzer/ChangeLog: PR analyzer/104560 * diagnostic-manager.cc (diagnostic_manager::build_emission_path): Add region creation events for globals of interest. (null_assignment_sm_context::get_old_program_state): New. (diagnostic_manager::add_events_for_eedge): Move check for changing dynamic extents from PK_BEFORE_STMT case to after the switch on the dst_point's kind so that we can emit them for the final stmt in a basic block. * engine.cc (impl_sm_context::get_old_program_state): New. * sm-malloc.cc (malloc_state_machine::get_default_state): Rewrite detection of m_non_heap to use get_memory_space. (free_of_non_heap::free_of_non_heap): Add freed_reg param. (free_of_non_heap::subclass_equal_p): Update for changes to fields. (free_of_non_heap::emit): Drop m_kind in favor of get_memory_space. (free_of_non_heap::describe_state_change): Remove logic for detecting alloca. (free_of_non_heap::mark_interesting_stuff): Add region-creation of m_freed_reg. (free_of_non_heap::get_memory_space): New. (free_of_non_heap::kind): Drop enum. (free_of_non_heap::m_freed_reg): New field. (free_of_non_heap::m_kind): Drop field. (malloc_state_machine::on_stmt): Drop transition to m_non_heap. (malloc_state_machine::handle_free_of_non_heap): New function, split out from on_deallocator_call and on_realloc_call, adding detection of the freed region. (malloc_state_machine::on_deallocator_call): Use it. (malloc_state_machine::on_realloc_call): Likewise. * sm.h (sm_context::get_old_program_state): New vfunc. gcc/testsuite/ChangeLog: PR analyzer/104560 * g++.dg/analyzer/placement-new.C: Update expected wording. * g++.dg/analyzer/pr100244.C: Likewise. * gcc.dg/analyzer/attr-malloc-1.c (test_7): Likewise. * gcc.dg/analyzer/malloc-1.c (test_24): Likewise. (test_25): Likewise. (test_26): Likewise. (test_50a, test_50b, test_50c): New. * gcc.dg/analyzer/malloc-callbacks.c (test_5): Update expected wording. * gcc.dg/analyzer/malloc-paths-8.c: Likewise. * gcc.dg/analyzer/pr104560-1.c: New test. * gcc.dg/analyzer/pr104560-2.c: New test. * gcc.dg/analyzer/realloc-1.c (test_7): Updated expected wording. * gcc.dg/analyzer/vla-1.c (test_2): New. Prune output from -Wfree-nonheap-object. Signed-off-by: David Malcolm --- gcc/analyzer/diagnostic-manager.cc| 105 +- gcc/analyzer/engine.cc| 5 + gcc/analyzer/sm-malloc.cc | 134 +- gcc/analyzer/sm.h | 4 + gcc/testsuite/g++.dg/analyzer/placement-new.C | 4 +- gcc/testsuite/g++.dg/analyzer/pr100244.C | 2 +- gcc/testsuite/gcc.dg/analyzer/attr-malloc-1.c | 2 +- gcc/testsuite/gcc.dg/analyzer/malloc-1.c | 32 - .../gcc.dg/analyzer/malloc-callbacks.c| 5 +- .../gcc.dg/analyzer/malloc-paths-8.c | 4 +- gcc/testsuite/gcc.dg/analyzer/pr104560-1.c| 43 ++ gcc/testsuite/gcc.dg/analyzer/pr104560-2.c| 26 gcc/testsuite/gcc.dg/analyzer/realloc-1.c | 4 +- gcc/testsuite/gcc.dg/analyzer/vla-1.c | 9 ++ 14 files changed, 262 insertions(+), 117 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr1
Re: Merge from trunk to gccgo branch
I merged trunk revision 24ca97325cab7bc454c785d55f37120fe7ea6f74 to the gccgo branch. Ian
Re: [PATCH] c++: double non-dep folding from finish_compound_literal [PR104565]
On 2/16/22 15:26, Patrick Palka wrote: In finish_compound_literal, we perform non-dependent expr folding before calling check_narrowing (ever since r9-5973). But ever since r10-7096, check_narrowing also performs non-dependent expr folding of its own. This double folding cause tsubst to see non-templated trees during the second folding, which causes a spurious error in the below testcase. This patch removes this first folding operation; it now seems obviated by the second one. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk/10/11? OK for trunk now, release branches in about a month. PR c++/104565 gcc/cp/ChangeLog: * semantics.cc (finish_compound_literal): Don't perform non-dependent expr folding before calling check_narrowing. gcc/testsuite/ChangeLog: * g++.dg/template/non-dependent22.C: New test. --- gcc/cp/semantics.cc | 10 +++--- gcc/testsuite/g++.dg/template/non-dependent22.C | 12 2 files changed, 15 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/g++.dg/template/non-dependent22.C diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 0cb17a6a8ab..114baa48710 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -3203,13 +3203,9 @@ finish_compound_literal (tree type, tree compound_literal, return error_mark_node; compound_literal = reshape_init (type, compound_literal, complain); if (SCALAR_TYPE_P (type) - && !BRACE_ENCLOSED_INITIALIZER_P (compound_literal)) -{ - tree t = instantiate_non_dependent_expr_sfinae (compound_literal, - complain); - if (!check_narrowing (type, t, complain)) - return error_mark_node; -} + && !BRACE_ENCLOSED_INITIALIZER_P (compound_literal) + && !check_narrowing (type, compound_literal, complain)) +return error_mark_node; if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type) == NULL_TREE) { diff --git a/gcc/testsuite/g++.dg/template/non-dependent22.C b/gcc/testsuite/g++.dg/template/non-dependent22.C new file mode 100644 index 000..83a6a13f15b --- /dev/null +++ b/gcc/testsuite/g++.dg/template/non-dependent22.C @@ -0,0 +1,12 @@ +// PR c++/104565 +// { dg-do compile { target c++11 } } + +struct apa { + constexpr int n() const { return 3; } +}; + +template +int f() { + apa foo; + return int{foo.n()}; // no matching function for call to 'apa::n(apa*)' +}
Re: [PATCH] Restrict the two sources of vect_recog_cond_expr_convert_pattern to be of the same type when convert is extension.
On Wed, Feb 16, 2022 at 10:17 PM Jakub Jelinek via Gcc-patches wrote: > > On Wed, Feb 16, 2022 at 05:03:09PM +0800, liuhongt via Gcc-patches wrote: > > > > +(match (cond_expr_convert_p @0 @2 @3 @6) > > > > + (cond (simple_comparison@6 @0 @1) (convert@4 @2) (convert@5 @3)) > > > > + (if (types_match (TREE_TYPE (@2), TREE_TYPE (@3)) > > > > > > But in principle @2 or @3 could safely differ in sign, you'd then need to > > > ensure > > > to insert sign conversions to @2/@3 to the signedness of @4/@5. > > > > > It turns out differ in sign is not suitable for extension(but ok for > > truncation), > > because it's zero_extend vs sign_extend. > > > > The patch add types_match check when convert is extension. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > And native Bootstrapped and regtested on CLX. > > > > Ok for trunk? > > > > gcc/ChangeLog: > > > > PR tree-optimization/104551 > > PR tree-optimization/103771 > > * match.pd (cond_expr_convert_p): Add types_match check when > > convert is extension. > > * tree-vect-patterns.cc > > (gimple_cond_expr_convert_p): Adjust comments. > > (vect_recog_cond_expr_convert_pattern): Ditto. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr104551.c: New test. > > --- > > gcc/match.pd | 8 +--- > > gcc/testsuite/gcc.target/i386/pr104551.c | 24 > > gcc/tree-vect-patterns.cc| 6 -- > > 3 files changed, 33 insertions(+), 5 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr104551.c > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index 05a10ab6bfd..8e80b9f1576 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -7692,11 +7692,13 @@ and, > >(if (INTEGRAL_TYPE_P (type) > > && INTEGRAL_TYPE_P (TREE_TYPE (@2)) > > && INTEGRAL_TYPE_P (TREE_TYPE (@0)) > > - && INTEGRAL_TYPE_P (TREE_TYPE (@3)) > > && TYPE_PRECISION (type) != TYPE_PRECISION (TREE_TYPE (@0)) > > && TYPE_PRECISION (TREE_TYPE (@0)) > > == TYPE_PRECISION (TREE_TYPE (@2)) > > - && TYPE_PRECISION (TREE_TYPE (@0)) > > - == TYPE_PRECISION (TREE_TYPE (@3)) > > + && (types_match (TREE_TYPE (@2), TREE_TYPE (@3)) > > +|| ((TYPE_PRECISION (TREE_TYPE (@0)) > > + == TYPE_PRECISION (TREE_TYPE (@3))) > > +&& INTEGRAL_TYPE_P (TREE_TYPE (@3)) > > +&& TYPE_PRECISION (TREE_TYPE (@3)) > TYPE_PRECISION (type))) > > && single_use (@4) > > && single_use (@5 > > I find this quite unreadable, it looks like if @2 and @3 are treated > differently. I think keeping the old 3 lines and just adding > && (TYPE_PRECISION (TREE_TYPE (@0)) >= TYPE_PRECISION (type) > || (TYPE_UNSIGNED (TREE_TYPE (@2)) > == TYPE_UNSIGNED (TREE_TYPE (@3 Yes, good idea. > after it ideally with a comment why would be better. > Note, if the precision of @0 and type is the same, I think signedness can > still differ, no? We have TYPE_PRECISION (type) != TYPE_PRECISION (TREE_TYPE (@0)). > > Jakub > -- BR, Hongtao
[committed] analyzer: const functions have no side effects [PR104576]
PR analyzer/104576 tracks that we issue a false positive from -Wanalyzer-use-of-uninitialized-value for the reproducers of PR 63311 when optimization is disabled. The root cause is that the analyzer was considering that a call to __builtin_sinf could have side-effects. This patch fixes things by generalizing the handling for "pure" functions to also consider "const" functions. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r12-7270-g5fbcbcaff7248604e04b39464f4fbd64fbf6e43b. gcc/analyzer/ChangeLog: PR analyzer/104576 * region-model.cc: Include "calls.h". (region_model::on_call_pre): Use flags_from_decl_or_type to generalize check for DECL_PURE_P to also check for ECF_CONST. gcc/testsuite/ChangeLog: PR analyzer/104576 * gcc.dg/analyzer/torture/uninit-pr63311.c: New test. * gcc.dg/analyzer/uninit-pr104576.c: New test. * gfortran.dg/analyzer/uninit-pr63311.f90: New test. Signed-off-by: David Malcolm --- gcc/analyzer/region-model.cc | 6 +- .../gcc.dg/analyzer/torture/uninit-pr63311.c | 134 ++ .../gcc.dg/analyzer/uninit-pr104576.c | 16 +++ .../gfortran.dg/analyzer/uninit-pr63311.f90 | 39 + 4 files changed, 193 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/uninit-pr63311.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/uninit-pr104576.c create mode 100644 gcc/testsuite/gfortran.dg/analyzer/uninit-pr63311.f90 diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc index 69e8fa7d1e3..d4d7816e0d5 100644 --- a/gcc/analyzer/region-model.cc +++ b/gcc/analyzer/region-model.cc @@ -72,6 +72,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-phinodes.h" #include "tree-ssa-operands.h" #include "ssa-iterators.h" +#include "calls.h" #if ENABLE_ANALYZER @@ -1271,13 +1272,14 @@ region_model::on_call_pre (const gcall *call, region_model_context *ctxt, in region-model-impl-calls.cc. Having them split out into separate functions makes it easier to put breakpoints on the handling of specific functions. */ + int callee_fndecl_flags = flags_from_decl_or_type (callee_fndecl); if (fndecl_built_in_p (callee_fndecl, BUILT_IN_NORMAL) && gimple_builtin_call_types_compatible_p (call, callee_fndecl)) switch (DECL_UNCHECKED_FUNCTION_CODE (callee_fndecl)) { default: - if (!DECL_PURE_P (callee_fndecl)) + if (!(callee_fndecl_flags & (ECF_CONST | ECF_PURE))) unknown_side_effects = true; break; case BUILT_IN_ALLOCA: @@ -1433,7 +1435,7 @@ region_model::on_call_pre (const gcall *call, region_model_context *ctxt, /* Handle in "on_call_post". */ } else if (!fndecl_has_gimple_body_p (callee_fndecl) - && !DECL_PURE_P (callee_fndecl) + && (!(callee_fndecl_flags & (ECF_CONST | ECF_PURE))) && !fndecl_built_in_p (callee_fndecl)) unknown_side_effects = true; } diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/uninit-pr63311.c b/gcc/testsuite/gcc.dg/analyzer/torture/uninit-pr63311.c new file mode 100644 index 000..a73289cb83f --- /dev/null +++ b/gcc/testsuite/gcc.dg/analyzer/torture/uninit-pr63311.c @@ -0,0 +1,134 @@ +/* { dg-additional-options "-Wno-analyzer-too-complex" } */ + +int foo () +{ + static volatile int v = 42; + int __result_foo; + + __result_foo = (int) v; + return __result_foo; +} + +void test (int * restrict n, int * restrict flag) +{ + int i; + int j; + int k; + double t; + int tt; + double v; + + if (*flag) +{ + t = 4.2e+1; + tt = foo (); +} + L_1: ; + v = 0.0; + { +int D_3353; + +D_3353 = *n; +i = 1; +if (i <= D_3353) + { +while (1) + { +{ + int D_3369; + + v = 0.0; + if (*flag) +{ + if (tt == i) +{ + { +double M_0; + +M_0 = v; +if (t > M_0 || (int) (M_0 != M_0)) + { +M_0 = t; + } +v = M_0; + } +} + L_5:; +} + L_4:; + { +int D_3359; + +D_3359 = *n; +j = 1; +if (j <= D_3359) + { +while (1) + { +{ + int D_3368; + + { +int D_3362; + +D_3362 = *n; +k = 1; +if (k <= D_3362) + { +
[PATCH] [i386] Clean up MPX-related bit_{MPX,BNDREGS,BNDCSR}.
Bootstrap and regrestest on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: * config/i386/cpuid.h (bit_MPX): Removed. (bit_BNDREGS): Ditto. (bit_BNDCSR): Ditto. --- gcc/config/i386/cpuid.h | 5 - 1 file changed, 5 deletions(-) diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index ed6113009bb..8b3dc2b1dde 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -86,7 +86,6 @@ #define bit_AVX2 (1 << 5) #define bit_BMI2 (1 << 8) #define bit_RTM(1 << 11) -#define bit_MPX(1 << 14) #define bit_AVX512F(1 << 16) #define bit_AVX512DQ (1 << 17) #define bit_RDSEED (1 << 18) @@ -136,10 +135,6 @@ #define bit_AMX_TILE(1 << 24) #define bit_AMX_INT8(1 << 25) -/* XFEATURE_ENABLED_MASK register bits (%eax == 0xd, %ecx == 0) */ -#define bit_BNDREGS (1 << 3) -#define bit_BNDCSR (1 << 4) - /* Extended State Enumeration Sub-leaf (%eax == 0xd, %ecx == 1) */ #define bit_XSAVEOPT (1 << 0) #define bit_XSAVEC (1 << 1) -- 2.18.1
Re: [PATCH] [i386] Clean up MPX-related bit_{MPX,BNDREGS,BNDCSR}.
On Thu, Feb 17, 2022 at 12:00 PM liuhongt wrote: > > Bootstrap and regrestest on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/cpuid.h (bit_MPX): Removed. > (bit_BNDREGS): Ditto. > (bit_BNDCSR): Ditto. > --- > gcc/config/i386/cpuid.h | 5 - > 1 file changed, 5 deletions(-) > > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h > index ed6113009bb..8b3dc2b1dde 100644 > --- a/gcc/config/i386/cpuid.h > +++ b/gcc/config/i386/cpuid.h > @@ -86,7 +86,6 @@ > #define bit_AVX2 (1 << 5) > #define bit_BMI2 (1 << 8) > #define bit_RTM(1 << 11) > -#define bit_MPX(1 << 14) > #define bit_AVX512F(1 << 16) > #define bit_AVX512DQ (1 << 17) > #define bit_RDSEED (1 << 18) > @@ -136,10 +135,6 @@ > #define bit_AMX_TILE(1 << 24) > #define bit_AMX_INT8(1 << 25) > > -/* XFEATURE_ENABLED_MASK register bits (%eax == 0xd, %ecx == 0) */ > -#define bit_BNDREGS (1 << 3) > -#define bit_BNDCSR (1 << 4) > - > /* Extended State Enumeration Sub-leaf (%eax == 0xd, %ecx == 1) */ > #define bit_XSAVEOPT (1 << 0) > #define bit_XSAVEC (1 << 1) > -- > 2.18.1 > -- BR, Hongtao
libbacktrace patch committed: Handle skeleton units
This libbacktrace patch handles DWARF 5 skeleton units, which are used when part of the DWARF information is stored in a separate file. This doesn't actually look in the separate file, as the line number information, which is all that we care about, is normally kept in the main executable because it needs relocations. For this patch bootstrapped and ran libbacktrace and Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian * dwarf.c (find_address_ranges): Handle skeleton units. (read_function_entry): Likewise. 3c16999f983331301384f51fc1cdc04f7d51ef6c diff --git a/libbacktrace/dwarf.c b/libbacktrace/dwarf.c index 2158bc14065..45cc9e77e40 100644 --- a/libbacktrace/dwarf.c +++ b/libbacktrace/dwarf.c @@ -1989,14 +1989,16 @@ find_address_ranges (struct backtrace_state *state, uintptr_t base_address, break; case DW_AT_stmt_list: - if (abbrev->tag == DW_TAG_compile_unit + if ((abbrev->tag == DW_TAG_compile_unit + || abbrev->tag == DW_TAG_skeleton_unit) && (val.encoding == ATTR_VAL_UINT || val.encoding == ATTR_VAL_REF_SECTION)) u->lineoff = val.u.uint; break; case DW_AT_name: - if (abbrev->tag == DW_TAG_compile_unit) + if (abbrev->tag == DW_TAG_compile_unit + || abbrev->tag == DW_TAG_skeleton_unit) { name_val = val; have_name_val = 1; @@ -2004,7 +2006,8 @@ find_address_ranges (struct backtrace_state *state, uintptr_t base_address, break; case DW_AT_comp_dir: - if (abbrev->tag == DW_TAG_compile_unit) + if (abbrev->tag == DW_TAG_compile_unit + || abbrev->tag == DW_TAG_skeleton_unit) { comp_dir_val = val; have_comp_dir_val = 1; @@ -2012,19 +2015,22 @@ find_address_ranges (struct backtrace_state *state, uintptr_t base_address, break; case DW_AT_str_offsets_base: - if (abbrev->tag == DW_TAG_compile_unit + if ((abbrev->tag == DW_TAG_compile_unit + || abbrev->tag == DW_TAG_skeleton_unit) && val.encoding == ATTR_VAL_REF_SECTION) u->str_offsets_base = val.u.uint; break; case DW_AT_addr_base: - if (abbrev->tag == DW_TAG_compile_unit + if ((abbrev->tag == DW_TAG_compile_unit + || abbrev->tag == DW_TAG_skeleton_unit) && val.encoding == ATTR_VAL_REF_SECTION) u->addr_base = val.u.uint; break; case DW_AT_rnglists_base: - if (abbrev->tag == DW_TAG_compile_unit + if ((abbrev->tag == DW_TAG_compile_unit + || abbrev->tag == DW_TAG_skeleton_unit) && val.encoding == ATTR_VAL_REF_SECTION) u->rnglists_base = val.u.uint; break; @@ -2052,7 +2058,8 @@ find_address_ranges (struct backtrace_state *state, uintptr_t base_address, } if (abbrev->tag == DW_TAG_compile_unit - || abbrev->tag == DW_TAG_subprogram) + || abbrev->tag == DW_TAG_subprogram + || abbrev->tag == DW_TAG_skeleton_unit) { if (!add_ranges (state, dwarf_sections, base_address, is_bigendian, u, pcrange.lowpc, &pcrange, @@ -2060,9 +2067,10 @@ find_address_ranges (struct backtrace_state *state, uintptr_t base_address, (void *) addrs)) return 0; - /* If we found the PC range in the DW_TAG_compile_unit, we -can stop now. */ - if (abbrev->tag == DW_TAG_compile_unit + /* If we found the PC range in the DW_TAG_compile_unit or +DW_TAG_skeleton_unit, we can stop now. */ + if ((abbrev->tag == DW_TAG_compile_unit + || abbrev->tag == DW_TAG_skeleton_unit) && (pcrange.have_ranges || (pcrange.have_lowpc && pcrange.have_highpc))) return 1; @@ -3274,7 +3282,8 @@ read_function_entry (struct backtrace_state *state, struct dwarf_data *ddata, /* The compile unit sets the base address for any address ranges in the function entries. */ - if (abbrev->tag == DW_TAG_compile_unit + if ((abbrev->tag == DW_TAG_compile_unit + || abbrev->tag == DW_TAG_skeleton_unit) && abbrev->attrs[i].name == DW_AT_low_pc) { if (val.encoding == ATTR_VAL_ADDRESS)
[PATCH] x86: Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER
Reading YMM registers with all zero bits needs VZEROUPPER on Sandy Bride, Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX transition penalty. Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER to generate vzeroupper instruction after loading all-zero YMM/YMM registers and enable it by default. gcc/ PR target/101456 * config/i386/i386.cc (ix86_avx_u128_mode_needed): Skip the vzeroupper optimization if target needs vzeroupper after reading all-zero YMM/YMM registers. * config/i386/i386.h (TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER): New. * config/i386/x86-tune.def (X86_TUNE_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER): New. gcc/testsuite/ PR target/101456 * gcc.target/i386/pr101456-1.c (dg-options): Add -mtune-ctrl=^read_zero_ymm_zmm_need_vzeroupper. * gcc.target/i386/pr101456-2.c: Likewise. * gcc.target/i386/pr101456-3.c: New test. * gcc.target/i386/pr101456-4.c: Likewise. --- gcc/config/i386/i386.cc| 51 -- gcc/config/i386/i386.h | 2 + gcc/config/i386/x86-tune.def | 5 +++ gcc/testsuite/gcc.target/i386/pr101456-1.c | 2 +- gcc/testsuite/gcc.target/i386/pr101456-2.c | 2 +- gcc/testsuite/gcc.target/i386/pr101456-3.c | 33 ++ gcc/testsuite/gcc.target/i386/pr101456-4.c | 33 ++ 7 files changed, 103 insertions(+), 25 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr101456-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr101456-4.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index cf246e74e57..1f8b4caf24c 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -14502,33 +14502,38 @@ ix86_avx_u128_mode_needed (rtx_insn *insn) subrtx_iterator::array_type array; - rtx set = single_set (insn); - if (set) + if (!TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER) { - rtx dest = SET_DEST (set); - rtx src = SET_SRC (set); - if (ix86_check_avx_upper_register (dest)) + /* Perform this vzeroupper optimization if target doesn't need +vzeroupper after reading all-zero YMM/YMM registers. */ + rtx set = single_set (insn); + if (set) { - /* This is an YMM/ZMM load. Return AVX_U128_DIRTY if the -source isn't zero. */ - if (standard_sse_constant_p (src, GET_MODE (dest)) != 1) - return AVX_U128_DIRTY; + rtx dest = SET_DEST (set); + rtx src = SET_SRC (set); + if (ix86_check_avx_upper_register (dest)) + { + /* This is an YMM/ZMM load. Return AVX_U128_DIRTY if the +source isn't zero. */ + if (standard_sse_constant_p (src, GET_MODE (dest)) != 1) + return AVX_U128_DIRTY; + else + return AVX_U128_ANY; + } else - return AVX_U128_ANY; - } - else - { - FOR_EACH_SUBRTX (iter, array, src, NONCONST) - if (ix86_check_avx_upper_register (*iter)) - { - int status = ix86_avx_u128_mode_source (insn, *iter); - if (status == AVX_U128_DIRTY) - return status; - } - } + { + FOR_EACH_SUBRTX (iter, array, src, NONCONST) + if (ix86_check_avx_upper_register (*iter)) + { + int status = ix86_avx_u128_mode_source (insn, *iter); + if (status == AVX_U128_DIRTY) + return status; + } + } - /* This isn't YMM/ZMM load/store. */ - return AVX_U128_ANY; + /* This isn't YMM/ZMM load/store. */ + return AVX_U128_ANY; + } } /* Require DIRTY mode if a 256bit or 512bit AVX register is referenced. diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index f41e0908250..98c2e200027 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -425,6 +425,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; #define TARGET_AVOID_MFENCE ix86_tune_features[X86_TUNE_AVOID_MFENCE] #define TARGET_EMIT_VZEROUPPER \ ix86_tune_features[X86_TUNE_EMIT_VZEROUPPER] +#define TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER \ + ix86_tune_features[X86_TUNE_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER] #define TARGET_EXPAND_ABS \ ix86_tune_features[X86_TUNE_EXPAND_ABS] #define TARGET_V2DF_REDUCTION_PREFER_HADDPD \ diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index 82ca0ae63ac..0a068c09202 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -649,3 +649,8 @@ DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "promote_qi_regs", m_NONE) /* X86_TUNE_EMIT_VZEROUPPER: This enables vzeroupper instruction insertion before a transfer of control flow out of the function. */ DEF_TUNE (X86_TUNE_EMIT_VZEROUPPER, "emit_vzeroupper", ~m_KNL)
Re: Merge from trunk to gccgo branch
And another one: I merged trunk revision 837eb12629dd8a8a45fac9b8db57b29ecda46f14 to the gccgo branch. Ian
Re: [PATCH] x86: Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER
On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches wrote: > > Reading YMM registers with all zero bits needs VZEROUPPER on Sandy Bride, > Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX > transition penalty. Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER to > generate vzeroupper instruction after loading all-zero YMM/YMM registers > and enable it by default. Shouldn't TARGET_READ_ZERO_YMM_ZMM_NONEED_VZEROUPPER sounds a bit smoother? Because originally we needed to add vzeroupper to all avx<->sse cases, now it's a tune to indicate that we don't need to add it in some cases. > > gcc/ > > PR target/101456 > * config/i386/i386.cc (ix86_avx_u128_mode_needed): Skip the > vzeroupper optimization if target needs vzeroupper after reading > all-zero YMM/YMM registers. > * config/i386/i386.h (TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER): > New. > * config/i386/x86-tune.def > (X86_TUNE_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER): New. > > gcc/testsuite/ > > PR target/101456 > * gcc.target/i386/pr101456-1.c (dg-options): Add > -mtune-ctrl=^read_zero_ymm_zmm_need_vzeroupper. > * gcc.target/i386/pr101456-2.c: Likewise. > * gcc.target/i386/pr101456-3.c: New test. > * gcc.target/i386/pr101456-4.c: Likewise. > --- > gcc/config/i386/i386.cc| 51 -- > gcc/config/i386/i386.h | 2 + > gcc/config/i386/x86-tune.def | 5 +++ > gcc/testsuite/gcc.target/i386/pr101456-1.c | 2 +- > gcc/testsuite/gcc.target/i386/pr101456-2.c | 2 +- > gcc/testsuite/gcc.target/i386/pr101456-3.c | 33 ++ > gcc/testsuite/gcc.target/i386/pr101456-4.c | 33 ++ > 7 files changed, 103 insertions(+), 25 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr101456-3.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr101456-4.c > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index cf246e74e57..1f8b4caf24c 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -14502,33 +14502,38 @@ ix86_avx_u128_mode_needed (rtx_insn *insn) > >subrtx_iterator::array_type array; > > - rtx set = single_set (insn); > - if (set) > + if (!TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER) > { > - rtx dest = SET_DEST (set); > - rtx src = SET_SRC (set); > - if (ix86_check_avx_upper_register (dest)) > + /* Perform this vzeroupper optimization if target doesn't need > +vzeroupper after reading all-zero YMM/YMM registers. */ > + rtx set = single_set (insn); > + if (set) > { > - /* This is an YMM/ZMM load. Return AVX_U128_DIRTY if the > -source isn't zero. */ > - if (standard_sse_constant_p (src, GET_MODE (dest)) != 1) > - return AVX_U128_DIRTY; > + rtx dest = SET_DEST (set); > + rtx src = SET_SRC (set); > + if (ix86_check_avx_upper_register (dest)) > + { > + /* This is an YMM/ZMM load. Return AVX_U128_DIRTY if the > +source isn't zero. */ > + if (standard_sse_constant_p (src, GET_MODE (dest)) != 1) > + return AVX_U128_DIRTY; > + else > + return AVX_U128_ANY; > + } > else > - return AVX_U128_ANY; > - } > - else > - { > - FOR_EACH_SUBRTX (iter, array, src, NONCONST) > - if (ix86_check_avx_upper_register (*iter)) > - { > - int status = ix86_avx_u128_mode_source (insn, *iter); > - if (status == AVX_U128_DIRTY) > - return status; > - } > - } > + { > + FOR_EACH_SUBRTX (iter, array, src, NONCONST) > + if (ix86_check_avx_upper_register (*iter)) > + { > + int status = ix86_avx_u128_mode_source (insn, *iter); > + if (status == AVX_U128_DIRTY) > + return status; > + } > + } > > - /* This isn't YMM/ZMM load/store. */ > - return AVX_U128_ANY; > + /* This isn't YMM/ZMM load/store. */ > + return AVX_U128_ANY; > + } > } > >/* Require DIRTY mode if a 256bit or 512bit AVX register is referenced. > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index f41e0908250..98c2e200027 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -425,6 +425,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; > #define TARGET_AVOID_MFENCE ix86_tune_features[X86_TUNE_AVOID_MFENCE] > #define TARGET_EMIT_VZEROUPPER \ > ix86_tune_features[X86_TUNE_EMIT_VZEROUPPER] > +#define TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER \ > + ix86_tune_features[X86_TUNE_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER] > #define TARGET_EXPAND_ABS \ > ix86_tune_features[X86_TUNE_EXPAND_ABS] > #define T
[PATCH V2] Restrict the two sources of vect_recog_cond_expr_convert_pattern to be of the same type when convert is extension.
> I find this quite unreadable, it looks like if @2 and @3 are treated > differently. I think keeping the old 3 lines and just adding > && (TYPE_PRECISION (TREE_TYPE (@0)) >= TYPE_PRECISION (type) > || (TYPE_UNSIGNED (TREE_TYPE (@2)) > == TYPE_UNSIGNED (TREE_TYPE (@3 > after it ideally with a comment why would be better. Update patch. gcc/ChangeLog: PR tree-optimization/104551 PR tree-optimization/103771 * match.pd (cond_expr_convert_p): Add types_match check when convert is extension. * tree-vect-patterns.cc (gimple_cond_expr_convert_p): Adjust comments. (vect_recog_cond_expr_convert_pattern): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr104551.c: New test. --- gcc/match.pd | 6 ++ gcc/testsuite/gcc.target/i386/pr104551.c | 24 gcc/tree-vect-patterns.cc| 6 -- 3 files changed, 34 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr104551.c diff --git a/gcc/match.pd b/gcc/match.pd index 05a10ab6bfd..8b6f22f1065 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -7698,5 +7698,11 @@ and, == TYPE_PRECISION (TREE_TYPE (@2)) && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@3)) + /* For vect_recog_cond_expr_convert_pattern, @2 and @3 can differ in + signess when convert is truncation, but not ok for extension since + it's sign_extend vs zero_extend. */ + && (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (type) + || (TYPE_UNSIGNED (TREE_TYPE (@2)) + == TYPE_UNSIGNED (TREE_TYPE (@3 && single_use (@4) && single_use (@5 diff --git a/gcc/testsuite/gcc.target/i386/pr104551.c b/gcc/testsuite/gcc.target/i386/pr104551.c new file mode 100644 index 000..6300f25c0d5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr104551.c @@ -0,0 +1,24 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mavx2" } */ +/* { dg-require-effective-target avx2 } */ + +unsigned int +__attribute__((noipa)) +test(unsigned int a, unsigned char p[16]) { + unsigned int res = 0; + for (unsigned b = 0; b < a; b += 1) +res = p[b] ? p[b] : (char) b; + return res; +} + +int main () +{ + unsigned int a = 16U; + unsigned char p[16]; + for (int i = 0; i != 16; i++) +p[i] = (unsigned char)128; + unsigned int res = test (a, p); + if (res != 128) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index a8f96d59643..217bdfd7045 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -929,8 +929,10 @@ vect_reassociating_reduction_p (vec_info *vinfo, with conditions: 1) @1, @2, c, d, a, b are all integral type. 2) There's single_use for both @1 and @2. - 3) a, c and d have same precision. + 3) a, c have same precision. 4) c and @1 have different precision. + 5) c, d are the same type or they can differ in sign when convert is + truncation. record a and c and d and @3. */ @@ -952,7 +954,7 @@ extern bool gimple_cond_expr_convert_p (tree, tree*, tree (*)(tree)); TYPE_PRECISION (TYPE_E) != TYPE_PRECISION (TYPE_CD); TYPE_PRECISION (TYPE_AB) == TYPE_PRECISION (TYPE_CD); single_use of op_true and op_false. - TYPE_AB could differ in sign. + TYPE_AB could differ in sign when (TYPE_E) A is a truncation. Input: -- 2.18.1
Re: [PATCH] tree-optimization/96881 - CD-DCE and CLOBBERs
On Tue, 15 Feb 2022, Jan Hubicka wrote: > > @@ -1272,7 +1275,7 @@ maybe_optimize_arith_overflow (gimple_stmt_iterator > > *gsi, > > contributes nothing to the program, and can be deleted. */ > > > > static bool > > -eliminate_unnecessary_stmts (void) > > +eliminate_unnecessary_stmts (bool aggressive) > > { > >bool something_changed = false; > >basic_block bb; > > @@ -1366,7 +1369,9 @@ eliminate_unnecessary_stmts (void) > > break; > > } > > } > > - if (!dead) > > + if (!dead > > + && (!aggressive > > + || bitmap_bit_p (visited_control_parents, bb->index))) > > It seems to me that it may be worth to consider case where > visited_control_parents is 0 while all basic blocks in the CD relation > are live for different reasons. I suppose this can happen in more > complex CFGs when the other arms of conditionals are live... It's a bit difficult to do in this place though since we might already have altered those blocks (and we need to check not for the block being live but for its control stmt). I suppose we could use the last_stmt_necessary bitmap. I'll do some statistics to see whether this helps. Richard.
Re: [PATCH] [i386] Clean up MPX-related bit_{MPX,BNDREGS,BNDCSR}.
On Thu, Feb 17, 2022 at 5:01 AM Hongtao Liu wrote: > > On Thu, Feb 17, 2022 at 12:00 PM liuhongt wrote: > > > > Bootstrap and regrestest on x86_64-pc-linux-gnu{-m32,}. > > Ok for trunk? > > > > gcc/ChangeLog: > > > > * config/i386/cpuid.h (bit_MPX): Removed. > > (bit_BNDREGS): Ditto. > > (bit_BNDCSR): Ditto. OK. Thanks, Uros. > > --- > > gcc/config/i386/cpuid.h | 5 - > > 1 file changed, 5 deletions(-) > > > > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h > > index ed6113009bb..8b3dc2b1dde 100644 > > --- a/gcc/config/i386/cpuid.h > > +++ b/gcc/config/i386/cpuid.h > > @@ -86,7 +86,6 @@ > > #define bit_AVX2 (1 << 5) > > #define bit_BMI2 (1 << 8) > > #define bit_RTM(1 << 11) > > -#define bit_MPX(1 << 14) > > #define bit_AVX512F(1 << 16) > > #define bit_AVX512DQ (1 << 17) > > #define bit_RDSEED (1 << 18) > > @@ -136,10 +135,6 @@ > > #define bit_AMX_TILE(1 << 24) > > #define bit_AMX_INT8(1 << 25) > > > > -/* XFEATURE_ENABLED_MASK register bits (%eax == 0xd, %ecx == 0) */ > > -#define bit_BNDREGS (1 << 3) > > -#define bit_BNDCSR (1 << 4) > > - > > /* Extended State Enumeration Sub-leaf (%eax == 0xd, %ecx == 1) */ > > #define bit_XSAVEOPT (1 << 0) > > #define bit_XSAVEC (1 << 1) > > -- > > 2.18.1 > > > > > -- > BR, > Hongtao
Re: [PATCH] x86: Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER
On Thu, Feb 17, 2022 at 6:25 AM Hongtao Liu via Gcc-patches wrote: > > On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches > wrote: > > > > Reading YMM registers with all zero bits needs VZEROUPPER on Sandy Bride, > > Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX > > transition penalty. Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER to > > generate vzeroupper instruction after loading all-zero YMM/YMM registers > > and enable it by default. > Shouldn't TARGET_READ_ZERO_YMM_ZMM_NONEED_VZEROUPPER sounds a bit smoother? > Because originally we needed to add vzeroupper to all avx<->sse cases, > now it's a tune to indicate that we don't need to add it in some Perhaps we should go from the other side and use X86_TUNE_OPTIMIZE_AVX_READ for new processors? Uros. > cases. > > > > gcc/ > > > > PR target/101456 > > * config/i386/i386.cc (ix86_avx_u128_mode_needed): Skip the > > vzeroupper optimization if target needs vzeroupper after reading > > all-zero YMM/YMM registers. > > * config/i386/i386.h (TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER): > > New. > > * config/i386/x86-tune.def > > (X86_TUNE_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER): New. > > > > gcc/testsuite/ > > > > PR target/101456 > > * gcc.target/i386/pr101456-1.c (dg-options): Add > > -mtune-ctrl=^read_zero_ymm_zmm_need_vzeroupper. > > * gcc.target/i386/pr101456-2.c: Likewise. > > * gcc.target/i386/pr101456-3.c: New test. > > * gcc.target/i386/pr101456-4.c: Likewise. > > --- > > gcc/config/i386/i386.cc| 51 -- > > gcc/config/i386/i386.h | 2 + > > gcc/config/i386/x86-tune.def | 5 +++ > > gcc/testsuite/gcc.target/i386/pr101456-1.c | 2 +- > > gcc/testsuite/gcc.target/i386/pr101456-2.c | 2 +- > > gcc/testsuite/gcc.target/i386/pr101456-3.c | 33 ++ > > gcc/testsuite/gcc.target/i386/pr101456-4.c | 33 ++ > > 7 files changed, 103 insertions(+), 25 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101456-3.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101456-4.c > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index cf246e74e57..1f8b4caf24c 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -14502,33 +14502,38 @@ ix86_avx_u128_mode_needed (rtx_insn *insn) > > > >subrtx_iterator::array_type array; > > > > - rtx set = single_set (insn); > > - if (set) > > + if (!TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER) > > { > > - rtx dest = SET_DEST (set); > > - rtx src = SET_SRC (set); > > - if (ix86_check_avx_upper_register (dest)) > > + /* Perform this vzeroupper optimization if target doesn't need > > +vzeroupper after reading all-zero YMM/YMM registers. */ > > + rtx set = single_set (insn); > > + if (set) > > { > > - /* This is an YMM/ZMM load. Return AVX_U128_DIRTY if the > > -source isn't zero. */ > > - if (standard_sse_constant_p (src, GET_MODE (dest)) != 1) > > - return AVX_U128_DIRTY; > > + rtx dest = SET_DEST (set); > > + rtx src = SET_SRC (set); > > + if (ix86_check_avx_upper_register (dest)) > > + { > > + /* This is an YMM/ZMM load. Return AVX_U128_DIRTY if the > > +source isn't zero. */ > > + if (standard_sse_constant_p (src, GET_MODE (dest)) != 1) > > + return AVX_U128_DIRTY; > > + else > > + return AVX_U128_ANY; > > + } > > else > > - return AVX_U128_ANY; > > - } > > - else > > - { > > - FOR_EACH_SUBRTX (iter, array, src, NONCONST) > > - if (ix86_check_avx_upper_register (*iter)) > > - { > > - int status = ix86_avx_u128_mode_source (insn, *iter); > > - if (status == AVX_U128_DIRTY) > > - return status; > > - } > > - } > > + { > > + FOR_EACH_SUBRTX (iter, array, src, NONCONST) > > + if (ix86_check_avx_upper_register (*iter)) > > + { > > + int status = ix86_avx_u128_mode_source (insn, *iter); > > + if (status == AVX_U128_DIRTY) > > + return status; > > + } > > + } > > > > - /* This isn't YMM/ZMM load/store. */ > > - return AVX_U128_ANY; > > + /* This isn't YMM/ZMM load/store. */ > > + return AVX_U128_ANY; > > + } > > } > > > >/* Require DIRTY mode if a 256bit or 512bit AVX register is referenced. > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > > index f41e0908250..98c2e200027 100644 > > --- a/gcc/config/i386/i386.h > > +++ b/gcc/config/i386/i386.h > > @@ -425,6 +425,8 @@ extern unsigned char ix86_tune_feat