Re: [PATCH] [i386] Prevent vectorization for load from parm_decl at O2 to avoid STF issue.
On Fri, Mar 4, 2022 at 3:28 PM liuhongt via Gcc-patches wrote: > > For parameter passing through stack, vectorized load from parm_decl > in callee may trigger serious STF issue. This is why GCC12 regresses > 50% for cray at -O2 compared to GCC11. > > The patch add an extremely large number to stmt_cost to prevent > vectorization for loads from parm_decl under very-cheap cost model, > this can at least prevent O2 regression due to STF issue, but may lose > some perf where there's no such issue(1 vector_load vs n scalar_load + > CTOR). > > No impact for SPEC2017 for both plain O2 and native O2 on ICX. > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > PR target/101908 > * config/i386/i386.cc (ix86_load_maybe_stfs_p): New. > (ix86_vector_costs::add_stmt_cost): Add extra cost for > vector_load/unsigned_load which may have stall forward issue. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr101908-1.c: New test. > * gcc.target/i386/pr101908-2.c: New test. > --- > gcc/config/i386/i386.cc| 31 ++ > gcc/testsuite/gcc.target/i386/pr101908-1.c | 12 + > gcc/testsuite/gcc.target/i386/pr101908-2.c | 12 + > 3 files changed, 55 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-2.c > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index b2bf90576d5..3bbaaf65ea8 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -22976,6 +22976,19 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, > struct noce_if_info *if_info) >return default_noce_conversion_profitable_p (seq, if_info); > } > > +/* Return true if REF may have STF issue, otherwise false. */ > +static bool > +ix86_load_maybe_stfs_p (tree ref) > +{ > + tree addr = get_base_address (ref); > + > + if (TREE_CODE (addr) != PARM_DECL > + || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (addr))) > + || tree_to_uhwi (TYPE_SIZE (TREE_TYPE (addr))) <= MAX_BITS_PER_WORD) > +return false; > + return true; > +} > + > /* x86-specific vector costs. */ > class ix86_vector_costs : public vector_costs > { > @@ -23203,6 +23216,24 @@ ix86_vector_costs::add_stmt_cost (int count, > vect_cost_for_stmt kind, > if (TREE_CODE (op) == SSA_NAME) > TREE_VISITED (op) = 0; > } > + > + /* Prevent vectorization for load from parm_decl at O2 to avoid STF issue. > + Performance may lose when there's no STF issue(1 vector_load vs n > + scalar_load + CTOR). > + TODO: both extra cost(2000) and ix86_load_maybe_stfs_p need to be fine > + tuned. */ > + if ((kind == vector_load || kind == unaligned_load) > + && flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP > + && stmt_info > + && stmt_info->slp_type == pure_slp > + && stmt_info->stmt > + && gimple_assign_load_p (stmt_info->stmt) > + && ix86_load_maybe_stfs_p (gimple_assign_rhs1 (stmt_info->stmt))) > +{ > + stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign); > + stmt_cost += 2000; > +} > + >if (stmt_cost == -1) > stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign); > > diff --git a/gcc/testsuite/gcc.target/i386/pr101908-1.c > b/gcc/testsuite/gcc.target/i386/pr101908-1.c > new file mode 100644 > index 000..f8e0f2e26bb > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr101908-1.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-slp-details" } */ > +/* { dg-final { scan-tree-dump {(?n)add new stmt:.*MEM \ double\>} "slp2" } } */ > + > +struct X { double x[2]; }; > +typedef double v2df __attribute__((vector_size(16))); > + > +v2df __attribute__((noipa)) > +foo (struct X* x, struct X* y) > +{ > + return (v2df) {x->x[1], x->x[0] } + (v2df) { y->x[1], y->x[0] }; > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr101908-2.c > b/gcc/testsuite/gcc.target/i386/pr101908-2.c > new file mode 100644 > index 000..7f2f00cebab > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr101908-2.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-slp-details" } */ > +/* { dg-final { scan-tree-dump-not {(?n)add new stmt:.*MEM \ double\>} "slp2" } } */ > + > +struct X { double x[2]; }; > +typedef double v2df __attribute__((vector_size(16))); > + > +v2df __attribute__((noipa)) > +foo (struct X x, struct X y) > +{ > + return (v2df) {x.x[1], x.x[0] } + (v2df) { y.x[1], y.x[0] }; > +} > -- > 2.18.1 > -- BR, Hongtao
Re: [PATCH] i386: Fix up cond_{and,ior,xor,mul}* [PR104779]
On Mon, Mar 07, 2022 at 10:15:48AM +0800, Hongtao Liu wrote: > > Note, the predicates on cond_fma* and other FMA variants look all wrong to > > me, usually the fma instructions require nonimmediate_operand operands, > > but the cond_* patterns use vector_operand. Besides what this patch > cut from predicate.md- > 1142; Return true when OP is operand acceptable for vector memory operand. > 1143; Only AVX can have misaligned memory operand. > 1144(define_predicate "vector_memory_operand" > 1145 (and (match_operand 0 "memory_operand") > 1146 (ior (match_test "TARGET_AVX") > 1147(match_test "MEM_ALIGN (op) >= GET_MODE_ALIGNMENT (mode)" > 1148 > 1149; Return true when OP is register_operand or vector_memory_operand. > 1150(define_predicate "vector_operand" > 1151 (ior (match_operand 0 "register_operand") > 1152 (match_operand 0 "vector_memory_operand"))) > cut end > > vector_operand is a subset of nonimmediate_operands, so it's more like > a potential optimization issue rather than a correctness one? You're right. And fma apparently doesn't use something like ix86_fixup_binary_operands*, allows all 3 inputs as MEMs and only deals with it at RA time. Jakub
Re: [PATCH] [i386] Prevent vectorization for load from parm_decl at O2 to avoid STF issue.
On Fri, Mar 4, 2022 at 8:27 AM liuhongt wrote: > > For parameter passing through stack, vectorized load from parm_decl > in callee may trigger serious STF issue. This is why GCC12 regresses > 50% for cray at -O2 compared to GCC11. > > The patch add an extremely large number to stmt_cost to prevent > vectorization for loads from parm_decl under very-cheap cost model, > this can at least prevent O2 regression due to STF issue, but may lose > some perf where there's no such issue(1 vector_load vs n scalar_load + > CTOR). Note this is just heuristics in that by-value passed parameters are usually stored to the stack close before the function call. It does not catch the similar case from foo (const X &bar) { ... } where a foo ({ 1., 2. }) will have the object passed by reference constructed right before the call. In the end a full solution will need to perform some IPA analysis that computes the initialization distance from the call and uses should factor in the use distance from function entry. > No impact for SPEC2017 for both plain O2 and native O2 on ICX. > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > PR target/101908 > * config/i386/i386.cc (ix86_load_maybe_stfs_p): New. > (ix86_vector_costs::add_stmt_cost): Add extra cost for > vector_load/unsigned_load which may have stall forward issue. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr101908-1.c: New test. > * gcc.target/i386/pr101908-2.c: New test. > --- > gcc/config/i386/i386.cc| 31 ++ > gcc/testsuite/gcc.target/i386/pr101908-1.c | 12 + > gcc/testsuite/gcc.target/i386/pr101908-2.c | 12 + > 3 files changed, 55 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-2.c > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index b2bf90576d5..3bbaaf65ea8 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -22976,6 +22976,19 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, > struct noce_if_info *if_info) >return default_noce_conversion_profitable_p (seq, if_info); > } > > +/* Return true if REF may have STF issue, otherwise false. */ > +static bool > +ix86_load_maybe_stfs_p (tree ref) > +{ > + tree addr = get_base_address (ref); > + > + if (TREE_CODE (addr) != PARM_DECL > + || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (addr))) > + || tree_to_uhwi (TYPE_SIZE (TREE_TYPE (addr))) <= MAX_BITS_PER_WORD) > +return false; > + return true; > +} > + > /* x86-specific vector costs. */ > class ix86_vector_costs : public vector_costs > { > @@ -23203,6 +23216,24 @@ ix86_vector_costs::add_stmt_cost (int count, > vect_cost_for_stmt kind, > if (TREE_CODE (op) == SSA_NAME) > TREE_VISITED (op) = 0; > } > + > + /* Prevent vectorization for load from parm_decl at O2 to avoid STF issue. > + Performance may lose when there's no STF issue(1 vector_load vs n > + scalar_load + CTOR). > + TODO: both extra cost(2000) and ix86_load_maybe_stfs_p need to be fine > + tuned. */ > + if ((kind == vector_load || kind == unaligned_load) > + && flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP This check doesn't make much sense, I'd rather remove it. > + && stmt_info > + && stmt_info->slp_type == pure_slp > + && stmt_info->stmt > + && gimple_assign_load_p (stmt_info->stmt) > + && ix86_load_maybe_stfs_p (gimple_assign_rhs1 (stmt_info->stmt))) I'd pass down STMT_VINFO_DATA_REF instead and have ix86_load_maybe_stfs_p and use tree addr = DR_BASE_ADDRESS (dr); if (TREE_CODE (addr) != ADDR_EXPR) return false; addr = get_base_address (TREE_OPERAND (addr, 0)); ... since that gets you a more reliable way to look at the actual object referenced. > +{ > + stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign); > + stmt_cost += 2000; > +} > + Maybe handle this like the Bonell case and thus put it after the stmt_cost == -1 handling, just bumping the cost (also noting the actual number is arbitrary). It would be nice to have a better estimate on the penalty than "2000", maybe formulate it in terms of the target costs simple-sse op at least. That said, it might be interesting to micro-benchmark v2df __attribute__((noipa)) foo (struct X* x, struct X* y) { double temx0, temx1, temy0, temy1; temx0 = x->x[0]; temx0 += temx0; ... temx1 = x->x[1]; temx1 += temx1; ... return (v2df) {temx1, temx0 } + (v2df) { temy1, temy0 }; } (without -ffast-math) to see how many vector adds we'd need to compensate the STLF penalty (just to have an idea whether the magic number is closer to 200, 2000 or 2). Maybe also put that respective kernel into the i386 testsuite with a specific -mtune (and make the thing a target tunable?). >if (stmt_cost == -1) >
[PATCH] PR tree-optimization/98335: Improvements to DSE's compute_trims.
This patch is the main middle-end piece of a fix for PR tree-opt/98335, which is a code-quality regression affecting mainline. The issue occurs in DSE's (dead store elimination's) compute_trims function that determines where a store to memory can be trimmed. In the testcase given in the PR, this function notices that the first byte of a DImode store is dead, and replaces the 8-byte store at (aligned) offset zero, with a 7-byte store at (unaligned) offset one. Most architectures can store a power-of-two bytes (up to a maximum) in single instruction, so writing 7 bytes requires more instructions than writing 8 bytes. This patch follows Jakub Jelinek's suggestion in comment 5, that compute_trims needs improved heuristics. In this patch, decision of whether and how to align trim_head is based on the number of bytes being written, the alignment of the start of the object and where within the object the first byte is written. The first tests check whether we're already writing to the start of the object, and that we're writing three or more bytes. If we're only writing one or two bytes, there's no benefit from providing additional alignment. Then we determine the alignment of the object, which is either 1, 2, 4, 8 or 16 byte aligned (capping at 16 guarantees that we never write more than 7 bytes beyond the minimum required). If the buffer is only 1 or 2 byte aligned there's no benefit from additional alignment. For the remaining cases, alignment of trim_head is based upon where within each aligned block (word) the first byte is written. For example, storing the last byte (or last half-word) of a word can be performed with a single insn. On x86_64-pc-linux-gnu with -O2 the new test case in the PR goes from: movl$0, -24(%rsp) movabsq $72057594037927935, %rdx movl$0, -21(%rsp) andq-24(%rsp), %rdx movq%rdx, %rax salq$8, %rax movbc(%rip), %al ret to xorl%eax, %eax movbc(%rip), %al ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. I've also added new testcases for the original motivating PR tree-optimization/86010, to ensure that those remain optimized (in future). Ok for mainline? 2022-03-07 Roger Sayle gcc/ChangeLog PR tree-optimization/98335 * tree-ssa-dse.cc (compute_trims): Improve logic deciding whether to align trim_head, writing more bytes by using fewer instructions. gcc/testsuite/ChangeLog PR tree-optimization/98335 * g++.dg/pr98335.C: New test case. * gcc.dg/pr86010.c: New test case. * gcc.dg/pr86010-2.c: New test case. Thanks in advance, Roger -- diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc index 2b22a61..080e406 100644 --- a/gcc/tree-ssa-dse.cc +++ b/gcc/tree-ssa-dse.cc @@ -405,10 +405,36 @@ compute_trims (ao_ref *ref, sbitmap live, int *trim_head, int *trim_tail, int first_live = bitmap_first_set_bit (live); *trim_head = first_live - first_orig; - /* If more than a word remains, then make sure to keep the - starting point at least word aligned. */ - if (last_live - first_live > UNITS_PER_WORD) -*trim_head &= ~(UNITS_PER_WORD - 1); + /* If REF is aligned, try to maintain this alignment if it reduces + the number of (power-of-two sized aligned) writes to memory. + First check that we're writing >= 3 bytes at a non-zero offset. */ + if (first_live + && last_live - first_live >= 2) +{ + unsigned int align = TYPE_ALIGN_UNIT (TREE_TYPE (ref->base)); + if (DECL_P (ref->base) && DECL_ALIGN_UNIT (ref->base) > align) + align = DECL_ALIGN_UNIT (ref->base); + if (align > UNITS_PER_WORD) + align = UNITS_PER_WORD; + if (align > 16) + align = 16; + if (align > 2) + { + /* ALIGN is 4, 8 or 16. */ + unsigned int low = first_live & (align - 1); + if (low * 2 < align) + { + if (align == 16 && low >= 4 && last_live < 15) + *trim_head &= ~3; + else + *trim_head &= ~(align - 1); + } + else if (low + 3 == align) + *trim_head &= ~1; + else if (low > 8 && low < 12) + *trim_head &= ~3; + } +} if ((*trim_head || *trim_tail) && dump_file && (dump_flags & TDF_DETAILS)) diff --git a/gcc/testsuite/g++.dg/pr98335.C b/gcc/testsuite/g++.dg/pr98335.C new file mode 100644 index 000..c54f4d9 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr98335.C @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +struct Data { + char a; + int b; +}; + +char c; + +Data val(int idx) { + return { c }; // { dg-warning "extended initializer" "c++ 98" { target { c++98_only } } } +} + +/* { dg-final { scan-tree-dump-not " + 1B] = {}" "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/pr86010-2.c b/gcc/te
[PATCH][ARM] translation: reuse string and use switch for codes
Hi. The patch simplifies translation strings. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error message. --- gcc/config/arm/arm-builtins.cc | 127 - 1 file changed, 79 insertions(+), 48 deletions(-) diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc index a7acc1d71e7..bdccba2bc88 100644 --- a/gcc/config/arm/arm-builtins.cc +++ b/gcc/config/arm/arm-builtins.cc @@ -3926,61 +3926,92 @@ arm_expand_builtin (tree exp, || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW) && (imm < 0 || imm > 32)) { - if (fcode == ARM_BUILTIN_WRORHI) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_rori_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WRORWI) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_rori_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WRORH) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_ror_pi16%> in code"); - else - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_ror_pi32%> in code"); + const char *builtin; + switch (fcode) + { + case ARM_BUILTIN_WRORHI: + builtin = "_mm_rori_pi16"; + break; + case ARM_BUILTIN_WRORWI: + builtin = "_mm_rori_pi32"; + break; + case ARM_BUILTIN_WRORH: + builtin = "_mm_ror_pi16"; + break; + default: + builtin = "_mm_ror_pi32"; + break; + } + error ("the range of count should be in 0 to 32; please check the intrinsic %qs in code", builtin); } else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD) && (imm < 0 || imm > 64)) { - if (fcode == ARM_BUILTIN_WRORDI) - error ("the range of count should be in 0 to 64; please check the intrinsic %<_mm_rori_si64%> in code"); - else - error ("the range of count should be in 0 to 64; please check the intrinsic %<_mm_ror_si64%> in code"); + const char *builtin = fcode == ARM_BUILTIN_WRORDI ? "_mm_rori_si64" : "_mm_ror_si64"; + error ("the range of count should be in 0 to 64; please check the intrinsic %qs in code", builtin); } else if (imm < 0) { - if (fcode == ARM_BUILTIN_WSRLHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRLWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRLDI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSLLHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSLLWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSLLDI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSRAHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRAWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRADI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSRLH) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRLW) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRLD) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSLLH) - error ("the count should be no less than 0; please check the intrinsic %<_mm_sll_pi16%> in code"); - else if
Re: [PATCH] s390: Fix up *cmp_and_trap_unsigned_int constraints [PR104775]
On 3/5/22 09:33, Jakub Jelinek wrote: > Hi! > > The following testcase fails to assemble due to clgte %r6,0(%r1,%r10) > insn not being accepted by assembler. > My rough understanding is that in the RSY-b insn format the spot > in other formats used for index registers is used instead for M3 what > kind of comparison it is, so this patch follows what other similar > instructions use for constraint (i.e. one without index register). > > Bootstrapped on s390x-linux, regtest there still pending, ok for > trunk if it passes it? > > 2022-03-05 Jakub Jelinek > > PR target/104775 > * config/s390/s390.md (*cmp_and_trap_unsigned_int): Use > S constraint instead of T in the last alternative. > > * gcc.target/s390/pr104775.c: New test. Ok. Thanks for the fix! Bye, Andreas
Re: [PATCH][ARM] translation: reuse string and use switch for codes
On Mon, Mar 07, 2022 at 11:09:26AM +0100, Martin Liška wrote: > gcc/ChangeLog: > > * config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error > message. Will defer review to ARM maintainers, just wanted to mention some of the lines are way too long (before or after). > --- a/gcc/config/arm/arm-builtins.cc > +++ b/gcc/config/arm/arm-builtins.cc > @@ -3926,61 +3926,92 @@ arm_expand_builtin (tree exp, > || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW) > && (imm < 0 || imm > 32)) > { > - if (fcode == ARM_BUILTIN_WRORHI) > - error ("the range of count should be in 0 to 32; please check > the intrinsic %<_mm_rori_pi16%> in code"); > - else if (fcode == ARM_BUILTIN_WRORWI) > - error ("the range of count should be in 0 to 32; please check > the intrinsic %<_mm_rori_pi32%> in code"); > - else if (fcode == ARM_BUILTIN_WRORH) > - error ("the range of count should be in 0 to 32; please check > the intrinsic %<_mm_ror_pi16%> in code"); > - else > - error ("the range of count should be in 0 to 32; please check > the intrinsic %<_mm_ror_pi32%> in code"); > + const char *builtin; > + switch (fcode) > + { > + case ARM_BUILTIN_WRORHI: > + builtin = "_mm_rori_pi16"; > + break; > + case ARM_BUILTIN_WRORWI: > + builtin = "_mm_rori_pi32"; > + break; > + case ARM_BUILTIN_WRORH: > + builtin = "_mm_ror_pi16"; > + break; > + default: > + builtin = "_mm_ror_pi32"; > + break; > + } > + error ("the range of count should be in 0 to 32; please check the > intrinsic %qs in code", builtin); > } > else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD) > && (imm < 0 || imm > 64)) > { > - if (fcode == ARM_BUILTIN_WRORDI) > - error ("the range of count should be in 0 to 64; please check > the intrinsic %<_mm_rori_si64%> in code"); > - else > - error ("the range of count should be in 0 to 64; please check > the intrinsic %<_mm_ror_si64%> in code"); > + const char *builtin = fcode == ARM_BUILTIN_WRORDI ? > "_mm_rori_si64" : "_mm_ror_si64"; > + error ("the range of count should be in 0 to 64; please check the > intrinsic %qs in code", builtin); > } > else if (imm < 0) > { > - if (fcode == ARM_BUILTIN_WSRLHI) > - error ("the count should be no less than 0; please check the > intrinsic %<_mm_srli_pi16%> in code"); > - else if (fcode == ARM_BUILTIN_WSRLWI) > - error ("the count should be no less than 0; please check the > intrinsic %<_mm_srli_pi32%> in code"); > - else if (fcode == ARM_BUILTIN_WSRLDI) > - error ("the count should be no less than 0; please check the > intrinsic %<_mm_srli_si64%> in code"); > - else if (fcode == ARM_BUILTIN_WSLLHI) > - error ("the count should be no less than 0; please check the > intrinsic %<_mm_slli_pi16%> in code"); > - else if (fcode == ARM_BUILTIN_WSLLWI) > - error ("the count should be no less than 0; please check the > intrinsic %<_mm_slli_pi32%> in code"); > - else if (fcode == ARM_BUILTIN_WSLLDI) > - error ("the count should be no less than 0; please check the > intrinsic %<_mm_slli_si64%> in code"); > - else if (fcode == ARM_BUILTIN_WSRAHI) > - error ("the count should be no less than 0; please check the > intrinsic %<_mm_srai_pi16%> in code"); > - else if (fcode == ARM_BUILTIN_WSRAWI) > - error ("the count should be no less than 0; please check the > intrinsic %<_mm_srai_pi32%> in code"); > - else if (fcode == ARM_BUILTIN_WSRADI) > - error ("the count should be no less than 0; please check the > intrinsic %<_mm_srai_si64%> in code"); > - else if (fcode == ARM_BUILTIN_WSRLH) > - error ("the count should be no less than 0; please check the > intrinsic %<_mm_srl_pi16%> in code"); > - else if (fcode == ARM_BUILTIN_WSRLW) > - error ("the count should be no less than 0; please check the > intrinsic %<_mm_srl_pi32%> in code"); > - else if (fcode == ARM_BUILTIN_WSRLD) > - error ("the count should be no less than 0; please check the > intrinsic %<_mm_srl_si64%> in code"); > - else if (fcode == ARM_BUILTIN_WSLLH) > - error ("the count should be no less than 0; please check the > intrinsic %<_mm_sll_pi16%> in code"); > - else if (fcode == ARM_BUILTIN_WSLLW) > - error ("the count should be no less than 0; please check the > intrinsic %<_mm_sll_pi32%> in code"); > - else if (
Re: [PATCH][ARM] translation: reuse string and use switch for codes
Hello. There's V2 where I fixed fact that: ARM_BUILTIN_WRORHI || fcode == ARM_BUILTIN_WRORH can have valid range on [1, 16]. Ready to be installed? Thanks, MartinFrom 879262c3a7aefd3ab9552cec881248b240f53818 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Mon, 7 Mar 2022 10:56:43 +0100 Subject: [PATCH] translation: reuse string and use switch for codes gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error message. --- gcc/config/arm/arm-builtins.cc | 125 - 1 file changed, 75 insertions(+), 50 deletions(-) diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc index a7acc1d71e7..77d47d1d059 100644 --- a/gcc/config/arm/arm-builtins.cc +++ b/gcc/config/arm/arm-builtins.cc @@ -3922,65 +3922,90 @@ arm_expand_builtin (tree exp, if (GET_MODE (op1) == VOIDmode) { imm = INTVAL (op1); - if ((fcode == ARM_BUILTIN_WRORHI || fcode == ARM_BUILTIN_WRORWI - || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW) + if ((fcode == ARM_BUILTIN_WRORWI || fcode == ARM_BUILTIN_WRORW) && (imm < 0 || imm > 32)) { - if (fcode == ARM_BUILTIN_WRORHI) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_rori_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WRORWI) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_rori_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WRORH) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_ror_pi16%> in code"); - else - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_ror_pi32%> in code"); + const char *builtin = (fcode == ARM_BUILTIN_WRORWI + ? "_mm_rori_pi32" : "_mm_ror_pi32"); + error ("the range of count should be in 0 to 32; please check the intrinsic %qs in code", + builtin); + } + else if ((fcode == ARM_BUILTIN_WRORHI || fcode == ARM_BUILTIN_WRORH) + && (imm < 0 || imm > 16)) + { + const char *builtin = (fcode == ARM_BUILTIN_WRORHI + ? "_mm_rori_pi16" : "_mm_ror_pi16"); + error ("the range of count should be in 0 to 16; please check the intrinsic %qs in code", + builtin); } else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD) && (imm < 0 || imm > 64)) { - if (fcode == ARM_BUILTIN_WRORDI) - error ("the range of count should be in 0 to 64; please check the intrinsic %<_mm_rori_si64%> in code"); - else - error ("the range of count should be in 0 to 64; please check the intrinsic %<_mm_ror_si64%> in code"); + const char *builtin = fcode == ARM_BUILTIN_WRORDI ? "_mm_rori_si64" : "_mm_ror_si64"; + error ("the range of count should be in 0 to 64; please check the intrinsic %qs in code", builtin); } else if (imm < 0) { - if (fcode == ARM_BUILTIN_WSRLHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRLWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRLDI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSLLHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSLLWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSLLDI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSRAHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRAWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRADI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSRLH) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRLW) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRLD) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSLLH) - error ("the count should be no less than 0; please check the intrinsic %<_mm_sll_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSLLW) - error ("the count should be no less than 0; please check the intrinsic %<_mm_sll_pi32%> in code"); - else if (f
Re: [PATCH][ARM] translation: reuse string and use switch for codes
On 3/7/22 11:12, Jakub Jelinek wrote: Will defer review to ARM maintainers, just wanted to mention some of the lines are way too long (before or after). Sure, fixed in V3. MartinFrom 34a195d905bb1719e13c2021ee475a0581b62a5b Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Mon, 7 Mar 2022 10:56:43 +0100 Subject: [PATCH] translation: reuse string and use switch for codes gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error message. --- gcc/config/arm/arm-builtins.cc | 127 - 1 file changed, 77 insertions(+), 50 deletions(-) diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc index a7acc1d71e7..6c0b1bda66a 100644 --- a/gcc/config/arm/arm-builtins.cc +++ b/gcc/config/arm/arm-builtins.cc @@ -3922,65 +3922,92 @@ arm_expand_builtin (tree exp, if (GET_MODE (op1) == VOIDmode) { imm = INTVAL (op1); - if ((fcode == ARM_BUILTIN_WRORHI || fcode == ARM_BUILTIN_WRORWI - || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW) + if ((fcode == ARM_BUILTIN_WRORWI || fcode == ARM_BUILTIN_WRORW) && (imm < 0 || imm > 32)) { - if (fcode == ARM_BUILTIN_WRORHI) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_rori_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WRORWI) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_rori_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WRORH) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_ror_pi16%> in code"); - else - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_ror_pi32%> in code"); + const char *builtin = (fcode == ARM_BUILTIN_WRORWI + ? "_mm_rori_pi32" : "_mm_ror_pi32"); + error ("the range of count should be in 0 to 32; " + "please check the intrinsic %qs in code", builtin); + } + else if ((fcode == ARM_BUILTIN_WRORHI || fcode == ARM_BUILTIN_WRORH) + && (imm < 0 || imm > 16)) + { + const char *builtin = (fcode == ARM_BUILTIN_WRORHI + ? "_mm_rori_pi16" : "_mm_ror_pi16"); + error ("the range of count should be in 0 to 16; " + "please check the intrinsic %qs in code", builtin); } else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD) && (imm < 0 || imm > 64)) { - if (fcode == ARM_BUILTIN_WRORDI) - error ("the range of count should be in 0 to 64; please check the intrinsic %<_mm_rori_si64%> in code"); - else - error ("the range of count should be in 0 to 64; please check the intrinsic %<_mm_ror_si64%> in code"); + const char *builtin = (fcode == ARM_BUILTIN_WRORDI + ? "_mm_rori_si64" : "_mm_ror_si64"); + error ("the range of count should be in 0 to 64; " + "please check the intrinsic %qs in code", builtin); } else if (imm < 0) { - if (fcode == ARM_BUILTIN_WSRLHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRLWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRLDI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSLLHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSLLWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSLLDI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSRAHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRAWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRADI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSRLH) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRLW) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRLD) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSLLH) - error ("the count should be no less than 0; please check the intrinsic %<_mm_sll_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSLLW) - error ("the count should be no less than 0; please check the intrinsic %
Re: [PATCH][ARM] translation: reuse string and use switch for codes
Hi! On 3/7/22 11:09, Martin Liška wrote: Hi. The patch simplifies translation strings. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. This is an arm-only patch, it's not clear if you built a cross-compiler for arm, and that these regression tests passed? Ready to be installed? Thanks, Martin gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error message. I think this is about PR target/104794, so this should be mentioned? Thanks, Christophe --- gcc/config/arm/arm-builtins.cc | 127 - 1 file changed, 79 insertions(+), 48 deletions(-) diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc index a7acc1d71e7..bdccba2bc88 100644 --- a/gcc/config/arm/arm-builtins.cc +++ b/gcc/config/arm/arm-builtins.cc @@ -3926,61 +3926,92 @@ arm_expand_builtin (tree exp, || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW) && (imm < 0 || imm > 32)) { - if (fcode == ARM_BUILTIN_WRORHI) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_rori_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WRORWI) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_rori_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WRORH) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_ror_pi16%> in code"); - else - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_ror_pi32%> in code"); + const char *builtin; + switch (fcode) + { + case ARM_BUILTIN_WRORHI: + builtin = "_mm_rori_pi16"; + break; + case ARM_BUILTIN_WRORWI: + builtin = "_mm_rori_pi32"; + break; + case ARM_BUILTIN_WRORH: + builtin = "_mm_ror_pi16"; + break; + default: + builtin = "_mm_ror_pi32"; + break; + } + error ("the range of count should be in 0 to 32; please check the intrinsic %qs in code", builtin); } else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD) && (imm < 0 || imm > 64)) { - if (fcode == ARM_BUILTIN_WRORDI) - error ("the range of count should be in 0 to 64; please check the intrinsic %<_mm_rori_si64%> in code"); - else - error ("the range of count should be in 0 to 64; please check the intrinsic %<_mm_ror_si64%> in code"); + const char *builtin = fcode == ARM_BUILTIN_WRORDI ? "_mm_rori_si64" : "_mm_ror_si64"; + error ("the range of count should be in 0 to 64; please check the intrinsic %qs in code", builtin); } else if (imm < 0) { - if (fcode == ARM_BUILTIN_WSRLHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRLWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRLDI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSLLHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSLLWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSLLDI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSRAHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRAWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRADI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSRLH) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRLW) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRLD) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSLLH) - error ("the count should be no less than 0; please check the intrinsic %<_mm_sll_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSLLW) - error ("t
Re: [PATCH][ARM] translation: reuse string and use switch for codes
On 3/7/22 11:18, Christophe Lyon via Gcc-patches wrote: Hi! On 3/7/22 11:09, Martin Liška wrote: Hi. The patch simplifies translation strings. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. This is an arm-only patch, it's not clear if you built a cross-compiler for arm, and that these regression tests passed? Sorry, I meant I can built the cross compiler. If I see correctly there are no tests that would check the error messages. Ready to be installed? Thanks, Martin gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error message. I think this is about PR target/104794, so this should be mentioned? Yep. There's V4. Is it ready for master? Martin Thanks, Christophe --- gcc/config/arm/arm-builtins.cc | 127 - 1 file changed, 79 insertions(+), 48 deletions(-) diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc index a7acc1d71e7..bdccba2bc88 100644 --- a/gcc/config/arm/arm-builtins.cc +++ b/gcc/config/arm/arm-builtins.cc @@ -3926,61 +3926,92 @@ arm_expand_builtin (tree exp, || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW) && (imm < 0 || imm > 32)) { - if (fcode == ARM_BUILTIN_WRORHI) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_rori_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WRORWI) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_rori_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WRORH) - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_ror_pi16%> in code"); - else - error ("the range of count should be in 0 to 32; please check the intrinsic %<_mm_ror_pi32%> in code"); + const char *builtin; + switch (fcode) + { + case ARM_BUILTIN_WRORHI: + builtin = "_mm_rori_pi16"; + break; + case ARM_BUILTIN_WRORWI: + builtin = "_mm_rori_pi32"; + break; + case ARM_BUILTIN_WRORH: + builtin = "_mm_ror_pi16"; + break; + default: + builtin = "_mm_ror_pi32"; + break; + } + error ("the range of count should be in 0 to 32; please check the intrinsic %qs in code", builtin); } else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD) && (imm < 0 || imm > 64)) { - if (fcode == ARM_BUILTIN_WRORDI) - error ("the range of count should be in 0 to 64; please check the intrinsic %<_mm_rori_si64%> in code"); - else - error ("the range of count should be in 0 to 64; please check the intrinsic %<_mm_ror_si64%> in code"); + const char *builtin = fcode == ARM_BUILTIN_WRORDI ? "_mm_rori_si64" : "_mm_ror_si64"; + error ("the range of count should be in 0 to 64; please check the intrinsic %qs in code", builtin); } else if (imm < 0) { - if (fcode == ARM_BUILTIN_WSRLHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRLWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRLDI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srli_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSLLHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSLLWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSLLDI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_slli_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSRAHI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRAWI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRADI) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srai_si64%> in code"); - else if (fcode == ARM_BUILTIN_WSRLH) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_pi16%> in code"); - else if (fcode == ARM_BUILTIN_WSRLW) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_pi32%> in code"); - else if (fcode == ARM_BUILTIN_WSRLD) - error ("the count should be no less than 0; please check the intrinsic %<_mm_srl_si64%> in code"); -
[PATCH] arm: fix option quoting in error messages.
This fixes option quoting in error messages. Ready to be installed? Thanks, Martin PR target/104794 gcc/ChangeLog: * config/arm/arm.cc (arm_option_override_internal): Fix quoting of options in error messages. (arm_option_reconfigure_globals): Likewise. --- gcc/config/arm/arm.cc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index c1103d9fad6..4bf1ef0517d 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -3185,8 +3185,8 @@ arm_option_override_internal (struct gcc_options *opts, if (arm_stack_protector_guard == SSP_GLOBAL && opts->x_arm_stack_protector_guard_offset_str) { - error ("incompatible options %'-mstack-protector-guard=global%' and" -"%'-mstack-protector-guard-offset=%qs%'", + error ("incompatible options %<-mstack-protector-guard=global%> and" +"%<-mstack-protector-guard-offset=%s%>", arm_stack_protector_guard_offset_str); } @@ -3880,7 +3880,7 @@ arm_option_reconfigure_globals (void) } if (!TARGET_HARD_TP && arm_stack_protector_guard == SSP_TLSREG) -error("%'-mstack-protector-guard=tls%' needs a hardware TLS register"); +error("%<-mstack-protector-guard=tls%> needs a hardware TLS register"); } /* Perform some validation between the desired architecture and the rest of the -- 2.35.1
RE: [PATCH][ARM] translation: reuse string and use switch for codes
> -Original Message- > From: Gcc-patches bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Martin Liška > Sent: Monday, March 7, 2022 10:22 AM > To: Christophe Lyon ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH][ARM] translation: reuse string and use switch for codes > > On 3/7/22 11:18, Christophe Lyon via Gcc-patches wrote: > > Hi! > > > > On 3/7/22 11:09, Martin Liška wrote: > >> Hi. > >> > >> The patch simplifies translation strings. > >> > >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > This is an arm-only patch, it's not clear if you built a cross-compiler for > > arm, > and that these regression tests passed? > > Sorry, I meant I can built the cross compiler. If I see correctly there are no > tests that > would check the error messages. > > > > >> > >> Ready to be installed? > >> Thanks, > >> Martin > >> > >> gcc/ChangeLog: > >> > >> * config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error > >> message. > > > > I think this is about PR target/104794, so this should be mentioned? > > Yep. > > There's V4. > > Is it ready for master? Ok. Thanks, Kyrill > > Martin > > > > > Thanks, > > > > Christophe > > > > > >> --- > >> gcc/config/arm/arm-builtins.cc | 127 - > >> 1 file changed, 79 insertions(+), 48 deletions(-) > >> > >> diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm- > builtins.cc > >> index a7acc1d71e7..bdccba2bc88 100644 > >> --- a/gcc/config/arm/arm-builtins.cc > >> +++ b/gcc/config/arm/arm-builtins.cc > >> @@ -3926,61 +3926,92 @@ arm_expand_builtin (tree exp, > >> || fcode == ARM_BUILTIN_WRORH || fcode == > ARM_BUILTIN_WRORW) > >> && (imm < 0 || imm > 32)) > >> { > >> - if (fcode == ARM_BUILTIN_WRORHI) > >> - error ("the range of count should be in 0 to 32; please check the > intrinsic %<_mm_rori_pi16%> in code"); > >> - else if (fcode == ARM_BUILTIN_WRORWI) > >> - error ("the range of count should be in 0 to 32; please check the > intrinsic %<_mm_rori_pi32%> in code"); > >> - else if (fcode == ARM_BUILTIN_WRORH) > >> - error ("the range of count should be in 0 to 32; please check the > intrinsic %<_mm_ror_pi16%> in code"); > >> - else > >> - error ("the range of count should be in 0 to 32; please check the > intrinsic %<_mm_ror_pi32%> in code"); > >> + const char *builtin; > >> + switch (fcode) > >> + { > >> + case ARM_BUILTIN_WRORHI: > >> + builtin = "_mm_rori_pi16"; > >> + break; > >> + case ARM_BUILTIN_WRORWI: > >> + builtin = "_mm_rori_pi32"; > >> + break; > >> + case ARM_BUILTIN_WRORH: > >> + builtin = "_mm_ror_pi16"; > >> + break; > >> + default: > >> + builtin = "_mm_ror_pi32"; > >> + break; > >> + } > >> + error ("the range of count should be in 0 to 32; please check > >> the > intrinsic %qs in code", builtin); > >> } > >> else if ((fcode == ARM_BUILTIN_WRORDI || fcode == > ARM_BUILTIN_WRORD) > >> && (imm < 0 || imm > 64)) > >> { > >> - if (fcode == ARM_BUILTIN_WRORDI) > >> - error ("the range of count should be in 0 to 64; please check the > intrinsic %<_mm_rori_si64%> in code"); > >> - else > >> - error ("the range of count should be in 0 to 64; please check the > intrinsic %<_mm_ror_si64%> in code"); > >> + const char *builtin = fcode == ARM_BUILTIN_WRORDI ? > "_mm_rori_si64" : "_mm_ror_si64"; > >> + error ("the range of count should be in 0 to 64; please check > >> the > intrinsic %qs in code", builtin); > >> } > >> else if (imm < 0) > >> { > >> - if (fcode == ARM_BUILTIN_WSRLHI) > >> - error ("the count should be no less than 0; please check the > >> intrinsic > %<_mm_srli_pi16%> in code"); > >> - else if (fcode == ARM_BUILTIN_WSRLWI) > >> - error ("the count should be no less than 0; please check the > >> intrinsic > %<_mm_srli_pi32%> in code"); > >> - else if (fcode == ARM_BUILTIN_WSRLDI) > >> - error ("the count should be no less than 0; please check the > >> intrinsic > %<_mm_srli_si64%> in code"); > >> - else if (fcode == ARM_BUILTIN_WSLLHI) > >> - error ("the count should be no less than 0; please check the > >> intrinsic > %<_mm_slli_pi16%> in code"); > >> - else if (fcode == ARM_BUILTIN_WSLLWI) > >> - error ("the count should be no less than 0; please check the > >> intrinsic > %<_mm_slli_pi32%> in code"); > >> - else if (fcode == ARM_BUILTIN_WSLLDI) > >> - error ("the count should be no less than 0; please check the > >> intrinsic > %<_mm_slli_si64%> in code"); > >> - else if (fcode == ARM_BUILTIN_WSRAHI) > >> - error ("the count should be no less than 0; please check the >
RE: [PATCH] arm: fix option quoting in error messages.
> -Original Message- > From: Gcc-patches bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Martin Liška > Sent: Monday, March 7, 2022 10:28 AM > To: gcc-patches@gcc.gnu.org > Cc: Christophe Lyon > Subject: [PATCH] arm: fix option quoting in error messages. > > This fixes option quoting in error messages. > > Ready to be installed? Ok. Thanks, Kyrill > Thanks, > Martin > > PR target/104794 > > gcc/ChangeLog: > > * config/arm/arm.cc (arm_option_override_internal): Fix quoting > of options in error messages. > (arm_option_reconfigure_globals): Likewise. > --- > gcc/config/arm/arm.cc | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc > index c1103d9fad6..4bf1ef0517d 100644 > --- a/gcc/config/arm/arm.cc > +++ b/gcc/config/arm/arm.cc > @@ -3185,8 +3185,8 @@ arm_option_override_internal (struct gcc_options > *opts, > if (arm_stack_protector_guard == SSP_GLOBAL > && opts->x_arm_stack_protector_guard_offset_str) > { > - error ("incompatible options %'-mstack-protector-guard=global%' and" > - "%'-mstack-protector-guard-offset=%qs%'", > + error ("incompatible options %<-mstack-protector-guard=global%> and" > + "%<-mstack-protector-guard-offset=%s%>", >arm_stack_protector_guard_offset_str); > } > > @@ -3880,7 +3880,7 @@ arm_option_reconfigure_globals (void) > } > > if (!TARGET_HARD_TP && arm_stack_protector_guard == SSP_TLSREG) > -error("%'-mstack-protector-guard=tls%' needs a hardware TLS register"); > +error("%<-mstack-protector-guard=tls%> needs a hardware TLS register"); > } > > /* Perform some validation between the desired architecture and the rest > of the > -- > 2.35.1
[PATCH][pushed] MSP430: fix error message.
PR target/104797 gcc/ChangeLog: * config/msp430/msp430.cc (msp430_expand_delay_cycles): Remove parenthesis from built-in name. --- gcc/config/msp430/msp430.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/msp430/msp430.cc b/gcc/config/msp430/msp430.cc index eb219fd5e71..7a378ceac56 100644 --- a/gcc/config/msp430/msp430.cc +++ b/gcc/config/msp430/msp430.cc @@ -2744,7 +2744,7 @@ msp430_expand_delay_cycles (rtx arg) if (GET_CODE (arg) != CONST_INT) { - error ("%<__delay_cycles()%> only takes constant arguments"); + error ("%<__delay_cycles%> only takes constant arguments"); return NULL_RTX; } -- 2.35.1
[x86 PATCH] PR tree-optimization/98335: New peephole2 xorl; movb -> movzbl
This patch is the backend piece of my proposed fix to PR tree-opt/98335, to allow C++ partial struct initialization to be as efficient/optimized as full struct initialization. With the middle-end patch just posted to gcc-patches, the test case in the PR compiles on x86_64-pc-linux-gnu with -O2 to: xorl%eax, %eax movbc(%rip), %al ret with this additional peephole2 (actually pair of peephole2s): movzbl c(%rip), %eax ret This patch has been tested on x86_64-pc-linux-gnu, on top of the middle-end piece, with make bootstrap and make -k check with no new failures. Posted in pieces to simplify review. Ok for mainline? 2022-03-07 Roger Sayle gcc/ChangeLog PR tree-optimization/98335 * config/i386/i386.md (peephole2): Transform xorl followed by a suitable movb or movw into the equivalent movz[bw]l. gcc/testsuite/ChangeLog PR tree-optimization/98335 * g++.target/i386/pr98335.C: New test case. Thanks in advance, Roger -- diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index d15170e..f1eb62a 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -4276,6 +4276,26 @@ [(set_attr "isa" "*,avx512dq,avx512dq") (set_attr "type" "imovx,mskmov,mskmov") (set_attr "mode" "SI,QI,QI")]) + +;; Transform xorl; mov[bw] (set strict_low_part) into movz[bw]l. +(define_peephole2 + [(parallel [(set (match_operand:DI 0 "general_reg_operand") + (const_int 0)) + (clobber (reg:CC FLAGS_REG))]) + (set (strict_low_part (match_operand:SWI12 1 "general_reg_operand")) + (match_operand:SWI12 2 "nonimmediate_operand"))] + "TARGET_64BIT + && REGNO (operands[0]) == REGNO (operands[1])" + [(set (match_dup 0) (zero_extend:DI (match_dup 2)))]) + +;; Likewise, but preserving FLAGS_REG. +(define_peephole2 + [(set (match_operand:DI 0 "general_reg_operand") (const_int 0)) + (set (strict_low_part (match_operand:SWI12 1 "general_reg_operand")) + (match_operand:SWI12 2 "nonimmediate_operand"))] + "TARGET_64BIT + && REGNO (operands[0]) == REGNO (operands[1])" + [(set (match_dup 0) (zero_extend:DI (match_dup 2)))]) ;; Sign extension instructions diff --git a/gcc/testsuite/g++.target/i386/pr98335.C b/gcc/testsuite/g++.target/i386/pr98335.C new file mode 100644 index 000..2581b83 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr98335.C @@ -0,0 +1,18 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2" } */ + +struct Data { + char a; + int b; +}; + +char c; + +Data val(int idx) { + return { c }; // { dg-warning "extended initializer" "c++ 98" { target { c++98_only } } } +} + +/* { dg-final { scan-assembler "movzbl" } } */ +/* { dg-final { scan-assembler-not "xorl" } } */ +/* { dg-final { scan-assembler-not "movb" } } */ +
Re: [x86 PATCH] PR tree-optimization/98335: New peephole2 xorl;movb -> movzbl
On Mon, Mar 7, 2022 at 11:55 AM Roger Sayle wrote: > > > This patch is the backend piece of my proposed fix to PR tree-opt/98335, > to allow C++ partial struct initialization to be as efficient/optimized > as full struct initialization. > > With the middle-end patch just posted to gcc-patches, the test case > in the PR compiles on x86_64-pc-linux-gnu with -O2 to: > > xorl%eax, %eax > movbc(%rip), %al > ret > > with this additional peephole2 (actually pair of peephole2s): > > movzbl c(%rip), %eax > ret > > > This patch has been tested on x86_64-pc-linux-gnu, on top of the > middle-end piece, with make bootstrap and make -k check with no > new failures. Posted in pieces to simplify review. Ok for mainline? Is there a reason that only inserts to DImode registers are implemented? IMO, these peepholes should also handle inserts to SImode. Uros. > > 2022-03-07 Roger Sayle > > gcc/ChangeLog > PR tree-optimization/98335 > * config/i386/i386.md (peephole2): Transform xorl followed by > a suitable movb or movw into the equivalent movz[bw]l. > > gcc/testsuite/ChangeLog > PR tree-optimization/98335 > * g++.target/i386/pr98335.C: New test case. > > > Thanks in advance, > Roger > -- >
Re: [PATCH V2] [i386] Optimize v4si broadcast for noavx512vl.
On Mon, Mar 7, 2022 at 6:11 AM liuhongt wrote: > > >What happens if you set preferred_for_speed to false for alternative 1? > It works, and I've removed the newly added splitter in this patch. > Also i tried to do similar things to *vec_dup with mode iterator > AVX2_VEC_DUP_MODE, but it hit ICE during reload since x86 don't have direct > move for QImode from gpr to sse register. so in this patch i only handle > *vec_dupv4si. > > >> +(define_split > >> + [(set (match_operand:V4SI 0 "sse_reg_operand") > >> + (vec_duplicate:V4SI > >> + (match_operand:SI 1 "general_reg_operand")))] > >> + "TARGET_SSE && reload_completed > >> + /* Disable this splitter if avx512vl_vec_dup_gprv4si insn is > >> + available, because then we can broadcast from GPRs directly. */ > > >I think avx512vl_vec_dup_gprv4si should be merged with the above > >pattern instead. > Remove this splitter. > > This will enable below > > - vbroadcastss.LC1(%rip), %xmm0 > + movl$-45, %edx > + vmovd %edx, %xmm0 > + vpshufd $0, %xmm0, %xmm0 > > According to microbenchmark, it's faster than broadcast from memory > for TARGET_INTER_UNIT_MOVES_TO_VEC. > > gcc/ChangeLog: > > * config/i386/sse.md (*vec_dupv4si): Disable memory operand > for !TARGET_INTER_UNIT_MOVES_TO_VEC when prefer_for_speed. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr100865-8a.c: Adjust testcase. > * gcc.target/i386/pr100865-8c.c: Ditto. > * gcc.target/i386/pr100865-9c.c: Ditto. LGTM. Thanks, Uros. > --- > gcc/config/i386/sse.md | 7 ++- > gcc/testsuite/gcc.target/i386/pr100865-8a.c | 2 +- > gcc/testsuite/gcc.target/i386/pr100865-8c.c | 2 +- > gcc/testsuite/gcc.target/i386/pr100865-9c.c | 2 +- > 4 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index 3066ea3734a..a091853065e 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -25134,7 +25134,12 @@ (define_insn "*vec_dupv4si" > (set_attr "length_immediate" "1,0,1") > (set_attr "prefix_extra" "0,1,*") > (set_attr "prefix" "maybe_vex,maybe_evex,orig") > - (set_attr "mode" "TI,V4SF,V4SF")]) > + (set_attr "mode" "TI,V4SF,V4SF") > + (set (attr "preferred_for_speed") > + (cond [(eq_attr "alternative" "1") > + (symbol_ref "!TARGET_INTER_UNIT_MOVES_TO_VEC") > + ] > + (symbol_ref "true")))]) > > (define_insn "*vec_dupv2di" >[(set (match_operand:V2DI 0 "register_operand" "=x,v,v,x") > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8a.c > b/gcc/testsuite/gcc.target/i386/pr100865-8a.c > index 911b14d4a25..544a14db6f7 100644 > --- a/gcc/testsuite/gcc.target/i386/pr100865-8a.c > +++ b/gcc/testsuite/gcc.target/i386/pr100865-8a.c > @@ -20,5 +20,5 @@ foo (void) > array[i] = MK_CONST128_BROADCAST_SIGNED (-45); > } > > -/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t > \]+\[^\n\]*, %xmm\[0-9\]+" 1 { xfail *-*-* } } } */ > +/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t > \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */ > /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } > */ > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8c.c > b/gcc/testsuite/gcc.target/i386/pr100865-8c.c > index 00682edb8c9..efee0488614 100644 > --- a/gcc/testsuite/gcc.target/i386/pr100865-8c.c > +++ b/gcc/testsuite/gcc.target/i386/pr100865-8c.c > @@ -3,5 +3,5 @@ > > #include "pr100865-8a.c" > > -/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, > %xmm\[0-9\]+" 1 { xfail *-*-* } } } */ > +/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, > %xmm\[0-9\]+" 1 } } */ > /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } > */ > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9c.c > b/gcc/testsuite/gcc.target/i386/pr100865-9c.c > index 8ffcdc1629d..e6f25902c1d 100644 > --- a/gcc/testsuite/gcc.target/i386/pr100865-9c.c > +++ b/gcc/testsuite/gcc.target/i386/pr100865-9c.c > @@ -3,5 +3,5 @@ > > #include "pr100865-9a.c" > > -/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, > %xmm\[0-9\]+" 1 { xfail *-*-* } } } */ > +/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, > %xmm\[0-9\]+" 1 } } */ > /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } > */ > -- > 2.18.1 >
[PATCH] tree-optimization/104782 - adjust PR101636 fix
This reverts the reversion of r10-5979 amending the CTOR case with a comment as to why the conversion is not necessary there. It also adds a testcase (but not for the CTOR case). Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2022-03-07 Richard Biener PR tree-optimization/104782 * tree-vect-slp.cc (vectorize_slp_instance_root_stmt): Re-instantiate r10-5979 fix, add comment. * gcc.dg/vect/pr104782.c: New testcase. --- gcc/testsuite/gcc.dg/vect/pr104782.c | 18 ++ gcc/tree-vect-slp.cc | 7 +++ 2 files changed, 25 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/pr104782.c diff --git a/gcc/testsuite/gcc.dg/vect/pr104782.c b/gcc/testsuite/gcc.dg/vect/pr104782.c new file mode 100644 index 000..7b8ca6ca25b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr104782.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3" } */ +/* { dg-additional-options "-march=armv8.2-a+sve -msve-vector-bits=128" { target aarch64-*-* } } */ + +int a, b, c; +static int d; +short *q; +void f() { + int *p = &d; + b = 9; + for (b = 9; b; b--) { +a = 2; +for (c = 2; c <= 9; c++) { + for (int i = 0; i < 3; i++) +*p |= (*q)++; +} + } +} diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index f9bbc872a99..4ac2b70303c 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -7382,6 +7382,10 @@ vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance) gimple *child_stmt = SLP_TREE_VEC_STMTS (node)[0]; tree vect_lhs = gimple_get_lhs (child_stmt); tree root_lhs = gimple_get_lhs (instance->root_stmts[0]->stmt); + if (!useless_type_conversion_p (TREE_TYPE (root_lhs), + TREE_TYPE (vect_lhs))) + vect_lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (root_lhs), + vect_lhs); rstmt = gimple_build_assign (root_lhs, vect_lhs); } else if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) > 1) @@ -7392,6 +7396,9 @@ vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance) vec *v; vec_alloc (v, nelts); + /* A CTOR can handle V16HI composition from VNx8HI so we +do not need to convert vector elements if the types +do not match. */ FOR_EACH_VEC_ELT (SLP_TREE_VEC_STMTS (node), j, child_stmt) CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, gimple_get_lhs (child_stmt)); -- 2.34.1
RE: [x86 PATCH] PR tree-optimization/98335: New peephole2 xorl; movb -> movzbl
Hi Uros, > Is there a reason that only inserts to DImode registers are implemented? > IMO, these peepholes should also handle inserts to SImode. I wasn't able to construct a test case that produced a byte or word insert into an SImode register. The front-ends and middle-end end up producing different code sequences, and -m32 changes the ABI so that small structs get passed in memory rather than in registers. Here's the expanded testcase that I investigated: struct DataCL { char a; int b; }; struct DataWL { short a; int b; }; struct DataIL { int a; int b; }; struct DataCI { char a; short b; }; struct DataWI { short a; short b; }; char c; short w; int i; DataCL foo_cl(int idx) { return { c }; } DataCL bar_cl(int idx) { return { c, 0 }; } DataWL foo_wl(int idx) { return { w }; } DataWL bar_wl(int idx) { return { w, 0 }; } DataIL foo_il(int idx) { return { i }; } DataIL bar_il(int idx) { return { i, 0 }; } DataCI foo_ci(int idx) { return { c }; } DataCI bar_ci(int idx) { return { c, 0 }; } DataWI foo_wi(int idx) { return { w }; } DataWI bar_wi(int idx) { return { w, 0 }; } I agree that for completeness similar peepholes handling inserts into SImode would be a good thing, but these wouldn't be restricted by TARGET_64BIT and would therefore require additional -m32 testing. The DImode peepholes I can justify for stage4 as PR tree-opt/98335 is a regression, SImode peepholes would be more of a "leap of faith". If you’d be willing to accept a patch without a testcase, let me know. It's also a pity that subreg handling in combine doesn't allow merging these inserts into zero registers to be combined to zero_extends in a machine independent way. My recent patch for PR 95126 (awaiting review) should also allow front-ends and middle-end passes more flexibility in optimizing small struct constructors. Thanks (as always) for reviewing patches so quickly. Roger --
Re: [x86 PATCH] PR tree-optimization/98335: New peephole2 xorl; movb -> movzbl
On Mon, Mar 07, 2022 at 11:51:50AM -, Roger Sayle wrote: > I agree that for completeness similar peepholes handling inserts into > SImode would be a good thing, but these wouldn't be restricted by > TARGET_64BIT and would therefore require additional -m32 testing. > The DImode peepholes I can justify for stage4 as PR tree-opt/98335 > is a regression, SImode peepholes would be more of a "leap of faith". > If you’d be willing to accept a patch without a testcase, let me know. For testcase, wouldn't something like: struct Data { char a; short b; }; char c; void val(void) { __asm__ __volatile__ ("" : : "r" ((struct Data) { c } )); } do it? Jakub
[C++ PATCH] PR c++/96329: ICE-on-invalid-code error recovery.
This patch fixes PR c++/96329 which is an ICE-on-invalid-code regression affecting mainline. This patch has been tested on x86_64-pc-linux-gnu, enabling languages c and c++, with make bootstrap and make -k check with no new failures. Ok for mainline? 2022-03-07 Roger Sayle gcc/cp/ChangeLog PR c++/96329 * parser.cc (cp_parser_linkage_specification): Treat the case where linkage is error_mark_node as "invalid linkage-specification". gcc/testsuite/ChangeLog PR c++/96329 * g++.dg/pr96329.C: New test case. Thanks in advance, Roger -- diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 03d99ab..d2993c7 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -16148,8 +16148,9 @@ cp_parser_linkage_specification (cp_parser* parser, tree prefix_attr) /* Transform the literal into an identifier. If the literal is a wide-character string, or contains embedded NULs, then we can't handle it as the user wants. */ - if (strlen (TREE_STRING_POINTER (linkage)) - != (size_t) (TREE_STRING_LENGTH (linkage) - 1)) + if (linkage == error_mark_node + || strlen (TREE_STRING_POINTER (linkage)) +!= (size_t) (TREE_STRING_LENGTH (linkage) - 1)) { cp_parser_error (parser, "invalid linkage-specification"); /* Assume C++ linkage. */ diff --git a/gcc/testsuite/g++.dg/pr96329.C b/gcc/testsuite/g++.dg/pr96329.C new file mode 100644 index 000..c9f44b7 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr96329.C @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-excess-errors "" } */ + +extern "C" ""a
[C++ PATCH] PR c++/96437: ICE-on-invalid-code error recovery.
This patch fixes PR c++/96437 which is an ICE-on-invalid-code regression affecting mainline. This patch has been tested on x86_64-pc-linux-gnu, enabling languages c and c++, with make bootstrap and make -k check with no new failures. Ok for mainline? 2022-03-07 Roger Sayle gcc/cp/ChangeLog PR c++/96437 * parser.cc (synthesize_implicit_template_parm): Check that TREE_VALUE (new_parm) isn't error_mark_node before setting its DECL_VIRTUAL_P. gcc/testsuite/ChangeLog PR c++/96437 * g++.dg/pr96437.C: New test case. Thanks in advance, Roger -- diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 03d99ab..992f839 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -48217,7 +48217,8 @@ synthesize_implicit_template_parm (cp_parser *parser, tree constr) function template is equivalent to an explicit template. Note that DECL_ARTIFICIAL is used elsewhere for template parameters. */ - DECL_VIRTUAL_P (TREE_VALUE (new_parm)) = true; + if (TREE_VALUE (new_parm) != error_mark_node) +DECL_VIRTUAL_P (TREE_VALUE (new_parm)) = true; // Chain the new parameter to the list of implicit parameters. if (parser->implicit_template_parms) diff --git a/gcc/testsuite/g++.dg/pr96437.C b/gcc/testsuite/g++.dg/pr96437.C new file mode 100644 index 000..00c4141 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr96437.C @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-excess-errors "" } */ + +template <()> void A(auto){}
[C++ PATCH] PR c++/96440: ICE-on-invalid-code error recovery.
This patch fixes PR c++/96440 which is an ICE-on-invalid-code regression affecting mainline. This patch has been tested on x86_64-pc-linux-gnu, enabling languages c and c++, with make bootstrap and make -k check with no new failures. Ok for mainline? 2022-03-07 Roger Sayle gcc/cp/ChangeLog PR c++/96440 * decl.cc (start_decl): Defend against prefix_attributes being error_mark_node. gcc/testsuite/ChangeLog PR c++/96440 * g++.dg/pr96440.C: New test case. Roger -- diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc index 7f80f9d..de41b4d 100644 --- a/gcc/cp/decl.cc +++ b/gcc/cp/decl.cc @@ -5483,13 +5483,15 @@ start_decl (const cp_declarator *declarator, *pushed_scope_p = NULL_TREE; - attributes = chainon (attributes, prefix_attributes); + if (prefix_attributes != error_mark_node) +attributes = chainon (attributes, prefix_attributes); decl = grokdeclarator (declarator, declspecs, NORMAL, initialized, &attributes); if (decl == NULL_TREE || VOID_TYPE_P (decl) - || decl == error_mark_node) + || decl == error_mark_node + || prefix_attributes == error_mark_node) return error_mark_node; context = CP_DECL_CONTEXT (decl); diff --git a/gcc/testsuite/g++.dg/pr96440.C b/gcc/testsuite/g++.dg/pr96440.C new file mode 100644 index 000..212271f --- /dev/null +++ b/gcc/testsuite/g++.dg/pr96440.C @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-excess-errors "" } */ + +auto alignas a [[]] [[const]] ();
Re: [PATCH] arm: fix option quoting in error messages.
On 07/03/2022 10:27, Martin Liška wrote: This fixes option quoting in error messages. Ready to be installed? Thanks, Martin PR target/104794 gcc/ChangeLog: * config/arm/arm.cc (arm_option_override_internal): Fix quoting of options in error messages. (arm_option_reconfigure_globals): Likewise. --- gcc/config/arm/arm.cc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index c1103d9fad6..4bf1ef0517d 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -3185,8 +3185,8 @@ arm_option_override_internal (struct gcc_options *opts, if (arm_stack_protector_guard == SSP_GLOBAL && opts->x_arm_stack_protector_guard_offset_str) { - error ("incompatible options %'-mstack-protector-guard=global%' and" - "%'-mstack-protector-guard-offset=%qs%'", + error ("incompatible options %<-mstack-protector-guard=global%> and" + "%<-mstack-protector-guard-offset=%s%>", Wasn't the complaint also that there should be a space in the text between the 'and"' and the '"%<' on the subsequent line? R. arm_stack_protector_guard_offset_str); } @@ -3880,7 +3880,7 @@ arm_option_reconfigure_globals (void) } if (!TARGET_HARD_TP && arm_stack_protector_guard == SSP_TLSREG) - error("%'-mstack-protector-guard=tls%' needs a hardware TLS register"); + error("%<-mstack-protector-guard=tls%> needs a hardware TLS register"); } /* Perform some validation between the desired architecture and the rest of the
Re: [PATCH] arm: fix option quoting in error messages.
On 3/7/22 14:14, Richard Earnshaw wrote: Wasn't the complaint also that there should be a space in the text between the 'and"' and the '"%<' on the subsequent line? Yeah, I overlooked that. I'm going to push the following patch. Cheers, MartinFrom b1d8198e7df616ea80cb648a2c831e2c21f4319f Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Mon, 7 Mar 2022 14:16:21 +0100 Subject: [PATCH] arm: add missing space to error. PR target/104794 gcc/ChangeLog: * config/arm/arm.cc (arm_option_override_internal): Add missing space. --- gcc/config/arm/arm.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 4bf1ef0517d..14b6c804455 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -3185,7 +3185,7 @@ arm_option_override_internal (struct gcc_options *opts, if (arm_stack_protector_guard == SSP_GLOBAL && opts->x_arm_stack_protector_guard_offset_str) { - error ("incompatible options %<-mstack-protector-guard=global%> and" + error ("incompatible options %<-mstack-protector-guard=global%> and " "%<-mstack-protector-guard-offset=%s%>", arm_stack_protector_guard_offset_str); } -- 2.35.1
Re: [PATCH v2] Add TARGET_MOVE_WITH_MODE_P
On Wed, Mar 2, 2022 at 10:18 PM H.J. Lu wrote: > > On Wed, Mar 02, 2022 at 09:51:26AM +0100, Richard Biener wrote: > > On Tue, Mar 1, 2022 at 11:41 PM H.J. Lu via Gcc-patches > > wrote: > > > > > > Add TARGET_FOLD_MEMCPY_MAX for the maximum number of bytes to fold memcpy. > > > The default is > > > > > > MOVE_MAX * MOVE_RATIO (optimize_function_for_size_p (cfun)) > > > > > > For x86, it is MOVE_MAX to restore the old behavior before > > > > I know we've discussed this to death in the PR, I just want to repeat here > > that the GIMPLE folding expects to generate a single load and a single > > store (that is what it does on the GIMPLE level) which is why MOVE_MAX > > was chosen originally (it's documented to what a "single instruction" does). > > In practice MOVE_MAX does not seem to cover vector register sizes > > so Richard pulled MOVE_RATIO which is really intended to cover > > the case of using multiple instructions for moving memory (but then I > > don't remember whether for the ARM case the single load/store GIMPLE > > will be expanded to multiple load/store instructions). > > > > TARGET_FOLD_MEMCPY_MAX sounds like a stop-gap solution, > > being very specific for memcpy folding (we also fold memmove btw). > > > > There is also MOVE_MAX_PIECES which _might_ be more appropriate > > than MOVE_MAX here and still honor the idea of single instructions. > > Now neither arm nor aarch64 define this and it defaults to MOVE_MAX, > > not MOVE_MAX * MOVE_RATIO. > > > > So if we need a new hook then that hook should at least get the > > 'speed' argument of MOVE_RATIO and it should get a better name. > > > > I still think that it should be possible to improve the insn check to > > avoid use of "disabled" modes, maybe that's also a point to add > > a new hook like .move_with_mode_p or so? To quote, we do > > Here is the v2 patch to add TARGET_MOVE_WITH_MODE_P. Again I'd like to shine light on MOVE_MAX_PIECES which explicitely mentions "a load or store used TO COPY MEMORY" (emphasis mine) and whose x86 implementation would already be fine (doing larger moves and also not doing too large moves). But appearantly the arm folks decided that that's not fit and instead (mis-?)used MOVE_MAX * MOVE_RATIO. Yes, MOVE_MAX_PIECES is documented to apply to move_by_pieces. Still GIMPLE memcpy/memmove inlining wants to mimic exactly that but restrict itself to a single load and a single store. > > > > scalar_int_mode mode; > > if (int_mode_for_size (ilen * 8, 0).exists (&mode) > > && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8 > > && have_insn_for (SET, mode) > > /* If the destination pointer is not aligned we must be > > able > > to emit an unaligned store. */ > > && (dest_align >= GET_MODE_ALIGNMENT (mode) > > || !targetm.slow_unaligned_access (mode, dest_align) > > || (optab_handler (movmisalign_optab, mode) > > != CODE_FOR_nothing))) > > > > where I understand the ISA is enabled and if the user explicitely > > uses it that's OK but -mprefer-avx128 should tell GCC to never > > generate AVX256 code where the user was not explicitely using it > > (still for example glibc might happily use AVX256 code to implement > > the memcpy we are folding!) > > > > Note the BB vectorizer also might end up with using AVX256 because > > in places it also relies on optab queries and the vector_mode_supported_p > > check (but the memcpy folding uses the fake integer modes). So > > x86 might need to implement the related_mode hook to avoid "auto"-using > > a larger vector mode which the default implementation would happily do. > > > > Richard. > > OK for master? Looking for opinions from others as well. Btw, there's a similar use in expand_DEFERRED_INIT: && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT, 0).exists (&var_mode) && have_insn_for (SET, var_mode)) So it occured to me that maybe targetm.move_with_mode_p should eventually check have_insn_for (SET, var_mode) or we should abstract checking the two things to a generic API somewhere (in optabs-query.h maybe, or expmed.h, not sure where it would be more appropriate). > +@deftypefn {Target Hook} bool TARGET_MOVE_WITH_MODE_P (machine_mode > @var{mode}) > +This target hook returns true if move with mode @var{mode} can be > +generated implicitly. The default definition returns true. > +@end deftypefn I know what you mean but I'm not sure "can be generated implicitly" captures that. The compiler always generated stuff explicitely. It's also "should", not "can", no? Thanks, Richard. > Thanks. > > H.J. > --- > Add TARGET_MOVE_WITH_MODE_P to return true if move with mode can be > generated implicitly. The default definition returns true. The x86 > version returns true if the mode size <= MOVE_MAX, which is the max > number o
Re: [PATCH] opts: fix -gtoggle + optimize attribute
On Fri, Mar 4, 2022 at 2:12 PM Martin Liška wrote: > > On 3/1/22 09:48, Richard Biener wrote: > > I think moving flag_gtoggle handling before the flag_syntax_only handling > > is a good thing. But I don't quite understand the flag_var_tracking > > disabling > > or how it worked before. > > Well, as you know, the debugging options are a can of worms. During GCC 12 > development I moved > most of the option logic to finish_options which is a place that is used both > for command line > option processing and optimize/target pragma/attribute processing. > > That's why we see this problem. OPT_LEVELS_1_PLUS enables flag_var_tracking > but we have to drop > debug debug_info_level == DINFO_LEVEL_NONE. > > > At least I think you want to check for > > debug_info_level == NONE, no? Why should DINFO_LEVEL_TERSE be > > special? > > No, sending updated version of the patch. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed? OK. > Thanks, > Martin
[PATCH v2] x86: Disable SSE on unwind-c.c and unwind-dw2.c
Since eh_return doesn't work with stack realignment, disable SSE on unwind-c.c and unwind-dw2.c to avoid stack realignment with the 4-byte incoming stack to avoid SSE usage which is caused by commit 609e8c492d62d92465460eae3d43dfc4b2c68288 Author: H.J. Lu Date: Sat Feb 26 14:17:23 2022 -0800 x86: Always return pseudo register in ix86_gen_scratch_sse_rtx when pseudo vector registers are used to expand memset. gcc/ PR target/104781 * config/i386/i386.cc (ix86_expand_epilogue): Assert there is no stack realignment for eh_return. gcc/testsuite/ PR target/104781 * gcc.target/i386/eh_return-1.c: Add -mincoming-stack-boundary=4. * gcc.target/i386/eh_return-2.c: Likewise. libgcc/ PR target/104781 * config.host (tmake_file): Add i386/32/t-eh-return-no-sse for 32-bit x86 Cygwin, MinGW and Solaris. * config/i386/32/t-eh-return-no-sse: New file. --- gcc/config/i386/i386.cc | 5 ++--- gcc/testsuite/gcc.target/i386/eh_return-1.c | 2 +- gcc/testsuite/gcc.target/i386/eh_return-2.c | 2 +- libgcc/config.host | 13 + libgcc/config/i386/32/t-eh-return-no-sse| 5 + 5 files changed, 22 insertions(+), 5 deletions(-) create mode 100644 libgcc/config/i386/32/t-eh-return-no-sse diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 95219902694..1c675304e32 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -9444,9 +9444,8 @@ ix86_expand_epilogue (int style) rtx sa = EH_RETURN_STACKADJ_RTX; rtx_insn *insn; - /* %ecx can't be used for both DRAP register and eh_return. */ - if (crtl->drap_reg) - gcc_assert (REGNO (crtl->drap_reg) != CX_REG); + /* Stack realignment doesn't work with eh_return. */ + gcc_assert (!crtl->stack_realign_needed); /* regparm nested functions don't work with eh_return. */ gcc_assert (!ix86_static_chain_on_stack); diff --git a/gcc/testsuite/gcc.target/i386/eh_return-1.c b/gcc/testsuite/gcc.target/i386/eh_return-1.c index b21fd75fc93..43f94f01a97 100644 --- a/gcc/testsuite/gcc.target/i386/eh_return-1.c +++ b/gcc/testsuite/gcc.target/i386/eh_return-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -march=haswell -mno-avx512f -mtune-ctrl=avx256_move_by_pieces" } */ +/* { dg-options "-O2 -mincoming-stack-boundary=4 -march=haswell -mno-avx512f -mtune-ctrl=avx256_move_by_pieces" } */ struct _Unwind_Context { diff --git a/gcc/testsuite/gcc.target/i386/eh_return-2.c b/gcc/testsuite/gcc.target/i386/eh_return-2.c index f23f4492dac..cb762f92cc2 100644 --- a/gcc/testsuite/gcc.target/i386/eh_return-2.c +++ b/gcc/testsuite/gcc.target/i386/eh_return-2.c @@ -1,6 +1,6 @@ /* PR target/101772 */ /* { dg-do compile } */ -/* { dg-additional-options "-O0 -march=x86-64 -mstackrealign" } */ +/* { dg-additional-options "-O0 -mincoming-stack-boundary=4 -march=x86-64 -mstackrealign" } */ struct _Unwind_Context _Unwind_Resume_or_Rethrow_this_context; diff --git a/libgcc/config.host b/libgcc/config.host index 094fd3ad254..1588ab6cf20 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1583,3 +1583,16 @@ case ${host} in tmake_file="$tmake_file t-gthr-noweak" ;; esac + +case ${host} in +i[34567]86-*-* | x86_64-*-*) + if test "${host_address}" = 32; then +case ${host} in +*-*-cygwin* | *-*-mingw* | *-*-solaris2*) + # Disable SSE on unwind-c.c and unwind-dw2.c to avoid stack + # realignment with the 4-byte aligned incoming stack. + tmake_file="${tmake_file} i386/${host_address}/t-eh-return-no-sse" + ;; +esac + fi +esac diff --git a/libgcc/config/i386/32/t-eh-return-no-sse b/libgcc/config/i386/32/t-eh-return-no-sse new file mode 100644 index 000..5a8c3135911 --- /dev/null +++ b/libgcc/config/i386/32/t-eh-return-no-sse @@ -0,0 +1,5 @@ +# Since eh_return doesn't work with stack realignment, disable SSE on +# unwind-c.c and unwind-dw2.c to avoid stack realignment with the 4-byte +# incoming stack. + +unwind-c.o unwind-c_s.o unwind-dw2.o unwind-dw2_s.o : HOST_LIBGCC2_CFLAGS += -mno-sse -- 2.35.1
Re: [PATCH] x86: Disable SSE on unwind-c.c and unwind-dw2.c
On Sun, Mar 6, 2022 at 10:07 AM Eric Botcazou wrote: > > > PR target/104781 > > * config.host (tmake_file): Add i386/32/t-eh-return-no-sse for > > 32-bit x86 Cygwin and Solaris. > > * config/i386/32/t-eh-return-no-sse: New file. > > What about MinGW here? It should be included. Here is the v2 patch: https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591298.html Thanks. -- H.J.
[Patch] Fortran: Fix gfc_maybe_dereference_var [PR104430]
The problem is that inside the main program, y = f(z) where the the result of z is type(t) :: z(size(x%a)) and 'x' is a dummy argument. 'x' looses the attr.dummy in gfc_add_interface_mapping and this leads to an additional indirect ref in gfc_maybe_dereference_var - but after the first indirect ref, TREE_TYPE is alread a RECORD_TYPE ... The simple fix is to simply punt when the argument is not a pointer or reference. This patch additionally fixes a comment in conv_parent_component_references. Those parts are obvious. The only potentially risky change is the comparison change from a name-wise comparison to a pointer-based comparison. I think it is fine and the testsuite did not find any issue, but you prefer, I can also exclude it. OK for mainline? How about GCC 10/11 backports? (I think leaving out the last change, it should be very safe to do.) Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Fortran: Fix gfc_maybe_dereference_var [PR104430] PR fortran/104430 gcc/fortran/ChangeLog: * trans-expr.cc (conv_parent_component_references): Fix comment; simplify comparison. (gfc_maybe_dereference_var): Avoid derefereing a nonpointer gcc/testsuite/ChangeLog: * gfortran.dg/class_result_10.f90: New test. gcc/fortran/trans-expr.cc | 6 -- gcc/testsuite/gfortran.dg/class_result_10.f90 | 25 + 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc index eb6a78c3a62..e441952818b 100644 --- a/gcc/fortran/trans-expr.cc +++ b/gcc/fortran/trans-expr.cc @@ -2805,9 +2805,9 @@ conv_parent_component_references (gfc_se * se, gfc_ref * ref) dt = ref->u.c.sym; c = ref->u.c.component; - /* Return if the component is in the parent type. */ + /* Return if the component is in this type, i.e. not in the parent type. */ for (cmp = dt->components; cmp; cmp = cmp->next) -if (strcmp (c->name, cmp->name) == 0) +if (c == cmp) return; /* Build a gfc_ref to recursively call gfc_conv_component_ref. */ @@ -2867,6 +2867,8 @@ tree gfc_maybe_dereference_var (gfc_symbol *sym, tree var, bool descriptor_only_p, bool is_classarray) { + if (!POINTER_TYPE_P (TREE_TYPE (var))) +return var; if (is_CFI_desc (sym, NULL)) return build_fold_indirect_ref_loc (input_location, var); diff --git a/gcc/testsuite/gfortran.dg/class_result_10.f90 b/gcc/testsuite/gfortran.dg/class_result_10.f90 new file mode 100644 index 000..acd9a73ebda --- /dev/null +++ b/gcc/testsuite/gfortran.dg/class_result_10.f90 @@ -0,0 +1,25 @@ +! { dg-do run} +! +! PR fortran/104430 +! +! Contributed by G. Steinmetz + +module m + type t + integer :: a + end type +contains + function f(x) result(z) + class(t) :: x(:) + type(t) :: z(size(x%a)) + z%a = 42 + end +end +program p + use m + class(t), allocatable :: y(:), z(:) + allocate (y(32)) + z = f(y) + if (size(z) /= 32) stop 1 + if (any (z%a /= 42)) stop 2 +end
[committed] Fix up duplicated duplicated words in comments
Hi! Like in r10-7215-g700d4cb08c88aec37c13e21e63dd61fd698baabc 2 years ago, I've run grep -v 'long long\|optab optab\|template template\|double double' *.{[chS],cc} */*.{[chS],cc} *.def config/*/* 2>/dev/null | grep ' \([a-zA-Z]\+\) \1 ' and for the cases that looked clearly wrong changed them, mostly by removing one of the duplicated words but in some cases with other changes. Committed to trunk as obvious. 2022-03-07 Jakub Jelinek gcc/ * tree-ssa-propagate.cc: Fix up duplicated word issue in a comment. * config/riscv/riscv.cc: Likewise. * config/darwin.h: Likewise. * config/i386/i386.cc: Likewise. * config/aarch64/thunderx3t110.md: Likewise. * config/aarch64/fractional-cost.h: Likewise. * config/vax/vax.cc: Likewise. * config/rs6000/pcrel-opt.md: Likewise. * config/rs6000/predicates.md: Likewise. * ctfc.h: Likewise. * tree-ssa-uninit.cc: Likewise. * value-relation.h: Likewise. * gimple-range-gori.cc: Likewise. * ipa-polymorphic-call.cc: Likewise. * pointer-query.cc: Likewise. * ipa-sra.cc: Likewise. * internal-fn.cc: Likewise. * varasm.cc: Likewise. * gimple-ssa-warn-access.cc: Likewise. gcc/analyzer/ * store.cc: Fix up duplicated word issue in a comment. * analyzer.cc: Likewise. * engine.cc: Likewise. * sm-taint.cc: Likewise. gcc/c-family/ * c-attribs.cc: Fix up duplicated word issue in a comment. gcc/cp/ * cvt.cc: Fix up duplicated word issue in a comment. * pt.cc: Likewise. * module.cc: Likewise. * coroutines.cc: Likewise. gcc/fortran/ * trans-expr.cc: Fix up duplicated word issue in a comment. * gfortran.h: Likewise. * scanner.cc: Likewise. gcc/jit/ * libgccjit.h: Fix up duplicated word issue in a comment. --- gcc/tree-ssa-propagate.cc.jj2022-01-18 11:59:00.090974799 +0100 +++ gcc/tree-ssa-propagate.cc 2022-03-07 14:33:28.033829512 +0100 @@ -697,7 +697,7 @@ private: gimple_stmt_iterator new_gsi); }; -/* Call post_new_stmt for each each new statement that has been added +/* Call post_new_stmt for each new statement that has been added to the current BB. OLD_GSI is the statement iterator before the BB changes ocurred. NEW_GSI is the iterator which may contain new statements. */ --- gcc/config/riscv/riscv.cc.jj2022-02-04 14:36:54.467612813 +0100 +++ gcc/config/riscv/riscv.cc 2022-03-07 14:50:54.717372413 +0100 @@ -4984,7 +4984,7 @@ riscv_option_override (void) target_flags |= MASK_FDIV; /* Handle -mtune, use -mcpu if -mtune is not given, and use default -mtune - if -mtune and -mcpu both not not given. */ + if -mtune and -mcpu both not given. */ cpu = riscv_parse_tune (riscv_tune_string ? riscv_tune_string : (riscv_cpu_string ? riscv_cpu_string : RISCV_TUNE_STRING_DEFAULT)); --- gcc/config/darwin.h.jj 2022-01-18 11:58:59.078989257 +0100 +++ gcc/config/darwin.h 2022-03-07 14:36:18.924463533 +0100 @@ -340,7 +340,7 @@ extern GTY(()) int darwin_ms_struct; " %:version-compare(>= 10.6 mmacosx-version-min= -no_compact_unwind) " /* In Darwin linker specs we can put -lcrt0.o and ld will search the library - path for crt0.o or -lcrtx.a and it will search for for libcrtx.a. As for + path for crt0.o or -lcrtx.a and it will search for libcrtx.a. As for other ports, we can also put xxx.{o,a}%s and get the appropriate complete startfile absolute directory. This latter point is important when we want to override ld's rule of .dylib being found ahead of .a and the user wants --- gcc/config/i386/i386.cc.jj 2022-03-04 09:35:58.674788325 +0100 +++ gcc/config/i386/i386.cc 2022-03-07 14:50:08.093016106 +0100 @@ -20334,7 +20334,7 @@ ix86_division_cost (const struct process /* Return cost of shift in MODE. If CONSTANT_OP1 is true, the op1 value is known and set in OP1_VAL. - AND_IN_OP1 specify in op1 is result of and and SHIFT_AND_TRUNCATE + AND_IN_OP1 specify in op1 is result of AND and SHIFT_AND_TRUNCATE if op1 is a result of subreg. SKIP_OP0/1 is set to true if cost of OP0/1 should be ignored. */ --- gcc/config/aarch64/thunderx3t110.md.jj 2022-01-11 23:11:21.68935 +0100 +++ gcc/config/aarch64/thunderx3t110.md 2022-03-07 14:47:39.710064661 +0100 @@ -138,7 +138,7 @@ (define_insn_reservation "thunderx3t110_ logic_shift_imm,logics_shift_imm")) "thunderx3t110_i01") -; we are going for the the optimistic answer (13) +; we are going for the optimistic answer (13) ; for now, the worst case is 23 (define_insn_reservation "thunderx3t110_div" 13 (and (eq_attr "tune" "thunderx3t110") --- gcc/config/aarch64/fractional-cost.h.jj 2022-01-11 23:11:21.687300033 +0100 +++ gcc/config/aarch64/fractional-cost.h2022-03-07 14:47:16.8
[Patch] Fortran: Fix gfc_conv_gfc_desc_to_cfi_desc with NULL [PR104126]
Pre-remark: Related NULL, there some accepts-invalid issues, not addressed in this patch. See https://gcc.gnu.org/PR104819 This patch fixes an ICE (12 regression) with NULL() that has no MOLD argument. OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Fortran: Fix gfc_conv_gfc_desc_to_cfi_desc with NULL [PR104126] PR fortran/104126 gcc/fortran/ChangeLog: * trans-expr.cc (gfc_conv_gfc_desc_to_cfi_desc): Handle NULL without MOLD. gcc/testsuite/ChangeLog: * gfortran.dg/null_actual_2.f90: New test. gcc/fortran/trans-expr.cc | 13 + gcc/testsuite/gfortran.dg/null_actual_2.f90 | 16 2 files changed, 25 insertions(+), 4 deletions(-) diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc index eb6a78c3a62..e8a78ccf4e1 100644 --- a/gcc/fortran/trans-expr.cc +++ b/gcc/fortran/trans-expr.cc @@ -5608,8 +5608,11 @@ gfc_conv_gfc_desc_to_cfi_desc (gfc_se *parmse, gfc_expr *e, gfc_symbol *fsym) itype = (e->ts.u.derived->intmod_sym_id == ISOCBINDING_FUNPTR ? CFI_type_cfunptr : CFI_type_cptr); else -switch (e->ts.type) - { +{ + if (e->expr_type == EXPR_NULL && e->ts.type == BT_UNKNOWN) + e->ts = fsym->ts; + switch (e->ts.type) + { case BT_INTEGER: case BT_LOGICAL: case BT_REAL: @@ -5647,7 +5650,8 @@ gfc_conv_gfc_desc_to_cfi_desc (gfc_se *parmse, gfc_expr *e, gfc_symbol *fsym) case BT_UNKNOWN: // FIXME: Really unreachable? Or reachable for type(*) ? If so, CFI_type_other? gcc_unreachable (); - } + } +} tmp = gfc_get_cfi_desc_type (cfi); gfc_add_modify (&block, tmp, @@ -5700,7 +5704,8 @@ gfc_conv_gfc_desc_to_cfi_desc (gfc_se *parmse, gfc_expr *e, gfc_symbol *fsym) gfc_init_block (&block2); /* Set elem_len, which may be only known at run time. */ - if (e->ts.type == BT_CHARACTER) + if (e->ts.type == BT_CHARACTER + && (e->expr_type != EXPR_NULL || gfc_strlen != NULL_TREE)) { gcc_assert (gfc_strlen); tmp = gfc_strlen; diff --git a/gcc/testsuite/gfortran.dg/null_actual_2.f90 b/gcc/testsuite/gfortran.dg/null_actual_2.f90 new file mode 100644 index 000..de481f01295 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/null_actual_2.f90 @@ -0,0 +1,16 @@ +! { dg-do compile } +! +! PR fortran/104126 +! +! Contributed by G. Steinmetz +! +program p + use iso_c_binding, only: c_char + character(len=:,kind=c_char), pointer :: d + call s(null(d)) + call s(null()) +contains + subroutine s(x) bind(c) + character(len=:, kind=c_char), pointer, intent(in) :: x + end +end
Re: [arm] MVE: Relax addressing modes for full loads and stores
On 17/01/2022 07:48, Christophe Lyon wrote: Hi André, On Fri, Jan 14, 2022 at 6:03 PM Andre Vieira (lists) via Gcc-patches wrote: Hi Christophe, This patch relaxes the addressing modes for the mve full load and stores (by full loads and stores I mean non-widening or narrowing loads and stores resp). The code before was requiring a LO_REGNUM for these, where this is only a requirement if the load is widening or the store narrowing. So with this your patch should not be necessary. Regression tested on arm-none-eabi-gcc. Can you please confirm this fixes the issue you were seeing too? Yes, I confirm this fixes the problem I was fixing with my patch #15 in my MVE/VCMP/VCOND series. I'll drop it. Thanks! Christophe gcc/ChangeLog: * config/arm/arm.h (MVE_STN_LDW_MODE): New MACRO. * config/arm/arm.c (mve_vector_mem_operand): Relax constraint on base register for non widening loads or narrowing stores. Kind Regards, Andre Vieira Ping, I noticed this also fixes PR 104790. Kind regards, Andre
Re: [PATCH v2] x86: Disable SSE on unwind-c.c and unwind-dw2.c
On Mon, Mar 07, 2022 at 05:57:40AM -0800, H.J. Lu via Gcc-patches wrote: > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -9444,9 +9444,8 @@ ix86_expand_epilogue (int style) > rtx sa = EH_RETURN_STACKADJ_RTX; > rtx_insn *insn; > > - /* %ecx can't be used for both DRAP register and eh_return. */ > - if (crtl->drap_reg) > - gcc_assert (REGNO (crtl->drap_reg) != CX_REG); > + /* Stack realignment doesn't work with eh_return. */ > + gcc_assert (!crtl->stack_realign_needed); Shouldn't this be error or sorry? Otherwise, won't calls to __builtin_eh_return from random testcases where stack realignment is needed or even with regparm nested functions just ICE? > > /* regparm nested functions don't work with eh_return. */ > gcc_assert (!ix86_static_chain_on_stack); Jakub
[committed] Add missing space in various string literals
Hi! After more than 2 years I've run my https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00844.html script again. While it has lots of false positives, it discovered two bugs. Committed to trunk as obvious. 2022-03-07 Jakub Jelinek gcc/c/ * c-parser.cc (c_parser_omp_clause_map): Add missing space in string literal. gcc/cp/ * parser.cc (cp_parser_omp_clause_map): Add missing space in string literal. --- gcc/c/c-parser.cc.jj2022-02-17 10:28:51.780275535 +0100 +++ gcc/c/c-parser.cc 2022-03-07 15:11:03.749647618 +0100 @@ -16230,8 +16230,8 @@ c_parser_omp_clause_map (c_parser *parse else { c_parser_error (parser, "%<#pragma omp target%> with " - "modifier other than % or %" - "on % clause"); + "modifier other than % or " + "% on % clause"); parens.skip_until_found_close (parser); return list; } --- gcc/cp/parser.cc.jj 2022-02-11 00:19:22.230066001 +0100 +++ gcc/cp/parser.cc2022-03-07 15:11:58.438889997 +0100 @@ -39432,8 +39432,8 @@ cp_parser_omp_clause_map (cp_parser *par else { cp_parser_error (parser, "%<#pragma omp target%> with " - "modifier other than % or %" - "on % clause"); + "modifier other than % or " + "% on % clause"); cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true, /*or_comma=*/false, Jakub
Re: [PATCH] c++: detecting copy-init context during CTAD [PR102137]
On Fri, 4 Mar 2022, Jason Merrill wrote: > On 3/4/22 14:24, Patrick Palka wrote: > > Here we're failing to communicate to cp_finish_decl from tsubst_expr > > that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING > > flag), which causes do_class_deduction to always consider explicit > > deduction guides when performing CTAD for a templated variable initializer. > > > > We could fix this by passing LOOKUP_ONLYCONVERTING appropriately when > > calling cp_finish_decl from tsubst_expr, but it seems do_class_deduction > > can determine if we're in a copy-init context by simply inspecting the > > initializer, and thus render its flags parameter unnecessary, which is > > what this patch implements. (If we were to fix this in tsubst_expr > > instead, I think we'd have to inspect the initializer in the same way > > in order to detect a copy-init context?) > > Hmm, does this affect conversions as well? > > Looks like it does: > > struct A > { > explicit operator int(); > }; > > template void f() > { > T t = A(); > } > > int main() > { > f(); // wrongly accepted > } > > The reverse, initializing via an explicit constructor, is caught by code in > build_aggr_init much like the code your patch adds to do_auto_deduction; > perhaps we should move/copy that code to cp_finish_decl? Ah, makes sense. Moving that code from build_aggr_init to cp_finish_decl broke things, but using it in both spots seems to work well. And I suppose we might as well use it in do_class_deduction too, since doing so lets us remove the flags parameter. -- >8 -- Subject: [PATCH] c++: detecting copy-init context during CTAD [PR102137] Here we're failing to communicate to cp_finish_decl from tsubst_expr that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING flag), which causes us to always consider explicit deduction guides when performing CTAD for a templated variable initializer. It turns out this bug also affects consideration of explicit conversion operators for the same reason. But consideration of explicit constructors is unaffected and seems to work correctly thanks to code in build_aggr_init that sets LOOKUP_ONLYCONVERTING when the initializer represents copy-initialization. This patch factors out the copy-initialization test from build_aggr_init and reuses it in tsubst_expr for sake of cp_finish_decl. And we might as well use it in do_class_deduction too, since doing so lets us get rid of its flags parameter (which is used only to control whether explicit deduction guides are considered). Bootstrapped and regtestd on x86_64-pc-linux-gnu, does this look OK for trunk? PR c++/102137 PR c++/87820 gcc/cp/ChangeLog: * cp-tree.h (is_copy_initialization): Declare. (do_auto_deduction): Remove flags parameter. * decl.cc (cp_finish_decl): Adjust call to do_auto_deduction. * init.cc (build_aggr_init): Split out copy-initialization check into ... (is_copy_initialization): ... here. * pt.cc (convert_template_argument): Adjust call to do_auto_deduction. (tsubst_expr) : Pass LOOKUP_ONLYCONVERTING to cp_finish_decl when is_copy_initialization. (do_class_deduction): Remove flags parameter, and instead call is_copy_initialization to determine if we're in a copy-init context. (do_auto_deduction): Adjust call to do_class_deduction. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/explicit15.C: New test. * g++.dg/cpp1z/class-deduction108.C: New test. --- gcc/cp/cp-tree.h | 4 +-- gcc/cp/decl.cc| 2 +- gcc/cp/init.cc| 20 +--- gcc/cp/pt.cc | 23 -- gcc/testsuite/g++.dg/cpp0x/explicit15.C | 31 +++ .../g++.dg/cpp1z/class-deduction108.C | 31 +++ 6 files changed, 93 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/explicit15.C create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction108.C diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index ac723901098..be556d2c441 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -7039,6 +7039,7 @@ extern void emit_mem_initializers (tree); extern tree build_aggr_init(tree, tree, int, tsubst_flags_t); extern int is_class_type (tree, int); +extern bool is_copy_initialization (tree); extern tree build_zero_init(tree, tree, bool); extern tree build_value_init (tree, tsubst_flags_t); extern tree build_value_init_noctor(tree, tsubst_flags_t); @@ -7279,8 +7280,7 @@ extern tree do_auto_deduction (tree, tree, tree, = tf_warning_or_error,
[PATCH v3] x86: Disable SSE on unwind-c.c and unwind-dw2.c
Since eh_return doesn't work with stack realignment, disable SSE on unwind-c.c and unwind-dw2.c to avoid stack realignment with the 4-byte incoming stack to avoid SSE usage which is caused by commit 609e8c492d62d92465460eae3d43dfc4b2c68288 Author: H.J. Lu Date: Sat Feb 26 14:17:23 2022 -0800 x86: Always return pseudo register in ix86_gen_scratch_sse_rtx when pseudo vector registers are used to expand memset. gcc/ PR target/104781 * config/i386/i386.cc (ix86_expand_epilogue): Sorry if there is stack realignment with eh_return or regparm nested function. gcc/testsuite/ PR target/104781 * gcc.target/i386/eh_return-1.c: Add -mincoming-stack-boundary=4. * gcc.target/i386/eh_return-2.c: Likewise. libgcc/ PR target/104781 * config.host (tmake_file): Add i386/32/t-eh-return-no-sse for 32-bit x86 Cygwin, MinGW and Solaris. * config/i386/32/t-eh-return-no-sse: New file. --- gcc/config/i386/i386.cc | 11 +++ gcc/testsuite/gcc.target/i386/eh_return-1.c | 2 +- gcc/testsuite/gcc.target/i386/eh_return-2.c | 2 +- libgcc/config.host | 13 + libgcc/config/i386/32/t-eh-return-no-sse| 5 + 5 files changed, 27 insertions(+), 6 deletions(-) create mode 100644 libgcc/config/i386/32/t-eh-return-no-sse diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index efa947f9795..4121f986221 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -9444,12 +9444,15 @@ ix86_expand_epilogue (int style) rtx sa = EH_RETURN_STACKADJ_RTX; rtx_insn *insn; - /* %ecx can't be used for both DRAP register and eh_return. */ - if (crtl->drap_reg) - gcc_assert (REGNO (crtl->drap_reg) != CX_REG); + /* Stack realignment doesn't work with eh_return. */ + if (crtl->stack_realign_needed) + sorry ("Stack realignment not supported with " + "%<__builtin_eh_return%>"); /* regparm nested functions don't work with eh_return. */ - gcc_assert (!ix86_static_chain_on_stack); + if (ix86_static_chain_on_stack) + sorry ("regparm nested function not supported with " + "%<__builtin_eh_return%>"); if (frame_pointer_needed) { diff --git a/gcc/testsuite/gcc.target/i386/eh_return-1.c b/gcc/testsuite/gcc.target/i386/eh_return-1.c index b21fd75fc93..43f94f01a97 100644 --- a/gcc/testsuite/gcc.target/i386/eh_return-1.c +++ b/gcc/testsuite/gcc.target/i386/eh_return-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -march=haswell -mno-avx512f -mtune-ctrl=avx256_move_by_pieces" } */ +/* { dg-options "-O2 -mincoming-stack-boundary=4 -march=haswell -mno-avx512f -mtune-ctrl=avx256_move_by_pieces" } */ struct _Unwind_Context { diff --git a/gcc/testsuite/gcc.target/i386/eh_return-2.c b/gcc/testsuite/gcc.target/i386/eh_return-2.c index f23f4492dac..cb762f92cc2 100644 --- a/gcc/testsuite/gcc.target/i386/eh_return-2.c +++ b/gcc/testsuite/gcc.target/i386/eh_return-2.c @@ -1,6 +1,6 @@ /* PR target/101772 */ /* { dg-do compile } */ -/* { dg-additional-options "-O0 -march=x86-64 -mstackrealign" } */ +/* { dg-additional-options "-O0 -mincoming-stack-boundary=4 -march=x86-64 -mstackrealign" } */ struct _Unwind_Context _Unwind_Resume_or_Rethrow_this_context; diff --git a/libgcc/config.host b/libgcc/config.host index 094fd3ad254..1588ab6cf20 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1583,3 +1583,16 @@ case ${host} in tmake_file="$tmake_file t-gthr-noweak" ;; esac + +case ${host} in +i[34567]86-*-* | x86_64-*-*) + if test "${host_address}" = 32; then +case ${host} in +*-*-cygwin* | *-*-mingw* | *-*-solaris2*) + # Disable SSE on unwind-c.c and unwind-dw2.c to avoid stack + # realignment with the 4-byte aligned incoming stack. + tmake_file="${tmake_file} i386/${host_address}/t-eh-return-no-sse" + ;; +esac + fi +esac diff --git a/libgcc/config/i386/32/t-eh-return-no-sse b/libgcc/config/i386/32/t-eh-return-no-sse new file mode 100644 index 000..5a8c3135911 --- /dev/null +++ b/libgcc/config/i386/32/t-eh-return-no-sse @@ -0,0 +1,5 @@ +# Since eh_return doesn't work with stack realignment, disable SSE on +# unwind-c.c and unwind-dw2.c to avoid stack realignment with the 4-byte +# incoming stack. + +unwind-c.o unwind-c_s.o unwind-dw2.o unwind-dw2_s.o : HOST_LIBGCC2_CFLAGS += -mno-sse -- 2.35.1
Re: [PATCH v2] x86: Disable SSE on unwind-c.c and unwind-dw2.c
On Mon, Mar 7, 2022 at 6:23 AM Jakub Jelinek wrote: > > On Mon, Mar 07, 2022 at 05:57:40AM -0800, H.J. Lu via Gcc-patches wrote: > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -9444,9 +9444,8 @@ ix86_expand_epilogue (int style) > > rtx sa = EH_RETURN_STACKADJ_RTX; > > rtx_insn *insn; > > > > - /* %ecx can't be used for both DRAP register and eh_return. */ > > - if (crtl->drap_reg) > > - gcc_assert (REGNO (crtl->drap_reg) != CX_REG); > > + /* Stack realignment doesn't work with eh_return. */ > > + gcc_assert (!crtl->stack_realign_needed); > > Shouldn't this be error or sorry? Otherwise, won't > calls to __builtin_eh_return from random testcases where > stack realignment is needed or even with regparm nested functions > just ICE? Fixed in the v3 patch: https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591307.html Thanks. > > > > /* regparm nested functions don't work with eh_return. */ > > gcc_assert (!ix86_static_chain_on_stack); > > Jakub > -- H.J.
Re: [committed] libstdc++: Ensure __glibcxx_assert_fail has default visibility
On Sat, 5 Mar 2022 at 20:34, Jonathan Wakely via Libstdc++ wrote: > > Tested powerpc64le-linux, pushed to trunk. > > -- >8 -- > > This ensures there's no linker error if libstdc++ headers are included > following a pragma that sets hidden visibility. > > Similarly for std::__terminate, which is always-inline so shouldn't > matter, but it's not wrong to do this anyway. > > libstdc++-v3/ChangeLog: > > * include/bits/c++config (__glibcxx_assert_fail): Add visibility > attribute. > (__terminate): Likewise. This broke mingw targets. Fixed like so. Tested x86_64-linux, pushed to trunk. commit 4cb935cb69f12088975fa7f6907c6ace0580e2dd Author: Jonathan Wakely Date: Mon Mar 7 15:07:05 2022 libstdc++: Use visibility pragmas instead of attributes [PR104807] The _GLIBCXX_PSEUDO_VISIBILITY macro isn't defined until after including os_defines.h, so we can't use _GLIBCXX_VISIBILITY early in c++config. Replace the uses of that macro with #pragma visibility push(default) instead. libstdc++-v3/ChangeLog: PR libstdc++/104807 * include/bits/c++config (__terminate, __glibcxx_assert_fail): Replace _GLIBCXX_VISIBILITY on function with visibility pragma. (__is_constant_evaluated): Add visibility pragma. diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config index 6c134f13509..2798b9786dc 100644 --- a/libstdc++-v3/include/bits/c++config +++ b/libstdc++-v3/include/bits/c++config @@ -302,15 +302,16 @@ namespace std typedef decltype(nullptr)nullptr_t; #endif +#pragma GCC visibility push(default) // This allows the library to terminate without including all of // and without making the declaration of std::terminate visible to users. extern "C++" __attribute__ ((__noreturn__, __always_inline__)) - _GLIBCXX_VISIBILITY(default) inline void __terminate() _GLIBCXX_USE_NOEXCEPT { void terminate() _GLIBCXX_USE_NOEXCEPT __attribute__ ((__noreturn__)); terminate(); } +#pragma GCC visibility pop } #define _GLIBCXX_USE_DUAL_ABI @@ -506,6 +507,7 @@ namespace std namespace std { +#pragma GCC visibility push(default) // Internal version of std::is_constant_evaluated(). // This can be used without checking if the compiler supports the feature. // The macro _GLIBCXX_HAVE_IS_CONSTANT_EVALUATED can be used to check if @@ -523,6 +525,7 @@ namespace std return false; #endif } +#pragma GCC visibility pop } // Debug Mode implies checking assertions. @@ -553,13 +556,15 @@ namespace std # ifdef _GLIBCXX_VERBOSE_ASSERT namespace std { +#pragma GCC visibility push(default) // Avoid the use of assert, because we're trying to keep the // include out of the mix. - extern "C++" _GLIBCXX_NORETURN _GLIBCXX_VISIBILITY(default) + extern "C++" _GLIBCXX_NORETURN void __glibcxx_assert_fail(const char* __file, int __line, const char* __function, const char* __condition) _GLIBCXX_NOEXCEPT; +#pragma GCC visibility pop } #define __glibcxx_assert_impl(_Condition) \ if (__builtin_expect(!bool(_Condition), false)) \
[committed] doc: Remove redundant sentence about modules being in C++20
Discussed with Jason and then committed as obvious. -- >8 -- As C++20 has already been published, we don't need to link to the draft (which is now the C++23 draft anyway). And there's no need to say it's part of the C++20 spec, or that there might be defect reports. That's true for everything in C++20, so calling it out here just for Modules isn't needed. gcc/ChangeLog: * doc/invoke.texi (C++ Modules): Remove anachronism. --- gcc/doc/invoke.texi | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 248ed534aee..b01ffab566a 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -34151,9 +34151,7 @@ provides a modular compilation system, intending to provide both faster builds and better library isolation. The ``Merging Modules'' paper @uref{https://wg21.link/p1103}, provides the easiest to read set of changes to the standard, although it does not capture later -changes. That specification is now part of C++20, -@uref{git@@github.com:cplusplus/draft.git}, it is considered complete -(there may be defect reports to come). +changes. @emph{G++'s modules support is not complete.} Other than bugs, the known missing pieces are: -- 2.34.1
RE: [arm] MVE: Relax addressing modes for full loads and stores
Ok, please include PR 104790 in the ChangeLog. Thanks, Kyrill From: Andre Vieira (lists) Sent: Monday, March 7, 2022 2:17 PM To: Kyrylo Tkachov Cc: GCC Patches Subject: Re: [arm] MVE: Relax addressing modes for full loads and stores On 17/01/2022 07:48, Christophe Lyon wrote: Hi André, On Fri, Jan 14, 2022 at 6:03 PM Andre Vieira (lists) via Gcc-patches mailto:gcc-patches@gcc.gnu.org>> wrote: Hi Christophe, This patch relaxes the addressing modes for the mve full load and stores (by full loads and stores I mean non-widening or narrowing loads and stores resp). The code before was requiring a LO_REGNUM for these, where this is only a requirement if the load is widening or the store narrowing. So with this your patch should not be necessary. Regression tested on arm-none-eabi-gcc. Can you please confirm this fixes the issue you were seeing too? Yes, I confirm this fixes the problem I was fixing with my patch #15 in my MVE/VCMP/VCOND series. I'll drop it. Thanks! Christophe gcc/ChangeLog: * config/arm/arm.h (MVE_STN_LDW_MODE): New MACRO. * config/arm/arm.c (mve_vector_mem_operand): Relax constraint on base register for non widening loads or narrowing stores. Kind Regards, Andre Vieira Ping, I noticed this also fixes PR 104790. Kind regards, Andre
[committed] Fortran: Fix typos
I saw that Jakub fixed some typos/duplicated words, I thought I could do likewise – and I found some more in gcc/fortran. Committed as r12-7524. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit e3ca3e7993696affe95a3ea24c2b133c14a056e4 Author: Tobias Burnus Date: Mon Mar 7 17:20:52 2022 +0100 Fortran: Fix typos gcc/fortran/ChangeLog: * array.cc (gfc_ref_dimen_size): Fix comment typo. * dump-parse-tree.cc (gfc_dump_c_prototypes): Likewise. * frontend-passes.cc (cfe_code): Likewise. * gfortran.texi: Likewise. * resolve.cc (generate_component_assignments): Likewise. * simplify.cc (gfc_simplify_this_image): Likewise. * trans-expr.cc (trans_scalar_class_assign, gfc_maybe_dereference_var): Likewise. * intrinsic.texi: Remove word duplication. * invoke.texi: Likewise. diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc index f1d92e00c98..eb9ed8580a0 100644 --- a/gcc/fortran/array.cc +++ b/gcc/fortran/array.cc @@ -2420,7 +2420,7 @@ gfc_ref_dimen_size (gfc_array_ref *ar, int dimen, mpz_t *result, mpz_t *end) gfc_free_expr(stride_expr); } - /* Calculate the number of elements via gfc_dep_differce, but only if + /* Calculate the number of elements via gfc_dep_difference, but only if start and end are both supplied in the reference or the array spec. This is to guard against strange but valid code like diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc index 322416e6556..3635460bffd 100644 --- a/gcc/fortran/dump-parse-tree.cc +++ b/gcc/fortran/dump-parse-tree.cc @@ -3543,7 +3543,7 @@ gfc_dump_c_prototypes (gfc_namespace *ns, FILE *file) gfc_traverse_ns (ns, write_interop_decl); } -/* Loop over all global symbols, writing out their declrations. */ +/* Loop over all global symbols, writing out their declarations. */ void gfc_dump_external_c_prototypes (FILE * file) diff --git a/gcc/fortran/frontend-passes.cc b/gcc/fortran/frontend-passes.cc index 22f1bb56a2b..4033f27df99 100644 --- a/gcc/fortran/frontend-passes.cc +++ b/gcc/fortran/frontend-passes.cc @@ -974,7 +974,7 @@ cfe_code (gfc_code **c, int *walk_subtrees, void *data ATTRIBUTE_UNUSED) changed_statement = NULL; /* Do not do anything inside a WHERE statement; scalar assignments, BLOCKs - and allocation on assigment are prohibited inside WHERE, and finally + and allocation on assignment are prohibited inside WHERE, and finally masking an expression would lead to wrong-code when replacing WHERE (a>0) diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi index 2a55676791b..f8737f4d323 100644 --- a/gcc/fortran/gfortran.texi +++ b/gcc/fortran/gfortran.texi @@ -1093,7 +1093,7 @@ variable. The maximum number of bytes of user data in a subrecord is 2147483639 (2 GiB - 9) for a four-byte record marker. This limit can be lowered -with the @option{-fmax-subrecord-length} option, altough this is +with the @option{-fmax-subrecord-length} option, although this is rarely useful. If the length of a logical record exceeds this limit, the data is distributed among several subrecords. diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi index f182cacb4b8..e3cd8279960 100644 --- a/gcc/fortran/intrinsic.texi +++ b/gcc/fortran/intrinsic.texi @@ -12897,7 +12897,7 @@ end program real_kinds @table @asis @item @emph{Description}: @code{SET_EXPONENT(X, I)} returns the real number whose fractional part -is that that of @var{X} and whose exponent part is @var{I}. +is that of @var{X} and whose exponent part is @var{I}. @item @emph{Standard}: Fortran 90 and later @@ -12917,7 +12917,7 @@ Elemental function @item @emph{Return value}: The return value is of the same type and kind as @var{X}. The real number whose fractional part -is that that of @var{X} and whose exponent part if @var{I} is returned; +is that of @var{X} and whose exponent part if @var{I} is returned; it is @code{FRACTION(X) * RADIX(X)**I}. @item @emph{Example}: diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi index 6435dc4d4de..5c7501a28b1 100644 --- a/gcc/fortran/invoke.texi +++ b/gcc/fortran/invoke.texi @@ -1858,7 +1858,7 @@ except when optimizing for size via @option{-Os}. If the code contains a very large number of argument that have to be packed, code size and also compilation time may become excessive. If that is the case, it may be better to disable this option. Instances of packing -can be found by using by using @option{-Warray-temporaries}. +can be found by using @option{-Warray-temporaries}. @item -fexternal-blas @opindex @code{fexternal-blas} @@ -2068,7 +2068,7 @@ does not generate p
Re: [PATCH v8 05/12] LoongArch Port: Machine description C files and .h files.
Hi, Some comments below, but otherwise it looks good to me. A few of the comments are about removing hook or macro definitions that are the same as the default. Doing that helps people who want to update a hook interface in future, since there are then fewer places to adjust. xucheng...@loongson.cn writes: > […] > +/* Describes how a symbol is used. > + > + SYMBOL_CONTEXT_CALL > + The symbol is used as the target of a call instruction. > + > + SYMBOL_CONTEXT_LEA > + The symbol is used in a load-address operation. > + > + SYMBOL_CONTEXT_MEM > + The symbol is used as the address in a MEM. */ > +enum loongarch_symbol_context { > + SYMBOL_CONTEXT_CALL, > + SYMBOL_CONTEXT_LEA, > + SYMBOL_CONTEXT_MEM > +}; It looks like this is unused: loongarch_classify_symbol takes an argument of this type, but ignores it. > […] > +/* Classifies a type of call. > + > + LARCH_CALL_NORMAL > + A normal call or call_value pattern. > + > + LARCH_CALL_SIBCALL > + A sibcall or sibcall_value pattern. > + > + LARCH_CALL_EPILOGUE > + A call inserted in the epilogue. */ > +enum loongarch_call_type { > + LARCH_CALL_NORMAL, > + LARCH_CALL_SIBCALL, > + LARCH_CALL_EPILOGUE > +}; This also looks unused. > + > +/* Controls the conditions under which certain instructions are split. > + > + SPLIT_IF_NECESSARY > + Only perform splits that are necessary for correctness > + (because no unsplit version exists). > + > + SPLIT_FOR_SPEED > + Perform splits that are necessary for correctness or > + beneficial for code speed. > + > + SPLIT_FOR_SIZE > + Perform splits that are necessary for correctness or > + beneficial for code size. */ > +enum loongarch_split_type { > + SPLIT_IF_NECESSARY, > + SPLIT_FOR_SPEED, > + SPLIT_FOR_SIZE > +}; It looks like this is also unused: loongarch_split_move_p takes an argument of this type, but ignores it. I think it'd be better to remove these three enums for now and add them back later if they become useful. > […] > +/* RTX costs of various operations on the different architectures. */ > +struct loongarch_rtx_cost_data > +{ > + unsigned short fp_add; > + unsigned short fp_mult_sf; > + unsigned short fp_mult_df; > + unsigned short fp_div_sf; > + unsigned short fp_div_df; > + unsigned short int_mult_si; > + unsigned short int_mult_di; > + unsigned short int_div_si; > + unsigned short int_div_di; > + unsigned short branch_cost; > + unsigned short memory_latency; > +}; > + > +/* Default RTX cost initializer. */ > +#define COSTS_N_INSNS(N) ((N) * 4) > +#define DEFAULT_COSTS\ > +.fp_add = COSTS_N_INSNS (1),\ > +.fp_mult_sf = COSTS_N_INSNS (2),\ > +.fp_mult_df = COSTS_N_INSNS (2),\ > +.fp_div_sf = COSTS_N_INSNS (4),\ > +.fp_div_df = COSTS_N_INSNS (4),\ > +.int_mult_si = COSTS_N_INSNS (1),\ > +.int_mult_di = COSTS_N_INSNS (1),\ > +.int_div_si = COSTS_N_INSNS (1),\ > +.int_div_di = COSTS_N_INSNS (1),\ > +.branch_cost = 2,\ > +.memory_latency = 4 Unfortunately we need to stay within standard C++(11), so the initialisers can't use “.name = value” notation. > […] > +/* Classifies an address. > + > + ADDRESS_REG > + A natural register + offset address. The register satisfies > + loongarch_valid_base_register_p and the offset is a > const_arith_operand. > + > + ADDRESS_REG_REG > + A base register indexed by (optionally scaled) register. > + > + ADDRESS_CONST_INT > + A signed 16-bit constant address. > + > + ADDRESS_SYMBOLIC: > + A constant symbolic address. */ > +enum loongarch_address_type > +{ > + ADDRESS_REG, > + ADDRESS_REG_REG, > + ADDRESS_CONST_INT, > + ADDRESS_SYMBOLIC > +}; > + > + > +/* Information about an address described by loongarch_address_type. > + > + ADDRESS_CONST_INT > + No fields are used. > + > + ADDRESS_REG > + REG is the base register and OFFSET is the constant offset. > + > + ADDRESS_SYMBOLIC > + SYMBOL_TYPE is the type of symbol that the address references. */ > +struct loongarch_address_info > +{ > + enum loongarch_address_type type; > + rtx reg; > + rtx offset; > + enum loongarch_symbol_type symbol_type; > +}; It'd be worth documenting what the fields mean for ADDRESS_REG_REG too. It looks like OFFSET is the index register in that case. > […] > +/* The value of TARGET_ATTRIBUTE_TABLE. */ > +static const struct attribute_spec loongarch_attribute_table[] = { > +/* { name, min_len, max_len, decl_req, type_req, fn_type_req, > + affects_type_identity, handler, exclude } */ > + { NULL, 0, 0, false, false, false, false, NULL, NULL } > +}; I think it'd be better to leave this and TARGET_ATTRIBUTE_TABLE undefined until a non-terminator entry is needed. > […] > +/* See whether TYPE is
[PATCH] c++: non-constant non-dependent decltype folding [PR104823]
instantiate_non_dependent_expr_sfinae instantiates only potentially constant expressions, but when processing a non-dependent decltype operand we want to instantiate it even if it's non-constant since such non-dependent decltype is always resolved ahead of time. Currently finish_decltype_type uses the former function, which causes us to miss issuing a narrowing diagnostic for S{id(v)} in the below testcase because we never instantiate this non-constant non-dependent decltype operand. So this patch makes finish_decltype_type use i_n_d_e_internal instead of _sfinae. And afterward, we need to keep processing_template_decl cleared for sake of the later call to lvalue_kind, which handles templated and non-templated COND_EXPR differently. Otherwise we'd incorrectly reject the declaration of g in cpp0x/cond2.C with: error: ‘g’ declared as function returning a function Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? PR c++/104823 gcc/cp/ChangeLog: * semantics.cc (finish_decltype_type): Use i_n_d_e_internal instead of _sfinae when instantiating a non-dependent decltype operand, and keep processing_template_decl cleared afterward. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/Wnarrowing19.C: New test. --- gcc/cp/semantics.cc | 11 ++- gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C | 8 2 files changed, 18 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 07cae993efe..66d90c2f7be 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -11217,6 +11217,8 @@ finish_decltype_type (tree expr, bool id_expression_or_member_access_p, /* decltype is an unevaluated context. */ cp_unevaluated u; + processing_template_decl_sentinel ptds (/*reset=*/false); + /* Depending on the resolution of DR 1172, we may later need to distinguish instantiation-dependent but not type-dependent expressions so that, say, A::U doesn't require 'typename'. */ @@ -11232,7 +11234,14 @@ finish_decltype_type (tree expr, bool id_expression_or_member_access_p, } else if (processing_template_decl) { - expr = instantiate_non_dependent_expr_sfinae (expr, complain); + /* Instantiate the non-dependent operand to diagnose any ill-formed +expressions. And keep processing_template_decl cleared for the rest +of the function (for sake of the call to lvalue_kind below, which +handles templated and non-templated COND_EXPR differently). */ + processing_template_decl = 0; + /* Since we want to do this even for non-constant expressions, we +use i_n_d_e_internal here instead of _sfinae. */ + expr = instantiate_non_dependent_expr_internal (expr, complain); if (expr == error_mark_node) return error_mark_node; } diff --git a/gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C b/gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C new file mode 100644 index 000..bd9fd2eb6f9 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C @@ -0,0 +1,8 @@ +// PR c++/104823 +// { dg-do compile { target c++11 } } + +struct S { S(int); }; + +double id(double v); + +template auto f(double v) -> decltype(S{id(v)}); // { dg-error "narrowing" } -- 2.35.1.354.g715d08a9e5
Re: [PATCH] c++: non-constant non-dependent decltype folding [PR104823]
On 3/7/22 14:41, Patrick Palka wrote: instantiate_non_dependent_expr_sfinae instantiates only potentially constant expressions Hmm, that now strikes me as a problematic interface, as we don't know whether what we get back is template or non-template trees. Maybe we want to change instantiate_non_dependent_expr to checking_assert that the argument is non-dependent (callers are already checking that), and drop the potentially-constant test? but when processing a non-dependent decltype operand we want to instantiate it even if it's non-constant since such non-dependent decltype is always resolved ahead of time. Currently finish_decltype_type uses the former function, which causes us to miss issuing a narrowing diagnostic for S{id(v)} in the below testcase because we never instantiate this non-constant non-dependent decltype operand. So this patch makes finish_decltype_type use i_n_d_e_internal instead of _sfinae. And afterward, we need to keep processing_template_decl cleared for sake of the later call to lvalue_kind, which handles templated and non-templated COND_EXPR differently. Otherwise we'd incorrectly reject the declaration of g in cpp0x/cond2.C with: error: ‘g’ declared as function returning a function Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? PR c++/104823 gcc/cp/ChangeLog: * semantics.cc (finish_decltype_type): Use i_n_d_e_internal instead of _sfinae when instantiating a non-dependent decltype operand, and keep processing_template_decl cleared afterward. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/Wnarrowing19.C: New test. --- gcc/cp/semantics.cc | 11 ++- gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C | 8 2 files changed, 18 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 07cae993efe..66d90c2f7be 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -11217,6 +11217,8 @@ finish_decltype_type (tree expr, bool id_expression_or_member_access_p, /* decltype is an unevaluated context. */ cp_unevaluated u; + processing_template_decl_sentinel ptds (/*reset=*/false); + /* Depending on the resolution of DR 1172, we may later need to distinguish instantiation-dependent but not type-dependent expressions so that, say, A::U doesn't require 'typename'. */ @@ -11232,7 +11234,14 @@ finish_decltype_type (tree expr, bool id_expression_or_member_access_p, } else if (processing_template_decl) { - expr = instantiate_non_dependent_expr_sfinae (expr, complain); + /* Instantiate the non-dependent operand to diagnose any ill-formed +expressions. And keep processing_template_decl cleared for the rest +of the function (for sake of the call to lvalue_kind below, which +handles templated and non-templated COND_EXPR differently). */ + processing_template_decl = 0; + /* Since we want to do this even for non-constant expressions, we +use i_n_d_e_internal here instead of _sfinae. */ + expr = instantiate_non_dependent_expr_internal (expr, complain); if (expr == error_mark_node) return error_mark_node; } diff --git a/gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C b/gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C new file mode 100644 index 000..bd9fd2eb6f9 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/Wnarrowing19.C @@ -0,0 +1,8 @@ +// PR c++/104823 +// { dg-do compile { target c++11 } } + +struct S { S(int); }; + +double id(double v); + +template auto f(double v) -> decltype(S{id(v)}); // { dg-error "narrowing" }
Re: [PATCH] c++: detecting copy-init context during CTAD [PR102137]
On 3/7/22 10:47, Patrick Palka wrote: On Fri, 4 Mar 2022, Jason Merrill wrote: On 3/4/22 14:24, Patrick Palka wrote: Here we're failing to communicate to cp_finish_decl from tsubst_expr that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING flag), which causes do_class_deduction to always consider explicit deduction guides when performing CTAD for a templated variable initializer. We could fix this by passing LOOKUP_ONLYCONVERTING appropriately when calling cp_finish_decl from tsubst_expr, but it seems do_class_deduction can determine if we're in a copy-init context by simply inspecting the initializer, and thus render its flags parameter unnecessary, which is what this patch implements. (If we were to fix this in tsubst_expr instead, I think we'd have to inspect the initializer in the same way in order to detect a copy-init context?) Hmm, does this affect conversions as well? Looks like it does: struct A { explicit operator int(); }; template void f() { T t = A(); } int main() { f(); // wrongly accepted } The reverse, initializing via an explicit constructor, is caught by code in build_aggr_init much like the code your patch adds to do_auto_deduction; perhaps we should move/copy that code to cp_finish_decl? Ah, makes sense. Moving that code from build_aggr_init to cp_finish_decl broke things, but using it in both spots seems to work well. And I suppose we might as well use it in do_class_deduction too, since doing so lets us remove the flags parameter. Before removing the flags parameter please try asserting that it now matches is_copy_initialization and see if anything breaks. -- >8 -- Subject: [PATCH] c++: detecting copy-init context during CTAD [PR102137] Here we're failing to communicate to cp_finish_decl from tsubst_expr that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING flag), which causes us to always consider explicit deduction guides when performing CTAD for a templated variable initializer. It turns out this bug also affects consideration of explicit conversion operators for the same reason. But consideration of explicit constructors is unaffected and seems to work correctly thanks to code in build_aggr_init that sets LOOKUP_ONLYCONVERTING when the initializer represents copy-initialization. This patch factors out the copy-initialization test from build_aggr_init and reuses it in tsubst_expr for sake of cp_finish_decl. And we might as well use it in do_class_deduction too, since doing so lets us get rid of its flags parameter (which is used only to control whether explicit deduction guides are considered). Bootstrapped and regtestd on x86_64-pc-linux-gnu, does this look OK for trunk? PR c++/102137 PR c++/87820 gcc/cp/ChangeLog: * cp-tree.h (is_copy_initialization): Declare. (do_auto_deduction): Remove flags parameter. * decl.cc (cp_finish_decl): Adjust call to do_auto_deduction. * init.cc (build_aggr_init): Split out copy-initialization check into ... (is_copy_initialization): ... here. * pt.cc (convert_template_argument): Adjust call to do_auto_deduction. (tsubst_expr) : Pass LOOKUP_ONLYCONVERTING to cp_finish_decl when is_copy_initialization. (do_class_deduction): Remove flags parameter, and instead call is_copy_initialization to determine if we're in a copy-init context. (do_auto_deduction): Adjust call to do_class_deduction. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/explicit15.C: New test. * g++.dg/cpp1z/class-deduction108.C: New test. --- gcc/cp/cp-tree.h | 4 +-- gcc/cp/decl.cc| 2 +- gcc/cp/init.cc| 20 +--- gcc/cp/pt.cc | 23 -- gcc/testsuite/g++.dg/cpp0x/explicit15.C | 31 +++ .../g++.dg/cpp1z/class-deduction108.C | 31 +++ 6 files changed, 93 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/explicit15.C create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction108.C diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index ac723901098..be556d2c441 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -7039,6 +7039,7 @@ extern void emit_mem_initializers (tree); extern tree build_aggr_init (tree, tree, int, tsubst_flags_t); extern int is_class_type (tree, int); +extern bool is_copy_initialization (tree); extern tree build_zero_init (tree, tree, bool); extern tree build_value_init (tree, tsubst_flags_t); extern tree build_value_init_noctor (tree, tsubst_flags_t); @@ -7279,8 +7280,7 @@ extern tree do_auto_deduction (tree, tree, tree,
[committed] analyzer: fix leak suppression at end of 'main' [PR101983]
PR analyzer/101983 reports what I thought were false positives from -Wanalyzer-malloc-leak, but on closer inspection, the analyzer is correctly reporting heap-allocated buffers that are no longer reachable. However, these "leaks" occur at the end of "main". The analyzer already has some logic to avoid reporting leaks at the end of main, where the leak is detected at the end of the EXIT basic block. However, in this case, the leak is detected at the clobber in BB 2 here: : func (&res); res ={v} {CLOBBER(eol)}; _4 = 0; : : return _4; where we have a chain BB 2 -> BB 3 -> EXIT BB. This patch generalizes the "are we at the end of 'main'" detection to handle such cases, silencing -Wanalyzer-malloc-leak on them. There's a remaining issue where the analyzer unhelpfully describes one of the leaking values as '', rather than 'res.a', but I'm leaving that for a followup (covered by PR analyzer/99771). Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r12-7525-g0af37ad4422052be4b7f779737e14c80e57d0ad9. gcc/analyzer/ChangeLog: PR analyzer/101983 * engine.cc (returning_from_function_p): New. (impl_region_model_context::on_state_leak): Use it when rejecting leaks at the return from "main". gcc/testsuite/ChangeLog: PR analyzer/101983 * gcc.dg/analyzer/pr101983-main.c: New test. * gcc.dg/analyzer/pr101983-not-main.c: New test. Signed-off-by: David Malcolm --- gcc/analyzer/engine.cc| 48 ++- gcc/testsuite/gcc.dg/analyzer/pr101983-main.c | 38 +++ .../gcc.dg/analyzer/pr101983-not-main.c | 40 3 files changed, 124 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr101983-main.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr101983-not-main.c diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc index c910d0519d8..bcea3ca4dfd 100644 --- a/gcc/analyzer/engine.cc +++ b/gcc/analyzer/engine.cc @@ -740,6 +740,51 @@ readability_comparator (const void *p1, const void *p2) return 0; } +/* Return true is SNODE is the EXIT node of a function, or is one + of the final snodes within its function. + + Specifically, handle the final supernodes before the EXIT node, + for the case of clobbers that happen immediately before exiting. + We need a run of snodes leading to the return_p snode, where all edges are + intraprocedural, and every snode has just one successor. + + We use this when suppressing leak reports at the end of "main". */ + +static bool +returning_from_function_p (const supernode *snode) +{ + if (!snode) +return false; + + unsigned count = 0; + const supernode *iter = snode; + while (true) +{ + if (iter->return_p ()) + return true; + if (iter->m_succs.length () != 1) + return false; + const superedge *sedge = iter->m_succs[0]; + if (sedge->get_kind () != SUPEREDGE_CFG_EDGE) + return false; + iter = sedge->m_dest; + + /* Impose a limit to ensure we terminate for pathological cases. + +We only care about the final 3 nodes, due to cases like: + BB: +(clobber causing leak) + + BB: + : + return _val; + + EXIT BB.*/ + if (++count > 3) + return false; +} +} + /* Find the best tree for SVAL and call SM's on_leak vfunc with it. If on_leak returns a pending_diagnostic, queue it up to be reported, so that we potentially complain about a leak of SVAL in the given STATE. */ @@ -794,8 +839,7 @@ impl_region_model_context::on_state_leak (const state_machine &sm, gcc_assert (m_enode_for_diag); /* Don't complain about leaks when returning from "main". */ - if (m_enode_for_diag->get_supernode () - && m_enode_for_diag->get_supernode ()->return_p ()) + if (returning_from_function_p (m_enode_for_diag->get_supernode ())) { tree fndecl = m_enode_for_diag->get_function ()->decl; if (id_equal (DECL_NAME (fndecl), "main")) diff --git a/gcc/testsuite/gcc.dg/analyzer/pr101983-main.c b/gcc/testsuite/gcc.dg/analyzer/pr101983-main.c new file mode 100644 index 000..a84353be35a --- /dev/null +++ b/gcc/testsuite/gcc.dg/analyzer/pr101983-main.c @@ -0,0 +1,38 @@ +/* { dg-additional-options "-Wno-analyzer-too-complex -fno-analyzer-call-summaries" } */ + +#include +#include +#include + +struct list { + struct list* next; + void *a; +}; + +void func(struct list **res) +{ + struct list *cur = NULL; + do { + struct list *n = malloc(sizeof(struct list)); + void *a = malloc(1); + if (n == NULL || a == NULL) { + if (n != NULL) free(n); + if (a != NULL) free(a); + break; + } + + if (cur == NULL) { + *res = cur = n; + } else { +
Re: [Patch] Fortran: Fix gfc_maybe_dereference_var [PR104430]
Hi Tobias, Am 07.03.22 um 15:01 schrieb Tobias Burnus: The problem is that inside the main program, y = f(z) where the the result of z is type(t) :: z(size(x%a)) and 'x' is a dummy argument. 'x' looses the attr.dummy in gfc_add_interface_mapping and this leads to an additional indirect ref in gfc_maybe_dereference_var - but after the first indirect ref, TREE_TYPE is alread a RECORD_TYPE ... The simple fix is to simply punt when the argument is not a pointer or reference. This patch additionally fixes a comment in conv_parent_component_references. LGTM. Regarding the commit message, there should be a period at the end of >(gfc_maybe_dereference_var): Avoid derefereing a nonpointer I think there are other PRs which profit from this fix. Can you please have a look at PR99585, and in particular the link in comment#0? ;-) Those parts are obvious. The only potentially risky change is the comparison change from a name-wise comparison to a pointer-based comparison. I think it is fine and the testsuite did not find any issue, but you prefer, I can also exclude it. I was thinking about this, too. But your change will prevent running into trouble in the (unlikely?) case c being a NULL pointer. OK for mainline? How about GCC 10/11 backports? (I think leaving out the last change, it should be very safe to do.) OK, as this is both an ICE-on-valid and a regression. Thanks for the patch! Harald Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH v8 04/12] LoongArch Port: Machine description files.
On Fri, 2022-03-04 at 15:18 +0800, xucheng...@loongson.cn wrote: > * config/loongarch/loongarch.md: New file. An ICE happens building OpenSSH-8.9p1. Investigation shows it's caused by the flag "-fzero-call-used-regs=". It's because the compiler attempts to clear FCCx registers but can't figure out how. This flag also triggers ICE for other targets (for example, PR 104820 for MIPS), and the related tests (zero-scratch-regs-{8,9,10,11}.c) are marked dg-skip for many targets. But it's unfortunate that packages like OpenSSH have already start to use this flag... I guess they just enabled it once they saw it was working for i386 :(. So it's better to solve the problem for a new target. A "quick fix" is adding an insn to clear FCCx. This is enough to build OpenSSH and make zero-scratch-regs-{8,9,10,11}.c PASS. diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index a9a8bc4b038..76c5ded9fe4 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -2020,6 +2020,12 @@ DONE; }) +;; Clear one FCC register +(define_insn "movfcc" [(set (match_operand:FCC 0 "register_operand" "=z") + (const_int 0))] + "" + "movgr2cf\t%0,$r0") + ;; Conditional move instructions. (define_insn "*sel_using_" -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]
Bootstrapped and regtested on mips64el-linux-gnuabi64. I'm not sure if it's "correct" to clobber other registers during the zeroing of scratch registers. But I can't really come up with a better idea: on MIPS there is no simple way to clear one bit in FCSR (i. e. FCC[x]). We can't just use "c.f.s $fccx,$f0,$f0" because it will raise an exception if $f0 contains a sNaN. -- This fixes the ICEs using -fzero-call-used-regs=all for MIPS target. OpenSSH-8.9p1 has started to enable this by default, giving us a reason to fix -fzero-call-used-regs for more targets. gcc/ PR target/104817 PR target/104820 * config/mips/mips.cc (mips_zero_call_used_regs): New function. (TARGET_ZERO_CALL_USED_REGS): Define. * config/mips/mips.md (mips_zero_fcc): New insn. gcc/testsuite * c-c++-common/zero-scratch-regs-8.c: Enable for MIPS. * c-c++-common/zero-scratch-regs-9.c: Likewise. * c-c++-common/zero-scratch-regs-10.c: Likewise. * c-c++-common/zero-scratch-regs-11.c: Likewise. --- gcc/config/mips/mips.cc | 47 +++ gcc/config/mips/mips.md | 6 +++ .../c-c++-common/zero-scratch-regs-10.c | 2 +- .../c-c++-common/zero-scratch-regs-11.c | 2 +- .../c-c++-common/zero-scratch-regs-8.c| 2 +- .../c-c++-common/zero-scratch-regs-9.c| 2 +- 6 files changed, 57 insertions(+), 4 deletions(-) diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc index 4f9683e8bf4..f0d6c8f564c 100644 --- a/gcc/config/mips/mips.cc +++ b/gcc/config/mips/mips.cc @@ -22611,6 +22611,51 @@ mips_asm_file_end (void) if (NEED_INDICATE_EXEC_STACK) file_end_indicate_exec_stack (); } + +static HARD_REG_SET +mips_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs) +{ + HARD_REG_SET zeroed_hardregs; + CLEAR_HARD_REG_SET (zeroed_hardregs); + + if (TEST_HARD_REG_BIT (need_zeroed_hardregs, HI_REGNUM)) +{ + /* Clear HI and LO altogether. MIPS target treats HILO as a +double-word register. */ + machine_mode dword_mode = TARGET_64BIT ? TImode : DImode; + rtx hilo = gen_rtx_REG (dword_mode, MD_REG_FIRST); + rtx zero = CONST0_RTX (dword_mode); + emit_move_insn (hilo, zero); + + SET_HARD_REG_BIT (zeroed_hardregs, HI_REGNUM); + if (TEST_HARD_REG_BIT (need_zeroed_hardregs, LO_REGNUM)) + SET_HARD_REG_BIT (zeroed_hardregs, LO_REGNUM); + else + emit_clobber (gen_rtx_REG (word_mode, LO_REGNUM)); +} + + bool zero_fcc = false; + for (int i = ST_REG_FIRST; i <= ST_REG_LAST; i++) +if (TEST_HARD_REG_BIT (need_zeroed_hardregs, i)) + zero_fcc = true; + + /* MIPS does not have a simple way to clear one bit in FCC. We just + clear FCC with ctc1 and clobber all FCC bits. */ + if (zero_fcc) +{ + emit_insn (gen_mips_zero_fcc ()); + for (int i = ST_REG_FIRST; i <= ST_REG_LAST; i++) + if (TEST_HARD_REG_BIT (need_zeroed_hardregs, i)) + SET_HARD_REG_BIT (zeroed_hardregs, i); + else + emit_clobber (gen_rtx_REG (CCmode, i)); +} + + need_zeroed_hardregs &= ~zeroed_hardregs; + return zeroed_hardregs | +default_zero_call_used_regs (need_zeroed_hardregs); +} + /* Initialize the GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP @@ -22919,6 +22964,8 @@ mips_asm_file_end (void) #undef TARGET_ASM_FILE_END #define TARGET_ASM_FILE_END mips_asm_file_end +#undef TARGET_ZERO_CALL_USED_REGS +#define TARGET_ZERO_CALL_USED_REGS mips_zero_call_used_regs struct gcc_target targetm = TARGET_INITIALIZER; diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md index e0f0a582732..edf58710cdd 100644 --- a/gcc/config/mips/mips.md +++ b/gcc/config/mips/mips.md @@ -96,6 +96,7 @@ (define_c_enum "unspec" [ ;; Floating-point environment. UNSPEC_GET_FCSR UNSPEC_SET_FCSR + UNSPEC_ZERO_FCC ;; HI/LO moves. UNSPEC_MFHI @@ -7670,6 +7671,11 @@ (define_insn "*mips_set_fcsr" "TARGET_HARD_FLOAT" "ctc1\t%0,$31") +(define_insn "mips_zero_fcc" + [(unspec_volatile [(const_int 0)] UNSPEC_ZERO_FCC)] + "TARGET_HARD_FLOAT" + "ctc1\t$0,$25") + ;; See tls_get_tp_mips16_ for why this form is used. (define_insn "mips_set_fcsr_mips16_" [(unspec_volatile:SI [(match_operand:P 0 "call_insn_operand" "dS") diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c index 96e0b79b328..c23b2ceb391 100644 --- a/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* aarch64*-*-* nvptx*-*-* s390*-*-* } } } */ +/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* aarch64*-*-* nvptx*-*-* s390*-*-* mips*-*-* } } } */ /* { dg-options "-O2" } */ #include diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-11.c b/gcc/test
Re: [PATCH] rs6000: Improve .machine
This breaks the build of libgcc for powerpc-linux-gnu (32-bit, default CPU; configured --disable-multilib --enable-secureplt). cc1: warning: The '-mfloat128' option may not be fully supported /tmp/ccHCPzSh.s: Assembler messages: /tmp/ccHCPzSh.s:163: Error: unrecognized opcode: `lfiwzx' /scratch/jmyers/glibc-bot/src/gcc/libgcc/shared-object.mk:14: recipe for target '_kf_to_sd.o' failed make[3]: *** [_kf_to_sd.o] Error 1 https://sourceware.org/pipermail/libc-testresults/2022q1/009421.html -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] rs6000: Fix up __SIZEOF_{FLOAT, IBM}128__ defines [PR99708]
Hi! On Sat, Mar 05, 2022 at 09:21:51AM +0100, Jakub Jelinek wrote: > As mentioned in the PR, right now on powerpc* __SIZEOF_{FLOAT,IBM}128__ > macros are predefined unconditionally, because {ieee,ibm}128_float_type_node > is always non-NULL, doesn't reflect whether __ieee128 or __ibm128 are > actually supported or not. > > The following patch: > 1) makes those {ieee,ibm}128_float_type_node trees NULL if we don't >really support them instead of equal to long_double_type_node > 2) adjusts the builtins code to use >ibm128_float_type_node ? ibm128_float_type_node : long_double_type_node >for the 2 builtins, so that we don't ICE during builtins creation >if ibm128_float_type_node is NULL (using error_mark_node instead of >long_double_type_node sounds more risky to me) I feel the opposite way: (potentially) using the wrong thing is just a ticking time bomb, never "safer". > 3) in rs6000_type_string will not match NULL as __ibm128, and adds >a __ieee128 case > 4) removes the clearly unused ptr_{ieee,ibm}128_float_type_node trees; >if something needs them in the future, they can be easily added back, >but wasting GTY just in case... > 5) actually syncs __SIZEOF_FLOAT128__ with the __float128 macro being >defined in addition to __ieee128 being usable > > Now, in the PR Segher said he doesn't like 5), but I think it is better > to match the reality and get this PR fixed and if we want to rethink > how __float128 is defined (whether it is a macro, or perhaps another > builtin type like __ieee128 which could be easily done by >lang_hooks.types.register_builtin_type (ieee128_float_type_node, > "__ieee128"); >lang_hooks.types.register_builtin_type (ieee128_float_type_node, > "__float128"); > perhaps under some conditions, rethink if the -mfloat128 option exists > and what it does etc., it can be done incrementally and the rs6000-c.cc > hunks in the patch can be easily reverted (appart from the formatting > fix). There needs to be a __SIZEOF_IEEE128__ as well, if you like reality :-) Sorry I did not pick up on that earlier. The basic types, that always exist if they are supported at all, are __ibm128 and __ieee128. Long double picks one of those, or some other type; and __float128 is a source of confusion (it tries to make things more similar to x86, but things are *not* similar anyway :-( ) Any code that cares about the actual mode used, or that supports more than one type, should use the explicit types. Code that just wants what "long double" stands for can just use that, of course. > Bootstrapped/regtested on powerpc{64le,64}-linux, on powerpc64-linux > with -m32/-m64 testing, ok for trunk? Tested with which -mcpu=? You need at least power7 to get __ieee128 support, but it should be tested *without* that as well. (The actual default for powerpc64-linux is power4 (so not even 970), and for powerpc-linux it is 603 or such.) > * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define > __SIZEOF_FLOAT128__ here and only if __float128 macro is defined. (if and only if?) > * config/rs6000/rs6000-gen-builtins.cc (TYPE_MAP_SIZE): Set to > 85 instead of 86. This should be auto-generated, or just not exist at all ("sizeof type_map / sizeof type_map[0]" in the one place it is used, instead -- "one place" is after removing it from the definition of the array of course, where it is unneeded :-) ) > static > const char *rs6000_type_string (tree type_node) > { > + if (type_node == NULL_TREE) > +return "unknown"; Please say this is NULL? If you want to indicate it is problematic, you can say "***NULL***" or such? > +{ "if", "ibm128_float_type_node " > + "? ibm128_float_type_node " > + ": long_double" }, I don't think that is correct. If there is no __ibm128, there is no IFmode either (they are two sides of the same coin). Using DFmode instead (which is the only thing that "long double" could be if not IFmode) will just ICE later, if this is used at all. So best case this will confuse the reader. Segher
Re: [PATCH] rs6000: Improve .machine
On Mon, Mar 07, 2022 at 08:53:42PM +, Joseph Myers wrote: > This breaks the build of libgcc for powerpc-linux-gnu (32-bit, default > CPU; configured --disable-multilib --enable-secureplt). > > cc1: warning: The '-mfloat128' option may not be fully supported > /tmp/ccHCPzSh.s: Assembler messages: > /tmp/ccHCPzSh.s:163: Error: unrecognized opcode: `lfiwzx' > /scratch/jmyers/glibc-bot/src/gcc/libgcc/shared-object.mk:14: recipe for > target '_kf_to_sd.o' failed > make[3]: *** [_kf_to_sd.o] Error 1 > > https://sourceware.org/pipermail/libc-testresults/2022q1/009421.html Thanks for the report! This is now PR104829. Segher
Re: [PATCH v3] libgo: Don't use pt_regs member in mcontext_t
On Sun, Mar 6, 2022 at 11:11 PM wrote: > > +#ifdef __PPC64__ > + ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gp_regs[32]; > +#else > + ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[32]; > +#endif Have you tested this in 32-bit mode? It does not look correct based on the glibc definitions. Looking at glibc it seems that it ought to be reg.sigpc = ((ucontext_t*)(context))->uc_mcontext.uc_regs->gregs[32]; Ian
Re: [pushed] c++: (*(fn))() [PR104618]
On 2/27/22 18:52, Jason Merrill wrote: The patch for PR90451 deferred marking to the point of actual use; we missed this one because of the parens. Tested x86_64-pc-linux-gnu, applying to trunk. PR c++/104618 gcc/cp/ChangeLog: * typeck.cc (cp_build_addr_expr_1): Also maybe_undo_parenthesized_ref. gcc/testsuite/ChangeLog: * g++.dg/overload/paren1.C: New test. --- gcc/cp/typeck.cc | 6 +++--- gcc/testsuite/g++.dg/overload/paren1.C | 7 +++ 2 files changed, 10 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/g++.dg/overload/paren1.C diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc index f796337f73c..bddc83759ad 100644 --- a/gcc/cp/typeck.cc +++ b/gcc/cp/typeck.cc @@ -6884,9 +6884,9 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue, tsubst_flags_t complain) so we can just form an ADDR_EXPR with the correct type. */ if (processing_template_decl || TREE_CODE (arg) != COMPONENT_REF) { - tree stripped_arg = tree_strip_any_location_wrapper (arg); - if (TREE_CODE (stripped_arg) == FUNCTION_DECL - && !mark_used (stripped_arg, complain) && !(complain & tf_error)) + tree stripped_arg + = tree_strip_any_location_wrapper (maybe_undo_parenthesized_ref (arg)); + if (!mark_single_function (stripped_arg, complain)) It occurred to me that it would be cleaner to do the stripping in mark_single_function. From efea883ae8a912f9e1081abf51b1e581232e5f30 Mon Sep 17 00:00:00 2001 From: Jason Merrill Date: Mon, 28 Feb 2022 12:05:34 -0400 Subject: [PATCH] c++: tweak to (*(fn))() patch [PR104618] To: gcc-patches@gcc.gnu.org Other callers of mark_single_function might also want to look through these wrapapers. PR c++/104618 gcc/cp/ChangeLog: * decl2.cc (mark_single_function): Look through parens and location wrapper. * typeck.cc (cp_build_addr_expr_1): Not here. --- gcc/cp/decl2.cc | 3 +++ gcc/cp/typeck.cc | 4 +--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc index 22edc2ba7f9..e85bb87c955 100644 --- a/gcc/cp/decl2.cc +++ b/gcc/cp/decl2.cc @@ -5737,6 +5737,9 @@ decl_dependent_p (tree decl) bool mark_single_function (tree expr, tsubst_flags_t complain) { + expr = maybe_undo_parenthesized_ref (expr); + expr = tree_strip_any_location_wrapper (expr); + if (is_overloaded_fn (expr) == 1 && !mark_used (expr, complain) && (complain & tf_error)) diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc index bddc83759ad..ea6a485bbbf 100644 --- a/gcc/cp/typeck.cc +++ b/gcc/cp/typeck.cc @@ -6884,9 +6884,7 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue, tsubst_flags_t complain) so we can just form an ADDR_EXPR with the correct type. */ if (processing_template_decl || TREE_CODE (arg) != COMPONENT_REF) { - tree stripped_arg - = tree_strip_any_location_wrapper (maybe_undo_parenthesized_ref (arg)); - if (!mark_single_function (stripped_arg, complain)) + if (!mark_single_function (arg, complain)) return error_mark_node; val = build_address (arg); if (TREE_CODE (arg) == OFFSET_REF) -- 2.27.0
[PATCH] RISCV: Strengthen libatomic lrsc pairs
Currently, libatomic's _sync_fetch_and_#op# and __sync_val_compare_and_swap methods are not sufficiently strong for the ATOMIC_SEQ_CST memory model. This can be shown using the following Herd litmus test: RISCV LRSC-LIB-CALL { 0:x6=a; 0:x8=b; 0:x10=c; 1:x6=a; 1:x8=b; 1:x10=c; } P0 | P1 ; ori x1,x0,1 | lw x9,0(x10) ; sw x1,0(x10)| fence rw,rw ; lr.w.aq x7,0(x8)| lw x5,0(x6) ; ori x7,x0,1 | ; sc.w.rl x3,x7,0(x8) | ; sw x1,0(x6) | ; exists (1:x9=1 /\ 1:x5=0 /\ b=1) This patch enforces SEQ_CST by setting the .aqrl bits on the LR and SC ops. 2022-03-07 Patrick O'Neill PR target/104831 * atomic.c: Change LR.aq/SC.rl pairs into sequentially consistent LR.aqrl/SC.aqrl pair. Signed-off-by: Patrick O'Neill --- RISCV LRSC-BUGFIX { 0:x6=a; 0:x8=b; 0:x10=c; 1:x6=a; 1:x8=b; 1:x10=c; } P0| P1 ; ori x1,x0,1 | lw x9,0(x10) ; sw x1,0(x10) | fence rw,rw ; lr.w.aqrl x7,0(x8)| lw x5,0(x6) ; ori x7,x0,1 | ; sc.w.aqrl x3,x7,0(x8) | ; sw x1,0(x6) | ; ~exists (1:x9=1 /\ 1:x5=0 /\ b=1) --- Below are other Herd litmus tests used to test the LR.aqrl/SC.aqrl fix. --- RISCV LRSC-READ (* LR/SC with .aq .rl bits does not allow read operations to be reordered within/above it. *) { 0:x6=a; 0:x8=b; 1:x6=a; 1:x8=b; } P0 | P1 ; lr.w.aq.rl x7,0(x8)| ori x1,x0,1 ; ori x7,x0,1| sw x1,0(x6) ; sc.w.aq.rl x1,x7,0(x8) | fence rw,rw ; lw x5,0(x6)| lw x7,0(x8) ; ~exists (0:x5=0 /\ 1:x7=0 /\ b=1) --- RISCV READ-LRSC (* LR/SC with .aq .rl bits does not allow read operations to be reordered within/beneath it. *) { 0:x6=a; 0:x8=b; 1:x6=a; 1:x8=b; } P0 | P1 ; lw x5,0(x6)| ori x1,x0,1 ; lr.w.aq.rl x7,0(x8)| sw x1,0(x8) ; ori x1,x0,1| fence rw,rw ; sc.w.aq.rl x1,x1,0(x8) | sw x1,0(x6) ; ~exists (0:x5=1 /\ 0:x7=0 /\ b=1) --- RISCV LRSC-WRITE (* LR/SC with .aq .rl bits does not allow write operations to be reordered within/above it. *) { 0:x8=b; 0:x10=c; 1:x8=b; 1:x10=c; } P0 | P1 ; ori x9,x0,1| lw x9,0(x10); lr.w.aq.rl x7,0(x8)| fence rw,rw ; ori x7,x0,1| lw x7,0(x8) ; sc.w.aq.rl x1,x7,0(x8) | ; sw x9,0(x10) | ; ~exists (1:x9=1 /\ 1:x7=0 /\ b=1) --- RISCV WRITE-LRSC (* LR/SC with .aq .rl bits does not allow write operations to be reordered within/beneath it. *) { 0:x8=b; 0:x10=c; 1:x8=b; 1:x10=c; } P0 | P1 ; ori x1,x0,1| ori x1,x0,1 ; sw x1,0(x10) | sw x1,0(x8) ; lr.w.aq.rl x7,0(x8)| fence rw,rw ; sc.w.aq.rl x1,x1,0(x8) | lw x9,0(x10) ; ~exists (0:x7=0 /\ 1:x9=0 /\ b=1) --- libgcc/config/riscv/atomic.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c index 7007e7a20e4..0c85a6d00ea 100644 --- a/libgcc/config/riscv/atomic.c +++ b/libgcc/config/riscv/atomic.c @@ -39,13 +39,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see unsigned old, tmp1, tmp2; \ \ asm volatile ("1:\n\t" \ - "lr.w.aq %[old], %[mem]\n\t" \ + "lr.w.aqrl %[old], %[mem]\n\t"\ #insn " %[tmp1], %[old], %[value]\n\t"\ invert\ "and %[tmp1], %[tmp1], %[mask]\n\t" \ "and %[tmp2], %[old], %[not_mask]\n\t"\ "or %[tmp2], %[tmp2], %[tmp1]\n\t"\ - "sc.w.rl %[tmp1], %[tmp2], %[mem]\n\t"\ + "sc.w.aqrl %[tmp1], %[tmp2], %[mem]\n\t" \ "bnez %[tmp1], 1b"\ : [old] "=&r" (old), \ [mem] "+A" (*(volatile unsigned*) aligned_addr),\ @@ -73,12 +73,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see unsigned old, tmp1; \ \ asm volatile ("1:\n\t" \ - "lr.w.aq %[old], %[mem]\n\t" \ + "lr.w.aqrl %[old], %[mem]\n\t" \ "and %[tmp1], %[old], %[mask]\n\t"\ "
Re: [C++ PATCH] PR c++/96437: ICE-on-invalid-code error recovery.
On 3/7/22 08:33, Roger Sayle wrote: This patch fixes PR c++/96437 which is an ICE-on-invalid-code regression affecting mainline. This patch has been tested on x86_64-pc-linux-gnu, enabling languages c and c++, with make bootstrap and make -k check with no new failures. Ok for mainline? 2022-03-07 Roger Sayle gcc/cp/ChangeLog PR c++/96437 * parser.cc (synthesize_implicit_template_parm): Check that TREE_VALUE (new_parm) isn't error_mark_node before setting its DECL_VIRTUAL_P. gcc/testsuite/ChangeLog PR c++/96437 * g++.dg/pr96437.C: New test case. + if (TREE_VALUE (new_parm) != error_mark_node) +DECL_VIRTUAL_P (TREE_VALUE (new_parm)) = true; Hmm, I wonder about returning early if there was an error, but this is fine as is. +/* { dg-options "-O2" } */ This also seems unneeded for this test. Also, please put tests in a subdirectory of g++.dg, not the top directory. This one might go in cpp2a/ since it's a C++20 feature. OK with those adjustments. Jason
Re: [C++ PATCH] PR c++/96329: ICE-on-invalid-code error recovery.
On 3/7/22 08:04, Roger Sayle wrote: This patch fixes PR c++/96329 which is an ICE-on-invalid-code regression affecting mainline. This patch has been tested on x86_64-pc-linux-gnu, enabling languages c and c++, with make bootstrap and make -k check with no new failures. Ok for mainline? 2022-03-07 Roger Sayle gcc/cp/ChangeLog PR c++/96329 * parser.cc (cp_parser_linkage_specification): Treat the case where linkage is error_mark_node as "invalid linkage-specification". gcc/testsuite/ChangeLog PR c++/96329 * g++.dg/pr96329.C: New test case. +/* { dg-options "-O2" } */ Is this necessary for the failure? That would be surprising for a parser issue. And this should probably go in g++.dg/template. OK with those adjustments. Jason
Re: [C++ PATCH] PR c++/96440: ICE-on-invalid-code error recovery.
On 3/7/22 08:55, Roger Sayle wrote: This patch fixes PR c++/96440 which is an ICE-on-invalid-code regression affecting mainline. This patch has been tested on x86_64-pc-linux-gnu, enabling languages c and c++, with make bootstrap and make -k check with no new failures. Ok for mainline? 2022-03-07 Roger Sayle gcc/cp/ChangeLog PR c++/96440 * decl.cc (start_decl): Defend against prefix_attributes being error_mark_node. gcc/testsuite/ChangeLog PR c++/96440 * g++.dg/pr96440.C: New test case. +/* { dg-options "-O2" } */ Please remove this line and put the test in g++.dg/cpp0x/ OK with those changes. Jason
[PATCH] rtl: ICE with thread_local and inline asm [PR104777]
In r270550, Jakub fixed classify_insn to handle asm goto: if the asm can jump to a label, the insn should be a JUMP_INSN. However, as the following testcase shows, non-null ASM_OPERANDS_LABEL_VEC doesn't guarantee that the rtx has any actual labels it can branch to. Here, the rtvec has 0 elements because of the __thread variable: we perform ix86_rewrite_tls_address which calls copy_isns and that allocates the rtvec: XVEC (copy, i) = rtvec_alloc (XVECLEN (orig, i)); This causes an ICE in update_br_prob_note: BRANCH_EDGE (bb) crashes because there's no branch edge. I think we can fix this by checking that there is at least one label the asm can jump to before wrapping the ASM_OPERANDS in a JUMP_INSN. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11? PR rtl-optimization/104777 gcc/ChangeLog: * rtl.cc (classify_insn): For ASM_OPERANDS, return JUMP_INSN only if ASM_OPERANDS_LABEL_VEC has at least one element. gcc/testsuite/ChangeLog: * gcc.dg/torture/tls/pr104777.c: New test. --- gcc/rtl.cc | 7 +++-- gcc/testsuite/gcc.dg/torture/tls/pr104777.c | 30 + 2 files changed, 35 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/tls/pr104777.c diff --git a/gcc/rtl.cc b/gcc/rtl.cc index f17474bfee1..a29d5e1371e 100644 --- a/gcc/rtl.cc +++ b/gcc/rtl.cc @@ -765,7 +765,9 @@ classify_insn (rtx x) return CALL_INSN; if (ANY_RETURN_P (x)) return JUMP_INSN; - if (GET_CODE (x) == ASM_OPERANDS && ASM_OPERANDS_LABEL_VEC (x)) + if (GET_CODE (x) == ASM_OPERANDS + && ASM_OPERANDS_LABEL_VEC (x) + && GET_NUM_ELEM (ASM_OPERANDS_LABEL_VEC (x)) > 0) return JUMP_INSN; if (GET_CODE (x) == SET) { @@ -794,7 +796,8 @@ classify_insn (rtx x) if (has_return_p) return JUMP_INSN; if (GET_CODE (XVECEXP (x, 0, 0)) == ASM_OPERANDS - && ASM_OPERANDS_LABEL_VEC (XVECEXP (x, 0, 0))) + && ASM_OPERANDS_LABEL_VEC (XVECEXP (x, 0, 0)) + && GET_NUM_ELEM (ASM_OPERANDS_LABEL_VEC (XVECEXP (x, 0, 0))) > 0) return JUMP_INSN; } #ifdef GENERATOR_FILE diff --git a/gcc/testsuite/gcc.dg/torture/tls/pr104777.c b/gcc/testsuite/gcc.dg/torture/tls/pr104777.c new file mode 100644 index 000..abaf59731fc --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/tls/pr104777.c @@ -0,0 +1,30 @@ +/* PR rtl-optimization/104777 */ +/* { dg-do compile } */ +/* { dg-require-effective-target tls } */ + +int savestate_r; +int savestate_ssb; +extern void abort(); +__thread int loop; +void f (void) +{ + int savestate_r0_5; + int savestate_r1_6; + + __asm__("" : "=m" (savestate_ssb), "=r" (savestate_r)); + savestate_r0_5 = savestate_r; + if (savestate_r0_5 == 0) + { +__asm__ __volatile__("" : : "m" (loop)); +abort (); + } + + __asm__("" : "=m" (savestate_ssb), "=r" (savestate_r)); + savestate_r1_6 = savestate_r; + if (savestate_r1_6 != 0) +return; + + __asm__ __volatile__("" : : "m" (loop)); + abort (); + +} base-commit: b7dbe870bc0193c76345f893d94c5bf6c99a6afe -- 2.35.1
Re: [PATCH] rtl: ICE with thread_local and inline asm [PR104777]
On Mon, Mar 07, 2022 at 07:03:17PM -0500, Marek Polacek via Gcc-patches wrote: > In r270550, Jakub fixed classify_insn to handle asm goto: if the asm can > jump to a label, the insn should be a JUMP_INSN. > > However, as the following testcase shows, non-null ASM_OPERANDS_LABEL_VEC > doesn't guarantee that the rtx has any actual labels it can branch to. But it should. > Here, the rtvec has 0 elements because of the __thread variable: we perform > ix86_rewrite_tls_address which calls copy_isns and that allocates the rtvec: > > XVEC (copy, i) = rtvec_alloc (XVECLEN (orig, i)); So fix *that* instead? Everywhere else does not use length zero RTL vectors. copy_rtx makes sure to do the right thing here, for example. We do not have notation to create zero-length vectors in RTL source code either, btw.: case 'V': /* 'V' is an optional vector: if a closeparen follows, just store NULL for this element. */ (optional vectors are at the end of an RTX), and if you write [] you will hit fatal_with_file_and_line ("vector must have at least one element"); Segher
Re: [PATCH] [i386] Don't fold builtin into gimple when isa mismatches.
ping^1 On Fri, Feb 25, 2022 at 1:51 PM Hongtao Liu wrote: > > On Fri, Feb 25, 2022 at 1:50 PM liuhongt wrote: > > > > The patch fixes ICE in ix86_gimple_fold_builtin. > > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > Ok for main trunk? > > > gcc/ChangeLog: > > > > PR target/104666 > > * config/i386/i386-expand.cc > > (ix86_check_builtin_isa_match): New func. > > (ix86_expand_builtin): Move code to > > ix86_check_builtin_isa_match and call it. > > * config/i386/i386-protos.h > > (ix86_check_builtin_isa_match): Declare. > > * config/i386/i386.cc (ix86_gimple_fold_builtin): Don't fold > > builtin into gimple when isa mismatches. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr104666.c: New test. > > --- > > gcc/config/i386/i386-expand.cc | 97 ++-- > > gcc/config/i386/i386-protos.h| 5 ++ > > gcc/config/i386/i386.cc | 4 + > > gcc/testsuite/gcc.target/i386/pr104666.c | 49 > > 4 files changed, 115 insertions(+), 40 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr104666.c > > > > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc > > index faa0191c6dd..1d132f0181d 100644 > > --- a/gcc/config/i386/i386-expand.cc > > +++ b/gcc/config/i386/i386-expand.cc > > @@ -12232,46 +12232,14 @@ ix86_expand_vec_set_builtin (tree exp) > >return target; > > } > > > > -/* Expand an expression EXP that calls a built-in function, > > - with result going to TARGET if that's convenient > > - (and in mode MODE if that's convenient). > > - SUBTARGET may be used as the target for computing one of EXP's operands. > > - IGNORE is nonzero if the value is to be ignored. */ > > - > > -rtx > > -ix86_expand_builtin (tree exp, rtx target, rtx subtarget, > > -machine_mode mode, int ignore) > > +/* Return true if the necessary isa options for this builtin exist, > > + else false. > > + fcode = DECL_MD_FUNCTION_CODE (fndecl); */ > > +bool > > +ix86_check_builtin_isa_match (unsigned int fcode, > > + HOST_WIDE_INT* pbisa, > > + HOST_WIDE_INT* pbisa2) > > { > > - size_t i; > > - enum insn_code icode, icode2; > > - tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0); > > - tree arg0, arg1, arg2, arg3, arg4; > > - rtx op0, op1, op2, op3, op4, pat, pat2, insn; > > - machine_mode mode0, mode1, mode2, mode3, mode4; > > - unsigned int fcode = DECL_MD_FUNCTION_CODE (fndecl); > > - > > - /* For CPU builtins that can be folded, fold first and expand the fold. > > */ > > - switch (fcode) > > -{ > > -case IX86_BUILTIN_CPU_INIT: > > - { > > - /* Make it call __cpu_indicator_init in libgcc. */ > > - tree call_expr, fndecl, type; > > -type = build_function_type_list (integer_type_node, NULL_TREE); > > - fndecl = build_fn_decl ("__cpu_indicator_init", type); > > - call_expr = build_call_expr (fndecl, 0); > > - return expand_expr (call_expr, target, mode, EXPAND_NORMAL); > > - } > > -case IX86_BUILTIN_CPU_IS: > > -case IX86_BUILTIN_CPU_SUPPORTS: > > - { > > - tree arg0 = CALL_EXPR_ARG (exp, 0); > > - tree fold_expr = fold_builtin_cpu (fndecl, &arg0); > > - gcc_assert (fold_expr != NULL_TREE); > > - return expand_expr (fold_expr, target, mode, EXPAND_NORMAL); > > - } > > -} > > - > >HOST_WIDE_INT isa = ix86_isa_flags; > >HOST_WIDE_INT isa2 = ix86_isa_flags2; > >HOST_WIDE_INT bisa = ix86_builtins_isa[fcode].isa; > > @@ -12321,7 +12289,56 @@ ix86_expand_builtin (tree exp, rtx target, rtx > > subtarget, > >bisa |= OPTION_MASK_ISA_SSE2; > > } > > > > - if ((bisa & isa) != bisa || (bisa2 & isa2) != bisa2) > > + if (pbisa) > > +*pbisa = bisa; > > + if (pbisa2) > > +*pbisa2 = bisa2; > > + > > + return (bisa & isa) == bisa && (bisa2 & isa2) == bisa2; > > +} > > + > > +/* Expand an expression EXP that calls a built-in function, > > + with result going to TARGET if that's convenient > > + (and in mode MODE if that's convenient). > > + SUBTARGET may be used as the target for computing one of EXP's operands. > > + IGNORE is nonzero if the value is to be ignored. */ > > + > > +rtx > > +ix86_expand_builtin (tree exp, rtx target, rtx subtarget, > > +machine_mode mode, int ignore) > > +{ > > + size_t i; > > + enum insn_code icode, icode2; > > + tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0); > > + tree arg0, arg1, arg2, arg3, arg4; > > + rtx op0, op1, op2, op3, op4, pat, pat2, insn; > > + machine_mode mode0, mode1, mode2, mode3, mode4; > > + unsigned int fcode = DECL_MD_FUNCTION_CODE (fndecl); > > + HOST_WIDE_INT bisa, bisa2; > > + > > + /* For CPU builtins that can be folded, fold first and expand the fold. > > */ > > + switch (fcode) > > +{ > > +case IX86_BUI
[PATCH v2] RISCV: Strengthen libatomic lrsc pairs
Currently, libatomic's _sync_fetch_and_#op# and __sync_val_compare_and_swap methods are not sufficiently strong for the ATOMIC_SEQ_CST memory model. This can be shown using the following Herd litmus test: RISCV LRSC-LIB-CALL { 0:x6=a; 0:x8=b; 0:x10=c; 1:x6=a; 1:x8=b; 1:x10=c; } P0 | P1 ; ori x1,x0,1 | lw x7,0(x8) ; sw x1,0(x10)| fence rw,rw ; lr.w.aq x7,0(x8)| lw x5,0(x6) ; ori x7,x0,1 | ; sc.w.rl x3,x7,0(x8) | ; fence rw,w | ; sw x1,0(x6) | ; exists (1:x7=1 /\ 1:x5=0 /\ b=1) This patch enforces SEQ_CST by setting the .aqrl bits on the LR and .rl bits on SC ops. 2022-03-07 Patrick O'Neill PR target/104831 * atomic.c: Change LR.aq/SC.rl pairs into sequentially consistent LR.aqrl/SC.rl pair. Signed-off-by: Patrick O'Neill --- Changelog v2: - Weakened LR/SC pairs to be in-line with ISA manual. - Updated litmus tests to reflect the relevant leading fences present in the RISCV implementation. --- RISCV LRSC-BUGFIX { 0:x6=a; 0:x8=b; 0:x10=c; 1:x6=a; 1:x8=b; 1:x10=c; } P0 | P1 ; ori x1,x0,1 | lw x7,0(x8) ; sw x1,0(x10)| fence rw,rw ; lr.w.aq x7,0(x8)| lw x5,0(x6) ; ori x7,x0,1 | ; sc.w.rl x3,x7,0(x8) | ; fence rw,w | ; sw x1,0(x6) | ; exists (1:x7=1 /\ 1:x5=0 /\ b=1) --- Below are other Herd litmus tests used to test the LR.aqrl/SC.aqrl fix. --- RISCV LRSC-READ (* LR/SC with .aq .rl bits does not allow read operations to be reordered within/above it. *) { 0:x6=a; 0:x8=b; 1:x6=a; 1:x8=b; } P0 | P1 ; lr.w.aq.rl x7,0(x8) | ori x1,x0,1 ; ori x7,x0,1 | sw x1,0(x6) ; sc.w.rl x1,x7,0(x8) | fence rw,rw ; fence rw,rw | lw x7,0(x8) ; lw x5,0(x6) | ; ~exists (0:x5=0 /\ 1:x7=0 /\ b=1) --- RISCV READ-LRSC (* LR/SC with .aq .rl bits does not allow read operations to be reordered within/beneath it. *) { 0:x6=a; 0:x8=b; 1:x6=a; 1:x8=b; } P0 | P1 ; lw x5,0(x6) | ori x1,x0,1 ; lr.w.aq.rl x7,0(x8) | sw x1,0(x8) ; ori x1,x0,1 | fence rw,rw ; sc.w.rl x1,x1,0(x8) | sw x1,0(x6) ; ~exists (0:x5=1 /\ 0:x7=0 /\ b=1) --- RISCV LRSC-WRITE (* LR/SC with .aq .rl bits does not allow write operations to be reordered within/above it. *) { 0:x8=b; 0:x10=c; 1:x8=b; 1:x10=c; } P0 | P1 ; ori x9,x0,1 | lw x9,0(x10); lr.w.aq.rl x7,0(x8) | fence rw,rw ; ori x7,x0,1 | lw x7,0(x8) ; sc.w.rl x1,x7,0(x8) | ; fence rw,w | ; sw x9,0(x10)| ; ~exists (1:x9=1 /\ 1:x7=0 /\ b=1) --- RISCV WRITE-LRSC (* LR/SC with .aq .rl bits does not allow write operations to be reordered within/beneath it. *) { 0:x8=b; 0:x10=c; 1:x8=b; 1:x10=c; } P0 | P1 ; ori x1,x0,1 | ori x1,x0,1 ; sw x1,0(x10)| sw x1,0(x8) ; lr.w.aq.rl x7,0(x8) | fence rw,rw ; sc.w.rl x1,x1,0(x8) | lw x9,0(x10) ; ~exists (0:x7=0 /\ 1:x9=0 /\ b=1) --- libgcc/config/riscv/atomic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c index 7007e7a20e4..fa0e428963f 100644 --- a/libgcc/config/riscv/atomic.c +++ b/libgcc/config/riscv/atomic.c @@ -39,7 +39,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see unsigned old, tmp1, tmp2; \ \ asm volatile ("1:\n\t" \ - "lr.w.aq %[old], %[mem]\n\t" \ + "lr.w.aqrl %[old], %[mem]\n\t"\ #insn " %[tmp1], %[old], %[value]\n\t"\ invert\ "and %[tmp1], %[tmp1], %[mask]\n\t" \ @@ -73,7 +73,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see unsigned old, tmp1; \ \ asm volatile ("1:\n\t" \ - "lr.w.aq %[old], %[mem]\n\t" \ + "lr.w.aqrl %[old], %[mem]\n\t" \ "and %[tmp1], %[old], %[mask]\n\t"\ "bne %[tmp1], %[o], 1f\n\t" \ "and %[tmp1], %[old], %[not_mask]\n\t"\ -- 2.25.1
[PATCH] RISC-V: Handle combine extension in canonical ordering.
From: LiaoShihua The crypto extension have several shorthand extensions that don't consist of any extra instructions. Take zk for example, while the extension would imply zkn, zkr, zkt. The 3 extensions should also combine back into zk to maintain the canonical order in isa strings. This patch addresses the above. And if the other extension has the same situation, you can add them in riscv_combine_info[] gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::handle_combine_ext):Combine back into zk to maintain the canonical order in isa strings. (riscv_subset_list::parse):Ditto. * config/riscv/riscv-subset.h:Declare handle_combine_ext(); gcc/testsuite/ChangeLog: * gcc.target/riscv/predef-17.c: New test. --- gcc/common/config/riscv/riscv-common.cc| 56 +++ gcc/config/riscv/riscv-subset.h| 1 + gcc/testsuite/gcc.target/riscv/predef-17.c | 63 ++ 3 files changed, 120 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/predef-17.c diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index a904893b9ed..1c06f83cc1c 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -189,6 +189,16 @@ static const struct riscv_ext_version riscv_ext_version_table[] = {NULL, ISA_SPEC_CLASS_NONE, 0, 0} }; +/* Combine extensions defined in this table */ +static const struct riscv_ext_version riscv_combine_info[] = +{ + {"zk", ISA_SPEC_CLASS_NONE, 1, 0}, + {"zkn", ISA_SPEC_CLASS_NONE, 1, 0}, + {"zks", ISA_SPEC_CLASS_NONE, 1, 0}, + /* Terminate the list. */ + {NULL, ISA_SPEC_CLASS_NONE, 0, 0} +}; + static const riscv_cpu_info riscv_cpu_tables[] = { #define RISCV_CORE(CORE_NAME, ARCH, TUNE) \ @@ -813,6 +823,50 @@ riscv_subset_list::handle_implied_ext (riscv_subset_t *ext) } } +/* Check any combine extensions for EXT. */ +void +riscv_subset_list::handle_combine_ext (riscv_subset_list *subset_list) +{ + const riscv_ext_version *combine_info; + const riscv_implied_info_t *implied_info; + bool IsCombined = false; + + for (combine_info = &riscv_combine_info[0]; combine_info->name; ++combine_info) + { + +/* Skip if combine extensions are present */ +if (subset_list->lookup(combine_info->name)) + continue; + +/* Find all extensions of the combine extension */ +for (implied_info = &riscv_implied_info[0]; implied_info->ext; ++implied_info) +{ + /* Skip if implied extension don't match combine extension */ + if (strcmp(combine_info->name, implied_info->ext) != 0) +continue; + + if (subset_list->lookup(implied_info->implied_ext)) + { +IsCombined = true; + } + else + { +IsCombined = false; +break; + } +} + +/* Add combine extensions */ +if (IsCombined) +{ + if (subset_list->lookup(combine_info->name) == NULL) + { +subset_list->add (combine_info->name, combine_info->major_version, combine_info->minor_version, false, true); + } +} + } +} + /* Parsing function for multi-letter extensions. Return Value: @@ -992,6 +1046,8 @@ riscv_subset_list::parse (const char *arch, location_t loc) subset_list->handle_implied_ext (itr); } + subset_list->handle_combine_ext (subset_list); + return subset_list; fail: diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h index 4f3556a8d9b..da2e22d34f2 100644 --- a/gcc/config/riscv/riscv-subset.h +++ b/gcc/config/riscv/riscv-subset.h @@ -68,6 +68,7 @@ private: const char *); void handle_implied_ext (riscv_subset_t *); + void handle_combine_ext (riscv_subset_list *); public: ~riscv_subset_list (); diff --git a/gcc/testsuite/gcc.target/riscv/predef-17.c b/gcc/testsuite/gcc.target/riscv/predef-17.c new file mode 100644 index 000..68f5f95a66c --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/predef-17.c @@ -0,0 +1,63 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64i_zbkb_zbkc_zbkx_zknd_zkne_zknh_zksed_zksh_zkr_zkt -mabi=lp64 -mcmodel=medlow -misa-spec=2.2" } */ + +int main () { + +#ifndef __riscv_arch_test +#error "__riscv_arch_test" +#endif + +#if __riscv_xlen != 64 +#error "__riscv_xlen" +#endif + +#if !defined(__riscv_i) +#error "__riscv_i" +#endif + +#if !defined(__riscv_zk) +#error "__riscv_zk" +#endif + +#if !defined(__riscv_zkr) +#error "__riscv_zkr" +#endif + +#if !defined(__riscv_zkn) +#error "__riscv_zkn" +#endif + +#if !defined(__riscv_zks) +#error "__riscv_zks" +#endif + +#if !defined(__riscv_zbkb) +#error "__riscv_zbkb" +#endif + +#if !defined(__riscv_zbkc) +#error "__riscv_zbkc" +#endif + +#if !defined(__riscv_zbkx) +#error "__riscv_zbkx" +#endif + +#if !defined(__riscv_zknd) +#error "__riscv_zknd" +#endif + +#if !defined(__riscv_zkne) +#error "__riscv_zkne" +#endif
Re: [PATCH] [i386] Prevent vectorization for load from parm_decl at O2 to avoid STF issue.
On Mon, Mar 7, 2022 at 5:37 PM Richard Biener via Gcc-patches wrote: > > On Fri, Mar 4, 2022 at 8:27 AM liuhongt wrote: > > > > For parameter passing through stack, vectorized load from parm_decl > > in callee may trigger serious STF issue. This is why GCC12 regresses > > 50% for cray at -O2 compared to GCC11. > > > > The patch add an extremely large number to stmt_cost to prevent > > vectorization for loads from parm_decl under very-cheap cost model, > > this can at least prevent O2 regression due to STF issue, but may lose > > some perf where there's no such issue(1 vector_load vs n scalar_load + > > CTOR). > > Note this is just heuristics in that by-value passed parameters are usually > stored to the stack close before the function call. It does not catch the > similar case from > > foo (const X &bar) { ... } > > where a > > foo ({ 1., 2. }) > > will have the object passed by reference constructed right before the > call. In the end a full solution will need to perform some IPA analysis > that computes the initialization distance from the call and uses should > factor in the use distance from function entry. Yes, this patch only deals with by-value passed objects which has nothing to do with ipa, but only depends on psABI. For the case of passing references, IPA is necessary to help analyze some obvious STFS scenarios, but static analysis still has limitations, for example, for the store across the cache line (or page) boundary case, IPA can not give a judgment. > > > No impact for SPEC2017 for both plain O2 and native O2 on ICX. > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > > Ok for trunk? > > > > gcc/ChangeLog: > > > > PR target/101908 > > * config/i386/i386.cc (ix86_load_maybe_stfs_p): New. > > (ix86_vector_costs::add_stmt_cost): Add extra cost for > > vector_load/unsigned_load which may have stall forward issue. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr101908-1.c: New test. > > * gcc.target/i386/pr101908-2.c: New test. > > --- > > gcc/config/i386/i386.cc| 31 ++ > > gcc/testsuite/gcc.target/i386/pr101908-1.c | 12 + > > gcc/testsuite/gcc.target/i386/pr101908-2.c | 12 + > > 3 files changed, 55 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-1.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-2.c > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index b2bf90576d5..3bbaaf65ea8 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -22976,6 +22976,19 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, > > struct noce_if_info *if_info) > >return default_noce_conversion_profitable_p (seq, if_info); > > } > > > > +/* Return true if REF may have STF issue, otherwise false. */ > > +static bool > > +ix86_load_maybe_stfs_p (tree ref) > > +{ > > + tree addr = get_base_address (ref); > > + > > + if (TREE_CODE (addr) != PARM_DECL > > + || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (addr))) > > + || tree_to_uhwi (TYPE_SIZE (TREE_TYPE (addr))) <= MAX_BITS_PER_WORD) > > +return false; > > + return true; > > +} > > + > > /* x86-specific vector costs. */ > > class ix86_vector_costs : public vector_costs > > { > > @@ -23203,6 +23216,24 @@ ix86_vector_costs::add_stmt_cost (int count, > > vect_cost_for_stmt kind, > > if (TREE_CODE (op) == SSA_NAME) > > TREE_VISITED (op) = 0; > > } > > + > > + /* Prevent vectorization for load from parm_decl at O2 to avoid STF > > issue. > > + Performance may lose when there's no STF issue(1 vector_load vs n > > + scalar_load + CTOR). > > + TODO: both extra cost(2000) and ix86_load_maybe_stfs_p need to be fine > > + tuned. */ > > + if ((kind == vector_load || kind == unaligned_load) > > + && flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP > > This check doesn't make much sense, I'd rather remove it. Will change. > > > + && stmt_info > > + && stmt_info->slp_type == pure_slp > > + && stmt_info->stmt > > + && gimple_assign_load_p (stmt_info->stmt) > > + && ix86_load_maybe_stfs_p (gimple_assign_rhs1 (stmt_info->stmt))) > > I'd pass down STMT_VINFO_DATA_REF instead and have ix86_load_maybe_stfs_p > and use > > tree addr = DR_BASE_ADDRESS (dr); > if (TREE_CODE (addr) != ADDR_EXPR) > return false; > addr = get_base_address (TREE_OPERAND (addr, 0)); > ... > > since that gets you a more reliable way to look at the actual object > referenced. Yes. > > > +{ > > + stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, > > misalign); > > + stmt_cost += 2000; > > +} > > + > > Maybe handle this like the Bonell case and thus put it after the > stmt_cost == -1 handling, just bumping the cost (also noting the actual number > is arbitrary). It would be nice to have a better estimate on the penalty than > "2000", maybe formulate
[PATCH] c++: Don't suggest cdtor or conversion op identifiers in spelling hints [PR104806]
Hi! On the following testcase, we emit "did you mean '__dt '?" in the error message. "__dt " shows there because it is dtor_identifier, but we shouldn't suggest those to the user, they are purely internal and can't be really typed by the user because of the final space in it. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2022-03-08 Jakub Jelinek PR c++/104806 * search.cc (lookup_field_fuzzy_info::fuzzy_lookup_field): Ignore identifiers with space at the end. * g++.dg/spellcheck-pr104806.C: New test. --- gcc/cp/search.cc.jj 2022-01-18 11:58:59.407984557 +0100 +++ gcc/cp/search.cc2022-03-07 10:44:33.455673155 +0100 @@ -1275,6 +1275,13 @@ lookup_field_fuzzy_info::fuzzy_lookup_fi if (is_lambda_ignored_entity (field)) continue; + /* Ignore special identifiers with space at the end like cdtor or +conversion op identifiers. */ + if (TREE_CODE (DECL_NAME (field)) == IDENTIFIER_NODE) + if (unsigned int len = IDENTIFIER_LENGTH (DECL_NAME (field))) + if (IDENTIFIER_POINTER (DECL_NAME (field))[len - 1] == ' ') + continue; + m_candidates.safe_push (DECL_NAME (field)); } } --- gcc/testsuite/g++.dg/spellcheck-pr104806.C.jj 2022-03-07 10:34:07.224499657 +0100 +++ gcc/testsuite/g++.dg/spellcheck-pr104806.C 2022-03-07 10:43:41.900399808 +0100 @@ -0,0 +1,5 @@ +// PR c++/104806 + +struct S {}; +int main() { S s; s.__d; } // { dg-bogus "'struct S' has no member named '__d'; did you mean '__\[a-z]* '" } + // { dg-error "'struct S' has no member named '__d'" "" { target *-*-* } .-1 } Jakub