Re: [PATCH] Expose stable sort algorithm to gcc_sort_r and add vec::stablesort
On Thu, 10 Jun 2021, Alexander Monakov wrote: > On Thu, 10 Jun 2021, Richard Biener wrote: > > > This makes it possible to apply GCCs stable sort algorithm to vec<> > > and also use it with the qsort_r compatible interface. > > > > Alex, any comments? > > I'm afraid the patch is not correct, see below; (I'll also point out > errors in comments while at it). Whoops - thanks for noticing. This is what you get when copy-editing adding gcc_stablesort_r in (which I had originally omitted) ... Fixed, re-bootstrapped and tested on x86_64-unknown-linux-gnu and pushed. Thanks, Richard. >From 0a9a35b9b07dfc82239545ec43dacbc9091543fa Mon Sep 17 00:00:00 2001 From: Richard Biener Date: Thu, 10 Jun 2021 11:03:55 +0200 Subject: [PATCH] Expose stable sort algorithm to gcc_sort_r and add vec::stablesort To: gcc-patches@gcc.gnu.org This makes it possible to apply GCCs stable sort algorithm to vec<> and also use it with the qsort_r compatible interface. 2021-06-10 Richard Biener * system.h (gcc_stablesort_r): Declare. * sort.cc (gcc_sort_r): Support stable sort. (gcc_stablesort_r): Define. * vec.h (vec<>::stablesort): Add. --- gcc/sort.cc | 14 +- gcc/system.h | 1 + gcc/vec.h| 24 3 files changed, 38 insertions(+), 1 deletion(-) diff --git a/gcc/sort.cc b/gcc/sort.cc index fe499b5ec73..1c83c62008d 100644 --- a/gcc/sort.cc +++ b/gcc/sort.cc @@ -277,8 +277,12 @@ gcc_sort_r (void *vbase, size_t n, size_t size, sort_r_cmp_fn *cmp, void *data) { if (n < 2) return; + size_t nlim = 5; + bool stable = (ssize_t) size < 0; + if (stable) +nlim = 3, size = ~size; char *base = (char *)vbase; - sort_r_ctx c = {data, cmp, base, n, size, 5}; + sort_r_ctx c = {data, cmp, base, n, size, nlim}; long long scratch[32]; size_t bufsz = (n / 2) * size; void *buf = bufsz <= sizeof scratch ? scratch : xmalloc (bufsz); @@ -296,3 +300,11 @@ gcc_stablesort (void *vbase, size_t n, size_t size, cmp_fn *cmp) { gcc_qsort (vbase, n, ~size, cmp); } + +/* Stable sort, signature-compatible to Glibc qsort_r. */ +void +gcc_stablesort_r (void *vbase, size_t n, size_t size, sort_r_cmp_fn *cmp, + void *data) +{ + gcc_sort_r (vbase, n, ~size, cmp, data); +} diff --git a/gcc/system.h b/gcc/system.h index 3c856266cc2..adde3e264b6 100644 --- a/gcc/system.h +++ b/gcc/system.h @@ -1250,6 +1250,7 @@ void gcc_sort_r (void *, size_t, size_t, sort_r_cmp_fn *, void *); void gcc_qsort (void *, size_t, size_t, int (*)(const void *, const void *)); void gcc_stablesort (void *, size_t, size_t, int (*)(const void *, const void *)); +void gcc_stablesort_r (void *, size_t, size_t, sort_r_cmp_fn *, void *data); /* Redirect four-argument qsort calls to gcc_qsort; one-argument invocations correspond to vec::qsort, and use C qsort internally. */ #define PP_5th(a1, a2, a3, a4, a5, ...) a5 diff --git a/gcc/vec.h b/gcc/vec.h index 24df2db0eeb..193377cb69c 100644 --- a/gcc/vec.h +++ b/gcc/vec.h @@ -612,6 +612,7 @@ public: void block_remove (unsigned, unsigned); void qsort (int (*) (const void *, const void *)); void sort (int (*) (const void *, const void *, void *), void *); + void stablesort (int (*) (const void *, const void *, void *), void *); T *bsearch (const void *key, int (*compar)(const void *, const void *)); T *bsearch (const void *key, int (*compar)(const void *, const void *, void *), void *); @@ -1160,6 +1161,17 @@ vec::sort (int (*cmp) (const void *, const void *, void *), gcc_sort_r (address (), length (), sizeof (T), cmp, data); } +/* Sort the contents of this vector with gcc_stablesort_r. CMP is the + comparison function to pass to qsort. */ + +template +inline void +vec::stablesort (int (*cmp) (const void *, const void *, +void *), void *data) +{ + if (length () > 1) +gcc_stablesort_r (address (), length (), sizeof (T), cmp, data); +} /* Search the contents of the sorted vector with a binary search. CMP is the comparison function to pass to bsearch. */ @@ -1488,6 +1500,7 @@ public: void block_remove (unsigned, unsigned); void qsort (int (*) (const void *, const void *)); void sort (int (*) (const void *, const void *, void *), void *); + void stablesort (int (*) (const void *, const void *, void *), void *); T *bsearch (const void *key, int (*compar)(const void *, const void *)); T *bsearch (const void *key, int (*compar)(const void *, const void *, void *), void *); @@ -2053,6 +2066,17 @@ vec::sort (int (*cmp) (const void *, const void *, m_vec->sort (cmp, data); } +/* Sort the contents of this vector with gcc_stablesort_r. CMP is the + comparison function to pass to qsort. */ + +template +inline void +vec::stablesort (int (*cmp) (const void *, const void *, +void *), void *data) +{ + if (m_vec) +m_v
[PATCH] Use stablesort for sorting association chain
This should preserve the original association order as much as possible for the initial SLP discovery attempt and also improve consistency. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-06-11 Richard Biener * tree-vect-slp.c (vect_build_slp_tree_2): Use stablesort to sort operands of the associative chain. --- gcc/tree-vect-slp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index c4f8f38012f..6237a61ffd4 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -1865,7 +1865,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, /* Now we have a set of chains with the same length. */ /* 1. pre-sort according to def_type and operation. */ for (unsigned lane = 0; lane < group_size; ++lane) - chains[lane].sort (dt_sort_cmp, vinfo); + chains[lane].stablesort (dt_sort_cmp, vinfo); if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, -- 2.26.2
Re: [Patch] contrib/gcc-changelog: Check that PR in subject in in changelog
On 6/10/21 5:32 PM, Tobias Burnus wrote: On 10.06.21 16:46, Martin Liška wrote: Note that flake8 has "plugins". At openSUSE, I install: ... None of those are available on Ubuntu – I probably should nag doko or start using my private computer for the tests ... I support the patch, please install it (except the not intentional change in gcc/c/c-parser.c). Then, please ask Jonathan for updating of the server hook. Cheers, Martin Updated as suggested and with you flake8-fix patch applied on top. Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
Re: [PATCH] Add statistics counting to PHI-OPT
On Fri, Jun 11, 2021 at 6:15 AM apinski--- via Gcc-patches wrote: > > From: Andrew Pinski > > This should have been done before I started to work on connecting > PHI-OPT to match-and-simplify to see quickly if we miss anything > but it is better late than never. > Anyways there was no statistics counting in PHI-OPT before so adding > it is the right thing to do. > > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. OK (minor nit - the strings have inconsistent starting with/without caps) Thanks, Richard. > gcc/ChangeLog: > > * tree-ssa-phiopt.c (replace_phi_edge_with_variable): > Add counting of how many times it is done. > (factor_out_conditional_conversion): Likewise. > (match_simplify_replacement): Likewise. > (value_replacement): Likewise. > (spaceship_replacement): Likewise. > (cond_store_replacement): Likewise. > (cond_if_else_store_replacement_1): Likewise. > (hoist_adjacent_loads): Likewise. > --- > gcc/tree-ssa-phiopt.c | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c > index 76f4e7ec843..02e26f974a5 100644 > --- a/gcc/tree-ssa-phiopt.c > +++ b/gcc/tree-ssa-phiopt.c > @@ -419,6 +419,8 @@ replace_phi_edge_with_variable (basic_block cond_block, >gsi = gsi_last_bb (cond_block); >gsi_remove (&gsi, true); > > + statistics_counter_event (cfun, "Replace PHI with variable", 1); > + >if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, > "COND_EXPR in block %d and PHI in block %d converted to > straightline code.\n", > @@ -618,6 +620,9 @@ factor_out_conditional_conversion (edge e0, edge e1, gphi > *phi, >/* Remove the original PHI stmt. */ >gsi = gsi_for_stmt (phi); >gsi_remove (&gsi, true); > + > + statistics_counter_event (cfun, "factored out cast", 1); > + >return newphi; > } > > @@ -893,6 +898,11 @@ match_simplify_replacement (basic_block cond_bb, > basic_block middle_bb, > >replace_phi_edge_with_variable (cond_bb, e1, phi, result); > > + /* Add Statistic here even though replace_phi_edge_with_variable already > + does it as we want to be able to count when match-simplify happens vs > + the others. */ > + statistics_counter_event (cfun, "match-simplify PHI replacement", 1); > + >/* Note that we optimized this PHI. */ >return true; > } > @@ -1196,6 +1206,8 @@ value_replacement (basic_block cond_bb, basic_block > middle_bb, > } >else > { > + statistics_counter_event (cfun, "Replace PHI with > variable/value_replacement", 1); > + > /* Replace the PHI arguments with arg. */ > SET_PHI_ARG_DEF (phi, e0->dest_idx, arg); > SET_PHI_ARG_DEF (phi, e1->dest_idx, arg); > @@ -2320,6 +2332,7 @@ spaceship_replacement (basic_block cond_bb, basic_block > middle_bb, > >gimple_stmt_iterator psi = gsi_for_stmt (phi); >remove_phi_node (&psi, true); > + statistics_counter_event (cfun, "spaceship replacement", 1); > >return true; > } > @@ -2982,6 +2995,7 @@ cond_store_replacement (basic_block middle_bb, > basic_block join_bb, >fprintf (dump_file, "\nInserted a new PHI statement in joint > block:\n"); >print_gimple_stmt (dump_file, new_stmt, 0, TDF_VOPS|TDF_MEMSYMS); > } > + statistics_counter_event (cfun, "conditional store replacement", 1); > >return true; > } > @@ -3056,6 +3070,8 @@ cond_if_else_store_replacement_1 (basic_block then_bb, > basic_block else_bb, >else > gsi_insert_before (&gsi, new_stmt, GSI_NEW_STMT); > > + statistics_counter_event (cfun, "if-then-else store replacement", 1); > + >return true; > } > > @@ -3469,6 +3485,7 @@ hoist_adjacent_loads (basic_block bb0, basic_block bb1, >gsi_move_to_bb_end (&gsi2, bb0); >gsi2 = gsi_for_stmt (def2); >gsi_move_to_bb_end (&gsi2, bb0); > + statistics_counter_event (cfun, "hoisted loads", 1); > >if (dump_file && (dump_flags & TDF_DETAILS)) > { > -- > 2.27.0 >
Re: [PATCH 3/4] remove %K from error() calls in the aarch64/arm back ends (PR 98512)
On Fri, 11 Jun 2021 at 01:29, Martin Sebor via Gcc-patches wrote: > > This patch removes the uses of %K from error() calls in the aarch64 > and arm back ends. I tested this change only by building a cross- > compiler but I can't easily run the tests so I'd appreciate some help > validating it. The fallout from the change should be limited to changes > to error messages so in the worst case it could be addressed after > committing the patch. I've submitted a validation with your patch, I'll let you know the results (in a few hours) Thanks Christophe
[PATCH] i386: Fix up *vec_concat_0_1 [PR101007]
On Fri, Apr 23, 2021 at 12:53:58PM +0800, Hongtao Liu via Gcc-patches wrote: > -(define_insn "*vec_concatv4si_0" > - [(set (match_operand:V4SI 0 "register_operand" "=v,x") > - (vec_concat:V4SI > - (match_operand:V2SI 1 "nonimmediate_operand" "vm,?!*y") > - (match_operand:V2SI 2 "const0_operand" " C,C")))] > +(define_insn "*vec_concat_0" > + [(set (match_operand:VI124_128 0 "register_operand" "=v,x") > + (vec_concat:VI124_128 > + (match_operand: 1 "nonimmediate_operand" "vm,?!*y") > + (match_operand: 2 "const0_operand" " C,C")))] >"TARGET_SSE2" >"@ > %vmovq\t{%1, %0|%0, %1} > @@ -22154,6 +22157,24 @@ (define_insn "avx_vec_concat" > (set_attr "prefix" "maybe_evex") > (set_attr "mode" "")]) > > +(define_insn_and_split "*vec_concat_0" > + [(set (match_operand:V 0 "register_operand") > + (vec_select:V > + (vec_concat: > + (match_operand:V 1 "nonimmediate_operand") > + (match_operand:V 2 "const0_operand")) > + (match_parallel 3 "movq_parallel" > + [(match_operand 4 "const_int_operand")])))] > + "ix86_pre_reload_split ()" > + "#" > + "&& 1" > + [(set (match_dup 0) > + (vec_concat:V (match_dup 1) (match_dup 5)))] > +{ > + operands[1] = gen_lowpart (mode, operands[1]); > + operands[5] = CONST0_RTX (mode); > +}) This regressed the following testcase with -msse -mno-sse2. The define_insn_and_split splits the permutation into *vec_concat_0 or *vec_concatv2di_0 insns which both have TARGET_SSE2 in their conditions (for the former you can see it above), but the define_insn_and_split matches always when the V mode's condition do, which for V16QI/V8HI/V4SI/V2DI/V4SF modes is always (well, when those modes are valid, which is TARGET_SSE). Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-06-11 Jakub Jelinek PR target/101007 * config/i386/sse.md (*vec_concat_0_1): Require TARGET_SSE2. * gcc.target/i386/sse-pr101007.c: New test. --- gcc/config/i386/sse.md.jj 2021-06-07 09:24:57.706689972 +0200 +++ gcc/config/i386/sse.md 2021-06-10 11:14:52.407588679 +0200 @@ -22395,7 +22395,7 @@ (define_insn_and_split "*vec_concat= 0), v, 4, 2); +} Jakub
[PATCH] simplify-rtx: Fix up simplify_logical_relational_operation for vector IOR [PR101008]
Hi! simplify_relational_operation callees typically return just const0_rtx or const_true_rtx and then simplify_relational_operation attempts to fix that up if the comparison result has vector mode, or floating mode, or punt if it has scalar mode and vector mode operands (it doesn't know how exactly to deal with the scalar masks). But, simplify_logical_relational_operation has a special case, where it attempts to fold (x < y) | (x >= y) etc. and if it determines it is always true, it just returns const_true_rtx, without doing the dances that simplify_relational_operation does. That results in an ICE on the following testcase, where such folding happens during expansion (of debug stmts into DEBUG_INSNs) and we ICE because all of sudden a VOIDmode rtx appears where it expects a vector (V4SImode) rtx. The following patch fixes that by moving the adjustement into a separate helper routine and using it from both simplify_relational_operation and simplify_logical_relational_operation. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-06-11 Jakub Jelinek PR rtl-optimization/101008 * simplify-rtx.c (relational_result): New function. (simplify_logical_relational_operation, simplify_relational_operation): Use it. * gcc.dg/pr101008.c: New test. --- gcc/simplify-rtx.c.jj 2021-05-04 21:02:24.0 +0200 +++ gcc/simplify-rtx.c 2021-06-10 13:56:48.946628822 +0200 @@ -2294,6 +2294,53 @@ comparison_code_valid_for_mode (enum rtx gcc_unreachable (); } } + +/* Canonicalize RES, a scalar const0_rtx/const_true_rtx to the right + false/true value of comparison with MODE where comparison operands + have CMP_MODE. */ + +static rtx +relational_result (machine_mode mode, machine_mode cmp_mode, rtx res) +{ + if (SCALAR_FLOAT_MODE_P (mode)) +{ + if (res == const0_rtx) +return CONST0_RTX (mode); +#ifdef FLOAT_STORE_FLAG_VALUE + REAL_VALUE_TYPE val = FLOAT_STORE_FLAG_VALUE (mode); + return const_double_from_real_value (val, mode); +#else + return NULL_RTX; +#endif +} + if (VECTOR_MODE_P (mode)) +{ + if (res == const0_rtx) + return CONST0_RTX (mode); +#ifdef VECTOR_STORE_FLAG_VALUE + rtx val = VECTOR_STORE_FLAG_VALUE (mode); + if (val == NULL_RTX) + return NULL_RTX; + if (val == const1_rtx) + return CONST1_RTX (mode); + + return gen_const_vec_duplicate (mode, val); +#else + return NULL_RTX; +#endif +} + /* For vector comparison with scalar int result, it is unknown + if the target means here a comparison into an integral bitmask, + or comparison where all comparisons true mean const_true_rtx + whole result, or where any comparisons true mean const_true_rtx + whole result. For const0_rtx all the cases are the same. */ + if (VECTOR_MODE_P (cmp_mode) + && SCALAR_INT_MODE_P (mode) + && res == const_true_rtx) +return NULL_RTX; + + return res; +} /* Simplify a logical operation CODE with result mode MODE, operating on OP0 and OP1, which should be both relational operations. Return 0 if no such @@ -2329,7 +2376,7 @@ simplify_context::simplify_logical_relat int mask = mask0 | mask1; if (mask == 15) -return const_true_rtx; +return relational_result (mode, GET_MODE (op0), const_true_rtx); code = mask_to_comparison (mask); @@ -5315,51 +5362,7 @@ simplify_context::simplify_relational_op tem = simplify_const_relational_operation (code, cmp_mode, op0, op1); if (tem) -{ - if (SCALAR_FLOAT_MODE_P (mode)) - { - if (tem == const0_rtx) -return CONST0_RTX (mode); -#ifdef FLOAT_STORE_FLAG_VALUE - { - REAL_VALUE_TYPE val; - val = FLOAT_STORE_FLAG_VALUE (mode); - return const_double_from_real_value (val, mode); - } -#else - return NULL_RTX; -#endif - } - if (VECTOR_MODE_P (mode)) - { - if (tem == const0_rtx) - return CONST0_RTX (mode); -#ifdef VECTOR_STORE_FLAG_VALUE - { - rtx val = VECTOR_STORE_FLAG_VALUE (mode); - if (val == NULL_RTX) - return NULL_RTX; - if (val == const1_rtx) - return CONST1_RTX (mode); - - return gen_const_vec_duplicate (mode, val); - } -#else - return NULL_RTX; -#endif - } - /* For vector comparison with scalar int result, it is unknown -if the target means here a comparison into an integral bitmask, -or comparison where all comparisons true mean const_true_rtx -whole result, or where any comparisons true mean const_true_rtx -whole result. For const0_rtx all the cases are the same. */ - if (VECTOR_MODE_P (cmp_mode) - && SCALAR_INT_MODE_P (mode) - && tem == const_true_rtx) - return NULL_RTX; - - return tem; -} +return relational_result (mode,
[PATCH] middle-end/101009 - fix distance vector recording
This fixes recording of distance vectors in case the DDR has just constant equal indexes. In that case we expect distance vectors with zero distances to be recorded which is what was done when any distance was computed for affine indexes. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-06-11 Richard Biener PR middle-end/101009 * tree-data-ref.c (build_classic_dist_vector_1): Make sure to set *init_b to true when we encounter a constant equal index pair. (compute_affine_dependence): Also dump the actual DR_REF. * gcc.dg/torture/pr101009.c: New testcase. --- gcc/testsuite/gcc.dg/torture/pr101009.c | 17 + gcc/tree-data-ref.c | 10 -- 2 files changed, 25 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr101009.c diff --git a/gcc/testsuite/gcc.dg/torture/pr101009.c b/gcc/testsuite/gcc.dg/torture/pr101009.c new file mode 100644 index 000..2bbed1dfc2c --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr101009.c @@ -0,0 +1,17 @@ +/* { dg-do run } */ +/* { dg-additional-options "-fno-tree-sra -fno-tree-pre -ftree-loop-distribution" } */ + +struct a { + unsigned b; + unsigned c; +} e, *f = &e; +int d = 1; +int main() { + for (; d; d--) { +struct a g[] = {{2, 1}, {2, 1}}; +*f = g[1]; + } + if (e.c != 1) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index b1f64684840..b37c234aea8 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -5121,6 +5121,8 @@ build_classic_dist_vector_1 (struct data_dependence_relation *ddr, non_affine_dependence_relation (ddr); return false; } + else + *init_b = true; } return true; @@ -5616,9 +5618,13 @@ compute_affine_dependence (struct data_dependence_relation *ddr, if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "(compute_affine_dependence\n"); - fprintf (dump_file, " stmt_a: "); + fprintf (dump_file, " ref_a: "); + print_generic_expr (dump_file, DR_REF (dra)); + fprintf (dump_file, ", stmt_a: "); print_gimple_stmt (dump_file, DR_STMT (dra), 0, TDF_SLIM); - fprintf (dump_file, " stmt_b: "); + fprintf (dump_file, " ref_b: "); + print_generic_expr (dump_file, DR_REF (drb)); + fprintf (dump_file, ", stmt_b: "); print_gimple_stmt (dump_file, DR_STMT (drb), 0, TDF_SLIM); } -- 2.26.2
Re: [PATCH] i386: Fix up *vec_concat_0_1 [PR101007]
On Fri, Jun 11, 2021 at 10:59 AM Jakub Jelinek wrote: > > On Fri, Apr 23, 2021 at 12:53:58PM +0800, Hongtao Liu via Gcc-patches wrote: > > -(define_insn "*vec_concatv4si_0" > > - [(set (match_operand:V4SI 0 "register_operand" "=v,x") > > - (vec_concat:V4SI > > - (match_operand:V2SI 1 "nonimmediate_operand" "vm,?!*y") > > - (match_operand:V2SI 2 "const0_operand" " C,C")))] > > +(define_insn "*vec_concat_0" > > + [(set (match_operand:VI124_128 0 "register_operand" "=v,x") > > + (vec_concat:VI124_128 > > + (match_operand: 1 "nonimmediate_operand" "vm,?!*y") > > + (match_operand: 2 "const0_operand" " C,C")))] > >"TARGET_SSE2" > >"@ > > %vmovq\t{%1, %0|%0, %1} > > @@ -22154,6 +22157,24 @@ (define_insn "avx_vec_concat" > > (set_attr "prefix" "maybe_evex") > > (set_attr "mode" "")]) > > > > +(define_insn_and_split "*vec_concat_0" > > + [(set (match_operand:V 0 "register_operand") > > + (vec_select:V > > + (vec_concat: > > + (match_operand:V 1 "nonimmediate_operand") > > + (match_operand:V 2 "const0_operand")) > > + (match_parallel 3 "movq_parallel" > > + [(match_operand 4 "const_int_operand")])))] > > + "ix86_pre_reload_split ()" > > + "#" > > + "&& 1" > > + [(set (match_dup 0) > > + (vec_concat:V (match_dup 1) (match_dup 5)))] > > +{ > > + operands[1] = gen_lowpart (mode, operands[1]); > > + operands[5] = CONST0_RTX (mode); > > +}) > > This regressed the following testcase with -msse -mno-sse2. > The define_insn_and_split splits the permutation into *vec_concat_0 > or *vec_concatv2di_0 insns which both have TARGET_SSE2 in their > conditions (for the former you can see it above), but the > define_insn_and_split matches always when the V mode's condition do, > which for V16QI/V8HI/V4SI/V2DI/V4SF modes is always (well, when those > modes are valid, which is TARGET_SSE). > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, > ok for trunk? > > 2021-06-11 Jakub Jelinek > > PR target/101007 > * config/i386/sse.md (*vec_concat_0_1): Require TARGET_SSE2. > > * gcc.target/i386/sse-pr101007.c: New test. OK, even as obvious patch. Thanks, Uros. > --- gcc/config/i386/sse.md.jj 2021-06-07 09:24:57.706689972 +0200 > +++ gcc/config/i386/sse.md 2021-06-10 11:14:52.407588679 +0200 > @@ -22395,7 +22395,7 @@ (define_insn_and_split "*vec_concat (match_operand:V 2 "const0_operand")) > (match_parallel 3 "movq_parallel" > [(match_operand 4 "const_int_operand")])))] > - "ix86_pre_reload_split ()" > + "TARGET_SSE2 && ix86_pre_reload_split ()" >"#" >"&& 1" >[(set (match_dup 0) > --- gcc/testsuite/gcc.target/i386/sse-pr101007.c.jj 2021-06-10 > 11:41:25.818609527 +0200 > +++ gcc/testsuite/gcc.target/i386/sse-pr101007.c2021-06-10 > 11:38:39.301910017 +0200 > @@ -0,0 +1,14 @@ > +/* PR target/101007 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -msse -mno-sse2" } */ > + > +typedef unsigned __attribute__((__vector_size__ (8))) U; > +typedef unsigned __attribute__((__vector_size__ (16))) V; > +V v; > +U *p; > + > +void > +foo (void) > +{ > + *p = (U) __builtin_shufflevector ((V)(0 == (V){} >= 0), v, 4, 2); > +} > > Jakub >
Re: [PATCH 2/2] arm: Auto-vectorization for MVE: add pack/unpack patterns
Christophe Lyon writes: > On Tue, 8 Jun 2021 at 14:10, Richard Sandiford > wrote: >> >> Christophe Lyon writes: >> > This patch adds vec_unpack_hi_, vec_unpack_lo_, >> > vec_pack_trunc_ patterns for MVE. >> > >> > It does so by moving the unpack patterns from neon.md to >> > vec-common.md, while adding them support for MVE. The pack expander is >> > derived from the Neon one (which in turn is renamed into >> > neon_quad_vec_pack_trunc_). >> > >> > The patch introduces mve_vec_pack_trunc_ to avoid the need for a >> > zero-initialized temporary, which is needed if the >> > vec_pack_trunc_ expander calls @mve_vmovn[bt]q_ >> > instead. >> > >> > With this patch, we can now vectorize the 16 and 8-bit versions of >> > vclz and vshl, although the generated code could still be improved. >> > For test_clz_s16, we now generate >> > vldrh.16q3, [r1] >> > vmovlb.s16 q2, q3 >> > vmovlt.s16 q3, q3 >> > vclz.i32 q2, q2 >> > vclz.i32 q3, q3 >> > vmovnb.i32 q1, q2 >> > vmovnt.i32 q1, q3 >> > vstrh.16q1, [r0] >> > which could be improved to >> > vldrh.16q3, [r1] >> > vclz.i16q1, q3 >> > vstrh.16q1, [r0] >> > if we could avoid the need for unpack/pack steps. >> >> Yeah, there was a PR about fixing this for popcount. I guess the same >> approach would apply here too. >> >> > For reference, clang-12 generates: >> > vldrh.s32 q0, [r1] >> > vldrh.s32 q1, [r1, #8] >> > vclz.i32q0, q0 >> > vstrh.32q0, [r0] >> > vclz.i32q0, q1 >> > vstrh.32q0, [r0, #8] >> > >> > 2021-06-03 Christophe Lyon >> > >> > gcc/ >> > * config/arm/mve.md (mve_vmovltq_): Prefix with '@'. >> > (mve_vmovlbq_): Likewise. >> > (mve_vmovnbq_): Likewise. >> > (mve_vmovntq_): Likewise. >> > (@mve_vec_pack_trunc_): New pattern. >> > * config/arm/neon.md (vec_unpack_hi_): Move to >> > vec-common.md. >> > (vec_unpack_lo_): Likewise. >> > (vec_pack_trunc_): Rename to >> > neon_quad_vec_pack_trunc_. >> > * config/arm/vec-common.md (vec_unpack_hi_): New >> > pattern. >> > (vec_unpack_lo_): New. >> > (vec_pack_trunc_): New. >> > >> > gcc/testsuite/ >> > * gcc.target/arm/simd/mve-vclz.c: Update expected results. >> > * gcc.target/arm/simd/mve-vshl.c: Likewise. >> > --- >> > gcc/config/arm/mve.md| 20 - >> > gcc/config/arm/neon.md | 39 + >> > gcc/config/arm/vec-common.md | 89 >> > gcc/testsuite/gcc.target/arm/simd/mve-vclz.c | 7 +- >> > gcc/testsuite/gcc.target/arm/simd/mve-vshl.c | 5 +- >> > 5 files changed, 114 insertions(+), 46 deletions(-) >> > >> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md >> > index 99e46d0bc69..b18292c07d3 100644 >> > --- a/gcc/config/arm/mve.md >> > +++ b/gcc/config/arm/mve.md >> > @@ -510,7 +510,7 @@ (define_insn "mve_vrev32q_" >> > ;; >> > ;; [vmovltq_u, vmovltq_s]) >> > ;; >> > -(define_insn "mve_vmovltq_" >> > +(define_insn "@mve_vmovltq_" >> >[ >> > (set (match_operand: 0 "s_register_operand" "=w") >> > (unspec: [(match_operand:MVE_3 1 >> > "s_register_operand" "w")] >> > @@ -524,7 +524,7 @@ (define_insn "mve_vmovltq_" >> > ;; >> > ;; [vmovlbq_s, vmovlbq_u]) >> > ;; >> > -(define_insn "mve_vmovlbq_" >> > +(define_insn "@mve_vmovlbq_" >> >[ >> > (set (match_operand: 0 "s_register_operand" "=w") >> > (unspec: [(match_operand:MVE_3 1 >> > "s_register_operand" "w")] >> > @@ -2187,7 +2187,7 @@ (define_insn "mve_vmlsldavxq_s" >> > ;; >> > ;; [vmovnbq_u, vmovnbq_s]) >> > ;; >> > -(define_insn "mve_vmovnbq_" >> > +(define_insn "@mve_vmovnbq_" >> >[ >> > (set (match_operand: 0 "s_register_operand" "=w") >> > (unspec: [(match_operand: 1 >> > "s_register_operand" "0") >> > @@ -2202,7 +2202,7 @@ (define_insn "mve_vmovnbq_" >> > ;; >> > ;; [vmovntq_s, vmovntq_u]) >> > ;; >> > -(define_insn "mve_vmovntq_" >> > +(define_insn "@mve_vmovntq_" >> >[ >> > (set (match_operand: 0 "s_register_operand" "=w") >> > (unspec: [(match_operand: 1 >> > "s_register_operand" "0") >> > @@ -2214,6 +2214,18 @@ (define_insn "mve_vmovntq_" >> >[(set_attr "type" "mve_move") >> > ]) >> > >> > +(define_insn "@mve_vec_pack_trunc_" >> > + [(set (match_operand: 0 "register_operand" "=&w") >> > + (vec_concat: >> > + (truncate: >> > + (match_operand:MVE_5 1 "register_operand" "w")) >> > + (truncate: >> > + (match_operand:MVE_5 2 "register_operand" "w"] >> > + "TARGET_HAVE_MVE" >> > + "vmovnb.i%q0, %q1\;vmovnt.i %q0, %q2" >> > + [(set_attr "type" "mve_move")] >> > +) >> > + >> >> I realise this is (like you say) based on the neon.md pattern, >> but we should use separate vmovnb and vm
Re: [PATCH 3/4] remove %K from error() calls in the aarch64/arm back ends (PR 98512)
Martin Sebor via Gcc-patches writes: > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 7b37e1b602c..7cdc824730c 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -13242,13 +13242,8 @@ bounds_check (rtx operand, HOST_WIDE_INT low, > HOST_WIDE_INT high, >lane = INTVAL (operand); > >if (lane < low || lane >= high) > -{ > - if (exp) > - error ("%K%s %wd out of range %wd - %wd", > -exp, desc, lane, low, high - 1); > - else > - error ("%s %wd out of range %wd - %wd", desc, lane, low, high - 1); > -} > +error_at (EXPR_LOCATION (exp), > + "%s %wd out of range %wd - %wd", desc, lane, low, high - 1); > } > > /* Bounds-check lanes. */ This part doesn't look safe: “exp” is null when called from arm_const_bounds. Thanks, Richard
Re: [PATCH] Add gnu::diagnose_as attribute
How can we make progress here? I could try to produce some "Tony Tables" of diagnostic output of my modified stdx::simd. I believe it's a major productivity boost to see abbreviated / "obfuscated" diagnostics *out-of-the box* (with the possibility to opt-out). Actually, it already *is* a productivity boost to me. Understanding diagnostics has improved from "1. ooof, I'm not going to read this, let me rather guess what the issue was 2. sh** I have to read it 3. several minutes later: I finally found the five words to understand the problem; I could use a break" to "1. right, let me check that" For reference I'll attach my stdx::simd diagnose_as patch. We could also talk about extending the feature to provide more information about the diagnose_as substition. E.g. print a list of all diagnose_as substitutions, which were used, at the end of the output stream. Or simpler, print "note: some identifiers were simplified, use -fno-diagnostics-use- aliases to see their real names". On Tuesday, 1 June 2021 21:12:18 CEST Jason Merrill wrote: > > Right, but then two of my design goals can't be met: > > > > 1. Diagnostics have an improved signal-to-noise ratio out of the box. > > > > 2. We can use replacement names that are not valid identifiers. > > This is the basic disconnect: I think that these goals are > contradictory, and that replacement names that are not valid identifiers > will just confuse users that don't know about them. > > If a user sees stdx::foo in a diagnostic and then tries to refer to > stdx::foo and gets an error, the diagnostic is not more helpful than one > that uses the fully qualified name. > > Jonathan, David, any thoughts on this issue? -- ── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ── diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h index 43331134301..8e0cceff860 100644 --- a/libstdc++-v3/include/experimental/bits/simd.h +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -80,13 +80,13 @@ using __m512d [[__gnu__::__vector_size__(64)]] = double; using __m512i [[__gnu__::__vector_size__(64)]] = long long; #endif -namespace simd_abi { +namespace simd_abi [[__gnu__::__diagnose_as__("simd_abi")]] { // simd_abi forward declarations {{{ // implementation details: -struct _Scalar; + struct [[__gnu__::__diagnose_as__("scalar")]] _Scalar; template - struct _Fixed; + struct [[__gnu__::__diagnose_as__("fixed_size")]] _Fixed; // There are two major ABIs that appear on different architectures. // Both have non-boolean values packed into an N Byte register @@ -105,28 +105,11 @@ template template struct _VecBltnBtmsk; -template - using _VecN = _VecBuiltin; - -template - using _Sse = _VecBuiltin<_UsedBytes>; - -template - using _Avx = _VecBuiltin<_UsedBytes>; - -template - using _Avx512 = _VecBltnBtmsk<_UsedBytes>; - -template - using _Neon = _VecBuiltin<_UsedBytes>; - -// implementation-defined: -using __sse = _Sse<>; -using __avx = _Avx<>; -using __avx512 = _Avx512<>; -using __neon = _Neon<>; -using __neon128 = _Neon<16>; -using __neon64 = _Neon<8>; +#if defined __i386__ || defined __x86_64__ +using __sse [[__gnu__::__diagnose_as__("[SSE]")]] = _VecBuiltin<16>; +using __avx [[__gnu__::__diagnose_as__("[AVX]")]] = _VecBuiltin<32>; +using __avx512 [[__gnu__::__diagnose_as__("[AVX512]")]] = _VecBltnBtmsk<64>; +#endif // standard: template @@ -364,7 +347,7 @@ namespace __detail * users link TUs compiled with different flags. This is especially important * for using simd in libraries. */ - using __odr_helper + using __odr_helper [[__gnu__::__diagnose_as__("[ODR helper]")]] = conditional_t<__machine_flags() == 0, _OdrEnforcer, _MachineFlagsTemplate<__machine_flags(), __floating_point_flags()>>; @@ -689,7 +672,7 @@ template __is_avx512_abi() { constexpr auto _Bytes = __abi_bytes_v<_Abi>; -return _Bytes <= 64 && is_same_v, _Abi>; +return _Bytes <= 64 && is_same_v, _Abi>; } // }}} diff --git a/libstdc++-v3/include/experimental/bits/simd_detail.h b/libstdc++-v3/include/experimental/bits/simd_detail.h index 78ad33f74e4..1f127cd0d52 100644 --- a/libstdc++-v3/include/experimental/bits/simd_detail.h +++ b/libstdc++-v3/include/experimental/bits/simd_detail.h @@ -36,7 +36,7 @@ {\ _GLIBCXX_BEGIN_NAMESPACE_VERSION \ namespace experimental { \ - inline namespace parallelism_v2 { + inline namespace parallelism_v2 [[__gnu__::__diagnose_as__("std\u209
Re: [PATCH 2/2] arm: Auto-vectorization for MVE: add pack/unpack patterns
On Fri, 11 Jun 2021 at 11:38, Richard Sandiford wrote: > > Christophe Lyon writes: > > On Tue, 8 Jun 2021 at 14:10, Richard Sandiford > > wrote: > >> > >> Christophe Lyon writes: > >> > This patch adds vec_unpack_hi_, vec_unpack_lo_, > >> > vec_pack_trunc_ patterns for MVE. > >> > > >> > It does so by moving the unpack patterns from neon.md to > >> > vec-common.md, while adding them support for MVE. The pack expander is > >> > derived from the Neon one (which in turn is renamed into > >> > neon_quad_vec_pack_trunc_). > >> > > >> > The patch introduces mve_vec_pack_trunc_ to avoid the need for a > >> > zero-initialized temporary, which is needed if the > >> > vec_pack_trunc_ expander calls @mve_vmovn[bt]q_ > >> > instead. > >> > > >> > With this patch, we can now vectorize the 16 and 8-bit versions of > >> > vclz and vshl, although the generated code could still be improved. > >> > For test_clz_s16, we now generate > >> > vldrh.16q3, [r1] > >> > vmovlb.s16 q2, q3 > >> > vmovlt.s16 q3, q3 > >> > vclz.i32 q2, q2 > >> > vclz.i32 q3, q3 > >> > vmovnb.i32 q1, q2 > >> > vmovnt.i32 q1, q3 > >> > vstrh.16q1, [r0] > >> > which could be improved to > >> > vldrh.16q3, [r1] > >> > vclz.i16q1, q3 > >> > vstrh.16q1, [r0] > >> > if we could avoid the need for unpack/pack steps. > >> > >> Yeah, there was a PR about fixing this for popcount. I guess the same > >> approach would apply here too. > >> > >> > For reference, clang-12 generates: > >> > vldrh.s32 q0, [r1] > >> > vldrh.s32 q1, [r1, #8] > >> > vclz.i32q0, q0 > >> > vstrh.32q0, [r0] > >> > vclz.i32q0, q1 > >> > vstrh.32q0, [r0, #8] > >> > > >> > 2021-06-03 Christophe Lyon > >> > > >> > gcc/ > >> > * config/arm/mve.md (mve_vmovltq_): Prefix with '@'. > >> > (mve_vmovlbq_): Likewise. > >> > (mve_vmovnbq_): Likewise. > >> > (mve_vmovntq_): Likewise. > >> > (@mve_vec_pack_trunc_): New pattern. > >> > * config/arm/neon.md (vec_unpack_hi_): Move to > >> > vec-common.md. > >> > (vec_unpack_lo_): Likewise. > >> > (vec_pack_trunc_): Rename to > >> > neon_quad_vec_pack_trunc_. > >> > * config/arm/vec-common.md (vec_unpack_hi_): New > >> > pattern. > >> > (vec_unpack_lo_): New. > >> > (vec_pack_trunc_): New. > >> > > >> > gcc/testsuite/ > >> > * gcc.target/arm/simd/mve-vclz.c: Update expected results. > >> > * gcc.target/arm/simd/mve-vshl.c: Likewise. > >> > --- > >> > gcc/config/arm/mve.md| 20 - > >> > gcc/config/arm/neon.md | 39 + > >> > gcc/config/arm/vec-common.md | 89 > >> > gcc/testsuite/gcc.target/arm/simd/mve-vclz.c | 7 +- > >> > gcc/testsuite/gcc.target/arm/simd/mve-vshl.c | 5 +- > >> > 5 files changed, 114 insertions(+), 46 deletions(-) > >> > > >> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md > >> > index 99e46d0bc69..b18292c07d3 100644 > >> > --- a/gcc/config/arm/mve.md > >> > +++ b/gcc/config/arm/mve.md > >> > @@ -510,7 +510,7 @@ (define_insn "mve_vrev32q_" > >> > ;; > >> > ;; [vmovltq_u, vmovltq_s]) > >> > ;; > >> > -(define_insn "mve_vmovltq_" > >> > +(define_insn "@mve_vmovltq_" > >> >[ > >> > (set (match_operand: 0 "s_register_operand" "=w") > >> > (unspec: [(match_operand:MVE_3 1 > >> > "s_register_operand" "w")] > >> > @@ -524,7 +524,7 @@ (define_insn "mve_vmovltq_" > >> > ;; > >> > ;; [vmovlbq_s, vmovlbq_u]) > >> > ;; > >> > -(define_insn "mve_vmovlbq_" > >> > +(define_insn "@mve_vmovlbq_" > >> >[ > >> > (set (match_operand: 0 "s_register_operand" "=w") > >> > (unspec: [(match_operand:MVE_3 1 > >> > "s_register_operand" "w")] > >> > @@ -2187,7 +2187,7 @@ (define_insn "mve_vmlsldavxq_s" > >> > ;; > >> > ;; [vmovnbq_u, vmovnbq_s]) > >> > ;; > >> > -(define_insn "mve_vmovnbq_" > >> > +(define_insn "@mve_vmovnbq_" > >> >[ > >> > (set (match_operand: 0 "s_register_operand" "=w") > >> > (unspec: [(match_operand: 1 > >> > "s_register_operand" "0") > >> > @@ -2202,7 +2202,7 @@ (define_insn "mve_vmovnbq_" > >> > ;; > >> > ;; [vmovntq_s, vmovntq_u]) > >> > ;; > >> > -(define_insn "mve_vmovntq_" > >> > +(define_insn "@mve_vmovntq_" > >> >[ > >> > (set (match_operand: 0 "s_register_operand" "=w") > >> > (unspec: [(match_operand: 1 > >> > "s_register_operand" "0") > >> > @@ -2214,6 +2214,18 @@ (define_insn "mve_vmovntq_" > >> >[(set_attr "type" "mve_move") > >> > ]) > >> > > >> > +(define_insn "@mve_vec_pack_trunc_" > >> > + [(set (match_operand: 0 "register_operand" "=&w") > >> > + (vec_concat: > >> > + (truncate: > >> > + (match_operand:MVE_5 1 "register_operand" "w")) > >> > + (truncate: > >> > +
Re: [PATCH 2/2] arm: Auto-vectorization for MVE: add pack/unpack patterns
Christophe Lyon writes: > In the meantime, I tried to make some progress, and looked at how > things work on aarch64. > > This led me to define (in mve.md): > > (define_insn "@mve_vec_pack_trunc_lo_" > [(set (match_operand: 0 "register_operand" "=w") >(truncate: (match_operand:MVE_5 1 "register_operand" "w")))] > "TARGET_HAVE_MVE" > "vmovnb.i %q0, %q1\;" > [(set_attr "type" "mve_move")] > ) > > (define_insn "@mve_vec_pack_trunc_hi_" > [(set (match_operand: 0 "register_operand" "=w") >(vec_concat: > (match_operand: 1 "register_operand" "0") > (truncate: > (match_operand:MVE_5 2 "register_operand" "w"] > "TARGET_HAVE_MVE" > "vmovnt.i %q0, %q2\;" > [(set_attr "type" "mve_move")] > ) > > and in vec-common.md, for > (define_expand "vec_pack_trunc_" > [(set (match_operand: 0 "register_operand") >(vec_concat: > (truncate: > (match_operand:VN 1 "register_operand")) > (truncate: > (match_operand:VN 2 "register_operand"] > > I expand this for MVE: > rtx tmpreg = gen_reg_rtx (mode); > emit_insn (gen_mve_vec_pack_trunc_lo (mode, tmpreg, operands[1])); > emit_insn (gen_mve_vec_pack_trunc_hi (mode, operands[0], > tmpreg, operands[2])); > > I am getting an error in reload: > error: unable to generate reloads for: > (insn 10 9 11 2 (set (reg:V4HI 122 [ vect__4.16 ]) > (truncate:V4HI (reg:V4SI 118 [ vect__4.16 ]))) > "/gcc.target/arm/simd/mve-vec-pack.c":17:112 3609 > {mve_vec_pack_trunc_lo_v4si} > (expr_list:REG_DEAD (reg:V4SI 118 [ vect__4.16 ]) > (nil))) > > The next insn is: > (insn 11 10 12 2 (set (reg:V8HI 121 [ vect__7.18 ]) > (vec_concat:V8HI (reg:V4HI 122 [ vect__4.16 ]) > (truncate:V4HI (reg:V4SI 119 [ vect__4.17 ] > "/gcc.target/arm/simd/mve-vec-pack.c":17:112 3611 > {mve_vec_pack_trunc_hi_v4si} > (expr_list:REG_DEAD (reg:V4HI 122 [ vect__4.16 ]) > (expr_list:REG_DEAD (reg:V4SI 119 [ vect__4.17 ]) > (nil > > What is causing the reload error? For this to work on MVE, we'd need to allow the 64-bit V_narrow modes to be stored in MVE registers. I'm not sure off-hand whether allowing that would be a good idea or not. If we continue to allow only 128-bit vectors in MVE registers then we'll need to continue to use unspecs instead of truncate and vec_concat. Thanks, Richard
[PATCH v2] c++: Add C++23 consteval if support - P1938R3 [PR100974]
On Thu, Jun 10, 2021 at 03:00:54PM -0400, Jason Merrill wrote: > The second is clearer about the fix, the first is clearer about the problem. > Maybe add a fixit to the first error? Done. > OK, just add a comment then. > > OK, just add a comment then. And added comments. > > > > --- gcc/cp/cp-gimplify.c.jj 2021-06-09 21:54:39.473193977 +0200 > > > > +++ gcc/cp/cp-gimplify.c2021-06-10 09:49:35.898557178 +0200 > > > > @@ -161,7 +161,9 @@ genericize_if_stmt (tree *stmt_p) > > > > if (!else_) > > > >else_ = build_empty_stmt (locus); > > > > - if (integer_nonzerop (cond) && !TREE_SIDE_EFFECTS (else_)) > > > > + if (IF_STMT_CONSTEVAL_P (stmt)) > > > > +stmt = else_; > > > > > > This seems redundant, since you're using boolean_false_node for the > > > condition. > > > > It is only when !TREE_SIDE_EFFECTS (else_). > > I think that is about having labels in the then_/else_ blocks and > > gotos jumping into those from outside. > > But IF_STMT_CONSTEVAL_P guarantees that is not the case (and we verify > > earlier), so we can do that (and in fact, for IF_STMT_CONSTEVAL_P have > > to, because then_ could contain consteval calls not constant expression > > evaluated). > > I guess we could do that for IF_STMT_CONSTEXPR_P too, that also > > doesn't allow gotos/switch into the branches. For this too. > > > > > > --- gcc/cp/call.c.jj2021-06-09 21:54:39.436194489 +0200 > > > > +++ gcc/cp/call.c 2021-06-10 09:49:35.949556470 +0200 > > > > @@ -8840,6 +8840,7 @@ immediate_invocation_p (tree fn, int nar > > > > || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl)) > > > > && (current_binding_level->kind != sk_function_parms > > > > || !current_binding_level->immediate_fn_ctx_p) > > > > + && !in_immediate_fn_ctx_p > > > > > > Now that we have this flag, shouldn't we set it in actual immediate > > > function > > > context? Or rename the flag to match when it's actually set. > > > > I guess I can try that. Though, I'm not sure if we could also get rid of > > the current_binding_level->immediate_fn_ctx_p for sk_function_parms case. Had a look at this, but could handle it. So instead I've renamed it to in_consteval_if_p. Ok for trunk if it passes bootstrap/regtest? 2021-06-11 Jakub Jelinek PR c++/100974 - P1938R3 - if consteval gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins): Predefine __cpp_if_consteval for -std=c++2b. gcc/cp/ * cp-tree.h (struct saved_scope): Add consteval_if_p member. Formatting fix for the discarded_stmt comment. (in_consteval_if_p, IF_STMT_CONSTEVAL_P): Define. * parser.c (cp_parser_lambda_expression): Temporarily disable in_consteval_if_p when parsing lambda body. (cp_parser_selection_statement): Parse consteval if. * decl.c (struct named_label_entry): Add in_consteval_if member. (level_for_consteval_if): New function. (poplevel_named_label_1, check_previous_goto_1, check_goto): Handle consteval if. * constexpr.c (cxx_eval_builtin_function_call): Clarify in comment why CP_BUILT_IN_IS_CONSTANT_EVALUATED needs to *non_constant_p for !ctx->manifestly_const_eval. (cxx_eval_conditional_expression): For IF_STMT_CONSTEVAL_P evaluate condition as if it was __builtin_is_constant_evaluated call. (potential_constant_expression_1): For IF_STMT_CONSTEVAL_P always recurse on both branches. * cp-gimplify.c (genericize_if_stmt): Genericize IF_STMT_CONSTEVAL_P as the else branch. * pt.c (tsubst_expr) : Copy IF_STMT_CONSTEVAL_P. Temporarily set in_consteval_if_p when recursing on IF_STMT_CONSTEVAL_P then branch. (tsubst_lambda_expr): Temporarily disable in_consteval_if_p when instantiating lambda body. * call.c (immediate_invocation_p): Return false when in_consteval_if_p. gcc/testsuite/ * g++.dg/cpp23/consteval-if1.C: New test. * g++.dg/cpp23/consteval-if2.C: New test. * g++.dg/cpp23/consteval-if3.C: New test. * g++.dg/cpp23/consteval-if4.C: New test. * g++.dg/cpp23/consteval-if5.C: New test. * g++.dg/cpp23/consteval-if6.C: New test. * g++.dg/cpp23/consteval-if7.C: New test. * g++.dg/cpp23/consteval-if8.C: New test. * g++.dg/cpp23/consteval-if9.C: New test. * g++.dg/cpp23/consteval-if10.C: New test. * g++.dg/cpp23/feat-cxx2b.C: Add __cpp_if_consteval tests. --- gcc/c-family/c-cppbuiltin.c.jj 2021-06-10 19:56:20.584335765 +0200 +++ gcc/c-family/c-cppbuiltin.c 2021-06-11 11:34:15.408578098 +0200 @@ -1029,6 +1029,7 @@ c_cpp_builtins (cpp_reader *pfile) { /* Set feature test macros for C++23. */ cpp_define (pfile, "__cpp_size_t_suffix=202011L"); + cpp_define (pfile, "__cpp_if_consteval=202106L"); } if (flag_concepts) { --- gcc/cp/cp-tree.h.jj 2
[PATCH] i386: Try to avoid variable permutation instruction [PR101021]
Some permutations can be implemented without costly PSHUFB instruction, e.g.: { 8,9,10,11,12,13,14,15, 0,1,2,3,4,5,6,7 } with PALIGNR, { 0,1,2,3, 4,5,6,7, 4,5,6,7, 12,13,14,15 } with PSHUFD, { 0,1, 2,3, 2,3, 6,7, 8,9,10,11,12,13,14,15 } with PSHUFLW and { 0,1,2,3,4,5,6,7, 8,9, 10,11, 10,11, 14,15 } with PSHUFHW. All these instructions have constant shuffle control mask and do not need to load shuffle mask from a memory to a temporary XMM register. 2021-06-11 Uroš Bizjak gcc/ PR target/101021 * config/i386/i386-expand.c (expand_vec_perm_pshufb): Return false if the permutation can be implemented with constant permutation instruction in wider mode. (canonicalize_vector_int_perm): Move above expand_vec_perm_pshufb. Handle V8QImode and V4HImode. gcc/testsuite/ PR target/101021 * gcc.target/i386/pr101021-1.c: New test. * gcc.target/i386/pr101021-2.c: Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Additionally tested with: GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx dg-torture.exp=vshuf*.c' Pushed to master. Uros. diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 9ee5257adf9..2fa3a18dc6a 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -17354,6 +17354,59 @@ expand_vec_perm_vpermil (struct expand_vec_perm_d *d) return true; } +/* For V*[QHS]Imode permutations, check if the same permutation + can't be performed in a 2x, 4x or 8x wider inner mode. */ + +static bool +canonicalize_vector_int_perm (const struct expand_vec_perm_d *d, + struct expand_vec_perm_d *nd) +{ + int i; + machine_mode mode = VOIDmode; + + switch (d->vmode) +{ +case E_V8QImode: mode = V4HImode; break; +case E_V16QImode: mode = V8HImode; break; +case E_V32QImode: mode = V16HImode; break; +case E_V64QImode: mode = V32HImode; break; +case E_V4HImode: mode = V2SImode; break; +case E_V8HImode: mode = V4SImode; break; +case E_V16HImode: mode = V8SImode; break; +case E_V32HImode: mode = V16SImode; break; +case E_V4SImode: mode = V2DImode; break; +case E_V8SImode: mode = V4DImode; break; +case E_V16SImode: mode = V8DImode; break; +default: return false; +} + for (i = 0; i < d->nelt; i += 2) +if ((d->perm[i] & 1) || d->perm[i + 1] != d->perm[i] + 1) + return false; + nd->vmode = mode; + nd->nelt = d->nelt / 2; + for (i = 0; i < nd->nelt; i++) +nd->perm[i] = d->perm[2 * i] / 2; + if (GET_MODE_INNER (mode) != DImode) +canonicalize_vector_int_perm (nd, nd); + if (nd != d) +{ + nd->one_operand_p = d->one_operand_p; + nd->testing_p = d->testing_p; + if (d->op0 == d->op1) + nd->op0 = nd->op1 = gen_lowpart (nd->vmode, d->op0); + else + { + nd->op0 = gen_lowpart (nd->vmode, d->op0); + nd->op1 = gen_lowpart (nd->vmode, d->op1); + } + if (d->testing_p) + nd->target = gen_raw_REG (nd->vmode, LAST_VIRTUAL_REGISTER + 1); + else + nd->target = gen_reg_rtx (nd->vmode); +} + return true; +} + /* Return true if permutation D can be performed as VMODE permutation instead. */ @@ -17391,6 +17444,7 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d) unsigned i, nelt, eltsz, mask; unsigned char perm[64]; machine_mode vmode = V16QImode; + struct expand_vec_perm_d nd; rtx rperm[64], vperm, target, op0, op1; nelt = d->nelt; @@ -17539,6 +17593,10 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d) return false; } + /* Try to avoid variable permutation instruction. */ + if (canonicalize_vector_int_perm (d, &nd) && expand_vec_perm_1 (&nd)) +return false; + if (d->testing_p) return true; @@ -17617,57 +17675,6 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d) return true; } -/* For V*[QHS]Imode permutations, check if the same permutation - can't be performed in a 2x, 4x or 8x wider inner mode. */ - -static bool -canonicalize_vector_int_perm (const struct expand_vec_perm_d *d, - struct expand_vec_perm_d *nd) -{ - int i; - machine_mode mode = VOIDmode; - - switch (d->vmode) -{ -case E_V16QImode: mode = V8HImode; break; -case E_V32QImode: mode = V16HImode; break; -case E_V64QImode: mode = V32HImode; break; -case E_V8HImode: mode = V4SImode; break; -case E_V16HImode: mode = V8SImode; break; -case E_V32HImode: mode = V16SImode; break; -case E_V4SImode: mode = V2DImode; break; -case E_V8SImode: mode = V4DImode; break; -case E_V16SImode: mode = V8DImode; break; -default: return false; -} - for (i = 0; i < d->nelt; i += 2) -if ((d->perm[i] & 1) || d->perm[i + 1] != d->perm[i] + 1) - return false; - nd->vmode = mode; - nd->nelt = d->nelt / 2; - for (i = 0; i < nd->nelt; i++) -nd->perm[i] = d->perm[2 * i] / 2; - if (GET_MODE_INNER (mode) != DImode
Re: [PATCH] simplify-rtx: Fix up simplify_logical_relational_operation for vector IOR [PR101008]
On Fri, 11 Jun 2021, Jakub Jelinek wrote: > Hi! > > simplify_relational_operation callees typically return just const0_rtx > or const_true_rtx and then simplify_relational_operation attempts to fix > that up if the comparison result has vector mode, or floating mode, > or punt if it has scalar mode and vector mode operands (it doesn't know how > exactly to deal with the scalar masks). > But, simplify_logical_relational_operation has a special case, where > it attempts to fold (x < y) | (x >= y) etc. and if it determines it is > always true, it just returns const_true_rtx, without doing the dances that > simplify_relational_operation does. > That results in an ICE on the following testcase, where such folding happens > during expansion (of debug stmts into DEBUG_INSNs) and we ICE because > all of sudden a VOIDmode rtx appears where it expects a vector (V4SImode) > rtx. > > The following patch fixes that by moving the adjustement into a separate > helper routine and using it from both simplify_relational_operation and > simplify_logical_relational_operation. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Thanks, Richard. > 2021-06-11 Jakub Jelinek > > PR rtl-optimization/101008 > * simplify-rtx.c (relational_result): New function. > (simplify_logical_relational_operation, > simplify_relational_operation): Use it. > > * gcc.dg/pr101008.c: New test. > > --- gcc/simplify-rtx.c.jj 2021-05-04 21:02:24.0 +0200 > +++ gcc/simplify-rtx.c2021-06-10 13:56:48.946628822 +0200 > @@ -2294,6 +2294,53 @@ comparison_code_valid_for_mode (enum rtx > gcc_unreachable (); > } > } > + > +/* Canonicalize RES, a scalar const0_rtx/const_true_rtx to the right > + false/true value of comparison with MODE where comparison operands > + have CMP_MODE. */ > + > +static rtx > +relational_result (machine_mode mode, machine_mode cmp_mode, rtx res) > +{ > + if (SCALAR_FLOAT_MODE_P (mode)) > +{ > + if (res == const0_rtx) > +return CONST0_RTX (mode); > +#ifdef FLOAT_STORE_FLAG_VALUE > + REAL_VALUE_TYPE val = FLOAT_STORE_FLAG_VALUE (mode); > + return const_double_from_real_value (val, mode); > +#else > + return NULL_RTX; > +#endif > +} > + if (VECTOR_MODE_P (mode)) > +{ > + if (res == const0_rtx) > + return CONST0_RTX (mode); > +#ifdef VECTOR_STORE_FLAG_VALUE > + rtx val = VECTOR_STORE_FLAG_VALUE (mode); > + if (val == NULL_RTX) > + return NULL_RTX; > + if (val == const1_rtx) > + return CONST1_RTX (mode); > + > + return gen_const_vec_duplicate (mode, val); > +#else > + return NULL_RTX; > +#endif > +} > + /* For vector comparison with scalar int result, it is unknown > + if the target means here a comparison into an integral bitmask, > + or comparison where all comparisons true mean const_true_rtx > + whole result, or where any comparisons true mean const_true_rtx > + whole result. For const0_rtx all the cases are the same. */ > + if (VECTOR_MODE_P (cmp_mode) > + && SCALAR_INT_MODE_P (mode) > + && res == const_true_rtx) > +return NULL_RTX; > + > + return res; > +} > > /* Simplify a logical operation CODE with result mode MODE, operating on OP0 > and OP1, which should be both relational operations. Return 0 if no such > @@ -2329,7 +2376,7 @@ simplify_context::simplify_logical_relat >int mask = mask0 | mask1; > >if (mask == 15) > -return const_true_rtx; > +return relational_result (mode, GET_MODE (op0), const_true_rtx); > >code = mask_to_comparison (mask); > > @@ -5315,51 +5362,7 @@ simplify_context::simplify_relational_op > >tem = simplify_const_relational_operation (code, cmp_mode, op0, op1); >if (tem) > -{ > - if (SCALAR_FLOAT_MODE_P (mode)) > - { > - if (tem == const0_rtx) > -return CONST0_RTX (mode); > -#ifdef FLOAT_STORE_FLAG_VALUE > - { > - REAL_VALUE_TYPE val; > - val = FLOAT_STORE_FLAG_VALUE (mode); > - return const_double_from_real_value (val, mode); > - } > -#else > - return NULL_RTX; > -#endif > - } > - if (VECTOR_MODE_P (mode)) > - { > - if (tem == const0_rtx) > - return CONST0_RTX (mode); > -#ifdef VECTOR_STORE_FLAG_VALUE > - { > - rtx val = VECTOR_STORE_FLAG_VALUE (mode); > - if (val == NULL_RTX) > - return NULL_RTX; > - if (val == const1_rtx) > - return CONST1_RTX (mode); > - > - return gen_const_vec_duplicate (mode, val); > - } > -#else > - return NULL_RTX; > -#endif > - } > - /* For vector comparison with scalar int result, it is unknown > - if the target means here a comparison into an integral bitmask, > - or comparison where all comparisons true mean const_true_rtx > - whole result, or where any comparisons true mean const_true_r
Re: [PATCH 04/11 v2] libstdc++: Make use of __builtin_bit_cast
While testing newer patches I found several missing conversions from __bit_cast to simd_bit_cast in this patch (i.e. where bit casting to / from fixed_size was sometimes required). Corrected patch attached. From: Matthias Kretz The __bit_cast function was a hack to achieve what __builtin_bit_cast can do, therefore use __builtin_bit_cast if possible. However, __builtin_bit_cast cannot be used to cast from/to fixed_size_simd, since it isn't trivially copyable (in the language sense — in principle it is). Therefore add __proposed::simd_bit_cast to enable the use case required in the test framework. Signed-off-by: Matthias Kretz libstdc++-v3/ChangeLog: * include/experimental/bits/simd.h (__bit_cast): Implement via __builtin_bit_cast #if available. (__proposed::simd_bit_cast): Add overloads for simd and simd_mask, which use __builtin_bit_cast (or __bit_cast #if not available), which return an object of the requested type with the same bits as the argument. * include/experimental/bits/simd_math.h: Use simd_bit_cast instead of __bit_cast to allow casts to fixed_size_simd. (copysign): Remove branch that was only required if __bit_cast cannot be constexpr. * testsuite/experimental/simd/tests/bits/test_values.h: Switch from __bit_cast to __proposed::simd_bit_cast since the former will not cast fixed_size objects anymore. --- libstdc++-v3/include/experimental/bits/simd.h | 57 ++- .../include/experimental/bits/simd_math.h | 36 +--- .../simd/tests/bits/test_values.h | 8 +-- 3 files changed, 75 insertions(+), 26 deletions(-) -- ── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ──diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h index 163f1b574e2..852d0b62012 100644 --- a/libstdc++-v3/include/experimental/bits/simd.h +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -1598,7 +1598,9 @@ template _GLIBCXX_SIMD_INTRINSIC constexpr _To __bit_cast(const _From __x) { -// TODO: implement with / replace by __builtin_bit_cast ASAP +#if __has_builtin(__builtin_bit_cast) +return __builtin_bit_cast(_To, __x); +#else static_assert(sizeof(_To) == sizeof(_From)); constexpr bool __to_is_vectorizable = is_arithmetic_v<_To> || is_enum_v<_To>; @@ -1629,6 +1631,7 @@ template reinterpret_cast(&__x), sizeof(_To)); return __r; } +#endif } // }}} @@ -2900,6 +2903,58 @@ template (__x)}; } + +template + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + _To + simd_bit_cast(const simd<_Up, _Abi>& __x) + { +using _Tp = typename _To::value_type; +using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember; +using _From = simd<_Up, _Abi>; +using _FromMember = typename _SimdTraits<_Up, _Abi>::_SimdMember; +// with concepts, the following should be constraints +static_assert(sizeof(_To) == sizeof(_From)); +static_assert(is_trivially_copyable_v<_Tp> && is_trivially_copyable_v<_Up>); +static_assert(is_trivially_copyable_v<_ToMember> && is_trivially_copyable_v<_FromMember>); +#if __has_builtin(__builtin_bit_cast) +return {__private_init, __builtin_bit_cast(_ToMember, __data(__x))}; +#else +return {__private_init, __bit_cast<_ToMember>(__data(__x))}; +#endif + } + +template + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + _To + simd_bit_cast(const simd_mask<_Up, _Abi>& __x) + { +using _From = simd_mask<_Up, _Abi>; +static_assert(sizeof(_To) == sizeof(_From)); +static_assert(is_trivially_copyable_v<_From>); +// _To can be simd, specifically simd> in which case _To is not trivially +// copyable. +if constexpr (is_simd_v<_To>) + { + using _Tp = typename _To::value_type; + using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember; + static_assert(is_trivially_copyable_v<_ToMember>); +#if __has_builtin(__builtin_bit_cast) + return {__private_init, __builtin_bit_cast(_ToMember, __x)}; +#else + return {__private_init, __bit_cast<_ToMember>(__x)}; +#endif + } +else + { + static_assert(is_trivially_copyable_v<_To>); +#if __has_builtin(__builtin_bit_cast) + return __builtin_bit_cast(_To, __x); +#else + return __bit_cast<_To>(__x); +#endif + } + } } // namespace __proposed // simd_cast {{{2 diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h index d954e761eee..afd8b5a028f 100644 --- a/libstdc++-v3/include/experimental/bits/simd_math.h +++ b/libstdc++-v3/include/experi
Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc
On Tue, 8 Jun 2021, Kees Cook wrote: > On Tue, Jun 08, 2021 at 09:41:38AM +0200, Richard Biener wrote: > > On Mon, 7 Jun 2021, Qing Zhao wrote: > > > > > Hi, > > > > > > > On Jun 7, 2021, at 2:53 AM, Richard Biener wrote: > > > > > > > >> > > > >> To address the above suggestion: > > > >> > > > >> My study shows: the call to __builtin_clear_padding is expanded during > > > >> gimplification phase. > > > >> And there is no __bultin_clear_padding expanding during rtx expanding > > > >> phase. > > > >> However, for -ftrivial-auto-var-init, padding initialization should be > > > >> done both in gimplification phase and rtx expanding phase. > > > >> since the __builtin_clear_padding might not be good for rtx expanding, > > > >> reusing __builtin_clear_padding might not work. > > > >> > > > >> Let me know if you have any more comments on this. > > > > > > > > Yes, I didn't suggest to literally emit calls to > > > > __builtin_clear_padding > > > > but instead to leverage the lowering code, more specifically share the > > > > code that figures _what_ is to be initialized (where the padding is) > > > > and eventually the actual code generation pieces. That might need some > > > > refactoring but the code where padding resides should be present only > > > > a single time (since it's quite complex). > > > > > > Okay, I see your point here. > > > > > > > > > > > Which is also why I suggested to split out the padding initialization > > > > bits to a separate patch (and option). > > > > > > Personally, I am okay with splitting padding initialization from this > > > current patch, > > > Kees, what’s your opinion on this? i.e, the current > > > -ftrivial-auto-var-init will NOT initialize padding, we will add another > > > option to > > > Explicitly initialize padding. > > > > It would also be possible to have -fauto-var-init, -fauto-var-init-padding > > and have -ftrivial-auto-var-init for clang compatibility enabling both. > > Sounds good to me! > > > Or -fauto-var-init={zero,pattern,padding} and allow > > -fauto-var-init=pattern,padding to be specified. Note there's also > > padding between auto variables on the stack - that "trailing" > > padding isn't initialized either? (yes, GCC sorts variables to minimize > > that padding) For example for > > > > void foo() > > { > > char a[3]; > > bar (a); > > } > > > > there's 12 bytes padding after 'a', shouldn't we initialize that? If not, > > why's other padding important to be initialized? > > This isn't a situation that I'm aware of causing real-world problems. > The issues have all come from padding within an addressable object. I > haven't tested Clang's behavior on this (and I have no kernel tests for > this padding), but I do check for trailing padding, like: > > struct test_trailing_hole { > char *one; > char *two; > char *three; > char four; > /* "sizeof(unsigned long) - 1" byte padding hole here. */ > }; Any justification why tail padding for struct foo { double x; char x[3]; } a; is important but not for char x[3]; ? It does look like an odd inconsistency to me. Richard.
Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc
On Thu, 10 Jun 2021, Qing Zhao wrote: > Hi, Richard, > > I need more discussion on the following comments you raised: > > > On May 26, 2021, at 6:18 AM, Richard Biener wrote: > > > > +/* Expand the IFN_DEFERRED_INIT function according to its second > > argument. */ > > +static void > > +expand_DEFERRED_INIT (internal_fn, gcall *stmt) > > +{ > > + tree var = gimple_call_lhs (stmt); > > + tree init = NULL_TREE; > > + enum auto_init_type init_type > > += (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); > > + > > + switch (init_type) > > +{ > > +default: > > + gcc_unreachable (); > > +case AUTO_INIT_PATTERN: > > + init = build_pattern_cst_for_auto_init (TREE_TYPE (var)); > > + expand_assignment (var, init, false); > > + break; > > +case AUTO_INIT_ZERO: > > + init = build_zero_cst (TREE_TYPE (var)); > > + expand_assignment (var, init, false); > > + break; > > +} > > > > I think actually building build_pattern_cst_for_auto_init can generate > > massive garbage and for big auto vars code size is also a concern and > > ideally on x86 you'd produce rep movq. So I don't think going > > via expand_assignment is good. Instead you possibly want to lower > > .DEFERRED_INIT to MEMs following expand_builtin_memset and > > eventually enhance that to allow storing pieces larger than a byte. > > When I tried to lower .DEFERRED_INIT to MEMs for “AUTO_INIT_PATTERN”, I have > the following questions: > > 1. If .DEFERRED_INIT will be lowered to MEMS through “memset”, then we > basically initialize the whole memory covering the > auto variable, including paddings. Right? Yes. > 2. Only when the value that is used to initialization has a repeated >byte-pattern, we can lower it through “memset”. Otherwise, If the >value that is used to initialization does Not have a repeated >byte-pattern, we can NOT lower it through “memset”, right? Yes. This is why I said you should do it _similar_ to how memcpy is implemented. OTOH I don't see a good reason to support patterns that are bigger than a byte ... > Currently, for the values that are used to initialize for > “AUTO_INIT_PATTERN”, we have: > > /* The following value is a guaranteed unmappable pointer value and has a > repeated byte-pattern which makes it easier to synthesize. We use it for > pointers as well as integers so that aggregates are likely to be > initialized with this repeated value. */ > uint64_t largevalue = 0xull; > /* For 32-bit platforms it's a bit trickier because, across systems, only > the > zero page can reasonably be expected to be unmapped, and even then we > need > a very low address. We use a smaller value, and that value sadly doesn't > have a repeated byte-pattern. We don't use it for integers. */ > uint32_t smallvalue = 0x00AA; > > In additional to the above, for BOOLEAN_TYPE: > > case BOOLEAN_TYPE: > /* We think that initializing the boolean variable to 0 other than 1 > is better even for pattern initialization. */ > > Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a > repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer > types. Therefore, lowering the .DEFERRED_INIT for “PATTERN” > initialization through “memset” is not always possible. > > Let me know if I miss anything in the above. Do you have other suggestions? The main point is that you need to avoid building the explicit initializer only to have it consumed by assignment expansion. If you want to keep all the singing and dancing (as opposed to maybe initializing with a 0x1 byte pattern) then I think for efficiency you still want to block-initialize the variable and then only fixup the special fields. But as said, all this is quite over-designed IMHO and simply zeroing everything would be much simpler and good enough. Richard.
Re: [PATCH] arc: Add --with-fpu support for ARCv2 cpus
Hi Bernhard, Please find attached my latest patch, it includes (hopefully) all your feedback. Thank you for comments, Claudiu >From 03075b3d9194120d7adb3cdc2aa0f58e3ea9dd1d Mon Sep 17 00:00:00 2001 From: Claudiu Zissulescu Date: Wed, 21 Oct 2020 16:11:43 +0300 Subject: [PATCH] arc: Add --with-fpu support for ARCv2 cpus Support for a compile-time default FPU. The --with-fpu configuration option is ignored if -mfpu compiler option is specified. The FPU options are only available for ARCv2 cpus. gcc/ -mm-dd Claudiu Zissulescu * config.gcc (arc): Add support for with_cpu option. * config/arc/arc.h (OPTION_DEFAULT_SPECS): Add fpu. Signed-off-by: Claudiu Zissulescu --- gcc/config.gcc | 44 +++- gcc/config/arc/arc.h | 4 2 files changed, 43 insertions(+), 5 deletions(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index 13c2004e3c52..5581dae88b52 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -4258,18 +4258,52 @@ case "${target}" in ;; arc*-*-*) - supported_defaults="cpu" + supported_defaults="cpu fpu" + new_cpu=hs38_linux if [ x"$with_cpu" = x ] \ - || grep "^ARC_CPU ($with_cpu," \ - ${srcdir}/config/arc/arc-cpus.def \ - > /dev/null; then + || grep -q -E "^ARC_CPU[[:blank:]]*\($with_cpu," \ + ${srcdir}/config/arc/arc-cpus.def + then # Ok - true + new_cpu=$with_cpu else echo "Unknown cpu used in --with-cpu=$with_cpu" 1>&2 exit 1 fi + + # see if --with-fpu matches any of the supported FPUs + case "$with_fpu" in + "") + # OK + ;; + fpus | fpus_div | fpus_fma | fpus_all) + # OK if em or hs + flags_ok="[emhs]+" + ;; + fpuda | fpuda_div | fpuda_fma | fpuda_all) + # OK only em + flags_ok="em" + ;; + fpud | fpud_div | fpud_fma | fpud_all) + # OK only hs + flags_ok="hs" + ;; + *) + echo "Unknown floating point type used in "\ + "--with-fpu=$with_fpu" 1>&2 + exit 1 + ;; + esac + + if [ -n "$flags_ok" ] \ + && ! grep -q -E "^ARC_CPU[[:blank:]]*\($new_cpu,[[:blank:]]*$flags_ok," \ + ${srcdir}/config/arc/arc-cpus.def + then + echo "Unknown floating point type used in "\ + "--with-fpu=$with_fpu for cpu $new_cpu" 1>&2 + exit 1 + fi ;; csky-*-*) diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h index 2ccebfa7afe6..e78f6e8202b3 100644 --- a/gcc/config/arc/arc.h +++ b/gcc/config/arc/arc.h @@ -100,7 +100,11 @@ extern const char *arc_cpu_to_as (int argc, const char **argv); "%:cpu_to_as(%{mcpu=*:%*}) %{mspfp*} %{mdpfp*} " \ "%{mfpu=fpuda*:-mfpuda} %{mcode-density}" +/* Support for a compile-time default CPU and FPU. The rules are: + --with-cpu is ignored if -mcpu, mARC*, marc*, mA7, mA6 are specified. + --with-fpu is ignored if -mfpu is specified. */ #define OPTION_DEFAULT_SPECS \ + {"fpu", "%{!mfpu=*:-mfpu=%(VALUE)}"}, \ {"cpu", "%{!mcpu=*:%{!mARC*:%{!marc*:%{!mA7:%{!mA6:-mcpu=%(VALUE)}" } #ifndef DRIVER_ENDIAN_SELF_SPECS -- 2.31.1
[PATCH] tree-optimization/101026 - fix SLP re-association
Since we cannot yet encode the operation in the SLP node itself but need a representative stmt require an existing one for now to avoid the need to build a fake GIMPLE stmt. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-06-11 Richard Biener PR tree-optimization/101026 * tree-vect-slp.c (vect_build_slp_tree_2): Make sure we have a representative for the associated chain nodes. * gfortran.dg/pr101026.f: New testcase. --- gcc/testsuite/gfortran.dg/pr101026.f | 11 +++ gcc/tree-vect-slp.c | 4 +++- 2 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gfortran.dg/pr101026.f diff --git a/gcc/testsuite/gfortran.dg/pr101026.f b/gcc/testsuite/gfortran.dg/pr101026.f new file mode 100644 index 000..9576d8802ca --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr101026.f @@ -0,0 +1,11 @@ +! { dg-do compile } +! { dg-options "-Ofast -frounding-math" } + SUBROUTINE PASSB4 (CC,CH) + DIMENSION CC(IDO,4,L1), CH(IDO,L1,*) + DO 103 I=2,IDO,2 +TI4 = CC0-CC(I,4,K) +CI4 = TI1-TI4 +CH(I-1,K,4) = CI4 +CH(I,K,4) = CI4 + 103CONTINUE + END diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 6237a61ffd4..897bd6f37b9 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -1860,7 +1860,9 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, chains.quick_push (chain.copy ()); chain.truncate (0); } - if (chains.length () == group_size) + if (chains.length () == group_size + /* We cannot yet use SLP_TREE_CODE to communicate the operation. */ + && op_stmt_info) { /* Now we have a set of chains with the same length. */ /* 1. pre-sort according to def_type and operation. */ -- 2.26.2
Re: [PATCH] PR tree-optimization/96392 Optimize x+0.0 if x is an integer
On Thu, Jun 10, 2021 at 9:45 PM Roger Sayle wrote: > > > The patch implements a missed optimization enhancement. Under usual > IEEE rules, x+0.0 can't be simplified to x when x might potentially > be an IEEE minus zero (-0.0). The current logic in the middle-end > checks whether the type of x should honor signed zeros, but with this > patch we introduce tree_expr_maybe_real_minus_zero_p that allows us > to confirm that the value can't possibly be -0.0, for example, the result > of a conversion from an integer type, or the result of fabs (or has a > type that doesn't honor signed zero). > > Whilst modifying match.pd, I also converted some additional folding > transformations from "testing the type" to "testing the value". > > The following patch has been tested on x86_64-pc-linux-gnu with > a make bootstrap and make -k check with no new failures. > > Ok for mainline? OK. Maybe we can at some point record & propagate these FP value predicates on SSA names just similar to SSA_NAME_RANGE_INFO ... Thanks, Richard. > > 2020-06-10 Roger Sayle > > gcc/ChangeLog > PR tree-optimization/96392 > * fold-const.c (fold_real_zero_addition_p): Take both arguments > of the addition or subtraction, not just the zero. Use this > other argument in tests for signaling NaNs and signed zeros. > (tree_expr_maybe_real_minus_zero_p): New predicate. > * fold-const.h (fold_real_zero_addition_p): Update prototype. > (tree_expr_maybe_real_minus_zero_p): New function prototype. > * match.pd: Update calls to fold_real_zero_addition_p. > Replace HONOR_NANS with tree_expr_maybe_nan_p. > Replace HONOR_SIGNED_ZEROS with tree_expr_maybe_real_minus_zero_p. > Replace HONOR_SNANS with tree_expr_maybe_signaling_nan_p. > * tree-ssa-reassoc.c (eliminate_using_constants): Update > call to fold_real_zero_addition_p. > > gcc/testsuite/ChangeLog > PR tree-optimization/96392 > * gcc.dg/pr96392.c: New test. > > Roger > -- > Roger Sayle > NextMove Software > Cambridge, UK >
[GCC][Patch] arm: Fix the mve multilib for the broken cmse support (pr99939).
Hi Richard, I have addressed all your review comments in https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571739.html in the following patch. The current CMSE support in the multilib build for "-march=armv8.1-m.main+mve -mfloat-abi=hard -mfpu=auto" is broken as specified in PR99939 and this patch fixes the issue. Regression tested on arm-none-eabi and found no regressions. Ok for master? and Ok for GCC-10 branch? Regards, Srinath. gcc/testsuite/ChangeLog: 2021-06-11 Srinath Parvathaneni * gcc.target/arm/cmse/cmse-18.c: Add separate scan-assembler directives check for target is v8.1-m.main+mve or not before comparing the assembly output. * gcc.target/arm/cmse/cmse-20.c: New test. libgcc/ChangeLog: 2021-06-11 Srinath Parvathaneni * config/arm/cmse_nonsecure_call.S: Add __ARM_FEATURE_MVE macro. * config/arm/t-arm: To link cmse.o and cmse_nonsecure_call.o on passing -mcmse option. ### Attachment also inlined for ease of reply### diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c index e1ff09257b7900982f49117d4cfc16f3bd79d76c..db7d975a90ea4bd1810aea03949ec1e8837e 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c +++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c @@ -8,4 +8,5 @@ void bar(f func, int a) func(a); } -/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" } } */ +/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" { target { ! arm_v8_1m_mve_ok } } } } */ +/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r\[0-7\]:SI\\\]\\\]" "final" { target { arm_v8_1m_mve_ok } } } } */ diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c new file mode 100644 index ..08e89bff6378f1f96950fc40f3ab3946bd433773 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c @@ -0,0 +1,28 @@ +/* This test is executed only if the execution engine supports CMSE instructions. */ +/* { dg-options "--save-temps -mcmse -Wl,--section-start,.gnu.sgstubs=0x0040" } */ + +#include +#include +#include + +void __attribute__((cmse_nonsecure_entry)) +secure_fun (int a, int *p) +{ + void *b = cmse_check_address_range ((void *)p, a, 1); + + if (b == NULL) + __builtin_abort (); + printf("%d", *((int *)b)); +} + +int +main (void) +{ + int *ptr; + int size = 1; + ptr = (int *) calloc (1, sizeof(int *)); + *ptr = 1315852292; + secure_fun (size, ptr); + free (ptr); + return 0; +} diff --git a/libgcc/config/arm/cmse_nonsecure_call.S b/libgcc/config/arm/cmse_nonsecure_call.S index 146f3ed52e9c7e915e5dbd9b70624ec3bd7cd5b5..00830ade98ea650c328c709d5d308fbc96f7f21c 100644 --- a/libgcc/config/arm/cmse_nonsecure_call.S +++ b/libgcc/config/arm/cmse_nonsecure_call.S @@ -25,7 +25,7 @@ .syntax unified #ifdef __ARM_PCS_VFP -# if __ARM_FP & 0x8 +# if (__ARM_FP & 0x8) || (__ARM_FEATURE_MVE & 1) .fpu fpv5-d16 # else .fpu fpv4-sp-d16 @@ -59,7 +59,7 @@ vmov s24, s25, r5, r5 vmov s26, s27, r5, r5 vmov s28, s29, r5, r5 vmov s30, s31, r5, r5 -#elif __ARM_FP & 0x08 +#elif (__ARM_FP & 0x8) || (__ARM_FEATURE_MVE & 1) vmov.f64d9, d8 vmov.f64d10, d8 vmov.f64d11, d8 diff --git a/libgcc/config/arm/t-arm b/libgcc/config/arm/t-arm index 3625a2590beec4e4e0e0881be9ad284c595c7190..c1553d4e5d80751b13dc2e9c9e36d5ebe82e5f8c 100644 --- a/libgcc/config/arm/t-arm +++ b/libgcc/config/arm/t-arm @@ -3,18 +3,17 @@ LIB1ASMFUNCS = _thumb1_case_sqi _thumb1_case_uqi _thumb1_case_shi \ _thumb1_case_uhi _thumb1_case_si _speculation_barrier HAVE_CMSE:=$(findstring __ARM_FEATURE_CMSE,$(shell $(gcc_compile_bare) -dM -E - /dev/null 2>/dev/null; echo $$?),0) CMSE_OPTS:=-mcmse endif ifdef HAVE_CMSE -ifndef HAVE_V81M + libgcc-objects += cmse.o cmse_nonsecure_call.o cmse.o: $(srcdir)/config/arm/cmse.c $(gcc_compile) -c $(CMSE_OPTS) $< + cmse_nonsecure_call.o: $(srcdir)/config/arm/cmse_nonsecure_call.S $(gcc_compile) -c $< endif -endif diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c index e1ff09257b7900982f49117d4cfc16f3bd79d76c..db7d975a90ea4bd1810aea03949ec1e8837e 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c +++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c @@ -8,4 +8,5 @@ void bar(f func, int a) func(a); } -/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" } } */ +/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" { target { ! arm_v8_1m_mve_ok } } } } */ +/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r\[0-7\]:SI\\\]\\\]" "final" { target { arm_v8_1m_mve_ok } } } } */ diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c new file mode 100644 index 0
[PATCH] tree-optimization/101028 - fix endless SLP reassoc discovery
This fixes a missing clearing of mismatched lanes from the fatal fail path in SLP reassoc discovery in the most conservative way. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2021-06-11 Richard Biener PR tree-optimization/101028 * tree-vect-slp.c (vect_build_slp_tree_2): When SLP reassoc discovery fails fatally, mark appropriate lanes in matches[] so. * gcc.dg/pr101028.c: New testcase. --- gcc/testsuite/gcc.dg/pr101028.c | 34 + gcc/tree-vect-slp.c | 26 ++--- 2 files changed, 57 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr101028.c diff --git a/gcc/testsuite/gcc.dg/pr101028.c b/gcc/testsuite/gcc.dg/pr101028.c new file mode 100644 index 000..501e6af37cf --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr101028.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast" } */ + +typedef struct { + double x, y; +} PointInfo; + +typedef struct { + PointInfo point; +} PrimitiveInfo; + +int TraceBezier_alpha, TraceBezier_i; +double TraceBezier_weight; +PointInfo *TraceBezier_points; +PrimitiveInfo *TraceBezier_primitive_info; + +void TracePath() { + double *coefficients; + PointInfo point; + long j; + for (; TraceBezier_i; TraceBezier_i++) { +point.x = point.y = TraceBezier_alpha = 1.0; +j = 0; +for (; j < 4; j++) { + point.x += TraceBezier_alpha * coefficients[j] * + TraceBezier_primitive_info->point.x; + point.y += TraceBezier_alpha * TraceBezier_primitive_info->point.y; + TraceBezier_alpha *= TraceBezier_weight; + TraceBezier_primitive_info++; +} +TraceBezier_points[TraceBezier_i] = point; +TraceBezier_weight += 1.0; + } +} diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 897bd6f37b9..9ded58592c8 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -1849,14 +1849,28 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, mismatch still hard-FAIL. */ if (chain_len == 0) hard_fail = false; + else + { + matches[lane] = false; + /* ??? We might want to process the other lanes, but +make sure to not give false matching hints to the +caller for lanes we did not process. */ + if (lane != group_size - 1) + matches[0] = false; + } break; } else if (chain_len == 0) chain_len = chain.length (); else if (chain.length () != chain_len) - /* ??? Here we could slip in magic to compensate with - neutral operands. */ - break; + { + /* ??? Here we could slip in magic to compensate with +neutral operands. */ + matches[lane] = false; + if (lane != group_size - 1) + matches[0] = false; + break; + } chains.quick_push (chain.copy ()); chain.truncate (0); } @@ -1905,6 +1919,9 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, dump_printf_loc (MSG_NOTE, vect_location, "giving up on chain due to mismatched " "def types\n"); + matches[lane] = false; + if (lane != group_size - 1) + matches[0] = false; goto out; } if (dt == vect_constant_def @@ -1983,6 +2000,9 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, dump_printf_loc (MSG_NOTE, vect_location, "failed to match up op %d\n", n); op_stmts.release (); + matches[lane] = false; + if (lane != group_size - 1) + matches[0] = false; goto out; } if (dump_enabled_p ()) -- 2.26.2
Re: [PATCH] rs6000: Support more short/char to float conversion
Hi Segher, on 2021/6/10 下午6:58, Segher Boessenkool wrote: > Hi! > > On Thu, Jun 10, 2021 at 05:32:23PM +0800, Kewen.Lin wrote: >> +/* { dg-do compile { target lp64 } } */ > > One final thing: what requires lp64 here? Could you try without please? > The lp64 is required for lxsi[bh]zx and mvexts[bh]2d, without it -m32 testing will have some failures as below: FAIL: gcc.target/powerpc/p9-fpcvt-3.c scan-assembler \\mlxsibzx\\M FAIL: gcc.target/powerpc/p9-fpcvt-3.c scan-assembler \\mlxsihzx\\M FAIL: gcc.target/powerpc/p9-fpcvt-3.c scan-assembler M FAIL: gcc.target/powerpc/p9-fpcvt-3.c scan-assembler \\mvextsh2d\\M > Okay for trunk with that considered. Thanks! > Thanks! It's committed in r12-1378. BR, Kewen
Re: [PATCH v2] c++: Add C++23 consteval if support - P1938R3 [PR100974]
On 6/11/21 6:28 AM, Jakub Jelinek wrote: On Thu, Jun 10, 2021 at 03:00:54PM -0400, Jason Merrill wrote: The second is clearer about the fix, the first is clearer about the problem. Maybe add a fixit to the first error? Done. OK, just add a comment then. OK, just add a comment then. And added comments. --- gcc/cp/cp-gimplify.c.jj 2021-06-09 21:54:39.473193977 +0200 +++ gcc/cp/cp-gimplify.c2021-06-10 09:49:35.898557178 +0200 @@ -161,7 +161,9 @@ genericize_if_stmt (tree *stmt_p) if (!else_) else_ = build_empty_stmt (locus); - if (integer_nonzerop (cond) && !TREE_SIDE_EFFECTS (else_)) + if (IF_STMT_CONSTEVAL_P (stmt)) +stmt = else_; This seems redundant, since you're using boolean_false_node for the condition. It is only when !TREE_SIDE_EFFECTS (else_). I think that is about having labels in the then_/else_ blocks and gotos jumping into those from outside. But IF_STMT_CONSTEVAL_P guarantees that is not the case (and we verify earlier), so we can do that (and in fact, for IF_STMT_CONSTEVAL_P have to, because then_ could contain consteval calls not constant expression evaluated). I guess we could do that for IF_STMT_CONSTEXPR_P too, that also doesn't allow gotos/switch into the branches. For this too. --- gcc/cp/call.c.jj2021-06-09 21:54:39.436194489 +0200 +++ gcc/cp/call.c 2021-06-10 09:49:35.949556470 +0200 @@ -8840,6 +8840,7 @@ immediate_invocation_p (tree fn, int nar || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl)) && (current_binding_level->kind != sk_function_parms || !current_binding_level->immediate_fn_ctx_p) + && !in_immediate_fn_ctx_p Now that we have this flag, shouldn't we set it in actual immediate function context? Or rename the flag to match when it's actually set. I guess I can try that. Though, I'm not sure if we could also get rid of the current_binding_level->immediate_fn_ctx_p for sk_function_parms case. Had a look at this, but could handle it. So instead I've renamed it to in_consteval_if_p. Ok for trunk if it passes bootstrap/regtest? OK. 2021-06-11 Jakub Jelinek PR c++/100974 - P1938R3 - if consteval gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins): Predefine __cpp_if_consteval for -std=c++2b. gcc/cp/ * cp-tree.h (struct saved_scope): Add consteval_if_p member. Formatting fix for the discarded_stmt comment. (in_consteval_if_p, IF_STMT_CONSTEVAL_P): Define. * parser.c (cp_parser_lambda_expression): Temporarily disable in_consteval_if_p when parsing lambda body. (cp_parser_selection_statement): Parse consteval if. * decl.c (struct named_label_entry): Add in_consteval_if member. (level_for_consteval_if): New function. (poplevel_named_label_1, check_previous_goto_1, check_goto): Handle consteval if. * constexpr.c (cxx_eval_builtin_function_call): Clarify in comment why CP_BUILT_IN_IS_CONSTANT_EVALUATED needs to *non_constant_p for !ctx->manifestly_const_eval. (cxx_eval_conditional_expression): For IF_STMT_CONSTEVAL_P evaluate condition as if it was __builtin_is_constant_evaluated call. (potential_constant_expression_1): For IF_STMT_CONSTEVAL_P always recurse on both branches. * cp-gimplify.c (genericize_if_stmt): Genericize IF_STMT_CONSTEVAL_P as the else branch. * pt.c (tsubst_expr) : Copy IF_STMT_CONSTEVAL_P. Temporarily set in_consteval_if_p when recursing on IF_STMT_CONSTEVAL_P then branch. (tsubst_lambda_expr): Temporarily disable in_consteval_if_p when instantiating lambda body. * call.c (immediate_invocation_p): Return false when in_consteval_if_p. gcc/testsuite/ * g++.dg/cpp23/consteval-if1.C: New test. * g++.dg/cpp23/consteval-if2.C: New test. * g++.dg/cpp23/consteval-if3.C: New test. * g++.dg/cpp23/consteval-if4.C: New test. * g++.dg/cpp23/consteval-if5.C: New test. * g++.dg/cpp23/consteval-if6.C: New test. * g++.dg/cpp23/consteval-if7.C: New test. * g++.dg/cpp23/consteval-if8.C: New test. * g++.dg/cpp23/consteval-if9.C: New test. * g++.dg/cpp23/consteval-if10.C: New test. * g++.dg/cpp23/feat-cxx2b.C: Add __cpp_if_consteval tests. --- gcc/c-family/c-cppbuiltin.c.jj 2021-06-10 19:56:20.584335765 +0200 +++ gcc/c-family/c-cppbuiltin.c 2021-06-11 11:34:15.408578098 +0200 @@ -1029,6 +1029,7 @@ c_cpp_builtins (cpp_reader *pfile) { /* Set feature test macros for C++23. */ cpp_define (pfile, "__cpp_size_t_suffix=202011L"); + cpp_define (pfile, "__cpp_if_consteval=202106L"); } if (flag_concepts) { --- gcc/cp/cp-tree.h.jj 2021-06-11 11:32:35.857980773 +0200 +++ gcc/cp/cp-tree.h2021-06-11 11:35:52.468210524 +0200 @@ -478,6 +478,7 @@ extern GTY(()) tree cp_global_trees[CPTI
[GCC][Patch] arm: Fix the mve multilib for the broken cmse support (pr99939).
Hi Richard, I have addressed all your review comments in https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571739.html in the following patch. The current CMSE support in the multilib build for "-march=armv8.1-m.main+mve -mfloat-abi=hard -mfpu=auto" is broken as specified in PR99939 and this patch fixes the issue. Regression tested on arm-none-eabi and found no regressions. Ok for master? and Ok for GCC-10 branch? Regards, Srinath. gcc/testsuite/ChangeLog: 2021-06-11 Srinath Parvathaneni PR target/99939 * gcc.target/arm/cmse/cmse-18.c: Add separate scan-assembler directives check for target is v8.1-m.main+mve or not before comparing the assembly output. * gcc.target/arm/cmse/cmse-20.c: New test. libgcc/ChangeLog: 2021-06-11 Srinath Parvathaneni PR target/99939 * config/arm/cmse_nonsecure_call.S: Add __ARM_FEATURE_MVE macro. * config/arm/t-arm: To link cmse.o and cmse_nonsecure_call.o on passing -mcmse option. ### Attachment also inlined for ease of reply### diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c index e1ff09257b7900982f49117d4cfc16f3bd79d76c..db7d975a90ea4bd1810aea03949ec1e8837e 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c +++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c @@ -8,4 +8,5 @@ void bar(f func, int a) func(a); } -/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" } } */ +/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" { target { ! arm_v8_1m_mve_ok } } } } */ +/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r\[0-7\]:SI\\\]\\\]" "final" { target { arm_v8_1m_mve_ok } } } } */ diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c new file mode 100644 index ..08e89bff6378f1f96950fc40f3ab3946bd433773 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c @@ -0,0 +1,28 @@ +/* This test is executed only if the execution engine supports CMSE instructions. */ +/* { dg-options "--save-temps -mcmse -Wl,--section-start,.gnu.sgstubs=0x0040" } */ + +#include +#include +#include + +void __attribute__((cmse_nonsecure_entry)) +secure_fun (int a, int *p) +{ + void *b = cmse_check_address_range ((void *)p, a, 1); + + if (b == NULL) + __builtin_abort (); + printf("%d", *((int *)b)); +} + +int +main (void) +{ + int *ptr; + int size = 1; + ptr = (int *) calloc (1, sizeof(int *)); + *ptr = 1315852292; + secure_fun (size, ptr); + free (ptr); + return 0; +} diff --git a/libgcc/config/arm/cmse_nonsecure_call.S b/libgcc/config/arm/cmse_nonsecure_call.S index 146f3ed52e9c7e915e5dbd9b70624ec3bd7cd5b5..00830ade98ea650c328c709d5d308fbc96f7f21c 100644 --- a/libgcc/config/arm/cmse_nonsecure_call.S +++ b/libgcc/config/arm/cmse_nonsecure_call.S @@ -25,7 +25,7 @@ .syntax unified #ifdef __ARM_PCS_VFP -# if __ARM_FP & 0x8 +# if (__ARM_FP & 0x8) || (__ARM_FEATURE_MVE & 1) .fpu fpv5-d16 # else .fpu fpv4-sp-d16 @@ -59,7 +59,7 @@ vmov s24, s25, r5, r5 vmov s26, s27, r5, r5 vmov s28, s29, r5, r5 vmov s30, s31, r5, r5 -#elif __ARM_FP & 0x08 +#elif (__ARM_FP & 0x8) || (__ARM_FEATURE_MVE & 1) vmov.f64d9, d8 vmov.f64d10, d8 vmov.f64d11, d8 diff --git a/libgcc/config/arm/t-arm b/libgcc/config/arm/t-arm index 3625a2590beec4e4e0e0881be9ad284c595c7190..c1553d4e5d80751b13dc2e9c9e36d5ebe82e5f8c 100644 --- a/libgcc/config/arm/t-arm +++ b/libgcc/config/arm/t-arm @@ -3,18 +3,17 @@ LIB1ASMFUNCS = _thumb1_case_sqi _thumb1_case_uqi _thumb1_case_shi \ _thumb1_case_uhi _thumb1_case_si _speculation_barrier HAVE_CMSE:=$(findstring __ARM_FEATURE_CMSE,$(shell $(gcc_compile_bare) -dM -E - /dev/null 2>/dev/null; echo $$?),0) CMSE_OPTS:=-mcmse endif ifdef HAVE_CMSE -ifndef HAVE_V81M + libgcc-objects += cmse.o cmse_nonsecure_call.o cmse.o: $(srcdir)/config/arm/cmse.c $(gcc_compile) -c $(CMSE_OPTS) $< + cmse_nonsecure_call.o: $(srcdir)/config/arm/cmse_nonsecure_call.S $(gcc_compile) -c $< endif -endif diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c index e1ff09257b7900982f49117d4cfc16f3bd79d76c..db7d975a90ea4bd1810aea03949ec1e8837e 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c +++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c @@ -8,4 +8,5 @@ void bar(f func, int a) func(a); } -/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" } } */ +/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" { target { ! arm_v8_1m_mve_ok } } } } */ +/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r\[0-7\]:SI\\\]\\\]" "final" { target { arm_v8_1m_mve_ok } } } } */ diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c b/gcc/testsuite/gcc.target/arm/
Re: [GCC][Patch] arm: Fix the mve multilib for the broken cmse support (pr99939).
On 11/06/2021 14:02, Srinath Parvathaneni via Gcc-patches wrote: Hi Richard, I have addressed all your review comments in https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571739.html in the following patch. The current CMSE support in the multilib build for "-march=armv8.1-m.main+mve -mfloat-abi=hard -mfpu=auto" is broken as specified in PR99939 and this patch fixes the issue. Regression tested on arm-none-eabi and found no regressions. Ok for master? and Ok for GCC-10 branch? Regards, Srinath. gcc/testsuite/ChangeLog: 2021-06-11 Srinath Parvathaneni PR target/99939 * gcc.target/arm/cmse/cmse-18.c: Add separate scan-assembler directives check for target is v8.1-m.main+mve or not before comparing the assembly output. * gcc.target/arm/cmse/cmse-20.c: New test. libgcc/ChangeLog: 2021-06-11 Srinath Parvathaneni PR target/99939 * config/arm/cmse_nonsecure_call.S: Add __ARM_FEATURE_MVE macro. * config/arm/t-arm: To link cmse.o and cmse_nonsecure_call.o on passing -mcmse option. OK. R. ### Attachment also inlined for ease of reply### diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c index e1ff09257b7900982f49117d4cfc16f3bd79d76c..db7d975a90ea4bd1810aea03949ec1e8837e 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c +++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c @@ -8,4 +8,5 @@ void bar(f func, int a) func(a); } -/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" } } */ +/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" { target { ! arm_v8_1m_mve_ok } } } } */ +/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r\[0-7\]:SI\\\]\\\]" "final" { target { arm_v8_1m_mve_ok } } } } */ diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c new file mode 100644 index ..08e89bff6378f1f96950fc40f3ab3946bd433773 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c @@ -0,0 +1,28 @@ +/* This test is executed only if the execution engine supports CMSE instructions. */ +/* { dg-options "--save-temps -mcmse -Wl,--section-start,.gnu.sgstubs=0x0040" } */ + +#include +#include +#include + +void __attribute__((cmse_nonsecure_entry)) +secure_fun (int a, int *p) +{ + void *b = cmse_check_address_range ((void *)p, a, 1); + + if (b == NULL) + __builtin_abort (); + printf("%d", *((int *)b)); +} + +int +main (void) +{ + int *ptr; + int size = 1; + ptr = (int *) calloc (1, sizeof(int *)); + *ptr = 1315852292; + secure_fun (size, ptr); + free (ptr); + return 0; +} diff --git a/libgcc/config/arm/cmse_nonsecure_call.S b/libgcc/config/arm/cmse_nonsecure_call.S index 146f3ed52e9c7e915e5dbd9b70624ec3bd7cd5b5..00830ade98ea650c328c709d5d308fbc96f7f21c 100644 --- a/libgcc/config/arm/cmse_nonsecure_call.S +++ b/libgcc/config/arm/cmse_nonsecure_call.S @@ -25,7 +25,7 @@ .syntax unified #ifdef __ARM_PCS_VFP -# if __ARM_FP & 0x8 +# if (__ARM_FP & 0x8) || (__ARM_FEATURE_MVE & 1) .fpu fpv5-d16 # else .fpu fpv4-sp-d16 @@ -59,7 +59,7 @@ vmov s24, s25, r5, r5 vmov s26, s27, r5, r5 vmov s28, s29, r5, r5 vmov s30, s31, r5, r5 -#elif __ARM_FP & 0x08 +#elif (__ARM_FP & 0x8) || (__ARM_FEATURE_MVE & 1) vmov.f64d9, d8 vmov.f64d10, d8 vmov.f64d11, d8 diff --git a/libgcc/config/arm/t-arm b/libgcc/config/arm/t-arm index 3625a2590beec4e4e0e0881be9ad284c595c7190..c1553d4e5d80751b13dc2e9c9e36d5ebe82e5f8c 100644 --- a/libgcc/config/arm/t-arm +++ b/libgcc/config/arm/t-arm @@ -3,18 +3,17 @@ LIB1ASMFUNCS = _thumb1_case_sqi _thumb1_case_uqi _thumb1_case_shi \ _thumb1_case_uhi _thumb1_case_si _speculation_barrier HAVE_CMSE:=$(findstring __ARM_FEATURE_CMSE,$(shell $(gcc_compile_bare) -dM -E - -HAVE_V81M:=$(findstring armv8.1-m.main,$(gcc_compile_bare)) ifeq ($(shell $(gcc_compile_bare) -E -mcmse - /dev/null 2>/dev/null; echo $$?),0) CMSE_OPTS:=-mcmse endif ifdef HAVE_CMSE -ifndef HAVE_V81M + libgcc-objects += cmse.o cmse_nonsecure_call.o cmse.o: $(srcdir)/config/arm/cmse.c $(gcc_compile) -c $(CMSE_OPTS) $< + cmse_nonsecure_call.o: $(srcdir)/config/arm/cmse_nonsecure_call.S $(gcc_compile) -c $< endif -endif
Re: [PATCH 3/4] remove %K from error() calls in the aarch64/arm back ends (PR 98512)
On Fri, 11 Jun 2021 at 09:53, Christophe Lyon wrote: > > On Fri, 11 Jun 2021 at 01:29, Martin Sebor via Gcc-patches > wrote: > > > > This patch removes the uses of %K from error() calls in the aarch64 > > and arm back ends. I tested this change only by building a cross- > > compiler but I can't easily run the tests so I'd appreciate some help > > validating it. The fallout from the change should be limited to changes > > to error messages so in the worst case it could be addressed after > > committing the patch. > > I've submitted a validation with your patch, I'll let you know the > results (in a few hours) > I haven't noticed any regression with that patch alone. I just noticed it is patch 3/4: I didn't apply patches 1 and 2 before this one, do you want me to relaunch the tests with the 3 patches combined? Thanks, Christophe > Thanks > > Christophe
[PATCH v2] combine: Tweak the condition of last_set invalidation
Hi Segher, Thanks for the review! on 2021/6/10 上午4:17, Segher Boessenkool wrote: > Hi! > > On Wed, Dec 16, 2020 at 04:49:49PM +0800, Kewen.Lin wrote: >> Currently we have the check: >> >> if (!insn >>|| (value && rsp->last_set_table_tick >= label_tick_ebb_start)) >> rsp->last_set_invalid = 1; >> >> which means if we want to record some value for some reg and >> this reg got refered before in a valid scope, > > If we already know it is *set* in this same extended basic block. > Possibly by the same instruction btw. > >> we invalidate the >> set of reg (last_set_invalid to 1). It avoids to find the wrong >> set for one reg reference, such as the case like: >> >>... op regX // this regX could find wrong last_set below >>regX = ... // if we think this set is valid >>... op regX > > Yup, exactly. > >> But because of retry's existence, the last_set_table_tick could >> be set by some later reference insns, but we see it's set due >> to retry on the set (for that reg) insn again, such as: >> >>insn 1 >>insn 2 >> >>regX = ... --> (a) >>... op regX--> (b) >> >>insn 3 >> >>// assume all in the same BB. >> >> Assuming we combine 1, 2 -> 3 sucessfully and replace them as two >> (3 insns -> 2 insns), > > This will delete insn 1 and write the combined result to insns 2 and 3. > >> retrying from insn1 or insn2 again: > > Always 2, but your point remains valid. > >> it will scan insn (a) again, the below condition holds for regX: >> >> (value && rsp->last_set_table_tick >= label_tick_ebb_start) >> >> it will mark this set as invalid set. But actually the >> last_set_table_tick here is set by insn (b) before retrying, so it >> should be safe to be taken as valid set. > > Yup. > >> This proposal is to check whether the last_set_table safely happens >> after the current set, make the set still valid if so. > >> Full SPEC2017 building shows this patch gets more sucessful combines >> from 1902208 to 1902243 (trivial though). > > Do you have some example, or maybe even a testcase? :-) > Sorry for the late reply, it took some time to get one reduced case. typedef struct SA *pa_t; struct SC { int h; pa_t elem[]; }; struct SD { struct SC *e; }; struct SA { struct { struct SD f[1]; } g; }; void foo(pa_t *k, char **m) { int l, i; pa_t a; l = (int)a->g.f[5].e; i = 0; for (; i < l; i++) { k[i] = a->g.f[5].e->elem[i]; m[i] = ""; } } Baseline is r12-0 and the option is "-O3 -mcpu=power9 -fno-strict-aliasing", with this patch, the generated assembly can save two rlwinm s. >> + /* Record the luid of the insn whose expression involving register n. */ >> + >> + int last_set_table_luid; > > "Record the luid of the insn for which last_set_table_tick was set", > right? > But it can be updated later to one smaller luid, how about the wording like: + /* Record the luid of the insn which uses register n, the insn should + be the first one using register n in that block of the insn which + last_set_table_tick was set for. */ >> -static void update_table_tick (rtx); >> +static void update_table_tick (rtx, int); > > Please remove this declaration instead, the function is not used until > after its actual definition :-) > Done. >> @@ -13243,7 +13247,21 @@ update_table_tick (rtx x) >>for (r = regno; r < endregno; r++) >> { >>reg_stat_type *rsp = ®_stat[r]; >> - rsp->last_set_table_tick = label_tick; >> + if (rsp->last_set_table_tick >= label_tick_ebb_start) >> +{ >> + /* Later references should not have lower ticks. */ >> + gcc_assert (label_tick >= rsp->last_set_table_tick); > > This should be obvious, but checking it won't hurt, okay. > >> + /* Should pick up the lowest luid if the references >> + are in the same block. */ >> + if (label_tick == rsp->last_set_table_tick >> + && rsp->last_set_table_luid > insn_luid) >> +rsp->last_set_table_luid = insn_luid; > > Why? Is it conservative for the check you will do later? Please spell > this out, it is crucial! > Since later the combinations involving this insn probably make the register be used in one insn sitting ahead (which has smaller luid than the one which was recorded before). Yes, it's very conservative, this ensure that we always use the luid of the insn which is the first insn using this register in the block. The last_set invalidation is going to catch the case like: ... regX // avoid the set used here ... regX = ... ... Once we have the smallest luid one of all insns which use register X, any unsafe regX sets should be caught. I updated the comments to: + /* Since combination may generate some instructions + to replace some foregoing instructions with the + references to register r (using register r), we + need to
[committed] analyzer: tweak priority of callstrings in worklist::key_t::cmp
While debugging another issue I noticed that the analyzer could fail to merge nodes for control flow in which one path had called a function and another path hadn't: BB / \ /\ fn call no fn call \/ \ / join BB The root cause was that the worklist sort function wasn't prioritizing call strings, and thus it was fully exploring the "no function called" path to the exit BB, and only then exploring the "within the function call" parts of the "funcion called" path. This patch prioritizes call strings when sorting the worklist so that the nodes with deeper call strings are processed before those with shallower call strings, thus allowing such nodes to be merged at the joinpoint. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as 9d20ec97475b1102d6ca005ad165056d34615a3d. gcc/analyzer/ChangeLog: * engine.cc (worklist::key_t::cmp): Move sort by call_string to before SCC. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c: Update expected number of enodes after the loop. * gcc.dg/analyzer/paths-8.c: New test. Signed-off-by: David Malcolm --- gcc/analyzer/engine.cc| 25 ++- .../loop-0-up-to-n-by-1-with-iter-obj.c | 3 +-- gcc/testsuite/gcc.dg/analyzer/paths-8.c | 17 + 3 files changed, 37 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-8.c diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc index 5b519fdf385..48320bc062e 100644 --- a/gcc/analyzer/engine.cc +++ b/gcc/analyzer/engine.cc @@ -2004,7 +2004,25 @@ worklist::key_t::cmp (const worklist::key_t &ka, const worklist::key_t &kb) return cmp; } - /* First, order by SCC. */ + /* Sort by callstring, so that nodes with deeper call strings are processed + before those with shallower call strings. + If we have + splitting BB + / \ +/\ + fn call no fn call +\/ + \ / +join BB + then we want the path inside the function call to be fully explored up + to the return to the join BB before we explore on the "no fn call" path, + so that both enodes at the join BB reach the front of the worklist at + the same time and thus have a chance of being merged. */ + int cs_cmp = call_string::cmp (call_string_a, call_string_b); + if (cs_cmp) +return cs_cmp; + + /* Order by SCC. */ int scc_id_a = ka.get_scc_id (ka.m_enode); int scc_id_b = kb.get_scc_id (kb.m_enode); if (scc_id_a != scc_id_b) @@ -2033,11 +2051,6 @@ worklist::key_t::cmp (const worklist::key_t &ka, const worklist::key_t &kb) gcc_assert (snode_a == snode_b); - /* The points might vary by callstring; try sorting by callstring. */ - int cs_cmp = call_string::cmp (call_string_a, call_string_b); - if (cs_cmp) -return cs_cmp; - /* Order within supernode via program point. */ int within_snode_cmp = function_point::cmp_within_supernode (point_a.get_function_point (), diff --git a/gcc/testsuite/gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c b/gcc/testsuite/gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c index 2b0352711ae..0172c9b324c 100644 --- a/gcc/testsuite/gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c +++ b/gcc/testsuite/gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c @@ -69,6 +69,5 @@ void test(int n) free (it); - __analyzer_dump_exploded_nodes (0); /* { dg-warning "2 processed enodes" } */ - // TODO: why 2 enodes here, rather than 1 + __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed enode" } */ } diff --git a/gcc/testsuite/gcc.dg/analyzer/paths-8.c b/gcc/testsuite/gcc.dg/analyzer/paths-8.c new file mode 100644 index 000..b350d4d7dbd --- /dev/null +++ b/gcc/testsuite/gcc.dg/analyzer/paths-8.c @@ -0,0 +1,17 @@ +#include "analyzer-decls.h" + +static void __attribute__((noinline)) +__analyzer_callee_1 (void) +{ + /* empty. */ +} + +void +test_1 (int flag) +{ + if (flag) +__analyzer_callee_1 (); + + /* Verify that we merge state, whether or not the call happens. */ + __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed enode" } */ +} -- 2.26.3
[PATCH] tree-optimization/101025 - fix store-motion dependence checking
This plugs a hole in store-motion where it fails to perform dependence checking on conditionally executed but not store-motioned refs. Bootstrapped on x86_64-unknown-linux-gnu, regtest in progress. 2021-06-11 Richard Biener PR tree-optimization/101025 * tree-ssa-loop-im.c (sm_seq_valid_bb): Make sure to process all refs that require dependence checking. * gcc.dg/torture/pr101025.c: New testcase. --- gcc/testsuite/gcc.dg/torture/pr101025.c | 23 +++ gcc/tree-ssa-loop-im.c | 38 +++-- 2 files changed, 59 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr101025.c diff --git a/gcc/testsuite/gcc.dg/torture/pr101025.c b/gcc/testsuite/gcc.dg/torture/pr101025.c new file mode 100644 index 000..483e0ff7997 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr101025.c @@ -0,0 +1,23 @@ +/* { dg-do run } */ + +int a[10]; +int b, d, g; +volatile char c; +short e; +volatile int f; +int main() +{ + for (; d <= 9; d++) { + b = e = 0; + for (; e < 4; e++) +a[e] = 4; + for (; b <= 3; b++) +if (g) + f = 0; +else + a[b] = c; + } + if (a[1] != 0) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c index 8034cf68d27..1c865b28fd6 100644 --- a/gcc/tree-ssa-loop-im.c +++ b/gcc/tree-ssa-loop-im.c @@ -2169,6 +2169,7 @@ execute_sm (class loop *loop, im_mem_ref *ref, enum sm_kind { sm_ord, sm_unord, sm_other }; struct seq_entry { + seq_entry () {} seq_entry (unsigned f, sm_kind k, tree fr = NULL) : first (f), second (k), from (fr) {} unsigned first; @@ -2352,6 +2353,8 @@ sm_seq_valid_bb (class loop *loop, basic_block bb, tree vdef, unsigned min_len = MIN(first_edge_seq.length (), edge_seq.length ()); /* Incrementally merge seqs into first_edge_seq. */ + int first_uneq = -1; + auto_vec extra_refs; for (unsigned int i = 0; i < min_len; ++i) { /* ??? We can more intelligently merge when we face different @@ -2367,6 +2370,11 @@ sm_seq_valid_bb (class loop *loop, basic_block bb, tree vdef, bitmap_set_bit (refs_not_supported, edge_seq[i].first); first_edge_seq[i].second = sm_other; first_edge_seq[i].from = NULL_TREE; + /* Record the dropped refs for later processing. */ + if (first_uneq == -1) + first_uneq = i; + extra_refs.safe_push (seq_entry (edge_seq[i].first, + sm_other, NULL_TREE)); } /* sm_other prevails. */ else if (first_edge_seq[i].second != edge_seq[i].second) @@ -2399,10 +2407,36 @@ sm_seq_valid_bb (class loop *loop, basic_block bb, tree vdef, } else if (edge_seq.length () > first_edge_seq.length ()) { + if (first_uneq == -1) + first_uneq = first_edge_seq.length (); for (unsigned i = first_edge_seq.length (); i < edge_seq.length (); ++i) - if (edge_seq[i].second == sm_ord) - bitmap_set_bit (refs_not_supported, edge_seq[i].first); + { + if (edge_seq[i].second == sm_ord) + bitmap_set_bit (refs_not_supported, edge_seq[i].first); + extra_refs.safe_push (seq_entry (edge_seq[i].first, + sm_other, NULL_TREE)); + } + } + /* Put unmerged refs at first_uneq to force dependence checking +on them. */ + if (first_uneq != -1) + { + /* Missing ordered_splice_at. */ + if ((unsigned)first_uneq == first_edge_seq.length ()) + first_edge_seq.safe_splice (extra_refs); + else + { + unsigned fes_length = first_edge_seq.length (); + first_edge_seq.safe_grow (fes_length + + extra_refs.length ()); + memmove (&first_edge_seq[first_uneq + extra_refs.length ()], + &first_edge_seq[first_uneq], + (fes_length - first_uneq) * sizeof (seq_entry)); + memcpy (&first_edge_seq[first_uneq], + extra_refs.address (), + extra_refs.length () * sizeof (seq_entry)); + } } } /* Use the sequence from the first edge and push SMs down. */ -- 2.26.2
For 'OMP_CLAUSE' in 'dump_generic_node', dump the whole OMP clause chain
Hi! See attached. OK to push after testing -- or would there ever be a reason where this is not appropriate? For example, for 'gcc.dg/gomp/simd-clones-2.c', this changes ("corrects") the '*.optimized' dump file as follows: @@ -1,7 +1,7 @@ ;; Function addit (addit, funcdef_no=0, decl_uid=2073, cgraph_uid=1, symbol_order=0) -__attribute__((omp declare simd (notinbranch), omp declare simd (inbranch))) +__attribute__((omp declare simd (notinbranch aligned(2:32)), omp declare simd (inbranch uniform(2) linear(1:66 int addit (int a, int b, int * c) { int _3; @@ -16,7 +16,7 @@ int addit (int a, int b, int * c) ;; Function setArray (setArray, funcdef_no=1, decl_uid=2078, cgraph_uid=2, symbol_order=1) -__attribute__((omp declare simd (notinbranch))) +__attribute__((omp declare simd (notinbranch uniform(0) aligned(0:32) linear(2:1 float setArray (float * a, float x, int k) { long unsigned int _1; @@ -40,7 +40,7 @@ float setArray (float * a, float x, int k) ;; Function setArray.simdclone.0 (_ZGVbN4ua32vl_setArray, funcdef_no=2, decl_uid=2089, cgraph_uid=3, symbol_order=2) -__attribute__((omp declare simd (notinbranch))) +__attribute__((omp declare simd (notinbranch uniform(0) aligned(0:32) linear(2:1 vector(4) float setArray.simdclone.0 (float * a, vector(4) float simd.4, int k) { vector(4) float vect__8.86; @@ -65,7 +65,7 @@ vector(4) float setArray.simdclone.0 (float * a, vector(4) float simd.4, int k) ;; Function setArray.simdclone.1 (_ZGVcN8ua32vl_setArray, funcdef_no=3, decl_uid=2100, cgraph_uid=5, symbol_order=4) -__attribute__((omp declare simd (notinbranch))) +__attribute__((omp declare simd (notinbranch uniform(0) aligned(0:32) linear(2:1 vector(8) float setArray.simdclone.1 (float * a, vector(8) float simd.8, int k) { vector(8) float vect__8.98; @@ -90,7 +90,7 @@ vector(8) float setArray.simdclone.1 (float * a, vector(8) float simd.8, int k) ;; Function setArray.simdclone.2 (_ZGVdN8ua32vl_setArray, funcdef_no=4, decl_uid=2372, cgraph_uid=6, symbol_order=5) -__attribute__((omp declare simd (notinbranch))) +__attribute__((omp declare simd (notinbranch uniform(0) aligned(0:32) linear(2:1 vector(8) float setArray.simdclone.2 (float * a, vector(8) float simd.12, int k) { vector(8) float vect__8.110; @@ -115,7 +115,7 @@ vector(8) float setArray.simdclone.2 (float * a, vector(8) float simd.12, int k) ;; Function setArray.simdclone.3 (_ZGVeN16ua32vl_setArray, funcdef_no=5, decl_uid=2558, cgraph_uid=7, symbol_order=6) -__attribute__((omp declare simd (notinbranch))) +__attribute__((omp declare simd (notinbranch uniform(0) aligned(0:32) linear(2:1 vector(16) float setArray.simdclone.3 (float * a, vector(16) float simd.16, int k) { vector(16) float vect__8.122; @@ -140,7 +140,7 @@ vector(16) float setArray.simdclone.3 (float * a, vector(16) float simd.16, int ;; Function addit.simdclone.0 (_ZGVbN4vvva32_addit, funcdef_no=6, decl_uid=3029, cgraph_uid=8, symbol_order=7) -__attribute__((omp declare simd (notinbranch), omp declare simd (inbranch))) +__attribute__((omp declare simd (notinbranch aligned(2:32)), omp declare simd (inbranch uniform(2) linear(1:66 vector(4) int addit.simdclone.0 (vector(4) int simd.22, vector(4) int simd.23, vector(2) unsigned long simd.24, vector(2) unsigned long simd.25) { vector(4) int vect__3.134; @@ -155,7 +155,7 @@ vector(4) int addit.simdclone.0 (vector(4) int simd.22, vector(4) int simd.23, v ;; Function addit.simdclone.1 (_ZGVcN4vvva32_addit, funcdef_no=7, decl_uid=3046, cgraph_uid=9, symbol_order=8) -__attribute__((omp declare simd (notinbranch), omp declare simd (inbranch))) +__attribute__((omp declare simd (notinbranch aligned(2:32)), omp declare simd (inbranch uniform(2) linear(1:66 vector(4) int addit.simdclone.1 (vector(4) int simd.31, vector(4) int simd.32, vector(2) unsigned long simd.33, vector(2) unsigned long simd.34) { vector(4) int vect__3.144; @@ -170,7 +170,7 @@ vector(4) int addit.simdclone.1 (vector(4) int simd.31, vector(4) int simd.32, v ;; Function addit.simdclone.2 (_ZGVdN8vvva32_addit, funcdef_no=8, decl_uid=3063, cgraph_uid=10, symbol_order=9) -__attribute__((omp declare simd (notinbranch), omp declare simd (inbranch))) +__attribute__((omp declare simd (notinbranch aligned(2:32)), omp declare simd (inbranch uniform(2) linear(1:66 vector(8) int addit.simdclone.2 (vector(8) int simd.40, vector(8) int simd.41, vector(4) unsigned long simd.42, vector(4) unsigned long simd.43) { vector(8) int vect__3.154; @@ -185,7 +185,7 @@ vector(8) int addit.simdclone.2 (vector(8) int simd.40, vector(8) int simd.41, v ;; Function addit.simdclone.3 (_ZGVeN16vvva32_addit, funcdef_n
Re: For 'OMP_CLAUSE' in 'dump_generic_node', dump the whole OMP clause chain
On Fri, Jun 11, 2021 at 04:04:35PM +0200, Thomas Schwinge wrote: > >From db04d261071eb5691906fd46436032a7db8a0b02 Mon Sep 17 00:00:00 2001 > From: Thomas Schwinge > Date: Fri, 11 Jun 2021 15:37:33 +0200 > Subject: [PATCH] For 'OMP_CLAUSE' in 'dump_generic_node', dump the whole OMP > clause chain > > ... instead of just the first clause. > > gcc/ > * tree-pretty-print.h (dump_omp_clauses): Add 'bool = true' > default argument. > * tree-pretty-print.c (dump_omp_clauses): Update. > (dump_generic_node) : Use it. > gcc/testsuite/ > * gcc.dg/gomp/simd-clones-2.c: Enhance. LGTM, thanks. Jakub
Re: [PATCH] c++: Substitute into function parms in lexical order [PR96560]
On Thu, Apr 29, 2021 at 7:48 AM Patrick Palka wrote: > > On Wed, 28 Apr 2021, Jason Merrill wrote: > > > On 4/28/21 2:24 PM, Patrick Palka wrote: > > > This makes tsubst_arg_types substitute into a function's parameter types > > > in left-to-right order instead of right-to-left order, in accordance with > > > DR 1227. > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for > > > trunk? [ diff generated with -w to hide noisy whitespace changes ] > > > > OK. We'll still substitute lambda default args in reverse order, but that > > shouldn't happen in sfinae context, so that shouldn't be a problem. Maybe > > add > > an assert (complain & tf_error) in the lambda case? > > Apparently cpp2a/lambda-uneval2.C trips over such an assert during > deduction for: > > template > auto j(T t) -> decltype([](auto x) -> decltype(x.invalid) { } (t)); > > which makes sense, I suppose. > > It looks like we can just move up the default argument processing to > before the recursive call to tsubst_arg_types, as in the below. > Bootstrapped and regtested on x86_64-pc-linux-gnu. Ping > > BTW, now that LAMBDA_EXPR has LAMBDA_EXPR_REGEN_INFO I think we have > enough information to defer substituting into lambda default arguments > until they're actually needed. Would that be something to look into as > a followup patch? > > -- >8 -- > > Subject: [PATCH] c++: Substitute into function parms in lexical order > [PR96560] > > This makes tsubst_arg_types substitute into a function's parameter types > in left-to-right order instead of right-to-left order, in accordance with > DR 1227. > > gcc/cp/ChangeLog: > > DR 1227 > PR c++/96560 > * pt.c (tsubst_arg_types): Rearrange so that we substitute into > TYPE_ARG_TYPES in forward order while short circuiting > appropriately. Adjust formatting. > > gcc/testsuite/ChangeLog: > > DR 1227 > PR c++/96560 > * g++.dg/template/sfinae-dr1227.C: New test. > --- > gcc/cp/pt.c | 51 ++- > gcc/testsuite/g++.dg/template/sfinae-dr1227.C | 23 + > 2 files changed, 51 insertions(+), 23 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/template/sfinae-dr1227.C > > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c > index eaf46659f85..e6d65595e2f 100644 > --- a/gcc/cp/pt.c > +++ b/gcc/cp/pt.c > @@ -15068,20 +15068,13 @@ tsubst_arg_types (tree arg_types, > tsubst_flags_t complain, > tree in_decl) > { > - tree remaining_arg_types; >tree type = NULL_TREE; > - int i = 1; > + int len = 1; >tree expanded_args = NULL_TREE; > - tree default_arg; > >if (!arg_types || arg_types == void_list_node || arg_types == end) > return arg_types; > > - remaining_arg_types = tsubst_arg_types (TREE_CHAIN (arg_types), > - args, end, complain, in_decl); > - if (remaining_arg_types == error_mark_node) > -return error_mark_node; > - >if (PACK_EXPANSION_P (TREE_VALUE (arg_types))) > { >/* For a pack expansion, perform substitution on the > @@ -15092,7 +15085,7 @@ tsubst_arg_types (tree arg_types, > >if (TREE_CODE (expanded_args) == TREE_VEC) > /* So that we'll spin through the parameters, one by one. */ > -i = TREE_VEC_LENGTH (expanded_args); > + len = TREE_VEC_LENGTH (expanded_args); >else > { >/* We only partially substituted into the parameter > @@ -15101,14 +15094,15 @@ tsubst_arg_types (tree arg_types, >expanded_args = NULL_TREE; > } > } > + else > +type = tsubst (TREE_VALUE (arg_types), args, complain, in_decl); > > - while (i > 0) { > ---i; > - > + /* Check if a substituted type is erroneous before substituting into > + the rest of the chain. */ > + for (int i = 0; i < len; i++) > +{ >if (expanded_args) > type = TREE_VEC_ELT (expanded_args, i); > -else if (!type) > - type = tsubst (TREE_VALUE (arg_types), args, complain, in_decl); > >if (type == error_mark_node) > return error_mark_node; > @@ -15122,15 +15116,12 @@ tsubst_arg_types (tree arg_types, > } > return error_mark_node; > } > - > -/* Do array-to-pointer, function-to-pointer conversion, and ignore > - top-level qualifiers as required. */ > -type = cv_unqualified (type_decays_to (type)); > +} > >/* We do not substitute into default arguments here. The standard > mandates that they be instantiated only when needed, which is > done in build_over_call. */ > -default_arg = TREE_PURPOSE (arg_types); > + tree default_arg = TREE_PURPOSE (arg_types); > >/* Except that we do substitute default arguments under tsubst_lambda_expr, > since the new op() won't have any associated template arguments for us > @@ -15139,20 +15130,34 @@ tsubst_arg_types (tree arg_types, > default_arg = tsu
Re: GCC documentation: porting to Sphinx
On 6/10/21 6:49 PM, Joseph Myers wrote: On Thu, 10 Jun 2021, Martin Liška wrote: 1) Can we organize the new documentation in $gccroot/doc folder similarly to what I have in texi2rst-generated repo? Would be beneficial as we can have a single Makefile and shared content will be in a same depth to the individual manuals. Where languages have their own manuals, I think it's more appropriate for those to go under the language-specific directories. So it will require the following folder structure: $gccroot/gcc/doc/gcc - for GCC documentation $gccroot/gcc/doc/gccint - for GCC internal documentation $gccroot/gcc/doc/gfortran - for Fortran documentation $gccroot/gcc/doc/gccgo - for GO documentation ... $gccroot/doc/share - shared components $gccroot/libgomp/doc - for libgomp documentation ... Are you fine with that? That doesn't stop the use of shared makefile code. Make-lang.in is a fragment included from gcc/Makefile.in ("-include $(LANG_MAKEFRAGS)"). I certainly expect it should be possible to write GNU make code in gcc/Makefile.in for building and installing manuals, such that subdirectories only need to define a few variables describing what manuals they have and everything else is handled by common code. The Sphinx Makefile will be capable of e.g. make html -C $gccroot/gcc/doc/gcc BUILDDIR=`pwd/put_it_somewhere` and the only configure dependency will VERSION_PACKAGE and BUGURL which will be provided in env: https://github.com/marxin/texi2rst-generated/blob/6cfcb7b8ae6497d49ea23a38262dfa26854bdb40/sphinx/baseconf.py#L38-L39 Martin
Re: GCC documentation: porting to Sphinx
On 6/11/21 1:48 AM, Martin Sebor wrote: On 6/10/21 7:18 AM, Martin Liška wrote: On 6/10/21 11:07 AM, Martin Liška wrote: Doing that, one has 2 unique links, that would be needed for get_option_url function. Plus, both :option:`-Wfoo` and :option:`-Wno-foo` references are going to work. And I've actually did the transformation and one can see it e.g. here: https://splichal.eu/scripts/sphinx/gcc/_build/html/gcc-command-options/options-to-request-or-suppress-warnings.html#cmdoption-Wprio-ctor-dtor I find the style you have below right now clearer than keeping both options in the same heading and adding a Note explaining the default etc. I.e., this _ -Wchar-subscripts Warn if an array subscript has type char. This is a common cause of error, as programmers often forget that this type is signed on some machines. This warning is enabled by -Wall. -Wno-char-subscripts Default option value for -Wchar-subscripts. Yes, fully agree with you! Thanks, Martin on this page right now: https://splichal.eu/scripts/sphinx/gcc/_build/html/gcc-command-options/options-to-request-or-suppress-warnings.html#cmdoption-Wchar-subscripts seems better than this: _ -Wno-shift-overflow, -Wshift-overflow=n, -Wshift-overflow +--+ | (!) Note | +--- | Default value is -Wno-shift-overflow, -Wshift-overflow is | | enabled by -Wall. | +--+ These options control warnings about left shift overflows. and also better than the alternative with (non-default) after the option in the heading. https://splichal.eu/scripts/sphinx/demo/_build/html/#cmdoption-Wno-shift-overflow3 Martin Martin
Re: [PATCH 3/4] remove %K from error() calls in the aarch64/arm back ends (PR 98512)
On 6/11/21 3:58 AM, Richard Sandiford wrote: Martin Sebor via Gcc-patches writes: diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7b37e1b602c..7cdc824730c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -13242,13 +13242,8 @@ bounds_check (rtx operand, HOST_WIDE_INT low, HOST_WIDE_INT high, lane = INTVAL (operand); if (lane < low || lane >= high) -{ - if (exp) - error ("%K%s %wd out of range %wd - %wd", - exp, desc, lane, low, high - 1); - else - error ("%s %wd out of range %wd - %wd", desc, lane, low, high - 1); -} +error_at (EXPR_LOCATION (exp), + "%s %wd out of range %wd - %wd", desc, lane, low, high - 1); } /* Bounds-check lanes. */ This part doesn't look safe: “exp” is null when called from arm_const_bounds. Doh! Yes, will fix, thanks. Martin Thanks, Richard
Re: [PATCH 3/4] remove %K from error() calls in the aarch64/arm back ends (PR 98512)
On 6/11/21 7:10 AM, Christophe Lyon wrote: On Fri, 11 Jun 2021 at 09:53, Christophe Lyon wrote: On Fri, 11 Jun 2021 at 01:29, Martin Sebor via Gcc-patches wrote: This patch removes the uses of %K from error() calls in the aarch64 and arm back ends. I tested this change only by building a cross- compiler but I can't easily run the tests so I'd appreciate some help validating it. The fallout from the change should be limited to changes to error messages so in the worst case it could be addressed after committing the patch. I've submitted a validation with your patch, I'll let you know the results (in a few hours) I haven't noticed any regression with that patch alone. I just noticed it is patch 3/4: I didn't apply patches 1 and 2 before this one, do you want me to relaunch the tests with the 3 patches combined? That would be great, as soon as I fix the thinko Richard Sandiford pointed out. Thanks for the offer! Martin Thanks, Christophe Thanks Christophe
[PATCH v2] inline-asm: Fix ICE with bitfields in "m" operands [PR100785]
On Wed, Jun 02, 2021 at 12:14:51PM -0400, Jason Merrill wrote: > On 6/2/21 11:25 AM, Jakub Jelinek wrote: > > On Wed, Jun 02, 2021 at 11:09:45AM -0400, Jason Merrill wrote: > > > On 6/2/21 3:59 AM, Jakub Jelinek wrote: > > > > if (!allows_reg && !cxx_mark_addressable (*op)) > > > > operand = error_mark_node; > > > > + else if (!allows_reg && bitfield_p (*op)) > > > > + { > > > > + error_at (loc, "attempt to take address of > > > > bit-field"); > > > > > > Hmm, why aren't we already catching this in cxx_mark_addressable? > > > > That is certainly possible, but it goes against Eric's patch > > https://gcc.gnu.org/pipermail/gcc-patches/2015-June/421483.html > > Hmm, I wonder what his rationale was? No idea (and see below, nothing in the testsuite seems to require that). Anyway, sorry for forgetting about this. Here is a variant of the patch that adds the errors to {c,cxx}_mark_addressable but keeps them in build_unary_op/cp_build_addr_expr_1 too where for C++ it handles SFINAE there etc. Bootstrapped/regtested on x86_64-linux and i686-linux. 2021-06-11 Jakub Jelinek PR inline-asm/100785 gcc/ * gimplify.c (gimplify_asm_expr): Don't diagnose errors if output or input operands were already error_mark_node. * cfgexpand.c (expand_asm_stmt): If errors are emitted, remove all inputs, outputs and clobbers from the asm and set template to "". gcc/c/ * c-typeck.c (c_mark_addressable): Diagnose trying to make bit-fields addressable. gcc/cp/ * semantics.c (cxx_mark_addressable): Diagnose trying to make bit-fields addressable. gcc/testsuite/ * c-c++-common/pr100785.c: New test. * gcc.dg/pr48552-1.c: Don't expect invalid lvalue errors. * gcc.dg/pr48552-2.c: Likewise. --- gcc/gimplify.c.jj 2021-06-10 15:27:35.141299425 +0200 +++ gcc/gimplify.c 2021-06-11 13:27:51.212867419 +0200 @@ -6321,12 +6321,14 @@ gimplify_asm_expr (tree *expr_p, gimple_ if (!allows_reg && allows_mem) mark_addressable (TREE_VALUE (link)); + tree orig = TREE_VALUE (link); tret = gimplify_expr (&TREE_VALUE (link), pre_p, post_p, is_inout ? is_gimple_min_lval : is_gimple_lvalue, fb_lvalue | fb_mayfail); if (tret == GS_ERROR) { - error ("invalid lvalue in % output %d", i); + if (orig != error_mark_node) + error ("invalid lvalue in % output %d", i); ret = tret; } @@ -6521,8 +6523,9 @@ gimplify_asm_expr (tree *expr_p, gimple_ mark_addressable (TREE_VALUE (link)); if (tret == GS_ERROR) { - error_at (EXPR_LOC_OR_LOC (TREE_VALUE (link), input_location), - "memory input %d is not directly addressable", i); + if (inputv != error_mark_node) + error_at (EXPR_LOC_OR_LOC (TREE_VALUE (link), input_location), + "memory input %d is not directly addressable", i); ret = tret; } } --- gcc/cfgexpand.c.jj 2021-06-02 10:07:47.632826557 +0200 +++ gcc/cfgexpand.c 2021-06-11 13:27:51.212867419 +0200 @@ -3082,6 +3082,7 @@ expand_asm_stmt (gasm *stmt) unsigned ninputs = gimple_asm_ninputs (stmt); unsigned nlabels = gimple_asm_nlabels (stmt); unsigned i; + bool error_seen = false; /* ??? Diagnose during gimplification? */ if (ninputs + noutputs + nlabels > MAX_RECOG_OPERANDS) @@ -3140,6 +3141,7 @@ expand_asm_stmt (gasm *stmt) { /* ??? Diagnose during gimplification? */ error ("unknown register name %qs in %", regname); + error_seen = true; } else if (j == -4) { @@ -3202,7 +3204,10 @@ expand_asm_stmt (gasm *stmt) && REG_P (DECL_RTL (output_tvec[j])) && HARD_REGISTER_P (DECL_RTL (output_tvec[j])) && output_hregno == REGNO (DECL_RTL (output_tvec[j]))) - error ("invalid hard register usage between output operands"); + { + error ("invalid hard register usage between output operands"); + error_seen = true; + } /* Verify matching constraint operands use the same hard register and that the non-matching constraint operands do not use the same @@ -3225,13 +3230,19 @@ expand_asm_stmt (gasm *stmt) } if (i == match && output_hregno != input_hregno) - error ("invalid hard register usage between output operand " -"and matching constraint operand"); + { + error ("invalid hard register usage between output " + "operand and matching constraint operand"); + error_seen = true; +
Re: [PATCH v2] inline-asm: Fix ICE with bitfields in "m" operands [PR100785]
On 6/11/21 10:50 AM, Jakub Jelinek wrote: On Wed, Jun 02, 2021 at 12:14:51PM -0400, Jason Merrill wrote: On 6/2/21 11:25 AM, Jakub Jelinek wrote: On Wed, Jun 02, 2021 at 11:09:45AM -0400, Jason Merrill wrote: On 6/2/21 3:59 AM, Jakub Jelinek wrote: if (!allows_reg && !cxx_mark_addressable (*op)) operand = error_mark_node; + else if (!allows_reg && bitfield_p (*op)) + { + error_at (loc, "attempt to take address of bit-field"); Hmm, why aren't we already catching this in cxx_mark_addressable? That is certainly possible, but it goes against Eric's patch https://gcc.gnu.org/pipermail/gcc-patches/2015-June/421483.html Hmm, I wonder what his rationale was? No idea (and see below, nothing in the testsuite seems to require that). Anyway, sorry for forgetting about this. Here is a variant of the patch that adds the errors to {c,cxx}_mark_addressable but keeps them in build_unary_op/cp_build_addr_expr_1 too where for C++ it handles SFINAE there etc. Bootstrapped/regtested on x86_64-linux and i686-linux. C++ change is OK; the whole patch is OK on Tuesday if no other comments. 2021-06-11 Jakub Jelinek PR inline-asm/100785 gcc/ * gimplify.c (gimplify_asm_expr): Don't diagnose errors if output or input operands were already error_mark_node. * cfgexpand.c (expand_asm_stmt): If errors are emitted, remove all inputs, outputs and clobbers from the asm and set template to "". gcc/c/ * c-typeck.c (c_mark_addressable): Diagnose trying to make bit-fields addressable. gcc/cp/ * semantics.c (cxx_mark_addressable): Diagnose trying to make bit-fields addressable. gcc/testsuite/ * c-c++-common/pr100785.c: New test. * gcc.dg/pr48552-1.c: Don't expect invalid lvalue errors. * gcc.dg/pr48552-2.c: Likewise. --- gcc/gimplify.c.jj 2021-06-10 15:27:35.141299425 +0200 +++ gcc/gimplify.c 2021-06-11 13:27:51.212867419 +0200 @@ -6321,12 +6321,14 @@ gimplify_asm_expr (tree *expr_p, gimple_ if (!allows_reg && allows_mem) mark_addressable (TREE_VALUE (link)); + tree orig = TREE_VALUE (link); tret = gimplify_expr (&TREE_VALUE (link), pre_p, post_p, is_inout ? is_gimple_min_lval : is_gimple_lvalue, fb_lvalue | fb_mayfail); if (tret == GS_ERROR) { - error ("invalid lvalue in % output %d", i); + if (orig != error_mark_node) + error ("invalid lvalue in % output %d", i); ret = tret; } @@ -6521,8 +6523,9 @@ gimplify_asm_expr (tree *expr_p, gimple_ mark_addressable (TREE_VALUE (link)); if (tret == GS_ERROR) { - error_at (EXPR_LOC_OR_LOC (TREE_VALUE (link), input_location), - "memory input %d is not directly addressable", i); + if (inputv != error_mark_node) + error_at (EXPR_LOC_OR_LOC (TREE_VALUE (link), input_location), + "memory input %d is not directly addressable", i); ret = tret; } } --- gcc/cfgexpand.c.jj 2021-06-02 10:07:47.632826557 +0200 +++ gcc/cfgexpand.c 2021-06-11 13:27:51.212867419 +0200 @@ -3082,6 +3082,7 @@ expand_asm_stmt (gasm *stmt) unsigned ninputs = gimple_asm_ninputs (stmt); unsigned nlabels = gimple_asm_nlabels (stmt); unsigned i; + bool error_seen = false; /* ??? Diagnose during gimplification? */ if (ninputs + noutputs + nlabels > MAX_RECOG_OPERANDS) @@ -3140,6 +3141,7 @@ expand_asm_stmt (gasm *stmt) { /* ??? Diagnose during gimplification? */ error ("unknown register name %qs in %", regname); + error_seen = true; } else if (j == -4) { @@ -3202,7 +3204,10 @@ expand_asm_stmt (gasm *stmt) && REG_P (DECL_RTL (output_tvec[j])) && HARD_REGISTER_P (DECL_RTL (output_tvec[j])) && output_hregno == REGNO (DECL_RTL (output_tvec[j]))) - error ("invalid hard register usage between output operands"); + { + error ("invalid hard register usage between output operands"); + error_seen = true; + } /* Verify matching constraint operands use the same hard register and that the non-matching constraint operands do not use the same @@ -3225,13 +3230,19 @@ expand_asm_stmt (gasm *stmt) } if (i == match && output_hregno != input_hregno) - error ("invalid hard register usage between output operand " -"and matching constraint operand"); + { + error ("invalid hard register usage between output " + "operand and matching constraint op
Re: [PATCH 2/2] arm: Auto-vectorization for MVE: add pack/unpack patterns
On Fri, 11 Jun 2021 at 12:20, Richard Sandiford wrote: > > Christophe Lyon writes: > > In the meantime, I tried to make some progress, and looked at how > > things work on aarch64. > > > > This led me to define (in mve.md): > > > > (define_insn "@mve_vec_pack_trunc_lo_" > > [(set (match_operand: 0 "register_operand" "=w") > >(truncate: (match_operand:MVE_5 1 "register_operand" > > "w")))] > > "TARGET_HAVE_MVE" > > "vmovnb.i %q0, %q1\;" > > [(set_attr "type" "mve_move")] > > ) > > > > (define_insn "@mve_vec_pack_trunc_hi_" > > [(set (match_operand: 0 "register_operand" "=w") > >(vec_concat: > > (match_operand: 1 "register_operand" "0") > > (truncate: > > (match_operand:MVE_5 2 "register_operand" "w"] > > "TARGET_HAVE_MVE" > > "vmovnt.i %q0, %q2\;" > > [(set_attr "type" "mve_move")] > > ) > > > > and in vec-common.md, for > > (define_expand "vec_pack_trunc_" > > [(set (match_operand: 0 "register_operand") > >(vec_concat: > > (truncate: > > (match_operand:VN 1 "register_operand")) > > (truncate: > > (match_operand:VN 2 "register_operand"] > > > > I expand this for MVE: > > rtx tmpreg = gen_reg_rtx (mode); > > emit_insn (gen_mve_vec_pack_trunc_lo (mode, tmpreg, > > operands[1])); > > emit_insn (gen_mve_vec_pack_trunc_hi (mode, operands[0], > > tmpreg, operands[2])); > > > > I am getting an error in reload: > > error: unable to generate reloads for: > > (insn 10 9 11 2 (set (reg:V4HI 122 [ vect__4.16 ]) > > (truncate:V4HI (reg:V4SI 118 [ vect__4.16 ]))) > > "/gcc.target/arm/simd/mve-vec-pack.c":17:112 3609 > > {mve_vec_pack_trunc_lo_v4si} > > (expr_list:REG_DEAD (reg:V4SI 118 [ vect__4.16 ]) > > (nil))) > > > > The next insn is: > > (insn 11 10 12 2 (set (reg:V8HI 121 [ vect__7.18 ]) > > (vec_concat:V8HI (reg:V4HI 122 [ vect__4.16 ]) > > (truncate:V4HI (reg:V4SI 119 [ vect__4.17 ] > > "/gcc.target/arm/simd/mve-vec-pack.c":17:112 3611 > > {mve_vec_pack_trunc_hi_v4si} > > (expr_list:REG_DEAD (reg:V4HI 122 [ vect__4.16 ]) > > (expr_list:REG_DEAD (reg:V4SI 119 [ vect__4.17 ]) > > (nil > > > > What is causing the reload error? > > For this to work on MVE, we'd need to allow the 64-bit V_narrow modes to be > stored in MVE registers. I'm not sure off-hand whether allowing that > would be a good idea or not. > > If we continue to allow only 128-bit vectors in MVE registers then > we'll need to continue to use unspecs instead of truncate and vec_concat. > Thanks for the feedback. How about v2 attached? Do you want me to merge neon_vec_unpack and mve_vec_unpack and only have different assembly? if (TARGET_HAVE_MVE) return "vmovlb.%# %q0, %q1"; else return "vmovlb.%# %q0, %q1"; > Thanks, > Richard From 5abc6196c934438421e293b6b4733a98ec80e982 Mon Sep 17 00:00:00 2001 From: Christophe Lyon Date: Thu, 3 Jun 2021 14:35:50 + Subject: [PATCH v2] arm: Auto-vectorization for MVE: add pack/unpack patterns This patch adds vec_unpack_hi_, vec_unpack_lo_, vec_pack_trunc_ patterns for MVE. It does so by moving the unpack patterns from neon.md to vec-common.md, while adding them support for MVE. The pack expander is derived from the Neon one (which in turn is renamed into neon_quad_vec_pack_trunc_). The patch introduces mve_vec_unpack_lo_ and mve_vec_unpack_hi_ which are similar to their Neon counterparts, except for the assembly syntax. I did not prefix them with '@', because I couldn't find how to call gen_mve_vec_unpack_[lo|hi] with as first parameter. Anyway, we can keep the VU iterator instead of MVE_3 since the offending V4SI mode is disabled in the expander. The patch introduces mve_vec_pack_trunc_lo_ to avoid the need for a zero-initialized temporary, which is needed if the vec_pack_trunc_ expander calls @mve_vmovn[bt]q_ instead. With this patch, we can now vectorize the 16 and 8-bit versions of vclz and vshl, although the generated code could still be improved. For test_clz_s16, we now generate vldrh.16q3, [r1] vmovlb.s16 q2, q3 vmovlt.s16 q3, q3 vclz.i32 q2, q2 vclz.i32 q3, q3 vmovnb.i32 q1, q2 vmovnt.i32 q1, q3 vstrh.16q1, [r0] which could be improved to vldrh.16q3, [r1] vclz.i16 q1, q3 vstrh.16q1, [r0] if we could avoid the need for unpack/pack steps. For reference, clang-12 generates: vldrh.s32 q0, [r1] vldrh.s32 q1, [r1, #8] vclz.i32q0, q0 vstrh.32q0, [r0] vclz.i32q0, q1 vstrh.32q0, [r0, #8] 2021-06-11 Christophe Lyon gcc/ * config/arm/mve.md (mve_vec_unpack_lo_): New pattern. (mve_vec_unpack_hi_): New pattern. (@mve_vec_pack_trunc_lo_): New pattern. (mve_vmovntq_): Prefix with '@'. * config/arm/neon.md (vec_unpack_hi_): Move to vec-comm
Re: [PATCH 0/3]: C N2653 char8_t implementation
On 6/7/21 5:03 PM, Joseph Myers wrote: On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote: These changes do not impact default gcc behavior. The existing -fchar8_t option is extended to C compilation to enable the N2653 changes, and -fno-char8_t is extended to explicitly disable them. N2653 has not yet been accepted by WG14, so no changes are made to handling of the C2X language dialect. Why is that option needed? Normally I'd expect features to be enabled or disabled based on the selected language version, rather than having separate options to adjust the configuration for one very specific feature in a language version. Adding extra language dialects not corresponding to any standard version but to some peculiar mix of versions (such as C17 with a changed type for u8"", or C2X with a changed type for u8'') needs a strong reason for those language dialects to be useful (for example, the -fgnu89-inline option was justified by widespread use of GNU-style extern inline in headers). The option is needed because it impacts core language backward compatibility (for both C and C++, the type of u8 string literals; for C++, the type of u8 character literals and the new char8_t fundamental type). The ability to opt-in or opt-out of the feature eases migration by enabling source code compatibility. C and C++ standards are not published at the same cadence. A project that targets C++20 and C17 may therefore have a need to either opt-out of char8_t support on the C++ side (already possible via -fno-char8_t), or to opt-in to char8_t support on the C side until such time as the targets change to C++20(+) and C23(+); assuming WG14 approval at some point. I think the whole patch series would best wait until after the proposal has been considered by a WG14 meeting, in addition to not increasing the number of language dialects supported. As an opt-in feature, this is useful to gain implementation and deployment experience for WG14. It would be appropriate to document this as an experimental feature pending WG14 approval. If WG14 declines it or approves it with different behavior, the feature can then be removed or changed. The option could also be introduced as -fexperimental-char8_t if that eases concerns, though I do not favor that approach due to misalignment with the existing option for C++. Tom.
Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc
> On Jun 11, 2021, at 6:12 AM, Richard Biener wrote: > > On Thu, 10 Jun 2021, Qing Zhao wrote: > >> Hi, Richard, >> >> I need more discussion on the following comments you raised: >> >>> On May 26, 2021, at 6:18 AM, Richard Biener wrote: >>> >>> +/* Expand the IFN_DEFERRED_INIT function according to its second >>> argument. */ >>> +static void >>> +expand_DEFERRED_INIT (internal_fn, gcall *stmt) >>> +{ >>> + tree var = gimple_call_lhs (stmt); >>> + tree init = NULL_TREE; >>> + enum auto_init_type init_type >>> += (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); >>> + >>> + switch (init_type) >>> +{ >>> +default: >>> + gcc_unreachable (); >>> +case AUTO_INIT_PATTERN: >>> + init = build_pattern_cst_for_auto_init (TREE_TYPE (var)); >>> + expand_assignment (var, init, false); >>> + break; >>> +case AUTO_INIT_ZERO: >>> + init = build_zero_cst (TREE_TYPE (var)); >>> + expand_assignment (var, init, false); >>> + break; >>> +} >>> >>> I think actually building build_pattern_cst_for_auto_init can generate >>> massive garbage and for big auto vars code size is also a concern and >>> ideally on x86 you'd produce rep movq. So I don't think going >>> via expand_assignment is good. Instead you possibly want to lower >>> .DEFERRED_INIT to MEMs following expand_builtin_memset and >>> eventually enhance that to allow storing pieces larger than a byte. >> >> When I tried to lower .DEFERRED_INIT to MEMs for “AUTO_INIT_PATTERN”, I >> have the following questions: >> >> 1. If .DEFERRED_INIT will be lowered to MEMS through “memset”, then we >> basically initialize the whole memory covering the >> auto variable, including paddings. Right? > > Yes. > >> 2. Only when the value that is used to initialization has a repeated >> byte-pattern, we can lower it through “memset”. Otherwise, If the >> value that is used to initialization does Not have a repeated >> byte-pattern, we can NOT lower it through “memset”, right? > > Yes. This is why I said you should do it _similar_ to how memcpy > is implemented. OTOH I don't see a good reason to support patterns > that are bigger than a byte ... > >> Currently, for the values that are used to initialize for >> “AUTO_INIT_PATTERN”, we have: >> >> /* The following value is a guaranteed unmappable pointer value and has a >> repeated byte-pattern which makes it easier to synthesize. We use it for >> pointers as well as integers so that aggregates are likely to be >> initialized with this repeated value. */ >> uint64_t largevalue = 0xull; >> /* For 32-bit platforms it's a bit trickier because, across systems, only >> the >> zero page can reasonably be expected to be unmapped, and even then we >> need >> a very low address. We use a smaller value, and that value sadly doesn't >> have a repeated byte-pattern. We don't use it for integers. */ >> uint32_t smallvalue = 0x00AA; >> >> In additional to the above, for BOOLEAN_TYPE: >> >>case BOOLEAN_TYPE: >> /* We think that initializing the boolean variable to 0 other than 1 >> is better even for pattern initialization. */ >> >> Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a >> repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer >> types. Therefore, lowering the .DEFERRED_INIT for “PATTERN” >> initialization through “memset” is not always possible. >> >> Let me know if I miss anything in the above. Do you have other suggestions? > > The main point is that you need to avoid building the explicit initializer > only to have it consumed by assignment expansion. If you want to keep > all the singing and dancing (as opposed to maybe initializing with a > 0x1 byte pattern) then I think for efficiency you still want to > block-initialize the variable and then only fixup the special fields. Yes, this is a good idea. We can memset the whole structure with repeated pattern “0xAA” first, Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform. That might be more efficient. > > But as said, all this is quite over-designed IMHO and simply > zeroing everything would be much simpler and good enough. So, the fundenmental questions are: 1. do we need the functionality of “Pattern Initialization” for debugging purpose? I see that other compilers support both Zero initialization and Pattern initialization. (Clang and Microsoft compiler) http://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/ Pattern init is used in development build for debugging purpose, zero init is used in production build for security purpose. So, I assume that GCC might want to provide similar functionality? But I am open on this. Kees, will Kernel use “Pattern initialization” feature? 2. Since “Pattern initialization” is just used for debugging pur
Re: GCC documentation: porting to Sphinx
On Fri, 11 Jun 2021, Martin Liška wrote: > > Where languages have their own manuals, I think it's more appropriate for > > those to go under the language-specific directories. > > So it will require the following folder structure: > > $gccroot/gcc/doc/gcc - for GCC documentation > $gccroot/gcc/doc/gccint - for GCC internal documentation > $gccroot/gcc/doc/gfortran - for Fortran documentation > $gccroot/gcc/doc/gccgo - for GO documentation I'm thinking of $gccroot/gcc/fortran/doc $gccroot/gcc/go/doc (or subdirectories thereof if desired) for the Fortran and Go manuals, so they go alongside the front end sources. > The Sphinx Makefile will be capable of e.g. My concern with makefiles is what the main GCC build system does, with "make" run at the top level of the build tree and with the targets defined by the GNU Coding Standards, not with what happens if someone manually does make in a subdirectory of the source or build tree. "make" at top level should build all the info manuals and man pages, as at present (if a suitable Sphinx version is installed), and "make install" should install them, in the same directories as at present. "make html" at top level should build all the HTML manuals, and "make install-html" should install them. "make pdf" and "make install-pdf" at top level should work likewise. "make install-html" and "make install-pdf" should put things under $(DESTDIR)$(htmldir) and $(DESTDIR)$(pdfdir) as at present. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH 1/3]: C N2653 char8_t: Language support
On 6/7/21 5:11 PM, Joseph Myers wrote: On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote: When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro is predefined. This is the mechanism proposed to glibc to opt-in to declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed in N2653. See [2]. I don't think glibc should have such a feature test macro, and I don't think GCC should define such feature test macros either - _*_SOURCE macros are generally for the *user* to define to decide what namespace they want visible, not for the compiler to define. Without proliferating new language dialects, __STDC_VERSION__ ought to be sufficient to communicate from the compiler to the library (including to GCC's own headers such as stdatomic.h). In general I agree, but I think an exception is warranted in this case for a few reasons: 1. The feature includes both core language changes (the change of type for u8 string literals) and library changes. The library changes are not actually dependent on the core language change, but they are intended to be used together. 2. Existing use of the char8_t identifier can be found in existing open source projects and likely exists in some closed source projects as well. An opt-in approach avoids conflict and the need to conditionalize code based on gcc version. 3. An opt-in approach enables evaluation of the feature prior to any WG14 approval. Tom.
Re: [PATCH 1/3]: C N2653 char8_t: Language support
On 6/7/21 5:12 PM, Joseph Myers wrote: Also, it seems odd to add a new field to cpp_options without any code in libcpp that uses the value of that field. Ah, thank you. That appears to be leftover code from prior experimentation and I failed to identify it as such when preparing the patch. I'll provide a revised patch. Tom.
Re: [PATCH 1/3]: C N2653 char8_t: Language support
On Fri, Jun 11, 2021 at 11:52:41AM -0400, Tom Honermann via Gcc-patches wrote: > On 6/7/21 5:11 PM, Joseph Myers wrote: > > On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote: > > > > > When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE > > > macro > > > is predefined. This is the mechanism proposed to glibc to opt-in to > > > declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions > > > proposed > > > in N2653. See [2]. > > I don't think glibc should have such a feature test macro, and I don't > > think GCC should define such feature test macros either - _*_SOURCE macros > > are generally for the *user* to define to decide what namespace they want > > visible, not for the compiler to define. Without proliferating new > > language dialects, __STDC_VERSION__ ought to be sufficient to communicate > > from the compiler to the library (including to GCC's own headers such as > > stdatomic.h). > > > In general I agree, but I think an exception is warranted in this case for a > few reasons: > > 1. The feature includes both core language changes (the change of type >for u8 string literals) and library changes. The library changes >are not actually dependent on the core language change, but they are >intended to be used together. > 2. Existing use of the char8_t identifier can be found in existing open >source projects and likely exists in some closed source projects as >well. An opt-in approach avoids conflict and the need to >conditionalize code based on gcc version. > 3. An opt-in approach enables evaluation of the feature prior to any >WG14 approval. But calling it _CHAR8_T_SOURCE is weird and inconsistent with everything else. In C++, there is __cpp_char8_t 201811L predefined macro for char8_t. Using that in C is not right, sure. Often we use __SIZEOF_type__ macros not just for sizeof(), but also for presence check of the types, like #ifdef __SIZEOF_INT128__ __int128 i; #else long long i; #endif etc., while char8_t has sizeof (char8_t) == 1, perhaps predefining __SIZEOF_CHAR8_T__ 1 instead of _CHAR8_T_SOURCE would be better? Jakub
Re: [PATCH 1/3]: C N2653 char8_t: Language support
On 6/11/21 12:01 PM, Jakub Jelinek wrote: On Fri, Jun 11, 2021 at 11:52:41AM -0400, Tom Honermann via Gcc-patches wrote: On 6/7/21 5:11 PM, Joseph Myers wrote: On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote: When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro is predefined. This is the mechanism proposed to glibc to opt-in to declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed in N2653. See [2]. I don't think glibc should have such a feature test macro, and I don't think GCC should define such feature test macros either - _*_SOURCE macros are generally for the *user* to define to decide what namespace they want visible, not for the compiler to define. Without proliferating new language dialects, __STDC_VERSION__ ought to be sufficient to communicate from the compiler to the library (including to GCC's own headers such as stdatomic.h). In general I agree, but I think an exception is warranted in this case for a few reasons: 1. The feature includes both core language changes (the change of type for u8 string literals) and library changes. The library changes are not actually dependent on the core language change, but they are intended to be used together. 2. Existing use of the char8_t identifier can be found in existing open source projects and likely exists in some closed source projects as well. An opt-in approach avoids conflict and the need to conditionalize code based on gcc version. 3. An opt-in approach enables evaluation of the feature prior to any WG14 approval. But calling it _CHAR8_T_SOURCE is weird and inconsistent with everything else. In C++, there is __cpp_char8_t 201811L predefined macro for char8_t. Using that in C is not right, sure. Often we use __SIZEOF_type__ macros not just for sizeof(), but also for presence check of the types, like #ifdef __SIZEOF_INT128__ __int128 i; #else long long i; #endif etc., while char8_t has sizeof (char8_t) == 1, perhaps predefining __SIZEOF_CHAR8_T__ 1 instead of _CHAR8_T_SOURCE would be better? I'm open to whatever signaling mechanism would be preferred. It took me a while to settle on _CHAR8_T_SOURCE as the mechanism to propose as I didn't find much for other precedents. I agree that having _CHAR8_T_SOURCE be implied by the -fchar8_t option is unusual with respect to other feature test macros. Is that what you find to be weird and inconsistent? Predefining __SIZEOF_CHAR8_T__ would be consistent with __SIZEOF_WCHAR_T__, but kind of strange too since the size is always 1. Perhaps a better approach would be to follow the __CHAR16_TYPE__ and __CHAR32_TYPE__ precedent and define __CHAR8_TYPE__ to unsigned char. That is likewise a bit strange since the type would always be unsigned char, but it does provide a bit more symmetry. That could potentially have some use as well; for C++, it could be defined as char8_t and thereby reflect the difference between the two languages. Perhaps it could be useful in the future as well if WG14 were to add distinct char8_t, char16_t, and char32_t types as C++ did (I'm not offering any prediction regarding the likelihood of that happening). Tom. Jakub
Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc
On Fri, Jun 11, 2021 at 03:49:02PM +, Qing Zhao wrote: > > On Jun 11, 2021, at 6:12 AM, Richard Biener wrote: > > [...] > > > > The main point is that you need to avoid building the explicit initializer > > only to have it consumed by assignment expansion. If you want to keep > > all the singing and dancing (as opposed to maybe initializing with a > > 0x1 byte pattern) then I think for efficiency you still want to > > block-initialize the variable and then only fixup the special fields. > > Yes, this is a good idea. > > We can memset the whole structure with repeated pattern “0xAA” first, > Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform. > That might be more efficient. > > > > > But as said, all this is quite over-designed IMHO and simply > > zeroing everything would be much simpler and good enough. > > So, the fundenmental questions are: > > 1. do we need the functionality of “Pattern Initialization” for debugging > purpose? > I see that other compilers support both Zero initialization and Pattern > initialization. (Clang and Microsoft compiler) > > http://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html > https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/ > Pattern init is used in development build for debugging purpose, zero init is > used in production build for security purpose. Correct -- "pattern" is much better about triggering all kinds of problems, and suitable for debug builds. "zero" is less disruptive and generally provides a safer failure mode for production builds. > So, I assume that GCC might want to provide similar functionality? But I am > open on this. > > Kees, will Kernel use “Pattern initialization” feature? It is currently support, yes. Note that if I had to choose between "nothing" and "only zero", I will happily take "only zero". However, it seems like pattern init isn't much more of an addition in this series. > 2. Since “Pattern initialization” is just used for debugging purpose, the > runtime and code size overhead might not be that > Important at all, right? That has been my impression, yes. Thanks! -Kees -- Kees Cook
Re: [PATCH 2/2] arm: Auto-vectorization for MVE: add pack/unpack patterns
Christophe Lyon writes: > Thanks for the feedback. How about v2 attached? > Do you want me to merge neon_vec_unpack and > mve_vec_unpack and only have different assembly? > if (TARGET_HAVE_MVE) > return "vmovlb.%# %q0, %q1"; > else > return "vmovlb.%# %q0, %q1"; I think it'd be better to keep them separate, given that they have different attributes. > @@ -2199,10 +,23 @@ (define_insn "mve_vmovnbq_" >[(set_attr "type" "mve_move") > ]) > > +;; vmovnb pattern used by the vec_pack_trunc expander to avoid the > +;; need for an extra register. “to avoid the need for an uninitailized input operand” might be clearer. > +(define_insn "@mve_vec_pack_trunc_lo_" > + [ > + (set (match_operand: 0 "s_register_operand" "=w") > + (unspec: [(match_operand:MVE_5 1 "s_register_operand" > "w")] > + VMOVNBQ_S)) > + ] > + "TARGET_HAVE_MVE" > + "vmovnb.i%# %q0, %q1" > + [(set_attr "type" "mve_move") > +]) > + > […] > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md > index 430a92ce966..3afb3e1d891 100644 > --- a/gcc/config/arm/vec-common.md > +++ b/gcc/config/arm/vec-common.md > @@ -632,3 +632,91 @@ (define_expand "clz2" >"ARM_HAVE__ARITH > && !TARGET_REALLY_IWMMXT" > ) > + > +;; vmovl[tb] are not available for V4SI on MVE > +(define_expand "vec_unpack_hi_" > + [(match_operand: 0 "register_operand") > + (SE: (match_operand:VU 1 "register_operand"))] > + "ARM_HAVE__ARITH > + && !TARGET_REALLY_IWMMXT > + && ! (mode == V4SImode && TARGET_HAVE_MVE) > + && !BYTES_BIG_ENDIAN" > + { > +rtvec v = rtvec_alloc (/2); > +rtx t1; > +int i; > +for (i = 0; i < (/2); i++) > + RTVEC_ELT (v, i) = GEN_INT ((/2) + i); > + > +t1 = gen_rtx_PARALLEL (mode, v); > + > +if (TARGET_NEON) > + emit_insn (gen_neon_vec_unpack_hi_ (operands[0], > + operands[1], > + t1)); > +else > + emit_insn (gen_mve_vec_unpack_hi_ (operands[0], > +operands[1], > +t1)); > +DONE; > + } > +) Since the patterns are the same for both MVE and Neon (a good thing!), it would be simpler to write this as: (define_insn "mve_vec_unpack_lo_" [(set (match_operand: 0 "register_operand" "=w") (SE: (vec_select: (match_operand:VU 1 "register_operand" "w") (match_dup 2] … and then set operands[2] to what is t1 above. There would then be no need to call emit_insn & DONE, and we could use the natural iterator (MVE_3) for the MVE patterns. Same idea for the lo version. > […] > +;; vmovn[tb] are not available for V2DI on MVE > +(define_expand "vec_pack_trunc_" > + [(set (match_operand: 0 "register_operand") > + (vec_concat: > + (truncate: > + (match_operand:VN 1 "register_operand")) > + (truncate: > + (match_operand:VN 2 "register_operand"] > + "ARM_HAVE__ARITH > + && !TARGET_REALLY_IWMMXT > + && ! (mode == V2DImode && TARGET_HAVE_MVE) > + && !BYTES_BIG_ENDIAN" > + { > + if (TARGET_NEON) > + { > + emit_insn (gen_neon_quad_vec_pack_trunc_ (operands[0], > operands[1], > +operands[2])); > + } > + else > + { > + rtx tmpreg = gen_reg_rtx (mode); > + emit_insn (gen_mve_vec_pack_trunc_lo (mode, tmpreg, > operands[1])); > + emit_insn (gen_mve_vmovntq (VMOVNTQ_S, mode, > +tmpreg, tmpreg, operands[2])); > + emit_move_insn (operands[0], tmpreg); vmovntq can assign directly to operands[0], without a separate move. OK with those changes, thanks. Richard > + } > + DONE; > + } > +) > […]
RE: [GCC][PATCH] arm: Fix polymorphic variants failing with undefined reference to `__ARM_undef` error.
> -Original Message- > From: Srinath Parvathaneni > Sent: 10 June 2021 17:14 > To: gcc-patches@gcc.gnu.org > Cc: Kyrylo Tkachov ; Richard Earnshaw > > Subject: [GCC][PATCH] arm: Fix polymorphic variants failing with undefined > reference to `__ARM_undef` error. > > Hi, > > This patch fixes the issue mentioned in PR101016, which is mve polymorphic > variants > failing at linking with undefined reference to "__ARM_undef" error. > > Regression tested on arm-none-eabi and found no regressions. > > Ok for master? Ok. Thanks, Kyrill > > Regards, > Srinath. > > gcc/ChangeLog: > > 2021-06-10 Srinath Parvathaneni > > PR target/101016 > * config/arm/arm_mve.h (__arm_vld1q): Change > __ARM_mve_coerce(p0, > int8_t const *) to __ARM_mve_coerce1(p0, int8_t *) in the argument > for > the polymorphic variants matching code. > (__arm_vld1q_z): Likewise. > (__arm_vld2q): Likewise. > (__arm_vld4q): Likewise. > (__arm_vldrbq_gather_offset): Likewise. > (__arm_vldrbq_gather_offset_z): Likewise. > > gcc/testsuite/ChangeLog: > > 2021-06-10 Srinath Parvathaneni > > PR target/101016 > * gcc.target/arm/mve/intrinsics/pr101016.c: New test. > > > > ### Attachment also inlined for ease of reply > ### > > > diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h > index > 1380f3acbfe64026bc882c308bb1c243e27ac4b3..83f10036990fc3df956fb2fa > 4818d1304138b485 100644 > --- a/gcc/config/arm/arm_mve.h > +++ b/gcc/config/arm/arm_mve.h > @@ -37565,47 +37565,47 @@ extern void *__ARM_undef; > > #define __arm_vld1q(p0) (\ >_Generic( (int (*)[__ARM_mve_typeid(p0)])0, \ > - int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_s8 > (__ARM_mve_coerce(p0, int8_t const *)), \ > - int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_s16 > (__ARM_mve_coerce(p0, int16_t const *)), \ > - int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_s32 > (__ARM_mve_coerce(p0, int32_t const *)), \ > - int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_u8 > (__ARM_mve_coerce(p0, uint8_t const *)), \ > - int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_u16 > (__ARM_mve_coerce(p0, uint16_t const *)), \ > - int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_u32 > (__ARM_mve_coerce(p0, uint32_t const *)), \ > - int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld1q_f16 > (__ARM_mve_coerce(p0, float16_t const *)), \ > - int (*)[__ARM_mve_type_float32_t_ptr]: __arm_vld1q_f32 > (__ARM_mve_coerce(p0, float32_t const * > + int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_s8 > (__ARM_mve_coerce1(p0, int8_t *)), \ > + int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_s16 > (__ARM_mve_coerce1(p0, int16_t *)), \ > + int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_s32 > (__ARM_mve_coerce1(p0, int32_t *)), \ > + int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_u8 > (__ARM_mve_coerce1(p0, uint8_t *)), \ > + int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_u16 > (__ARM_mve_coerce1(p0, uint16_t *)), \ > + int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_u32 > (__ARM_mve_coerce1(p0, uint32_t *)), \ > + int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld1q_f16 > (__ARM_mve_coerce1(p0, float16_t *)), \ > + int (*)[__ARM_mve_type_float32_t_ptr]: __arm_vld1q_f32 > (__ARM_mve_coerce1(p0, float32_t * > > #define __arm_vld1q_z(p0,p1) ( \ >_Generic( (int (*)[__ARM_mve_typeid(p0)])0, \ > - int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_z_s8 > (__ARM_mve_coerce(p0, int8_t const *), p1), \ > - int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_z_s16 > (__ARM_mve_coerce(p0, int16_t const *), p1), \ > - int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_z_s32 > (__ARM_mve_coerce(p0, int32_t const *), p1), \ > - int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_z_u8 > (__ARM_mve_coerce(p0, uint8_t const *), p1), \ > - int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_z_u16 > (__ARM_mve_coerce(p0, uint16_t const *), p1), \ > - int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_z_u32 > (__ARM_mve_coerce(p0, uint32_t const *), p1), \ > - int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld1q_z_f16 > (__ARM_mve_coerce(p0, float16_t const *), p1), \ > - int (*)[__ARM_mve_type_float32_t_ptr]: __arm_vld1q_z_f32 > (__ARM_mve_coerce(p0, float32_t const *), p1))) > + int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_z_s8 > (__ARM_mve_coerce1(p0, int8_t *), p1), \ > + int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_z_s16 > (__ARM_mve_coerce1(p0, int16_t *), p1), \ > + int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_z_s32 > (__ARM_mve_coerce1(p0, int32_t *), p1), \ > + int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_z_u8 > (__ARM_mve_coerce1(p0, uint8_t *), p1), \ > + int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_z_u16 > (__ARM_mve_coerce1(p0, uint16_t *), p1), \ > + int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_z_u32 > (__ARM_mve_coerce1(p0, uint32_t *), p1), \ > + int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld1q_z_f16 > (__ARM_mve_coerce1(p
Re: [PATCH 1/3]: C N2653 char8_t: Language support
On Fri, Jun 11, 2021 at 12:20:48PM -0400, Tom Honermann wrote: > I'm open to whatever signaling mechanism would be preferred. It took me a > while to settle on _CHAR8_T_SOURCE as the mechanism to propose as I didn't > find much for other precedents. > > I agree that having _CHAR8_T_SOURCE be implied by the -fchar8_t option is > unusual with respect to other feature test macros. Is that what you find to > be weird and inconsistent? > > Predefining __SIZEOF_CHAR8_T__ would be consistent with __SIZEOF_WCHAR_T__, > but kind of strange too since the size is always 1. > > Perhaps a better approach would be to follow the __CHAR16_TYPE__ and > __CHAR32_TYPE__ precedent and define __CHAR8_TYPE__ to unsigned char. That > is likewise a bit strange since the type would always be unsigned char, but > it does provide a bit more symmetry. That could potentially have some use > as well; for C++, it could be defined as char8_t and thereby reflect the > difference between the two languages. Perhaps it could be useful in the > future as well if WG14 were to add distinct char8_t, char16_t, and char32_t > types as C++ did (I'm not offering any prediction regarding the likelihood > of that happening). C++ already predefines #define __CHAR8_TYPE__ unsigned char #define __CHAR16_TYPE__ short unsigned int #define __CHAR32_TYPE__ unsigned int for -std={c,gnu}++2{0,a,3,b} or -fchar8_t (unless -fno-char8_t), so I agree just making sure __CHAR8_TYPE__ is defined to unsigned char even for C is best. And you probably don't need to do anything in the C patch for it, void c_stddef_cpp_builtins(void) { builtin_define_with_value ("__SIZE_TYPE__", SIZE_TYPE, 0); ... if (flag_char8_t) builtin_define_with_value ("__CHAR8_TYPE__", CHAR8_TYPE, 0); builtin_define_with_value ("__CHAR16_TYPE__", CHAR16_TYPE, 0); builtin_define_with_value ("__CHAR32_TYPE__", CHAR32_TYPE, 0); will do that. Jakub
Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc
On Jun 11, 2021, at 10:49 AM, Qing Zhao via Gcc-patches mailto:gcc-patches@gcc.gnu.org>> wrote: On Jun 11, 2021, at 6:12 AM, Richard Biener mailto:rguent...@suse.de>> wrote: On Thu, 10 Jun 2021, Qing Zhao wrote: Hi, Richard, I need more discussion on the following comments you raised: On May 26, 2021, at 6:18 AM, Richard Biener mailto:rguent...@suse.de>> wrote: +/* Expand the IFN_DEFERRED_INIT function according to its second argument. */ +static void +expand_DEFERRED_INIT (internal_fn, gcall *stmt) +{ + tree var = gimple_call_lhs (stmt); + tree init = NULL_TREE; + enum auto_init_type init_type += (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); + + switch (init_type) +{ +default: + gcc_unreachable (); +case AUTO_INIT_PATTERN: + init = build_pattern_cst_for_auto_init (TREE_TYPE (var)); + expand_assignment (var, init, false); + break; +case AUTO_INIT_ZERO: + init = build_zero_cst (TREE_TYPE (var)); + expand_assignment (var, init, false); + break; +} I think actually building build_pattern_cst_for_auto_init can generate massive garbage and for big auto vars code size is also a concern and ideally on x86 you'd produce rep movq. So I don't think going via expand_assignment is good. Instead you possibly want to lower .DEFERRED_INIT to MEMs following expand_builtin_memset and eventually enhance that to allow storing pieces larger than a byte. When I tried to lower .DEFERRED_INIT to MEMs for “AUTO_INIT_PATTERN”, I have the following questions: 1. If .DEFERRED_INIT will be lowered to MEMS through “memset”, then we basically initialize the whole memory covering the auto variable, including paddings. Right? Yes. 2. Only when the value that is used to initialization has a repeated byte-pattern, we can lower it through “memset”. Otherwise, If the value that is used to initialization does Not have a repeated byte-pattern, we can NOT lower it through “memset”, right? Yes. This is why I said you should do it _similar_ to how memcpy is implemented. OTOH I don't see a good reason to support patterns that are bigger than a byte ... Currently, for the values that are used to initialize for “AUTO_INIT_PATTERN”, we have: /* The following value is a guaranteed unmappable pointer value and has a repeated byte-pattern which makes it easier to synthesize. We use it for pointers as well as integers so that aggregates are likely to be initialized with this repeated value. */ uint64_t largevalue = 0xull; /* For 32-bit platforms it's a bit trickier because, across systems, only the zero page can reasonably be expected to be unmapped, and even then we need a very low address. We use a smaller value, and that value sadly doesn't have a repeated byte-pattern. We don't use it for integers. */ uint32_t smallvalue = 0x00AA; In additional to the above, for BOOLEAN_TYPE: case BOOLEAN_TYPE: /* We think that initializing the boolean variable to 0 other than 1 is better even for pattern initialization. */ Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer types. Therefore, lowering the .DEFERRED_INIT for “PATTERN” initialization through “memset” is not always possible. Let me know if I miss anything in the above. Do you have other suggestions? The main point is that you need to avoid building the explicit initializer only to have it consumed by assignment expansion. If you want to keep all the singing and dancing (as opposed to maybe initializing with a 0x1 byte pattern) then I think for efficiency you still want to block-initialize the variable and then only fixup the special fields. Yes, this is a good idea. We can memset the whole structure with repeated pattern “0xAA” first, Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform. That might be more efficient. However, the paddings will be initialized to “0xAA”. But this should be fine since with -fauto-var-init, the paddings can be any value. So, still should be fine. Qing
Re: [PATCH 1/4] introduce diagnostic infrastructure changes (PR 98512)
On Thu, 2021-06-10 at 17:26 -0600, Martin Sebor wrote: > This diff introduces the diagnostic infrastructure changes to support > controlling warnings at any call site in the inlining stack and > printing > the inlining context without the %K and %G directives. Thanks for working on this, looks very promising. > Improve warning suppression for inlined functions. > > Resolves: > PR middle-end/98871 - Cannot silence -Wmaybe-uninitialized at declaration site > PR middle-end/98512 - #pragma GCC diagnostic ignored ineffective in > conjunction with alias attribute Am I right in thinking that you add test coverage for both of these in patch 2 of the kit? > > gcc/ChangeLog: > > * diagnostic.c (update_inlining_context): New. > (update_effective_level_from_pragmas): Handle inlining context. > (diagnostic_report_diagnostic): Same. > * diagnostic.h (struct diagnostic_info): Add ctor. > (struct diagnostic_context): Add members. > * tree-diagnostic.c (get_inlining_locations): New. > (set_inlining_location): New. > (tree_diagnostics_defaults): Set new callback pointers. [..snip...] > @@ -1204,7 +1256,7 @@ diagnostic_report_diagnostic (diagnostic_context > *context, >/* We do this to avoid giving the message for -pedantic-errors. */ >orig_diag_kind = diagnostic->kind; > } > - > + Stray whitespace change? Though it looks like a fix of a stray space, so not a big deal. >if (diagnostic->kind == DK_NOTE && context->inhibit_notes_p) > return false; > [..snip...] > diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h > index 1b9d6b1f64d..b95ee23dda0 100644 > --- a/gcc/diagnostic.h > +++ b/gcc/diagnostic.h > @@ -87,6 +87,10 @@ enum diagnostics_extra_output_kind > list in diagnostic.def. */ > struct diagnostic_info > { > + diagnostic_info () > +: message (), richloc (), metadata (), x_data (), kind (), option_index > () > + { } > + Why does the patch add this ctor? >/* Text to be formatted. */ >text_info message; > > @@ -343,6 +347,32 @@ struct diagnostic_context > >/* Callback for final cleanup. */ >void (*final_cb) (diagnostic_context *context); > + > + /* The inlining context of the diagnostic (may have just one > + element if a diagnostic is not for an inlined expression). */ > + struct inlining_ctx > + { > +void reset () > +{ > + ilocs.release (); > + loc = UNKNOWN_LOCATION; > + ao = NULL; > + allsyslocs = false; > +} > + > +/* Locations along the inlining stack. */ > +auto_vec ilocs; > +/* The locus of the diagnostic. */ > +location_t loc; > +/* The abstract origin of the location. */ > +void *ao; > +/* Set of every ILOCS element is in a system header. */ > +bool allsyslocs; > + } ictx; Why is the inlining ctx part of the diagnostic_context? That feels strange to me. This inlining information relates to a particular diagnostic, so it seems more appropriate to me that it should be part of the diagnostic_info (which might thus necessitate having a ctor for diagnostic_info). Doing that might avoid the need for "reset", if I'm right in assuming that getting the data is done once per diagnostic during diagnostic_report_diagnostic. Alternatively, could this be state that's created on the stack during diagnostic_report_diagnostic and passed around by pointer as another parameter? (putting it in diagnostic_info might be simplest though) Maybe rename it to "inlining_info"? How involved would it be to make it be a class with private fields? Can the field names have "m_" prefixes, please? > + /* Callbacks to get and set the inlining context. */ Probably should spell out in the comment here that doing so requires knowledge of trees, which is why it's a callback (to avoid diagnostic.c from having to know about trees). > + void (*get_locations_cb)(diagnostic_context *, diagnostic_info *); > + void (*set_location_cb)(const diagnostic_context *, diagnostic_info *); > }; > > static inline void > diff --git a/gcc/tree-diagnostic.c b/gcc/tree-diagnostic.c > index 95b8ef30070..a8c5484849a 100644 > --- a/gcc/tree-diagnostic.c > +++ b/gcc/tree-diagnostic.c > @@ -305,6 +305,84 @@ default_tree_printer (pretty_printer *pp, text_info > *text, const char *spec, >return true; > } > > +/* Get the inlining stack corresponding to the DIAGNOSTIC location. */ > + > +static void > +get_inlining_locations (diagnostic_context *context, > + diagnostic_info *diagnostic) > +{ > + context->ictx.reset (); > + > + location_t loc = diagnostic_location (diagnostic); > + tree block = LOCATION_BLOCK (loc); > + > + /* Count the number of locations in system headers. When all are, > + warnings are suppressed by -Wno-system-headers. Otherwise, they > + involve some user code, possibly inlined into a function in a system > + header, and are not treated as coming from system headers. */ > + unsigned nsys
Re: [PATCH v2] inline-asm: Fix ICE with bitfields in "m" operands [PR100785]
On Fri, 11 Jun 2021, Jakub Jelinek via Gcc-patches wrote: > gcc/c/ > * c-typeck.c (c_mark_addressable): Diagnose trying to make > bit-fields addressable. The C front-end changes are OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc
On Fri, Jun 11, 2021 at 01:04:09PM +0200, Richard Biener wrote: > On Tue, 8 Jun 2021, Kees Cook wrote: > > > On Tue, Jun 08, 2021 at 09:41:38AM +0200, Richard Biener wrote: > > > On Mon, 7 Jun 2021, Qing Zhao wrote: > > > > > > > Hi, > > > > > > > > > On Jun 7, 2021, at 2:53 AM, Richard Biener wrote: > > > > > > > > > >> > > > > >> To address the above suggestion: > > > > >> > > > > >> My study shows: the call to __builtin_clear_padding is expanded > > > > >> during gimplification phase. > > > > >> And there is no __bultin_clear_padding expanding during rtx > > > > >> expanding phase. > > > > >> However, for -ftrivial-auto-var-init, padding initialization should > > > > >> be done both in gimplification phase and rtx expanding phase. > > > > >> since the __builtin_clear_padding might not be good for rtx > > > > >> expanding, reusing __builtin_clear_padding might not work. > > > > >> > > > > >> Let me know if you have any more comments on this. > > > > > > > > > > Yes, I didn't suggest to literally emit calls to > > > > > __builtin_clear_padding > > > > > but instead to leverage the lowering code, more specifically share the > > > > > code that figures _what_ is to be initialized (where the padding is) > > > > > and eventually the actual code generation pieces. That might need > > > > > some > > > > > refactoring but the code where padding resides should be present only > > > > > a single time (since it's quite complex). > > > > > > > > Okay, I see your point here. > > > > > > > > > > > > > > Which is also why I suggested to split out the padding initialization > > > > > bits to a separate patch (and option). > > > > > > > > Personally, I am okay with splitting padding initialization from this > > > > current patch, > > > > Kees, what’s your opinion on this? i.e, the current > > > > -ftrivial-auto-var-init will NOT initialize padding, we will add > > > > another option to > > > > Explicitly initialize padding. > > > > > > It would also be possible to have -fauto-var-init, -fauto-var-init-padding > > > and have -ftrivial-auto-var-init for clang compatibility enabling both. > > > > Sounds good to me! > > > > > Or -fauto-var-init={zero,pattern,padding} and allow > > > -fauto-var-init=pattern,padding to be specified. Note there's also > > > padding between auto variables on the stack - that "trailing" > > > padding isn't initialized either? (yes, GCC sorts variables to minimize > > > that padding) For example for > > > > > > void foo() > > > { > > > char a[3]; > > > bar (a); > > > } > > > > > > there's 12 bytes padding after 'a', shouldn't we initialize that? If not, > > > why's other padding important to be initialized? > > > > This isn't a situation that I'm aware of causing real-world problems. > > The issues have all come from padding within an addressable object. I > > haven't tested Clang's behavior on this (and I have no kernel tests for > > this padding), but I do check for trailing padding, like: > > > > struct test_trailing_hole { > > char *one; > > char *two; > > char *three; > > char four; > > /* "sizeof(unsigned long) - 1" byte padding hole here. */ > > }; > > Any justification why tail padding for > > struct foo { double x; char x[3]; } a; > > is important but not for > > char x[3]; > > ? It does look like an odd inconsistency to me. The problem is with sizeof() and the various compounding results related to it. Namely, things that do whole-struct copies (which is unfortunately common in the kernel) will include the padding for "a" since it is within the object, as represented by sizeof(), but not for x: #include int main(void) { struct foo { double y; char x[3]; } a; char x[3]; printf("a: %zu (a.y: %zu, a.x: %zu)\n", sizeof(a), sizeof(a.y), sizeof(a.x)); printf("x: %zu\n", sizeof(x)); return 0; } a: 16 (a.y: 8, a.x: 3) x: 3 And it gets worse with structs-within-structs, etc. -- Kees Cook
Re: [PATCH 0/3]: C N2653 char8_t implementation
On Fri, 11 Jun 2021, Tom Honermann via Gcc-patches wrote: > The option is needed because it impacts core language backward compatibility > (for both C and C++, the type of u8 string literals; for C++, the type of u8 > character literals and the new char8_t fundamental type). Lots of new features in new standard versions can affect backward compatibility. We generally bundle all of those up into a single -std option rather than having an explosion of different language variants with different features enabled or disabled. I don't think this feature, for C, reaches the threshold that would justify having a separate option to control it, especially given that people can use -Wno-pointer-sign or pointer casts or their own local char8_t typedef as an intermediate step if they want code using u8"" strings to work for both old and new standard versions. I don't think u8"" strings are widely used in C library headers in a way where the choice of type matters. (Use of a feature in library headers is a key thing that can justify options such as -fgnu89-inline, because it means the choice of language version is no longer fully under control of a single project.) The only feature proposed for C2x that I think is likely to have significant compatibility implications in practice for a lot of code is making bool, true and false into keywords. I still don't think a separate option makes sense there. (If that feature is accepted for C2x, what would be useful is for people to do distribution rebuilds with -std=gnu2x as the default to find and fix code that breaks, in advance of the default actually changing in GCC. But the workaround for not-yet-fixed code would be -std=gnu11, not a separate option for that one feature.) > > I think the whole patch series would best wait until after the proposal > > has been considered by a WG14 meeting, in addition to not increasing the > > number of language dialects supported. > > As an opt-in feature, this is useful to gain implementation and deployment > experience for WG14. I think this feature is one of the cases where experience in C++ is sufficiently relevant for C (although there are certainly cases of other language features where the languages are sufficiently different that using C++ experience like that can be problematic). E.g. we didn't need -fdigit-separators for C before digit separators were added to C2x, and we don't need -fno-digit-separators now they are in C2x (the feature is just enabled or disabled based on the language version), although that's one of many features that do affect compatibility in corner cases. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] libgccjit: Handle truncation and extension for casts [PR 95498]
On Thu, 2021-05-27 at 21:22 -0400, Antoni Boucher wrote: > Here's the patch with the condition removed. > I believe everything is now fixed. > Thanks! Thanks; this looks good to me. Is this the latest version of the patch; would you like me to apply it? Dave > > Le jeudi 27 mai 2021 à 18:21 -0400, David Malcolm a écrit : > > On Tue, 2021-05-25 at 20:16 -0400, Antoni Boucher wrote: > > > I updated the patch according to the comments by Tom Tromey. > > > > > > There's one question left about your question regarding > > > C_MAYBE_CONST_EXPR, David: > > > > > > I am not sure if we can get a C_MAYBE_CONST_EXPR from libgccjit, > > > and > > > it > > > indeed seems like it's only created in c-family. > > > However, we do use it in libgccjit here: > > > https://github.com/gcc-mirror/gcc/blob/master/gcc/jit/jit-playback.c#L1180 > > > > > > I tried removing the condition `if (TREE_CODE (t_ret) != > > > C_MAYBE_CONST_EXPR)` and all the tests of libgccjit still pass. > > > > > > That code was copied from here: > > > https://github.com/gcc-mirror/gcc/blob/master/gcc/c/c-convert.c#L175 > > > and might not be needed in libgccjit. > > > > > > Should I just remove the condition, then? > > > > I think so. > > > > Thanks > > Dave > > > > > > > > Le jeudi 13 mai 2021 à 19:58 -0400, David Malcolm a écrit : > > > > On Thu, 2021-05-13 at 19:31 -0400, Antoni Boucher wrote: > > > > > Thanks for your answer. > > > > > > > > > > See my answers below: > > > > > > > > > > Le jeudi 13 mai 2021 à 18:13 -0400, David Malcolm a écrit : > > > > > > On Sat, 2021-02-20 at 17:17 -0500, Antoni Boucher via Gcc- > > > > > > patches > > > > > > wrote: > > > > > > > Hi. > > > > > > > Thanks for your feedback! > > > > > > > > > > > > > > > > > > > Sorry about the delay in responding. > > > > > > > > > > > > In the past I was hesitant about adding more cast support > > > > > > to > > > > > > libgccjit > > > > > > since I felt that the user could always just create a union > > > > > > to > > > > > > do > > > > > > the > > > > > > cast. Then I tried actually using the libgccjit API to do > > > > > > this, > > > > > > and > > > > > > realized how much work it adds, so I now think we do want > > > > > > to > > > > > > support > > > > > > casting more types. > > > > > > > > > > > > > > > > > > > See answers below: > > > > > > > > > > > > > > On Sat, Feb 20, 2021 at 11:20:35AM -0700, Tom Tromey > > > > > > > wrote: > > > > > > > > > > > > > "Antoni" == Antoni Boucher via Gcc-patches > > > > > > > > > > > > > < > > > > > > > > > > > > > gcc-patches@gcc.gnu.org> writes: > > > > > > > > > > > > > > > > Antoni> gcc/jit/ > > > > > > > > Antoni> PR target/95498 > > > > > > > > Antoni> * jit-playback.c: Add support to handle > > > > > > > > truncation > > > > > > > > and extension > > > > > > > > Antoni> in the convert function. > > > > > > > > > > > > > > > > Antoni> + switch (dst_code) > > > > > > > > Antoni> + { > > > > > > > > Antoni> + case INTEGER_TYPE: > > > > > > > > Antoni> + case ENUMERAL_TYPE: > > > > > > > > Antoni> + t_ret = convert_to_integer (dst_type, > > > > > > > > expr); > > > > > > > > Antoni> + goto maybe_fold; > > > > > > > > Antoni> + > > > > > > > > Antoni> + default: > > > > > > > > Antoni> + gcc_assert > > > > > > > > (gcc::jit::active_playback_ctxt); > > > > > > > > Antoni> + gcc::jit::active_playback_ctxt- > > > > > > > > >add_error > > > > > > > > (NULL, > > > > > > > > "unhandled conversion"); > > > > > > > > Antoni> + fprintf (stderr, "input expression:\n"); > > > > > > > > Antoni> + debug_tree (expr); > > > > > > > > Antoni> + fprintf (stderr, "requested type:\n"); > > > > > > > > Antoni> + debug_tree (dst_type); > > > > > > > > Antoni> + return error_mark_node; > > > > > > > > Antoni> + > > > > > > > > Antoni> + maybe_fold: > > > > > > > > Antoni> + if (TREE_CODE (t_ret) != > > > > > > > > C_MAYBE_CONST_EXPR) > > > > > > > > > > > > Do we even get C_MAYBE_CONST_EXPR in libgccjit? That tree > > > > > > code > > > > > > is > > > > > > defined in c-family/c-common.def; how can nodes of that > > > > > > kind > > > > > > be > > > > > > created > > > > > > outside of the c-family? > > > > > > > > > > I am not sure, but that seems like it's only created in c- > > > > > family > > > > > indeed. > > > > > However, we do use it in libgccjit here: > > > > > > > > > > https://github.com/gcc-mirror/gcc/blob/master/gcc/jit/jit-playback.c#L1180 > > > > > > > > > > > > > > > > > > > Antoni> + t_ret = fold (t_ret); > > > > > > > > Antoni> + return t_ret; > > > > > > > > > > > > > > > > It seems weird to have a single 'goto' to maybe_fold, > > > > > > > > especially > > > > > > > > inside > > > > > > > > a switch like this. > > > > > > > > > > > > > > > > If you think the maybe_fold code won't be reused, then > > > > > > > > it > > > > > > > > should > > > > > > > > just > > > > > > > > be > > > > > > > > hoisted
[committed] libstdc++: Fix filesystem::path comparisons for C++23
In C++23 there is a basic_string_view(Range&&) constructor, which is constrained to take a range (specifically, a contiguous_range). When the filesystem::path comparison operators call lhs.compare(rhs) the overload taking a string_view is considered, which means checking whether path satisfies the range concept. That satisfaction result changes depending whether path::iterator is complete, which is ill-formed; no diagnostic required. To avoid the problem, this change ensures that the overload resolution is performed in a context where path::iterator is complete and the range concept is satisfied. (The result of overload resolution is always that the compare(const path&) overload is the best match, but we still have to consider the compare(basic_string_view) one to decide if it even participates in overload resolution). For std::filesystem::path we can't define the comparison operators later in the file, because they are hidden friends, so a new helper is introduced that gets defined when everything else is complete. For std::experimental::filesystem::path we can just move the definitions of the comparison operators later in the file. Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: * include/bits/fs_path.h (operator==, operator<=>): Use new _S_compare function. (path::_S_compare): New function to call path::compare in a context where path::iterator is complete. * include/experimental/bits/fs_path.h (operator<, operator==): Define after path::iterator is complete. * testsuite/27_io/filesystem/path/native/conv_c++23.cc: New test. * testsuite/experimental/filesystem/path/native/conv_c++23.cc: New test. Tested powerpc64le-linux. Committed to trunk. This also needs to be backported to gcc-11 where the string_view constructor is present for C++23 mode. commit 1e690757d30775ed340a368b9a9463b2ad68de01 Author: Jonathan Wakely Date: Fri Jun 11 19:18:11 2021 libstdc++: Fix filesystem::path comparisons for C++23 In C++23 there is a basic_string_view(Range&&) constructor, which is constrained to take a range (specifically, a contiguous_range). When the filesystem::path comparison operators call lhs.compare(rhs) the overload taking a string_view is considered, which means checking whether path satisfies the range concept. That satisfaction result changes depending whether path::iterator is complete, which is ill-formed; no diagnostic required. To avoid the problem, this change ensures that the overload resolution is performed in a context where path::iterator is complete and the range concept is satisfied. (The result of overload resolution is always that the compare(const path&) overload is the best match, but we still have to consider the compare(basic_string_view) one to decide if it even participates in overload resolution). For std::filesystem::path we can't define the comparison operators later in the file, because they are hidden friends, so a new helper is introduced that gets defined when everything else is complete. For std::experimental::filesystem::path we can just move the definitions of the comparison operators later in the file. Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: * include/bits/fs_path.h (operator==, operator<=>): Use new _S_compare function. (path::_S_compare): New function to call path::compare in a context where path::iterator is complete. * include/experimental/bits/fs_path.h (operator<, operator==): Define after path::iterator is complete. * testsuite/27_io/filesystem/path/native/conv_c++23.cc: New test. * testsuite/experimental/filesystem/path/native/conv_c++23.cc: New test. diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h index ad2518f414c..5e285204527 100644 --- a/libstdc++-v3/include/bits/fs_path.h +++ b/libstdc++-v3/include/bits/fs_path.h @@ -513,13 +513,13 @@ namespace __detail /// Compare paths friend bool operator==(const path& __lhs, const path& __rhs) noexcept -{ return __lhs.compare(__rhs) == 0; } +{ return path::_S_compare(__lhs, __rhs) == 0; } #if __cpp_lib_three_way_comparison /// Compare paths friend strong_ordering operator<=>(const path& __lhs, const path& __rhs) noexcept -{ return __lhs.compare(__rhs) <=> 0; } +{ return path::_S_compare(__lhs, __rhs) <=> 0; } #else /// Compare paths friend bool operator!=(const path& __lhs, const path& __rhs) noexcept @@ -627,6 +627,11 @@ namespace __detail static basic_string<_CharT, _Traits, _Allocator> _S_str_convert(basic_string_view, const _Allocator&); +// Returns lhs.compare(rhs), but defined after path::iterator is complete. +__attribute__((__always_inline__)) +static int +
[committed] d: foreach over a tuple doesn't work on 16-bit targets (PR100999)
Hi, This patch improves semantic passes in the front-end around the `foreach' and `static foreach' statements to be more resilient to compiling in a minimal D runtime environment. Checking of the index type has been improved as well so now there won't be needless compiler errors when using 8 or 16-bit integers as index types when the size fits the expected loop range. Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32, committed to mainline, and backported to the gcc-10 and gcc-11 release branches. Regards, Iain. --- gcc/d/ChangeLog: PR d/100999 * dmd/MERGE: Merge upstream dmd 7a3808254. libphobos/ChangeLog: PR d/100999 * src/MERGE: Merge upstream phobos 55bb17543. --- gcc/d/dmd/MERGE | 2 +- gcc/d/dmd/cond.c | 29 +--- gcc/d/dmd/dinterpret.c| 9 +++ gcc/d/dmd/expression.c| 2 +- gcc/d/dmd/expressionsem.c | 12 ++-- gcc/d/dmd/statementsem.c | 36 +- .../compilable/extra-files/minimal/object.d | 1 + .../gdc.test/compilable/interpret5.d | 30 gcc/testsuite/gdc.test/compilable/minimal3.d | 36 ++ .../gdc.test/compilable/staticforeach.d | 38 ++ gcc/testsuite/gdc.test/compilable/test21742.d | 13 gcc/testsuite/gdc.test/compilable/test22006.d | 14 .../gdc.test/fail_compilation/b12504.d| 64 + .../gdc.test/fail_compilation/diag16976.d | 69 ++- .../gdc.test/fail_compilation/fail117.d | 6 +- .../gdc.test/fail_compilation/fail22006.d | 22 ++ .../gdc.test/fail_compilation/fail238_m32.d | 8 +-- .../gdc.test/fail_compilation/fail238_m64.d | 8 +-- .../gdc.test/fail_compilation/fail7424b.d | 2 +- .../gdc.test/fail_compilation/fail7424c.d | 2 +- .../gdc.test/fail_compilation/fail7424d.d | 2 +- .../gdc.test/fail_compilation/fail7424e.d | 2 +- .../gdc.test/fail_compilation/fail7424f.d | 2 +- .../gdc.test/fail_compilation/fail7424g.d | 2 +- .../gdc.test/fail_compilation/fail7424h.d | 2 +- .../gdc.test/fail_compilation/fail7424i.d | 2 +- .../gdc.test/fail_compilation/fail9766.d | 4 +- .../gdc.test/fail_compilation/ice9406.d | 3 +- .../gdc.test/fail_compilation/test21927.d | 20 ++ .../gdc.test/fail_compilation/test21939.d | 9 +++ libphobos/src/MERGE | 2 +- libphobos/src/std/typecons.d | 15 ++-- 32 files changed, 388 insertions(+), 80 deletions(-) create mode 100644 gcc/testsuite/gdc.test/compilable/extra-files/minimal/object.d create mode 100644 gcc/testsuite/gdc.test/compilable/interpret5.d create mode 100644 gcc/testsuite/gdc.test/compilable/minimal3.d create mode 100644 gcc/testsuite/gdc.test/compilable/test21742.d create mode 100644 gcc/testsuite/gdc.test/compilable/test22006.d create mode 100644 gcc/testsuite/gdc.test/fail_compilation/b12504.d create mode 100644 gcc/testsuite/gdc.test/fail_compilation/fail22006.d create mode 100644 gcc/testsuite/gdc.test/fail_compilation/test21927.d create mode 100644 gcc/testsuite/gdc.test/fail_compilation/test21939.d diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE index a617f285eac..d20785d9126 100644 --- a/gcc/d/dmd/MERGE +++ b/gcc/d/dmd/MERGE @@ -1,4 +1,4 @@ -4a4e46a6f304a667e0c05d4455706ec2056ffddc +7a3808254878df8cb70a055bea58afc79187b778 The first line of this file holds the git revision number of the last merge done from the dlang/dmd repository. diff --git a/gcc/d/dmd/cond.c b/gcc/d/dmd/cond.c index 6f112ad909c..6c7dc9eb1a3 100644 --- a/gcc/d/dmd/cond.c +++ b/gcc/d/dmd/cond.c @@ -112,6 +112,7 @@ static void lowerArrayAggregate(StaticForeach *sfe, Scope *sc) sfe->aggrfe->aggr = new TupleExp(aggr->loc, es); sfe->aggrfe->aggr = expressionSemantic(sfe->aggrfe->aggr, sc); sfe->aggrfe->aggr = sfe->aggrfe->aggr->optimize(WANTvalue); +sfe->aggrfe->aggr = sfe->aggrfe->aggr->ctfeInterpret(); } else { @@ -198,7 +199,8 @@ static TypeStruct *createTupleType(Loc loc, Expressions *e) Type *ty = new TypeTypeof(loc, new TupleExp(loc, e)); sdecl->members->push(new VarDeclaration(loc, ty, fid, NULL)); TypeStruct *r = (TypeStruct *)sdecl->type; -r->vtinfo = TypeInfoStructDeclaration::create(r); // prevent typeinfo from going to object file +if (global.params.useTypeInfo && Type::dtypeinfo) +r->vtinfo = TypeInfoStructDeclaration::create(r); // prevent typeinfo from going to object file return r; } @@ -312,15 +314,25 @@ static void lowerNonArrayAggregate(StaticForeach *sfe, Scope *sc) Identifier *idres = Identifier::generateId("__res"); VarDeclaration *vard = new VarDeclaration(aloc, aty, idres, NULL); Statements *s2 = new Statements(); -s2->push(new ExpStatement(aloc, vard)); -Expression *catass = new CatAssignE
PING [PATCH] PR fortran/95502 - ICE in gfc_check_do_variable, at fortran/parse.c:4446
*PING* BTW this patch also addresses PR95501, so I'll update the Changelog. Thanks, Harald > Gesendet: Freitag, 04. Juni 2021 um 23:12 Uhr > Von: "Harald Anlauf" > An: "fortran" , "gcc-patches" > Betreff: [PATCH] PR fortran/95502 - ICE in gfc_check_do_variable, at > fortran/parse.c:4446 > > ICE-on-invalid issues during error recovery. Testcase by Gerhard, > initial patch by Steve. I found another variant which needed an > additional fix for a NULL pointer dereference. > > Regtested on x86_64-pc-linux-gnu. > > OK for mainline / 11-branch? > > Thanks, > Harald > > > Fortran - ICE in gfc_check_do_variable, at fortran/parse.c:4446 > > Avoid NULL pointer dereferences during error recovery. > > gcc/fortran/ChangeLog: > > PR fortran/95502 > * expr.c (gfc_check_pointer_assign): Avoid NULL pointer > dereference. > * match.c (gfc_match_pointer_assignment): Likewise. > * parse.c (gfc_check_do_variable): Avoid comparison with NULL > symtree. > > gcc/testsuite/ChangeLog: > > PR fortran/95502 > * gfortran.dg/pr95502.f90: New test. > >
Re: GCC documentation: porting to Sphinx
> On Jun 11, 2021, at 11:50 AM, Joseph Myers wrote: > > ... > > "make" at top level should build all the info manuals and man pages, as at > present (if a suitable Sphinx version is installed), and "make install" > should install them, in the same directories as at present. > > "make html" at top level should build all the HTML manuals, and "make > install-html" should install them. > > "make pdf" and "make install-pdf" at top level should work likewise. > > "make install-html" and "make install-pdf" should put things under > $(DESTDIR)$(htmldir) and $(DESTDIR)$(pdfdir) as at present. And in addition, it would be nice to have additional make and make install- targets for other output formats that Sphinx can generate for us, at least some of them. "epub" comes to mind as an example I would like to have. paul
[PATCH,rs6000] Do not check if SMS succeeds on powerpc
These tests have become unstable and SMS either succeeds or doesn't depending on things like changes in instruction latency. Removing the scan-rtl-dump-times checks for powerpc*-*-*. If bootstrap/regtest is passes, ok for trunk and backport to 11? Thanks! Aaron gcc/testsuite * gcc.dg/sms-1.c: Remove scan-rtl-dump-times check. * gcc.dg/sms-2.c: Remove scan-rtl-dump-times check. * gcc.dg/sms-3.c: Remove scan-rtl-dump-times check. * gcc.dg/sms-4.c: Remove scan-rtl-dump-times check. * gcc.dg/sms-6.c: Remove scan-rtl-dump-times check. * gcc.dg/sms-8.c: Remove scan-rtl-dump-times check. * gcc.dg/sms-10.c: Remove scan-rtl-dump-times check. --- gcc/testsuite/gcc.dg/sms-1.c | 2 -- gcc/testsuite/gcc.dg/sms-10.c | 3 --- gcc/testsuite/gcc.dg/sms-2.c | 2 -- gcc/testsuite/gcc.dg/sms-3.c | 3 --- gcc/testsuite/gcc.dg/sms-4.c | 3 --- gcc/testsuite/gcc.dg/sms-6.c | 2 -- gcc/testsuite/gcc.dg/sms-8.c | 4 7 files changed, 19 deletions(-) diff --git a/gcc/testsuite/gcc.dg/sms-1.c b/gcc/testsuite/gcc.dg/sms-1.c index 26868c34c71..098e1aa6e45 100644 --- a/gcc/testsuite/gcc.dg/sms-1.c +++ b/gcc/testsuite/gcc.dg/sms-1.c @@ -40,5 +40,3 @@ main () return 0; } -/* { dg-final { scan-rtl-dump-times "SMS succeeded" 1 "sms" { target powerpc*-*-* } } } */ - diff --git a/gcc/testsuite/gcc.dg/sms-10.c b/gcc/testsuite/gcc.dg/sms-10.c index d85e8e2a274..df3bba24ed0 100644 --- a/gcc/testsuite/gcc.dg/sms-10.c +++ b/gcc/testsuite/gcc.dg/sms-10.c @@ -113,6 +113,3 @@ main () return 0; } - -/* { dg-final { scan-rtl-dump-times "SMS succeeded" 1 "sms" { target powerpc*-*-* } } } */ - diff --git a/gcc/testsuite/gcc.dg/sms-2.c b/gcc/testsuite/gcc.dg/sms-2.c index 7b96f550262..f8375f9f05d 100644 --- a/gcc/testsuite/gcc.dg/sms-2.c +++ b/gcc/testsuite/gcc.dg/sms-2.c @@ -31,5 +31,3 @@ fun (nb) sy = 0; } } - -/* { dg-final { scan-rtl-dump-times "SMS loop many exits" 1 "sms" { target powerpc*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/sms-3.c b/gcc/testsuite/gcc.dg/sms-3.c index 822b516af2f..5e56ecf761c 100644 --- a/gcc/testsuite/gcc.dg/sms-3.c +++ b/gcc/testsuite/gcc.dg/sms-3.c @@ -38,6 +38,3 @@ main () foo (6, 3); return 0; } - -/* { dg-final { scan-rtl-dump-times "SMS succeeded" 1 "sms" { target powerpc*-*-* } } } */ - diff --git a/gcc/testsuite/gcc.dg/sms-4.c b/gcc/testsuite/gcc.dg/sms-4.c index f5ebb55a2f4..8416b8b9ce9 100644 --- a/gcc/testsuite/gcc.dg/sms-4.c +++ b/gcc/testsuite/gcc.dg/sms-4.c @@ -34,6 +34,3 @@ main () foo (5, a, b, c, dst); return 0; } - -/* { dg-final { scan-rtl-dump-times "SMS succeeded" 1 "sms" { target powerpc*-*-* } } } */ - diff --git a/gcc/testsuite/gcc.dg/sms-6.c b/gcc/testsuite/gcc.dg/sms-6.c index e57e01539eb..d6fa45a2cf9 100644 --- a/gcc/testsuite/gcc.dg/sms-6.c +++ b/gcc/testsuite/gcc.dg/sms-6.c @@ -41,5 +41,3 @@ int main() return 0; } - -/* { dg-final { scan-rtl-dump-times "SMS succeeded" 3 "sms" { target powerpc*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/sms-8.c b/gcc/testsuite/gcc.dg/sms-8.c index 7ccaa454125..dc0a3fc1f9b 100644 --- a/gcc/testsuite/gcc.dg/sms-8.c +++ b/gcc/testsuite/gcc.dg/sms-8.c @@ -34,7 +34,3 @@ main () res = foo (3, 4); return 0; } - -/* { dg-final { scan-rtl-dump-times "SMS succeeded" 1 "sms" { target powerpc*-*-* } } } */ - - -- 2.27.0
Re: [PATCH,rs6000] Do not check if SMS succeeds on powerpc
Hi! On Fri, Jun 11, 2021 at 01:56:58PM -0500, Aaron Sawdey wrote: > These tests have become unstable and SMS either succeeds or doesn't > depending on things like changes in instruction latency. Removing > the scan-rtl-dump-times checks for powerpc*-*-*. > > If bootstrap/regtest is passes, ok for trunk and backport to 11? Yes. Thank you! Segher
Re: [PATCH] Document that -fno-trampolines is for Ada only [PR100735]
> > Jeff et. al. > > > On 9 Jun 2021, at 17:23, Jeff Law via Gcc-patches > > wrote: > > On 5/25/2021 2:23 PM, Paul Eggert wrote: > >> The GCC manual's documentation of -fno-trampolines was apparently > >> written from an Ada point of view. However, when I read it I > >> understandably mistook it to say that -fno-trampolines also works for > >> C, C++, etc. It doesn't: it is silently ignored for these languages, > >> and I assume for any language other than Ada. > >> > >> This confusion caused me to go in the wrong direction in a Gnulib > >> dicussion, as I mistakenly thought that entire C apps with nested > >> functions could be compiled with -fno-trampolines and then use nested > >> C functions in stack overflow handlers where the alternate stack > >> is allocated via malloc. I was wrong, as this won't work on common > >> platforms like x86-64 where malloc yields non-executable storage. > >> > >> gcc/ > >> * doc/invoke.texi (Code Gen Options): > >> * doc/tm.texi.in (Trampolines): > >> Document that -fno-trampolines and -ftrampolines work > >> only with Ada. > > So Martin Uecker probably has the most state on this. IIRC when we last > > discussed -fno- > trampolines the belief was that it could be easily made to work independent > of the language, but > that it was ultimately an ABI change. That ultimately derailed plans to use > -fno-trampolines for > other languages in the immediate term. > > This is correct, it’s not technically too hard to make it work for another > language (I have a hack > in my arm64-darwin branch that does this for gfortran). As noted for most > ports it is an ABI > break and thus not usable outside a such a work-around. > > For the record (for the arm64-darwin port in the first instance), together > with some of my friends > at embecosm we plan to implement a solution to the trampoline that does not > require executable > stack and does not require an ABI break. Perhaps such a solution will be of > interest to other > ports that do not want executable stack. > > We’re not quite ready to post the design yet - but will do so in the next few > weeks (all being > well). > For reference the discussion and PATCH for C can be found here: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-12/msg01532.html FWIW: There is a proposal discussed in WG14 to add lambdas. In this context a more general and portable solution to this problem would also be useful. One related idea is to add a wider function pointer type to C that includes a data pointer. A regular function pointer could be converted to the wider pointer but back conversion only works if it originally was a regular function pointer. This pointer type could then also be used for interoperablity with other languages that have callable objects or closures. If GCC could add something like this and there is then implementation experience, we could later try to standardize it. (there was also positive feedback to this idea from some C++ people I spoke to) Martin
Re: [PATCH] c++: Substitute into function parms in lexical order [PR96560]
On 4/29/21 7:48 AM, Patrick Palka wrote: On Wed, 28 Apr 2021, Jason Merrill wrote: On 4/28/21 2:24 PM, Patrick Palka wrote: This makes tsubst_arg_types substitute into a function's parameter types in left-to-right order instead of right-to-left order, in accordance with DR 1227. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? [ diff generated with -w to hide noisy whitespace changes ] OK. We'll still substitute lambda default args in reverse order, but that shouldn't happen in sfinae context, so that shouldn't be a problem. Maybe add an assert (complain & tf_error) in the lambda case? Apparently cpp2a/lambda-uneval2.C trips over such an assert during deduction for: template auto j(T t) -> decltype([](auto x) -> decltype(x.invalid) { } (t)); which makes sense, I suppose. It looks like we can just move up the default argument processing to before the recursive call to tsubst_arg_types, as in the below. Bootstrapped and regtested on x86_64-pc-linux-gnu. OK. BTW, now that LAMBDA_EXPR has LAMBDA_EXPR_REGEN_INFO I think we have enough information to defer substituting into lambda default arguments until they're actually needed. Would that be something to look into as a followup patch? I think the language currently specifies that they're substituted immediately, so probably not unless that changes. -- >8 -- Subject: [PATCH] c++: Substitute into function parms in lexical order [PR96560] This makes tsubst_arg_types substitute into a function's parameter types in left-to-right order instead of right-to-left order, in accordance with DR 1227. gcc/cp/ChangeLog: DR 1227 PR c++/96560 * pt.c (tsubst_arg_types): Rearrange so that we substitute into TYPE_ARG_TYPES in forward order while short circuiting appropriately. Adjust formatting. gcc/testsuite/ChangeLog: DR 1227 PR c++/96560 * g++.dg/template/sfinae-dr1227.C: New test. --- gcc/cp/pt.c | 51 ++- gcc/testsuite/g++.dg/template/sfinae-dr1227.C | 23 + 2 files changed, 51 insertions(+), 23 deletions(-) create mode 100644 gcc/testsuite/g++.dg/template/sfinae-dr1227.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index eaf46659f85..e6d65595e2f 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -15068,20 +15068,13 @@ tsubst_arg_types (tree arg_types, tsubst_flags_t complain, tree in_decl) { - tree remaining_arg_types; tree type = NULL_TREE; - int i = 1; + int len = 1; tree expanded_args = NULL_TREE; - tree default_arg; if (!arg_types || arg_types == void_list_node || arg_types == end) return arg_types; - remaining_arg_types = tsubst_arg_types (TREE_CHAIN (arg_types), - args, end, complain, in_decl); - if (remaining_arg_types == error_mark_node) -return error_mark_node; - if (PACK_EXPANSION_P (TREE_VALUE (arg_types))) { /* For a pack expansion, perform substitution on the @@ -15092,7 +15085,7 @@ tsubst_arg_types (tree arg_types, if (TREE_CODE (expanded_args) == TREE_VEC) /* So that we'll spin through the parameters, one by one. */ -i = TREE_VEC_LENGTH (expanded_args); + len = TREE_VEC_LENGTH (expanded_args); else { /* We only partially substituted into the parameter @@ -15101,14 +15094,15 @@ tsubst_arg_types (tree arg_types, expanded_args = NULL_TREE; } } + else +type = tsubst (TREE_VALUE (arg_types), args, complain, in_decl); - while (i > 0) { ---i; - + /* Check if a substituted type is erroneous before substituting into + the rest of the chain. */ + for (int i = 0; i < len; i++) +{ if (expanded_args) type = TREE_VEC_ELT (expanded_args, i); -else if (!type) - type = tsubst (TREE_VALUE (arg_types), args, complain, in_decl); if (type == error_mark_node) return error_mark_node; @@ -15122,15 +15116,12 @@ tsubst_arg_types (tree arg_types, } return error_mark_node; } - -/* Do array-to-pointer, function-to-pointer conversion, and ignore - top-level qualifiers as required. */ -type = cv_unqualified (type_decays_to (type)); +} /* We do not substitute into default arguments here. The standard mandates that they be instantiated only when needed, which is done in build_over_call. */ -default_arg = TREE_PURPOSE (arg_types); + tree default_arg = TREE_PURPOSE (arg_types); /* Except that we do substitute default arguments under tsubst_lambda_expr, since the new op() won't have any associated template arguments for us @@ -15139,20 +15130,34 @@ tsubst_arg_types (tree arg_types, default_arg = tsubst_copy_and_build (default_arg, args, complain, in_decl, false/*fn*/, false/*con
Re: [C PATCH] qualifiers of pointers to arrays in C2X [PR 98397]
(PING. In case you missed this. Sorry, forgot to CC you.) Am Montag, den 24.05.2021, 08:05 +0200 schrieb Martin Uecker: > Hi Joseph, > > I found some time to update this patch. The only real change > of the patch is the qualifier in the conditional expression for > pointer to arrays in C2X. All the rest are the warnings, > which were wrong in the last version. > > I hope I got this correct this time in combination with > -pedantic-errors and -Wc11-c2x-compat. > > Martin > > > 2021-05-16 Martin Uecker > > gcc/c/ > PR c/98397 > * c-typeck.c (comp_target_types): Change pedwarn to pedwarn_c11 > for pointers to arrays with qualifiers. > (build_conditional_expr): For C23 don't lose qualifiers for pointers > to arrays when the other pointer is a void pointer. Update warnings. > (convert_for_assignment): Update warnings for C2X when converting from > void* with qualifiers to a pointer to array with the same qualifiers. > > gcc/testsuite/ > PR c/98397 > * gcc.dg/c11-qual-1.c: New test. > * gcc.dg/c2x-qual-1.c: New test. > * gcc.dg/c2x-qual-2.c: New test. > * gcc.dg/c2x-qual-3.c: New test. > * gcc.dg/c2x-qual-4.c: New test. > * gcc.dg/c2x-qual-5.c: New test. > * gcc.dg/c2x-qual-6.c: New test. > * gcc.dg/pointer-array-quals-1.c: Remove unnecessary flag. > * gcc.dg/pointer-array-quals-2.c: Remove unnecessary flag. > > > diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c > index fc64ef96fb8..5b13656c090 100644 > --- a/gcc/c/c-typeck.c > +++ b/gcc/c/c-typeck.c > @@ -1328,8 +1328,8 @@ comp_target_types (location_t location, tree ttl, tree > ttr) >val = comptypes_check_enum_int (mvl, mvr, &enum_and_int_p); > >if (val == 1 && val_ped != 1) > -pedwarn (location, OPT_Wpedantic, "pointers to arrays with different > qualifiers " > - "are incompatible in ISO C"); > +pedwarn_c11 (location, OPT_Wpedantic, "invalid use of pointers to arrays > with different > qualifiers " > + "in ISO C before C2X"); > >if (val == 2) > pedwarn (location, OPT_Wpedantic, "types are not quite compatible"); > @@ -5396,39 +5396,40 @@ build_conditional_expr (location_t colon_loc, tree > ifexp, bool ifexp_bcp, > "used in conditional expression"); > return error_mark_node; > } > - else if (VOID_TYPE_P (TREE_TYPE (type1)) > -&& !TYPE_ATOMIC (TREE_TYPE (type1))) > - { > - if ((TREE_CODE (TREE_TYPE (type2)) == ARRAY_TYPE) > - && (TYPE_QUALS (strip_array_types (TREE_TYPE (type2))) > - & ~TYPE_QUALS (TREE_TYPE (type1 > - warning_at (colon_loc, OPT_Wdiscarded_array_qualifiers, > - "pointer to array loses qualifier " > - "in conditional expression"); > - > - if (TREE_CODE (TREE_TYPE (type2)) == FUNCTION_TYPE) > + else if ((VOID_TYPE_P (TREE_TYPE (type1)) > + && !TYPE_ATOMIC (TREE_TYPE (type1))) > +|| (VOID_TYPE_P (TREE_TYPE (type2)) > +&& !TYPE_ATOMIC (TREE_TYPE (type2 > + { > + tree t1 = TREE_TYPE (type1); > + tree t2 = TREE_TYPE (type2); > + if (!VOID_TYPE_P (t1)) > +{ > + /* roles are swapped */ > + t1 = t2; > + t2 = TREE_TYPE (type1); > +} > + tree t2_stripped = strip_array_types (t2); > + if ((TREE_CODE (t2) == ARRAY_TYPE) > + && (TYPE_QUALS (t2_stripped) & ~TYPE_QUALS (t1))) > + { > + if (!flag_isoc2x) > + warning_at (colon_loc, OPT_Wdiscarded_array_qualifiers, > + "pointer to array loses qualifier " > + "in conditional expression"); > + else if (warn_c11_c2x_compat > 0) > + warning_at (colon_loc, OPT_Wc11_c2x_compat, > + "pointer to array loses qualifier " > + "in conditional expression in ISO C before C2X"); > + } > + if (TREE_CODE (t2) == FUNCTION_TYPE) > pedwarn (colon_loc, OPT_Wpedantic, >"ISO C forbids conditional expr between " >"% and function pointer"); > - result_type = build_pointer_type (qualify_type (TREE_TYPE (type1), > - TREE_TYPE (type2))); > - } > - else if (VOID_TYPE_P (TREE_TYPE (type2)) > -&& !TYPE_ATOMIC (TREE_TYPE (type2))) > - { > - if ((TREE_CODE (TREE_TYPE (type1)) == ARRAY_TYPE) > - && (TYPE_QUALS (strip_array_types (TREE_TYPE (type1))) > - & ~TYPE_QUALS (TREE_TYPE (type2 > - warning_at (colon_loc, OPT_Wdiscarded_array_qualifiers, > - "pointer to array loses qualifier " > - "in conditional expression"); > - > - if (TREE_CODE (TREE_TYPE (type1)) == FUNCTION_TYPE) > - pe
Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]
On Thu, Jun 10, 2021 at 03:11:08PM +0800, Xionghu Luo wrote: > On 2021/6/10 00:24, Segher Boessenkool wrote: > >>"!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && > >> !TARGET_P9_VECTOR > >> && !altivec_indexed_or_indirect_operand (operands[0], mode)" > >>[(const_int 0)] > >> { > >>rs6000_emit_le_vsx_permute (operands[1], operands[1], mode); > >>rs6000_emit_le_vsx_permute (operands[0], operands[1], mode); > >>rs6000_emit_le_vsx_permute (operands[1], operands[1], mode); > >>DONE; > >> }) > > > > So it seems like it is only 3 insns in the very unlucky case? Normally > > it will end up as just one simple store? > > I am afraid there is not "simple store" for *TImode on P8LE*. There is only > stxvd2x that rotates the element(stvx requires memory to be aligned, not > suitable pattern), so every vsx_le_perm_store_v1ti must be split to 3 > instructions for alternative 0, it seems incorrect to force the cost to be 4. Often it could be done as just two insns though? If the value stored is not used elsewhere? So we could make the first alternative cost 8 then as well, which will also work out for combine, right? Alternatively we could have what is now the second alternative be the first, if that is realistic -- that one already has cost 8 (it is just two machine instructions). Segher
Re: [PATCH] libgccjit: Add function to set the initial value of a global variable [PR96089]
David: this one wasn't reviewed yet by you, so you can review it. Le jeudi 20 mai 2021 à 21:27 -0400, Antoni Boucher a écrit : > Hi. > > I made this patch to set an arbitrary value to a global variable. > > This patch suffers from the same issue as inline assembly > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100380), i.e. it > segfaults if the `rvalue` is created after the global variable. > It seems to be a design issue so I'm not sure what would be the fix > for > it and having it fixed would allow me to test this new function much > more and see if things are missing (i.e. it might require a way to > create a constant struct). > See the link above for an explanation of this issue. > > Thanks for the review.
[PATCH] x86: Replace ix86_red_zone_size with ix86_red_zone_used
Add red_zone_used to machine_function to track if red zone is used. When expanding function prologue, set red_zone_used to true if red zone is used. gcc/ PR target/pr101023 * config/i386/i386.c (ix86_expand_prologue): Set red_zone_used to true if red zone is used. (ix86_output_indirect_jmp): Replace ix86_red_zone_size with ix86_red_zone_used. * config/i386/i386.h (machine_function): Add red_zone_used. (ix86_red_zone_size): Removed. (ix86_red_zone_used): New. * config/i386/i386.md (peephole2 patterns): Replace ix86_red_zone_size with ix86_red_zone_used. gcc/testsuite/ PR target/pr101023 * g++.target/i386/pr101023a.C: New test. * g++.target/i386/pr101023b.C: Likewise. --- gcc/config/i386/i386.c| 6 ++- gcc/config/i386/i386.h| 5 +- gcc/config/i386/i386.md | 8 +-- gcc/testsuite/g++.target/i386/pr101023a.C | 62 +++ gcc/testsuite/g++.target/i386/pr101023b.C | 5 ++ 5 files changed, 80 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/g++.target/i386/pr101023a.C create mode 100644 gcc/testsuite/g++.target/i386/pr101023b.C diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 05b8dc806cd..a61255857ff 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -8401,10 +8401,14 @@ ix86_expand_prologue (void) || frame.stack_pointer_offset < CHECK_STACK_LIMIT)) { ix86_emit_save_regs_using_mov (frame.reg_save_offset); + cfun->machine->red_zone_used = true; int_registers_saved = true; } } + if (frame.red_zone_size != 0) +cfun->machine->red_zone_used = true; + if (stack_realign_fp) { int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT; @@ -15915,7 +15919,7 @@ ix86_output_indirect_jmp (rtx call_op) { /* We can't have red-zone since "call" in the indirect thunk pushes the return address onto stack, destroying red-zone. */ - if (ix86_red_zone_size != 0) + if (ix86_red_zone_used) gcc_unreachable (); ix86_output_indirect_branch (call_op, "%0", true); diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 919d0b2418a..182b3275991 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -2663,6 +2663,9 @@ struct GTY(()) machine_function { invalid calls. */ BOOL_BITFIELD silent_p : 1; + /* True if red zone is used. */ + BOOL_BITFIELD red_zone_used : 1; + /* The largest alignment, in bytes, of stack slot actually used. */ unsigned int max_used_stack_alignment; @@ -2693,7 +2696,7 @@ extern GTY(()) tree ms_va_list_type_node; #define ix86_current_function_calls_tls_descriptor \ (ix86_tls_descriptor_calls_expanded_in_cfun && df_regs_ever_live_p (SP_REG)) #define ix86_static_chain_on_stack (cfun->machine->static_chain_on_stack) -#define ix86_red_zone_size (cfun->machine->frame.red_zone_size) +#define ix86_red_zone_used (cfun->machine->red_zone_used) /* Control behavior of x86_file_start. */ #define X86_FILE_START_VERSION_DIRECTIVE false diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 7743c61ec86..6e4abf32e7c 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -20491,7 +20491,7 @@ (define_peephole2 (clobber (mem:BLK (scratch)))])] "(TARGET_SINGLE_PUSH || optimize_insn_for_size_p ()) && INTVAL (operands[0]) == -GET_MODE_SIZE (word_mode) - && ix86_red_zone_size == 0" + && !ix86_red_zone_used" [(clobber (match_dup 1)) (parallel [(set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1)) (clobber (mem:BLK (scratch)))])]) @@ -20505,7 +20505,7 @@ (define_peephole2 (clobber (mem:BLK (scratch)))])] "(TARGET_DOUBLE_PUSH || optimize_insn_for_size_p ()) && INTVAL (operands[0]) == -2*GET_MODE_SIZE (word_mode) - && ix86_red_zone_size == 0" + && !ix86_red_zone_used" [(clobber (match_dup 1)) (set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1)) (parallel [(set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1)) @@ -20520,7 +20520,7 @@ (define_peephole2 (clobber (reg:CC FLAGS_REG))])] "(TARGET_SINGLE_PUSH || optimize_insn_for_size_p ()) && INTVAL (operands[0]) == -GET_MODE_SIZE (word_mode) - && ix86_red_zone_size == 0" + && !ix86_red_zone_used" [(clobber (match_dup 1)) (set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))]) @@ -20532,7 +20532,7 @@ (define_peephole2 (clobber (reg:CC FLAGS_REG))])] "(TARGET_DOUBLE_PUSH || optimize_insn_for_size_p ()) && INTVAL (operands[0]) == -2*GET_MODE_SIZE (word_mode) - && ix86_red_zone_size == 0" + && !ix86_red_zone_used" [(clobber (match_dup 1)) (set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1)) (set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))]) diff --git a/gcc/testsuite/g++.target/i386/p
[PATCH] Fix effective target for check-builtin-vec_rlnm-runnable.c test
GCC maintainers: The gcc test suite compiles and attempts to run the check-builtin- vec_rlnm-runnable.c test on Power 8 platforms. The test should only be run on Power 9 and newer platforms. The attached patch fixes the target for the executable test so it only runs on Power 9 and newer platforms. The patch was tested on powerpc64-linux instead (Power 8 BE). The test harness reports 1 unsupported test. The patch was also tested on: powerpc64le-linux instead (Power 9 LE) powerpc64le-linux instead (Power 10 LE) The test harness reports 3 expected passes and no failures. Please let me know if the patch looks OK for mainline. Thanks. Carl - The effective target for a Power 9 runnable test should be p9vector_hw. 2021-06-11 Carl Love gcc/testsuite/ChangeLog * gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c (dg-require-effective-target): Change target to p9vector_hw. --- .../gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c index cd67b06afbe..55935eaafd2 100644 --- a/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-require-effective-target p9vector_hw } */ /* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */ /* Verify the vec_rlm and vec_rlmi builtins works correctly. */ -- 2.17.1
[PATCH] c++: Don't complain about [[maybe_unused]] on non-static data members
On Fri, Jun 11, 2021 at 10:35:58PM +0200, Jakub Jelinek wrote: > > But I also agree that GCC shouldn't warn here. > > We could do that by using a wrapper around handle_unused_attribute > for the maybe_unused attribute, that way warn on unused attribute on > FIELD_DECLs, but not for maybe_unused (until we actually implement some > warning that uses it). Here it is in patch form. Ok for trunk if it passes bootstrap/regtest? 2021-06-11 Jakub Jelinek * tree.c (handle_maybe_unused_attribute): New function. (std_attribute_table): Use it as handler for maybe_unused attribute. * g++.dg/cpp1z/maybe_unused2.C: New test. --- gcc/cp/tree.c.jj2021-05-28 11:03:19.490884332 +0200 +++ gcc/cp/tree.c 2021-06-11 23:41:36.503413441 +0200 @@ -4872,6 +4872,23 @@ handle_likeliness_attribute (tree *node, return error_mark_node; } +/* The C++17 [[maybe_unused]] attribute maps to GNU unused attribute, + except that we don't want -Wattributes to warn for [[maybe_unused]] + on non-static data members. */ + +static tree +handle_maybe_unused_attribute (tree *node, tree name, tree args, + int flags, bool *no_add_attrs) +{ + if (TREE_CODE (*node) == FIELD_DECL) +{ + *no_add_attrs = true; + return NULL_TREE; +} + else +return handle_unused_attribute (node, name, args, flags, no_add_attrs); +} + /* Table of valid C++ attributes. */ const struct attribute_spec cxx_attribute_table[] = { @@ -4890,7 +4907,7 @@ const struct attribute_spec std_attribut /* { name, min_len, max_len, decl_req, type_req, fn_type_req, affects_type_identity, handler, exclude } */ { "maybe_unused", 0, 0, false, false, false, false, -handle_unused_attribute, NULL }, +handle_maybe_unused_attribute, NULL }, { "nodiscard", 0, 1, false, false, false, false, handle_nodiscard_attribute, NULL }, { "no_unique_address", 0, 0, true, false, false, false, --- gcc/testsuite/g++.dg/cpp1z/maybe_unused2.C.jj 2021-06-11 23:40:51.742027943 +0200 +++ gcc/testsuite/g++.dg/cpp1z/maybe_unused2.C 2021-06-11 23:40:47.642084225 +0200 @@ -0,0 +1,7 @@ +// { dg-do compile { target c++11 } } +// { dg-options "-Wunused -Wextra" } + +struct [[maybe_unused]] A { + [[maybe_unused]] static int i; + [[maybe_unused]] int a; +}; Jakub
Re: [PATCH] Fix effective target for check-builtin-vec_rlnm-runnable.c test
Hi! On Fri, Jun 11, 2021 at 02:19:40PM -0700, Carl Love wrote: > The gcc test suite compiles and attempts to run the check-builtin- > vec_rlnm-runnable.c test on Power 8 platforms. The test should only be > run on Power 9 and newer platforms. The attached patch fixes the > target for the executable test so it only runs on Power 9 and newer > platforms. > * gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c > (dg-require-effective-target): > Change target to p9vector_hw. You would put the ( on a new line, lik: * gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c (dg-require-effective-target): Change target to p9vector_hw. Okay for trunk like that. Thanks! Segher
[PATCH 1/1] Fix a typo in an AutoFDO error string
Trivial change, committed to trunk. [PATCH 1/1] Fix a typo in an AutoFDO error string gcc/ChangeLog: * auto-profile.c (read_profile): fix a typo in an error string --- gcc/auto-profile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c index 2a6d9a1fc24..a4601243dc9 100644 --- a/gcc/auto-profile.c +++ b/gcc/auto-profile.c @@ -939,7 +939,7 @@ read_profile (void) unsigned version = gcov_read_unsigned (); if (version != AUTO_PROFILE_VERSION) { - error ("AutoFDO profile version %u does match %u", + error ("AutoFDO profile version %u does not match %u", version, AUTO_PROFILE_VERSION); return; } -- 2.25.1
[pushed] c++: speed up looking up the current class
While looking at template instantiation tracing, I noticed that we were frequently looking up a particular class template instance while instantiating it. This patch shortcuts that lookup, and speeds up compiling stdc++.h with my (checking/unoptimized) compiler by about 3%. Tested x86_64-pc-linux-gnu, applying to trunk. gcc/cp/ChangeLog: * pt.c (lookup_template_class_1): Shortcut current_class_type. --- gcc/cp/pt.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 141388ad2e5..d4bb5cc5eaf 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -9833,6 +9833,13 @@ lookup_template_class_1 (tree d1, tree arglist, tree in_decl, tree context, /* From here on, we're only interested in the most general template. */ + /* Shortcut looking up the current class scope again. */ + if (current_class_type) + if (tree ti = CLASSTYPE_TEMPLATE_INFO (current_class_type)) + if (gen_tmpl == most_general_template (TI_TEMPLATE (ti)) + && comp_template_args (arglist, TI_ARGS (ti))) + return current_class_type; + /* Calculate the BOUND_ARGS. These will be the args that are actually tsubst'd into the definition to create the instantiation. */ base-commit: f16f65f8364b5bf23c72a8fdbba4974ecadc5cb6 -- 2.27.0