Re: [match.pd] PR83750 - CSE erf/erfc pair
On Mon, 18 Oct 2021 at 17:23, Richard Biener wrote: > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener wrote: > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener wrote: > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > Hi Richard, > > > > > > As suggested in PR, I have attached WIP patch that adds two patterns > > > > > > to match.pd: > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and, > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p(). > > > > > > > > > > > > This works to remove call to erfc for the following test: > > > > > > double f(double x) > > > > > > { > > > > > > double g(double, double); > > > > > > > > > > > > double t1 = __builtin_erf (x); > > > > > > double t2 = __builtin_erfc (x); > > > > > > return g(t1, t2); > > > > > > } > > > > > > > > > > > > with .optimized dump shows: > > > > > > t1_2 = __builtin_erf (x_1(D)); > > > > > > t2_3 = 1.0e+0 - t1_2; > > > > > > > > > > > > However, for the following test: > > > > > > double f(double x) > > > > > > { > > > > > > double g(double, double); > > > > > > > > > > > > double t1 = __builtin_erfc (x); > > > > > > return t1; > > > > > > } > > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 - > > > > > > erf(x) to erfc(x) again > > > > > > post canonicalization. > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied, > > > > > > but then it tries to > > > > > > resimplify erfc(x), which fails post canonicalization. So we end up > > > > > > with erfc(x) transformed to > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal. > > > > > > Could you suggest how to proceed ? > > > > > > > > > > I applied your patch manually and it does the intended > > > > > simplifications so I wonder what I am missing? > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's > > > > no erf(x) in the source ? > > > > > > I do think it's reasonable to expect erfc to be available when erf > > > is and vice versa but note both are C99 specified functions (either > > > requires -lm). > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ? > > Yes, but I'm confused because you say the patch doesn't work for you? The patch works for me to CSE erf/erfc pair. However when there's only erfc in the source, it canonicalizes erfc(x) to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to erfc(x) with -O3 -funsafe-math-optimizations. For, t1 = __builtin_erfc(x), .optimized dump shows: _2 = __builtin_erf (x_1(D)); t1_3 = 1.0e+0 - _2; and for, double t1 = x + __builtin_erfc(x); .optimized dump shows: _3 = __builtin_erf (x_2(D)); _7 = x_2(D) + 1.0e+0; t1_4 = _7 - _3; I assume in both cases, we want erfc in the code-gen instead ? I think the reason uncaonicalization fails is because the pattern 1 - erf(x) to erfc(x) gets applied, but then it fails in resimplifying erfc(x), and we end up with 1 - erf(x) in code-gen. >From gimple-match.c, it hits the simplification: gimple_seq *lseq = seq; if (__builtin_expect (!dbg_cnt (match), 0)) goto next_after_fail1172; if (__builtin_expect (dump_file && (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__); { res_op->set_op (CFN_BUILT_IN_ERFC, type, 1); res_op->ops[0] = captures[0]; res_op->resimplify (lseq, valueize); return true; } But res_op->resimplify returns false, and doesn't end up adding to lseq. As you suggest, should we instead handle this in fre to transform erfc(x) to 1 - erf(x), only when there's a matching erf(x) in the source ? Thanks, Prathamesh > > Btw, please add the testcase from the PR and also a testcase that shows > the canonicalization is undone. Maybe you can also double-check that > we handle x + erfc (x) because I see we associate that as > (x + 1) - erf (x) which is then not recognized back to erfc. > > The less surprising (as to preserve the function called in the source) > variant for the PR would be to teach CSE to lookup erf(x) when > visiting erfc(x) and when found synthesize 1 - erf(x). > > That said, a mathematician should chime in on how important it is > to preserve erfc vs. erf (precision or even speed). > > Thanks, > Richard. > > > Thanks, > > Prathamesh > > > > > > > > Richard. > > > > > > > So for the following test: > > > > double f(double x) > > > > { > > > > t1 = __builtin_erfc(x) > > > > return t1; > > > > } > > > > > > > > .optimized dump shows: > > > > double f (double x) > > > > { > > > > double t
[PATCH] Remove check_aligned parameter from vect_supportable_dr_alignment
There are two calls with true as parameter, one is only relevant for the case of the misalignment being unknown which means the access is never aligned there, the other is in the peeling hash insert code used conditional on the unlimited cost model which adds an artificial count. But the way it works right now is that it boosts the count if the specific misalignment when not peeling is unsupported - in particular when the access is currently aligned we'll query the backend with a misalign value of zero. I've changed it to boost the peeling when unknown alignment is not supported instead and noted how we could in principle improve this. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-10-19 Richard Biener * tree-vectorizer.h (vect_supportable_dr_alignment): Remove check_aligned argument. * tree-vect-data-refs.c (vect_supportable_dr_alignment): Likewise. (vect_peeling_hash_insert): Add supportable_if_not_aligned argument and do not call vect_supportable_dr_alignment here. (vect_peeling_supportable): Adjust. (vect_enhance_data_refs_alignment): Compute whether the access is supported with different alignment here and pass that down to vect_peeling_hash_insert. (vect_vfa_access_size): Adjust. * tree-vect-stmts.c (vect_get_store_cost): Likewise. (vect_get_load_cost): Likewise. (get_negative_load_store_type): Likewise. (get_group_load_store_type): Likewise. (get_load_store_type): Likewise. --- gcc/tree-vect-data-refs.c | 41 ++- gcc/tree-vect-stmts.c | 12 ++-- gcc/tree-vectorizer.h | 2 +- 3 files changed, 30 insertions(+), 25 deletions(-) diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 4c9215874c9..7c95f9ad69e 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -1465,14 +1465,10 @@ peel_info_hasher::equal (const _vect_peel_info *a, const _vect_peel_info *b) static void vect_peeling_hash_insert (hash_table *peeling_htab, loop_vec_info loop_vinfo, dr_vec_info *dr_info, - int npeel) + int npeel, bool supportable_if_not_aligned) { struct _vect_peel_info elem, *slot; _vect_peel_info **new_slot; - tree vectype = STMT_VINFO_VECTYPE (dr_info->stmt); - bool supportable_dr_alignment -= (vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype, true) - != dr_unaligned_unsupported); elem.npeel = npeel; slot = peeling_htab->find (&elem); @@ -1488,7 +1484,9 @@ vect_peeling_hash_insert (hash_table *peeling_htab, *new_slot = slot; } - if (!supportable_dr_alignment + /* If this DR is not supported with unknown misalignment then bias + this slot when the cost model is disabled. */ + if (!supportable_if_not_aligned && unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) slot->count += VECT_MAX_COST; } @@ -1661,7 +1659,7 @@ vect_peeling_supportable (loop_vec_info loop_vinfo, dr_vec_info *dr0_info, vect_update_misalignment_for_peel (dr_info, dr0_info, npeel); tree vectype = STMT_VINFO_VECTYPE (dr_info->stmt); supportable_dr_alignment - = vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype, false); + = vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype); SET_DR_MISALIGNMENT (dr_info, save_misalignment); if (supportable_dr_alignment == dr_unaligned_unsupported) @@ -1803,7 +1801,6 @@ opt_result vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) { class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); - enum dr_alignment_support supportable_dr_alignment; dr_vec_info *first_store = NULL; dr_vec_info *dr0_info = NULL; struct data_reference *dr; @@ -1912,8 +1909,6 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) stmt_vec_info stmt_info = dr_info->stmt; tree vectype = STMT_VINFO_VECTYPE (stmt_info); - supportable_dr_alignment - = vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype, true); do_peeling = vector_alignment_reachable_p (dr_info); if (do_peeling) { @@ -1957,11 +1952,20 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) } /* Save info about DR in the hash table. Also include peeling -amounts according to the explanation above. */ +amounts according to the explanation above. Indicate +the alignment status when the ref is not aligned. +??? Rather than using unknown alignment here we should +prune all entries from the peeling hashtable which cause +DRs to be not supported. */ + bool supportable_if_not_aligned + = targetm.vectorize.support_vector_misalignment + (TYPE_MODE (vectype), TREE_TYPE (DR_REF (
Re: [PATCH] tree-object-size: Make unknown a computation
On 10/19/21 12:28, Jakub Jelinek wrote: On Tue, Oct 19, 2021 at 09:47:45AM +0530, Siddhesh Poyarekar wrote: Compute the unknown size value as a function of the min/max bit of object_size_type. This transforms into a neat little branchless sequence on x86_64: movl%edi, %eax sarl%eax xorl$1, %eax negl%eax cltq which should be faster than loading the value from memory. A quick unscientific test using `time make check-gcc RUNTESTFLAGS="dg.exp=builtin*"` But if you use some other higher bit of object_size_type for the mode (normal vs. dynamic) you'd need to mask it away, so it will become longer. Anyway, I guess that part is ok. Yes, my WIP patchset has: ((unsigned HOST_WIDE_INT) -(((object_size_type & MIN_BIT) >> 1) ^ 1)); which results in: xorl%eax, %eax andl$2, %edi sete%al negq%rax -/* Compute object_sizes for PTR, defined to an unknown value. */ - -static void -unknown_object_size (struct object_size_info *osi, tree ptr) -{ - int object_size_type = osi->object_size_type; - unsigned int varno = SSA_NAME_VERSION (ptr); - unsigned HOST_WIDE_INT bytes; - - gcc_assert (object_sizes[object_size_type][varno] - != unknown[object_size_type]); - gcc_assert (osi->pass == 0); - - bytes = unknown[object_size_type]; - - if ((object_size_type & 2) == 0) -{ - if (object_sizes[object_size_type][varno] < bytes) - object_sizes[object_size_type][varno] = bytes; -} - else -{ - if (object_sizes[object_size_type][varno] > bytes) - object_sizes[object_size_type][varno] = bytes; -} -} But I don't think removing this function is desirable. Can it be greatly simplified? Yes, certainly. The assert verifies it is not unknown before, and then for mode 0 or 1 uses maximum which will always be unknown and for mode 2 or 3 minimum which will always be unknown as well. But I'd keep the asserts in there. So it can become int object_size_type = osi->object_size_type; unsigned int varno = SSA_NAME_VERSION (ptr); unsigned HOST_WIDE_INT bytes = unknown (object_size_type); gcc_checking_assert (object_sizes[object_size_type][varno] != bytes); gcc_checking_assert (osi->pass == 0); object_sizes[object_size_type][varno] = bytes; OK, I'll send an updated patch. Thanks, Siddhesh
Re: [match.pd] PR83750 - CSE erf/erfc pair
On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches wrote: > > On Mon, 18 Oct 2021 at 17:23, Richard Biener wrote: > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener wrote: > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener > > > > > wrote: > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > Hi Richard, > > > > > > > As suggested in PR, I have attached WIP patch that adds two > > > > > > > patterns > > > > > > > to match.pd: > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and, > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p(). > > > > > > > > > > > > > > This works to remove call to erfc for the following test: > > > > > > > double f(double x) > > > > > > > { > > > > > > > double g(double, double); > > > > > > > > > > > > > > double t1 = __builtin_erf (x); > > > > > > > double t2 = __builtin_erfc (x); > > > > > > > return g(t1, t2); > > > > > > > } > > > > > > > > > > > > > > with .optimized dump shows: > > > > > > > t1_2 = __builtin_erf (x_1(D)); > > > > > > > t2_3 = 1.0e+0 - t1_2; > > > > > > > > > > > > > > However, for the following test: > > > > > > > double f(double x) > > > > > > > { > > > > > > > double g(double, double); > > > > > > > > > > > > > > double t1 = __builtin_erfc (x); > > > > > > > return t1; > > > > > > > } > > > > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 - > > > > > > > erf(x) to erfc(x) again > > > > > > > post canonicalization. > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets > > > > > > > applied, > > > > > > > but then it tries to > > > > > > > resimplify erfc(x), which fails post canonicalization. So we end > > > > > > > up > > > > > > > with erfc(x) transformed to > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal. > > > > > > > Could you suggest how to proceed ? > > > > > > > > > > > > I applied your patch manually and it does the intended > > > > > > simplifications so I wonder what I am missing? > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's > > > > > no erf(x) in the source ? > > > > > > > > I do think it's reasonable to expect erfc to be available when erf > > > > is and vice versa but note both are C99 specified functions (either > > > > requires -lm). > > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ? > > > > Yes, but I'm confused because you say the patch doesn't work for you? > The patch works for me to CSE erf/erfc pair. > However when there's only erfc in the source, it canonicalizes erfc(x) > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to > erfc(x) > with -O3 -funsafe-math-optimizations. > > For, > t1 = __builtin_erfc(x), > > .optimized dump shows: > _2 = __builtin_erf (x_1(D)); > t1_3 = 1.0e+0 - _2; > > and for, > double t1 = x + __builtin_erfc(x); > > .optimized dump shows: > _3 = __builtin_erf (x_2(D)); > _7 = x_2(D) + 1.0e+0; > t1_4 = _7 - _3; > > I assume in both cases, we want erfc in the code-gen instead ? > I think the reason uncaonicalization fails is because the pattern 1 - > erf(x) to erfc(x) > gets applied, but then it fails in resimplifying erfc(x), and we end > up with 1 - erf(x) in code-gen. > > From gimple-match.c, it hits the simplification: > > gimple_seq *lseq = seq; > if (__builtin_expect (!dbg_cnt > (match), 0)) goto next_after_fail1172; > if (__builtin_expect (dump_file && > (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern > %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__); > { > res_op->set_op (CFN_BUILT_IN_ERFC, type, 1); > res_op->ops[0] = captures[0]; > res_op->resimplify (lseq, valueize); > return true; > } > > But res_op->resimplify returns false, and doesn't end up adding to lseq. There's nothing to add to lseq since there's also nothing to resimplify. The only thing that could happen is that the replacement is not done because replace_stmt_with_simplification via maybe_push_res_to_seq doesn't pass the builtin_decl_implicit test: /* Find the function we want to call. */ tree decl = builtin_decl_implicit (as_builtin_fn (fn)); if (!decl) return NULL; btw, it did work for me since the call was present before and gimplification should then mark the function eligible for implicit generation. > As you suggest, should we instead handle this in fre to transform > erfc(x) to 1 - erf(x), only when > there's a matching erf(x) in the source ? Note that's strictly l
Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.
On Tue, Oct 19, 2021 at 8:52 AM Richard Biener wrote: > > On Mon, Oct 18, 2021 at 4:03 PM Aldy Hernandez wrote: > > > > > > > > On 10/18/21 3:41 PM, Aldy Hernandez wrote: > > > > > I've been experimenting with reducing the total number of threading > > > passes, and I'd like to see if there's consensus/stomach for altering > > > the pipeline. Note, that the goal is to remove forward threader clients, > > > not the other way around. So, we should prefer to remove a VRP threader > > > instance over a *.thread one immediately before VRP. > > > > > > After some playing, it looks like if we enable fully-resolving mode in > > > the *.thread passes immediately preceeding VRP, we can remove the VRP > > > threading passes altogether, thus removing 2 threading passes (and > > > forward threading passes at that!). > > > > It occurs to me that we could also remove the threading before VRP > > passes, and enable a fully-resolving backward threader after VRP. I > > haven't played with this scenario, but it should be just as good. That > > being said, I don't know the intricacies of why we had both pre and post > > VRP threading passes, and if one is ideally better than the other. > > It was done because they were different threaders. Since the new threader > uses built-in VRP it shouldn't really matter whether it's before or after > VRP _for the threading_, but it might be that if threading runs before VRP > then VRP itself can do a better job on cleaning up the IL. Good point. FWIW, earlier this season I played with replacing the VRPs with evrp instances (which fold far more conditionals) and I found that the threaders can actually find LESS opportunities after *vrp fold away things. I don't know if this is a good or a bad thing. Perhaps we should benchmark three alternatives: 1. Mainline 2. Fully resolving threader -> VRP -> No threading. 3. No threading -> VRP -> Full resolving threader. ...and see what the actual effect is, regardless of number of threaded paths. Speak of which, what's the blessed way of benchmarking performance nowadays? I've seen some PRs fly that measure some more lightweight benchmarks (open source?) than a full blown SPEC. > > + /* ?? Is this still needed. ?? */ >/* Threading can leave many const/copy propagations in the IL. > Clean them up. Instead of just copy_prop, we use ccp to > compute alignment and nonzero bits. */ > > Yes, it's still needed but not for the stated reason - the VRP > substitution and folding stage should deal with copy/constant propagation > but we replaced the former copy propagation with CCP to re-compute > nonzero bits & alignment so I'd change the comment to > >/* Run CCP to compute alignment and nonzero bits. */ Ahh.. There's another similar comment after DOM. Is this comment still relevant? NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */); /* Threading can leave many const/copy propagations in the IL. Clean them up. Failure to do so well can lead to false positives from warnings for erroneous code. */ NEXT_PASS (pass_copy_prop); /* Identify paths that should never be executed in a conforming program and isolate those paths. */ Thanks. Aldy
[PATCH v2] tree-object-size: Make unknown a computation
Compute the unknown size value as a function of the min/max bit of object_size_type. This transforms into a neat little branchless sequence on x86_64: movl%edi, %eax sarl%eax xorl$1, %eax negl%eax cltq which should be faster than loading the value from memory. A quick unscientific test using `time make check-gcc RUNTESTFLAGS="dg.exp=builtin*"` shaves about half a second off execution time with this. Also simplify implementation of unknown_object_size. gcc/ChangeLog: * tree-object-size.c (unknown): Make into a function. Adjust all uses. (unknown_object_size): Simplify implementation. Signed-off-by: Siddhesh Poyarekar --- gcc/tree-object-size.c | 100 ++--- 1 file changed, 43 insertions(+), 57 deletions(-) diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c index 46a976dfe10..4334e05ef70 100644 --- a/gcc/tree-object-size.c +++ b/gcc/tree-object-size.c @@ -45,13 +45,6 @@ struct object_size_info unsigned int *stack, *tos; }; -static const unsigned HOST_WIDE_INT unknown[4] = { - HOST_WIDE_INT_M1U, - HOST_WIDE_INT_M1U, - 0, - 0 -}; - static tree compute_object_offset (const_tree, const_tree); static bool addr_object_size (struct object_size_info *, const_tree, int, unsigned HOST_WIDE_INT *); @@ -82,6 +75,11 @@ static bitmap computed[4]; /* Maximum value of offset we consider to be addition. */ static unsigned HOST_WIDE_INT offset_limit; +static inline unsigned HOST_WIDE_INT +unknown (int object_size_type) +{ + return ((unsigned HOST_WIDE_INT) -((object_size_type >> 1) ^ 1)); +} /* Initialize OFFSET_LIMIT variable. */ static void @@ -204,7 +202,7 @@ decl_init_size (tree decl, bool min) /* Compute __builtin_object_size for PTR, which is a ADDR_EXPR. OBJECT_SIZE_TYPE is the second argument from __builtin_object_size. - If unknown, return unknown[object_size_type]. */ + If unknown, return unknown (object_size_type). */ static bool addr_object_size (struct object_size_info *osi, const_tree ptr, @@ -216,7 +214,7 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, /* Set to unknown and overwrite just before returning if the size could be determined. */ - *psize = unknown[object_size_type]; + *psize = unknown (object_size_type); pt_var = TREE_OPERAND (ptr, 0); while (handled_component_p (pt_var)) @@ -244,9 +242,9 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, SSA_NAME_VERSION (var))) sz = object_sizes[object_size_type][SSA_NAME_VERSION (var)]; else - sz = unknown[object_size_type]; + sz = unknown (object_size_type); } - if (sz != unknown[object_size_type]) + if (sz != unknown (object_size_type)) { offset_int mem_offset; if (mem_ref_offset (pt_var).is_constant (&mem_offset)) @@ -257,13 +255,13 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, else if (wi::fits_uhwi_p (dsz)) sz = dsz.to_uhwi (); else - sz = unknown[object_size_type]; + sz = unknown (object_size_type); } else - sz = unknown[object_size_type]; + sz = unknown (object_size_type); } - if (sz != unknown[object_size_type] && sz < offset_limit) + if (sz != unknown (object_size_type) && sz < offset_limit) pt_var_size = size_int (sz); } else if (DECL_P (pt_var)) @@ -445,7 +443,7 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, /* Compute __builtin_object_size for CALL, which is a GIMPLE_CALL. Handles calls to functions declared with attribute alloc_size. OBJECT_SIZE_TYPE is the second argument from __builtin_object_size. - If unknown, return unknown[object_size_type]. */ + If unknown, return unknown (object_size_type). */ static unsigned HOST_WIDE_INT alloc_object_size (const gcall *call, int object_size_type) @@ -459,7 +457,7 @@ alloc_object_size (const gcall *call, int object_size_type) calltype = gimple_call_fntype (call); if (!calltype) -return unknown[object_size_type]; +return unknown (object_size_type); /* Set to positions of alloc_size arguments. */ int arg1 = -1, arg2 = -1; @@ -479,7 +477,7 @@ alloc_object_size (const gcall *call, int object_size_type) || (arg2 >= 0 && (arg2 >= (int)gimple_call_num_args (call) || TREE_CODE (gimple_call_arg (call, arg2)) != INTEGER_CST))) -return unknown[object_size_type]; +return unknown (object_size_type); tree bytes = NULL_TREE; if (arg2 >= 0) @@ -492,7 +490,7 @@ alloc_object_size (const gcall *call, int object_size_type) if (bytes && tree_fits_uhwi_p (bytes)) return tree_to_uhwi (bytes); - return unknown[object_size_type]; + return unknown (object_size_type
Re: [PATCH v2] tree-object-size: Make unknown a computation
On Tue, Oct 19, 2021 at 01:08:30PM +0530, Siddhesh Poyarekar wrote: > Compute the unknown size value as a function of the min/max bit of > object_size_type. This transforms into a neat little branchless > sequence on x86_64: > > movl%edi, %eax > sarl%eax > xorl$1, %eax > negl%eax > cltq > > which should be faster than loading the value from memory. A quick > unscientific test using > > `time make check-gcc RUNTESTFLAGS="dg.exp=builtin*"` > > shaves about half a second off execution time with this. Also simplify > implementation of unknown_object_size. > > gcc/ChangeLog: > > * tree-object-size.c (unknown): Make into a function. Adjust > all uses. > (unknown_object_size): Simplify implementation. > > Signed-off-by: Siddhesh Poyarekar Ok for trunk if it passes bootstrap/regtest. Jakub
Re: [SVE] Adjust PR93183 test-case to compile with -march=armv8.3-a+sve
Prathamesh Kulkarni writes: > Hi, > The attached patch removes "-mcpu=generic+sve" from dg-options, > because it conflicts > with -march=armv8.3-a+sve, and resulted in: > > cc1: warning: switch '-mcpu=generic+sve' conflicts with > '-march=armv8.3-a+sve' switch^M > FAIL: gcc.target/aarch64/sve/pr93183.c (test for excess errors) > Excess errors: > cc1: warning: switch '-mcpu=generic+sve' conflicts with > '-march=armv8.3-a+sve' switch > > Is the patch OK to commit ? Yes, thanks. Richard > Thanks, > Prathamesh > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c > b/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c > index 2f92224cecb..8d1ee418baf 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O3 -mcpu=generic+sve" } */ > +/* { dg-options "-O3" } */ > > typedef unsigned char uint8_t; >
[PATCH] Add misalignment output parameter to get_load_store_type
This makes us compute the misalignment alongside the alignment support scheme in get_load_store_type, removing some out-of-place calls to the DR alignment API. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-10-18 Richard Biener * tree-vect-stmts.c (get_group_load_store_type): Add misalignment output parameter and initialize it. (get_group_load_store_type): Likewise. (vectorizable_store): Remove now redundant queries. (vectorizable_load): Likewise. --- gcc/tree-vect-stmts.c | 39 +++ 1 file changed, 27 insertions(+), 12 deletions(-) diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 5d8fe42e200..c3690769d8f 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -2109,6 +2109,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, bool masked_p, vec_load_store_type vls_type, vect_memory_access_type *memory_access_type, dr_alignment_support *alignment_support_scheme, + int *misalignment, gather_scatter_info *gs_info) { loop_vec_info loop_vinfo = dyn_cast (vinfo); @@ -2294,10 +2295,16 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, if (*memory_access_type == VMAT_GATHER_SCATTER || *memory_access_type == VMAT_ELEMENTWISE) -*alignment_support_scheme = dr_unaligned_supported; +{ + *alignment_support_scheme = dr_unaligned_supported; + *misalignment = DR_MISALIGNMENT_UNKNOWN; +} else -*alignment_support_scheme - = vect_supportable_dr_alignment (vinfo, first_dr_info, vectype); +{ + *alignment_support_scheme + = vect_supportable_dr_alignment (vinfo, first_dr_info, vectype); + *misalignment = dr_misalignment (first_dr_info, vectype); +} if (vls_type != VLS_LOAD && first_stmt_info == stmt_info) { @@ -2337,7 +2344,9 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, storing it in *MEMORY_ACCESS_TYPE if so. If we decide to use gathers or scatters, fill in GS_INFO accordingly. In addition *ALIGNMENT_SUPPORT_SCHEME is filled out and false is returned if - the target does not support the alignment scheme. + the target does not support the alignment scheme. *MISALIGNMENT + is set according to the alignment of the access (including + DR_MISALIGNMENT_UNKNOWN when it is unknown). SLP says whether we're performing SLP rather than loop vectorization. MASKED_P is true if the statement is conditional on a vectorized mask. @@ -2351,10 +2360,12 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int ncopies, vect_memory_access_type *memory_access_type, dr_alignment_support *alignment_support_scheme, +int *misalignment, gather_scatter_info *gs_info) { loop_vec_info loop_vinfo = dyn_cast (vinfo); poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); + *misalignment = DR_MISALIGNMENT_UNKNOWN; if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) { *memory_access_type = VMAT_GATHER_SCATTER; @@ -2402,7 +2413,8 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, if (!get_group_load_store_type (vinfo, stmt_info, vectype, slp_node, masked_p, vls_type, memory_access_type, - alignment_support_scheme, gs_info)) + alignment_support_scheme, + misalignment, gs_info)) return false; } else if (STMT_VINFO_STRIDED_P (stmt_info)) @@ -2439,6 +2451,8 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, = vect_supportable_dr_alignment (vinfo, STMT_VINFO_DR_INFO (stmt_info), vectype); + *misalignment = dr_misalignment (STMT_VINFO_DR_INFO (stmt_info), + vectype); } } @@ -7337,9 +7351,10 @@ vectorizable_store (vec_info *vinfo, vect_memory_access_type memory_access_type; enum dr_alignment_support alignment_support_scheme; + int misalignment; if (!get_load_store_type (vinfo, stmt_info, vectype, slp_node, mask, vls_type, ncopies, &memory_access_type, - &alignment_support_scheme, &gs_info)) + &alignment_support_scheme, &misalignment, &gs_info)) return false; if (mask) @@ -8218,14 +8233,13 @@ vectorizable_store (vec_info *vinfo, gcc_assert (aligned_access_p (first_dr_info, vectype)); misalign = 0; } - else if (dr_misalignment
Re: [SVE] Adjust PR93183 test-case to compile with -march=armv8.3-a+sve
On Tue, 19 Oct 2021 at 13:32, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > Hi, > > The attached patch removes "-mcpu=generic+sve" from dg-options, > > because it conflicts > > with -march=armv8.3-a+sve, and resulted in: > > > > cc1: warning: switch '-mcpu=generic+sve' conflicts with > > '-march=armv8.3-a+sve' switch^M > > FAIL: gcc.target/aarch64/sve/pr93183.c (test for excess errors) > > Excess errors: > > cc1: warning: switch '-mcpu=generic+sve' conflicts with > > '-march=armv8.3-a+sve' switch > > > > Is the patch OK to commit ? > > Yes, thanks. Thanks, committed as 6b4c18b98127087d7f14062b81bc678f0589cd36. Thanks, Prathamesh > > Richard > > > Thanks, > > Prathamesh > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c > > b/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c > > index 2f92224cecb..8d1ee418baf 100644 > > --- a/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c > > @@ -1,5 +1,5 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O3 -mcpu=generic+sve" } */ > > +/* { dg-options "-O3" } */ > > > > typedef unsigned char uint8_t; > >
Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.
On Tue, Oct 19, 2021 at 9:33 AM Aldy Hernandez wrote: > > On Tue, Oct 19, 2021 at 8:52 AM Richard Biener > wrote: > > > > On Mon, Oct 18, 2021 at 4:03 PM Aldy Hernandez wrote: > > > > > > > > > > > > On 10/18/21 3:41 PM, Aldy Hernandez wrote: > > > > > > > I've been experimenting with reducing the total number of threading > > > > passes, and I'd like to see if there's consensus/stomach for altering > > > > the pipeline. Note, that the goal is to remove forward threader > > > > clients, > > > > not the other way around. So, we should prefer to remove a VRP threader > > > > instance over a *.thread one immediately before VRP. > > > > > > > > After some playing, it looks like if we enable fully-resolving mode in > > > > the *.thread passes immediately preceeding VRP, we can remove the VRP > > > > threading passes altogether, thus removing 2 threading passes (and > > > > forward threading passes at that!). > > > > > > It occurs to me that we could also remove the threading before VRP > > > passes, and enable a fully-resolving backward threader after VRP. I > > > haven't played with this scenario, but it should be just as good. That > > > being said, I don't know the intricacies of why we had both pre and post > > > VRP threading passes, and if one is ideally better than the other. > > > > It was done because they were different threaders. Since the new threader > > uses built-in VRP it shouldn't really matter whether it's before or after > > VRP _for the threading_, but it might be that if threading runs before VRP > > then VRP itself can do a better job on cleaning up the IL. > > Good point. > > FWIW, earlier this season I played with replacing the VRPs with evrp > instances (which fold far more conditionals) and I found that the > threaders can actually find LESS opportunities after *vrp fold away > things. I don't know if this is a good or a bad thing. Probably a sign that either threading theads stuff that's pointless (does not consider conditions on the path that always evaluate false?) or that after VRP and removing redundant conditions blocks are now bigger and we run into some --param limit (that would suggest the limits behave odd if it would thread when we'd artificially split blocks). Examples might be interesting to look at to understand what's going "wrong". > Perhaps we > should benchmark three alternatives: > > 1. Mainline > 2. Fully resolving threader -> VRP -> No threading. > 3. No threading -> VRP -> Full resolving threader. > > ...and see what the actual effect is, regardless of number of threaded paths. As said, only 2. makes "sense" to me unless examples show why we really have the usual pass ordering issue. As said, I think threading exposes new VRP (esp. constant/copy prop) opportunities but VRP shouldn't expose new threading opportunities. > Speak of which, what's the blessed way of benchmarking performance > nowadays? I've seen some PRs fly that measure some more lightweight > benchmarks (open source?) than a full blown SPEC. The answer is SPEC. The other answer would be GCC bootstrap time and resulting code size. > > > > + /* ?? Is this still needed. ?? */ > >/* Threading can leave many const/copy propagations in the IL. > > Clean them up. Instead of just copy_prop, we use ccp to > > compute alignment and nonzero bits. */ > > > > Yes, it's still needed but not for the stated reason - the VRP > > substitution and folding stage should deal with copy/constant propagation > > but we replaced the former copy propagation with CCP to re-compute > > nonzero bits & alignment so I'd change the comment to > > > >/* Run CCP to compute alignment and nonzero bits. */ > > Ahh.. > > There's another similar comment after DOM. Is this comment still relevant? Yes, since DOM still does threading the threaded paths lack const/copy propagation. > NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */); > /* Threading can leave many const/copy propagations in the IL. > Clean them up. Failure to do so well can lead to false > positives from warnings for erroneous code. */ > NEXT_PASS (pass_copy_prop); > /* Identify paths that should never be executed in a conforming > program and isolate those paths. */ > > Thanks. > Aldy >
[COMMITTED] Change threading comment before pass_ccp pass.
As suggested. Thanks. gcc/ChangeLog: * passes.def: Change threading comment before pass_ccp pass. --- gcc/passes.def | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/gcc/passes.def b/gcc/passes.def index c11c237f6d2..4c54176328b 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -339,9 +339,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); NEXT_PASS (pass_vrp_threader); - /* Threading can leave many const/copy propagations in the IL. -Clean them up. Instead of just copy_prop, we use ccp to -compute alignment and nonzero bits. */ + /* Run CCP to compute alignment and nonzero bits. */ NEXT_PASS (pass_ccp, true /* nonzero_p */); NEXT_PASS (pass_warn_restrict); NEXT_PASS (pass_dse); -- 2.31.1
Re: [PATCH] rs6000: Remove unnecessary option manipulation.
On 10/15/21 17:24, Martin Liška wrote: P.S. Next time, please CC me. All right, I tested the updated patch and it works fine on ppc64le-linux-gnu. May I install it? Thanks, Martin
Re: PATCH, rs6000] Optimization for vec_xl_sext
Committed as r12-4494. Thanks to all of you. Gui Haochen On 15/10/2021 上午 2:53, David Edelsohn wrote: On Thu, Oct 14, 2021 at 2:17 AM HAO CHEN GUI wrote: Hi, The patch optimizes the code generation for vec_xl_sext builtin. Now all the sign extensions are done on VSX registers directly. Bootstrapped and tested on powerpc64le-linux with no regressions. Is this okay for trunk? Any recommendations? Thanks a lot. I refined the patch according to Bill and David's advice. I put the patch.diff and ChangeLog in attachment also in case the indentation doesn't show correctly in email body. ChangeLog 2021-10-11 Haochen Gui gcc/ * config/rs6000/rs6000-call.c (altivec_expand_lxvr_builtin): Modify the expansion for sign extension. All extensions are done within VSX registers. gcc/testsuite/ * gcc.target/powerpc/p10_vec_xl_sext.c: New test. This is okay. Thanks, David
[PATCH] Adjust testcase for O2 vectorization.
updated patch: 1. Add documents in doc/sourcebuild.texi (Effective-Target Keywords). 2. Reduce -novec.c testcases to contain only new failed parted which is caused by O2 vectorization. 3. Add PR in dg-warning comment. As discussed in [1], this patch add xfail/target selector to those testcases, also make a copy of them so that they can be tested w/o vectorization. Newly added xfail/target selectors are used to check the vectorization capability of continuous byte/double bytes storage, these scenarios are exactly the part of the testcases that regressed after O2 vectorization. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581456.html. gcc/ChangeLog * doc/sourcebuild.texi (Effective-Target Keywords): Document vect_slp_v2qi_store, vect_slp_v4qi_store, vect_slp_v8qi_store, vect_slp_v16qi_store, vect_slp_v2hi_store, vect_slp_v4hi_store, vect_slp_v2si_store, vect_slp_v4si_store. gcc/testsuite/ChangeLog PR middle-end/102722 PR middle-end/102697 PR middle-end/102462 PR middle-end/102706 PR middle-end/102744 * c-c++-common/Wstringop-overflow-2.c: Adjust testcase with new xfail/target selector. * gcc.dg/Warray-bounds-51.c: Ditto. * gcc.dg/Warray-parameter-3.c: Ditto. * gcc.dg/Wstringop-overflow-14.c: Ditto. * gcc.dg/Wstringop-overflow-21.c: Ditto. * gcc.dg/Wstringop-overflow-68.c: Ditto. * gcc.dg/Wstringop-overflow-76.c: Ditto. * gcc.dg/Warray-bounds-48.c: Ditto. * gcc.dg/Wzero-length-array-bounds-2.c: Ditto. * lib/target-supports.exp (check_vect_slp_aligned_store_usage): New function. (check_effective_target_vect_slp_v2qi_store): Ditto. (check_effective_target_vect_slp_v4qi_store): Ditto. (check_effective_target_vect_slp_v8qi_store): Ditto. (check_effective_target_vect_slp_v16qi_store): Ditto. (check_effective_target_vect_slp_v2hi_store): Ditto. (check_effective_target_vect_slp_v4hi_store): Ditto. (check_effective_target_vect_slp_v2si_store): Ditto. (check_effective_target_vect_slp_v4si_store): Ditto. * c-c++-common/Wstringop-overflow-2-novec.c: New test. * gcc.dg/Warray-bounds-51-novec.c: New test. * gcc.dg/Warray-bounds-48-novec.c: New test. * gcc.dg/Warray-parameter-3-novec.c: New test. * gcc.dg/Wstringop-overflow-14-novec.c: New test. * gcc.dg/Wstringop-overflow-21-novec.c: New test. * gcc.dg/Wstringop-overflow-76-novec.c: New test. * gcc.dg/Wzero-length-array-bounds-2-novec.c: New test. --- gcc/doc/sourcebuild.texi | 32 ++ .../c-c++-common/Wstringop-overflow-2-novec.c | 126 ++ .../c-c++-common/Wstringop-overflow-2.c | 20 +- gcc/testsuite/gcc.dg/Warray-bounds-48-novec.c | 364 ++ gcc/testsuite/gcc.dg/Warray-bounds-48.c | 4 +- gcc/testsuite/gcc.dg/Warray-bounds-51-novec.c | 21 + gcc/testsuite/gcc.dg/Warray-bounds-51.c | 2 +- .../gcc.dg/Warray-parameter-3-novec.c | 16 + gcc/testsuite/gcc.dg/Warray-parameter-3.c | 2 +- .../gcc.dg/Wstringop-overflow-14-novec.c | 16 + gcc/testsuite/gcc.dg/Wstringop-overflow-14.c | 4 +- .../gcc.dg/Wstringop-overflow-21-novec.c | 34 ++ gcc/testsuite/gcc.dg/Wstringop-overflow-21.c | 8 +- gcc/testsuite/gcc.dg/Wstringop-overflow-68.c | 17 +- .../gcc.dg/Wstringop-overflow-76-novec.c | 88 + gcc/testsuite/gcc.dg/Wstringop-overflow-76.c | 18 +- .../Wzero-length-array-bounds-2-novec.c | 45 +++ .../gcc.dg/Wzero-length-array-bounds-2.c | 2 +- gcc/testsuite/lib/target-supports.exp | 182 + 19 files changed, 967 insertions(+), 34 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/Wstringop-overflow-2-novec.c create mode 100644 gcc/testsuite/gcc.dg/Warray-bounds-48-novec.c create mode 100644 gcc/testsuite/gcc.dg/Warray-bounds-51-novec.c create mode 100644 gcc/testsuite/gcc.dg/Warray-parameter-3-novec.c create mode 100644 gcc/testsuite/gcc.dg/Wstringop-overflow-14-novec.c create mode 100644 gcc/testsuite/gcc.dg/Wstringop-overflow-21-novec.c create mode 100644 gcc/testsuite/gcc.dg/Wstringop-overflow-76-novec.c create mode 100644 gcc/testsuite/gcc.dg/Wzero-length-array-bounds-2-novec.c diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index b1fffd5e90f..6a165767630 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1845,6 +1845,38 @@ Target supports loop vectorization with partial vectors and @item vect_partial_vectors Target supports loop vectorization with partial vectors and @code{vect-partial-vector-usage} is nonzero. + +@item vect_slp_v2qi_store +Target supports vectorization of 2-byte char stores with 2-byte aligned +address at plain @option{-O2}. + +@item vect_slp_v4qi_store +Target supports vectorization of 4-byte char stores with 4-byte aligned +address at plai
Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.
On 10/19/21 10:40 AM, Richard Biener wrote: On Tue, Oct 19, 2021 at 9:33 AM Aldy Hernandez wrote: On Tue, Oct 19, 2021 at 8:52 AM Richard Biener wrote: On Mon, Oct 18, 2021 at 4:03 PM Aldy Hernandez wrote: On 10/18/21 3:41 PM, Aldy Hernandez wrote: I've been experimenting with reducing the total number of threading passes, and I'd like to see if there's consensus/stomach for altering the pipeline. Note, that the goal is to remove forward threader clients, not the other way around. So, we should prefer to remove a VRP threader instance over a *.thread one immediately before VRP. After some playing, it looks like if we enable fully-resolving mode in the *.thread passes immediately preceeding VRP, we can remove the VRP threading passes altogether, thus removing 2 threading passes (and forward threading passes at that!). It occurs to me that we could also remove the threading before VRP passes, and enable a fully-resolving backward threader after VRP. I haven't played with this scenario, but it should be just as good. That being said, I don't know the intricacies of why we had both pre and post VRP threading passes, and if one is ideally better than the other. It was done because they were different threaders. Since the new threader uses built-in VRP it shouldn't really matter whether it's before or after VRP _for the threading_, but it might be that if threading runs before VRP then VRP itself can do a better job on cleaning up the IL. Good point. FWIW, earlier this season I played with replacing the VRPs with evrp instances (which fold far more conditionals) and I found that the threaders can actually find LESS opportunities after *vrp fold away things. I don't know if this is a good or a bad thing. Probably a sign that either threading theads stuff that's pointless (does not consider conditions on the path that always evaluate false?) At least in the backward threader, we don't keep looking back if we can resolve the conditional at the end of an in-progress path, so it's certainly possible we thread paths that are unreachable. I'm pretty sure that's also possible in the forward threader. For example, we if we have a candidate path that ends in x > 1234 and we know on entry to the path that x is [2000,3000], there's no need to chase further back to see if the path itself is reachable. or that after VRP and removing redundant conditions blocks are now bigger and we run into some --param limit (that would suggest the limits behave odd if it would thread when we'd artificially split blocks). Examples might be interesting to look at to understand what's going "wrong". Perhaps we should benchmark three alternatives: 1. Mainline 2. Fully resolving threader -> VRP -> No threading. 3. No threading -> VRP -> Full resolving threader. ...and see what the actual effect is, regardless of number of threaded paths. As said, only 2. makes "sense" to me unless examples show why we really have the usual pass ordering issue. As said, I think threading exposes new VRP (esp. constant/copy prop) opportunities but VRP shouldn't expose new threading opportunities. Excellent! This saves me time, as I've mostly played with option #2 ;-). Thanks. Aldy
Re: [PATCH] options: Fix variable tracking option processing.
On Fri, Oct 15, 2021 at 5:22 PM Martin Liška wrote: > > All right, and there's second part that moves the code > from toplev.c to opts.c (finish_options) as I've done in the original version. > > The patch also handles PR102766 where nvptx.c target sets: > debug_nonbind_markers_p = 0; > > So the easiest approach is marking the flag as set in global_options_set, > I haven't found a better approach :/ Reason is that nvptx_option_override > is called before finish_options. So currently nvptx_option_override is called before we do this code blob (it's called at the beginning of process_options). Why's the solution not to move this setting to finish_options as well? (and disabling it along var-tracking when we end with no -g in process_options) IMHO the target should have the last say, so the hook should be invoked after we are finished overriding stuff and finish_options should be _only_ doing diagnostics and disabling stuff we cannot handle. Richard. > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed? > Thanks, > Martin
[PATCH][aarch64] target: Support whitespaces in target attr/pragma.
Hello. The patch does the same as g:df592811f950301ed3b10a08e476dad0f2eff26a for aarch64. Tested locally with cross compiler. Ready for master? Thanks, Martin PR target/102375 gcc/ChangeLog: * config/aarch64/aarch64.c (aarch64_process_one_target_attr): Strip whitespaces. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr102375.c: New test. --- gcc/config/aarch64/aarch64.c| 1 + gcc/testsuite/gcc.target/aarch64/pr102375.c | 4 2 files changed, 5 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr102375.c diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 76d99d247ae..4c3e491ab14 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -17624,6 +17624,7 @@ aarch64_process_one_target_attr (char *arg_str) bool invert = false; size_t len = strlen (arg_str); + arg_str = strip_whitespaces (arg_str, &len); if (len == 0) { diff --git a/gcc/testsuite/gcc.target/aarch64/pr102375.c b/gcc/testsuite/gcc.target/aarch64/pr102375.c new file mode 100644 index 000..fa75d319b2d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr102375.c @@ -0,0 +1,4 @@ +/* PR target/102375 */ +/* { dg-do compile } */ + +void calculate(void) __attribute__ ((target ("+sve, +sve2"))); -- 2.33.1
Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.
On Tue, Oct 19, 2021 at 11:06 AM Aldy Hernandez wrote: > > > > On 10/19/21 10:40 AM, Richard Biener wrote: > > On Tue, Oct 19, 2021 at 9:33 AM Aldy Hernandez wrote: > >> > >> On Tue, Oct 19, 2021 at 8:52 AM Richard Biener > >> wrote: > >>> > >>> On Mon, Oct 18, 2021 at 4:03 PM Aldy Hernandez wrote: > > > > On 10/18/21 3:41 PM, Aldy Hernandez wrote: > > > I've been experimenting with reducing the total number of threading > > passes, and I'd like to see if there's consensus/stomach for altering > > the pipeline. Note, that the goal is to remove forward threader > > clients, > > not the other way around. So, we should prefer to remove a VRP threader > > instance over a *.thread one immediately before VRP. > > > > After some playing, it looks like if we enable fully-resolving mode in > > the *.thread passes immediately preceeding VRP, we can remove the VRP > > threading passes altogether, thus removing 2 threading passes (and > > forward threading passes at that!). > > It occurs to me that we could also remove the threading before VRP > passes, and enable a fully-resolving backward threader after VRP. I > haven't played with this scenario, but it should be just as good. That > being said, I don't know the intricacies of why we had both pre and post > VRP threading passes, and if one is ideally better than the other. > >>> > >>> It was done because they were different threaders. Since the new threader > >>> uses built-in VRP it shouldn't really matter whether it's before or after > >>> VRP _for the threading_, but it might be that if threading runs before VRP > >>> then VRP itself can do a better job on cleaning up the IL. > >> > >> Good point. > >> > >> FWIW, earlier this season I played with replacing the VRPs with evrp > >> instances (which fold far more conditionals) and I found that the > >> threaders can actually find LESS opportunities after *vrp fold away > >> things. I don't know if this is a good or a bad thing. > > > > Probably a sign that either threading theads stuff that's pointless > > (does not consider conditions on the path that always evaluate false?) > > At least in the backward threader, we don't keep looking back if we can > resolve the conditional at the end of an in-progress path, so it's > certainly possible we thread paths that are unreachable. I'm pretty > sure that's also possible in the forward threader. > > For example, we if we have a candidate path that ends in x > 1234 and we > know on entry to the path that x is [2000,3000], there's no need to > chase further back to see if the path itself is reachable. For that matter, when I was working on replacing the DOM threader, I found out that the forward threader + evrp routinely tried to thread paths that were unreachable, and I had to trim them from my comparison tally. The new backward threader engine suffers less from this, because if there is an UNDEFINED range as part of the in-path calculation, we can trim the path as unreachable (and avoid further searches in that direction). However, as I said, if the range is known on entry, we do no further lookups and happily thread away. Aldy
Re: [PATCH] options: Fix variable tracking option processing.
On 10/19/21 11:12, Richard Biener wrote: On Fri, Oct 15, 2021 at 5:22 PM Martin Liška wrote: All right, and there's second part that moves the code from toplev.c to opts.c (finish_options) as I've done in the original version. The patch also handles PR102766 where nvptx.c target sets: debug_nonbind_markers_p = 0; So the easiest approach is marking the flag as set in global_options_set, I haven't found a better approach :/ Reason is that nvptx_option_override is called before finish_options. So currently nvptx_option_override is called before we do this code blob (it's called at the beginning of process_options). Yes, happens early in process_options. Why's the solution not to move this setting to finish_options as well? I would like to, but the option detection depends on target hooks and some debuginfo functionality, leading to: g++ -no-pie -g -DIN_GCC -fPIC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -static-libstdc++ -static-libgcc -o Tlto-wrapper \ lto-wrapper.o collect-utils.o ggc-none.o libcommon-target.a libcommon.a ../libcpp/libcpp.a ../libbacktrace/.libs/libbacktrace.a ../libiberty/pic/libiberty.a ../libdecnumber/libdecnumber.a /home/marxin/Programming/gcc/gcc/opts.c:1382: error: undefined reference to 'dwarf2out_default_as_loc_support()' /home/marxin/Programming/gcc/gcc/opts.c:1384: error: undefined reference to 'dwarf2out_default_as_locview_support()' /home/marxin/Programming/gcc/gcc/opts.c:1409: error: undefined reference to 'targetm' Or can we do better? (and disabling it along var-tracking when we end with no -g in process_options) IMHO the target should have the last say, so the hook should be invoked after we are finished overriding stuff and finish_options should be _only_ doing diagnostics and disabling stuff we cannot handle. Well, that sounds good, but we are quite far from that :/ Martin Richard. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin
[PATCH] tree-optimization/102827 - avoid stmts in preheader
The PR shows that when carefully crafting the runtime alias condition in the vectorizer we might end up using defs from the loop preheader but will end up inserting the condition before the .LOOP_VECTORIZED call. So the following makes sure to insert invariants before that when we versioned the loop, preserving the invariant the vectorizer relies on. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-10-19 Richard Biener PR tree-optimization/102827 * tree-if-conv.c (predicate_statements): Add pe parameter and use that edge to insert invariant stmts on. (combine_blocks): Pass through pe. (tree_if_conversion): Compute the edge to insert invariant stmts on and pass it along. * gcc.dg/pr102827.c: New testcase. --- gcc/testsuite/gcc.dg/pr102827.c | 13 + gcc/tree-if-conv.c | 21 +++-- 2 files changed, 28 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr102827.c diff --git a/gcc/testsuite/gcc.dg/pr102827.c b/gcc/testsuite/gcc.dg/pr102827.c new file mode 100644 index 000..eed3eba32d1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr102827.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O -ftree-vectorize --param ssa-name-def-chain-limit=0" } */ +/* { dg-additional-options "-mavx" { target { x86_64-*-* i?86-*-* } } } */ + +void +test_double_double_nugt_var (double *dest, double *src, int b, int i) +{ + while (i < 1) +{ + dest[i] = b ? src[i] : 0.0; + ++i; +} +} diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index 15dcc1e2b94..b165dc0c17f 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -2490,7 +2490,7 @@ predicate_rhs_code (gassign *stmt, tree mask, tree cond, */ static void -predicate_statements (loop_p loop) +predicate_statements (loop_p loop, edge pe) { unsigned int i, orig_loop_num_nodes = loop->num_nodes; auto_vec vect_sizes; @@ -2596,8 +2596,7 @@ predicate_statements (loop_p loop) if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2)) && expr_invariant_in_loop_p (loop, gimple_assign_rhs1 (stmt2))) - gsi_insert_on_edge_immediate (loop_preheader_edge (loop), - stmt2); + gsi_insert_on_edge_immediate (pe, stmt2); else if (first) { gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT); @@ -2679,7 +2678,7 @@ remove_conditions_and_labels (loop_p loop) blocks. Replace PHI nodes with conditional modify expressions. */ static void -combine_blocks (class loop *loop) +combine_blocks (class loop *loop, edge pe) { basic_block bb, exit_bb, merge_target_bb; unsigned int orig_loop_num_nodes = loop->num_nodes; @@ -2692,7 +2691,7 @@ combine_blocks (class loop *loop) predicate_all_scalar_phis (loop); if (need_to_predicate || need_to_rewrite_undefined) -predicate_statements (loop); +predicate_statements (loop, pe); /* Merge basic blocks. */ exit_bb = NULL; @@ -3187,6 +3186,7 @@ tree_if_conversion (class loop *loop, vec *preds) bool aggressive_if_conv; class loop *rloop; bitmap exit_bbs; + edge pe; again: rloop = NULL; @@ -3218,6 +3218,9 @@ tree_if_conversion (class loop *loop, vec *preds) || loop->dont_vectorize)) goto cleanup; + /* The edge to insert invariant stmts on. */ + pe = loop_preheader_edge (loop); + /* Since we have no cost model, always version loops unless the user specified -ftree-loop-if-convert or unless versioning is required. Either version this loop, or if the pattern is right for outer-loop @@ -3255,12 +3258,18 @@ tree_if_conversion (class loop *loop, vec *preds) gcc_assert (nloop->inner && nloop->inner->next == NULL); rloop = nloop->inner; } + else + /* If we versioned loop then make sure to insert invariant + stmts before the .LOOP_VECTORIZED check since the vectorizer + will re-use that for things like runtime alias versioning + whose condition can end up using those invariants. */ + pe = single_pred_edge (gimple_bb (preds->last ())); } /* Now all statements are if-convertible. Combine all the basic blocks into one huge basic block doing the if-conversion on-the-fly. */ - combine_blocks (loop); + combine_blocks (loop, pe); /* Perform local CSE, this esp. helps the vectorizer analysis if loads and stores are involved. CSE only the loop body, not the entry -- 2.31.1
[PATCH] Compute negative offset in get_load_store_type
This moves the computation of a negative offset that needs to be applied when we vectorize a negative stride access to get_load_store_type alongside where we compute the actual access method. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-10-19 Richard Biener * tree-vect-stmts.c (get_negative_load_store_type): Add offset output parameter and initialize it. (get_group_load_store_type): Likewise. (get_load_store_type): Likewise. (vectorizable_store): Use offset as computed by get_load_store_type. (vectorizable_load): Likewise. --- gcc/tree-vect-stmts.c | 33 + 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index c3690769d8f..09a97b44c91 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -1967,13 +1967,14 @@ perm_mask_for_reverse (tree vectype) /* A subroutine of get_load_store_type, with a subset of the same arguments. Handle the case where STMT_INFO is a load or store that - accesses consecutive elements with a negative step. */ + accesses consecutive elements with a negative step. Sets *POFFSET + to the offset to be applied to the DR for the first access. */ static vect_memory_access_type get_negative_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, tree vectype, vec_load_store_type vls_type, - unsigned int ncopies) + unsigned int ncopies, poly_int64 *poffset) { dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info); dr_alignment_support alignment_support_scheme; @@ -2003,6 +2004,7 @@ get_negative_load_store_type (vec_info *vinfo, dump_printf_loc (MSG_NOTE, vect_location, "negative step with invariant source;" " no permute needed.\n"); + *poffset = -TYPE_VECTOR_SUBPARTS (vectype) + 1; return VMAT_CONTIGUOUS_DOWN; } @@ -2014,6 +2016,7 @@ get_negative_load_store_type (vec_info *vinfo, return VMAT_ELEMENTWISE; } + *poffset = -TYPE_VECTOR_SUBPARTS (vectype) + 1; return VMAT_CONTIGUOUS_REVERSE; } @@ -2108,6 +2111,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, tree vectype, slp_tree slp_node, bool masked_p, vec_load_store_type vls_type, vect_memory_access_type *memory_access_type, + poly_int64 *poffset, dr_alignment_support *alignment_support_scheme, int *misalignment, gather_scatter_info *gs_info) @@ -2210,7 +2214,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, /* ??? The VMAT_CONTIGUOUS_REVERSE code generation is only correct for single element "interleaving" SLP. */ *memory_access_type = get_negative_load_store_type - (vinfo, stmt_info, vectype, vls_type, 1); +(vinfo, stmt_info, vectype, vls_type, 1, poffset); else { /* Try to use consecutive accesses of DR_GROUP_SIZE elements, @@ -2359,6 +2363,7 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, bool masked_p, vec_load_store_type vls_type, unsigned int ncopies, vect_memory_access_type *memory_access_type, +poly_int64 *poffset, dr_alignment_support *alignment_support_scheme, int *misalignment, gather_scatter_info *gs_info) @@ -2366,6 +2371,7 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, loop_vec_info loop_vinfo = dyn_cast (vinfo); poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); *misalignment = DR_MISALIGNMENT_UNKNOWN; + *poffset = 0; if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) { *memory_access_type = VMAT_GATHER_SCATTER; @@ -2412,7 +2418,7 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, { if (!get_group_load_store_type (vinfo, stmt_info, vectype, slp_node, masked_p, - vls_type, memory_access_type, + vls_type, memory_access_type, poffset, alignment_support_scheme, misalignment, gs_info)) return false; @@ -2444,7 +2450,7 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, { if (cmp < 0) *memory_access_type = get_negative_load_store_type - (vinfo, stmt_info, vectype, vls_type, ncopies); + (vinfo, stmt_info, vectype, vls_ty
[committed] libstdc++: Fix mem-initializer in std::move_only_function [PR102825]
libstdc++-v3/ChangeLog: PR libstdc++/102825 * include/bits/mofunc_impl.h (move_only_function): Remove invalid base initializer. * testsuite/20_util/move_only_function/cons.cc: Instantiate constructors to check bodies. Tested powerpc64le-linux. Committed to trunk. commit 9890b12c72c02828c691f22198c3e0afd8678991 Author: Jonathan Wakely Date: Tue Oct 19 09:16:56 2021 libstdc++: Fix mem-initializer in std::move_only_function [PR102825] libstdc++-v3/ChangeLog: PR libstdc++/102825 * include/bits/mofunc_impl.h (move_only_function): Remove invalid base initializer. * testsuite/20_util/move_only_function/cons.cc: Instantiate constructors to check bodies. diff --git a/libstdc++-v3/include/bits/mofunc_impl.h b/libstdc++-v3/include/bits/mofunc_impl.h index 543c6f547b7..968d235f867 100644 --- a/libstdc++-v3/include/bits/mofunc_impl.h +++ b/libstdc++-v3/include/bits/mofunc_impl.h @@ -108,7 +108,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION move_only_function(in_place_type_t<_Tp>, initializer_list<_Up> __il, _Args&&... __args) noexcept(_S_nothrow_init<_Tp, initializer_list<_Up>&, _Args...>()) - : _Mofunc_base(nullptr), _M_invoke(&_S_invoke<_Tp>) + : _M_invoke(&_S_invoke<_Tp>) { static_assert(is_same_v, _Tp>); _M_init<_Tp>(__il, std::forward<_Args>(__args)...); diff --git a/libstdc++-v3/testsuite/20_util/move_only_function/cons.cc b/libstdc++-v3/testsuite/20_util/move_only_function/cons.cc index 0992f107003..d8a0a4ab2b0 100644 --- a/libstdc++-v3/testsuite/20_util/move_only_function/cons.cc +++ b/libstdc++-v3/testsuite/20_util/move_only_function/cons.cc @@ -96,3 +96,30 @@ static_assert( ! is_nothrow_constructible_v, in_place_type_t, int> ); static_assert( is_nothrow_constructible_v, in_place_type_t, int, int> ); + +struct I { + I(int, const char*); + I(std::initializer_list); + int operator()() const noexcept; +}; + +static_assert( is_constructible_v, + std::in_place_type_t, + int, const char*> ); +static_assert( is_constructible_v, + std::in_place_type_t, + std::initializer_list> ); + +void +test_instantiation() +{ + // Instantiate the constructor bodies + move_only_function f0; + move_only_function f1(nullptr); + move_only_function f2( I(1, "two") ); + move_only_function f3(std::in_place_type, 3, "four"); + move_only_function f4(std::in_place_type, // PR libstdc++/102825 + { 'P', 'R', '1', '0', '2', '8', '2', '5'}); + auto f5 = std::move(f4); + f4 = std::move(f5); +}
Re: [PATCH][aarch64] target: Support whitespaces in target attr/pragma.
Martin Liška writes: > Hello. > > The patch does the same as g:df592811f950301ed3b10a08e476dad0f2eff26a for > aarch64. > > Tested locally with cross compiler. > > Ready for master? > > Thanks, > Martin > > PR target/102375 > > gcc/ChangeLog: > > * config/aarch64/aarch64.c (aarch64_process_one_target_attr): > Strip whitespaces. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/pr102375.c: New test. > --- > gcc/config/aarch64/aarch64.c| 1 + > gcc/testsuite/gcc.target/aarch64/pr102375.c | 4 > 2 files changed, 5 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr102375.c > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 76d99d247ae..4c3e491ab14 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -17624,6 +17624,7 @@ aarch64_process_one_target_attr (char *arg_str) > bool invert = false; > > size_t len = strlen (arg_str); > + arg_str = strip_whitespaces (arg_str, &len); It looks like this ought to happen after the alloca and copy, since it modifies the string. Thanks, Richard > > if (len == 0) > { > diff --git a/gcc/testsuite/gcc.target/aarch64/pr102375.c > b/gcc/testsuite/gcc.target/aarch64/pr102375.c > new file mode 100644 > index 000..fa75d319b2d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr102375.c > @@ -0,0 +1,4 @@ > +/* PR target/102375 */ > +/* { dg-do compile } */ > + > +void calculate(void) __attribute__ ((target ("+sve, +sve2")));
Re: [PATCH] options: Fix variable tracking option processing.
On Tue, Oct 19, 2021 at 11:34 AM Martin Liška wrote: > > On 10/19/21 11:12, Richard Biener wrote: > > On Fri, Oct 15, 2021 at 5:22 PM Martin Liška wrote: > >> > >> All right, and there's second part that moves the code > >> from toplev.c to opts.c (finish_options) as I've done in the original > >> version. > >> > >> The patch also handles PR102766 where nvptx.c target sets: > >> debug_nonbind_markers_p = 0; > >> > >> So the easiest approach is marking the flag as set in global_options_set, > >> I haven't found a better approach :/ Reason is that nvptx_option_override > >> is called before finish_options. > > > > So currently nvptx_option_override is called before we do this code > > blob (it's called at the beginning of process_options). > > Yes, happens early in process_options. > > > > > Why's the solution not to move this setting to finish_options as well? > > I would like to, but the option detection depends on target hooks and some > debuginfo > functionality, leading to: > > g++ -no-pie -g -DIN_GCC -fPIC -DCROSS_DIRECTORY_STRUCTURE > -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall > -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag > -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long > -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H > -static-libstdc++ -static-libgcc -o Tlto-wrapper \ > lto-wrapper.o collect-utils.o ggc-none.o libcommon-target.a libcommon.a > ../libcpp/libcpp.a ../libbacktrace/.libs/libbacktrace.a > ../libiberty/pic/libiberty.a ../libdecnumber/libdecnumber.a > /home/marxin/Programming/gcc/gcc/opts.c:1382: error: undefined reference to > 'dwarf2out_default_as_loc_support()' > /home/marxin/Programming/gcc/gcc/opts.c:1384: error: undefined reference to > 'dwarf2out_default_as_locview_support()' > /home/marxin/Programming/gcc/gcc/opts.c:1409: error: undefined reference to > 'targetm' > > Or can we do better? Meh ... :/ Well, move the target override hook call down (try to shuffle things so diagnostics happen after but "inits" happen before). > > (and disabling it along var-tracking when we end with no -g in > > process_options) > > > > IMHO the target should have the last say, so the hook should be invoked > > after we are finished overriding stuff and finish_options should be > > _only_ doing diagnostics and disabling stuff we cannot handle. > > Well, that sounds good, but we are quite far from that :/ Yes, I know... > Martin > > > > > Richard. > > > >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > >> > >> Ready to be installed? > >> Thanks, > >> Martin >
Re: [COMMITTED] Change threading comment before pass_ccp pass.
On Tue, Oct 19, 2021 at 10:48 AM Aldy Hernandez wrote: > > As suggested. Thanks. > > gcc/ChangeLog: > > * passes.def: Change threading comment before pass_ccp pass. > --- > gcc/passes.def | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/gcc/passes.def b/gcc/passes.def > index c11c237f6d2..4c54176328b 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -339,9 +339,7 @@ along with GCC; see the file COPYING3. If not see >NEXT_PASS (pass_thread_jumps); >NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); >NEXT_PASS (pass_vrp_threader); > - /* Threading can leave many const/copy propagations in the IL. > -Clean them up. Instead of just copy_prop, we use ccp to > -compute alignment and nonzero bits. */ Well, the comment is still true as long as pass_vrp_threader is there ;) > + /* Run CCP to compute alignment and nonzero bits. */ >NEXT_PASS (pass_ccp, true /* nonzero_p */); >NEXT_PASS (pass_warn_restrict); >NEXT_PASS (pass_dse); > -- > 2.31.1 >
Re: [match.pd] PR83750 - CSE erf/erfc pair
On Tue, 19 Oct 2021 at 13:02, Richard Biener wrote: > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches > wrote: > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener wrote: > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener wrote: > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener > > > > > > wrote: > > > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > Hi Richard, > > > > > > > > As suggested in PR, I have attached WIP patch that adds two > > > > > > > > patterns > > > > > > > > to match.pd: > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and, > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p(). > > > > > > > > > > > > > > > > This works to remove call to erfc for the following test: > > > > > > > > double f(double x) > > > > > > > > { > > > > > > > > double g(double, double); > > > > > > > > > > > > > > > > double t1 = __builtin_erf (x); > > > > > > > > double t2 = __builtin_erfc (x); > > > > > > > > return g(t1, t2); > > > > > > > > } > > > > > > > > > > > > > > > > with .optimized dump shows: > > > > > > > > t1_2 = __builtin_erf (x_1(D)); > > > > > > > > t2_3 = 1.0e+0 - t1_2; > > > > > > > > > > > > > > > > However, for the following test: > > > > > > > > double f(double x) > > > > > > > > { > > > > > > > > double g(double, double); > > > > > > > > > > > > > > > > double t1 = __builtin_erfc (x); > > > > > > > > return t1; > > > > > > > > } > > > > > > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform > > > > > > > > 1 - > > > > > > > > erf(x) to erfc(x) again > > > > > > > > post canonicalization. > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets > > > > > > > > applied, > > > > > > > > but then it tries to > > > > > > > > resimplify erfc(x), which fails post canonicalization. So we > > > > > > > > end up > > > > > > > > with erfc(x) transformed to > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal. > > > > > > > > Could you suggest how to proceed ? > > > > > > > > > > > > > > I applied your patch manually and it does the intended > > > > > > > simplifications so I wonder what I am missing? > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when > > > > > > there's > > > > > > no erf(x) in the source ? > > > > > > > > > > I do think it's reasonable to expect erfc to be available when erf > > > > > is and vice versa but note both are C99 specified functions (either > > > > > requires -lm). > > > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ? > > > > > > Yes, but I'm confused because you say the patch doesn't work for you? > > The patch works for me to CSE erf/erfc pair. > > However when there's only erfc in the source, it canonicalizes erfc(x) > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to > > erfc(x) > > with -O3 -funsafe-math-optimizations. > > > > For, > > t1 = __builtin_erfc(x), > > > > .optimized dump shows: > > _2 = __builtin_erf (x_1(D)); > > t1_3 = 1.0e+0 - _2; > > > > and for, > > double t1 = x + __builtin_erfc(x); > > > > .optimized dump shows: > > _3 = __builtin_erf (x_2(D)); > > _7 = x_2(D) + 1.0e+0; > > t1_4 = _7 - _3; > > > > I assume in both cases, we want erfc in the code-gen instead ? > > I think the reason uncaonicalization fails is because the pattern 1 - > > erf(x) to erfc(x) > > gets applied, but then it fails in resimplifying erfc(x), and we end > > up with 1 - erf(x) in code-gen. > > > > From gimple-match.c, it hits the simplification: > > > > gimple_seq *lseq = seq; > > if (__builtin_expect (!dbg_cnt > > (match), 0)) goto next_after_fail1172; > > if (__builtin_expect (dump_file && > > (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern > > %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__); > > { > > res_op->set_op (CFN_BUILT_IN_ERFC, type, > > 1); > > res_op->ops[0] = captures[0]; > > res_op->resimplify (lseq, valueize); > > return true; > > } > > > > But res_op->resimplify returns false, and doesn't end up adding to lseq. > > There's nothing to add to lseq since there's also nothing to resimplify. > The only thing that could happen is that the replacement is not done > because replace_stmt_with_simplification via maybe_push_res_to_seq > doesn't pass the builtin_decl_implicit test: > > /* Find the function we want to call. */ > tree decl = builtin_decl_implicit (as_builtin_fn (fn)); > if (!decl) >
[PATCH] MAINTAINERS: Add myself for write after approval
ChangeLog: 2021-10-19 Clément Chigot * MAINTAINERS: Add myself for write after approval. From 13ddc381ea7bde6df9e48fb968d9324564f7a540 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Cl=C3=A9ment=20Chigot?= Date: Tue, 19 Oct 2021 13:20:14 +0200 Subject: [PATCH] MAINTAINERS: Add myself for write after approval MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ChangeLog: 2021-10-19 Clément Chigot * MAINTAINERS: Add myself for write after approval. --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index bf4006c779f..b22f930583a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -352,6 +352,7 @@ Gabriel Charette Chandra Chavva Dehao Chen Fabien Chêne +Clément Chigot Harshit Chopra Tamar Christina Eric Christopher -- 2.25.1
Re: [match.pd] PR83750 - CSE erf/erfc pair
On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote: > On Tue, 19 Oct 2021 at 13:02, Richard Biener > wrote: > > > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches > > wrote: > > > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener wrote: > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener > > > > > wrote: > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener > > > > > > > wrote: > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > Hi Richard, > > > > > > > > > As suggested in PR, I have attached WIP patch that adds two > > > > > > > > > patterns > > > > > > > > > to match.pd: > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and, > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p(). > > > > > > > > > > > > > > > > > > This works to remove call to erfc for the following test: > > > > > > > > > double f(double x) > > > > > > > > > { > > > > > > > > > double g(double, double); > > > > > > > > > > > > > > > > > > double t1 = __builtin_erf (x); > > > > > > > > > double t2 = __builtin_erfc (x); > > > > > > > > > return g(t1, t2); > > > > > > > > > } > > > > > > > > > > > > > > > > > > with .optimized dump shows: > > > > > > > > > t1_2 = __builtin_erf (x_1(D)); > > > > > > > > > t2_3 = 1.0e+0 - t1_2; > > > > > > > > > > > > > > > > > > However, for the following test: > > > > > > > > > double f(double x) > > > > > > > > > { > > > > > > > > > double g(double, double); > > > > > > > > > > > > > > > > > > double t1 = __builtin_erfc (x); > > > > > > > > > return t1; > > > > > > > > > } > > > > > > > > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not > > > > > > > > > transform 1 - > > > > > > > > > erf(x) to erfc(x) again > > > > > > > > > post canonicalization. > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets > > > > > > > > > applied, > > > > > > > > > but then it tries to > > > > > > > > > resimplify erfc(x), which fails post canonicalization. So we > > > > > > > > > end up > > > > > > > > > with erfc(x) transformed to > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal. > > > > > > > > > Could you suggest how to proceed ? > > > > > > > > > > > > > > > > I applied your patch manually and it does the intended > > > > > > > > simplifications so I wonder what I am missing? > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when > > > > > > > there's > > > > > > > no erf(x) in the source ? > > > > > > > > > > > > I do think it's reasonable to expect erfc to be available when erf > > > > > > is and vice versa but note both are C99 specified functions (either > > > > > > requires -lm). > > > > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ? > > > > > > > > Yes, but I'm confused because you say the patch doesn't work for you? > > > The patch works for me to CSE erf/erfc pair. > > > However when there's only erfc in the source, it canonicalizes erfc(x) > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to > > > erfc(x) > > > with -O3 -funsafe-math-optimizations. > > > > > > For, > > > t1 = __builtin_erfc(x), > > > > > > .optimized dump shows: > > > _2 = __builtin_erf (x_1(D)); > > > t1_3 = 1.0e+0 - _2; > > > > > > and for, > > > double t1 = x + __builtin_erfc(x); > > > > > > .optimized dump shows: > > > _3 = __builtin_erf (x_2(D)); > > > _7 = x_2(D) + 1.0e+0; > > > t1_4 = _7 - _3; > > > > > > I assume in both cases, we want erfc in the code-gen instead ? > > > I think the reason uncaonicalization fails is because the pattern 1 - > > > erf(x) to erfc(x) > > > gets applied, but then it fails in resimplifying erfc(x), and we end > > > up with 1 - erf(x) in code-gen. > > > > > > From gimple-match.c, it hits the simplification: > > > > > > gimple_seq *lseq = seq; > > > if (__builtin_expect (!dbg_cnt > > > (match), 0)) goto next_after_fail1172; > > > if (__builtin_expect (dump_file && > > > (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern > > > %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__); > > > { > > > res_op->set_op (CFN_BUILT_IN_ERFC, > > > type, 1); > > > res_op->ops[0] = captures[0]; > > > res_op->resimplify (lseq, valueize); > > > return true; > > > } > > > > > > But res_op->resimplify returns false, and doesn't end up adding to lseq. > > > > There's nothing to add to lseq since there's also nothing to resimplify. > > The only thing that could happen is t
Re: [COMMITTED] Change threading comment before pass_ccp pass.
On 10/19/21 12:55 PM, Richard Biener wrote: On Tue, Oct 19, 2021 at 10:48 AM Aldy Hernandez wrote: As suggested. Thanks. gcc/ChangeLog: * passes.def: Change threading comment before pass_ccp pass. --- gcc/passes.def | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/gcc/passes.def b/gcc/passes.def index c11c237f6d2..4c54176328b 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -339,9 +339,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); NEXT_PASS (pass_vrp_threader); - /* Threading can leave many const/copy propagations in the IL. -Clean them up. Instead of just copy_prop, we use ccp to -compute alignment and nonzero bits. */ Well, the comment is still true as long as pass_vrp_threader is there ;) Ha ha! Oops. You're right. Tell you what, if we decide not to rearrange the pipeline, I'll put the comment back :). Aldy
[PATCH] Refactor load/store costing
This passes down the already available alignment scheme and misalignment to the load/store costing routines, removing redundant queries. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-10-19 Richard Biener * tree-vect-data-refs.c (vect_get_data_access_cost): Get alignment support scheme and misalignment as arguments and pass them down. (vect_get_peeling_costs_all_drs): Compute that info here and note that we shouldn't need to. * tree-vect-stmts.c (vect_model_store_cost): Get alignment support scheme and misalignment as arguments. (vect_get_store_cost): Likewise. (vect_model_load_cost): Likewise. (vect_get_load_cost): Likewise. (vectorizable_store): Pass down alignment support scheme and misalignment to costing. (vectorizable_load): Likewise. --- gcc/tree-vect-data-refs.c | 20 +++ gcc/tree-vect-stmts.c | 41 --- gcc/tree-vectorizer.h | 4 +++- 3 files changed, 40 insertions(+), 25 deletions(-) diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 7c95f9ad69e..0db6aec7312 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -1396,7 +1396,9 @@ vector_alignment_reachable_p (dr_vec_info *dr_info) static void vect_get_data_access_cost (vec_info *vinfo, dr_vec_info *dr_info, - unsigned int *inside_cost, + dr_alignment_support alignment_support_scheme, + int misalignment, + unsigned int *inside_cost, unsigned int *outside_cost, stmt_vector_for_cost *body_cost_vec, stmt_vector_for_cost *prologue_cost_vec) @@ -1411,10 +1413,12 @@ vect_get_data_access_cost (vec_info *vinfo, dr_vec_info *dr_info, ncopies = vect_get_num_copies (loop_vinfo, STMT_VINFO_VECTYPE (stmt_info)); if (DR_IS_READ (dr_info->dr)) -vect_get_load_cost (vinfo, stmt_info, ncopies, true, inside_cost, +vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, + misalignment, true, inside_cost, outside_cost, prologue_cost_vec, body_cost_vec, false); else -vect_get_store_cost (vinfo,stmt_info, ncopies, inside_cost, body_cost_vec); +vect_get_store_cost (vinfo,stmt_info, ncopies, alignment_support_scheme, +misalignment, inside_cost, body_cost_vec); if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, @@ -1545,7 +1549,15 @@ vect_get_peeling_costs_all_drs (loop_vec_info loop_vinfo, vect_dr_misalign_for_aligned_access (dr0_info)); else vect_update_misalignment_for_peel (dr_info, dr0_info, npeel); - vect_get_data_access_cost (loop_vinfo, dr_info, inside_cost, outside_cost, + /* ??? We should be able to avoid both the adjustment before and the +call to vect_supportable_dr_alignment below. */ + tree vectype = STMT_VINFO_VECTYPE (dr_info->stmt); + int misalignment = dr_misalignment (dr_info, vectype); + dr_alignment_support alignment_support_scheme + = vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype); + vect_get_data_access_cost (loop_vinfo, dr_info, +alignment_support_scheme, misalignment, +inside_cost, outside_cost, body_cost_vec, prologue_cost_vec); SET_DR_MISALIGNMENT (dr_info, save_misalignment); } diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 09a97b44c91..afc3ef17834 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -909,6 +909,8 @@ cfun_returns (tree decl) static void vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, vect_memory_access_type memory_access_type, + dr_alignment_support alignment_support_scheme, + int misalignment, vec_load_store_type vls_type, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { @@ -969,7 +971,8 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, scalar_store, stmt_info, 0, vect_body); } else -vect_get_store_cost (vinfo, stmt_info, ncopies, &inside_cost, cost_vec); +vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, +misalignment, &inside_cost, cost_vec); if (memory_access_type == VMAT_ELEMENTWISE || memory_access_type == VMAT_STRIDED_SLP) @@ -1021,15 +1024,12 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, /* Calculate cost of DR's memory access. */ void -vect_get_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, in
[PATCH, v2] c++: Diagnose taking address of an immediate member function [PR102753]
On Mon, Oct 18, 2021 at 12:42:00PM -0400, Jason Merrill wrote: > > --- gcc/cp/typeck.c.jj 2021-10-05 09:53:55.382734051 +0200 > > +++ gcc/cp/typeck.c 2021-10-15 19:28:38.034213437 +0200 > > @@ -6773,9 +6773,21 @@ cp_build_addr_expr_1 (tree arg, bool str > > return error_mark_node; > > } > > + if (TREE_CODE (t) == FUNCTION_DECL > > + && DECL_IMMEDIATE_FUNCTION_P (t) > > + && cp_unevaluated_operand == 0 > > + && (current_function_decl == NULL_TREE > > + || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl))) > > This doesn't cover some of the other cases of immediate context; we should > probably factor most of immediate_invocation_p out into a function called > something like in_immediate_context and use it here, and in several other > places as well. You're right, I've done that for the two spots in cp_build_addr_expr_1 and added testsuite coverage for where it changed behavior. While doing that I've discovered further issues. One is that we weren't diagnosing PMFs referring to immediate methods returned from immediate functions (either directly or embedded in aggregates). I'm not sure if it can only appear as PTRMEM_CST which I've handled (cp_walk_subtree only walks the type and not the PTRMEM_CST_MEMBER) or something else. Another issue is that while default arg in immediate function containing &immediate_fn works properly, if it is immediate_fn instead, we were incorrectly rejecting it. I've handled this in build_over_call, though with this usage in_consteval_if_p is slightly misnamed, it stands for in consteval if or some other reason why we are currently in immediate function context. Though, that flag alone can't be all the reasons for being in immediate function contexts, as I've tried the other reasons can't be handled in such a bool and need to be tested too. And another thing isn't in a patch, but I'm wondering whether we don't handle it incorrectly. constexpr.c has: /* Check that immediate invocation does not return an expression referencing any immediate function decls. They need to be allowed while parsing immediate functions, but can't leak outside of them. */ if (is_consteval && t != r && (current_function_decl == NULL_TREE || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl))) as condition for the discovery of embedded immediate FUNCTION_DECLs (or now PTRMEM_CSTs). If I remove the && (current... ..._decl)) then g++.dg/cpp2a/consteval7.C's struct S { int b; int (*c) (); }; consteval S baz () { return { 5, foo }; } consteval int qux () { S s = baz (); return s.b + s.c (); } consteval int quux () { constexpr S s = baz (); return s.b + s.c (); } quux line fails, but based on http://eel.is/c++draft/expr.const#11 I wonder if it shouldn't fail (clang++ -std=c++20 rejects it), and be only accepted without the constexpr keyword before S s. Also wonder about e.g. consteval int foo () { return 42; } consteval int bar () { auto fn1 = foo; // This must be ok constexpr auto fn2 = foo; // Isn't this an error? return fn1 () + fn2 (); } constexpr int baz () { if consteval { auto fn1 = foo; // This must be ok constexpr auto fn2 = foo; // Isn't this an error? return fn1 () + fn2 (); } return 0; } auto a = bar (); static_assert (bar () == 84); static_assert (baz () == 84); (again, clang++ -std=c++20 rejects the fn2 = foo; case, but doesn't implement consteval if, so can't test the other one). For taking address of an immediate function or method if it is taken outside of immediate function context we already have diagnostics about it, but shouldn't the immediate FUNCTION_DECL discovery in cxx_eval_outermost_constant_expression be instead guarded with something like if (is_consteval || in_immediate_context ()) and be done regardless of whether t != r? 2021-10-19 Jakub Jelinek PR c++/102753 * cp-tree.h (in_immediate_context): Declare. * call.c (in_immediate_context): New function. (immediate_invocation_p): Use it. (build_over_call): Temporarily set in_consteval_if_p for convert_default_arg calls of immediate invocations. * typeck.c (cp_build_addr_expr_1): Diagnose taking address of an immediate method. Use t instead of TREE_OPERAND (arg, 1). Use in_immediate_context function. * constexpr.c (find_immediate_fndecl): Handle PTRMEM_CST which refers to immediate function decl. * g++.dg/cpp2a/consteval20.C: New test. * g++.dg/cpp2a/consteval21.C: New test. * g++.dg/cpp2a/consteval22.C: New test. * g++.dg/cpp2a/consteval23.C: New test. * g++.dg/cpp23/consteval-if11.C: New test. --- gcc/cp/cp-tree.h.jj 2021-10-15 11:58:44.968133548 +0200 +++ gcc/cp/cp-tree.h2021-10-19 10:40:58.375799274 +0200 @@ -6547,6 +6547,7 @@ extern tree perform_direct_initializatio tsubst_flags_t); extern vec *resolve_args (vec*,
[PATCH 1/2] libstdc++: Implement LWG 3523 changes to ranges::iota_view
libstdc++-v3/ChangeLog: * include/std/ranges (iota_view::_Iterator): Befriend iota_view. (iota_view::_Sentinel): Likewise. (iota_view::iota_view): Add three overloads each taking an iterator/sentinel pair as per LWG 3523. * testsuite/std/ranges/iota/iota_view.cc (test06): New test. --- libstdc++-v3/include/std/ranges | 21 +++ .../testsuite/std/ranges/iota/iota_view.cc| 21 +++ 2 files changed, 42 insertions(+) diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges index b8de400dfbb..85f232d8fb9 100644 --- a/libstdc++-v3/include/std/ranges +++ b/libstdc++-v3/include/std/ranges @@ -532,6 +532,7 @@ namespace ranges private: _Winc _M_value = _Winc(); + friend iota_view; friend _Sentinel; }; @@ -568,6 +569,8 @@ namespace ranges operator-(const _Sentinel& __x, const _Iterator& __y) requires sized_sentinel_for<_Bound, _Winc> { return __x._M_distance_from(__y); } + + friend iota_view; }; _Winc _M_value = _Winc(); @@ -590,6 +593,24 @@ namespace ranges __glibcxx_assert( bool(__value <= __bound) ); } + constexpr + iota_view(_Iterator __first, _Iterator __last) + requires same_as<_Winc, _Bound> + : iota_view(__first._M_value, __last._M_value) + { } + + constexpr + iota_view(_Iterator __first, unreachable_sentinel_t __last) + requires same_as<_Bound, unreachable_sentinel_t> + : iota_view(__first._M_value, __last) + { } + + constexpr + iota_view(_Iterator __first, _Sentinel __last) + requires (!same_as<_Winc, _Bound>) && (!same_as<_Bound, unreachable_sentinel_t>) + : iota_view(__first._M_value, __last._M_bound) + { } + constexpr _Iterator begin() const { return _Iterator{_M_value}; } diff --git a/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc b/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc index 362ef1f7f78..5bebe4be693 100644 --- a/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc +++ b/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc @@ -18,6 +18,7 @@ // { dg-options "-std=gnu++2a" } // { dg-do run { target c++2a } } +#include #include #include @@ -90,6 +91,25 @@ test05() VERIFY( r.begin() - r.end() == -3 ); } +void +test06() +{ + // Verify LWG 3523 changes. + auto v1 = std::views::iota(0, 5); + auto w1 = decltype(v1)(v1.begin(), v1.end()); + VERIFY( std::ranges::equal(v1, w1) ); + + auto v2 = std::views::iota(0); + auto w2 = decltype(v2)(v2.begin(), v2.end()); + static_assert(std::same_as); + VERIFY( *w2.begin() == 0 ); + + auto v3 = std::views::iota(0, 5l); + auto w3 = decltype(v3)(v3.begin(), v3.end()); + static_assert(!std::ranges::common_range); + VERIFY( std::ranges::equal(v3, w3) ); +} + int main() { @@ -98,4 +118,5 @@ main() test03(); test04(); test05(); + test06(); } -- 2.33.1.637.gf443b226ca
[PATCH 2/2] libstdc++: Implement P1739R4 changes to views::take/drop/counted
This implements P1739R4 along with the resolution for LWG 3407 which corrects the paper's wording. Tested on x86_64-pc-linux-gnu, does this look OK for trunk? libstdc++-v3/ChangeLog: * include/bits/ranges_util.h (views::_Drop): Forward declare. (subrange): Befriend views::_Drop. (subrange::_S_store_size): Declare constexpr instead of just const, remove obsolete comment. * include/std/ranges (views::__detail::__is_empty_view): Define. (views::__detail::__is_basic_string_view): Likewise. (views::__detail::__is_subrange): Likewise. (views::__detail::__is_iota_view): Likewise. (views::__detail::__can_take_view): Rename template parm _Tp to _Dp. (views::_Take): Rename template parm _Tp to _Dp, make it non-deducible and fix its type to range_difference_t<_Range>. Implement P1739R4 and LWG 3407 changes. (views::__detail::__can_drop_view): Rename template parm _Tp to _Dp. (views::_Drop): As with views::_Take. (views::_Counted): Implement P1739R4 changes. * include/std/span (__detail::__is_std_span): Rename to ... (__detail::__is_span): ... this and turn it into a variable template. (__detail::__is_std_array): Turn it into a variable template. (span::span): Adjust uses of __is_std_span and __is_std_array accordingly. * testsuite/std/ranges/adaptors/p1739.cc: New test. --- libstdc++-v3/include/bits/ranges_util.h | 7 +- libstdc++-v3/include/std/ranges | 106 +++--- libstdc++-v3/include/std/span | 12 +- .../testsuite/std/ranges/adaptors/p1739.cc| 88 +++ 4 files changed, 192 insertions(+), 21 deletions(-) create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/p1739.cc diff --git a/libstdc++-v3/include/bits/ranges_util.h b/libstdc++-v3/include/bits/ranges_util.h index 7e7b958d274..1afa66d298c 100644 --- a/libstdc++-v3/include/bits/ranges_util.h +++ b/libstdc++-v3/include/bits/ranges_util.h @@ -211,6 +211,8 @@ namespace ranges } // namespace __detail + namespace views { struct _Drop; } + enum class subrange_kind : bool { unsized, sized }; /// The ranges::subrange class template @@ -221,8 +223,9 @@ namespace ranges class subrange : public view_interface> { private: - // XXX: gcc complains when using constexpr here - static const bool _S_store_size + friend struct views::_Drop; // Needs to inspect _S_store_size. + + static constexpr bool _S_store_size = _Kind == subrange_kind::sized && !sized_sentinel_for<_Sent, _It>; _It _M_begin = _It(); diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges index 85f232d8fb9..64396027c1b 100644 --- a/libstdc++-v3/include/std/ranges +++ b/libstdc++-v3/include/std/ranges @@ -42,6 +42,7 @@ #include #include #include +#include #include #include #include @@ -2147,19 +2148,66 @@ namespace views::__adaptor { namespace __detail { - template + template + constexpr bool __is_empty_view = false; + + template + constexpr bool __is_empty_view> = true; + + template + constexpr bool __is_basic_string_view = false; + + template + constexpr bool __is_basic_string_view> = true; + + template + constexpr bool __is_subrange = false; + + template + constexpr bool __is_subrange> = true; + + template + constexpr bool __is_iota_view = false; + + template + constexpr bool __is_iota_view> = true; + + template concept __can_take_view - = requires { take_view(std::declval<_Range>(), std::declval<_Tp>()); }; + = requires { take_view(std::declval<_Range>(), std::declval<_Dp>()); }; } // namespace __detail struct _Take : __adaptor::_RangeAdaptor<_Take> { - template - requires __detail::__can_take_view<_Range, _Tp> + template> + requires __detail::__can_take_view<_Range, _Dp> constexpr auto - operator() [[nodiscard]] (_Range&& __r, _Tp&& __n) const + operator() [[nodiscard]] (_Range&& __r, type_identity_t<_Dp> __n) const { - return take_view(std::forward<_Range>(__r), std::forward<_Tp>(__n)); + using _Tp = remove_cvref_t<_Range>; + if constexpr (__detail::__is_empty_view<_Tp>) + return _Tp(); + else if constexpr (random_access_range<_Tp> +&& sized_range<_Tp> +&& (std::__detail::__is_span<_Tp> +|| __detail::__is_basic_string_view<_Tp> +|| __detail::__is_subrange<_Tp> +|| __detail::__is_iota_view<_Tp>)) + { + __n = std::min<_Dp>(ranges::distance(__r), __n); + auto __begin = ranges::begin(__r); + auto __end = __begin + __n; +
Re: [PATCH][aarch64] target: Support whitespaces in target attr/pragma.
On 10/19/21 12:52, Richard Sandiford wrote: It looks like this ought to happen after the alloca and copy, since it modifies the string. Oh yeah, good point. Ready to be installed with the change? Thanks, MartinFrom 68df4cba3bccb714a14e3c795e6d9e4a44c54318 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Tue, 19 Oct 2021 11:11:16 +0200 Subject: [PATCH] target: Support whitespaces in target attr/pragma. PR target/102375 gcc/ChangeLog: * config/aarch64/aarch64.c (aarch64_process_one_target_attr): Strip whitespaces. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr102375.c: New test. --- gcc/config/aarch64/aarch64.c| 1 + gcc/testsuite/gcc.target/aarch64/pr102375.c | 4 2 files changed, 5 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr102375.c diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 76d99d247ae..fdf341812f4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -17633,6 +17633,7 @@ aarch64_process_one_target_attr (char *arg_str) char *str_to_check = (char *) alloca (len + 1); strcpy (str_to_check, arg_str); + str_to_check = strip_whitespaces (str_to_check, &len); /* We have something like __attribute__ ((target ("+fp+nosimd"))). It is easier to detect and handle it explicitly here rather than going diff --git a/gcc/testsuite/gcc.target/aarch64/pr102375.c b/gcc/testsuite/gcc.target/aarch64/pr102375.c new file mode 100644 index 000..fa75d319b2d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr102375.c @@ -0,0 +1,4 @@ +/* PR target/102375 */ +/* { dg-do compile } */ + +void calculate(void) __attribute__ ((target ("+sve, +sve2"))); -- 2.33.1
Re: [PATCH][aarch64] target: Support whitespaces in target attr/pragma.
Martin Liška writes: > On 10/19/21 12:52, Richard Sandiford wrote: >> It looks like this ought to happen after the alloca and copy, since it >> modifies the string. > > Oh yeah, good point. > > Ready to be installed with the change? > Thanks, > Martin > > From 68df4cba3bccb714a14e3c795e6d9e4a44c54318 Mon Sep 17 00:00:00 2001 > From: Martin Liska > Date: Tue, 19 Oct 2021 11:11:16 +0200 > Subject: [PATCH] target: Support whitespaces in target attr/pragma. > > PR target/102375 > > gcc/ChangeLog: > > * config/aarch64/aarch64.c (aarch64_process_one_target_attr): > Strip whitespaces. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/pr102375.c: New test. OK, thanks. Richard > --- > gcc/config/aarch64/aarch64.c| 1 + > gcc/testsuite/gcc.target/aarch64/pr102375.c | 4 > 2 files changed, 5 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr102375.c > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 76d99d247ae..fdf341812f4 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -17633,6 +17633,7 @@ aarch64_process_one_target_attr (char *arg_str) > >char *str_to_check = (char *) alloca (len + 1); >strcpy (str_to_check, arg_str); > + str_to_check = strip_whitespaces (str_to_check, &len); > >/* We have something like __attribute__ ((target ("+fp+nosimd"))). > It is easier to detect and handle it explicitly here rather than going > diff --git a/gcc/testsuite/gcc.target/aarch64/pr102375.c > b/gcc/testsuite/gcc.target/aarch64/pr102375.c > new file mode 100644 > index 000..fa75d319b2d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr102375.c > @@ -0,0 +1,4 @@ > +/* PR target/102375 */ > +/* { dg-do compile } */ > + > +void calculate(void) __attribute__ ((target ("+sve, +sve2")));
Cleanup compute_points_to_sets
Hi, this patch fixes two issues I noticed while proofreading the code. First is that I have added conditional around setting of nonlocal and escaped flags (since they may be set from solver) while keeping the variable in assignment that is confusing. Second is that we still do not set pt in the case function has no memory side effects. In this case the call use is not going to be used since uses_global_memory is false only if either function is const or modref determined that all loads are from memory pointed to by parameters. In both cases we will disambiguate earlier before asking PTA oracle, but it is better to avoid stale PTA sets (which shows in -alias dumps etc.) Most of builtins are not modifying global memory, one option would be to stick another flag into the fnspecs strings for this property. Bootstrapped/regtested x86_64-linux, OK? Honza gcc/ChangeLog: * tree-ssa-structalias.c (compute_points_to_sets): Cleanup. diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c index 2e6513bb72a..35971a54e02 100644 --- a/gcc/tree-ssa-structalias.c +++ b/gcc/tree-ssa-structalias.c @@ -7550,8 +7550,8 @@ compute_points_to_sets (void) always escaped. */ if (uses_global_memory) { - pt->nonlocal = uses_global_memory; - pt->escaped = uses_global_memory; + pt->nonlocal = 1; + pt->escaped = 1; } } else if (uses_global_memory) @@ -7561,6 +7561,8 @@ compute_points_to_sets (void) *pt = cfun->gimple_df->escaped; pt->nonlocal = 1; } + else + memset (pt, 0, sizeof (struct pt_solution)); } pt = gimple_call_clobber_set (stmt); @@ -7582,8 +7584,8 @@ compute_points_to_sets (void) always escaped. */ if (writes_global_memory) { - pt->nonlocal = writes_global_memory; - pt->escaped = writes_global_memory; + pt->nonlocal = 1; + pt->escaped = 1; } } else if (writes_global_memory) @@ -7593,6 +7595,8 @@ compute_points_to_sets (void) *pt = cfun->gimple_df->escaped; pt->nonlocal = 1; } + else + memset (pt, 0, sizeof (struct pt_solution)); } } }
Re: [PATCH 1/2] libstdc++: Implement LWG 3523 changes to ranges::iota_view
On Tue, 19 Oct 2021 at 13:18, Patrick Palka via Libstdc++ wrote: > > libstdc++-v3/ChangeLog: > > * include/std/ranges (iota_view::_Iterator): Befriend iota_view. > (iota_view::_Sentinel): Likewise. > (iota_view::iota_view): Add three overloads each taking an > iterator/sentinel pair as per LWG 3523. > * testsuite/std/ranges/iota/iota_view.cc (test06): New test. OK for trunk and gcc-11 (after some soak time), thanks. > --- > libstdc++-v3/include/std/ranges | 21 +++ > .../testsuite/std/ranges/iota/iota_view.cc| 21 +++ > 2 files changed, 42 insertions(+) > > diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges > index b8de400dfbb..85f232d8fb9 100644 > --- a/libstdc++-v3/include/std/ranges > +++ b/libstdc++-v3/include/std/ranges > @@ -532,6 +532,7 @@ namespace ranges >private: > _Winc _M_value = _Winc(); > > + friend iota_view; > friend _Sentinel; >}; > > @@ -568,6 +569,8 @@ namespace ranges > operator-(const _Sentinel& __x, const _Iterator& __y) > requires sized_sentinel_for<_Bound, _Winc> > { return __x._M_distance_from(__y); } > + > + friend iota_view; >}; > >_Winc _M_value = _Winc(); > @@ -590,6 +593,24 @@ namespace ranges > __glibcxx_assert( bool(__value <= __bound) ); >} > > + constexpr > + iota_view(_Iterator __first, _Iterator __last) > + requires same_as<_Winc, _Bound> > + : iota_view(__first._M_value, __last._M_value) > + { } > + > + constexpr > + iota_view(_Iterator __first, unreachable_sentinel_t __last) > + requires same_as<_Bound, unreachable_sentinel_t> > + : iota_view(__first._M_value, __last) > + { } > + > + constexpr > + iota_view(_Iterator __first, _Sentinel __last) > + requires (!same_as<_Winc, _Bound>) && (!same_as<_Bound, > unreachable_sentinel_t>) > + : iota_view(__first._M_value, __last._M_bound) > + { } > + >constexpr _Iterator >begin() const { return _Iterator{_M_value}; } > > diff --git a/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc > b/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc > index 362ef1f7f78..5bebe4be693 100644 > --- a/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc > +++ b/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc > @@ -18,6 +18,7 @@ > // { dg-options "-std=gnu++2a" } > // { dg-do run { target c++2a } } > > +#include > #include > #include > > @@ -90,6 +91,25 @@ test05() >VERIFY( r.begin() - r.end() == -3 ); > } > > +void > +test06() > +{ > + // Verify LWG 3523 changes. > + auto v1 = std::views::iota(0, 5); > + auto w1 = decltype(v1)(v1.begin(), v1.end()); > + VERIFY( std::ranges::equal(v1, w1) ); > + > + auto v2 = std::views::iota(0); > + auto w2 = decltype(v2)(v2.begin(), v2.end()); > + static_assert(std::same_as std::unreachable_sentinel_t>); > + VERIFY( *w2.begin() == 0 ); > + > + auto v3 = std::views::iota(0, 5l); > + auto w3 = decltype(v3)(v3.begin(), v3.end()); > + static_assert(!std::ranges::common_range); > + VERIFY( std::ranges::equal(v3, w3) ); > +} > + > int > main() > { > @@ -98,4 +118,5 @@ main() >test03(); >test04(); >test05(); > + test06(); > } > -- > 2.33.1.637.gf443b226ca >
[PATCH] aix: ensure reference to __tls_get_addr is in text section.
The garbage collector of AIX linker might remove the reference to __tls_get_addr if it's added inside an unused csect, which can be the case of .data with very simple programs. gcc/ChangeLog: 2021-10-19 Clément Chigot * config/rs6000/rs6000.c (rs6000_xcoff_file_end): Move __tls_get_addr reference to .text csect. Approved offline by David Edelson. From 52e9e4554d8dba9f9c9c56267789fc1d08b1de98 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Cl=C3=A9ment=20Chigot?= Date: Thu, 14 Oct 2021 09:03:13 +0200 Subject: [PATCH] aix: ensure reference to __tls_get_addr is in text section. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The garbage collector of AIX linker might remove the reference to __tls_get_addr if it's added inside an unused csect, which can be the case of .data with very simple programs. gcc/ChangeLog: 2021-10-19 Clément Chigot * config/rs6000/rs6000.c (rs6000_xcoff_file_end): Move __tls_get_addr reference to .text csect. --- gcc/config/rs6000/rs6000.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 68111c3fe6a..bac959f4ef4 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -21626,17 +21626,17 @@ static void rs6000_xcoff_file_end (void) { switch_to_section (text_section); + if (xcoff_tls_exec_model_detected) +{ + /* Add a .ref to __tls_get_addr to force libpthread dependency. */ + fputs ("\t.extern __tls_get_addr\n\t.ref __tls_get_addr\n", asm_out_file); +} fputs ("_section_.text:\n", asm_out_file); switch_to_section (data_section); fputs (TARGET_32BIT ? "\t.long _section_.text\n" : "\t.llong _section_.text\n", asm_out_file); - if (xcoff_tls_exec_model_detected) -{ - /* Add a .ref to __tls_get_addr to force libpthread dependency. */ - fputs ("\t.extern __tls_get_addr\n\t.ref __tls_get_addr\n", asm_out_file); -} } struct declare_alias_data -- 2.25.1
Re: [PATCH] AArch64: Tune case-values-threshold
Hi Richard, > I'm just concerned that here we're using the same explanation but with > different numbers. Why are the new numbers more right than the old ones > (especially when it comes to code size, where the trade-off hasn't > really changed)? Like all tuning/costing parameters, these values are never fixed but change over time due to optimizations, micro architectures and workloads. The previous values were out of date so that's why I retuned them by benchmarking different values and choosing the best combinations. > It would be good to have more discussion of why certain numbers are > too small or too high, and why 8 is the right pivot point for -Os. You mean add more discussion in the comment? That comment is already overly large and too specific - it would be better to reduce it. The -Os value was never tuned, and 8 turns out to be faster and smaller than GCC's default. Cheers, Wilco
[PATCH, v2, OpenMP, Fortran] Support in_reduction for Fortran
Hi Jakub, On 2021/9/18 12:11 AM, Jakub Jelinek wrote: @@ -3496,7 +3509,8 @@ static match match_omp (gfc_exec_op op, const omp_mask mask) { gfc_omp_clauses *c; - if (gfc_match_omp_clauses (&c, mask) != MATCH_YES) + if (gfc_match_omp_clauses (&c, mask, true, true, false, +(op == EXEC_OMP_TARGET)) != MATCH_YES) The ()s around op == EXEC_OMP_TARGET are unnecessary. Fixed. --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -6391,12 +6391,17 @@ gfc_trans_omp_task (gfc_code *code) static tree gfc_trans_omp_taskgroup (gfc_code *code) { + stmtblock_t block; + gfc_start_block (&block); tree body = gfc_trans_code (code->block->next); tree stmt = make_node (OMP_TASKGROUP); TREE_TYPE (stmt) = void_type_node; OMP_TASKGROUP_BODY (stmt) = body; - OMP_TASKGROUP_CLAUSES (stmt) = NULL_TREE; - return stmt; + OMP_TASKGROUP_CLAUSES (stmt) = gfc_trans_omp_clauses (&block, + code->ext.omp_clauses, + code->loc); + gfc_add_expr_to_block (&block, stmt); If this was missing, then I'm afraid we lack a lot of testsuite coverage for Fortran task reductions. It doesn't need to be covered in this patch, but would be good to cover it incrementally. Because the above means nothing with taskgroup with task_reduction clause(s) could work properly at runtime. Actually, the testcases do somewhat exercise taskgroup task_reductions, but like you said, only lightly. --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -1317,9 +1317,13 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) if (is_omp_target (ctx->stmt)) { tree at = decl; + omp_context *scan_ctx = ctx; if (ctx->outer) - scan_omp_op (&at, ctx->outer); - tree nt = omp_copy_decl_1 (at, ctx); + { + scan_omp_op (&at, ctx->outer); + scan_ctx = ctx->outer; + } + tree nt = omp_copy_decl_1 (at, scan_ctx); splay_tree_insert (ctx->field_map, (splay_tree_key) &DECL_CONTEXT (decl), (splay_tree_value) nt); You're right that the var remembered with &DECL_CONTEXT (whatever) key is used outside of the target construct rather than inside of it. So, if ctx->outer is non-NULL, it seems right to create the var in that outer context. But, if ctx->outer is NULL, which can happen if the target construct is orphaned, consider e.g. extern int &x; extern int &y; void foo () { #pragma omp target in_reduction (+: x, y) { x = x + 8; y = y + 16; } } void bar () { #pragma omp taskgroup task_reduction (+: x, y) foo (); } then those artificial decls (copies of x and y) should appear to be at the function scope and not inside of the target region. Therefore, I wonder if omp_copy_decl_2 shouldn't do the DECL_CONTEXT (copy) = current_function_decl; DECL_CHAIN (copy) = ctx->block_vars; ctx->block_vars = copy; (the last one can be moved next to the others) only if ctx != NULL and otherwise call gimple_add_tmp_var (copy); instead and then just call omp_copy_decl_1 at that spot with unconditional ctx->outer. I see what you mean. I tried gimple_add_tmp_var but didn't work due to a !DECL_SEEN_IN_BIND_EXPR_P() assert fail, but record_vars() appears to work. Also, this isn't the only place that should have such a change, there is also if (ctx->outer) scan_omp_op (&at, ctx->outer); tree nt = omp_copy_decl_1 (at, ctx); splay_tree_insert (ctx->field_map, (splay_tree_key) &DECL_CONTEXT (t), (splay_tree_value) nt); a few lines above this and I'd expect that it should be (at, ctx->outer) as well. Fixed. @@ -1339,7 +1343,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) if (!is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))) { by_ref = use_pointer_for_field (decl, ctx); - if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IN_REDUCTION) + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IN_REDUCTION + && !splay_tree_lookup (ctx->field_map, +(splay_tree_key) decl)) install_var_field (decl, by_ref, 3, ctx); } install_var_local (decl, ctx); When exactly do you need this? It doesn't trigger on the new libgomp testcase... I remember there was a testcase with triggered an ICE without this, but for some reason can't find it anymore. I don't have any more evidence this is needed, so removed now. --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/target-in-reduction-1.f90 @@ -0,0 +1,33 @@ +! { dg-do run } + +subrouti
Re: [PATCH] rs6000: Remove unnecessary option manipulation.
Hi Martin, On 10/19/21 3:53 AM, Martin Liška wrote: > On 10/15/21 17:24, Martin Liška wrote: >> P.S. Next time, please CC me. > > All right, I tested the updated patch and it works fine > on ppc64le-linux-gnu. > > May I install it? I'm not a maintainer, so can't approve -- CCing those who can. Thanks! Bill > Thanks, > Martin
Re: [PATCH v4 1/3] rs6000: Add nmmintrin.h to extra_headers
Hi Paul, On 10/18/21 8:15 PM, Paul A. Clarke wrote: > Fix an ommission in commit 29fb1e831bf1c25e4574bf2f98a9f534e5c67665. > > 2021-10-18 Paul A. Clarke > > gcc > * config/config.gcc (extra_headers): Add nmmintrin.h. > --- > gcc/config.gcc | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gcc/config.gcc b/gcc/config.gcc > index aa5bd5d14590..1cb9303b3a85 100644 > --- a/gcc/config.gcc > +++ b/gcc/config.gcc > @@ -490,6 +490,7 @@ powerpc*-*-*) > extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h" > extra_headers="${extra_headers} mmintrin.h x86intrin.h" > extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h" > + extra_headers="${extra_headers} nmmintrin.h" > extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h > si2vmx.h" > extra_headers="${extra_headers} amo.h" > case x$with_cpu in In my opinion, you can commit this one as obvious. Thanks, Bill
[Patch, committed] Fortran: Fix "str" to scalar descriptor conversion [PR92482]
Here, the problem is that the param.expr was: &"abc" -> type: "char*" as that's an ADDR_EXPR, the previous code dereferrenced it: *&"abc" -> type *(char*) but that's the type 'char'. Thus, at the end, the result was scalar = 'a' -> type char instead of scalar "abc" -> type char array of size 3 Solution: Do what the comment does – remove the ADDR_EXPR insead of dereferrencing the result. Build + regtested on x86_64-gnu-linux + installed as r12-4505-g6920d5a1a2834e9c62d441b8f4c6186b01107d13 Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 6920d5a1a2834e9c62d441b8f4c6186b01107d13 Author: Tobias Burnus Date: Tue Oct 19 15:16:01 2021 +0200 Fortran: Fix "str" to scalar descriptor conversion [PR92482] PR fortran/92482 gcc/fortran/ChangeLog: * trans-expr.c (gfc_conv_procedure_call): Use TREE_OPERAND not build_fold_indirect_ref_loc to undo an ADDR_EXPR. gcc/testsuite/ChangeLog: * gfortran.dg/bind-c-char-descr.f90: Remove xfail; extend a bit. --- gcc/fortran/trans-expr.c| 2 +- gcc/testsuite/gfortran.dg/bind-c-char-descr.f90 | 57 - 2 files changed, 39 insertions(+), 20 deletions(-) diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index 01389373065..29697e69e75 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -6640,7 +6640,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, { tmp = parmse.expr; if (TREE_CODE (tmp) == ADDR_EXPR) - tmp = build_fold_indirect_ref_loc (input_location, tmp); + tmp = TREE_OPERAND (tmp, 0); parmse.expr = gfc_conv_scalar_to_descriptor (&parmse, tmp, fsym->attr); parmse.expr = gfc_build_addr_expr (NULL_TREE, diff --git a/gcc/testsuite/gfortran.dg/bind-c-char-descr.f90 b/gcc/testsuite/gfortran.dg/bind-c-char-descr.f90 index 3b01ad3b63d..8829fd1f71b 100644 --- a/gcc/testsuite/gfortran.dg/bind-c-char-descr.f90 +++ b/gcc/testsuite/gfortran.dg/bind-c-char-descr.f90 @@ -2,7 +2,6 @@ ! ! Contributed by José Rui Faustino de Sousa ! -! Note the xfail issue below for 'strg_print_2("abc") program strp_p @@ -24,13 +23,18 @@ program strp_p if (len(str) /= 3 .or. str /= "abc") stop 1 if (len(strp_1) /= 3 .or. strp_1 /= "abc") stop 2 if (len(strp_2) /= 3 .or. strp_2 /= "abc") stop 3 - call strg_print_0("abc") ! Error (10.0.0) or segmentation fault (9.1.0) - call strg_print_0(str) ! Error (10.0.0) or segmentation fault (9.1.0) - call strg_print_0(strp_1) ! Error (10.0.0) or segmentation fault (9.1.0) - call strg_print_0(strp_2) ! Error (10.0.0) or segmentation fault (9.1.0) - call strg_print_1(strp_1) ! Not yet supported + call strg_print_0("abc") + call strg_print_0(str) + call strg_print_0(strp_1) + call strg_print_0(strp_2) + call strg_print_0_c("abc") + call strg_print_0_c(str) + call strg_print_0_c(strp_1) + call strg_print_0_c(strp_2) + call strg_print_1(strp_1) + call strg_print_1_c(strp_1) - call strg_print_2("abc", xfail=.true.) + call strg_print_2("abc") call strg_print_2(str) call strg_print_2(strp_1) call strg_print_2(strp_2) @@ -42,14 +46,21 @@ program strp_p contains - subroutine strg_print_0(this) bind(c) ! Error (10.0.0 20191106) or warning (9.1.0) issued with bind(c) + subroutine strg_print_0 (this) character(len=*, kind=c_char), target, intent(in) :: this if (len (this) /= 3) stop 10 if (this /= "abc") stop 11 end subroutine strg_print_0 + + subroutine strg_print_0_c (this) bind(c) +character(len=*, kind=c_char), target, intent(in) :: this + +if (len (this) /= 3) stop 10 +if (this /= "abc") stop 11 + end subroutine strg_print_0_c - subroutine strg_print_1(this) bind(c) ! Not yet supported with bind(c) + subroutine strg_print_1 (this) bind(c) character(len=:, kind=c_char), pointer, intent(in) :: this character(len=:), pointer :: strn @@ -63,26 +74,34 @@ contains if (this /= "abc") stop 25 end if end subroutine strg_print_1 + + subroutine strg_print_1_c (this) bind(c) +character(len=:, kind=c_char), pointer, intent(in) :: this +character(len=:), pointer :: strn + +if (.not. associated (this)) stop 20 +if (len (this) /= 3) stop 21 +if (this /= "abc") stop 22 + strn => this + if (.not. associated (strn)) stop 23 + if(associated(strn))then + if (len (this) /= 3) stop 24 + if (this /= "abc") stop 25 + end if + end subroutine strg_print_1_c - subroutine strg_print_2(this, xfail) + subroutine strg_print_2(this) use, intrinsic :: iso_c_binding, only: & c_loc, c_f_pointer type(*), target, intent(in) :: this(..) -logical, opt
Re: [PATCH] AArch64: Tune case-values-threshold
Wilco Dijkstra writes: > Hi Richard, > >> I'm just concerned that here we're using the same explanation but with >> different numbers. Why are the new numbers more right than the old ones >> (especially when it comes to code size, where the trade-off hasn't >> really changed)? > > Like all tuning/costing parameters, these values are never fixed but change > over time due to optimizations, micro architectures and workloads. > The previous values were out of date so that's why I retuned them by > benchmarking different values and choosing the best combinations. > >> It would be good to have more discussion of why certain numbers are >> too small or too high, and why 8 is the right pivot point for -Os. > > You mean add more discussion in the comment? That comment is already overly > large and too specific - it would be better to reduce it. The -Os value was > never > tuned, and 8 turns out to be faster and smaller than GCC's default. The problem is that you're effectively asking for these values to be taken on faith without providing any analysis and without describing how you arrived at the new numbers. Did you try other values too? If so, how did they compare with the numbers that you finally chose? At least that would give an indication of where the boundaries are. For example, it's easier to believe that 8 is the right value for -Os if you say that you tried 9 and 7 as well, and they were worse than 8 by X% and Y%. This would also help anyone who wants to tweak the numbers again in future. BTW, which CPU did you use to do the experiments? Are the tuning parameters for that CPU already consistent with the new generic values? Thanks, Richard
[PATCH] c++: Reject addresses of immediate functions in constexpr vars inside of immediate functions or consteval if [PR102753]
On Tue, Oct 19, 2021 at 02:00:21PM +0200, Jakub Jelinek via Gcc-patches wrote: > And another thing isn't in a patch, but I'm wondering whether we don't > handle it incorrectly. constexpr.c has: > /* Check that immediate invocation does not return an expression referencing > any immediate function decls. They need to be allowed while parsing > immediate functions, but can't leak outside of them. */ > if (is_consteval > && t != r > && (current_function_decl == NULL_TREE > || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl))) > as condition for the discovery of embedded immediate FUNCTION_DECLs > (or now PTRMEM_CSTs). If I remove the && (current... ..._decl)) > then g++.dg/cpp2a/consteval7.C's > struct S { int b; int (*c) (); }; > consteval S baz () { return { 5, foo }; } > consteval int qux () { S s = baz (); return s.b + s.c (); } > consteval int quux () { constexpr S s = baz (); return s.b + s.c (); } > quux line fails, but based on > http://eel.is/c++draft/expr.const#11 > I wonder if it shouldn't fail (clang++ -std=c++20 rejects it), > and be only accepted without the constexpr keyword before S s. Here is an incremental patch that implements that. 2021-10-19 Jakub Jelinek PR c++/102753 * constexpr.c (cxx_eval_outermost_constant_expr): Perform find_immediate_fndecl discovery if is_consteval or in_immediate_context () rather than if is_consteval, t != r and not in immediate function's body. * g++.dg/cpp2a/consteval7.C: Expect diagnostics on quux. * g++.dg/cpp2a/consteval24.C: New test. * g++.dg/cpp23/consteval-if12.C: New test. --- gcc/cp/constexpr.c.jj 2021-10-19 12:22:35.583964001 +0200 +++ gcc/cp/constexpr.c 2021-10-19 13:58:22.545182032 +0200 @@ -7472,12 +7472,8 @@ cxx_eval_outermost_constant_expr (tree t } /* Check that immediate invocation does not return an expression referencing - any immediate function decls. They need to be allowed while parsing - immediate functions, but can't leak outside of them. */ - if (is_consteval - && t != r - && (current_function_decl == NULL_TREE - || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl))) + any immediate function decls. */ + if (is_consteval || in_immediate_context ()) if (tree immediate_fndecl = cp_walk_tree_without_duplicates (&r, find_immediate_fndecl, NULL)) --- gcc/testsuite/g++.dg/cpp2a/consteval7.C.jj 2020-01-12 11:54:37.140402440 +0100 +++ gcc/testsuite/g++.dg/cpp2a/consteval7.C 2021-10-19 13:59:54.033897061 +0200 @@ -7,7 +7,7 @@ constexpr auto a = bar (); // { dg-error struct S { int b; int (*c) (); }; consteval S baz () { return { 5, foo }; } consteval int qux () { S s = baz (); return s.b + s.c (); } -consteval int quux () { constexpr S s = baz (); return s.b + s.c (); } +consteval int quux () { constexpr S s = baz (); return s.b + s.c (); } // { dg-error "immediate evaluation returns address of immediate function 'consteval int foo\\(\\)'" } constexpr auto d = baz (); // { dg-error "immediate evaluation returns address of immediate function 'consteval int foo\\(\\)'" } constexpr auto e = qux (); constexpr auto f = quux (); --- gcc/testsuite/g++.dg/cpp2a/consteval24.C.jj 2021-10-19 14:32:51.858019368 +0200 +++ gcc/testsuite/g++.dg/cpp2a/consteval24.C2021-10-19 14:49:11.618177303 +0200 @@ -0,0 +1,30 @@ +// PR c++/102753 +// { dg-do compile { target c++20 } } + +struct S { + constexpr S () : s (0) {} + consteval int foo () { return 1; } + virtual consteval int bar () { return 2; } + int s; +}; + +consteval int foo () { return 42; } +consteval auto baz () { return foo; } +consteval auto qux () { return &S::foo; } +consteval auto corge () { return &S::bar; } + +consteval int +bar () +{ + S s; + constexpr auto fn1 = foo;// { dg-error "immediate evaluation returns address of immediate function" } + constexpr auto fn2 = &foo; // { dg-error "immediate evaluation returns address of immediate function" } + constexpr auto fn3 = &S::foo;// { dg-error "immediate evaluation returns address of immediate function" } + constexpr auto fn4 = &S::bar;// { dg-error "immediate evaluation returns address of immediate function" } + constexpr auto fn5 = baz (); // { dg-error "immediate evaluation returns address of immediate function" } + constexpr auto fn6 = qux (); // { dg-error "immediate evaluation returns address of immediate function" } + constexpr auto fn7 = corge (); // { dg-error "immediate evaluation returns address of immediate function" } + return fn1 () + fn2 () + (s.*fn3) () + (s.*fn4) () + fn5 () + (s.*fn6) () + (s.*fn7) (); +} + +auto a = bar (); --- gcc/testsuite/g++.dg/cpp23/consteval-if12.C.jj 2021-10-19 14:54:05.123023731 +0200 +++ gcc/testsuite/g++.dg/cpp23/consteval-if12.C 2021-10-19 14:55:29.02684403
Re: Cleanup compute_points_to_sets
On October 19, 2021 2:35:47 PM GMT+02:00, Jan Hubicka via Gcc-patches wrote: >Hi, >this patch fixes two issues I noticed while proofreading the code. >First is that I have added conditional around setting of nonlocal and >escaped flags (since they may be set from solver) while keeping the >variable in assignment that is confusing. > >Second is that we still do not set pt in the case function has no memory >side effects. In this case the call use is not going to be used since >uses_global_memory is false only if either function is const or modref >determined that all loads are from memory pointed to by parameters. In >both cases we will disambiguate earlier before asking PTA oracle, but it >is better to avoid stale PTA sets (which shows in -alias dumps etc.) > >Most of builtins are not modifying global memory, one option would be to >stick another flag into the fnspecs strings for this property. > >Bootstrapped/regtested x86_64-linux, OK? Ok. Richard. >Honza > >gcc/ChangeLog: > > * tree-ssa-structalias.c (compute_points_to_sets): Cleanup. > >diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c >index 2e6513bb72a..35971a54e02 100644 >--- a/gcc/tree-ssa-structalias.c >+++ b/gcc/tree-ssa-structalias.c >@@ -7550,8 +7550,8 @@ compute_points_to_sets (void) >always escaped. */ > if (uses_global_memory) > { >-pt->nonlocal = uses_global_memory; >-pt->escaped = uses_global_memory; >+pt->nonlocal = 1; >+pt->escaped = 1; > } > } > else if (uses_global_memory) >@@ -7561,6 +7561,8 @@ compute_points_to_sets (void) > *pt = cfun->gimple_df->escaped; > pt->nonlocal = 1; > } >+else >+ memset (pt, 0, sizeof (struct pt_solution)); > } > > pt = gimple_call_clobber_set (stmt); >@@ -7582,8 +7584,8 @@ compute_points_to_sets (void) >always escaped. */ > if (writes_global_memory) > { >-pt->nonlocal = writes_global_memory; >-pt->escaped = writes_global_memory; >+pt->nonlocal = 1; >+pt->escaped = 1; > } > } > else if (writes_global_memory) >@@ -7593,6 +7595,8 @@ compute_points_to_sets (void) > *pt = cfun->gimple_df->escaped; > pt->nonlocal = 1; > } >+else >+ memset (pt, 0, sizeof (struct pt_solution)); > } > } > }
[committed] libstdc++: Fix std::stack deduction guide
libstdc++-v3/ChangeLog: * include/bits/stl_stack.h (stack(Iterator, Iterator)): Remove non-deducible template parameter from deduction guide. * testsuite/23_containers/stack/deduction.cc: Check new C++23 deduction guides. Tested powerpc64le-linux. Committed to trunk. commit c4ecb11e4f7ea15f636e463248c8b14083bef05d Author: Jonathan Wakely Date: Tue Oct 19 11:38:26 2021 libstdc++: Fix std::stack deduction guide libstdc++-v3/ChangeLog: * include/bits/stl_stack.h (stack(Iterator, Iterator)): Remove non-deducible template parameter from deduction guide. * testsuite/23_containers/stack/deduction.cc: Check new C++23 deduction guides. diff --git a/libstdc++-v3/include/bits/stl_stack.h b/libstdc++-v3/include/bits/stl_stack.h index 429743f5514..76b2e242c37 100644 --- a/libstdc++-v3/include/bits/stl_stack.h +++ b/libstdc++-v3/include/bits/stl_stack.h @@ -322,7 +322,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION -> stack; #ifdef __cpp_lib_adaptor_iterator_pair_constructor - template::value_type, typename = _RequireInputIter<_InputIterator>> diff --git a/libstdc++-v3/testsuite/23_containers/stack/deduction.cc b/libstdc++-v3/testsuite/23_containers/stack/deduction.cc index 169a063687b..dea7ba060d9 100644 --- a/libstdc++-v3/testsuite/23_containers/stack/deduction.cc +++ b/libstdc++-v3/testsuite/23_containers/stack/deduction.cc @@ -87,3 +87,17 @@ test02() std::stack s8(std::move(l), l.get_allocator()); check_type>>(s8); } + +#if __cpp_lib_adaptor_iterator_pair_constructor +void +test03() +{ + std::list l; + + std::stack s1(l.begin(), l.end()); + check_type>(s1); + + std::stack s2(l.begin(), l.end(), std::allocator()); + check_type>(s1); +} +#endif
[committed] libstdc++: Implement monadic operations for std::optional (P0798R8)
Another new addition to the C++23 working draft. The new member functions of std::optional are only defined for C++23, but the new members of _Optional_payload_base are defined for C++20 so that they can be used in non-propagating-cache in . The _Optional_payload_base::_M_construct member can also be used in non-propagating-cache now, because it's constexpr since r12-4389. There will be an LWG issue about the feature test macro, suggesting that we should just bump the value of __cpp_lib_optional instead. I haven't done that here, but it can be changed once consensus is reached on the change. libstdc++-v3/ChangeLog: * include/std/optional (_Optional_payload_base::_Storage): Add constructor taking a callable function to invoke. (_Optional_payload_base::_M_apply): New function. (__cpp_lib_monadic_optional): Define for C++23. (optional::and_then, optional::transform, optional::or_else): Define for C++23. * include/std/ranges (__detail::__cached): Remove. (__detail::__non_propagating_cache): Remove use of __cached for contained value. Use _Optional_payload_base::_M_construct and _Optional_payload_base::_M_apply to set the contained value. * include/std/version (__cpp_lib_monadic_optional): Define. * testsuite/20_util/optional/monadic/and_then.cc: New test. * testsuite/20_util/optional/monadic/or_else.cc: New test. * testsuite/20_util/optional/monadic/or_else_neg.cc: New test. * testsuite/20_util/optional/monadic/transform.cc: New test. * testsuite/20_util/optional/monadic/version.cc: New test. Tested powerpc64le-linux. Committed to trunk. commit 82b2e4f8cf5a01c6724fe3f465a77ee03cfcaae2 Author: Jonathan Wakely Date: Tue Oct 19 11:06:56 2021 libstdc++: Implement monadic operations for std::optional (P0798R8) Another new addition to the C++23 working draft. The new member functions of std::optional are only defined for C++23, but the new members of _Optional_payload_base are defined for C++20 so that they can be used in non-propagating-cache in . The _Optional_payload_base::_M_construct member can also be used in non-propagating-cache now, because it's constexpr since r12-4389. There will be an LWG issue about the feature test macro, suggesting that we should just bump the value of __cpp_lib_optional instead. I haven't done that here, but it can be changed once consensus is reached on the change. libstdc++-v3/ChangeLog: * include/std/optional (_Optional_payload_base::_Storage): Add constructor taking a callable function to invoke. (_Optional_payload_base::_M_apply): New function. (__cpp_lib_monadic_optional): Define for C++23. (optional::and_then, optional::transform, optional::or_else): Define for C++23. * include/std/ranges (__detail::__cached): Remove. (__detail::__non_propagating_cache): Remove use of __cached for contained value. Use _Optional_payload_base::_M_construct and _Optional_payload_base::_M_apply to set the contained value. * include/std/version (__cpp_lib_monadic_optional): Define. * testsuite/20_util/optional/monadic/and_then.cc: New test. * testsuite/20_util/optional/monadic/or_else.cc: New test. * testsuite/20_util/optional/monadic/or_else_neg.cc: New test. * testsuite/20_util/optional/monadic/transform.cc: New test. * testsuite/20_util/optional/monadic/version.cc: New test. diff --git a/libstdc++-v3/include/std/optional b/libstdc++-v3/include/std/optional index b69268b3642..eac91d3c160 100644 --- a/libstdc++-v3/include/std/optional +++ b/libstdc++-v3/include/std/optional @@ -1,6 +1,7 @@ // -*- C++ -*- // Copyright (C) 2013-2021 Free Software Foundation, Inc. +// Copyright The GNU Toolchain Authors. // // This file is part of the GNU ISO C++ Library. This library is free // software; you can redistribute it and/or modify it under the @@ -44,6 +45,10 @@ #include // in_place_t #if __cplusplus > 201703L # include +# include // std::__invoke +#endif +#if __cplusplus > 202002L +# include #endif namespace std _GLIBCXX_VISIBILITY(default) @@ -81,6 +86,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION /// Tag to disengage optional objects. inline constexpr nullopt_t nullopt { nullopt_t::_Construct::_Token }; + template struct _Optional_func { _Fn& _M_f; }; + /** * @brief Exception class thrown when a disengaged optional object is * dereferenced. @@ -211,6 +218,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : _M_value(__il, std::forward<_Args>(__args)...) { } +#if __cplusplus >= 202002L + template + constexpr + _Storage(_Optional_func<_Fn> __f, _Arg&& __arg) + : _M_value(std::__invoke(std::forward<_Fn>(__f._M_f), +
[committed] libstdc++: Change std::variant union member to empty struct
This more clearly expresses the intent (a completely unused, trivial type) than using char. It's also consistent with the unions in std::optional. libstdc++-v3/ChangeLog: * include/std/variant (_Uninitialized): Use an empty struct for the unused union member, instead of char. Tested powerpc64le-linux. Committed to trunk. commit 5a8832b1659e311437d25b7ec8b078be27ae54b8 Author: Jonathan Wakely Date: Tue Oct 19 11:53:27 2021 libstdc++: Change std::variant union member to empty struct This more clearly expresses the intent (a completely unused, trivial type) than using char. It's also consistent with the unions in std::optional. libstdc++-v3/ChangeLog: * include/std/variant (_Uninitialized): Use an empty struct for the unused union member, instead of char. diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant index d18365fde22..3da7dad1e82 100644 --- a/libstdc++-v3/include/std/variant +++ b/libstdc++-v3/include/std/variant @@ -293,8 +293,10 @@ namespace __variant constexpr _Type&& _M_get() && noexcept { return std::move(_M_storage); } + struct _Empty_byte { }; + union { - char _M_nope; + _Empty_byte _M_empty; _Type _M_storage; }; #else
Re: [PATCH, v2, OpenMP, Fortran] Support in_reduction for Fortran
On Tue, Oct 19, 2021 at 09:03:06PM +0800, Chung-Lin Tang wrote: > 2021-10-19 Chung-Lin Tang > > gcc/fortran/ChangeLog: > > * openmp.c (gfc_match_omp_clause_reduction): Add 'openmp_target' default > false parameter. Add 'always,tofrom' map for OMP_LIST_IN_REDUCTION case. > (gfc_match_omp_clauses): Add 'openmp_target' default false parameter, > adjust call to gfc_match_omp_clause_reduction. > (match_omp): Adjust call to gfc_match_omp_clauses > * trans-openmp.c (gfc_trans_omp_taskgroup): Add call to > gfc_match_omp_clause, create and return block. > > gcc/ChangeLog: > > * omp-low.c (omp_copy_decl_2): For !ctx, use record_vars to add new copy > as local variable. > (scan_sharing_clauses): Place copy of OMP_CLAUSE_IN_REDUCTION decl in > ctx->outer instead of ctx. > > gcc/testsuite/ChangeLog: > > * gfortran.dg/gomp/reduction4.f90: Adjust omp target in_reduction' scan > pattern. > > libgomp/ChangeLog: > > * testsuite/libgomp.fortran/target-in-reduction-1.f90: New test. > * testsuite/libgomp.fortran/target-in-reduction-2.f90: New test. LGTM, thanks. Jakub
[PATCH] Refactor vect_supportable_dr_alignment
This refactors vect_supportable_dr_alignment to get the misalignment as input parameter which allows us to elide modifying/restoring of DR_MISALIGNMENT during alignment peeling analysis which eventually makes it more straight-forward to split out the negative step handling. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk. 2021-10-19 Richard Biener * tree-vectorizer.h (vect_supportable_dr_alignment): Add misalignment parameter. * tree-vect-data-refs.c (vect_get_peeling_costs_all_drs): Do not change DR_MISALIGNMENT in place, instead pass the adjusted misalignment to vect_supportable_dr_alignment. (vect_peeling_supportable): Likewise. (vect_peeling_hash_get_lowest_cost): Adjust. (vect_enhance_data_refs_alignment): Likewise. (vect_vfa_access_size): Likewise. (vect_supportable_dr_alignment): Add misalignment parameter and simplify. * tree-vect-stmts.c (get_negative_load_store_type): Adjust. (get_group_load_store_type): Likewise. (get_load_store_type): Likewise. --- gcc/tree-vect-data-refs.c | 113 +++--- gcc/tree-vect-stmts.c | 26 + gcc/tree-vectorizer.h | 2 +- 3 files changed, 85 insertions(+), 56 deletions(-) diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 0db6aec7312..556ae9725f1 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -1529,37 +1529,49 @@ vect_get_peeling_costs_all_drs (loop_vec_info loop_vinfo, unsigned int *outside_cost, stmt_vector_for_cost *body_cost_vec, stmt_vector_for_cost *prologue_cost_vec, - unsigned int npeel, - bool unknown_misalignment) + unsigned int npeel) { vec datarefs = LOOP_VINFO_DATAREFS (loop_vinfo); + bool dr0_alignment_known_p += (dr0_info + && known_alignment_for_access_p (dr0_info, + STMT_VINFO_VECTYPE (dr0_info->stmt))); + for (data_reference *dr : datarefs) { dr_vec_info *dr_info = loop_vinfo->lookup_dr (dr); if (!vect_relevant_for_alignment_p (dr_info)) continue; - int save_misalignment; - save_misalignment = dr_info->misalignment; + tree vectype = STMT_VINFO_VECTYPE (dr_info->stmt); + dr_alignment_support alignment_support_scheme; + int misalignment; + unsigned HOST_WIDE_INT alignment; + if (npeel == 0) - ; - else if (unknown_misalignment && dr_info == dr0_info) - SET_DR_MISALIGNMENT (dr_info, -vect_dr_misalign_for_aligned_access (dr0_info)); + misalignment = dr_misalignment (dr_info, vectype); + else if (dr_info == dr0_info + || vect_dr_aligned_if_peeled_dr_is (dr_info, dr0_info)) + misalignment = 0; + else if (!dr0_alignment_known_p + || !known_alignment_for_access_p (dr_info, vectype) + || !DR_TARGET_ALIGNMENT (dr_info).is_constant (&alignment)) + misalignment = DR_MISALIGNMENT_UNKNOWN; else - vect_update_misalignment_for_peel (dr_info, dr0_info, npeel); - /* ??? We should be able to avoid both the adjustment before and the -call to vect_supportable_dr_alignment below. */ - tree vectype = STMT_VINFO_VECTYPE (dr_info->stmt); - int misalignment = dr_misalignment (dr_info, vectype); - dr_alignment_support alignment_support_scheme - = vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype); + { + misalignment = dr_misalignment (dr_info, vectype); + misalignment += npeel * TREE_INT_CST_LOW (DR_STEP (dr_info->dr)); + misalignment &= alignment - 1; + } + alignment_support_scheme + = vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype, +misalignment); + vect_get_data_access_cost (loop_vinfo, dr_info, alignment_support_scheme, misalignment, inside_cost, outside_cost, body_cost_vec, prologue_cost_vec); - SET_DR_MISALIGNMENT (dr_info, save_misalignment); } } @@ -1583,7 +1595,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot, vect_get_peeling_costs_all_drs (loop_vinfo, elem->dr_info, &inside_cost, &outside_cost, &body_cost_vec, - &prologue_cost_vec, elem->npeel, false); + &prologue_cost_vec, elem->npeel); body_cost_vec.release (); @@ -1655,25 +1667,37 @@ vect_peeling_supportable (loop_vec_info loop_vinfo, dr_vec_info *dr0_info, vec datarefs = LOOP_VINFO_DATAREFS (loop_vinfo); enum dr_alignment_support supportable_dr_alignment; + bo
Re: [PATCH] rs6000: Remove unnecessary option manipulation.
Hi! On Thu, Oct 14, 2021 at 09:49:30AM +0200, Martin Liška wrote: > gcc/ChangeLog: > * config/rs6000/rs6000.c (rs6000_override_options_after_change): > Do not set flag_rename_registers, it's already default behavior. It defaults to *off*? frename-registers Common Var(flag_rename_registers) Optimization EnabledBy(funroll-loops) Perform a register renaming optimization pass. > Use EnabledBy for unroll_only_small_loops. > * config/rs6000/rs6000.opt: Use EnabledBy for > unroll_only_small_loops. > --- > gcc/config/rs6000/rs6000.c | 7 +-- > gcc/config/rs6000/rs6000.opt | 2 +- > 2 files changed, 2 insertions(+), 7 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index acba4d9f26c..40146179e06 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -3472,13 +3472,8 @@ rs6000_override_options_after_change (void) >/* Explicit -funroll-loops turns -munroll-only-small-loops off, and > turns -frename-registers on. */ >if ((OPTION_SET_P (flag_unroll_loops) && flag_unroll_loops) > - || (OPTION_SET_P (flag_unroll_all_loops) > -&& flag_unroll_all_loops)) > + || (OPTION_SET_P (flag_unroll_all_loops && flag_unroll_all_loops))) That doesn't do what the changelog said, and it is not obvious at all that this is correct? Maybe this works with the current implementation of that macro (I assume you tested it works :-) ), but that is not something you can depend on. This expands to global_options_set.x_flag_unroll_all_loops && flag_unroll_all_loops just like the previous code did, but that one was obvious, and this is not. > { > - if (!OPTION_SET_P (unroll_only_small_loops)) > - unroll_only_small_loops = 0; > - if (!OPTION_SET_P (flag_rename_registers)) > - flag_rename_registers = 1; >if (!OPTION_SET_P (flag_cunroll_grow_size)) > flag_cunroll_grow_size = 1; > } > diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt > index 9d7878f144a..faeb7423ca7 100644 > --- a/gcc/config/rs6000/rs6000.opt > +++ b/gcc/config/rs6000/rs6000.opt > @@ -546,7 +546,7 @@ Target Undocumented Var(rs6000_optimize_swaps) Init(1) > Save > Analyze and remove doubleword swaps from VSX computations. > > munroll-only-small-loops > -Target Undocumented Var(unroll_only_small_loops) Init(0) Save > +Target Undocumented Var(unroll_only_small_loops) Init(0) Save > EnabledBy(funroll-loops) Your patches cannot apply. Please send them non-wordwrapped. This isn't the endpoint of the changes here I hope? The macro games make everything less readable (so, harder to change) and more fragile. Segher
Re: [PATCH] rs6000: Remove unnecessary option manipulation.
On Fri, Oct 15, 2021 at 05:24:32PM +0200, Martin Liška wrote: > On 10/14/21 17:10, Bill Schmidt via Gcc-patches wrote: > >Looks like you got your parentheses wrong here. > > Whoops, thanks for the heads up. > > I'm testing this fixed version. Please start a new thread for every new patch (series). I missed this one like this, instead I reviewed the older one. [-- Attachment #2: 0001-rs6000-Remove-unnecessary-option-manipulation.patch --] [-- Type: text/x-patch, Encoding: base64, Size: 2.6K --] Don't encode as non-text. Don't use x-anything if you can help it, it is meaningless in email. Use git send-email, it makes everything work :-) Segher
Re: [aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr
Prathamesh Kulkarni writes: > Hi, > The attached patch emits a more verbose diagnostic for target attribute that > is an architecture extension needing a leading '+'. > > For the following test, > void calculate(void) __attribute__ ((__target__ ("sve"))); > > With patch, the compiler now emits: > 102376.c:1:1: error: arch extension ‘sve’ should be prepended with ‘+’ > 1 | void calculate(void) __attribute__ ((__target__ ("sve"))); > | ^~~~ > > instead of: > 102376.c:1:1: error: pragma or attribute ‘target("sve")’ is not valid > 1 | void calculate(void) __attribute__ ((__target__ ("sve"))); > | ^~~~ Nice :-) > (This isn't specific to sve though). > OK to commit after bootstrap+test ? > > Thanks, > Prathamesh > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index a9a1800af53..975f7faf968 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -17821,7 +17821,16 @@ aarch64_process_target_attr (tree args) >num_attrs++; >if (!aarch64_process_one_target_attr (token)) > { > - error ("pragma or attribute % is not valid", token); > + /* Check if token is possibly an arch extension without > + leading '+'. */ > + char *str = (char *) xmalloc (strlen (token) + 2); > + str[0] = '+'; > + strcpy(str + 1, token); I think std::string would be better here, e.g.: auto with_plus = std::string ("+") + token; > + if (aarch64_handle_attr_isa_flags (str)) > + error("arch extension %<%s%> should be prepended with %<+%>", > token); Nit: should be a space before the “(”. In principle, a fixit hint would have been nice here, but I don't think we have enough information to provide one. (Just saying for the record.) Thanks, Richard > + else > + error ("pragma or attribute % is not valid", > token); > + free (str); > return false; > } >
Re: [PATCH v4 1/3] rs6000: Add nmmintrin.h to extra_headers
On Tue, Oct 19, 2021 at 08:10:06AM -0500, Bill Schmidt via Gcc-patches wrote: > Hi Paul, > > On 10/18/21 8:15 PM, Paul A. Clarke wrote: > > Fix an ommission in commit 29fb1e831bf1c25e4574bf2f98a9f534e5c67665. (Typo, s/mm/m/) > > 2021-10-18 Paul A. Clarke > > > > gcc > > * config/config.gcc (extra_headers): Add nmmintrin.h. > > --- > > gcc/config.gcc | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/gcc/config.gcc b/gcc/config.gcc > > index aa5bd5d14590..1cb9303b3a85 100644 > > --- a/gcc/config.gcc > > +++ b/gcc/config.gcc > > @@ -490,6 +490,7 @@ powerpc*-*-*) > > extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h" > > extra_headers="${extra_headers} mmintrin.h x86intrin.h" > > extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h" > > + extra_headers="${extra_headers} nmmintrin.h" > > extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h > > si2vmx.h" > > extra_headers="${extra_headers} amo.h" > > case x$with_cpu in > > In my opinion, you can commit this one as obvious. Or as trivial. Or as obvious and trivial :-) Segher
Re: [PATCH v4 3/3] rs6000: Guard some x86 intrinsics implementations
On Mon, Oct 18, 2021 at 08:15:12PM -0500, Paul A. Clarke via Gcc-patches wrote: > Some compatibility implementations of x86 intrinsics include > Power intrinsics which require POWER8. Guard them. I assume this improves on all previous commented things (you don't say if it does). > gcc > PR target/101893 > PR target/102719 > * config/rs6000/emmintrin.h: Guard POWER8 intrinsics. > * config/rs6000/pmmintrin.h: Same. > * config/rs6000/smmintrin.h: Same. > * config/rs6000/tmmintrin.h: Same. Okay for trunk. Thanks! Segher
[PATCH] libstdc++: Implement LWG 3549 changes to ranges::enable_view
This patch also reverts the r11-3504 workaround since it's made obsolete by this resolution. Tested on x86_64-pc-linux-gnu, does this look OK for trunk only? libstdc++-v3/ChangeLog: * include/bits/ranges_base.h (view_interface): Forward declare. (__detail::__is_derived_from_view_interface_fn): Declare (__detail::__is_derived_from_view_interface): Define as per LWG 3549. (enable_view): Adjust as per LWG 3549. * include/bits/ranges_util.h (view_interface): Don't derive from view_base. * include/std/ranges (filter_view): Revert r11-3504 change. (transform_view): Likewise. (take_view): Likewise. (take_while_view): Likewise. (drop_view): Likewise. (drop_while_view): Likewise. (join_view): Likewise. (split_view): Likewise. (reverse_view): Likewise. * testsuite/std/ranges/adaptors/sizeof.cc: Update expected sizes. * testsuite/std/ranges/view.cc (test_view::test_view): Remove now that views no longer need to be default-initializable. (test01): New test. --- libstdc++-v3/include/bits/ranges_base.h | 21 - libstdc++-v3/include/bits/ranges_util.h | 2 +- libstdc++-v3/include/std/ranges | 45 +-- .../testsuite/std/ranges/adaptors/sizeof.cc | 6 +-- libstdc++-v3/testsuite/std/ranges/view.cc | 28 ++-- 5 files changed, 70 insertions(+), 32 deletions(-) diff --git a/libstdc++-v3/include/bits/ranges_base.h b/libstdc++-v3/include/bits/ranges_base.h index 01d0c35f4b4..7801b2fd023 100644 --- a/libstdc++-v3/include/bits/ranges_base.h +++ b/libstdc++-v3/include/bits/ranges_base.h @@ -614,12 +614,31 @@ namespace ranges template using range_size_t = decltype(ranges::size(std::declval<_Range&>())); + template +requires is_class_v<_Derived> && same_as<_Derived, remove_cv_t<_Derived>> +class view_interface; // defined in + + namespace __detail + { +template + requires (!same_as<_Tp, view_interface<_Up>>) + void __is_derived_from_view_interface_fn(const _Tp&, + const view_interface<_Up>&); // not defined + +// Returns true iff _Tp has exactly one public base class that's a +// specialization of view_interface. +template + concept __is_derived_from_view_interface + = requires (_Tp __t) { __is_derived_from_view_interface_fn(__t, __t); }; + } + /// [range.view] The ranges::view_base type. struct view_base { }; /// [range.view] The ranges::enable_view boolean. template -inline constexpr bool enable_view = derived_from<_Tp, view_base>; +inline constexpr bool enable_view = derived_from<_Tp, view_base> + || __detail::__is_derived_from_view_interface<_Tp>; /// [range.view] The ranges::view concept. template diff --git a/libstdc++-v3/include/bits/ranges_util.h b/libstdc++-v3/include/bits/ranges_util.h index 1afa66d298c..5c0bef26220 100644 --- a/libstdc++-v3/include/bits/ranges_util.h +++ b/libstdc++-v3/include/bits/ranges_util.h @@ -61,7 +61,7 @@ namespace ranges /// The ranges::view_interface class template template requires is_class_v<_Derived> && same_as<_Derived, remove_cv_t<_Derived>> -class view_interface : public view_base +class view_interface { private: constexpr _Derived& _M_derived() noexcept diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges index 64396027c1b..e47fc075bbe 100644 --- a/libstdc++-v3/include/std/ranges +++ b/libstdc++-v3/include/std/ranges @@ -1533,9 +1533,9 @@ namespace views::__adaptor { return __y.__equal(__x); } }; + _Vp _M_base = _Vp(); [[no_unique_address]] __detail::__box<_Pred> _M_pred; [[no_unique_address]] __detail::_CachedPosition<_Vp> _M_cached_begin; - _Vp _M_base = _Vp(); public: filter_view() requires (default_initializable<_Vp> @@ -1544,7 +1544,7 @@ namespace views::__adaptor constexpr filter_view(_Vp __base, _Pred __pred) - : _M_pred(std::move(__pred)), _M_base(std::move(__base)) + : _M_base(std::move(__base)), _M_pred(std::move(__pred)) { } constexpr _Vp @@ -1900,8 +1900,8 @@ namespace views::__adaptor friend _Sentinel; }; - [[no_unique_address]] __detail::__box<_Fp> _M_fun; _Vp _M_base = _Vp(); + [[no_unique_address]] __detail::__box<_Fp> _M_fun; public: transform_view() requires (default_initializable<_Vp> @@ -1910,7 +1910,7 @@ namespace views::__adaptor constexpr transform_view(_Vp __base, _Fp __fun) - : _M_fun(std::move(__fun)), _M_base(std::move(__base)) + : _M_base(std::move(__base)), _M_fun(std::move(__fun)) { } constexpr _Vp @@ -2037,15 +2037,15 @@ namespace views::__adaptor friend _Sentinel; }; - range_difference_t<_Vp> _M_count = 0; _Vp
Re: [PATCH] AArch64: Tune case-values-threshold
On 19 October 2021 15:23:58 CEST, Richard Sandiford via Gcc-patches wrote: >Wilco Dijkstra writes: >> Hi Richard, >> >>> I'm just concerned that here we're using the same explanation but with >>> different numbers. Why are the new numbers more right than the old ones >>> (especially when it comes to code size, where the trade-off hasn't >>> really changed)? >> >> Like all tuning/costing parameters, these values are never fixed but change >> over time due to optimizations, micro architectures and workloads. >> The previous values were out of date so that's why I retuned them by >> benchmarking different values and choosing the best combinations. >> >>> It would be good to have more discussion of why certain numbers are >>> too small or too high, and why 8 is the right pivot point for -Os. >> >> You mean add more discussion in the comment? That comment is already overly >> large and too specific - it would be better to reduce it. The -Os value was >> never >> tuned, and 8 turns out to be faster and smaller than GCC's default. > >The problem is that you're effectively asking for these values to be >taken on faith without providing any analysis and without describing >how you arrived at the new numbers. Did you try other values too? >If so, how did they compare with the numbers that you finally chose? >At least that would give an indication of where the boundaries are. Maybe you can show csibe benchmark numbers to show the effects: http://szeged.github.io/csibe/ thanks, > >For example, it's easier to believe that 8 is the right value for -Os if >you say that you tried 9 and 7 as well, and they were worse than 8 by X% >and Y%. This would also help anyone who wants to tweak the numbers >again in future. > >BTW, which CPU did you use to do the experiments? Are the tuning >parameters for that CPU already consistent with the new generic values? > >Thanks, >Richard
Re: [PATCH] rs6000: Remove unnecessary option manipulation.
On 10/19/21 16:23, Segher Boessenkool wrote: On Fri, Oct 15, 2021 at 05:24:32PM +0200, Martin Liška wrote: On 10/14/21 17:10, Bill Schmidt via Gcc-patches wrote: Looks like you got your parentheses wrong here. Whoops, thanks for the heads up. I'm testing this fixed version. Please start a new thread for every new patch (series). I missed this one like this, instead I reviewed the older one. Is it really best practice. My impression is that patch review (iterating over a patch) happens in the same thread (in most cases). It's caused by discussion in between sender reviewers. [-- Attachment #2: 0001-rs6000-Remove-unnecessary-option-manipulation.patch --] [-- Type: text/x-patch, Encoding: base64, Size: 2.6K --] Meh :) If I need a reply to somebody's questions, I always attach patch as an attachment. And I can't likely influence how Thunderbird is going to mark it. Don't encode as non-text. Don't use x-anything if you can help it, it is meaningless in email. Use git send-email, it makes everything work :-) I know. Anyway, sending updated version of the patch. Cheers, Martin Segher From d32d9445d8ec868abc965f167fffa10ab19417e5 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Wed, 13 Oct 2021 14:20:33 +0200 Subject: [PATCH] rs6000: Remove unnecessary option manipulation. gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_override_options_after_change): Do not set flag_rename_registers, it's already enabled with EnabledBy(funroll-loops). Use EnabledBy for unroll_only_small_loops. * config/rs6000/rs6000.opt: Use EnabledBy for unroll_only_small_loops. --- gcc/config/rs6000/rs6000.c | 7 +-- gcc/config/rs6000/rs6000.opt | 2 +- 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 01a95591a5d..b9dddcd0aa1 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -3472,13 +3472,8 @@ rs6000_override_options_after_change (void) /* Explicit -funroll-loops turns -munroll-only-small-loops off, and turns -frename-registers on. */ if ((OPTION_SET_P (flag_unroll_loops) && flag_unroll_loops) - || (OPTION_SET_P (flag_unroll_all_loops) - && flag_unroll_all_loops)) + || (OPTION_SET_P (flag_unroll_all_loops) && flag_unroll_all_loops)) { - if (!OPTION_SET_P (unroll_only_small_loops)) - unroll_only_small_loops = 0; - if (!OPTION_SET_P (flag_rename_registers)) - flag_rename_registers = 1; if (!OPTION_SET_P (flag_cunroll_grow_size)) flag_cunroll_grow_size = 1; } diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 9d7878f144a..faeb7423ca7 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -546,7 +546,7 @@ Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save Analyze and remove doubleword swaps from VSX computations. munroll-only-small-loops -Target Undocumented Var(unroll_only_small_loops) Init(0) Save +Target Undocumented Var(unroll_only_small_loops) Init(0) Save EnabledBy(funroll-loops) ; Use conservative small loop unrolling. mpower9-misc -- 2.33.1
Re: [PATCH] c++: Fix up push_local_extern_decl_alias error recovery [PR102642]
On 10/11/21 07:18, Jakub Jelinek wrote: Hi! My recent push_local_extern_decl_alias change broke error-recovery, do_pushdecl can return error_mark_node and set_decl_tls_model can't be called on that. There are other code paths that store error_mark_node into DECL_LOCAL_DECL_ALIAS, with the intent to differentiate the cases where we haven't yet tried to push it into the namespace scope (NULL) and one where we have tried it but it failed (error_mark_node), but looking around, there are other spots where we call functions or do processing which doesn't tolerate error_mark_node. So, the first hunk with the testcase fixes the testcase, the others fix what I've spotted and the fix was easy to figure out (there are I think 3 other spots mainly for function multiversioning). What if we use NULL_TREE for the error case instead of error_mark_node, i.e. - DECL_LOCAL_DECL_ALIAS (decl) = alias; + if (alias != error_mark_node) +DECL_LOCAL_DECL_ALIAS (decl) = alias; ? That ought to avoid the need to change other functions, since they already check for null. Ok for trunk and 11.3 (where I've backported the tls fix before) if it passes bootstrap/regtest? 2021-10-11 Jakub Jelinek PR c++/102642 * name-lookup.c (push_local_extern_decl_alias): Don't call set_decl_tls_model on error_mark_node. * decl.c (make_rtl_for_nonlocal_decl): Don't call set_user_assembler_name on error_mark_node. * parser.c (cp_parser_oacc_declare): Ignore DECL_LOCAL_DECL_ALIAS if it is error_mark_node. (cp_parser_omp_declare_target): Likewise. * g++.dg/tls/pr102642.C: New test. --- gcc/cp/name-lookup.c.jj 2021-10-01 10:30:07.674588541 +0200 +++ gcc/cp/name-lookup.c2021-10-11 12:43:39.261051228 +0200 @@ -3474,7 +3474,9 @@ push_local_extern_decl_alias (tree decl) push_nested_namespace (ns); alias = do_pushdecl (alias, /* hiding= */true); pop_nested_namespace (ns); - if (VAR_P (decl) && CP_DECL_THREAD_LOCAL_P (decl)) + if (VAR_P (decl) + && CP_DECL_THREAD_LOCAL_P (decl) + && alias != error_mark_node) set_decl_tls_model (alias, DECL_TLS_MODEL (decl)); } } --- gcc/cp/decl.c.jj2021-10-09 10:07:51.883704975 +0200 +++ gcc/cp/decl.c 2021-10-11 12:49:33.810977118 +0200 @@ -7373,7 +7373,8 @@ make_rtl_for_nonlocal_decl (tree decl, t This is horrible, as we're affecting a possibly-shared decl. Again, a one-true-decl model breaks down. */ - set_user_assembler_name (ns_decl, asmspec); + if (ns_decl != error_mark_node) + set_user_assembler_name (ns_decl, asmspec); } } --- gcc/cp/parser.c.jj 2021-10-09 10:14:24.043098112 +0200 +++ gcc/cp/parser.c 2021-10-11 12:47:21.220874667 +0200 @@ -44437,7 +44437,8 @@ cp_parser_oacc_declare (cp_parser *parse dependent local extern variable decls are as rare as hen's teeth. */ if (auto alias = DECL_LOCAL_DECL_ALIAS (decl)) - decl = alias; + if (alias != error_mark_node) + decl = alias; if (OMP_CLAUSE_MAP_KIND (t) == GOMP_MAP_LINK) id = get_identifier ("omp declare target link"); @@ -45665,7 +45666,8 @@ cp_parser_omp_declare_target (cp_parser if (VAR_OR_FUNCTION_DECL_P (t) && DECL_LOCAL_DECL_P (t) && DECL_LANG_SPECIFIC (t) - && DECL_LOCAL_DECL_ALIAS (t)) + && DECL_LOCAL_DECL_ALIAS (t) + && DECL_LOCAL_DECL_ALIAS (t) != error_mark_node) handle_omp_declare_target_clause (c, DECL_LOCAL_DECL_ALIAS (t), device_type); } --- gcc/testsuite/g++.dg/tls/pr102642.C.jj 2021-10-11 13:00:35.889503002 +0200 +++ gcc/testsuite/g++.dg/tls/pr102642.C 2021-10-11 13:00:20.388724721 +0200 @@ -0,0 +1,10 @@ +// PR c++/102642 +// { dg-do compile { target c++11 } } + +thread_local int *z; // { dg-message "previous declaration" } + +void +foo () +{ + extern thread_local int z; // { dg-error "conflicting declaration" } +} Jakub
[Patch, committed] Fortran: Fix 'fn spec' for deferred character length (was: Re: [r12-4457 Regression] FAIL: gfortran.dg/deferred_type_param_6.f90 -Os execution test on Linux/x86_64)
On 16.10.21 20:54, Jan Hubicka wrote: I wrote: Fortran has for a long time 'character(len=5), allocatable" or "character(len=*)". In the first case, the "5" can be ignored as both caller and callee know the length. In the second case, the length is determined by the argument, but it cannot be changed. Since a not-that-short while, 'len=:' together with allocatable/pointer is supported. In the latter case, the value can be change when the array association/allocation is changed. ... + if (!sym->ts.u.cl->length + && ((sym->attr.allocatable && sym->attr.target) + || sym->attr.pointer)) +spec[spec_len++] = '.'; + if (!sym->ts.u.cl->length && sym->attr.allocatable) +spec[spec_len++] = 'w'; + else +spec[spec_len++] = 'R'; Also escaping is quite important bit of information so it would be good to figure out if it really can escape rather than playing safe. The pointer to the string length variable itself does not escape, only its integer string value: subroutine foo(x) character(len=:), pointer :: x character(len=:), pointer :: y y => x has in the dump: .y = *_x; y = (character(kind=1)[1:.y] *) *x; Thus, 'w' can always be used. Committed as obvious as r12-4511-gff0eec94e87dfb7dc387f120ca5ade2707aecf50 Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit ff0eec94e87dfb7dc387f120ca5ade2707aecf50 Author: Tobias Burnus Date: Tue Oct 19 16:38:56 2021 +0200 Fortran: Fix 'fn spec' for deferred character length Shows now up with gfortran.dg/deferred_type_param_6.f90 due to more ME optimizations, causing fails without this commit. gcc/fortran/ChangeLog: * trans-types.c (create_fn_spec): For allocatable/pointer character(len=:), use 'w' not 'R' as fn spec for the length dummy argument. --- gcc/fortran/trans-types.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c index 50fceebc941..42778067dbe 100644 --- a/gcc/fortran/trans-types.c +++ b/gcc/fortran/trans-types.c @@ -3014,7 +3014,11 @@ create_fn_spec (gfc_symbol *sym, tree fntype) } if (sym->ts.type == BT_CHARACTER) { - spec[spec_len++] = 'R'; + if (!sym->ts.u.cl->length + && (sym->attr.allocatable || sym->attr.pointer)) + spec[spec_len++] = 'w'; + else + spec[spec_len++] = 'R'; spec[spec_len++] = ' '; } }
Re: [PATCH] AArch64: Tune case-values-threshold
Hi Richard, > The problem is that you're effectively asking for these values to be > taken on faith without providing any analysis and without describing > how you arrived at the new numbers. Did you try other values too? > If so, how did they compare with the numbers that you finally chose? > At least that would give an indication of where the boundaries are. Yes, I obviously tried other values, pretty much all in range 1-20. There is generally a range of 4-5 values that are very similar in size, and then you choose one in the middle which also looks good for performance. > For example, it's easier to believe that 8 is the right value for -Os if > you say that you tried 9 and 7 as well, and they were worse than 8 by X% > and Y%. This would also help anyone who wants to tweak the numbers > again in future. For -Os, the size range for values 6-10 is within 0.01% so they are virtually identical and I picked the median. Whether this will remain best in the future is unclear since it depends on so many things, so at some point it needs to be looked at again, just like most other tunings. > BTW, which CPU did you use to do the experiments? Are the tuning > parameters for that CPU already consistent with the new generic values? This was done on Neoverse N1. Almost no CPUs use per-CPU tuning for this. Cheers, Wilco
Re: [PATCH] Add a simulate_record_decl lang hook
On 10/18/21 16:35, Richard Sandiford wrote: Jason Merrill writes: On 9/24/21 13:53, Richard Sandiford wrote: + if (type == error_mark_node) +return lhd_simulate_record_decl (loc, name, fields); Why fall back to the language-independent function on error? Is there a case where that gives better error recovery than just returning error_mark_node? I don't think falling back necessarily improves future error messages (or makes them worse). The reason was more that the code to handle target builtins generally expects to be able to create whatever types and functions it wants. If we return something unexpected, even it's error_mark_node, then there's a higher risk of ICEs later on. I guess that's a bit defeatist. But in practice, the first code that uses the hook will be code that previously ran at start-up and so didn't have to worry about these errors. In practice I think errors will be extremely rare. + xref_basetypes (type, NULL_TREE); + type = begin_class_definition (type); + if (type == error_mark_node) +return lhd_simulate_record_decl (loc, name, fields); + + for (tree field : fields) +finish_member_declaration (field); + + type = finish_struct (type, NULL_TREE); + + tree decl = build_decl (loc, TYPE_DECL, ident, type); + TYPE_NAME (type) = decl; + TYPE_STUB_DECL (type) = decl; Setting TYPE_NAME and TYPE_STUB_DECL to the typedef is wrong; it should work to just remove these two lines. I expect they're also wrong for C. For C++ only, I wonder if you need this typedef at all. If you do want it, you need to use set_underlying_type to create a real typedef. I expect that's also true for C. Ah, yeah, thanks for the pointer. Fixed in the patch below. I wanted the hook to simulate the typedef even for C++ because its first user will be arm_neon.h. The spec for arm_neon.h says that the types must be declared as: typedef struct int32x2x4_t { … } int32x2x4_t; etc. So, although it's a silly edge case, code that tries to take advantage of the struct stat hack, such as: #include struct int32x2x4_t int32x2x4_t = {}; should continue to be rejected for C++ as well as C. Maybe in future we could add a flag to suppress the typedef if some callers prefer that behaviour. Tested as before. Can the C++ hook go in cp-lang.c (which already includes langhooks-def.h) instead of decl.c? With that change, the patch is OK in a week if nobody else has feedback. gcc/ * langhooks.h (lang_hooks_for_types::simulate_record_decl): New hook. * langhooks-def.h (lhd_simulate_record_decl): Declare. (LANG_HOOKS_SIMULATE_RECORD_DECL): Define. (LANG_HOOKS_FOR_TYPES_INITIALIZER): Include it. * langhooks.c (lhd_simulate_record_decl): New function. gcc/c/ * c-tree.h (c_simulate_record_decl): Declare. * c-objc-common.h (LANG_HOOKS_SIMULATE_RECORD_DECL): Override. * c-decl.c (c_simulate_record_decl): New function. gcc/cp/ * decl.c: Include langhooks-def.h. (cxx_simulate_record_decl): New function. * cp-objcp-common.h (cxx_simulate_record_decl): Declare. (LANG_HOOKS_SIMULATE_RECORD_DECL): Override. --- gcc/c/c-decl.c | 30 ++ gcc/c/c-objc-common.h| 2 ++ gcc/c/c-tree.h | 2 ++ gcc/cp/cp-objcp-common.h | 4 gcc/cp/decl.c| 37 + gcc/langhooks-def.h | 4 gcc/langhooks.c | 19 +++ gcc/langhooks.h | 10 ++ 8 files changed, 108 insertions(+) diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index 771efa3eadf..186fa1692c1 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -9436,6 +9436,36 @@ c_simulate_enum_decl (location_t loc, const char *name, input_location = saved_loc; return enumtype; } + +/* Implement LANG_HOOKS_SIMULATE_RECORD_DECL. */ + +tree +c_simulate_record_decl (location_t loc, const char *name, + array_slice fields) +{ + location_t saved_loc = input_location; + input_location = loc; + + class c_struct_parse_info *struct_info; + tree ident = get_identifier (name); + tree type = start_struct (loc, RECORD_TYPE, ident, &struct_info); + + for (unsigned int i = 0; i < fields.size (); ++i) +{ + DECL_FIELD_CONTEXT (fields[i]) = type; + if (i > 0) + DECL_CHAIN (fields[i - 1]) = fields[i]; +} + + finish_struct (loc, type, fields[0], NULL_TREE, struct_info); + + tree decl = build_decl (loc, TYPE_DECL, ident, type); + set_underlying_type (decl); + lang_hooks.decls.pushdecl (decl); + + input_location = saved_loc; + return type; +} /* Create the FUNCTION_DECL for a function definition. DECLSPECS, DECLARATOR and ATTRIBUTES are the parts of diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h index 7d35a0621e4..f4e8271f06c 100644 --- a/gcc/c/c-objc-common.h +++ b/gcc/c/c-objc-common.h @@ -81,6 +81,8 @@ along with GCC; see the file COPYING3. I
Re: [PATCH v2 0/4] libffi: Sync with upstream
Hi, H.J. My colleague built GCC, including GCC Go, with your patch: "I was able to build libgo and test it partially. The results are similar to the current master without libffi updates. But 64bit tests aren't working in both cases. It's related to LIBPATH issues..." - David On Mon, Oct 18, 2021 at 11:09 AM H.J. Lu wrote: > > On Mon, Oct 18, 2021 at 8:04 AM David Edelsohn wrote: > > > > Hi, H.J. > > > > My colleague responded that GCC Go builds and works on AIX, but it > > currently requires a special, custom version of GNU objcopy that adds > > support for the types of features that Go requires to operate on AIX > > XCOFF files. Those changes have not yet been updated and contributed > > to GNU Binutils. > > > > I will see if I can install that version of objcopy standalone. We > > also can ask Clement and ATOS to test GCC Go build with your proposed > > libffi patch, or is it vanilla libffi trunk? > > My libffi branch: > > https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/libffi/master > > synced with libffi v3.4.2, not master. > > BTW, the current master branch won't build libgo: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102796 > > Thanks. > > > Thanks, David > > > > > > On Sat, Oct 16, 2021 at 3:59 PM H.J. Lu wrote: > > > > > > On Sat, Oct 16, 2021 at 12:53 PM David Edelsohn wrote: > > > > > > > > On Sat, Oct 16, 2021 at 1:13 PM H.J. Lu wrote: > > > > > > > > > > On Sat, Oct 16, 2021 at 10:04 AM David Edelsohn > > > > > wrote: > > > > > > > > > > > > On Sat, Oct 16, 2021 at 7:48 AM H.J. Lu wrote: > > > > > > > > > > > > > > On Fri, Oct 15, 2021 at 5:22 PM David Edelsohn > > > > > > > wrote: > > > > > > > > > > > > > > > > On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Change in the v2 patch: > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. Disable static trampolines by default. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > GCC maintained a copy of libffi snapshot from 2009 > > > > > > > > > > > > > > and cherry-picked fixes > > > > > > > > > > > > > > from upstream over the last 10+ years. In the > > > > > > > > > > > > > > meantime, libffi upstream > > > > > > > > > > > > > > has been changed significantly with new features, > > > > > > > > > > > > > > bug fixes and new target > > > > > > > > > > > > > > support. Here is a set of patches to sync with > > > > > > > > > > > > > > libffi 3.4.2 release and > > > > > > > > > > > > > > make it easier to sync with libffi upstream: > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. Document how to sync with upstream. > > > > > > > > > > > > > > 2. Add scripts to help sync with upstream. > > > > > > > > > > > > > > 3. Sync with libffi 3.4.2. This patch is quite big. > > > > > > > > > > > > > > It is availale at > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f > > > > > > > > > > > > > > 4. Integrate libffi build and testsuite with GCC. > > > > > > > > > > > > > > > > > > > > > > > > > > How did you test this? It looks like libgo is the > > > > > > > > > > > > > only consumer of > > > > > > > > > > > > > libffi these days. > > > > > > > > > > > > > In particular go/libgo seems to be supported on > > > > > > > > > > > > > almost all targets besides > > > > > > > > > > > > > darwin/windows - did you test cross and canadian > > > > > > > > > > > > > configurations? > > > > > > > > > > > > > > > > > > > > > > > > I only tested it on Linux/i686 and Linux/x86-64. My > > > > > > > > > > > > understanding is that > > > > > > > > > > > > the upstream libffi works on Darwin and Windows. > > > > > > > > > > > > > > > > > > > > > > > > > I applaud the attempt to sync to upsteam but I fear > > > > > > > > > > > > > you won't get any "review" > > > > > > > > > > > > > of this massive diff. > > > > > > > > > > > > > > > > > > > > > > > > I believe that it should just work. Our libffi is very > > > > > > > > > > > > much out of date. > > > > > > > > > > > > > > > > > > > > > > Yes, you can hope. And yes, our libffi is out of date. > > > > > > > > > > > > > > > > > > > > > > Can you please do the extra step to test one weird > > > > > > > > > > > architecture, namely > > > > > > > > > > > powerpc64-aix which is available on the compile-farm? > > >
Re: [PATCH] c++: Fix up push_local_extern_decl_alias error recovery [PR102642]
On Tue, Oct 19, 2021 at 10:48:04AM -0400, Jason Merrill wrote: > What if we use NULL_TREE for the error case instead of error_mark_node, i.e. > > - DECL_LOCAL_DECL_ALIAS (decl) = alias; > + if (alias != error_mark_node) > +DECL_LOCAL_DECL_ALIAS (decl) = alias; > > ? That ought to avoid the need to change other functions, since they > already check for null. True, but I'm worried about e.g. maybe_version_functions which does if (DECL_LOCAL_DECL_P (olddecl)) { olddecl = DECL_LOCAL_DECL_ALIAS (olddecl); maybe_mark_function_versioned (olddecl); } (I think the above will crash of DECL_LOCAL_DECL_ALIAS is NULL) and if (DECL_LOCAL_DECL_P (newdecl)) { /* Unfortunately, we can get here before pushdecl naturally calls push_local_extern_decl_alias, so we need to call it directly. */ if (!DECL_LOCAL_DECL_ALIAS (newdecl)) push_local_extern_decl_alias (newdecl); newdecl = DECL_LOCAL_DECL_ALIAS (newdecl); maybe_mark_function_versioned (newdecl); } which means that if there is an error from push_local_extern_decl_alias, we'd report it twice rather than once (once from here, another time from do_pushdecl). I'm afraid maybe_version_functions needs fixing no matter what, but the NULL vs. error_mark_node decision is probably dependent on whether it is ok to error twice or not. Jakub
Re: [PATCH v4 3/3] rs6000: Guard some x86 intrinsics implementations
On Tue, Oct 19, 2021 at 09:32:20AM -0500, Segher Boessenkool wrote: > On Mon, Oct 18, 2021 at 08:15:12PM -0500, Paul A. Clarke via Gcc-patches > wrote: > > Some compatibility implementations of x86 intrinsics include > > Power intrinsics which require POWER8. Guard them. > > I assume this improves on all previous commented things (you don't say > if it does). Sorry, I summarized the changes in the v4 cover letter. This patch required no changes other than adding a new PR addressed by it. The reasons for no changes was in my reply to your review of v3. > > gcc > > PR target/101893 > > PR target/102719 > > * config/rs6000/emmintrin.h: Guard POWER8 intrinsics. > > * config/rs6000/pmmintrin.h: Same. > > * config/rs6000/smmintrin.h: Same. > > * config/rs6000/tmmintrin.h: Same. > > Okay for trunk. Thanks! Thanks! PC
[committed] libstdc++: Fix doxygen generation to work with relative paths
In r12-826 I tried to remove some redundant steps from the doxygen build, but they are needed when configure is run as a relative path. The use of pwd is to resolve the relative path to an absolute one. libstdc++-v3/ChangeLog: * doc/Makefile.am (stamp-html-doxygen, stamp-html-doxygen) (stamp-latex-doxygen, stamp-man-doxygen): Fix recipes for relative ${top_srcdir}. * doc/Makefile.in: Regenerate. Tested x86_64-linux. Committed to trunk. I'll backport to gcc-11 after testing finishes. commit 04d392e8430ca66a3f12b7db4f3cb84788269a48 Author: Jonathan Wakely Date: Tue Oct 19 16:00:13 2021 libstdc++: Fix doxygen generation to work with relative paths In r12-826 I tried to remove some redundant steps from the doxygen build, but they are needed when configure is run as a relative path. The use of pwd is to resolve the relative path to an absolute one. libstdc++-v3/ChangeLog: * doc/Makefile.am (stamp-html-doxygen, stamp-html-doxygen) (stamp-latex-doxygen, stamp-man-doxygen): Fix recipes for relative ${top_srcdir}. * doc/Makefile.in: Regenerate. diff --git a/libstdc++-v3/doc/Makefile.am b/libstdc++-v3/doc/Makefile.am index 487e8621b23..0aacf3f27de 100644 --- a/libstdc++-v3/doc/Makefile.am +++ b/libstdc++-v3/doc/Makefile.am @@ -226,10 +226,11 @@ ${doxygen_outdir}/man: mkdir -p ${doxygen_outdir}/man stamp-xml-doxygen: ${doxygen_outdir}/xml - @builddir=`cd ..; ${PWD_COMMAND}`; \ + -srcdir=`cd ${top_srcdir}; ${PWD_COMMAND}`; \ + builddir=`cd ..; ${PWD_COMMAND}`; \ ${SHELL} ${doxygen_script} \ --host_alias=${host_alias} --mode=xml \ - "${top_srcdir}" "$${builddir}" NO || true + "$${srcdir}" "$${builddir}" NO $(STAMP) stamp-xml-doxygen stamp-xml-single-doxygen: stamp-xml-doxygen @@ -239,17 +240,19 @@ stamp-xml-single-doxygen: stamp-xml-doxygen $(STAMP) stamp-xml-single-doxygen stamp-html-doxygen: ${doxygen_outdir}/html - @builddir=`cd ..; ${PWD_COMMAND}`; \ + -srcdir=`cd ${top_srcdir}; ${PWD_COMMAND}`; \ + builddir=`cd ..; ${PWD_COMMAND}`; \ ${SHELL} ${doxygen_script} \ --host_alias=${host_alias} --mode=html \ - "${top_srcdir}" "$${builddir}" YES || true + "$${srcdir}" "$${builddir}" YES $(STAMP) stamp-html-doxygen stamp-latex-doxygen: ${doxygen_outdir}/latex - @builddir=`cd ..; ${PWD_COMMAND}`; \ + -srcdir=`cd ${top_srcdir}; ${PWD_COMMAND}`; \ + builddir=`cd ..; ${PWD_COMMAND}`; \ ${SHELL} ${doxygen_script} \ --host_alias=${host_alias} --mode=latex --latex_cmd=$(LATEX_CMD) \ - "${top_srcdir}" "$${builddir}" NO || true + "$${srcdir}" "$${builddir}" NO $(STAMP) stamp-latex-doxygen # Chance of loonnggg creation time on this rule. Iff this fails, @@ -274,10 +277,11 @@ stamp-pdf-doxygen: stamp-latex-doxygen ${doxygen_outdir}/pdf $(STAMP) stamp-pdf-doxygen stamp-man-doxygen: ${doxygen_outdir}/man - @builddir=`cd ..; ${PWD_COMMAND}`; \ + -srcdir=`cd ${top_srcdir}; ${PWD_COMMAND}`; \ + builddir=`cd ..; ${PWD_COMMAND}`; \ ${SHELL} ${doxygen_script} \ --host_alias=${host_alias} --mode=man \ - "${top_srcdir}" "$${builddir}" YES || true + "$${srcdir}" "$${builddir}" YES $(STAMP) stamp-man-doxygen doc-xml-doxygen: stamp-xml-doxygen
[committed] libstdc++: Implement std::random_device::entropy() for other sources
Currently this function only returns a non-zero value for /dev/random and /dev/urandom. When a hardware instruction such as RDRAND is in use it should (in theory) be perfectly random and produce 32 bits of entropy in each 32-bit result. Add a helper function to identify the source of randomness from the _M_func and _M_file data members, and return a suitable value when RDRAND or RDSEED is being used. libstdc++-v3/ChangeLog: * src/c++11/random.cc (which_source): New helper function. (random_device::_M_getentropy()): Use which_source and return suitable values for sources other than device files. * testsuite/26_numerics/random/random_device/entropy.cc: New test. Tested powerpc64le-linux. Committed to trunk. commit 58f339fc5eaae7db9526f81ab91f282ad4a9b8cc Author: Jonathan Wakely Date: Tue Oct 19 12:31:06 2021 libstdc++: Implement std::random_device::entropy() for other sources Currently this function only returns a non-zero value for /dev/random and /dev/urandom. When a hardware instruction such as RDRAND is in use it should (in theory) be perfectly random and produce 32 bits of entropy in each 32-bit result. Add a helper function to identify the source of randomness from the _M_func and _M_file data members, and return a suitable value when RDRAND or RDSEED is being used. libstdc++-v3/ChangeLog: * src/c++11/random.cc (which_source): New helper function. (random_device::_M_getentropy()): Use which_source and return suitable values for sources other than device files. * testsuite/26_numerics/random/random_device/entropy.cc: New test. diff --git a/libstdc++-v3/src/c++11/random.cc b/libstdc++-v3/src/c++11/random.cc index 44b9f30e4a9..4b64bde00ea 100644 --- a/libstdc++-v3/src/c++11/random.cc +++ b/libstdc++-v3/src/c++11/random.cc @@ -192,6 +192,51 @@ namespace std _GLIBCXX_VISIBILITY(default) return lcg(); } #endif + +enum Which { + rand_s = 1, rdseed = 2, rdrand = 4, device_file = 8, prng = 16, + any = 0x +}; + +inline Which +which_source(random_device::result_type (*func [[maybe_unused]])(void*), +void* file [[maybe_unused]]) +{ +#ifdef _GLIBCXX_USE_CRT_RAND_S + if (func == &__winxp_rand_s) + return rand_s; +#endif + +#ifdef USE_RDSEED +#ifdef USE_RDRAND + if (func == &__x86_rdseed_rdrand) + return rdseed; +#endif + if (func == &__x86_rdseed) + return rdseed; +#endif + +#ifdef USE_RDRAND + if (func == &__x86_rdrand) + return rdrand; +#endif + +#ifdef _GLIBCXX_USE_DEV_RANDOM + if (file != nullptr) + return device_file; +#endif + +#ifdef USE_LCG + if (func == &__lcg) + return prng; +#endif + +#ifdef USE_MT19937 + return prng; +#endif + + return any; // should be unreachable +} } void @@ -209,10 +254,7 @@ namespace std _GLIBCXX_VISIBILITY(default) const char* fname [[gnu::unused]] = nullptr; -enum { - rand_s = 1, rdseed = 2, rdrand = 4, device_file = 8, prng = 16, - any = 0x -} which; +Which which; if (token == "default") { @@ -449,10 +491,25 @@ namespace std _GLIBCXX_VISIBILITY(default) double random_device::_M_getentropy() const noexcept { +const int max = sizeof(result_type) * __CHAR_BIT__; + +switch(which_source(_M_func, _M_file)) +{ +case rdrand: +case rdseed: + return (double) max; +case rand_s: +case prng: + return 0.0; +case device_file: + // handled below + break; +default: + return 0.0; +} + #if defined _GLIBCXX_USE_DEV_RANDOM \ && defined _GLIBCXX_HAVE_SYS_IOCTL_H && defined RNDGETENTCNT -if (!_M_file) - return 0.0; #ifdef USE_POSIX_FILE_IO const int fd = _M_fd; @@ -469,7 +526,6 @@ namespace std _GLIBCXX_VISIBILITY(default) if (ent < 0) return 0.0; -const int max = sizeof(result_type) * __CHAR_BIT__; if (ent > max) ent = max; diff --git a/libstdc++-v3/testsuite/26_numerics/random/random_device/entropy.cc b/libstdc++-v3/testsuite/26_numerics/random/random_device/entropy.cc new file mode 100644 index 000..9ef1538d2bb --- /dev/null +++ b/libstdc++-v3/testsuite/26_numerics/random/random_device/entropy.cc @@ -0,0 +1,37 @@ +// { dg-do run { target c++11 } } + +#include +#include +#include + +void +test01() +{ + for (auto token : { "mt19937", "prng", "rand_s" }) +if (__gnu_test::random_device_available(token)) + VERIFY( std::random_device(token).entropy() == 0.0 ); + + using result_type = std::random_device::result_type; + const double max = std::log2(std::numeric_limits::max() + 1.0); + + for (auto token : { "/dev/random", "/dev/urandom" }) +if (__gnu_test::random_device_available(token)) +{ + const double entropy = std::random_device(token).entropy(); + VERIFY( entropy >= 0.0 ); + VERIFY( entropy <= max
[PATCH] libstdc++: Add support for POWER9 DARN instruction to std::random_device
The ISA-3.0 instruction set includes DARN ("deliver a random number") which can be used similar to the existing support for RDRAND and RDSEED. libstdc++-v3/ChangeLog: * src/c++11/random.cc (USE_DARN): Define. (__ppc_darn): New function to use POWER9 DARN instruction. (Which): Add 'darn' enumerator. (which_source): Check for __ppc_darn. (random_device::_M_init): Support "darn" and "hw" tokens. (random_device::_M_getentropy): Add darn to switch. * testsuite/26_numerics/random/random_device/cons/token.cc: Check "darn" token. * testsuite/26_numerics/random/random_device/entropy.cc: Likewise. Tested powerpc64le-linux (power8 and power9) and x86_64-linux. The new "darn" (power-specific) and "hw" (x86 and power) strings should be documented, but I'll do that if this gets committed. Most of this patch is just "more of the same", similar to the existing code for RDRAND and RDSEED on x86, but the parts of the patch I'd like more eyes on are: +#elif defined __powerpc__ && defined __BUILTIN_CPU_SUPPORTS__ +# define USE_DARN 1 #endif #include and: @@ -135,6 +137,15 @@ namespace std _GLIBCXX_VISIBILITY(default) #endif #endif +#ifdef USE_DARN +unsigned int +__attribute__((target("power9"))) +__ppc_darn(void*) +{ + return __builtin_darn_32(); +} +#endif + and: @@ -346,6 +375,17 @@ namespace std _GLIBCXX_VISIBILITY(default) } #endif // USE_RDRAND +#ifdef USE_DARN +if (which & darn) + { + if (__builtin_cpu_supports("darn")) + { + _M_func = &__ppc_darn; + return; + } + } +#endif // USE_DARN + #ifdef _GLIBCXX_USE_DEV_RANDOM if (which & device_file) { commit 5cfef2d435b5cb3b3e959e14e4b1edde8edea473 Author: Jonathan Wakely Date: Tue Oct 19 12:53:00 2021 libstdc++: Add support for POWER9 DARN instruction to std::random_device The ISA-3.0 instruction set includes DARN ("deliver a random number") which can be used similar to the existing support for RDRAND and RDSEED. libstdc++-v3/ChangeLog: * src/c++11/random.cc (USE_DARN): Define. (__ppc_darn): New function to use POWER9 DARN instruction. (Which): Add 'darn' enumerator. (which_source): Check for __ppc_darn. (random_device::_M_init): Support "darn" and "hw" tokens. (random_device::_M_getentropy): Add darn to switch. * testsuite/26_numerics/random/random_device/cons/token.cc: Check "darn" token. * testsuite/26_numerics/random/random_device/entropy.cc: Likewise. diff --git a/libstdc++-v3/src/c++11/random.cc b/libstdc++-v3/src/c++11/random.cc index 4b64bde00ea..96d59799c2c 100644 --- a/libstdc++-v3/src/c++11/random.cc +++ b/libstdc++-v3/src/c++11/random.cc @@ -37,6 +37,8 @@ # ifdef _GLIBCXX_X86_RDSEED # define USE_RDSEED 1 # endif +#elif defined __powerpc__ && defined __BUILTIN_CPU_SUPPORTS__ +# define USE_DARN 1 #endif #include @@ -69,7 +71,7 @@ #if defined _GLIBCXX_USE_CRT_RAND_S || defined _GLIBCXX_USE_DEV_RANDOM // The OS provides a source of randomness we can use. # pragma GCC poison _M_mt -#elif defined USE_RDRAND || defined USE_RDSEED +#elif defined USE_RDRAND || defined USE_RDSEED || defined USE_DARN // Hardware instructions might be available, but use cpuid checks at runtime. # pragma GCC poison _M_mt // If the runtime cpuid checks fail we'll use a linear congruential engine. @@ -135,6 +137,15 @@ namespace std _GLIBCXX_VISIBILITY(default) #endif #endif +#ifdef USE_DARN +unsigned int +__attribute__((target("power9"))) +__ppc_darn(void*) +{ + return __builtin_darn_32(); +} +#endif + #ifdef _GLIBCXX_USE_CRT_RAND_S unsigned int __winxp_rand_s(void*) @@ -193,11 +204,16 @@ namespace std _GLIBCXX_VISIBILITY(default) } #endif -enum Which { - rand_s = 1, rdseed = 2, rdrand = 4, device_file = 8, prng = 16, +enum Which : unsigned { + device_file = 1, prng = 2, rand_s = 4, + rdseed = 64, rdrand = 128, darn = 256, any = 0x }; +constexpr Which +operator|(Which l, Which r) noexcept +{ return Which(unsigned(l) | unsigned(r)); } + inline Which which_source(random_device::result_type (*func [[maybe_unused]])(void*), void* file [[maybe_unused]]) @@ -221,6 +237,11 @@ namespace std _GLIBCXX_VISIBILITY(default) return rdrand; #endif +#ifdef USE_DARN + if (func == &__ppc_darn) + return darn; +#endif + #ifdef _GLIBCXX_USE_DEV_RANDOM if (file != nullptr) return device_file; @@ -269,6 +290,14 @@ namespace std _GLIBCXX_VISIBILITY(default) else if (token == "rdrand" || token == "rdrnd") which = rdrand; #endif // USE_RDRAND +#ifdef USE_DARN +else if (token == "darn") + which = darn; +#endif +#if defined USE_RDRAND || defined USE_RDSEED || defined USE_DA
Re: [PATCH v2 0/4] libffi: Sync with upstream
On Tue, Oct 19, 2021 at 8:03 AM David Edelsohn wrote: > > Hi, H.J. > > My colleague built GCC, including GCC Go, with your patch: > > "I was able to build libgo and test it partially. The results are > similar to the current master without libffi updates. But 64bit tests > aren't working in both cases. It's related to LIBPATH issues..." > Thanks for checking. I will rebase and retest. If there is no regression, I will check them in. Thanks. -- H.J.
[PATCH] x86: Adjust gcc.target/i386/pr22076.c
commit 247c407c83f0015f4b92d5f71e45b63192f6757e Author: Roger Sayle Date: Mon Oct 18 12:15:40 2021 +0100 Try placing RTL folded constants in the constant pool. My recent attempts to come up with a testcase for my patch to evaluate ss_plus in simplify-rtx.c, identified a missed optimization opportunity (that's potentially a long-time regression): The RTL optimizers no longer place constants in the constant pool. changed -m32 codegen from movq.LC1, %mm0 paddb .LC0, %mm0 movq%mm0, x ret to movl$807671820, %eax movl$1616136252, %edx movl%eax, x movl%edx, x+4 ret and -m64 codegen from movq.LC1(%rip), %mm0 paddb .LC0(%rip), %mm0 movq%xmm0, x(%rip) ret to movq.LC2(%rip), %rax movq%rax, x(%rip) ret Adjust pr22076.c to check that MMX register isn't used since avoiding MMX register isn't a bad thing. PR testsuite/102840 * gcc.target/i386/pr22076.c: Updated to check that MMX register isn't used. --- gcc/testsuite/gcc.target/i386/pr22076.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/pr22076.c b/gcc/testsuite/gcc.target/i386/pr22076.c index 427ffcd4920..aa06f057690 100644 --- a/gcc/testsuite/gcc.target/i386/pr22076.c +++ b/gcc/testsuite/gcc.target/i386/pr22076.c @@ -15,5 +15,6 @@ void test () x = _mm_add_pi8 (mm0, mm1); } -/* { dg-final { scan-assembler-times "movq" 2 } } */ -/* { dg-final { scan-assembler-not "movl" { target nonpic } } } */ +/* { dg-final { scan-assembler-times "movq" 2 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "movl" 4 { target { nonpic && ia32 } } } } */ +/* { dg-final { scan-assembler-not "%mm" } } */ -- 2.32.0
[PATCH] libstdc++: Implement LWG 3470 change to ranges::subrange
Tested on x86_64-pc-linux-gnu, does this look OK for trunk and branches? libstdc++-v3/ChangeLog: * include/bits/ranges_util.h (__detail::__uses_nonqualification_pointer_conversion): Define and use it ... (__convertible_to_nonslicing): ... here, as per LWG 3470. * testsuite/std/ranges/subrange/1.cc: New test. --- libstdc++-v3/include/bits/ranges_util.h | 13 + .../testsuite/std/ranges/subrange/1.cc| 19 +++ 2 files changed, 28 insertions(+), 4 deletions(-) create mode 100644 libstdc++-v3/testsuite/std/ranges/subrange/1.cc diff --git a/libstdc++-v3/include/bits/ranges_util.h b/libstdc++-v3/include/bits/ranges_util.h index 7e7b958d274..765848e327d 100644 --- a/libstdc++-v3/include/bits/ranges_util.h +++ b/libstdc++-v3/include/bits/ranges_util.h @@ -184,11 +184,16 @@ namespace ranges namespace __detail { -template +template + concept __uses_nonqualification_pointer_conversion + = is_pointer_v<_From> && is_pointer_v<_To> + && !convertible_to(*)[], +remove_pointer_t<_To>(*)[]>; + +template concept __convertible_to_non_slicing = convertible_to<_From, _To> - && !(is_pointer_v> && is_pointer_v> - && __different_from>, - remove_pointer_t>>); + && !__uses_nonqualification_pointer_conversion, + decay_t<_To>>; template concept __pair_like diff --git a/libstdc++-v3/testsuite/std/ranges/subrange/1.cc b/libstdc++-v3/testsuite/std/ranges/subrange/1.cc new file mode 100644 index 000..8a53261c78c --- /dev/null +++ b/libstdc++-v3/testsuite/std/ranges/subrange/1.cc @@ -0,0 +1,19 @@ +// { dg-options "-std=gnu++20" } +// { dg-do run { target c++20 } } + +#include + +void +test01() +{ + // LWG 3470 + int a[3] = {1,2,3}; + int* b[3] = {&a[2], &a[0], &a[1]}; + auto c = std::ranges::subrange(b); +} + +int +main() +{ + test01(); +} -- 2.33.1.711.g9d530dc002
[PATCH] libstdc++: Implement LWG 3568 change to ranges::basic_istream_view
Tested on x86_64-pc-linux-gnu, does this look OK for trunk? (The branches don't have P2325R3.) libstdc++-v3/ChangeLog: * include/std/ranges (basic_istream_view::_M_object): Value initialize as per LWG 3568. --- libstdc++-v3/include/std/ranges | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges index 5e18c98eb2f..4a90f115d2f 100644 --- a/libstdc++-v3/include/std/ranges +++ b/libstdc++-v3/include/std/ranges @@ -711,7 +711,7 @@ namespace views private: basic_istream<_CharT, _Traits>* _M_stream; - _Val _M_object; + _Val _M_object = _Val(); struct _Iterator { -- 2.33.1.711.g9d530dc002
[PATCH] libstdc++: Implement LWG 3580 changes to ranges::iota_view
Tested on x86_64-pc-linux-gnu, does this look OK for trunk and branches? libstdc++-v3/ChangeLog: * include/std/ranges (iota_view::operator+): Adjust definition as per LWG 3580. (iota_view::operator-): Likewise. --- libstdc++-v3/include/std/ranges | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges index 18bd087985c..5e18c98eb2f 100644 --- a/libstdc++-v3/include/std/ranges +++ b/libstdc++-v3/include/std/ranges @@ -497,7 +497,10 @@ namespace ranges friend constexpr _Iterator operator+(_Iterator __i, difference_type __n) requires __detail::__advanceable<_Winc> - { return __i += __n; } + { + __i += __n; + return __i; + } friend constexpr _Iterator operator+(difference_type __n, _Iterator __i) @@ -507,7 +510,10 @@ namespace ranges friend constexpr _Iterator operator-(_Iterator __i, difference_type __n) requires __detail::__advanceable<_Winc> - { return __i -= __n; } + { + __i -= __n; + return __i; + } friend constexpr difference_type operator-(const _Iterator& __x, const _Iterator& __y) -- 2.33.1.711.g9d530dc002
Re: [PATCH] libstdc++: Implement LWG 3580 changes to ranges::iota_view
On Tue, 19 Oct 2021, 19:33 Patrick Palka via Libstdc++, < libstd...@gcc.gnu.org> wrote: > Tested on x86_64-pc-linux-gnu, does this look OK for trunk and branches? > Yes, thanks. > libstdc++-v3/ChangeLog: > > * include/std/ranges (iota_view::operator+): Adjust definition > as per LWG 3580. > (iota_view::operator-): Likewise. > --- > libstdc++-v3/include/std/ranges | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/libstdc++-v3/include/std/ranges > b/libstdc++-v3/include/std/ranges > index 18bd087985c..5e18c98eb2f 100644 > --- a/libstdc++-v3/include/std/ranges > +++ b/libstdc++-v3/include/std/ranges > @@ -497,7 +497,10 @@ namespace ranges > friend constexpr _Iterator > operator+(_Iterator __i, difference_type __n) > requires __detail::__advanceable<_Winc> > - { return __i += __n; } > + { > + __i += __n; > + return __i; > + } > > friend constexpr _Iterator > operator+(difference_type __n, _Iterator __i) > @@ -507,7 +510,10 @@ namespace ranges > friend constexpr _Iterator > operator-(_Iterator __i, difference_type __n) > requires __detail::__advanceable<_Winc> > - { return __i -= __n; } > + { > + __i -= __n; > + return __i; > + } > > friend constexpr difference_type > operator-(const _Iterator& __x, const _Iterator& __y) > -- > 2.33.1.711.g9d530dc002 > >
Re: [PATCH] libstdc++: Implement LWG 3568 change to ranges::basic_istream_view
On Tue, 19 Oct 2021, 19:31 Patrick Palka via Libstdc++, < libstd...@gcc.gnu.org> wrote: > Tested on x86_64-pc-linux-gnu, does this look OK for trunk? > Yes, thanks. (The branches don't have P2325R3.) > > libstdc++-v3/ChangeLog: > > * include/std/ranges (basic_istream_view::_M_object): Value > initialize as per LWG 3568. > --- > libstdc++-v3/include/std/ranges | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/libstdc++-v3/include/std/ranges > b/libstdc++-v3/include/std/ranges > index 5e18c98eb2f..4a90f115d2f 100644 > --- a/libstdc++-v3/include/std/ranges > +++ b/libstdc++-v3/include/std/ranges > @@ -711,7 +711,7 @@ namespace views > > private: >basic_istream<_CharT, _Traits>* _M_stream; > - _Val _M_object; > + _Val _M_object = _Val(); > >struct _Iterator >{ > -- > 2.33.1.711.g9d530dc002 > >
Re: [PATCH] libstdc++: Implement LWG 3470 change to ranges::subrange
On Tue, 19 Oct 2021, 19:30 Patrick Palka via Libstdc++, < libstd...@gcc.gnu.org> wrote: > Tested on x86_64-pc-linux-gnu, does this look OK for trunk and branches? > Yes, thanks. > libstdc++-v3/ChangeLog: > > * include/bits/ranges_util.h > (__detail::__uses_nonqualification_pointer_conversion): Define > and use it ... > (__convertible_to_nonslicing): ... here, as per LWG 3470. > * testsuite/std/ranges/subrange/1.cc: New test. > --- > libstdc++-v3/include/bits/ranges_util.h | 13 + > .../testsuite/std/ranges/subrange/1.cc| 19 +++ > 2 files changed, 28 insertions(+), 4 deletions(-) > create mode 100644 libstdc++-v3/testsuite/std/ranges/subrange/1.cc > > diff --git a/libstdc++-v3/include/bits/ranges_util.h > b/libstdc++-v3/include/bits/ranges_util.h > index 7e7b958d274..765848e327d 100644 > --- a/libstdc++-v3/include/bits/ranges_util.h > +++ b/libstdc++-v3/include/bits/ranges_util.h > @@ -184,11 +184,16 @@ namespace ranges > >namespace __detail >{ > -template > +template > + concept __uses_nonqualification_pointer_conversion > + = is_pointer_v<_From> && is_pointer_v<_To> > + && !convertible_to(*)[], > +remove_pointer_t<_To>(*)[]>; > + > +template >concept __convertible_to_non_slicing = convertible_to<_From, _To> > - && !(is_pointer_v> && is_pointer_v> > - && __different_from>, > - remove_pointer_t>>); > + && !__uses_nonqualification_pointer_conversion, > + decay_t<_To>>; > > template >concept __pair_like > diff --git a/libstdc++-v3/testsuite/std/ranges/subrange/1.cc > b/libstdc++-v3/testsuite/std/ranges/subrange/1.cc > new file mode 100644 > index 000..8a53261c78c > --- /dev/null > +++ b/libstdc++-v3/testsuite/std/ranges/subrange/1.cc > @@ -0,0 +1,19 @@ > +// { dg-options "-std=gnu++20" } > +// { dg-do run { target c++20 } } > + > +#include > + > +void > +test01() > +{ > + // LWG 3470 > + int a[3] = {1,2,3}; > + int* b[3] = {&a[2], &a[0], &a[1]}; > + auto c = std::ranges::subrange(b); > +} > + > +int > +main() > +{ > + test01(); > +} > -- > 2.33.1.711.g9d530dc002 > >
Re: [PATCH] c++: Fix up push_local_extern_decl_alias error recovery [PR102642]
On 10/19/21 11:21, Jakub Jelinek wrote: On Tue, Oct 19, 2021 at 10:48:04AM -0400, Jason Merrill wrote: What if we use NULL_TREE for the error case instead of error_mark_node, i.e. - DECL_LOCAL_DECL_ALIAS (decl) = alias; + if (alias != error_mark_node) +DECL_LOCAL_DECL_ALIAS (decl) = alias; ? That ought to avoid the need to change other functions, since they already check for null. True, but I'm worried about e.g. maybe_version_functions which does if (DECL_LOCAL_DECL_P (olddecl)) { olddecl = DECL_LOCAL_DECL_ALIAS (olddecl); maybe_mark_function_versioned (olddecl); } (I think the above will crash of DECL_LOCAL_DECL_ALIAS is NULL) and if (DECL_LOCAL_DECL_P (newdecl)) { /* Unfortunately, we can get here before pushdecl naturally calls push_local_extern_decl_alias, so we need to call it directly. */ if (!DECL_LOCAL_DECL_ALIAS (newdecl)) push_local_extern_decl_alias (newdecl); newdecl = DECL_LOCAL_DECL_ALIAS (newdecl); maybe_mark_function_versioned (newdecl); } which means that if there is an error from push_local_extern_decl_alias, we'd report it twice rather than once (once from here, another time from do_pushdecl). I'm afraid maybe_version_functions needs fixing no matter what, but the NULL vs. error_mark_node decision is probably dependent on whether it is ok to error twice or not. Ah, true. Your earlier patch is OK. Jason
Re: [PATCH] c++: error message for dependent template members [PR70417]
On 10/10/21 07:28, Anthony Sharp wrote: Hi Jason, Hope you are well. Thanks for the feedback. I like adding the configurability, but I think let's keep committing as the default behavior. And adding the parameter to the rollback function seems unnecessary. For the behavior argument, let's use an enum to be clearer. Sure, all done. The declaration of the enum could have gone in many places it seems but I put it in cp-tree.h. I'd put it just above the definition of saved_token_sentinel in parser.c. I've removed the awkward call to cp_parser_require and put it (along with the call to cp_parser_skip_to_end_of_template_parameter_list) in its own new function called cp_parser_ensure_reached_end_of_template_parameter_list. Maybe cp_parser_require_end_of_template_parameter_list? Either way is fine. I think we don't want to return early here, we want to go through the same ( check that we do for regular names. I have changed it to do that, but could something like "operator- <" ever be intended as something other than the start of a template-id? Hmm, good point; operators that are member functions must be non-static, so we couldn't be doing a comparison of the address of the function. +// { dg-warning "expected \"template\" keyword before dependent template name" { target c++20 } .-1 } +// ^ bogus warning caused by the above being treated as an expression, not a declaration > Hmm, that's a problem. Can you avoid it by checking declarator_p? We actually already check declarator_p in cp_parser_id_expression in that case. The reason it throws a warning is because typename14.C is intentionally malformed; in the eyes of the compiler it's written like an expression because it's missing the return type (although, even adding a return type would not make it valid). I'm not sure there's any worthwhile way around this really. But it isn't written like an expression: the error comes when trying to diagnose it as an invalid type in a declaration context. So declarator_p should be true there. I'll fix that. Also, some more missing template keywords seemed to crop up in the regression tests, so I had to sprinkle some more template keywords in a few. I guess there was a hidden bug that was missing a few scenarios. Just out of curiosity, how pedantic would you say I should be when submitting patches to GCC etc? I regression test and bootstrap all my changes, but I'm always worrying about what might happen if I somehow forgot to stage a file in git, or attached the wrong patch file, or interpreted the .sum file wrong etc. Do patches go through another round of testing after submitting? Not automatically (yet), but often I test other people's patches before applying them. Sometimes I wonder whether I should be applying the patch locally and then bootstrapping and regression testing it again, although that's hardly fun since that usually takes around 2-3 hours even with -j 12. Maybe I ought to think about getting a dedicated Linux PC. You could also apply for a GCC Compile Farm account: https://gcc.gnu.org/wiki/CompileFarm + if (next_token->keyword == RID_TEMPLATE) +{ + /* But at least make sure it's properly formed (e.g. see PR19397). */ + if (cp_lexer_peek_nth_token (parser->lexer, 2)->type == CPP_NAME) + return 1; + + return -1; +} + + /* Could be a ~ referencing the destructor of a class template. */ + if (next_token->type == CPP_COMPL) +{ + /* It could only be a template. */ + if (cp_lexer_peek_nth_token (parser->lexer, 2)->type == CPP_NAME) + return 1; + + return -1; +} Why don't these check for the < ? + /* E.g. "::operator- <>". */ + if (next_token->keyword == RID_OPERATOR) +{ + /* Consume it so it doesn't get in our way. */ + cp_lexer_consume_token (parser->lexer); + next_token = cp_lexer_peek_token (parser->lexer); + found_operator_keyword = true; +} ... + if (!found_operator_keyword && next_token->type != CPP_NAME) +return -1; These could be if/else if so you don't need the found_operator_keyword variable. + if (next_token->type == CPP_TEMPLATE_ID) +return 1; This could move above the saved_token_sentinel; you won't have a CPP_TEMPLATE_ID after RID_OPERATOR. + /* If this is a function template then we would see a "(" after the + final ">". It could also be a class template constructor. */ + if (next_token->type == CPP_OPEN_PAREN + /* If this is a class template then we could see a scope token after + the final ">". */ + || next_token->type == CPP_SCOPE + /* We could see a ";" after a variable template. */ + || next_token->type == CPP_SEMICOLON + /* We could see something like +friend vect (::operator- <>)( const vect&, const vect& ); +*/ + || next_token->type == CPP_CLOSE_PAREN) +return 1; This seems too limited. As I was saying before,
Re: [PATCH] elf: Add __libc_get_static_tls_bounds [BZ #16291]
On Thu, Oct 14, 2021 at 5:13 PM Fangrui Song wrote: > > On 2021-10-06, Fangrui Song wrote: > >On 2021-09-27, Fangrui Song wrote: > >>On 2021-09-27, Florian Weimer wrote: > >>>* Fangrui Song: > >>> > Sanitizer runtimes need static TLS boundaries for a variety of use cases. > > * asan/hwasan/msan/tsan need to unpoison static TLS blocks to prevent > false > positives due to reusing the TLS blocks with a previous thread. > * lsan needs TCB for pointers into pthread_setspecific regions. > > See https://maskray.me/blog/2021-02-14-all-about-thread-local-storage > for details. > > compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp GetTls has > to infer the static TLS bounds from TP, _dl_get_tls_static_info, and > hard-coded TCB sizes. Currently this is somewhat robust for > aarch64/powerpc64/x86-64 but is brittle for many other architectures. > > This patch implements __libc_get_static_tls_bounds@@GLIBC_PRIVATE which > is available in Android bionic since API level 31. This API allows the > sanitizer code to be more robust. _dl_get_tls_static_info@@GLIBC_PRIVATE > can probably be removed when Clang/GCC sanitizers drop reliance on it. > I am unclear whether the version should be GLIBC_2.*. > >>> > >>>Does this really cover the right memory region? I assume LSAN needs > >>>something that identifies pointers to malloc'ed memory that are stored > >>>in non-malloc'ed (mmap'ed) memory. The static TLS region is certainly a > >>>place where such pointers can be stored. But struct pthread also > >>>contains other such pointers: the DTV, the TPP data, and POSIX TLS > >>>(pthread_setspecific) data, and struct pthread is not obviously part of > >>>the static TLS region. > >> > >>I know the pthread_setspecific leak detection is brittle but it is > >>currently implemented this way ;-) > >> > >>https://maskray.me/blog/2021-02-14-all-about-thread-local-storage says > >> > >>"On glibc, GetTls returned range includes > >>pthread::{specific_1stblock,specific} for thread-specific data keys. > >>There is currently a hack to ignore allocations from ld.so allocated > >>dynamic TLS blocks. Note: if the pthread::{specific_1stblock,specific} > >>pointers are encrypted, lsan cannot track the allocation." > >> > >>If pthread::{specific_1stblock,specific} use an XOR technique (like > >>__cxa_atexit/setjmp) the pthread_setspecific leak detection will stop > >>working :( > >> > >>--- > >> > >>In any case, the pthread_setspecific leak detection is a relatively > >>minor issue. The big issue is asan/msan/tsan false positives due to > >>reusing an (exited) thread stack or its TLS blocks. > >> > >>Around > >>https://code.woboq.org/llvm/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp.html#435 > >>there is very long messy code hard coding the thread descriptor size in > >>glibc. > >> > >>Android `__libc_get_static_tls_bounds(&start_addr, &end_addr);` is the > >>most robust one. > >> > >>--- > >> > >>I ported sanitizers to musl (https://reviews.llvm.org/D93848) > >>in LLVM 12.0.0 and fixed some TLS block detection aarch64/ppc64 issues > >>(https://reviews.llvm.org/D98926 and its follow-up, due to the > >>complexity I couldn't get it right in the first place), so I have some > >>understanding about sanitizers' TLS usage. > > > >Adhemerval showed me that the __libc_get_static_tls_bounds behavior is > >expected on aarch64 as well ( > >__libc_get_static_tls_bounds should match sanitizer GetTls) > > > >From https://gist.github.com/MaskRay/e035b85dce008f0c6d4997b98354d355 > >``` > >$ ./testrun.sh ./test-tls-boundary > >+++GetTls: 0x7f9c5fd6c000 4416 > >get_tls=0x7f9c600b4050 > >_dl_get_tls_static_info: 4416 64 > >get_static=0x7f9c600b4070 > >__libc_get_static_tls_bounds: 0x7f9c5fd6c000 4416 > >``` > > > > > > > >Is there any concern adding the interface? > > Gentle ping... CC gcc-patches which ports compiler-rt and may be interested in more reliable sanitizers.
[committed] doc: Fix typo in name of PowerPC __builtin_cpu_supports built-in
gcc/ChangeLog: * doc/extend.texi (Basic PowerPC Built-in Functions): Fix typo. Committed as obvious. commit c6a1fdd6dde3a95997731c8339d70970aca67594 Author: Jonathan Wakely Date: Tue Oct 19 20:37:53 2021 doc: Fix typo in name of PowerPC __builtin_cpu_supports built-in gcc/ChangeLog: * doc/extend.texi (Basic PowerPC Built-in Functions): Fix typo. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 10d466fae9a..3c942d81c32 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -17698,7 +17698,7 @@ macro @code{__BUILTIN_CPU_SUPPORTS__} if the @code{__builtin_cpu_supports} built-in function is fully supported. If GCC was configured to use a GLIBC before 2.23, the built-in -function @code{__builtin_cpu_suports} always returns a 0 and the +function @code{__builtin_cpu_supports} always returns a 0 and the compiler issues a warning. The following features can be
Re: [PATCH] libstdc++: Implement LWG 3549 changes to ranges::enable_view
On Tue, 19 Oct 2021 at 15:35, Patrick Palka via Libstdc++ < libstd...@gcc.gnu.org> wrote: > This patch also reverts the r11-3504 workaround since it's made obsolete > by this resolution. > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk only? > OK, thanks. > libstdc++-v3/ChangeLog: > > * include/bits/ranges_base.h (view_interface): Forward declare. > (__detail::__is_derived_from_view_interface_fn): Declare > (__detail::__is_derived_from_view_interface): Define as per LWG > 3549. > (enable_view): Adjust as per LWG 3549. > * include/bits/ranges_util.h (view_interface): Don't derive from > view_base. > * include/std/ranges (filter_view): Revert r11-3504 change. > (transform_view): Likewise. > (take_view): Likewise. > (take_while_view): Likewise. > (drop_view): Likewise. > (drop_while_view): Likewise. > (join_view): Likewise. > (split_view): Likewise. > (reverse_view): Likewise. > * testsuite/std/ranges/adaptors/sizeof.cc: Update expected > sizes. > * testsuite/std/ranges/view.cc (test_view::test_view): Remove > now that views no longer need to be default-initializable. > (test01): New test. > --- > libstdc++-v3/include/bits/ranges_base.h | 21 - > libstdc++-v3/include/bits/ranges_util.h | 2 +- > libstdc++-v3/include/std/ranges | 45 +-- > .../testsuite/std/ranges/adaptors/sizeof.cc | 6 +-- > libstdc++-v3/testsuite/std/ranges/view.cc | 28 ++-- > 5 files changed, 70 insertions(+), 32 deletions(-) > > diff --git a/libstdc++-v3/include/bits/ranges_base.h > b/libstdc++-v3/include/bits/ranges_base.h > index 01d0c35f4b4..7801b2fd023 100644 > --- a/libstdc++-v3/include/bits/ranges_base.h > +++ b/libstdc++-v3/include/bits/ranges_base.h > @@ -614,12 +614,31 @@ namespace ranges >template > using range_size_t = decltype(ranges::size(std::declval<_Range&>())); > > + template > +requires is_class_v<_Derived> && same_as<_Derived, > remove_cv_t<_Derived>> > +class view_interface; // defined in > + > + namespace __detail > + { > +template > + requires (!same_as<_Tp, view_interface<_Up>>) > + void __is_derived_from_view_interface_fn(const _Tp&, > + const > view_interface<_Up>&); // not defined > + > +// Returns true iff _Tp has exactly one public base class that's a > +// specialization of view_interface. > +template > + concept __is_derived_from_view_interface > + = requires (_Tp __t) { __is_derived_from_view_interface_fn(__t, > __t); }; > + } > + >/// [range.view] The ranges::view_base type. >struct view_base { }; > >/// [range.view] The ranges::enable_view boolean. >template > -inline constexpr bool enable_view = derived_from<_Tp, view_base>; > +inline constexpr bool enable_view = derived_from<_Tp, view_base> > + || __detail::__is_derived_from_view_interface<_Tp>; > >/// [range.view] The ranges::view concept. >template > diff --git a/libstdc++-v3/include/bits/ranges_util.h > b/libstdc++-v3/include/bits/ranges_util.h > index 1afa66d298c..5c0bef26220 100644 > --- a/libstdc++-v3/include/bits/ranges_util.h > +++ b/libstdc++-v3/include/bits/ranges_util.h > @@ -61,7 +61,7 @@ namespace ranges >/// The ranges::view_interface class template >template > requires is_class_v<_Derived> && same_as<_Derived, > remove_cv_t<_Derived>> > -class view_interface : public view_base > +class view_interface > { > private: >constexpr _Derived& _M_derived() noexcept > diff --git a/libstdc++-v3/include/std/ranges > b/libstdc++-v3/include/std/ranges > index 64396027c1b..e47fc075bbe 100644 > --- a/libstdc++-v3/include/std/ranges > +++ b/libstdc++-v3/include/std/ranges > @@ -1533,9 +1533,9 @@ namespace views::__adaptor > { return __y.__equal(__x); } >}; > > + _Vp _M_base = _Vp(); >[[no_unique_address]] __detail::__box<_Pred> _M_pred; >[[no_unique_address]] __detail::_CachedPosition<_Vp> > _M_cached_begin; > - _Vp _M_base = _Vp(); > > public: >filter_view() requires (default_initializable<_Vp> > @@ -1544,7 +1544,7 @@ namespace views::__adaptor > >constexpr >filter_view(_Vp __base, _Pred __pred) > - : _M_pred(std::move(__pred)), _M_base(std::move(__base)) > + : _M_base(std::move(__base)), _M_pred(std::move(__pred)) >{ } > >constexpr _Vp > @@ -1900,8 +1900,8 @@ namespace views::__adaptor > friend _Sentinel; > }; > > - [[no_unique_address]] __detail::__box<_Fp> _M_fun; >_Vp _M_base = _Vp(); > + [[no_unique_address]] __detail::__box<_Fp> _M_fun; > > public: >transform_view() requires (default_initializable<_Vp> > @@ -1910,7 +1910,7 @@ namespace views::__adaptor > >constexpr >tra
Re: [PATCH] gcc: implement AIX-style constructors
Clement, + /* Use __C_runtime_pstartup to run ctors and register dtors. + This whole part should normally be in libgcc but as + AIX cdtors format is currently not the default, managed + that in collect2. */ Why are you emitting the special startup function call in collect2.c instead of placing it in libgcc. The comment mentions that the special startup function should be defined in libgcc. Yes, the AIX ld bcdtors mechanism is not the default, but what is the harm? The symbol will be defined and exported by libgcc. If the AIX linker -bcdtors functionality is not invoked, the symbol is not used. And if a user does invoke the AIX linker with -bcdtors, the behavior will be the same (either the program was compiled to use AIX cdtors or not, which is the same if the startup function is emitted by collect2.c. Also, the patch should include documentation of the option. The documentation should mention that this is for interoperability with IBM XL Compiler, and the option will not operate correctly unless the application and the GCC runtime are built with the option. Thanks, David On Mon, Oct 18, 2021 at 3:55 AM CHIGOT, CLEMENT wrote: > > AIX linker now supports constructors and destructors detection. For such > functions to be detected, their name must starts with __sinit or __sterm. > and -bcdtors must be passed to linker calls. It will create "_cdtors" > symbol which can be used to launch the initialization. > > This patch creates a new RS6000 flag "-mcdtors=". > With "-mcdtors=aix", gcc will generate these new constructors/destructors. > With "-mcdtors=gcc", which is currently the default, gcc will continue > to generate "gcc" format for constructors (ie _GLOBAL__I and _GLOBAL__D > symbols). > Ideally, it would have been better to enable the AIX format by default > instead of using collect2. However, the compatibility between the > previously-built binaries and the new ones is too complex to be done. > > gcc/ChangeLog: > 2021-10-04 Clément Chigot > > * collect2.c (aixbcdtors_flags): New variable. > (main): Use it to detect -bcdtors and remove -binitfini flag. > (write_c_file_stat): Adapt to new AIX format. > * config/rs6000/aix.h (FILE_SINIT_FORMAT): New define. > (FILE_STERM_FORMAT): New define. > (TARGET_FILE_FUNCTION_FORMAT): New define. > * config/rs6000/aix64.opt: Add -mcdtors flag. > * config/rs6000/aix71.h (LINK_SPEC_COMMON): Pass -bcdtors when > -mcdtors=aix is passed. > * config/rs6000/aix72.h (LINK_SPEC_COMMON): Likewise. > * config/rs6000/aix73.h (LINK_SPEC_COMMON): Likewise. > * config/rs6000/rs6000-opts.h (enum rs6000_cdtors): New enum. > * tree.c (get_file_function_name): Add > TARGET_FILE_FUNCTION_FORMAT support. > > gcc/testsuite/ChangeLog: > 2021-10-04 Clément Chigot > > * gcc.target/powerpc/constructor-aix.c: New test. > >
[r12-4475 Regression] FAIL: gcc.target/i386/pr22076.c scan-assembler-times movq 2 on Linux/x86_64
On Linux/x86_64, 247c407c83f0015f4b92d5f71e45b63192f6757e is the first bad commit commit 247c407c83f0015f4b92d5f71e45b63192f6757e Author: Roger Sayle Date: Mon Oct 18 12:15:40 2021 +0100 Try placing RTL folded constants in the constant pool. caused FAIL: gcc.target/i386/pr22076.c scan-assembler-not movl FAIL: gcc.target/i386/pr22076.c scan-assembler-times movq 2 with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4475/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/pr22076.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/pr22076.c --target_board='unix{-m32\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
how does vrp2 rearrange this?
using testcase ifcvt-4.c: typedef int word __attribute__((mode(word))); word foo (word x, word y, word a) { word i = x; word j = y; /* Try to make taking the branch likely. */ __builtin_expect (x > y, 1); if (x > y) { i = a; j = i; } return i * j; The current VRP2 pass takes: if (x_3(D) > y_4(D)) goto ; [50.00%] <<--- note the THEN target else goto ; [50.00%] ;; succ: 3 [50.0% (guessed)] count:536870912 (estimated locally) (TRUE_VALUE,EXECUTABLE) ;; 4 [50.0% (guessed)] count:536870912 (estimated locally) (FALSE_VALUE,EXECUTABLE) ;; basic block 3, loop depth 0, count 536870912 (estimated locally), maybe hot ;; prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED) ;; pred: 2 [50.0% (guessed)] count:536870912 (estimated locally) (TRUE_VALUE,EXECUTABLE) ;; succ: 4 [always] count:536870912 (estimated locally) (FALLTHRU,EXECUTABLE) ;; basic block 4, loop depth 0, count 1073741824 (estimated locally), maybe hot ;; prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED) ;; pred: 2 [50.0% (guessed)] count:536870912 (estimated locally) (FALSE_VALUE,EXECUTABLE) ;; 3 [always] count:536870912 (estimated locally) (FALLTHRU,EXECUTABLE) # i_1 = PHI # j_2 = PHI _6 = i_1 * j_2; # VUSE <.MEM_7(D)> return _6; and turns it into : ;; basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe hot ;; prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED) ;; pred: ENTRY [always] count:1073741824 (estimated locally) (FALLTHRU,EXECUTABLE) if (x_3(D) > y_4(D)) goto ; [50.00%] <<-- has been reversed. else goto ; [50.00%] ;; succ: 4 [50.0% (guessed)] count:536870912 (estimated locally) (TRUE_VALUE,EXECUTABLE) ;; 3 [50.0% (guessed)] count:536870912 (estimated locally) (FALSE_VALUE,EXECUTABLE) ;; basic block 3, loop depth 0, count 536870912 (estimated locally), maybe hot ;; prev block 2, next block 4, flags: (NEW, VISITED) ;; pred: 2 [50.0% (guessed)] count:536870912 (estimated locally) (FALSE_VALUE,EXECUTABLE) ;; succ: 4 [always] count:536870912 (estimated locally) (FALLTHRU,EXECUTABLE) ;; basic block 4, loop depth 0, count 1073741824 (estimated locally), maybe hot ;; prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED) ;; pred: 3 [always] count:536870912 (estimated locally) (FALLTHRU,EXECUTABLE) ;; 2 [50.0% (guessed)] count:536870912 (estimated locally) (TRUE_VALUE,EXECUTABLE) # i_1 = PHI # j_2 = PHI _6 = i_1 * j_2; # VUSE <.MEM_7(D)> return _6; So the IF has reversed the targets (but not the condition), and the PHIs in the target block have been adjusted. Where does this happen? I cannot find it. There doesnt seem to be anything in the IL which is reflecting the "expect" from the original code.. and if I run ranger vrp instead of classic_vrp, we don't do this... so I'm missing something Thanks Andrew
Re: [PATCH] Cleanup --params for backward threader.
Hi Aldy! On Thu, 2021-10-14 16:25:48 +0200, Aldy Hernandez via Gcc-patches wrote: > The new backward threader makes some of the --param knobs used to > control it questionable at best or no longer applicable at worst. > > The fsm-maximum-phi-arguments param is unused and can be removed. > > The max-fsm-thread-length param is block based which is a bit redundant, > since we already restrict paths based on instruction estimates. > > The max-fsm-thread-paths restricts the total number of threadable paths > in a function. We probably don't need this. Besides, the forward > threader has no such restriction. > > OK pending tests? This causes a regression for me. I'm auto-building lots of GCC cross-compilers and use these to cross-build the Linux kernel. Using binutils/gas/gcc configured for --target=sh-linux (actual configuration for GCC is this: .../gcc/configure --target=sh-linux --enable-werror-always \ --enable-languages=all --disable-gcov\ --disable-shared --disable-threads \ --without-headers \ --prefix=/var/lib/laminar/run/gcc-sh-linux/13/toolchain-install ) Then, building Linux for a good number of default configurations for ARCH=sh and ARCH=arm, GCC will just loop: $ make ARCH=sh distclean $ cp arch/sh/configs/r7780mp_defconfig .config $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- oldconfig < /dev/null $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- prepare $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- all [...] sh-linux-gcc -Wp,-MMD,drivers/ata/.libata-core.o.d -nostdinc -isystem /tmp/testbed/install/lib/gcc/sh-linux/12.0.0/include -I./arch/sh/include -I./arch/sh/include/generated -I./include -I./arch/sh/include/uapi -I./arch/sh/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -m4 -m4-nofpu -m4a -m4a-nofpu -ml -mno-fdpic -Wa,-isa=sh4a-up -ffreestanding -I ./arch/sh/include/cpu-sh4a -I ./arch/sh/include/cpu-sh4 -I ./arch/sh/include/cpu-common -I ./arch/sh/include/mach-highlander -I ./arch/sh/include/mach-common -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu89 -pipe -m4 -m4-nofpu -m4a -m4a-nofpu -ml -mno-fdpic -Wa,-isa=sh4a-up -ffreestanding -I ./arch/sh/include/cpu-sh4a -I ./arch/sh/include/cpu-sh4 -I ./arch/sh/include/cpu-common -I ./arch/sh/include/mach-highlander -I ./arch/sh/include/mach-common -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -Wno-address-of-packed-member -O2 -fno-allow-store-data-races -Wframe-larger-than=1024 -fstack-protector-strong -Wimplicit-fallthrough=5 -Wno-main -Wno-unused-but-set-variable -Wno-unused-const-variable -fomit-frame-pointer -ftrivial-auto-var-init=zero -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang -fno-stack-clash-protection -g -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-stringop-truncation -Wno-zero-length-bounds -Wno-array-bounds -Wno-stringop-overflow -Wno-restrict -Wno-maybe-uninitialized -fno-strict-overflow -fno-stack-check -fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -Wno-packed-not-aligned-DKBUILD_MODFILE='"drivers/ata/libata"' -DKBUILD_BASENAME='"libata_core"' -DKBUILD_MODNAME='"libata"' -D__KBUILD_MODNAME=kmod_libata -c -o drivers/ata/libata-core.o drivers/ata/libata-core.c (gdb) bt #0 0x0100318e in vec::operator[] (ix=0, this=0x50ee6e0) at ../../gcc/gcc/vec.h:1495 #1 back_jt_path_registry::adjust_paths_after_duplication (this=0x7ffdf8b6e868, curr_path_num=0) at ../../gcc/gcc/tree-ssa-threadupdate.c:2315 #2 0x01003c0d in back_jt_path_registry::duplicate_thread_path (this=0x7ffdf8b6e868, entry=0x7f92651000c0, exit=, region=, n_region=8, current_path_no=0) at ../../gcc/gcc/tree-ssa-threadupdate.c:2546 #3 0x010051e4 in back_jt_path_registry::update_cfg (this=0x7ffdf8b6e868) at ../../gcc/gcc/tree-ssa-threadupdate.c:2656 #4 0x01003ecc in jt_path_registry::thread_through_all_blocks (this=0x7ffdf8b6e868, peel_loop_headers=) at ../../gcc/gcc/tree-ssa-threadupdate.c:2604 #5 0x00ffb5a7 in back_threader_registry::thread_through_all_blocks (may_peel_loop_headers=true, this=0x7ffdf8b6e868) at ../../gcc/gcc/tree-ssa-threadbackward.c:556 #6 back_threader::thread_through_all_blocks (may_peel_loop_headers=true, this=0x7ffdf8b6e860) at ../../gcc/gcc/tree-ssa-threadbackward.c:501 #7 (anonymous namespace)::try_thread_blocks (fun=fun@entry=0x7f926eb389c0) at ../../gcc/gcc/tree-ssa-threadbackward.c:946 #8 0x00ffb5eb in (anonymous namespa
Re: [PATCH] Cleanup --params for backward threader.
That's odd. Yeah, please open a PR. Thanks. Aldy On Tue, Oct 19, 2021, 22:47 Jan-Benedict Glaw wrote: > Hi Aldy! > > On Thu, 2021-10-14 16:25:48 +0200, Aldy Hernandez via Gcc-patches < > gcc-patches@gcc.gnu.org> wrote: > > The new backward threader makes some of the --param knobs used to > > control it questionable at best or no longer applicable at worst. > > > > The fsm-maximum-phi-arguments param is unused and can be removed. > > > > The max-fsm-thread-length param is block based which is a bit redundant, > > since we already restrict paths based on instruction estimates. > > > > The max-fsm-thread-paths restricts the total number of threadable paths > > in a function. We probably don't need this. Besides, the forward > > threader has no such restriction. > > > > OK pending tests? > > This causes a regression for me. I'm auto-building lots of GCC > cross-compilers and use these to cross-build the Linux kernel. > > Using binutils/gas/gcc configured for --target=sh-linux (actual > configuration for GCC is this: > > .../gcc/configure --target=sh-linux --enable-werror-always \ > --enable-languages=all --disable-gcov\ > --disable-shared --disable-threads \ > --without-headers \ > > --prefix=/var/lib/laminar/run/gcc-sh-linux/13/toolchain-install > ) > > Then, building Linux for a good number of default configurations for > ARCH=sh and ARCH=arm, GCC will just loop: > > $ make ARCH=sh distclean > $ cp arch/sh/configs/r7780mp_defconfig .config > $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- oldconfig < /dev/null > $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- prepare > $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- all > [...] > sh-linux-gcc -Wp,-MMD,drivers/ata/.libata-core.o.d -nostdinc -isystem > /tmp/testbed/install/lib/gcc/sh-linux/12.0.0/include -I./arch/sh/include > -I./arch/sh/include/generated -I./include -I./arch/sh/include/uapi > -I./arch/sh/include/generated/uapi -I./include/uapi > -I./include/generated/uapi -include ./include/linux/compiler-version.h > -include ./include/linux/kconfig.h -include > ./include/linux/compiler_types.h -D__KERNEL__ -m4 -m4-nofpu -m4a -m4a-nofpu > -ml -mno-fdpic -Wa,-isa=sh4a-up -ffreestanding -I > ./arch/sh/include/cpu-sh4a -I ./arch/sh/include/cpu-sh4 -I > ./arch/sh/include/cpu-common -I ./arch/sh/include/mach-highlander -I > ./arch/sh/include/mach-common -fmacro-prefix-map=./= -Wall -Wundef > -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common > -fshort-wchar -fno-PIE -Werror=implicit-function-declaration > -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu89 > -pipe -m4 -m4-nofpu -m4a -m4a-nofpu -ml -mno-fdpic -Wa,-isa=sh4a-up > -ffreestanding -I ./arch/sh/include/cpu-sh4a -I ./arch/sh/include/cpu-sh4 > -I ./arch/sh/include/cpu-common -I ./arch/sh/include/mach-highlander -I > ./arch/sh/include/mach-common -fno-delete-null-pointer-checks > -Wno-frame-address -Wno-format-truncation -Wno-format-overflow > -Wno-address-of-packed-member -O2 -fno-allow-store-data-races > -Wframe-larger-than=1024 -fstack-protector-strong -Wimplicit-fallthrough=5 > -Wno-main -Wno-unused-but-set-variable -Wno-unused-const-variable > -fomit-frame-pointer -ftrivial-auto-var-init=zero > -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang > -fno-stack-clash-protection -g -Wdeclaration-after-statement -Wvla > -Wno-pointer-sign -Wno-stringop-truncation -Wno-zero-length-bounds > -Wno-array-bounds -Wno-stringop-overflow -Wno-restrict > -Wno-maybe-uninitialized -fno-strict-overflow -fno-stack-check > -fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types > -Werror=designated-init -Wno-packed-not-aligned > -DKBUILD_MODFILE='"drivers/ata/libata"' -DKBUILD_BASENAME='"libata_core"' > -DKBUILD_MODNAME='"libata"' -D__KBUILD_MODNAME=kmod_libata -c -o > drivers/ata/libata-core.o drivers/ata/libata-core.c > > > (gdb) bt > #0 0x0100318e in vec vl_ptr>::operator[] (ix=0, this=0x50ee6e0) at ../../gcc/gcc/vec.h:1495 > #1 back_jt_path_registry::adjust_paths_after_duplication > (this=0x7ffdf8b6e868, curr_path_num=0) at > ../../gcc/gcc/tree-ssa-threadupdate.c:2315 > #2 0x01003c0d in back_jt_path_registry::duplicate_thread_path > (this=0x7ffdf8b6e868, entry=0x7f92651000c0, exit=, > region=, n_region=8, > current_path_no=0) at ../../gcc/gcc/tree-ssa-threadupdate.c:2546 > #3 0x010051e4 in back_jt_path_registry::update_cfg > (this=0x7ffdf8b6e868) at ../../gcc/gcc/tree-ssa-threadupdate.c:2656 > #4 0x01003ecc in jt_path_registry::thread_through_all_blocks > (this=0x7ffdf8b6e868, peel_loop_headers=) at > ../../gcc/gcc/tree-ssa-threadupdate.c:2604 > #5 0x00ffb5a7 in > back_threader_registry::thread_through_all_blocks > (may_peel_loop_headers=true, this=0x7ffdf8b6e868) at > ../../gcc/gcc/tree-ssa-threadbackward.c:556 > #6 back_threader::thread_through_all_blocks (may_peel_loop_heade
Re: how does vrp2 rearrange this?
On Tue, Oct 19, 2021 at 1:29 PM Andrew MacLeod via Gcc-patches wrote: > > using testcase ifcvt-4.c: > > > typedef int word __attribute__((mode(word))); > > word > foo (word x, word y, word a) > { >word i = x; >word j = y; >/* Try to make taking the branch likely. */ >__builtin_expect (x > y, 1); >if (x > y) > { >i = a; >j = i; > } >return i * j; > > The current VRP2 pass takes: > >if (x_3(D) > y_4(D)) > goto ; [50.00%]<<--- note the THEN target >else > goto ; [50.00%] > ;;succ: 3 [50.0% (guessed)] count:536870912 (estimated > locally) (TRUE_VALUE,EXECUTABLE) > ;;4 [50.0% (guessed)] count:536870912 (estimated > locally) (FALSE_VALUE,EXECUTABLE) > > ;; basic block 3, loop depth 0, count 536870912 (estimated locally), > maybe hot > ;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED) > ;;pred: 2 [50.0% (guessed)] count:536870912 (estimated > locally) (TRUE_VALUE,EXECUTABLE) > ;;succ: 4 [always] count:536870912 (estimated locally) > (FALLTHRU,EXECUTABLE) > > ;; basic block 4, loop depth 0, count 1073741824 (estimated locally), > maybe hot > ;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED) > ;;pred: 2 [50.0% (guessed)] count:536870912 (estimated > locally) (FALSE_VALUE,EXECUTABLE) > ;;3 [always] count:536870912 (estimated locally) > (FALLTHRU,EXECUTABLE) ># i_1 = PHI ># j_2 = PHI >_6 = i_1 * j_2; ># VUSE <.MEM_7(D)> >return _6; > > and turns it into : > > ;; basic block 2, loop depth 0, count 1073741824 (estimated locally), > maybe hot > ;;prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED) > ;;pred: ENTRY [always] count:1073741824 (estimated locally) > (FALLTHRU,EXECUTABLE) >if (x_3(D) > y_4(D)) > goto ; [50.00%]<<-- has been reversed. >else > goto ; [50.00%] > ;;succ: 4 [50.0% (guessed)] count:536870912 (estimated > locally) (TRUE_VALUE,EXECUTABLE) > ;;3 [50.0% (guessed)] count:536870912 (estimated > locally) (FALSE_VALUE,EXECUTABLE) > > ;; basic block 3, loop depth 0, count 536870912 (estimated locally), > maybe hot > ;;prev block 2, next block 4, flags: (NEW, VISITED) > ;;pred: 2 [50.0% (guessed)] count:536870912 (estimated > locally) (FALSE_VALUE,EXECUTABLE) > ;;succ: 4 [always] count:536870912 (estimated locally) > (FALLTHRU,EXECUTABLE) > > ;; basic block 4, loop depth 0, count 1073741824 (estimated locally), > maybe hot > ;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED) > ;;pred: 3 [always] count:536870912 (estimated locally) > (FALLTHRU,EXECUTABLE) > ;;2 [50.0% (guessed)] count:536870912 (estimated > locally) (TRUE_VALUE,EXECUTABLE) ># i_1 = PHI ># j_2 = PHI >_6 = i_1 * j_2; ># VUSE <.MEM_7(D)> >return _6; > > So the IF has reversed the targets (but not the condition), and the PHIs > in the target block have been adjusted. > > Where does this happen? I cannot find it. There doesnt seem to be > anything in the IL which is reflecting the "expect" from the original > code.. and if I run ranger vrp instead of classic_vrp, we don't do > this... so I'm missing something I suspect this is an artifact of inserting and removing the assert expressions in VRP rather than anything else. Thanks, Andrew > > Thanks > > Andrew >
Re: how does vrp2 rearrange this?
On 10/19/21 5:13 PM, Andrew Pinski wrote: On Tue, Oct 19, 2021 at 1:29 PM Andrew MacLeod via Gcc-patches wrote: using testcase ifcvt-4.c: typedef int word __attribute__((mode(word))); word foo (word x, word y, word a) { word i = x; word j = y; /* Try to make taking the branch likely. */ __builtin_expect (x > y, 1); if (x > y) { i = a; j = i; } return i * j; The current VRP2 pass takes: if (x_3(D) > y_4(D)) goto ; [50.00%]<<--- note the THEN target else goto ; [50.00%] ;;succ: 3 [50.0% (guessed)] count:536870912 (estimated locally) (TRUE_VALUE,EXECUTABLE) ;;4 [50.0% (guessed)] count:536870912 (estimated locally) (FALSE_VALUE,EXECUTABLE) ;; basic block 3, loop depth 0, count 536870912 (estimated locally), maybe hot ;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED) ;;pred: 2 [50.0% (guessed)] count:536870912 (estimated locally) (TRUE_VALUE,EXECUTABLE) ;;succ: 4 [always] count:536870912 (estimated locally) (FALLTHRU,EXECUTABLE) ;; basic block 4, loop depth 0, count 1073741824 (estimated locally), maybe hot ;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED) ;;pred: 2 [50.0% (guessed)] count:536870912 (estimated locally) (FALSE_VALUE,EXECUTABLE) ;;3 [always] count:536870912 (estimated locally) (FALLTHRU,EXECUTABLE) # i_1 = PHI # j_2 = PHI _6 = i_1 * j_2; # VUSE <.MEM_7(D)> return _6; and turns it into : ;; basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe hot ;;prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED) ;;pred: ENTRY [always] count:1073741824 (estimated locally) (FALLTHRU,EXECUTABLE) if (x_3(D) > y_4(D)) goto ; [50.00%]<<-- has been reversed. else goto ; [50.00%] ;;succ: 4 [50.0% (guessed)] count:536870912 (estimated locally) (TRUE_VALUE,EXECUTABLE) ;;3 [50.0% (guessed)] count:536870912 (estimated locally) (FALSE_VALUE,EXECUTABLE) ;; basic block 3, loop depth 0, count 536870912 (estimated locally), maybe hot ;;prev block 2, next block 4, flags: (NEW, VISITED) ;;pred: 2 [50.0% (guessed)] count:536870912 (estimated locally) (FALSE_VALUE,EXECUTABLE) ;;succ: 4 [always] count:536870912 (estimated locally) (FALLTHRU,EXECUTABLE) ;; basic block 4, loop depth 0, count 1073741824 (estimated locally), maybe hot ;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED) ;;pred: 3 [always] count:536870912 (estimated locally) (FALLTHRU,EXECUTABLE) ;;2 [50.0% (guessed)] count:536870912 (estimated locally) (TRUE_VALUE,EXECUTABLE) # i_1 = PHI # j_2 = PHI _6 = i_1 * j_2; # VUSE <.MEM_7(D)> return _6; So the IF has reversed the targets (but not the condition), and the PHIs in the target block have been adjusted. Where does this happen? I cannot find it. There doesnt seem to be anything in the IL which is reflecting the "expect" from the original code.. and if I run ranger vrp instead of classic_vrp, we don't do this... so I'm missing something I suspect this is an artifact of inserting and removing the assert expressions in VRP rather than anything else. Well, this fails the test because we eventually check in RTL land to make sure the branch was reversed due to the expect, I guess. and it seems that only VRP2 does the reversing. so its expected.. just not obvious to me why, when or how. of course, the pass is th rtl.ce1 pass... and the check is for "2 true changes made." . and if the IF isn't reversed , it doesnt "report" that. I have no idea what thats about. regardless... if we DONT do this, then the assembly at the end has an extra branch in it, so it seems like a necessary thing. Andrew
Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.
On 10/18/2021 7:41 AM, Aldy Hernandez wrote: After some playing, it looks like if we enable fully-resolving mode in the *.thread passes immediately preceeding VRP, we can remove the VRP threading passes altogether, thus removing 2 threading passes (and forward threading passes at that!). Whoo whoo! The numbers look really good. We get 6874 more jump threading passes over my boostrap .ii files for a total 3.74% increase. And we get that while running marginally faster (0.19% faster, so noise). The details are: *** Mainline (with the loop rotation patch): ethread:64722 dom:31246 thread:73709 vrp-thread:14357 total: 184034 *** Removing all the VRP threaders. ethread:64722 thread-full:76493 dom:33648 thread:16045 total: 190908 Notice that not only do we get a lot more threads in thread-full (resolving mode), but even DOM can get more jump threads. This doesn't come without risks though. The main issue is that we would be removing one engine (forward threader), with another one (backward threader). But the good news is that (a) we've been using the new backward threader for a while now (b) even the VRP threader in mainline is using the backward threader solver. So, all that would really be changing would be the path discovery bits and custom copier in the forward threader, with the backward threader bit and the generic copier. I personally don't think this is a big risk, because we've done all the hard work already and it's all being stressed in one way or another. I don't see the risk as significantly different than any other big chunk of development work. Furthermore, this is a major step on the path we've been discussing the last couple years. There'll be some testing fallout, but I think that's manageable and ultimately worth the pain to work through. I can express how happy I am that we're at the point of zapping the two VRP threading passes. The untested patch below is all that would need to happen, albeit with copius changes to tests. I'd like to see where we all stand on this before I start chugging away at testing and other time consuming tasks. Note, that all the relevant bits will still be tested in this release, so I'm not gonna cry one way or another. But it'd be nice to start reducing passes, especially if we get a 3.74% increase in jump threads for no time penalty. Finally, even if we all agree, I think we should give us a week after the loop rotation restrictions go in, because threading changes always cause a party of unexpected things to happen. Sure. And FWIW, the loop rotation changes are fine IMHO. So commit those when you're ready, then drop this in a week later. All in time for stage1 close :-) Jeff
Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.
On 10/18/2021 8:03 AM, Aldy Hernandez wrote: On 10/18/21 3:41 PM, Aldy Hernandez wrote: I've been experimenting with reducing the total number of threading passes, and I'd like to see if there's consensus/stomach for altering the pipeline. Note, that the goal is to remove forward threader clients, not the other way around. So, we should prefer to remove a VRP threader instance over a *.thread one immediately before VRP. After some playing, it looks like if we enable fully-resolving mode in the *.thread passes immediately preceeding VRP, we can remove the VRP threading passes altogether, thus removing 2 threading passes (and forward threading passes at that!). It occurs to me that we could also remove the threading before VRP passes, and enable a fully-resolving backward threader after VRP. I haven't played with this scenario, but it should be just as good. That being said, I don't know the intricacies of why we had both pre and post VRP threading passes, and if one is ideally better than the other. The only post-VRP threading pass that (in my mind) makes sense is the one sitting between VRP and DOM and it should replace the DOM based threader. Jeff
Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.
On 10/19/2021 1:33 AM, Aldy Hernandez wrote: On Tue, Oct 19, 2021 at 8:52 AM Richard Biener wrote: On Mon, Oct 18, 2021 at 4:03 PM Aldy Hernandez wrote: On 10/18/21 3:41 PM, Aldy Hernandez wrote: I've been experimenting with reducing the total number of threading passes, and I'd like to see if there's consensus/stomach for altering the pipeline. Note, that the goal is to remove forward threader clients, not the other way around. So, we should prefer to remove a VRP threader instance over a *.thread one immediately before VRP. After some playing, it looks like if we enable fully-resolving mode in the *.thread passes immediately preceeding VRP, we can remove the VRP threading passes altogether, thus removing 2 threading passes (and forward threading passes at that!). It occurs to me that we could also remove the threading before VRP passes, and enable a fully-resolving backward threader after VRP. I haven't played with this scenario, but it should be just as good. That being said, I don't know the intricacies of why we had both pre and post VRP threading passes, and if one is ideally better than the other. It was done because they were different threaders. Since the new threader uses built-in VRP it shouldn't really matter whether it's before or after VRP _for the threading_, but it might be that if threading runs before VRP then VRP itself can do a better job on cleaning up the IL. Good point. FWIW, earlier this season I played with replacing the VRPs with evrp instances (which fold far more conditionals) and I found that the threaders can actually find LESS opportunities after *vrp fold away things. I don't know if this is a good or a bad thing. Perhaps we should benchmark three alternatives: This is expected. VRP and DOM will sometimes find conditionals that they can fully optimize away. If those conditionals are left in the IL, the threaders would sometimes pick them up. So as we fold more in VRP/DOM, I'm not surprised there's fewer things for the threaders to find. In general, if a conditional can be removed by VRP/DOM, that's the preference. 1. Mainline 2. Fully resolving threader -> VRP -> No threading. 3. No threading -> VRP -> Full resolving threader. ...and see what the actual effect is, regardless of number of threaded paths. Speak of which, what's the blessed way of benchmarking performance nowadays? I've seen some PRs fly that measure some more lightweight benchmarks (open source?) than a full blown SPEC. Can't speak for anyone else, but I benchmark these days with spec in a cycle-approximate simulator :-) I'm isolated from the whims of turbo modes and the like. Of course it takes considerably longer than running on real hardware :-) + /* ?? Is this still needed. ?? */ /* Threading can leave many const/copy propagations in the IL. Clean them up. Instead of just copy_prop, we use ccp to compute alignment and nonzero bits. */ Yes, it's still needed but not for the stated reason - the VRP substitution and folding stage should deal with copy/constant propagation but we replaced the former copy propagation with CCP to re-compute nonzero bits & alignment so I'd change the comment to /* Run CCP to compute alignment and nonzero bits. */ The threaders (really the copiers) don't make much, if any, attempt to clean up the IL. So, for example, they'll often leave degenerate PHIs in the IL. We need to clean that crap up or we'll get false positives in the middle-end diagnostics. Jeff
Re: [PATCH 1/4] Add dump prints when execute_fixup_cfg removes a write only var store.
On 10/18/2021 10:54 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski While debugging PR 102703, I found it was hard to figure out where the store was being removed as there was no pass which was outputting why the store was removed. This adds to execute_fixup_cfg the output. Also note most of removals happen when execute_fixup_cfg is called from the inliner. gcc/ChangeLog: * tree-cfg.c (execute_fixup_cfg): Output when the statement is removed when it is a write only var. OK jeff
Re: [PATCH 2/4] Remove outdated comment about execute_fixup_cfg
On 10/18/2021 10:54 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski The comment about execute_fixup_cfg not being able to run as a standalone pass is not true for a long time now. It has been a standalone pass for a while now. gcc/ChangeLog: * tree-cfg.c (execute_fixup_cfg): Remove comment about standalone pass. OK jeff
Re: [PATCH 3/4] Factor out removal of write only stores from execute_fixup_cfg
On 10/18/2021 10:54 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski To make it easier to fix PR 102703, factoring this code out to its own function makes it easier to read and less indentions too. gcc/ChangeLog: * tree-cfg.c (maybe_remove_writeonly_store): New function factored out from ... (execute_fixup_cfg): Here. Call maybe_remove_writeonly_store. OK jeff
Re: [PATCH 4/4] Improve maybe_remove_writeonly_store to do a simple DCE for defining statement
On 10/18/2021 10:54 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski Instead of putting a full blow DCE after execute_fixup_cfg, it makes sense to try to remove the defining statement for the store that is being removed. Right now we only handle PHI node statements as there needs no extra checks except for it is only used once in the store statement. gcc/ChangeLog: * tree-cfg.c (maybe_remove_writeonly_store): Remove defining (PHI) statement of the store if possible. This is the only part that I consider at all controversial. Is the case you're trying to handle such that you have to eliminate the PHI immediately and can't wait until the next DCE pass? If so and we want to go this direction, should we pull this out into a little routine? I'm a bit surprised we don't already have one or more that do basically the same thing. Jeff
Re: how does vrp2 rearrange this?
On Tue, Oct 19, 2021 at 3:32 PM Andrew MacLeod wrote: > > On 10/19/21 5:13 PM, Andrew Pinski wrote: > > On Tue, Oct 19, 2021 at 1:29 PM Andrew MacLeod via Gcc-patches > > wrote: > >> using testcase ifcvt-4.c: > >> > >> > >> typedef int word __attribute__((mode(word))); > >> > >> word > >> foo (word x, word y, word a) > >> { > >> word i = x; > >> word j = y; > >> /* Try to make taking the branch likely. */ > >> __builtin_expect (x > y, 1); > >> if (x > y) > >> { > >> i = a; > >> j = i; > >> } > >> return i * j; > >> The testcase is broken anyways. The builtin_expect should be inside the if to have any effect. Look at the estimated values: if (x_3(D) > y_4(D)) goto ; [50.00%]<<-- has been reversed. else goto ; [50.00%] ;;succ: 4 [50.0% (guessed)] count:536870912 (estimated locally) (TRUE_VALUE,EXECUTABLE) ;;3 [50.0% (guessed)] count:536870912 (estimated locally) (FALSE_VALUE,EXECUTABLE) See how it is 50/50? The testcase is not even testing what it says it is testing. Just happened to work previously does not mean anything. Move the builtin_expect inside the if and try again. I am shocked it took this long to find the testcase issue really. Thanks, Andrew Pinski > >> The current VRP2 pass takes: > >> > >> if (x_3(D) > y_4(D)) > >> goto ; [50.00%]<<--- note the THEN target > >> else > >> goto ; [50.00%] > >> ;;succ: 3 [50.0% (guessed)] count:536870912 (estimated > >> locally) (TRUE_VALUE,EXECUTABLE) > >> ;;4 [50.0% (guessed)] count:536870912 (estimated > >> locally) (FALSE_VALUE,EXECUTABLE) > >> > >> ;; basic block 3, loop depth 0, count 536870912 (estimated locally), > >> maybe hot > >> ;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED) > >> ;;pred: 2 [50.0% (guessed)] count:536870912 (estimated > >> locally) (TRUE_VALUE,EXECUTABLE) > >> ;;succ: 4 [always] count:536870912 (estimated locally) > >> (FALLTHRU,EXECUTABLE) > >> > >> ;; basic block 4, loop depth 0, count 1073741824 (estimated locally), > >> maybe hot > >> ;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED) > >> ;;pred: 2 [50.0% (guessed)] count:536870912 (estimated > >> locally) (FALSE_VALUE,EXECUTABLE) > >> ;;3 [always] count:536870912 (estimated locally) > >> (FALLTHRU,EXECUTABLE) > >> # i_1 = PHI > >> # j_2 = PHI > >> _6 = i_1 * j_2; > >> # VUSE <.MEM_7(D)> > >> return _6; > >> > >> and turns it into : > >> > >> ;; basic block 2, loop depth 0, count 1073741824 (estimated locally), > >> maybe hot > >> ;;prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED) > >> ;;pred: ENTRY [always] count:1073741824 (estimated locally) > >> (FALLTHRU,EXECUTABLE) > >> if (x_3(D) > y_4(D)) > >> goto ; [50.00%]<<-- has been reversed. > >> else > >> goto ; [50.00%] > >> ;;succ: 4 [50.0% (guessed)] count:536870912 (estimated > >> locally) (TRUE_VALUE,EXECUTABLE) > >> ;;3 [50.0% (guessed)] count:536870912 (estimated > >> locally) (FALSE_VALUE,EXECUTABLE) > >> > >> ;; basic block 3, loop depth 0, count 536870912 (estimated locally), > >> maybe hot > >> ;;prev block 2, next block 4, flags: (NEW, VISITED) > >> ;;pred: 2 [50.0% (guessed)] count:536870912 (estimated > >> locally) (FALSE_VALUE,EXECUTABLE) > >> ;;succ: 4 [always] count:536870912 (estimated locally) > >> (FALLTHRU,EXECUTABLE) > >> > >> ;; basic block 4, loop depth 0, count 1073741824 (estimated locally), > >> maybe hot > >> ;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED) > >> ;;pred: 3 [always] count:536870912 (estimated locally) > >> (FALLTHRU,EXECUTABLE) > >> ;;2 [50.0% (guessed)] count:536870912 (estimated > >> locally) (TRUE_VALUE,EXECUTABLE) > >> # i_1 = PHI > >> # j_2 = PHI > >> _6 = i_1 * j_2; > >> # VUSE <.MEM_7(D)> > >> return _6; > >> > >> So the IF has reversed the targets (but not the condition), and the PHIs > >> in the target block have been adjusted. > >> > >> Where does this happen? I cannot find it. There doesnt seem to be > >> anything in the IL which is reflecting the "expect" from the original > >> code.. and if I run ranger vrp instead of classic_vrp, we don't do > >> this... so I'm missing something > > I suspect this is an artifact of inserting and removing the assert > > expressions in VRP rather than anything else. > > > Well, this fails the test because we eventually check in RTL land to > make sure the branch was reversed due to the expect, I guess. and it > seems that only VRP2 does the reversing. so its expected.. just not > obvious to me why, when or how. > > of course, the pass is th rtl.ce1 pass... and the check is for "2 true > changes made." . and if the IF isn't reversed , it doesnt "report" > that. I have no id
[PATCH] gcc-changelog: Add libffi/ to ignored_prefixes
Add libffi/ to ignored_prefixes for syncing with libffi upstream: commit c095f8f2e6f26bfc2ff8e3276c6af23ab153f5ff Author: H.J. Lu Date: Tue Aug 31 07:14:47 2021 -0700 libffi: Sync with libffi 3.4.2 Merged commit: f9ea41683444ebe11cfa45b05223899764df28fb to avoid remote: *** The following commit was rejected by your hooks.commit-extra-checker script (status: 1) remote: *** commit: c095f8f2e6f26bfc2ff8e3276c6af23ab153f5ff remote: *** ChangeLog format failed: remote: *** ERR: cannot find a ChangeLog location in message remote: *** remote: *** Please see: https://gcc.gnu.org/codingconventions.html#ChangeLogs remote: *** remote: error: hook declined to update refs/heads/master * gcc-changelog/git_commit.py (ignored_prefixes): Add libffi/. --- contrib/gcc-changelog/git_commit.py | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py index cf29f761964..60377b68ba1 100755 --- a/contrib/gcc-changelog/git_commit.py +++ b/contrib/gcc-changelog/git_commit.py @@ -134,6 +134,7 @@ ignored_prefixes = { 'gcc/go/gofrontend/', 'gcc/testsuite/gdc.test/', 'gcc/testsuite/go.test/test/', +'libffi/', 'libgo/', 'libphobos/libdruntime/', 'libphobos/src/', -- 2.32.0
[PATCH] Use fold_build2 instead fold_binary for TRUTH_AND
In tree_simplify_using_condition_1, there is code which should be logic: "op0 || op1"/"op0 && op1". When creating expression for TRUTH_OR_EXPR and TRUTH_AND_EXPR, fold_build2 would be used instead fold_binary which always return NULL_TREE for this kind of expr. Bootstrap and regtest pass on ppc and ppc64le. Is this ok for trunk? BR, Jiufu gcc/ChangeLog: * tree-ssa-loop-niter.c (tree_simplify_using_condition_1): Replace fold_binary with fold_build2 fir logical OR/AND. --- gcc/tree-ssa-loop-niter.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index 75109407124..27e11a29707 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -2290,12 +2290,12 @@ tree_simplify_using_condition_1 (tree cond, tree expr) /* Check whether COND ==> EXPR. */ notcond = invert_truthvalue (cond); - e = fold_binary (TRUTH_OR_EXPR, boolean_type_node, notcond, expr); + e = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, notcond, expr); if (e && integer_nonzerop (e)) return e; /* Check whether COND ==> not EXPR. */ - e = fold_binary (TRUTH_AND_EXPR, boolean_type_node, cond, expr); + e = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, cond, expr); if (e && integer_zerop (e)) return e; -- 2.17.1