On Wed, 26 Feb 2020, Jakub Jelinek wrote: > Hi! > > The following testcase is miscompiled in 8+. > The problem is that check_no_overlap has a special case for INTEGER_CST > marked stores (i.e. stores of constants), if both all currenly merged stores > and the one under consideration for merging with them are marked that way, > it anticipates that other INTEGER_CST marked stores that overlap with those > and precede those (have smaller info->order) could be merged with those and > doesn't punt for them. > In PR86844 and PR87859 fixes I've then added quite large code that is > performed after check_no_overlap and tries to find out if we need and can > merge further INTEGER_CST marked stores, or need to punt. > Unfortunately, that code is there only in the overlapping case code and > the testcase below shows that we really need it even in the adjacent store > case. After sort_by_bitpos we have: > bitpos width order rhs_code > 96 32 3 INTEGER_CST > 128 32 1 INTEGER_CST > 128 128 2 INTEGER_CST > 192 32 0 MEM_REF > Because of the missing PR86844/PR87859-ish code in the adjacent store > case, we merge the adjacent (memory wise) stores 96/32/3 and 128/32/1, > and then we consider the 128-bit store which is in program-order in between > them, but in this case we punt, because the merging would extend the > merged store region from bitpos 96 and 64-bits to bitpos 96 and 160-bits > and that has an overlap with an incompatible store (the MEM_REF one). > The problem is that we can't really punt this way, because the 128-bit > store is in between those two we've merged already, so either we manage > to merge even that one together with the others, or would need to avoid > already merging the 96/32/3 and 128/32/1 stores together. > Now, rather than copying around the PR86844/PR87859 code to the other spot, > we can actually just use the overlapping code, merge_overlapping is really > a superset of merge_into, so that is what the patch does. If doing > adjacent store merge for rhs_code other than INTEGER_CST, I believe the > current code is already fine, check_no_overlap in that case doesn't make > the exception and will punt if there is some earlier (smaller order) > non-mergeable overlapping store. There is just one case that could be > problematic, if the merged_store has BIT_INSERT_EXPRs in them and the > new store is a constant store (INTEGER_CST rhs_code), then check_no_overlap > would do the exception and still would allow the special case. But we > really shouldn't have the special case in that case, so this patch also > changes check_no_overlap to just have a bool whether we should have the > special case or not. > > Bootstrapped/regtested on x86_64-linux and i686-linux on the trunk as well > as coprresponding backport to 8 branch, ok for trunk and release branches?
OK. > Note, as I said in the PR, for GCC11 we could consider performing some kind > of cheap DSE during the store merging (perhaps guarded with flag_tree_dse). > And another thing to consider is only consider as problematic non-mergeable > stores that not only have order smaller than last_order as currently, but > also have order larger than first_order, as in this testcase if we actually > ignored (not merged with anything at all) the 192/32/0 store, because it is > not in between the other stores we'd merge, it would be fine to merge the > other 3 stores, though of course the testcase can be easily adjusted by > putting the 192/32 store after the 128/32 store and then this patch would be > still needed. Though, I think I'd need more time thinking this over. When looking at the PR I thought that the issue must be with the sorting since if there's no intermediate store breaking the chain we should be able to "merge" all INTEGER_CSTs by simply native encoding them in program order? (similar to what VN does with partial defs) Richard. > 2020-02-25 Jakub Jelinek <ja...@redhat.com> > > PR tree-optimization/93820 > * gimple-ssa-store-merging.c (check_no_overlap): Change RHS_CODE > argument to ALL_INTEGER_CST_P boolean. > (imm_store_chain_info::try_coalesce_bswap): Adjust caller. > (imm_store_chain_info::coalesce_immediate_stores): Likewise. Handle > adjacent INTEGER_CST store into merged_store->only_constants like > overlapping one. > > * gcc.dg/pr93820.c: New test. > > --- gcc/gimple-ssa-store-merging.c.jj 2020-02-13 10:03:39.000000000 +0100 > +++ gcc/gimple-ssa-store-merging.c 2020-02-25 18:47:40.303092315 +0100 > @@ -2370,8 +2370,9 @@ gather_bswap_load_refs (vec<tree> *refs, > /* Check if there are any stores in M_STORE_INFO after index I > (where M_STORE_INFO must be sorted by sort_by_bitpos) that overlap > a potential group ending with END that have their order > - smaller than LAST_ORDER. RHS_CODE is the kind of store in the > - group. Return true if there are no such stores. > + smaller than LAST_ORDER. ALL_INTEGER_CST_P is true if > + all the stores already merged and the one under consideration > + have rhs_code of INTEGER_CST. Return true if there are no such stores. > Consider: > MEM[(long long int *)p_28] = 0; > MEM[(long long int *)p_28 + 8B] = 0; > @@ -2394,13 +2395,13 @@ gather_bswap_load_refs (vec<tree> *refs, > the MEM[(long long int *)p_28 + 8B] = 0; would now be before it, > so we need to refuse merging MEM[(long long int *)p_28 + 8B] = 0; > into the group. That way it will be its own store group and will > - not be touched. If RHS_CODE is INTEGER_CST and there are overlapping > + not be touched. If ALL_INTEGER_CST_P and there are overlapping > INTEGER_CST stores, those are mergeable using merge_overlapping, > so don't return false for those. */ > > static bool > check_no_overlap (vec<store_immediate_info *> m_store_info, unsigned int i, > - enum tree_code rhs_code, unsigned int last_order, > + bool all_integer_cst_p, unsigned int last_order, > unsigned HOST_WIDE_INT end) > { > unsigned int len = m_store_info.length (); > @@ -2410,7 +2411,7 @@ check_no_overlap (vec<store_immediate_in > if (info->bitpos >= end) > break; > if (info->order < last_order > - && (rhs_code != INTEGER_CST || info->rhs_code != INTEGER_CST)) > + && (!all_integer_cst_p || info->rhs_code != INTEGER_CST)) > return false; > } > return true; > @@ -2563,7 +2564,7 @@ imm_store_chain_info::try_coalesce_bswap > if (n.base_addr == NULL_TREE && !is_gimple_val (n.src)) > return false; > > - if (!check_no_overlap (m_store_info, last, LROTATE_EXPR, last_order, end)) > + if (!check_no_overlap (m_store_info, last, false, last_order, end)) > return false; > > /* Don't handle memory copy this way if normal non-bswap processing > @@ -2713,7 +2714,14 @@ imm_store_chain_info::coalesce_immediate > |---store 2---| > Overlapping stores. */ > else if (IN_RANGE (info->bitpos, merged_store->start, > - merged_store->start + merged_store->width - 1)) > + merged_store->start + merged_store->width - 1) > + /* |---store 1---||---store 2---| > + Handle also the consecutive INTEGER_CST stores case here, > + as we have here the code to deal with overlaps. */ > + || (info->bitregion_start <= merged_store->bitregion_end > + && info->rhs_code == INTEGER_CST > + && merged_store->only_constants > + && merged_store->can_be_merged_into (info))) > { > /* Only allow overlapping stores of constants. */ > if (info->rhs_code == INTEGER_CST > @@ -2725,8 +2733,7 @@ imm_store_chain_info::coalesce_immediate > unsigned HOST_WIDE_INT end > = MAX (merged_store->start + merged_store->width, > info->bitpos + info->bitsize); > - if (check_no_overlap (m_store_info, i, INTEGER_CST, > - last_order, end)) > + if (check_no_overlap (m_store_info, i, true, last_order, end)) > { > /* check_no_overlap call above made sure there are no > overlapping stores with non-INTEGER_CST rhs_code > @@ -2879,7 +2886,7 @@ imm_store_chain_info::coalesce_immediate > std::swap (info->ops[0], info->ops[1]); > info->ops_swapped_p = true; > } > - if (check_no_overlap (m_store_info, i, info->rhs_code, > + if (check_no_overlap (m_store_info, i, false, > MAX (merged_store->last_order, info->order), > MAX (merged_store->start + merged_store->width, > info->bitpos + info->bitsize))) > --- gcc/testsuite/gcc.dg/pr93820.c.jj 2020-02-25 18:34:01.535154614 +0100 > +++ gcc/testsuite/gcc.dg/pr93820.c 2020-02-25 14:32:31.600207611 +0100 > @@ -0,0 +1,26 @@ > +/* PR tree-optimization/93820 */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -fno-tree-dse" } */ > + > +typedef int v4si __attribute__((vector_size(4 * sizeof (int)))); > +int a[10]; > + > +__attribute__((noipa)) int > +foo (int *p) > +{ > + a[6] = *p; > + a[4] = 1; > + *(((v4si *)&a[0]) + 1) = (v4si) { 0, 0, 0, 0 }; > + a[3] = 0; > +} > + > +int > +main () > +{ > + int i = 0; > + foo (&i); > + for (i = 0; i < 10; i++) > + if (a[i]) > + __builtin_abort (); > + return 0; > +} > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)