On Mon, 5 Nov 2018, Jakub Jelinek wrote: > Hi! > > My recent change for PR86844 broke the following testcases. > > The problem is that we are processing the stores in the bitpos/bitsize > order. The change PR86844 change was about if we try to merge two > overlapping stores of INTEGER_CST, we need to go through all other stores > that are overlapping those and come in program order before the last of > those; if they are INTEGER_CST stores, we can merge them, if they are other > stores, we punt completely. If there are any stores ordered in the > bitpos/bitsize ordering in between those (thus also overlapping), we keep > them unmerged as is. > > The problem is that if we this way skip any stores, we need to ensure that > we don't merge this store group with any statements beyond that, because > merged store group is stored at the point of the last statement, and that > would overwrite the skipped stores. Let's use the testcase: > i.f.i[0] = 0; // bitpos 0, bitsize 32 > i.f.i[1] = 0; // bitpos 32, bitsize 32 > i.f.i[2] = 0; // bitpos 64, bitsize 32 > i.f.q.f7 = 1; // all other stores are bitpos the number after f, bitsize 1 > i.f.q.f2 = 1; > i.f.q.f21 = 1; > i.f.q.f19 = 1; > i.f.q.f14 = 1; > i.f.q.f5 = 1; > i.f.q.f0 = 1; > i.f.q.f15 = 1; > i.f.q.f16 = 1; > i.f.q.f6 = 1; > i.f.q.f9 = 1; > i.f.q.f17 = 1; > i.f.q.f1 = 1; > i.f.q.f8 = 1; > i.f.q.f13 = 1; > i.f.q.f66 = 1; > In the bitpos/bitsize order, this is: > i.f.i[0] = 0; // bitpos 0, bitsize 32 > i.f.q.f0 = 1; > i.f.q.f1 = 1; > i.f.q.f2 = 1; > i.f.q.f5 = 1; > i.f.q.f6 = 1; > i.f.q.f7 = 1; > i.f.q.f8 = 1; > i.f.q.f9 = 1; > i.f.q.f13 = 1; > i.f.q.f14 = 1; > i.f.q.f15 = 1; > i.f.q.f16 = 1; > i.f.q.f17 = 1; > i.f.q.f19 = 1; > i.f.q.f21 = 1; > i.f.i[1] = 0; // bitpos 32, bitsize 32 > i.f.i[2] = 0; // bitpos 64, bitsize 32 > i.f.q.f66 = 1; > and when trying to merge overlapping i.f.i[0] = 0; with i.f.q.f0 = 1;, > we also merge in overlapping f7, f2, f21, f19, f14, f5 and skip > overlapping, but coming after the f0 store, stores f1 .. f17 other than the > above mentioned ones (so those stores remain in the source). Now, in this > case the merged store group would be emitted where the f0 store is and the > optimization would be still correct. The problem is that we keep going and > merge that store group with i[1] and i[2] stores (that is ok, those are > before the f0 store, so again, still at the spot of f0 store), but then also > merge with f66 store and that makes it incorrect, the > i.f.q.f15 = 1; > i.f.q.f16 = 1; > i.f.q.f6 = 1; > i.f.q.f9 = 1; > i.f.q.f17 = 1; > i.f.q.f1 = 1; > i.f.q.f8 = 1; > i.f.q.f13 = 1; > stores remain in the IL, but because the last store of the merged group is > now at f66 store, that is where those bits are overwritten with the store > of 96 bits together. The following patch fixes that by making sure that if > we skip this way any stores (keep them in the IL), then we ensure that we > don't grow the store group beyond that last_order point. > > Now, the first testcase also shows that we skip those stmts completely > unnecessarily, while they are overlapping stores that come after the > latter of the two overlapping stores being merged, they are still > INTEGER_CST stores and in the past we've merged them all into a constant > store just fine. The reason PR86844 was done is that there can be > non-INTEGER_CST stores that prevent the merging. So, this patch also > remembers which of the overlapping stores we would skip are INTEGER_CST > stores and which are other stores; if we see any to be skipped INTEGER_CSTs, > we retry the analysis with including those too; if that works out, we use > that, if it doesn't, we revert to what we chose before (skipping those > stores with magic to ensure we don't merge with other stores later in > program order). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and 8.3?
OK. Thanks, Richard. > 2018-11-05 Jakub Jelinek <ja...@redhat.com> > > PR tree-optimization/87859 > * gimple-ssa-store-merging.c (struct merged_store_group): Add > only_constants and first_nonmergeable_order members. > (merged_store_group::merged_store_group): Initialize them. > (merged_store_group::do_merge): Clear only_constants member if > adding something other than INTEGER_CST store. > (imm_store_chain_info::coalesce_immediate_stores): Don't merge > stores with order >= first_nonmergeable_order. Use > merged_store->only_constants instead of always recomputing it. > Set merged_store->first_nonmergeable_order if we've skipped any > stores. Attempt to merge overlapping INTEGER_CST stores that > we would otherwise skip. > > * gcc.dg/store_merging_24.c: New test. > * gcc.dg/store_merging_25.c: New test. > > --- gcc/gimple-ssa-store-merging.c.jj 2018-11-02 15:36:30.802999681 +0100 > +++ gcc/gimple-ssa-store-merging.c 2018-11-05 10:20:25.980370963 +0100 > @@ -1429,6 +1429,8 @@ struct merged_store_group > unsigned int first_order; > unsigned int last_order; > bool bit_insertion; > + bool only_constants; > + unsigned int first_nonmergeable_order; > > auto_vec<store_immediate_info *> stores; > /* We record the first and last original statements in the sequence because > @@ -1821,6 +1823,8 @@ merged_store_group::merged_store_group ( > val = NULL; > mask = NULL; > bit_insertion = false; > + only_constants = info->rhs_code == INTEGER_CST; > + first_nonmergeable_order = ~0U; > unsigned HOST_WIDE_INT align_bitpos = 0; > get_object_alignment_1 (gimple_assign_lhs (info->stmt), > &align, &align_bitpos); > @@ -1936,6 +1940,8 @@ merged_store_group::do_merge (store_imme > first_order = info->order; > first_stmt = stmt; > } > + if (info->rhs_code != INTEGER_CST) > + only_constants = false; > } > > /* Merge a store recorded by INFO into this merged store. > @@ -2696,32 +2702,25 @@ imm_store_chain_info::coalesce_immediate > } > } > > + if (info->order >= merged_store->first_nonmergeable_order) > + ; > + > /* |---store 1---| > |---store 2---| > Overlapping stores. */ > - if (IN_RANGE (info->bitpos, merged_store->start, > - merged_store->start + merged_store->width - 1)) > + else if (IN_RANGE (info->bitpos, merged_store->start, > + merged_store->start + merged_store->width - 1)) > { > /* Only allow overlapping stores of constants. */ > - if (info->rhs_code == INTEGER_CST) > + if (info->rhs_code == INTEGER_CST && merged_store->only_constants) > { > - bool only_constants = true; > - store_immediate_info *infoj; > - unsigned int j; > - FOR_EACH_VEC_ELT (merged_store->stores, j, infoj) > - if (infoj->rhs_code != INTEGER_CST) > - { > - only_constants = false; > - break; > - } > unsigned int last_order > = MAX (merged_store->last_order, info->order); > unsigned HOST_WIDE_INT end > = MAX (merged_store->start + merged_store->width, > info->bitpos + info->bitsize); > - if (only_constants > - && check_no_overlap (m_store_info, i, INTEGER_CST, > - last_order, end)) > + if (check_no_overlap (m_store_info, i, INTEGER_CST, > + last_order, end)) > { > /* check_no_overlap call above made sure there are no > overlapping stores with non-INTEGER_CST rhs_code > @@ -2741,36 +2740,97 @@ imm_store_chain_info::coalesce_immediate > store_by_bitpos order it comes after the last store that > we can't merge with them. We can merge the first 3 stores > and keep the last store as is though. */ > - unsigned int len = m_store_info.length (), k = i; > - for (unsigned int j = i + 1; j < len; ++j) > + unsigned int len = m_store_info.length (); > + unsigned int try_order = last_order; > + unsigned int first_nonmergeable_order; > + unsigned int k; > + bool last_iter = false; > + int attempts = 0; > + do > { > - store_immediate_info *info2 = m_store_info[j]; > - if (info2->bitpos >= end) > - break; > - if (info2->order < last_order) > + unsigned int max_order = 0; > + unsigned first_nonmergeable_int_order = ~0U; > + unsigned HOST_WIDE_INT this_end = end; > + k = i; > + first_nonmergeable_order = ~0U; > + for (unsigned int j = i + 1; j < len; ++j) > { > - if (info2->rhs_code != INTEGER_CST) > + store_immediate_info *info2 = m_store_info[j]; > + if (info2->bitpos >= this_end) > + break; > + if (info2->order < try_order) > { > - /* Normally check_no_overlap makes sure this > - doesn't happen, but if end grows below, then > - we need to process more stores than > - check_no_overlap verified. Example: > - MEM[(int *)p_5] = 0; > - MEM[(short *)p_5 + 3B] = 1; > - MEM[(char *)p_5 + 4B] = _9; > - MEM[(char *)p_5 + 2B] = 2; */ > - k = 0; > - break; > + if (info2->rhs_code != INTEGER_CST) > + { > + /* Normally check_no_overlap makes sure this > + doesn't happen, but if end grows below, > + then we need to process more stores than > + check_no_overlap verified. Example: > + MEM[(int *)p_5] = 0; > + MEM[(short *)p_5 + 3B] = 1; > + MEM[(char *)p_5 + 4B] = _9; > + MEM[(char *)p_5 + 2B] = 2; */ > + k = 0; > + break; > + } > + k = j; > + this_end = MAX (this_end, > + info2->bitpos + info2->bitsize); > } > - k = j; > - end = MAX (end, info2->bitpos + info2->bitsize); > + else if (info2->rhs_code == INTEGER_CST > + && !last_iter) > + { > + max_order = MAX (max_order, info2->order + 1); > + first_nonmergeable_int_order > + = MIN (first_nonmergeable_int_order, > + info2->order); > + } > + else > + first_nonmergeable_order > + = MIN (first_nonmergeable_order, info2->order); > } > + if (k == 0) > + { > + if (last_order == try_order) > + break; > + /* If this failed, but only because we grew > + try_order, retry with the last working one, > + so that we merge at least something. */ > + try_order = last_order; > + last_iter = true; > + continue; > + } > + last_order = try_order; > + /* Retry with a larger try_order to see if we could > + merge some further INTEGER_CST stores. */ > + if (max_order > + && (first_nonmergeable_int_order > + < first_nonmergeable_order)) > + { > + try_order = MIN (max_order, > + first_nonmergeable_order); > + try_order > + = MIN (try_order, > + merged_store->first_nonmergeable_order); > + if (try_order > last_order && ++attempts < 16) > + continue; > + } > + first_nonmergeable_order > + = MIN (first_nonmergeable_order, > + first_nonmergeable_int_order); > + end = this_end; > + break; > } > + while (1); > > if (k != 0) > { > merged_store->merge_overlapping (info); > > + merged_store->first_nonmergeable_order > + = MIN (merged_store->first_nonmergeable_order, > + first_nonmergeable_order); > + > for (unsigned int j = i + 1; j <= k; j++) > { > store_immediate_info *info2 = m_store_info[j]; > @@ -2778,7 +2838,8 @@ imm_store_chain_info::coalesce_immediate > if (info2->order < last_order) > { > gcc_assert (info2->rhs_code == INTEGER_CST); > - merged_store->merge_overlapping (info2); > + if (info != info2) > + merged_store->merge_overlapping (info2); > } > /* Other stores are kept and not merged in any > way. */ > --- gcc/testsuite/gcc.dg/store_merging_24.c.jj 2018-11-02 > 15:36:59.829518310 +0100 > +++ gcc/testsuite/gcc.dg/store_merging_24.c 2018-11-02 15:36:59.829518310 > +0100 > @@ -0,0 +1,75 @@ > +/* PR tree-optimization/87859 */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -fdump-tree-store-merging-details" } */ > +/* { dg-final { scan-tree-dump "New sequence of \[23] stores to replace old > one of 19 stores" "store-merging" { target i?86-*-* x86_64-*-* } } } */ > +/* { dg-final { scan-tree-dump "New sequence of 1 stores to replace old one > of 6 stores" "store-merging" { target i?86-*-* x86_64-*-* } } } */ > + > +struct S { > + union F { > + struct T { > +#define A(n) unsigned n:1; > +#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) \ > + A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) > + B(f) B(f1) B(f2) B(f3) B(f4) B(f5) > + A(f60) A(f61) A(f62) A(f63) A(f64) A(f65) A(f66) > + } q; > + unsigned int i[3]; > + } f; > +}; > + > +struct S s = { > + .f.q.f0 = 1, .f.q.f1 = 1, .f.q.f2 = 1, .f.q.f5 = 1, .f.q.f6 = 1, > + .f.q.f7 = 1, .f.q.f8 = 1, .f.q.f9 = 1, .f.q.f13 = 1, .f.q.f14 = 1, > + .f.q.f15 = 1, .f.q.f16 = 1, .f.q.f17 = 1, .f.q.f19 = 1, .f.q.f21 = 1, > + .f.q.f66 = 1 > +}; > + > +__attribute__((noipa)) void > +bar (unsigned *x) > +{ > + if (x[0] != s.f.i[0] || x[1] != s.f.i[1] || x[2] != s.f.i[2]) > + __builtin_abort (); > +} > + > +__attribute__((noipa)) void > +foo (unsigned char *state) > +{ > + struct S i; > + i.f.i[0] = 0; > + i.f.i[1] = 0; > + i.f.i[2] = 0; > + i.f.q.f7 = 1; > + i.f.q.f2 = 1; > + i.f.q.f21 = 1; > + i.f.q.f19 = 1; > + i.f.q.f14 = 1; > + i.f.q.f5 = 1; > + i.f.q.f0 = 1; > + i.f.q.f15 = 1; > + i.f.q.f16 = 1; > + i.f.q.f6 = 1; > + i.f.q.f9 = 1; > + i.f.q.f17 = 1; > + i.f.q.f1 = 1; > + i.f.q.f8 = 1; > + i.f.q.f13 = 1; > + i.f.q.f66 = 1; > + if (*state) > + { > + i.f.q.f37 = 1; > + i.f.q.f38 = 1; > + i.f.q.f39 = 1; > + i.f.q.f40 = 1; > + i.f.q.f41 = 1; > + i.f.q.f36 = 1; > + } > + bar (i.f.i); > +} > + > +int > +main () > +{ > + unsigned char z = 0; > + foo (&z); > + return 0; > +} > --- gcc/testsuite/gcc.dg/store_merging_25.c.jj 2018-11-05 > 10:28:41.544249761 +0100 > +++ gcc/testsuite/gcc.dg/store_merging_25.c 2018-11-05 10:28:04.033864475 > +0100 > @@ -0,0 +1,75 @@ > +/* PR tree-optimization/87859 */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -fdump-tree-store-merging-details" } */ > +/* { dg-final { scan-tree-dump "New sequence of \[23] stores to replace old > one of 14 stores" "store-merging" { target i?86-*-* x86_64-*-* } } } */ > +/* { dg-final { scan-tree-dump "New sequence of 1 stores to replace old one > of 6 stores" "store-merging" { target i?86-*-* x86_64-*-* } } } */ > + > +struct S { > + union F { > + struct T { > +#define A(n) unsigned n:1; > +#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) \ > + A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) > + B(f) B(f1) B(f2) B(f3) B(f4) B(f5) > + A(f60) A(f61) A(f62) A(f63) A(f64) A(f65) A(f66) > + } q; > + unsigned int i[3]; > + } f; > +}; > + > +struct S s = { > + .f.q.f0 = 1, .f.q.f1 = 1, .f.q.f2 = 1, .f.q.f5 = 1, .f.q.f6 = 1, > + .f.q.f7 = 1, .f.q.f8 = 1, .f.q.f9 = 1, .f.q.f13 = 1, .f.q.f14 = 1, > + .f.q.f15 = 1, .f.q.f16 = 1, .f.q.f17 = 1, .f.q.f19 = 1, .f.q.f21 = 1, > + .f.q.f66 = 1 > +}; > + > +__attribute__((noipa)) void > +bar (unsigned *x) > +{ > + if (x[0] != s.f.i[0] || x[1] != s.f.i[1] || x[2] != s.f.i[2]) > + __builtin_abort (); > +} > + > +__attribute__((noipa)) void > +foo (unsigned char *state, unsigned char c) > +{ > + struct S i; > + i.f.i[0] = 0; > + i.f.i[1] = 0; > + i.f.i[2] = 0; > + i.f.q.f7 = 1; > + i.f.q.f2 = 1; > + i.f.q.f21 = 1; > + i.f.q.f19 = 1; > + i.f.q.f14 = 1; > + i.f.q.f5 = 1; > + i.f.q.f0 = 1; > + i.f.q.f15 = 1; > + i.f.q.f16 = 1; > + i.f.q.f6 = 1; > + i.f.q.f9 = 1; > + i.f.q.f17 = c; > + i.f.q.f1 = 1; > + i.f.q.f8 = 1; > + i.f.q.f13 = 1; > + i.f.q.f66 = 1; > + if (*state) > + { > + i.f.q.f37 = 1; > + i.f.q.f38 = 1; > + i.f.q.f39 = 1; > + i.f.q.f40 = 1; > + i.f.q.f41 = 1; > + i.f.q.f36 = 1; > + } > + bar (i.f.i); > +} > + > +int > +main () > +{ > + unsigned char z = 0; > + foo (&z, 1); > + return 0; > +} > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)