On Wed, 26 Feb 2020, Richard Biener wrote:
> On Wed, 26 Feb 2020, luoxhu wrote:
>
> > On 2020/2/18 17:57, Richard Biener wrote:
> > > On Tue, 18 Feb 2020, Xionghu Luo wrote:
> > >
> > >> Store-merging pass should run twice, the reason is pass fre/pre will do
> > >> some kind of optimizations to instructions by:
> > >> 1. Converting the load from address to load from function arguments
> > >> (store_merging_30.c:foo1).
> > >> 2. Converting the byte access to
> > >> BIT_FIELD_REF(store_merging_30.c:foo2).
> > >> 3. Other bitfield combinations or potential interference
> > >> optimizations etc.
> > >> These optimizations will break the store chain, store-merging pass fails
> > >> to catch such kind of pattern so stores are not merged in middle end,
> > >> then consecutive stb/sth instructions(should be merged to stw) are
> > >> emitted
> > >> finally.
> > >>
> > >> And why not directly move store-merging pass(numbered 194) just before
> > >> fre1(numbered 35) is for case store_merging_14.c, 5 merges are done by
> > >> store_merging1, and 4 merges are done fore store_merge2. So, keep the
> > >> original store_merge as store_merge2 as store merge may be still
> > >> available
> > >> after other pass optimizations. Most of the 30 store_merging_N.c test
> > >> case dg-final pass name would be updated from store-merging to
> > >> store-merging1 once this RFC patch idea got confirmed.
> > >> Any comments? Thanks.
> > >
> > > Generally trying to solve a pass ordering issue by re-ordering/duplicating
> > > passes might help a single testcase but will surely pessimize others.
> > > So that's a no-go.
> > >
> > > What the testcase shows is that store-merging needs to operate
> > > similar to bswap when trying to find a "common" source for a combined
> > > store. That is, it should appropriately follow defs. For foo2 I expect
> > > it to be not too difficult, for foo1 I'd say we miss a FRE opportunity
> > > here (but Jakub is working on that one IIRC).
> >
> > Thanks Richard, not sure about my understanding and please correct if any.
> >
> > I tried Jukub's latest patch of "sccvn: Handle bitfields in
> > push_partial_def".
> > Got to know fre pass checks the load instruction's vuse chain and do the
> > constant
> > bitfield combines in push_partial_def, then
> > native_encode_expr/native_interpret_expr
> > can decode and encode the constant content and shift/combine the data.
> > This should be based on one change to my test case(by adding return
> > page->counters;)
> > to trigger the fre pass push all SSA name's partial_defs. Currently, for
> > SSA variables,
> > this encode/interpret is not supported yet, I suppose this is the
> > opportunity you mean.
> > As this case's input is not constant, so Jukub's patch doesn't fix it.
> >
> > struct page
> > {
> > union
> > {
> > unsigned counters;
> > struct
> > {
> > union
> > {
> > struct
> > {
> > unsigned inuse : 16;
> > unsigned inuse2 : 8;
> > unsigned objects : 5;
> > unsigned frozen : 3;
> > };
> > };
> > };
> > };
> > };
> >
> > unsigned
> > foo1 (struct page *page, unsigned long counters_new)
> > {
> > struct page tmp;
> > tmp.counters = counters_new;
> > page->inuse = tmp.inuse;
> > page->inuse2 = tmp.inuse2;
> > page->objects = tmp.objects;
> > page->frozen = tmp.frozen;
> > return page->counters;
> > }
> >
> > If "return page->counters;" is removed, this case won't match the fre's
> > current code path
> > in vn_reference_lookup_3 without a load(_14 = page_9(D)->D.2912.counters)
> > to walk all the vuses.
> > So it seems not a fre opportunity but exactly a store merging issue(Also
> > checked llvm that it
> > doesn't generate byte store and short store so it only produces 1 stw for
> > all patterns).
>
> Huh, but the code has the return page->counters and you expec it to
> return counters_new unchanged. Yes, you also expect the code to be
> optimized to
>
> page->counters = counters_new;
> return counters_new;
>
> and that part would indeed be a store-merging opportunity. Currently
> FRE does
>
> _1 = (unsigned int) counters_new_6(D);
> tmp.D.1943.counters = _1;
> _18 = (short unsigned int) counters_new_6(D);
> page_9(D)->D.1943.D.1942.D.1941.D.1940.inuse = _18;
> _19 = BIT_FIELD_REF <_1, 8, 16>;
> page_9(D)->D.1943.D.1942.D.1941.D.1940.inuse2 = _19;
> _4 = tmp.D.1943.D.1942.D.1941.D.1940.objects;
> page_9(D)->D.1943.D.1942.D.1941.D.1940.objects = _4;
> _5 = tmp.D.1943.D.1942.D.1941.D.1940.frozen;
> page_9(D)->D.1943.D.1942.D.1941.D.1940.frozen = _5;
>
> so it fails to optimize the other loads to BIT_FIELD_REFs of _1. If it
> did that then store-merging woudl already handle the code I think.
FRE fails to do that because of
/* ??? We can't handle bitfield precision extracts without
either using an alternate type for the BIT_FIELD_REF and
then doing a conversion or possibly adjusting the offset
according to endianness. */
&& (! INTEGRAL_TYPE_P (vr->type)
|| known_eq (ref->size, TYPE_PRECISION (vr->type)))
&& multiple_p (ref->size, BITS_PER_UNIT))
...
|| type_has_mode_precision_p (TREE_TYPE (def_rhs)))
where I applied the FUD rule and backed off possible endianess issues...
Richard.
> > Optimize this case in pass store-merging is reasonable, just as you said,
> > "trying to find a "common" source for a combined store.", it requires break
> > store-merging pass's behavior: so far store-merging pass track all store
> > instruction's RHS and stops when RHS is a load instruction with <base_addr,
> > bitsize,
> > bitpos, bitregion_start, bitregion_end> extracted, convert instructions and
> > bitfield
> > instructions generated by pass fre are ignored in
> > pass_store_merging::process_store.
> > To teach the process_store capture real common source requires refactoring
> > handled_load
> > to continue tracking even RHS is a load instruction and support the convert
> > instructions
> > and bitfield_instructions, this seems to be a big change.
>
> Well, as said above FRE would elide the loads and you'll see a series of
> stores with BIT_FIELD_REF as sources.
>
> > Regarding to Jakub's comments, merging could reduce many byte stores and
> > half stores
> > to improve performance for this type of case. There is already an
> > 033t.esra running before,
> > and not sure whether SRA should replace such kind of bitfield operations.
> > Adding a store-merging pass is so simple and many passes run more than
> > once...
> > So which would be best for this optimization, please? Thanks again :)
>
> I think the testcase as-is might be also optimized with Andrews pending
> work to lower bitfield accesses. So a slightly adjusted testcase not
> using bitfields but maybe char (or mixed char/short) members would be
> useful to track as well. There SRA might apply (SRA backs off of
> doing anything to bitfields currently).
>
> Richard.
>
> > Xionghu
> >
> > >
> > > Richard.
> > >
> > >> PS:
> > >> Before this patch, store_merging_30.c.035t.fre1:
> > >>
> > >> ... foo1:
> > >> Inserted _13 = (short unsigned int) counters_new_5(D);
> > >> Replaced tmp.D.2912.D.2911.D.2910.D.2909.inuse with _13 in all uses of
> > >> _1 = tmp.D.2912.D.2911.D.2910.D.2909.inuse;
> > >> Removing dead stmt _1 = tmp.D.2912.D.2911.D.2910.D.2909.inuse;
> > >> ... foo2:
> > >> Inserted _17 = BIT_FIELD_REF <_1, 8, 16>;
> > >> Replaced tmp.D.2926.D.2925.D.2924.D.2923.objects with _17 in all uses of
> > >> _3 = tmp.D.2926.D.2925.D.2924.D.2923.objects;
> > >> Removing dead stmt _3 = tmp.D.2926.D.2925.D.2924.D.2923.objects;
> > >>
> > >> foo1 asm:
> > >> rldicl 9,4,48,48
> > >> sth 4,0(3)
> > >> sth 9,2(3)
> > >> blr
> > >>
> > >> With this patch(similar for foo2):
> > >>
> > >> stw r4,0(r3)
> > >> blr
> > >>
> > >> gcc/ChangeLog
> > >>
> > >> 2020-02-18 Xiong Hu Luo <[email protected]>
> > >>
> > >> Part of PR middle-end/71509
> > >> gimple-ssa-store-merging.c (clone): New.
> > >> passes.def (pass_store_merging): New.
> > >>
> > >> gcc/testsuite/ChangeLog
> > >>
> > >> 2020-02-18 Xiong Hu Luo <[email protected]>
> > >>
> > >> Part of PR middle-end/71509
> > >> testsuite/gcc.dg/store_merging_14.c: Update.
> > >> testsuite/gcc.dg/store_merging_30.c: New.
> > >> ---
> > >> gcc/gimple-ssa-store-merging.c | 2 +
> > >> gcc/passes.def | 1 +
> > >> gcc/testsuite/gcc.dg/store_merging_14.c | 3 +-
> > >> gcc/testsuite/gcc.dg/store_merging_30.c | 86 +++++++++++++++++++++++++
> > >> 4 files changed, 91 insertions(+), 1 deletion(-)
> > >> create mode 100644 gcc/testsuite/gcc.dg/store_merging_30.c
> > >>
> > >> diff --git a/gcc/gimple-ssa-store-merging.c
> > >> b/gcc/gimple-ssa-store-merging.c
> > >> index 8371323ef4a..9a5bd49fc3a 100644
> > >> --- a/gcc/gimple-ssa-store-merging.c
> > >> +++ b/gcc/gimple-ssa-store-merging.c
> > >> @@ -2156,6 +2156,8 @@ public:
> > >> {
> > >> }
> > >>
> > >> + opt_pass * clone () { return new pass_store_merging (m_ctxt); }
> > >> +
> > >> /* Pass not supported for PDP-endian, nor for insane hosts or
> > >> target character sizes where native_{encode,interpret}_expr
> > >> doesn't work properly. */
> > >> diff --git a/gcc/passes.def b/gcc/passes.def
> > >> index 2bf2cb78fc5..e531531cb14 100644
> > >> --- a/gcc/passes.def
> > >> +++ b/gcc/passes.def
> > >> @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3. If not see
> > >> /* pass_build_ealias is a dummy pass that ensures that we
> > >> execute TODO_rebuild_alias at this point. */
> > >> NEXT_PASS (pass_build_ealias);
> > >> + NEXT_PASS (pass_store_merging);
> > >> NEXT_PASS (pass_fre, true /* may_iterate */);
> > >> NEXT_PASS (pass_early_vrp);
> > >> NEXT_PASS (pass_merge_phi);
> > >> diff --git a/gcc/testsuite/gcc.dg/store_merging_14.c
> > >> b/gcc/testsuite/gcc.dg/store_merging_14.c
> > >> index 9310aaf3489..bd120d18ac6 100644
> > >> --- a/gcc/testsuite/gcc.dg/store_merging_14.c
> > >> +++ b/gcc/testsuite/gcc.dg/store_merging_14.c
> > >> @@ -214,4 +214,5 @@ main ()
> > >> return 0;
> > >> }
> > >>
> > >> -/* { dg-final { scan-tree-dump-times "Merging successful" 9
> > >> "store-merging" } } */
> > >> +/* { dg-final { scan-tree-dump-times "Merging successful" 5
> > >> "store-merging1" } } */
> > >> +/* { dg-final { scan-tree-dump-times "Merging successful" 4
> > >> "store-merging2" } } */
> > >> diff --git a/gcc/testsuite/gcc.dg/store_merging_30.c
> > >> b/gcc/testsuite/gcc.dg/store_merging_30.c
> > >> new file mode 100644
> > >> index 00000000000..71369c3b196
> > >> --- /dev/null
> > >> +++ b/gcc/testsuite/gcc.dg/store_merging_30.c
> > >> @@ -0,0 +1,86 @@
> > >> +/* { dg-do run } */
> > >> +/* { dg-require-effective-target store_merge } */
> > >> +/* { dg-options "-O2 -fdump-tree-store-merging" } */
> > >> +
> > >> +typedef unsigned int atomic_t;
> > >> +
> > >> +struct page
> > >> +{
> > >> + union
> > >> + {
> > >> + unsigned long counters;
> > >> + struct
> > >> + {
> > >> + union
> > >> + {
> > >> + struct
> > >> + {
> > >> + unsigned inuse : 16;
> > >> + unsigned objects : 15;
> > >> + unsigned frozen : 1;
> > >> + };
> > >> + };
> > >> + };
> > >> + };
> > >> +};
> > >> +
> > >> +struct page2
> > >> +{
> > >> + union
> > >> + {
> > >> + unsigned counters;
> > >> + struct
> > >> + {
> > >> + union
> > >> + {
> > >> + struct
> > >> + {
> > >> + unsigned inuse : 16;
> > >> + unsigned objects : 8;
> > >> + unsigned frozen : 8;
> > >> + };
> > >> + };
> > >> + };
> > >> + };
> > >> +};
> > >> +
> > >> +__attribute__((noipa)) void
> > >> +foo1 (struct page *page, unsigned long counters_new)
> > >> +{
> > >> + struct page tmp;
> > >> + tmp.counters = counters_new;
> > >> + page->inuse = tmp.inuse;
> > >> + page->objects = tmp.objects;
> > >> + page->frozen = tmp.frozen;
> > >> +}
> > >> +
> > >> +__attribute__((noipa)) void
> > >> +foo2 (struct page2 *page2, unsigned long counters_new)
> > >> +{
> > >> + struct page2 tmp;
> > >> + tmp.counters = counters_new;
> > >> + page2->inuse = tmp.inuse;
> > >> + page2->objects = tmp.objects;
> > >> + page2->frozen = tmp.frozen;
> > >> +}
> > >> +
> > >> +int main ()
> > >> +{
> > >> + struct page page;
> > >> + foo1 (&page, 0x12345678ABCDEFFEUL);
> > >> +
> > >> + asm volatile ("" : : : "memory");
> > >> + if (page.frozen != 1 || page.objects != 0x2bcd || page.inuse !=
> > >> 0xeffe)
> > >> + __builtin_abort ();
> > >> +
> > >> + struct page2 page2;
> > >> + foo2 (&page2, 0x12345678ABCDEFFEUL);
> > >> +
> > >> + asm volatile ("" : : : "memory");
> > >> +
> > >> + if (page2.frozen != 0xab || page2.objects != 0xcd || page2.inuse !=
> > >> 0xeffe)
> > >> + __builtin_abort ();
> > >> + return 0;
> > >> +}
> > >> +
> > >> +/* { dg-final { scan-tree-dump-times "Merging successful" 2
> > >> "store-merging1" } } */
> > >>
> > >
> >
> >
>
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)