On Wed, 26 Feb 2020, Richard Biener wrote: > On Wed, 26 Feb 2020, luoxhu wrote: > > > On 2020/2/18 17:57, Richard Biener wrote: > > > On Tue, 18 Feb 2020, Xionghu Luo wrote: > > > > > >> Store-merging pass should run twice, the reason is pass fre/pre will do > > >> some kind of optimizations to instructions by: > > >> 1. Converting the load from address to load from function arguments > > >> (store_merging_30.c:foo1). > > >> 2. Converting the byte access to > > >> BIT_FIELD_REF(store_merging_30.c:foo2). > > >> 3. Other bitfield combinations or potential interference > > >> optimizations etc. > > >> These optimizations will break the store chain, store-merging pass fails > > >> to catch such kind of pattern so stores are not merged in middle end, > > >> then consecutive stb/sth instructions(should be merged to stw) are > > >> emitted > > >> finally. > > >> > > >> And why not directly move store-merging pass(numbered 194) just before > > >> fre1(numbered 35) is for case store_merging_14.c, 5 merges are done by > > >> store_merging1, and 4 merges are done fore store_merge2. So, keep the > > >> original store_merge as store_merge2 as store merge may be still > > >> available > > >> after other pass optimizations. Most of the 30 store_merging_N.c test > > >> case dg-final pass name would be updated from store-merging to > > >> store-merging1 once this RFC patch idea got confirmed. > > >> Any comments? Thanks. > > > > > > Generally trying to solve a pass ordering issue by re-ordering/duplicating > > > passes might help a single testcase but will surely pessimize others. > > > So that's a no-go. > > > > > > What the testcase shows is that store-merging needs to operate > > > similar to bswap when trying to find a "common" source for a combined > > > store. That is, it should appropriately follow defs. For foo2 I expect > > > it to be not too difficult, for foo1 I'd say we miss a FRE opportunity > > > here (but Jakub is working on that one IIRC). > > > > Thanks Richard, not sure about my understanding and please correct if any. > > > > I tried Jukub's latest patch of "sccvn: Handle bitfields in > > push_partial_def". > > Got to know fre pass checks the load instruction's vuse chain and do the > > constant > > bitfield combines in push_partial_def, then > > native_encode_expr/native_interpret_expr > > can decode and encode the constant content and shift/combine the data. > > This should be based on one change to my test case(by adding return > > page->counters;) > > to trigger the fre pass push all SSA name's partial_defs. Currently, for > > SSA variables, > > this encode/interpret is not supported yet, I suppose this is the > > opportunity you mean. > > As this case's input is not constant, so Jukub's patch doesn't fix it. > > > > struct page > > { > > union > > { > > unsigned counters; > > struct > > { > > union > > { > > struct > > { > > unsigned inuse : 16; > > unsigned inuse2 : 8; > > unsigned objects : 5; > > unsigned frozen : 3; > > }; > > }; > > }; > > }; > > }; > > > > unsigned > > foo1 (struct page *page, unsigned long counters_new) > > { > > struct page tmp; > > tmp.counters = counters_new; > > page->inuse = tmp.inuse; > > page->inuse2 = tmp.inuse2; > > page->objects = tmp.objects; > > page->frozen = tmp.frozen; > > return page->counters; > > } > > > > If "return page->counters;" is removed, this case won't match the fre's > > current code path > > in vn_reference_lookup_3 without a load(_14 = page_9(D)->D.2912.counters) > > to walk all the vuses. > > So it seems not a fre opportunity but exactly a store merging issue(Also > > checked llvm that it > > doesn't generate byte store and short store so it only produces 1 stw for > > all patterns). > > Huh, but the code has the return page->counters and you expec it to > return counters_new unchanged. Yes, you also expect the code to be > optimized to > > page->counters = counters_new; > return counters_new; > > and that part would indeed be a store-merging opportunity. Currently > FRE does > > _1 = (unsigned int) counters_new_6(D); > tmp.D.1943.counters = _1; > _18 = (short unsigned int) counters_new_6(D); > page_9(D)->D.1943.D.1942.D.1941.D.1940.inuse = _18; > _19 = BIT_FIELD_REF <_1, 8, 16>; > page_9(D)->D.1943.D.1942.D.1941.D.1940.inuse2 = _19; > _4 = tmp.D.1943.D.1942.D.1941.D.1940.objects; > page_9(D)->D.1943.D.1942.D.1941.D.1940.objects = _4; > _5 = tmp.D.1943.D.1942.D.1941.D.1940.frozen; > page_9(D)->D.1943.D.1942.D.1941.D.1940.frozen = _5; > > so it fails to optimize the other loads to BIT_FIELD_REFs of _1. If it > did that then store-merging woudl already handle the code I think.
FRE fails to do that because of /* ??? We can't handle bitfield precision extracts without either using an alternate type for the BIT_FIELD_REF and then doing a conversion or possibly adjusting the offset according to endianness. */ && (! INTEGRAL_TYPE_P (vr->type) || known_eq (ref->size, TYPE_PRECISION (vr->type))) && multiple_p (ref->size, BITS_PER_UNIT)) ... || type_has_mode_precision_p (TREE_TYPE (def_rhs))) where I applied the FUD rule and backed off possible endianess issues... Richard. > > Optimize this case in pass store-merging is reasonable, just as you said, > > "trying to find a "common" source for a combined store.", it requires break > > store-merging pass's behavior: so far store-merging pass track all store > > instruction's RHS and stops when RHS is a load instruction with <base_addr, > > bitsize, > > bitpos, bitregion_start, bitregion_end> extracted, convert instructions and > > bitfield > > instructions generated by pass fre are ignored in > > pass_store_merging::process_store. > > To teach the process_store capture real common source requires refactoring > > handled_load > > to continue tracking even RHS is a load instruction and support the convert > > instructions > > and bitfield_instructions, this seems to be a big change. > > Well, as said above FRE would elide the loads and you'll see a series of > stores with BIT_FIELD_REF as sources. > > > Regarding to Jakub's comments, merging could reduce many byte stores and > > half stores > > to improve performance for this type of case. There is already an > > 033t.esra running before, > > and not sure whether SRA should replace such kind of bitfield operations. > > Adding a store-merging pass is so simple and many passes run more than > > once... > > So which would be best for this optimization, please? Thanks again :) > > I think the testcase as-is might be also optimized with Andrews pending > work to lower bitfield accesses. So a slightly adjusted testcase not > using bitfields but maybe char (or mixed char/short) members would be > useful to track as well. There SRA might apply (SRA backs off of > doing anything to bitfields currently). > > Richard. > > > Xionghu > > > > > > > > Richard. > > > > > >> PS: > > >> Before this patch, store_merging_30.c.035t.fre1: > > >> > > >> ... foo1: > > >> Inserted _13 = (short unsigned int) counters_new_5(D); > > >> Replaced tmp.D.2912.D.2911.D.2910.D.2909.inuse with _13 in all uses of > > >> _1 = tmp.D.2912.D.2911.D.2910.D.2909.inuse; > > >> Removing dead stmt _1 = tmp.D.2912.D.2911.D.2910.D.2909.inuse; > > >> ... foo2: > > >> Inserted _17 = BIT_FIELD_REF <_1, 8, 16>; > > >> Replaced tmp.D.2926.D.2925.D.2924.D.2923.objects with _17 in all uses of > > >> _3 = tmp.D.2926.D.2925.D.2924.D.2923.objects; > > >> Removing dead stmt _3 = tmp.D.2926.D.2925.D.2924.D.2923.objects; > > >> > > >> foo1 asm: > > >> rldicl 9,4,48,48 > > >> sth 4,0(3) > > >> sth 9,2(3) > > >> blr > > >> > > >> With this patch(similar for foo2): > > >> > > >> stw r4,0(r3) > > >> blr > > >> > > >> gcc/ChangeLog > > >> > > >> 2020-02-18 Xiong Hu Luo <luo...@linux.ibm.com> > > >> > > >> Part of PR middle-end/71509 > > >> gimple-ssa-store-merging.c (clone): New. > > >> passes.def (pass_store_merging): New. > > >> > > >> gcc/testsuite/ChangeLog > > >> > > >> 2020-02-18 Xiong Hu Luo <luo...@linux.ibm.com> > > >> > > >> Part of PR middle-end/71509 > > >> testsuite/gcc.dg/store_merging_14.c: Update. > > >> testsuite/gcc.dg/store_merging_30.c: New. > > >> --- > > >> gcc/gimple-ssa-store-merging.c | 2 + > > >> gcc/passes.def | 1 + > > >> gcc/testsuite/gcc.dg/store_merging_14.c | 3 +- > > >> gcc/testsuite/gcc.dg/store_merging_30.c | 86 +++++++++++++++++++++++++ > > >> 4 files changed, 91 insertions(+), 1 deletion(-) > > >> create mode 100644 gcc/testsuite/gcc.dg/store_merging_30.c > > >> > > >> diff --git a/gcc/gimple-ssa-store-merging.c > > >> b/gcc/gimple-ssa-store-merging.c > > >> index 8371323ef4a..9a5bd49fc3a 100644 > > >> --- a/gcc/gimple-ssa-store-merging.c > > >> +++ b/gcc/gimple-ssa-store-merging.c > > >> @@ -2156,6 +2156,8 @@ public: > > >> { > > >> } > > >> > > >> + opt_pass * clone () { return new pass_store_merging (m_ctxt); } > > >> + > > >> /* Pass not supported for PDP-endian, nor for insane hosts or > > >> target character sizes where native_{encode,interpret}_expr > > >> doesn't work properly. */ > > >> diff --git a/gcc/passes.def b/gcc/passes.def > > >> index 2bf2cb78fc5..e531531cb14 100644 > > >> --- a/gcc/passes.def > > >> +++ b/gcc/passes.def > > >> @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3. If not see > > >> /* pass_build_ealias is a dummy pass that ensures that we > > >> execute TODO_rebuild_alias at this point. */ > > >> NEXT_PASS (pass_build_ealias); > > >> + NEXT_PASS (pass_store_merging); > > >> NEXT_PASS (pass_fre, true /* may_iterate */); > > >> NEXT_PASS (pass_early_vrp); > > >> NEXT_PASS (pass_merge_phi); > > >> diff --git a/gcc/testsuite/gcc.dg/store_merging_14.c > > >> b/gcc/testsuite/gcc.dg/store_merging_14.c > > >> index 9310aaf3489..bd120d18ac6 100644 > > >> --- a/gcc/testsuite/gcc.dg/store_merging_14.c > > >> +++ b/gcc/testsuite/gcc.dg/store_merging_14.c > > >> @@ -214,4 +214,5 @@ main () > > >> return 0; > > >> } > > >> > > >> -/* { dg-final { scan-tree-dump-times "Merging successful" 9 > > >> "store-merging" } } */ > > >> +/* { dg-final { scan-tree-dump-times "Merging successful" 5 > > >> "store-merging1" } } */ > > >> +/* { dg-final { scan-tree-dump-times "Merging successful" 4 > > >> "store-merging2" } } */ > > >> diff --git a/gcc/testsuite/gcc.dg/store_merging_30.c > > >> b/gcc/testsuite/gcc.dg/store_merging_30.c > > >> new file mode 100644 > > >> index 00000000000..71369c3b196 > > >> --- /dev/null > > >> +++ b/gcc/testsuite/gcc.dg/store_merging_30.c > > >> @@ -0,0 +1,86 @@ > > >> +/* { dg-do run } */ > > >> +/* { dg-require-effective-target store_merge } */ > > >> +/* { dg-options "-O2 -fdump-tree-store-merging" } */ > > >> + > > >> +typedef unsigned int atomic_t; > > >> + > > >> +struct page > > >> +{ > > >> + union > > >> + { > > >> + unsigned long counters; > > >> + struct > > >> + { > > >> + union > > >> + { > > >> + struct > > >> + { > > >> + unsigned inuse : 16; > > >> + unsigned objects : 15; > > >> + unsigned frozen : 1; > > >> + }; > > >> + }; > > >> + }; > > >> + }; > > >> +}; > > >> + > > >> +struct page2 > > >> +{ > > >> + union > > >> + { > > >> + unsigned counters; > > >> + struct > > >> + { > > >> + union > > >> + { > > >> + struct > > >> + { > > >> + unsigned inuse : 16; > > >> + unsigned objects : 8; > > >> + unsigned frozen : 8; > > >> + }; > > >> + }; > > >> + }; > > >> + }; > > >> +}; > > >> + > > >> +__attribute__((noipa)) void > > >> +foo1 (struct page *page, unsigned long counters_new) > > >> +{ > > >> + struct page tmp; > > >> + tmp.counters = counters_new; > > >> + page->inuse = tmp.inuse; > > >> + page->objects = tmp.objects; > > >> + page->frozen = tmp.frozen; > > >> +} > > >> + > > >> +__attribute__((noipa)) void > > >> +foo2 (struct page2 *page2, unsigned long counters_new) > > >> +{ > > >> + struct page2 tmp; > > >> + tmp.counters = counters_new; > > >> + page2->inuse = tmp.inuse; > > >> + page2->objects = tmp.objects; > > >> + page2->frozen = tmp.frozen; > > >> +} > > >> + > > >> +int main () > > >> +{ > > >> + struct page page; > > >> + foo1 (&page, 0x12345678ABCDEFFEUL); > > >> + > > >> + asm volatile ("" : : : "memory"); > > >> + if (page.frozen != 1 || page.objects != 0x2bcd || page.inuse != > > >> 0xeffe) > > >> + __builtin_abort (); > > >> + > > >> + struct page2 page2; > > >> + foo2 (&page2, 0x12345678ABCDEFFEUL); > > >> + > > >> + asm volatile ("" : : : "memory"); > > >> + > > >> + if (page2.frozen != 0xab || page2.objects != 0xcd || page2.inuse != > > >> 0xeffe) > > >> + __builtin_abort (); > > >> + return 0; > > >> +} > > >> + > > >> +/* { dg-final { scan-tree-dump-times "Merging successful" 2 > > >> "store-merging1" } } */ > > >> > > > > > > > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)