On October 21, 2016 3:29:15 PM GMT+02:00, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: >Hi Richard, > >On 21/10/16 13:37, Richard Biener wrote: >> On Tue, 18 Oct 2016, Kyrill Tkachov wrote: >> >>> Hi Richard, >>> >>> This patch is a merge of [1] and [2] and implements the manual >merging of >>> bitfields >>> as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN >too. >>> It caused me a lot of headeache because the bit offset is counted >from the >>> most significant bit >>> in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN >looks >>> irrelevant for store merging >>> anyway as it's just used to described RTL extract operations). >>> I've included ASCII diagrams of the steps in the merging algorithm. >> Heh, thanks. >> >>> Bootstrapped and tested on arm-none-linux-gnueabihf, >aarch64-none-linux-gnu, >>> x86_64-unknown-linux-gnu. >>> Also tested on aarch64_be-none-elf. >>> >>> How does this version look now? >> Mostly good. For >> >> +bool >> +pass_store_merging::terminate_all_aliasing_chains (tree dest, tree >base, >> + gimple *stmt) >> +{ >> ... >> + /* Check if the assignment destination (BASE) is part of a store >chain. >> + This is to catch non-constant stores to destinations that may >be >> part >> + of a chain. */ >> + if (base) >> + { >> + chain_info = m_stores.get (base); >> + if (chain_info) >> + { >> + struct store_immediate_info *info; >> + unsigned int i; >> + FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info) >> + { >> + if (refs_may_alias_p (info->dest, dest)) >> + { >> >> I suppose the chain is not yet sorted in any way? >> >> At least for 'dest' which do not have a known constant offset we >> could do >> >> if (base) >> terminate_and_release_chain (base); > >Do you mean when get_inner_reference returns non-NULL for POFFSET?
Yes. >Or do you think we should try to look into dest in this function? > >> to speed things up? IIRC we do not terminate chains early in >> this phase when we have enough stores to form a group, so >> writing a testcase that triggers quadraticness would be as simple >> as having >> >> char a[1000000]; >> >> void foo () >> { >> a[0] = 1; >> a[1] = 2; >> .... >> a[9999999] = 3; >> } >> >> ? >> >> so I think you probably want to limit the number of stores you >> ever put onto a chain and if you reach that limit, terminate >> and release it? Like just choose 16 or 64? (and experiment >> with the above kind of testcases) > >I was initially thinking of imposing such a limit as well but >later I thought we'd want to extend the output code to be able to emit >a memcpy (or memset) call for large regions, so detecting the largest >possible >regions would be needed. But then we need a better data structure here to avoid the quadraticness. But that is not implemented yet (though I have >experimented >with it) so I can add a limit here. Should I just hardcode a limit or >should I make it >into a --param (MAX_STMTS_IN_STORE_MERGING_CHAIN or something)? A param is preferred. >> >> + bit_off = byte_off << LOG2_BITS_PER_UNIT; >> + if (!wi::neg_p (bit_off) && wi::fits_shwi_p >(bit_off)) >> + { >> + bitpos += bit_off.to_shwi (); >> + >> >> I think you want bit_off += bitpos before the fits_shwi check >> otherwise this add may still overflow. >> >> + base_addr = copy_node (base_addr); >> + TREE_OPERAND (base_addr, 1) >> + = build_zero_cst (TREE_TYPE (TREE_OPERAND ( >> + base_addr, >1))); >> >> I'd prefer >> >> base_addr = build2 (MEM_REF, ...); >> >> here. > >Thanks for the feedback, >Kyrill > >> >> Thanks, >> Richard. >> >>> Thanks, >>> Kyrill >>> >>> [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00573.html >>> [2] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00572.html >>> >>> 2016-10-18 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> >>> PR middle-end/22141 >>> * Makefile.in (OBJS): Add gimple-ssa-store-merging.o. >>> * common.opt (fstore-merging): New Optimization option. >>> * opts.c (default_options_table): Add entry for >>> OPT_ftree_store_merging. >>> * fold-const.h (can_native_encode_type_p): Declare prototype. >>> * fold-const.c (can_native_encode_type_p): Define. >>> * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define. >>> * passes.def: Insert pass_tree_store_merging. >>> * tree-pass.h (make_pass_store_merging): Declare extern >>> prototype. >>> * gimple-ssa-store-merging.c: New file. >>> * doc/invoke.texi (Optimization Options): Document >>> -fstore-merging. >>> >>> 2016-10-18 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> Jakub Jelinek <ja...@redhat.com> >>> Andrew Pinski <pins...@gmail.com> >>> >>> PR middle-end/22141 >>> PR rtl-optimization/23684 >>> * gcc.c-torture/execute/pr22141-1.c: New test. >>> * gcc.c-torture/execute/pr22141-2.c: Likewise. >>> * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging. >>> * gcc.target/aarch64/ldp_stp_4.c: Likewise. >>> * gcc.dg/store_merging_1.c: New test. >>> * gcc.dg/store_merging_2.c: Likewise. >>> * gcc.dg/store_merging_3.c: Likewise. >>> * gcc.dg/store_merging_4.c: Likewise. >>> * gcc.dg/store_merging_5.c: Likewise. >>> * gcc.dg/store_merging_6.c: Likewise. >>> * gcc.dg/store_merging_7.c: Likewise. >>> * gcc.target/i386/pr22141.c: Likewise. >>> * gcc.target/i386/pr34012.c: Add -fno-store-merging to >dg-options. >>> * g++.dg/init/new17.C: Likewise. >>>