On August 25, 2016 3:57:11 PM GMT+02:00, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: >Hi Richard, > >On 25/08/16 14:01, Richard Biener wrote: >> On Mon, Aug 22, 2016 at 12:39 PM, Kyrill Tkachov >> <kyrylo.tkac...@foss.arm.com> wrote: >>> Hi all, >>> >>> This is a v2 of the patch at [1] addressing feedback from Richi. >>> The most major changes are in the first phase of the pass that >detects store >>> chains. >>> This now uses a hash_map to track multiple chains to different >bases, >>> eliminating the need >>> to iterate over a basic block multiple times, thus reducing the >complexity. >>> >>> Additionally the merging has been enabled for bitfields that are not >a >>> multiple of BITS_PER_UNIT as long >>> as the merged group is of a size that is a multiple of >BITS_PER_UNIT. >>> Handling those shouldn't be too hard >>> in the future, I think, but would require fiddling with >BIT_FIELD_REF >>> expressions that I'm not very familiar >>> with at the moment. Multiple constant stores to the same location >are now >>> supported and the code should >>> do the right thing merging them, effectively performing a bit of >dead store >>> elimination. >>> >>> Also, the merging code keeps track of the first and last statement >in the >>> original statement sequence and >>> outputs the new merged sequence after the last original statement. >>> >>> The option for this pass has been renamed to -fstore-merging and the >new >>> file is now named >>> gimple-ssa-store-merging.c. >>> >>> Bootstrapped and tested on arm-none-linux-gnueabihf, >aarch64-none-linux-gnu, >>> x86_64-linux-gnu. >>> Additionally tested on aarch64_be-none-elf. >>> >>> Does this look better? >>> >>> I'll experiment with placing the pass earlier in the pipeline but I >thought >>> I'd send this version out >>> for more comments. >> + Note on endianness and example: >> + The bitposition that is written in each access is determined by >> + get_inner_reference which calculates the position in the object >rather than >> + the memory itself >> >> actually they are the same -- but the endianess and the size of the >immediate >> of course matters when you put it into the "virtual" field at a >(memory-)offset. >> >> native_encode_* might help you here if you use a unsigned char[] >buffer >> of group size and simply encode the immediates at the >get_inner_reference >> position. Then you can native_interpret_* the pieces you want to use >for >> the actual merged stores. That way you can handle arbitrary >tcc_constant >> trees as well. You can also simply gobble up overlapping stores to >the >> group by encoding them in execution order. > >Unfortunately I don't see how that would deal with sub-byte bitfields. >native_encode_* and native_interpret_* deal with byte quantities >(native_encode_* zero- or sign-extends the value to the next byte) >so I can't tell them to start encoding or extracting at a particular >bit position.
True. Though you can encode into a byte aligned buffer and merge that 'shifted'. IIRC the bitpos is memory oder bit position and thus not subject to BITS_BIG_ENDIAN. Richard. >Thanks for your valuable feedback. I'll be working through it. > >Kyrill > >> This way you should be able to handle merging of two vector stores to >one >> for example. Or four half-floats to a larger integer one. >> But yeah - it'll change your internal rep from wide_int to a unsigned >> char buffer of variable size ... >> >> There is also WORDS_BIG_ENDIAN (for pdp endianess) which you don't >seem >> to handle. >> >> + if (is_gimple_assign (stmt)) >> + { >> + tree lhs = gimple_assign_lhs (stmt); >> + tree rhs = gimple_assign_rhs1 (stmt); >> + >> + HOST_WIDE_INT bitsize, bitpos; >> + machine_mode mode; >> + int unsignedp = 0, reversep = 0, volatilep = 0; >> + tree offset, base_addr; >> + base_addr >> + = get_inner_reference (lhs, &bitsize, &bitpos, >&offset, &mode, >> + &unsignedp, &reversep, >&volatilep); >> >> Please avoid this if !lhs_code_valid_for_store_merging_p. >> >> + /* The ultimate base object may be volatile so catch it >here. */ >> + if (TREE_THIS_VOLATILE (base_addr)) >> >> this is wrong (the access may be non-volatile). Checking for >> gimple_has_volatile_ops above >> is enough. >> >> I'd say instead of using MAX_STORE_BITSIZE you should be using >MOVE_MAX. >> >> As you are only interested in stores the is_gimple_assign check >should be >> gimple_assign_single_p (stmt) && gimple_vdef (stmt) instead. >> >> + struct imm_store_chain_info **chain_info >> + = m_stores.get (base_addr); >> + if (invalid) >> + { >> + if (gimple_vuse (stmt) || gimple_vdef (stmt)) >> + terminate_all_aliasing_chains (lhs, base_addr, >stmt); >> + >> + continue; >> >> looks like you can as well handle this by falling though to (xxx) >> >> + } >> + else >> + { >> + >> + store_immediate_info *info; >> + if (chain_info) >> + { >> + info = new store_immediate_info ( >> + bitsize, bitpos, rhs, lhs, stmt, >> + (*chain_info)->m_store_info.length ()); >> + if (dump_file) >> + { >> + fprintf (dump_file, >> + "Recording immediate store from >stmt:\n"); >> + print_gimple_stmt (dump_file, stmt, 0, 0); >> + } >> + (*chain_info)->m_store_info.safe_push (info); >> + continue; >> + } >> + else if (invalid) >> >> AFAICS this is never true >> >> + { >> + terminate_all_aliasing_chains (lhs, base_addr, >stmt); >> + continue; >> + } >> + >> + /* Store aliases any existing chain? */ >> + terminate_all_aliasing_chains (lhs, base_addr, >stmt); >> >> likewise with respect to fallthrough to (xxx) >> >> >> (xxx) >> + /* Check if load/store aliases any existing chains. */ >> + hash_map<tree_operand_hash, struct imm_store_chain_info >*>::iterator >> + iter >> + = m_stores.begin (); >> + for (; iter != m_stores.end (); ++iter) >> + { >> + tree key = (*iter).first; >> + if (stmt_may_clobber_ref_p (stmt, key) >> + || ref_maybe_used_by_stmt_p (stmt, key)) >> + terminate_and_release_chain (key); >> + } >> >> So here it's cheaper to guard the whole loop with >> >> if (gimple_vuse (stmt)) >> >> I suppose implementing terminate_all_aliaisng_chains the same way >> by simply passing in 'stmt' should simplify it (and factor out that >last >> loop as well). >> >> Sofar for now. >> >> Richard. >> >> >> >> >> >>> Thanks, >>> Kyrill >>> >>> [1] https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00942.html >>> >>> 2016-08-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> >>> PR middle-end/22141 >>> * Makefile.in (OBJS): Add gimple-ssa-store-merging.o. >>> * common.opt (fstore-merging): New Optimization option. >>> * opts.c (default_options_table): Add entry for >>> OPT_ftree_store_merging. >>> * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define. >>> * passes.def: Insert pass_tree_store_merging. >>> * tree-pass.h (make_pass_store_merging): Declare extern >>> prototype. >>> * gimple-ssa-store-merging.c: New file. >>> * doc/invoke.texi (Optimization Options): Document >>> -fstore-merging. >>> >>> 2016-08-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> Jakub Jelinek <ja...@redhat.com> >>> >>> PR middle-end/22141 >>> * gcc.c-torture/execute/pr22141-1.c: New test. >>> * gcc.c-torture/execute/pr22141-2.c: Likewise. >>> * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging. >>> * gcc.dg/store_merging_1.c: New test. >>> * gcc.dg/store_merging_2.c: Likewise. >>> * gcc.dg/store_merging_3.c: Likewise. >>> * gcc.dg/store_merging_4.c: Likewise. >>> * gcc.dg/store_merging_5.c: Likewise. >>> * gcc.dg/store_merging_6.c: Likewise. >>> * gcc.target/i386/pr22141.c: Likewise.