On August 25, 2016 3:57:11 PM GMT+02:00, Kyrill Tkachov 
<kyrylo.tkac...@foss.arm.com> wrote:
>Hi Richard,
>
>On 25/08/16 14:01, Richard Biener wrote:
>> On Mon, Aug 22, 2016 at 12:39 PM, Kyrill Tkachov
>> <kyrylo.tkac...@foss.arm.com> wrote:
>>> Hi all,
>>>
>>> This is a v2 of the patch at [1] addressing feedback from Richi.
>>> The most major changes are in the first phase of the pass that
>detects store
>>> chains.
>>> This now uses a hash_map to track multiple chains to different
>bases,
>>> eliminating the need
>>> to iterate over a basic block multiple times, thus reducing the
>complexity.
>>>
>>> Additionally the merging has been enabled for bitfields that are not
>a
>>> multiple of BITS_PER_UNIT as long
>>> as the merged group is of a size that is a multiple of
>BITS_PER_UNIT.
>>> Handling those shouldn't be too hard
>>> in the future, I think, but would require fiddling with
>BIT_FIELD_REF
>>> expressions that I'm not very familiar
>>> with at the moment.  Multiple constant stores to the same location
>are now
>>> supported and the code should
>>> do the right thing merging them, effectively performing a bit of
>dead store
>>> elimination.
>>>
>>> Also, the merging code keeps track of the first and last statement
>in the
>>> original statement sequence and
>>> outputs the new merged sequence after the last original statement.
>>>
>>> The option for this pass has been renamed to -fstore-merging and the
>new
>>> file is now named
>>> gimple-ssa-store-merging.c.
>>>
>>> Bootstrapped and tested on arm-none-linux-gnueabihf,
>aarch64-none-linux-gnu,
>>> x86_64-linux-gnu.
>>> Additionally tested on aarch64_be-none-elf.
>>>
>>> Does this look better?
>>>
>>> I'll experiment with placing the pass earlier in the pipeline but I
>thought
>>> I'd send this version out
>>> for more comments.
>> +   Note on endianness and example:
>> +   The bitposition that is written in each access is determined by
>> +   get_inner_reference which calculates the position in the object
>rather than
>> +   the memory itself
>>
>> actually they are the same -- but the endianess and the size of the
>immediate
>> of course matters when you put it into the "virtual" field at a
>(memory-)offset.
>>
>> native_encode_* might help you here if you use a unsigned char[]
>buffer
>> of group size and simply encode the immediates at the
>get_inner_reference
>> position.  Then you can native_interpret_* the pieces you want to use
>for
>> the actual merged stores.  That way you can handle arbitrary
>tcc_constant
>> trees as well.  You can also simply gobble up overlapping stores to
>the
>> group by encoding them in execution order.
>
>Unfortunately I don't see how that would deal with sub-byte bitfields.
>native_encode_* and native_interpret_* deal with byte quantities
>(native_encode_* zero- or sign-extends the value to the next byte)
>so I can't tell them to start encoding or extracting at a particular
>bit position.

True.  Though you can encode into a byte aligned buffer and merge that 
'shifted'.  IIRC the bitpos is memory oder bit position and thus not subject to 
BITS_BIG_ENDIAN.

Richard.

>Thanks for your valuable feedback. I'll be working through it.
>
>Kyrill
>
>> This way you should be able to handle merging of two vector stores to
>one
>> for example.  Or four half-floats to a larger integer one.
>> But yeah - it'll change your internal rep from wide_int to a unsigned
>> char buffer of variable size ...
>>
>> There is also WORDS_BIG_ENDIAN (for pdp endianess) which you don't
>seem
>> to handle.
>>
>> +         if (is_gimple_assign (stmt))
>> +           {
>> +             tree lhs = gimple_assign_lhs (stmt);
>> +             tree rhs = gimple_assign_rhs1 (stmt);
>> +
>> +             HOST_WIDE_INT bitsize, bitpos;
>> +             machine_mode mode;
>> +             int unsignedp = 0, reversep = 0, volatilep = 0;
>> +             tree offset, base_addr;
>> +             base_addr
>> +               = get_inner_reference (lhs, &bitsize, &bitpos,
>&offset, &mode,
>> +                                      &unsignedp, &reversep,
>&volatilep);
>>
>> Please avoid this if !lhs_code_valid_for_store_merging_p.
>>
>> +             /* The ultimate base object may be volatile so catch it
>here.  */
>> +             if (TREE_THIS_VOLATILE (base_addr))
>>
>> this is wrong (the access may be non-volatile).  Checking for
>> gimple_has_volatile_ops above
>> is enough.
>>
>> I'd say instead of using MAX_STORE_BITSIZE you should be using
>MOVE_MAX.
>>
>> As you are only interested in stores the is_gimple_assign check
>should be
>> gimple_assign_single_p (stmt) && gimple_vdef (stmt) instead.
>>
>> +             struct imm_store_chain_info **chain_info
>> +               = m_stores.get (base_addr);
>> +             if (invalid)
>> +               {
>> +                 if (gimple_vuse (stmt) || gimple_vdef (stmt))
>> +                   terminate_all_aliasing_chains (lhs, base_addr,
>stmt);
>> +
>> +                 continue;
>>
>> looks like you can as well handle this by falling though to (xxx)
>>
>> +               }
>> +             else
>> +               {
>> +
>> +                 store_immediate_info *info;
>> +                 if (chain_info)
>> +                   {
>> +                     info = new store_immediate_info (
>> +                       bitsize, bitpos, rhs, lhs, stmt,
>> +                       (*chain_info)->m_store_info.length ());
>> +                     if (dump_file)
>> +                       {
>> +                         fprintf (dump_file,
>> +                                  "Recording immediate store from
>stmt:\n");
>> +                         print_gimple_stmt (dump_file, stmt, 0, 0);
>> +                       }
>> +                     (*chain_info)->m_store_info.safe_push (info);
>> +                     continue;
>> +                   }
>> +                 else if (invalid)
>>
>> AFAICS this is never true
>>
>> +                   {
>> +                     terminate_all_aliasing_chains (lhs, base_addr,
>stmt);
>> +                     continue;
>> +                   }
>> +
>> +                 /* Store aliases any existing chain?  */
>> +                 terminate_all_aliasing_chains (lhs, base_addr,
>stmt);
>>
>> likewise with respect to fallthrough to (xxx)
>>
>>
>> (xxx)
>> +         /* Check if load/store aliases any existing chains.  */
>> +         hash_map<tree_operand_hash, struct imm_store_chain_info
>*>::iterator
>> +           iter
>> +           = m_stores.begin ();
>> +         for (; iter != m_stores.end (); ++iter)
>> +           {
>> +             tree key = (*iter).first;
>> +             if (stmt_may_clobber_ref_p (stmt, key)
>> +                 || ref_maybe_used_by_stmt_p (stmt, key))
>> +               terminate_and_release_chain (key);
>> +           }
>>
>> So here it's cheaper to guard the whole loop with
>>
>>    if (gimple_vuse (stmt))
>>
>> I suppose implementing terminate_all_aliaisng_chains the same way
>> by simply passing in 'stmt' should simplify it (and factor out that
>last
>> loop as well).
>>
>> Sofar for now.
>>
>> Richard.
>>
>>
>>
>>
>>
>>> Thanks,
>>> Kyrill
>>>
>>> [1] https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00942.html
>>>
>>> 2016-08-22  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>
>>>      PR middle-end/22141
>>>      * Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
>>>      * common.opt (fstore-merging): New Optimization option.
>>>      * opts.c (default_options_table): Add entry for
>>>      OPT_ftree_store_merging.
>>>      * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
>>>      * passes.def: Insert pass_tree_store_merging.
>>>      * tree-pass.h (make_pass_store_merging): Declare extern
>>>      prototype.
>>>      * gimple-ssa-store-merging.c: New file.
>>>      * doc/invoke.texi (Optimization Options): Document
>>>      -fstore-merging.
>>>
>>> 2016-08-22  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>              Jakub Jelinek  <ja...@redhat.com>
>>>
>>>      PR middle-end/22141
>>>      * gcc.c-torture/execute/pr22141-1.c: New test.
>>>      * gcc.c-torture/execute/pr22141-2.c: Likewise.
>>>      * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
>>>      * gcc.dg/store_merging_1.c: New test.
>>>      * gcc.dg/store_merging_2.c: Likewise.
>>>      * gcc.dg/store_merging_3.c: Likewise.
>>>      * gcc.dg/store_merging_4.c: Likewise.
>>>      * gcc.dg/store_merging_5.c: Likewise.
>>>      * gcc.dg/store_merging_6.c: Likewise.
>>>      * gcc.target/i386/pr22141.c: Likewise.


Reply via email to