On October 21, 2016 3:29:15 PM GMT+02:00, Kyrill Tkachov 
<kyrylo.tkac...@foss.arm.com> wrote:
>Hi Richard,
>
>On 21/10/16 13:37, Richard Biener wrote:
>> On Tue, 18 Oct 2016, Kyrill Tkachov wrote:
>>
>>> Hi Richard,
>>>
>>> This patch is a merge of [1] and [2] and implements the manual
>merging of
>>> bitfields
>>> as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN
>too.
>>> It caused me a lot of headeache because the bit offset is counted
>from the
>>> most significant bit
>>> in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN
>looks
>>> irrelevant for store merging
>>> anyway as it's just used to described RTL extract operations).
>>> I've included ASCII diagrams of the steps in the merging algorithm.
>> Heh, thanks.
>>
>>> Bootstrapped and tested on arm-none-linux-gnueabihf,
>aarch64-none-linux-gnu,
>>> x86_64-unknown-linux-gnu.
>>> Also tested on aarch64_be-none-elf.
>>>
>>> How does this version look now?
>> Mostly good.  For
>>
>> +bool
>> +pass_store_merging::terminate_all_aliasing_chains (tree dest, tree
>base,
>> +                                                  gimple *stmt)
>> +{
>> ...
>> +  /* Check if the assignment destination (BASE) is part of a store
>chain.
>> +     This is to catch non-constant stores to destinations that may
>be
>> part
>> +     of a chain.  */
>> +  if (base)
>> +    {
>> +      chain_info = m_stores.get (base);
>> +      if (chain_info)
>> +       {
>> +         struct store_immediate_info *info;
>> +         unsigned int i;
>> +         FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info)
>> +           {
>> +             if (refs_may_alias_p (info->dest, dest))
>> +               {
>>
>> I suppose the chain is not yet sorted in any way?
>>
>> At least for 'dest' which do not have a known constant offset we
>> could do
>>
>>     if (base)
>>       terminate_and_release_chain (base);
>
>Do you mean when get_inner_reference returns non-NULL for POFFSET?

Yes.

>Or do you think we should try to look into dest in this function?
>
>> to speed things up?  IIRC we do not terminate chains early in
>> this phase when we have enough stores to form a group, so
>> writing a testcase that triggers quadraticness would be as simple
>> as having
>>
>> char a[1000000];
>>
>> void foo ()
>> {
>>   a[0] = 1;
>>   a[1] = 2;
>>   ....
>>   a[9999999] = 3;
>> }
>>
>> ?
>>
>> so I think you probably want to limit the number of stores you
>> ever put onto a chain and if you reach that limit, terminate
>> and release it?  Like just choose 16 or 64?  (and experiment
>> with the above kind of testcases)
>
>I was initially thinking of imposing such a limit as well but
>later I thought we'd want to extend the output code to be able to emit
>a memcpy (or memset) call for large regions, so detecting the largest
>possible
>regions would be needed.

But then we need a better data structure here to avoid the quadraticness.

 But that is not implemented yet (though I have
>experimented
>with it) so I can add a limit here. Should I just hardcode a limit or
>should I make it
>into a --param (MAX_STMTS_IN_STORE_MERGING_CHAIN or something)?

A param is preferred.

>>
>> +                 bit_off = byte_off << LOG2_BITS_PER_UNIT;
>> +                 if (!wi::neg_p (bit_off) && wi::fits_shwi_p
>(bit_off))
>> +                   {
>> +                     bitpos += bit_off.to_shwi ();
>> +
>>
>> I think you want bit_off += bitpos before the fits_shwi check
>> otherwise this add may still overflow.
>>
>> +                     base_addr = copy_node (base_addr);
>> +                     TREE_OPERAND (base_addr, 1)
>> +                       = build_zero_cst (TREE_TYPE (TREE_OPERAND (
>> +                                                      base_addr,
>1)));
>>
>> I'd prefer
>>
>>        base_addr = build2 (MEM_REF, ...);
>>
>> here.
>
>Thanks for the feedback,
>Kyrill
>
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Kyrill
>>>
>>> [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00573.html
>>> [2] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00572.html
>>>
>>> 2016-10-18  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>
>>>      PR middle-end/22141
>>>      * Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
>>>      * common.opt (fstore-merging): New Optimization option.
>>>      * opts.c (default_options_table): Add entry for
>>>      OPT_ftree_store_merging.
>>>      * fold-const.h (can_native_encode_type_p): Declare prototype.
>>>      * fold-const.c (can_native_encode_type_p): Define.
>>>      * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
>>>      * passes.def: Insert pass_tree_store_merging.
>>>      * tree-pass.h (make_pass_store_merging): Declare extern
>>>      prototype.
>>>      * gimple-ssa-store-merging.c: New file.
>>>      * doc/invoke.texi (Optimization Options): Document
>>>      -fstore-merging.
>>>
>>> 2016-10-18  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>              Jakub Jelinek  <ja...@redhat.com>
>>>              Andrew Pinski  <pins...@gmail.com>
>>>
>>>      PR middle-end/22141
>>>      PR rtl-optimization/23684
>>>      * gcc.c-torture/execute/pr22141-1.c: New test.
>>>      * gcc.c-torture/execute/pr22141-2.c: Likewise.
>>>      * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
>>>      * gcc.target/aarch64/ldp_stp_4.c: Likewise.
>>>      * gcc.dg/store_merging_1.c: New test.
>>>      * gcc.dg/store_merging_2.c: Likewise.
>>>      * gcc.dg/store_merging_3.c: Likewise.
>>>      * gcc.dg/store_merging_4.c: Likewise.
>>>      * gcc.dg/store_merging_5.c: Likewise.
>>>      * gcc.dg/store_merging_6.c: Likewise.
>>>      * gcc.dg/store_merging_7.c: Likewise.
>>>      * gcc.target/i386/pr22141.c: Likewise.
>>>      * gcc.target/i386/pr34012.c: Add -fno-store-merging to
>dg-options.
>>>      * g++.dg/init/new17.C: Likewise.
>>>


Reply via email to