Hi Richard,

On 21/10/16 13:37, Richard Biener wrote:
On Tue, 18 Oct 2016, Kyrill Tkachov wrote:

Hi Richard,

This patch is a merge of [1] and [2] and implements the manual merging of
bitfields
as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN too.
It caused me a lot of headeache because the bit offset is counted from the
most significant bit
in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN looks
irrelevant for store merging
anyway as it's just used to described RTL extract operations).
I've included ASCII diagrams of the steps in the merging algorithm.
Heh, thanks.

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
x86_64-unknown-linux-gnu.
Also tested on aarch64_be-none-elf.

How does this version look now?
Mostly good.  For

+bool
+pass_store_merging::terminate_all_aliasing_chains (tree dest, tree base,
+                                                  gimple *stmt)
+{
...
+  /* Check if the assignment destination (BASE) is part of a store chain.
+     This is to catch non-constant stores to destinations that may be
part
+     of a chain.  */
+  if (base)
+    {
+      chain_info = m_stores.get (base);
+      if (chain_info)
+       {
+         struct store_immediate_info *info;
+         unsigned int i;
+         FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info)
+           {
+             if (refs_may_alias_p (info->dest, dest))
+               {

I suppose the chain is not yet sorted in any way?

At least for 'dest' which do not have a known constant offset we
could do

    if (base)
      terminate_and_release_chain (base);

Do you mean when get_inner_reference returns non-NULL for POFFSET?
Or do you think we should try to look into dest in this function?

to speed things up?  IIRC we do not terminate chains early in
this phase when we have enough stores to form a group, so
writing a testcase that triggers quadraticness would be as simple
as having

char a[1000000];

void foo ()
{
  a[0] = 1;
  a[1] = 2;
  ....
  a[9999999] = 3;
}

?

so I think you probably want to limit the number of stores you
ever put onto a chain and if you reach that limit, terminate
and release it?  Like just choose 16 or 64?  (and experiment
with the above kind of testcases)

I was initially thinking of imposing such a limit as well but
later I thought we'd want to extend the output code to be able to emit
a memcpy (or memset) call for large regions, so detecting the largest possible
regions would be needed. But that is not implemented yet (though I have 
experimented
with it) so I can add a limit here. Should I just hardcode a limit or should I 
make it
into a --param (MAX_STMTS_IN_STORE_MERGING_CHAIN or something)?


+                 bit_off = byte_off << LOG2_BITS_PER_UNIT;
+                 if (!wi::neg_p (bit_off) && wi::fits_shwi_p (bit_off))
+                   {
+                     bitpos += bit_off.to_shwi ();
+

I think you want bit_off += bitpos before the fits_shwi check
otherwise this add may still overflow.

+                     base_addr = copy_node (base_addr);
+                     TREE_OPERAND (base_addr, 1)
+                       = build_zero_cst (TREE_TYPE (TREE_OPERAND (
+                                                      base_addr, 1)));

I'd prefer

       base_addr = build2 (MEM_REF, ...);

here.

Thanks for the feedback,
Kyrill


Thanks,
Richard.

Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00573.html
[2] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00572.html

2016-10-18  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

     PR middle-end/22141
     * Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
     * common.opt (fstore-merging): New Optimization option.
     * opts.c (default_options_table): Add entry for
     OPT_ftree_store_merging.
     * fold-const.h (can_native_encode_type_p): Declare prototype.
     * fold-const.c (can_native_encode_type_p): Define.
     * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
     * passes.def: Insert pass_tree_store_merging.
     * tree-pass.h (make_pass_store_merging): Declare extern
     prototype.
     * gimple-ssa-store-merging.c: New file.
     * doc/invoke.texi (Optimization Options): Document
     -fstore-merging.

2016-10-18  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
             Jakub Jelinek  <ja...@redhat.com>
             Andrew Pinski  <pins...@gmail.com>

     PR middle-end/22141
     PR rtl-optimization/23684
     * gcc.c-torture/execute/pr22141-1.c: New test.
     * gcc.c-torture/execute/pr22141-2.c: Likewise.
     * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
     * gcc.target/aarch64/ldp_stp_4.c: Likewise.
     * gcc.dg/store_merging_1.c: New test.
     * gcc.dg/store_merging_2.c: Likewise.
     * gcc.dg/store_merging_3.c: Likewise.
     * gcc.dg/store_merging_4.c: Likewise.
     * gcc.dg/store_merging_5.c: Likewise.
     * gcc.dg/store_merging_6.c: Likewise.
     * gcc.dg/store_merging_7.c: Likewise.
     * gcc.target/i386/pr22141.c: Likewise.
     * gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options.
     * g++.dg/init/new17.C: Likewise.


Reply via email to