On Mon, Aug 22, 2016 at 12:39 PM, Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com> wrote:
> Hi all,
>
> This is a v2 of the patch at [1] addressing feedback from Richi.
> The most major changes are in the first phase of the pass that detects store
> chains.
> This now uses a hash_map to track multiple chains to different bases,
> eliminating the need
> to iterate over a basic block multiple times, thus reducing the complexity.
>
> Additionally the merging has been enabled for bitfields that are not a
> multiple of BITS_PER_UNIT as long
> as the merged group is of a size that is a multiple of BITS_PER_UNIT.
> Handling those shouldn't be too hard
> in the future, I think, but would require fiddling with BIT_FIELD_REF
> expressions that I'm not very familiar
> with at the moment.  Multiple constant stores to the same location are now
> supported and the code should
> do the right thing merging them, effectively performing a bit of dead store
> elimination.
>
> Also, the merging code keeps track of the first and last statement in the
> original statement sequence and
> outputs the new merged sequence after the last original statement.
>
> The option for this pass has been renamed to -fstore-merging and the new
> file is now named
> gimple-ssa-store-merging.c.
>
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
> x86_64-linux-gnu.
> Additionally tested on aarch64_be-none-elf.
>
> Does this look better?
>
> I'll experiment with placing the pass earlier in the pipeline but I thought
> I'd send this version out
> for more comments.

+   Note on endianness and example:
+   The bitposition that is written in each access is determined by
+   get_inner_reference which calculates the position in the object rather than
+   the memory itself

actually they are the same -- but the endianess and the size of the immediate
of course matters when you put it into the "virtual" field at a (memory-)offset.

native_encode_* might help you here if you use a unsigned char[] buffer
of group size and simply encode the immediates at the get_inner_reference
position.  Then you can native_interpret_* the pieces you want to use for
the actual merged stores.  That way you can handle arbitrary tcc_constant
trees as well.  You can also simply gobble up overlapping stores to the
group by encoding them in execution order.

This way you should be able to handle merging of two vector stores to one
for example.  Or four half-floats to a larger integer one.
But yeah - it'll change your internal rep from wide_int to a unsigned
char buffer of variable size ...

There is also WORDS_BIG_ENDIAN (for pdp endianess) which you don't seem
to handle.

+         if (is_gimple_assign (stmt))
+           {
+             tree lhs = gimple_assign_lhs (stmt);
+             tree rhs = gimple_assign_rhs1 (stmt);
+
+             HOST_WIDE_INT bitsize, bitpos;
+             machine_mode mode;
+             int unsignedp = 0, reversep = 0, volatilep = 0;
+             tree offset, base_addr;
+             base_addr
+               = get_inner_reference (lhs, &bitsize, &bitpos, &offset, &mode,
+                                      &unsignedp, &reversep, &volatilep);

Please avoid this if !lhs_code_valid_for_store_merging_p.

+             /* The ultimate base object may be volatile so catch it here.  */
+             if (TREE_THIS_VOLATILE (base_addr))

this is wrong (the access may be non-volatile).  Checking for
gimple_has_volatile_ops above
is enough.

I'd say instead of using MAX_STORE_BITSIZE you should be using MOVE_MAX.

As you are only interested in stores the is_gimple_assign check should be
gimple_assign_single_p (stmt) && gimple_vdef (stmt) instead.

+             struct imm_store_chain_info **chain_info
+               = m_stores.get (base_addr);
+             if (invalid)
+               {
+                 if (gimple_vuse (stmt) || gimple_vdef (stmt))
+                   terminate_all_aliasing_chains (lhs, base_addr, stmt);
+
+                 continue;

looks like you can as well handle this by falling though to (xxx)

+               }
+             else
+               {
+
+                 store_immediate_info *info;
+                 if (chain_info)
+                   {
+                     info = new store_immediate_info (
+                       bitsize, bitpos, rhs, lhs, stmt,
+                       (*chain_info)->m_store_info.length ());
+                     if (dump_file)
+                       {
+                         fprintf (dump_file,
+                                  "Recording immediate store from stmt:\n");
+                         print_gimple_stmt (dump_file, stmt, 0, 0);
+                       }
+                     (*chain_info)->m_store_info.safe_push (info);
+                     continue;
+                   }
+                 else if (invalid)

AFAICS this is never true

+                   {
+                     terminate_all_aliasing_chains (lhs, base_addr, stmt);
+                     continue;
+                   }
+
+                 /* Store aliases any existing chain?  */
+                 terminate_all_aliasing_chains (lhs, base_addr, stmt);

likewise with respect to fallthrough to (xxx)


(xxx)
+         /* Check if load/store aliases any existing chains.  */
+         hash_map<tree_operand_hash, struct imm_store_chain_info *>::iterator
+           iter
+           = m_stores.begin ();
+         for (; iter != m_stores.end (); ++iter)
+           {
+             tree key = (*iter).first;
+             if (stmt_may_clobber_ref_p (stmt, key)
+                 || ref_maybe_used_by_stmt_p (stmt, key))
+               terminate_and_release_chain (key);
+           }

So here it's cheaper to guard the whole loop with

  if (gimple_vuse (stmt))

I suppose implementing terminate_all_aliaisng_chains the same way
by simply passing in 'stmt' should simplify it (and factor out that last
loop as well).

Sofar for now.

Richard.





> Thanks,
> Kyrill
>
> [1] https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00942.html
>
> 2016-08-22  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>
>     PR middle-end/22141
>     * Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
>     * common.opt (fstore-merging): New Optimization option.
>     * opts.c (default_options_table): Add entry for
>     OPT_ftree_store_merging.
>     * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
>     * passes.def: Insert pass_tree_store_merging.
>     * tree-pass.h (make_pass_store_merging): Declare extern
>     prototype.
>     * gimple-ssa-store-merging.c: New file.
>     * doc/invoke.texi (Optimization Options): Document
>     -fstore-merging.
>
> 2016-08-22  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>             Jakub Jelinek  <ja...@redhat.com>
>
>     PR middle-end/22141
>     * gcc.c-torture/execute/pr22141-1.c: New test.
>     * gcc.c-torture/execute/pr22141-2.c: Likewise.
>     * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
>     * gcc.dg/store_merging_1.c: New test.
>     * gcc.dg/store_merging_2.c: Likewise.
>     * gcc.dg/store_merging_3.c: Likewise.
>     * gcc.dg/store_merging_4.c: Likewise.
>     * gcc.dg/store_merging_5.c: Likewise.
>     * gcc.dg/store_merging_6.c: Likewise.
>     * gcc.target/i386/pr22141.c: Likewise.

Reply via email to