Hi Richard,
On 25/08/16 14:01, Richard Biener wrote:
On Mon, Aug 22, 2016 at 12:39 PM, Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com> wrote:
Hi all,
This is a v2 of the patch at [1] addressing feedback from Richi.
The most major changes are in the first phase of the pass that detects store
chains.
This now uses a hash_map to track multiple chains to different bases,
eliminating the need
to iterate over a basic block multiple times, thus reducing the complexity.
Additionally the merging has been enabled for bitfields that are not a
multiple of BITS_PER_UNIT as long
as the merged group is of a size that is a multiple of BITS_PER_UNIT.
Handling those shouldn't be too hard
in the future, I think, but would require fiddling with BIT_FIELD_REF
expressions that I'm not very familiar
with at the moment. Multiple constant stores to the same location are now
supported and the code should
do the right thing merging them, effectively performing a bit of dead store
elimination.
Also, the merging code keeps track of the first and last statement in the
original statement sequence and
outputs the new merged sequence after the last original statement.
The option for this pass has been renamed to -fstore-merging and the new
file is now named
gimple-ssa-store-merging.c.
Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
x86_64-linux-gnu.
Additionally tested on aarch64_be-none-elf.
Does this look better?
I'll experiment with placing the pass earlier in the pipeline but I thought
I'd send this version out
for more comments.
+ Note on endianness and example:
+ The bitposition that is written in each access is determined by
+ get_inner_reference which calculates the position in the object rather than
+ the memory itself
actually they are the same -- but the endianess and the size of the immediate
of course matters when you put it into the "virtual" field at a (memory-)offset.
native_encode_* might help you here if you use a unsigned char[] buffer
of group size and simply encode the immediates at the get_inner_reference
position. Then you can native_interpret_* the pieces you want to use for
the actual merged stores. That way you can handle arbitrary tcc_constant
trees as well. You can also simply gobble up overlapping stores to the
group by encoding them in execution order.
Unfortunately I don't see how that would deal with sub-byte bitfields.
native_encode_* and native_interpret_* deal with byte quantities
(native_encode_* zero- or sign-extends the value to the next byte)
so I can't tell them to start encoding or extracting at a particular
bit position.
Thanks for your valuable feedback. I'll be working through it.
Kyrill
This way you should be able to handle merging of two vector stores to one
for example. Or four half-floats to a larger integer one.
But yeah - it'll change your internal rep from wide_int to a unsigned
char buffer of variable size ...
There is also WORDS_BIG_ENDIAN (for pdp endianess) which you don't seem
to handle.
+ if (is_gimple_assign (stmt))
+ {
+ tree lhs = gimple_assign_lhs (stmt);
+ tree rhs = gimple_assign_rhs1 (stmt);
+
+ HOST_WIDE_INT bitsize, bitpos;
+ machine_mode mode;
+ int unsignedp = 0, reversep = 0, volatilep = 0;
+ tree offset, base_addr;
+ base_addr
+ = get_inner_reference (lhs, &bitsize, &bitpos, &offset, &mode,
+ &unsignedp, &reversep, &volatilep);
Please avoid this if !lhs_code_valid_for_store_merging_p.
+ /* The ultimate base object may be volatile so catch it here. */
+ if (TREE_THIS_VOLATILE (base_addr))
this is wrong (the access may be non-volatile). Checking for
gimple_has_volatile_ops above
is enough.
I'd say instead of using MAX_STORE_BITSIZE you should be using MOVE_MAX.
As you are only interested in stores the is_gimple_assign check should be
gimple_assign_single_p (stmt) && gimple_vdef (stmt) instead.
+ struct imm_store_chain_info **chain_info
+ = m_stores.get (base_addr);
+ if (invalid)
+ {
+ if (gimple_vuse (stmt) || gimple_vdef (stmt))
+ terminate_all_aliasing_chains (lhs, base_addr, stmt);
+
+ continue;
looks like you can as well handle this by falling though to (xxx)
+ }
+ else
+ {
+
+ store_immediate_info *info;
+ if (chain_info)
+ {
+ info = new store_immediate_info (
+ bitsize, bitpos, rhs, lhs, stmt,
+ (*chain_info)->m_store_info.length ());
+ if (dump_file)
+ {
+ fprintf (dump_file,
+ "Recording immediate store from stmt:\n");
+ print_gimple_stmt (dump_file, stmt, 0, 0);
+ }
+ (*chain_info)->m_store_info.safe_push (info);
+ continue;
+ }
+ else if (invalid)
AFAICS this is never true
+ {
+ terminate_all_aliasing_chains (lhs, base_addr, stmt);
+ continue;
+ }
+
+ /* Store aliases any existing chain? */
+ terminate_all_aliasing_chains (lhs, base_addr, stmt);
likewise with respect to fallthrough to (xxx)
(xxx)
+ /* Check if load/store aliases any existing chains. */
+ hash_map<tree_operand_hash, struct imm_store_chain_info *>::iterator
+ iter
+ = m_stores.begin ();
+ for (; iter != m_stores.end (); ++iter)
+ {
+ tree key = (*iter).first;
+ if (stmt_may_clobber_ref_p (stmt, key)
+ || ref_maybe_used_by_stmt_p (stmt, key))
+ terminate_and_release_chain (key);
+ }
So here it's cheaper to guard the whole loop with
if (gimple_vuse (stmt))
I suppose implementing terminate_all_aliaisng_chains the same way
by simply passing in 'stmt' should simplify it (and factor out that last
loop as well).
Sofar for now.
Richard.
Thanks,
Kyrill
[1] https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00942.html
2016-08-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com>
PR middle-end/22141
* Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
* common.opt (fstore-merging): New Optimization option.
* opts.c (default_options_table): Add entry for
OPT_ftree_store_merging.
* params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
* passes.def: Insert pass_tree_store_merging.
* tree-pass.h (make_pass_store_merging): Declare extern
prototype.
* gimple-ssa-store-merging.c: New file.
* doc/invoke.texi (Optimization Options): Document
-fstore-merging.
2016-08-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com>
Jakub Jelinek <ja...@redhat.com>
PR middle-end/22141
* gcc.c-torture/execute/pr22141-1.c: New test.
* gcc.c-torture/execute/pr22141-2.c: Likewise.
* gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
* gcc.dg/store_merging_1.c: New test.
* gcc.dg/store_merging_2.c: Likewise.
* gcc.dg/store_merging_3.c: Likewise.
* gcc.dg/store_merging_4.c: Likewise.
* gcc.dg/store_merging_5.c: Likewise.
* gcc.dg/store_merging_6.c: Likewise.
* gcc.target/i386/pr22141.c: Likewise.