On Mon, 7 May 2012, Aldy Hernandez wrote:
> Hi. Sorry for the delay. There were various tricky hiccups along the way to
> bootstrappability and regression cleanliness...
>
> On 04/26/12 04:51, Richard Guenther wrote:
> > On Wed, 25 Apr 2012, Aldy Hernandez wrote:
> >
> > > On 04/25/12 06:45, Richard Guenther wrote:
> > > > On Tue, Apr 24, 2012 at 7:43 PM, Aldy Hernandez<[email protected]>
> > > > wrote:
> > > > > On 04/13/12 03:46, Richard Guenther wrote:
> > > > > >
> > > > > > On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez<[email protected]>
> > > > > > wrote:
>
> > > Speak of loads, I am keeping the information as an additional bitmap in
> > > `memory_accesses', as ->refs_in_loop was set for stores as well, so I
> > > couldn't
> > > depend on it. Let me know if you have another idea.
> >
> > Hmm, refs_in_loop& ~all_refs_stored_in_loop, so instead of
> >
> > + bitmap reads = VEC_index (bitmap, memory_accesses.reads_in_loop,
> > + loop->num);
> > + ref_read_in_loop_p = bitmap_bit_p (reads, ref->id);
> >
> > ref_read_in_loop_p = bitmap_bit_p (refs, ref->id)&& !bitmap_bit_p
> > (stores, ref->id);
> >
> > ? But maybe that doesn't work if a ref is both read and stored to.
> > Btw, rather than adding a bitmap to memory_accesses I'd rather add
> > a mark_ref_loaded corresponding to mark_ref_stored (or rather merge
> > both into one) and a bitmap to struct mem_ref.
>
> I have done so as mark_ref_loaded(). I first tried to merge both into a
> generic mark_ref(), to avoid iterating twice, but I was concerned with the
> early loop exit of !bitmap_bit_p(ref->stored, loop->num), not knowing if I
> should exit the loop altogether in the merged case or what. In either case,
> the code was somewhat convoluted, so I opted for a separate clean function.
>
> In scanning the gimple stream for the loads I noticed that instances of two
> component refs (say foo.bar) were not pointer comparable, so I am asking the
> alias oracle with refs_may_alias_p. It also seemed to make more sense wrt
> memory chunks that may alias. What do you think?
No, it asks for equal must aliases (it will replace them after all!).
You should use operand_equal_p to compare reference trees.
But why do all the following dance
+
+ if (gimple_assign_single_p (stmt))
+ {
+ tree t = gimple_assign_rhs1 (stmt);
+ /* ?? For some reason two exact COMPONENT_REFs cannot be
+ compared with pointer equality, so ask the alias oracle. */
+ if (ref->mem == t
+ || ((TREE_CODE (t) == SSA_NAME
+ || DECL_P (t)
+ || handled_component_p (t)
+ || TREE_CODE (t) == MEM_REF
+ || TREE_CODE (t) == TARGET_MEM_REF)
+ && refs_may_alias_p (t, ref->mem)))
+ mark_ref_loaded (ref, loop);
+ }
at all and not just
if (is_stored)
mark_ref_stored (ref, loop);
else
mark_ref_loaded (ref, loop);
and similar in the !mem case mark the ref loaded unconditionally:
mark_ref_loaded (ref, loop);
if (gimple_vdef (stmt))
mark_ref_stored (ref, loop);
> This patch is far more robust and fully tested. Two wrenches to complicate
> things though...
>
> First, while inserting the code writing the _LSM_ temporary writes into the
> original variables, I found a handful of tests where the order of the writes
> mattered (aliased locations, inlined code, etc, IIRC). [This is in loops
> where multiple stores are moved.] So iterating and inserting on the exit
> edges caused the stores to be saved in reverse. I am now saving the edges and
> threading the code, so they happen in the correct order:
>
> if (foo_flag_lsm)
> foo = foo_lsm;
> if (foo2_flag_lsm)
> foo2 = foo2_lsm;
> actual_exit:
>
> This required some edge redirection so we didn't jump to the original actual
> exit prematurely when handling multiple moved stores. The dominator tree
> needed to be updated, and there is one instance where I couldn't find a clever
> way of saving the order without jumping through an inordinate amount of hoops.
> It seemed easier to call recompute_dominator.
>
> Second, avoiding the load, unless it happens in the loop like you have
> suggested, brought about some bogus unused warnings with the LSM temporaries.
> What happens is that compute_ssa() creates uses of the LSM temporaries that
> are not yet defined, so we end up with PHI nodes with a definition entry (D).
> Since we are sure the eventual uses will be gated by their corresponding flag,
> I created phi nodes with an initial value of 0 in the preheader of the loops
> in which they are used. This solved the problem. I also played with
> initializing to 0 at the entry block, but that seemed a bit too invasive.
Hm. Btw, see also PR39612 which you could solve with that?
> Andrew suggested the correct fix was to add a new pass that was able to do
> some ?? flow sensitive data flow analysis ?? that could discover these
> unreachable paths and insert the 0 phis at the start of the blocks
> automatically. But that seemed like far too much work, considering how long
> it's taken me to get this far ;-).
>
> Bootstraped and fully tested with multi_threaded_model_p=true hard-coded.
> This is the revision I'd like to contribute, sans the hardcoded value.
>
> What do you think?
I notice
+ /* Emit the load code into the latch, so that we are sure it will
+ be processed after all dependencies. */
+ latch_edge = loop_latch_edge (loop);
inserting code into the latch block is bad - the vectorizer will
be confused by non-empty latches.
You might at first (as you suggested yourself ;)) avoid trying to
tackle the load issue. The patch is already complex enough, and
a candidate for splitting out is the BB_IN_TRANSACTION change
(hereby pre-approved).
Your ChangeLog mentions changes that are no longer necessary
(the target hook).
I didn't quite get the store order issue - we _do_ preserve store
ordering right now, do we? So how come introducing the flags
change that?
Thanks,
Richard.