Re: [PR tree-optimization/52558]: RFC: questions on store data race

Richard Guenther Wed, 16 May 2012 05:54:19 -0700

On Mon, 7 May 2012, Aldy Hernandez wrote:

> Hi.  Sorry for the delay.  There were various tricky hiccups along the way to
> bootstrappability and regression cleanliness...
> 
> On 04/26/12 04:51, Richard Guenther wrote:
> > On Wed, 25 Apr 2012, Aldy Hernandez wrote:
> >
> > > On 04/25/12 06:45, Richard Guenther wrote:
> > > > On Tue, Apr 24, 2012 at 7:43 PM, Aldy Hernandez<[email protected]>
> > > > wrote:
> > > > > On 04/13/12 03:46, Richard Guenther wrote:
> > > > > >
> > > > > > On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez<[email protected]>
> > > > > > wrote:
> 
> > > Speak of loads, I am keeping the information as an additional bitmap in
> > > `memory_accesses', as ->refs_in_loop was set for stores as well, so I
> > > couldn't
> > > depend on it.  Let me know if you have another idea.
> >
> > Hmm, refs_in_loop&  ~all_refs_stored_in_loop, so instead of
> >
> > +      bitmap reads = VEC_index (bitmap, memory_accesses.reads_in_loop,
> > +                               loop->num);
> > +      ref_read_in_loop_p = bitmap_bit_p (reads, ref->id);
> >
> > ref_read_in_loop_p = bitmap_bit_p (refs, ref->id)&&  !bitmap_bit_p
> > (stores, ref->id);
> >
> > ?  But maybe that doesn't work if a ref is both read and stored to.
> > Btw, rather than adding a bitmap to memory_accesses I'd rather add
> > a mark_ref_loaded corresponding to mark_ref_stored (or rather merge
> > both into one) and a bitmap to struct mem_ref.
> 
> I have done so as mark_ref_loaded().  I first tried to merge both into a
> generic mark_ref(), to avoid iterating twice, but I was concerned with the
> early loop exit of !bitmap_bit_p(ref->stored, loop->num), not knowing if I
> should exit the loop altogether in the merged case or what.  In either case,
> the code was somewhat convoluted, so I opted for a separate clean function.
> 
> In scanning the gimple stream for the loads I noticed that instances of two
> component refs (say foo.bar) were not pointer comparable, so I am asking the
> alias oracle with refs_may_alias_p.  It also seemed to make more sense wrt
> memory chunks that may alias.  What do you think?


No, it asks for equal must aliases (it will replace them after all!).
You should use operand_equal_p to compare reference trees.

But why do all the following dance

+
+  if (gimple_assign_single_p (stmt))
+    {
+      tree t = gimple_assign_rhs1 (stmt);
+      /* ?? For some reason two exact COMPONENT_REFs cannot be
+        compared with pointer equality, so ask the alias oracle.  */
+      if (ref->mem == t
+         || ((TREE_CODE (t) == SSA_NAME
+              || DECL_P (t)
+              || handled_component_p (t)
+              || TREE_CODE (t) == MEM_REF
+              || TREE_CODE (t) == TARGET_MEM_REF)
+             && refs_may_alias_p (t, ref->mem)))
+       mark_ref_loaded (ref, loop);
+    }

at all and not just

   if (is_stored)
     mark_ref_stored (ref, loop);
   else
     mark_ref_loaded (ref, loop);

and similar in the !mem case mark the ref loaded unconditionally:

   mark_ref_loaded (ref, loop);
   if (gimple_vdef (stmt))
     mark_ref_stored (ref, loop);

> This patch is far more robust and fully tested.  Two wrenches to complicate
> things though...
> 
> First, while inserting the code writing the _LSM_ temporary writes into the
> original variables, I found a handful of tests where the order of the writes
> mattered (aliased locations, inlined code, etc, IIRC).  [This is in loops
> where multiple stores are moved.]  So iterating and inserting on the exit
> edges caused the stores to be saved in reverse.  I am now saving the edges and
> threading the code, so they happen in the correct order:
> 
>               if (foo_flag_lsm)
>                       foo = foo_lsm;
>               if (foo2_flag_lsm)
>                       foo2 = foo2_lsm;
>       actual_exit:
> 
> This required some edge redirection so we didn't jump to the original actual
> exit prematurely when handling multiple moved stores.  The dominator tree
> needed to be updated, and there is one instance where I couldn't find a clever
> way of saving the order without jumping through an inordinate amount of hoops.
> It seemed easier to call recompute_dominator.
> 
> Second, avoiding the load, unless it happens in the loop like you have
> suggested, brought about some bogus unused warnings with the LSM temporaries.
> What happens is that compute_ssa() creates uses of the LSM temporaries that
> are not yet defined, so we end up with PHI nodes with a definition entry (D).
> Since we are sure the eventual uses will be gated by their corresponding flag,
> I created phi nodes with an initial value of 0 in the preheader of the loops
> in which they are used.  This solved the problem.  I also played with
> initializing to 0 at the entry block, but that seemed a bit too invasive.

Hm.  Btw, see also PR39612 which you could solve with that?

> Andrew suggested the correct fix was to add a new pass that was able to do
> some ?? flow sensitive data flow analysis ?? that could discover these
> unreachable paths and insert the 0 phis at the start of the blocks
> automatically.  But that seemed like far too much work, considering how long
> it's taken me to get this far ;-).
> 
> Bootstraped and fully tested with multi_threaded_model_p=true hard-coded.
> This is the revision I'd like to contribute, sans the hardcoded value.
> 
> What do you think?

I notice

+  /* Emit the load code into the latch, so that we are sure it will
+     be processed after all dependencies.  */
+  latch_edge = loop_latch_edge (loop);

inserting code into the latch block is bad - the vectorizer will
be confused by non-empty latches.

You might at first (as you suggested yourself ;)) avoid trying to
tackle the load issue.  The patch is already complex enough, and
a candidate for splitting out is the BB_IN_TRANSACTION change
(hereby pre-approved).

Your ChangeLog mentions changes that are no longer necessary
(the target hook).

I didn't quite get the store order issue - we _do_ preserve store
ordering right now, do we?  So how come introducing the flags
change that?

Thanks,
Richard.

Re: [PR tree-optimization/52558]: RFC: questions on store data race

Reply via email to