On Fri, 2012-02-24 at 09:58 +0100, Richard Guenther wrote: > On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandez <al...@redhat.com> wrote: > > On 02/23/12 12:19, Aldy Hernandez wrote: > > > >> about hit me. Instead now I save all loads in a function and iterate > >> through them in a brute force way. I'd like to rewrite this into a hash > >> of some sort, but before I go any further I'm interested to know if the > >> main idea is ok. > > > > > > For the record, it may be ideal to reuse some of the iterations we already > > do over the function's basic blocks, so we don't have to iterate yet again > > over the IL stream. Though it may complicate the pass unnecessarily. > > It's definitely ugly that you need to do this.
Indeed. But that's simply the price we have to pay for making publication with transactions easier for the programmer yet still at acceptable performance. Also note that what we primarily have to care about for this PR is to not hoist loads _within_ transactions if they would violate publication safety. I didn't have time to look at Aldy's patch yet, but a first safe and conservative way would be to treat transactions as full transformation barriers, and prevent publication-safety-violating transformations _within_ transactions. Which I would prefer until we're confident that we understood all of it. For hoisting out of or across transactions, we have to reason about more than just publication safety. > And yes, you have to look at > very many passes I guess, PRE comes to my mind at least. > > I still do not understand the situation very well, at least why for > > transaction { for () if (b) load; } load; > > it should be safe to hoist load out of the transaction This one is funny. *Iff* this is an atomic txn, we can assume that the transaction does not wait for a concurrent event. If it is a relaxed txn, then we must not have a loop which terminates depending on an atomic load (which we don't in that example); otherwise, we cannot hoist load to before the txn (same reason as why we must not hoist to before an atomic load with memory_order_acquire). Now, if these things hold, then the load will always be executed after the txn. Thus, we can assume that it will happen anyway and nothing can stop it (irrespective of which position the txn gets in the Transactional Synchronization Order (see the C++ TM spec)). It is a nonatomic load, and we can rely on the program being data-race-free, so we cannot have other threads storing to the same location, so we can hoist it across because in that part of the program, the location is guaranteed to be thread-local data (or immutable). As I said, this isn't related to just pub safety anymore. And this is tricky enough that I'd rather not try to optimize this currently but instead wait until we have more confidence in our understanding of the matter. > while for > > load; transaction { for () if (b) load; } > > it is not. If the same assumptions as above hold, I think you can hoist it out, because again you can assume that it targets thread-local/immutable data: the nontransactional (+nonatomic!) load can happen at any time, essentially, irrespective of b's value or how/when other threads modify it. Thus, it cannot have been changed between the two loads in a data-race-free program. > Neither do I understand why it's not ok for > > transaction { for () if (b) load; } > > to hoist the load out of the transaction. You would violate publication safety. Also, you don't have any reason to believe that load targets thread-local/immutable data, so you must not make it nontransactional (otherwise, you could introduce a race condition). > > I assume all the issue is about hoisting things over the trasaction start. > So - why does the transaction start not simply clobber all global memory? > That would avoid the hoisting. I assume that transforming the above to > > transaction { tem = load; for () if (b) = tem; } > > is ok? No, it is not. Actually, this is precisely the transformation that we need to prevent from happening. As Aldy said, please see the explanations in the PR, or have a look at the C++ TM specification and the C++11 memory model alternatively. We could also discuss this on IRC, if this would be easier. Torvald