Hi, On Fri, 26 Oct 2007, Tomash Brechko wrote:
> On Fri, Oct 26, 2007 at 21:45:03 +0400, Tomash Brechko wrote: > > Note that it doesn't cancel cmoves, as those are loads, not stores. > > I just checked with x86 instruction reference, CMOVcc is reg->reg or > mem->reg, never reg->mem. You know God's deed when you see it. :) I wasn't precise in what actually is the important optimization. The important thing about this loop is, that the data is basically random, so that branch prediction has no chance to do any good. Consequentially all branches in that loop have a pretty high cost. So high in fact that it's better to replace it with conditional moves on the value to store and make the store unconditional. So, yes, there are no conditional store instructions on x86, but the branches need to be removed anyway for performance, and for that we need to make the stores unconditional (even at the cost of perhaps introducing another load). You are also right that for that example we can determine that an unconditional store already dominates (and postdominates) the conditional stores in question and hence would already be thread-unsafe, so the transformation would be okay even with thread-safeness in mind. I was merely showing that this transformation _does_ matter in some cases to refute opposite claims which seemed to be expressed too airy in this thread. Now there are multiple ways out of this dilemma, retaining the transformation and not breaking threaded code: 1) do the transformation only if there are already other stores in an outer control region. I see that already being worked on down-thread. 2) do the transformation but also conditionalize the address of the store: if (cond) *p = val; ---> __typeof__ (*p) dummy; if (!cond) p = &dummy; // dummy a stack slot, hence no trap, no thread // implications *p = val; I plan to work on the latter anyway somewhen as it also allows me to do the transformation if unconditional non-trappingness can't be proven. Ciao, Michael. P.S.: I'm still somewhat disappointed about the way this discussion goes, it reminds me of the ugly one about signed integer overflow. There it was an overly vocal set of people refusing to write ISO C which lead to a very intrusive change in the compiler. Now this seems to happen again (though no such intrusive changes would be required right now, but perhaps for the other memory model). Then and now the presumed "deficiencies" did exist already since years, but for some unfathomable reason only resulted in tempest in a teacup recently. I don't think it's a good strategy to change the compiler into a strictly speaking wrong direction whenever the loudness of whiners reaches a certain amount.