date:20140209

Re: type promotion

2014-02-09 Thread Richard Sandiford

Sorry for the late answer.

Umesh Kalappa  writes:
> Was porting gcc 4.8.1 to the private target which has 8 bit regs  and
> can be used as pair for 16bit like  AB ,CD but not BC or AD.
>
> I was stuck in the type promotion like
>
> int i;
> unsigned char c;
>
> int test ()
> {
>   i =c;
> }
>
> defined  the zero_extendqihi2 pattern for the above c construct  like
>
> (define_expand zero_extendqihi2
> [(set (operand:hi 0 "" """)
>  (zero_extend:hi (operand:qi 1)))]
> ""
> if(!reload_completed)
> {
> if(operands[1] != REG)
> operands[1]= force_reg(QI,operands[1]);
>
> /* Here i need to enforce gcc to use the next consective paired reg
> like B  if operands[1] is in  A reg or D if  operands[1] is in  C */
> }
>  )
>
> How do i module the above reguirement in the backend  ?

You might have already solved this by now, but this kind of restriction
is usually modelled via HARD_REGNO_MODE_OK.  A and C would be “OK”
for HImode but B and D wouldn't.

Did you single out zero_extend because the QImode input also needs
to follow the same rules?  If so, the way to model that depends on
the restriction.  If the QImode input to the zero_extend must be in
the low 8 bits of the output register then you can use matching
constraints like "0" for the input operand.  (It's OK to match
operands of different sizes like this.)  If instead the QImode
input is independent of the HImode output but can only go in
certain registers (perhaps just A or C, for example) then you
can define a register class that just includes those registers.
You'd then need an associated define_register_constraint.

Thanks,
Richard

gcc-4.9-20140209 is now available

2014-02-09 Thread gccadmin

Snapshot gcc-4.9-20140209 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20140209/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 207646

You'll find:

 gcc-4.9-20140209.tar.bz2 Complete GCC

  MD5=37b98103b5d83989a7c62539d33a284b
  SHA1=ccadc64501bc9298c5d5cc26a47d6098c3d11ff0

Diffs from 4.9-20140202 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-09 Thread Torvald Riegel

On Thu, 2014-02-06 at 20:20 -0800, Paul E. McKenney wrote:
> On Fri, Feb 07, 2014 at 12:44:48AM +0100, Torvald Riegel wrote:
> > On Thu, 2014-02-06 at 14:11 -0800, Paul E. McKenney wrote:
> > > On Thu, Feb 06, 2014 at 10:17:03PM +0100, Torvald Riegel wrote:
> > > > On Thu, 2014-02-06 at 11:27 -0800, Paul E. McKenney wrote:
> > > > > On Thu, Feb 06, 2014 at 06:59:10PM +, Will Deacon wrote:
> > > > > > There are also so many ways to blow your head off it's untrue. For 
> > > > > > example,
> > > > > > cmpxchg takes a separate memory model parameter for failure and 
> > > > > > success, but
> > > > > > then there are restrictions on the sets you can use for each. It's 
> > > > > > not hard
> > > > > > to find well-known memory-ordering experts shouting "Just use
> > > > > > memory_model_seq_cst for everything, it's too hard otherwise". Then 
> > > > > > there's
> > > > > > the fun of load-consume vs load-acquire (arm64 GCC completely 
> > > > > > ignores consume
> > > > > > atm and optimises all of the data dependencies away) as well as the 
> > > > > > definition
> > > > > > of "data races", which seem to be used as an excuse to miscompile a 
> > > > > > program
> > > > > > at the earliest opportunity.
> > > > > 
> > > > > Trust me, rcu_dereference() is not going to be defined in terms of
> > > > > memory_order_consume until the compilers implement it both correctly 
> > > > > and
> > > > > efficiently.  They are not there yet, and there is currently no 
> > > > > shortage
> > > > > of compiler writers who would prefer to ignore memory_order_consume.
> > > > 
> > > > Do you have any input on
> > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448?  In particular, the
> > > > language standard's definition of dependencies?
> > > 
> > > Let's see...  1.10p9 says that a dependency must be carried unless:
> > > 
> > > — B is an invocation of any specialization of std::kill_dependency 
> > > (29.3), or
> > > — A is the left operand of a built-in logical AND (&&, see 5.14) or 
> > > logical OR (||, see 5.15) operator,
> > > or
> > > — A is the left operand of a conditional (?:, see 5.16) operator, or
> > > — A is the left operand of the built-in comma (,) operator (5.18);
> > > 
> > > So the use of "flag" before the "?" is ignored.  But the "flag - flag"
> > > after the "?" will carry a dependency, so the code fragment in 59448
> > > needs to do the ordering rather than just optimizing "flag - flag" out
> > > of existence.  One way to do that on both ARM and Power is to actually
> > > emit code for "flag - flag", but there are a number of other ways to
> > > make that work.
> > 
> > And that's what would concern me, considering that these requirements
> > seem to be able to creep out easily.  Also, whereas the other atomics
> > just constrain compilers wrt. reordering across atomic accesses or
> > changes to the atomic accesses themselves, the dependencies are new
> > requirements on pieces of otherwise non-synchronizing code.  The latter
> > seems far more involved to me.
> 
> Well, the wording of 1.10p9 is pretty explicit on this point.
> There are only a few exceptions to the rule that dependencies from
> memory_order_consume loads must be tracked.  And to your point about
> requirements being placed on pieces of otherwise non-synchronizing code,
> we already have that with plain old load acquire and store release --
> both of these put ordering constraints that affect the surrounding
> non-synchronizing code.

I think there's a significant difference.  With acquire/release or more
general memory orders, it's true that we can't order _across_ the atomic
access.  However, we can reorder and optimize without additional
constraints if we do not reorder.  This is not the case with consume
memory order, as the (p + flag - flag) example shows.

> This issue got a lot of discussion, and the compromise is that
> dependencies cannot leak into or out of functions unless the relevant
> parameters or return values are annotated with [[carries_dependency]].
> This means that the compiler can see all the places where dependencies
> must be tracked.  This is described in 7.6.4.

I wasn't aware of 7.6.4 (but it isn't referred to as an additional
constraint--what it is--in 1.10, so I guess at least that should be
fixed).
Also, AFAIU, 7.6.4p3 is wrong in that the attribute does make a semantic
difference, at least if one is assuming that normal optimization of
sequential code is the default, and that maintaining things such as
(flag-flag) is not; if optimizing away (flag-flag) would require the
insertion of fences unless there is the carries_dependency attribute,
then this would be bad I think.

IMHO, the dependencies construct (carries_dependency, kill_dependency)
seem to be backwards to me.  They assume that the compiler preserves all
those dependencies including (flag-flag) by default, which prohibits
meaningful optimizations.  Instead, I guess there should be a construct
for explicitly exploiting the dependencies (e.g.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-09 Thread Torvald Riegel

On Fri, 2014-02-07 at 10:02 -0800, Paul E. McKenney wrote:
> On Fri, Feb 07, 2014 at 04:55:48PM +, Will Deacon wrote:
> > Hi Paul,
> > 
> > On Fri, Feb 07, 2014 at 04:50:28PM +, Paul E. McKenney wrote:
> > > On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
> > > > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> > > > > Hopefully some discussion of out-of-thin-air values as well.
> > > > 
> > > > Yes, absolutely shoot store speculation in the head already. Then drive
> > > > a wooden stake through its hart.
> > > > 
> > > > C11/C++11 should not be allowed to claim itself a memory model until 
> > > > that
> > > > is sorted.
> > > 
> > > There actually is a proposal being put forward, but it might not make ARM
> > > and Power people happy because it involves adding a compare, a branch,
> > > and an ISB/isync after every relaxed load...  Me, I agree with you,
> > > much preferring the no-store-speculation approach.
> > 
> > Can you elaborate a bit on this please? We don't permit speculative stores
> > in the ARM architecture, so it seems counter-intuitive that GCC needs to
> > emit any additional instructions to prevent that from happening.
> 
> Requiring a compare/branch/ISB after each relaxed load enables a simple(r)
> proof that out-of-thin-air values cannot be observed in the face of any
> compiler optimization that refrains from reordering a prior relaxed load
> with a subsequent relaxed store.
> 
> > Stores can, of course, be observed out-of-order but that's a lot more
> > reasonable :)
> 
> So let me try an example.  I am sure that Torvald Riegel will jump in
> with any needed corrections or amplifications:
> 
> Initial state: x == y == 0
> 
> T1:   r1 = atomic_load_explicit(x, memory_order_relaxed);
>   atomic_store_explicit(r1, y, memory_order_relaxed);
> 
> T2:   r2 = atomic_load_explicit(y, memory_order_relaxed);
>   atomic_store_explicit(r2, x, memory_order_relaxed);
> 
> One would intuitively expect r1 == r2 == 0 as the only possible outcome.
> But suppose that the compiler used specialization optimizations, as it
> would if there was a function that has a very lightweight implementation
> for some values and a very heavyweight one for other.  In particular,
> suppose that the lightweight implementation was for the value 42.
> Then the compiler might do something like the following:
> 
> Initial state: x == y == 0
> 
> T1:   r1 = atomic_load_explicit(x, memory_order_relaxed);
>   if (r1 == 42)
>   atomic_store_explicit(42, y, memory_order_relaxed);
>   else
>   atomic_store_explicit(r1, y, memory_order_relaxed);
> 
> T2:   r2 = atomic_load_explicit(y, memory_order_relaxed);
>   atomic_store_explicit(r2, x, memory_order_relaxed);
> 
> Suddenly we have an explicit constant 42 showing up.  Of course, if
> the compiler carefully avoided speculative stores (as both Peter and
> I believe that it should if its code generation is to be regarded as
> anything other than an act of vandalism, the words in the standard
> notwithstanding), there would be no problem.  But currently, a number
> of compiler writers see absolutely nothing wrong with transforming
> the optimized-for-42 version above with something like this:
> 
> Initial state: x == y == 0
> 
> T1:   r1 = atomic_load_explicit(x, memory_order_relaxed);
>   atomic_store_explicit(42, y, memory_order_relaxed);
>   if (r1 != 42)
>   atomic_store_explicit(r1, y, memory_order_relaxed);
> 
> T2:   r2 = atomic_load_explicit(y, memory_order_relaxed);
>   atomic_store_explicit(r2, x, memory_order_relaxed);

Intuitively, this is wrong because this let's the program take a step
the abstract machine wouldn't do.  This is different to the sequential
code that Peter posted because it uses atomics, and thus one can't
easily assume that the difference is not observable.

For this to be correct, the compiler would actually have to prove that
the speculative store is "as-if correct", which in turn would mean that
it needs to be aware of all potential observers, and check whether those
observers aren't actually affected by the speculative store.

I would guess that the compilers you have in mind don't really do that.
If they do, then I don't see why this should be okay, unless you think
out-of-thin-air values are something good (which I wouldn't agree with).

> And then it is a short and uncontroversial step to the following:
> 
> Initial state: x == y == 0
> 
> T1:   atomic_store_explicit(42, y, memory_order_relaxed);
>   r1 = atomic_load_explicit(x, memory_order_relaxed);
>   if (r1 != 42)
>   atomic_store_explicit(r1, y, memory_order_relaxed);
> 
> T2:   r2 = atomic_load_explicit(y, memory_order_relaxed);
>   atomic_store_explicit(r2, x, memory_order_relaxed);
> 
> This can of course result in r1 == r2 == 42, even though the constant
> 42 never appeared in the original code.  This is one way to generate
> an out-of-thin-air value.
>

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-09 Thread Linus Torvalds

On Sun, Feb 9, 2014 at 4:27 PM, Torvald Riegel  wrote:
>
> I wouldn't characterize the situation like this (although I can't speak
> for others, obviously).  IMHO, it's perfectly fine on sequential /
> non-synchronizing code, because we know the difference isn't observable
> by a correct program.

What BS is that? If you use an "atomic_store_explicit()", by
definition you're either

 (a) f*cking insane
 (b) not doing sequential non-synchronizing code

and a compiler that assumes that the programmer is insane may actually
be correct more often than not, but it's still a shit compiler.
Agreed?

So I don't see how any sane person can say that speculative writes are
ok. They are clearly not ok.

Speculative stores are a bad idea in general. They are completely
invalid for anything that says "atomic". This is not even worth
discussing.

Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-09 Thread Torvald Riegel

On Sun, 2014-02-09 at 16:56 -0800, Linus Torvalds wrote:
> On Sun, Feb 9, 2014 at 4:27 PM, Torvald Riegel  wrote:
> >
> > I wouldn't characterize the situation like this (although I can't speak
> > for others, obviously).  IMHO, it's perfectly fine on sequential /
> > non-synchronizing code, because we know the difference isn't observable
> > by a correct program.
> 
> What BS is that? If you use an "atomic_store_explicit()", by
> definition you're either
> 
>  (a) f*cking insane
>  (b) not doing sequential non-synchronizing code
> 
> and a compiler that assumes that the programmer is insane may actually
> be correct more often than not, but it's still a shit compiler.
> Agreed?

Due to all the expletives, I can't really understand what you are
saying.  Nor does what I guess it might mean seem to relate to what I
said.

(a) seems to say that you don't like requiring programmers to mark
atomic accesses specially.  Is that the case?

If so, I have to disagree.  If you're writing concurrent code, marking
the atomic accesses is the least of your problem.  Instead, for the
concurrent code I've written, it rather improved readability; it made
code more verbose, but in turn it made the synchronization easier to
see.

Beside this question of taste (and I don't care what the Kernel style
guides are), there is a real technical argument here, though:  Requiring
all synchronizing memory accesses to be annotated makes the compiler
aware what is sequential code, and what is not.  Without it, one can't
exploit data-race-freedom.  So unless we want to slow down optimization
of sequential code, we need to tell the compiler what is what.  If every
access is potentially synchronizing, then this doesn't just prevent
speculative stores.

(b) seems as if you are saying that if there is a specially annotated
atomic access in the code, that this isn't sequential/non-synchronizing
code anymore.  I don't agree with that, obviously.

> So I don't see how any sane person can say that speculative writes are
> ok. They are clearly not ok.

We are discussing programs written against the C11/C++11 memory model
here.  At least that's the part that got forwarded to gcc@gcc.gnu.org,
and was subject of the nearest emails in the thread.

This memory model requires programs to be data-race-free.  Thus, we can
optimize accordingly.  If you don't like that, then this isn't C11/C++11
anymore.  Which would be fine, but then complain about that
specifically.

> Speculative stores are a bad idea in general.

I disagree in the context of C11/C++11 programs.  At least from a
correctness point of view.

> They are completely
> invalid for anything that says "atomic".

I agree, as you will see when you read the other emails I posted as part
of this thread.  But those two things are separate.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-09 Thread Linus Torvalds

On Sun, Feb 9, 2014 at 5:16 PM, Torvald Riegel  wrote:
>
> (a) seems to say that you don't like requiring programmers to mark
> atomic accesses specially.  Is that the case?

In Paul's example, they were marked specially.

And you seemed to argue that Paul's example could possibly return
anything but 0/0.

If you really think that, I hope to God that you have nothing to do
with the C standard or any actual compiler I ever use.

Because such a standard or compiler would be shit. It's sadly not too uncommon.

 Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-09 Thread Torvald Riegel

On Sun, 2014-02-09 at 17:24 -0800, Linus Torvalds wrote:
> On Sun, Feb 9, 2014 at 5:16 PM, Torvald Riegel  wrote:
> >
> > (a) seems to say that you don't like requiring programmers to mark
> > atomic accesses specially.  Is that the case?
> 
> In Paul's example, they were marked specially.
> 
> And you seemed to argue that Paul's example could possibly return
> anything but 0/0.

Just read my reply to Paul again.  Here's an excerpt:

> Initial state: x == y == 0
> 
> T1:   r1 = atomic_load_explicit(x, memory_order_relaxed);
>   atomic_store_explicit(42, y, memory_order_relaxed);
>   if (r1 != 42)
>   atomic_store_explicit(r1, y, memory_order_relaxed);
> 
> T2:   r2 = atomic_load_explicit(y, memory_order_relaxed);
>   atomic_store_explicit(r2, x, memory_order_relaxed);

Intuitively, this is wrong because this let's the program take a step
the abstract machine wouldn't do.  This is different to the sequential
code that Peter posted because it uses atomics, and thus one can't
easily assume that the difference is not observable.

IOW, I wrote that such a compiler transformation would be wrong in my
opinion.  Thus, it should *not* return 42.
If you still see how what I wrote could be misunderstood, please let me
know because I really don't see it.

Yes, I don't do so by swearing or calling other insane, and I try to see
the reasoning that those compilers' authors might have had, even if I
don't agree with it.  In my personal opinion, that's a good thing.

> If you really think that, I hope to God that you have nothing to do
> with the C standard or any actual compiler I ever use.
> 
> Because such a standard or compiler would be shit. It's sadly not too 
> uncommon.

Thanks for the kind words.  Perhaps writing that was quicker than
reading what I actually wrote.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-09 Thread Linus Torvalds

On Sun, Feb 9, 2014 at 5:46 PM, Torvald Riegel  wrote:
>
> IOW, I wrote that such a compiler transformation would be wrong in my
> opinion.  Thus, it should *not* return 42.

Ahh, I am happy to have misunderstood. The "intuitively" threw me,
because I thought that was building up to a "but", and misread the
rest.

I then react stronly, because I've seen so much total crap (the
type-based C aliasing rules topping my list) etc coming out of
standards groups because it allows them to generate wrong code that
goes faster, that I just assume compiler people are out to do stupid
things in the name of "..but the standard allows it".

  Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-09 Thread Paul E. McKenney

On Mon, Feb 10, 2014 at 01:27:51AM +0100, Torvald Riegel wrote:
> On Fri, 2014-02-07 at 10:02 -0800, Paul E. McKenney wrote:
> > On Fri, Feb 07, 2014 at 04:55:48PM +, Will Deacon wrote:
> > > Hi Paul,
> > > 
> > > On Fri, Feb 07, 2014 at 04:50:28PM +, Paul E. McKenney wrote:
> > > > On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
> > > > > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> > > > > > Hopefully some discussion of out-of-thin-air values as well.
> > > > > 
> > > > > Yes, absolutely shoot store speculation in the head already. Then 
> > > > > drive
> > > > > a wooden stake through its hart.
> > > > > 
> > > > > C11/C++11 should not be allowed to claim itself a memory model until 
> > > > > that
> > > > > is sorted.
> > > > 
> > > > There actually is a proposal being put forward, but it might not make 
> > > > ARM
> > > > and Power people happy because it involves adding a compare, a branch,
> > > > and an ISB/isync after every relaxed load...  Me, I agree with you,
> > > > much preferring the no-store-speculation approach.
> > > 
> > > Can you elaborate a bit on this please? We don't permit speculative stores
> > > in the ARM architecture, so it seems counter-intuitive that GCC needs to
> > > emit any additional instructions to prevent that from happening.
> > 
> > Requiring a compare/branch/ISB after each relaxed load enables a simple(r)
> > proof that out-of-thin-air values cannot be observed in the face of any
> > compiler optimization that refrains from reordering a prior relaxed load
> > with a subsequent relaxed store.
> > 
> > > Stores can, of course, be observed out-of-order but that's a lot more
> > > reasonable :)
> > 
> > So let me try an example.  I am sure that Torvald Riegel will jump in
> > with any needed corrections or amplifications:
> > 
> > Initial state: x == y == 0
> > 
> > T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
> > atomic_store_explicit(r1, y, memory_order_relaxed);
> > 
> > T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
> > atomic_store_explicit(r2, x, memory_order_relaxed);
> > 
> > One would intuitively expect r1 == r2 == 0 as the only possible outcome.
> > But suppose that the compiler used specialization optimizations, as it
> > would if there was a function that has a very lightweight implementation
> > for some values and a very heavyweight one for other.  In particular,
> > suppose that the lightweight implementation was for the value 42.
> > Then the compiler might do something like the following:
> > 
> > Initial state: x == y == 0
> > 
> > T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
> > if (r1 == 42)
> > atomic_store_explicit(42, y, memory_order_relaxed);
> > else
> > atomic_store_explicit(r1, y, memory_order_relaxed);
> > 
> > T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
> > atomic_store_explicit(r2, x, memory_order_relaxed);
> > 
> > Suddenly we have an explicit constant 42 showing up.  Of course, if
> > the compiler carefully avoided speculative stores (as both Peter and
> > I believe that it should if its code generation is to be regarded as
> > anything other than an act of vandalism, the words in the standard
> > notwithstanding), there would be no problem.  But currently, a number
> > of compiler writers see absolutely nothing wrong with transforming
> > the optimized-for-42 version above with something like this:
> > 
> > Initial state: x == y == 0
> > 
> > T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
> > atomic_store_explicit(42, y, memory_order_relaxed);
> > if (r1 != 42)
> > atomic_store_explicit(r1, y, memory_order_relaxed);
> > 
> > T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
> > atomic_store_explicit(r2, x, memory_order_relaxed);
> 
> Intuitively, this is wrong because this let's the program take a step
> the abstract machine wouldn't do.  This is different to the sequential
> code that Peter posted because it uses atomics, and thus one can't
> easily assume that the difference is not observable.
> 
> For this to be correct, the compiler would actually have to prove that
> the speculative store is "as-if correct", which in turn would mean that
> it needs to be aware of all potential observers, and check whether those
> observers aren't actually affected by the speculative store.
> 
> I would guess that the compilers you have in mind don't really do that.
> If they do, then I don't see why this should be okay, unless you think
> out-of-thin-air values are something good (which I wouldn't agree with).

OK, we agree that pulling the atomic store to y out of its "if" statement
is a bad thing.  Very good!  Now we just have to convince others on
the committee.  ;-)

Thanx, Paul

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-09 Thread Paul E. McKenney

On Mon, Feb 10, 2014 at 01:27:51AM +0100, Torvald Riegel wrote:
> On Fri, 2014-02-07 at 10:02 -0800, Paul E. McKenney wrote:
> > On Fri, Feb 07, 2014 at 04:55:48PM +, Will Deacon wrote:

[ . . . ]

> > And then it is a short and uncontroversial step to the following:
> > 
> > Initial state: x == y == 0
> > 
> > T1: atomic_store_explicit(42, y, memory_order_relaxed);
> > r1 = atomic_load_explicit(x, memory_order_relaxed);
> > if (r1 != 42)
> > atomic_store_explicit(r1, y, memory_order_relaxed);
> > 
> > T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
> > atomic_store_explicit(r2, x, memory_order_relaxed);
> > 
> > This can of course result in r1 == r2 == 42, even though the constant
> > 42 never appeared in the original code.  This is one way to generate
> > an out-of-thin-air value.
> > 
> > As near as I can tell, compiler writers hate the idea of prohibiting
> > speculative-store optimizations because it requires them to introduce
> > both control and data dependency tracking into their compilers.
> 
> I wouldn't characterize the situation like this (although I can't speak
> for others, obviously).  IMHO, it's perfectly fine on sequential /
> non-synchronizing code, because we know the difference isn't observable
> by a correct program.  For synchronizing code, compilers just shouldn't
> do it, or they would have to truly prove that speculation is harmless.
> That will be hard, so I think it should just be avoided.
> 
> Synchronization code will likely have been tuned anyway (especially if
> it uses relaxed MO), so I don't see a large need for trying to optimize
> using speculative atomic stores.
> 
> Thus, I think there's an easy and practical solution.

I like this approach, but there has been resistance to it in the past.
Definitely worth a good try, though!

> > Many of
> > them seem to hate dependency tracking with a purple passion.  At least,
> > such a hatred would go a long way towards explaining the incomplete
> > and high-overhead implementations of memory_order_consume, the long
> > and successful use of idioms based on the memory_order_consume pattern
> > notwithstanding [*].  ;-)
> 
> I still think that's different because it blurs the difference between
> sequential code and synchronizing code (ie, atomic accesses).  With
> consume MO, the simple solution above doesn't work anymore, because
> suddenly synchronizing code does affect optimizations in sequential
> code, even if that wouldn't reorder across the synchronizing code (which
> would be clearly "visible" to the implementation of the optimization).

I understand that memory_order_consume is a bit harder on compiler
writers than the other memory orders, but it is also pretty valuable.

Thanx, Paul

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-09 Thread Paul E. McKenney

On Mon, Feb 10, 2014 at 01:06:48AM +0100, Torvald Riegel wrote:
> On Thu, 2014-02-06 at 20:20 -0800, Paul E. McKenney wrote:
> > On Fri, Feb 07, 2014 at 12:44:48AM +0100, Torvald Riegel wrote:
> > > On Thu, 2014-02-06 at 14:11 -0800, Paul E. McKenney wrote:
> > > > On Thu, Feb 06, 2014 at 10:17:03PM +0100, Torvald Riegel wrote:
> > > > > On Thu, 2014-02-06 at 11:27 -0800, Paul E. McKenney wrote:
> > > > > > On Thu, Feb 06, 2014 at 06:59:10PM +, Will Deacon wrote:
> > > > > > > There are also so many ways to blow your head off it's untrue. 
> > > > > > > For example,
> > > > > > > cmpxchg takes a separate memory model parameter for failure and 
> > > > > > > success, but
> > > > > > > then there are restrictions on the sets you can use for each. 
> > > > > > > It's not hard
> > > > > > > to find well-known memory-ordering experts shouting "Just use
> > > > > > > memory_model_seq_cst for everything, it's too hard otherwise". 
> > > > > > > Then there's
> > > > > > > the fun of load-consume vs load-acquire (arm64 GCC completely 
> > > > > > > ignores consume
> > > > > > > atm and optimises all of the data dependencies away) as well as 
> > > > > > > the definition
> > > > > > > of "data races", which seem to be used as an excuse to miscompile 
> > > > > > > a program
> > > > > > > at the earliest opportunity.
> > > > > > 
> > > > > > Trust me, rcu_dereference() is not going to be defined in terms of
> > > > > > memory_order_consume until the compilers implement it both 
> > > > > > correctly and
> > > > > > efficiently.  They are not there yet, and there is currently no 
> > > > > > shortage
> > > > > > of compiler writers who would prefer to ignore memory_order_consume.
> > > > > 
> > > > > Do you have any input on
> > > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448?  In particular, the
> > > > > language standard's definition of dependencies?
> > > > 
> > > > Let's see...  1.10p9 says that a dependency must be carried unless:
> > > > 
> > > > — B is an invocation of any specialization of std::kill_dependency 
> > > > (29.3), or
> > > > — A is the left operand of a built-in logical AND (&&, see 5.14) or 
> > > > logical OR (||, see 5.15) operator,
> > > > or
> > > > — A is the left operand of a conditional (?:, see 5.16) operator, or
> > > > — A is the left operand of the built-in comma (,) operator (5.18);
> > > > 
> > > > So the use of "flag" before the "?" is ignored.  But the "flag - flag"
> > > > after the "?" will carry a dependency, so the code fragment in 59448
> > > > needs to do the ordering rather than just optimizing "flag - flag" out
> > > > of existence.  One way to do that on both ARM and Power is to actually
> > > > emit code for "flag - flag", but there are a number of other ways to
> > > > make that work.
> > > 
> > > And that's what would concern me, considering that these requirements
> > > seem to be able to creep out easily.  Also, whereas the other atomics
> > > just constrain compilers wrt. reordering across atomic accesses or
> > > changes to the atomic accesses themselves, the dependencies are new
> > > requirements on pieces of otherwise non-synchronizing code.  The latter
> > > seems far more involved to me.
> > 
> > Well, the wording of 1.10p9 is pretty explicit on this point.
> > There are only a few exceptions to the rule that dependencies from
> > memory_order_consume loads must be tracked.  And to your point about
> > requirements being placed on pieces of otherwise non-synchronizing code,
> > we already have that with plain old load acquire and store release --
> > both of these put ordering constraints that affect the surrounding
> > non-synchronizing code.
> 
> I think there's a significant difference.  With acquire/release or more
> general memory orders, it's true that we can't order _across_ the atomic
> access.  However, we can reorder and optimize without additional
> constraints if we do not reorder.  This is not the case with consume
> memory order, as the (p + flag - flag) example shows.

Agreed, memory_order_consume does introduce additional restrictions.

> > This issue got a lot of discussion, and the compromise is that
> > dependencies cannot leak into or out of functions unless the relevant
> > parameters or return values are annotated with [[carries_dependency]].
> > This means that the compiler can see all the places where dependencies
> > must be tracked.  This is described in 7.6.4.
> 
> I wasn't aware of 7.6.4 (but it isn't referred to as an additional
> constraint--what it is--in 1.10, so I guess at least that should be
> fixed).
> Also, AFAIU, 7.6.4p3 is wrong in that the attribute does make a semantic
> difference, at least if one is assuming that normal optimization of
> sequential code is the default, and that maintaining things such as
> (flag-flag) is not; if optimizing away (flag-flag) would require the
> insertion of fences unless there is the carries_dependency attribute,
> then this would be bad I think.

No, the attribute does not

Re: type promotion

gcc-4.9-20140209 is now available

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

12 matches

Site Navigation

Mail list logo

Footer information