Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 10, 2014 at 01:27:51AM +0100, Torvald Riegel wrote: > > Initial state: x == y == 0 > > > > T1: r1 = atomic_load_explicit(x, memory_order_relaxed); > > atomic_store_explicit(42, y, memory_order_relaxed); > > if (r1 != 42) > > atomic_store_explicit(r1, y, memory_order_relaxed); > > > > T2: r2 = atomic_load_explicit(y, memory_order_relaxed); > > atomic_store_explicit(r2, x, memory_order_relaxed); > > Intuitively, this is wrong because this let's the program take a step > the abstract machine wouldn't do. This is different to the sequential > code that Peter posted because it uses atomics, and thus one can't > easily assume that the difference is not observable. Yeah, my bad for not being familiar with the atrocious crap C11 made of atomics :/
Re: [RFC][PATCH 0/5] arch: atomic rework
On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote: > As near as I can tell, compiler writers hate the idea of prohibiting > speculative-store optimizations because it requires them to introduce > both control and data dependency tracking into their compilers. Many of > them seem to hate dependency tracking with a purple passion. At least, > such a hatred would go a long way towards explaining the incomplete > and high-overhead implementations of memory_order_consume, the long > and successful use of idioms based on the memory_order_consume pattern > notwithstanding [*]. ;-) Just tell them that because the hardware provides control dependencies we actually use and rely on them. Not that I expect they care too much what we do, given the current state of things.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 10, 2014 at 11:48:13AM +, Peter Zijlstra wrote: > On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote: > > As near as I can tell, compiler writers hate the idea of prohibiting > > speculative-store optimizations because it requires them to introduce > > both control and data dependency tracking into their compilers. Many of > > them seem to hate dependency tracking with a purple passion. At least, > > such a hatred would go a long way towards explaining the incomplete > > and high-overhead implementations of memory_order_consume, the long > > and successful use of idioms based on the memory_order_consume pattern > > notwithstanding [*]. ;-) > > Just tell them that because the hardware provides control dependencies > we actually use and rely on them. s/control/address/ ? Will
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 10, 2014 at 11:49:29AM +, Will Deacon wrote: > On Mon, Feb 10, 2014 at 11:48:13AM +, Peter Zijlstra wrote: > > On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote: > > > As near as I can tell, compiler writers hate the idea of prohibiting > > > speculative-store optimizations because it requires them to introduce > > > both control and data dependency tracking into their compilers. Many of > > > them seem to hate dependency tracking with a purple passion. At least, > > > such a hatred would go a long way towards explaining the incomplete > > > and high-overhead implementations of memory_order_consume, the long > > > and successful use of idioms based on the memory_order_consume pattern > > > notwithstanding [*]. ;-) > > > > Just tell them that because the hardware provides control dependencies > > we actually use and rely on them. > > s/control/address/ ? Nope, control. Since stores cannot be speculated and thus require linear control flow history we can use it to order LOAD -> STORE when the LOAD is required for the control flow decision and the STORE depends on the control flow path. Also see commit 18c03c61444a211237f3d4782353cb38dba795df to Documentation/memory-barriers.txt --- commit c7f2e3cd6c1f4932ccc4135d050eae3f7c7aef63 Author: Peter Zijlstra Date: Mon Nov 25 11:49:10 2013 +0100 perf: Optimize ring-buffer write by depending on control dependencies Remove a full barrier from the ring-buffer write path by relying on a control dependency to order a LOAD -> STORE scenario. Cc: "Paul E. McKenney" Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/n/tip-8alv40z6ikk57jzbaobnx...@git.kernel.org Signed-off-by: Ingo Molnar diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index e8b168af135b..146a5792b1d2 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -61,19 +61,20 @@ static void perf_output_put_handle(struct perf_output_handle *handle) * * kernel user * -* READ ->data_tail READ ->data_head -* smp_mb() (A) smp_rmb() (C) -* WRITE $dataREAD $data -* smp_wmb() (B) smp_mb()(D) -* STORE ->data_head WRITE ->data_tail +* if (LOAD ->data_tail) {LOAD ->data_head +* (A) smp_rmb() (C) +* STORE $data LOAD $data +* smp_wmb() (B) smp_mb()(D) +* STORE ->data_head STORE ->data_tail +* } * * Where A pairs with D, and B pairs with C. * -* I don't think A needs to be a full barrier because we won't in fact -* write data until we see the store from userspace. So we simply don't -* issue the data WRITE until we observe it. Be conservative for now. +* In our case (A) is a control dependency that separates the load of +* the ->data_tail and the stores of $data. In case ->data_tail +* indicates there is no room in the buffer to store $data we do not. * -* OTOH, D needs to be a full barrier since it separates the data READ +* D needs to be a full barrier since it separates the data READ * from the tail WRITE. * * For B a WMB is sufficient since it separates two WRITEs, and for C @@ -81,7 +82,7 @@ static void perf_output_put_handle(struct perf_output_handle *handle) * * See perf_output_begin(). */ - smp_wmb(); + smp_wmb(); /* B, matches C */ rb->user_page->data_head = head; /* @@ -144,17 +145,26 @@ int perf_output_begin(struct perf_output_handle *handle, if (!rb->overwrite && unlikely(CIRC_SPACE(head, tail, perf_data_size(rb)) < size)) goto fail; + + /* +* The above forms a control dependency barrier separating the +* @tail load above from the data stores below. Since the @tail +* load is required to compute the branch to fail below. +* +* A, matches D; the full memory barrier userspace SHOULD issue +* after reading the data and before storing the new tail +* position. +* +* See perf_output_put_handle(). +*/ + head += size; } while (local_cmpxchg(&rb->head, offset, head) != offset); /* -* Separate the userpage->tail read from the data stores below. -* Matches the MB userspace SHOULD issue after reading the data -* and before storing the new tail position. -* -* See perf_output_put_handle(). +* We rely o
Re: LLVM collaboration?
On Fri, Feb 7, 2014 at 5:07 PM, Renato Golin wrote: > * GCC and LLVM collaboration / The Open Source Compiler Initiative > > With LLVM mature enough to feature as the default toolchain in some > Unix distributions, and with the inherent (and profitable) share of > solutions, ideas and code between the two, we need to start talking at > a more profound level. There will always be problems that can't be > included in any standard (language, extension, or machine-specific) > and are intrinsic to the compilation infrastructure. For those, and > other common problems, we need common solutions to at least both LLVM > and GCC, but ideally any open source (and even closed source) > toolchain. In this BoF session, we shall discuss to what extent this > collaboration can take us, how we should start and what are the next > steps to make this happen. Looks good. Registered. Thanks. Diego.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 10, 2014 at 11:49:29AM +, Will Deacon wrote: > On Mon, Feb 10, 2014 at 11:48:13AM +, Peter Zijlstra wrote: > > On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote: > > > As near as I can tell, compiler writers hate the idea of prohibiting > > > speculative-store optimizations because it requires them to introduce > > > both control and data dependency tracking into their compilers. Many of > > > them seem to hate dependency tracking with a purple passion. At least, > > > such a hatred would go a long way towards explaining the incomplete > > > and high-overhead implementations of memory_order_consume, the long > > > and successful use of idioms based on the memory_order_consume pattern > > > notwithstanding [*]. ;-) > > > > Just tell them that because the hardware provides control dependencies > > we actually use and rely on them. > > s/control/address/ ? Both are important, but as Peter's reply noted, it was control dependencies under discussion. Data dependencies (which include the ARM/PowerPC notion of address dependencies) are called out by the standard already, but control dependencies are not. I am not all that satisified by current implementations of data dependencies, admittedly. Should be an interesting discussion. ;-) Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 10, 2014 at 03:04:43PM +, Paul E. McKenney wrote: > On Mon, Feb 10, 2014 at 11:49:29AM +, Will Deacon wrote: > > On Mon, Feb 10, 2014 at 11:48:13AM +, Peter Zijlstra wrote: > > > On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote: > > > > As near as I can tell, compiler writers hate the idea of prohibiting > > > > speculative-store optimizations because it requires them to introduce > > > > both control and data dependency tracking into their compilers. Many of > > > > them seem to hate dependency tracking with a purple passion. At least, > > > > such a hatred would go a long way towards explaining the incomplete > > > > and high-overhead implementations of memory_order_consume, the long > > > > and successful use of idioms based on the memory_order_consume pattern > > > > notwithstanding [*]. ;-) > > > > > > Just tell them that because the hardware provides control dependencies > > > we actually use and rely on them. > > > > s/control/address/ ? > > Both are important, but as Peter's reply noted, it was control > dependencies under discussion. Data dependencies (which include the > ARM/PowerPC notion of address dependencies) are called out by the standard > already, but control dependencies are not. I am not all that satisified > by current implementations of data dependencies, admittedly. Should > be an interesting discussion. ;-) Ok, but since you can't use control dependencies to order LOAD -> LOAD, it's a pretty big ask of the compiler to make use of them for things like consume, where a data dependency will suffice for any combination of accesses. Will
Re: [RFC][PATCH 0/5] arch: atomic rework
On Sun, Feb 9, 2014 at 4:27 PM, Torvald Riegel wrote: > > Intuitively, this is wrong because this let's the program take a step > the abstract machine wouldn't do. This is different to the sequential > code that Peter posted because it uses atomics, and thus one can't > easily assume that the difference is not observable. Btw, what is the definition of "observable" for the atomics? Because I'm hoping that it's not the same as for volatiles, where "observable" is about the virtual machine itself, and as such volatile accesses cannot be combined or optimized at all. Now, I claim that atomic accesses cannot be done speculatively for writes, and not re-done for reads (because the value could change), but *combining* them would be possible and good. For example, we often have multiple independent atomic accesses that could certainly be combined: testing the individual bits of an atomic value with helper functions, causing things like "load atomic, test bit, load same atomic, test another bit". The two atomic loads could be done as a single load without possibly changing semantics on a real machine, but if "visibility" is defined in the same way it is for "volatile", that wouldn't be a valid transformation. Right now we use "volatile" semantics for these kinds of things, and they really can hurt. Same goes for multiple writes (possibly due to setting bits): combining multiple accesses into a single one is generally fine, it's *adding* write accesses speculatively that is broken by design.. At the same time, you can't combine atomic loads or stores infinitely - "visibility" on a real machine definitely is about timeliness. Removing all but the last write when there are multiple consecutive writes is generally fine, even if you unroll a loop to generate those writes. But if what remains is a loop, it might be a busy-loop basically waiting for something, so it would be wrong ("untimely") to hoist a store in a loop entirely past the end of the loop, or hoist a load in a loop to before the loop. Does the standard allow for that kind of behavior? Linus
Conditional execution over emit_move_insn
Hi, I'd like to hardcode conditional execution of emit_move_insn based on the predicate checking that the address in the destination argument is non-NULL. The platform supports conditional execution, but doesn't have explicitly defined conditional moves (target=tic6x). I have already tried to find any look-alike pieces in the gcc code tree but without success - I am new here. As for the background - I am trying to work around the bug I submitted (id=60123) before there's an official patch for it available. I appreciate any help. Thanks, Wojciech
Re: Fwd: LLVM collaboration?
> 1. There IS an unnecessary fence between GCC and LLVM. > > License arguments are one reason why we can't share code as easily as > we would like, but there is no argument against sharing ideas, > cross-reporting bugs, helping each other implement a better > compiler/linker/assembler/libraries just because of an artificial > wall. We need to break this wall. > > I rarely see GCC folks reporting bugs on our side, or people saying > "we should check with the GCC folks" actually doing it. We're not > contagious folks, you know. Talking to GCC engineers won't make me a > lesser LLVM engineer, and vice-versa. One practical experience I have with LLVM developers is sharing experiences about getting Firefox to work with LTO with Rafael Espindola and I think it was useful for both of us. I am definitly open to more discussion. Lets try a specific topic that is on my TODO list for some time. I would like to make it possible for mutliple compilers to be used to LTO a single binary. As we are all making LTO more useful, I think it is matter of time until people will start shipping LTO object files by default and users will end up feeding them into different compilers or incompatible version of the same compiler. We probably want to make this work, even thought the cross-module optimization will not happen in this case. The plugin interface in binutils seems to do its job well both for GCC and LLVM and I hope that open64 and ICC will eventually join, too. The trouble however is that one needs to pass explicit --plugin argument specifying the particular plugin to load and so GCC ships with its own wrappers (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar thing. It may be smoother if binutils was able to load multiple plugins at once and grab plugins from system and user installed compilers without explicit --plugin argument. Binutils probably should also have a way to detect LTO object files and produce more useful diagnostic than they do now, when there is no plugin claiming them. There are some PRs filled on the topic http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=15300 http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=13227 but not much progress on them. I wonder if we can get this designed and implemented. On the other hand, GCC current maintains non-plugin path for LTO that is now only used by darwin port due to lack of plugin enabled LD there. It seems that liblto used by darwin is losely compatible with the plugin API, but it makes it harder to have different compilers share it (one has to LD_PRELOAD liblto to different one prior executing the linker?) I wonder, is there chance to implement linker plugin API to libLTO glue or add plugin support to native Darwin tools? Honza
Re: Google Summer of Code -- Admin needed
On 6/02/2014, at 7:45 am, Moore, Catherine wrote: > Hi All, > > I acted as the Google Summer of Code Administrator in 2013 and I do not wish > to continue. > > There is an upcoming deadline (February 14th) for an organization to submit > their applications to the Google Summer of Code.Is there anyone who would > like to act as the gcc admin for 2014? > I assume that folks would like to have the gcc project continue to > participate; we need to find someone to submit the application and commit to > the admin duties. > > The bulk of the work is organizational. There are some web forms to fill > out, evaluations need to be completed, an irc meeting was required, plus > finding projects and mentors for the projects. > > I hope someone will pick this up. I want to admin GCC's GSoC this year. In the next several days I will be bugging past GCC GSoC admins and mentors to get an idea of what I'm getting myself into. Please send me a note if you haven't been GSoC mentor in the past years, but want to try this year. Thank you, -- Maxim Kuvyrkov www.linaro.org
Re: Google Summer of Code -- Admin needed
On Mon, Feb 10, 2014 at 6:51 PM, Maxim Kuvyrkov wrote: > > I want to admin GCC's GSoC this year. > > In the next several days I will be bugging past GCC GSoC admins and mentors > to get an idea of what I'm getting myself into. Please send me a note if you > haven't been GSoC mentor in the past years, but want to try this year. Thanks for doing this. I encourage all GCC maintainers to sign up to be a mentor. You don't have to actually mentor a student; it gives you the opportunity to comment on proposals. Ian