Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-10 Thread Peter Zijlstra
On Mon, Feb 10, 2014 at 01:27:51AM +0100, Torvald Riegel wrote:
> > Initial state: x == y == 0
> > 
> > T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
> > atomic_store_explicit(42, y, memory_order_relaxed);
> > if (r1 != 42)
> > atomic_store_explicit(r1, y, memory_order_relaxed);
> > 
> > T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
> > atomic_store_explicit(r2, x, memory_order_relaxed);
> 
> Intuitively, this is wrong because this let's the program take a step
> the abstract machine wouldn't do.  This is different to the sequential
> code that Peter posted because it uses atomics, and thus one can't
> easily assume that the difference is not observable.

Yeah, my bad for not being familiar with the atrocious crap C11 made of
atomics :/



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-10 Thread Peter Zijlstra
On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote:
> As near as I can tell, compiler writers hate the idea of prohibiting
> speculative-store optimizations because it requires them to introduce
> both control and data dependency tracking into their compilers.  Many of
> them seem to hate dependency tracking with a purple passion.  At least,
> such a hatred would go a long way towards explaining the incomplete
> and high-overhead implementations of memory_order_consume, the long
> and successful use of idioms based on the memory_order_consume pattern
> notwithstanding [*].  ;-)

Just tell them that because the hardware provides control dependencies
we actually use and rely on them.

Not that I expect they care too much what we do, given the current state
of things.


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-10 Thread Will Deacon
On Mon, Feb 10, 2014 at 11:48:13AM +, Peter Zijlstra wrote:
> On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote:
> > As near as I can tell, compiler writers hate the idea of prohibiting
> > speculative-store optimizations because it requires them to introduce
> > both control and data dependency tracking into their compilers.  Many of
> > them seem to hate dependency tracking with a purple passion.  At least,
> > such a hatred would go a long way towards explaining the incomplete
> > and high-overhead implementations of memory_order_consume, the long
> > and successful use of idioms based on the memory_order_consume pattern
> > notwithstanding [*].  ;-)
> 
> Just tell them that because the hardware provides control dependencies
> we actually use and rely on them.

s/control/address/ ?

Will


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-10 Thread Peter Zijlstra
On Mon, Feb 10, 2014 at 11:49:29AM +, Will Deacon wrote:
> On Mon, Feb 10, 2014 at 11:48:13AM +, Peter Zijlstra wrote:
> > On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote:
> > > As near as I can tell, compiler writers hate the idea of prohibiting
> > > speculative-store optimizations because it requires them to introduce
> > > both control and data dependency tracking into their compilers.  Many of
> > > them seem to hate dependency tracking with a purple passion.  At least,
> > > such a hatred would go a long way towards explaining the incomplete
> > > and high-overhead implementations of memory_order_consume, the long
> > > and successful use of idioms based on the memory_order_consume pattern
> > > notwithstanding [*].  ;-)
> > 
> > Just tell them that because the hardware provides control dependencies
> > we actually use and rely on them.
> 
> s/control/address/ ?

Nope, control.

Since stores cannot be speculated and thus require linear control flow
history we can use it to order LOAD -> STORE when the LOAD is required
for the control flow decision and the STORE depends on the control flow
path.

Also see commit 18c03c61444a211237f3d4782353cb38dba795df to
Documentation/memory-barriers.txt

---
commit c7f2e3cd6c1f4932ccc4135d050eae3f7c7aef63
Author: Peter Zijlstra 
Date:   Mon Nov 25 11:49:10 2013 +0100

perf: Optimize ring-buffer write by depending on control dependencies

Remove a full barrier from the ring-buffer write path by relying on
a control dependency to order a LOAD -> STORE scenario.

Cc: "Paul E. McKenney" 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/n/tip-8alv40z6ikk57jzbaobnx...@git.kernel.org
Signed-off-by: Ingo Molnar 

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index e8b168af135b..146a5792b1d2 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -61,19 +61,20 @@ static void perf_output_put_handle(struct 
perf_output_handle *handle)
 *
 *   kernel user
 *
-*   READ ->data_tail   READ ->data_head
-*   smp_mb()   (A) smp_rmb()   (C)
-*   WRITE $dataREAD $data
-*   smp_wmb()  (B) smp_mb()(D)
-*   STORE ->data_head  WRITE ->data_tail
+*   if (LOAD ->data_tail) {LOAD ->data_head
+*  (A) smp_rmb()   (C)
+*  STORE $data LOAD $data
+*  smp_wmb()   (B) smp_mb()(D)
+*  STORE ->data_head   STORE ->data_tail
+*   }
 *
 * Where A pairs with D, and B pairs with C.
 *
-* I don't think A needs to be a full barrier because we won't in fact
-* write data until we see the store from userspace. So we simply don't
-* issue the data WRITE until we observe it. Be conservative for now.
+* In our case (A) is a control dependency that separates the load of
+* the ->data_tail and the stores of $data. In case ->data_tail
+* indicates there is no room in the buffer to store $data we do not.
 *
-* OTOH, D needs to be a full barrier since it separates the data READ
+* D needs to be a full barrier since it separates the data READ
 * from the tail WRITE.
 *
 * For B a WMB is sufficient since it separates two WRITEs, and for C
@@ -81,7 +82,7 @@ static void perf_output_put_handle(struct perf_output_handle 
*handle)
 *
 * See perf_output_begin().
 */
-   smp_wmb();
+   smp_wmb(); /* B, matches C */
rb->user_page->data_head = head;
 
/*
@@ -144,17 +145,26 @@ int perf_output_begin(struct perf_output_handle *handle,
if (!rb->overwrite &&
unlikely(CIRC_SPACE(head, tail, perf_data_size(rb)) < size))
goto fail;
+
+   /*
+* The above forms a control dependency barrier separating the
+* @tail load above from the data stores below. Since the @tail
+* load is required to compute the branch to fail below.
+*
+* A, matches D; the full memory barrier userspace SHOULD issue
+* after reading the data and before storing the new tail
+* position.
+*
+* See perf_output_put_handle().
+*/
+
head += size;
} while (local_cmpxchg(&rb->head, offset, head) != offset);
 
/*
-* Separate the userpage->tail read from the data stores below.
-* Matches the MB userspace SHOULD issue after reading the data
-* and before storing the new tail position.
-*
-* See perf_output_put_handle().
+* We rely o

Re: LLVM collaboration?

2014-02-10 Thread Diego Novillo
On Fri, Feb 7, 2014 at 5:07 PM, Renato Golin  wrote:

> * GCC and LLVM collaboration / The Open Source Compiler Initiative
>
> With LLVM mature enough to feature as the default toolchain in some
> Unix distributions, and with the inherent (and profitable) share of
> solutions, ideas and code between the two, we need to start talking at
> a more profound level. There will always be problems that can't be
> included in any standard (language, extension, or machine-specific)
> and are intrinsic to the compilation infrastructure. For those, and
> other common problems, we need common solutions to at least both LLVM
> and GCC, but ideally any open source (and even closed source)
> toolchain. In this BoF session, we shall discuss to what extent this
> collaboration can take us, how we should start and what are the next
> steps to make this happen.

Looks good.  Registered.


Thanks. Diego.


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-10 Thread Paul E. McKenney
On Mon, Feb 10, 2014 at 11:49:29AM +, Will Deacon wrote:
> On Mon, Feb 10, 2014 at 11:48:13AM +, Peter Zijlstra wrote:
> > On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote:
> > > As near as I can tell, compiler writers hate the idea of prohibiting
> > > speculative-store optimizations because it requires them to introduce
> > > both control and data dependency tracking into their compilers.  Many of
> > > them seem to hate dependency tracking with a purple passion.  At least,
> > > such a hatred would go a long way towards explaining the incomplete
> > > and high-overhead implementations of memory_order_consume, the long
> > > and successful use of idioms based on the memory_order_consume pattern
> > > notwithstanding [*].  ;-)
> > 
> > Just tell them that because the hardware provides control dependencies
> > we actually use and rely on them.
> 
> s/control/address/ ?

Both are important, but as Peter's reply noted, it was control
dependencies under discussion.  Data dependencies (which include the
ARM/PowerPC notion of address dependencies) are called out by the standard
already, but control dependencies are not.  I am not all that satisified
by current implementations of data dependencies, admittedly.  Should
be an interesting discussion.  ;-)

Thanx, Paul



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-10 Thread Will Deacon
On Mon, Feb 10, 2014 at 03:04:43PM +, Paul E. McKenney wrote:
> On Mon, Feb 10, 2014 at 11:49:29AM +, Will Deacon wrote:
> > On Mon, Feb 10, 2014 at 11:48:13AM +, Peter Zijlstra wrote:
> > > On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote:
> > > > As near as I can tell, compiler writers hate the idea of prohibiting
> > > > speculative-store optimizations because it requires them to introduce
> > > > both control and data dependency tracking into their compilers.  Many of
> > > > them seem to hate dependency tracking with a purple passion.  At least,
> > > > such a hatred would go a long way towards explaining the incomplete
> > > > and high-overhead implementations of memory_order_consume, the long
> > > > and successful use of idioms based on the memory_order_consume pattern
> > > > notwithstanding [*].  ;-)
> > > 
> > > Just tell them that because the hardware provides control dependencies
> > > we actually use and rely on them.
> > 
> > s/control/address/ ?
> 
> Both are important, but as Peter's reply noted, it was control
> dependencies under discussion.  Data dependencies (which include the
> ARM/PowerPC notion of address dependencies) are called out by the standard
> already, but control dependencies are not.  I am not all that satisified
> by current implementations of data dependencies, admittedly.  Should
> be an interesting discussion.  ;-)

Ok, but since you can't use control dependencies to order LOAD -> LOAD, it's
a pretty big ask of the compiler to make use of them for things like
consume, where a data dependency will suffice for any combination of
accesses.

Will


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-10 Thread Linus Torvalds
On Sun, Feb 9, 2014 at 4:27 PM, Torvald Riegel  wrote:
>
> Intuitively, this is wrong because this let's the program take a step
> the abstract machine wouldn't do.  This is different to the sequential
> code that Peter posted because it uses atomics, and thus one can't
> easily assume that the difference is not observable.

Btw, what is the definition of "observable" for the atomics?

Because I'm hoping that it's not the same as for volatiles, where
"observable" is about the virtual machine itself, and as such volatile
accesses cannot be combined or optimized at all.

Now, I claim that atomic accesses cannot be done speculatively for
writes, and not re-done for reads (because the value could change),
but *combining* them would be possible and good.

For example, we often have multiple independent atomic accesses that
could certainly be combined: testing the individual bits of an atomic
value with helper functions, causing things like "load atomic, test
bit, load same atomic, test another bit". The two atomic loads could
be done as a single load without possibly changing semantics on a real
machine, but if "visibility" is defined in the same way it is for
"volatile", that wouldn't be a valid transformation. Right now we use
"volatile" semantics for these kinds of things, and they really can
hurt.

Same goes for multiple writes (possibly due to setting bits):
combining multiple accesses into a single one is generally fine, it's
*adding* write accesses speculatively that is broken by design..

At the same time, you can't combine atomic loads or stores infinitely
- "visibility" on a real machine definitely is about timeliness.
Removing all but the last write when there are multiple consecutive
writes is generally fine, even if you unroll a loop to generate those
writes. But if what remains is a loop, it might be a busy-loop
basically waiting for something, so it would be wrong ("untimely") to
hoist a store in a loop entirely past the end of the loop, or hoist a
load in a loop to before the loop.

Does the standard allow for that kind of behavior?

  Linus


Conditional execution over emit_move_insn

2014-02-10 Thread Wojciech Migda
Hi,

I'd like to hardcode conditional execution of emit_move_insn based on the 
predicate checking that the address in the destination argument is non-NULL.
The platform supports conditional execution, but doesn't have explicitly 
defined conditional moves (target=tic6x).
I have already tried to find any look-alike pieces in the gcc code tree but 
without success - I am new here.
As for the background - I am trying to work around the bug I submitted 
(id=60123) before there's an official patch for it available.

I appreciate any help.

Thanks,

Wojciech


Re: Fwd: LLVM collaboration?

2014-02-10 Thread Jan Hubicka
> 1. There IS an unnecessary fence between GCC and LLVM.
> 
> License arguments are one reason why we can't share code as easily as
> we would like, but there is no argument against sharing ideas,
> cross-reporting bugs, helping each other implement a better
> compiler/linker/assembler/libraries just because of an artificial
> wall. We need to break this wall.
> 
> I rarely see GCC folks reporting bugs on our side, or people saying
> "we should check with the GCC folks" actually doing it. We're not
> contagious folks, you know. Talking to GCC engineers won't make me a
> lesser LLVM engineer, and vice-versa.

One practical experience I have with LLVM developers is sharing experiences
about getting Firefox to work with LTO with Rafael Espindola and I think it was
useful for both of us. I am definitly open to more discussion.

Lets try a specific topic that is on my TODO list for some time.

I would like to make it possible for mutliple compilers to be used to LTO a
single binary. As we are all making LTO more useful, I think it is matter of
time until people will start shipping LTO object files by default and users
will end up feeding them into different compilers or incompatible version of
the same compiler. We probably want to make this work, even thought the
cross-module optimization will not happen in this case.

The plugin interface in binutils seems to do its job well both for GCC and LLVM
and I hope that open64 and ICC will eventually join, too.

The trouble however is that one needs to pass explicit --plugin argument
specifying the particular plugin to load and so GCC ships with its own wrappers
(gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar thing.

It may be smoother if binutils was able to load multiple plugins at once and
grab plugins from system and user installed compilers without explicit --plugin
argument.

Binutils probably should also have a way to detect LTO object files and produce
more useful diagnostic than they do now, when there is no plugin claiming them.

There are some PRs filled on the topic
http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=15300
http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=13227
but not much progress on them.

I wonder if we can get this designed and implemented.

On the other hand, GCC current maintains non-plugin path for LTO that is now
only used by darwin port due to lack of plugin enabled LD there.  It seems
that liblto used by darwin is losely compatible with the plugin API, but it 
makes
it harder to have different compilers share it (one has to LD_PRELOAD liblto
to different one prior executing the linker?)

I wonder, is there chance to implement linker plugin API to libLTO glue or add
plugin support to native Darwin tools?

Honza


Re: Google Summer of Code -- Admin needed

2014-02-10 Thread Maxim Kuvyrkov
On 6/02/2014, at 7:45 am, Moore, Catherine  wrote:

> Hi All,
> 
> I acted as the Google Summer of Code Administrator in 2013 and I do not wish 
> to continue.
> 
> There is an upcoming deadline (February 14th) for an organization to submit 
> their applications to the Google Summer of Code.Is there anyone who would 
> like to act as the gcc admin for 2014?
> I assume that folks would like to have the gcc project continue to 
> participate;  we need to find someone to submit the application and commit to 
> the admin duties.
> 
> The bulk of the work is organizational.  There are some web forms to fill 
> out, evaluations need to be completed, an irc meeting was required, plus 
> finding projects and mentors for the projects.
> 
> I hope someone will pick this up.

I want to admin GCC's GSoC this year.

In the next several days I will be bugging past GCC GSoC admins and mentors to 
get an idea of what I'm getting myself into.  Please send me a note if you 
haven't been GSoC mentor in the past years, but want to try this year.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org



Re: Google Summer of Code -- Admin needed

2014-02-10 Thread Ian Lance Taylor
On Mon, Feb 10, 2014 at 6:51 PM, Maxim Kuvyrkov
 wrote:
>
> I want to admin GCC's GSoC this year.
>
> In the next several days I will be bugging past GCC GSoC admins and mentors 
> to get an idea of what I'm getting myself into.  Please send me a note if you 
> haven't been GSoC mentor in the past years, but want to try this year.

Thanks for doing this.

I encourage all GCC maintainers to sign up to be a mentor.  You don't
have to actually mentor a student; it gives you the opportunity to
comment on proposals.

Ian