Re: Vectorizer Pragmas

2014-02-17 Thread Renato Golin
On 16 February 2014 23:44, Tim Prince  wrote:
> I don't think many people want to use both OpenMP 4 and older Intel
> directives together.

I'm having less and less incentives to use anything other than omp4,
cilk and whatever. I think we should be able to map all our internal
needs to those pragmas.

On the other hand, if you guys have any cross discussion with Intel
folks about it, I'd love to hear. Since our support for those
directives are a bit behind, would be good not to duplicate the
efforts in the long run.

Thanks!
--renato


Re: TYPE_BINFO and canonical types at LTO

2014-02-17 Thread Richard Biener
On Mon, 17 Feb 2014, Jan Hubicka wrote:

> > On Fri, 14 Feb 2014, Jan Hubicka wrote:
> > 
> > > > > This smells bad, since it is given a canonical type that is after the
> > > > > structural equivalency merging that ignores BINFOs, so it may be 
> > > > > completely
> > > > > different class with completely different bases than the original.  
> > > > > Bases are
> > > > > structuraly merged, too and may be exchanged for normal fields because
> > > > > DECL_ARTIFICIAL (that separate bases and fields) does not seem to be 
> > > > > part of
> > > > > the canonical type definition in LTO.
> > > > 
> > > > Can you elaborate on that DECL_ARTIFICIAL thing?  That is, what is 
> > > > broken
> > > > by considering all fields during that merging?
> > > 
> > > To make the code work with LTO, one can not merge 
> > > struct B {struct A a}
> > > struct B: A {}
> > > 
> > > these IMO differ only by DECL_ARTIFICIAL flag on the fields.
> > 
> > "The code" == that BINFO walk?  Is that because we walk a completely
> 
> Yes.
> 
> > unrelated BINFO chain?  I'd say we should have merged its types
> > so that difference shouldn't matter.
> > 
> > Hopefully ;)
> 
> I am trying to make point that will matter.  Here is completed testcase above:
> 
> struct A {int a;};
> struct C:A {};
> struct B {struct A a;};
> struct C *p2;
> struct B *p1;
> int
> t()
> {
>   p1->a.a = 2;
>   return p2->a;
> }
> 
> With patch I get:
> 
> Index: lto/lto.c
> ===
> --- lto/lto.c   (revision 20)
> +++ lto/lto.c   (working copy)
> @@ -49,6 +49,8 @@ along with GCC; see the file COPYING3.
>  #include "data-streamer.h"
>  #include "context.h"
>  #include "pass_manager.h"
> +#include "print-tree.h"
>  
>  
>  /* Number of parallel tasks to run, -1 if we want to use GNU Make jobserver. 
>  */
> @@ -619,6 +621,15 @@ gimple_canonical_type_eq (const void *p1
>  {
>const_tree t1 = (const_tree) p1;
>const_tree t2 = (const_tree) p2;
> +  if (gimple_canonical_types_compatible_p (CONST_CAST_TREE (t1),
> + CONST_CAST_TREE (t2))
> +  && TREE_CODE (CONST_CAST_TREE (t1)) == RECORD_TYPE)
> + {
> +   debug_tree (CONST_CAST_TREE (t1));
> +   fprintf (stderr, "bases:%i\n", BINFO_BASE_BINFOS (TYPE_BINFO 
> (t1))->length());
> +   debug_tree (CONST_CAST_TREE (t2));
> +   fprintf (stderr, "bases:%i\n", BINFO_BASE_BINFOS (TYPE_BINFO 
> (t2))->length());
> + }
>return gimple_canonical_types_compatible_p (CONST_CAST_TREE (t1),
>   CONST_CAST_TREE (t2));
>  }
> 
>   size  bitsizetype> constant 32>
> unit size  sizetype> constant 4>
> align 32 symtab 0 alias set -1 canonical type 0x76c52888
> fields  type  0x76ae83a0 32> unit size 
> align 32 symtab 0 alias set -1 canonical type 0x76c52738 
> fields  context  0x76af2e60 D.2821>
> chain >
> nonlocal SI file t.C line 3 col 20 size  32> unit size 
> align 32 offset_align 128
> offset 
> bit offset  context 
> 
> chain 
> nonlocal VOID file t.C line 3 col 10
> align 1 context  result 
> >> context  0x76af2e60 D.2821>
> pointer_to_this  chain  0x76c550b8 B>>
> bases:0
>   size  bitsizetype> constant 32>
> unit size  sizetype> constant 4>
> align 32 symtab 0 alias set -1 structural equality
> fields  type  0x76ae83a0 32> unit size 
> align 32 symtab 0 alias set -1 canonical type 0x76c52738 
> fields  context  0x76af2e60 D.2821>
> chain >
> ignored SI file t.C line 2 col 8 size  
> unit size 
> align 32 offset_align 128
> offset 
> bit offset  context 
> 
> chain 
> nonlocal VOID file t.C line 2 col 12
> align 1 context  result 
> >> context  0x76af2e60 D.2821>
> chain >
> bases:1
> 
> So we prevail structure B with structure C.  One has bases to walk other 
> doesn't. If that BINFO walk in alias.c (on canonical types) did 
> something useful, we have a wrong code bug.

Yeah, ok.  But we treat those types (B and C) TBAA equivalent because
structurally they are the same ;)  Luckily C has a "proper" field
for its base (proper means that offset and size are correct as well
as the type).  It indeed has DECL_ARTIFICIAL set and yes, we treat
those as "real" fields when doing the structural comparison.

More interesting is of course when we can re-use tail-padding in
one but not the other (works as expected - not merged).

struct A { A (); short x; bool a;};
struct C:A { bool b; };
struct B {struct A a; bool b;};
struct C *p2;
struct B *p1;
int
t()
{
  p1->a.a = 2;
  return p2->a;
}

> Yes, zero sized classes are those having no fields (but other stuff, 
> type decls, bases etc.)

Yeah, but TBAA obviously doesn't care about type decls and bases.

Richard.


Re: Need help: Is a VAR_DECL type builtin or not?

2014-02-17 Thread Dominik Vogt
On Fri, Feb 14, 2014 at 02:40:44PM +0100, Richard Biener wrote:
> On Fri, Feb 14, 2014 at 9:59 AM, Dominik Vogt  wrote:
> > Given a specific VAR_DECL tree node, I need to find out whether
> > its type is built in or not.  Up to now I have
> >
> >   tree tn = TYPE_NAME (TREE_TYPE (var_decl));
> >   if (tn != NULL_TREE && TREE_CODE (tn) == TYPE_DECL && DECL_NAME (tn))
> > {
> >   ...
> > }
> >
> > This if-condition is true for both,
> >
> >   int x;
> >   const int x;
> >   ...
> >
> > and
> >
> >   typedef int i_t;
> >   i_t x;
> >   const i_t x;
> >   ...
> >
> > I need to weed out the class of VAR_DECLs that directly use built
> > in types.
> 
> Try DECL_IS_BUILTIN.  But I question how you define "builtin" here?

Well, actually I'm working on the variable output function in
godump.c.  At the moment, if the code comes across

  typedef char c_t
  chat c1;
  c_t c2;

it emits

  type _c_t byte
  var c1 byte
  var c2 byte

This is fine for c1, but for c2 it should really use the type:

  var c2 _c_t

So the rule I'm trying to implement is:

  Given a Tree node that is a VAR_DECL, if its type is an "alias"
  (defined with typedef/union/struct/class etc.), use the name of
  the alias, otherwise resolve the type recursively until only
  types built into the language are left.

It's really only about the underlying data types (int, float,
_Complex etc.), not about storage classes, pointers, attributes,
qualifiers etc.

Well, since godump.c already caches all declarations it has come
across, I could assume that these declarations are not built-in
and use that in the "rule" above.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: Need help: Is a VAR_DECL type builtin or not?

2014-02-17 Thread Richard Biener
On Mon, Feb 17, 2014 at 1:15 PM, Dominik Vogt  wrote:
> On Fri, Feb 14, 2014 at 02:40:44PM +0100, Richard Biener wrote:
>> On Fri, Feb 14, 2014 at 9:59 AM, Dominik Vogt  
>> wrote:
>> > Given a specific VAR_DECL tree node, I need to find out whether
>> > its type is built in or not.  Up to now I have
>> >
>> >   tree tn = TYPE_NAME (TREE_TYPE (var_decl));
>> >   if (tn != NULL_TREE && TREE_CODE (tn) == TYPE_DECL && DECL_NAME (tn))
>> > {
>> >   ...
>> > }
>> >
>> > This if-condition is true for both,
>> >
>> >   int x;
>> >   const int x;
>> >   ...
>> >
>> > and
>> >
>> >   typedef int i_t;
>> >   i_t x;
>> >   const i_t x;
>> >   ...
>> >
>> > I need to weed out the class of VAR_DECLs that directly use built
>> > in types.
>>
>> Try DECL_IS_BUILTIN.  But I question how you define "builtin" here?
>
> Well, actually I'm working on the variable output function in
> godump.c.  At the moment, if the code comes across
>
>   typedef char c_t
>   chat c1;
>   c_t c2;
>
> it emits
>
>   type _c_t byte
>   var c1 byte
>   var c2 byte
>
> This is fine for c1, but for c2 it should really use the type:
>
>   var c2 _c_t
>
> So the rule I'm trying to implement is:
>
>   Given a Tree node that is a VAR_DECL, if its type is an "alias"
>   (defined with typedef/union/struct/class etc.), use the name of
>   the alias, otherwise resolve the type recursively until only
>   types built into the language are left.
>
> It's really only about the underlying data types (int, float,
> _Complex etc.), not about storage classes, pointers, attributes,
> qualifiers etc.
>
> Well, since godump.c already caches all declarations it has come
> across, I could assume that these declarations are not built-in
> and use that in the "rule" above.

Not sure what GO presents us as location info, but DECL_IS_BUILTIN
looks if the line the type was declared is sth "impossible" (reserved
and supposed to be used for all types that do not have to be declared).

Richard.

> Ciao
>
> Dominik ^_^  ^_^
>
> --
>
> Dominik Vogt
> IBM Germany
>


Re: Vectorizer Pragmas

2014-02-17 Thread Tim Prince


On 2/17/2014 4:42 AM, Renato Golin wrote:

On 16 February 2014 23:44, Tim Prince  wrote:

I don't think many people want to use both OpenMP 4 and older Intel
directives together.

I'm having less and less incentives to use anything other than omp4,
cilk and whatever. I think we should be able to map all our internal
needs to those pragmas.

On the other hand, if you guys have any cross discussion with Intel
folks about it, I'd love to hear. Since our support for those
directives are a bit behind, would be good not to duplicate the
efforts in the long run.


I'm continuing discussions with former Intel colleagues.  If you are 
asking for insight into how Intel priorities vary over time, I don't 
expect much, unless the next beta compiler provides some inferences.  
They have talked about implementing all of OpenMP 4.0 except user 
defined reduction this year.  That would imply more activity in that 
area than on cilkplus, although some fixes have come in the latter.  On 
the other hand I had an issue on omp simd reduction(max: ) closed with 
the decision "will not be fixed."
I have an icc problem report in on fixing omp simd safelen so it is more 
like the standard and less like the obsolete pragma simd vectorlength.  
Also, I have some problem reports active attempting to get clarification 
of their omp target implementation.


You may have noticed that omp parallel for simd in current Intel 
compilers can be used for combined thread and simd parallelism, 
including the case where the outer loop is parallelizable and 
vectorizable but the inner one is not.


--
Tim Prince



Re: Vectorizer Pragmas

2014-02-17 Thread Renato Golin
On 17 February 2014 14:47, Tim Prince  wrote:
> I'm continuing discussions with former Intel colleagues.  If you are asking
> for insight into how Intel priorities vary over time, I don't expect much,
> unless the next beta compiler provides some inferences.  They have talked
> about implementing all of OpenMP 4.0 except user defined reduction this
> year.  That would imply more activity in that area than on cilkplus,

I'm expecting this. Any proposal to support Cilk in LLVM would be
purely temporary and not endorsed in any way.


> although some fixes have come in the latter.  On the other hand I had an
> issue on omp simd reduction(max: ) closed with the decision "will not be
> fixed."

We still haven't got pragmas for induction/reduction logic, so I'm not
too worried about them.


> I have an icc problem report in on fixing omp simd safelen so it is more
> like the standard and less like the obsolete pragma simd vectorlength.

Our width metadata is slightly different in that it means "try to use
that length", rather than "it's safe to use that length", this is why
I'm holding on use safelen for the moment.


> Also, I have some problem reports active attempting to get clarification of
> their omp target implementation.

Same here... RTFM is not enough in this case. ;)


> You may have noticed that omp parallel for simd in current Intel compilers
> can be used for combined thread and simd parallelism, including the case
> where the outer loop is parallelizable and vectorizable but the inner one is
> not.

That's my fear of going with omp simd directly. I don't want to be
throwing threads all over the place when all I really want is vector
code.

For the time, my proposal is to use legacy pragmas: vector/novector,
unroll/nounroll and simd vectorlength which map nicely to the metadata
we already have and don't incur in OpenMP overhead. Later on, if
OpenMP ends up with simple non-threaded pragmas, we should use those
and deprecate the legacy ones.

If GCC is trying to do the same thing regarding non-threaded-vector
code, I'd be glad to be involved in the discussion. Some LLVM folks
think this should be an OpenMP discussion, I personally think it's
pushing the boundaries a bit too much on an inherently threaded
library extension.

cheers,
--renato


Re: [RFC] Offloading Support in libgomp

2014-02-17 Thread Ilya Verbin
On 14 Feb 16:43, Jakub Jelinek wrote:
> So, perhaps we should just stop for now oring the copyfrom in and just use
> the copyfrom from the very first mapping only, and wait for what the committee
> actually agrees on.
> 
>   Jakub

Like this?

@@ -171,11 +171,16 @@ gomp_map_vars_existing (splay_tree_key oldn, 
splay_tree_key newn,
"[%p..%p) is already mapped",
(void *) newn->host_start, (void *) newn->host_end,
(void *) oldn->host_start, (void *) oldn->host_end);
+#if 0
+  /* FIXME: Remove this when OpenMP 4.0 will be standardized.  Currently it's
+ unclear regarding overwriting copy_from for the existing mapping.
+ See http://gcc.gnu.org/ml/gcc/2014-02/msg00208.html for details.  */
   if (((kind & 7) == 2 || (kind & 7) == 3)
   && !oldn->copy_from
   && oldn->host_start == newn->host_start
   && oldn->host_end == newn->host_end)
 oldn->copy_from = true;
+#endif
   oldn->refcount++;
 }

  -- Ilya


Re: [RFC] Offloading Support in libgomp

2014-02-17 Thread Jakub Jelinek
On Mon, Feb 17, 2014 at 07:59:16PM +0400, Ilya Verbin wrote:
> On 14 Feb 16:43, Jakub Jelinek wrote:
> > So, perhaps we should just stop for now oring the copyfrom in and just use
> > the copyfrom from the very first mapping only, and wait for what the 
> > committee
> > actually agrees on.
> > 
> > Jakub
> 
> Like this?
> 
> @@ -171,11 +171,16 @@ gomp_map_vars_existing (splay_tree_key oldn, 
> splay_tree_key newn,
>   "[%p..%p) is already mapped",
>   (void *) newn->host_start, (void *) newn->host_end,
>   (void *) oldn->host_start, (void *) oldn->host_end);
> +#if 0
> +  /* FIXME: Remove this when OpenMP 4.0 will be standardized.  Currently it's
> + unclear regarding overwriting copy_from for the existing mapping.
> + See http://gcc.gnu.org/ml/gcc/2014-02/msg00208.html for details.  */
>if (((kind & 7) == 2 || (kind & 7) == 3)
>&& !oldn->copy_from
>&& oldn->host_start == newn->host_start
>&& oldn->host_end == newn->host_end)
>  oldn->copy_from = true;
> +#endif
>oldn->refcount++;
>  }

Well, OpenMP 4.0 is a released standard, just in some cases ambiguous or
buggy.  I'd just remove the code rather than putting it into #if 0, patch
preapproved.  It will stay in the SVN history...

Jakub


Re: Need help: Is a VAR_DECL type builtin or not?

2014-02-17 Thread Ian Lance Taylor
On Mon, Feb 17, 2014 at 5:28 AM, Richard Biener
 wrote:
> On Mon, Feb 17, 2014 at 1:15 PM, Dominik Vogt  wrote:
>> On Fri, Feb 14, 2014 at 02:40:44PM +0100, Richard Biener wrote:
>>> On Fri, Feb 14, 2014 at 9:59 AM, Dominik Vogt  
>>> wrote:
>>> > Given a specific VAR_DECL tree node, I need to find out whether
>>> > its type is built in or not.  Up to now I have
>>> >
>>> >   tree tn = TYPE_NAME (TREE_TYPE (var_decl));
>>> >   if (tn != NULL_TREE && TREE_CODE (tn) == TYPE_DECL && DECL_NAME (tn))
>>> > {
>>> >   ...
>>> > }
>>> >
>>> > This if-condition is true for both,
>>> >
>>> >   int x;
>>> >   const int x;
>>> >   ...
>>> >
>>> > and
>>> >
>>> >   typedef int i_t;
>>> >   i_t x;
>>> >   const i_t x;
>>> >   ...
>>> >
>>> > I need to weed out the class of VAR_DECLs that directly use built
>>> > in types.
>>>
>>> Try DECL_IS_BUILTIN.  But I question how you define "builtin" here?
>>
>> Well, actually I'm working on the variable output function in
>> godump.c.  At the moment, if the code comes across
>>
>>   typedef char c_t
>>   chat c1;
>>   c_t c2;
>>
>> it emits
>>
>>   type _c_t byte
>>   var c1 byte
>>   var c2 byte
>>
>> This is fine for c1, but for c2 it should really use the type:
>>
>>   var c2 _c_t
>>
>> So the rule I'm trying to implement is:
>>
>>   Given a Tree node that is a VAR_DECL, if its type is an "alias"
>>   (defined with typedef/union/struct/class etc.), use the name of
>>   the alias, otherwise resolve the type recursively until only
>>   types built into the language are left.
>>
>> It's really only about the underlying data types (int, float,
>> _Complex etc.), not about storage classes, pointers, attributes,
>> qualifiers etc.
>>
>> Well, since godump.c already caches all declarations it has come
>> across, I could assume that these declarations are not built-in
>> and use that in the "rule" above.
>
> Not sure what GO presents us as location info, but DECL_IS_BUILTIN
> looks if the line the type was declared is sth "impossible" (reserved
> and supposed to be used for all types that do not have to be declared).

godump.c is actually not used by the Go frontend.  The purpose of
godump.c is to read C header files and dump them in a Go
representation.  It's used when building the Go library, to get Go
versions of system structures like struct stat.

I'm not quite sure what Dominik is after.  For system structures using
the basic type, the underlying type of a typedef, is normally what you
want.  But to answer the question as stated, I think I would look at
functions like is_naming_typedef_decl in dwarf2out.c, since this
sounds like the kind of question that debug info needs to sort out.

Ian


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Paul E. McKenney
On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote:
> > You need volatile semantics to force the compiler to ignore any proofs
> > it might otherwise attempt to construct.  Hence all the ACCESS_ONCE()
> > calls in my email to Torvald.  (Hopefully I translated your example
> > reasonably.)
> 
> My brain gave out for today; but it did appear to have the right
> structure.

I can relate.  ;-)

> I would prefer it C11 would not require the volatile casts. It should
> simply _never_ speculate with atomic writes, volatile or not.

I agree with not needing volatiles to prevent speculated writes.  However,
they will sometimes be needed to prevent excessive load/store combining.
The compiler doesn't have the runtime feedback mechanisms that the
hardware has, and thus will need help from the developer from time
to time.

Or maybe the Linux kernel simply waits to transition to C11 relaxed atomics
until the compiler has learned to be sufficiently conservative in its
load-store combining decisions.

Thanx, Paul



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Joseph S. Myers
On Sat, 15 Feb 2014, Torvald Riegel wrote:

> glibc is a counterexample that comes to mind, although it's a smaller
> code base.  (It's currently not using C11 atomics, but transitioning
> there makes sense, and some thing I want to get to eventually.)

glibc is using C11 atomics (GCC builtins rather than _Atomic / 
, but using __atomic_* with explicitly specified memory model 
rather than the older __sync_*) on AArch64, plus in certain cases on ARM 
and MIPS.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Will Deacon
On Mon, Feb 17, 2014 at 06:59:31PM +, Joseph S. Myers wrote:
> On Sat, 15 Feb 2014, Torvald Riegel wrote:
> 
> > glibc is a counterexample that comes to mind, although it's a smaller
> > code base.  (It's currently not using C11 atomics, but transitioning
> > there makes sense, and some thing I want to get to eventually.)
> 
> glibc is using C11 atomics (GCC builtins rather than _Atomic / 
> , but using __atomic_* with explicitly specified memory model 
> rather than the older __sync_*) on AArch64, plus in certain cases on ARM 
> and MIPS.

Hmm, actually that results in a change in behaviour for the __sync_*
primitives on AArch64. The documentation for those states that:

  `In most cases, these built-in functions are considered a full barrier. That
  is, no memory operand is moved across the operation, either forward or
  backward. Further, instructions are issued as necessary to prevent the
  processor from speculating loads across the operation and from queuing stores
  after the operation.'

which is stronger than simply mapping them to memory_model_seq_cst, which
seems to be what the AArch64 compiler is doing (so you get acquire + release
instead of a full fence).

Will


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel
On Mon, 2014-02-17 at 18:59 +, Joseph S. Myers wrote:
> On Sat, 15 Feb 2014, Torvald Riegel wrote:
> 
> > glibc is a counterexample that comes to mind, although it's a smaller
> > code base.  (It's currently not using C11 atomics, but transitioning
> > there makes sense, and some thing I want to get to eventually.)
> 
> glibc is using C11 atomics (GCC builtins rather than _Atomic / 
> , but using __atomic_* with explicitly specified memory model 
> rather than the older __sync_*) on AArch64, plus in certain cases on ARM 
> and MIPS.

I think the major steps remaining is moving the other architectures
over, and rechecking concurrent code (e.g., for the code that I have
seen, it was either asm variants (eg, on x86), or built before C11; ARM
pthread_once was lacking memory_barriers (see "pthread_once unification"
patches I posted)).  We also need/should to move towards using
relaxed-MO atomic loads instead of plain loads.





Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel
On Sat, 2014-02-15 at 10:49 -0800, Linus Torvalds wrote:
> On Sat, Feb 15, 2014 at 9:45 AM, Torvald Riegel  wrote:
> >
> > I think a major benefit of C11's memory model is that it gives a
> > *precise* specification for how a compiler is allowed to optimize.
> 
> Clearly it does *not*. This whole discussion is proof of that. It's
> not at all clear,

It might not be an easy-to-understand specification, but as far as I'm
aware it is precise.  The Cambridge group's formalization certainly is
precise.  From that, one can derive (together with the usual rules for
as-if etc.) what a compiler is allowed to do (assuming that the standard
is indeed precise).  My replies in this discussion have been based on
reasoning about the standard, and not secret knowledge (with the
exception of no-out-of-thin-air, which is required in the standard's
prose but not yet formalized).

I agree that I'm using the formalization as a kind of placeholder for
the standard's prose (which isn't all that easy to follow for me
either), but I guess there's no way around an ISO standard using prose.

If you see a case in which the standard isn't precise, please bring it
up or open a C++ CWG issue for it.

> and the standard apparently is at least debatably
> allowing things that shouldn't be allowed.

Which example do you have in mind here?  Haven't we resolved all the
debated examples, or did I miss any?

> It's also a whole lot more
> complicated than "volatile", so the likelihood of a compiler writer
> actually getting it right - even if the standard does - is lower.

It's not easy, that's for sure, but none of the high-performance
alternatives are easy either.  There are testing tools out there based
on the formalization of the model, and we've found bugs with them.

And the alternative of using something not specified by the standard is
even worse, I think, because then you have to guess what a compiler
might do, without having any constraints; IOW, one is resorting to "no
sane compiler would do that", and that doesn't seem to very robust
either.




Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds
On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel  wrote:
>
> Which example do you have in mind here?  Haven't we resolved all the
> debated examples, or did I miss any?

Well, Paul seems to still think that the standard possibly allows
speculative writes or possibly value speculation in ways that break
the hardware-guaranteed orderings.

And personally, I can't read standards paperwork. It is invariably
written in some basically impossible-to-understand lawyeristic mode,
and then it is read by people (compiler writers) that intentionally
try to mis-use the words and do language-lawyering ("that depends on
what the meaning of 'is' is"). The whole "lvalue vs rvalue expression
vs 'what is a volatile access'" thing for C++ was/is a great example
of that.

So quite frankly, as a result I refuse to have anything to do with the
process directly.

 Linus


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Paul E. McKenney
On Mon, Feb 17, 2014 at 08:55:47PM +0100, Torvald Riegel wrote:
> On Sat, 2014-02-15 at 10:49 -0800, Linus Torvalds wrote:
> > On Sat, Feb 15, 2014 at 9:45 AM, Torvald Riegel  wrote:
> > >
> > > I think a major benefit of C11's memory model is that it gives a
> > > *precise* specification for how a compiler is allowed to optimize.
> > 
> > Clearly it does *not*. This whole discussion is proof of that. It's
> > not at all clear,
> 
> It might not be an easy-to-understand specification, but as far as I'm
> aware it is precise.  The Cambridge group's formalization certainly is
> precise.  From that, one can derive (together with the usual rules for
> as-if etc.) what a compiler is allowed to do (assuming that the standard
> is indeed precise).  My replies in this discussion have been based on
> reasoning about the standard, and not secret knowledge (with the
> exception of no-out-of-thin-air, which is required in the standard's
> prose but not yet formalized).
> 
> I agree that I'm using the formalization as a kind of placeholder for
> the standard's prose (which isn't all that easy to follow for me
> either), but I guess there's no way around an ISO standard using prose.
> 
> If you see a case in which the standard isn't precise, please bring it
> up or open a C++ CWG issue for it.

I suggest that I go through the Linux kernel's requirements for atomics
and memory barriers and see how they map to C11 atomics.  With that done,
we would have very specific examples to go over.  Without that done, the
discussion won't converge very well.

Seem reasonable?

Thanx, Paul

> > and the standard apparently is at least debatably
> > allowing things that shouldn't be allowed.
> 
> Which example do you have in mind here?  Haven't we resolved all the
> debated examples, or did I miss any?
> 
> > It's also a whole lot more
> > complicated than "volatile", so the likelihood of a compiler writer
> > actually getting it right - even if the standard does - is lower.
> 
> It's not easy, that's for sure, but none of the high-performance
> alternatives are easy either.  There are testing tools out there based
> on the formalization of the model, and we've found bugs with them.
> 
> And the alternative of using something not specified by the standard is
> even worse, I think, because then you have to guess what a compiler
> might do, without having any constraints; IOW, one is resorting to "no
> sane compiler would do that", and that doesn't seem to very robust
> either.
> 
> 



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Richard Biener
On February 17, 2014 7:18:15 PM GMT+01:00, "Paul E. McKenney" 
 wrote:
>On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote:
>> On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote:
>> > You need volatile semantics to force the compiler to ignore any
>proofs
>> > it might otherwise attempt to construct.  Hence all the
>ACCESS_ONCE()
>> > calls in my email to Torvald.  (Hopefully I translated your example
>> > reasonably.)
>> 
>> My brain gave out for today; but it did appear to have the right
>> structure.
>
>I can relate.  ;-)
>
>> I would prefer it C11 would not require the volatile casts. It should
>> simply _never_ speculate with atomic writes, volatile or not.
>
>I agree with not needing volatiles to prevent speculated writes. 
>However,
>they will sometimes be needed to prevent excessive load/store
>combining.
>The compiler doesn't have the runtime feedback mechanisms that the
>hardware has, and thus will need help from the developer from time
>to time.
>
>Or maybe the Linux kernel simply waits to transition to C11 relaxed
>atomics
>until the compiler has learned to be sufficiently conservative in its
>load-store combining decisions.

Sounds backwards. Currently the compiler does nothing to the atomics. I'm sure 
we'll eventually add something. But if testing coverage is zero outside then 
surely things get worse, not better with time.

Richard.

>   Thanx, Paul




FreeBSD users of gcc

2014-02-17 Thread Loren James Rittle
Greetings,

I am the named maintainer of the freebsd port.  I have been for
approximately twelve years; although I haven't been very active the
last four years.

The last major work I put into the freebsd port was at the end of
2009.  I have reviewed others' patches since then; but it really
hasn't required anything major since David O'Brien and I did
foundational work in the early 200Xs (which itself was based on many
others' foundations).  Gerald Pfeifer has also done much to keep the
port in a good shape.  (I also don't want to ignore the many patches
that came from members of the FreeBSD core team and other FreeBSD
users.)

To complicate matters, I haven't been using FreeBSD on my primary
desktop or otherwise since early 2011.

FreeBSD is listed as a tier one platform.  Therefore, I am looking for
someone that both the GCC steering committee and I would be willing to
hand over the reigns before I drop my officially-listed
maintainership.

The expected person will likely already have Write After Approval status.

Please contact me directly, if you are qualified and interested in
becoming the freebsd OS port maintainer.

Regards,
Loren


Re: TYPE_BINFO and canonical types at LTO

2014-02-17 Thread Jan Hubicka
> 
> Yeah, ok.  But we treat those types (B and C) TBAA equivalent because
> structurally they are the same ;)  Luckily C has a "proper" field
> for its base (proper means that offset and size are correct as well
> as the type).  It indeed has DECL_ARTIFICIAL set and yes, we treat
> those as "real" fields when doing the structural comparison.

Yep, the difference is that depending if C or D win, we will end up walking the
BINFO or not.  So we should not depend on the BINFo walk for correctness.
> 
> More interesting is of course when we can re-use tail-padding in
> one but not the other (works as expected - not merged).

Yep.
> 
> struct A { A (); short x; bool a;};
> struct C:A { bool b; };
> struct B {struct A a; bool b;};
> struct C *p2;
> struct B *p1;
> int
> t()
> {
>   p1->a.a = 2;
>   return p2->a;
> }
> 
> > Yes, zero sized classes are those having no fields (but other stuff, 
> > type decls, bases etc.)
> 
> Yeah, but TBAA obviously doesn't care about type decls and bases.

So I guess the conclussion is that the BINFO walk in alias.c is pointless?

Concerning the merging details and LTO aliasing, I think for 4.10 we should
make C++ to compute mangled names of types (i.e. call DECL_ASSEMBLER_NAME on
the associated type_decl + explicitly mark that type is driven by ODR) and then
we can do merging driven by ODR rule.

Non-ODR types born from other frontends will then need to be made to alias all
the ODR variants that can be done by storing them into the current canonical 
type hash.
(I wonder if we want to support cross language aliasing for non-POD?)

I also think we want explicit representation of types known to be local to 
compilation
unit - anonymous namespaces in C/C++, types defined within function bodies in C 
and
god knows what in Ada/Fortran/Java.

Honza
> 
> Richard.


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel
On Mon, 2014-02-17 at 12:23 -0800, Paul E. McKenney wrote:
> On Mon, Feb 17, 2014 at 08:55:47PM +0100, Torvald Riegel wrote:
> > On Sat, 2014-02-15 at 10:49 -0800, Linus Torvalds wrote:
> > > On Sat, Feb 15, 2014 at 9:45 AM, Torvald Riegel  
> > > wrote:
> > > >
> > > > I think a major benefit of C11's memory model is that it gives a
> > > > *precise* specification for how a compiler is allowed to optimize.
> > > 
> > > Clearly it does *not*. This whole discussion is proof of that. It's
> > > not at all clear,
> > 
> > It might not be an easy-to-understand specification, but as far as I'm
> > aware it is precise.  The Cambridge group's formalization certainly is
> > precise.  From that, one can derive (together with the usual rules for
> > as-if etc.) what a compiler is allowed to do (assuming that the standard
> > is indeed precise).  My replies in this discussion have been based on
> > reasoning about the standard, and not secret knowledge (with the
> > exception of no-out-of-thin-air, which is required in the standard's
> > prose but not yet formalized).
> > 
> > I agree that I'm using the formalization as a kind of placeholder for
> > the standard's prose (which isn't all that easy to follow for me
> > either), but I guess there's no way around an ISO standard using prose.
> > 
> > If you see a case in which the standard isn't precise, please bring it
> > up or open a C++ CWG issue for it.
> 
> I suggest that I go through the Linux kernel's requirements for atomics
> and memory barriers and see how they map to C11 atomics.  With that done,
> we would have very specific examples to go over.  Without that done, the
> discussion won't converge very well.
> 
> Seem reasonable?

Sounds good!



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel
On Mon, 2014-02-17 at 12:18 -0800, Linus Torvalds wrote:
> On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel  wrote:
> >
> > Which example do you have in mind here?  Haven't we resolved all the
> > debated examples, or did I miss any?
> 
> Well, Paul seems to still think that the standard possibly allows
> speculative writes or possibly value speculation in ways that break
> the hardware-guaranteed orderings.

That's true, I just didn't see any specific examples so far.

> And personally, I can't read standards paperwork. It is invariably
> written in some basically impossible-to-understand lawyeristic mode,

Yeah, it's not the most intuitive form for things like the memory model.

> and then it is read by people (compiler writers) that intentionally
> try to mis-use the words and do language-lawyering ("that depends on
> what the meaning of 'is' is").

That assumption about people working on compilers is a little too broad,
don't you think?

I think that it is important to stick to a specification, in the same
way that one wouldn't expect a program with undefined behavior make any
sense of it, magically, in cases where stuff is undefined.

However, that of course doesn't include trying to exploit weasel-wording
(BTW, both users and compiler writers try to do it).  IMHO,
weasel-wording in a standard is a problem in itself even if not
exploited, and often it indicates that there is a real issue.  There
might be reasons to have weasel-wording (e.g., because there's no known
better way to express it like in case of the not really precise
no-out-of-thin-air rule today), but nonetheless those aren't ideal.

> The whole "lvalue vs rvalue expression
> vs 'what is a volatile access'" thing for C++ was/is a great example
> of that.

I'm not aware of the details of this.

> So quite frankly, as a result I refuse to have anything to do with the
> process directly.

That's unfortunate.  Then please work with somebody that isn't
uncomfortable with participating directly in the process.  But be
warned, it may very well be a person working on compilers :)

Have you looked at the formalization of the model by Batty et al.?  The
overview of this is prose, but the formalized model itself is all formal
relations and logic.  So there should be no language-lawyering issues
with that form.  (For me, the formalized model is much easier to reason
about.)



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds
On Mon, Feb 17, 2014 at 1:21 PM, Torvald Riegel  wrote:
> On Mon, 2014-02-17 at 12:18 -0800, Linus Torvalds wrote:
>> and then it is read by people (compiler writers) that intentionally
>> try to mis-use the words and do language-lawyering ("that depends on
>> what the meaning of 'is' is").
>
> That assumption about people working on compilers is a little too broad,
> don't you think?

Let's just say that *some* are that way, and those are the ones that I
end up butting heads with.

The sane ones I never have to argue with - point them at a bug, and
they just say "yup, bug". The insane ones say "we don't need to fix
that, because if you read this copy of the standards that have been
translated to chinese and back, it clearly says that this is
acceptable".

>> The whole "lvalue vs rvalue expression
>> vs 'what is a volatile access'" thing for C++ was/is a great example
>> of that.
>
> I'm not aware of the details of this.

The argument was that an lvalue doesn't actually "access" the memory
(an rvalue does), so this:

   volatile int *p = ...;

   *p;

doesn't need to generate a load from memory, because "*p" is still an
lvalue (since you could assign things to it).

This isn't an issue in C, because in C, expression statements are
always rvalues, but C++ changed that. The people involved with the C++
standards have generally been totally clueless about their subtle
changes.

I may have misstated something, but basically some C++ people tried
very hard to make "volatile" useless.

We had other issues too. Like C compiler people who felt that the
type-based aliasing should always override anything else, even if the
variable accessed (through different types) was statically clearly
aliasing and used the exact same pointer. That made it impossible to
do a syntactically clean model of "this aliases", since the _only_
exception to the type-based aliasing rule was to generate a union for
every possible access pairing.

We turned off type-based aliasing (as I've mentioned before, I think
it's a fundamentally broken feature to begin with, and a horrible
horrible hack that adds no value for anybody but the HPC people).

Gcc eventually ended up having some sane syntax for overriding it, but
by then I was too disgusted with the people involved to even care.

   Linus


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel
On Sat, 2014-02-15 at 11:15 -0800, Linus Torvalds wrote:
> On Sat, Feb 15, 2014 at 9:30 AM, Torvald Riegel  wrote:
> >
> > I think the example is easy to misunderstand, because the context isn't
> > clear.  Therefore, let me first try to clarify the background.
> >
> > (1) The abstract machine does not write speculatively.
> > (2) Emitting a branch instruction and executing a branch at runtime is
> > not part of the specified behavior of the abstract machine.  Of course,
> > the abstract machine performs conditional execution, but that just
> > specifies the output / side effects that it must produce (e.g., volatile
> > stores) -- not with which hardware instructions it is producing this.
> > (3) A compiled program must produce the same output as if executed by
> > the abstract machine.
> 
> Ok, I'm fine with that.
> 
> > Thus, we need to be careful what "speculative store" is meant to refer
> > to.  A few examples:
> >
> > if (atomic_load(&x, mo_relaxed) == 1)
> >   atomic_store(&y, 3, mo_relaxed));
> 
> No, please don't use this idiotic example. It is wrong.

It won't be useful in practice in a lot of cases, but that doesn't mean
it's wrong.  It's clearly not illegal code.  It also serves a purpose: a
simple example to reason about a few aspects of the memory model.

> The fact is, if a compiler generates anything but the obvious sequence
> (read/cmp/branch/store - where branch/store might obviously be done
> with some other machine conditional like a predicate), the compiler is
> wrong.

Why?  I've reasoned why (1) to (3) above allow in certain cases (i.e.,
the first load always returning 1) for the branch (or other machine
conditional) to not be emitted.  So please either poke holes into this
reasoning, or clarify that you don't in fact, contrary to what you wrote
above, agree with (1) to (3).

> Anybody who argues anything else is wrong, or confused, or confusing.

I appreciate your opinion, and maybe I'm just one of the three things
above (my vote is on "confusing").  But without you saying why doesn't
help me see what's the misunderstanding here.

> Instead, argue about *other* sequences where the compiler can do something.

I'd prefer if we could clarify the misunderstanding for the simple case
first that doesn't involve stronger ordering requirements in the form of
non-relaxed MOs.

> For example, this sequence:
> 
>atomic_store(&x, a, mo_relaxed);
>b = atomic_load(&x, mo_relaxed);
> 
> can validly be transformed to
> 
>atomic_store(&x, a, mo_relaxed);
>b = (typeof(x)) a;
> 
> and I think everybody agrees about that. In fact, that optimization
> can be done even for mo_strict.

Yes.

> But even that "obvious" optimization has subtle cases. What if the
> store is relaxed, but the load is strict? You can't do the
> optimization without a lot of though, because dropping the strict load
> would drop an ordering point. So even the "store followed by exact
> same load" case has subtle issues.

Yes if a compiler wants to optimize that, it has to give it more
thought.  My gut feeling is that either the store should get the
stronger ordering, or the accesses should be merged.  But I'd have to
think more about that one (which I can do on request).

> With similar caveats, it is perfectly valid to merge two consecutive
> loads, and to merge two consecutive stores.
> 
> Now that means that the sequence
> 
> atomic_store(&x, 1, mo_relaxed);
> if (atomic_load(&x, mo_relaxed) == 1)
> atomic_store(&y, 3, mo_relaxed);
> 
> can first be optimized to
> 
> atomic_store(&x, 1, mo_relaxed);
> if (1 == 1)
> atomic_store(&y, 3, mo_relaxed);
> 
> and then you get the end result that you wanted in the first place
> (including the ability to re-order the two stores due to the relaxed
> ordering, assuming they can be proven to not alias - and please don't
> use the idiotic type-based aliasing rules).
> 
> Bringing up your first example is pure and utter confusion.

Sorry if it was confusing.  But then maybe we need to talk about it
more, because it shouldn't be confusing if we agree on what the memory
model allows and what not.  I had originally picked the example because
it was related to the example Paul/Peter brought up.

> Don't do
> it. Instead, show what are obvious and valid transformations, and then
> you can bring up these kinds of combinations as "look, this is
> obviously also correct".

I have my doubts whether the best way to reason about the memory model
is by thinking about specific compiler transformations.  YMMV,
obviously.

The -- kind of vague -- reason is that the allowed transformations will
be more complicated to reason about than the allowed output of a
concurrent program when understanding the memory model (ie, ordering and
interleaving of memory accesses, etc.).  However, I can see that when
trying to optimize with a hardware memory model in mind, this might look
appealing.

What the compiler will do is exploiting knowledge about all possible
executions

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Paul E. McKenney
On Mon, Feb 17, 2014 at 09:39:54PM +0100, Richard Biener wrote:
> On February 17, 2014 7:18:15 PM GMT+01:00, "Paul E. McKenney" 
>  wrote:
> >On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote:
> >> On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote:
> >> > You need volatile semantics to force the compiler to ignore any
> >proofs
> >> > it might otherwise attempt to construct.  Hence all the
> >ACCESS_ONCE()
> >> > calls in my email to Torvald.  (Hopefully I translated your example
> >> > reasonably.)
> >> 
> >> My brain gave out for today; but it did appear to have the right
> >> structure.
> >
> >I can relate.  ;-)
> >
> >> I would prefer it C11 would not require the volatile casts. It should
> >> simply _never_ speculate with atomic writes, volatile or not.
> >
> >I agree with not needing volatiles to prevent speculated writes. 
> >However,
> >they will sometimes be needed to prevent excessive load/store
> >combining.
> >The compiler doesn't have the runtime feedback mechanisms that the
> >hardware has, and thus will need help from the developer from time
> >to time.
> >
> >Or maybe the Linux kernel simply waits to transition to C11 relaxed
> >atomics
> >until the compiler has learned to be sufficiently conservative in its
> >load-store combining decisions.
> 
> Sounds backwards. Currently the compiler does nothing to the atomics. I'm 
> sure we'll eventually add something. But if testing coverage is zero outside 
> then surely things get worse, not better with time.

Perhaps we solve this chicken-and-egg problem by creating a test suite?

Thanx, Paul



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel
On Mon, 2014-02-17 at 14:02 -0800, Linus Torvalds wrote:
> On Mon, Feb 17, 2014 at 1:21 PM, Torvald Riegel  wrote:
> > On Mon, 2014-02-17 at 12:18 -0800, Linus Torvalds wrote:
> >> and then it is read by people (compiler writers) that intentionally
> >> try to mis-use the words and do language-lawyering ("that depends on
> >> what the meaning of 'is' is").
> >
> > That assumption about people working on compilers is a little too broad,
> > don't you think?
> 
> Let's just say that *some* are that way, and those are the ones that I
> end up butting heads with.
> 
> The sane ones I never have to argue with - point them at a bug, and
> they just say "yup, bug". The insane ones say "we don't need to fix
> that, because if you read this copy of the standards that have been
> translated to chinese and back, it clearly says that this is
> acceptable".
> 
> >> The whole "lvalue vs rvalue expression
> >> vs 'what is a volatile access'" thing for C++ was/is a great example
> >> of that.
> >
> > I'm not aware of the details of this.
> 
> The argument was that an lvalue doesn't actually "access" the memory
> (an rvalue does), so this:
> 
>volatile int *p = ...;
> 
>*p;
> 
> doesn't need to generate a load from memory, because "*p" is still an
> lvalue (since you could assign things to it).
> 
> This isn't an issue in C, because in C, expression statements are
> always rvalues, but C++ changed that.

Huhh.  I can see the problems that this creates in terms of C/C++
compatibility.

> The people involved with the C++
> standards have generally been totally clueless about their subtle
> changes.

This isn't a fair characterization.  There are many people that do care,
and certainly not all are clueless.  But it's a limited set of people,
bugs happen, and not all of them will have the same goals.

I think one way to prevent such problems in the future could be to have
someone in the kernel community volunteer to look through standard
revisions before they are published.  The standard needs to be fixed,
because compilers need to conform to the standard (e.g., a compiler's
extension "fixing" the above wouldn't be conforming anymore because it
emits more volatile reads than specified).

Or maybe those of us working on the standard need to flag potential
changes of interest to the kernel folks.  But that may be less reliable
than someone from the kernel side looking at them; I don't know.



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds
On Mon, Feb 17, 2014 at 2:09 PM, Torvald Riegel  wrote:
> On Sat, 2014-02-15 at 11:15 -0800, Linus Torvalds wrote:
>> >
>> > if (atomic_load(&x, mo_relaxed) == 1)
>> >   atomic_store(&y, 3, mo_relaxed));
>>
>> No, please don't use this idiotic example. It is wrong.
>
> It won't be useful in practice in a lot of cases, but that doesn't mean
> it's wrong.  It's clearly not illegal code.  It also serves a purpose: a
> simple example to reason about a few aspects of the memory model.

It's not illegal code, but i you claim that you can make that store
unconditional, it's a pointless and wrong example.

>> The fact is, if a compiler generates anything but the obvious sequence
>> (read/cmp/branch/store - where branch/store might obviously be done
>> with some other machine conditional like a predicate), the compiler is
>> wrong.
>
> Why?  I've reasoned why (1) to (3) above allow in certain cases (i.e.,
> the first load always returning 1) for the branch (or other machine
> conditional) to not be emitted.  So please either poke holes into this
> reasoning, or clarify that you don't in fact, contrary to what you wrote
> above, agree with (1) to (3).

The thing is, the first load DOES NOT RETURN 1. It returns whatever
that memory location contains. End of story.

Stop claiming it "can return 1".. It *never* returns 1 unless you do
the load and *verify* it, or unless the load itself can be made to go
away. And with the code sequence given, that just doesn't happen. END
OF STORY.

So your argument is *shit*. Why do you continue to argue it?

I told you how that load can go away, and you agreed. But IT CANNOT GO
AWAY any other way. You cannot claim "the compiler knows". The
compiler doesn't know. It's that simple.

>> So why do I say you are wrong, after I just gave you an example of how
>> it happens? Because my example went back to the *real* issue, and
>> there are actual real semantically meaningful details with doing
>> things like load merging.
>>
>> To give an example, let's rewrite things a bit more to use an extra variable:
>>
>> atomic_store(&x, 1, mo_relaxed);
>> a = atomic_load(&1, mo_relaxed);
>> if (a == 1)
>> atomic_store(&y, 3, mo_relaxed);
>>
>> which looks exactly the same.
>
> I'm confused.  Is this a new example?

That is a new example. The important part is that it has left a
"trace" for the programmer: because 'a' contains the value, the
programmer can now look at the value later and say "oh, we know we did
a store iff a was 1"

>> This sequence:
>>
>> atomic_store(&x, 1, mo_relaxed);
>> a = atomic_load(&x, mo_relaxed);
>> atomic_store(&y, 3, mo_relaxed);
>>
>> is actually - and very seriously - buggy.
>>
>> Why? Because you have effectively split the atomic_load into two loads
>> - one for the value of 'a', and one for your 'proof' that the store is
>> unconditional.
>
> I can't follow that, because it isn't clear to me which code sequences
> are meant to belong together, and which transformations the compiler is
> supposed to make.  If you would clarify that, then I can reply to this
> part.

Basically, if the compiler allows the condition of "I wrote 3 to the
y, but the programmer sees 'a' has another value than 1 later" then
the compiler is one buggy pile of shit. It fundamentally broke the
whole concept of atomic accesses. Basically the "atomic" access to 'x'
turned into two different accesses: the one that "proved" that x had
the value 1 (and caused the value 3 to be written), and the other load
that then write that other value into 'a'.

It's really not that complicated.

And this is why descriptions like this should ABSOLUTELY NOT BE
WRITTEN as "if the compiler can prove that 'x' had the value 1, it can
remove the branch". Because that IS NOT SUFFICIENT. That was not a
valid transformation of the atomic load.

The only valid transformation was the one I stated, namely to remove
the load entirely and replace it with the value written earlier in the
same execution context.

Really, why is so hard to understand?

   Linus


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel
On Mon, 2014-02-17 at 14:14 -0800, Paul E. McKenney wrote:
> On Mon, Feb 17, 2014 at 09:39:54PM +0100, Richard Biener wrote:
> > On February 17, 2014 7:18:15 PM GMT+01:00, "Paul E. McKenney" 
> >  wrote:
> > >On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote:
> > >> On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote:
> > >> > You need volatile semantics to force the compiler to ignore any
> > >proofs
> > >> > it might otherwise attempt to construct.  Hence all the
> > >ACCESS_ONCE()
> > >> > calls in my email to Torvald.  (Hopefully I translated your example
> > >> > reasonably.)
> > >> 
> > >> My brain gave out for today; but it did appear to have the right
> > >> structure.
> > >
> > >I can relate.  ;-)
> > >
> > >> I would prefer it C11 would not require the volatile casts. It should
> > >> simply _never_ speculate with atomic writes, volatile or not.
> > >
> > >I agree with not needing volatiles to prevent speculated writes. 
> > >However,
> > >they will sometimes be needed to prevent excessive load/store
> > >combining.
> > >The compiler doesn't have the runtime feedback mechanisms that the
> > >hardware has, and thus will need help from the developer from time
> > >to time.
> > >
> > >Or maybe the Linux kernel simply waits to transition to C11 relaxed
> > >atomics
> > >until the compiler has learned to be sufficiently conservative in its
> > >load-store combining decisions.
> > 
> > Sounds backwards. Currently the compiler does nothing to the atomics. I'm 
> > sure we'll eventually add something. But if testing coverage is zero 
> > outside then surely things get worse, not better with time.
> 
> Perhaps we solve this chicken-and-egg problem by creating a test suite?

Perhaps.  The test suite might also be a good set of examples showing
which cases we expect to be optimized in a certain way, and which not.
I suppose the uses of (the equivalent) of atomics in the kernel would be
a good start.



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds
On Mon, Feb 17, 2014 at 2:25 PM, Torvald Riegel  wrote:
> On Mon, 2014-02-17 at 14:02 -0800, Linus Torvalds wrote:
>>
>> The argument was that an lvalue doesn't actually "access" the memory
>> (an rvalue does), so this:
>>
>>volatile int *p = ...;
>>
>>*p;
>>
>> doesn't need to generate a load from memory, because "*p" is still an
>> lvalue (since you could assign things to it).
>>
>> This isn't an issue in C, because in C, expression statements are
>> always rvalues, but C++ changed that.
>
> Huhh.  I can see the problems that this creates in terms of C/C++
> compatibility.

That's not the biggest problem.

The biggest problem is that you have compiler writers that don't care
about sane *use* of the features they write a compiler for, they just
care about the standard.

So they don't care about C vs C++ compatibility. Even more
importantly, they don't care about the *user* that uses only C++ and
the fact that their reading of the standard results in *meaningless*
behavior. They point to the standard and say "that's what the standard
says, suck it", and silently generate code (or in this case, avoid
generating code) that makes no sense.

So it's not about C++ being incompatible with C, it's about C++ having
insane and bad semantics unless you just admit that "oh, ok, I need to
not just read the standard, I also need to use my brain, and admit
that a C++ statement expression needs to act as if it is an "access"
wrt volatile variables".

In other words, as a compiler person, you do need to read more than
the paper of standard. You need to also take into account what is
reasonable behavior even when the standard could possibly be read some
other way. And some compiler people don't.

The "volatile access in statement expression" did get resolved,
sanely, at least in gcc. I think gcc warns about some remaining cases.

Btw, afaik, C++11 actually clarifies the standard to require the
reads, because everybody *knew* that not requiring the read was insane
and meaningless behavior, and clearly against the intent of
"volatile".

But that didn't stop compiler writers from saying "hey, the standard
allows my insane and meaningless behavior, so I'll implement it and
not consider it a bug".

Linus


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Joseph S. Myers
On Mon, 17 Feb 2014, Torvald Riegel wrote:

> On Mon, 2014-02-17 at 18:59 +, Joseph S. Myers wrote:
> > On Sat, 15 Feb 2014, Torvald Riegel wrote:
> > 
> > > glibc is a counterexample that comes to mind, although it's a smaller
> > > code base.  (It's currently not using C11 atomics, but transitioning
> > > there makes sense, and some thing I want to get to eventually.)
> > 
> > glibc is using C11 atomics (GCC builtins rather than _Atomic / 
> > , but using __atomic_* with explicitly specified memory model 
> > rather than the older __sync_*) on AArch64, plus in certain cases on ARM 
> > and MIPS.
> 
> I think the major steps remaining is moving the other architectures
> over, and rechecking concurrent code (e.g., for the code that I have

I don't think we'll be ready to require GCC >= 4.7 to build glibc for 
another year or two, although probably we could move the requirement up 
from 4.4 to 4.6.  (And some platforms only had the C11 atomics optimized 
later than 4.7.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: MSP430 in gcc4.9 ... enable interrupts?

2014-02-17 Thread DJ Delorie

> I presume these will be part of the headers for the library
> distributed for msp430 gcc by TI/Redhat?

I can't speak for TI's or Red Hat's plans.  GNU's typical non-custom
embedded runtime is newlib/libgloss, which usually doesn't have that
much in the way of chip-specific headers or library functions.

> is that for the "critical" attribute that exists in the old msp430
> port (which disables interrupts for the duration of the function)?

Yes, for things like that.  They're documented under "Function
Attributes" in the "Extensions to the C Language Family" chapter of
the current GCC manual.


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Alec Teal

On 17/02/14 20:18, Linus Torvalds wrote:

On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel  wrote:

Which example do you have in mind here?  Haven't we resolved all the
debated examples, or did I miss any?

Well, Paul seems to still think that the standard possibly allows
speculative writes or possibly value speculation in ways that break
the hardware-guaranteed orderings.

And personally, I can't read standards paperwork. It is invariably

Can't => Don't - evidently.

written in some basically impossible-to-understand lawyeristic mode,
You mean "unambiguous" - try reading a patent (Apple have 1000s of 
trivial ones, I tried reading one once thinking "how could they have 
phrased it so this got approved", their technique was to make the reader 
want to start cutting themselves to prove they wern't numb to everything)

and then it is read by people (compiler writers) that intentionally
try to mis-use the words and do language-lawyering ("that depends on
what the meaning of 'is' is"). The whole "lvalue vs rvalue expression
vs 'what is a volatile access'" thing for C++ was/is a great example
of that.
I'm not going to teach you what rvalues and lvalues, but! 
http://lmgtfy.com/?q=what+are+rvalues might help.


So quite frankly, as a result I refuse to have anything to do with the
process directly.

Is this goodbye?


  Linus
That aside, what is the problem? If the compiler has created code that 
that has different program states than what would be created without 
optimisation please file a bug report and/or send something to the 
mailing list USING A CIVIL TONE, there's no need for swear-words and 
profanities all the time - use them when you want to emphasise 
something. Additionally if you are always angry, start calling that 
state "normal" then reserve such words for when you are outraged.


There are so many emails from you bitching about stuff, I've lost track 
of what you're bitching about you bitch that much about it. Like this 
standards stuff above (notice I said stuff, not "crap" or "shit").


What exactly is your problem, if the compiler is doing something the 
standard does not permit, or optimising something wrongly (read: "puts 
the program in a different state than if the optimisation was not 
applied") that is REALLY serious, you are right to report it; but 
whining like a n00b on Stack-overflow when a question gets closed is not 
helping.


I tried reading back though the emails (I dismissed them previously) but 
there's just so much ranting, and rants about the standard too (I would 
trash this if I deemed the effort required to delete was less than the 
storage of the bytes the message takes up) standardised behaviour is 
VERY important.


So start again, what is the serious problem, have you got any code that 
would let me replicate it, what is your version of GCC?


Oh and lastly! Optimisations are not as casual as "oh, we could do this 
and it'd work better" unlike kernel work or any other software that is 
being improved, it is very formal (and rightfully so). I seriously 
recommend you read the first 40 pages at least of a book called 
"Compiler Design, Analysis and Transformation" it's not about the 
parsing phases or anything, but it develops a good introduction and 
later a good foundation for exploring the field further. Compilers do 
not operate on what I call "A-level logic" and to show what I mean I use 
the shovel-to-the-face of real analysis, "of course 1/x tends towards 0, 
it's not gonna be 5!!" = A-level logic. "Let epsilon > 0 be given, then 
there exists an N" - formal proof. So when one says "the compiler 
can prove" it's not some silly thing powered by A-level logic, it is the 
implementation of something that can be proven to be correct (in the 
sense of the program states mentioned before)


So yeah, calm down and explain - no lashing out at standards bodies, 
what is the problem?


Alec


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel
On Mon, 2014-02-17 at 14:32 -0800, Linus Torvalds wrote:
> On Mon, Feb 17, 2014 at 2:09 PM, Torvald Riegel  wrote:
> > On Sat, 2014-02-15 at 11:15 -0800, Linus Torvalds wrote:
> >> >
> >> > if (atomic_load(&x, mo_relaxed) == 1)
> >> >   atomic_store(&y, 3, mo_relaxed));
> >>
> >> No, please don't use this idiotic example. It is wrong.
> >
> > It won't be useful in practice in a lot of cases, but that doesn't mean
> > it's wrong.  It's clearly not illegal code.  It also serves a purpose: a
> > simple example to reason about a few aspects of the memory model.
> 
> It's not illegal code, but i you claim that you can make that store
> unconditional, it's a pointless and wrong example.
> 
> >> The fact is, if a compiler generates anything but the obvious sequence
> >> (read/cmp/branch/store - where branch/store might obviously be done
> >> with some other machine conditional like a predicate), the compiler is
> >> wrong.
> >
> > Why?  I've reasoned why (1) to (3) above allow in certain cases (i.e.,
> > the first load always returning 1) for the branch (or other machine
> > conditional) to not be emitted.  So please either poke holes into this
> > reasoning, or clarify that you don't in fact, contrary to what you wrote
> > above, agree with (1) to (3).
> 
> The thing is, the first load DOES NOT RETURN 1. It returns whatever
> that memory location contains. End of story.

The memory location is just an abstraction for state, if it's not
volatile.

> Stop claiming it "can return 1".. It *never* returns 1 unless you do
> the load and *verify* it, or unless the load itself can be made to go
> away. And with the code sequence given, that just doesn't happen. END
> OF STORY.

void foo();
{
  atomic x = 1;
  if (atomic_load(&x, mo_relaxed) == 1)
atomic_store(&y, 3, mo_relaxed));
}

This is a counter example to your claim, and yes, the compiler has proof
that x is 1.  It's deliberately simple, but I can replace this with
other more advanced situations.  For example, if x comes out of malloc
(or, on the kernel side, something else that returns non-aliasing
memory) and hasn't provably escaped to other threads yet.

I haven't posted this full example, but I've *clearly* said that *if*
the compiler can prove that the load would always return 1, it can
remove it.  And it's simple to see why that's the case: If this holds,
then in all allowed executions it would load from a know store, the
relaxed_mo gives no further ordering guarantees so we can just take the
value, and we're good.

> So your argument is *shit*. Why do you continue to argue it?

Maybe because it isn't?  Maybe you should try to at least trust that my
intentions are good, even if distrusting my ability to reason.

> I told you how that load can go away, and you agreed. But IT CANNOT GO
> AWAY any other way. You cannot claim "the compiler knows". The
> compiler doesn't know. It's that simple.

Oh yes it can.  Because of the same rules that allow you to perform the
other transformations.  Please try to see the similarities here.  You
previously said you don't want to mix volatile semantics and atomics.
This is something that's being applied in this example.

> >> So why do I say you are wrong, after I just gave you an example of how
> >> it happens? Because my example went back to the *real* issue, and
> >> there are actual real semantically meaningful details with doing
> >> things like load merging.
> >>
> >> To give an example, let's rewrite things a bit more to use an extra 
> >> variable:
> >>
> >> atomic_store(&x, 1, mo_relaxed);
> >> a = atomic_load(&1, mo_relaxed);
> >> if (a == 1)
> >> atomic_store(&y, 3, mo_relaxed);
> >>
> >> which looks exactly the same.
> >
> > I'm confused.  Is this a new example?
> 
> That is a new example. The important part is that it has left a
> "trace" for the programmer: because 'a' contains the value, the
> programmer can now look at the value later and say "oh, we know we did
> a store iff a was 1"
> 
> >> This sequence:
> >>
> >> atomic_store(&x, 1, mo_relaxed);
> >> a = atomic_load(&x, mo_relaxed);
> >> atomic_store(&y, 3, mo_relaxed);
> >>
> >> is actually - and very seriously - buggy.
> >>
> >> Why? Because you have effectively split the atomic_load into two loads
> >> - one for the value of 'a', and one for your 'proof' that the store is
> >> unconditional.
> >
> > I can't follow that, because it isn't clear to me which code sequences
> > are meant to belong together, and which transformations the compiler is
> > supposed to make.  If you would clarify that, then I can reply to this
> > part.
> 
> Basically, if the compiler allows the condition of "I wrote 3 to the
> y, but the programmer sees 'a' has another value than 1 later" then
> the compiler is one buggy pile of shit. It fundamentally broke the
> whole concept of atomic accesses. Basically the "atomic" access to 'x'
> turned into two different accesses: the one that "proved" that x had
> the value 1 (and caused the value 3 to

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel
On Mon, 2014-02-17 at 14:47 -0800, Linus Torvalds wrote:
> On Mon, Feb 17, 2014 at 2:25 PM, Torvald Riegel  wrote:
> > On Mon, 2014-02-17 at 14:02 -0800, Linus Torvalds wrote:
> >>
> >> The argument was that an lvalue doesn't actually "access" the memory
> >> (an rvalue does), so this:
> >>
> >>volatile int *p = ...;
> >>
> >>*p;
> >>
> >> doesn't need to generate a load from memory, because "*p" is still an
> >> lvalue (since you could assign things to it).
> >>
> >> This isn't an issue in C, because in C, expression statements are
> >> always rvalues, but C++ changed that.
> >
> > Huhh.  I can see the problems that this creates in terms of C/C++
> > compatibility.
> 
> That's not the biggest problem.
> 
> The biggest problem is that you have compiler writers that don't care
> about sane *use* of the features they write a compiler for, they just
> care about the standard.
> 
> So they don't care about C vs C++ compatibility. Even more
> importantly, they don't care about the *user* that uses only C++ and
> the fact that their reading of the standard results in *meaningless*
> behavior. They point to the standard and say "that's what the standard
> says, suck it", and silently generate code (or in this case, avoid
> generating code) that makes no sense.

There's an underlying problem here that's independent from the actual
instance that you're worried about here: "no sense" is a ultimately a
matter of taste/objectives/priorities as long as the respective
specification is logically consistent.

If you want to be independent of your sanity being different from other
people's sanity (e.g., compiler writers), you need to make sure that the
specification is precise and says what you want.  IOW, think about the
specification being the program, and the people being computers; you
better want a well-defined program in this case.

> So it's not about C++ being incompatible with C, it's about C++ having
> insane and bad semantics unless you just admit that "oh, ok, I need to
> not just read the standard, I also need to use my brain, and admit
> that a C++ statement expression needs to act as if it is an "access"
> wrt volatile variables".

1) I agree that (IMO) a good standard strives for being easy to
understand.

2) In practice, there is a trade-off between "Easy to understand" and
actually producing a specification.  A standard is not a tutorial.  And
that's for good reason, because (a) there might be more than one way to
teach something and that should be allowed and (b) that the standard
should carry the full precision but still be compact enough to be
manageable.

3) Implementations can try to be nice to users by helping them avoiding
error-prone corner cases or such.  A warning for common problems is such
a case.  But an implementation has to draw a line somewhere, demarcating
cases where it fully exploits what the standard says (eg, to allow
optimizations) from cases where it is more conservative and does what
the standard allows but in a potentially more intuitive way.  That's
especially the case if it's being asked to produce high-performance
code.

4) There will be arguments for where the line actually is, simply
because different users will have different goals.

5) The way to reduce 4) is to either make the standard more specific, or
to provide better user documentation.  If the standard has strict
requirements, then there will be less misunderstanding.

6) To achieve 5), one way is to get involved in the standards process.




Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds
On Mon, Feb 17, 2014 at 3:10 PM, Alec Teal  wrote:
>
> You mean "unambiguous" - try reading a patent (Apple have 1000s of trivial
> ones, I tried reading one once thinking "how could they have phrased it so
> this got approved", their technique was to make the reader want to start
> cutting themselves to prove they wern't numb to everything)

Oh, I agree, patent language is worse.

> I'm not going to teach you what rvalues and lvalues, but!

I know what lvalues and rvalues are. I *understand* the thinking that
goes on behind the "let's not do the access, because it's not an
rvalue, so there is no 'access' to the object".

I understand it from a technical perspective.

I don't understand the compiler writer that uses a *technicality* to
argue against generating sane code that is obviously what the user
actually asked for.

See the difference?

> So start again, what is the serious problem, have you got any code that
> would let me replicate it, what is your version of GCC?

The volatile problem is long fixed. The people who argued for the
"legalistically correct", but insane behavior lost (and as mentioned,
I think C++11 actually fixed the legalistic reading too).

I'm bringing it up because I've had too many cases where compiler
writers pointed to standard and said "that is ambiguous or undefined,
so we can do whatever the hell we want, regardless of whether that's
sensible, or regardless of whether there is a sensible way to get the
behavior you want or not".


> Oh and lastly! Optimisations are not as casual as "oh, we could do this and
> it'd work better" unlike kernel work or any other software that is being
> improved, it is very formal (and rightfully so)

Alec, I know compilers. I don't do code generation (quite frankly,
register allocation and instruction choice is when I give up), but I
did actually write my own for static analysis, including turning
things into SSA etc.

No, I'm not a "compiler person", but I actually do know enough that I
understand what goes on.

And exactly because I know enough, I would *really* like atomics to be
well-defined, and have very clear - and *local* - rules about how they
can be combined and optimized.

None of this "if you can prove that the read has value X" stuff. And
things like value speculation should simply not be allowed, because
that actually breaks the dependency chain that the CPU architects give
guarantees for. Instead, make the rules be very clear, and very
simple, like my suggestion. You can never remove a load because you
can "prove" it has some value, but you can combine two consecutive
atomic accesses/

For example, CPU people actually do tend to give guarantees for
certain things, like stores that are causally related being visible in
a particular order. If the compiler starts doing value speculation on
atomic accesses, you are quite possibly breaking things like that.
It's just not a good idea. Don't do it. Write the standard so that it
clearly is disallowed.

Because you may think that a C standard is machine-independent, but
that isn't really the case. The people who write code still write code
for a particular machine. Our code works (in the general case) on
different byte orderings, different register sizes, different memory
ordering models. But in each *instance* we still end up actually
coding for each machine.

So the rules for atomics should be simple and *specific* enough that
when you write code for a particular architecture, you can take the
architecture memory ordering *and* the C atomics orderings into
account, and do the right thing for that architecture.

And that very much means that doing things like value speculation MUST
NOT HAPPEN. See? Even if you can prove that your code is "equivalent",
it isn't.

So for example, let's say that you have a pointer, and you have some
reason to believe that the pointer has a particular value. So you
rewrite following the pointer from this:

  value = ptr->val;

into

  value = speculated->value;
  tmp = ptr;
  if (unlikely(tmp != speculated))
value = tmp->value;

and maybe you can now make the critical code-path for the speculated
case go faster (since now there is no data dependency for the
speculated case, and the actual pointer chasing load is now no longer
in the critical path), and you made things faster because your
profiling showed that the speculated case was true 99% of the time.
Wonderful, right? And clearly, the code "provably" does the same
thing.

EXCEPT THAT IS NOT TRUE AT ALL.

It very much does not do the same thing at all, and by doing value
speculation and "proving" something was true, the only thing you did
was to make incorrect code run faster. Because now the causally
related load of value from the pointer isn't actually causally related
at all, and you broke the memory ordering.

This is why I don't like it when I see Torvald talk about "proving"
things. It's bullshit. You can "prove" pretty much anything, and in
the process lose sight of the bigger issue, namely tha

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds
On Mon, Feb 17, 2014 at 3:17 PM, Torvald Riegel  wrote:
> On Mon, 2014-02-17 at 14:32 -0800,
>
>> Stop claiming it "can return 1".. It *never* returns 1 unless you do
>> the load and *verify* it, or unless the load itself can be made to go
>> away. And with the code sequence given, that just doesn't happen. END
>> OF STORY.
>
> void foo();
> {
>   atomic x = 1;
>   if (atomic_load(&x, mo_relaxed) == 1)
> atomic_store(&y, 3, mo_relaxed));
> }

This is the very example I gave, where the real issue is not that "you
prove that load returns 1", you instead say "store followed by a load
can be combined".

I (in another email I just wrote) tried to show why the "prove
something is true" is a very dangerous model.  Seriously, it's pure
crap. It's broken.

If the C standard defines atomics in terms of "provable equivalence",
it's broken. Exactly because on a *virtual* machine you can prove
things that are not actually true in a *real* machine. I have the
example of value speculation changing the memory ordering model of the
actual machine.

See?

Linus


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds
On Mon, Feb 17, 2014 at 3:41 PM, Torvald Riegel  wrote:
>
> There's an underlying problem here that's independent from the actual
> instance that you're worried about here: "no sense" is a ultimately a
> matter of taste/objectives/priorities as long as the respective
> specification is logically consistent.

Yes. But I don't think it's "independent".

Exactly *because* some people will read standards without applying
"does the resulting code generation actually make sense for the
programmer that wrote the code", the standard has to be pretty clear.

The standard often *isn't* pretty clear. It wasn't clear enough when
it came to "volatile", and yet that was a *much* simpler concept than
atomic accesses and memory ordering.

And most of the time it's not a big deal. But because the C standard
generally tries to be very portable, and cover different machines,
there tends to be a mindset that anything inherently unportable is
"undefined" or "implementation defined", and then the compiler writer
is basically given free reign to do anything they want (with
"implementation defined" at least requiring that it is reliably the
same thing).

And when it comes to memory ordering, *everything* is basically
non-portable, because different CPU's very much have different rules.
I worry that that means that the standard then takes the stance that
"well, compiler re-ordering is no worse than CPU re-ordering, so we
let the compiler do anything". And then we have to either add
"volatile" to make sure the compiler doesn't do that, or use an overly
strict memory model at the compiler level that makes it all pointless.

So I really really hope that the standard doesn't give compiler
writers free hands to do anything that they can prove is "equivalent"
in the virtual C machine model. That's not how you get reliable
results.

   Linus


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Paul E. McKenney
On Mon, Feb 17, 2014 at 04:18:52PM -0800, Linus Torvalds wrote:
> On Mon, Feb 17, 2014 at 3:41 PM, Torvald Riegel  wrote:
> >
> > There's an underlying problem here that's independent from the actual
> > instance that you're worried about here: "no sense" is a ultimately a
> > matter of taste/objectives/priorities as long as the respective
> > specification is logically consistent.
> 
> Yes. But I don't think it's "independent".
> 
> Exactly *because* some people will read standards without applying
> "does the resulting code generation actually make sense for the
> programmer that wrote the code", the standard has to be pretty clear.
> 
> The standard often *isn't* pretty clear. It wasn't clear enough when
> it came to "volatile", and yet that was a *much* simpler concept than
> atomic accesses and memory ordering.
> 
> And most of the time it's not a big deal. But because the C standard
> generally tries to be very portable, and cover different machines,
> there tends to be a mindset that anything inherently unportable is
> "undefined" or "implementation defined", and then the compiler writer
> is basically given free reign to do anything they want (with
> "implementation defined" at least requiring that it is reliably the
> same thing).
> 
> And when it comes to memory ordering, *everything* is basically
> non-portable, because different CPU's very much have different rules.
> I worry that that means that the standard then takes the stance that
> "well, compiler re-ordering is no worse than CPU re-ordering, so we
> let the compiler do anything". And then we have to either add
> "volatile" to make sure the compiler doesn't do that, or use an overly
> strict memory model at the compiler level that makes it all pointless.

For whatever it is worth, this line of reasoning has been one reason why
I have been objecting strenuously every time someone on the committee
suggests eliminating "volatile" from the standard.

Thanx, Paul

> So I really really hope that the standard doesn't give compiler
> writers free hands to do anything that they can prove is "equivalent"
> in the virtual C machine model. That's not how you get reliable
> results.
> 
>Linus
> 



RE: Vectorizer Pragmas

2014-02-17 Thread Geva, Robert
The way Intel present #pragma simd (to users, to the OpenMP committee, to the C 
and C++ committees, etc) is that it is not a hint, it has a meaning.
The meaning is defined in term of evaluation order.
Both C and C++ define an evaluation order for sequential programs. #pragma simd 
relaxes the sequential order into a partial order:
0. subsequent iterations of the loop are chunked together and execute in 
lockstep
1. there is no change in the order of evaluation of expression within an 
iteration
2. if X and Y are expressions in the loop, and X(i) is the evaluation of X in 
iteration i, then for X sequenced before Y and iteration i evaluated before 
iteration j, X(i) is sequenced before Y(j).

A corollary is that the sequential order is always allowed, since it satisfies 
the partial order.
However, the partial order allows the compiler to group copies of the same 
expression next to each other, and then to combine the scalar instructions into 
a vector instruction.
There are other corollaries, such as that if multiple loop iterations write 
into an object defined outside of the loop then it has to be an undefined 
behavior, the vector moral equivalent of a data race. That is what induction 
variables and reductions are necessary exception to this rule and require 
explicit support.

As far as correctness, by this definition, the programmer expressed that it is 
correct, and the compiler should not try to prove correctness. 

On performance heuristics side, the Intel compiler tries to not second guess 
the user. There are users who work much harder than just add a #pragma simd on 
unmodified sequential loops. There are various changes that may be necessary, 
and users who worked hard to get their loops in a good shape are unhappy if the 
compiler does second guess them.

Robert.

-Original Message-
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Renato 
Golin
Sent: Monday, February 17, 2014 7:14 AM
To: tpri...@computer.org
Cc: gcc
Subject: Re: Vectorizer Pragmas

On 17 February 2014 14:47, Tim Prince  wrote:
> I'm continuing discussions with former Intel colleagues.  If you are 
> asking for insight into how Intel priorities vary over time, I don't 
> expect much, unless the next beta compiler provides some inferences.  
> They have talked about implementing all of OpenMP 4.0 except user 
> defined reduction this year.  That would imply more activity in that 
> area than on cilkplus,

I'm expecting this. Any proposal to support Cilk in LLVM would be purely 
temporary and not endorsed in any way.


> although some fixes have come in the latter.  On the other hand I had 
> an issue on omp simd reduction(max: ) closed with the decision "will 
> not be fixed."

We still haven't got pragmas for induction/reduction logic, so I'm not too 
worried about them.


> I have an icc problem report in on fixing omp simd safelen so it is 
> more like the standard and less like the obsolete pragma simd vectorlength.

Our width metadata is slightly different in that it means "try to use that 
length", rather than "it's safe to use that length", this is why I'm holding on 
use safelen for the moment.


> Also, I have some problem reports active attempting to get 
> clarification of their omp target implementation.

Same here... RTFM is not enough in this case. ;)


> You may have noticed that omp parallel for simd in current Intel 
> compilers can be used for combined thread and simd parallelism, 
> including the case where the outer loop is parallelizable and 
> vectorizable but the inner one is not.

That's my fear of going with omp simd directly. I don't want to be throwing 
threads all over the place when all I really want is vector code.

For the time, my proposal is to use legacy pragmas: vector/novector, 
unroll/nounroll and simd vectorlength which map nicely to the metadata we 
already have and don't incur in OpenMP overhead. Later on, if OpenMP ends up 
with simple non-threaded pragmas, we should use those and deprecate the legacy 
ones.

If GCC is trying to do the same thing regarding non-threaded-vector code, I'd 
be glad to be involved in the discussion. Some LLVM folks think this should be 
an OpenMP discussion, I personally think it's pushing the boundaries a bit too 
much on an inherently threaded library extension.

cheers,
--renato


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Paul E. McKenney
On Mon, Feb 17, 2014 at 12:18:21PM -0800, Linus Torvalds wrote:
> On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel  wrote:
> >
> > Which example do you have in mind here?  Haven't we resolved all the
> > debated examples, or did I miss any?
> 
> Well, Paul seems to still think that the standard possibly allows
> speculative writes or possibly value speculation in ways that break
> the hardware-guaranteed orderings.

It is not that I know of any specific problems, but rather that I
know I haven't looked under all the rocks.  Plus my impression from
my few years on the committee is that the standard will be pushed to
the limit when it comes time to add optimizations.

One example that I learned about last week uses the branch-prediction
hardware to validate value speculation.  And no, I am not at all a fan
of value speculation, in case you were curious.  However, it is still
an educational example.

This is where you start:

p = gp.load_explicit(memory_order_consume); /* AKA rcu_dereference() */
do_something(p->a, p->b, p->c);
p->d = 1;

Then you leverage branch-prediction hardware as follows:

p = gp.load_explicit(memory_order_consume); /* AKA rcu_dereference() */
if (p == GUESS) {
do_something(GUESS->a, GUESS->b, GUESS->c);
GUESS->d = 1;
} else {
do_something(p->a, p->b, p->c);
p->d = 1;
}

The CPU's branch-prediction hardware squashes speculation in the case where
the guess was wrong, and this prevents the speculative store to ->d from
ever being visible.  However, the then-clause breaks dependencies, which
means that the loads -could- be speculated, so that do_something() gets
passed pre-initialization values.

Now, I hope and expect that the wording in the standard about dependency
ordering prohibits this sort of thing.  But I do not yet know for certain.

And yes, I am being paranoid.  But not unnecessarily paranoid.  ;-)

Thanx, Paul

> And personally, I can't read standards paperwork. It is invariably
> written in some basically impossible-to-understand lawyeristic mode,
> and then it is read by people (compiler writers) that intentionally
> try to mis-use the words and do language-lawyering ("that depends on
> what the meaning of 'is' is"). The whole "lvalue vs rvalue expression
> vs 'what is a volatile access'" thing for C++ was/is a great example
> of that.
> 
> So quite frankly, as a result I refuse to have anything to do with the
> process directly.
> 
>  Linus
> 



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds
On Mon, Feb 17, 2014 at 7:00 PM, Paul E. McKenney
 wrote:
>
> One example that I learned about last week uses the branch-prediction
> hardware to validate value speculation.  And no, I am not at all a fan
> of value speculation, in case you were curious.

Heh. See the example I used in my reply to Alec Teal. It basically
broke the same dependency the same way.

Yes, value speculation of reads is simply wrong, the same way
speculative writes are simply wrong. The dependency chain matters, and
is meaningful, and breaking it is actively bad.

As far as I can tell, the intent is that you can't do value
speculation (except perhaps for the "relaxed", which quite frankly
sounds largely useless). But then I do get very very nervous when
people talk about "proving" certain values.

Linus


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds
On Mon, Feb 17, 2014 at 7:24 PM, Linus Torvalds
 wrote:
>
> As far as I can tell, the intent is that you can't do value
> speculation (except perhaps for the "relaxed", which quite frankly
> sounds largely useless).

Hmm. The language I see for "consume" is not obvious:

  "Consume operation: no reads in the current thread dependent on the
value currently loaded can be reordered before this load"

and it could make a compiler writer say that value speculation is
still valid, if you do it like this (with "ptr" being the atomic
variable):

  value = ptr->val;

into

  tmp = ptr;
  value = speculated.value;
  if (unlikely(tmp != &speculated))
value = tmp->value;

which is still bogus. The load of "ptr" does happen before the load of
"value = speculated->value" in the instruction stream, but it would
still result in the CPU possibly moving the value read before the
pointer read at least on ARM and power.

So if you're a compiler person, you think you followed the letter of
the spec - as far as *you* were concerned, no load dependent on the
value of the atomic load moved to before the atomic load. You go home,
happy, knowing you've done your job. Never mind that you generated
code that doesn't actually work.

I dread having to explain to the compiler person that he may be right
in some theoretical virtual machine, but the code is subtly broken and
nobody will ever understand why (and likely not be able to create a
test-case showing the breakage).

But maybe the full standard makes it clear that "reordered before this
load" actually means on the real hardware, not just in the generated
instruction stream. Reading it with understanding of the *intent* and
understanding all the different memory models that requirement should
be obvious (on alpha, you need an "rmb" instruction after the load),
but ...

Linus


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Paul E. McKenney
On Mon, Feb 17, 2014 at 07:24:56PM -0800, Linus Torvalds wrote:
> On Mon, Feb 17, 2014 at 7:00 PM, Paul E. McKenney
>  wrote:
> >
> > One example that I learned about last week uses the branch-prediction
> > hardware to validate value speculation.  And no, I am not at all a fan
> > of value speculation, in case you were curious.
> 
> Heh. See the example I used in my reply to Alec Teal. It basically
> broke the same dependency the same way.

;-)

> Yes, value speculation of reads is simply wrong, the same way
> speculative writes are simply wrong. The dependency chain matters, and
> is meaningful, and breaking it is actively bad.
> 
> As far as I can tell, the intent is that you can't do value
> speculation (except perhaps for the "relaxed", which quite frankly
> sounds largely useless). But then I do get very very nervous when
> people talk about "proving" certain values.

That was certainly my intent, but as you might have notice in the
discussion earlier in this thread, the intent can get lost pretty
quickly.  ;-)

The HPC guys appear to be the most interested in breaking dependencies.
Their software does't rely on dependencies, and from their viewpoint
anything that has any chance of leaving an FP unit of any type idle is
a very bad thing.  But there are probably other benchmarks for which
breaking dependencies gives a few percent performance boost.

Thanx, Paul



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Paul E. McKenney
On Mon, Feb 17, 2014 at 07:42:42PM -0800, Linus Torvalds wrote:
> On Mon, Feb 17, 2014 at 7:24 PM, Linus Torvalds
>  wrote:
> >
> > As far as I can tell, the intent is that you can't do value
> > speculation (except perhaps for the "relaxed", which quite frankly
> > sounds largely useless).
> 
> Hmm. The language I see for "consume" is not obvious:
> 
>   "Consume operation: no reads in the current thread dependent on the
> value currently loaded can be reordered before this load"
> 
> and it could make a compiler writer say that value speculation is
> still valid, if you do it like this (with "ptr" being the atomic
> variable):
> 
>   value = ptr->val;
> 
> into
> 
>   tmp = ptr;
>   value = speculated.value;
>   if (unlikely(tmp != &speculated))
> value = tmp->value;
> 
> which is still bogus. The load of "ptr" does happen before the load of
> "value = speculated->value" in the instruction stream, but it would
> still result in the CPU possibly moving the value read before the
> pointer read at least on ARM and power.
> 
> So if you're a compiler person, you think you followed the letter of
> the spec - as far as *you* were concerned, no load dependent on the
> value of the atomic load moved to before the atomic load. You go home,
> happy, knowing you've done your job. Never mind that you generated
> code that doesn't actually work.

Agreed, that would be bad.  But please see below.

> I dread having to explain to the compiler person that he may be right
> in some theoretical virtual machine, but the code is subtly broken and
> nobody will ever understand why (and likely not be able to create a
> test-case showing the breakage).

If things go as they usually do, such explanations will be required
a time or two.

> But maybe the full standard makes it clear that "reordered before this
> load" actually means on the real hardware, not just in the generated
> instruction stream. Reading it with understanding of the *intent* and
> understanding all the different memory models that requirement should
> be obvious (on alpha, you need an "rmb" instruction after the load),
> but ...

The key point with memory_order_consume is that it must be paired with
some sort of store-release, a category that includes stores tagged
with memory_order_release (surprise!), memory_order_acq_rel, and
memory_order_seq_cst.  This pairing is analogous to the memory-barrier
pairing in the Linux kernel.

So you have something like this for the rcu_assign_pointer() side:

p = kmalloc(...);
if (unlikely(!p))
return -ENOMEM;
p->a = 1;
p->b = 2;
p->c = 3;
/* The following would be buried within rcu_assign_pointer(). */
atomic_store_explicit(&gp, p, memory_order_release);

And something like this for the rcu_dereference() side:

/* The following would be buried within rcu_dereference(). */
q = atomic_load_explicit(&gp, memory_order_consume);
do_something_with(q->a);

So, let's look at the C11 draft, section 5.1.2.4 "Multi-threaded
executions and data races".

5.1.2.4p14 says that the atomic_load_explicit() carries a dependency to
the argument of do_something_with().

5.1.2.4p15 says that the atomic_store_explicit() is dependency-ordered
before the atomic_load_explicit().

5.1.2.4p15 also says that the atomic_store_explicit() is
dependency-ordered before the argument of do_something_with().  This is
because if A is dependency-ordered before X and X carries a dependency
to B, then A is dependency-ordered before B.

5.1.2.4p16 says that the atomic_store_explicit() inter-thread happens
before the argument of do_something_with().

The assignment to p->a is sequenced before the atomic_store_explicit().

Therefore, combining these last two, the assignment to p->a happens
before the argument of do_something_with(), and that means that
do_something_with() had better see the "1" assigned to p->a or some
later value.

But as far as I know, compiler writers currently take the approach of
treating memory_order_consume as if it was memory_order_acquire.
Which certainly works, as long as ARM and PowerPC people don't mind
an extra memory barrier out of each rcu_dereference().

Which is one thing that compiler writers are permitted to do according
to the standard -- substitute a memory-barrier instruction for any
given dependency...

Thanx, Paul



Help Required on Missing GOTO statements in Gimple/SSA/CFG Pass ...

2014-02-17 Thread Mohsin Khan
Hi,

 I am developing plugins for the GCC-4.8.2. I am a newbie in plugins.
I wrote a plugin and tried to count and see the Goto Statements using
the gimple_stmt_iterator. I get gimple statements printed on my
stdout, but I am not able to find the line which has goto statements.
I only get other lines such as variable declaration and logic
statements, but no goto statements.
  When I open the Gimple/SSA/CFG file seperately using the vim editor
I find the goto statements are actually present.
  So, can anyone help me. How can I actually get the count of Goto
statements or atleast access these goto statements using some
iterator.
  I have used -fdump-tree-all, -fdump-tree-cfg as flags.

Here is the pseudocode:

struct register_pass_info pass_info = {
&(pass_plugin.pass), /* Address of new pass,
here, the 'struct
 opt_pass' field of
'gimple_opt_pass'
 defined above */
"ssa",   /* Name of the reference
pass for hooking up
 the new pass.   ??? */
0,   /* Insert the pass at the
specified instance
 number of the reference
pass. Do it for
 every instance if it is 0. */
PASS_POS_INSERT_AFTER/* how to insert the new
pass: before,
 after, or replace. Here
we are inserting
 a pass names 'plug' after
the pass named
 'pta' */
};

.

static unsigned int dead_code_elimination (void)
{

   FOR_EACH_BB_FN (bb, cfun)
 {
  //  gimple_dump_bb(stdout,bb,0,0);
 //printf("\nIn New BB");

   gsi2= gsi_after_labels (bb);
  print_gimple_stmt(stdout,gsi_stmt(gsi2),0,0);
 /*Iterating over each gimple statement in a basic block*/
for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
   g = gsi_stmt(gsi);

print_gimple_stmt(stdout,g,0,0);

  if (gimple_code(g)==GIMPLE_GOTO)
  printf("\nFound GOTO stmt\n");

print_gimple_stmt(stdout,gsi_stmt(gsi),0,0);
  //analyze_gimple_statement (gsi);
 }
   }
}


Re: Help Required on Missing GOTO statements in Gimple/SSA/CFG Pass ...

2014-02-17 Thread Basile Starynkevitch
On Tue, 2014-02-18 at 11:17 +0530, Mohsin Khan wrote:
> Hi,
> 
>  I am developing plugins for the GCC-4.8.2. I am a newbie in plugins.
> I wrote a plugin and tried to count and see the Goto Statements using
> the gimple_stmt_iterator. I get gimple statements printed on my
> stdout, but I am not able to find the line which has goto statements.

I guess that most GOTOs are just becoming implicit as the link to the
next basic block.

Probably 

   if (!cond) goto end;
   something;
  end:;

has nearly the same Gimple representation than
   while (cond) {
 something;
   }

BTW, did you consider using MELT http://gcc-melt.org/ to code your GCC
extension?

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***