Re: Vectorizer Pragmas
On 16 February 2014 23:44, Tim Prince wrote: > I don't think many people want to use both OpenMP 4 and older Intel > directives together. I'm having less and less incentives to use anything other than omp4, cilk and whatever. I think we should be able to map all our internal needs to those pragmas. On the other hand, if you guys have any cross discussion with Intel folks about it, I'd love to hear. Since our support for those directives are a bit behind, would be good not to duplicate the efforts in the long run. Thanks! --renato
Re: TYPE_BINFO and canonical types at LTO
On Mon, 17 Feb 2014, Jan Hubicka wrote: > > On Fri, 14 Feb 2014, Jan Hubicka wrote: > > > > > > > This smells bad, since it is given a canonical type that is after the > > > > > structural equivalency merging that ignores BINFOs, so it may be > > > > > completely > > > > > different class with completely different bases than the original. > > > > > Bases are > > > > > structuraly merged, too and may be exchanged for normal fields because > > > > > DECL_ARTIFICIAL (that separate bases and fields) does not seem to be > > > > > part of > > > > > the canonical type definition in LTO. > > > > > > > > Can you elaborate on that DECL_ARTIFICIAL thing? That is, what is > > > > broken > > > > by considering all fields during that merging? > > > > > > To make the code work with LTO, one can not merge > > > struct B {struct A a} > > > struct B: A {} > > > > > > these IMO differ only by DECL_ARTIFICIAL flag on the fields. > > > > "The code" == that BINFO walk? Is that because we walk a completely > > Yes. > > > unrelated BINFO chain? I'd say we should have merged its types > > so that difference shouldn't matter. > > > > Hopefully ;) > > I am trying to make point that will matter. Here is completed testcase above: > > struct A {int a;}; > struct C:A {}; > struct B {struct A a;}; > struct C *p2; > struct B *p1; > int > t() > { > p1->a.a = 2; > return p2->a; > } > > With patch I get: > > Index: lto/lto.c > === > --- lto/lto.c (revision 20) > +++ lto/lto.c (working copy) > @@ -49,6 +49,8 @@ along with GCC; see the file COPYING3. > #include "data-streamer.h" > #include "context.h" > #include "pass_manager.h" > +#include "print-tree.h" > > > /* Number of parallel tasks to run, -1 if we want to use GNU Make jobserver. > */ > @@ -619,6 +621,15 @@ gimple_canonical_type_eq (const void *p1 > { >const_tree t1 = (const_tree) p1; >const_tree t2 = (const_tree) p2; > + if (gimple_canonical_types_compatible_p (CONST_CAST_TREE (t1), > + CONST_CAST_TREE (t2)) > + && TREE_CODE (CONST_CAST_TREE (t1)) == RECORD_TYPE) > + { > + debug_tree (CONST_CAST_TREE (t1)); > + fprintf (stderr, "bases:%i\n", BINFO_BASE_BINFOS (TYPE_BINFO > (t1))->length()); > + debug_tree (CONST_CAST_TREE (t2)); > + fprintf (stderr, "bases:%i\n", BINFO_BASE_BINFOS (TYPE_BINFO > (t2))->length()); > + } >return gimple_canonical_types_compatible_p (CONST_CAST_TREE (t1), > CONST_CAST_TREE (t2)); > } > > size bitsizetype> constant 32> > unit size sizetype> constant 4> > align 32 symtab 0 alias set -1 canonical type 0x76c52888 > fields type 0x76ae83a0 32> unit size > align 32 symtab 0 alias set -1 canonical type 0x76c52738 > fields context 0x76af2e60 D.2821> > chain > > nonlocal SI file t.C line 3 col 20 size 32> unit size > align 32 offset_align 128 > offset > bit offset context > > chain > nonlocal VOID file t.C line 3 col 10 > align 1 context result > >> context 0x76af2e60 D.2821> > pointer_to_this chain 0x76c550b8 B>> > bases:0 > size bitsizetype> constant 32> > unit size sizetype> constant 4> > align 32 symtab 0 alias set -1 structural equality > fields type 0x76ae83a0 32> unit size > align 32 symtab 0 alias set -1 canonical type 0x76c52738 > fields context 0x76af2e60 D.2821> > chain > > ignored SI file t.C line 2 col 8 size > unit size > align 32 offset_align 128 > offset > bit offset context > > chain > nonlocal VOID file t.C line 2 col 12 > align 1 context result > >> context 0x76af2e60 D.2821> > chain > > bases:1 > > So we prevail structure B with structure C. One has bases to walk other > doesn't. If that BINFO walk in alias.c (on canonical types) did > something useful, we have a wrong code bug. Yeah, ok. But we treat those types (B and C) TBAA equivalent because structurally they are the same ;) Luckily C has a "proper" field for its base (proper means that offset and size are correct as well as the type). It indeed has DECL_ARTIFICIAL set and yes, we treat those as "real" fields when doing the structural comparison. More interesting is of course when we can re-use tail-padding in one but not the other (works as expected - not merged). struct A { A (); short x; bool a;}; struct C:A { bool b; }; struct B {struct A a; bool b;}; struct C *p2; struct B *p1; int t() { p1->a.a = 2; return p2->a; } > Yes, zero sized classes are those having no fields (but other stuff, > type decls, bases etc.) Yeah, but TBAA obviously doesn't care about type decls and bases. Richard.
Re: Need help: Is a VAR_DECL type builtin or not?
On Fri, Feb 14, 2014 at 02:40:44PM +0100, Richard Biener wrote: > On Fri, Feb 14, 2014 at 9:59 AM, Dominik Vogt wrote: > > Given a specific VAR_DECL tree node, I need to find out whether > > its type is built in or not. Up to now I have > > > > tree tn = TYPE_NAME (TREE_TYPE (var_decl)); > > if (tn != NULL_TREE && TREE_CODE (tn) == TYPE_DECL && DECL_NAME (tn)) > > { > > ... > > } > > > > This if-condition is true for both, > > > > int x; > > const int x; > > ... > > > > and > > > > typedef int i_t; > > i_t x; > > const i_t x; > > ... > > > > I need to weed out the class of VAR_DECLs that directly use built > > in types. > > Try DECL_IS_BUILTIN. But I question how you define "builtin" here? Well, actually I'm working on the variable output function in godump.c. At the moment, if the code comes across typedef char c_t chat c1; c_t c2; it emits type _c_t byte var c1 byte var c2 byte This is fine for c1, but for c2 it should really use the type: var c2 _c_t So the rule I'm trying to implement is: Given a Tree node that is a VAR_DECL, if its type is an "alias" (defined with typedef/union/struct/class etc.), use the name of the alias, otherwise resolve the type recursively until only types built into the language are left. It's really only about the underlying data types (int, float, _Complex etc.), not about storage classes, pointers, attributes, qualifiers etc. Well, since godump.c already caches all declarations it has come across, I could assume that these declarations are not built-in and use that in the "rule" above. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: Need help: Is a VAR_DECL type builtin or not?
On Mon, Feb 17, 2014 at 1:15 PM, Dominik Vogt wrote: > On Fri, Feb 14, 2014 at 02:40:44PM +0100, Richard Biener wrote: >> On Fri, Feb 14, 2014 at 9:59 AM, Dominik Vogt >> wrote: >> > Given a specific VAR_DECL tree node, I need to find out whether >> > its type is built in or not. Up to now I have >> > >> > tree tn = TYPE_NAME (TREE_TYPE (var_decl)); >> > if (tn != NULL_TREE && TREE_CODE (tn) == TYPE_DECL && DECL_NAME (tn)) >> > { >> > ... >> > } >> > >> > This if-condition is true for both, >> > >> > int x; >> > const int x; >> > ... >> > >> > and >> > >> > typedef int i_t; >> > i_t x; >> > const i_t x; >> > ... >> > >> > I need to weed out the class of VAR_DECLs that directly use built >> > in types. >> >> Try DECL_IS_BUILTIN. But I question how you define "builtin" here? > > Well, actually I'm working on the variable output function in > godump.c. At the moment, if the code comes across > > typedef char c_t > chat c1; > c_t c2; > > it emits > > type _c_t byte > var c1 byte > var c2 byte > > This is fine for c1, but for c2 it should really use the type: > > var c2 _c_t > > So the rule I'm trying to implement is: > > Given a Tree node that is a VAR_DECL, if its type is an "alias" > (defined with typedef/union/struct/class etc.), use the name of > the alias, otherwise resolve the type recursively until only > types built into the language are left. > > It's really only about the underlying data types (int, float, > _Complex etc.), not about storage classes, pointers, attributes, > qualifiers etc. > > Well, since godump.c already caches all declarations it has come > across, I could assume that these declarations are not built-in > and use that in the "rule" above. Not sure what GO presents us as location info, but DECL_IS_BUILTIN looks if the line the type was declared is sth "impossible" (reserved and supposed to be used for all types that do not have to be declared). Richard. > Ciao > > Dominik ^_^ ^_^ > > -- > > Dominik Vogt > IBM Germany >
Re: Vectorizer Pragmas
On 2/17/2014 4:42 AM, Renato Golin wrote: On 16 February 2014 23:44, Tim Prince wrote: I don't think many people want to use both OpenMP 4 and older Intel directives together. I'm having less and less incentives to use anything other than omp4, cilk and whatever. I think we should be able to map all our internal needs to those pragmas. On the other hand, if you guys have any cross discussion with Intel folks about it, I'd love to hear. Since our support for those directives are a bit behind, would be good not to duplicate the efforts in the long run. I'm continuing discussions with former Intel colleagues. If you are asking for insight into how Intel priorities vary over time, I don't expect much, unless the next beta compiler provides some inferences. They have talked about implementing all of OpenMP 4.0 except user defined reduction this year. That would imply more activity in that area than on cilkplus, although some fixes have come in the latter. On the other hand I had an issue on omp simd reduction(max: ) closed with the decision "will not be fixed." I have an icc problem report in on fixing omp simd safelen so it is more like the standard and less like the obsolete pragma simd vectorlength. Also, I have some problem reports active attempting to get clarification of their omp target implementation. You may have noticed that omp parallel for simd in current Intel compilers can be used for combined thread and simd parallelism, including the case where the outer loop is parallelizable and vectorizable but the inner one is not. -- Tim Prince
Re: Vectorizer Pragmas
On 17 February 2014 14:47, Tim Prince wrote: > I'm continuing discussions with former Intel colleagues. If you are asking > for insight into how Intel priorities vary over time, I don't expect much, > unless the next beta compiler provides some inferences. They have talked > about implementing all of OpenMP 4.0 except user defined reduction this > year. That would imply more activity in that area than on cilkplus, I'm expecting this. Any proposal to support Cilk in LLVM would be purely temporary and not endorsed in any way. > although some fixes have come in the latter. On the other hand I had an > issue on omp simd reduction(max: ) closed with the decision "will not be > fixed." We still haven't got pragmas for induction/reduction logic, so I'm not too worried about them. > I have an icc problem report in on fixing omp simd safelen so it is more > like the standard and less like the obsolete pragma simd vectorlength. Our width metadata is slightly different in that it means "try to use that length", rather than "it's safe to use that length", this is why I'm holding on use safelen for the moment. > Also, I have some problem reports active attempting to get clarification of > their omp target implementation. Same here... RTFM is not enough in this case. ;) > You may have noticed that omp parallel for simd in current Intel compilers > can be used for combined thread and simd parallelism, including the case > where the outer loop is parallelizable and vectorizable but the inner one is > not. That's my fear of going with omp simd directly. I don't want to be throwing threads all over the place when all I really want is vector code. For the time, my proposal is to use legacy pragmas: vector/novector, unroll/nounroll and simd vectorlength which map nicely to the metadata we already have and don't incur in OpenMP overhead. Later on, if OpenMP ends up with simple non-threaded pragmas, we should use those and deprecate the legacy ones. If GCC is trying to do the same thing regarding non-threaded-vector code, I'd be glad to be involved in the discussion. Some LLVM folks think this should be an OpenMP discussion, I personally think it's pushing the boundaries a bit too much on an inherently threaded library extension. cheers, --renato
Re: [RFC] Offloading Support in libgomp
On 14 Feb 16:43, Jakub Jelinek wrote: > So, perhaps we should just stop for now oring the copyfrom in and just use > the copyfrom from the very first mapping only, and wait for what the committee > actually agrees on. > > Jakub Like this? @@ -171,11 +171,16 @@ gomp_map_vars_existing (splay_tree_key oldn, splay_tree_key newn, "[%p..%p) is already mapped", (void *) newn->host_start, (void *) newn->host_end, (void *) oldn->host_start, (void *) oldn->host_end); +#if 0 + /* FIXME: Remove this when OpenMP 4.0 will be standardized. Currently it's + unclear regarding overwriting copy_from for the existing mapping. + See http://gcc.gnu.org/ml/gcc/2014-02/msg00208.html for details. */ if (((kind & 7) == 2 || (kind & 7) == 3) && !oldn->copy_from && oldn->host_start == newn->host_start && oldn->host_end == newn->host_end) oldn->copy_from = true; +#endif oldn->refcount++; } -- Ilya
Re: [RFC] Offloading Support in libgomp
On Mon, Feb 17, 2014 at 07:59:16PM +0400, Ilya Verbin wrote: > On 14 Feb 16:43, Jakub Jelinek wrote: > > So, perhaps we should just stop for now oring the copyfrom in and just use > > the copyfrom from the very first mapping only, and wait for what the > > committee > > actually agrees on. > > > > Jakub > > Like this? > > @@ -171,11 +171,16 @@ gomp_map_vars_existing (splay_tree_key oldn, > splay_tree_key newn, > "[%p..%p) is already mapped", > (void *) newn->host_start, (void *) newn->host_end, > (void *) oldn->host_start, (void *) oldn->host_end); > +#if 0 > + /* FIXME: Remove this when OpenMP 4.0 will be standardized. Currently it's > + unclear regarding overwriting copy_from for the existing mapping. > + See http://gcc.gnu.org/ml/gcc/2014-02/msg00208.html for details. */ >if (((kind & 7) == 2 || (kind & 7) == 3) >&& !oldn->copy_from >&& oldn->host_start == newn->host_start >&& oldn->host_end == newn->host_end) > oldn->copy_from = true; > +#endif >oldn->refcount++; > } Well, OpenMP 4.0 is a released standard, just in some cases ambiguous or buggy. I'd just remove the code rather than putting it into #if 0, patch preapproved. It will stay in the SVN history... Jakub
Re: Need help: Is a VAR_DECL type builtin or not?
On Mon, Feb 17, 2014 at 5:28 AM, Richard Biener wrote: > On Mon, Feb 17, 2014 at 1:15 PM, Dominik Vogt wrote: >> On Fri, Feb 14, 2014 at 02:40:44PM +0100, Richard Biener wrote: >>> On Fri, Feb 14, 2014 at 9:59 AM, Dominik Vogt >>> wrote: >>> > Given a specific VAR_DECL tree node, I need to find out whether >>> > its type is built in or not. Up to now I have >>> > >>> > tree tn = TYPE_NAME (TREE_TYPE (var_decl)); >>> > if (tn != NULL_TREE && TREE_CODE (tn) == TYPE_DECL && DECL_NAME (tn)) >>> > { >>> > ... >>> > } >>> > >>> > This if-condition is true for both, >>> > >>> > int x; >>> > const int x; >>> > ... >>> > >>> > and >>> > >>> > typedef int i_t; >>> > i_t x; >>> > const i_t x; >>> > ... >>> > >>> > I need to weed out the class of VAR_DECLs that directly use built >>> > in types. >>> >>> Try DECL_IS_BUILTIN. But I question how you define "builtin" here? >> >> Well, actually I'm working on the variable output function in >> godump.c. At the moment, if the code comes across >> >> typedef char c_t >> chat c1; >> c_t c2; >> >> it emits >> >> type _c_t byte >> var c1 byte >> var c2 byte >> >> This is fine for c1, but for c2 it should really use the type: >> >> var c2 _c_t >> >> So the rule I'm trying to implement is: >> >> Given a Tree node that is a VAR_DECL, if its type is an "alias" >> (defined with typedef/union/struct/class etc.), use the name of >> the alias, otherwise resolve the type recursively until only >> types built into the language are left. >> >> It's really only about the underlying data types (int, float, >> _Complex etc.), not about storage classes, pointers, attributes, >> qualifiers etc. >> >> Well, since godump.c already caches all declarations it has come >> across, I could assume that these declarations are not built-in >> and use that in the "rule" above. > > Not sure what GO presents us as location info, but DECL_IS_BUILTIN > looks if the line the type was declared is sth "impossible" (reserved > and supposed to be used for all types that do not have to be declared). godump.c is actually not used by the Go frontend. The purpose of godump.c is to read C header files and dump them in a Go representation. It's used when building the Go library, to get Go versions of system structures like struct stat. I'm not quite sure what Dominik is after. For system structures using the basic type, the underlying type of a typedef, is normally what you want. But to answer the question as stated, I think I would look at functions like is_naming_typedef_decl in dwarf2out.c, since this sounds like the kind of question that debug info needs to sort out. Ian
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote: > On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote: > > You need volatile semantics to force the compiler to ignore any proofs > > it might otherwise attempt to construct. Hence all the ACCESS_ONCE() > > calls in my email to Torvald. (Hopefully I translated your example > > reasonably.) > > My brain gave out for today; but it did appear to have the right > structure. I can relate. ;-) > I would prefer it C11 would not require the volatile casts. It should > simply _never_ speculate with atomic writes, volatile or not. I agree with not needing volatiles to prevent speculated writes. However, they will sometimes be needed to prevent excessive load/store combining. The compiler doesn't have the runtime feedback mechanisms that the hardware has, and thus will need help from the developer from time to time. Or maybe the Linux kernel simply waits to transition to C11 relaxed atomics until the compiler has learned to be sufficiently conservative in its load-store combining decisions. Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Sat, 15 Feb 2014, Torvald Riegel wrote: > glibc is a counterexample that comes to mind, although it's a smaller > code base. (It's currently not using C11 atomics, but transitioning > there makes sense, and some thing I want to get to eventually.) glibc is using C11 atomics (GCC builtins rather than _Atomic / , but using __atomic_* with explicitly specified memory model rather than the older __sync_*) on AArch64, plus in certain cases on ARM and MIPS. -- Joseph S. Myers jos...@codesourcery.com
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 06:59:31PM +, Joseph S. Myers wrote: > On Sat, 15 Feb 2014, Torvald Riegel wrote: > > > glibc is a counterexample that comes to mind, although it's a smaller > > code base. (It's currently not using C11 atomics, but transitioning > > there makes sense, and some thing I want to get to eventually.) > > glibc is using C11 atomics (GCC builtins rather than _Atomic / > , but using __atomic_* with explicitly specified memory model > rather than the older __sync_*) on AArch64, plus in certain cases on ARM > and MIPS. Hmm, actually that results in a change in behaviour for the __sync_* primitives on AArch64. The documentation for those states that: `In most cases, these built-in functions are considered a full barrier. That is, no memory operand is moved across the operation, either forward or backward. Further, instructions are issued as necessary to prevent the processor from speculating loads across the operation and from queuing stores after the operation.' which is stronger than simply mapping them to memory_model_seq_cst, which seems to be what the AArch64 compiler is doing (so you get acquire + release instead of a full fence). Will
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-17 at 18:59 +, Joseph S. Myers wrote: > On Sat, 15 Feb 2014, Torvald Riegel wrote: > > > glibc is a counterexample that comes to mind, although it's a smaller > > code base. (It's currently not using C11 atomics, but transitioning > > there makes sense, and some thing I want to get to eventually.) > > glibc is using C11 atomics (GCC builtins rather than _Atomic / > , but using __atomic_* with explicitly specified memory model > rather than the older __sync_*) on AArch64, plus in certain cases on ARM > and MIPS. I think the major steps remaining is moving the other architectures over, and rechecking concurrent code (e.g., for the code that I have seen, it was either asm variants (eg, on x86), or built before C11; ARM pthread_once was lacking memory_barriers (see "pthread_once unification" patches I posted)). We also need/should to move towards using relaxed-MO atomic loads instead of plain loads.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Sat, 2014-02-15 at 10:49 -0800, Linus Torvalds wrote: > On Sat, Feb 15, 2014 at 9:45 AM, Torvald Riegel wrote: > > > > I think a major benefit of C11's memory model is that it gives a > > *precise* specification for how a compiler is allowed to optimize. > > Clearly it does *not*. This whole discussion is proof of that. It's > not at all clear, It might not be an easy-to-understand specification, but as far as I'm aware it is precise. The Cambridge group's formalization certainly is precise. From that, one can derive (together with the usual rules for as-if etc.) what a compiler is allowed to do (assuming that the standard is indeed precise). My replies in this discussion have been based on reasoning about the standard, and not secret knowledge (with the exception of no-out-of-thin-air, which is required in the standard's prose but not yet formalized). I agree that I'm using the formalization as a kind of placeholder for the standard's prose (which isn't all that easy to follow for me either), but I guess there's no way around an ISO standard using prose. If you see a case in which the standard isn't precise, please bring it up or open a C++ CWG issue for it. > and the standard apparently is at least debatably > allowing things that shouldn't be allowed. Which example do you have in mind here? Haven't we resolved all the debated examples, or did I miss any? > It's also a whole lot more > complicated than "volatile", so the likelihood of a compiler writer > actually getting it right - even if the standard does - is lower. It's not easy, that's for sure, but none of the high-performance alternatives are easy either. There are testing tools out there based on the formalization of the model, and we've found bugs with them. And the alternative of using something not specified by the standard is even worse, I think, because then you have to guess what a compiler might do, without having any constraints; IOW, one is resorting to "no sane compiler would do that", and that doesn't seem to very robust either.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel wrote: > > Which example do you have in mind here? Haven't we resolved all the > debated examples, or did I miss any? Well, Paul seems to still think that the standard possibly allows speculative writes or possibly value speculation in ways that break the hardware-guaranteed orderings. And personally, I can't read standards paperwork. It is invariably written in some basically impossible-to-understand lawyeristic mode, and then it is read by people (compiler writers) that intentionally try to mis-use the words and do language-lawyering ("that depends on what the meaning of 'is' is"). The whole "lvalue vs rvalue expression vs 'what is a volatile access'" thing for C++ was/is a great example of that. So quite frankly, as a result I refuse to have anything to do with the process directly. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 08:55:47PM +0100, Torvald Riegel wrote: > On Sat, 2014-02-15 at 10:49 -0800, Linus Torvalds wrote: > > On Sat, Feb 15, 2014 at 9:45 AM, Torvald Riegel wrote: > > > > > > I think a major benefit of C11's memory model is that it gives a > > > *precise* specification for how a compiler is allowed to optimize. > > > > Clearly it does *not*. This whole discussion is proof of that. It's > > not at all clear, > > It might not be an easy-to-understand specification, but as far as I'm > aware it is precise. The Cambridge group's formalization certainly is > precise. From that, one can derive (together with the usual rules for > as-if etc.) what a compiler is allowed to do (assuming that the standard > is indeed precise). My replies in this discussion have been based on > reasoning about the standard, and not secret knowledge (with the > exception of no-out-of-thin-air, which is required in the standard's > prose but not yet formalized). > > I agree that I'm using the formalization as a kind of placeholder for > the standard's prose (which isn't all that easy to follow for me > either), but I guess there's no way around an ISO standard using prose. > > If you see a case in which the standard isn't precise, please bring it > up or open a C++ CWG issue for it. I suggest that I go through the Linux kernel's requirements for atomics and memory barriers and see how they map to C11 atomics. With that done, we would have very specific examples to go over. Without that done, the discussion won't converge very well. Seem reasonable? Thanx, Paul > > and the standard apparently is at least debatably > > allowing things that shouldn't be allowed. > > Which example do you have in mind here? Haven't we resolved all the > debated examples, or did I miss any? > > > It's also a whole lot more > > complicated than "volatile", so the likelihood of a compiler writer > > actually getting it right - even if the standard does - is lower. > > It's not easy, that's for sure, but none of the high-performance > alternatives are easy either. There are testing tools out there based > on the formalization of the model, and we've found bugs with them. > > And the alternative of using something not specified by the standard is > even worse, I think, because then you have to guess what a compiler > might do, without having any constraints; IOW, one is resorting to "no > sane compiler would do that", and that doesn't seem to very robust > either. > >
Re: [RFC][PATCH 0/5] arch: atomic rework
On February 17, 2014 7:18:15 PM GMT+01:00, "Paul E. McKenney" wrote: >On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote: >> On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote: >> > You need volatile semantics to force the compiler to ignore any >proofs >> > it might otherwise attempt to construct. Hence all the >ACCESS_ONCE() >> > calls in my email to Torvald. (Hopefully I translated your example >> > reasonably.) >> >> My brain gave out for today; but it did appear to have the right >> structure. > >I can relate. ;-) > >> I would prefer it C11 would not require the volatile casts. It should >> simply _never_ speculate with atomic writes, volatile or not. > >I agree with not needing volatiles to prevent speculated writes. >However, >they will sometimes be needed to prevent excessive load/store >combining. >The compiler doesn't have the runtime feedback mechanisms that the >hardware has, and thus will need help from the developer from time >to time. > >Or maybe the Linux kernel simply waits to transition to C11 relaxed >atomics >until the compiler has learned to be sufficiently conservative in its >load-store combining decisions. Sounds backwards. Currently the compiler does nothing to the atomics. I'm sure we'll eventually add something. But if testing coverage is zero outside then surely things get worse, not better with time. Richard. > Thanx, Paul
FreeBSD users of gcc
Greetings, I am the named maintainer of the freebsd port. I have been for approximately twelve years; although I haven't been very active the last four years. The last major work I put into the freebsd port was at the end of 2009. I have reviewed others' patches since then; but it really hasn't required anything major since David O'Brien and I did foundational work in the early 200Xs (which itself was based on many others' foundations). Gerald Pfeifer has also done much to keep the port in a good shape. (I also don't want to ignore the many patches that came from members of the FreeBSD core team and other FreeBSD users.) To complicate matters, I haven't been using FreeBSD on my primary desktop or otherwise since early 2011. FreeBSD is listed as a tier one platform. Therefore, I am looking for someone that both the GCC steering committee and I would be willing to hand over the reigns before I drop my officially-listed maintainership. The expected person will likely already have Write After Approval status. Please contact me directly, if you are qualified and interested in becoming the freebsd OS port maintainer. Regards, Loren
Re: TYPE_BINFO and canonical types at LTO
> > Yeah, ok. But we treat those types (B and C) TBAA equivalent because > structurally they are the same ;) Luckily C has a "proper" field > for its base (proper means that offset and size are correct as well > as the type). It indeed has DECL_ARTIFICIAL set and yes, we treat > those as "real" fields when doing the structural comparison. Yep, the difference is that depending if C or D win, we will end up walking the BINFO or not. So we should not depend on the BINFo walk for correctness. > > More interesting is of course when we can re-use tail-padding in > one but not the other (works as expected - not merged). Yep. > > struct A { A (); short x; bool a;}; > struct C:A { bool b; }; > struct B {struct A a; bool b;}; > struct C *p2; > struct B *p1; > int > t() > { > p1->a.a = 2; > return p2->a; > } > > > Yes, zero sized classes are those having no fields (but other stuff, > > type decls, bases etc.) > > Yeah, but TBAA obviously doesn't care about type decls and bases. So I guess the conclussion is that the BINFO walk in alias.c is pointless? Concerning the merging details and LTO aliasing, I think for 4.10 we should make C++ to compute mangled names of types (i.e. call DECL_ASSEMBLER_NAME on the associated type_decl + explicitly mark that type is driven by ODR) and then we can do merging driven by ODR rule. Non-ODR types born from other frontends will then need to be made to alias all the ODR variants that can be done by storing them into the current canonical type hash. (I wonder if we want to support cross language aliasing for non-POD?) I also think we want explicit representation of types known to be local to compilation unit - anonymous namespaces in C/C++, types defined within function bodies in C and god knows what in Ada/Fortran/Java. Honza > > Richard.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-17 at 12:23 -0800, Paul E. McKenney wrote: > On Mon, Feb 17, 2014 at 08:55:47PM +0100, Torvald Riegel wrote: > > On Sat, 2014-02-15 at 10:49 -0800, Linus Torvalds wrote: > > > On Sat, Feb 15, 2014 at 9:45 AM, Torvald Riegel > > > wrote: > > > > > > > > I think a major benefit of C11's memory model is that it gives a > > > > *precise* specification for how a compiler is allowed to optimize. > > > > > > Clearly it does *not*. This whole discussion is proof of that. It's > > > not at all clear, > > > > It might not be an easy-to-understand specification, but as far as I'm > > aware it is precise. The Cambridge group's formalization certainly is > > precise. From that, one can derive (together with the usual rules for > > as-if etc.) what a compiler is allowed to do (assuming that the standard > > is indeed precise). My replies in this discussion have been based on > > reasoning about the standard, and not secret knowledge (with the > > exception of no-out-of-thin-air, which is required in the standard's > > prose but not yet formalized). > > > > I agree that I'm using the formalization as a kind of placeholder for > > the standard's prose (which isn't all that easy to follow for me > > either), but I guess there's no way around an ISO standard using prose. > > > > If you see a case in which the standard isn't precise, please bring it > > up or open a C++ CWG issue for it. > > I suggest that I go through the Linux kernel's requirements for atomics > and memory barriers and see how they map to C11 atomics. With that done, > we would have very specific examples to go over. Without that done, the > discussion won't converge very well. > > Seem reasonable? Sounds good!
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-17 at 12:18 -0800, Linus Torvalds wrote: > On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel wrote: > > > > Which example do you have in mind here? Haven't we resolved all the > > debated examples, or did I miss any? > > Well, Paul seems to still think that the standard possibly allows > speculative writes or possibly value speculation in ways that break > the hardware-guaranteed orderings. That's true, I just didn't see any specific examples so far. > And personally, I can't read standards paperwork. It is invariably > written in some basically impossible-to-understand lawyeristic mode, Yeah, it's not the most intuitive form for things like the memory model. > and then it is read by people (compiler writers) that intentionally > try to mis-use the words and do language-lawyering ("that depends on > what the meaning of 'is' is"). That assumption about people working on compilers is a little too broad, don't you think? I think that it is important to stick to a specification, in the same way that one wouldn't expect a program with undefined behavior make any sense of it, magically, in cases where stuff is undefined. However, that of course doesn't include trying to exploit weasel-wording (BTW, both users and compiler writers try to do it). IMHO, weasel-wording in a standard is a problem in itself even if not exploited, and often it indicates that there is a real issue. There might be reasons to have weasel-wording (e.g., because there's no known better way to express it like in case of the not really precise no-out-of-thin-air rule today), but nonetheless those aren't ideal. > The whole "lvalue vs rvalue expression > vs 'what is a volatile access'" thing for C++ was/is a great example > of that. I'm not aware of the details of this. > So quite frankly, as a result I refuse to have anything to do with the > process directly. That's unfortunate. Then please work with somebody that isn't uncomfortable with participating directly in the process. But be warned, it may very well be a person working on compilers :) Have you looked at the formalization of the model by Batty et al.? The overview of this is prose, but the formalized model itself is all formal relations and logic. So there should be no language-lawyering issues with that form. (For me, the formalized model is much easier to reason about.)
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 1:21 PM, Torvald Riegel wrote: > On Mon, 2014-02-17 at 12:18 -0800, Linus Torvalds wrote: >> and then it is read by people (compiler writers) that intentionally >> try to mis-use the words and do language-lawyering ("that depends on >> what the meaning of 'is' is"). > > That assumption about people working on compilers is a little too broad, > don't you think? Let's just say that *some* are that way, and those are the ones that I end up butting heads with. The sane ones I never have to argue with - point them at a bug, and they just say "yup, bug". The insane ones say "we don't need to fix that, because if you read this copy of the standards that have been translated to chinese and back, it clearly says that this is acceptable". >> The whole "lvalue vs rvalue expression >> vs 'what is a volatile access'" thing for C++ was/is a great example >> of that. > > I'm not aware of the details of this. The argument was that an lvalue doesn't actually "access" the memory (an rvalue does), so this: volatile int *p = ...; *p; doesn't need to generate a load from memory, because "*p" is still an lvalue (since you could assign things to it). This isn't an issue in C, because in C, expression statements are always rvalues, but C++ changed that. The people involved with the C++ standards have generally been totally clueless about their subtle changes. I may have misstated something, but basically some C++ people tried very hard to make "volatile" useless. We had other issues too. Like C compiler people who felt that the type-based aliasing should always override anything else, even if the variable accessed (through different types) was statically clearly aliasing and used the exact same pointer. That made it impossible to do a syntactically clean model of "this aliases", since the _only_ exception to the type-based aliasing rule was to generate a union for every possible access pairing. We turned off type-based aliasing (as I've mentioned before, I think it's a fundamentally broken feature to begin with, and a horrible horrible hack that adds no value for anybody but the HPC people). Gcc eventually ended up having some sane syntax for overriding it, but by then I was too disgusted with the people involved to even care. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Sat, 2014-02-15 at 11:15 -0800, Linus Torvalds wrote: > On Sat, Feb 15, 2014 at 9:30 AM, Torvald Riegel wrote: > > > > I think the example is easy to misunderstand, because the context isn't > > clear. Therefore, let me first try to clarify the background. > > > > (1) The abstract machine does not write speculatively. > > (2) Emitting a branch instruction and executing a branch at runtime is > > not part of the specified behavior of the abstract machine. Of course, > > the abstract machine performs conditional execution, but that just > > specifies the output / side effects that it must produce (e.g., volatile > > stores) -- not with which hardware instructions it is producing this. > > (3) A compiled program must produce the same output as if executed by > > the abstract machine. > > Ok, I'm fine with that. > > > Thus, we need to be careful what "speculative store" is meant to refer > > to. A few examples: > > > > if (atomic_load(&x, mo_relaxed) == 1) > > atomic_store(&y, 3, mo_relaxed)); > > No, please don't use this idiotic example. It is wrong. It won't be useful in practice in a lot of cases, but that doesn't mean it's wrong. It's clearly not illegal code. It also serves a purpose: a simple example to reason about a few aspects of the memory model. > The fact is, if a compiler generates anything but the obvious sequence > (read/cmp/branch/store - where branch/store might obviously be done > with some other machine conditional like a predicate), the compiler is > wrong. Why? I've reasoned why (1) to (3) above allow in certain cases (i.e., the first load always returning 1) for the branch (or other machine conditional) to not be emitted. So please either poke holes into this reasoning, or clarify that you don't in fact, contrary to what you wrote above, agree with (1) to (3). > Anybody who argues anything else is wrong, or confused, or confusing. I appreciate your opinion, and maybe I'm just one of the three things above (my vote is on "confusing"). But without you saying why doesn't help me see what's the misunderstanding here. > Instead, argue about *other* sequences where the compiler can do something. I'd prefer if we could clarify the misunderstanding for the simple case first that doesn't involve stronger ordering requirements in the form of non-relaxed MOs. > For example, this sequence: > >atomic_store(&x, a, mo_relaxed); >b = atomic_load(&x, mo_relaxed); > > can validly be transformed to > >atomic_store(&x, a, mo_relaxed); >b = (typeof(x)) a; > > and I think everybody agrees about that. In fact, that optimization > can be done even for mo_strict. Yes. > But even that "obvious" optimization has subtle cases. What if the > store is relaxed, but the load is strict? You can't do the > optimization without a lot of though, because dropping the strict load > would drop an ordering point. So even the "store followed by exact > same load" case has subtle issues. Yes if a compiler wants to optimize that, it has to give it more thought. My gut feeling is that either the store should get the stronger ordering, or the accesses should be merged. But I'd have to think more about that one (which I can do on request). > With similar caveats, it is perfectly valid to merge two consecutive > loads, and to merge two consecutive stores. > > Now that means that the sequence > > atomic_store(&x, 1, mo_relaxed); > if (atomic_load(&x, mo_relaxed) == 1) > atomic_store(&y, 3, mo_relaxed); > > can first be optimized to > > atomic_store(&x, 1, mo_relaxed); > if (1 == 1) > atomic_store(&y, 3, mo_relaxed); > > and then you get the end result that you wanted in the first place > (including the ability to re-order the two stores due to the relaxed > ordering, assuming they can be proven to not alias - and please don't > use the idiotic type-based aliasing rules). > > Bringing up your first example is pure and utter confusion. Sorry if it was confusing. But then maybe we need to talk about it more, because it shouldn't be confusing if we agree on what the memory model allows and what not. I had originally picked the example because it was related to the example Paul/Peter brought up. > Don't do > it. Instead, show what are obvious and valid transformations, and then > you can bring up these kinds of combinations as "look, this is > obviously also correct". I have my doubts whether the best way to reason about the memory model is by thinking about specific compiler transformations. YMMV, obviously. The -- kind of vague -- reason is that the allowed transformations will be more complicated to reason about than the allowed output of a concurrent program when understanding the memory model (ie, ordering and interleaving of memory accesses, etc.). However, I can see that when trying to optimize with a hardware memory model in mind, this might look appealing. What the compiler will do is exploiting knowledge about all possible executions
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 09:39:54PM +0100, Richard Biener wrote: > On February 17, 2014 7:18:15 PM GMT+01:00, "Paul E. McKenney" > wrote: > >On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote: > >> On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote: > >> > You need volatile semantics to force the compiler to ignore any > >proofs > >> > it might otherwise attempt to construct. Hence all the > >ACCESS_ONCE() > >> > calls in my email to Torvald. (Hopefully I translated your example > >> > reasonably.) > >> > >> My brain gave out for today; but it did appear to have the right > >> structure. > > > >I can relate. ;-) > > > >> I would prefer it C11 would not require the volatile casts. It should > >> simply _never_ speculate with atomic writes, volatile or not. > > > >I agree with not needing volatiles to prevent speculated writes. > >However, > >they will sometimes be needed to prevent excessive load/store > >combining. > >The compiler doesn't have the runtime feedback mechanisms that the > >hardware has, and thus will need help from the developer from time > >to time. > > > >Or maybe the Linux kernel simply waits to transition to C11 relaxed > >atomics > >until the compiler has learned to be sufficiently conservative in its > >load-store combining decisions. > > Sounds backwards. Currently the compiler does nothing to the atomics. I'm > sure we'll eventually add something. But if testing coverage is zero outside > then surely things get worse, not better with time. Perhaps we solve this chicken-and-egg problem by creating a test suite? Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-17 at 14:02 -0800, Linus Torvalds wrote: > On Mon, Feb 17, 2014 at 1:21 PM, Torvald Riegel wrote: > > On Mon, 2014-02-17 at 12:18 -0800, Linus Torvalds wrote: > >> and then it is read by people (compiler writers) that intentionally > >> try to mis-use the words and do language-lawyering ("that depends on > >> what the meaning of 'is' is"). > > > > That assumption about people working on compilers is a little too broad, > > don't you think? > > Let's just say that *some* are that way, and those are the ones that I > end up butting heads with. > > The sane ones I never have to argue with - point them at a bug, and > they just say "yup, bug". The insane ones say "we don't need to fix > that, because if you read this copy of the standards that have been > translated to chinese and back, it clearly says that this is > acceptable". > > >> The whole "lvalue vs rvalue expression > >> vs 'what is a volatile access'" thing for C++ was/is a great example > >> of that. > > > > I'm not aware of the details of this. > > The argument was that an lvalue doesn't actually "access" the memory > (an rvalue does), so this: > >volatile int *p = ...; > >*p; > > doesn't need to generate a load from memory, because "*p" is still an > lvalue (since you could assign things to it). > > This isn't an issue in C, because in C, expression statements are > always rvalues, but C++ changed that. Huhh. I can see the problems that this creates in terms of C/C++ compatibility. > The people involved with the C++ > standards have generally been totally clueless about their subtle > changes. This isn't a fair characterization. There are many people that do care, and certainly not all are clueless. But it's a limited set of people, bugs happen, and not all of them will have the same goals. I think one way to prevent such problems in the future could be to have someone in the kernel community volunteer to look through standard revisions before they are published. The standard needs to be fixed, because compilers need to conform to the standard (e.g., a compiler's extension "fixing" the above wouldn't be conforming anymore because it emits more volatile reads than specified). Or maybe those of us working on the standard need to flag potential changes of interest to the kernel folks. But that may be less reliable than someone from the kernel side looking at them; I don't know.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 2:09 PM, Torvald Riegel wrote: > On Sat, 2014-02-15 at 11:15 -0800, Linus Torvalds wrote: >> > >> > if (atomic_load(&x, mo_relaxed) == 1) >> > atomic_store(&y, 3, mo_relaxed)); >> >> No, please don't use this idiotic example. It is wrong. > > It won't be useful in practice in a lot of cases, but that doesn't mean > it's wrong. It's clearly not illegal code. It also serves a purpose: a > simple example to reason about a few aspects of the memory model. It's not illegal code, but i you claim that you can make that store unconditional, it's a pointless and wrong example. >> The fact is, if a compiler generates anything but the obvious sequence >> (read/cmp/branch/store - where branch/store might obviously be done >> with some other machine conditional like a predicate), the compiler is >> wrong. > > Why? I've reasoned why (1) to (3) above allow in certain cases (i.e., > the first load always returning 1) for the branch (or other machine > conditional) to not be emitted. So please either poke holes into this > reasoning, or clarify that you don't in fact, contrary to what you wrote > above, agree with (1) to (3). The thing is, the first load DOES NOT RETURN 1. It returns whatever that memory location contains. End of story. Stop claiming it "can return 1".. It *never* returns 1 unless you do the load and *verify* it, or unless the load itself can be made to go away. And with the code sequence given, that just doesn't happen. END OF STORY. So your argument is *shit*. Why do you continue to argue it? I told you how that load can go away, and you agreed. But IT CANNOT GO AWAY any other way. You cannot claim "the compiler knows". The compiler doesn't know. It's that simple. >> So why do I say you are wrong, after I just gave you an example of how >> it happens? Because my example went back to the *real* issue, and >> there are actual real semantically meaningful details with doing >> things like load merging. >> >> To give an example, let's rewrite things a bit more to use an extra variable: >> >> atomic_store(&x, 1, mo_relaxed); >> a = atomic_load(&1, mo_relaxed); >> if (a == 1) >> atomic_store(&y, 3, mo_relaxed); >> >> which looks exactly the same. > > I'm confused. Is this a new example? That is a new example. The important part is that it has left a "trace" for the programmer: because 'a' contains the value, the programmer can now look at the value later and say "oh, we know we did a store iff a was 1" >> This sequence: >> >> atomic_store(&x, 1, mo_relaxed); >> a = atomic_load(&x, mo_relaxed); >> atomic_store(&y, 3, mo_relaxed); >> >> is actually - and very seriously - buggy. >> >> Why? Because you have effectively split the atomic_load into two loads >> - one for the value of 'a', and one for your 'proof' that the store is >> unconditional. > > I can't follow that, because it isn't clear to me which code sequences > are meant to belong together, and which transformations the compiler is > supposed to make. If you would clarify that, then I can reply to this > part. Basically, if the compiler allows the condition of "I wrote 3 to the y, but the programmer sees 'a' has another value than 1 later" then the compiler is one buggy pile of shit. It fundamentally broke the whole concept of atomic accesses. Basically the "atomic" access to 'x' turned into two different accesses: the one that "proved" that x had the value 1 (and caused the value 3 to be written), and the other load that then write that other value into 'a'. It's really not that complicated. And this is why descriptions like this should ABSOLUTELY NOT BE WRITTEN as "if the compiler can prove that 'x' had the value 1, it can remove the branch". Because that IS NOT SUFFICIENT. That was not a valid transformation of the atomic load. The only valid transformation was the one I stated, namely to remove the load entirely and replace it with the value written earlier in the same execution context. Really, why is so hard to understand? Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-17 at 14:14 -0800, Paul E. McKenney wrote: > On Mon, Feb 17, 2014 at 09:39:54PM +0100, Richard Biener wrote: > > On February 17, 2014 7:18:15 PM GMT+01:00, "Paul E. McKenney" > > wrote: > > >On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote: > > >> On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote: > > >> > You need volatile semantics to force the compiler to ignore any > > >proofs > > >> > it might otherwise attempt to construct. Hence all the > > >ACCESS_ONCE() > > >> > calls in my email to Torvald. (Hopefully I translated your example > > >> > reasonably.) > > >> > > >> My brain gave out for today; but it did appear to have the right > > >> structure. > > > > > >I can relate. ;-) > > > > > >> I would prefer it C11 would not require the volatile casts. It should > > >> simply _never_ speculate with atomic writes, volatile or not. > > > > > >I agree with not needing volatiles to prevent speculated writes. > > >However, > > >they will sometimes be needed to prevent excessive load/store > > >combining. > > >The compiler doesn't have the runtime feedback mechanisms that the > > >hardware has, and thus will need help from the developer from time > > >to time. > > > > > >Or maybe the Linux kernel simply waits to transition to C11 relaxed > > >atomics > > >until the compiler has learned to be sufficiently conservative in its > > >load-store combining decisions. > > > > Sounds backwards. Currently the compiler does nothing to the atomics. I'm > > sure we'll eventually add something. But if testing coverage is zero > > outside then surely things get worse, not better with time. > > Perhaps we solve this chicken-and-egg problem by creating a test suite? Perhaps. The test suite might also be a good set of examples showing which cases we expect to be optimized in a certain way, and which not. I suppose the uses of (the equivalent) of atomics in the kernel would be a good start.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 2:25 PM, Torvald Riegel wrote: > On Mon, 2014-02-17 at 14:02 -0800, Linus Torvalds wrote: >> >> The argument was that an lvalue doesn't actually "access" the memory >> (an rvalue does), so this: >> >>volatile int *p = ...; >> >>*p; >> >> doesn't need to generate a load from memory, because "*p" is still an >> lvalue (since you could assign things to it). >> >> This isn't an issue in C, because in C, expression statements are >> always rvalues, but C++ changed that. > > Huhh. I can see the problems that this creates in terms of C/C++ > compatibility. That's not the biggest problem. The biggest problem is that you have compiler writers that don't care about sane *use* of the features they write a compiler for, they just care about the standard. So they don't care about C vs C++ compatibility. Even more importantly, they don't care about the *user* that uses only C++ and the fact that their reading of the standard results in *meaningless* behavior. They point to the standard and say "that's what the standard says, suck it", and silently generate code (or in this case, avoid generating code) that makes no sense. So it's not about C++ being incompatible with C, it's about C++ having insane and bad semantics unless you just admit that "oh, ok, I need to not just read the standard, I also need to use my brain, and admit that a C++ statement expression needs to act as if it is an "access" wrt volatile variables". In other words, as a compiler person, you do need to read more than the paper of standard. You need to also take into account what is reasonable behavior even when the standard could possibly be read some other way. And some compiler people don't. The "volatile access in statement expression" did get resolved, sanely, at least in gcc. I think gcc warns about some remaining cases. Btw, afaik, C++11 actually clarifies the standard to require the reads, because everybody *knew* that not requiring the read was insane and meaningless behavior, and clearly against the intent of "volatile". But that didn't stop compiler writers from saying "hey, the standard allows my insane and meaningless behavior, so I'll implement it and not consider it a bug". Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 17 Feb 2014, Torvald Riegel wrote: > On Mon, 2014-02-17 at 18:59 +, Joseph S. Myers wrote: > > On Sat, 15 Feb 2014, Torvald Riegel wrote: > > > > > glibc is a counterexample that comes to mind, although it's a smaller > > > code base. (It's currently not using C11 atomics, but transitioning > > > there makes sense, and some thing I want to get to eventually.) > > > > glibc is using C11 atomics (GCC builtins rather than _Atomic / > > , but using __atomic_* with explicitly specified memory model > > rather than the older __sync_*) on AArch64, plus in certain cases on ARM > > and MIPS. > > I think the major steps remaining is moving the other architectures > over, and rechecking concurrent code (e.g., for the code that I have I don't think we'll be ready to require GCC >= 4.7 to build glibc for another year or two, although probably we could move the requirement up from 4.4 to 4.6. (And some platforms only had the C11 atomics optimized later than 4.7.) -- Joseph S. Myers jos...@codesourcery.com
Re: MSP430 in gcc4.9 ... enable interrupts?
> I presume these will be part of the headers for the library > distributed for msp430 gcc by TI/Redhat? I can't speak for TI's or Red Hat's plans. GNU's typical non-custom embedded runtime is newlib/libgloss, which usually doesn't have that much in the way of chip-specific headers or library functions. > is that for the "critical" attribute that exists in the old msp430 > port (which disables interrupts for the duration of the function)? Yes, for things like that. They're documented under "Function Attributes" in the "Extensions to the C Language Family" chapter of the current GCC manual.
Re: [RFC][PATCH 0/5] arch: atomic rework
On 17/02/14 20:18, Linus Torvalds wrote: On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel wrote: Which example do you have in mind here? Haven't we resolved all the debated examples, or did I miss any? Well, Paul seems to still think that the standard possibly allows speculative writes or possibly value speculation in ways that break the hardware-guaranteed orderings. And personally, I can't read standards paperwork. It is invariably Can't => Don't - evidently. written in some basically impossible-to-understand lawyeristic mode, You mean "unambiguous" - try reading a patent (Apple have 1000s of trivial ones, I tried reading one once thinking "how could they have phrased it so this got approved", their technique was to make the reader want to start cutting themselves to prove they wern't numb to everything) and then it is read by people (compiler writers) that intentionally try to mis-use the words and do language-lawyering ("that depends on what the meaning of 'is' is"). The whole "lvalue vs rvalue expression vs 'what is a volatile access'" thing for C++ was/is a great example of that. I'm not going to teach you what rvalues and lvalues, but! http://lmgtfy.com/?q=what+are+rvalues might help. So quite frankly, as a result I refuse to have anything to do with the process directly. Is this goodbye? Linus That aside, what is the problem? If the compiler has created code that that has different program states than what would be created without optimisation please file a bug report and/or send something to the mailing list USING A CIVIL TONE, there's no need for swear-words and profanities all the time - use them when you want to emphasise something. Additionally if you are always angry, start calling that state "normal" then reserve such words for when you are outraged. There are so many emails from you bitching about stuff, I've lost track of what you're bitching about you bitch that much about it. Like this standards stuff above (notice I said stuff, not "crap" or "shit"). What exactly is your problem, if the compiler is doing something the standard does not permit, or optimising something wrongly (read: "puts the program in a different state than if the optimisation was not applied") that is REALLY serious, you are right to report it; but whining like a n00b on Stack-overflow when a question gets closed is not helping. I tried reading back though the emails (I dismissed them previously) but there's just so much ranting, and rants about the standard too (I would trash this if I deemed the effort required to delete was less than the storage of the bytes the message takes up) standardised behaviour is VERY important. So start again, what is the serious problem, have you got any code that would let me replicate it, what is your version of GCC? Oh and lastly! Optimisations are not as casual as "oh, we could do this and it'd work better" unlike kernel work or any other software that is being improved, it is very formal (and rightfully so). I seriously recommend you read the first 40 pages at least of a book called "Compiler Design, Analysis and Transformation" it's not about the parsing phases or anything, but it develops a good introduction and later a good foundation for exploring the field further. Compilers do not operate on what I call "A-level logic" and to show what I mean I use the shovel-to-the-face of real analysis, "of course 1/x tends towards 0, it's not gonna be 5!!" = A-level logic. "Let epsilon > 0 be given, then there exists an N" - formal proof. So when one says "the compiler can prove" it's not some silly thing powered by A-level logic, it is the implementation of something that can be proven to be correct (in the sense of the program states mentioned before) So yeah, calm down and explain - no lashing out at standards bodies, what is the problem? Alec
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-17 at 14:32 -0800, Linus Torvalds wrote: > On Mon, Feb 17, 2014 at 2:09 PM, Torvald Riegel wrote: > > On Sat, 2014-02-15 at 11:15 -0800, Linus Torvalds wrote: > >> > > >> > if (atomic_load(&x, mo_relaxed) == 1) > >> > atomic_store(&y, 3, mo_relaxed)); > >> > >> No, please don't use this idiotic example. It is wrong. > > > > It won't be useful in practice in a lot of cases, but that doesn't mean > > it's wrong. It's clearly not illegal code. It also serves a purpose: a > > simple example to reason about a few aspects of the memory model. > > It's not illegal code, but i you claim that you can make that store > unconditional, it's a pointless and wrong example. > > >> The fact is, if a compiler generates anything but the obvious sequence > >> (read/cmp/branch/store - where branch/store might obviously be done > >> with some other machine conditional like a predicate), the compiler is > >> wrong. > > > > Why? I've reasoned why (1) to (3) above allow in certain cases (i.e., > > the first load always returning 1) for the branch (or other machine > > conditional) to not be emitted. So please either poke holes into this > > reasoning, or clarify that you don't in fact, contrary to what you wrote > > above, agree with (1) to (3). > > The thing is, the first load DOES NOT RETURN 1. It returns whatever > that memory location contains. End of story. The memory location is just an abstraction for state, if it's not volatile. > Stop claiming it "can return 1".. It *never* returns 1 unless you do > the load and *verify* it, or unless the load itself can be made to go > away. And with the code sequence given, that just doesn't happen. END > OF STORY. void foo(); { atomic x = 1; if (atomic_load(&x, mo_relaxed) == 1) atomic_store(&y, 3, mo_relaxed)); } This is a counter example to your claim, and yes, the compiler has proof that x is 1. It's deliberately simple, but I can replace this with other more advanced situations. For example, if x comes out of malloc (or, on the kernel side, something else that returns non-aliasing memory) and hasn't provably escaped to other threads yet. I haven't posted this full example, but I've *clearly* said that *if* the compiler can prove that the load would always return 1, it can remove it. And it's simple to see why that's the case: If this holds, then in all allowed executions it would load from a know store, the relaxed_mo gives no further ordering guarantees so we can just take the value, and we're good. > So your argument is *shit*. Why do you continue to argue it? Maybe because it isn't? Maybe you should try to at least trust that my intentions are good, even if distrusting my ability to reason. > I told you how that load can go away, and you agreed. But IT CANNOT GO > AWAY any other way. You cannot claim "the compiler knows". The > compiler doesn't know. It's that simple. Oh yes it can. Because of the same rules that allow you to perform the other transformations. Please try to see the similarities here. You previously said you don't want to mix volatile semantics and atomics. This is something that's being applied in this example. > >> So why do I say you are wrong, after I just gave you an example of how > >> it happens? Because my example went back to the *real* issue, and > >> there are actual real semantically meaningful details with doing > >> things like load merging. > >> > >> To give an example, let's rewrite things a bit more to use an extra > >> variable: > >> > >> atomic_store(&x, 1, mo_relaxed); > >> a = atomic_load(&1, mo_relaxed); > >> if (a == 1) > >> atomic_store(&y, 3, mo_relaxed); > >> > >> which looks exactly the same. > > > > I'm confused. Is this a new example? > > That is a new example. The important part is that it has left a > "trace" for the programmer: because 'a' contains the value, the > programmer can now look at the value later and say "oh, we know we did > a store iff a was 1" > > >> This sequence: > >> > >> atomic_store(&x, 1, mo_relaxed); > >> a = atomic_load(&x, mo_relaxed); > >> atomic_store(&y, 3, mo_relaxed); > >> > >> is actually - and very seriously - buggy. > >> > >> Why? Because you have effectively split the atomic_load into two loads > >> - one for the value of 'a', and one for your 'proof' that the store is > >> unconditional. > > > > I can't follow that, because it isn't clear to me which code sequences > > are meant to belong together, and which transformations the compiler is > > supposed to make. If you would clarify that, then I can reply to this > > part. > > Basically, if the compiler allows the condition of "I wrote 3 to the > y, but the programmer sees 'a' has another value than 1 later" then > the compiler is one buggy pile of shit. It fundamentally broke the > whole concept of atomic accesses. Basically the "atomic" access to 'x' > turned into two different accesses: the one that "proved" that x had > the value 1 (and caused the value 3 to
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-17 at 14:47 -0800, Linus Torvalds wrote: > On Mon, Feb 17, 2014 at 2:25 PM, Torvald Riegel wrote: > > On Mon, 2014-02-17 at 14:02 -0800, Linus Torvalds wrote: > >> > >> The argument was that an lvalue doesn't actually "access" the memory > >> (an rvalue does), so this: > >> > >>volatile int *p = ...; > >> > >>*p; > >> > >> doesn't need to generate a load from memory, because "*p" is still an > >> lvalue (since you could assign things to it). > >> > >> This isn't an issue in C, because in C, expression statements are > >> always rvalues, but C++ changed that. > > > > Huhh. I can see the problems that this creates in terms of C/C++ > > compatibility. > > That's not the biggest problem. > > The biggest problem is that you have compiler writers that don't care > about sane *use* of the features they write a compiler for, they just > care about the standard. > > So they don't care about C vs C++ compatibility. Even more > importantly, they don't care about the *user* that uses only C++ and > the fact that their reading of the standard results in *meaningless* > behavior. They point to the standard and say "that's what the standard > says, suck it", and silently generate code (or in this case, avoid > generating code) that makes no sense. There's an underlying problem here that's independent from the actual instance that you're worried about here: "no sense" is a ultimately a matter of taste/objectives/priorities as long as the respective specification is logically consistent. If you want to be independent of your sanity being different from other people's sanity (e.g., compiler writers), you need to make sure that the specification is precise and says what you want. IOW, think about the specification being the program, and the people being computers; you better want a well-defined program in this case. > So it's not about C++ being incompatible with C, it's about C++ having > insane and bad semantics unless you just admit that "oh, ok, I need to > not just read the standard, I also need to use my brain, and admit > that a C++ statement expression needs to act as if it is an "access" > wrt volatile variables". 1) I agree that (IMO) a good standard strives for being easy to understand. 2) In practice, there is a trade-off between "Easy to understand" and actually producing a specification. A standard is not a tutorial. And that's for good reason, because (a) there might be more than one way to teach something and that should be allowed and (b) that the standard should carry the full precision but still be compact enough to be manageable. 3) Implementations can try to be nice to users by helping them avoiding error-prone corner cases or such. A warning for common problems is such a case. But an implementation has to draw a line somewhere, demarcating cases where it fully exploits what the standard says (eg, to allow optimizations) from cases where it is more conservative and does what the standard allows but in a potentially more intuitive way. That's especially the case if it's being asked to produce high-performance code. 4) There will be arguments for where the line actually is, simply because different users will have different goals. 5) The way to reduce 4) is to either make the standard more specific, or to provide better user documentation. If the standard has strict requirements, then there will be less misunderstanding. 6) To achieve 5), one way is to get involved in the standards process.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 3:10 PM, Alec Teal wrote: > > You mean "unambiguous" - try reading a patent (Apple have 1000s of trivial > ones, I tried reading one once thinking "how could they have phrased it so > this got approved", their technique was to make the reader want to start > cutting themselves to prove they wern't numb to everything) Oh, I agree, patent language is worse. > I'm not going to teach you what rvalues and lvalues, but! I know what lvalues and rvalues are. I *understand* the thinking that goes on behind the "let's not do the access, because it's not an rvalue, so there is no 'access' to the object". I understand it from a technical perspective. I don't understand the compiler writer that uses a *technicality* to argue against generating sane code that is obviously what the user actually asked for. See the difference? > So start again, what is the serious problem, have you got any code that > would let me replicate it, what is your version of GCC? The volatile problem is long fixed. The people who argued for the "legalistically correct", but insane behavior lost (and as mentioned, I think C++11 actually fixed the legalistic reading too). I'm bringing it up because I've had too many cases where compiler writers pointed to standard and said "that is ambiguous or undefined, so we can do whatever the hell we want, regardless of whether that's sensible, or regardless of whether there is a sensible way to get the behavior you want or not". > Oh and lastly! Optimisations are not as casual as "oh, we could do this and > it'd work better" unlike kernel work or any other software that is being > improved, it is very formal (and rightfully so) Alec, I know compilers. I don't do code generation (quite frankly, register allocation and instruction choice is when I give up), but I did actually write my own for static analysis, including turning things into SSA etc. No, I'm not a "compiler person", but I actually do know enough that I understand what goes on. And exactly because I know enough, I would *really* like atomics to be well-defined, and have very clear - and *local* - rules about how they can be combined and optimized. None of this "if you can prove that the read has value X" stuff. And things like value speculation should simply not be allowed, because that actually breaks the dependency chain that the CPU architects give guarantees for. Instead, make the rules be very clear, and very simple, like my suggestion. You can never remove a load because you can "prove" it has some value, but you can combine two consecutive atomic accesses/ For example, CPU people actually do tend to give guarantees for certain things, like stores that are causally related being visible in a particular order. If the compiler starts doing value speculation on atomic accesses, you are quite possibly breaking things like that. It's just not a good idea. Don't do it. Write the standard so that it clearly is disallowed. Because you may think that a C standard is machine-independent, but that isn't really the case. The people who write code still write code for a particular machine. Our code works (in the general case) on different byte orderings, different register sizes, different memory ordering models. But in each *instance* we still end up actually coding for each machine. So the rules for atomics should be simple and *specific* enough that when you write code for a particular architecture, you can take the architecture memory ordering *and* the C atomics orderings into account, and do the right thing for that architecture. And that very much means that doing things like value speculation MUST NOT HAPPEN. See? Even if you can prove that your code is "equivalent", it isn't. So for example, let's say that you have a pointer, and you have some reason to believe that the pointer has a particular value. So you rewrite following the pointer from this: value = ptr->val; into value = speculated->value; tmp = ptr; if (unlikely(tmp != speculated)) value = tmp->value; and maybe you can now make the critical code-path for the speculated case go faster (since now there is no data dependency for the speculated case, and the actual pointer chasing load is now no longer in the critical path), and you made things faster because your profiling showed that the speculated case was true 99% of the time. Wonderful, right? And clearly, the code "provably" does the same thing. EXCEPT THAT IS NOT TRUE AT ALL. It very much does not do the same thing at all, and by doing value speculation and "proving" something was true, the only thing you did was to make incorrect code run faster. Because now the causally related load of value from the pointer isn't actually causally related at all, and you broke the memory ordering. This is why I don't like it when I see Torvald talk about "proving" things. It's bullshit. You can "prove" pretty much anything, and in the process lose sight of the bigger issue, namely tha
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 3:17 PM, Torvald Riegel wrote: > On Mon, 2014-02-17 at 14:32 -0800, > >> Stop claiming it "can return 1".. It *never* returns 1 unless you do >> the load and *verify* it, or unless the load itself can be made to go >> away. And with the code sequence given, that just doesn't happen. END >> OF STORY. > > void foo(); > { > atomic x = 1; > if (atomic_load(&x, mo_relaxed) == 1) > atomic_store(&y, 3, mo_relaxed)); > } This is the very example I gave, where the real issue is not that "you prove that load returns 1", you instead say "store followed by a load can be combined". I (in another email I just wrote) tried to show why the "prove something is true" is a very dangerous model. Seriously, it's pure crap. It's broken. If the C standard defines atomics in terms of "provable equivalence", it's broken. Exactly because on a *virtual* machine you can prove things that are not actually true in a *real* machine. I have the example of value speculation changing the memory ordering model of the actual machine. See? Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 3:41 PM, Torvald Riegel wrote: > > There's an underlying problem here that's independent from the actual > instance that you're worried about here: "no sense" is a ultimately a > matter of taste/objectives/priorities as long as the respective > specification is logically consistent. Yes. But I don't think it's "independent". Exactly *because* some people will read standards without applying "does the resulting code generation actually make sense for the programmer that wrote the code", the standard has to be pretty clear. The standard often *isn't* pretty clear. It wasn't clear enough when it came to "volatile", and yet that was a *much* simpler concept than atomic accesses and memory ordering. And most of the time it's not a big deal. But because the C standard generally tries to be very portable, and cover different machines, there tends to be a mindset that anything inherently unportable is "undefined" or "implementation defined", and then the compiler writer is basically given free reign to do anything they want (with "implementation defined" at least requiring that it is reliably the same thing). And when it comes to memory ordering, *everything* is basically non-portable, because different CPU's very much have different rules. I worry that that means that the standard then takes the stance that "well, compiler re-ordering is no worse than CPU re-ordering, so we let the compiler do anything". And then we have to either add "volatile" to make sure the compiler doesn't do that, or use an overly strict memory model at the compiler level that makes it all pointless. So I really really hope that the standard doesn't give compiler writers free hands to do anything that they can prove is "equivalent" in the virtual C machine model. That's not how you get reliable results. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 04:18:52PM -0800, Linus Torvalds wrote: > On Mon, Feb 17, 2014 at 3:41 PM, Torvald Riegel wrote: > > > > There's an underlying problem here that's independent from the actual > > instance that you're worried about here: "no sense" is a ultimately a > > matter of taste/objectives/priorities as long as the respective > > specification is logically consistent. > > Yes. But I don't think it's "independent". > > Exactly *because* some people will read standards without applying > "does the resulting code generation actually make sense for the > programmer that wrote the code", the standard has to be pretty clear. > > The standard often *isn't* pretty clear. It wasn't clear enough when > it came to "volatile", and yet that was a *much* simpler concept than > atomic accesses and memory ordering. > > And most of the time it's not a big deal. But because the C standard > generally tries to be very portable, and cover different machines, > there tends to be a mindset that anything inherently unportable is > "undefined" or "implementation defined", and then the compiler writer > is basically given free reign to do anything they want (with > "implementation defined" at least requiring that it is reliably the > same thing). > > And when it comes to memory ordering, *everything* is basically > non-portable, because different CPU's very much have different rules. > I worry that that means that the standard then takes the stance that > "well, compiler re-ordering is no worse than CPU re-ordering, so we > let the compiler do anything". And then we have to either add > "volatile" to make sure the compiler doesn't do that, or use an overly > strict memory model at the compiler level that makes it all pointless. For whatever it is worth, this line of reasoning has been one reason why I have been objecting strenuously every time someone on the committee suggests eliminating "volatile" from the standard. Thanx, Paul > So I really really hope that the standard doesn't give compiler > writers free hands to do anything that they can prove is "equivalent" > in the virtual C machine model. That's not how you get reliable > results. > >Linus >
RE: Vectorizer Pragmas
The way Intel present #pragma simd (to users, to the OpenMP committee, to the C and C++ committees, etc) is that it is not a hint, it has a meaning. The meaning is defined in term of evaluation order. Both C and C++ define an evaluation order for sequential programs. #pragma simd relaxes the sequential order into a partial order: 0. subsequent iterations of the loop are chunked together and execute in lockstep 1. there is no change in the order of evaluation of expression within an iteration 2. if X and Y are expressions in the loop, and X(i) is the evaluation of X in iteration i, then for X sequenced before Y and iteration i evaluated before iteration j, X(i) is sequenced before Y(j). A corollary is that the sequential order is always allowed, since it satisfies the partial order. However, the partial order allows the compiler to group copies of the same expression next to each other, and then to combine the scalar instructions into a vector instruction. There are other corollaries, such as that if multiple loop iterations write into an object defined outside of the loop then it has to be an undefined behavior, the vector moral equivalent of a data race. That is what induction variables and reductions are necessary exception to this rule and require explicit support. As far as correctness, by this definition, the programmer expressed that it is correct, and the compiler should not try to prove correctness. On performance heuristics side, the Intel compiler tries to not second guess the user. There are users who work much harder than just add a #pragma simd on unmodified sequential loops. There are various changes that may be necessary, and users who worked hard to get their loops in a good shape are unhappy if the compiler does second guess them. Robert. -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Renato Golin Sent: Monday, February 17, 2014 7:14 AM To: tpri...@computer.org Cc: gcc Subject: Re: Vectorizer Pragmas On 17 February 2014 14:47, Tim Prince wrote: > I'm continuing discussions with former Intel colleagues. If you are > asking for insight into how Intel priorities vary over time, I don't > expect much, unless the next beta compiler provides some inferences. > They have talked about implementing all of OpenMP 4.0 except user > defined reduction this year. That would imply more activity in that > area than on cilkplus, I'm expecting this. Any proposal to support Cilk in LLVM would be purely temporary and not endorsed in any way. > although some fixes have come in the latter. On the other hand I had > an issue on omp simd reduction(max: ) closed with the decision "will > not be fixed." We still haven't got pragmas for induction/reduction logic, so I'm not too worried about them. > I have an icc problem report in on fixing omp simd safelen so it is > more like the standard and less like the obsolete pragma simd vectorlength. Our width metadata is slightly different in that it means "try to use that length", rather than "it's safe to use that length", this is why I'm holding on use safelen for the moment. > Also, I have some problem reports active attempting to get > clarification of their omp target implementation. Same here... RTFM is not enough in this case. ;) > You may have noticed that omp parallel for simd in current Intel > compilers can be used for combined thread and simd parallelism, > including the case where the outer loop is parallelizable and > vectorizable but the inner one is not. That's my fear of going with omp simd directly. I don't want to be throwing threads all over the place when all I really want is vector code. For the time, my proposal is to use legacy pragmas: vector/novector, unroll/nounroll and simd vectorlength which map nicely to the metadata we already have and don't incur in OpenMP overhead. Later on, if OpenMP ends up with simple non-threaded pragmas, we should use those and deprecate the legacy ones. If GCC is trying to do the same thing regarding non-threaded-vector code, I'd be glad to be involved in the discussion. Some LLVM folks think this should be an OpenMP discussion, I personally think it's pushing the boundaries a bit too much on an inherently threaded library extension. cheers, --renato
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 12:18:21PM -0800, Linus Torvalds wrote: > On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel wrote: > > > > Which example do you have in mind here? Haven't we resolved all the > > debated examples, or did I miss any? > > Well, Paul seems to still think that the standard possibly allows > speculative writes or possibly value speculation in ways that break > the hardware-guaranteed orderings. It is not that I know of any specific problems, but rather that I know I haven't looked under all the rocks. Plus my impression from my few years on the committee is that the standard will be pushed to the limit when it comes time to add optimizations. One example that I learned about last week uses the branch-prediction hardware to validate value speculation. And no, I am not at all a fan of value speculation, in case you were curious. However, it is still an educational example. This is where you start: p = gp.load_explicit(memory_order_consume); /* AKA rcu_dereference() */ do_something(p->a, p->b, p->c); p->d = 1; Then you leverage branch-prediction hardware as follows: p = gp.load_explicit(memory_order_consume); /* AKA rcu_dereference() */ if (p == GUESS) { do_something(GUESS->a, GUESS->b, GUESS->c); GUESS->d = 1; } else { do_something(p->a, p->b, p->c); p->d = 1; } The CPU's branch-prediction hardware squashes speculation in the case where the guess was wrong, and this prevents the speculative store to ->d from ever being visible. However, the then-clause breaks dependencies, which means that the loads -could- be speculated, so that do_something() gets passed pre-initialization values. Now, I hope and expect that the wording in the standard about dependency ordering prohibits this sort of thing. But I do not yet know for certain. And yes, I am being paranoid. But not unnecessarily paranoid. ;-) Thanx, Paul > And personally, I can't read standards paperwork. It is invariably > written in some basically impossible-to-understand lawyeristic mode, > and then it is read by people (compiler writers) that intentionally > try to mis-use the words and do language-lawyering ("that depends on > what the meaning of 'is' is"). The whole "lvalue vs rvalue expression > vs 'what is a volatile access'" thing for C++ was/is a great example > of that. > > So quite frankly, as a result I refuse to have anything to do with the > process directly. > > Linus >
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 7:00 PM, Paul E. McKenney wrote: > > One example that I learned about last week uses the branch-prediction > hardware to validate value speculation. And no, I am not at all a fan > of value speculation, in case you were curious. Heh. See the example I used in my reply to Alec Teal. It basically broke the same dependency the same way. Yes, value speculation of reads is simply wrong, the same way speculative writes are simply wrong. The dependency chain matters, and is meaningful, and breaking it is actively bad. As far as I can tell, the intent is that you can't do value speculation (except perhaps for the "relaxed", which quite frankly sounds largely useless). But then I do get very very nervous when people talk about "proving" certain values. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 7:24 PM, Linus Torvalds wrote: > > As far as I can tell, the intent is that you can't do value > speculation (except perhaps for the "relaxed", which quite frankly > sounds largely useless). Hmm. The language I see for "consume" is not obvious: "Consume operation: no reads in the current thread dependent on the value currently loaded can be reordered before this load" and it could make a compiler writer say that value speculation is still valid, if you do it like this (with "ptr" being the atomic variable): value = ptr->val; into tmp = ptr; value = speculated.value; if (unlikely(tmp != &speculated)) value = tmp->value; which is still bogus. The load of "ptr" does happen before the load of "value = speculated->value" in the instruction stream, but it would still result in the CPU possibly moving the value read before the pointer read at least on ARM and power. So if you're a compiler person, you think you followed the letter of the spec - as far as *you* were concerned, no load dependent on the value of the atomic load moved to before the atomic load. You go home, happy, knowing you've done your job. Never mind that you generated code that doesn't actually work. I dread having to explain to the compiler person that he may be right in some theoretical virtual machine, but the code is subtly broken and nobody will ever understand why (and likely not be able to create a test-case showing the breakage). But maybe the full standard makes it clear that "reordered before this load" actually means on the real hardware, not just in the generated instruction stream. Reading it with understanding of the *intent* and understanding all the different memory models that requirement should be obvious (on alpha, you need an "rmb" instruction after the load), but ... Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 07:24:56PM -0800, Linus Torvalds wrote: > On Mon, Feb 17, 2014 at 7:00 PM, Paul E. McKenney > wrote: > > > > One example that I learned about last week uses the branch-prediction > > hardware to validate value speculation. And no, I am not at all a fan > > of value speculation, in case you were curious. > > Heh. See the example I used in my reply to Alec Teal. It basically > broke the same dependency the same way. ;-) > Yes, value speculation of reads is simply wrong, the same way > speculative writes are simply wrong. The dependency chain matters, and > is meaningful, and breaking it is actively bad. > > As far as I can tell, the intent is that you can't do value > speculation (except perhaps for the "relaxed", which quite frankly > sounds largely useless). But then I do get very very nervous when > people talk about "proving" certain values. That was certainly my intent, but as you might have notice in the discussion earlier in this thread, the intent can get lost pretty quickly. ;-) The HPC guys appear to be the most interested in breaking dependencies. Their software does't rely on dependencies, and from their viewpoint anything that has any chance of leaving an FP unit of any type idle is a very bad thing. But there are probably other benchmarks for which breaking dependencies gives a few percent performance boost. Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 07:42:42PM -0800, Linus Torvalds wrote: > On Mon, Feb 17, 2014 at 7:24 PM, Linus Torvalds > wrote: > > > > As far as I can tell, the intent is that you can't do value > > speculation (except perhaps for the "relaxed", which quite frankly > > sounds largely useless). > > Hmm. The language I see for "consume" is not obvious: > > "Consume operation: no reads in the current thread dependent on the > value currently loaded can be reordered before this load" > > and it could make a compiler writer say that value speculation is > still valid, if you do it like this (with "ptr" being the atomic > variable): > > value = ptr->val; > > into > > tmp = ptr; > value = speculated.value; > if (unlikely(tmp != &speculated)) > value = tmp->value; > > which is still bogus. The load of "ptr" does happen before the load of > "value = speculated->value" in the instruction stream, but it would > still result in the CPU possibly moving the value read before the > pointer read at least on ARM and power. > > So if you're a compiler person, you think you followed the letter of > the spec - as far as *you* were concerned, no load dependent on the > value of the atomic load moved to before the atomic load. You go home, > happy, knowing you've done your job. Never mind that you generated > code that doesn't actually work. Agreed, that would be bad. But please see below. > I dread having to explain to the compiler person that he may be right > in some theoretical virtual machine, but the code is subtly broken and > nobody will ever understand why (and likely not be able to create a > test-case showing the breakage). If things go as they usually do, such explanations will be required a time or two. > But maybe the full standard makes it clear that "reordered before this > load" actually means on the real hardware, not just in the generated > instruction stream. Reading it with understanding of the *intent* and > understanding all the different memory models that requirement should > be obvious (on alpha, you need an "rmb" instruction after the load), > but ... The key point with memory_order_consume is that it must be paired with some sort of store-release, a category that includes stores tagged with memory_order_release (surprise!), memory_order_acq_rel, and memory_order_seq_cst. This pairing is analogous to the memory-barrier pairing in the Linux kernel. So you have something like this for the rcu_assign_pointer() side: p = kmalloc(...); if (unlikely(!p)) return -ENOMEM; p->a = 1; p->b = 2; p->c = 3; /* The following would be buried within rcu_assign_pointer(). */ atomic_store_explicit(&gp, p, memory_order_release); And something like this for the rcu_dereference() side: /* The following would be buried within rcu_dereference(). */ q = atomic_load_explicit(&gp, memory_order_consume); do_something_with(q->a); So, let's look at the C11 draft, section 5.1.2.4 "Multi-threaded executions and data races". 5.1.2.4p14 says that the atomic_load_explicit() carries a dependency to the argument of do_something_with(). 5.1.2.4p15 says that the atomic_store_explicit() is dependency-ordered before the atomic_load_explicit(). 5.1.2.4p15 also says that the atomic_store_explicit() is dependency-ordered before the argument of do_something_with(). This is because if A is dependency-ordered before X and X carries a dependency to B, then A is dependency-ordered before B. 5.1.2.4p16 says that the atomic_store_explicit() inter-thread happens before the argument of do_something_with(). The assignment to p->a is sequenced before the atomic_store_explicit(). Therefore, combining these last two, the assignment to p->a happens before the argument of do_something_with(), and that means that do_something_with() had better see the "1" assigned to p->a or some later value. But as far as I know, compiler writers currently take the approach of treating memory_order_consume as if it was memory_order_acquire. Which certainly works, as long as ARM and PowerPC people don't mind an extra memory barrier out of each rcu_dereference(). Which is one thing that compiler writers are permitted to do according to the standard -- substitute a memory-barrier instruction for any given dependency... Thanx, Paul
Help Required on Missing GOTO statements in Gimple/SSA/CFG Pass ...
Hi, I am developing plugins for the GCC-4.8.2. I am a newbie in plugins. I wrote a plugin and tried to count and see the Goto Statements using the gimple_stmt_iterator. I get gimple statements printed on my stdout, but I am not able to find the line which has goto statements. I only get other lines such as variable declaration and logic statements, but no goto statements. When I open the Gimple/SSA/CFG file seperately using the vim editor I find the goto statements are actually present. So, can anyone help me. How can I actually get the count of Goto statements or atleast access these goto statements using some iterator. I have used -fdump-tree-all, -fdump-tree-cfg as flags. Here is the pseudocode: struct register_pass_info pass_info = { &(pass_plugin.pass), /* Address of new pass, here, the 'struct opt_pass' field of 'gimple_opt_pass' defined above */ "ssa", /* Name of the reference pass for hooking up the new pass. ??? */ 0, /* Insert the pass at the specified instance number of the reference pass. Do it for every instance if it is 0. */ PASS_POS_INSERT_AFTER/* how to insert the new pass: before, after, or replace. Here we are inserting a pass names 'plug' after the pass named 'pta' */ }; . static unsigned int dead_code_elimination (void) { FOR_EACH_BB_FN (bb, cfun) { // gimple_dump_bb(stdout,bb,0,0); //printf("\nIn New BB"); gsi2= gsi_after_labels (bb); print_gimple_stmt(stdout,gsi_stmt(gsi2),0,0); /*Iterating over each gimple statement in a basic block*/ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { g = gsi_stmt(gsi); print_gimple_stmt(stdout,g,0,0); if (gimple_code(g)==GIMPLE_GOTO) printf("\nFound GOTO stmt\n"); print_gimple_stmt(stdout,gsi_stmt(gsi),0,0); //analyze_gimple_statement (gsi); } } }
Re: Help Required on Missing GOTO statements in Gimple/SSA/CFG Pass ...
On Tue, 2014-02-18 at 11:17 +0530, Mohsin Khan wrote: > Hi, > > I am developing plugins for the GCC-4.8.2. I am a newbie in plugins. > I wrote a plugin and tried to count and see the Goto Statements using > the gimple_stmt_iterator. I get gimple statements printed on my > stdout, but I am not able to find the line which has goto statements. I guess that most GOTOs are just becoming implicit as the link to the next basic block. Probably if (!cond) goto end; something; end:; has nearly the same Gimple representation than while (cond) { something; } BTW, did you consider using MELT http://gcc-melt.org/ to code your GCC extension? -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***