Re: asking your advice about bug
On 02/17/2014 06:50 PM, Roman Gareev wrote: Hi Tobias, thanks for the answer! Hi Roman, sorry for missing this mail. I think that the segfault is being caused by NULL arguments being passedto compute_deps by loop_level_carries_dependences. *This is **causing **an* *assignment of** NULL values to the following parameters of **compute_deps:* must_raw_no_source, may_raw_no_source, must_war_no_source, may_war_no_source, must_waw_no_source, may_waw_no_source. They are being passed to subtract_commutative_associative_deps and dereferenced in the following statements: *must_raw_no_source = isl_union_map_subtract (*must_raw_no_source, x_must_raw_no_source); *may_raw_no_source = isl_union_map_subtract (*may_raw_no_source, x_may_raw_no_source); *must_war_no_source = isl_union_map_subtract (*must_war_no_source, x_must_war_no_source); *may_war_no_source = isl_union_map_subtract (*may_war_no_source, x_may_war_no_source); *must_waw_no_source = isl_union_map_subtract (*must_waw_no_source, x_must_waw_no_source); *may_waw_no_source = isl_union_map_subtract (*may_waw_no_source, x_may_waw_no_source); This is the reason of segfault. (All functions mentioned above are located in gcc/graphite-dependences.c) Interesting analysis. I think that this can be solved by the addition to subtract_commutative_associative_deps of NULL checking of the following variables: must_raw_no_source, may_raw_no_source, must_war_no_source, may_war_no_source, must_waw_no_source, may_waw_no_source. I've implemented this in the patch, which can be found below. Yes, this would be a 'solution'. However, I am in fact surprised that those variables are NULL at all. Do you have an idea why this is the case? Understanding this would help to understand if the patch you propose is actually the right solution or if it is just hiding a previous bug. Cheers, Tobias
Re: [RFC][PATCH 0/5] arch: atomic rework
Hi, On Fri, 21 Feb 2014, Paul E. McKenney wrote: > > And with conservative I mean "everything is a source of a dependency, and > > hence can't be removed, reordered or otherwise fiddled with", and that > > includes code sequences where no atomic objects are anywhere in sight [1]. > > In the light of that the only realistic way (meaning to not have to > > disable optimization everywhere) to implement consume as currently > > specified is to map it to acquire. At which point it becomes pointless. > > No, only memory_order_consume loads and [[carries_dependency]] > function arguments are sources of dependency chains. I don't see [[carries_dependency]] in the C11 final draft (yeah, should get a real copy, I know, but let's assume it's the same language as the standard). Therefore, yes, only consume loads are sources of dependencies. The problem with the definition of the "carries a dependency" relation is not the sources, but rather where it stops. It's transitively closed over "value of evaluation A is used as operand in evaluation B", with very few exceptions as per 5.1.2.4#14. Evaluations can contain function calls, so if there's _any_ chance that an operand of an evaluation might even indirectly use something resulting from a consume load then that evaluation must be compiled in a way to not break dependency chains. I don't see a way to generally assume that e.g. the value of a function argument can impossibly result from a consume load, therefore the compiler must assume that all function arguments _can_ result from such loads, and so must disable all depchain breaking optimization (which are many). > > [1] Simple example of what type of transformations would be disallowed: > > > > int getzero (int i) { return i - i; } > > This needs to be as follows: > > [[carries_dependency]] int getzero(int i [[carries_dependency]]) > { > return i - i; > } > > Otherwise dependencies won't get carried through it. So, with the above do you agree that in absense of any other magic (see below) the compiler is not allowed to transform my initial getzero() (without the carries_dependency markers) implementation into "return 0;" because of the C11 rules for "carries-a-dependency"? If so, do you then also agree that the specification of "carries a dependency" is somewhat, err, shall we say, overbroad? > > depchains don't matter, could _then_ optmize it to zero. But that's > > insane, especially considering that it's hard to detect if a given context > > doesn't care for depchains, after all the depchain relation is constructed > > exactly so that it bleeds into nearly everywhere. So we would most of > > the time have to assume that the ultimate context will be depchain-aware > > and therefore disable many transformations. > > Any function that does not contain a memory_order_consume load and that > doesn't have any arguments marked [[carries_dependency]] can be > optimized just as before. And as such marker doesn't exist we must conservatively assume that it's on _all_ parameters, so I'll stand by my claim. > > Then inlining getzero would merely add another "# j.dep = i.dep" > > relation, so depchains are still there but the value optimization can > > happen before inlining. Having to do something like that I'd find > > disgusting, and rather rewrite consume into acquire :) Or make the > > depchain relation somehow realistically implementable. > > I was actually OK with arithmetic cancellation breaking the dependency > chains. Others on the committee felt otherwise, and I figured that (1) > I wouldn't be writing that kind of function anyway and (2) they knew > more about writing compilers than I. I would still be OK saying that > things like "i-i", "i*0", "i%1", "i&0", "i|~0" and so on just break the > dependency chain. Exactly. I can see the problem that people had with that, though. There are very many ways to write conceiled zeros (or generally neutral elements of the function in question). My getzero() function is one (it could e.g. be an assembler implementation). The allowance to break dependency chains would have to apply to such cancellation as well, and so can't simply itemize all cases in which cancellation is allowed. Rather it would have had to argue about something like "value dependency", ala "evaluation B depends on A, if there exist at least two different values A1 and A2 (results from A), for which evaluation B (with otherwise same operands) yields different values B1 and B2". Alas, it doesn't, except if you want to understand the term "the value of A is used as an operand of B" in that way. Even then you'd still have the second case of the depchain definition, via intermediate not even atomic memory stores and loads to make two evaluations be ordered per carries-a-dependency. And even that understanding of "is used" wouldn't be enough, because there are cases where the cancellation happens in steps, and where it int
Non-temporal move
I could see "storent" pattern in x86 machine descriptions (in sse.md)., but internals doc don't mention it. Should we add a description about this in the internals doc? Regards Ganesh
RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking
Richard Sandiford writes > Matthew Fortune writes: > > All, > > > > Imagination Technologies would like to introduce the design of an O32 > > ABI extension for MIPS to allow it to be used in conjunction with MIPS > > FPUs having 64-bit floating-point registers. This is a wide-reaching > > design that involves changes to all components of the MIPS toolchain > > it is being posted to GCC first and will progress on to other tools. > > This ABI extension is compatible with the existing O32 ABI definition > > and will not require the introduction of new build variants > (multilibs). > > > > The design document is relatively large and has been placed on the > > MIPS Compiler Team wiki to facilitate review: > > > > http://dmz-portal.mips.com/wiki/MIPS_O32_ABI_-_FR0_and_FR1_Interlinkin > > g > > Looks good to me. It'll be interesting to see whether making the odd- > numbered call-saved-in-fr0 registers available for frx pays off or > whether it ends up being better to avoid them. Indeed, I suspect they should be avoided except for leaf functions. You would have to be pretty desperate for a register if you use the caller-and-callee save registers! > I understand the need to deprecate the current -mgp32 -mfp64 behaviour. > I don't think we should deprecate -mfp64 itself though. Instead, why > not keep -mfp32 as meaning FR0, -mfp64 meaning FR1 and add -mfpxx for > modeless? So rather than deprecating the -mgp32 -mfp64 combination and > adding -mfr, we'd just make -mgp32 -mfp64 generate the new FR1 form in > which the odd-numbered registers are call-clobbered rather than the old > form in which they were call-saved. Extreme caution is the only reason why the design avoided changing fp64 behaviour (basically in case anyone had serious objection). If you would be happy with a change of behaviour for -mgp32 -mfp64 then that is a great start. > AIUI the old form never really worked reliably due to things like > newlib's setjmp not preserving the odd-numbered registers, so it doesn't > seem worth keeping around. Also, the old form is identified by the GNU > attribute (4, 4) so it'd be easy for the linker to reject links between > the old and the new form. That is true. You will have noticed a number of changes over recent months to start fixing fp64 as currently defined but having found this new solution then such fixes are no longer important. The lack of support for gp32 fp64 in linux is further reason to permit redefining it. Would you be happy to retain the same builtin defines for FP64 if changing its behaviour (i.e. __mips_fpr=64)? > The corresponding asm would then be ".set fp=xx". > > Either way, a new .set option would be better than a specific .fr > directive because it gives you access to the option stack (".set > push"/".set pop"). > > I'm not sure about: > > If an assembly directive is seen prior to the start of the text > section then this modifies the default mode for the module. > > This isn't how any of the existing options work and I think the > inconsistency would be confusing. It also means that if the first > function in a file happens to force a local mode (e.g. > because it's an ifunc implementation) then you'd have to remember to > write: > > .fr x > .fr 1 > > so that the first sets the mode for the module and the second sets it > for the first function. The different treatment of the two lines > wouldn't be obvious at first glance. > > How about instead having a separate directive that explicitly sets the > global value of an option? I.e. something like ".module", taking the > same options as ".set". Better names welcome. :-) Use of a different directive to actually affect the overall mode of a module sounds like a good plan and it avoids the weird behaviour. The only thing specifically needed is that the assembly file records the mode it was written for. Getting the wrong command line option would otherwise lead to unusual runtime failures. We have been/are still discussing this point so it's no surprise you have commented on it too. I'll wait for any further comments on this area and update accordingly. > The scheme allows an ifunc to request a mode and effectively gives the > choice to the firstcomer. Every other ifunc then has to live with the > choice. I don't think that's a good idea, since the order that ifuncs > are called isn't well-defined or under direct user control. > > Since ifuncs would have to live with a choice made by other ifuncs, in > practice they must all be prepared to deal with FR=1 if linked into a > fully-modeless or FR1 program, and to deal with FR=0 if linked into a > fully-modeless or FR0 program. So IMO the dynamic linker should simply > set FR=1 for modeless programs if the hardware supports it and set it to > 0 for modeless programs otherwise, like you say in the first paragraph > of 9.4. The ifunc interaction should possibly be moved to a different proposal. We could reduce this down to a simple statem
Re: [RFC][PATCH 0/5] arch: atomic rework
On Sun, Feb 23, 2014 at 11:31 AM, Linus Torvalds wrote: > > Let me think about it some more, but my gut feel is that just tweaking > the definition of what "ordered" means is sufficient. > > So to go back to the suggested ordering rules (ignoring the "restrict" > part, which is just to clarify that ordering through other means to > get to the object doesn't matter), I suggested: > > "the consume ordering guarantees the ordering between that > atomic read and the accesses to the object that the pointer > points to" > > and I think the solution is to just say that this ordering acts as a > fence. It doesn't say exactly *where* the fence is, but it says that > there is *some* fence between the load of the pointer and any/all > accesses to the object through that pointer. I'm wrong. That doesn't work. At all. There is no ordering except through the pointer chain. So I think saying just that, and nothing else (no magic fences, no nothing) is the right thing: "the consume ordering guarantees the ordering between that atomic read and the accesses to the object that the pointer points to directly or indirectly through a chain of pointers" The thing is, anything but a chain of pointers (and maybe relaxing it to "indexes in tables" in addition to pointers) doesn't really work. The current standard tries to break it at "obvious" points that can lose the data dependency (either by turning it into a control dependency, or by just dropping the value, like the left-hand side of a comma-expression), but the fact is, it's broken. It's broken not just because the value can be lost other ways (ie the "p-p" example), it's broken because the value can be turned into a control dependency so many other ways too. Compilers regularly turn arithmetic ops with logical comparisons into branches. So an expression like "a = !!ptr" carries a dependency in the current C standard, but it's entirely possible that a compiler ends up turning it into a compare-and-branch rather than a compare-and-set-conditional, depending on just exactly how "a" ends up being used. That's true even on an architecture like ARM that has a lot of conditional instructions (there are way less if you compile for Thumb, for example, but compilers also do things like "if there are more than N predicated instructions I'll just turn it into a branch-over instead"). So I think the C standard needs to just explicitly say that you can walk a chain of pointers (with that possible "indexes in arrays" extension), and nothing more. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 4:57 PM, Linus Torvalds wrote: > On Sun, Feb 23, 2014 at 11:31 AM, Linus Torvalds > wrote: >> >> Let me think about it some more, but my gut feel is that just tweaking >> the definition of what "ordered" means is sufficient. >> >> So to go back to the suggested ordering rules (ignoring the "restrict" >> part, which is just to clarify that ordering through other means to >> get to the object doesn't matter), I suggested: >> >> "the consume ordering guarantees the ordering between that >> atomic read and the accesses to the object that the pointer >> points to" >> >> and I think the solution is to just say that this ordering acts as a >> fence. It doesn't say exactly *where* the fence is, but it says that >> there is *some* fence between the load of the pointer and any/all >> accesses to the object through that pointer. > > I'm wrong. That doesn't work. At all. There is no ordering except > through the pointer chain. > > So I think saying just that, and nothing else (no magic fences, no > nothing) is the right thing: > > "the consume ordering guarantees the ordering between that > atomic read and the accesses to the object that the pointer > points to directly or indirectly through a chain of pointers" To me that reads like int i; int *q = &i; int **p = &q; atomic_XXX (p, CONSUME); orders against accesses '*p', '**p', '*q' and 'i'. Thus it seems they want to say that it orders against aliased storage - but then go further and include "indirectly through a chain of pointers"?! Thus an atomic read of a int * orders against any 'int' memory operation but not against 'float' memory operations? Eh ... Just jumping in to throw in my weird-2-cents. Richard.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 8:27 AM, Richard Biener wrote: > > To me that reads like > > int i; > int *q = &i; > int **p = &q; > > atomic_XXX (p, CONSUME); > > orders against accesses '*p', '**p', '*q' and 'i'. Thus it seems they > want to say that it orders against aliased storage - but then go further > and include "indirectly through a chain of pointers"?! Thus an > atomic read of a int * orders against any 'int' memory operation but > not against 'float' memory operations? No, it's not about type at all, and the "chain of pointers" can be much more complex than that, since the "int *" can point to within an object that contains other things than just that "int" (the "int" can be part of a structure that then has pointers to other structures etc). So in your example, ptr = atomic_read(p, CONSUME); would indeed order against the subsequent access of the chain through *that* pointer (the whole "restrict" thing that I left out as a separate thing, which was probably a mistake), but certainly not against any integer pointer, and certainly not against any aliasing pointer chains. So yes, the atomic_read() would be ordered wrt '*ptr' (getting 'q') _and_ '**ptr' (getting 'i'), but nothing else - including just the aliasing access of dereferencing 'i' directly. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 8:37 AM, Linus Torvalds wrote: > > So yes, the atomic_read() would be ordered wrt '*ptr' (getting 'q') > _and_ '**ptr' (getting 'i'), but nothing else - including just the > aliasing access of dereferencing 'i' directly. Btw, what CPU architects and memory ordering guys tend to do in documentation is give a number of "litmus test" pseudo-code sequences to show the effects and intent of the language. I think giving those kinds of litmus tests for both "this is ordered" and "this is not ordered" cases like the above is would be a great clarification. Partly because the language is going to be somewhat legalistic and thus hard to wrap your mind around, and partly to really hit home the *intent* of the language, which I think is actually fairly clear to both compiler writers and to programmers. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
Hi, On Mon, 24 Feb 2014, Linus Torvalds wrote: > > To me that reads like > > > > int i; > > int *q = &i; > > int **p = &q; > > > > atomic_XXX (p, CONSUME); > > > > orders against accesses '*p', '**p', '*q' and 'i'. Thus it seems they > > want to say that it orders against aliased storage - but then go further > > and include "indirectly through a chain of pointers"?! Thus an > > atomic read of a int * orders against any 'int' memory operation but > > not against 'float' memory operations? > > No, it's not about type at all, and the "chain of pointers" can be > much more complex than that, since the "int *" can point to within an > object that contains other things than just that "int" (the "int" can > be part of a structure that then has pointers to other structures > etc). So, let me try to poke holes into your definition or increase my understanding :) . You said "chain of pointers"(dereferences I assume), e.g. if p is result of consume load, then access to p->here->there->next->prev->stuff is supposed to be ordered with that load (or only when that last load/store itself is also an atomic load or store?). So, what happens if the pointer deref chain is partly hidden in some functions: A * adjustptr (B *ptr) { return &ptr->here->there->next; } B * p = atomic_XXX (&somewhere, consume); adjustptr(p)->prev->stuff = bla; As far as I understood you, this whole ptrderef chain business would be only an optimization opportunity, right? So if the compiler can't be sure how p is actually used (as in my function-using case, assume adjustptr is defined in another unit), then the consume load would simply be transformed into an acquire (or whatever, with some barrier I mean)? Only _if_ the compiler sees all obvious uses of p (indirectly through pointer derefs) can it, yeah, do what with the consume load? Ciao, Michael.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 07:57:24AM -0800, Linus Torvalds wrote: > On Sun, Feb 23, 2014 at 11:31 AM, Linus Torvalds > wrote: > > > > Let me think about it some more, but my gut feel is that just tweaking > > the definition of what "ordered" means is sufficient. > > > > So to go back to the suggested ordering rules (ignoring the "restrict" > > part, which is just to clarify that ordering through other means to > > get to the object doesn't matter), I suggested: > > > > "the consume ordering guarantees the ordering between that > > atomic read and the accesses to the object that the pointer > > points to" > > > > and I think the solution is to just say that this ordering acts as a > > fence. It doesn't say exactly *where* the fence is, but it says that > > there is *some* fence between the load of the pointer and any/all > > accesses to the object through that pointer. > > I'm wrong. That doesn't work. At all. There is no ordering except > through the pointer chain. > > So I think saying just that, and nothing else (no magic fences, no > nothing) is the right thing: > > "the consume ordering guarantees the ordering between that > atomic read and the accesses to the object that the pointer > points to directly or indirectly through a chain of pointers" > > The thing is, anything but a chain of pointers (and maybe relaxing it > to "indexes in tables" in addition to pointers) doesn't really work. > > The current standard tries to break it at "obvious" points that can > lose the data dependency (either by turning it into a control > dependency, or by just dropping the value, like the left-hand side of > a comma-expression), but the fact is, it's broken. > > It's broken not just because the value can be lost other ways (ie the > "p-p" example), it's broken because the value can be turned into a > control dependency so many other ways too. > > Compilers regularly turn arithmetic ops with logical comparisons into > branches. So an expression like "a = !!ptr" carries a dependency in > the current C standard, but it's entirely possible that a compiler > ends up turning it into a compare-and-branch rather than a > compare-and-set-conditional, depending on just exactly how "a" ends up > being used. That's true even on an architecture like ARM that has a > lot of conditional instructions (there are way less if you compile for > Thumb, for example, but compilers also do things like "if there are > more than N predicated instructions I'll just turn it into a > branch-over instead"). > > So I think the C standard needs to just explicitly say that you can > walk a chain of pointers (with that possible "indexes in arrays" > extension), and nothing more. I am comfortable with this. My desire for also marking the later pointers does not make sense without some automated way of validating them, which I don't immediately see a way to do. So let me try laying out the details. Sticking with pointers for the moment, if we reach agreement on these, I will try expanding to integers. 1. A pointer value obtained from a memory_order_consume load is part of a pointer chain. I am calling the pointer itself a "chained pointer" for the moment. 2. Note that it is the value that qualifies as being chained, not the variable. For example, given pointer variable might hold a chained pointer at one point in the code, then a non-chained pointer later. Therefore, "q = p", where "q" is a pointer and "p" is a chained pointer results in "q" containing a chained pointer. 3. Adding or subtracting an integer to/from a chained pointer results in another chained pointer in that same pointer chain. 4. Bitwise operators ("&", "|", "^", and I suppose also "~") applied to a chained pointer and an integer results in another chained pointer in that same pointer chain. 5. Consider a sequence as follows: dereference operator (unary "*", "[]", "->") optionally followed by a series of direct selection operators ("."), finally (and unconditionally) followed by a unary "&" operator. Applying such a sequence to a chained pointer results in another chained pointer in the same chain. Given a chained pointer "p", examples include "&p[3]", "&p->a.b.c.d.e.f.g", and "&*p". 6. The expression "p->f", where "p" is a chained pointer and "f" is a pointer, results in a chained pointer. FWIW, this means that pointer chains can overlap as in this example: p = atomic_load_explicit(&gp, memory_order_consume); q = atomic_load_explicit(&p->ap, memory_order_consume); x = q->a; This should be fine, I don't see any problems with this. 7. Applying a pointer cast to a chained pointer results in a chained pointer. 8. Applying any of the following operators to a chained pointer res
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 05:55:50PM +0100, Michael Matz wrote: > Hi, > > On Mon, 24 Feb 2014, Linus Torvalds wrote: > > > > To me that reads like > > > > > > int i; > > > int *q = &i; > > > int **p = &q; > > > > > > atomic_XXX (p, CONSUME); > > > > > > orders against accesses '*p', '**p', '*q' and 'i'. Thus it seems they > > > want to say that it orders against aliased storage - but then go further > > > and include "indirectly through a chain of pointers"?! Thus an > > > atomic read of a int * orders against any 'int' memory operation but > > > not against 'float' memory operations? > > > > No, it's not about type at all, and the "chain of pointers" can be > > much more complex than that, since the "int *" can point to within an > > object that contains other things than just that "int" (the "int" can > > be part of a structure that then has pointers to other structures > > etc). > > So, let me try to poke holes into your definition or increase my > understanding :) . You said "chain of pointers"(dereferences I assume), > e.g. if p is result of consume load, then access to > p->here->there->next->prev->stuff is supposed to be ordered with that load > (or only when that last load/store itself is also an atomic load or > store?). > > So, what happens if the pointer deref chain is partly hidden in some > functions: > > A * adjustptr (B *ptr) { return &ptr->here->there->next; } > B * p = atomic_XXX (&somewhere, consume); > adjustptr(p)->prev->stuff = bla; > > As far as I understood you, this whole ptrderef chain business would be > only an optimization opportunity, right? So if the compiler can't be sure > how p is actually used (as in my function-using case, assume adjustptr is > defined in another unit), then the consume load would simply be > transformed into an acquire (or whatever, with some barrier I mean)? Only > _if_ the compiler sees all obvious uses of p (indirectly through pointer > derefs) can it, yeah, do what with the consume load? Good point, I left that out of my list. Adding it: 13. By default, pointer chains do not propagate into or out of functions. In implementations having attributes, a [[carries_dependency]] may be used to mark a function argument or return as passing a pointer chain into or out of that function. If a function does not contain memory_order_consume loads and also does not contain [[carries_dependency]] attributes, then that function may be compiled using any desired dependency-breaking optimizations. The ordering effects are implementation defined when a given pointer chain passes into or out of a function through a parameter or return not marked with a [[carries_dependency]] attributed. Note that this last paragraph differs from the current standard, which would require ordering regardless. Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 8:55 AM, Michael Matz wrote: > > So, let me try to poke holes into your definition or increase my > understanding :) . You said "chain of pointers"(dereferences I assume), > e.g. if p is result of consume load, then access to > p->here->there->next->prev->stuff is supposed to be ordered with that load > (or only when that last load/store itself is also an atomic load or > store?). It's supposed to be ordered wrt the first load (the consuming one), yes. > So, what happens if the pointer deref chain is partly hidden in some > functions: No problem. The thing is, the ordering is actually handled by the CPU in all relevant cases. So the compiler doesn't actually need to *do* anything. All this legalistic stuff is just to describe the semantics and the guarantees. The problem is two cases: (a) alpha (which doesn't really order any accesses at all, not even dependent loads), but for a compiler alpha is actually trivial: just add a "rmb" instruction after the load, and you can't really do anything else (there's a few optimizations you can do wrt the rmb, but they are very specific and simple). So (a) is a "problem", but the solution is actually really simple, and gives very *strong* guarantees: on alpha, a "consume" ends up being basically the same as a read barrier after the load, with only very minimal room for optimization. (b) ARM and powerpc and similar architectures, that guarantee the data dependency as long as it is an *actual* data dependency, and never becomes a control dependency. On ARM and powerpc, control dependencies do *not* order accesses (the reasons boil down to essentially: branch prediction breaks the dependency, and instructions that come after the branch can be happily executed before the branch). But it's almost impossible to describe that in the standard, since compilers can (and very much do) turn a control dependency into a data dependency and vice versa. So the current standard tries to describe that "control vs data" dependency, and tries to limit it to a data dependency. It fails. It fails for multiple reasons - it doesn't allow for trivial optimizations that just remove the data dependency, and it also doesn't allow for various trivial cases where the compiler *does* turn the data dependency into a control dependency. So I really really think that the current C standard language is broken. Unfixably broken. I'm trying to change the "syntactic data dependency" that the current standard uses into something that is clearer and correct. The "chain of pointers" thing is still obviously a data dependency, but by limiting it to pointers, it simplifies the language, clarifies the meaning, avoids all syntactic tricks (ie "p-p" is clearly a syntactic dependency on "p", but does *not* involve in any way following the pointer) and makes it basically impossible for the compiler to break the dependency without doing value prediction, and since value prediction has to be disallowed anyway, that's a feature, not a bug. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 02:55:07PM +0100, Michael Matz wrote: > Hi, > > On Fri, 21 Feb 2014, Paul E. McKenney wrote: > > > > And with conservative I mean "everything is a source of a dependency, and > > > hence can't be removed, reordered or otherwise fiddled with", and that > > > includes code sequences where no atomic objects are anywhere in sight [1]. > > > In the light of that the only realistic way (meaning to not have to > > > disable optimization everywhere) to implement consume as currently > > > specified is to map it to acquire. At which point it becomes pointless. > > > > No, only memory_order_consume loads and [[carries_dependency]] > > function arguments are sources of dependency chains. > > I don't see [[carries_dependency]] in the C11 final draft (yeah, should > get a real copy, I know, but let's assume it's the same language as the > standard). Therefore, yes, only consume loads are sources of > dependencies. The problem with the definition of the "carries a > dependency" relation is not the sources, but rather where it stops. > It's transitively closed over "value of evaluation A is used as operand in > evaluation B", with very few exceptions as per 5.1.2.4#14. Evaluations > can contain function calls, so if there's _any_ chance that an operand of > an evaluation might even indirectly use something resulting from a consume > load then that evaluation must be compiled in a way to not break > dependency chains. > > I don't see a way to generally assume that e.g. the value of a function > argument can impossibly result from a consume load, therefore the compiler > must assume that all function arguments _can_ result from such loads, and > so must disable all depchain breaking optimization (which are many). > > > > [1] Simple example of what type of transformations would be disallowed: > > > > > > int getzero (int i) { return i - i; } > > > > This needs to be as follows: > > > > [[carries_dependency]] int getzero(int i [[carries_dependency]]) > > { > > return i - i; > > } > > > > Otherwise dependencies won't get carried through it. > > So, with the above do you agree that in absense of any other magic (see > below) the compiler is not allowed to transform my initial getzero() > (without the carries_dependency markers) implementation into "return 0;" > because of the C11 rules for "carries-a-dependency"? > > If so, do you then also agree that the specification of "carries a > dependency" is somewhat, err, shall we say, overbroad? >From what I can see, overbroad. The problem is that the C++11 standard defines how carries-dependency interacts with function calls and returns in 7.6.4, which describes the [[carries_dependency]] attribute. For example, 7.6.4p6 says: Function g’s second parameter has a carries_dependency attribute, but its first parameter does not. Therefore, function h’s first call to g carries a dependency into g, but its second call does not. The implementation might need to insert a fence prior to the second call to g. When C11 declined to take attributes, they also left out the part saying how carries-dependency interacts with functions. :-/ Might be fixed by now, checking up on it. One could argue that the bit about emitting fence instructions at function calls and returns is implied by the as-if rule even without this wording, but... > > > depchains don't matter, could _then_ optmize it to zero. But that's > > > insane, especially considering that it's hard to detect if a given > > > context > > > doesn't care for depchains, after all the depchain relation is > > > constructed > > > exactly so that it bleeds into nearly everywhere. So we would most of > > > the time have to assume that the ultimate context will be depchain-aware > > > and therefore disable many transformations. > > > > Any function that does not contain a memory_order_consume load and that > > doesn't have any arguments marked [[carries_dependency]] can be > > optimized just as before. > > And as such marker doesn't exist we must conservatively assume that it's > on _all_ parameters, so I'll stand by my claim. Or that you have to emit a fence instruction when a dependency chain enters or leaves a function in cases where all callers/calles are not visible to the compiler. My preference is that the ordering properties of a carries-dependency chain is implementation defined at the point that it enters or leaves a function without the marker, but others strongly disagreed. ;-) > > > Then inlining getzero would merely add another "# j.dep = i.dep" > > > relation, so depchains are still there but the value optimization can > > > happen before inlining. Having to do something like that I'd find > > > disgusting, and rather rewrite consume into acquire :) Or make the > > > depchain relation somehow realistically implementable. > > > > I was actually OK with arithmetic cancellation breaking the dependency
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 09:28:56AM -0800, Paul E. McKenney wrote: > On Mon, Feb 24, 2014 at 05:55:50PM +0100, Michael Matz wrote: > > Hi, > > > > On Mon, 24 Feb 2014, Linus Torvalds wrote: > > > > > > To me that reads like > > > > > > > > int i; > > > > int *q = &i; > > > > int **p = &q; > > > > > > > > atomic_XXX (p, CONSUME); > > > > > > > > orders against accesses '*p', '**p', '*q' and 'i'. Thus it seems they > > > > want to say that it orders against aliased storage - but then go further > > > > and include "indirectly through a chain of pointers"?! Thus an > > > > atomic read of a int * orders against any 'int' memory operation but > > > > not against 'float' memory operations? > > > > > > No, it's not about type at all, and the "chain of pointers" can be > > > much more complex than that, since the "int *" can point to within an > > > object that contains other things than just that "int" (the "int" can > > > be part of a structure that then has pointers to other structures > > > etc). > > > > So, let me try to poke holes into your definition or increase my > > understanding :) . You said "chain of pointers"(dereferences I assume), > > e.g. if p is result of consume load, then access to > > p->here->there->next->prev->stuff is supposed to be ordered with that load > > (or only when that last load/store itself is also an atomic load or > > store?). > > > > So, what happens if the pointer deref chain is partly hidden in some > > functions: > > > > A * adjustptr (B *ptr) { return &ptr->here->there->next; } > > B * p = atomic_XXX (&somewhere, consume); > > adjustptr(p)->prev->stuff = bla; > > > > As far as I understood you, this whole ptrderef chain business would be > > only an optimization opportunity, right? So if the compiler can't be sure > > how p is actually used (as in my function-using case, assume adjustptr is > > defined in another unit), then the consume load would simply be > > transformed into an acquire (or whatever, with some barrier I mean)? Only > > _if_ the compiler sees all obvious uses of p (indirectly through pointer > > derefs) can it, yeah, do what with the consume load? > > Good point, I left that out of my list. Adding it: > > 13. By default, pointer chains do not propagate into or out of functions. > In implementations having attributes, a [[carries_dependency]] > may be used to mark a function argument or return as passing > a pointer chain into or out of that function. > > If a function does not contain memory_order_consume loads and > also does not contain [[carries_dependency]] attributes, then > that function may be compiled using any desired dependency-breaking > optimizations. > > The ordering effects are implementation defined when a given > pointer chain passes into or out of a function through a parameter > or return not marked with a [[carries_dependency]] attributed. > > Note that this last paragraph differs from the current standard, which > would require ordering regardless. And there is also kill_dependency(), which needs to be added to the list in #8 of operators that take a chained pointer and return something that is not a chained pointer. Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 09:38:46AM -0800, Linus Torvalds wrote: > On Mon, Feb 24, 2014 at 8:55 AM, Michael Matz wrote: > > > > So, let me try to poke holes into your definition or increase my > > understanding :) . You said "chain of pointers"(dereferences I assume), > > e.g. if p is result of consume load, then access to > > p->here->there->next->prev->stuff is supposed to be ordered with that load > > (or only when that last load/store itself is also an atomic load or > > store?). > > It's supposed to be ordered wrt the first load (the consuming one), yes. > > > So, what happens if the pointer deref chain is partly hidden in some > > functions: > > No problem. > > The thing is, the ordering is actually handled by the CPU in all > relevant cases. So the compiler doesn't actually need to *do* > anything. All this legalistic stuff is just to describe the semantics > and the guarantees. > > The problem is two cases: > > (a) alpha (which doesn't really order any accesses at all, not even > dependent loads), but for a compiler alpha is actually trivial: just > add a "rmb" instruction after the load, and you can't really do > anything else (there's a few optimizations you can do wrt the rmb, but > they are very specific and simple). > > So (a) is a "problem", but the solution is actually really simple, and > gives very *strong* guarantees: on alpha, a "consume" ends up being > basically the same as a read barrier after the load, with only very > minimal room for optimization. > > (b) ARM and powerpc and similar architectures, that guarantee the > data dependency as long as it is an *actual* data dependency, and > never becomes a control dependency. > > On ARM and powerpc, control dependencies do *not* order accesses (the > reasons boil down to essentially: branch prediction breaks the > dependency, and instructions that come after the branch can be happily > executed before the branch). But it's almost impossible to describe > that in the standard, since compilers can (and very much do) turn a > control dependency into a data dependency and vice versa. > > So the current standard tries to describe that "control vs data" > dependency, and tries to limit it to a data dependency. It fails. It > fails for multiple reasons - it doesn't allow for trivial > optimizations that just remove the data dependency, and it also > doesn't allow for various trivial cases where the compiler *does* turn > the data dependency into a control dependency. > > So I really really think that the current C standard language is > broken. Unfixably broken. > > I'm trying to change the "syntactic data dependency" that the current > standard uses into something that is clearer and correct. > > The "chain of pointers" thing is still obviously a data dependency, > but by limiting it to pointers, it simplifies the language, clarifies > the meaning, avoids all syntactic tricks (ie "p-p" is clearly a > syntactic dependency on "p", but does *not* involve in any way > following the pointer) and makes it basically impossible for the > compiler to break the dependency without doing value prediction, and > since value prediction has to be disallowed anyway, that's a feature, > not a bug. OK, good point, please ignore my added thirteenth item in the list. Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 9:21 AM, Paul E. McKenney wrote: > > 4. Bitwise operators ("&", "|", "^", and I suppose also "~") > applied to a chained pointer and an integer results in another > chained pointer in that same pointer chain. No. You cannot define it this way. Taking the value of a pointer and doing a bitwise operation that throws away all the bits (or even *most* of the bits) results in the compiler easily being able to turn the "chain" into a non-chain. The obvious example being "val & 0", but things like "val & 1" are in practice also something that compilers easily turn into control dependencies instead of data dependencies. So you can talk about things like "aligning the pointer value to object boundaries" etc, but it really cannot and *must* not be about the syntactic operations. The same goes for "adding and subtracting an integer". The *syntax* doesn't matter. It's about remaining information. Doing "p-(int)p" or "p+(-(int)p)" doesn't leave any information despite being "subtracting and adding an integer" at a syntactic level. Syntax is meaningless. Really. > 8. Applying any of the following operators to a chained pointer > results in something that is not a chained pointer: > "()", sizeof, "!", "*", "/", "%", ">>", "<<", "<", ">", "<=", > ">=", "==", "!=", "&&", and "||". Parenthesis? I'm assuming that you mean calling through the chained pointer. Also, I think all of /, * and % are perfectly fine, and might be used for that "aligning the pointer" operation that is fine. Linus
Re: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking
Matthew Fortune writes: > Richard Sandiford writes >> I understand the need to deprecate the current -mgp32 -mfp64 behaviour. >> I don't think we should deprecate -mfp64 itself though. Instead, why >> not keep -mfp32 as meaning FR0, -mfp64 meaning FR1 and add -mfpxx for >> modeless? So rather than deprecating the -mgp32 -mfp64 combination and >> adding -mfr, we'd just make -mgp32 -mfp64 generate the new FR1 form in >> which the odd-numbered registers are call-clobbered rather than the old >> form in which they were call-saved. > > Extreme caution is the only reason why the design avoided changing fp64 > behaviour (basically in case anyone had serious objection). If you would > be happy with a change of behaviour for -mgp32 -mfp64 then that is a > great start. Yeah, my first impression is that keeping the current interface would be much better than adding a new set of options. >> AIUI the old form never really worked reliably due to things like >> newlib's setjmp not preserving the odd-numbered registers, so it doesn't >> seem worth keeping around. Also, the old form is identified by the GNU >> attribute (4, 4) so it'd be easy for the linker to reject links between >> the old and the new form. > > That is true. You will have noticed a number of changes over recent > months to start fixing fp64 as currently defined but having found this > new solution then such fixes are no longer important. The lack of > support for gp32 fp64 in linux is further reason to permit redefining > it. Would you be happy to retain the same builtin defines for FP64 if > changing its behaviour (i.e. __mips_fpr=64)? I think that should be OK. I suppose a natural follow-on question is what __mips_fpr should be for -mfpxx. Maybe just 0? If we want to be extra cautious we could define a second set of macros alongside the old ones. >> The scheme allows an ifunc to request a mode and effectively gives the >> choice to the firstcomer. Every other ifunc then has to live with the >> choice. I don't think that's a good idea, since the order that ifuncs >> are called isn't well-defined or under direct user control. >> >> Since ifuncs would have to live with a choice made by other ifuncs, in >> practice they must all be prepared to deal with FR=1 if linked into a >> fully-modeless or FR1 program, and to deal with FR=0 if linked into a >> fully-modeless or FR0 program. So IMO the dynamic linker should simply >> set FR=1 for modeless programs if the hardware supports it and set it to >> 0 for modeless programs otherwise, like you say in the first paragraph >> of 9.4. > > The ifunc interaction should possibly be moved to a different > proposal. We could reduce this down to a simple statement that > interaction with ifunc needs to be considered when finalising MIPS ifunc > support in general. Sounds good. >> You allow the mode to be changed midexecution if a new FR0 or FR1 object >> is loaded. Is it really worth supporting that though? >> It has the same problem as the ifuncs: once you've dlopen()ed an object, >> you fix the mode for the whole program, even after the dlclose(). >> Unless we know of specific cases where this is needed, maybe it would be >> safer to fix the mode before execution based on DT_NEEDED libraries and >> allow the mode of modeless programs to be overridden by an environment >> variable. > > Scanning the entire set of DT_NEEDED libraries would achieve most of > what full dynamic mode switching gives us, it is essentially the first > stage of the dynamic mode switching described in the proposal > anyway. However, I am concerned about excluding dlopen()ed objects from > mode selection though (not so worried about excluding ifunc, that could > just fix the mode before resolving the first one). One specific concern > is for Android where I believe we have the situation where native > applications are loaded as (a form of) shared library. This means a mode > requirement can be introduced late on. In an Android environment it is > unlikely to be acceptable to have to do something special to load an > application that happens to have a specific mode requirement so dynamic > selection is useful. This is more of a transitional problem than > anything but making it a smooth process is quite important. I'm also not > sure that there is much more effort required for a dynamic linker to > take account of dlopen()ed objects in addition to DT_NEEDED, changes are > needed in this code regardless. As far as GNU/Linux goes, if we do end up with a function in something like a modeless libm that is implemented as an FR-aware ifunc, that would force the choice to be made early anyway. So we have this very specific case where everything in the initial process is modeless, no ifuncs take advantage of the FR setting, and a dlopen()ed object was compiled as fr0 rather than modeless. I agree it's possible but it seems unlikely. I know nothing about the way Android loading works though. :-) Could you describe it in more detail?
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 10:14:01AM -0800, Linus Torvalds wrote: > On Mon, Feb 24, 2014 at 9:21 AM, Paul E. McKenney > wrote: > > > > 4. Bitwise operators ("&", "|", "^", and I suppose also "~") > > applied to a chained pointer and an integer results in another > > chained pointer in that same pointer chain. > > No. You cannot define it this way. Taking the value of a pointer and > doing a bitwise operation that throws away all the bits (or even > *most* of the bits) results in the compiler easily being able to turn > the "chain" into a non-chain. > > The obvious example being "val & 0", but things like "val & 1" are in > practice also something that compilers easily turn into control > dependencies instead of data dependencies. Indeed, most of the bits need to remain for this to work. > So you can talk about things like "aligning the pointer value to > object boundaries" etc, but it really cannot and *must* not be about > the syntactic operations. > > The same goes for "adding and subtracting an integer". The *syntax* > doesn't matter. It's about remaining information. Doing "p-(int)p" or > "p+(-(int)p)" doesn't leave any information despite being "subtracting > and adding an integer" at a syntactic level. > > Syntax is meaningless. Really. Good points. How about the following replacements? 3. Adding or subtracting an integer to/from a chained pointer results in another chained pointer in that same pointer chain. The results of addition and subtraction operations that cancel the chained pointer's value (for example, "p-(long)p" where "p" is a pointer to char) are implementation defined. 4. Bitwise operators ("&", "|", "^", and I suppose also "~") applied to a chained pointer and an integer for the purposes of alignment and pointer translation results in another chained pointer in that same pointer chain. Other uses of bitwise operators on chained pointers (for example, "p|~0") are implementation defined. > > 8. Applying any of the following operators to a chained pointer > > results in something that is not a chained pointer: > > "()", sizeof, "!", "*", "/", "%", ">>", "<<", "<", ">", "<=", > > ">=", "==", "!=", "&&", and "||". > > Parenthesis? I'm assuming that you mean calling through the chained pointer. Yes, good point. Of course, parentheses for grouping just pass the value through without affecting the chained-ness. > Also, I think all of /, * and % are perfectly fine, and might be used > for that "aligning the pointer" operation that is fine. Something like this? char *p; p = p - (unsigned long)p % 8; I was thinking of this as subtraction -- the "p" gets moduloed by 8, which loses the chained-pointer designation. But that is OK because that designation gets folded back in by the subtraction. Am I missing a use case? That leaves things like this one: p = (p / 8) * 8; I cannot think of any other legitimate use for "/" and "*". Here is an updated #8 and a new 8a: 8. Applying any of the following operators to a chained pointer results in something that is not a chained pointer: function call "()", sizeof, "!", "%", ">>", "<<", "<", ">", "<=", ">=", "==", "!=", "&&", "||", and "kill_dependency()". 8a. Dividing a chained pointer by an integer and multiplying it by that same integer (for example, to align that pointer) results in a chained pointer in that same pointer chain. The ordering effects of other uses of infix "*" and "/" on chained pointers are implementation defined. Does that capture it? Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 10:53 AM, Paul E. McKenney wrote: > > Good points. How about the following replacements? > > 3. Adding or subtracting an integer to/from a chained pointer > results in another chained pointer in that same pointer chain. > The results of addition and subtraction operations that cancel > the chained pointer's value (for example, "p-(long)p" where "p" > is a pointer to char) are implementation defined. > > 4. Bitwise operators ("&", "|", "^", and I suppose also "~") > applied to a chained pointer and an integer for the purposes > of alignment and pointer translation results in another > chained pointer in that same pointer chain. Other uses > of bitwise operators on chained pointers (for example, > "p|~0") are implementation defined. Quite frankly, I think all of this language that is about the actual operations is irrelevant and wrong. It's not going to help compiler writers, and it sure isn't going to help users that read this. Why not just talk about "value chains" and that any operations that restrict the value range severely end up breaking the chain. There is no point in listing the operations individually, because every single operation *can* restrict things. Listing individual operations and depdendencies is just fundamentally wrong. For example, let's look at this obvious case: int q,*p = atomic_read(&pp, consume); .. nothing modifies 'p' .. q = *p; and there are literally *zero* operations that modify the value change, so obviously the two operations are ordered, right? Wrong. What if the "nothing modifies 'p'" part looks like this: if (p != &myvariable) return; and now any sane compiler will happily optimize "q = *p" into "q = myvariable", and we're all done - nothing invalid was ever So my earlier suggestion tried to avoid this by having the restrict thing, so the above wouldn't work. But your (and the current C standards) attempt to define this with some kind of syntactic dependency carrying chain will _inevitably_ get this wrong, and/or be too horribly complex to actually be useful. Seriously, don't do it. I claim that all your attempts to do this crazy syntactic "these operations maintain the chained pointers" is broken. The fact that you have changed "carries a dependency" to "chained pointer" changes NOTHING. So just give it up. It's a fundamentally broken model. It's *wrong*, but even more importantly, it's not even *useful*, since it ends up being too complicated for a compiler writer or a programmer to understand. I really really really think you need to do this at a higher conceptual level, get away from all these idiotic "these operations maintain the chain" crap. Because there *is* no such list. Quite frankly, any standards text that has that "[[carries_dependency]]" or "[[kill_dependency]]" or whatever attribute is broken. It's broken because the whole concept is TOTALLY ALIEN to the compiler writer or the programmer. It makes no sense. It's purely legalistic language that has zero reason to exist. It's non-intuitive for everybody. And *any* language that talks about the individual operations only encourages people to play legalistic games that actually defeat the whole purpose (namely that all relevant CPU's are going to implement that consume ordering guarantee natively, with no extra code generation rules AT ALL). So any time you talk about some random detail of some operation, somebody is going to come up with a "trick" that defeats things. So don't do it. There is absolutely ZERO difference between any of the arithmetic operations, be they bitwise, additive, multiplicative, shifts, whatever. The *only* thing that matters for all of them is whether they are "value-preserving", or whether they drop so much information that the compiler might decide to use a control dependency instead. That's true for every single one of them. Similarly, actual true control dependencies that limit the problem space sufficiently that the actual pointer value no longer has significant information in it (see the above example) are also things that remove information to the point that only a control dependency remains. Even when the value itself is not modified in any way at all. Linus
[GSoc 2014] OpenMP runtime improvements
Hi, I am interested in contributing to the OpenMP project as part of GSoC 2014. I am mailing to discuss ideas, to see if someone is willing to mentor me and also if possible, to get a test patch in before the end of application process so as to verify my qualification! Looking forward to your feedback! Thanks, Pranith
build breakage in libsanitizer
Hey all, I've just tried building Trunk from branches/google/main/ and got the following failure in libsanitizer: In file included from /mnt/test/rvbd-root-gcc49/usr/include/features.h:356:0, from /mnt/test/rvbd-root-gcc49/usr/include/arpa/inet.h:22, from ../../../../gcc49-google-main/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc:20: /mnt/test/rvbd-root-gcc49/usr/include/sys/timex.h:145:31: error: expected initializer before 'throw' __asm__ ("ntp_gettimex") __THROW; ^ This is an intel-to-intel cross-compiler that is being built against linux-2.6.32.27 headers and glibc-2.12 (with a few redhat patches). The system header in question contains the following: #if defined __GNUC__ && __GNUC__ >= 2 extern int ntp_gettime (struct ntptimeval *__ntv) __asm__ ("ntp_gettimex") __THROW; #else extern int ntp_gettimex (struct ntptimeval *__ntv) __THROW; # define ntp_gettime ntp_gettimex #endif This same header works against gcc4.8 (or is not used). Could someone clarify what libsanitizer needs here and what gcc dislikes please? Which should I patch? Thanks in advance! Oleg.
Re: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking
On 02/24/2014 10:42 AM, Richard Sandiford wrote: >... >> AIUI the old form never really worked reliably due to things like >> newlib's setjmp not preserving the odd-numbered registers, so it doesn't >>> seem worth keeping around. Also, the old form is identified by the GNU >>> attribute (4, 4) so it'd be easy for the linker to reject links between >>> the old and the new form. >> >> That is true. You will have noticed a number of changes over recent >> months to start fixing fp64 as currently defined but having found this >> new solution then such fixes are no longer important. The lack of >> support for gp32 fp64 in linux is further reason to permit redefining >> it. Would you be happy to retain the same builtin defines for FP64 if >> changing its behaviour (i.e. __mips_fpr=64)? > >I think that should be OK. I suppose a natural follow-on question >is what __mips_fpr should be for -mfpxx. Maybe just 0? I think we should think carefully about just making -mfp64 just disappear. The support has existed for bare iron for quite a while, and we do internal testing of MSA using -mfp64. I'd rather avoid a flag day. It would be good to continue recognizing that object files with attribute (4, 4) (-mfp64) are not compatible with other objects. Note that adding EF_MIPS_FP64 is a recent change, which was made to enable Linux kernel support, and there probably is little code out there that is dependent on it. I am more comfortable in changing the interpretation of EF_MIPS_FP64 to simply mean that the process is to be set with FR=1 mode as apposed to FR=0. This will allow us to ease our way off of -mfp64 to -mfr1 mode for those environments where FR=1 is really needed. Doug >...
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 11:54:46AM -0800, Linus Torvalds wrote: > On Mon, Feb 24, 2014 at 10:53 AM, Paul E. McKenney > wrote: > > > > Good points. How about the following replacements? > > > > 3. Adding or subtracting an integer to/from a chained pointer > > results in another chained pointer in that same pointer chain. > > The results of addition and subtraction operations that cancel > > the chained pointer's value (for example, "p-(long)p" where "p" > > is a pointer to char) are implementation defined. > > > > 4. Bitwise operators ("&", "|", "^", and I suppose also "~") > > applied to a chained pointer and an integer for the purposes > > of alignment and pointer translation results in another > > chained pointer in that same pointer chain. Other uses > > of bitwise operators on chained pointers (for example, > > "p|~0") are implementation defined. > > Quite frankly, I think all of this language that is about the actual > operations is irrelevant and wrong. > > It's not going to help compiler writers, and it sure isn't going to > help users that read this. > > Why not just talk about "value chains" and that any operations that > restrict the value range severely end up breaking the chain. There is > no point in listing the operations individually, because every single > operation *can* restrict things. Listing individual operations and > depdendencies is just fundamentally wrong. > > For example, let's look at this obvious case: > >int q,*p = atomic_read(&pp, consume); >.. nothing modifies 'p' .. >q = *p; > > and there are literally *zero* operations that modify the value > change, so obviously the two operations are ordered, right? > > Wrong. > > What if the "nothing modifies 'p'" part looks like this: > > if (p != &myvariable) > return; > > and now any sane compiler will happily optimize "q = *p" into "q = > myvariable", and we're all done - nothing invalid was ever Yes, the compiler could do that. But it would still be required to carry a dependency from the memory_order_consume read to the "*p", which it could do by compiling "q = *p" rather than "q = myvariable" on the one hand or by emitting a memory-barrier instruction on the other. This was the point of #12: 12. A memory_order_consume load carries a dependency to any dereference operator (unary "*", "[]", and "->") in the resulting pointer chain. Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 2:37 PM, Paul E. McKenney wrote: >> >> What if the "nothing modifies 'p'" part looks like this: >> >> if (p != &myvariable) >> return; >> >> and now any sane compiler will happily optimize "q = *p" into "q = >> myvariable", and we're all done - nothing invalid was ever > > Yes, the compiler could do that. But it would still be required to > carry a dependency from the memory_order_consume read to the "*p", But that's *BS*. You didn't actually listen to the main issue. Paul, why do you insist on this carries-a-dependency crap? It's broken. If you don't believe me, then believe the compiler person who already piped up and told you so. The "carries a dependency" model is broken. Get over it. No sane compiler will ever distinguish two different registers that have the same value from each other. No sane compiler will ever say "ok, register r1 has the exact same value as register r2, but r2 carries the dependency, so I need to make sure to pass r2 to that function or use it as a base pointer". And nobody sane should *expect* a compiler to distinguish two registers with the same value that way. So the whole model is broken. I gave an alternate model (the "restrict"), and you didn't seem to understand the really fundamental difference. It's not a language difference, it's a conceptual difference. In the broken "carries a dependency" model, you have fight all those aliases that can have the same value, and it is not a fight you can win. We've had the "p-p" examples, we've had the "p&0" examples, but the fact is, that "p==&myvariable" example IS EXACTLY THE SAME THING. All three of those things: "p-p", "p&0", and "p==&myvariable" mean that any compiler worth its salt now know that "p" carries no information, and will optimize it away. So please stop arguing against that. Whenever you argue against that simple fact, you are arguing against sane compilers. So *accept* the fact that some operations (and I guarantee that there are more of those than you can think of, and you can create them with various tricks using pretty much *any* feature in the C language) essentially take the data information away. And just accept the fact that then the ordering goes away too. So give up on "carries a dependency". Because there will be cases where that dependency *isn't* carried. The language of the standard needs to get *away* from the broken model, because otherwise the standard is broken. I suggest we instead talk about "litmus tests" and why certain code sequences are ordered, and others are not. So the code sequence I already mentioned is *not* ordered: Litmus test 1: p = atomic_read(pp, consume); if (p == &variable) return p->val; is *NOT* ordered, because the compiler can trivially turn this into "return variable.val", and break the data dependency. This is true *regardless* of any "carries a dependency" language, because that language is insane, and doesn't work when the different pieces here may be in different compilation units. BUT: Litmus test 2: p = atomic_read(pp, consume); if (p != &variable) return p->val; *IS* ordered, because while it looks syntactically a lot like "Litmus test 1", there is no sane way a compiler can use the knowledge that "p is not a pointer to a particular location" to break the data dependency. There is no way in hell that any sane "carries a dependency" model can get the simple tests above right. So give up on it already. "Carries a dependency" cannot work. It's a bad model. You're trying to describe the problem using the wrong tools. Note that my "restrict+pointer to object" language actually got this *right*. The "restrict" part made Litmus test 1 not ordered, because that "p == &variable" success case means that the pointer wasn't restricted, so the pre-requisite for ordering didn't exist. See? The "carries a dependency" is a broken model for this, but there are _other_ models that can work. You tried to rewrite my model into "carries a dependency". That *CANNOT* work. It's like trying to rewrite quantum physics into the Greek model of the four elements. They are not compatible models, and one of them can be shown to not work. Linus
RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking
Richard Sandiford writes > >> AIUI the old form never really worked reliably due to things like > >> newlib's setjmp not preserving the odd-numbered registers, so it > >> doesn't seem worth keeping around. Also, the old form is identified > >> by the GNU attribute (4, 4) so it'd be easy for the linker to reject > >> links between the old and the new form. > > > > That is true. You will have noticed a number of changes over recent > > months to start fixing fp64 as currently defined but having found this > > new solution then such fixes are no longer important. The lack of > > support for gp32 fp64 in linux is further reason to permit redefining > > it. Would you be happy to retain the same builtin defines for FP64 if > > changing its behaviour (i.e. __mips_fpr=64)? > > I think that should be OK. I suppose a natural follow-on question is > what __mips_fpr should be for -mfpxx. Maybe just 0? I'm doing just that in my experimental implementation of all this. > If we want to be extra cautious we could define a second set of macros > alongside the old ones. > > >> You allow the mode to be changed midexecution if a new FR0 or FR1 > >> object is loaded. Is it really worth supporting that though? > >> It has the same problem as the ifuncs: once you've dlopen()ed an > >> object, you fix the mode for the whole program, even after the > dlclose(). > >> Unless we know of specific cases where this is needed, maybe it would > >> be safer to fix the mode before execution based on DT_NEEDED > >> libraries and allow the mode of modeless programs to be overridden by > >> an environment variable. > > > > Scanning the entire set of DT_NEEDED libraries would achieve most of > > what full dynamic mode switching gives us, it is essentially the first > > stage of the dynamic mode switching described in the proposal anyway. > > However, I am concerned about excluding dlopen()ed objects from mode > > selection though (not so worried about excluding ifunc, that could > > just fix the mode before resolving the first one). One specific > > concern is for Android where I believe we have the situation where > > native applications are loaded as (a form of) shared library. This > > means a mode requirement can be introduced late on. In an Android > > environment it is unlikely to be acceptable to have to do something > > special to load an application that happens to have a specific mode > > requirement so dynamic selection is useful. This is more of a > > transitional problem than anything but making it a smooth process is > > quite important. I'm also not sure that there is much more effort > > required for a dynamic linker to take account of dlopen()ed objects in > > addition to DT_NEEDED, changes are needed in this code regardless. > > As far as GNU/Linux goes, if we do end up with a function in something > like a modeless libm that is implemented as an FR-aware ifunc, that > would force the choice to be made early anyway. So we have this very > specific case where everything in the initial process is modeless, no > ifuncs take advantage of the FR setting, and a dlopen()ed object was > compiled as fr0 rather than modeless. I agree it's possible but it > seems unlikely. A reasonable point. > I know nothing about the way Android loading works though. :-) Could you > describe it in more detail? Is it significantly different from glibc's > dynamic loader running a PIE? I am working from fragments of information on this aspect still so I need to get more clarification from Android developers. My current understanding is that native parts of applications are actually shared libraries and form part of, but not necessarily the entry to, an application. Since such a shared library can't be 'required' by anything it must be loaded explicitly. I'll get clarification but the potential need for dynamic mode switching in Android need not affect the decision that GNU/Linux takes. > >> If we do end up using ELF flags then maybe adding two new EF_MIPS_ABI > >> enums would be better. It's more likely to be trapped by old loaders > >> and avoids eating up those precious remaining bits. > > > > Sound's reasonable but I'm still trying to determine how this > > information can be propagated from loader to dynamic loader. > > The dynamic loader has access to the ELF headers so I didn't think it > would need any help. As I understand it the dynamic loader only has specific access to the program headers of the executable not the ELF headers. There is no question that the dynamic loader has access to DSO ELF headers but we need the start point too. > >> You didn't say specifically how a static program's crt code would > >> know whether it was linked as modeless or in a specific FR mode. > >> Maybe the linker could define a special hidden symbol? > > > > Why do you say crt rather than dlopen? The mode requirement should > > only matter if you want to change it and dlopen should be able to > > access information in the same way
Re: [LM-32] Code generation for address loading
FX MOREL writes: > Hi everyone, > > I am developing on a custom design using the LatticeMico32 > architecture and I use gcc 4.5.1 to compile C code for this arch. > > In this architecture, the loading of an address 0x always > takes two assembly instructions to fetch the address because > immediates are on 16 bits : > mvhi r1, 0x > ori r1, r1, 0x > ... > lw r2, r1 > > In my situation, nearly all the symbols are located in the same 64kB > region and their address share the same hi-part, so I am trying to > minimize the overload of always using two instructions when only one > is needed. > [additional details deleted] > Because the symbol mapping phase is done during linking, I have little > chance to know the future symbol address at code generation but is > there some way I could make this work ? > If I affect the symbol to a dedicated section (with the __attribute__ > ((section())) directive ), is there a way to know its section during > code generation ? > > I understand that I am asking for a very 'dangerous' advice but again, > this will only be a custom optimization for a custom design. Have you considered doing this through custom GNU linker relaxation work? I would try this before hacking away at the compiler. AG > > Thank you. > > F-X Morel
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 03:35:04PM -0800, Linus Torvalds wrote: > On Mon, Feb 24, 2014 at 2:37 PM, Paul E. McKenney > wrote: > >> > >> What if the "nothing modifies 'p'" part looks like this: > >> > >> if (p != &myvariable) > >> return; > >> > >> and now any sane compiler will happily optimize "q = *p" into "q = > >> myvariable", and we're all done - nothing invalid was ever > > > > Yes, the compiler could do that. But it would still be required to > > carry a dependency from the memory_order_consume read to the "*p", > > But that's *BS*. You didn't actually listen to the main issue. > > Paul, why do you insist on this carries-a-dependency crap? Sigh. Read on... > It's broken. If you don't believe me, then believe the compiler person > who already piped up and told you so. > > The "carries a dependency" model is broken. Get over it. > > No sane compiler will ever distinguish two different registers that > have the same value from each other. No sane compiler will ever say > "ok, register r1 has the exact same value as register r2, but r2 > carries the dependency, so I need to make sure to pass r2 to that > function or use it as a base pointer". > > And nobody sane should *expect* a compiler to distinguish two > registers with the same value that way. > > So the whole model is broken. > > I gave an alternate model (the "restrict"), and you didn't seem to > understand the really fundamental difference. It's not a language > difference, it's a conceptual difference. > > In the broken "carries a dependency" model, you have fight all those > aliases that can have the same value, and it is not a fight you can > win. We've had the "p-p" examples, we've had the "p&0" examples, but > the fact is, that "p==&myvariable" example IS EXACTLY THE SAME THING. > > All three of those things: "p-p", "p&0", and "p==&myvariable" mean > that any compiler worth its salt now know that "p" carries no > information, and will optimize it away. > > So please stop arguing against that. Whenever you argue against that > simple fact, you are arguing against sane compilers. So let me see if I understand your reasoning. My best guess is that it goes something like this: 1. The Linux kernel contains code that passes pointers from rcu_dereference() through external functions. 2. Code in the Linux kernel expects the normal RCU ordering guarantees to be in effect even when external functions are involved. 3. When compiling one of these external functions, the C compiler has no way of knowing about these RCU ordering guarantees. 4. The C compiler might therefore apply any and all optimizations to these external functions. 5. This in turn implies that we the only way to prohibit any given optimization from being applied to the results obtained from rcu_dereference() is to prohibit that optimization globally. 6. We have to be very careful what optimizations are globally prohibited, because a poor choice could result in unacceptable performance degradation. 7. Therefore, the only operations that can be counted on to maintain the needed RCU orderings are those where the compiler really doesn't have any choice, in other words, where any reasonable way of computing the result will necessarily maintain the needed ordering. Did I get this right, or am I confused? > So *accept* the fact that some operations (and I guarantee that there > are more of those than you can think of, and you can create them with > various tricks using pretty much *any* feature in the C language) > essentially take the data information away. And just accept the fact > that then the ordering goes away too. Actually, the fact that there are more potential optimizations than I can think of is a big reason for my insistence on the carries-a-dependency crap. My lack of optimization omniscience makes me very nervous about relying on there never ever being a reasonable way of computing a given result without preserving the ordering. > So give up on "carries a dependency". Because there will be cases > where that dependency *isn't* carried. > > The language of the standard needs to get *away* from the broken > model, because otherwise the standard is broken. > > I suggest we instead talk about "litmus tests" and why certain code > sequences are ordered, and others are not. OK... > So the code sequence I already mentioned is *not* ordered: > > Litmus test 1: > > p = atomic_read(pp, consume); > if (p == &variable) > return p->val; > >is *NOT* ordered, because the compiler can trivially turn this into > "return variable.val", and break the data dependency. Right, given your model, the compiler is free to produce code that doesn't order the load from pp against the load from p->val. >This is true *regardless* of any "carries a dependency" language, > because that language
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 3:35 PM, Linus Torvalds wrote: > > Litmus test 1: > > p = atomic_read(pp, consume); > if (p == &variable) > return p->val; > >is *NOT* ordered Btw, don't get me wrong. I don't _like_ it not being ordered, and I actually did spend some time thinking about my earlier proposal on strengthening the 'consume' ordering. I have for the last several years been 100% convinced that the Intel memory ordering is the right thing, and that people who like weak memory ordering are wrong and should try to avoid reproducing if at all possible. But given that we have memory orderings like power and ARM, I don't actually see a sane way to get a good strong ordering. You can teach compilers about cases like the above when they actually see all the code and they could poison the value chain etc. But it would be fairly painful, and once you cross object files (or even just functions in the same compilation unit, for that matter), it goes from painful to just "ridiculously not worth it". So I think the C semantics should mirror what the hardware gives us - and do so even in the face of reasonable optimizations - not try to do something else that requires compilers to treat "consume" very differently. If people made me king of the world, I'd outlaw weak memory ordering. You can re-order as much as you want in hardware with speculation etc, but you should always *check* your speculation and make it *look* like you did everything in order. Which is pretty much the intel memory ordering (ignoring the write buffering). Linus