Re: asking your advice about bug

2014-02-24 Thread Tobias Grosser

On 02/17/2014 06:50 PM, Roman Gareev wrote:



Hi Tobias,


  thanks for the answer!


Hi Roman,

sorry for missing this mail.



  I think that the segfault is being caused by NULL arguments being passedto 
compute_deps
by loop_level_carries_dependences. *This is **causing **an* *assignment of**
NULL values to the following parameters of **compute_deps:* must_raw_no_source,
may_raw_no_source, must_war_no_source, may_war_no_source,
must_waw_no_source, may_waw_no_source. They are being passed to 
subtract_commutative_associative_deps
and dereferenced in the following statements:


  *must_raw_no_source = isl_union_map_subtract (*must_raw_no_source,
x_must_raw_no_source);


  *may_raw_no_source = isl_union_map_subtract (*may_raw_no_source,
x_may_raw_no_source);


  *must_war_no_source = isl_union_map_subtract (*must_war_no_source,
x_must_war_no_source);


  *may_war_no_source = isl_union_map_subtract (*may_war_no_source,
x_may_war_no_source);


  *must_waw_no_source = isl_union_map_subtract (*must_waw_no_source,
x_must_waw_no_source);


  *may_waw_no_source = isl_union_map_subtract (*may_waw_no_source,
x_may_waw_no_source);

  This is the reason of segfault. (All functions mentioned above are located
in gcc/graphite-dependences.c)



Interesting analysis.


  I think that this can be solved by the addition to
subtract_commutative_associative_deps of NULL checking of the following
variables: must_raw_no_source, may_raw_no_source, must_war_no_source,
may_war_no_source, must_waw_no_source, may_waw_no_source. I've implemented
this in the patch, which can be found below.


Yes, this would be a 'solution'. However, I am in fact surprised that 
those variables are NULL at all. Do you have an idea why this is the 
case? Understanding this would help to understand if the patch you 
propose is actually the right solution or if it is just hiding a 
previous bug.


Cheers,
Tobias


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Michael Matz
Hi,

On Fri, 21 Feb 2014, Paul E. McKenney wrote:

> > And with conservative I mean "everything is a source of a dependency, and 
> > hence can't be removed, reordered or otherwise fiddled with", and that 
> > includes code sequences where no atomic objects are anywhere in sight [1].
> > In the light of that the only realistic way (meaning to not have to 
> > disable optimization everywhere) to implement consume as currently 
> > specified is to map it to acquire.  At which point it becomes pointless.
> 
> No, only memory_order_consume loads and [[carries_dependency]]
> function arguments are sources of dependency chains.

I don't see [[carries_dependency]] in the C11 final draft (yeah, should 
get a real copy, I know, but let's assume it's the same language as the 
standard).  Therefore, yes, only consume loads are sources of 
dependencies.  The problem with the definition of the "carries a 
dependency" relation is not the sources, but rather where it stops.  
It's transitively closed over "value of evaluation A is used as operand in 
evaluation B", with very few exceptions as per 5.1.2.4#14.  Evaluations 
can contain function calls, so if there's _any_ chance that an operand of 
an evaluation might even indirectly use something resulting from a consume 
load then that evaluation must be compiled in a way to not break 
dependency chains.

I don't see a way to generally assume that e.g. the value of a function 
argument can impossibly result from a consume load, therefore the compiler 
must assume that all function arguments _can_ result from such loads, and 
so must disable all depchain breaking optimization (which are many).

> > [1] Simple example of what type of transformations would be disallowed:
> > 
> > int getzero (int i) { return i - i; }
> 
> This needs to be as follows:
> 
> [[carries_dependency]] int getzero(int i [[carries_dependency]])
> {
>   return i - i;
> }
> 
> Otherwise dependencies won't get carried through it.

So, with the above do you agree that in absense of any other magic (see 
below) the compiler is not allowed to transform my initial getzero() 
(without the carries_dependency markers) implementation into "return 0;" 
because of the C11 rules for "carries-a-dependency"?

If so, do you then also agree that the specification of "carries a 
dependency" is somewhat, err, shall we say, overbroad?

> > depchains don't matter, could _then_ optmize it to zero.  But that's 
> > insane, especially considering that it's hard to detect if a given context 
> > doesn't care for depchains, after all the depchain relation is constructed 
> > exactly so that it bleeds into nearly everywhere.  So we would most of 
> > the time have to assume that the ultimate context will be depchain-aware 
> > and therefore disable many transformations.
> 
> Any function that does not contain a memory_order_consume load and that 
> doesn't have any arguments marked [[carries_dependency]] can be 
> optimized just as before.

And as such marker doesn't exist we must conservatively assume that it's 
on _all_ parameters, so I'll stand by my claim.

> > Then inlining getzero would merely add another "# j.dep = i.dep" 
> > relation, so depchains are still there but the value optimization can 
> > happen before inlining.  Having to do something like that I'd find 
> > disgusting, and rather rewrite consume into acquire :)  Or make the 
> > depchain relation somehow realistically implementable.
> 
> I was actually OK with arithmetic cancellation breaking the dependency 
> chains.  Others on the committee felt otherwise, and I figured that (1) 
> I wouldn't be writing that kind of function anyway and (2) they knew 
> more about writing compilers than I.  I would still be OK saying that 
> things like "i-i", "i*0", "i%1", "i&0", "i|~0" and so on just break the 
> dependency chain.

Exactly.  I can see the problem that people had with that, though.  There 
are very many ways to write conceiled zeros (or generally neutral elements 
of the function in question).  My getzero() function is one (it could e.g. 
be an assembler implementation).  The allowance to break dependency chains 
would have to apply to such cancellation as well, and so can't simply 
itemize all cases in which cancellation is allowed.  Rather it would have 
had to argue about something like "value dependency", ala "evaluation B 
depends on A, if there exist at least two different values A1 and A2 
(results from A), for which evaluation B (with otherwise same operands) 
yields different values B1 and B2".

Alas, it doesn't, except if you want to understand the term "the value of 
A is used as an operand of B" in that way.  Even then you'd still have the 
second case of the depchain definition, via intermediate not even atomic 
memory stores and loads to make two evaluations be ordered per 
carries-a-dependency.

And even that understanding of "is used" wouldn't be enough, because there 
are cases where the cancellation happens in steps, and where it int

Non-temporal move

2014-02-24 Thread Gopalasubramanian, Ganesh
I could see "storent" pattern in x86 machine descriptions (in sse.md)., but 
internals doc don't mention it. Should we add a description about this in the 
internals doc?

Regards
Ganesh



RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-02-24 Thread Matthew Fortune
Richard Sandiford  writes
> Matthew Fortune  writes:
> > All,
> >
> > Imagination Technologies would like to introduce the design of an O32
> > ABI extension for MIPS to allow it to be used in conjunction with MIPS
> > FPUs having 64-bit floating-point registers. This is a wide-reaching
> > design that involves changes to all components of the MIPS toolchain
> > it is being posted to GCC first and will progress on to other tools.
> > This ABI extension is compatible with the existing O32 ABI definition
> > and will not require the introduction of new build variants
> (multilibs).
> >
> > The design document is relatively large and has been placed on the
> > MIPS Compiler Team wiki to facilitate review:
> >
> > http://dmz-portal.mips.com/wiki/MIPS_O32_ABI_-_FR0_and_FR1_Interlinkin
> > g
> 
> Looks good to me.  It'll be interesting to see whether making the odd-
> numbered call-saved-in-fr0 registers available for frx pays off or
> whether it ends up being better to avoid them.

Indeed, I suspect they should be avoided except for leaf functions. You would 
have to be pretty desperate for a register if you use the caller-and-callee 
save registers!

> I understand the need to deprecate the current -mgp32 -mfp64 behaviour.
> I don't think we should deprecate -mfp64 itself though.  Instead, why
> not keep -mfp32 as meaning FR0, -mfp64 meaning FR1 and add -mfpxx for
> modeless?  So rather than deprecating the -mgp32 -mfp64 combination and
> adding -mfr, we'd just make -mgp32 -mfp64 generate the new FR1 form in
> which the odd-numbered registers are call-clobbered rather than the old
> form in which they were call-saved.

Extreme caution is the only reason why the design avoided changing fp64 
behaviour (basically in case anyone had serious objection). If you would be 
happy with a change of behaviour for -mgp32 -mfp64 then that is a great start.
 
> AIUI the old form never really worked reliably due to things like
> newlib's setjmp not preserving the odd-numbered registers, so it doesn't
> seem worth keeping around.  Also, the old form is identified by the GNU
> attribute (4, 4) so it'd be easy for the linker to reject links between
> the old and the new form.

That is true. You will have noticed a number of changes over recent months to 
start fixing fp64 as currently defined but having found this new solution then 
such fixes are no longer important. The lack of support for gp32 fp64 in linux 
is further reason to permit redefining it. Would you be happy to retain the 
same builtin defines for FP64 if changing its behaviour (i.e. __mips_fpr=64)?
 
> The corresponding asm would then be ".set fp=xx".
> 
> Either way, a new .set option would be better than a specific .fr
> directive because it gives you access to the option stack (".set
> push"/".set pop").
> 
> I'm not sure about:
> 
>   If an assembly directive is seen prior to the start of the text
>   section then this modifies the default mode for the module.
> 
> This isn't how any of the existing options work and I think the
> inconsistency would be confusing.  It also means that if the first
> function in a file happens to force a local mode (e.g.
> because it's an ifunc implementation) then you'd have to remember to
> write:
> 
>   .fr x
>   .fr 1
> 
> so that the first sets the mode for the module and the second sets it
> for the first function.  The different treatment of the two lines
> wouldn't be obvious at first glance.
> 
> How about instead having a separate directive that explicitly sets the
> global value of an option?  I.e. something like ".module", taking the
> same options as ".set".  Better names welcome. :-)

Use of a different directive to actually affect the overall mode of a module 
sounds like a good plan and it avoids the weird behaviour. The only thing 
specifically needed is that the assembly file records the mode it was written 
for. Getting the wrong command line option would otherwise lead to unusual 
runtime failures. We have been/are still discussing this point so it's no 
surprise you have commented on it too. I'll wait for any further comments on 
this area and update accordingly.
 
> The scheme allows an ifunc to request a mode and effectively gives the
> choice to the firstcomer.  Every other ifunc then has to live with the
> choice.  I don't think that's a good idea, since the order that ifuncs
> are called isn't well-defined or under direct user control.
> 
> Since ifuncs would have to live with a choice made by other ifuncs, in
> practice they must all be prepared to deal with FR=1 if linked into a
> fully-modeless or FR1 program, and to deal with FR=0 if linked into a
> fully-modeless or FR0 program.  So IMO the dynamic linker should simply
> set FR=1 for modeless programs if the hardware supports it and set it to
> 0 for modeless programs otherwise, like you say in the first paragraph
> of 9.4.

The ifunc interaction should possibly be moved to a different proposal. We 
could reduce this down to a simple statem

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds
On Sun, Feb 23, 2014 at 11:31 AM, Linus Torvalds
 wrote:
>
> Let me think about it some more, but my gut feel is that just tweaking
> the definition of what "ordered" means is sufficient.
>
> So to go back to the suggested ordering rules (ignoring the "restrict"
> part, which is just to clarify that ordering through other means to
> get to the object doesn't matter), I suggested:
>
>  "the consume ordering guarantees the ordering between that
>   atomic read and the accesses to the object that the pointer
>   points to"
>
> and I think the solution is to just say that this ordering acts as a
> fence. It doesn't say exactly *where* the fence is, but it says that
> there is *some* fence between the load of the pointer and any/all
> accesses to the object through that pointer.

I'm wrong. That doesn't work. At all. There is no ordering except
through the pointer chain.

So I think saying just that, and nothing else (no magic fences, no
nothing) is the right thing:

 "the consume ordering guarantees the ordering between that
  atomic read and the accesses to the object that the pointer
  points to directly or indirectly through a chain of pointers"

The thing is, anything but a chain of pointers (and maybe relaxing it
to "indexes in tables" in addition to pointers) doesn't really work.

The current standard tries to break it at "obvious" points that can
lose the data dependency (either by turning it into a control
dependency, or by just dropping the value, like the left-hand side of
a comma-expression), but the fact is, it's broken.

It's broken not just because the value can be lost other ways (ie the
"p-p" example), it's broken because the value can be turned into a
control dependency so many other ways too.

Compilers regularly turn arithmetic ops with logical comparisons into
branches. So an expression like "a = !!ptr" carries a dependency in
the current C standard, but it's entirely possible that a compiler
ends up turning it into a compare-and-branch rather than a
compare-and-set-conditional, depending on just exactly how "a" ends up
being used. That's true even on an architecture like ARM that has a
lot of conditional instructions (there are way less if you compile for
Thumb, for example, but compilers also do things like "if there are
more than N predicated instructions I'll just turn it into a
branch-over instead").

So I think the C standard needs to just explicitly say that you can
walk a chain of pointers (with that possible "indexes in arrays"
extension), and nothing more.

Linus


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Richard Biener
On Mon, Feb 24, 2014 at 4:57 PM, Linus Torvalds
 wrote:
> On Sun, Feb 23, 2014 at 11:31 AM, Linus Torvalds
>  wrote:
>>
>> Let me think about it some more, but my gut feel is that just tweaking
>> the definition of what "ordered" means is sufficient.
>>
>> So to go back to the suggested ordering rules (ignoring the "restrict"
>> part, which is just to clarify that ordering through other means to
>> get to the object doesn't matter), I suggested:
>>
>>  "the consume ordering guarantees the ordering between that
>>   atomic read and the accesses to the object that the pointer
>>   points to"
>>
>> and I think the solution is to just say that this ordering acts as a
>> fence. It doesn't say exactly *where* the fence is, but it says that
>> there is *some* fence between the load of the pointer and any/all
>> accesses to the object through that pointer.
>
> I'm wrong. That doesn't work. At all. There is no ordering except
> through the pointer chain.
>
> So I think saying just that, and nothing else (no magic fences, no
> nothing) is the right thing:
>
>  "the consume ordering guarantees the ordering between that
>   atomic read and the accesses to the object that the pointer
>   points to directly or indirectly through a chain of pointers"

To me that reads like

  int i;
  int *q = &i;
  int **p = &q;

  atomic_XXX (p, CONSUME);

orders against accesses '*p', '**p', '*q' and 'i'.  Thus it seems they
want to say that it orders against aliased storage - but then go further
and include "indirectly through a chain of pointers"?!  Thus an
atomic read of a int * orders against any 'int' memory operation but
not against 'float' memory operations?

Eh ...

Just jumping in to throw in my weird-2-cents.

Richard.


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds
On Mon, Feb 24, 2014 at 8:27 AM, Richard Biener
 wrote:
>
> To me that reads like
>
>   int i;
>   int *q = &i;
>   int **p = &q;
>
>   atomic_XXX (p, CONSUME);
>
> orders against accesses '*p', '**p', '*q' and 'i'.  Thus it seems they
> want to say that it orders against aliased storage - but then go further
> and include "indirectly through a chain of pointers"?!  Thus an
> atomic read of a int * orders against any 'int' memory operation but
> not against 'float' memory operations?

No, it's not about type at all, and the "chain of pointers" can be
much more complex than that, since the "int *" can point to within an
object that contains other things than just that "int" (the "int" can
be part of a structure that then has pointers to other structures
etc).

So in your example,

ptr = atomic_read(p, CONSUME);

would indeed order against the subsequent access of the chain through
*that* pointer (the whole "restrict" thing that I left out as a
separate thing, which was probably a mistake), but certainly not
against any integer pointer, and certainly not against any aliasing
pointer chains.

So yes, the atomic_read() would be ordered wrt '*ptr' (getting 'q')
_and_ '**ptr' (getting 'i'), but nothing else - including just the
aliasing access of dereferencing 'i' directly.

Linus


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds
On Mon, Feb 24, 2014 at 8:37 AM, Linus Torvalds
 wrote:
>
> So yes, the atomic_read() would be ordered wrt '*ptr' (getting 'q')
> _and_ '**ptr' (getting 'i'), but nothing else - including just the
> aliasing access of dereferencing 'i' directly.

Btw, what CPU architects and memory ordering guys tend to do in
documentation is give a number of "litmus test" pseudo-code sequences
to show the effects and intent of the language.

I think giving those kinds of litmus tests for both "this is ordered"
and "this is not ordered" cases like the above is would be a great
clarification. Partly because the language is going to be somewhat
legalistic and thus hard to wrap your mind around, and partly to
really hit home the *intent* of the language, which I think is
actually fairly clear to both compiler writers and to programmers.

   Linus


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Michael Matz
Hi,

On Mon, 24 Feb 2014, Linus Torvalds wrote:

> > To me that reads like
> >
> >   int i;
> >   int *q = &i;
> >   int **p = &q;
> >
> >   atomic_XXX (p, CONSUME);
> >
> > orders against accesses '*p', '**p', '*q' and 'i'.  Thus it seems they
> > want to say that it orders against aliased storage - but then go further
> > and include "indirectly through a chain of pointers"?!  Thus an
> > atomic read of a int * orders against any 'int' memory operation but
> > not against 'float' memory operations?
> 
> No, it's not about type at all, and the "chain of pointers" can be
> much more complex than that, since the "int *" can point to within an
> object that contains other things than just that "int" (the "int" can
> be part of a structure that then has pointers to other structures
> etc).

So, let me try to poke holes into your definition or increase my 
understanding :) .  You said "chain of pointers"(dereferences I assume), 
e.g. if p is result of consume load, then access to 
p->here->there->next->prev->stuff is supposed to be ordered with that load 
(or only when that last load/store itself is also an atomic load or 
store?).

So, what happens if the pointer deref chain is partly hidden in some 
functions:

A * adjustptr (B *ptr) { return &ptr->here->there->next; }
B * p = atomic_XXX (&somewhere, consume);
adjustptr(p)->prev->stuff = bla;

As far as I understood you, this whole ptrderef chain business would be 
only an optimization opportunity, right?  So if the compiler can't be sure 
how p is actually used (as in my function-using case, assume adjustptr is 
defined in another unit), then the consume load would simply be 
transformed into an acquire (or whatever, with some barrier I mean)?  Only 
_if_ the compiler sees all obvious uses of p (indirectly through pointer 
derefs) can it, yeah, do what with the consume load?


Ciao,
Michael.


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Paul E. McKenney
On Mon, Feb 24, 2014 at 07:57:24AM -0800, Linus Torvalds wrote:
> On Sun, Feb 23, 2014 at 11:31 AM, Linus Torvalds
>  wrote:
> >
> > Let me think about it some more, but my gut feel is that just tweaking
> > the definition of what "ordered" means is sufficient.
> >
> > So to go back to the suggested ordering rules (ignoring the "restrict"
> > part, which is just to clarify that ordering through other means to
> > get to the object doesn't matter), I suggested:
> >
> >  "the consume ordering guarantees the ordering between that
> >   atomic read and the accesses to the object that the pointer
> >   points to"
> >
> > and I think the solution is to just say that this ordering acts as a
> > fence. It doesn't say exactly *where* the fence is, but it says that
> > there is *some* fence between the load of the pointer and any/all
> > accesses to the object through that pointer.
> 
> I'm wrong. That doesn't work. At all. There is no ordering except
> through the pointer chain.
> 
> So I think saying just that, and nothing else (no magic fences, no
> nothing) is the right thing:
> 
>  "the consume ordering guarantees the ordering between that
>   atomic read and the accesses to the object that the pointer
>   points to directly or indirectly through a chain of pointers"
> 
> The thing is, anything but a chain of pointers (and maybe relaxing it
> to "indexes in tables" in addition to pointers) doesn't really work.
> 
> The current standard tries to break it at "obvious" points that can
> lose the data dependency (either by turning it into a control
> dependency, or by just dropping the value, like the left-hand side of
> a comma-expression), but the fact is, it's broken.
> 
> It's broken not just because the value can be lost other ways (ie the
> "p-p" example), it's broken because the value can be turned into a
> control dependency so many other ways too.
> 
> Compilers regularly turn arithmetic ops with logical comparisons into
> branches. So an expression like "a = !!ptr" carries a dependency in
> the current C standard, but it's entirely possible that a compiler
> ends up turning it into a compare-and-branch rather than a
> compare-and-set-conditional, depending on just exactly how "a" ends up
> being used. That's true even on an architecture like ARM that has a
> lot of conditional instructions (there are way less if you compile for
> Thumb, for example, but compilers also do things like "if there are
> more than N predicated instructions I'll just turn it into a
> branch-over instead").
> 
> So I think the C standard needs to just explicitly say that you can
> walk a chain of pointers (with that possible "indexes in arrays"
> extension), and nothing more.

I am comfortable with this.  My desire for also marking the later
pointers does not make sense without some automated way of validating
them, which I don't immediately see a way to do.

So let me try laying out the details.  Sticking with pointers for the
moment, if we reach agreement on these, I will try expanding to integers.

1.  A pointer value obtained from a memory_order_consume load is part
of a pointer chain.  I am calling the pointer itself a "chained
pointer" for the moment.

2.  Note that it is the value that qualifies as being chained, not
the variable.  For example, given pointer variable might hold
a chained pointer at one point in the code, then a non-chained
pointer later.  Therefore, "q = p", where "q" is a pointer and
"p" is a chained pointer results in "q" containing a chained
pointer.

3.  Adding or subtracting an integer to/from a chained pointer
results in another chained pointer in that same pointer chain.

4.  Bitwise operators ("&", "|", "^", and I suppose also "~")
applied to a chained pointer and an integer results in another
chained pointer in that same pointer chain.

5.  Consider a sequence as follows: dereference operator (unary "*",
"[]", "->") optionally followed by a series of direct selection
operators ("."), finally (and unconditionally) followed by
a unary "&" operator.  Applying such a sequence to a chained
pointer results in another chained pointer in the same chain.

Given a chained pointer "p", examples include "&p[3]",
"&p->a.b.c.d.e.f.g", and "&*p".

6.  The expression "p->f", where "p" is a chained pointer and "f"
is a pointer, results in a chained pointer.

FWIW, this means that pointer chains can overlap as in this
example:

p = atomic_load_explicit(&gp, memory_order_consume);
q = atomic_load_explicit(&p->ap, memory_order_consume);
x = q->a;

This should be fine, I don't see any problems with this.

7.  Applying a pointer cast to a chained pointer results in a
chained pointer.

8.  Applying any of the following operators to a chained pointer
res

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Paul E. McKenney
On Mon, Feb 24, 2014 at 05:55:50PM +0100, Michael Matz wrote:
> Hi,
> 
> On Mon, 24 Feb 2014, Linus Torvalds wrote:
> 
> > > To me that reads like
> > >
> > >   int i;
> > >   int *q = &i;
> > >   int **p = &q;
> > >
> > >   atomic_XXX (p, CONSUME);
> > >
> > > orders against accesses '*p', '**p', '*q' and 'i'.  Thus it seems they
> > > want to say that it orders against aliased storage - but then go further
> > > and include "indirectly through a chain of pointers"?!  Thus an
> > > atomic read of a int * orders against any 'int' memory operation but
> > > not against 'float' memory operations?
> > 
> > No, it's not about type at all, and the "chain of pointers" can be
> > much more complex than that, since the "int *" can point to within an
> > object that contains other things than just that "int" (the "int" can
> > be part of a structure that then has pointers to other structures
> > etc).
> 
> So, let me try to poke holes into your definition or increase my 
> understanding :) .  You said "chain of pointers"(dereferences I assume), 
> e.g. if p is result of consume load, then access to 
> p->here->there->next->prev->stuff is supposed to be ordered with that load 
> (or only when that last load/store itself is also an atomic load or 
> store?).
> 
> So, what happens if the pointer deref chain is partly hidden in some 
> functions:
> 
> A * adjustptr (B *ptr) { return &ptr->here->there->next; }
> B * p = atomic_XXX (&somewhere, consume);
> adjustptr(p)->prev->stuff = bla;
> 
> As far as I understood you, this whole ptrderef chain business would be 
> only an optimization opportunity, right?  So if the compiler can't be sure 
> how p is actually used (as in my function-using case, assume adjustptr is 
> defined in another unit), then the consume load would simply be 
> transformed into an acquire (or whatever, with some barrier I mean)?  Only 
> _if_ the compiler sees all obvious uses of p (indirectly through pointer 
> derefs) can it, yeah, do what with the consume load?

Good point, I left that out of my list.  Adding it:

13. By default, pointer chains do not propagate into or out of functions.
In implementations having attributes, a [[carries_dependency]]
may be used to mark a function argument or return as passing
a pointer chain into or out of that function.

If a function does not contain memory_order_consume loads and
also does not contain [[carries_dependency]] attributes, then
that function may be compiled using any desired dependency-breaking
optimizations.

The ordering effects are implementation defined when a given
pointer chain passes into or out of a function through a parameter
or return not marked with a [[carries_dependency]] attributed.

Note that this last paragraph differs from the current standard, which
would require ordering regardless.

Thanx, Paul



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds
On Mon, Feb 24, 2014 at 8:55 AM, Michael Matz  wrote:
>
> So, let me try to poke holes into your definition or increase my
> understanding :) .  You said "chain of pointers"(dereferences I assume),
> e.g. if p is result of consume load, then access to
> p->here->there->next->prev->stuff is supposed to be ordered with that load
> (or only when that last load/store itself is also an atomic load or
> store?).

It's supposed to be ordered wrt the first load (the consuming one), yes.

> So, what happens if the pointer deref chain is partly hidden in some
> functions:

No problem.

The thing is, the ordering is actually handled by the CPU in all
relevant cases.  So the compiler doesn't actually need to *do*
anything. All this legalistic stuff is just to describe the semantics
and the guarantees.

The problem is two cases:

 (a) alpha (which doesn't really order any accesses at all, not even
dependent loads), but for a compiler alpha is actually trivial: just
add a "rmb" instruction after the load, and you can't really do
anything else (there's a few optimizations you can do wrt the rmb, but
they are very specific and simple).

So (a) is a "problem", but the solution is actually really simple, and
gives very *strong* guarantees: on alpha, a "consume" ends up being
basically the same as a read barrier after the load, with only very
minimal room for optimization.

 (b) ARM and powerpc and similar architectures, that guarantee the
data dependency as long as it is an *actual* data dependency, and
never becomes a control dependency.

On ARM and powerpc, control dependencies do *not* order accesses (the
reasons boil down to essentially: branch prediction breaks the
dependency, and instructions that come after the branch can be happily
executed before the branch). But it's almost impossible to describe
that in the standard, since compilers can (and very much do) turn a
control dependency into a data dependency and vice versa.

So the current standard tries to describe that "control vs data"
dependency, and tries to limit it to a data dependency. It fails. It
fails for multiple reasons - it doesn't allow for trivial
optimizations that just remove the data dependency, and it also
doesn't allow for various trivial cases where the compiler *does* turn
the data dependency into a control dependency.

So I really really think that the current C standard language is
broken. Unfixably broken.

I'm trying to change the "syntactic data dependency" that the current
standard uses into something that is clearer and correct.

The "chain of pointers" thing is still obviously a data dependency,
but by limiting it to pointers, it simplifies the language, clarifies
the meaning, avoids all syntactic tricks (ie "p-p" is clearly a
syntactic dependency on "p", but does *not* involve in any way
following the pointer) and makes it basically impossible for the
compiler to break the dependency without doing value prediction, and
since value prediction has to be disallowed anyway, that's a feature,
not a bug.

  Linus


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Paul E. McKenney
On Mon, Feb 24, 2014 at 02:55:07PM +0100, Michael Matz wrote:
> Hi,
> 
> On Fri, 21 Feb 2014, Paul E. McKenney wrote:
> 
> > > And with conservative I mean "everything is a source of a dependency, and 
> > > hence can't be removed, reordered or otherwise fiddled with", and that 
> > > includes code sequences where no atomic objects are anywhere in sight [1].
> > > In the light of that the only realistic way (meaning to not have to 
> > > disable optimization everywhere) to implement consume as currently 
> > > specified is to map it to acquire.  At which point it becomes pointless.
> > 
> > No, only memory_order_consume loads and [[carries_dependency]]
> > function arguments are sources of dependency chains.
> 
> I don't see [[carries_dependency]] in the C11 final draft (yeah, should 
> get a real copy, I know, but let's assume it's the same language as the 
> standard).  Therefore, yes, only consume loads are sources of 
> dependencies.  The problem with the definition of the "carries a 
> dependency" relation is not the sources, but rather where it stops.  
> It's transitively closed over "value of evaluation A is used as operand in 
> evaluation B", with very few exceptions as per 5.1.2.4#14.  Evaluations 
> can contain function calls, so if there's _any_ chance that an operand of 
> an evaluation might even indirectly use something resulting from a consume 
> load then that evaluation must be compiled in a way to not break 
> dependency chains.
> 
> I don't see a way to generally assume that e.g. the value of a function 
> argument can impossibly result from a consume load, therefore the compiler 
> must assume that all function arguments _can_ result from such loads, and 
> so must disable all depchain breaking optimization (which are many).
> 
> > > [1] Simple example of what type of transformations would be disallowed:
> > > 
> > > int getzero (int i) { return i - i; }
> > 
> > This needs to be as follows:
> > 
> > [[carries_dependency]] int getzero(int i [[carries_dependency]])
> > {
> > return i - i;
> > }
> > 
> > Otherwise dependencies won't get carried through it.
> 
> So, with the above do you agree that in absense of any other magic (see 
> below) the compiler is not allowed to transform my initial getzero() 
> (without the carries_dependency markers) implementation into "return 0;" 
> because of the C11 rules for "carries-a-dependency"?
> 
> If so, do you then also agree that the specification of "carries a 
> dependency" is somewhat, err, shall we say, overbroad?

>From what I can see, overbroad.  The problem is that the C++11 standard
defines how carries-dependency interacts with function calls and returns
in 7.6.4, which describes the [[carries_dependency]] attribute.  For example,
7.6.4p6 says:

Function g’s second parameter has a carries_dependency
attribute, but its first parameter does not. Therefore, function
h’s first call to g carries a dependency into g, but its second
call does not. The implementation might need to insert a fence
prior to the second call to g.

When C11 declined to take attributes, they also left out the part saying
how carries-dependency interacts with functions.  :-/

Might be fixed by now, checking up on it.

One could argue that the bit about emitting fence instructions at
function calls and returns is implied by the as-if rule even without
this wording, but...

> > > depchains don't matter, could _then_ optmize it to zero.  But that's 
> > > insane, especially considering that it's hard to detect if a given 
> > > context 
> > > doesn't care for depchains, after all the depchain relation is 
> > > constructed 
> > > exactly so that it bleeds into nearly everywhere.  So we would most of 
> > > the time have to assume that the ultimate context will be depchain-aware 
> > > and therefore disable many transformations.
> > 
> > Any function that does not contain a memory_order_consume load and that 
> > doesn't have any arguments marked [[carries_dependency]] can be 
> > optimized just as before.
> 
> And as such marker doesn't exist we must conservatively assume that it's 
> on _all_ parameters, so I'll stand by my claim.

Or that you have to emit a fence instruction when a dependency chain
enters or leaves a function in cases where all callers/calles are not
visible to the compiler.

My preference is that the ordering properties of a carries-dependency
chain is implementation defined at the point that it enters or leaves
a function without the marker, but others strongly disagreed.  ;-)

> > > Then inlining getzero would merely add another "# j.dep = i.dep" 
> > > relation, so depchains are still there but the value optimization can 
> > > happen before inlining.  Having to do something like that I'd find 
> > > disgusting, and rather rewrite consume into acquire :)  Or make the 
> > > depchain relation somehow realistically implementable.
> > 
> > I was actually OK with arithmetic cancellation breaking the dependency

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Paul E. McKenney
On Mon, Feb 24, 2014 at 09:28:56AM -0800, Paul E. McKenney wrote:
> On Mon, Feb 24, 2014 at 05:55:50PM +0100, Michael Matz wrote:
> > Hi,
> > 
> > On Mon, 24 Feb 2014, Linus Torvalds wrote:
> > 
> > > > To me that reads like
> > > >
> > > >   int i;
> > > >   int *q = &i;
> > > >   int **p = &q;
> > > >
> > > >   atomic_XXX (p, CONSUME);
> > > >
> > > > orders against accesses '*p', '**p', '*q' and 'i'.  Thus it seems they
> > > > want to say that it orders against aliased storage - but then go further
> > > > and include "indirectly through a chain of pointers"?!  Thus an
> > > > atomic read of a int * orders against any 'int' memory operation but
> > > > not against 'float' memory operations?
> > > 
> > > No, it's not about type at all, and the "chain of pointers" can be
> > > much more complex than that, since the "int *" can point to within an
> > > object that contains other things than just that "int" (the "int" can
> > > be part of a structure that then has pointers to other structures
> > > etc).
> > 
> > So, let me try to poke holes into your definition or increase my 
> > understanding :) .  You said "chain of pointers"(dereferences I assume), 
> > e.g. if p is result of consume load, then access to 
> > p->here->there->next->prev->stuff is supposed to be ordered with that load 
> > (or only when that last load/store itself is also an atomic load or 
> > store?).
> > 
> > So, what happens if the pointer deref chain is partly hidden in some 
> > functions:
> > 
> > A * adjustptr (B *ptr) { return &ptr->here->there->next; }
> > B * p = atomic_XXX (&somewhere, consume);
> > adjustptr(p)->prev->stuff = bla;
> > 
> > As far as I understood you, this whole ptrderef chain business would be 
> > only an optimization opportunity, right?  So if the compiler can't be sure 
> > how p is actually used (as in my function-using case, assume adjustptr is 
> > defined in another unit), then the consume load would simply be 
> > transformed into an acquire (or whatever, with some barrier I mean)?  Only 
> > _if_ the compiler sees all obvious uses of p (indirectly through pointer 
> > derefs) can it, yeah, do what with the consume load?
> 
> Good point, I left that out of my list.  Adding it:
> 
> 13.   By default, pointer chains do not propagate into or out of functions.
>   In implementations having attributes, a [[carries_dependency]]
>   may be used to mark a function argument or return as passing
>   a pointer chain into or out of that function.
> 
>   If a function does not contain memory_order_consume loads and
>   also does not contain [[carries_dependency]] attributes, then
>   that function may be compiled using any desired dependency-breaking
>   optimizations.
> 
>   The ordering effects are implementation defined when a given
>   pointer chain passes into or out of a function through a parameter
>   or return not marked with a [[carries_dependency]] attributed.
> 
> Note that this last paragraph differs from the current standard, which
> would require ordering regardless.

And there is also kill_dependency(), which needs to be added to the list
in #8 of operators that take a chained pointer and return something that
is not a chained pointer.

Thanx, Paul



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Paul E. McKenney
On Mon, Feb 24, 2014 at 09:38:46AM -0800, Linus Torvalds wrote:
> On Mon, Feb 24, 2014 at 8:55 AM, Michael Matz  wrote:
> >
> > So, let me try to poke holes into your definition or increase my
> > understanding :) .  You said "chain of pointers"(dereferences I assume),
> > e.g. if p is result of consume load, then access to
> > p->here->there->next->prev->stuff is supposed to be ordered with that load
> > (or only when that last load/store itself is also an atomic load or
> > store?).
> 
> It's supposed to be ordered wrt the first load (the consuming one), yes.
> 
> > So, what happens if the pointer deref chain is partly hidden in some
> > functions:
> 
> No problem.
> 
> The thing is, the ordering is actually handled by the CPU in all
> relevant cases.  So the compiler doesn't actually need to *do*
> anything. All this legalistic stuff is just to describe the semantics
> and the guarantees.
> 
> The problem is two cases:
> 
>  (a) alpha (which doesn't really order any accesses at all, not even
> dependent loads), but for a compiler alpha is actually trivial: just
> add a "rmb" instruction after the load, and you can't really do
> anything else (there's a few optimizations you can do wrt the rmb, but
> they are very specific and simple).
> 
> So (a) is a "problem", but the solution is actually really simple, and
> gives very *strong* guarantees: on alpha, a "consume" ends up being
> basically the same as a read barrier after the load, with only very
> minimal room for optimization.
> 
>  (b) ARM and powerpc and similar architectures, that guarantee the
> data dependency as long as it is an *actual* data dependency, and
> never becomes a control dependency.
> 
> On ARM and powerpc, control dependencies do *not* order accesses (the
> reasons boil down to essentially: branch prediction breaks the
> dependency, and instructions that come after the branch can be happily
> executed before the branch). But it's almost impossible to describe
> that in the standard, since compilers can (and very much do) turn a
> control dependency into a data dependency and vice versa.
> 
> So the current standard tries to describe that "control vs data"
> dependency, and tries to limit it to a data dependency. It fails. It
> fails for multiple reasons - it doesn't allow for trivial
> optimizations that just remove the data dependency, and it also
> doesn't allow for various trivial cases where the compiler *does* turn
> the data dependency into a control dependency.
> 
> So I really really think that the current C standard language is
> broken. Unfixably broken.
> 
> I'm trying to change the "syntactic data dependency" that the current
> standard uses into something that is clearer and correct.
> 
> The "chain of pointers" thing is still obviously a data dependency,
> but by limiting it to pointers, it simplifies the language, clarifies
> the meaning, avoids all syntactic tricks (ie "p-p" is clearly a
> syntactic dependency on "p", but does *not* involve in any way
> following the pointer) and makes it basically impossible for the
> compiler to break the dependency without doing value prediction, and
> since value prediction has to be disallowed anyway, that's a feature,
> not a bug.

OK, good point, please ignore my added thirteenth item in the list.

Thanx, Paul



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds
On Mon, Feb 24, 2014 at 9:21 AM, Paul E. McKenney
 wrote:
>
> 4.  Bitwise operators ("&", "|", "^", and I suppose also "~")
> applied to a chained pointer and an integer results in another
> chained pointer in that same pointer chain.

No. You cannot define it this way. Taking the value of a pointer and
doing a bitwise operation that throws away all the bits (or even
*most* of the bits) results in the compiler easily being able to turn
the "chain" into a non-chain.

The obvious example being "val & 0", but things like "val & 1" are in
practice also something that compilers easily turn into control
dependencies instead of data dependencies.

So you can talk about things like "aligning the pointer value to
object boundaries" etc, but it really cannot and *must* not be about
the syntactic operations.

The same goes for "adding and subtracting an integer". The *syntax*
doesn't matter. It's about remaining information. Doing "p-(int)p" or
"p+(-(int)p)" doesn't leave any information despite being "subtracting
and adding an integer" at a syntactic level.

Syntax is meaningless. Really.

> 8.  Applying any of the following operators to a chained pointer
> results in something that is not a chained pointer:
> "()", sizeof, "!", "*", "/", "%", ">>", "<<", "<", ">", "<=",
> ">=", "==", "!=", "&&", and "||".

Parenthesis? I'm assuming that you mean calling through the chained pointer.

Also, I think all of /, * and % are perfectly fine, and might be used
for that "aligning the pointer" operation that is fine.

 Linus


Re: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-02-24 Thread Richard Sandiford
Matthew Fortune  writes:
> Richard Sandiford  writes
>> I understand the need to deprecate the current -mgp32 -mfp64 behaviour.
>> I don't think we should deprecate -mfp64 itself though.  Instead, why
>> not keep -mfp32 as meaning FR0, -mfp64 meaning FR1 and add -mfpxx for
>> modeless?  So rather than deprecating the -mgp32 -mfp64 combination and
>> adding -mfr, we'd just make -mgp32 -mfp64 generate the new FR1 form in
>> which the odd-numbered registers are call-clobbered rather than the old
>> form in which they were call-saved.
>
> Extreme caution is the only reason why the design avoided changing fp64
> behaviour (basically in case anyone had serious objection). If you would
> be happy with a change of behaviour for -mgp32 -mfp64 then that is a
> great start.

Yeah, my first impression is that keeping the current interface would
be much better than adding a new set of options.

>> AIUI the old form never really worked reliably due to things like
>> newlib's setjmp not preserving the odd-numbered registers, so it doesn't
>> seem worth keeping around.  Also, the old form is identified by the GNU
>> attribute (4, 4) so it'd be easy for the linker to reject links between
>> the old and the new form.
>
> That is true. You will have noticed a number of changes over recent
> months to start fixing fp64 as currently defined but having found this
> new solution then such fixes are no longer important. The lack of
> support for gp32 fp64 in linux is further reason to permit redefining
> it. Would you be happy to retain the same builtin defines for FP64 if
> changing its behaviour (i.e. __mips_fpr=64)?

I think that should be OK.  I suppose a natural follow-on question
is what __mips_fpr should be for -mfpxx.  Maybe just 0?

If we want to be extra cautious we could define a second set of macros
alongside the old ones.

>> The scheme allows an ifunc to request a mode and effectively gives the
>> choice to the firstcomer.  Every other ifunc then has to live with the
>> choice.  I don't think that's a good idea, since the order that ifuncs
>> are called isn't well-defined or under direct user control.
>> 
>> Since ifuncs would have to live with a choice made by other ifuncs, in
>> practice they must all be prepared to deal with FR=1 if linked into a
>> fully-modeless or FR1 program, and to deal with FR=0 if linked into a
>> fully-modeless or FR0 program.  So IMO the dynamic linker should simply
>> set FR=1 for modeless programs if the hardware supports it and set it to
>> 0 for modeless programs otherwise, like you say in the first paragraph
>> of 9.4.
>
> The ifunc interaction should possibly be moved to a different
> proposal. We could reduce this down to a simple statement that
> interaction with ifunc needs to be considered when finalising MIPS ifunc
> support in general.

Sounds good.

>> You allow the mode to be changed midexecution if a new FR0 or FR1 object
>> is loaded.  Is it really worth supporting that though?
>> It has the same problem as the ifuncs: once you've dlopen()ed an object,
>> you fix the mode for the whole program, even after the dlclose().
>> Unless we know of specific cases where this is needed, maybe it would be
>> safer to fix the mode before execution based on DT_NEEDED libraries and
>> allow the mode of modeless programs to be overridden by an environment
>> variable.
>
> Scanning the entire set of DT_NEEDED libraries would achieve most of
> what full dynamic mode switching gives us, it is essentially the first
> stage of the dynamic mode switching described in the proposal
> anyway. However, I am concerned about excluding dlopen()ed objects from
> mode selection though (not so worried about excluding ifunc, that could
> just fix the mode before resolving the first one). One specific concern
> is for Android where I believe we have the situation where native
> applications are loaded as (a form of) shared library. This means a mode
> requirement can be introduced late on. In an Android environment it is
> unlikely to be acceptable to have to do something special to load an
> application that happens to have a specific mode requirement so dynamic
> selection is useful. This is more of a transitional problem than
> anything but making it a smooth process is quite important. I'm also not
> sure that there is much more effort required for a dynamic linker to
> take account of dlopen()ed objects in addition to DT_NEEDED, changes are
> needed in this code regardless.

As far as GNU/Linux goes, if we do end up with a function in something
like a modeless libm that is implemented as an FR-aware ifunc, that would
force the choice to be made early anyway.  So we have this very specific
case where everything in the initial process is modeless, no ifuncs take
advantage of the FR setting, and a dlopen()ed object was compiled as fr0
rather than modeless.  I agree it's possible but it seems unlikely.

I know nothing about the way Android loading works though. :-)
Could you describe it in more detail? 

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Paul E. McKenney
On Mon, Feb 24, 2014 at 10:14:01AM -0800, Linus Torvalds wrote:
> On Mon, Feb 24, 2014 at 9:21 AM, Paul E. McKenney
>  wrote:
> >
> > 4.  Bitwise operators ("&", "|", "^", and I suppose also "~")
> > applied to a chained pointer and an integer results in another
> > chained pointer in that same pointer chain.
> 
> No. You cannot define it this way. Taking the value of a pointer and
> doing a bitwise operation that throws away all the bits (or even
> *most* of the bits) results in the compiler easily being able to turn
> the "chain" into a non-chain.
> 
> The obvious example being "val & 0", but things like "val & 1" are in
> practice also something that compilers easily turn into control
> dependencies instead of data dependencies.

Indeed, most of the bits need to remain for this to work.

> So you can talk about things like "aligning the pointer value to
> object boundaries" etc, but it really cannot and *must* not be about
> the syntactic operations.
> 
> The same goes for "adding and subtracting an integer". The *syntax*
> doesn't matter. It's about remaining information. Doing "p-(int)p" or
> "p+(-(int)p)" doesn't leave any information despite being "subtracting
> and adding an integer" at a syntactic level.
> 
> Syntax is meaningless. Really.

Good points.  How about the following replacements?

3.  Adding or subtracting an integer to/from a chained pointer
results in another chained pointer in that same pointer chain.
The results of addition and subtraction operations that cancel
the chained pointer's value (for example, "p-(long)p" where "p"
is a pointer to char) are implementation defined.

4.  Bitwise operators ("&", "|", "^", and I suppose also "~")
applied to a chained pointer and an integer for the purposes
of alignment and pointer translation results in another
chained pointer in that same pointer chain.  Other uses
of bitwise operators on chained pointers (for example,
"p|~0") are implementation defined.

> > 8.  Applying any of the following operators to a chained pointer
> > results in something that is not a chained pointer:
> > "()", sizeof, "!", "*", "/", "%", ">>", "<<", "<", ">", "<=",
> > ">=", "==", "!=", "&&", and "||".
> 
> Parenthesis? I'm assuming that you mean calling through the chained pointer.

Yes, good point.  Of course, parentheses for grouping just pass the
value through without affecting the chained-ness.

> Also, I think all of /, * and % are perfectly fine, and might be used
> for that "aligning the pointer" operation that is fine.

Something like this?

char *p;

p = p - (unsigned long)p % 8;

I was thinking of this as subtraction -- the "p" gets moduloed by 8,
which loses the chained-pointer designation.  But that is OK because
that designation gets folded back in by the subtraction.  Am I missing
a use case?

That leaves things like this one:

p = (p / 8) * 8;

I cannot think of any other legitimate use for "/" and "*".

Here is an updated #8 and a new 8a:

8.  Applying any of the following operators to a chained pointer
results in something that is not a chained pointer: function call
"()", sizeof, "!", "%", ">>", "<<", "<", ">", "<=", ">=", "==",
"!=", "&&", "||", and "kill_dependency()".

8a. Dividing a chained pointer by an integer and multiplying it
by that same integer (for example, to align that pointer) results
in a chained pointer in that same pointer chain.  The ordering
effects of other uses of infix "*" and "/" on chained pointers
are implementation defined.

Does that capture it?

Thanx, Paul



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds
On Mon, Feb 24, 2014 at 10:53 AM, Paul E. McKenney
 wrote:
>
> Good points.  How about the following replacements?
>
> 3.  Adding or subtracting an integer to/from a chained pointer
> results in another chained pointer in that same pointer chain.
> The results of addition and subtraction operations that cancel
> the chained pointer's value (for example, "p-(long)p" where "p"
> is a pointer to char) are implementation defined.
>
> 4.  Bitwise operators ("&", "|", "^", and I suppose also "~")
> applied to a chained pointer and an integer for the purposes
> of alignment and pointer translation results in another
> chained pointer in that same pointer chain.  Other uses
> of bitwise operators on chained pointers (for example,
> "p|~0") are implementation defined.

Quite frankly, I think all of this language that is about the actual
operations is irrelevant and wrong.

It's not going to help compiler writers, and it sure isn't going to
help users that read this.

Why not just talk about "value chains" and that any operations that
restrict the value range severely end up breaking the chain. There is
no point in listing the operations individually, because every single
operation *can* restrict things. Listing individual operations and
depdendencies is just fundamentally wrong.

For example, let's look at this obvious case:

   int q,*p = atomic_read(&pp, consume);
   .. nothing modifies 'p' ..
   q = *p;

and there are literally *zero* operations that modify the value
change, so obviously the two operations are ordered, right?

Wrong.

What if the "nothing modifies 'p'" part looks like this:

if (p != &myvariable)
return;

and now any sane compiler will happily optimize "q = *p" into "q =
myvariable", and we're all done - nothing invalid was ever

So my earlier suggestion tried to avoid this by having the restrict
thing, so the above wouldn't work.

But your (and the current C standards) attempt to define this with
some kind of syntactic dependency carrying chain will _inevitably_ get
this wrong, and/or be too horribly complex to actually be useful.

Seriously, don't do it. I claim that all your attempts to do this
crazy syntactic "these operations maintain the chained pointers" is
broken. The fact that you have changed "carries a dependency" to
"chained pointer" changes NOTHING.

So just give it up. It's a fundamentally broken model. It's *wrong*,
but even more importantly, it's not even *useful*, since it ends up
being too complicated for a compiler writer or a programmer to
understand.

I really really really think you need to do this at a higher
conceptual level, get away from all these idiotic "these operations
maintain the chain" crap. Because there *is* no such list.

Quite frankly, any standards text that has that
"[[carries_dependency]]" or "[[kill_dependency]]" or whatever
attribute is broken. It's broken because the whole concept is TOTALLY
ALIEN to the compiler writer or the programmer. It makes no sense.
It's purely legalistic language that has zero reason to exist. It's
non-intuitive for everybody.

And *any* language that talks about the individual operations only
encourages people to play legalistic games that actually defeat the
whole purpose (namely that all relevant CPU's are going to implement
that consume ordering guarantee natively, with no extra code
generation rules AT ALL). So any time you talk about some random
detail of some operation, somebody is going to come up with a "trick"
that defeats things. So don't do it. There is absolutely ZERO
difference between any of the arithmetic operations, be they bitwise,
additive, multiplicative, shifts, whatever.

The *only* thing that matters for all of them is whether they are
"value-preserving", or whether they drop so much information that the
compiler might decide to use a control dependency instead. That's true
for every single one of them.

Similarly, actual true control dependencies that limit the problem
space sufficiently that the actual pointer value no longer has
significant information in it (see the above example) are also things
that remove information to the point that only a control dependency
remains. Even when the value itself is not modified in any way at all.

  Linus


[GSoc 2014] OpenMP runtime improvements

2014-02-24 Thread Pranith Kumar
Hi,

I am interested in contributing to the OpenMP project as part of GSoC 2014.

I am mailing to discuss ideas, to see if someone is willing to mentor
me and also if possible, to get a test patch in before the end  of
application process so as to verify my qualification!

Looking forward to your feedback!

Thanks,
Pranith


build breakage in libsanitizer

2014-02-24 Thread Oleg Smolsky
Hey all, I've just tried building Trunk from branches/google/main/ and 
got the following failure in libsanitizer:


In file included from 
/mnt/test/rvbd-root-gcc49/usr/include/features.h:356:0,

 from /mnt/test/rvbd-root-gcc49/usr/include/arpa/inet.h:22,
 from 
../../../../gcc49-google-main/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc:20:
/mnt/test/rvbd-root-gcc49/usr/include/sys/timex.h:145:31: error: 
expected initializer before 'throw'

  __asm__ ("ntp_gettimex") __THROW;
   ^

This is an intel-to-intel cross-compiler that is being built against 
linux-2.6.32.27 headers and glibc-2.12 (with a few redhat patches). The 
system header in question contains the following:


#if defined __GNUC__ && __GNUC__ >= 2
extern int ntp_gettime (struct ntptimeval *__ntv)
 __asm__ ("ntp_gettimex") __THROW;
#else
extern int ntp_gettimex (struct ntptimeval *__ntv) __THROW;
# define ntp_gettime ntp_gettimex
#endif

This same header works against gcc4.8 (or is not used). Could someone 
clarify what libsanitizer needs here and what gcc dislikes please? Which 
should I patch?


Thanks in advance!
Oleg.


Re: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-02-24 Thread Doug Gilmore
On 02/24/2014 10:42 AM, Richard Sandiford wrote:
>...
>> AIUI the old form never really worked reliably due to things like
>> newlib's setjmp not preserving the odd-numbered registers, so it doesn't
>>> seem worth keeping around.  Also, the old form is identified by the GNU
>>> attribute (4, 4) so it'd be easy for the linker to reject links between
>>> the old and the new form.
>>
>> That is true. You will have noticed a number of changes over recent
>> months to start fixing fp64 as currently defined but having found this
>> new solution then such fixes are no longer important. The lack of
>> support for gp32 fp64 in linux is further reason to permit redefining
>> it. Would you be happy to retain the same builtin defines for FP64 if
>> changing its behaviour (i.e. __mips_fpr=64)?
>
>I think that should be OK.  I suppose a natural follow-on question
>is what __mips_fpr should be for -mfpxx.  Maybe just 0?
I think we should think carefully about just making -mfp64 just disappear.
The support has existed for bare iron for quite a while, and we do internal
testing of MSA using -mfp64.  I'd rather avoid a flag day.  It would be
good to continue recognizing that object files with attribute (4, 4)
(-mfp64) are not compatible with other objects.  Note that adding
EF_MIPS_FP64 is a recent change, which was made to enable Linux kernel
support, and there probably is little code out there that is dependent on
it.  I am more comfortable in changing the interpretation of EF_MIPS_FP64
to simply mean that the process is to be set with FR=1 mode as apposed to
FR=0.  This will allow us to ease our way off of -mfp64 to -mfr1 mode for
those environments where FR=1 is really needed.

Doug
>...




Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Paul E. McKenney
On Mon, Feb 24, 2014 at 11:54:46AM -0800, Linus Torvalds wrote:
> On Mon, Feb 24, 2014 at 10:53 AM, Paul E. McKenney
>  wrote:
> >
> > Good points.  How about the following replacements?
> >
> > 3.  Adding or subtracting an integer to/from a chained pointer
> > results in another chained pointer in that same pointer chain.
> > The results of addition and subtraction operations that cancel
> > the chained pointer's value (for example, "p-(long)p" where "p"
> > is a pointer to char) are implementation defined.
> >
> > 4.  Bitwise operators ("&", "|", "^", and I suppose also "~")
> > applied to a chained pointer and an integer for the purposes
> > of alignment and pointer translation results in another
> > chained pointer in that same pointer chain.  Other uses
> > of bitwise operators on chained pointers (for example,
> > "p|~0") are implementation defined.
> 
> Quite frankly, I think all of this language that is about the actual
> operations is irrelevant and wrong.
> 
> It's not going to help compiler writers, and it sure isn't going to
> help users that read this.
> 
> Why not just talk about "value chains" and that any operations that
> restrict the value range severely end up breaking the chain. There is
> no point in listing the operations individually, because every single
> operation *can* restrict things. Listing individual operations and
> depdendencies is just fundamentally wrong.
> 
> For example, let's look at this obvious case:
> 
>int q,*p = atomic_read(&pp, consume);
>.. nothing modifies 'p' ..
>q = *p;
> 
> and there are literally *zero* operations that modify the value
> change, so obviously the two operations are ordered, right?
> 
> Wrong.
> 
> What if the "nothing modifies 'p'" part looks like this:
> 
> if (p != &myvariable)
> return;
> 
> and now any sane compiler will happily optimize "q = *p" into "q =
> myvariable", and we're all done - nothing invalid was ever

Yes, the compiler could do that.  But it would still be required to
carry a dependency from the memory_order_consume read to the "*p",
which it could do by compiling "q = *p" rather than "q = myvariable"
on the one hand or by emitting a memory-barrier instruction on the other.

This was the point of #12:

12. A memory_order_consume load carries a dependency to any
dereference operator (unary "*", "[]", and "->") in the
resulting pointer chain.

Thanx, Paul



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds
On Mon, Feb 24, 2014 at 2:37 PM, Paul E. McKenney
 wrote:
>>
>> What if the "nothing modifies 'p'" part looks like this:
>>
>> if (p != &myvariable)
>> return;
>>
>> and now any sane compiler will happily optimize "q = *p" into "q =
>> myvariable", and we're all done - nothing invalid was ever
>
> Yes, the compiler could do that.  But it would still be required to
> carry a dependency from the memory_order_consume read to the "*p",

But that's *BS*. You didn't actually listen to the main issue.

Paul, why do you insist on this carries-a-dependency crap?

It's broken. If you don't believe me, then believe the compiler person
who already piped up and told you so.

The "carries a dependency" model is broken. Get over it.

No sane compiler will ever distinguish two different registers that
have the same value from each other. No sane compiler will ever say
"ok, register r1 has the exact same value as register r2, but r2
carries the dependency, so I need to make sure to pass r2 to that
function or use it as a base pointer".

And nobody sane should *expect* a compiler to distinguish two
registers with the same value that way.

So the whole model is broken.

I gave an alternate model (the "restrict"), and you didn't seem to
understand the really fundamental difference. It's not a language
difference, it's a conceptual difference.

In the broken "carries a dependency" model, you have fight all those
aliases that can have the same value, and it is not a fight you can
win. We've had the "p-p" examples, we've had the "p&0" examples, but
the fact is, that "p==&myvariable" example IS EXACTLY THE SAME THING.

All three of those things: "p-p", "p&0", and "p==&myvariable" mean
that any compiler worth its salt now know that "p" carries no
information, and will optimize it away.

So please stop arguing against that. Whenever you argue against that
simple fact, you are arguing against sane compilers.

So *accept* the fact that some operations (and I guarantee that there
are more of those than you can think of, and you can create them with
various tricks using pretty much *any* feature in the C language)
essentially take the data information away. And just accept the fact
that then the ordering goes away too.

So give up on "carries a dependency". Because there will be cases
where that dependency *isn't* carried.

The language of the standard needs to get *away* from the broken
model, because otherwise the standard is broken.

I suggest we instead talk about "litmus tests" and why certain code
sequences are ordered, and others are not.

So the code sequence I already mentioned is *not* ordered:

Litmus test 1:

p = atomic_read(pp, consume);
if (p == &variable)
return p->val;

   is *NOT* ordered, because the compiler can trivially turn this into
"return variable.val", and break the data dependency.

   This is true *regardless* of any "carries a dependency" language,
because that language is insane, and doesn't work when the different
pieces here may be in different compilation units.

BUT:

Litmus test 2:

p = atomic_read(pp, consume);
if (p != &variable)
return p->val;

  *IS* ordered, because while it looks syntactically a lot like
"Litmus test 1", there is no sane way a compiler can use the knowledge
that "p is not a pointer to a particular location" to break the data
dependency.

There is no way in hell that any sane "carries a dependency" model can
get the simple tests above right.

So give up on it already. "Carries a dependency" cannot work. It's a
bad model. You're trying to describe the problem using the wrong
tools.

Note that my "restrict+pointer to object" language actually got this
*right*. The "restrict" part made Litmus test 1 not ordered, because
that "p == &variable" success case means that the pointer wasn't
restricted, so the pre-requisite for ordering didn't exist.

See? The "carries a dependency" is a broken model for this, but there
are _other_ models that can work.

You tried to rewrite my model into "carries a dependency". That
*CANNOT* work. It's like trying to rewrite quantum physics into the
Greek model of the four elements. They are not compatible models, and
one of them can be shown to not work.

  Linus


RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-02-24 Thread Matthew Fortune
Richard Sandiford  writes
> >> AIUI the old form never really worked reliably due to things like
> >> newlib's setjmp not preserving the odd-numbered registers, so it
> >> doesn't seem worth keeping around.  Also, the old form is identified
> >> by the GNU attribute (4, 4) so it'd be easy for the linker to reject
> >> links between the old and the new form.
> >
> > That is true. You will have noticed a number of changes over recent
> > months to start fixing fp64 as currently defined but having found this
> > new solution then such fixes are no longer important. The lack of
> > support for gp32 fp64 in linux is further reason to permit redefining
> > it. Would you be happy to retain the same builtin defines for FP64 if
> > changing its behaviour (i.e. __mips_fpr=64)?
> 
> I think that should be OK.  I suppose a natural follow-on question is
> what __mips_fpr should be for -mfpxx.  Maybe just 0?

I'm doing just that in my experimental implementation of all this.
 
> If we want to be extra cautious we could define a second set of macros
> alongside the old ones.
> 
> >> You allow the mode to be changed midexecution if a new FR0 or FR1
> >> object is loaded.  Is it really worth supporting that though?
> >> It has the same problem as the ifuncs: once you've dlopen()ed an
> >> object, you fix the mode for the whole program, even after the
> dlclose().
> >> Unless we know of specific cases where this is needed, maybe it would
> >> be safer to fix the mode before execution based on DT_NEEDED
> >> libraries and allow the mode of modeless programs to be overridden by
> >> an environment variable.
> >
> > Scanning the entire set of DT_NEEDED libraries would achieve most of
> > what full dynamic mode switching gives us, it is essentially the first
> > stage of the dynamic mode switching described in the proposal anyway.
> > However, I am concerned about excluding dlopen()ed objects from mode
> > selection though (not so worried about excluding ifunc, that could
> > just fix the mode before resolving the first one). One specific
> > concern is for Android where I believe we have the situation where
> > native applications are loaded as (a form of) shared library. This
> > means a mode requirement can be introduced late on. In an Android
> > environment it is unlikely to be acceptable to have to do something
> > special to load an application that happens to have a specific mode
> > requirement so dynamic selection is useful. This is more of a
> > transitional problem than anything but making it a smooth process is
> > quite important. I'm also not sure that there is much more effort
> > required for a dynamic linker to take account of dlopen()ed objects in
> > addition to DT_NEEDED, changes are needed in this code regardless.
> 
> As far as GNU/Linux goes, if we do end up with a function in something
> like a modeless libm that is implemented as an FR-aware ifunc, that
> would force the choice to be made early anyway.  So we have this very
> specific case where everything in the initial process is modeless, no
> ifuncs take advantage of the FR setting, and a dlopen()ed object was
> compiled as fr0 rather than modeless.  I agree it's possible but it
> seems unlikely.

A reasonable point.
 
> I know nothing about the way Android loading works though. :-) Could you
> describe it in more detail?  Is it significantly different from glibc's
> dynamic loader running a PIE?

I am working from fragments of information on this aspect still so I need to 
get more clarification from Android developers. My current understanding is 
that native parts of applications are actually shared libraries and form part 
of, but not necessarily the entry to, an application. Since such a shared 
library can't be 'required' by anything it must be loaded explicitly. I'll get 
clarification but the potential need for dynamic mode switching in Android need 
not affect the decision that GNU/Linux takes.
 
> >> If we do end up using ELF flags then maybe adding two new EF_MIPS_ABI
> >> enums would be better.  It's more likely to be trapped by old loaders
> >> and avoids eating up those precious remaining bits.
> >
> > Sound's reasonable but I'm still trying to determine how this
> > information can be propagated from loader to dynamic loader.
> 
> The dynamic loader has access to the ELF headers so I didn't think it
> would need any help.

As I understand it the dynamic loader only has specific access to the program 
headers of the executable not the ELF headers. There is no question that the 
dynamic loader has access to DSO ELF headers but we need the start point too.
 
> >> You didn't say specifically how a static program's crt code would
> >> know whether it was linked as modeless or in a specific FR mode.
> >> Maybe the linker could define a special hidden symbol?
> >
> > Why do you say crt rather than dlopen? The mode requirement should
> > only matter if you want to change it and dlopen should be able to
> > access information in the same way 

Re: [LM-32] Code generation for address loading

2014-02-24 Thread Anthony Green
FX MOREL  writes:

> Hi everyone,
>
> I am developing on a custom design using the LatticeMico32
> architecture and I use gcc 4.5.1 to compile C code for this arch.
>
> In this architecture, the loading of an address 0x always
> takes two assembly instructions to fetch the address because
> immediates are on 16 bits :
> mvhi r1, 0x
> ori r1, r1, 0x
> ...
> lw r2, r1
>
> In my situation, nearly all the symbols are located in the same 64kB
> region and their address share the same hi-part, so I am trying to
> minimize the overload of always using two instructions when only one
> is needed.
>

[additional details deleted]

> Because the symbol mapping phase is done during linking, I have little
> chance to know the future symbol address at code generation but is
> there some way I could make this work ?
> If I affect the symbol to a dedicated section (with the __attribute__
> ((section())) directive ), is there a way to know its section during
> code generation ?
>
> I understand that I am asking for a very 'dangerous' advice but again,
> this will only be a custom optimization for a custom design.

Have you considered doing this through custom GNU linker relaxation
work?  I would try this before hacking away at the compiler.

AG


>
> Thank you.
>
> F-X Morel


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Paul E. McKenney
On Mon, Feb 24, 2014 at 03:35:04PM -0800, Linus Torvalds wrote:
> On Mon, Feb 24, 2014 at 2:37 PM, Paul E. McKenney
>  wrote:
> >>
> >> What if the "nothing modifies 'p'" part looks like this:
> >>
> >> if (p != &myvariable)
> >> return;
> >>
> >> and now any sane compiler will happily optimize "q = *p" into "q =
> >> myvariable", and we're all done - nothing invalid was ever
> >
> > Yes, the compiler could do that.  But it would still be required to
> > carry a dependency from the memory_order_consume read to the "*p",
> 
> But that's *BS*. You didn't actually listen to the main issue.
> 
> Paul, why do you insist on this carries-a-dependency crap?

Sigh.  Read on...

> It's broken. If you don't believe me, then believe the compiler person
> who already piped up and told you so.
> 
> The "carries a dependency" model is broken. Get over it.
> 
> No sane compiler will ever distinguish two different registers that
> have the same value from each other. No sane compiler will ever say
> "ok, register r1 has the exact same value as register r2, but r2
> carries the dependency, so I need to make sure to pass r2 to that
> function or use it as a base pointer".
> 
> And nobody sane should *expect* a compiler to distinguish two
> registers with the same value that way.
> 
> So the whole model is broken.
> 
> I gave an alternate model (the "restrict"), and you didn't seem to
> understand the really fundamental difference. It's not a language
> difference, it's a conceptual difference.
> 
> In the broken "carries a dependency" model, you have fight all those
> aliases that can have the same value, and it is not a fight you can
> win. We've had the "p-p" examples, we've had the "p&0" examples, but
> the fact is, that "p==&myvariable" example IS EXACTLY THE SAME THING.
> 
> All three of those things: "p-p", "p&0", and "p==&myvariable" mean
> that any compiler worth its salt now know that "p" carries no
> information, and will optimize it away.
> 
> So please stop arguing against that. Whenever you argue against that
> simple fact, you are arguing against sane compilers.

So let me see if I understand your reasoning.  My best guess is that it
goes something like this:

1.  The Linux kernel contains code that passes pointers from
rcu_dereference() through external functions.

2.  Code in the Linux kernel expects the normal RCU ordering
guarantees to be in effect even when external functions are
involved.

3.  When compiling one of these external functions, the C compiler
has no way of knowing about these RCU ordering guarantees.

4.  The C compiler might therefore apply any and all optimizations
to these external functions.

5.  This in turn implies that we the only way to prohibit any given
optimization from being applied to the results obtained from
rcu_dereference() is to prohibit that optimization globally.

6.  We have to be very careful what optimizations are globally
prohibited, because a poor choice could result in unacceptable
performance degradation.

7.  Therefore, the only operations that can be counted on to
maintain the needed RCU orderings are those where the compiler
really doesn't have any choice, in other words, where any
reasonable way of computing the result will necessarily maintain
the needed ordering.

Did I get this right, or am I confused?

> So *accept* the fact that some operations (and I guarantee that there
> are more of those than you can think of, and you can create them with
> various tricks using pretty much *any* feature in the C language)
> essentially take the data information away. And just accept the fact
> that then the ordering goes away too.

Actually, the fact that there are more potential optimizations than I can
think of is a big reason for my insistence on the carries-a-dependency
crap.  My lack of optimization omniscience makes me very nervous about
relying on there never ever being a reasonable way of computing a given
result without preserving the ordering.

> So give up on "carries a dependency". Because there will be cases
> where that dependency *isn't* carried.
> 
> The language of the standard needs to get *away* from the broken
> model, because otherwise the standard is broken.
> 
> I suggest we instead talk about "litmus tests" and why certain code
> sequences are ordered, and others are not.

OK...

> So the code sequence I already mentioned is *not* ordered:
> 
> Litmus test 1:
> 
> p = atomic_read(pp, consume);
> if (p == &variable)
> return p->val;
> 
>is *NOT* ordered, because the compiler can trivially turn this into
> "return variable.val", and break the data dependency.

Right, given your model, the compiler is free to produce code that
doesn't order the load from pp against the load from p->val.

>This is true *regardless* of any "carries a dependency" language,
> because that language

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds
On Mon, Feb 24, 2014 at 3:35 PM, Linus Torvalds
 wrote:
>
> Litmus test 1:
>
> p = atomic_read(pp, consume);
> if (p == &variable)
> return p->val;
>
>is *NOT* ordered

Btw, don't get me wrong. I don't _like_ it not being ordered, and I
actually did spend some time thinking about my earlier proposal on
strengthening the 'consume' ordering.

I have for the last several years been 100% convinced that the Intel
memory ordering is the right thing, and that people who like weak
memory ordering are wrong and should try to avoid reproducing if at
all possible. But given that we have memory orderings like power and
ARM, I don't actually see a sane way to get a good strong ordering.
You can teach compilers about cases like the above when they actually
see all the code and they could poison the value chain etc. But it
would be fairly painful, and once you cross object files (or even just
functions in the same compilation unit, for that matter), it goes from
painful to just "ridiculously not worth it".

So I think the C semantics should mirror what the hardware gives us -
and do so even in the face of reasonable optimizations - not try to do
something else that requires compilers to treat "consume" very
differently.

If people made me king of the world, I'd outlaw weak memory ordering.
You can re-order as much as you want in hardware with speculation etc,
but you should always *check* your speculation and make it *look* like
you did everything in order. Which is pretty much the intel memory
ordering (ignoring the write buffering).

   Linus