Re: [RFC] Meta-description for tree and gimple folding

2014-03-05 Thread Richard Biener
On Tue, 4 Mar 2014, Marc Glisse wrote:

> On Mon, 3 Mar 2014, Richard Biener wrote:
> 
> > > How do I restrict some subexpression to have
> > > a single use?
> > 
> > This kind of restrictions come via the valueize() hook - simply
> > valueize to NULL_TREE to make the match fail (for example
> > SSA_NAME_OCCURS_IN_ABNORMAL_PHI could be made fail that way).
> 
> Shouldn't that single-use property depend more on the transformation and less
> on where it is called from? a+b-b -> a is always going to be a good idea
> (well, register pressure aside), even if a+b is used in many other places. But
> if you are using a*b elsewhere, turning a*b+c into FMA doesn't make so much
> sense.

Yeah, that's true.

> Well, we can always call has_single_use on some @i if it is an SSA_NAME.

Sure.  As I have to add the capability to guard patterns with flags
doing an additional has_single_use test there is easy (but of course
that's something only available when in SSA form - something to keep
in mind if we also want to create a GENERIC variant of the transform).

Note that the concept of having a single-use makes whether a pattern
matches possibly dependent on the order of processing uses and
dependent on dead code being removed.  Just sth to keep in mind if you
want the maximum number of transforms rather than being cautious
by default.

> > > If I write a COND_EXPR matcher, could it generate code for phiopt as
> > > well?
> > 
> > Not sure, what do you have in mind specifically?
> 
> fold-const.c has the equivalent of:
> (define_match_and_simplify abs
>   (COND_EXPR (LT_EXPR @0 zero_p) (NEGATE_EXPR @0) @0)
>   (ABS_EXPR @0))
> 
> (it would help to be able to write LT_EXPR|LE_EXPR, maybe even to try
> automatically simplify(!a)?c:b for a?b:c)
> which works well on trees, but requires more complicated code in phiopt (same
> for min/max).

Yeah, inventing short-cuts for stuff like LT_EXPR|LE_EXPR or
(PLUS_EXPR sub-expr1 sub-expr2) vs. (PLUS_EXPR sub-expr2 sub-expr1)
is on my list.  In the end it will (internally) duplicate the
pattern to have one for each variant.

I'm still pondering over the exact syntax for both (I can easily
add builtin knowledge for commutative operators of course).  Eventually
the preprocessor comes to our rescue here ...

#define X(CMP) \
  (define_match_and_simplify \
((COND_EXPR (CMP @0 zero_p) (NEGATE_EXPR @0) @0) \
(ABS_EXPR @0))
X(LT_EXPR)
X(LE_EXPR)

uglier than sth like

(define_op LT_OR_LE_EXPR LT_EXPR|LE_EXPR)
(define_match_and_simplify
  ((COND_EXPR (LT_OR_LE_EXPR @0 zero_p) ...

or what you proposed.  But how do you handle

 (BIT_IOR_EXPR (LT_EXPR|LE_EXPR @0 @1) (GE_EXPR|GT_EXPR @0 @1))

for example?  a) Match variants in lock-step?  b) Have the above
generate 4 variants?  c) Disallow it (looks error-prone)?

I'd rather keep it simple for now ;)

> > > How do you handle a
> > > transformation that currently tries to recursively fold something else and
> > > does the main transformation only if that simplified?
> > 
> > And doesn't do the other folding (because it's not in the IL literally?)?
> > Similar to the cst without overflow case, by writing custom C code
> > and allowing that to signal failure.
> 
> I am not sure if it will be easy to write code that works for generic and
> gimple. I'll see...

That's true - though GIMPLE and GENERIC share trees which makes it
easy enough for most of the cases.  We'll see ...

For now I'm concentrating of fitting the framework into forwprop,
replacing manual patterns that occur there.

Richard.


Re: linux says it is a bug

2014-03-05 Thread Richard Henderson
On 03/04/2014 10:12 PM, Yury Gribov wrote:
>>> Asms without outputs are automatically volatile.  So there ought be zero 
>>> change
>>> with and without the explicit use of the __volatile__ keyword.
>>
>> That’s what the documentation says but it wasn’t actually true
>> as of a couple of releases ago, as I recall.
> 
> Looks like 2005:
> 
> $ git annotate gcc/c/c-typeck.c
> ...
> 89552023(   bonzini 2005-10-05 12:17:16 +   9073) /* asm
> statements without outputs, including simple ones, are treated
> 89552023(   bonzini 2005-10-05 12:17:16 +   9074)   as
> volatile.  */
> 89552023(   bonzini 2005-10-05 12:17:16 +   9075)
> ASM_INPUT_P (args) = simple;
> 89552023(   bonzini 2005-10-05 12:17:16 +   9076)
> ASM_VOLATILE_P (args) = (noutputs == 0);

Yep, that's the one.  So, more than "a couple" of releases: gcc 4.2 and later.


r~


Re: linux says it is a bug

2014-03-05 Thread Paul_Koning

On Mar 5, 2014, at 10:07 AM, Richard Henderson  wrote:

> On 03/04/2014 10:12 PM, Yury Gribov wrote:
 Asms without outputs are automatically volatile.  So there ought be zero 
 change
 with and without the explicit use of the __volatile__ keyword.
>>> 
>>> That’s what the documentation says but it wasn’t actually true
>>> as of a couple of releases ago, as I recall.
>> 
>> Looks like 2005:
>> 
>> $ git annotate gcc/c/c-typeck.c
>> ...
>> 89552023(   bonzini 2005-10-05 12:17:16 +   9073) /* asm
>> statements without outputs, including simple ones, are treated
>> 89552023(   bonzini 2005-10-05 12:17:16 +   9074)   as
>> volatile.  */
>> 89552023(   bonzini 2005-10-05 12:17:16 +   9075)
>> ASM_INPUT_P (args) = simple;
>> 89552023(   bonzini 2005-10-05 12:17:16 +   9076)
>> ASM_VOLATILE_P (args) = (noutputs == 0);
> 
> Yep, that's the one.  So, more than "a couple" of releases: gcc 4.2 and later.

Thanks gentlemen.  That explains it — we ran into this in GCC 3.3.3 and then 
upgraded from there straight to V4.6 or so.

paul


RE: [RFC][ARM] Naming for new switch to check for mixed hardfloat/softfloat compat

2014-03-05 Thread Thomas Preud'homme
> From: Thomas Preud'homme
> [Since I can now send emails without disclaimers, I registered to the mailing
> list with my work email. Thus no need to CC me anymore.]

Failed in the previous 2 emails. Sorry about that.





RE: [RFC][ARM] Naming for new switch to check for mixed hardfloat/softfloat compat

2014-03-05 Thread Thomas Preud'homme
[Since I can now send emails without disclaimers, I registered to the mailing 
list with my work email. Thus no need to CC me anymore.]

My apologize for the line length, the MUA says it all I think. It seems to 
ignore my word wrap setting

> From: Joseph Myers [mailto:jos...@codesourcery.com]
> Sent: Wednesday, March 05, 2014 2:13 AM
> 
> 
> If the function is only declared and not called or defined (in a system
> header etc.), of course you don't want that to affect the ABI (even in the
> case of an inline function in a system header, unless an out-of-line call
> is generated to it).  But a call to a function not defined in that unit
> does affect the ABI compatibility, if the call involves affected types.

This should be fine with the current implementation as I did the check inside 
the hooks TARGET_FUNCTION_ARG, TARGET_FUNCTION_VALUE and INIT_CUMULATIVE_ARG. 
The two former allow to test whether a parameter or a return value is a float 
while the latter is used to test whether the float ABI actually matters. 
INIT_CUMULATIVE_ARG is called when knowing where a parameter ends up matters, 
so when a function is defined (not declared) or when it is called. Checking 
whether the function is static or variadic allow to have a complete check. 
Inlining does not need any additional check since if the function is inlined 
there is no call and if not there will be a call and the current checks will 
catch it. However, one thing I did not do is look in function calls if the 
return value is ignored or not. If it is ignored the float ABI does not matter.

> 
> Some libgcc functions on ARM have ABIs that depend on which AAPCS
> variant
> is in use - that is, libcalls, not just explicitly defined or called
> functions, can affect the ABI compatibility.  But the RTABI functions
> don't - if you allow for that, then you increase the number of cases that
> end up compatible with both ABI variants.

Do you have some example of such libgcc functions? Is there any of them with no 
link to the use of float in public interface? Without knowing any such case 
from the top of my head I would say that the use of any of these functions make 
the compilation unit not compatible with both calling conventions since it 
requires libgcc for a specific calling convention but maybe the runtime library 
can be treated differently than other libraries.

> 
> On ARM, variadic functions use only the base AAPCS and so don't affect
> compatibility even if they have floating-point (or vector) arguments.
> (This is something that's different on some other architectures with
> similar issues.)

Yep I took this into consideration. 

Best regards,

Thomas




RE: [RFC][ARM] Naming for new switch to check for mixed hardfloat/softfloat compat

2014-03-05 Thread Thomas Preud'homme
> From: Richard Sandiford [mailto:rdsandif...@googlemail.com]

> Yeah, that'd be great.  The checking that MIPS's -mno-float should do
> (but doesn't do) would be a superset of what you need, since the MIPS
> case would include internal uses of floats.  But it would definitely
> make sense to share the bits that can be shared and (for example) to get
> consistent error messages for them.

I agree on the error messages and on a common switch. I am not sure for the 
rest though (see below).

> How about warning for all float types (as the float option) and
> all vector types (as a separate option)?  I'm not sure there's
> as much value in warning specifically about "hardware" types
> since that can always change in future.  E.g. a while ago the
> only MIPS vector of interest was V2SF, but then Loongson added
> some integer ones, and now MSA is adding 128-bit vector types.
> There could be wider types in future, as happened for 512-bit AVX.

I do not think it is a good approach. First, it means on ARM a user would need 
to use several switches to test compatibility and at least one part would be 
ARM specific (vector of less than 4 float/double). Second, when the function is 
variadic ARM use the base calling convention, no matter what is the calling 
convention normally used. So a float in a variadic function is not a problem 
for ARM but might be a problem for MIPS. I agree though that there is an 
overlap with the -no-float switch of MIPS. On the other hand checking for the 
dual compatibility of an ARM code is just about 15 lines of code and doing this 
in middle end with some hooks for architecture specific bits would make the 
code much bigger I think. I'll think some more about it and give you a more 
decided opinion. Right now my mind is not set.

> -mno-float as it stands today is really just -msoft-float with some
> floating-point support removed from the library to save space.
> One of the important examples is that the floating-point printf
> and scanf formats are not supported, so printf and scanf do not
> pull in large parts of the software floating-point library.
> 
> But the restrictions that apply to -mno-float should make it
> link-compatible with -mhard-float too, as for your use case.

Right. With your explanation and the description of the -mno-float switch I now 
understand what it does. So as I said, there is an important overlap with my 
work and if I can find an approach that manage to share some code I will do 
that. I am unsure at present about whether it is a good thing to do. I believe 
a new switch is necessary since what I need for ARM is slightly different than 
-mno-float meaning. The logic could be partially shared if it is not too 
complicated to do so.

Best regards,

Thomas




Re: Help Required on Missing GOTO statements in Gimple/SSA/CFG Pass ...

2014-03-05 Thread Mohsin Khan
Hi,
 I am extremely sorry as I couldn't reply from many days. Actually I
was busy with some personal work so I didn't work for many days.
 I didn't use MELT because, I didn't want learn a new language and
also my professor wanted me to code the plugin in C/C++ .

On 2/18/14, Basile Starynkevitch  wrote:
> On Tue, 2014-02-18 at 11:17 +0530, Mohsin Khan wrote:
>> Hi,
>>
>>  I am developing plugins for the GCC-4.8.2. I am a newbie in plugins.
>> I wrote a plugin and tried to count and see the Goto Statements using
>> the gimple_stmt_iterator. I get gimple statements printed on my
>> stdout, but I am not able to find the line which has goto statements.
>
> I guess that most GOTOs are just becoming implicit as the link to the
> next basic block.
>
> Probably
>
>if (!cond) goto end;
>something;
>   end:;
>
> has nearly the same Gimple representation than
>while (cond) {
>  something;
>}
>
> BTW, did you consider using MELT http://gcc-melt.org/ to code your GCC
> extension?
>
> --
> Basile STARYNKEVITCH http://starynkevitch.net/Basile/
> email: basilestarynkevitchnet mobile: +33 6 8501 2359
> 8, rue de la Faiencerie, 92340 Bourg La Reine, France
> *** opinions {are only mine, sont seulement les miennes} ***
>
>
>


Can Some one please help me on this gcc plugin..

2014-03-05 Thread Mohsin Khan
Hi,

 I am developing plugins for the GCC-4.8.2. I am a newbie in plugins.
I wrote a plugin and tried to count and see the Goto Statements using
the gimple_stmt_iterator. I get gimple statements printed on my
stdout, but I am not able to find the line which has goto statements.
I only get other lines such as variable declaration and logic
statements, but no goto statements.
  When I open the Gimple/SSA/CFG file seperately using the vim editor
I find the goto statements are actually present.
  So, can anyone help me. How can I actually get the count of Goto
statements or atleast access these goto statements using some
iterator.
  I have used -fdump-tree-all, -fdump-tree-cfg as flags.

Here is the pseudocode:

struct register_pass_info pass_info = {
&(pass_plugin.pass), /* Address of new pass,
here, the 'struct
 opt_pass' field of
'gimple_opt_pass'
 defined above */
"ssa",   /* Name of the reference
pass for hooking up
 the new pass.   ??? */
0,   /* Insert the pass at the
specified instance
 number of the reference
pass. Do it for
 every instance if it is 0. */
PASS_POS_INSERT_AFTER/* how to insert the new
pass: before,
 after, or replace. Here
we are inserting
 a pass names 'plug' after
the pass named
 'pta' */
};

.

static unsigned int dead_code_elimination (void)
{

   FOR_EACH_BB_FN (bb, cfun)
 {
  //  gimple_dump_bb(stdout,bb,0,0);
 //printf("\nIn New BB");

   gsi2= gsi_after_labels (bb);
  print_gimple_stmt(stdout,gsi_stmt(gsi2),0,0);
 /*Iterating over each gimple statement in a basic block*/
for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
   g = gsi_stmt(gsi);

print_gimple_stmt(stdout,g,0,0);

  if (gimple_code(g)==GIMPLE_GOTO)
  printf("\nFound GOTO stmt\n");

print_gimple_stmt(stdout,gsi_stmt(gsi),0,0);
  //analyze_gimple_statement (gsi);
 }
   }
}


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-03-05 Thread Torvald Riegel
On Tue, 2014-03-04 at 11:00 -0800, Paul E. McKenney wrote:
> On Mon, Mar 03, 2014 at 09:46:19PM +0100, Torvald Riegel wrote:
> > xagsmtp2.20140303204700.3...@vmsdvma.vnet.ibm.com
> > X-Xagent-Gateway: vmsdvma.vnet.ibm.com (XAGSMTP2 at VMSDVMA)
> > 
> > On Mon, 2014-03-03 at 11:20 -0800, Paul E. McKenney wrote:
> > > On Mon, Mar 03, 2014 at 07:55:08PM +0100, Torvald Riegel wrote:
> > > > xagsmtp2.20140303190831.9...@uk1vsc.vnet.ibm.com
> > > > X-Xagent-Gateway: uk1vsc.vnet.ibm.com (XAGSMTP2 at UK1VSC)
> > > > 
> > > > On Fri, 2014-02-28 at 16:50 -0800, Paul E. McKenney wrote:
> > > > > +oDo not use the results from the boolean "&&" and "||" when
> > > > > + dereferencing.  For example, the following (rather improbable)
> > > > > + code is buggy:
> > > > > +
> > > > > + int a[2];
> > > > > + int index;
> > > > > + int force_zero_index = 1;
> > > > > +
> > > > > + ...
> > > > > +
> > > > > + r1 = rcu_dereference(i1)
> > > > > + r2 = a[r1 && force_zero_index];  /* BUGGY!!! */
> > > > > +
> > > > > + The reason this is buggy is that "&&" and "||" are often 
> > > > > compiled
> > > > > + using branches.  While weak-memory machines such as ARM or 
> > > > > PowerPC
> > > > > + do order stores after such branches, they can speculate loads,
> > > > > + which can result in misordering bugs.
> > > > > +
> > > > > +oDo not use the results from relational operators ("==", "!=",
> > > > > + ">", ">=", "<", or "<=") when dereferencing.  For example,
> > > > > + the following (quite strange) code is buggy:
> > > > > +
> > > > > + int a[2];
> > > > > + int index;
> > > > > + int flip_index = 0;
> > > > > +
> > > > > + ...
> > > > > +
> > > > > + r1 = rcu_dereference(i1)
> > > > > + r2 = a[r1 != flip_index];  /* BUGGY!!! */
> > > > > +
> > > > > + As before, the reason this is buggy is that relational operators
> > > > > + are often compiled using branches.  And as before, although
> > > > > + weak-memory machines such as ARM or PowerPC do order stores
> > > > > + after such branches, but can speculate loads, which can again
> > > > > + result in misordering bugs.
> > > > 
> > > > Those two would be allowed by the wording I have recently proposed,
> > > > AFAICS.  r1 != flip_index would result in two possible values (unless
> > > > there are further constraints due to the type of r1 and the values that
> > > > flip_index can have).
> > > 
> > > And I am OK with the value_dep_preserving type providing more/better
> > > guarantees than we get by default from current compilers.
> > > 
> > > One question, though.  Suppose that the code did not want a value
> > > dependency to be tracked through a comparison operator.  What does
> > > the developer do in that case?  (The reason I ask is that I have
> > > not yet found a use case in the Linux kernel that expects a value
> > > dependency to be tracked through a comparison.)
> > 
> > Hmm.  I suppose use an explicit cast to non-vdp before or after the
> > comparison?
> 
> That should work well assuming that things like "if", "while", and "?:"
> conditions are happy to take a vdp.

I currently don't see a reason why that should be disallowed.  If we
have allowed an implicit conversion to non-vdp, I believe that should
follow.  ?: could be somewhat special, in that the type depends on the
2nd and 3rd operand.  Thus, "vdp x = non-vdp ? vdp : vdp;" should be
allowed, whereas "vdp x = non-vdp ? non-vdp : vdp;" probably should be
disallowed if we don't provide for implicit casts from non-vdp to vdp.

> This assumes that p->a only returns
> vdp if field "a" is declared vdp, otherwise we have vdps running wild
> through the program.  ;-)

That's a good question.  For the scheme I had in mind, I'm not concerned
about vdps running wild because one needs to assign to explicitly
vdp-typed variables (or function arguments, etc.) to let vdp extend to
beyond single expressions.

Nonetheless, I think it's a good question how -> should behave if the
field is not vdp; in particular, should vdp->non_vdp be automatically
vdp?  One concern might be that we know something about non-vdp -- OTOH,
we shouldn't be able to do so because we (assume to) don't know anything
about the vdp pointer, so we can't infer something about something it
points to.

> The other thing that can happen is that a vdp can get handed off to
> another synchronization mechanism, for example, to reference counting:
> 
>   p = atomic_load_explicit(&gp, memory_order_consume);
>   if (do_something_with(p->a)) {
>   /* fast path protected by RCU. */
>   return 0;
>   }
>   if (atomic_inc_not_zero(&p->refcnt) {

Is the argument to atomic_inc_no_zero vdp or non-vdp?

>   /* slow path protected by reference counting. */
>   return do_something_else_with((struct foo *)p);  /* 

Re: Can Some one please help me on this gcc plugin..

2014-03-05 Thread David Malcolm
On Wed, 2014-03-05 at 21:58 +0530, Mohsin Khan wrote:
> Hi,
> 
>  I am developing plugins for the GCC-4.8.2. I am a newbie in plugins.
> I wrote a plugin and tried to count and see the Goto Statements using
> the gimple_stmt_iterator. I get gimple statements printed on my
> stdout, but I am not able to find the line which has goto statements.
> I only get other lines such as variable declaration and logic
> statements, but no goto statements.
>   When I open the Gimple/SSA/CFG file seperately using the vim editor
> I find the goto statements are actually present.
>   So, can anyone help me. How can I actually get the count of Goto
> statements or atleast access these goto statements using some
> iterator.
>   I have used -fdump-tree-all, -fdump-tree-cfg as flags.
> 
> Here is the pseudocode:
> 
> struct register_pass_info pass_info = {
> &(pass_plugin.pass), /* Address of new pass,
> here, the 'struct
>  opt_pass' field of
> 'gimple_opt_pass'
>  defined above */
> "ssa",   /* Name of the reference
> pass for hooking up
>  the new pass.   ??? */
> 0,   /* Insert the pass at the
> specified instance
>  number of the reference
> pass. Do it for
>  every instance if it is 0. */
> PASS_POS_INSERT_AFTER/* how to insert the new
> pass: before,

You're inserting your pass after the "ssa" pass, which converts the CFG
to SSA form.  This is run *after* the function has been converted from a
flat list of gimple statements into a CFG of basic blocks, and that CFG
conversion eliminates the goto statements in favor of edges within the
CFG.  If you see "goto" in the dump, that's presumably just a textual
way of expressing an edge in the CFG.

To see gimple goto statements, you need to run your pass *before* the
convertion to CFG, which happens fairly early on, in the "cfg" pass

FWIW there's a diagram showing the passes here:
http://gcc-python-plugin.readthedocs.org/en/latest/tables-of-passes.html


Hope this is helpful
Dave



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-03-05 Thread Torvald Riegel
On Tue, 2014-03-04 at 13:35 -0800, Paul E. McKenney wrote:
> On Tue, Mar 04, 2014 at 11:00:32AM -0800, Paul E. McKenney wrote:
> > On Mon, Mar 03, 2014 at 09:46:19PM +0100, Torvald Riegel wrote:
> > > xagsmtp2.20140303204700.3...@vmsdvma.vnet.ibm.com
> > > X-Xagent-Gateway: vmsdvma.vnet.ibm.com (XAGSMTP2 at VMSDVMA)
> > > 
> > > On Mon, 2014-03-03 at 11:20 -0800, Paul E. McKenney wrote:
> > > > On Mon, Mar 03, 2014 at 07:55:08PM +0100, Torvald Riegel wrote:
> > > > > xagsmtp2.20140303190831.9...@uk1vsc.vnet.ibm.com
> > > > > X-Xagent-Gateway: uk1vsc.vnet.ibm.com (XAGSMTP2 at UK1VSC)
> > > > > 
> > > > > On Fri, 2014-02-28 at 16:50 -0800, Paul E. McKenney wrote:
> > > > > > +o  Do not use the results from the boolean "&&" and "||" when
> > > > > > +   dereferencing.  For example, the following (rather improbable)
> > > > > > +   code is buggy:
> > > > > > +
> > > > > > +   int a[2];
> > > > > > +   int index;
> > > > > > +   int force_zero_index = 1;
> > > > > > +
> > > > > > +   ...
> > > > > > +
> > > > > > +   r1 = rcu_dereference(i1)
> > > > > > +   r2 = a[r1 && force_zero_index];  /* BUGGY!!! */
> > > > > > +
> > > > > > +   The reason this is buggy is that "&&" and "||" are often 
> > > > > > compiled
> > > > > > +   using branches.  While weak-memory machines such as ARM or 
> > > > > > PowerPC
> > > > > > +   do order stores after such branches, they can speculate loads,
> > > > > > +   which can result in misordering bugs.
> > > > > > +
> > > > > > +o  Do not use the results from relational operators ("==", "!=",
> > > > > > +   ">", ">=", "<", or "<=") when dereferencing.  For example,
> > > > > > +   the following (quite strange) code is buggy:
> > > > > > +
> > > > > > +   int a[2];
> > > > > > +   int index;
> > > > > > +   int flip_index = 0;
> > > > > > +
> > > > > > +   ...
> > > > > > +
> > > > > > +   r1 = rcu_dereference(i1)
> > > > > > +   r2 = a[r1 != flip_index];  /* BUGGY!!! */
> > > > > > +
> > > > > > +   As before, the reason this is buggy is that relational operators
> > > > > > +   are often compiled using branches.  And as before, although
> > > > > > +   weak-memory machines such as ARM or PowerPC do order stores
> > > > > > +   after such branches, but can speculate loads, which can again
> > > > > > +   result in misordering bugs.
> > > > > 
> > > > > Those two would be allowed by the wording I have recently proposed,
> > > > > AFAICS.  r1 != flip_index would result in two possible values (unless
> > > > > there are further constraints due to the type of r1 and the values 
> > > > > that
> > > > > flip_index can have).
> > > > 
> > > > And I am OK with the value_dep_preserving type providing more/better
> > > > guarantees than we get by default from current compilers.
> > > > 
> > > > One question, though.  Suppose that the code did not want a value
> > > > dependency to be tracked through a comparison operator.  What does
> > > > the developer do in that case?  (The reason I ask is that I have
> > > > not yet found a use case in the Linux kernel that expects a value
> > > > dependency to be tracked through a comparison.)
> > > 
> > > Hmm.  I suppose use an explicit cast to non-vdp before or after the
> > > comparison?
> > 
> > That should work well assuming that things like "if", "while", and "?:"
> > conditions are happy to take a vdp.  This assumes that p->a only returns
> > vdp if field "a" is declared vdp, otherwise we have vdps running wild
> > through the program.  ;-)
> > 
> > The other thing that can happen is that a vdp can get handed off to
> > another synchronization mechanism, for example, to reference counting:
> > 
> > p = atomic_load_explicit(&gp, memory_order_consume);
> > if (do_something_with(p->a)) {
> > /* fast path protected by RCU. */
> > return 0;
> > }
> > if (atomic_inc_not_zero(&p->refcnt) {
> > /* slow path protected by reference counting. */
> > return do_something_else_with((struct foo *)p);  /* CHANGE */
> > }
> > /* Needed slow path, but raced with deletion. */
> > return -EAGAIN;
> > 
> > I am guessing that the cast ends the vdp.  Is that the case?
> 
> And here is a more elaborate example from the Linux kernel:
> 
>   struct md_rdev value_dep_preserving *rdev;  /* CHANGE */
> 
>   rdev = rcu_dereference(conf->mirrors[disk].rdev);
>   if (r1_bio->bios[disk] == IO_BLOCKED
>   || rdev == NULL
>   || test_bit(Unmerged, &rdev->flags)
>   || test_bit(Faulty, &rdev->flags))
>   continue;
> 
> The fact that the "rdev == NULL" returns vdp does not force the "||"
> operators to be evaluated arithmetically because the entire function
> is an "if" condition, correct?

That's a good question, and one that as far as I understand currently,
essentially boils down to whether we want to have tight restrictions on
which operations are still vdp

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-03-05 Thread Torvald Riegel
On Tue, 2014-03-04 at 22:11 +, Peter Sewell wrote:
> On 3 March 2014 20:44, Torvald Riegel  wrote:
> > On Sun, 2014-03-02 at 04:05 -0600, Peter Sewell wrote:
> >> On 1 March 2014 08:03, Paul E. McKenney  wrote:
> >> > On Sat, Mar 01, 2014 at 04:06:34AM -0600, Peter Sewell wrote:
> >> >> Hi Paul,
> >> >>
> >> >> On 28 February 2014 18:50, Paul E. McKenney 
> >> >>  wrote:
> >> >> > On Thu, Feb 27, 2014 at 12:53:12PM -0800, Paul E. McKenney wrote:
> >> >> >> On Thu, Feb 27, 2014 at 11:47:08AM -0800, Linus Torvalds wrote:
> >> >> >> > On Thu, Feb 27, 2014 at 11:06 AM, Paul E. McKenney
> >> >> >> >  wrote:
> >> >> >> > >
> >> >> >> > > 3.  The comparison was against another RCU-protected pointer,
> >> >> >> > > where that other pointer was properly fetched using one
> >> >> >> > > of the RCU primitives.  Here it doesn't matter which 
> >> >> >> > > pointer
> >> >> >> > > you use.  At least as long as the rcu_assign_pointer() 
> >> >> >> > > for
> >> >> >> > > that other pointer happened after the last update to the
> >> >> >> > > pointed-to structure.
> >> >> >> > >
> >> >> >> > > I am a bit nervous about #3.  Any thoughts on it?
> >> >> >> >
> >> >> >> > I think that it might be worth pointing out as an example, and 
> >> >> >> > saying
> >> >> >> > that code like
> >> >> >> >
> >> >> >> >p = atomic_read(consume);
> >> >> >> >X;
> >> >> >> >q = atomic_read(consume);
> >> >> >> >Y;
> >> >> >> >if (p == q)
> >> >> >> > data = p->val;
> >> >> >> >
> >> >> >> > then the access of "p->val" is constrained to be data-dependent on
> >> >> >> > *either* p or q, but you can't really tell which, since the 
> >> >> >> > compiler
> >> >> >> > can decide that the values are interchangeable.
> >> >> >> >
> >> >> >> > I cannot for the life of me come up with a situation where this 
> >> >> >> > would
> >> >> >> > matter, though. If "X" contains a fence, then that fence will be a
> >> >> >> > stronger ordering than anything the consume through "p" would
> >> >> >> > guarantee anyway. And if "X" does *not* contain a fence, then the
> >> >> >> > atomic reads of p and q are unordered *anyway*, so then whether the
> >> >> >> > ordering to the access through "p" is through p or q is kind of
> >> >> >> > irrelevant. No?
> >> >> >>
> >> >> >> I can make a contrived litmus test for it, but you are right, the 
> >> >> >> only
> >> >> >> time you can see it happen is when X has no barriers, in which case
> >> >> >> you don't have any ordering anyway -- both the compiler and the CPU 
> >> >> >> can
> >> >> >> reorder the loads into p and q, and the read from p->val can, as you 
> >> >> >> say,
> >> >> >> come from either pointer.
> >> >> >>
> >> >> >> For whatever it is worth, hear is the litmus test:
> >> >> >>
> >> >> >> T1:   p = kmalloc(...);
> >> >> >>   if (p == NULL)
> >> >> >>   deal_with_it();
> >> >> >>   p->a = 42;  /* Each field in its own cache line. */
> >> >> >>   p->b = 43;
> >> >> >>   p->c = 44;
> >> >> >>   atomic_store_explicit(&gp1, p, memory_order_release);
> >> >> >>   p->b = 143;
> >> >> >>   p->c = 144;
> >> >> >>   atomic_store_explicit(&gp2, p, memory_order_release);
> >> >> >>
> >> >> >> T2:   p = atomic_load_explicit(&gp2, memory_order_consume);
> >> >> >>   r1 = p->b;  /* Guaranteed to get 143. */
> >> >> >>   q = atomic_load_explicit(&gp1, memory_order_consume);
> >> >> >>   if (p == q) {
> >> >> >>   /* The compiler decides that q->c is same as p->c. */
> >> >> >>   r2 = p->c; /* Could get 44 on weakly order system. */
> >> >> >>   }
> >> >> >>
> >> >> >> The loads from gp1 and gp2 are, as you say, unordered, so you get 
> >> >> >> what
> >> >> >> you get.
> >> >> >>
> >> >> >> And publishing a structure via one RCU-protected pointer, updating 
> >> >> >> it,
> >> >> >> then publishing it via another pointer seems to me to be asking for
> >> >> >> trouble anyway.  If you really want to do something like that and 
> >> >> >> still
> >> >> >> see consistency across all the fields in the structure, please put a 
> >> >> >> lock
> >> >> >> in the structure and use it to guard updates and accesses to those 
> >> >> >> fields.
> >> >> >
> >> >> > And here is a patch documenting the restrictions for the current Linux
> >> >> > kernel.  The rules change a bit due to rcu_dereference() acting a bit
> >> >> > differently than atomic_load_explicit(&p, memory_order_consume).
> >> >> >
> >> >> > Thoughts?
> >> >>
> >> >> That might serve as informal documentation for linux kernel
> >> >> programmers about the bounds on the optimisations that you expect
> >> >> compilers to do for common-case RCU code - and I guess that's what you
> >> >> intend it to be for.   But I don't see how one can make it precise
> >> >> enough to serve as a language definition, so that compiler people
> >> >> could confidently say "yes, we respect that", which I guess is what
> >> >> you

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-03-05 Thread Paul E. McKenney
On Wed, Mar 05, 2014 at 05:26:36PM +0100, Torvald Riegel wrote:
> xagsmtp3.20140305162928.8...@uk1vsc.vnet.ibm.com
> X-Xagent-Gateway: uk1vsc.vnet.ibm.com (XAGSMTP3 at UK1VSC)
> 
> On Tue, 2014-03-04 at 11:00 -0800, Paul E. McKenney wrote:
> > On Mon, Mar 03, 2014 at 09:46:19PM +0100, Torvald Riegel wrote:
> > > xagsmtp2.20140303204700.3...@vmsdvma.vnet.ibm.com
> > > X-Xagent-Gateway: vmsdvma.vnet.ibm.com (XAGSMTP2 at VMSDVMA)
> > > 
> > > On Mon, 2014-03-03 at 11:20 -0800, Paul E. McKenney wrote:
> > > > On Mon, Mar 03, 2014 at 07:55:08PM +0100, Torvald Riegel wrote:
> > > > > xagsmtp2.20140303190831.9...@uk1vsc.vnet.ibm.com
> > > > > X-Xagent-Gateway: uk1vsc.vnet.ibm.com (XAGSMTP2 at UK1VSC)
> > > > > 
> > > > > On Fri, 2014-02-28 at 16:50 -0800, Paul E. McKenney wrote:
> > > > > > +o  Do not use the results from the boolean "&&" and "||" when
> > > > > > +   dereferencing.  For example, the following (rather improbable)
> > > > > > +   code is buggy:
> > > > > > +
> > > > > > +   int a[2];
> > > > > > +   int index;
> > > > > > +   int force_zero_index = 1;
> > > > > > +
> > > > > > +   ...
> > > > > > +
> > > > > > +   r1 = rcu_dereference(i1)
> > > > > > +   r2 = a[r1 && force_zero_index];  /* BUGGY!!! */
> > > > > > +
> > > > > > +   The reason this is buggy is that "&&" and "||" are often 
> > > > > > compiled
> > > > > > +   using branches.  While weak-memory machines such as ARM or 
> > > > > > PowerPC
> > > > > > +   do order stores after such branches, they can speculate loads,
> > > > > > +   which can result in misordering bugs.
> > > > > > +
> > > > > > +o  Do not use the results from relational operators ("==", "!=",
> > > > > > +   ">", ">=", "<", or "<=") when dereferencing.  For example,
> > > > > > +   the following (quite strange) code is buggy:
> > > > > > +
> > > > > > +   int a[2];
> > > > > > +   int index;
> > > > > > +   int flip_index = 0;
> > > > > > +
> > > > > > +   ...
> > > > > > +
> > > > > > +   r1 = rcu_dereference(i1)
> > > > > > +   r2 = a[r1 != flip_index];  /* BUGGY!!! */
> > > > > > +
> > > > > > +   As before, the reason this is buggy is that relational operators
> > > > > > +   are often compiled using branches.  And as before, although
> > > > > > +   weak-memory machines such as ARM or PowerPC do order stores
> > > > > > +   after such branches, but can speculate loads, which can again
> > > > > > +   result in misordering bugs.
> > > > > 
> > > > > Those two would be allowed by the wording I have recently proposed,
> > > > > AFAICS.  r1 != flip_index would result in two possible values (unless
> > > > > there are further constraints due to the type of r1 and the values 
> > > > > that
> > > > > flip_index can have).
> > > > 
> > > > And I am OK with the value_dep_preserving type providing more/better
> > > > guarantees than we get by default from current compilers.
> > > > 
> > > > One question, though.  Suppose that the code did not want a value
> > > > dependency to be tracked through a comparison operator.  What does
> > > > the developer do in that case?  (The reason I ask is that I have
> > > > not yet found a use case in the Linux kernel that expects a value
> > > > dependency to be tracked through a comparison.)
> > > 
> > > Hmm.  I suppose use an explicit cast to non-vdp before or after the
> > > comparison?
> > 
> > That should work well assuming that things like "if", "while", and "?:"
> > conditions are happy to take a vdp.
> 
> I currently don't see a reason why that should be disallowed.  If we
> have allowed an implicit conversion to non-vdp, I believe that should
> follow.

I am a bit nervous about a silent implicit conversion from vdp to
non-vdp in the general case.  However, when the result is being used by
a conditional, the silent implicit conversion makes a lot of sense.
Is that distinction something that the compiler can handle easily?

On the other hand, silent implicit conversion from non-vdp to vdp
is very useful for common code that can be invoked both by RCU
readers and by updaters.

>  ?: could be somewhat special, in that the type depends on the
> 2nd and 3rd operand.  Thus, "vdp x = non-vdp ? vdp : vdp;" should be
> allowed, whereas "vdp x = non-vdp ? non-vdp : vdp;" probably should be
> disallowed if we don't provide for implicit casts from non-vdp to vdp.

Actually, from the Linux-kernel code that I am seeing, we want to be able
to silently convert from non-vdp to vdp in order to permit common code
that is invoked from both RCU readers (vdp) and updaters (often non-vdp).
This common code must be compiled conservatively to allow vdp, but should
be just find with non-vdp.

Going through the combinations...

 0. vdp x = vdp ? vdp : vdp; /* OK, matches. */
 1. vdp x = vdp ? vdp : non-vdp; /* Silent conversion. */
 2. vdp x = vdp ? non-vdp : vdp; /* Silent conversion. */
 3. vdp 

RE: [RFC][ARM] Naming for new switch to check for mixed hardfloat/softfloat compat

2014-03-05 Thread Joseph S. Myers
On Wed, 5 Mar 2014, Thomas Preud'homme wrote:

> > Some libgcc functions on ARM have ABIs that depend on which AAPCS 
> > variant is in use - that is, libcalls, not just explicitly defined or 
> > called functions, can affect the ABI compatibility.  But the RTABI 
> > functions don't - if you allow for that, then you increase the number 
> > of cases that end up compatible with both ABI variants.
> 
> Do you have some example of such libgcc functions? Is there any of them 
> with no link to the use of float in public interface? Without knowing 
> any such case from the top of my head I would say that the use of any of 
> these functions make the compilation unit not compatible with both 
> calling conventions since it requires libgcc for a specific calling 
> convention but maybe the runtime library can be treated differently than 
> other libraries.

The functions affected use floating-point in their public interfaces - for 
example, __muldc3.  Note that libcalls have a different hook 
(TARGET_LIBCALL_VALUE, ending up using arm_libcall_uses_aapcs_base) from 
the ones you mentioned.  But if you use only functions that pass 
arm_libcall_uses_aapcs_base (i.e. the floating-point operations defined in 
RTABI) or don't involve floating point, then you can be compatible with 
both calling conventions.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-03-05 Thread Paul E. McKenney
On Wed, Mar 05, 2014 at 05:54:59PM +0100, Torvald Riegel wrote:
> On Tue, 2014-03-04 at 13:35 -0800, Paul E. McKenney wrote:
> > On Tue, Mar 04, 2014 at 11:00:32AM -0800, Paul E. McKenney wrote:
> > > On Mon, Mar 03, 2014 at 09:46:19PM +0100, Torvald Riegel wrote:
> > > > xagsmtp2.20140303204700.3...@vmsdvma.vnet.ibm.com
> > > > X-Xagent-Gateway: vmsdvma.vnet.ibm.com (XAGSMTP2 at VMSDVMA)
> > > > 
> > > > On Mon, 2014-03-03 at 11:20 -0800, Paul E. McKenney wrote:
> > > > > On Mon, Mar 03, 2014 at 07:55:08PM +0100, Torvald Riegel wrote:
> > > > > > xagsmtp2.20140303190831.9...@uk1vsc.vnet.ibm.com
> > > > > > X-Xagent-Gateway: uk1vsc.vnet.ibm.com (XAGSMTP2 at UK1VSC)
> > > > > > 
> > > > > > On Fri, 2014-02-28 at 16:50 -0800, Paul E. McKenney wrote:
> > > > > > > +oDo not use the results from the boolean "&&" and "||" 
> > > > > > > when
> > > > > > > + dereferencing.  For example, the following (rather improbable)
> > > > > > > + code is buggy:
> > > > > > > +
> > > > > > > + int a[2];
> > > > > > > + int index;
> > > > > > > + int force_zero_index = 1;
> > > > > > > +
> > > > > > > + ...
> > > > > > > +
> > > > > > > + r1 = rcu_dereference(i1)
> > > > > > > + r2 = a[r1 && force_zero_index];  /* BUGGY!!! */
> > > > > > > +
> > > > > > > + The reason this is buggy is that "&&" and "||" are often 
> > > > > > > compiled
> > > > > > > + using branches.  While weak-memory machines such as ARM or 
> > > > > > > PowerPC
> > > > > > > + do order stores after such branches, they can speculate loads,
> > > > > > > + which can result in misordering bugs.
> > > > > > > +
> > > > > > > +oDo not use the results from relational operators ("==", 
> > > > > > > "!=",
> > > > > > > + ">", ">=", "<", or "<=") when dereferencing.  For example,
> > > > > > > + the following (quite strange) code is buggy:
> > > > > > > +
> > > > > > > + int a[2];
> > > > > > > + int index;
> > > > > > > + int flip_index = 0;
> > > > > > > +
> > > > > > > + ...
> > > > > > > +
> > > > > > > + r1 = rcu_dereference(i1)
> > > > > > > + r2 = a[r1 != flip_index];  /* BUGGY!!! */
> > > > > > > +
> > > > > > > + As before, the reason this is buggy is that relational operators
> > > > > > > + are often compiled using branches.  And as before, although
> > > > > > > + weak-memory machines such as ARM or PowerPC do order stores
> > > > > > > + after such branches, but can speculate loads, which can again
> > > > > > > + result in misordering bugs.
> > > > > > 
> > > > > > Those two would be allowed by the wording I have recently proposed,
> > > > > > AFAICS.  r1 != flip_index would result in two possible values 
> > > > > > (unless
> > > > > > there are further constraints due to the type of r1 and the values 
> > > > > > that
> > > > > > flip_index can have).
> > > > > 
> > > > > And I am OK with the value_dep_preserving type providing more/better
> > > > > guarantees than we get by default from current compilers.
> > > > > 
> > > > > One question, though.  Suppose that the code did not want a value
> > > > > dependency to be tracked through a comparison operator.  What does
> > > > > the developer do in that case?  (The reason I ask is that I have
> > > > > not yet found a use case in the Linux kernel that expects a value
> > > > > dependency to be tracked through a comparison.)
> > > > 
> > > > Hmm.  I suppose use an explicit cast to non-vdp before or after the
> > > > comparison?
> > > 
> > > That should work well assuming that things like "if", "while", and "?:"
> > > conditions are happy to take a vdp.  This assumes that p->a only returns
> > > vdp if field "a" is declared vdp, otherwise we have vdps running wild
> > > through the program.  ;-)
> > > 
> > > The other thing that can happen is that a vdp can get handed off to
> > > another synchronization mechanism, for example, to reference counting:
> > > 
> > >   p = atomic_load_explicit(&gp, memory_order_consume);
> > >   if (do_something_with(p->a)) {
> > >   /* fast path protected by RCU. */
> > >   return 0;
> > >   }
> > >   if (atomic_inc_not_zero(&p->refcnt) {
> > >   /* slow path protected by reference counting. */
> > >   return do_something_else_with((struct foo *)p);  /* CHANGE */
> > >   }
> > >   /* Needed slow path, but raced with deletion. */
> > >   return -EAGAIN;
> > > 
> > > I am guessing that the cast ends the vdp.  Is that the case?
> > 
> > And here is a more elaborate example from the Linux kernel:
> > 
> > struct md_rdev value_dep_preserving *rdev;  /* CHANGE */
> > 
> > rdev = rcu_dereference(conf->mirrors[disk].rdev);
> > if (r1_bio->bios[disk] == IO_BLOCKED
> > || rdev == NULL
> > || test_bit(Unmerged, &rdev->flags)
> > || test_bit(Faulty, &rdev->flags))
> > continue;
> > 
> > The fact that the "rdev == NULL" returns vdp does not force the "||"
> > operators to be evaluated arithmeti

Re: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-05 Thread Richard Sandiford
Matthew Fortune  writes:
> Are you're OK with automatically selecting fpxx if no -mfp option, no
> .module and no .gnu_attribute exists? Such code would currently end up
> as FP ABI Any even if FP code was present, I don't suppose anything
> would get worse if this existing behaviour simply continued though.

The -mfp setting is usually implied by the -mabi setting.  I don't
think we should change that.  Since this is a new mode, and since
the fpxx markup will be available from the start, everyone using
fpxx should say so explicitly.

E.g. maybe the rules should be:

(1) Any explicit .gnu_attribute 4 is always used, although we might
give a diagnostic if it's incompatible with the module-level setting.

(2) Otherwise, if the code does not use FP then the attribute is left
at the default of 0.

(3) Otherwise, a nonzero .gnu_attribute 4 is implied from the module-level
setting.

(4) For compatibility, -mabi=32 continues to imply -mfp32.  fpxx mode must
be selected explicitly.

Which was supposed to be simple, but maybe isn't so much.

Thanks,
Richard


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-03-05 Thread Peter Sewell
On 5 March 2014 17:15, Torvald Riegel  wrote:
> On Tue, 2014-03-04 at 22:11 +, Peter Sewell wrote:
>> On 3 March 2014 20:44, Torvald Riegel  wrote:
>> > On Sun, 2014-03-02 at 04:05 -0600, Peter Sewell wrote:
>> >> On 1 March 2014 08:03, Paul E. McKenney  
>> >> wrote:
>> >> > On Sat, Mar 01, 2014 at 04:06:34AM -0600, Peter Sewell wrote:
>> >> >> Hi Paul,
>> >> >>
>> >> >> On 28 February 2014 18:50, Paul E. McKenney 
>> >> >>  wrote:
>> >> >> > On Thu, Feb 27, 2014 at 12:53:12PM -0800, Paul E. McKenney wrote:
>> >> >> >> On Thu, Feb 27, 2014 at 11:47:08AM -0800, Linus Torvalds wrote:
>> >> >> >> > On Thu, Feb 27, 2014 at 11:06 AM, Paul E. McKenney
>> >> >> >> >  wrote:
>> >> >> >> > >
>> >> >> >> > > 3.  The comparison was against another RCU-protected 
>> >> >> >> > > pointer,
>> >> >> >> > > where that other pointer was properly fetched using one
>> >> >> >> > > of the RCU primitives.  Here it doesn't matter which 
>> >> >> >> > > pointer
>> >> >> >> > > you use.  At least as long as the rcu_assign_pointer() 
>> >> >> >> > > for
>> >> >> >> > > that other pointer happened after the last update to the
>> >> >> >> > > pointed-to structure.
>> >> >> >> > >
>> >> >> >> > > I am a bit nervous about #3.  Any thoughts on it?
>> >> >> >> >
>> >> >> >> > I think that it might be worth pointing out as an example, and 
>> >> >> >> > saying
>> >> >> >> > that code like
>> >> >> >> >
>> >> >> >> >p = atomic_read(consume);
>> >> >> >> >X;
>> >> >> >> >q = atomic_read(consume);
>> >> >> >> >Y;
>> >> >> >> >if (p == q)
>> >> >> >> > data = p->val;
>> >> >> >> >
>> >> >> >> > then the access of "p->val" is constrained to be data-dependent on
>> >> >> >> > *either* p or q, but you can't really tell which, since the 
>> >> >> >> > compiler
>> >> >> >> > can decide that the values are interchangeable.
>> >> >> >> >
>> >> >> >> > I cannot for the life of me come up with a situation where this 
>> >> >> >> > would
>> >> >> >> > matter, though. If "X" contains a fence, then that fence will be a
>> >> >> >> > stronger ordering than anything the consume through "p" would
>> >> >> >> > guarantee anyway. And if "X" does *not* contain a fence, then the
>> >> >> >> > atomic reads of p and q are unordered *anyway*, so then whether 
>> >> >> >> > the
>> >> >> >> > ordering to the access through "p" is through p or q is kind of
>> >> >> >> > irrelevant. No?
>> >> >> >>
>> >> >> >> I can make a contrived litmus test for it, but you are right, the 
>> >> >> >> only
>> >> >> >> time you can see it happen is when X has no barriers, in which case
>> >> >> >> you don't have any ordering anyway -- both the compiler and the CPU 
>> >> >> >> can
>> >> >> >> reorder the loads into p and q, and the read from p->val can, as 
>> >> >> >> you say,
>> >> >> >> come from either pointer.
>> >> >> >>
>> >> >> >> For whatever it is worth, hear is the litmus test:
>> >> >> >>
>> >> >> >> T1:   p = kmalloc(...);
>> >> >> >>   if (p == NULL)
>> >> >> >>   deal_with_it();
>> >> >> >>   p->a = 42;  /* Each field in its own cache line. */
>> >> >> >>   p->b = 43;
>> >> >> >>   p->c = 44;
>> >> >> >>   atomic_store_explicit(&gp1, p, memory_order_release);
>> >> >> >>   p->b = 143;
>> >> >> >>   p->c = 144;
>> >> >> >>   atomic_store_explicit(&gp2, p, memory_order_release);
>> >> >> >>
>> >> >> >> T2:   p = atomic_load_explicit(&gp2, memory_order_consume);
>> >> >> >>   r1 = p->b;  /* Guaranteed to get 143. */
>> >> >> >>   q = atomic_load_explicit(&gp1, memory_order_consume);
>> >> >> >>   if (p == q) {
>> >> >> >>   /* The compiler decides that q->c is same as p->c. */
>> >> >> >>   r2 = p->c; /* Could get 44 on weakly order system. */
>> >> >> >>   }
>> >> >> >>
>> >> >> >> The loads from gp1 and gp2 are, as you say, unordered, so you get 
>> >> >> >> what
>> >> >> >> you get.
>> >> >> >>
>> >> >> >> And publishing a structure via one RCU-protected pointer, updating 
>> >> >> >> it,
>> >> >> >> then publishing it via another pointer seems to me to be asking for
>> >> >> >> trouble anyway.  If you really want to do something like that and 
>> >> >> >> still
>> >> >> >> see consistency across all the fields in the structure, please put 
>> >> >> >> a lock
>> >> >> >> in the structure and use it to guard updates and accesses to those 
>> >> >> >> fields.
>> >> >> >
>> >> >> > And here is a patch documenting the restrictions for the current 
>> >> >> > Linux
>> >> >> > kernel.  The rules change a bit due to rcu_dereference() acting a bit
>> >> >> > differently than atomic_load_explicit(&p, memory_order_consume).
>> >> >> >
>> >> >> > Thoughts?
>> >> >>
>> >> >> That might serve as informal documentation for linux kernel
>> >> >> programmers about the bounds on the optimisations that you expect
>> >> >> compilers to do for common-case RCU code - and I guess that's what you
>> >> >> intend it to be for.   But

RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-05 Thread Matthew Fortune
Richard Sandiford  writes:
> Matthew Fortune  writes:
> > Are you're OK with automatically selecting fpxx if no -mfp option, no
> > .module and no .gnu_attribute exists? Such code would currently end up
> > as FP ABI Any even if FP code was present, I don't suppose anything
> > would get worse if this existing behaviour simply continued though.
> 
> The -mfp setting is usually implied by the -mabi setting.  I don't think
> we should change that.  Since this is a new mode, and since the fpxx
> markup will be available from the start, everyone using fpxx should say
> so explicitly.
> 
> E.g. maybe the rules should be:
> 
> (1) Any explicit .gnu_attribute 4 is always used, although we might
> give a diagnostic if it's incompatible with the module-level
> setting.
> 
> (2) Otherwise, if the code does not use FP then the attribute is left
> at the default of 0.
> 
> (3) Otherwise, a nonzero .gnu_attribute 4 is implied from the module-
> level
> setting.
> 
> (4) For compatibility, -mabi=32 continues to imply -mfp32.  fpxx mode
> must
> be selected explicitly.
> 
> Which was supposed to be simple, but maybe isn't so much.

This sounds OK. I'd rather (4) permitted transition to fpxx for 'safe' FP code 
but let's see if we can do without it. Setjmp/longjmp are the only obvious 
candidates for using FP code in assembly files and these need to transition to 
fpxx.

The glibc implementation of setjmp/longjmp is in C so the new defaults from the 
compiler will lead to this being fpxx as -mips32r2 will imply -mfpxx so that is 
OK, these modules will be tagged as fpxx.

Currently newlib's implementation is assembly code with no .gnu_attributes. 
Under the rules above this would start to be implicitly tagged as gnu_attribute 
4,1 (fp32). Any thoughts on how we transition this to fpxx and still have the 
modules buildable with old tools as well? I'm not sure if it will be acceptable 
to say that it has to be rewritten in C.

There will also be uclibc and bionic to deal with too for setjmp/longjmp but I 
don't have their source to hand.

Regards,
Matthew


Re: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-05 Thread Richard Sandiford
Matthew Fortune  writes:
> Richard Sandiford  writes:
>> Matthew Fortune  writes:
>> > Are you're OK with automatically selecting fpxx if no -mfp option, no
>> > .module and no .gnu_attribute exists? Such code would currently end up
>> > as FP ABI Any even if FP code was present, I don't suppose anything
>> > would get worse if this existing behaviour simply continued though.
>> 
>> The -mfp setting is usually implied by the -mabi setting.  I don't think
>> we should change that.  Since this is a new mode, and since the fpxx
>> markup will be available from the start, everyone using fpxx should say
>> so explicitly.
>> 
>> E.g. maybe the rules should be:
>> 
>> (1) Any explicit .gnu_attribute 4 is always used, although we might
>> give a diagnostic if it's incompatible with the module-level
>> setting.
>> 
>> (2) Otherwise, if the code does not use FP then the attribute is left
>> at the default of 0.
>> 
>> (3) Otherwise, a nonzero .gnu_attribute 4 is implied from the module-
>> level
>> setting.
>> 
>> (4) For compatibility, -mabi=32 continues to imply -mfp32.  fpxx mode
>> must
>> be selected explicitly.
>> 
>> Which was supposed to be simple, but maybe isn't so much.
>
> This sounds OK. I'd rather (4) permitted transition to fpxx for 'safe'
> FP code but let's see if we can do without it. Setjmp/longjmp are the
> only obvious candidates for using FP code in assembly files and these
> need to transition to fpxx.
>
> The glibc implementation of setjmp/longjmp is in C so the new defaults
> from the compiler will lead to this being fpxx as -mips32r2 will imply
> -mfpxx so that is OK, these modules will be tagged as fpxx.

Hmm, I don't think -mips32r2 should make any difference here.
You've specified it so that fpxx will work with MIPS II and above
and I'd prefer not to have an architecture option implicitly changing
the ABI.  (They sometimes did in the long-distant past but it just
led to confusion.)

I think instead we should have a configuration switch that allows
a particular -mfp option to be inserted alongside -mabi=32 if no explicit
-mfp is given.  This is how most --with options work.  Maybe
--with-fp-32={32|64|xx}?  Specific triples could set a default value if
they like.  E.g. the MTI, SDE and mipsisa* ones would probably want to
default to --with-32-fp=xx.  Triples aimed at MIPS IV and below would
stay as they are.  (MIPS IV is sometimes used with -mabi=32.)

--with-fp-32 isn't the greatest name but is at least consistent with
--with-arch-32 and -mabi=32.  Maybe --with-fp-32=64 is so weird
that breaking consistency is better though.

> Currently newlib's implementation is assembly code with no
> .gnu_attributes. Under the rules above this would start to be implicitly
> tagged as gnu_attribute 4,1 (fp32). Any thoughts on how we transition
> this to fpxx and still have the modules buildable with old tools as
> well? I'm not sure if it will be acceptable to say that it has to be
> rewritten in C.

If it's assembled as -mfpxx then it'll be implicitly tagged with the
new .gnu_attribute rather than 4,1.  If it's not assembled as -mfpxx
then 4,1 would be the right choice.

Thanks,
Richard


RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-05 Thread Matthew Fortune
Richard Sandiford  writes:
> Matthew Fortune  writes:
> > Richard Sandiford  writes:
> >> Matthew Fortune  writes:
> >> > Are you're OK with automatically selecting fpxx if no -mfp option,
> >> > no .module and no .gnu_attribute exists? Such code would currently
> >> > end up as FP ABI Any even if FP code was present, I don't suppose
> >> > anything would get worse if this existing behaviour simply
> continued though.
> >>
> >> The -mfp setting is usually implied by the -mabi setting.  I don't
> >> think we should change that.  Since this is a new mode, and since the
> >> fpxx markup will be available from the start, everyone using fpxx
> >> should say so explicitly.
> >>
> >> E.g. maybe the rules should be:
> >>
> >> (1) Any explicit .gnu_attribute 4 is always used, although we might
> >> give a diagnostic if it's incompatible with the module-level
> >> setting.
> >>
> >> (2) Otherwise, if the code does not use FP then the attribute is left
> >> at the default of 0.
> >>
> >> (3) Otherwise, a nonzero .gnu_attribute 4 is implied from the module-
> >> level
> >> setting.
> >>
> >> (4) For compatibility, -mabi=32 continues to imply -mfp32.  fpxx mode
> >> must
> >> be selected explicitly.
> >>
> >> Which was supposed to be simple, but maybe isn't so much.
> >
> > This sounds OK. I'd rather (4) permitted transition to fpxx for 'safe'
> > FP code but let's see if we can do without it. Setjmp/longjmp are the
> > only obvious candidates for using FP code in assembly files and these
> > need to transition to fpxx.
> >
> > The glibc implementation of setjmp/longjmp is in C so the new defaults
> > from the compiler will lead to this being fpxx as -mips32r2 will imply
> > -mfpxx so that is OK, these modules will be tagged as fpxx.
> 
> Hmm, I don't think -mips32r2 should make any difference here.
> You've specified it so that fpxx will work with MIPS II and above and
> I'd prefer not to have an architecture option implicitly changing the
> ABI.  (They sometimes did in the long-distant past but it just led to
> confusion.)

I didn't mean to single out mips32r2 here it applies equally to anything except 
mips1 with O32. 

> I think instead we should have a configuration switch that allows a
> particular -mfp option to be inserted alongside -mabi=32 if no explicit
> -mfp is given.  This is how most --with options work.  Maybe --with-fp-
> 32={32|64|xx}?  Specific triples could set a default value if they like.
> E.g. the MTI, SDE and mipsisa* ones would probably want to default to --
> with-32-fp=xx.  Triples aimed at MIPS IV and below would stay as they
> are.  (MIPS IV is sometimes used with -mabi=32.)
> 
> --with-fp-32 isn't the greatest name but is at least consistent with
> --with-arch-32 and -mabi=32.  Maybe --with-fp-32=64 is so weird that
> breaking consistency is better though.

Tying the use of fpxx by default to a configure time setting is OK with me. 
When enabled it would still have to follow the rules as defined in the design 
in that it can only apply to architectures that can support the variant. 
Currently that means everything but mips1. I'm not sure this is the same as 
tying an ABI to an architecture as both fp32 and fpxx are O32 and link 
compatible. Perhaps the configure switch would be --with-o32-fp={32|64|xx}. 
This shows it is just an O32 related setting.

> > Currently newlib's implementation is assembly code with no
> > .gnu_attributes. Under the rules above this would start to be
> > implicitly tagged as gnu_attribute 4,1 (fp32). Any thoughts on how we
> > transition this to fpxx and still have the modules buildable with old
> > tools as well? I'm not sure if it will be acceptable to say that it
> > has to be rewritten in C.
> 
> If it's assembled as -mfpxx then it'll be implicitly tagged with the new
> .gnu_attribute rather than 4,1.  If it's not assembled as -mfpxx then
> 4,1 would be the right choice.

So this would be dependent on the build system ensuring -mfpxx is passed as 
appropriate if the toolchain supports it. There is some risk in this too if the 
existing code (which I know is not fpxx safe) gets built with a new toolchain 
then it will be tagged as fpxx. I wonder if this tells us that command line 
options cannot safely set the FP ABI away from the default. Instead only the 
.module and .gnu_attribute can set it as only the source code can know what FP 
mode it was written for. The change to your 4 points above would be that the 
module level setting is not impacted by the command line -mfp options.

This would then require us to have an explicit attribute in the source to 
select fpxx which would need to be optionally included dependent on assembler 
support for .module. (The relaxation would have helped here of course.)

> Thanks,
> Richard


Re: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-05 Thread Richard Sandiford
Matthew Fortune  writes:
>> I think instead we should have a configuration switch that allows a
>> particular -mfp option to be inserted alongside -mabi=32 if no explicit
>> -mfp is given.  This is how most --with options work.  Maybe --with-fp-
>> 32={32|64|xx}?  Specific triples could set a default value if they like.
>> E.g. the MTI, SDE and mipsisa* ones would probably want to default to --
>> with-32-fp=xx.  Triples aimed at MIPS IV and below would stay as they
>> are.  (MIPS IV is sometimes used with -mabi=32.)
>> 
>> --with-fp-32 isn't the greatest name but is at least consistent with
>> --with-arch-32 and -mabi=32.  Maybe --with-fp-32=64 is so weird that
>> breaking consistency is better though.
>
> Tying the use of fpxx by default to a configure time setting is OK with
> me. When enabled it would still have to follow the rules as defined in
> the design in that it can only apply to architectures that can support
> the variant.

Right.  It's really equivalent to putting the -mfp on every command line
that doesn't have one.

> Currently that means everything but mips1.

Yeah, using -mips1 on a --with-{o}32-fp=xx toolchain would be an error.

> I'm not sure this is the same as tying an ABI to an architecture as
> both fp32 and fpxx are O32 and link compatible. Perhaps the configure
> switch would be --with-o32-fp={32|64|xx}. This shows it is just an O32
> related setting.

What I meant is that -march= and -mips shouldn't imply a different
-mfp setting.  The -mfp setting should be self-contained and it should
be an error if the architecture isn't compatible.

We might be in violent agreement here :-)  Like I say, I was just a
bit worried by the earlier -mips32r2 thing because there was a time
when a -mips option really could imply things like -mabi, -mgp and -mfp.

--with-o32-fp would be OK with me.  I'm just worried about the ABI
being spelt differently from -mabi=, but there's probably no perfect
alternative.

>> > Currently newlib's implementation is assembly code with no
>> > .gnu_attributes. Under the rules above this would start to be
>> > implicitly tagged as gnu_attribute 4,1 (fp32). Any thoughts on how we
>> > transition this to fpxx and still have the modules buildable with old
>> > tools as well? I'm not sure if it will be acceptable to say that it
>> > has to be rewritten in C.
>> 
>> If it's assembled as -mfpxx then it'll be implicitly tagged with the new
>> .gnu_attribute rather than 4,1.  If it's not assembled as -mfpxx then
>> 4,1 would be the right choice.
>
> So this would be dependent on the build system ensuring -mfpxx is passed
> as appropriate if the toolchain supports it. There is some risk in this
> too if the existing code (which I know is not fpxx safe) gets built with
> a new toolchain then it will be tagged as fpxx. I wonder if this tells
> us that command line options cannot safely set the FP ABI away from the
> default. Instead only the .module and .gnu_attribute can set it as only
> the source code can know what FP mode it was written for. The change to
> your 4 points above would be that the module level setting is not
> impacted by the command line -mfp options.

I don't think that's necessary.  For one thing, there's always the
problem with unannotated asm code that the command-line options might
be wrong.  There's often not much we can do about that.  E.g. we have to
assume that code assembled as -mabi=n32 really is n32 code and not n64
code (and produce a 32-bit rather than 64-bit ELF).  There's no way of
hedging our bets in that case: we have to pick an ABI.

For another, leaving the attribute as the default 0 makes the object
compatible with everything, so a file assembled with the wrong -mfpyy
could still be linked with other -mfpyy files.  I don't think it gives
us anything extra.

The interaction between .module and command-line options should be the
same for all .module/.set options.

Thanks,
Richard


exposed pipeline

2014-03-05 Thread shmeel gutl
For the 4.7 branch I only saw one architecture using exposed pipeline. 
Is there any documentation on the quality of exposed pipeline support? 
Does the back-end need to do anything special to deal with jumps and 
returns from calls?


Thanks
Shmeel



Re: Request for discussion: Rewrite of inline assembler docs

2014-03-05 Thread dw


On 3/3/2014 3:36 AM, Richard Sandiford wrote:

dw  writes:

On 2/27/2014 11:32 PM, Richard Sandiford wrote:

dw  writes:

On 2/27/2014 4:11 AM, Richard Sandiford wrote:

Andrew Haley  writes:

Over the years there has been a great deal of traffic on these lists
caused by misunderstandings of GCC's inline assembler.  That's partly
because it's inherently tricky, but the existing documentation needs
to be improved.

dw  has done a fairly thorough reworking of
the documentation.  I've helped a bit.

Section 6.41 of the GCC manual has been rewritten.  It has become:

6.41 How to Use Inline Assembly Language in C Code
6.41.1 Basic Asm - Assembler Instructions with No Operands
6.41.2 Extended Asm - Assembler Instructions with C Expression Operands

We could simply post the patch to GCC-patches and have at it, but I
think it's better to discuss the document here first.  You can read it
at

http://www.LimeGreenSocks.com/gcc/Basic-Asm.html
http://www.LimeGreenSocks.com/gcc/Extended-Asm.html
http://www.LimeGreenSocks.com/gcc/extend04.zip (contains .texi, .patch,
and affected html pages)

All comments are very welcome.

Thanks for doing this, looks like a big improvement.

Thanks, I did my best.  I appreciate you taking the time to review them.


A couple of comments:

The section on basic asms says:

 Do not expect a sequence of asm statements to remain perfectly
 consecutive after compilation. To ensure that assembler instructions
 maintain their order, use a single asm statement containing multiple
 instructions. Note that GCC's optimizer can move asm statements
 relative to other code, including across jumps.

The "maintain their order" might be a bit misleading, since volatile asms
(including basic asms) must always be executed in the original order.
Maybe this was meaning placement/address order instead?

This statement is based on this text from the existing docs:

"Similarly, you can't expect a sequence of volatile |asm| instructions
to remain perfectly consecutive. If you want consecutive output, use a
single |asm|."

I do not dispute what you are saying.  I just want to confirm that the
existing docs are incorrect before making a change.  Also, see Andi's
response re -fno-toplevel-reorder.

It seems to me that recommending "single statement" is both the
clearest, and the safest approach here.  But I'm prepared to change my
mind if there is consensus I should.

Right.  I agree with that part.  I just thought that the "maintain their
order" could be misunderstood as meaning execution order, whereas I think
both sentences of the original docs were talking about being "perfectly
consecutive" (which to me means "there are no other instructions inbetween").

Hmm.  I'm not seeing the differences here that you do.

Well, like you say, things can be moved across branches.  So, although
this is a very artificial example:

  asm ("x");
  asm ("y");

could become:

  goto bar;

foo:
  asm ("y");
  ...

bar:
  asm ("x");
  goto foo;

This has reordered the instructions in the sense that they have a
different order in memory.  But they are still _executed_ in the same
order.  Actually reordering the execution would be a serious bug.

So I just want to avoid anything that gives the impression that "y" can
be executed before "x" in this example.  I still think:


Since the existing docs say "GCC's optimizer can move asm statements
relative to other code", how would you feel about:

"Do not expect a sequence of |asm| statements to remain perfectly
consecutive after compilation. If you want to stop the compiler from
reordering or inserting anything into a sequence of assembler
instructions, use a single |asm| statement containing multiple
instructions. Note that GCC's optimizer can move |asm| statements
relative to other code, including across jumps."

...this gives the impression that we might try to execute volatiles
in a different order.


Ahh!  Ok, I see what you mean.  Hmm.  Based on the description of 
"no-toplevel-reorder", I assumed that it actually *might* re-order them.


So, more like:

"GCC's optimizer can move asm statements relative to other
code, including across jumps.  This has implications for code
that contains a sequence of asm statements.  While the execution
order of asm statements will be preserved, do not expect the sequence of asm
statements to remain perfectly consecutive in the compiler's output.
To ensure that assembler instructions maintain their order, use a
single asm statement containing multiple instructions."



It might also be
worth mentioning that the number of instances of an asm in the output
may be different from the input.  (Can it increase as well as decrease?
I'm not sure off-hand, but probably yes.)

So, in the volatile section, how about something like this for decrease:

"GCC does not delete a volatile |asm| if it is reachable, but may delete
it if it can prove that control flow never reaches the location of the
instruction."

It'