date:20140322

Re: Request for discussion: Rewrite of inline assembler docs

2014-03-22 Thread dw



On 3/21/2014 2:57 AM, James Greenhalgh wrote:

On Thu, Feb 27, 2014 at 11:07:21AM +, Andrew Haley wrote:

Over the years there has been a great deal of traffic on these lists
caused by misunderstandings of GCC's inline assembler.  That's partly
because it's inherently tricky, but the existing documentation needs
to be improved.

dw  has done a fairly thorough reworking of
the documentation.  I've helped a bit.

Section 6.41 of the GCC manual has been rewritten.  It has become:

6.41 How to Use Inline Assembly Language in C Code
6.41.1 Basic Asm - Assembler Instructions with No Operands
6.41.2 Extended Asm - Assembler Instructions with C Expression Operands

We could simply post the patch to GCC-patches and have at it, but I
think it's better to discuss the document here first.  You can read it
at

This documentation looks like a huge improvement.


Thanks, I've worked hard to make it so.


As the discussion here seems to have stalled, perhaps it is time to propose
the patch to gcc-patches?


Sorry, I wanted to make sure the people who were discussing it had a 
chance to finish responding, then I got caught up in other projects.  
I'll try to roll the comments into the docs this weekend.



I'm certainly keen to see this make it to trunk, the increase in clarity
is substantial.


Exactly so.  Trying to figure this all out from the existing docs drove 
me crazy.  That's what motivated me to fix this.  My goal is to make it 
so the next guy who has to struggle with this has an easier time than I did.



Thanks,
James

Re: [RFC, MIPS] Relax NaN rules

2014-03-22 Thread Richard Sandiford

Matthew Fortune  writes:
> Thanks Joseph. I guess I'm not really pushing to have don't-care
> supported as it would take a lot of effort to determine when code does
> and does not care, you rightly point out more cases to deal with
> too. I'm not sure if the benefit would then be worth it or not as there
> would still be modules which do and do not care about old and new NaNs
> so it doesn't really relieve any pressure on toolchains or linux
> distributions. The second part of the proposal is more
> interesting/useful as it is saying I don't care about the impact of
> getting NaN encoding wrong and a tools vendor/linux distribution then
> gets to make that choice. Any comments on that aspect?

Maybe it's just me, but I don't understand your use case for (2).
If 99% of users don't care about the different NaN encodings then
why would they use a different -mnan setting from the default?
Are you worried about potential future processors that only support
2008 NaNs?  If so, then (a) you give good reasons why that seems like a
misfeature and (b) I think we should instead tackle it by continuing
to allow both -mnan settings for all processors.  I.e. we simply wouldn't
add any logic to check "is this processor/architecture compatible with this
encoding?".

Thanks,
Richard

Re: [RFC][ARM] Naming for new switch to check for mixed hardfloat/softfloat compat

2014-03-22 Thread Richard Sandiford

"Thomas Preud'homme"  writes:
>> From: Richard Sandiford [mailto:rdsandif...@googlemail.com]
>> 
>> "Thomas Preud'homme"  writes:
>> 
>> -mno-float causes gcc to define the macro __mips_no_float, which the
>> implementation can use when deciding whether to bother handling %f, etc.
>> I'm afraid there's nothing more sophisticated to it than that.
>> 
>
> So the libc needs to be compiled with -mno-float as well so that printf and
> scanf drop the float handling? 

Yes.  The configurations that support -mno-float have separate -mno-float
multilibs.  In a sense, the point of -mno-float instead of -msoft-float is
to select these cut-down libraries.

>> A more general restriction would be "must pass arguments in the same
>> way for both option set A and option set B".  That too could be done
>> using existing hooks and SWITCHABLE_TARGET, although it might not be
>> hyper-efficient.
>
> I don't know how exactly that would work. You would switch twice for
> each function and ask the register used for each function call?

Yeah, that's the idea.  More generally, run initialize_argument_information
once for each ABI and compare the two arg_data arrays.  Similarly for
function.c.

Thanks,
Richard

Re: Request for discussion: Rewrite of inline assembler docs

2014-03-22 Thread Richard Sandiford

Sorry for the slow response.

dw  writes:
> On 3/3/2014 3:36 AM, Richard Sandiford wrote:
>> Well, like you say, things can be moved across branches.  So, although
>> this is a very artificial example:
>>
>>   asm ("x");
>>   asm ("y");
>>
>> could become:
>>
>>   goto bar;
>>
>> foo:
>>   asm ("y");
>>   ...
>>
>> bar:
>>   asm ("x");
>>   goto foo;
>>
>> This has reordered the instructions in the sense that they have a
>> different order in memory.  But they are still _executed_ in the same
>> order.  Actually reordering the execution would be a serious bug.
>>
>> So I just want to avoid anything that gives the impression that "y" can
>> be executed before "x" in this example.  I still think:
>>
>>> Since the existing docs say "GCC's optimizer can move asm statements
>>> relative to other code", how would you feel about:
>>>
>>> "Do not expect a sequence of |asm| statements to remain perfectly
>>> consecutive after compilation. If you want to stop the compiler from
>>> reordering or inserting anything into a sequence of assembler
>>> instructions, use a single |asm| statement containing multiple
>>> instructions. Note that GCC's optimizer can move |asm| statements
>>> relative to other code, including across jumps."
>> ...this gives the impression that we might try to execute volatiles
>> in a different order.
>
> Ahh!  Ok, I see what you mean.  Hmm.  Based on the description of 
> "no-toplevel-reorder", I assumed that it actually *might* re-order them.

Well, -fno-toplevel-reorder only applies to asms outside functions,
where there's no execution order as such.  Top-level asms are treated
more like function definitions.

> "GCC's optimizer can move asm statements relative to other
> code, including across jumps.  This has implications for code
> that contains a sequence of asm statements.  While the execution
> order of asm statements will be preserved, do not expect the sequence of asm
> statements to remain perfectly consecutive in the compiler's output.

This part sounds good to me FWIW.

> To ensure that assembler instructions maintain their order, use a
> single asm statement containing multiple instructions."

I'm still unsure about "maintain their order" here though.  How about
something like:

  If certain instructions need to remain consecutive in the output,
  put them in a single multi-instruction asm statement.

>> In the extended section:
>>
>>  Unless an output operand has the '&' constraint modifier (see
>>  Modifiers), GCC may allocate it in the same register as an unrelated
>>  input operand, [...]
>>
>> It could also use it for addresses in other (memory) outputs.
> Ok.  But I'm not sure this really adds anything.  Having warned people
> that the register may be re-used unless '&' is used seems sufficient.
 It matters where it can be reused though.  If you talk about input
 operands only, people might think it is OK to write asms of the form:

  foo tmp,[input0]
  bar [output0],tmp
  frob [output1],tmp
 where output0 is a register and output1 is a memory.  This safely avoids
 using the input operand after assigning to output0, but the address in
 output1 is still live and could be changed by bar.
>>> I'm not sure we're talking about the same problem.  I'm borrowing this
>>> x86 example from someone else:
>>>
>>> static inline char *
>>> lcopy( char *dst, const char *src, long len )
>>> {
>>>  asm(
>>> "shr $3,%2; " /* how many qwords to copy */
>>> "rep movsq; " /* copy that many */
>>> "mov %3,%2; " /* how many bytes to copy */
>>> "rep movsb" /* copy that many */
>>>  : "+D" (dst),  "+S" (src),  "+c" (len)
>>>  :  "r" (len & 7)
>>>  :  "memory");
>>> return dst;
>>> }
>>>
>>> You might expect that  "len" and "len & 7" are two different things.
>>> However if the function is called with a constant less than 8, the
>>> compiler knows that they are actually the same and uses rcx for both,
>>> giving mov rcx,rcx for mov %3,%2 and of course by then rcx is zero.
>>> Using & on len forces the use of two separate registers.
>>>
>>> This seems to me to be a different kind of problem than:
>>>
>>> asm ("xxx": "=r" (x), "=m" (x));
>>>
>>> Or am I missing your point?
>> Well, that code is just one instance of (and a good example of)
>> the principle that GCC assumes all inputs are consumed before any
>> outputs are written.  And the point is that the "inputs" in that
>> description aren't restricted to input operands: they apply to any
>> rvalues in the output operands too.
>>
>> E.g. the same thing could occur for an artificial case like:
>>
>>  asm ("" : "+r" (ptr), "=m" (*x));
>>
>> if GCC realises that x==ptr.  Then the address in operand 1 might
>> be the same as operand 0.  The same goes for:
>>
>>  asm ("" : "=r" (ptr), "=m" (*x) : "0" (ptr));
>>
>> which is really just another way of wri

Re: [RL78] Questions about code-generation

2014-03-22 Thread Richard Hulme


On 22/03/14 01:47, Jeff Law wrote:

On 03/21/14 18:35, DJ Delorie wrote:


I've found that "removing uneeded moves through registers" is
something gcc does poorly in the post-reload optimizers.  I've written
my own on some occasions (for rl78 too).  Perhaps this is a good
starting point to look at?


much needless copying, which strengthens my suspicion that it's
something in the RL78 backend that needs 'tweaking'.


Of course it is, I've said that before I think.  The RL78 uses a
virtual model until reload, then converts each virtual instructions
into multiple real instructions, then optimizes the result.  This is
going to be worse than if the real model had been used throughout
(like arm or x86), but in this case, the real model *can't* be used
throughout, because gcc can't understand it well enough to get through
regalloc and reload.  The RL78 is just to "weird" to be modelled
as-is.

I keep hoping that gcc's own post-reload optimizers would do a better
job, though.  Combine should be able to combine, for example, the "mov
r8,ax; cmp r8,#4" types of insns together.

The virtual register file was the only way I could see to make RL78
work.  I can't recall the details, but when you described the situation
to me the virtual register file was the only way I could see to make the
RL78 work in the IRA+reload world.

What would be quite interesting to try would be to continue to use the
virtualized register set, but instead use the IRA+LRA path.  Presumably
that wouldn't be terribly hard to try and there's a reasonable chance
that'll improve the code in a noticeable way.


Looking at how that's done by other backends, as far as I can tell, I 
just need to add something like:


#undef  TARGET_LRA_P
#define TARGET_LRA_P rl78_enable_lra

static bool
rl78_enable_lra (void)
{
  return true;
}

to rl78.c?  At least in theory, even if other work is needed elsewhere 
to make things run smoothly.


Unfortunately, that function never seems to be called.

How does TARGET_LRA_P get used, anyway?  I can't find anything that 
tries to use it, only places where it gets set.  Is there some funky 
preprocessor stuff going on that's stopping me grepping for it?



The next obvious thing to try, and it's probably a lot more work, would
be to see if IRA+LRA is smart enough (or can be made so with a
reasonable amount of work) to eliminate the virtual register file
completely.

Just to be clear, I'm not planning to work on this; my participation and
interest in the RL78 was limited to providing a few tips to DJ.


And from my side, I'm not trying to get anyone to work on it (though 
obviously I'm not averse to it).  I'm just looking for hints and tips so 
that I can try to understand the causes (and hopefully find some solutions).


Regards,

Richard.

Re: [RL78] Questions about code-generation

2014-03-22 Thread Richard Hulme


On 22/03/14 01:35, DJ Delorie wrote:

Is it possible that the virtual pass causes inefficiencies in some
cases by sticking with r8-r31 when one of the 'normal' registers
would be better?


That's not a fair question to ask, since the virtual pass can *only*
use r8-r31.  The first bank has to be left alone else the
devirtualizer becomes a few orders of magnitude harder, if not
impossible, to make work correctly.


What I meant was that because the virtual pass can only use r8-r31, it's 
causing unnecessary register moves to be generated because it chooses, 
say, r8 as the register for a byte compare.  Because r8 is a *valid* 
register to use with a byte compare, it sticks with it come what may and 
then causes additional instructions to be generated to make sure that 
the result to be compared definitely ends up in r8, even if the register 
the result was in is also valid for a byte compare operation.



much needless copying, which strengthens my suspicion that it's
something in the RL78 backend that needs 'tweaking'.


Of course it is, I've said that before I think.  The RL78 uses a
virtual model until reload, then converts each virtual instructions
into multiple real instructions, then optimizes the result.  This is


It may be obvious to you and everyone else on this list that it's the 
backend that needs tweaking but I've only been looking at the compiler 
internals for a couple of weeks, so unfortunately it's not obvious to me.


I'm not complaining or pointing fingers or anything like that.  I'm just 
trying to understand the reasons why things are the way they are - what 
things are happening in the backend, what's happening in the 'generic' 
part and the interactions between them.


I understand that it's easy to say 'This is what the compiler's 
generating.  That's stupid.  It should be generating this', which is why 
I'm trying to understand the reasons that cause the compiler to generate 
what it's generating.



going to be worse than if the real model had been used throughout
(like arm or x86), but in this case, the real model *can't* be used
throughout, because gcc can't understand it well enough to get through
regalloc and reload.  The RL78 is just to "weird" to be modelled
as-is.


Can you explain what is too weird about it in particular?  It certainly 
has restrictions on which registers can be used with various 
instructions but I wouldn't have thought they were that complicated that 
they couldn't be described using the normal constraints?


Regards,

Richard.

Re: GCC 4.9.0 Status Report (2014-03-13)

2014-03-22 Thread Klaus Rudolph

I want to ask how I can find the bugs in bugzilla which are listed in
the "Quality Data" Table. It feels that there are more bugs which are
not listed. For example:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57694
Actually the compiler returns "not implemented" while compiling the
given example code but it seems that this bug is not tracked in the
actual quality data.

my question is:
Which is the selection criteria that a bug is related to the release.
What is maybe the missing information in a bug report to add the bug to
that list.
Can I help by give some more example code or any other "simple" things
to support the process a bit?

Regards
 Klaus




> Status
> ==
> 
> The trunk is still in Stage 4, which means only patches fixing regressions
> and documentation issues are appropriate.
> 
> Comparing to last year's status reports, we are something in between a
> fortnight and month behind the last year's schedule, but if enough attention
> is given to the remaining P1 blockers, we could still release around the
> beginning of April.
> 
> The list of secondary architectures has changed recently, so to remind people
> I'm including it here:
> 
> The primary platforms are:
> 
> arm-linux-gnueabi
> i386-unknown-freebsd
> i686-pc-linux-gnu
> mipsisa64-elf
> powerpc64-unknown-linux-gnu
> sparc-sun-solaris2.10
> x86_64-unknown-linux-gnu
> 
> The secondary platforms are:
> 
> aarch64-elf
> powerpc-ibm-aix7.1.0.0
> i686-apple-darwin
> i686-pc-cygwin
> i686-mingw32
> s390x-linux-gnu
> 
> 
> Quality Data
> 
> 
> Priority  #   Change from last report
> ---   ---
> P19-  23
> P2   75-  12
> P3   15-   6
> ---   ---
> Total99-  41
> 
> 
> 
> Previous Report
> ===
> 
> http://gcc.gnu.org/ml/gcc/2014-02/msg00013.html
> 
> 
> The next report will be sent by me again, hopefully announcing
> the first GCC 4.9.0 release candidate soon.
>

Re: [RFC, MIPS] Relax NaN rules

2014-03-22 Thread Maciej W. Rozycki

On Sat, 22 Mar 2014, Richard Sandiford wrote:

> > Thanks Joseph. I guess I'm not really pushing to have don't-care
> > supported as it would take a lot of effort to determine when code does
> > and does not care, you rightly point out more cases to deal with
> > too. I'm not sure if the benefit would then be worth it or not as there
> > would still be modules which do and do not care about old and new NaNs
> > so it doesn't really relieve any pressure on toolchains or linux
> > distributions. The second part of the proposal is more
> > interesting/useful as it is saying I don't care about the impact of
> > getting NaN encoding wrong and a tools vendor/linux distribution then
> > gets to make that choice. Any comments on that aspect?
> 
> Maybe it's just me, but I don't understand your use case for (2).
> If 99% of users don't care about the different NaN encodings then
> why would they use a different -mnan setting from the default?
> Are you worried about potential future processors that only support
> 2008 NaNs?  If so, then (a) you give good reasons why that seems like a
> misfeature and (b) I think we should instead tackle it by continuing
> to allow both -mnan settings for all processors.  I.e. we simply wouldn't
> add any logic to check "is this processor/architecture compatible with this
> encoding?".

 Such processors already exist I believe.  Matthew will fill you in on the 
details, but IIRC they are strapped at boot to either NaN mode that cannot 
be switched at the run time, i.e. via CP1.FCSR (the NAN2008 bit is fixed 
at 0 or 1 depending on the boot mode selected).  Of course that means they 
can still run legacy NaN code if strapped for the legacy NaN mode, but 
it's up to the SOC/board hardware designer to decide which mode to choose 
and we have no control over that.

 I feel uneasy about silently producing wrong results, even if it's only 
such a border case as for many the NaN is.  Perhaps switching the kernel 
into a full FP emulation mode for programs that have an unsupported NaN 
requirement would be an option?  That should be doable with a reasonable 
effort and wouldn't affect programs that do not use the FPU at all, so no 
need to rush rebuilding everything, just the FP-intensive software 
packages.

  Maciej

gcc-4.7-20140322 is now available

2014-03-22 Thread gccadmin

Snapshot gcc-4.7-20140322 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.7-20140322/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.7 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_7-branch 
revision 208766

You'll find:

 gcc-4.7-20140322.tar.bz2 Complete GCC

  MD5=8db365aabc9d48daefae3dcda8bda51f
  SHA1=bc9a15dd262331e006547bb1861aca64ed7f205a

Diffs from 4.7-20140315 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.7
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

Re: Request for discussion: Rewrite of inline assembler docs

Re: [RFC, MIPS] Relax NaN rules

Re: [RFC][ARM] Naming for new switch to check for mixed hardfloat/softfloat compat

Re: Request for discussion: Rewrite of inline assembler docs

Re: [RL78] Questions about code-generation

Re: [RL78] Questions about code-generation

Re: GCC 4.9.0 Status Report (2014-03-13)

Re: [RFC, MIPS] Relax NaN rules

gcc-4.7-20140322 is now available

9 matches

Site Navigation

Mail list logo

Footer information