Re: Request for discussion: Rewrite of inline assembler docs
On 3/21/2014 2:57 AM, James Greenhalgh wrote: On Thu, Feb 27, 2014 at 11:07:21AM +, Andrew Haley wrote: Over the years there has been a great deal of traffic on these lists caused by misunderstandings of GCC's inline assembler. That's partly because it's inherently tricky, but the existing documentation needs to be improved. dw has done a fairly thorough reworking of the documentation. I've helped a bit. Section 6.41 of the GCC manual has been rewritten. It has become: 6.41 How to Use Inline Assembly Language in C Code 6.41.1 Basic Asm - Assembler Instructions with No Operands 6.41.2 Extended Asm - Assembler Instructions with C Expression Operands We could simply post the patch to GCC-patches and have at it, but I think it's better to discuss the document here first. You can read it at This documentation looks like a huge improvement. Thanks, I've worked hard to make it so. As the discussion here seems to have stalled, perhaps it is time to propose the patch to gcc-patches? Sorry, I wanted to make sure the people who were discussing it had a chance to finish responding, then I got caught up in other projects. I'll try to roll the comments into the docs this weekend. I'm certainly keen to see this make it to trunk, the increase in clarity is substantial. Exactly so. Trying to figure this all out from the existing docs drove me crazy. That's what motivated me to fix this. My goal is to make it so the next guy who has to struggle with this has an easier time than I did. Thanks, James
Re: [RFC, MIPS] Relax NaN rules
Matthew Fortune writes: > Thanks Joseph. I guess I'm not really pushing to have don't-care > supported as it would take a lot of effort to determine when code does > and does not care, you rightly point out more cases to deal with > too. I'm not sure if the benefit would then be worth it or not as there > would still be modules which do and do not care about old and new NaNs > so it doesn't really relieve any pressure on toolchains or linux > distributions. The second part of the proposal is more > interesting/useful as it is saying I don't care about the impact of > getting NaN encoding wrong and a tools vendor/linux distribution then > gets to make that choice. Any comments on that aspect? Maybe it's just me, but I don't understand your use case for (2). If 99% of users don't care about the different NaN encodings then why would they use a different -mnan setting from the default? Are you worried about potential future processors that only support 2008 NaNs? If so, then (a) you give good reasons why that seems like a misfeature and (b) I think we should instead tackle it by continuing to allow both -mnan settings for all processors. I.e. we simply wouldn't add any logic to check "is this processor/architecture compatible with this encoding?". Thanks, Richard
Re: [RFC][ARM] Naming for new switch to check for mixed hardfloat/softfloat compat
"Thomas Preud'homme" writes: >> From: Richard Sandiford [mailto:rdsandif...@googlemail.com] >> >> "Thomas Preud'homme" writes: >> >> -mno-float causes gcc to define the macro __mips_no_float, which the >> implementation can use when deciding whether to bother handling %f, etc. >> I'm afraid there's nothing more sophisticated to it than that. >> > > So the libc needs to be compiled with -mno-float as well so that printf and > scanf drop the float handling? Yes. The configurations that support -mno-float have separate -mno-float multilibs. In a sense, the point of -mno-float instead of -msoft-float is to select these cut-down libraries. >> A more general restriction would be "must pass arguments in the same >> way for both option set A and option set B". That too could be done >> using existing hooks and SWITCHABLE_TARGET, although it might not be >> hyper-efficient. > > I don't know how exactly that would work. You would switch twice for > each function and ask the register used for each function call? Yeah, that's the idea. More generally, run initialize_argument_information once for each ABI and compare the two arg_data arrays. Similarly for function.c. Thanks, Richard
Re: Request for discussion: Rewrite of inline assembler docs
Sorry for the slow response. dw writes: > On 3/3/2014 3:36 AM, Richard Sandiford wrote: >> Well, like you say, things can be moved across branches. So, although >> this is a very artificial example: >> >> asm ("x"); >> asm ("y"); >> >> could become: >> >> goto bar; >> >> foo: >> asm ("y"); >> ... >> >> bar: >> asm ("x"); >> goto foo; >> >> This has reordered the instructions in the sense that they have a >> different order in memory. But they are still _executed_ in the same >> order. Actually reordering the execution would be a serious bug. >> >> So I just want to avoid anything that gives the impression that "y" can >> be executed before "x" in this example. I still think: >> >>> Since the existing docs say "GCC's optimizer can move asm statements >>> relative to other code", how would you feel about: >>> >>> "Do not expect a sequence of |asm| statements to remain perfectly >>> consecutive after compilation. If you want to stop the compiler from >>> reordering or inserting anything into a sequence of assembler >>> instructions, use a single |asm| statement containing multiple >>> instructions. Note that GCC's optimizer can move |asm| statements >>> relative to other code, including across jumps." >> ...this gives the impression that we might try to execute volatiles >> in a different order. > > Ahh! Ok, I see what you mean. Hmm. Based on the description of > "no-toplevel-reorder", I assumed that it actually *might* re-order them. Well, -fno-toplevel-reorder only applies to asms outside functions, where there's no execution order as such. Top-level asms are treated more like function definitions. > "GCC's optimizer can move asm statements relative to other > code, including across jumps. This has implications for code > that contains a sequence of asm statements. While the execution > order of asm statements will be preserved, do not expect the sequence of asm > statements to remain perfectly consecutive in the compiler's output. This part sounds good to me FWIW. > To ensure that assembler instructions maintain their order, use a > single asm statement containing multiple instructions." I'm still unsure about "maintain their order" here though. How about something like: If certain instructions need to remain consecutive in the output, put them in a single multi-instruction asm statement. >> In the extended section: >> >> Unless an output operand has the '&' constraint modifier (see >> Modifiers), GCC may allocate it in the same register as an unrelated >> input operand, [...] >> >> It could also use it for addresses in other (memory) outputs. > Ok. But I'm not sure this really adds anything. Having warned people > that the register may be re-used unless '&' is used seems sufficient. It matters where it can be reused though. If you talk about input operands only, people might think it is OK to write asms of the form: foo tmp,[input0] bar [output0],tmp frob [output1],tmp where output0 is a register and output1 is a memory. This safely avoids using the input operand after assigning to output0, but the address in output1 is still live and could be changed by bar. >>> I'm not sure we're talking about the same problem. I'm borrowing this >>> x86 example from someone else: >>> >>> static inline char * >>> lcopy( char *dst, const char *src, long len ) >>> { >>> asm( >>> "shr $3,%2; " /* how many qwords to copy */ >>> "rep movsq; " /* copy that many */ >>> "mov %3,%2; " /* how many bytes to copy */ >>> "rep movsb" /* copy that many */ >>> : "+D" (dst), "+S" (src), "+c" (len) >>> : "r" (len & 7) >>> : "memory"); >>> return dst; >>> } >>> >>> You might expect that "len" and "len & 7" are two different things. >>> However if the function is called with a constant less than 8, the >>> compiler knows that they are actually the same and uses rcx for both, >>> giving mov rcx,rcx for mov %3,%2 and of course by then rcx is zero. >>> Using & on len forces the use of two separate registers. >>> >>> This seems to me to be a different kind of problem than: >>> >>> asm ("xxx": "=r" (x), "=m" (x)); >>> >>> Or am I missing your point? >> Well, that code is just one instance of (and a good example of) >> the principle that GCC assumes all inputs are consumed before any >> outputs are written. And the point is that the "inputs" in that >> description aren't restricted to input operands: they apply to any >> rvalues in the output operands too. >> >> E.g. the same thing could occur for an artificial case like: >> >> asm ("" : "+r" (ptr), "=m" (*x)); >> >> if GCC realises that x==ptr. Then the address in operand 1 might >> be the same as operand 0. The same goes for: >> >> asm ("" : "=r" (ptr), "=m" (*x) : "0" (ptr)); >> >> which is really just another way of wri
Re: [RL78] Questions about code-generation
On 22/03/14 01:47, Jeff Law wrote: On 03/21/14 18:35, DJ Delorie wrote: I've found that "removing uneeded moves through registers" is something gcc does poorly in the post-reload optimizers. I've written my own on some occasions (for rl78 too). Perhaps this is a good starting point to look at? much needless copying, which strengthens my suspicion that it's something in the RL78 backend that needs 'tweaking'. Of course it is, I've said that before I think. The RL78 uses a virtual model until reload, then converts each virtual instructions into multiple real instructions, then optimizes the result. This is going to be worse than if the real model had been used throughout (like arm or x86), but in this case, the real model *can't* be used throughout, because gcc can't understand it well enough to get through regalloc and reload. The RL78 is just to "weird" to be modelled as-is. I keep hoping that gcc's own post-reload optimizers would do a better job, though. Combine should be able to combine, for example, the "mov r8,ax; cmp r8,#4" types of insns together. The virtual register file was the only way I could see to make RL78 work. I can't recall the details, but when you described the situation to me the virtual register file was the only way I could see to make the RL78 work in the IRA+reload world. What would be quite interesting to try would be to continue to use the virtualized register set, but instead use the IRA+LRA path. Presumably that wouldn't be terribly hard to try and there's a reasonable chance that'll improve the code in a noticeable way. Looking at how that's done by other backends, as far as I can tell, I just need to add something like: #undef TARGET_LRA_P #define TARGET_LRA_P rl78_enable_lra static bool rl78_enable_lra (void) { return true; } to rl78.c? At least in theory, even if other work is needed elsewhere to make things run smoothly. Unfortunately, that function never seems to be called. How does TARGET_LRA_P get used, anyway? I can't find anything that tries to use it, only places where it gets set. Is there some funky preprocessor stuff going on that's stopping me grepping for it? The next obvious thing to try, and it's probably a lot more work, would be to see if IRA+LRA is smart enough (or can be made so with a reasonable amount of work) to eliminate the virtual register file completely. Just to be clear, I'm not planning to work on this; my participation and interest in the RL78 was limited to providing a few tips to DJ. And from my side, I'm not trying to get anyone to work on it (though obviously I'm not averse to it). I'm just looking for hints and tips so that I can try to understand the causes (and hopefully find some solutions). Regards, Richard.
Re: [RL78] Questions about code-generation
On 22/03/14 01:35, DJ Delorie wrote: Is it possible that the virtual pass causes inefficiencies in some cases by sticking with r8-r31 when one of the 'normal' registers would be better? That's not a fair question to ask, since the virtual pass can *only* use r8-r31. The first bank has to be left alone else the devirtualizer becomes a few orders of magnitude harder, if not impossible, to make work correctly. What I meant was that because the virtual pass can only use r8-r31, it's causing unnecessary register moves to be generated because it chooses, say, r8 as the register for a byte compare. Because r8 is a *valid* register to use with a byte compare, it sticks with it come what may and then causes additional instructions to be generated to make sure that the result to be compared definitely ends up in r8, even if the register the result was in is also valid for a byte compare operation. much needless copying, which strengthens my suspicion that it's something in the RL78 backend that needs 'tweaking'. Of course it is, I've said that before I think. The RL78 uses a virtual model until reload, then converts each virtual instructions into multiple real instructions, then optimizes the result. This is It may be obvious to you and everyone else on this list that it's the backend that needs tweaking but I've only been looking at the compiler internals for a couple of weeks, so unfortunately it's not obvious to me. I'm not complaining or pointing fingers or anything like that. I'm just trying to understand the reasons why things are the way they are - what things are happening in the backend, what's happening in the 'generic' part and the interactions between them. I understand that it's easy to say 'This is what the compiler's generating. That's stupid. It should be generating this', which is why I'm trying to understand the reasons that cause the compiler to generate what it's generating. going to be worse than if the real model had been used throughout (like arm or x86), but in this case, the real model *can't* be used throughout, because gcc can't understand it well enough to get through regalloc and reload. The RL78 is just to "weird" to be modelled as-is. Can you explain what is too weird about it in particular? It certainly has restrictions on which registers can be used with various instructions but I wouldn't have thought they were that complicated that they couldn't be described using the normal constraints? Regards, Richard.
Re: GCC 4.9.0 Status Report (2014-03-13)
I want to ask how I can find the bugs in bugzilla which are listed in the "Quality Data" Table. It feels that there are more bugs which are not listed. For example: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57694 Actually the compiler returns "not implemented" while compiling the given example code but it seems that this bug is not tracked in the actual quality data. my question is: Which is the selection criteria that a bug is related to the release. What is maybe the missing information in a bug report to add the bug to that list. Can I help by give some more example code or any other "simple" things to support the process a bit? Regards Klaus > Status > == > > The trunk is still in Stage 4, which means only patches fixing regressions > and documentation issues are appropriate. > > Comparing to last year's status reports, we are something in between a > fortnight and month behind the last year's schedule, but if enough attention > is given to the remaining P1 blockers, we could still release around the > beginning of April. > > The list of secondary architectures has changed recently, so to remind people > I'm including it here: > > The primary platforms are: > > arm-linux-gnueabi > i386-unknown-freebsd > i686-pc-linux-gnu > mipsisa64-elf > powerpc64-unknown-linux-gnu > sparc-sun-solaris2.10 > x86_64-unknown-linux-gnu > > The secondary platforms are: > > aarch64-elf > powerpc-ibm-aix7.1.0.0 > i686-apple-darwin > i686-pc-cygwin > i686-mingw32 > s390x-linux-gnu > > > Quality Data > > > Priority # Change from last report > --- --- > P19- 23 > P2 75- 12 > P3 15- 6 > --- --- > Total99- 41 > > > > Previous Report > === > > http://gcc.gnu.org/ml/gcc/2014-02/msg00013.html > > > The next report will be sent by me again, hopefully announcing > the first GCC 4.9.0 release candidate soon. >
Re: [RFC, MIPS] Relax NaN rules
On Sat, 22 Mar 2014, Richard Sandiford wrote: > > Thanks Joseph. I guess I'm not really pushing to have don't-care > > supported as it would take a lot of effort to determine when code does > > and does not care, you rightly point out more cases to deal with > > too. I'm not sure if the benefit would then be worth it or not as there > > would still be modules which do and do not care about old and new NaNs > > so it doesn't really relieve any pressure on toolchains or linux > > distributions. The second part of the proposal is more > > interesting/useful as it is saying I don't care about the impact of > > getting NaN encoding wrong and a tools vendor/linux distribution then > > gets to make that choice. Any comments on that aspect? > > Maybe it's just me, but I don't understand your use case for (2). > If 99% of users don't care about the different NaN encodings then > why would they use a different -mnan setting from the default? > Are you worried about potential future processors that only support > 2008 NaNs? If so, then (a) you give good reasons why that seems like a > misfeature and (b) I think we should instead tackle it by continuing > to allow both -mnan settings for all processors. I.e. we simply wouldn't > add any logic to check "is this processor/architecture compatible with this > encoding?". Such processors already exist I believe. Matthew will fill you in on the details, but IIRC they are strapped at boot to either NaN mode that cannot be switched at the run time, i.e. via CP1.FCSR (the NAN2008 bit is fixed at 0 or 1 depending on the boot mode selected). Of course that means they can still run legacy NaN code if strapped for the legacy NaN mode, but it's up to the SOC/board hardware designer to decide which mode to choose and we have no control over that. I feel uneasy about silently producing wrong results, even if it's only such a border case as for many the NaN is. Perhaps switching the kernel into a full FP emulation mode for programs that have an unsupported NaN requirement would be an option? That should be doable with a reasonable effort and wouldn't affect programs that do not use the FPU at all, so no need to rush rebuilding everything, just the FP-intensive software packages. Maciej
gcc-4.7-20140322 is now available
Snapshot gcc-4.7-20140322 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.7-20140322/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.7 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_7-branch revision 208766 You'll find: gcc-4.7-20140322.tar.bz2 Complete GCC MD5=8db365aabc9d48daefae3dcda8bda51f SHA1=bc9a15dd262331e006547bb1861aca64ed7f205a Diffs from 4.7-20140315 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.7 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.