rnreg and vliw
It seems that in gcc 4.7, the rnreg pass for renaming registers after reload is not vliw aware. In particular I saw it reassign a register that is in use in the same vliw. To be more concrete, I saw it change the following pseudo code DI:a30 = v0 SI:a14 = -a14 to DI:a30 = v0 SI:a31 = -a14 since a31 was never referenced again. This won't work inside a vliw since it causes two instructions to set a31. Even though, rnreg runs before sched2, it runs after software pipelining which creates its own vliws. Is there any easy fix for this. Thanks, Shmeel
Contributing to GCC and question about PR64744
Dear GCC team, I would like to contribute to the project. I have a background in embedded systems programming, but few experience in compiler development. I'd like to try with fixing PR64744. Would some one help me to understand what should be correct compilers behaviour with an example below: __attribute__((naked)) void foo() { char [2] = {0}; }; Now gcc (trunk for aarch64 target) goes to ICE while compiling this code: cc1 -O0 test.c But I think that it should report something like: "local frame unavailable (naked function?)" Thanks in advance -- Alexander
Help with reload bug, please
How does reload ensure that an SImode value (re)loaded into an FP register has a valid stack index? The FP load instruction allows a smaller index range than the integer equivalent, but nothing checks the destination register, only the source mode. I'm trying to solve a problem in which GCC 4.1 gets this wrong, but AFAICT this code works exactly the same now as then (although I don't have a testcase). IOW, unless I'm missing something, the only reason this doesn't fail all the time is that it's quite rare for the register pressure to cause just this value to spill in a function that has a stack frame >1KB and the index happens to be beyond the limit. My target is ARMv7a with VFP. The code is trying to cast an int to a float. The machine description is such that the preferred route is to load the value first into a general register, transfer it to the VFP register, and then convert. It's only possible to get it to load directly to the VFP register if all the general registers are in use. This makes it very hard to write a synthetic testcase. I can "fix" the problem by rewriting arm_legitimate_index_p such that it assumes all SImode access might be VFP loads, but that seems suboptimal. Any suggestions would be appreciated! Thanks Andrew
Re: Help with reload bug, please
On 23 January 2015 at 13:46, Andrew Stubbs wrote: > How does reload ensure that an SImode value (re)loaded into an FP register > has a valid stack index? You could use CANNOT_CHANGE_MODE_CLASS, or request secondary reload. For the latter, you can look at the memory/pseudo to decide if the address requires a secondary reload for your register (classs). Although we had a few bugs previously where reload wouldn't re-calculate frame addresses when something relevant changed, so you might have to backport some fixes when working with an old compiler.
Re: C++ Standard Question
On 22/01/15 16:07 -0600, Joel Sherrill wrote: On 1/22/2015 3:44 PM, Marc Glisse wrote: On Thu, 22 Jan 2015, Joel Sherrill wrote: I think this is a glibc issue but since this method is defined in the C++ standards, I thought there were plenty of language lawyers here. :) s/glibc/libstdc++/ and they have their own ML. Thank you. That's deprecated, isn't it? Yes. There is also a warning about that coming from the test code. I don't know how long it has been deprecated since even with -std=c++03, the warning is present. Those types were deprecated even in the first C++ standard in 1998. They were born deprecated, and have remained so ever since. class strstreambuf : public basic_streambuf > ISSUE > int pcount() const; <= ISSUE My reading of the C++03 and draft C++14 says that the int pcount() method in this class is not const. glibc has it const in the glibc shipped with Fedora 20 and CentOS 6. This is a simple test case: #include int main() { int (std::strstreambuf::*dummy)() = &std::strstreambuf::pcount; /*-- pcount is conformant --*/ return 0; } What's the consensus? The exact signature of member functions is not mandated by the standard, implementations are allowed to make the function const if that works (or provide both a const and a non-const version). Your code is not guaranteed to work. Lambdas usually provide a fine workaround. This code is actually from the Open Group FACE Conformance Test Suite. It uses code like this to check the presence of methods from the C Standard Library, POSIX APIs, and the C++ Standard Library. It would be really helpful if you could cite the place in the C++ standard so I can provide that as feedback to the authors of the test suite. 17.6.5.5 [member.functions] gives implementors certain freedoms to provide slightly different signatures to those described in the standard, including this: -2- It is unspecified whether any member functions in the C++ standard library are defined as inline (7.1.2). An implementation may declare additional non-virtual member function signatures within a class: — by adding arguments with default values to a member function signature;(187) — ... 187) Hence, the address of a member function of a class in the C++ standard library has an unspecified type. The footnote makes it clear that we could declare strstreambuf::pcount as a function with one or more parameters, as long as we give them default values, meaning that the type of &std::strstreambuf::pcount is not guaranteed to be int (strstreambuf::*)() in a conforming implementation, so the conformance test could fail on an implementation that conforms 100% to the standard. I'm not sure if that gives us the liberty to add 'const' where the standard doesn't have it, so this might be a real non-conformance issue, but even if we fixed it their test is not guaranteed to work for other reasons. On a positive note, the test suite isn't flagging much using this technique. This may be the only method. But I would like to provide the correct feedback to them so no one else deals with this. IMHO the correct feedback for these deprecated types might be "Hmm, you're right. Oh well, nevermind, we're not going to bother fixing it now." Any real program using the pcount() member will work correctly with our implementation. In practice only conformance tests are likely to notice the difference.
Re: Problem with extremely large procedures and 64-bit code
Thanks Richard for your input, much appreciated. I followed up on your suggestions; unfortunately the -Wdisabled-optimization option you suggested did not cause any warnings. Still trying one by one the --params options without success. I got a new hint, though, running the same examples on a MacBook I don't see the same issue at all, time difference between 64-bit and 32-bit in each optimize/debug versions is slightly off, and 64-bit always about 10% faster in each class. I guess somehow the compiler flags are different, perhaps you, or someone knows what flags are set differently by default between them, though is hard to compare the actual speeds because the hardware is different. Here are the specs on the mac: gcc: Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn) - don't know what that means expected a number like 4.2.1 or something like that, 2.53 GHz Intel Core 2 Duo Anything comes to your mind? Thanks again for your help, Ricardo On 1/20/15 1:21 AM, Richard Biener wrote: On Tue, Jan 20, 2015 at 4:57 AM, Ricardo Telichevesky wrote: Hi, I have a strange problem with extremely large procedures when generating 64-bit code I am using gcc 4.9.2 on RHEL6.3 on a 64-thread 4-socket Xeon E7 4820 with 256GB of memory. No avx extensions, using sse option when building the compiler. This particular code is serial. I made measurements with 32- and 64- bit both debug -g and optimize -O3 for two different examples (this is a circuit simulator and each example is a different circuit that uses different transistors). Example A is the one the effect is more acute. I listed at the bottom of the e-mail the 3 procedures that consume 90% of the execution time: a) As a counter-example, the factor code listed is heavily optimized hand-written 300-lines of C++ code that behaves as expected: 64-bit optimize is way faster than any other, up to 15x faster than 32-bit debug (btw great job in the compiler, it is really shining here). b) evalTran has 18000 lines of auto-generated code and behaves very counter-intuitively 64-bit optimize code is 3x slower than 32-bit optimize code. c) evalTranRhs has 5000 lines even worse: 64-bit is 4x slower than 32-bit. Notice that all the data structures in 32-bit code and 64-bit code are identical and most variables are identical - in fact all integers used are 64-bit, and most operations are floating-point ops. Initially I thought the 64-bit code was a lot bigger than 32-bit code and the cache was overwhelmed. In fact the difference in code sizes is not even 10% (at least debug - notice I calculated the size of each procedure in bytes) so my trash-the I-cache conjecture seems to be wrong. The overall execution time is causing us a lot of problems - 64-bit optimize takes 16seconds, even more than 32-bit debug 10seconds and 32-bit optimize 4.8 seconds. Considering we only care about 64-bit optimize we got a big problem here. Example B is not so bad, and in fact 64-bit code is slightly faster than 32-bit code, would be nice if went even faster, but if I got A to behave like that I'd be pretty happy already. I tried to look at the wide array of optimizing options for the code, it is is a dizzying task and I could not get any kind of intuition besides the -O3 ... Would you have any suggestions for the proper flags for those ridiculously large auto-generated codes that might be able to alleviate this 32-bit vs 64-bit problem? Would you think that the fact this code is in a dynamic linked library (-fPIC) plays a role? It's hard to tell without a testcase but GCC has various limits on code sizes passes deal with so you might trip one of these which effectively would disable optimizations. For example loop dependence analysis has a limit on the number of memory references it considers (--param loop-max-datarefs-for-datadeps, default 1000). Note that not all such limits are controlled by --params. We have -Wdisabled-optimization that should warn if you run into any such case (but the warning is unfortunately not correctly implemented by all passes having such limits). Thanks, Richard. Thanks very much for your help, Ricardo All times are wall clock in micro-seconds - the main was checked against the reported UNIX time and is exact. example A == evalTran has 18000 lines of C code (two for loops around 99% of the code) evalTranRhs has 5000 lines of C code (two for loops around 99% of the code) 32 bit debug -g -m32 -fPIC -Wall -Winvalid-pch -msse2 %time elapsed(us) #calls per call(us)timer name @DN@ - --- -- -- 2.503 254536833530 numerical TRAN factor 56.01 5695065 8335683 evalTranbytes=231791 35.41 3600646 13924 258 evalTranRhs bytes=57501 10010168242 1 10168242main @DT@ 32 bit optimize -O3 -m32 -fPIC -Wall -Winvalid-pch -msse2 %ti
Re: Problem with extremely large procedures and 64-bit code
On 23 January 2015 at 16:07, Ricardo Telichevesky wrote: > gcc: Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn) - don't > know what that means expected a number like 4.2.1 or something like that, > 2.53 GHz Intel Core 2 Duo Hi Ricardo, This is not gcc at all, it's Clang+LLVM. :/ I'm not sure why Apple decided to call the binary "gcc", which obviously causes more confusion than solves problems, but that's beyond the point. You should try Richard's suggestions again on the Linux/GCC that you originally started. cheers, --renato
Re: Help with reload bug, please
On 01/23/15 06:46, Andrew Stubbs wrote: How does reload ensure that an SImode value (re)loaded into an FP register has a valid stack index? The FP load instruction allows a smaller index range than the integer equivalent, but nothing checks the destination register, only the source mode. Unfortunately, GCC is designed with the assumption that the validity of an address is independent of whether its used in a load or store and in the case of a load, it's independent of the target register or in a store that validity is independent of the source register. This is deeply baked into the compiler in a variety of places. When I had responsibility for the PA I bumped up against this often. Just for reference, the PA allows a 14 bit displacement in memory addresses which use integer registers, but only a 5 bit displacement for FP registers. Other than the displacement amounts, I suspect this is the same core problem you have on your port. Ultimately all I could do way layer hack on hack. I can't remember them all. The most significant ones were to first reject the larger offsets for FP modes in GO_IF_LEGITIMATE_ADDRESS. While it's still valid (and relatively common on the PA) to access integer registers in FP modes or vice-versa, this change was a huge help. Secondary reloads are critical. When you detect a situation that won't work, you have to allocate a secondary reload register so that you can build up the address as well as all the reload_in/reload_out handling. This is how you ensure that if the compiler did something like try to load from memory using an integer mode into an FP register you've got an scratch register for reloading the address if it is an out-of-range reg+ address. We may have used special constraints as well to allow loads/stores of integer registers in FP modes to use the larger offset. Jeff
Re: C++ Standard Question
On 1/23/2015 9:55 AM, Jonathan Wakely wrote: > On 22/01/15 16:07 -0600, Joel Sherrill wrote: >> On 1/22/2015 3:44 PM, Marc Glisse wrote: >>> On Thu, 22 Jan 2015, Joel Sherrill wrote: >>> I think this is a glibc issue but since this method is defined in the C++ standards, I thought there were plenty of language lawyers here. :) >>> s/glibc/libstdc++/ and they have their own ML. >> Thank you. >>> That's deprecated, isn't it? >> Yes. There is also a warning about that coming from the test code. >> I don't know how long it has been deprecated since even with >> -std=c++03, the warning is present. > Those types were deprecated even in the first C++ standard in 1998. > They were born deprecated, and have remained so ever since. I don't mind the deprecated warning but at least I know now how long they have been that way. :) class strstreambuf : public basic_streambuf > ISSUE > int pcount() const; <= ISSUE My reading of the C++03 and draft C++14 says that the int pcount() method in this class is not const. glibc has it const in the glibc shipped with Fedora 20 and CentOS 6. This is a simple test case: #include int main() { int (std::strstreambuf::*dummy)() = &std::strstreambuf::pcount; /*-- pcount is conformant --*/ return 0; } What's the consensus? >>> The exact signature of member functions is not mandated by the standard, >>> implementations are allowed to make the function const if that works (or >>> provide both a const and a non-const version). Your code is not guaranteed >>> to work. Lambdas usually provide a fine workaround. >>> >> This code is actually from the Open Group FACE Conformance Test Suite. >> It uses code like this to check the presence of methods from the C Standard >> Library, POSIX APIs, and the C++ Standard Library. It would be really >> helpful >> if you could cite the place in the C++ standard so I can provide that as >> feedback >> to the authors of the test suite. > 17.6.5.5 [member.functions] gives implementors certain freedoms to > provide slightly different signatures to those described in the > standard, including this: > > -2- It is unspecified whether any member functions in the C++ standard > library are defined as inline (7.1.2). An implementation may > declare additional non-virtual member function signatures within a > class: > > — by adding arguments with default values to a member function > signature;(187) > — ... > > 187) Hence, the address of a member function of a class in the C++ > standard library has an unspecified type. > > The footnote makes it clear that we could declare strstreambuf::pcount > as a function with one or more parameters, as long as we give them > default values, meaning that the type of &std::strstreambuf::pcount is > not guaranteed to be int (strstreambuf::*)() in a conforming > implementation, so the conformance test could fail on an > implementation that conforms 100% to the standard. > > I'm not sure if that gives us the liberty to add 'const' where the > standard doesn't have it, so this might be a real non-conformance > issue, but even if we fixed it their test is not guaranteed to work > for other reasons. > Thank you for the explanation. Sounds like every place this technique flags a method, it cannot be an automatic fail but will need manual examination to at least check for defaulted arguments. I am curious what the ruling is on the const. Since the answer will impact FACE's guidance for its test suite. Is there a better way to automate a signature compliance? To tweak what they have done? >> On a positive note, the test suite isn't flagging much using this >> technique. This >> may be the only method. But I would like to provide the correct feedback to >> them so no one else deals with this. > IMHO the correct feedback for these deprecated types might be "Hmm, > you're right. Oh well, nevermind, we're not going to bother fixing it > now." > > Any real program using the pcount() member will work correctly with > our implementation. In practice only conformance tests are likely to > notice the difference. > -- Joel Sherrill, Ph.D. Director of Research & Development joel.sherr...@oarcorp.comOn-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available(256) 722-9985
Re: C++ Standard Question
On 23/01/15 10:53 -0600, Joel Sherrill wrote: Is there a better way to automate a signature compliance? To tweak what they have done? Testing member function signatures for compliance is inherently flawed, they just shouldn't do it. I would say they should be testing that the function can be called on a non-const object and that it behaves as specified, rather than testing for a specific signature.
Re: C++ Standard Question
On 1/23/2015 10:59 AM, Jonathan Wakely wrote: > On 23/01/15 10:53 -0600, Joel Sherrill wrote: >> Is there a better way to automate a signature compliance? To tweak >> what they have done? > Testing member function signatures for compliance is inherently > flawed, they just shouldn't do it. > > I would say they should be testing that the function can be called on > a non-const object and that it behaves as specified, rather than > testing for a specific signature. That's more or less how the RTEMS API signature tests work for C. We declare a variable of each type and pass it into the method including only the .h files POSIX says you should. Thanks. -- Joel Sherrill, Ph.D. Director of Research & Development joel.sherr...@oarcorp.comOn-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available(256) 722-9985
Is there a way to dump LTO IR?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64754 is a LTO bug where stage 1 and stage 2 compilers generate different LTO IR. Is there a way to dump LTO IR to see the actual difference in LTO IR? Thanks. -- H.J.
Re: Help with reload bug, please
On 23/01/15 16:34, Jeff Law wrote: Just for reference, the PA allows a 14 bit displacement in memory addresses which use integer registers, but only a 5 bit displacement for FP registers. Other than the displacement amounts, I suspect this is the same core problem you have on your port. Yes, that seems similar. Ultimately all I could do way layer hack on hack. I can't remember them all. The most significant ones were to first reject the larger offsets for FP modes in GO_IF_LEGITIMATE_ADDRESS. While it's still valid (and relatively common on the PA) to access integer registers in FP modes or vice-versa, this change was a huge help. This is already the case; it does the right thing when the mode is SFmode. Secondary reloads are critical. When you detect a situation that won't work, you have to allocate a secondary reload register so that you can build up the address as well as all the reload_in/reload_out handling. This is how you ensure that if the compiler did something like try to load from memory using an integer mode into an FP register you've got an scratch register for reloading the address if it is an out-of-range reg+ address. SECONDARY_INPUT_RELOAD_CLASS is another missed opportunity. Just like the legitimate address stuff, this has checks for the various VFP classes, but reload detects the class in the same flawed way, so an integer reload gives GENERAL_REGS even when the destination is VFP. Within the macro there's no way to see the whole insn. We may have used special constraints as well to allow loads/stores of integer registers in FP modes to use the larger offset. Do you have an example? Thanks Andrew