Re: Help with PPC Relocation options
On Mon, Aug 1, 2011 at 12:12 PM, Rohit Arul Raj wrote: > Hello All, > > I compiled a simple 1.c file with -mpcu=e500mc64 option and while > trying to create a relocatable, i am getting the following error: > > $powerpc-elf-ld.exe -static -r 1.o > powerpc-elf-ld.exe: Relocatable linking with relocations from format > elf64-powerpc (1.o) to format elf32-powerpc (a.out) is not supported > > > $ powerpc-elf-ld.exe -static -r 1.o --oformat elf64-powerpc > powerpc-elf-ld.exe: Relocatable linking with relocations from format > elf64-powerpc (1.o) to format elf64-powerpc (a.out) is not supported > > Is relocatable linking not allowed for 64bit PPC? > > Regards, > Rohit > Yeah got it.. "-m elf64ppc" I was trying with '-m64' option which worked with 'powerpc-linux' tool chain but doesn't work with 'powerpc-elf' toolchain. Thanks, Rohit
Re: onlinedocs formated text too small to read
Jon Grant wrote: > Hello > > Georg-Johann Lay wrote, On 08/07/11 19:08: > [.] >> I can confirm that it's hardly readable on some systems. >> I use Opera and several FF versions, some worse, some a bit less worse. >> >> IMO it's definitely to small, I already thought about complaining, too. >> >> Johann > > Could I ask, what would be the best way to progress this request? e.g. > Should I create a bugzilla ticket. > > Best regards, Jon http://gcc.gnu.org/ml/gcc/2011-07/msg00106.html CCed Gerald, I think he cares for that kind of things. If he does not answer (it's vacation time) file a PR so that it won't be forgotten. Johann
Re: [RFC] Add middle end hook for stack red zone size
On Mon, Aug 01, 2011 at 11:44:04AM +0800, Jiangning Liu wrote: > It's quite necessary to solve the general problem in middle-end rather than > in back-end. That's what we disagree on. All back-ends but ARM are able to handle it right, why can't ARM too? The ABI rules for stack handling in the epilogues are simply too diverse and complex to be handled easily in the scheduler. Jakub
Re: PATCH RFA: Build stages 2 and 3 with C++
2011/8/1 Marc Glisse : > On Fri, 15 Jul 2011, Ian Lance Taylor wrote: > >> I would like to propose this patch as a step toward building gcc using a >> C++ compiler. This patch builds stage1 with the C compiler as usual, >> and defaults to building stages 2 and 3 with a C++ compiler built during >> stage 1. This means that the gcc installed and used by most people will >> be built by a C++ compiler. This will ensure that gcc is fully >> buildable with C++, while retaining the ability to bootstrap with only a >> C compiler, not a C++ compiler. > > Nice step. Now that gcc can (mostly) build with g++, it would be great if it > could build with a non-gnu compiler. More precisely, with a compiler that > doesn't define __GNUC__. Indeed, the code is quite different in this case, > as can be seen trying to compile gcc with CC='gcc -U__GNUC__' and CXX='g++ > -U__GNUC__' (there are other reasons why this won't work, but at least it > shows some of the same issues I see with sunpro). > > > To start with, the obstack_free macro casts a pointer to an int -> error. > /data/repos/gcc/trunk/libcpp/directives.c:2048:7: error: cast from ‘char*’ > to ‘int’ loses precision [-fpermissive] > > > Then, ENUM_BITFIELD(cpp_ttype) is expanded to unsigned int instead of the > enum, and conversions from int to enum require an explicit cast in C++, > giving many errors like: > /data/repos/gcc/trunk/libcpp/charset.c:1615:79: error: invalid conversion > from ‘unsigned int’ to ‘cpp_ttype’ [-fpermissive] > /data/repos/gcc/trunk/libcpp/charset.c:1371:1: error: initializing > argument 5 of ‘bool cpp_interpret_string(cpp_reader*, const cpp_string*, > size_t, cpp_string*, cpp_ttype)’ [-fpermissive] > > Do we want to add a cast in almost every place a field declared with > ENUM_BITFIELD is used? That's quite a lot of places, everywhere in gcc... > The alternative would be to store the full enum instead of a bitfield (just > for stage1 so that's not too bad), but some comments in the code seem to > advise against it. I think it's the only viable solution (use the full enum for a non-GCC stage1 C++ compiler). We could help it somewhat by at least placing enum bitfields first/last in our bitfield groups. Any other opinions? Btw, thanks for trying non-GCC stage1 compilers ;) Richard.
Re: PATCH RFA: Build stages 2 and 3 with C++
On Mon, 1 Aug 2011, Richard Guenther wrote: > I think it's the only viable solution (use the full enum for a non-GCC stage1 > C++ compiler). We could help it somewhat by at least placing > enum bitfields first/last in our bitfield groups. Are GCC and other compilers declaring that they support the GNU C and C++ languages by defining __GNUC__ really the only compilers with this extension? Feature tests for particular features are generally better than testing for whether the compiler in use is GCC. (Using configure tests for things in ansidecl.h does require checking where in the gcc and src repositories those things are used, to make sure that the relevant configure tests are used everywhere necessary.) (Actually, C++03 appears to support enum bit-fields - it's only for C that they are a GNU extension - so can't we just enable them unconditionally when building as C++?) -- Joseph S. Myers jos...@codesourcery.com
Re: PATCH RFA: Build stages 2 and 3 with C++
On Mon, Aug 1, 2011 at 11:53 AM, Joseph S. Myers wrote: > On Mon, 1 Aug 2011, Richard Guenther wrote: > >> I think it's the only viable solution (use the full enum for a non-GCC stage1 >> C++ compiler). We could help it somewhat by at least placing >> enum bitfields first/last in our bitfield groups. > > Are GCC and other compilers declaring that they support the GNU C and C++ > languages by defining __GNUC__ really the only compilers with this > extension? Feature tests for particular features are generally better > than testing for whether the compiler in use is GCC. (Using configure > tests for things in ansidecl.h does require checking where in the gcc and > src repositories those things are used, to make sure that the relevant > configure tests are used everywhere necessary.) > > (Actually, C++03 appears to support enum bit-fields - it's only for C that > they are a GNU extension - so can't we just enable them unconditionally > when building as C++?) Oh, sure - that's even better. Richard.
Re: [RFC] Add middle end hook for stack red zone size
On Mon, 1 Aug 2011, Jakub Jelinek wrote: > On Mon, Aug 01, 2011 at 11:44:04AM +0800, Jiangning Liu wrote: > > It's quite necessary to solve the general problem in middle-end rather than > > in back-end. > > That's what we disagree on. All back-ends but ARM are able to handle it > right, why can't ARM too? The ABI rules for stack handling in the epilogues > are simply too diverse and complex to be handled easily in the scheduler. Given that the long-standing open bugs relating to scheduling and stack adjustments (30282, 38644) include issues for Power that do not yet appear to have been fixed, even if other back ends are able to handle it right they don't seem to do so at present. -- Joseph S. Myers jos...@codesourcery.com
Re: PATCH RFA: Build stages 2 and 3 with C++
On Mon, 1 Aug 2011, Joseph S. Myers wrote: On Mon, 1 Aug 2011, Richard Guenther wrote: I think it's the only viable solution (use the full enum for a non-GCC stage1 C++ compiler). We could help it somewhat by at least placing enum bitfields first/last in our bitfield groups. Are GCC and other compilers declaring that they support the GNU C and C++ languages by defining __GNUC__ really the only compilers with this extension? Feature tests for particular features are generally better than testing for whether the compiler in use is GCC. (Using configure tests for things in ansidecl.h does require checking where in the gcc and src repositories those things are used, to make sure that the relevant configure tests are used everywhere necessary.) I just checked, and indeed sunpro supports this extension as well in C. (Actually, C++03 appears to support enum bit-fields - it's only for C that they are a GNU extension - so can't we just enable them unconditionally when building as C++?) Great, I didn't know that. That's a much better solution. -- Marc Glisse
RE: [RFC] Add middle end hook for stack red zone size
The answer is ARM can. However, if you look into the bugs PR30282 and PR38644, PR44199, you may find in history, there are several different cases in different ports reporting the similar failures, covering x86, PowerPC and ARM. You are right, they were all fixed in back-ends in the past, but we should fix the bug in a general way to make GCC infrastructure stronger, rather than fixing the problem target-by-target and case-by-case! If you further look into the back-end fixes in x86 and PowerPC, you may find they looks quite similar in back-ends. Thanks, -Jiangning > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] > On Behalf Of Jakub Jelinek > Sent: Monday, August 01, 2011 5:12 PM > To: Jiangning Liu > Cc: 'Joern Rennecke'; gcc@gcc.gnu.org; gcc-patc...@gcc.gnu.org; > vmaka...@redhat.com; dje@gmail.com; Richard Henderson; Ramana > Radhakrishnan; 'Ramana Radhakrishnan' > Subject: Re: [RFC] Add middle end hook for stack red zone size > > On Mon, Aug 01, 2011 at 11:44:04AM +0800, Jiangning Liu wrote: > > It's quite necessary to solve the general problem in middle-end rather than in > back-end. > > That's what we disagree on. All back-ends but ARM are able to handle it > right, why can't ARM too? The ABI rules for stack handling in the epilogues > are simply too diverse and complex to be handled easily in the scheduler. > > Jakub
Re: [RFC] Add middle end hook for stack red zone size
On 01/08/11 10:11, Jakub Jelinek wrote: > On Mon, Aug 01, 2011 at 11:44:04AM +0800, Jiangning Liu wrote: >> It's quite necessary to solve the general problem in middle-end rather than >> in back-end. > > That's what we disagree on. All back-ends but ARM are able to handle it > right, why can't ARM too? The ABI rules for stack handling in the epilogues > are simply too diverse and complex to be handled easily in the scheduler. Because the vast majority of back-ends (ie those without red zones) shouldn't have to deal with this mess. This is something the MI code should be able to work out and deal with itself. Then the compiler will at least generate safe code, rather than randomly moving things about and allowing potentially unsafe writes/reads from beyond the allocated stack region. We should build the compiler defensively, but then allow for more aggressive optimizations to disable the defences when the back-end wants to take on the responsibility. Not the other way around. R.
Re: [RFC] Add middle end hook for stack red zone size
On Mon, Aug 01, 2011 at 06:14:27PM +0800, Jiangning Liu wrote: > ARM. You are right, they were all fixed in back-ends in the past, but we > should > fix the bug in a general way to make GCC infrastructure stronger, rather > than fixing the problem target-by-target and case-by-case! If you further > look into the back-end fixes in x86 and PowerPC, you may find they looks > quite similar in back-ends. > Red zone is only one difficulty, your patch is e.g. completely ignoring existence of biased stack pointers (e.g. SPARC -m64 has them). Some targets have stack growing in opposite direction, etc. We have really a huge amount of very diverse ABIs and making the middle-end grok what is an invalid stack access is difficult. Jakub
Re: Defining constraint for registers tuple
> Don't change the constraint, just add an alternative. Or use a > different insn with an insn predicate. This is misunderstanding beacuse of my great English :) I am not going to update existing constraint. I am going to implement new one. Actually, I am looking for some expample, where similar constraint might be implemented already. -- Thanks, K
Re: Re: patch: don't issue -Wreorder warnings when order doesn't matter
> What if the object being constructed has only POD-type members with constant > initializers but is declared volatile I don't understand really... but it doesn't matter, I give up.
Re: Performance degradation on g++ 4.6
Hi Benjamin, On 2011/7/30 06:22, Benjamin Redelings I wrote: I had some performance degradation with 4.6 as well. However, I was able to cure it by using -finline-limit=800 or 1000 I think. However, this lead to a code size increase. Were the old higher-performance binaries larger? Yes, the older binary for the degraded test was indeed larger: 107K vs 88K. However, I have just re-built and re-run the test and there was no significant difference in performance. IE the degradation in "simple_types_constant_folding" test remains when building with -finline-limit=800 (or =1000) IIRC, setting finline-limit=n actually sets two params to n/2, but I think you may only need to change 1 to get the old performance back. --param max-inline-insns-single defaults to 450, but --param max-inline-insns-auto defaults to 90. Perhaps you can get the old performance back by adjusting just one of these two parameters, or by setting them to different values, instead of the same value, as would be achieved by -finline-limit. "--param max-inline-insns-auto=800" by itself does not help. The "--param max-inline-insns-single=800 --param max-inline-insns-auto=1000" combination makes no significant difference either. BTW, some of these tweaks increase the binary size to 99K, yet there is no performance increase. Oleg.
Re: Performance degradation on g++ 4.6
On Mon, 1 Aug 2011, Oleg Smolsky wrote: BTW, some of these tweaks increase the binary size to 99K, yet there is no performance increase. I don't see this in the thread: did you use -march=native? -- Marc Glisse
Do I need some Python stuff to build trunk as of 177065 ?
See: http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg00117.html Kind regards, -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Re: Do I need some Python stuff to build trunk as of 177065 ?
On Mon, 1 Aug 2011, Toon Moene wrote: See: http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg00117.html Er, the python thing only tells you your system has a broken symlink but ignores it. Did you check in libgcc/config.log for the real error? -- Marc Glisse
Re: Do I need some Python stuff to build trunk as of 177065 ?
On 08/01/2011 08:45 PM, Marc Glisse wrote: On Mon, 1 Aug 2011, Toon Moene wrote: See: http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg00117.html Er, the python thing only tells you your system has a broken symlink but ignores it. Did you check in libgcc/config.log for the real error? Oops - sorry for the noise. The relevant config.log says: cc1: error: unrecognized command line option '-mnolzcnt' Kind regards, -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Re: Do I need some Python stuff to build trunk as of 177065 ?
On 08/01/2011 08:56 PM, Toon Moene wrote: On 08/01/2011 08:45 PM, Marc Glisse wrote: On Mon, 1 Aug 2011, Toon Moene wrote: See: http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg00117.html Er, the python thing only tells you your system has a broken symlink but ignores it. Did you check in libgcc/config.log for the real error? Oops - sorry for the noise. The relevant config.log says: cc1: error: unrecognized command line option '-mnolzcnt' Would this, per chance, have been caused by revision 177034 ? -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Re: IRA vs CANNOT_CHANGE_MODE_CLASS, + 4.7 IRA regressions?
On 07/29/2011 12:13 PM, Vladimir Makarov wrote: On 07/27/2011 05:59 PM, Sandra Loosemore wrote: [snip] So, here's my question. Is it worthwhile for me to continue this approach of trying to make the MIPS backend smarter? Or is the way IRA deals with CANNOT_CHANGE_MODE_CLASS fundamentally broken and in need of fixing in a target-inspecific way? And/or is there some other regression in IRA on mainline that's causing it to spill to memory when it didn't used to in 4.6? I think the second ("fixing in a target-inspecific way"). Instead of prohibiting class for a pseudo (that what is happening for class FP_REGS) because the pseudo can change its mode, impossibility of changing mode should be reflected in the class cost (through some reload cost evaluation). I'll try to fix it. The only problem is that it will take sometime because the fix should be tested on a few platforms. It would be nice to make PR not to forget about the problem. Thanks for offering to look into this. I created http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49936 with my test case and WIP patch for the MIPS backend. At this point I'm thinking that the additional memory spills I'm seeing on mainline are not related to CANNOT_CHANGE_MODE_CLASS at all but are just some other regression in the register allocator compared to 4.6. It might be useful to try to confirm/isolate that problem first. -Sandra
libgcc: strange optimization
Hi list, consider the following test code: static void inline f1(int arg) { register int a1 asm("r8") = 10; register int a2 asm("r1") = arg; asm("scall" : : "r"(a1), "r"(a2)); } void f2(int arg) { f1(arg >> 10); } If you compile this code with 'lm32-gcc -O1 -S -c test.c' (see end of this email), the a1 = 10; assignment is optimized away. According to my understanding the following happens: 1) function inlining 2) deferred argument evaluation 3) because our target has no barrel shifter, (arg >> 10) is emitted as a function call to libgcc's __ashrsi3 (_in place_!) 4) BAM! dead code elimination optimizes r8 assignment away because calli may clobber r1-r10 (callee saved registers on lm32). If you use: void f2(int arg) { f1(__ashrsi3(arg, 10)); } everything works as expected, __ashrsi3 is evaluated before the body of f1. According to wikipedia [1], function calls are sequence points and all side effects for the arguments are completed before entering the function. So in my understanding the deferred argument evaluation is wrong if that operation is emitted as a call to a libgcc helper. I tried that on other architectures too (microblaze and avr). All show the same behaviour. If an integer arithmetic opcode is translated to a call to libgcc, every assignment to a register which is clobbered by the call is optimized away. The GCC mentions some caveats when using explicit register variables [2]: In the above example, beware that a register that is call-clobbered by the target ABI will be overwritten by any function call in the assignment, including library calls for arithmetic operators. Also a register may be clobbered when generating some operations, like variable shift, memory copy or memory move on x86. Assuming it is a call-clobbered register, this may happen to r0 above by the assignment to p2. If you have to use such a register, use temporary variables for expressions between the register assignment. But i think, this may not apply to the case above, where the arithmetic operator is an argument of the called function. Eg. there is a sequence point and the statements must not be reordered. Assembler output (lm32-gcc -O1 -S -c test.c): f2: addi sp, sp, -4 sw (sp+4), ra addi r2, r0, 10 calli__ashrsi3 scall lw ra, (sp+4) addi sp, sp, 4 bra Assembler output with no DCE (lm32-gcc -O1 -S -fno-dce -c test.c) f2: addi sp, sp, -4 sw (sp+4), ra addi r8, r0, 10 addi r2, r0, 10 calli__ashrsi3 scall lw ra, (sp+4) addi sp, sp, 4 bra [1] http://en.wikipedia.org/wiki/Sequence_point [2] http://gcc.gnu.org/onlinedocs/gcc/Extended- Asm.html#Example%20of%20asm%20with%20clobbered%20asm%20reg -- Michael
Re: libgcc: strange optimization
Michael Walle schrieb: Hi list, consider the following test code: static void inline f1(int arg) { register int a1 asm("r8") = 10; register int a2 asm("r1") = arg; asm("scall" : : "r"(a1), "r"(a2)); } void f2(int arg) { f1(arg >> 10); } If you compile this code with 'lm32-gcc -O1 -S -c test.c' (see end of this email), the a1 = 10; assignment is optimized away. Your asm has no output operands and no side effects, with more aggressive optimization the whole ask would disappear. What you want is maybe something like asm volatile ("scall" : : "r"(a1), "r"(a2)); Johann
Re: libgcc: strange optimization
Hi, That was quick :) > Your asm has no output operands and no side effects, with more > aggressive optimization the whole ask would disappear. Sorry, that was just a small test file, the original code has output operands. The new test code: static int inline f1(int arg) { register int ret asm("r1"); register int a1 asm("r8") = 10; register int a2 asm("r1") = arg; asm volatile ("scall" : "=r"(ret) : "r"(a1), "r"(a2) : "memory"); return ret; } int f2(int arg1, int arg2) { return f1(arg1 >> 10); } translates to the same assembly: f2: addi sp, sp, -4 sw (sp+4), ra addi r2, r0, 10 calli__ashrsi3 scall lw ra, (sp+4) addi sp, sp, 4 bra PS. R1 is the return register in the target architecture ABI. -- Michael
Re: libgcc: strange optimization
On 08/01/2011 01:30 PM, Michael Walle wrote: > 1) function inlining > 2) deferred argument evaluation > 3) because our target has no barrel shifter, (arg >> 10) is emitted as a > function call to libgcc's __ashrsi3 (_in place_!) > 4) BAM! dead code elimination optimizes r8 assignment away because calli > may clobber r1-r10 (callee saved registers on lm32). I'm afraid the only solution I can think of is to force F1 out-of-line. That's the only safe way to make sure that arguments are completely evaluated before forcing them into hard register variables. Alternately, expose new constraints such that you don't need the hard register variables at all. E.g. asm("scall" : : "R08"(a1), "R01"(a2)); where Rxx is defined in constraints.md for every relevant register. That'll prevent a reference to the hard register until register allocation, at which point we'll have done the right thing with the shift. r~
Re: Line 0 Hack??
Re-sending as plain text for gcc@gcc.gnu.org ... Hi, I have a question about the line 0 hack on line 13232 of gcc/cp/decl.c (or just text search for "Hack", it's the only place it's found in that file...). From my revision history, Steven introduced this in 2005, and Tom modified it in 2007 (probably when modifying the linemap). The problem is that this very call to get the source_location of line 0 creates a NEW linemap entry in the line table, AFTER all of the LC_LEAVE have taken place (i.e. we were done parsing and now add a new linemap to the line_table)... And hence, we finish the parsing with line_table->depth == 1. In particular, I am building linemap serialization for pre-parsed headers, and added what I think to be a fair gcc_assert that when we serialize the line_table, it's depth should be 0. However, if we happen to have "main" in the header (this is when we get in this block in decl.c from my understanding as DECL_MAIN_P is then true), a new linemap is added at the end of the line table in the header after the LC_LEAVE... when merging this in the middle of a C file upon deserializing, this entry makes no sense, but I can't just ignore it either as a source_location has been handed off for it... My question is, what is this "special line 0" is this just a hack for this particular situation or is "line 0" a much more important concept I can't mess around with? I could potentially hack around it, but hacking around a hack can only make things worst in the long run.. I'm not familiar with the "middle end warning" referred to in the comment in decl.c, could we potentially once and for all get rid of this hack? Best, Gabriel
[x32] Allow R_X86_64_64
Hi, It turns out that x32 needs R_X86_64_64. One major reason is the displacement range of x32 is -2G to +2G. It isn't a problem for compiler since only small model is required for x32. However, to address 0 to 4G directly in assembly code, we have to use R_X86_64_64 with movabs. I am checking the follow patch into x32 psABI to allow R_X86_64_64. -- H.J. diff --git a/object-files.tex b/object-files.tex index 3c9b9c6..7f0fd14 100644 --- a/object-files.tex +++ b/object-files.tex @@ -451,7 +451,7 @@ or \texttt{Elf32_Rel} relocation. \multicolumn{1}{c}{Calculation} \\ \hline \texttt{R_X86_64_NONE} & 0 & none & none \\ - \texttt{R_X86_64_64} $^\dagger$ & 1 & \textit{word64} & \texttt{S + A} \\ + \texttt{R_X86_64_64} & 1 & \textit{word64} & \texttt{S + A} \\ \texttt{R_X86_64_PC32} & 2 & \textit{word32} & \texttt{S + A - P} \\ \texttt{R_X86_64_GOT32} & 3 & \textit{word32} & \texttt{G + A} \\ \texttt{R_X86_64_PLT32} & 4 & \textit{word32} & \texttt{L + A - P} \\
A case that PRE optimization hurts performance
Hi, For the following simple test case, PRE optimization hoists computation (s!=1) into the default branch of the switch statement, and finally causes very poor code generation. This problem occurs in both X86 and ARM, and I believe it is also a problem for other targets. int f(char *t) { int s=0; while (*t && s != 1) { switch (s) { case 0: s = 2; break; case 2: s = 1; break; default: if (*t == '-') s = 1; break; } t++; } return s; } Taking X86 as an example, with option "-O2" you may find 52 instructions generated like below, : 0: 55 push %ebp 1: 31 c0 xor%eax,%eax 3: 89 e5 mov%esp,%ebp 5: 57 push %edi 6: 56 push %esi 7: 53 push %ebx 8: 8b 55 08mov0x8(%ebp),%edx b: 0f b6 0amovzbl (%edx),%ecx e: 84 c9 test %cl,%cl 10: 74 50 je 62 12: 83 c2 01add$0x1,%edx 15: 85 c0 test %eax,%eax 17: 75 23 jne3c 19: 8d b4 26 00 00 00 00lea0x0(%esi,%eiz,1),%esi 20: 0f b6 0amovzbl (%edx),%ecx 23: 84 c9 test %cl,%cl 25: 0f 95 c0setne %al 28: 89 c7 mov%eax,%edi 2a: b8 02 00 00 00 mov$0x2,%eax 2f: 89 fb mov%edi,%ebx 31: 83 c2 01add$0x1,%edx 34: 84 db test %bl,%bl 36: 74 2a je 62 38: 85 c0 test %eax,%eax 3a: 74 e4 je 20 3c: 83 f8 02cmp$0x2,%eax 3f: 74 1f je 60 41: 80 f9 2dcmp$0x2d,%cl 44: 74 22 je 68 46: 0f b6 0amovzbl (%edx),%ecx 49: 83 f8 01cmp$0x1,%eax 4c: 0f 95 c3setne %bl 4f: 89 df mov%ebx,%edi 51: 84 c9 test %cl,%cl 53: 0f 95 c3setne %bl 56: 89 de mov%ebx,%esi 58: 21 f7 and%esi,%edi 5a: eb d3 jmp2f 5c: 8d 74 26 00 lea0x0(%esi,%eiz,1),%esi 60: b0 01 mov$0x1,%al 62: 5b pop%ebx 63: 5e pop%esi 64: 5f pop%edi 65: 5d pop%ebp 66: c3 ret 67: 90 nop 68: b8 01 00 00 00 mov$0x1,%eax 6d: 5b pop%ebx 6e: 5e pop%esi 6f: 5f pop%edi 70: 5d pop%ebp 71: c3 ret But with command line option "-O2 -fno-tree-pre", there are only 12 instructions generated, and the code would be very clean like below, : 0: 55 push %ebp 1: 31 c0 xor%eax,%eax 3: 89 e5 mov%esp,%ebp 5: 8b 55 08mov0x8(%ebp),%edx 8: 80 3a 00cmpb $0x0,(%edx) b: 74 0e je 1b d: 80 7a 01 00 cmpb $0x0,0x1(%edx) 11: b0 02 mov$0x2,%al 13: ba 01 00 00 00 mov$0x1,%edx 18: 0f 45 c2cmovne %edx,%eax 1b: 5d pop%ebp 1c: c3 ret Do you have any idea about this? Thanks, -Jiangning
Re: Line 0 Hack??
I have removed the hack and the test output is identical to the clean build test output. See issue4835047 for the patch. Gabriel On Mon, Aug 1, 2011 at 2:56 PM, Gabriel Charette wrote: > Re-sending as plain text for gcc@gcc.gnu.org ... > > > > Hi, > > I have a question about the line 0 hack on line 13232 of gcc/cp/decl.c > (or just text search for "Hack", it's the only place it's found in > that file...). > > From my revision history, Steven introduced this in 2005, and Tom > modified it in 2007 (probably when modifying the linemap). > > The problem is that this very call to get the source_location of line > 0 creates a NEW linemap entry in the line table, AFTER all of the > LC_LEAVE have taken place (i.e. we were done parsing and now add a new > linemap to the line_table)... > > And hence, we finish the parsing with line_table->depth == 1. > > In particular, I am building linemap serialization for pre-parsed > headers, and added what I think to be a fair gcc_assert that when we > serialize the line_table, it's depth should be 0. However, if we > happen to have "main" in the header (this is when we get in this block > in decl.c from my understanding as DECL_MAIN_P is then true), a new > linemap is added at the end of the line table in the header after the > LC_LEAVE... when merging this in the middle of a C file upon > deserializing, this entry makes no sense, but I can't just ignore it > either as a source_location has been handed off for it... > > My question is, what is this "special line 0" is this just a hack for > this particular situation or is "line 0" a much more important concept > I can't mess around with? > > I could potentially hack around it, but hacking around a hack can only > make things worst in the long run.. > > I'm not familiar with the "middle end warning" referred to in the > comment in decl.c, could we potentially once and for all get rid of > this hack? > > Best, > Gabriel >
RE: [RFC] Add middle end hook for stack red zone size
Hi Jakub, Appreciate for your valuable comments! I think SPARC V9 ABI doesn't have red zone defined, right? So stack_red_zone_size should be defined as zero by default, the scheduler would block moving memory accesses across stack adjustment no matter what the offset is. I don't see any risk here. Also, in my patch function *abs* is being used to avoid the opposite stack direction issue as you mentioned. Some people like you insist on the ABI diversity, and actually I agree with you on this. But part of the ABI definition is general for all targets. The point here is memory access beyond stack red zone should be avoided, which is the general part of ABI that compiler should guarantee. For this general part, middle end should take the responsibility. Thanks, -Jiangning > -Original Message- > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Monday, August 01, 2011 6:31 PM > To: Jiangning Liu > Cc: 'Joern Rennecke'; gcc@gcc.gnu.org; gcc-patc...@gcc.gnu.org; > vmaka...@redhat.com; dje@gmail.com; Richard Henderson; Ramana > Radhakrishnan; 'Ramana Radhakrishnan' > Subject: Re: [RFC] Add middle end hook for stack red zone size > > On Mon, Aug 01, 2011 at 06:14:27PM +0800, Jiangning Liu wrote: > > ARM. You are right, they were all fixed in back-ends in the past, but > we > > should > > fix the bug in a general way to make GCC infrastructure stronger, > rather > > than fixing the problem target-by-target and case-by-case! If you > further > > look into the back-end fixes in x86 and PowerPC, you may find they > looks > > quite similar in back-ends. > > > > Red zone is only one difficulty, your patch is e.g. completely ignoring > existence of biased stack pointers (e.g. SPARC -m64 has them). > Some targets have stack growing in opposite direction, etc. > We have really a huge amount of very diverse ABIs and making the > middle-end > grok what is an invalid stack access is difficult. > > Jakub
Re: Performance degradation on g++ 4.6
Try isolate the int8_t constant folding testing from the rest to see if the slow down can be reproduced with the isolated case. If the problem disappear, it is likely due to the following inline parameters: large-function-insns, large-function-growth, large-unit-insns, inline-unit-growth. For instance set --param large-function-insns=1 --param large-unit-insns=2 David On Mon, Aug 1, 2011 at 11:43 AM, Oleg Smolsky wrote: > On 2011/7/29 14:07, Xinliang David Li wrote: >> >> Profiling tools are your best friend here. If you don't have access to >> any, the least you can do is to build the program with -pg option and >> use gprof tool to find out differences. > > The test suite has a bunch of very basic C++ tests that are executed an > enormous number of times. I've built one with the obvious performance > degradation and attached the source, output and reports. > > Here are some highlights: > v4.1: Total absolute time for int8_t constant folding: 30.42 sec > v4.6: Total absolute time for int8_t constant folding: 43.32 sec > > Every one of the tests in this section had degraded... the first half more > than the second. I am not sure how much further I can take this - the > benchmarked code is very short and plain. I can post disassembly for one > (some?) of them if anyone is willing to take a look... > > Thanks, > Oleg. >
Re: libgcc: strange optimization
On Mon, 1 Aug 2011, Georg-Johann Lay wrote: > Michael Walle schrieb: > > Hi list, > > > > consider the following test code: > > static void inline f1(int arg) > > { > >register int a1 asm("r8") = 10; > >register int a2 asm("r1") = arg; > > > >asm("scall" : : "r"(a1), "r"(a2)); > > } > > > > void f2(int arg) > > { > >f1(arg >> 10); > > } > > > > > > If you compile this code with 'lm32-gcc -O1 -S -c test.c' (see end of this > > email), the a1 = 10; assignment is optimized away. > > Your asm has no output operands and no side effects, with more aggressive > optimization the whole ask would disappear. No, for the record that's not supposed to happen for asms *without outputs*. "If an @code{asm} has output operands, GCC assumes for optimization purposes the instruction has no side effects except to change the output operands." > What you want is maybe something like > >asm volatile ("scall" : : "r"(a1), "r"(a2)); For the code at hand, the scall should be described to both have an output and be marked volatile, since the system call is a side effect that GCC can't see and might otherwise optimize away if the system call return value is unused. A plain volatile marking as the above should not be necessary, modulo gcc bugs. The real problem is quite worrysome. I don't think a port (lm32) should have to solve it with constraints; the (inline) function parameter *should* cause a non-clobbering temporary to hold any intermediate operations, but it looks as if you'll otherwise have to debug it yourself. brgds, H-P
Re: libgcc: strange optimization
On Mon, 1 Aug 2011, Richard Henderson wrote: > On 08/01/2011 01:30 PM, Michael Walle wrote: > > 1) function inlining > > 2) deferred argument evaluation > > 3) because our target has no barrel shifter, (arg >> 10) is emitted as a > > function call to libgcc's __ashrsi3 (_in place_!) > > 4) BAM! dead code elimination optimizes r8 assignment away because calli > > may clobber r1-r10 (callee saved registers on lm32). > > I'm afraid the only solution I can think of is to force F1 out-of-line. Or another temporary - but the parameter should already have that effect. brgds, H-P
Re: libgcc: strange optimization
Michael Walle wrote: > Hi, > > That was quick :) > >> Your asm has no output operands and no side effects, with more >> aggressive optimization the whole ask would disappear. > Sorry, that was just a small test file, the original code has output operands. > > The new test code: > static int inline f1(int arg) > { >register int ret asm("r1"); >register int a1 asm("r8") = 10; >register int a2 asm("r1") = arg; > >asm volatile ("scall" : "=r"(ret) : "r"(a1), "r"(a2) : "memory"); > >return ret; > } > > int f2(int arg1, int arg2) > { >return f1(arg1 >> 10); > } > > translates to the same assembly: > f2: > addi sp, sp, -4 > sw (sp+4), ra > addi r2, r0, 10 > calli__ashrsi3 > scall > lw ra, (sp+4) > addi sp, sp, 4 > bra > > PS. R1 is the return register in the target architecture ABI. I'd guess you ran into http://gcc.gnu.org/onlinedocs/gcc/Local-Reg-Vars.html#Local-Reg-Vars A common pitfall is to initialize multiple call-clobbered registers with arbitrary expressions, where a function call or library call for an arithmetic operator will overwrite a register value from a previous assignment, for example r0 below: register int *p1 asm ("r0") = ...; register int *p2 asm ("r1") = ...; In those cases, a solution is to use a temporary variable for each arbitrary expression. So I'd try to rewrite it as static int inline f1 (int arg0) { int arg = arg0; register int ret asm("r1"); register int a1 asm("r8") = 10; register int a2 asm("r1") = arg; asm volatile ("scall" : "=r"(ret) : "r"(a1), "r"(a2) : "memory"); return ret; } and if that does not help the rather hackish static int inline f1 (int arg0) { int arg = arg0; register int ret asm("r1"); register int a1 asm("r8"); register int a2 asm("r1"); asm ("" : "+r" (arg)); a1 = 10; a2 = arg; asm volatile ("scall" : "=r"(ret) : "r"(a1), "r"(a2) : "memory"); return ret; }