Re: libgcc: strange optimization
Hans-Peter Nilsson writes: > On Mon, 1 Aug 2011, Richard Henderson wrote: > > > On 08/01/2011 01:30 PM, Michael Walle wrote: > > > 1) function inlining > > > 2) deferred argument evaluation > > > 3) because our target has no barrel shifter, (arg >> 10) is emitted as a > > > function call to libgcc's __ashrsi3 (_in place_!) > > > 4) BAM! dead code elimination optimizes r8 assignment away because calli > > > may clobber r1-r10 (callee saved registers on lm32). > > > > I'm afraid the only solution I can think of is to force F1 out-of-line. > > Or another temporary - but the parameter should already have > that effect. It should, but doesn't. See PR48863 for similar breakage on ARM. /Mikael
Re: A case that PRE optimization hurts performance
On Tue, Aug 2, 2011 at 4:37 AM, Jiangning Liu wrote: > Hi, > > For the following simple test case, PRE optimization hoists computation > (s!=1) into the default branch of the switch statement, and finally causes > very poor code generation. This problem occurs in both X86 and ARM, and I > believe it is also a problem for other targets. > > int f(char *t) { > int s=0; > > while (*t && s != 1) { > switch (s) { > case 0: > s = 2; > break; > case 2: > s = 1; > break; > default: > if (*t == '-') > s = 1; > break; > } > t++; > } > > return s; > } > > Taking X86 as an example, with option "-O2" you may find 52 instructions > generated like below, > > : > 0: 55 push %ebp > 1: 31 c0 xor %eax,%eax > 3: 89 e5 mov %esp,%ebp > 5: 57 push %edi > 6: 56 push %esi > 7: 53 push %ebx > 8: 8b 55 08 mov 0x8(%ebp),%edx > b: 0f b6 0a movzbl (%edx),%ecx > e: 84 c9 test %cl,%cl > 10: 74 50 je 62 > 12: 83 c2 01 add $0x1,%edx > 15: 85 c0 test %eax,%eax > 17: 75 23 jne 3c > 19: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi > 20: 0f b6 0a movzbl (%edx),%ecx > 23: 84 c9 test %cl,%cl > 25: 0f 95 c0 setne %al > 28: 89 c7 mov %eax,%edi > 2a: b8 02 00 00 00 mov $0x2,%eax > 2f: 89 fb mov %edi,%ebx > 31: 83 c2 01 add $0x1,%edx > 34: 84 db test %bl,%bl > 36: 74 2a je 62 > 38: 85 c0 test %eax,%eax > 3a: 74 e4 je 20 > 3c: 83 f8 02 cmp $0x2,%eax > 3f: 74 1f je 60 > 41: 80 f9 2d cmp $0x2d,%cl > 44: 74 22 je 68 > 46: 0f b6 0a movzbl (%edx),%ecx > 49: 83 f8 01 cmp $0x1,%eax > 4c: 0f 95 c3 setne %bl > 4f: 89 df mov %ebx,%edi > 51: 84 c9 test %cl,%cl > 53: 0f 95 c3 setne %bl > 56: 89 de mov %ebx,%esi > 58: 21 f7 and %esi,%edi > 5a: eb d3 jmp 2f > 5c: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi > 60: b0 01 mov $0x1,%al > 62: 5b pop %ebx > 63: 5e pop %esi > 64: 5f pop %edi > 65: 5d pop %ebp > 66: c3 ret > 67: 90 nop > 68: b8 01 00 00 00 mov $0x1,%eax > 6d: 5b pop %ebx > 6e: 5e pop %esi > 6f: 5f pop %edi > 70: 5d pop %ebp > 71: c3 ret > > But with command line option "-O2 -fno-tree-pre", there are only 12 > instructions generated, and the code would be very clean like below, > > : > 0: 55 push %ebp > 1: 31 c0 xor %eax,%eax > 3: 89 e5 mov %esp,%ebp > 5: 8b 55 08 mov 0x8(%ebp),%edx > 8: 80 3a 00 cmpb $0x0,(%edx) > b: 74 0e je 1b > d: 80 7a 01 00 cmpb $0x0,0x1(%edx) > 11: b0 02 mov $0x2,%al > 13: ba 01 00 00 00 mov $0x1,%edx > 18: 0f 45 c2 cmovne %edx,%eax > 1b: 5d pop %ebp > 1c: c3 ret > > Do you have any idea about this? If you look at what PRE does it is clear that it's a profitable transformation as it reduces the number of instructions computed on the paths where s is set to a constant. If you ask for size optimization you won't get this transformation. Now - on the tree level we don't see the if-conversion opportunity (we don't have a conditional move instruction), but still we could improve phiopt to handle switch statements - not sure what we could do with the case in question though which is : switch (s_3) , case 0: , case 2: > : goto (); : if (D.2703_6 == 45) goto ; else goto (); : # s_2 = PHI <2(4), 1(3), 1(6), s_3(5)> : as the condition in at L3 complicates things. The code is also really really special (and written in an awful way ...). Richard. > Thanks, > -Jiangning > > > >
Re: Performance degradation on g++ 4.6
On Mon, Aug 1, 2011 at 8:43 PM, Oleg Smolsky wrote: > On 2011/7/29 14:07, Xinliang David Li wrote: >> >> Profiling tools are your best friend here. If you don't have access to >> any, the least you can do is to build the program with -pg option and >> use gprof tool to find out differences. > > The test suite has a bunch of very basic C++ tests that are executed an > enormous number of times. I've built one with the obvious performance > degradation and attached the source, output and reports. > > Here are some highlights: > v4.1: Total absolute time for int8_t constant folding: 30.42 sec > v4.6: Total absolute time for int8_t constant folding: 43.32 sec > > Every one of the tests in this section had degraded... the first half more > than the second. I am not sure how much further I can take this - the > benchmarked code is very short and plain. I can post disassembly for one > (some?) of them if anyone is willing to take a look... I have a hard time actually seeing what expressions they try to fold (argh, templates everywhere ...). One thing that changed between 4.1 and 4.6 is that we can no longer re-associate freely signed integers because of undefined overflow concerns - which is a correctness issue. Depending on the way the tests are written the folding in 4.1 was probably a bug. Richard. > Thanks, > Oleg. >
Re: libgcc: strange optimization
On Tue, Aug 2, 2011 at 10:48 AM, Mikael Pettersson wrote: > Hans-Peter Nilsson writes: > > On Mon, 1 Aug 2011, Richard Henderson wrote: > > > > > On 08/01/2011 01:30 PM, Michael Walle wrote: > > > > 1) function inlining > > > > 2) deferred argument evaluation > > > > 3) because our target has no barrel shifter, (arg >> 10) is emitted > as a > > > > function call to libgcc's __ashrsi3 (_in place_!) > > > > 4) BAM! dead code elimination optimizes r8 assignment away because > calli > > > > may clobber r1-r10 (callee saved registers on lm32). > > > > > > I'm afraid the only solution I can think of is to force F1 out-of-line. > > > > Or another temporary - but the parameter should already have > > that effect. > > It should, but doesn't. See PR48863 for similar breakage on ARM. On GIMPLE we don't see either the libcall nor those "dependencies". Don't use register vars. Richard. > /Mikael >
Re: libgcc: strange optimization
Richard Guenther wrote: > On Tue, Aug 2, 2011 at 10:48 AM, Mikael Pettersson wrote: >> Hans-Peter Nilsson writes: >> > On Mon, 1 Aug 2011, Richard Henderson wrote: >> > >> > > On 08/01/2011 01:30 PM, Michael Walle wrote: >> > > > 1) function inlining >> > > > 2) deferred argument evaluation >> > > > 3) because our target has no barrel shifter, (arg >> 10) is emitted >> as a >> > > > function call to libgcc's __ashrsi3 (_in place_!) >> > > > 4) BAM! dead code elimination optimizes r8 assignment away because >> calli >> > > > may clobber r1-r10 (callee saved registers on lm32). >> > > >> > > I'm afraid the only solution I can think of is to force F1 out-of-line. >> > >> > Or another temporary - but the parameter should already have >> > that effect. >> >> It should, but doesn't. See PR48863 for similar breakage on ARM. > > On GIMPLE we don't see either the libcall nor those "dependencies". > > Don't use register vars. IMO such code is supposed to work, e.g. in order to write an interface to a non-ABI assembler function. In general this cannot be expressed by means of constraints because that would imply plethora of different constraints for each register/mode. >From the documentation a user will expect that he wrote correct code and is not supposed to bother with GCC inerts like implicit library calls or GIMPLE or whatever. Johann > Richard.
Re: libgcc: strange optimization
On Tue, Aug 2, 2011 at 12:01 PM, Georg-Johann Lay wrote: > Richard Guenther wrote: >> On Tue, Aug 2, 2011 at 10:48 AM, Mikael Pettersson wrote: >>> Hans-Peter Nilsson writes: >>> > On Mon, 1 Aug 2011, Richard Henderson wrote: >>> > >>> > > On 08/01/2011 01:30 PM, Michael Walle wrote: >>> > > > 1) function inlining >>> > > > 2) deferred argument evaluation >>> > > > 3) because our target has no barrel shifter, (arg >> 10) is emitted >>> as a >>> > > > function call to libgcc's __ashrsi3 (_in place_!) >>> > > > 4) BAM! dead code elimination optimizes r8 assignment away because >>> calli >>> > > > may clobber r1-r10 (callee saved registers on lm32). >>> > > >>> > > I'm afraid the only solution I can think of is to force F1 out-of-line. >>> > >>> > Or another temporary - but the parameter should already have >>> > that effect. >>> >>> It should, but doesn't. See PR48863 for similar breakage on ARM. >> >> On GIMPLE we don't see either the libcall nor those "dependencies". >> >> Don't use register vars. > > IMO such code is supposed to work, e.g. in order to write an interface > to a non-ABI assembler function. In general this cannot be expressed > by means of constraints because that would imply plethora of different > constraints for each register/mode. > > From the documentation a user will expect that he wrote correct code > and is not supposed to bother with GCC inerts like implicit library > calls or GIMPLE or whatever. Well. I suppose what is happening is that we expand from register int a1 __asm__ (*edi); register int a2 __asm__ (*eax); int D.2700; : D.2700_2 = arg_1(D) >> 10; a1 = 10; a2 = D.2700_2; __asm__ __volatile__("scall" : : "r" a1, "r" a2); return; and end up TERing D.2700_2 = arg_1(D) >> 10, materializing a libcall after the setting of a1 and before the asm. To confirm that try -fno-tree-ter. I don't see how we can easily avoid this without exposing libcalls at the gimple level. Maybe disable TER if we see any register variable use. Richard. > Johann > >> Richard. > > >
Re: libgcc: strange optimization
Hi, > To confirm that try -fno-tree-ter. "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working assembly code: f2: addi sp, sp, -4 sw (sp+4), ra addi r2, r0, 10 calli__ashrsi3 addi r8, r0, 10 scall lw ra, (sp+4) addi sp, sp, 4 bra -- Michael
Re: libgcc: strange optimization
Michael Walle writes: > > Hi, > > > To confirm that try -fno-tree-ter. > > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working > assembly code: > > f2: > addi sp, sp, -4 > sw (sp+4), ra > addi r2, r0, 10 > calli__ashrsi3 > addi r8, r0, 10 > scall > lw ra, (sp+4) > addi sp, sp, 4 > bra -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
Re: libgcc: strange optimization
On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson wrote: > Michael Walle writes: > > > > Hi, > > > > > To confirm that try -fno-tree-ter. > > > > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working > > assembly code: > > > > f2: > > addi sp, sp, -4 > > sw (sp+4), ra > > addi r2, r0, 10 > > calli __ashrsi3 > > addi r8, r0, 10 > > scall > > lw ra, (sp+4) > > addi sp, sp, 4 > > b ra > > -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4. It's of course only a workaround, not a real fix as nothing prevents other optimizers from performing the re-scheduling TER does. I suggest to amend the documentation for local call-clobbered register variables to say that the only valid sequence using them is from a non-inlinable function that contains only direct initializations of the register variables from constants or parameters. Or go one step further and deprecate local register variables alltogether (they IMHO don't make much sense, and rather the targets should provide a way to properly constrain asm inputs and outputs). Richard.
Re: libgcc: strange optimization
Richard Guenther wrote: > On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson wrote: >> Michael Walle writes: >> > >> > Hi, >> > >> > > To confirm that try -fno-tree-ter. >> > >> > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working >> > assembly code: >> > >> > f2: >> > addi sp, sp, -4 >> > sw (sp+4), ra >> > addi r2, r0, 10 >> > calli__ashrsi3 >> > addi r8, r0, 10 >> > scall >> > lw ra, (sp+4) >> > addi sp, sp, 4 >> > bra >> >> -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4. > > It's of course only a workaround, not a real fix as nothing prevents > other optimizers from performing the re-scheduling TER does. > > I suggest to amend the documentation for local call-clobbered register > variables to say that the only valid sequence using them is from a > non-inlinable function that contains only direct initializations of the > register variables from constants or parameters. > > Or go one step further and deprecate local register variables alltogether > (they IMHO don't make much sense, and rather the targets should provide > a way to properly constrain asm inputs and outputs). Strongly oppose. Local register variables are very useful; maybe not on a linux machine but on embedded systems there are situations you cannot do without. You ever counted the constraint alternatives that that would be needed? You'd need different constraints for QI/HI/SI/DI for each register resulting in myriads of register classes increasing register allocation time and dumps would become impossible to read. I once tried it for a target...never again. Besides that, with local register vars the developer can write code that meets his requirements, whereas for constraints you will always have to change the compiler and make existing sources incompatible. If GCC provides such a feature it must work properly and not be sacrifices for this or that optimization. Correct code is always preferred over non-functional code. Johann > > Richard. >
Re: libgcc: strange optimization
On Tue, 2 Aug 2011, Richard Guenther wrote: > On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson wrote: > > Michael Walle writes: > > > > > > Hi, > > > > > > > To confirm that try -fno-tree-ter. > > > > > > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working > > > assembly code: > > > > > > f2: > > > addi sp, sp, -4 > > > sw (sp+4), ra > > > addi r2, r0, 10 > > > calli __ashrsi3 > > > addi r8, r0, 10 > > > scall > > > lw ra, (sp+4) > > > addi sp, sp, 4 > > > b ra > > > > -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4. > > It's of course only a workaround, not a real fix as nothing prevents > other optimizers from performing the re-scheduling TER does. > > I suggest to amend the documentation for local call-clobbered register > variables to say that the only valid sequence using them is from a > non-inlinable function that contains only direct initializations of the > register variables from constants or parameters. I'd be ok with that, FWIW; I see the problem with keeping the scheduling of operations in a working order (yuck) and I don't see how else to keep it working ...except perhaps make gcc flag functions with register asms as non-inlinable, maybe even flag down any of the dangerous re-scheduling? Maybe I can do that with some hand-holding? > Or go one step further and deprecate local register variables alltogether > (they IMHO don't make much sense, and rather the targets should provide > a way to properly constrain asm inputs and outputs). They do make sense when implementing e.g. system calls, and they're documented to work as discussed. (I almost regret making that happen, though.) Fortunately such functions are small, and not relatively much helped by inlining (it's a *syscall*; much more happening beyond the call than is affected by inlining some parameter initialization). Sure, new targets are much better off by implementing that through other means, but preferably intrinsic functions to asms. brgds, H-P
Revision 176335 (removal of #include in thr-posix.h) cause numerous compile failures
Revisions 176335 removed the traditional "#include " from gthr-posix.h. This breaks the build of many programs (Firefox, Chromium, etc.) that implicitly rely on it. I'm not sure that the gain is worth the pain in this case. -- Markus
Re: libgcc: strange optimization
On Tue, Aug 2, 2011 at 2:53 PM, Hans-Peter Nilsson wrote: > On Tue, 2 Aug 2011, Richard Guenther wrote: >> On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson wrote: >> > Michael Walle writes: >> > > >> > > Hi, >> > > >> > > > To confirm that try -fno-tree-ter. >> > > >> > > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following >> > working >> > > assembly code: >> > > >> > > f2: >> > > addi sp, sp, -4 >> > > sw (sp+4), ra >> > > addi r2, r0, 10 >> > > calli __ashrsi3 >> > > addi r8, r0, 10 >> > > scall >> > > lw ra, (sp+4) >> > > addi sp, sp, 4 >> > > b ra >> > >> > -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4. >> >> It's of course only a workaround, not a real fix as nothing prevents >> other optimizers from performing the re-scheduling TER does. >> >> I suggest to amend the documentation for local call-clobbered register >> variables to say that the only valid sequence using them is from a >> non-inlinable function that contains only direct initializations of the >> register variables from constants or parameters. > > I'd be ok with that, FWIW; I see the problem with keeping the > scheduling of operations in a working order (yuck) and I don't > see how else to keep it working ...except perhaps make gcc flag > functions with register asms as non-inlinable, maybe even flag > down any of the dangerous re-scheduling? But then can't people use a pure assembler stub instead? Without inlining there isn't much benefit left from writing void f1(int arg) { register int a1 asm("r8") = 10; register int a2 asm("r1") = arg; asm("scall" : : "r"(a1), "r"(a2)); } instead of f1: mov r8, 10 mov r1, rX scall ret in a .s file no? I doubt much prologue/epilogue is needed. Or even write void f1(int arg) { asm("mov r8, %0; mov r1 %1; scall;" : : "g"(a1), "g"(a2) : "r8", "r1"); } which should be inlinable again (yes, in inlined for not optimally register-allocated, but compared to the non-inline routine?). Richard. > Maybe I can do that with some hand-holding? > >> Or go one step further and deprecate local register variables alltogether >> (they IMHO don't make much sense, and rather the targets should provide >> a way to properly constrain asm inputs and outputs). > > They do make sense when implementing e.g. system calls, and > they're documented to work as discussed. (I almost regret > making that happen, though.) Fortunately such functions are > small, and not relatively much helped by inlining (it's a > *syscall*; much more happening beyond the call than is affected > by inlining some parameter initialization). Sure, new targets > are much better off by implementing that through other means, > but preferably intrinsic functions to asms. > > brgds, H-P
Re: Revision 176335 (removal of #include in thr-posix.h) cause numerous compile failures
On Tue, Aug 02, 2011 at 03:08:03PM +0200, Markus Trippelsdorf wrote: > Revisions 176335 removed the traditional "#include " from > gthr-posix.h. This breaks the build of many programs (Firefox, Chromium, > etc.) that implicitly rely on it. This isn't the first time the libstdc++ headers were cleaned up, and each time there are dozens of programs that need to be fixed up. Each time they just were fixed. > I'm not sure that the gain is worth the pain in this case. It certainly is. Jakub
Re: libgcc: strange optimization
On Tue, 2 Aug 2011, Richard Guenther wrote: > > I'd be ok with that, FWIW; I see the problem with keeping the > > scheduling of operations in a working order (yuck) and I don't > > see how else to keep it working ...except perhaps make gcc flag > > functions with register asms as non-inlinable, maybe even flag > > down any of the dangerous re-scheduling? > > But then can't people use a pure assembler stub instead? I see your point, but you're thinking new code, I'm thinking let's keep existing code used by several targets and documented as working the last seven years working. Maybe breakable with a major release; in gcc-5. (Oh no, I see what's coming. :) brgds, H-P
Re: libgcc: strange optimization
Richard Guenther writes: > Or go one step further and deprecate local register variables alltogether > (they IMHO don't make much sense, and rather the targets should provide > a way to properly constrain asm inputs and outputs). No, local register variables are documented as working and many programs rely on them. They are a straightforward way to get an asm argument in a specific register, and I don't see any reason to break that. > I suggest to amend the documentation for local call-clobbered register > variables to say that the only valid sequence using them is from a > non-inlinable function that contains only direct initializations of the > register variables from constants or parameters. Let's just implement those requirements in the compiler itself. Ian
Re: libgcc: strange optimization
On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor wrote: > Richard Guenther writes: > >> Or go one step further and deprecate local register variables alltogether >> (they IMHO don't make much sense, and rather the targets should provide >> a way to properly constrain asm inputs and outputs). > > No, local register variables are documented as working and many programs > rely on them. They are a straightforward way to get an asm argument in > a specific register, and I don't see any reason to break that. Well, maybe they look like so. But in reality there is _no_ connection from the register setup to the actual asm. Which is the problem the compiler faces here (apart from the libcall issue). If there should be an implicit dependence of all asms to all local register var setters and users then this isn't implemented on gimple (or rather it works by chance there as we treat register vars as memory and do not disambiguate anything across asms (yet)). >> I suggest to amend the documentation for local call-clobbered register >> variables to say that the only valid sequence using them is from a >> non-inlinable function that contains only direct initializations of the >> register variables from constants or parameters. > > Let's just implement those requirements in the compiler itself. Doesn't work for existing code, no? And if thinking new code then I'd rather have explicit dependences (and a way to represent them). Thus, for example asm ("scall" : : "asm("r0")" (10), ...) thus, why force new constraints when we already can figure out local register vars by register name? Why not extend the constraint syntax somehow to allow specifying the same effect? Richard. > Ian >
Re: Revision 176335 (removal of #include in thr-posix.h) cause numerous compile failures
On 2 August 2011 14:08, Markus Trippelsdorf wrote: > Revisions 176335 removed the traditional "#include " from > gthr-posix.h. This breaks the build of many programs (Firefox, Chromium, > etc.) that implicitly rely on it. > I'm not sure that the gain is worth the pain in this case. The "pain" fixes a real bug in libstdc++ which caused g++ to reject valid programs. Not rejecting valid programs is more important than accepting invalid ones, especially ones that can be easily fixed by adding a missing header.
Re: libgcc: strange optimization
Richard Guenther writes: > On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor wrote: >> Richard Guenther writes: >> >>> Or go one step further and deprecate local register variables alltogether >>> (they IMHO don't make much sense, and rather the targets should provide >>> a way to properly constrain asm inputs and outputs). >> >> No, local register variables are documented as working and many programs >> rely on them. They are a straightforward way to get an asm argument in >> a specific register, and I don't see any reason to break that. > > Well, maybe they look like so. But in reality there is _no_ connection > from the register setup to the actual asm. Which is the problem the > compiler faces here (apart from the libcall issue). If there should be > an implicit dependence of all asms to all local register var setters > and users then this isn't implemented on gimple (or rather it works > by chance there as we treat register vars as memory and do not > disambiguate anything across asms (yet)). I'm not sure why we need to do anything at the GIMPLE level other than disable some optimizations. There is a connection from the register variable to the asm--the asm refers to the variable. There is nothing specific about the register in there, but at the GIMPLE level there doesn't have to be. We should not break a useful existing feature because we find it inconvenient. Let's just disable some optimizations so that it continues to work. >>> I suggest to amend the documentation for local call-clobbered register >>> variables to say that the only valid sequence using them is from a >>> non-inlinable function that contains only direct initializations of the >>> register variables from constants or parameters. >> >> Let's just implement those requirements in the compiler itself. > > Doesn't work for existing code, no? Why not? > And if thinking new code then > I'd rather have explicit dependences (and a way to represent them). > Thus, for example > > asm ("scall" : : "asm("r0")" (10), ...) > > thus, why force new constraints when we already can figure out > local register vars by register name? Why not extend the constraint > syntax somehow to allow specifying the same effect? I agree that it would be a good idea to permit asms to indicate the specific register the operand should go into. Ian
Re: libgcc: strange optimization
On 08/02/2011 05:22 AM, Richard Guenther wrote: >> -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4. > > It's of course only a workaround, not a real fix as nothing prevents > other optimizers from performing the re-scheduling TER does. > > I suggest to amend the documentation for local call-clobbered register > variables to say that the only valid sequence using them is from a > non-inlinable function that contains only direct initializations of the > register variables from constants or parameters. > > Or go one step further and deprecate local register variables alltogether > (they IMHO don't make much sense, and rather the targets should provide > a way to properly constrain asm inputs and outputs). Neither of these is a viable option. What we might be able to do is throttle TER when the destination is a local register variable. This should unbreak the common case of local regs immediately surrounding an asm. r~
Re: libgcc: strange optimization
Richard Guenther schrieb: I suggest to amend the documentation for local call-clobbered register variables to say that the only valid sequence using them is from a non-inlinable function that contains only direct initializations of the register variables from constants or parameters. Richard. That's completely counterproductive. If a developer invents asm or local register variables he has a very good reason for that choice like to meet hard (with hard as in HARD) real time constraints. Disabling inlining a function that uses local register vars would make many places of local register vars unusable because thre would no more be a way to write down the exact register usage footprint of a piece of asm. Typical use-cases are interfacing to a function that has a smaller register footprint than an ordinary block-box function or doing some arithmetic that needs special hard regs for which there are no fitting constraints. All this will be impossible if inlining is disabled for such functions; then it is no more possible to describe such a low-overhead piece of code without calling a black box function, clobber all call-clobbered registers, render a maybe tail-function into a non tail-call function etc. Embedded systems like ARM or PowerPC based get more and more important over the years and GCC should put more attention to them and their needs; not only to bolide servers/PCs. This includes systems with hard real time constraints. Johann
Problem in bootstrapping trunk - error in stage 2 -mnolzcnt command line option.
What am I doing wrong: configure:3247: checking for suffix of object files configure:3269: /home/toon/compilers/obj-t/./gcc/xgcc -B/home/toon/compilers/obj-t/./gcc/ -B/tmp/lto/x86_64-unknown-linux-gnu/bin/ -B/tmp/lto/x86_64-unknown-linux-gnu/lib/ -isystem /tmp/lto/x86_64-unknown-linux-gnu/include -isystem /tmp/lto/x86_64-unknown-linux-gnu/sys-include-c -g -O2 conftest.c >&5 cc1: error: unrecognized command line option '-mnolzcnt' The configure line is: ../gcc/configure --prefix=/tmp/lto --enable-languages=c++ --with-build-config=bootstrap-lto --with-gnu-ld --disable-multilib --disable-nls --with-arch=native --with-tune=native && \ Kind regards, -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Re: Problem in bootstrapping trunk - error in stage 2 -mnolzcnt command line option.
On Tue, Aug 2, 2011 at 11:24 AM, Toon Moene wrote: > What am I doing wrong: > > configure:3247: checking for suffix of object files > configure:3269: /home/toon/compilers/obj-t/./gcc/xgcc > -B/home/toon/compilers/obj-t/./gcc/ -B/tmp/lto/x86_64-unknown-linux-gnu/bin/ > -B/tmp/lto/x86_64-unknown-linux-gnu/lib/ -isystem > /tmp/lto/x86_64-unknown-linux-gnu/include -isystem > /tmp/lto/x86_64-unknown-linux-gnu/sys-include -c -g -O2 conftest.c >&5 > cc1: error: unrecognized command line option '-mnolzcnt' > > The configure line is: > > ../gcc/configure --prefix=/tmp/lto --enable-languages=c++ > --with-build-config=bootstrap-lto --with-gnu-ld --disable-multilib > --disable-nls --with-arch=native --with-tune=native && \ > I checked in this as an obvious fix. Thanks. -- H.J. ---Index: ChangeLog === --- ChangeLog (revision 177203) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2011-08-02 H.J. Lu + + * config/i386/driver-i386.c (host_detect_local_cpu): Fix a typo. + 2011-08-02 Richard Henderson PR target/49878 Index: config/i386/driver-i386.c === --- config/i386/driver-i386.c (revision 177203) +++ config/i386/driver-i386.c (working copy) @@ -718,7 +718,7 @@ const char *host_detect_local_cpu (int a const char *avx = has_avx ? " -mavx" : " -mno-avx"; const char *sse4_2 = has_sse4_2 ? " -msse4.2" : " -mno-sse4.2"; const char *sse4_1 = has_sse4_1 ? " -msse4.1" : " -mno-sse4.1"; - const char *lzcnt = has_lzcnt ? " -mlzcnt" : " -mnolzcnt"; + const char *lzcnt = has_lzcnt ? " -mlzcnt" : " -mno-lzcnt"; options = concat (options, cx16, sahf, movbe, ase, pclmul, popcnt, abm, lwp, fma, fma4, xop, bmi, tbm,
Re: libgcc: strange optimization
On Tue, Aug 2, 2011 at 6:02 PM, Richard Henderson wrote: > On 08/02/2011 05:22 AM, Richard Guenther wrote: >>> -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4. >> >> It's of course only a workaround, not a real fix as nothing prevents >> other optimizers from performing the re-scheduling TER does. >> >> I suggest to amend the documentation for local call-clobbered register >> variables to say that the only valid sequence using them is from a >> non-inlinable function that contains only direct initializations of the >> register variables from constants or parameters. >> >> Or go one step further and deprecate local register variables alltogether >> (they IMHO don't make much sense, and rather the targets should provide >> a way to properly constrain asm inputs and outputs). > > Neither of these is a viable option. > > What we might be able to do is throttle TER when the destination > is a local register variable. This should unbreak the common case > of local regs immediately surrounding an asm. Sure, similar to disabling TER for functions containing such vars. But it isn't a solution for the general issue that nothing prevents scheduling gimple statements between register variable def and use. Richard. > > r~ >
ANN: gcc-python-plugin 0.6
gcc-python-plugin is a plugin for GCC 4.6 onwards which embeds the CPython interpreter within GCC, allowing you to write new compiler warnings in Python, generate code visualizations, etc. Tarball releases are available at: https://fedorahosted.org/releases/g/c/gcc-python-plugin/ Prebuilt-documentation can be seen at: http://readthedocs.org/docs/gcc-python-plugin/en/latest/index.html Project homepage: https://fedorahosted.org/gcc-python-plugin/ What's new in v0.6: - It's now possible to create new gcc passes in Python, rather than just wire up callbacks to existing passes. See: http://readthedocs.org/docs/gcc-python-plugin/en/latest/passes.html#creating-new-optimization-passes - GCC's callgraph information is now visible via a Python API. See: http://readthedocs.org/docs/gcc-python-plugin/en/latest/callgraph.html - GCC's Register Transfer Language is now visible via a Python API (albeit a rather primitive one so far). - Improvements to Python 3 support (the selftests are now built and run against the version of Python that the plugin was compiled against, rather than hardcoding Python 2.7; various Python 3 and --with-py-debug compatibility issues that this shook out are now fixed). - Various other minor fixes Enjoy! Dave
gcc-4.4-20110802 is now available
Snapshot gcc-4.4-20110802 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20110802/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.4 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch revision 177219 You'll find: gcc-4.4-20110802.tar.bz2 Complete GCC MD5=5f0ee4e2d2e6d516442e1bf200694d38 SHA1=d67e299b15edb6f7f90e95bdd172c83c269aa3bb Diffs from 4.4-20110726 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: libgcc: strange optimization
Richard Guenther writes: > But then can't people use a pure assembler stub instead? Without > inlining there isn't much benefit left from writing > > void f1(int arg) > { > register int a1 asm("r8") = 10; > register int a2 asm("r1") = arg; > > asm("scall" : : "r"(a1), "r"(a2)); > } > > instead of > > f1: > mov r8, 10 > mov r1, rX > scall > ret > > in a .s file no? I doubt much prologue/epilogue is needed. > > Or even write > > void f1(int arg) > { > asm("mov r8, %0; mov r1 %1; scall;" : : "g"(a1), "g"(a2) : "r8", "r1"); > } Of course in practice people _do_ want to use it with f1 inlined, where using reg variables (or alternatively, some expanded constraint language for the asm parameters) can really get rid of tons of unnecessary asm moves, and they want the compiler to guard against conflicts. -Miles -- "Whatever you do will be insignificant, but it is very important that you do it." Mahatma Gandhi