Re: [RFC] timers, pointers to functions and type safety
On Fri, 2006-12-01 at 17:21 +, Al Viro wrote: > Now, there's another question: how do we get there? Or, at least, from > current void (*)(unsigned long) to void (*)(void *)... I think the real solution should be void (*function)(struct timer_list *timer); and hand the timer itself to the callback. Most of the timers are embedded into data structures anyway and for the rest we can easily build one. > "A fscking huge patch flipping everything at once" is obviously not an > answer; too much PITA merging and impossible to review. There are ~ 500 files affected and this is in the range of cleanups we did recently at the end of the merge window already. I'd volunteer to hack this up and keep the patch up to date until the final merge. I have done that before and I'm not scared about it. The patches are a couple of lines per file and I do not agree that this is impossible to review. tglx
Re: BUG: GCC-4.4.x changes the function frame on some functions
On Thu, 19 Nov 2009, Thomas Gleixner wrote: Can the GCC folks please shed some light on this: standard function start: push %ebp mov%esp, %ebp call mcount modified function start on a handful of functions only seen with gcc 4.4.x on x86 32 bit: push %edi lea0x8(%esp),%edi and$0xfff0,%esp pushl -0x4(%edi) push %ebp mov%esp,%ebp ... call mcount This modification leads to a hard to solve problem in the kernel function graph tracer which assumes that the stack looks like: return address saved ebp With the modified function start sequence this is not longer true and the manipulation of the return address on the stack fails silently. Neither gcc 4.3 nor gcc 3.4 are generating such function frames, so it looks like a gcc 4.4.x feature. There is no real obvious reason why the edi magic needs to be done _before_ push %ebp mov%esp,%ebp Thanks, tglx
Re: BUG: GCC-4.4.x changes the function frame on some functions
On Thu, 19 Nov 2009, Andrew Haley wrote: > Thomas Gleixner wrote: > > There is no real obvious reason why the edi magic needs to be done > > _before_ > > > > push %ebp > > mov%esp,%ebp > > Sure there is: unless you do the adjustment first %ebp won't be 16-aligned. And why is this not done in 99% of the functions in the kernel, just in this one and some random others ? > We're aligning the stack properly, as per the ABI requirements. Can't > you just fix the tracer? Where is that ABI requirement that push %ebp needs to happen on an aligned stack ? And why is this something GCC did not care about until GCC4.4 ? Thanks, tglx
Re: BUG: GCC-4.4.x changes the function frame on some functions
On Thu, 19 Nov 2009, Andrew Haley wrote: > Thomas Gleixner wrote: > > On Thu, 19 Nov 2009, Andrew Haley wrote: > >> Thomas Gleixner wrote: > >>> There is no real obvious reason why the edi magic needs to be done > >>> _before_ > >>> > >>> push %ebp > >>> mov%esp,%ebp > >> Sure there is: unless you do the adjustment first %ebp won't be 16-aligned. > > > > And why is this not done in 99% of the functions in the kernel, just > > in this one and some random others ? > > If I could see the function I might be able to tell you. It's either a > performance enhancement, something to do with SSE, or it's a bug. kernel/time/timer_stats.c timer_stats_update_stats() Here is the disassembly: 8107ad50 : 8107ad50: 57 push %edi 8107ad51: 8d 7c 24 08 lea0x8(%esp),%edi 8107ad55: 83 e4 f0and$0xfff0,%esp 8107ad58: ff 77 fcpushl -0x4(%edi) 8107ad5b: 55 push %ebp 8107ad5c: 89 e5 mov%esp,%ebp 8107ad5e: 57 push %edi 8107ad5f: 56 push %esi 8107ad60: 53 push %ebx 8107ad61: 83 ec 6csub$0x6c,%esp 8107ad64: e8 47 92 f8 ff call 81003fb0 8107ad69: 8b 77 04mov0x4(%edi),%esi 8107ad6c: 89 75 a4mov%esi,-0x5c(%ebp) 8107ad6f: 65 8b 35 14 00 00 00mov%gs:0x14,%esi 8107ad76: 89 75 e4mov%esi,-0x1c(%ebp) 8107ad79: 31 f6 xor%esi,%esi 8107ad7b: 8b 35 60 5a cd 81 mov0x81cd5a60,%esi 8107ad81: 8b 1f mov(%edi),%ebx 8107ad83: 85 f6 test %esi,%esi 8107ad85: 8b 7f 08mov0x8(%edi),%edi 8107ad88: 75 18 jne8107ada2 8107ad8a: 8b 45 e4mov-0x1c(%ebp),%eax 8107ad8d: 65 33 05 14 00 00 00xor%gs:0x14,%eax 8107ad94: 75 53 jne8107ade9 8107ad96: 83 c4 6cadd$0x6c,%esp 8107ad99: 5b pop%ebx 8107ad9a: 5e pop%esi 8107ad9b: 5f pop%edi 8107ad9c: 5d pop%ebp 8107ad9d: 8d 67 f8lea-0x8(%edi),%esp 8107ada0: 5f pop%edi 8107ada1: c3 ret 8107ada2: be 00 7a d6 81 mov$0x81d67a00,%esi 8107ada7: 89 45 acmov%eax,-0x54(%ebp) 8107adaa: 89 75 a0mov%esi,-0x60(%ebp) 8107adad: 89 5d b4mov%ebx,-0x4c(%ebp) 8107adb0: 64 8b 35 78 6a d6 81mov%fs:0x81d66a78,%esi 8107adb7: 8b 34 b5 20 50 cd 81mov-0x7e32afe0(,%esi,4),%esi 8107adbe: 89 4d b0mov%ecx,-0x50(%ebp) 8107adc1: 01 75 a0add%esi,-0x60(%ebp) 8107adc4: 89 55 b8mov%edx,-0x48(%ebp) 8107adc7: 8b 45 a0mov-0x60(%ebp),%eax 8107adca: 89 7d c0mov%edi,-0x40(%ebp) 8107adcd: e8 de f7 76 00 call 817ea5b0 <_spin_lock_irqsave> 8107add2: 83 3d 60 5a cd 81 00cmpl $0x0,0x81cd5a60 8107add9: 89 c3 mov%eax,%ebx 8107addb: 75 11 jne8107adee 8107addd: 89 da mov%ebx,%edx 8107addf: 8b 45 a0mov-0x60(%ebp),%eax 8107ade2: e8 79 fc 76 00 call 817eaa60 <_spin_unlock_irqrestore> 8107ade7: eb a1 jmp8107ad8a 8107ade9: e8 52 e4 fc ff call 81049240 <__stack_chk_fail> 8107adee: 8d 45 a8lea-0x58(%ebp),%eax 8107adf1: 8b 55 a4mov-0x5c(%ebp),%edx 8107adf4: e8 f7 fd ff ff call 8107abf0 8107adf9: 85 c0 test %eax,%eax 8107adfb: 74 05 je 8107ae02 8107adfd: ff 40 14incl 0x14(%eax) 8107ae00: eb db jmp8107addd 8107ae02: f0 ff 05 00 67 fd 81lock incl 0x81fd6700 8107ae09: eb d2 jmp8107addd 8107ae0b: 90 nop 8107ae0c: 90 nop 8107ae0d: 90 nop 8107ae0e: 90 nop 8107ae0f: 90 nop There is a dozen more of those. Thanks, tglx
Re: BUG: GCC-4.4.x changes the function frame on some functions
On Thu, 19 Nov 2009, Linus Torvalds wrote: > Umm. But it still does, doesn't it? That > > pushl -0x4(%edi) > push %ebp > > should do it - the "-0x4(%edi)" thing seems to be trying to reload the > return address. No? > > Maybe I misread the code - but regardless, it does look like a gcc code > generation bug if only because we really don't want a 16-byte aligned > stack anyway, and have asked for it to not be done. > > So I agree that gcc shouldn't do that crazy prologue (and certainly _not_ > before calling mcount anyway), but I'm not sure I agree with that detail > of your analysis or explanation. Yes, it does store the return address before the pushed ebp, but this is a copy of the real stack entry which is before the pushed edi. The function graph tracer needs to redirect the return into the tracer and it therefor saves the real return address and modifies the stack so the return ends up in the tracer code which then goes back to the real return address. But in this prologue/aligment case we modify the copy and not the real return address on the stack, so we return without calling into the tracer which is causing the headache because the state of the tracer becomes confused. Thanks, tglx
Re: BUG: GCC-4.4.x changes the function frame on some functions
On Thu, 19 Nov 2009, Richard Guenther wrote: > Note that I only can reproduce the issue with > -mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2. > And > you didn't provide us with a testcase either ... so please open > a bugzilla and attach preprocessed source of a file that > shows the problem, note the function it happens in and provide > the command-line options you used for building. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109 Thanks, tglx
Re: BUG: GCC-4.4.x changes the function frame on some functions
On Thu, 19 Nov 2009, Andrew Haley wrote: > OK, I found it. There is a struct defined as > > struct entry { > ... > } __attribute__((__aligned__((1 << (4); > > and then in timer_stats_update_stats you have a local variable of type > struct entry: > > void timer_stats_update_stats() > { > spinlock_t *lock; > struct entry *entry, input; > > So, gcc has to 16-align the stack pointer to satisfy the alignment > for struct entry. This does not explain why GCC < 4.4.x actually puts push %ebp mov %esp, %ebp first and why GCC 4.4.x decides to create an extra copy of the return address instead of just keeping the mcount stack magic right at the function entry. Thanks, tglx
Re: BUG: GCC-4.4.x changes the function frame on some functions
On Thu, 19 Nov 2009, Linus Torvalds wrote: > > I bet other people than just the kernel use the mcount hook for subtler > > things than just doing profiles. And even if they don't, the quoted code > > generation is just crazy _crap_. > > For the kernel, if the only case is that timer_stat.c thing that Thomas > pointed at, I guess we can at least work around it with something like the > appended. The kernel code is certainly ugly too, no question about that. > > It's just that we'd like to be able to depend on mcount code generation > not being insane even in the presense of ugly code.. > > The alternative would be to have some warning when this happens, so that > we can at least see it. "mcount won't work reliably" There are at least 20 other random functions which have the same problem. Have not looked at the details yet. Just compiled with -mincoming-stack-boundary=4 and the problem goes away as gcc now thinks that the incoming stack is already 16 byte aligned. But that might break code which actually uses SSE Thanks, tglx
Re: BUG: GCC-4.4.x changes the function frame on some functions
On Thu, 19 Nov 2009, Linus Torvalds wrote: > On Thu, 19 Nov 2009, Richard Guenther wrote: > > > > Note that I only can reproduce the issue with > > -mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2. > > Since you can reproduce it with -mincoming-stack-boundary=2, I woul > suggest just fixing mcount handling that way regardless of anything else. > The current code generated by gcc is just insane - even for the case where > you _want_ 16-byte stack alignment. > > Instead crazy code like > > > push %edi > > lea0x8(%esp),%edi > > and$0xfff0,%esp > > pushl -0x4(%edi) > > push %ebp > > mov%esp,%ebp > > ... > > call mcount > > the sane thing to do would be to just do it as > > push %ebp > mov%esp,%ebp > call mcount > and$0xfff0,%esp which is what the 64bit compile does except that the mcount call happens a bit later which is fine. 8107cd34 : 8107cd34: 55 push %rbp 8107cd35: 48 89 e5mov%rsp,%rbp 8107cd38: 48 83 e4 c0 and$0xffc0,%rsp Thanks, tglx
Re: BUG: GCC-4.4.x changes the function frame on some functions
On Thu, 19 Nov 2009, Jeff Law wrote: > On 11/19/09 15:43, Steven Rostedt wrote: > > On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote: > > > > > > > Having said all that, I don't expect to personally be looking at the > > > problem, given the list of other codegen issues that need to be looked > > > at (reload in particular), profiling/stack interactions would be around > > > 87 millionth on my list. > > > > > Is there someone else that can look at it? > > > > > Unsure at the moment... Like everyone else, GCC developers are busy and this > probably isn't going to be a high priority item for anyone. > > > > Or at the very least, could you point us to where that code is, and one > > of us tracing folks could take a crack at switching hats to be a > > compiler writer (with the obvious prerequisite of drinking a lot of beer > > first, or is there a better drug to cope with the pain of writing gcc?). > > > It _might_ be as easy as defining PROFILE_BEFORE_PROLOGUE in > gcc-gcc/config/i386/linux.h & rebuilding GCC. > > Based on comments elsewhere, the sun386i support may have used > PROFILE_BEFORE_PROLOGUE in the past and thus the x86 backend may not need > further adjustment. That is obviously the ideal case. > > If that appears to work for your needs, I'll volunteer to test it more > thoroughly and assuming those tests look good shepherd it into the source > tree. We definitely want to see that ASAP. While testing various kernel configs we found out that the problem comes and goes. Finally I started to compare the gcc command line options and after some fiddling it turned out that the following minimal deltas change the code generator behaviour: Bad: -march=pentium-mmx-Wa,-mtune=generic32 Good: -march=i686-mtune=generic -Wa,-mtune=generic32 Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32 I'm not supposed to understand the logic behind that, right ? Thanks, tglx
Re: BUG: GCC-4.4.x changes the function frame on some functions
On Thu, 19 Nov 2009, Linus Torvalds wrote: > On Fri, 20 Nov 2009, Thomas Gleixner wrote: > > > > While testing various kernel configs we found out that the problem > > comes and goes. Finally I started to compare the gcc command line > > options and after some fiddling it turned out that the following > > minimal deltas change the code generator behaviour: > > > > Bad: -march=pentium-mmx-Wa,-mtune=generic32 > > Good: -march=i686-mtune=generic -Wa,-mtune=generic32 > > Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32 > > > > I'm not supposed to understand the logic behind that, right ? > > Are you sure it's just the compiler flags? I first captured the command line with V=1 and created a script of it. Then I changed the -march -mtune options in that script and compiled just that single file manually w/o changing .config or invoking the kernel make magic. The good ones produce: 650: 55 push %ebp 651: 89 e5 mov%esp,%ebp 653: 83 e4 f0and$0xfff0,%esp The bad one: 05f0 : 5f0: 57 push %edi 5f1: 8d 7c 24 08 lea0x8(%esp),%edi 5f5: 83 e4 f0and$0xfff0,%esp 5f8: ff 77 fcpushl -0x4(%edi) 5fb: 55 push %ebp 5fc: 89 e5 mov%esp,%ebp > There's another configuration portion: the size of the alignment itself. > That's dependent on L1_CACHE_SHIFT, which in turn is taken from the kernel > config CONFIG_X86_L1_CACHE_SHIFT. > > Maybe that value matters too - for example maybe gcc will not try to align > the stack if it's big? That does not change any of the compiler options, but yes it could have some effect via the various include magics, but all I have seen so far is linkage.h which should not affect the compiler. And the manual compile did not change any of this. > [ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT > totally unrelated numbers? Very confusing. ] Agreed. > The compiler flags we use are tied to some of the same choices that choose > the cache shift, so the correlation you found while debugging this would > still hold. Digging further tomorrow when my brain is more awake. Thanks, tglx
Re: BUG: GCC-4.4.x changes the function frame on some functions
On Fri, 20 Nov 2009, Thomas Gleixner wrote: > On Thu, 19 Nov 2009, Linus Torvalds wrote: > > On Fri, 20 Nov 2009, Thomas Gleixner wrote: > > > > > > While testing various kernel configs we found out that the problem > > > comes and goes. Finally I started to compare the gcc command line > > > options and after some fiddling it turned out that the following > > > minimal deltas change the code generator behaviour: > > > > > > Bad: -march=pentium-mmx-Wa,-mtune=generic32 > > > Good: -march=i686-mtune=generic -Wa,-mtune=generic32 > > > Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32 Found some more: Bad: -march=k6 -Wa,-mtune=generic32 Bad: -march=geode-Wa,-mtune=generic32 Bad: -march=c3 -Wa,-mtune=generic32 That seems every thing which has MMX support but no SSE and is somehow compatible to the pentium-mmx. Looks like the code generator optimization for those was done after consuming the secret gcc-shrooms. Thanks, tglx
Re: BUG: GCC-4.4.x changes the function frame on some functions
On Mon, 23 Nov 2009, Jakub Jelinek wrote: > On Thu, Nov 19, 2009 at 08:01:57PM +0100, Thomas Gleixner wrote: > > Just compiled with -mincoming-stack-boundary=4 and the problem goes > > away as gcc now thinks that the incoming stack is already 16 byte > > aligned. But that might break code which actually uses SSE > > Please don't do this, lying to the compiler is just going to result in > wrong-code sooner or later, with the above switch gcc will assume the > incoming stack is 16-byte aligned (which is not true in the ix86 kernel) > and could very well e.g. optimize away code that looks at > alignment of stack variables etc. Right. I gave up the idea pretty fast. But in the current situation we are forced to lie to the compiler in some way. Forcing -mtune=generic when the function graph tracer is enabled seems to be a halfways sane work around. Thanks, tglx
Re: [PATCH][GIT PULL][v2.6.32] tracing/x86: Add check to detect GCC messing with mcount prologue
On Tue, 24 Nov 2009, Andrew Haley wrote: > H.J. Lu wrote: > > On Sun, Nov 22, 2009 at 9:20 AM, Andrew Haley wrote: > >> H.J. Lu wrote: > >>> On Fri, Nov 20, 2009 at 11:35 AM, Andrew Haley wrote: > Steven Rostedt wrote: > > Ingo, Thomas and Linus, > > > > I know Thomas did a patch to force the -mtune=generic, but just in case > > gcc decides to do something crazy again, this patch will catch it. > > > > Should we try to get this in now? > I'm sure this makes sense, but a gcc test case would be even better. > If this can be detected in the gcc test suite it'll be found and > fixed long before y'all in kernel land get to see it. That's the > only way to guarantee this never bothers you again. > > H.J., who wrote the code in question, is hopefully looking at why > this odd code is being generated. Once he's done I can put a > suitable test case in the gcc test suite. > > >>> See: > >>> > >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109#c7 > >> I saw that, but does it mean you're going to investigate? There is > >> no obvious reason why -mtune=generic should affect code generation > >> in this way, but it does. > > > > Why not, there is > > > > static const unsigned int x86_accumulate_outgoing_args > > = m_AMD_MULTIPLE | m_ATOM | m_PENT4 | m_NOCONA | m_PPRO | m_CORE2 > > | m_GENERIC; > > > > -mtune=generic turns on -maccumulate-outgoing-args. > > Alright, so let's at least try to give the kernel people the information > that they need. > > What you're saying is, to avoid this: > > 05f0 : > 5f0: 57 push %edi > 5f1: 8d 7c 24 08 lea0x8(%esp),%edi > 5f5: 83 e4 f0and$0xfff0,%esp > 5f8: ff 77 fcpushl -0x4(%edi) > 5fb: 55 push %ebp > 5fc: 89 e5 mov%esp,%ebp > > you should compile your code with -maccumulate-outgoing-args, and there's > no need to use -mtune=generic. Is that right? Seems to work. What other side effects has that ? Thanks, tglx
Re: [PATCH][GIT PULL][v2.6.32] tracing/x86: Add check to detect GCC messing with mcount prologue
On Tue, 24 Nov 2009, Jakub Jelinek wrote: > On Tue, Nov 24, 2009 at 03:55:49PM +0100, Thomas Gleixner wrote: > > > you should compile your code with -maccumulate-outgoing-args, and there's > > > no need to use -mtune=generic. Is that right? > > > > Seems to work. What other side effects has that ? > > Faster code, significant increase in code size though. Note that on many > architectures it is the only supported model. Just checked on the affected -marchs. The increase in code size is about 3% which is not that bad and definitely acceptable for the tracing case. Will zap the -mtune=generic patch and use -maccumulate-outgoing-args instead. Thanks, tglx
Re: [PATCH][GIT PULL][v2.6.32] tracing/x86: Add check to detect GCC messing with mcount prologue
On Wed, 25 Nov 2009, Ingo Molnar wrote: > * Thomas Gleixner wrote: > > > On Tue, 24 Nov 2009, Jakub Jelinek wrote: > > > > > On Tue, Nov 24, 2009 at 03:55:49PM +0100, Thomas Gleixner wrote: > > > > > you should compile your code with -maccumulate-outgoing-args, and > > > > > there's > > > > > no need to use -mtune=generic. Is that right? > > > > > > > > Seems to work. What other side effects has that ? > > > > > > Faster code, significant increase in code size though. Note that on many > > > architectures it is the only supported model. > > > > Just checked on the affected -marchs. The increase in code size is > > about 3% which is not that bad and definitely acceptable for the > > tracing case. Will zap the -mtune=generic patch and use > > -maccumulate-outgoing-args instead. > > hm, 3% sounds quite large :( dyn-ftrace is enabled in distro configs, so > 3% is a big deal IMO. Distro-configs have -mtune=generic anyway. So it's not changing anything for them. I'm talking about the -march flags which result in that weird code (pentium-mmx ). Thanks, tglx