Re: [RFC] timers, pointers to functions and type safety

2006-12-02 Thread Thomas Gleixner
On Fri, 2006-12-01 at 17:21 +, Al Viro wrote:
> Now, there's another question: how do we get there?  Or, at least, from
> current void (*)(unsigned long) to void (*)(void *)...

I think the real solution should be 

void (*function)(struct timer_list *timer);

and hand the timer itself to the callback. Most of the timers are
embedded into data structures anyway and for the rest we can easily
build one.

> "A fscking huge patch flipping everything at once" is obviously not an
> answer; too much PITA merging and impossible to review.

There are ~ 500 files affected and this is in the range of cleanups we
did recently at the end of the merge window already. I'd volunteer to
hack this up and keep the patch up to date until the final merge. I have
done that before and I'm not scared about it. The patches are a couple
of lines per file and I do not agree that this is impossible to review. 

tglx




Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner
On Thu, 19 Nov 2009, Thomas Gleixner wrote:

Can the GCC folks please shed some light on this:

standard function start:

 push   %ebp
 mov%esp, %ebp
 
 call   mcount

modified function start on a handful of functions only seen with gcc
4.4.x on x86 32 bit:

push   %edi
lea0x8(%esp),%edi
and$0xfff0,%esp
pushl  -0x4(%edi)
push   %ebp
mov%esp,%ebp
...
call   mcount

This modification leads to a hard to solve problem in the kernel
function graph tracer which assumes that the stack looks like:

   return address
   saved  ebp

With the modified function start sequence this is not longer true and
the manipulation of the return address on the stack fails silently.

Neither gcc 4.3 nor gcc 3.4 are generating such function frames, so it
looks like a gcc 4.4.x feature.

There is no real obvious reason why the edi magic needs to be done
_before_ 

push   %ebp
mov%esp,%ebp

Thanks,

tglx


Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner
On Thu, 19 Nov 2009, Andrew Haley wrote:
> Thomas Gleixner wrote:
> > There is no real obvious reason why the edi magic needs to be done
> > _before_ 
> > 
> > push   %ebp
> > mov%esp,%ebp
> 
> Sure there is: unless you do the adjustment first %ebp won't be 16-aligned.

And why is this not done in 99% of the functions in the kernel, just
in this one and some random others ?
 
> We're aligning the stack properly, as per the ABI requirements.  Can't
> you just fix the tracer?

Where is that ABI requirement that

push   %ebp

needs to happen on an aligned stack ? 

And why is this something GCC did not care about until GCC4.4 ?

Thanks,

tglx




Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner
On Thu, 19 Nov 2009, Andrew Haley wrote:

> Thomas Gleixner wrote:
> > On Thu, 19 Nov 2009, Andrew Haley wrote:
> >> Thomas Gleixner wrote:
> >>> There is no real obvious reason why the edi magic needs to be done
> >>> _before_ 
> >>>
> >>>   push   %ebp
> >>>   mov%esp,%ebp
> >> Sure there is: unless you do the adjustment first %ebp won't be 16-aligned.
> > 
> > And why is this not done in 99% of the functions in the kernel, just
> > in this one and some random others ?
> 
> If I could see the function I might be able to tell you.  It's either a
> performance enhancement, something to do with SSE, or it's a bug.

kernel/time/timer_stats.c timer_stats_update_stats()

Here is the disassembly:

8107ad50 :
8107ad50:   57  push   %edi
8107ad51:   8d 7c 24 08 lea0x8(%esp),%edi
8107ad55:   83 e4 f0and$0xfff0,%esp
8107ad58:   ff 77 fcpushl  -0x4(%edi)
8107ad5b:   55  push   %ebp
8107ad5c:   89 e5   mov%esp,%ebp
8107ad5e:   57  push   %edi
8107ad5f:   56  push   %esi
8107ad60:   53  push   %ebx
8107ad61:   83 ec 6csub$0x6c,%esp
8107ad64:   e8 47 92 f8 ff  call   81003fb0 
8107ad69:   8b 77 04mov0x4(%edi),%esi
8107ad6c:   89 75 a4mov%esi,-0x5c(%ebp)
8107ad6f:   65 8b 35 14 00 00 00mov%gs:0x14,%esi
8107ad76:   89 75 e4mov%esi,-0x1c(%ebp)
8107ad79:   31 f6   xor%esi,%esi
8107ad7b:   8b 35 60 5a cd 81   mov0x81cd5a60,%esi
8107ad81:   8b 1f   mov(%edi),%ebx
8107ad83:   85 f6   test   %esi,%esi
8107ad85:   8b 7f 08mov0x8(%edi),%edi
8107ad88:   75 18   jne8107ada2 

8107ad8a:   8b 45 e4mov-0x1c(%ebp),%eax
8107ad8d:   65 33 05 14 00 00 00xor%gs:0x14,%eax
8107ad94:   75 53   jne8107ade9 

8107ad96:   83 c4 6cadd$0x6c,%esp
8107ad99:   5b  pop%ebx
8107ad9a:   5e  pop%esi
8107ad9b:   5f  pop%edi
8107ad9c:   5d  pop%ebp
8107ad9d:   8d 67 f8lea-0x8(%edi),%esp
8107ada0:   5f  pop%edi
8107ada1:   c3  ret
8107ada2:   be 00 7a d6 81  mov$0x81d67a00,%esi
8107ada7:   89 45 acmov%eax,-0x54(%ebp)
8107adaa:   89 75 a0mov%esi,-0x60(%ebp)
8107adad:   89 5d b4mov%ebx,-0x4c(%ebp)
8107adb0:   64 8b 35 78 6a d6 81mov%fs:0x81d66a78,%esi
8107adb7:   8b 34 b5 20 50 cd 81mov-0x7e32afe0(,%esi,4),%esi
8107adbe:   89 4d b0mov%ecx,-0x50(%ebp)
8107adc1:   01 75 a0add%esi,-0x60(%ebp)
8107adc4:   89 55 b8mov%edx,-0x48(%ebp)
8107adc7:   8b 45 a0mov-0x60(%ebp),%eax
8107adca:   89 7d c0mov%edi,-0x40(%ebp)
8107adcd:   e8 de f7 76 00  call   817ea5b0 <_spin_lock_irqsave>
8107add2:   83 3d 60 5a cd 81 00cmpl   $0x0,0x81cd5a60
8107add9:   89 c3   mov%eax,%ebx
8107addb:   75 11   jne8107adee 

8107addd:   89 da   mov%ebx,%edx
8107addf:   8b 45 a0mov-0x60(%ebp),%eax
8107ade2:   e8 79 fc 76 00  call   817eaa60 
<_spin_unlock_irqrestore>
8107ade7:   eb a1   jmp8107ad8a 

8107ade9:   e8 52 e4 fc ff  call   81049240 <__stack_chk_fail>
8107adee:   8d 45 a8lea-0x58(%ebp),%eax
8107adf1:   8b 55 a4mov-0x5c(%ebp),%edx
8107adf4:   e8 f7 fd ff ff  call   8107abf0 
8107adf9:   85 c0   test   %eax,%eax
8107adfb:   74 05   je 8107ae02 

8107adfd:   ff 40 14incl   0x14(%eax)
8107ae00:   eb db   jmp8107addd 

8107ae02:   f0 ff 05 00 67 fd 81lock incl 0x81fd6700
8107ae09:   eb d2   jmp8107addd 

8107ae0b:   90  nop
8107ae0c:   90  nop
8107ae0d:   90  nop
8107ae0e:   90  nop
8107ae0f:   90  nop


There is a dozen more of those.

Thanks,

tglx


Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner
On Thu, 19 Nov 2009, Linus Torvalds wrote:
> Umm. But it still does, doesn't it? That
> 
>   pushl  -0x4(%edi)
>   push   %ebp
> 
> should do it - the "-0x4(%edi)" thing seems to be trying to reload the 
> return address. No? 
> 
> Maybe I misread the code - but regardless, it does look like a gcc code 
> generation bug if only because we really don't want a 16-byte aligned 
> stack anyway, and have asked for it to not be done.
> 
> So I agree that gcc shouldn't do that crazy prologue (and certainly _not_ 
> before calling mcount anyway), but I'm not sure I agree with that detail 
> of your analysis or explanation.

Yes, it does store the return address before the pushed ebp, but this
is a copy of the real stack entry which is before the pushed edi.

The function graph tracer needs to redirect the return into the tracer
and it therefor saves the real return address and modifies the stack
so the return ends up in the tracer code which then goes back to the
real return address.

But in this prologue/aligment case we modify the copy and not the real
return address on the stack, so we return without calling into the
tracer which is causing the headache because the state of the tracer
becomes confused.

Thanks,

tglx


Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner
On Thu, 19 Nov 2009, Richard Guenther wrote:
> Note that I only can reproduce the issue with
> -mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2.
> And
> you didn't provide us with a testcase either ... so please open
> a bugzilla and attach preprocessed source of a file that
> shows the problem, note the function it happens in and provide
> the command-line options you used for building.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109

Thanks,

tglx


Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner
On Thu, 19 Nov 2009, Andrew Haley wrote:
> OK, I found it.  There is a struct defined as
> 
> struct entry {
>  ...
> } __attribute__((__aligned__((1 << (4);
> 
> and then in timer_stats_update_stats you have a local variable of type
> struct entry:
> 
> void timer_stats_update_stats()
> {
>  spinlock_t *lock;
>  struct entry *entry, input;
> 
> So, gcc has to 16-align the stack pointer to satisfy the alignment
> for struct entry.

This does not explain why GCC < 4.4.x actually puts

 push %ebp
 mov  %esp, %ebp

first and why GCC 4.4.x decides to create an extra copy of the return
address instead of just keeping the mcount stack magic right at the
function entry.

Thanks,

tglx


Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner
On Thu, 19 Nov 2009, Linus Torvalds wrote:
> > I bet other people than just the kernel use the mcount hook for subtler 
> > things than just doing profiles. And even if they don't, the quoted code 
> > generation is just crazy _crap_.
> 
> For the kernel, if the only case is that timer_stat.c thing that Thomas 
> pointed at, I guess we can at least work around it with something like the 
> appended. The kernel code is certainly ugly too, no question about that. 
> 
> It's just that we'd like to be able to depend on mcount code generation 
> not being insane even in the presense of ugly code..
> 
> The alternative would be to have some warning when this happens, so that 
> we can at least see it. "mcount won't work reliably"

There are at least 20 other random functions which have the same
problem. Have not looked at the details yet.

Just compiled with -mincoming-stack-boundary=4 and the problem goes
away as gcc now thinks that the incoming stack is already 16 byte
aligned. But that might break code which actually uses SSE

Thanks,

tglx




Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner
On Thu, 19 Nov 2009, Linus Torvalds wrote:
> On Thu, 19 Nov 2009, Richard Guenther wrote:
> > 
> > Note that I only can reproduce the issue with
> > -mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2.
> 
> Since you can reproduce it with -mincoming-stack-boundary=2, I woul 
> suggest just fixing mcount handling that way regardless of anything else. 
> The current code generated by gcc is just insane - even for the case where 
> you _want_ 16-byte stack alignment.
> 
> Instead crazy code like
> 
> >   push   %edi
> >   lea0x8(%esp),%edi
> >   and$0xfff0,%esp
> >   pushl  -0x4(%edi)
> >   push   %ebp
> >   mov%esp,%ebp
> >   ...
> >   call   mcount
> 
> the sane thing to do would be to just do it as
> 
>   push   %ebp
>   mov%esp,%ebp
>   call   mcount
>   and$0xfff0,%esp

which is what the 64bit compile does except that the mcount call
happens a bit later which is fine.

8107cd34 :
8107cd34:   55  push   %rbp
8107cd35:   48 89 e5mov%rsp,%rbp
8107cd38:   48 83 e4 c0 and$0xffc0,%rsp

Thanks,

tglx


Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner
On Thu, 19 Nov 2009, Jeff Law wrote:
> On 11/19/09 15:43, Steven Rostedt wrote:
> > On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:
> > 
> >
> > > Having said all that, I don't expect to personally be looking at the
> > > problem, given the list of other codegen issues that need to be looked
> > > at (reload in particular), profiling/stack interactions would be around
> > > 87 millionth on my list.
> > >  
> > Is there someone else that can look at it?
> > 
> >
> Unsure at the moment...  Like everyone else, GCC developers are busy and this
> probably isn't going to be a high priority item for anyone.
> 
> 
> > Or at the very least, could you point us to where that code is, and one
> > of us tracing folks could take a crack at switching hats to be a
> > compiler writer (with the obvious prerequisite of drinking a lot of beer
> > first, or is there a better drug to cope with the pain of writing gcc?).
> >
> It _might_ be as easy as defining PROFILE_BEFORE_PROLOGUE in
> gcc-gcc/config/i386/linux.h & rebuilding GCC.
> 
> Based on comments elsewhere, the sun386i support may have used
> PROFILE_BEFORE_PROLOGUE in the past and thus the x86 backend may not need
> further adjustment.  That is obviously the ideal case.
> 
> If that appears to work for your needs, I'll volunteer to test it more
> thoroughly and assuming those tests look good shepherd it into the source
> tree.

We definitely want to see that ASAP.

While testing various kernel configs we found out that the problem
comes and goes. Finally I started to compare the gcc command line
options and after some fiddling it turned out that the following
minimal deltas change the code generator behaviour:

Bad:  -march=pentium-mmx-Wa,-mtune=generic32
Good: -march=i686-mtune=generic -Wa,-mtune=generic32
Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32

I'm not supposed to understand the logic behind that, right ?

Thanks,

tglx


Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner
On Thu, 19 Nov 2009, Linus Torvalds wrote:
> On Fri, 20 Nov 2009, Thomas Gleixner wrote:
> > 
> > While testing various kernel configs we found out that the problem
> > comes and goes. Finally I started to compare the gcc command line
> > options and after some fiddling it turned out that the following
> > minimal deltas change the code generator behaviour:
> > 
> > Bad:  -march=pentium-mmx-Wa,-mtune=generic32
> > Good: -march=i686-mtune=generic -Wa,-mtune=generic32
> > Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32
> > 
> > I'm not supposed to understand the logic behind that, right ?
> 
> Are you sure it's just the compiler flags?

I first captured the command line with V=1 and created a script of
it. Then I changed the -march -mtune options in that script and
compiled just that single file manually w/o changing .config or
invoking the kernel make magic.

The good ones produce:

650:   55  push   %ebp
651:   89 e5   mov%esp,%ebp
653:   83 e4 f0and$0xfff0,%esp

The bad one:

05f0 :
 5f0:   57  push   %edi
 5f1:   8d 7c 24 08 lea0x8(%esp),%edi
 5f5:   83 e4 f0and$0xfff0,%esp
 5f8:   ff 77 fcpushl  -0x4(%edi)
 5fb:   55  push   %ebp
 5fc:   89 e5   mov%esp,%ebp
 
> There's another configuration portion: the size of the alignment itself. 
> That's dependent on L1_CACHE_SHIFT, which in turn is taken from the kernel 
> config CONFIG_X86_L1_CACHE_SHIFT.
> 
> Maybe that value matters too - for example maybe gcc will not try to align 
> the stack if it's big?

That does not change any of the compiler options, but yes it could
have some effect via the various include magics, but all I have seen
so far is linkage.h which should not affect the compiler. And the
manual compile did not change any of this.
 
> [ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT 
>   totally unrelated numbers? Very confusing. ]

Agreed.

> The compiler flags we use are tied to some of the same choices that choose 
> the cache shift, so the correlation you found while debugging this would 
> still hold.

Digging further tomorrow when my brain is more awake.

Thanks,

tglx


Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner
On Fri, 20 Nov 2009, Thomas Gleixner wrote:

> On Thu, 19 Nov 2009, Linus Torvalds wrote:
> > On Fri, 20 Nov 2009, Thomas Gleixner wrote:
> > > 
> > > While testing various kernel configs we found out that the problem
> > > comes and goes. Finally I started to compare the gcc command line
> > > options and after some fiddling it turned out that the following
> > > minimal deltas change the code generator behaviour:
> > > 
> > > Bad:  -march=pentium-mmx-Wa,-mtune=generic32
> > > Good: -march=i686-mtune=generic -Wa,-mtune=generic32
> > > Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32

Found some more:

Bad:  -march=k6   -Wa,-mtune=generic32
Bad:  -march=geode-Wa,-mtune=generic32
Bad:  -march=c3   -Wa,-mtune=generic32

That seems every thing which has MMX support but no SSE and is somehow
compatible to the pentium-mmx.

Looks like the code generator optimization for those was done after
consuming the secret gcc-shrooms.

Thanks,

tglx


Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-23 Thread Thomas Gleixner
On Mon, 23 Nov 2009, Jakub Jelinek wrote:

> On Thu, Nov 19, 2009 at 08:01:57PM +0100, Thomas Gleixner wrote:
> > Just compiled with -mincoming-stack-boundary=4 and the problem goes
> > away as gcc now thinks that the incoming stack is already 16 byte
> > aligned. But that might break code which actually uses SSE
> 
> Please don't do this, lying to the compiler is just going to result in
> wrong-code sooner or later, with the above switch gcc will assume the
> incoming stack is 16-byte aligned (which is not true in the ix86 kernel)
> and could very well e.g. optimize away code that looks at
> alignment of stack variables etc.

Right. I gave up the idea pretty fast. But in the current situation we
are forced to lie to the compiler in some way. Forcing -mtune=generic
when the function graph tracer is enabled seems to be a halfways sane
work around.

Thanks,

tglx


Re: [PATCH][GIT PULL][v2.6.32] tracing/x86: Add check to detect GCC messing with mcount prologue

2009-11-24 Thread Thomas Gleixner
On Tue, 24 Nov 2009, Andrew Haley wrote:
> H.J. Lu wrote:
> > On Sun, Nov 22, 2009 at 9:20 AM, Andrew Haley  wrote:
> >> H.J. Lu wrote:
> >>> On Fri, Nov 20, 2009 at 11:35 AM, Andrew Haley  wrote:
>  Steven Rostedt wrote:
> > Ingo, Thomas and Linus,
> >
> > I know Thomas did a patch to force the -mtune=generic, but just in case
> > gcc decides to do something crazy again, this patch will catch it.
> >
> > Should we try to get this in now?
>  I'm sure this makes sense, but a gcc test case would be even better.
>  If this can be detected in the gcc test suite it'll be found and
>  fixed long before y'all in kernel land get to see it.  That's the
>  only way to guarantee this never bothers you again.
> 
>  H.J., who wrote the code in question, is hopefully looking at why
>  this odd code is being generated.  Once he's done I can put a
>  suitable test case in the gcc test suite.
> 
> >>> See:
> >>>
> >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109#c7
> >> I saw that, but does it mean you're going to investigate?  There is
> >> no obvious reason why -mtune=generic should affect code generation
> >> in this way, but it does.
> > 
> > Why not, there is
> > 
> > static const unsigned int x86_accumulate_outgoing_args
> >   = m_AMD_MULTIPLE | m_ATOM | m_PENT4 | m_NOCONA | m_PPRO | m_CORE2
> > | m_GENERIC;
> > 
> > -mtune=generic turns on -maccumulate-outgoing-args.
> 
> Alright, so let's at least try to give the kernel people the information
> that they need.
> 
> What you're saying is, to avoid this:
> 
>   05f0 :
>  5f0:   57  push   %edi
>  5f1:   8d 7c 24 08 lea0x8(%esp),%edi
>  5f5:   83 e4 f0and$0xfff0,%esp
>  5f8:   ff 77 fcpushl  -0x4(%edi)
>  5fb:   55  push   %ebp
>  5fc:   89 e5   mov%esp,%ebp
> 
> you should compile your code with -maccumulate-outgoing-args, and there's
> no need to use -mtune=generic.  Is that right?

Seems to work. What other side effects has that ?

Thanks,

tglx


Re: [PATCH][GIT PULL][v2.6.32] tracing/x86: Add check to detect GCC messing with mcount prologue

2009-11-25 Thread Thomas Gleixner
On Tue, 24 Nov 2009, Jakub Jelinek wrote:

> On Tue, Nov 24, 2009 at 03:55:49PM +0100, Thomas Gleixner wrote:
> > > you should compile your code with -maccumulate-outgoing-args, and there's
> > > no need to use -mtune=generic.  Is that right?
> > 
> > Seems to work. What other side effects has that ?
> 
> Faster code, significant increase in code size though.  Note that on many
> architectures it is the only supported model.

Just checked on the affected -marchs. The increase in code size is
about 3% which is not that bad and definitely acceptable for the
tracing case. Will zap the -mtune=generic patch and use
-maccumulate-outgoing-args instead.

Thanks,

tglx


Re: [PATCH][GIT PULL][v2.6.32] tracing/x86: Add check to detect GCC messing with mcount prologue

2009-11-25 Thread Thomas Gleixner
On Wed, 25 Nov 2009, Ingo Molnar wrote:
> * Thomas Gleixner  wrote:
> 
> > On Tue, 24 Nov 2009, Jakub Jelinek wrote:
> > 
> > > On Tue, Nov 24, 2009 at 03:55:49PM +0100, Thomas Gleixner wrote:
> > > > > you should compile your code with -maccumulate-outgoing-args, and 
> > > > > there's
> > > > > no need to use -mtune=generic.  Is that right?
> > > > 
> > > > Seems to work. What other side effects has that ?
> > > 
> > > Faster code, significant increase in code size though.  Note that on many
> > > architectures it is the only supported model.
> > 
> > Just checked on the affected -marchs. The increase in code size is 
> > about 3% which is not that bad and definitely acceptable for the 
> > tracing case. Will zap the -mtune=generic patch and use 
> > -maccumulate-outgoing-args instead.
> 
> hm, 3% sounds quite large :( dyn-ftrace is enabled in distro configs, so 
> 3% is a big deal IMO.

Distro-configs have -mtune=generic anyway. So it's not changing
anything for them.

I'm talking about the -march flags which result in that weird code
(pentium-mmx ).

Thanks,

tglx