On Mon, 23 Nov 2009, Jakub Jelinek wrote:
> On Thu, Nov 19, 2009 at 08:01:57PM +0100, Thomas Gleixner wrote:
> > Just compiled with -mincoming-stack-boundary=4 and the problem goes
> > away as gcc now thinks that the incoming stack is already 16 byte
> > aligned. But that might break code which ac
On Thu, Nov 19, 2009 at 08:01:57PM +0100, Thomas Gleixner wrote:
> Just compiled with -mincoming-stack-boundary=4 and the problem goes
> away as gcc now thinks that the incoming stack is already 16 byte
> aligned. But that might break code which actually uses SSE
Please don't do this, lying to the
On 11/20/2009 04:34 AM, Steven Rostedt wrote:
>>
>> If there's interest I can polish it up and submit formally.
>
> I would definitely be interested, and I would also be willing to test
> it.
>
I don't think there is any question that interception at the
architectural entry point would be the ri
On Fri, 2009-11-20 at 10:57 +0100, Andi Kleen wrote:
> Steven Rostedt writes:
> >
> > And frame pointers do add a little overhead as well. Too bad the mcount
> > ABI wasn't something like this:
> >
> >
> > :
> > callmcount
> > [...]
> >
> > This way, the function ad
Andrew Haley wrote:
> Thomas Gleixner wrote:
>
>> While testing various kernel configs we found out that the problem
>> comes and goes. Finally I started to compare the gcc command line
>> options and after some fiddling it turned out that the following
>> minimal deltas change the code generator
Thomas Gleixner wrote:
> While testing various kernel configs we found out that the problem
> comes and goes. Finally I started to compare the gcc command line
> options and after some fiddling it turned out that the following
> minimal deltas change the code generator behaviour:
>
> Bad: -march
Steven Rostedt writes:
>
> And frame pointers do add a little overhead as well. Too bad the mcount
> ABI wasn't something like this:
>
>
> :
> callmcount
> [...]
>
> This way, the function address for mcount would have been (%esp) and the
> parent address woul
* Linus Torvalds wrote:
> [ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
> totally unrelated numbers? Very confusing. ]
incidentally (or maybe not so incidentally) that got fixed yesterday in
-tip - at around the time i triggered that crash:
350f8f5: x86: Elimi
On Fri, 20 Nov 2009, Thomas Gleixner wrote:
> On Thu, 19 Nov 2009, Linus Torvalds wrote:
> > On Fri, 20 Nov 2009, Thomas Gleixner wrote:
> > >
> > > While testing various kernel configs we found out that the problem
> > > comes and goes. Finally I started to compare the gcc command line
> > > opt
On 11/19/2009 04:59 PM, Linus Torvalds wrote:
>
> [ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
> totally unrelated numbers? Very confusing. ]
>
Yes, there is another thread to clean up that particular mess; it is
already in -tip:
http://git.kernel.org/tip/350f8f5
On Thu, 19 Nov 2009, Linus Torvalds wrote:
> On Fri, 20 Nov 2009, Thomas Gleixner wrote:
> >
> > While testing various kernel configs we found out that the problem
> > comes and goes. Finally I started to compare the gcc command line
> > options and after some fiddling it turned out that the follo
On Fri, 20 Nov 2009, Thomas Gleixner wrote:
>
> While testing various kernel configs we found out that the problem
> comes and goes. Finally I started to compare the gcc command line
> options and after some fiddling it turned out that the following
> minimal deltas change the code generator beh
On Thu, 19 Nov 2009, Jeff Law wrote:
> On 11/19/09 15:43, Steven Rostedt wrote:
> > On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:
> >
> >
> > > Having said all that, I don't expect to personally be looking at the
> > > problem, given the list of other codegen issues that need to be looked
On 11/19/09 15:43, Steven Rostedt wrote:
On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:
Having said all that, I don't expect to personally be looking at the
problem, given the list of other codegen issues that need to be looked
at (reload in particular), profiling/stack interactions wou
On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:
> Having said all that, I don't expect to personally be looking at the
> problem, given the list of other codegen issues that need to be looked
> at (reload in particular), profiling/stack interactions would be around
> 87 millionth on my list.
On 11/19/09 14:14, H. Peter Anvin wrote:
Hence a new unconstrained option...
Not arguing against it, just noting there are targets where after the
prologue mcount is mandated. There's certainly hooks in GCC to do it
both ways and if there's no clear need to use after-prologue on
x86-linu
Hence a new unconstrained option...
"Jeff Law" wrote:
>On 11/19/09 12:50, H. Peter Anvin wrote:
>>
>> Calling the profiler immediately at the entry point is clearly the more
>> sane option. It means the ABI is well-defined, stable, and independent
>> of what the actual function contents are. I
On 11/19/09 13:06, Linus Torvalds wrote:
On Thu, 19 Nov 2009, H. Peter Anvin wrote:
Calling the profiler immediately at the entry point is clearly the more
sane option. It means the ABI is well-defined, stable, and independent
of what the actual function contents are. It means that ABI is
On 11/19/09 12:50, H. Peter Anvin wrote:
Calling the profiler immediately at the entry point is clearly the more
sane option. It means the ABI is well-defined, stable, and independent
of what the actual function contents are. It means that ABI isn't the
normal C ABI (the __fentry__ function wo
On i386, if we call __fentry__ immediately on entry the return address will be
in 4(%esp), so I fail to see how you could not reliably have the return
address. Other arches would have different constraints, of course.
"Frederic Weisbecker" wrote:
>On Thu, Nov 19, 2009 at 03:05:41PM -0500, Ste
On Thu, 2009-11-19 at 12:36 -0800, Linus Torvalds wrote:
>
> On Thu, 19 Nov 2009, Frederic Weisbecker wrote:
> >
> > > That way the lr would have the current function, and the parent would
> > > still be at 8(%sp)
> >
> > Yeah right, we need at least such very tiny prologue for
> > archs that sto
On Thu, 19 Nov 2009, Frederic Weisbecker wrote:
>
> > That way the lr would have the current function, and the parent would
> > still be at 8(%sp)
>
> Yeah right, we need at least such very tiny prologue for
> archs that store return addresses in a reg.
Well, it will be architecture-dependent.
On Thu, 19 Nov 2009, Linus Torvalds wrote:
> On Thu, 19 Nov 2009, Richard Guenther wrote:
> >
> > Note that I only can reproduce the issue with
> > -mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2.
>
> Since you can reproduce it with -mincoming-stack-boundary=2, I woul
> sugges
On Thu, Nov 19, 2009 at 03:17:16PM -0500, Steven Rostedt wrote:
> On Thu, 2009-11-19 at 15:05 -0500, Steven Rostedt wrote:
>
> > Well, other archs use a register to store the return address. But it
> > would also be easy to do (pseudo arch assembly):
> >
> > :
> > mov lr, (%sp)
>
On Thu, Nov 19, 2009 at 03:05:41PM -0500, Steven Rostedt wrote:
> On Thu, 2009-11-19 at 20:46 +0100, Frederic Weisbecker wrote:
> > On Thu, Nov 19, 2009 at 02:28:06PM -0500, Steven Rostedt wrote:
>
> > > :
> > > call __fentry__
> > > [...]
> > >
> > >
> > > -- Steve
> >
On Thu, 2009-11-19 at 15:05 -0500, Steven Rostedt wrote:
> Well, other archs use a register to store the return address. But it
> would also be easy to do (pseudo arch assembly):
>
> :
> mov lr, (%sp)
> add 8, %sp
> blr __fentry__
Should be bl __fe
On Thu, 2009-11-19 at 11:50 -0800, H. Peter Anvin wrote:
> > Perhaps we could create another profiler? Instead of calling mcount,
> > call a new function: __fentry__ or something. Have it activated with
> > another switch. This could make the performance of the function tracer
> > even better with
On Thu, 19 Nov 2009, H. Peter Anvin wrote:
>
> Calling the profiler immediately at the entry point is clearly the more
> sane option. It means the ABI is well-defined, stable, and independent
> of what the actual function contents are. It means that ABI isn't the
> normal C ABI (the __fentry__
On Thu, 2009-11-19 at 20:46 +0100, Frederic Weisbecker wrote:
> On Thu, Nov 19, 2009 at 02:28:06PM -0500, Steven Rostedt wrote:
> > :
> > call __fentry__
> > [...]
> >
> >
> > -- Steve
>
>
> I would really like this. So that we can forget about other possible
>
On Thu, Nov 19, 2009 at 08:54:56PM +0100, Kai Tietz wrote:
> 2009/11/19 Frederic Weisbecker :
> > I would really like this. So that we can forget about other possible
> > further suprises due to sophisticated function prologues beeing before
> > the mcount call.
> >
> > And I guess that would fix i
2009/11/19 Frederic Weisbecker :
> I would really like this. So that we can forget about other possible
> further suprises due to sophisticated function prologues beeing before
> the mcount call.
>
> And I guess that would fix it in every archs.
My 5 cent for this, too.
> That said, Linus had a g
On 11/19/2009 11:28 AM, Steven Rostedt wrote:
>
> Hehe, scratch register on i686 ;-)
>
> i686 has no extra regs. It just has:
>
> %eax, %ebx, %ecx, %edx - as the general purpose regs
> %esp - stack
> %ebp - frame pointer
> %edi, %esi - counter regs
>
> That's just 8 regs, and half of those are
* Steven Rostedt wrote:
> On Thu, 2009-11-19 at 19:47 +0100, Ingo Molnar wrote:
> > * Linus Torvalds wrote:
> >
> > > Admittedly, anybody who compiles with -pg probably doesn't care deeply
> > > about smaller and more efficient code, since the mcount call overhead
> > > tends to make the thi
On Thu, Nov 19, 2009 at 02:28:06PM -0500, Steven Rostedt wrote:
> On Thu, 2009-11-19 at 11:10 -0800, David Daney wrote:
> > Linus Torvalds wrote:
>
> > For the MIPS port of GCC and Linux I recently added the
> > -mmcount-ra-address switch. It causes the location of the return
> > address (on th
On Thu, 2009-11-19 at 11:10 -0800, David Daney wrote:
> Linus Torvalds wrote:
> For the MIPS port of GCC and Linux I recently added the
> -mmcount-ra-address switch. It causes the location of the return
> address (on the stack) to be passed to mcount in a scratch register.
Hehe, scratch regist
Linus Torvalds wrote:
On Thu, 19 Nov 2009, Linus Torvalds wrote:
I bet other people than just the kernel use the mcount hook for subtler
things than just doing profiles. And even if they don't, the quoted code
generation is just crazy _crap_.
For the kernel, if the only case is that timer_st
On Thu, 2009-11-19 at 19:47 +0100, Ingo Molnar wrote:
> * Linus Torvalds wrote:
>
> > Admittedly, anybody who compiles with -pg probably doesn't care deeply
> > about smaller and more efficient code, since the mcount call overhead
> > tends to make the thing moot anyway, but it really looks lik
On Thu, 19 Nov 2009, Linus Torvalds wrote:
> > I bet other people than just the kernel use the mcount hook for subtler
> > things than just doing profiles. And even if they don't, the quoted code
> > generation is just crazy _crap_.
>
> For the kernel, if the only case is that timer_stat.c thing
On Thu, 19 Nov 2009, Linus Torvalds wrote:
>
> I bet other people than just the kernel use the mcount hook for subtler
> things than just doing profiles. And even if they don't, the quoted code
> generation is just crazy _crap_.
For the kernel, if the only case is that timer_stat.c thing that
* Linus Torvalds wrote:
> Admittedly, anybody who compiles with -pg probably doesn't care deeply
> about smaller and more efficient code, since the mcount call overhead
> tends to make the thing moot anyway, but it really looks like a
> win-win situation to just fix the mcount call sequence r
On Thu, 19 Nov 2009, Linus Torvalds wrote:
>
> Oh Gods, are we back to gcc people saying "sure, we do stupid things, but
> it's allowed, so we don't consider it a bug because it doesn't matter that
> real people care about real life, we only care about some paper, and real
> life doesn't matt
On Thu, 19 Nov 2009, Andrew Haley wrote:
>
> I've got all that off-list. I found the cause, and replied in another
> email. It's not a bug.
Oh Gods, are we back to gcc people saying "sure, we do stupid things, but
it's allowed, so we don't consider it a bug because it doesn't matter that
re
On Thu, 19 Nov 2009, Richard Guenther wrote:
>
> Note that I only can reproduce the issue with
> -mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2.
Since you can reproduce it with -mincoming-stack-boundary=2, I woul
suggest just fixing mcount handling that way regardless of an
On Thu, 19 Nov 2009, Andrew Haley wrote:
> OK, I found it. There is a struct defined as
>
> struct entry {
> ...
> } __attribute__((__aligned__((1 << (4);
>
> and then in timer_stats_update_stats you have a local variable of type
> struct entry:
>
> void timer_stats_update_stats()
> {
> s
On 11/19/2009 10:33 AM, Steven Rostedt wrote:
>
> It has to align the entire stack? Why not just the variable within the
> stack?
>
Because if the stack pointer isn't aligned, it won't know where it can
stuff the variable. It has to pad *somewhere*, and since you may have
more than one such var
Steven Rostedt wrote:
> On Thu, 2009-11-19 at 18:20 +, Andrew Haley wrote:
>
>> OK, I found it. There is a struct defined as
>>
>> struct entry {
>> ...
>> } __attribute__((__aligned__((1 << (4);
>>
>> and then in timer_stats_update_stats you have a local variable of type
>> struct entry
On Thu, Nov 19, 2009 at 10:33 AM, Steven Rostedt wrote:
> It has to align the entire stack? Why not just the variable within the
> stack?
I had proposed a patch which just aligns the variable but that patch
was never really commented on and HJL's patches to realign the whole
stack went in afterwa
On Thu, 19 Nov 2009, Richard Guenther wrote:
> Note that I only can reproduce the issue with
> -mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2.
> And
> you didn't provide us with a testcase either ... so please open
> a bugzilla and attach preprocessed source of a file that
> sho
On Thu, 2009-11-19 at 18:20 +, Andrew Haley wrote:
> OK, I found it. There is a struct defined as
>
> struct entry {
> ...
> } __attribute__((__aligned__((1 << (4);
>
> and then in timer_stats_update_stats you have a local variable of type
> struct entry:
>
> void timer_stats_update_s
Richard Guenther wrote:
> And
> you didn't provide us with a testcase either ... so please open
> a bugzilla and attach preprocessed source of a file that
> shows the problem, note the function it happens in and provide
> the command-line options you used for building.
I've got all that off-list.
Thomas Gleixner wrote:
> On Thu, 19 Nov 2009, Thomas Gleixner wrote:
>
> Can the GCC folks please shed some light on this:
>
> standard function start:
>
>push %ebp
>mov%esp, %ebp
>
>call mcount
>
> modified function start on a handful of functions on
On Thu, Nov 19, 2009 at 6:59 PM, Steven Rostedt wrote:
> On Thu, 2009-11-19 at 09:39 -0800, Linus Torvalds wrote:
>
>> > This modification leads to a hard to solve problem in the kernel
>> > function graph tracer which assumes that the stack looks like:
>> >
>> > return address
>> >
On Thu, 2009-11-19 at 09:39 -0800, Linus Torvalds wrote:
> > This modification leads to a hard to solve problem in the kernel
> > function graph tracer which assumes that the stack looks like:
> >
> >return address
> >saved ebp
>
> Umm. But it still does, doesn't it? That
>
>
On Thu, 19 Nov 2009, Linus Torvalds wrote:
> Umm. But it still does, doesn't it? That
>
> pushl -0x4(%edi)
> push %ebp
>
> should do it - the "-0x4(%edi)" thing seems to be trying to reload the
> return address. No?
>
> Maybe I misread the code - but regardless, it does look lik
On Thu, 19 Nov 2009, Thomas Gleixner wrote:
>
> standard function start:
>
>push %ebp
>mov%esp, %ebp
>
>call mcount
>
> modified function start on a handful of functions only seen with gcc
> 4.4.x on x86 32 bit:
>
> push %edi
> lea
Richard Guenther writes:
>
> It's likely because you have long long vars on the stack which is
> faster when they are aligned.
It's not faster for 32bit.
-Andi
--
a...@linux.intel.com -- Speaking for myself only.
On Thu, 19 Nov 2009, Andrew Haley wrote:
> Thomas Gleixner wrote:
> > On Thu, 19 Nov 2009, Andrew Haley wrote:
> >> Thomas Gleixner wrote:
> >>> There is no real obvious reason why the edi magic needs to be done
> >>> _before_
> >>>
> >>> push %ebp
> >>> mov%esp,%ebp
> >> Sure there is:
On Thu, Nov 19, 2009 at 11:02:32AM -0500, Steven Rostedt wrote:
> On Thu, 2009-11-19 at 15:44 +, Andrew Haley wrote:
> > Thomas Gleixner wrote:
>
> > We're aligning the stack properly, as per the ABI requirements. Can't
> > you just fix the tracer?
>
> And how do we do that? The hooks that a
Thomas Gleixner wrote:
> On Thu, 19 Nov 2009, Andrew Haley wrote:
>> Thomas Gleixner wrote:
>>> There is no real obvious reason why the edi magic needs to be done
>>> _before_
>>>
>>> push %ebp
>>> mov%esp,%ebp
>> Sure there is: unless you do the adjustment first %ebp won't be 16-ali
On Thu, 2009-11-19 at 15:44 +, Andrew Haley wrote:
> We're aligning the stack properly, as per the ABI requirements. Can't
> you just fix the tracer?
Unfortunately, this is the only fix we have:
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index b416512..cd39064 100644
--- a/ker
On 11/19/2009 08:02 AM, Steven Rostedt wrote:
> On Thu, 2009-11-19 at 15:44 +, Andrew Haley wrote:
>> Thomas Gleixner wrote:
>
>> We're aligning the stack properly, as per the ABI requirements. Can't
>> you just fix the tracer?
>
> And how do we do that? The hooks that are in place have no i
On Thu, 19 Nov 2009, Andrew Haley wrote:
> Thomas Gleixner wrote:
> > There is no real obvious reason why the edi magic needs to be done
> > _before_
> >
> > push %ebp
> > mov%esp,%ebp
>
> Sure there is: unless you do the adjustment first %ebp won't be 16-aligned.
And why is this
On Thu, 2009-11-19 at 15:44 +, Andrew Haley wrote:
> Thomas Gleixner wrote:
> We're aligning the stack properly, as per the ABI requirements. Can't
> you just fix the tracer?
And how do we do that? The hooks that are in place have no idea of what
happened before they were called?
-- Steve
On Thu, Nov 19, 2009 at 4:54 PM, H. Peter Anvin wrote:
> On 11/19/2009 07:44 AM, Andrew Haley wrote:
>>
>> We're aligning the stack properly, as per the ABI requirements. Can't
>> you just fix the tracer?
>>
>
> "Per the ABI requirements?" We're talking 32 bits, here.
Hm, even with
void bar (i
On 11/19/2009 07:44 AM, Andrew Haley wrote:
>
> We're aligning the stack properly, as per the ABI requirements. Can't
> you just fix the tracer?
>
"Per the ABI requirements?" We're talking 32 bits, here.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.
On Thu, Nov 19, 2009 at 4:49 PM, Richard Guenther
wrote:
> On Thu, Nov 19, 2009 at 4:45 PM, H. Peter Anvin wrote:
>> On 11/19/2009 07:37 AM, Thomas Gleixner wrote:
>>>
>>> modified function start on a handful of functions only seen with gcc
>>> 4.4.x on x86 32 bit:
>>>
>>> push %edi
>>>
On Thu, Nov 19, 2009 at 4:45 PM, H. Peter Anvin wrote:
> On 11/19/2009 07:37 AM, Thomas Gleixner wrote:
>>
>> modified function start on a handful of functions only seen with gcc
>> 4.4.x on x86 32 bit:
>>
>> push %edi
>> lea 0x8(%esp),%edi
>> and $0xfff0,%esp
>>
On 11/19/2009 07:37 AM, Thomas Gleixner wrote:
>
> modified function start on a handful of functions only seen with gcc
> 4.4.x on x86 32 bit:
>
> push %edi
> lea0x8(%esp),%edi
> and$0xfff0,%esp
> pushl -0x4(%edi)
> push %ebp
> mov%esp,%ebp
Thomas Gleixner wrote:
> On Thu, 19 Nov 2009, Thomas Gleixner wrote:
>
> Can the GCC folks please shed some light on this:
>
> standard function start:
>
>push %ebp
>mov%esp, %ebp
>
>call mcount
>
> modified function start on a handful of functions on
On Thu, 19 Nov 2009, Thomas Gleixner wrote:
Can the GCC folks please shed some light on this:
standard function start:
push %ebp
mov%esp, %ebp
call mcount
modified function start on a handful of functions only seen with gcc
4.4.x on x86 32 bit:
70 matches
Mail list logo