[ This is FYI only. Documenting what I found with gcc 4.5.1 (but is
fixed in 4.5.4 ]
Part of my test suit is to build the kernel with a compiler before asm
goto was supported (to test jump labels without it).
Recently I noticed that the kernel started to hang when building with
it. For a while
On Tue, 13 Aug 2013 07:46:46 -0700
"H. Peter Anvin" wrote:
> > On Mon, Aug 12, 2013 at 10:47:37AM -0700, H. Peter Anvin wrote:
> >> Since we really doesn't want to...
>
> Ow. Can't believe I wrote that.
>
All your base are belong to us!
-- Steve
On Wed, 2013-08-07 at 12:03 -0400, Mathieu Desnoyers wrote:
> You might want to try creating a global array of counters (accessible
> both from C for printout and assembly for update).
>
> Index the array from assembly using: (2f - 1f)
>
> 1:
> jmp ...;
> 2:
>
> And put an atomic incr
On Wed, 2013-08-07 at 07:06 +0200, Ondřej Bílka wrote:
> Add short_counter,long_counter and before increment counter before each
> jump. That way we will know how many short/long jumps were taken.
That's not trivial at all. The jump is a single location (in an asm
goto() statement) that happens
On Tue, 2013-08-06 at 20:45 -0400, Steven Rostedt wrote:
> [3.387362] short jumps: 106
> [3.390277] long jumps: 330
>
> Thus, approximately 25%. Not bad.
Also, where these happen to be is probably even more important than how
many. If all the short jumps happen in slow
On Tue, 2013-08-06 at 16:43 -0400, Steven Rostedt wrote:
> On Tue, 2013-08-06 at 16:33 -0400, Mathieu Desnoyers wrote:
>
> > Steve, perhaps you could add a mode to your binary rewriting program
> > that counts the number of 2-byte vs 5-byte jumps found, and if possible
>
On Tue, 2013-08-06 at 16:33 -0400, Mathieu Desnoyers wrote:
> Steve, perhaps you could add a mode to your binary rewriting program
> that counts the number of 2-byte vs 5-byte jumps found, and if possible
> get a breakdown of those per subsystem ?
I actually started doing that, as I was curious t
On Tue, 2013-08-06 at 10:48 -0700, Linus Torvalds wrote:
> So I wonder if this is a "ok, let's not bother, it's not worth the
> pain" issue. 128 bytes of offset is very small, so there probably
> aren't all that many cases that would use it.
OK, I'll forward port the original patches for the hell
On Tue, 2013-08-06 at 09:19 -0700, H. Peter Anvin wrote:
> On 08/06/2013 09:15 AM, Steven Rostedt wrote:
> > On Mon, 2013-08-05 at 14:43 -0700, H. Peter Anvin wrote:
> >
> >> For unconditional jmp that should be pretty safe barring any fundamental
> >> changes
On Mon, 2013-08-05 at 14:43 -0700, H. Peter Anvin wrote:
> For unconditional jmp that should be pretty safe barring any fundamental
> changes to the instruction set, in which case we can enable it as
> needed, but for extra robustness it probably should skip prefix bytes.
Would the assembler add
On Mon, 2013-08-05 at 11:49 -0700, Linus Torvalds wrote:
> Ugh. Why the crazy update_jump_label script stuff?
After playing with the patches again, I now understand why I did that.
It wasn't just for optimization.
Currently the way jump labels work is that we use asm goto() and place a
5 byte no
On Mon, 2013-08-05 at 22:26 -0400, Jason Baron wrote:
> I think if the 'cold' attribute on the default disabled static_key
> branch moved the text completely out-of-line, it would satisfy your
> requirement here?
>
> If you like this approach, perhaps we can make something like this work
> wit
On Mon, 2013-08-05 at 17:28 -0400, Mathieu Desnoyers wrote:
> Another thing that bothers me with Steven's approach is that decoding
> jumps generated by the compiler seems fragile IMHO.
The encodings wont change. If they do, then old kernels will not run on
new hardware.
Now if it adds a third o
On Mon, 2013-08-05 at 12:57 -0700, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
> wrote:
> >
> > I remember that choosing between 2 and 5 bytes nop in the asm goto was
> > tricky: it had something to do with the fact that gcc doesn't know the
> > exact size of each in
On Mon, 2013-08-05 at 11:49 -0700, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 11:39 AM, Steven Rostedt wrote:
> >
> > I had patches that did exactly this:
> >
> > https://lkml.org/lkml/2012/3/8/461
> >
> > But it got dropped for some reason. I don
On Mon, 2013-08-05 at 12:04 -0700, Andi Kleen wrote:
> Steven Rostedt writes:
>
> Can't you just use -freorder-blocks-and-partition?
Yeah, I'm familiar with this option.
>
> This should already partition unlikely blocks into a
> different section. Just a single
On Mon, 2013-08-05 at 11:51 -0700, H. Peter Anvin wrote:
> On 08/05/2013 11:49 AM, Steven Rostedt wrote:
> > On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote:
> >
> >> Traps nest, that's why there is a stack. (OK, so you don't want to take
> >>
On Mon, 2013-08-05 at 11:34 -0700, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
> wrote:
> >
> > Ugh. I can see the attraction of your section thing for that case, I
> > just get the feeling that we should be able to do better somehow.
>
> Hmm.. Quite frankly, Steven, f
On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote:
> Traps nest, that's why there is a stack. (OK, so you don't want to take
> the same trap inside the trap handler, but that code should be very
> limited.) The trap instruction just becomes very short, but rather
> slow, call-return.
>
>
On Mon, 2013-08-05 at 11:20 -0700, Linus Torvalds wrote:
> Of course, it would be good to optimize static_key_false() itself -
> right now those static key jumps are always five bytes, and while they
> get nopped out, it would still be nice if there was some way to have
> just a two-byte nop (turn
On Mon, 2013-08-05 at 11:17 -0700, H. Peter Anvin wrote:
> On 08/05/2013 10:55 AM, Steven Rostedt wrote:
> >
> > Well, as tracepoints are being added quite a bit in Linux, my concern is
> > with the inlined functions that they bring. With jump labels they are
> > disa
On Mon, 2013-08-05 at 13:55 -0400, Steven Rostedt wrote:
> The difference between this and the
> "section" hack I suggested, is that this would use a "call"/"ret" when
> enabled instead of a "jmp"/"jmp".
I wonder if this is what Kris Kross meant in their song?
/me goes back to work...
-- Steve
On Mon, 2013-08-05 at 10:12 -0700, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 9:55 AM, Steven Rostedt wrote:
> First off, we have very few things that are *so* unlikely that they
> never get executed. Putting things in a separate section would
> actually be really bad.
My mai
On Mon, 2013-08-05 at 10:02 -0700, H. Peter Anvin wrote:
> > if (x) __attibute__((section(".foo"))) {
> > /* do something */
> > }
> >
>
> One concern I have is how this kind of code would work when embedded
> inside a function which already has a section attribute. This could
> easily caus
[ sent to both Linux kernel mailing list and to gcc list ]
I was looking at some of the old code I still have marked in my TODO
list, that I never pushed to get mainlined. One of them is to move trace
point logic out of the fast path to get rid of the stress that it
imposes on the icache.
Almost
On Tue, 2009-11-24 at 17:12 +, Andrew Haley wrote:
> H. Peter Anvin wrote:
> > If we're changing gcc anyway, then let's add the option of intercepting
> > the function at the point where the machine state is well-defined by
> > ABI, which is before the function stack frame is set up.
>
> Hmm.
On Fri, 2009-11-20 at 19:35 +, Andrew Haley wrote:
> Steven Rostedt wrote:
> > Ingo, Thomas and Linus,
> >
> > I know Thomas did a patch to force the -mtune=generic, but just in case
> > gcc decides to do something crazy again, this patch will catch it.
> >
Ingo, Thomas and Linus,
I know Thomas did a patch to force the -mtune=generic, but just in case
gcc decides to do something crazy again, this patch will catch it.
Should we try to get this in now?
-- Steve
On Fri, 2009-11-20 at 00:23 -0500, Steven Rostedt wrote:
> com
On Fri, 2009-11-20 at 10:57 +0100, Andi Kleen wrote:
> Steven Rostedt writes:
> >
> > And frame pointers do add a little overhead as well. Too bad the mcount
> > ABI wasn't something like this:
> >
> >
> > :
> > callmcount
&g
This touches the Makefile scripts. I forgot to CC kbuild and Sam.
-- Steve
On Fri, 2009-11-20 at 00:23 -0500, Steven Rostedt wrote:
> Ingo,
>
> Not sure if this is too much for this late in the -rc game, but it finds
> the gcc bug at build time, and we don't need to disab
an be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/urgent-2
Steven Rostedt (1):
tracing/x86: Add check to detect GCC messing with mcount prologue
kernel/trace/Kconfig|1 -
scripts/Makefile.build | 25 +++-
sc
On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:
> Having said all that, I don't expect to personally be looking at the
> problem, given the list of other codegen issues that need to be looked
> at (reload in particular), profiling/stack interactions would be around
> 87 millionth on my list.
On Thu, 2009-11-19 at 12:36 -0800, Linus Torvalds wrote:
>
> On Thu, 19 Nov 2009, Frederic Weisbecker wrote:
> >
> > > That way the lr would have the current function, and the parent would
> > > still be at 8(%sp)
> >
> > Yeah right, we need at least such very tiny prologue for
> > archs that sto
On Thu, 2009-11-19 at 15:05 -0500, Steven Rostedt wrote:
> Well, other archs use a register to store the return address. But it
> would also be easy to do (pseudo arch assembly):
>
> :
> mov lr, (%sp)
> add 8, %sp
> blr __f
On Thu, 2009-11-19 at 11:50 -0800, H. Peter Anvin wrote:
> > Perhaps we could create another profiler? Instead of calling mcount,
> > call a new function: __fentry__ or something. Have it activated with
> > another switch. This could make the performance of the function tracer
> > even better with
On Thu, 2009-11-19 at 20:46 +0100, Frederic Weisbecker wrote:
> On Thu, Nov 19, 2009 at 02:28:06PM -0500, Steven Rostedt wrote:
> > :
> > call __fentry__
> > [...]
> >
> >
> > -- Steve
>
>
> I would really like
On Thu, 2009-11-19 at 11:10 -0800, David Daney wrote:
> Linus Torvalds wrote:
> For the MIPS port of GCC and Linux I recently added the
> -mmcount-ra-address switch. It causes the location of the return
> address (on the stack) to be passed to mcount in a scratch register.
Hehe, scratch regist
On Thu, 2009-11-19 at 19:47 +0100, Ingo Molnar wrote:
> * Linus Torvalds wrote:
>
> > Admittedly, anybody who compiles with -pg probably doesn't care deeply
> > about smaller and more efficient code, since the mcount call overhead
> > tends to make the thing moot anyway, but it really looks lik
On Thu, 2009-11-19 at 18:20 +, Andrew Haley wrote:
> OK, I found it. There is a struct defined as
>
> struct entry {
> ...
> } __attribute__((__aligned__((1 << (4);
>
> and then in timer_stats_update_stats you have a local variable of type
> struct entry:
>
> void timer_stats_update_s
On Thu, 2009-11-19 at 09:39 -0800, Linus Torvalds wrote:
> > This modification leads to a hard to solve problem in the kernel
> > function graph tracer which assumes that the stack looks like:
> >
> >return address
> >saved ebp
>
> Umm. But it still does, doesn't it? That
>
>
On Thu, 2009-11-19 at 15:44 +, Andrew Haley wrote:
> We're aligning the stack properly, as per the ABI requirements. Can't
> you just fix the tracer?
Unfortunately, this is the only fix we have:
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index b416512..cd39064 100644
--- a/ker
On Thu, 2009-11-19 at 15:44 +, Andrew Haley wrote:
> Thomas Gleixner wrote:
> We're aligning the stack properly, as per the ABI requirements. Can't
> you just fix the tracer?
And how do we do that? The hooks that are in place have no idea of what
happened before they were called?
-- Steve
42 matches
Mail list logo