Matthew Dillon <[EMAIL PROTECTED]> wrote:
>:I'm not sure there's any reason why you shouldn't. If you changed the
>:semantics of a stack segment so that memory addresses below the stack
>:pointer were irrelevant, you could implement a small, 0-cycle, on-chip
>:stack (that overflowed into memory).
>
> This would be relatively complex and also results in cache coherency
> problems.
I agree that there would be additional complexity. I believe that the
`on-chip stack cache' part has been implemented on some Forth chips
(where stack performance is rather critical), though I don't know
whether any of them were MP-capable.
My reason for suggesting the change to stack semantics was also to
allow cache line allocation without a memory fetch (ie if SP=1000,
a push would result in ff0..fff (or fe0..fff) being allocated as
a cache line without bothering to fetch ff0..ffb). I'm not sure
whether this change would actually provide a measurable improvement
though (I suspect that it wouldn't).
In this case, I believe cache coherency can be bypassed. The stack
segment is only needed on one processor at a time. If there's an
interrupt on that CPU, the on-chip stack would flush to memory so
that the memory image was consistent.
At the minimal end, another way of looking at it would be as an
`invisible' branch-and-link register - capable of saving a single
return address as long as nothing else was pushed onto the stack.
> A solution already exists: It's called branch-and-link,
One case where the IBM/360 accidently got it right :-).
> but Intel cpu's do not use it because Intel cpu's do not have enough
> registers (makes you just want to throw up -- all that MMX junk and they
> couldn't add a branch and link register! ).
But all that MMX junk makes Doom (or whatever) look much better
and that's far more critical :-).
> The key with branch-and-link
> is that the lowest subroutine level does not have to save/restore the
> register, making entry and return two or three times faster then
> subroutine calls that make other subroutine calls.
I seem to recall reading somewhere that leaf subroutine performance
is also fairly important for overall performance (though that may
have been before C-compilers learnt how to in-line functions).
Peter
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message