Re: [PATCH 2/4] x86: Add -mfunction-return=

H.J. Lu Sat, 13 Jan 2018 11:10:17 -0800

On Sat, Jan 13, 2018 at 9:31 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
>> On Sat, Jan 13, 2018 at 8:51 AM, Woodhouse, David <d...@amazon.co.uk> wrote:
>> > On Sat, 2018-01-13 at 08:09 -0800, H.J. Lu wrote:
>> >>
>> >> > Again please extend both documentation hunks so it is clear what is 
>> >> > purpose
>> >> > of this hack.
>> >>
>> >> David, can you help here?
>> >
>> > On most older CPUs the indirect branch issue is limited to actual
>> > indirect branches.
>> >
>> > On Skylake-era CPUs, however, an underflow of the RSB (return stack
>> > buffer) caused by a call/ret imbalance (such as on context switch) will
>> > cause predictions to come from the same problematic branch predictor —
>> > essentially, allowing 'ret' instructions to be targeted by an attacker
>> > in precisely the same way as indirect branches.
>> >
>> > Note that there are plenty of other causes for RSB underflow. Like
>> > taking an SMI, which clears the RSB completely. Or various other
>> > things. Including a call stack deeper than 16 function calls.
>> >
>> > The -mfunction-return option was an experiment to use the retpoline
>> > approach for 'ret' too. I forget the implementation (I could look
>> > upthread), but essentially it was equivalent to replacing ret with
>> > 'pop %r12; jmp __x86_indirect_thunk_r12' so that you *never* deplete
>> > the RSB because of the 'call;ret' trick in the retpoline itself. Hence
>> > your exposure on Skylake was reduced to the possibility of taking an
>> > SMI while *in* the retpoline.
>>
>> RCX/ECX is a scratch register for both 32-bit and 64-bit.  Is it OK
>> to use it for "ret":
>>
>> pop %rcx
>> jmp __x86_indirect_thunk_rcx


There is no need for that since the return address is already on top
of stack.  We can just do

   __x86_return_thunk:
            call L2
    L1:
            pause
            jmp L1
    L2:
            lea 8(%rsp), %rsp|lea 4(%esp), %esp
            ret

ret becomes: jmp __x86_return_thunk as my current patch does.

> Is it also safe for local functions and IPA-ra?

Yes.

> Also what will your patchset do with large code model? Perhaps we want
> to sorry there that it is not suported.

True.  I will prepare a separate patch for it.

> Honza
>>
>> > This would, of course, be forcing a mispredict/pipeline stall on every
>> > 'ret', rather than only on every indirect branch as in the original
>> > retpoline idea. HJ added the code, but I'm not sure anyone at Intel
>> > ever did actually do the *testing* to establish the performance
>> > characteristics. Dave/Arjan?
>> >
>> > For my part, right *now* the kernel doesn't use this option. But then,
>> > we don't have a comprehensive answer for Skylake yet other than "use
>> > the new microcode features". Which are slower than retpoline, but not
>> > as *much* slower on Skylake as they are on other CPUs.
>> >
>> > Amazon Web Services UK Limited. Registered in England and Wales with 
>> > registration number 08650665 and which has its registered office at 60 
>> > Holborn Viaduct, London EC1A 2FD, United Kingdom.
>>
>>
>>
>> --
>> H.J.



-- 
H.J.

Re: [PATCH 2/4] x86: Add -mfunction-return=

Reply via email to