On Sat, Jan 13, 2018 at 9:31 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >> On Sat, Jan 13, 2018 at 8:51 AM, Woodhouse, David <d...@amazon.co.uk> wrote: >> > On Sat, 2018-01-13 at 08:09 -0800, H.J. Lu wrote: >> >> >> >> > Again please extend both documentation hunks so it is clear what is >> >> > purpose >> >> > of this hack. >> >> >> >> David, can you help here? >> > >> > On most older CPUs the indirect branch issue is limited to actual >> > indirect branches. >> > >> > On Skylake-era CPUs, however, an underflow of the RSB (return stack >> > buffer) caused by a call/ret imbalance (such as on context switch) will >> > cause predictions to come from the same problematic branch predictor — >> > essentially, allowing 'ret' instructions to be targeted by an attacker >> > in precisely the same way as indirect branches. >> > >> > Note that there are plenty of other causes for RSB underflow. Like >> > taking an SMI, which clears the RSB completely. Or various other >> > things. Including a call stack deeper than 16 function calls. >> > >> > The -mfunction-return option was an experiment to use the retpoline >> > approach for 'ret' too. I forget the implementation (I could look >> > upthread), but essentially it was equivalent to replacing ret with >> > 'pop %r12; jmp __x86_indirect_thunk_r12' so that you *never* deplete >> > the RSB because of the 'call;ret' trick in the retpoline itself. Hence >> > your exposure on Skylake was reduced to the possibility of taking an >> > SMI while *in* the retpoline. >> >> RCX/ECX is a scratch register for both 32-bit and 64-bit. Is it OK >> to use it for "ret": >> >> pop %rcx >> jmp __x86_indirect_thunk_rcx
There is no need for that since the return address is already on top of stack. We can just do __x86_return_thunk: call L2 L1: pause jmp L1 L2: lea 8(%rsp), %rsp|lea 4(%esp), %esp ret ret becomes: jmp __x86_return_thunk as my current patch does. > Is it also safe for local functions and IPA-ra? Yes. > Also what will your patchset do with large code model? Perhaps we want > to sorry there that it is not suported. True. I will prepare a separate patch for it. > Honza >> >> > This would, of course, be forcing a mispredict/pipeline stall on every >> > 'ret', rather than only on every indirect branch as in the original >> > retpoline idea. HJ added the code, but I'm not sure anyone at Intel >> > ever did actually do the *testing* to establish the performance >> > characteristics. Dave/Arjan? >> > >> > For my part, right *now* the kernel doesn't use this option. But then, >> > we don't have a comprehensive answer for Skylake yet other than "use >> > the new microcode features". Which are slower than retpoline, but not >> > as *much* slower on Skylake as they are on other CPUs. >> > >> > Amazon Web Services UK Limited. Registered in England and Wales with >> > registration number 08650665 and which has its registered office at 60 >> > Holborn Viaduct, London EC1A 2FD, United Kingdom. >> >> >> >> -- >> H.J. -- H.J.