On Fri, Mar 19, 2021 at 05:18:14PM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 18 March 2021 17:11
> > 
> > Due to commit c9c324dc22aa ("objtool: Support stack layout changes
> > in alternatives"), it is possible to simplify the retpolines.
> > 
> ...
> > Notice that since the longest alternative sequence is now:
> > 
> >    0:   e8 07 00 00 00          callq  c <.altinstr_replacement+0xc>
> >    5:   f3 90                   pause
> >    7:   0f ae e8                lfence
> >    a:   eb f9                   jmp    5 <.altinstr_replacement+0x5>
> >    c:   48 89 04 24             mov    %rax,(%rsp)
> >   10:   c3                      retq
> > 
> > 17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW,
> > if we can shrink the retpoline by 1 byte we can pack it more dense)
> 
> I'm intrigued about the lfence after the pause.
> Clearly this is for very warped cpu behaviour.
> To get to the pause you have to be speculating past an
> unconditional call.

Please read up on retpoline... That's the speculation trap. The warped
CPU behaviour is called Spectre-v2.

For others, the obvious alternative is the below; and possibly we could
then also remove the loop.

The original retpoline, as per Paul's article has: 1: pause; jmp 1b;.
That is, it lacks the LFENCE we have and would also fit 16 bytes.



---
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -15,8 +15,7 @@
        call    .Ldo_rop_\@
 .Lspec_trap_\@:
        UNWIND_HINT_EMPTY
-       pause
-       lfence
+       int3
        jmp .Lspec_trap_\@
 .Ldo_rop_\@:
        mov     %\reg, (%_ASM_SP)
@@ -27,7 +26,7 @@
 .macro THUNK reg
        .section .text.__x86.indirect_thunk
 
-       .align 32
+       .align 16
 SYM_FUNC_START(__x86_indirect_thunk_\reg)
 
        ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \


Reply via email to