On Fri, Sep 05, 2025 at 09:19:29AM -0700, Kees Cook wrote: > On Fri, Sep 05, 2025 at 10:51:03AM +0200, Peter Zijlstra wrote: > > On Thu, Sep 04, 2025 at 05:24:10PM -0700, Kees Cook wrote: > > > +- The check-call instruction sequence must be treated a single unit: it > > > + cannot be rearranged or split or optimized. The pattern is that > > > + indirect calls, "call *$target", get converted into: > > > + > > > + mov $target_expression, %target ; only present if the expression was > > > + ; not already %target register > > > + load -$offset(%target), %tmp ; load the typeid hash at target > > > + cmp $hash, %tmp ; compare expected typeid with loaded > > > + je .Lcheck_passed ; jump to the indirect call > > > + .Lkcfi_trap$N: ; label of trap insn > > > + trap ; trap on failure, but arranged so > > > + ; "permissive mode" falls through > > > + .Lkcfi_call$N: ; label of call insn > > > + call *%target ; actual indirect call > > > + > > > + This pattern of call immediately after trap provides for the > > > + "permissive" checking mode automatically: the trap gets handled, > > > + a warning emitted, and then execution continues after the trap to > > > + the call. > > > > I know it is far too late to do anything here. But I've recently dug > > through a bunch of optimization manual and the like and that Jcc is > > about as bad as it gets :/ > > > > The old optimization manual states that forward jumps are assumed > > not-taken; while backward jumps are assumed taken. > > > > The new wisdom is that any Jcc must be assumed not-taken; that is, the > > fallthrough case has the best throughput. > > I would expect the cmp to be the slowest part of this sequence, and I > figured the both the trap and the call to be speculation barriers? I'm > not sure, though. Is changing the sequence actually useful?
The load can miss, in which case it is definitely the most expensive thing around. > > Here we have a forward branch which is assumed taken :-( > > The constraints we have are: > > - Linux x86 KCFI trap handler decodes the instructions from the trap > backwards, but it uses exact offsets (-12 and -6). > - Control flow following the trap must make the call (for warn-only mode) > > If we change this, we'd need to make the insn decoder smarter to likey > look at the insn AFTER the trap ("is it a direct jump?") > > And then use this, which is ugly, but matches second constraint: > > cmp $hash %tmp > jne .Ltrap > .Lcall: > call *%target > jmp .Ldone > .Ltrap: > ud2 > jmp .Lcall > .Ldone: Ah, you can do something like: cmp $hash, %tmp jne +3 nopl -42(%rax) call *%target which is only 2 bytes longer. Notably, that nopl is 4 bytes and the 4th byte is 0xd6 (aka UDB). This is an effective UDcc instruction based around a forward non-taken branch. But yeah, I don't know if it is worth changing this. Its just that I've been staring at these things far too much of late :-)