On March 31, 2026 6:59:06 PM PDT, Xin Li <[email protected]> wrote: > > >> On Mar 30, 2026, at 11:03 PM, Xin Li <[email protected]> wrote: >> >> >>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 == >>>>>> regs->flags'. This check relies on the behavior of the SYSCALL >>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'. >>>>>> >>>>>> However, on systems with FRED (Flexible Return and Event Delivery) >>>>>> enabled, instead of using registers, all state is saved onto the stack. >>>>>> Consequently, 'R11' retains its userspace value, causing the assertion >>>>>> to fail. >>>>>> >>>>>> Fix this by detecting if FRED is enabled and skipping the register >>>>>> assertion in that case. The detection is done by checking if the RPL >>>>>> bits of the GS selector are preserved after a hardware exception. >>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via >>>>>> ERETU) preserves them. >>>>>> >>>>> >>>>> I don't really like this. I think we have two credible choices: >>>>> >>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves >>>>> R11 and RCX on entry and exit. And update the test to actually test >>>>> this. >>>>> >>>>> 2. Define the Linux ABI to be what it has been for quite a few years: >>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit >>>>> preserves all registers. >>>>> >>>>> I'm in favor of #2. People love making new programming languages and >>>>> runtimes and inline asm and, these days, vibe coded crap. And it's >>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code >>>>> generator that RCX and R11 are clobbered than it is to remember that >>>>> they're clobbered. And it's easy to test on FRED (well, not really, >>>>> but it hopefully will be some day) and it's easy to publish one's >>>>> code, and then everyone is a bit screwed when the resulting program >>>>> crashes sometimes on non-FRED systems. And it will be miserable to >>>>> debug. >>>>> >>>>> (It's *really* *really* easy to screw this up in a way that sort of >>>>> works even on non-FRED: RCX and R11 are usually clobbered across >>>>> function calls, so one can get into a situation in which one's >>>>> generated code usually doesn't require that SYSCALL preserve one of >>>>> these registers until an inlining decision changes or some code gets >>>>> reordered, and then it will start failing. And making the failure >>>>> depend on hardware details is just nasty. >>>>> >>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry >>>>> on FRED to match non-FRED. >>>> >>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for >>>> FRED systems is by far the safest choice. >>>> >>>> Aside from this selftest, fancy debuggers and anything that can transfer >>>> userspace state between machines might be 'surprised'. >>> >>> Thanks Andy and Peter. >>> >>> Indeed, making the selftest branch on FRED vs. non-FRED behavior >>> is not a good practice. The selftest should validate ABI consistency. >>> >>> I agree with Andy's option #2, so this should be fixed in the FRED >>> syscall entry implementation. >>> >>> Li Xin, does this direction look right to you? I can assit with >>> validation and keep the selftest aligned with the agreed ABI. >>> >> >> Yes, consistency should take precedence over hardware-specific variations. >> >> I would like to hear from Andrew Cooper and hpa before we do it. > >Per Andy’s suggestion, the change would be: > >diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c >index 88c757ac8ccd..a19898747a2c 100644 >--- a/arch/x86/entry/entry_fred.c >+++ b/arch/x86/entry/entry_fred.c >@@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs *regs) > { > /* The compiler can fold these conditions into a single test */ > if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) { >+ regs->cx = regs->ip; >+ regs->r11 = regs->flags; >+ > regs->orig_ax = regs->ax; > regs->ax = -ENOSYS; > do_syscall_64(regs, regs->orig_ax); > >It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here. > > > > > > >
We discussed this over a year ago, and at that point agreed that reserving the register was the desired behavior. Why has this changed now?

