> On Oct 4, 2019, at 6:52 AM, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote:
>
> On Fri, 4 Oct 2019 at 15:42, Jason A. Donenfeld <ja...@zx2c4.com> wrote:
>>
>>> On Thu, Oct 03, 2019 at 10:43:29AM +0200, Ard Biesheuvel wrote:
>>> On Wed, 2 Oct 2019 at 16:17, Ard Biesheuvel <ard.biesheu...@linaro.org>
>>> wrote:
>>>>
>>> ...
>>>>
>>>> In the future, I would like to extend these interfaces to use static calls,
>>>> so that the accelerated implementations can be [un]plugged at runtime. For
>>>> the time being, we rely on weak aliases and conditional exports so that the
>>>> users of the library interfaces link directly to the accelerated versions,
>>>> but without the ability to unplug them.
>>>>
>>>
>>> As it turns out, we don't actually need static calls for this.
>>> Instead, we can simply permit weak symbol references to go unresolved
>>> between modules (as we already do in the kernel itself, due to the
>>> fact that ELF permits it), and have the accelerated code live in
>>> separate modules that may not be loadable on certain systems, or be
>>> blacklisted by the user.
>>
>> You're saying that at module insertion time, the kernel will override
>> weak symbols with those provided by the module itself? At runtime?
>>
>
> Yes.
>
>> Do you know offhand how this patching works? Is there a PLT that gets
>> patched, and so the calls all go through a layer of function pointer
>> indirection? Or are all call sites fixed up at insertion time and the
>> call instructions rewritten with some runtime patching magic?
>>
>
> No magic. Take curve25519 for example, when built for ARM:
>
> 00000000 <curve25519>:
> 0: f240 0300 movw r3, #0
> 0: R_ARM_THM_MOVW_ABS_NC curve25519_arch
> 4: f2c0 0300 movt r3, #0
> 4: R_ARM_THM_MOVT_ABS curve25519_arch
> 8: b570 push {r4, r5, r6, lr}
> a: 4604 mov r4, r0
> c: 460d mov r5, r1
> e: 4616 mov r6, r2
> 10: b173 cbz r3, 30 <curve25519+0x30>
> 12: f7ff fffe bl 0 <curve25519_arch>
> 12: R_ARM_THM_CALL curve25519_arch
> 16: b158 cbz r0, 30 <curve25519+0x30>
> 18: 4620 mov r0, r4
> 1a: 2220 movs r2, #32
> 1c: f240 0100 movw r1, #0
> 1c: R_ARM_THM_MOVW_ABS_NC .LANCHOR0
> 20: f2c0 0100 movt r1, #0
> 20: R_ARM_THM_MOVT_ABS .LANCHOR0
> 24: f7ff fffe bl 0 <__crypto_memneq>
> 24: R_ARM_THM_CALL __crypto_memneq
> 28: 3000 adds r0, #0
> 2a: bf18 it ne
> 2c: 2001 movne r0, #1
> 2e: bd70 pop {r4, r5, r6, pc}
> 30: 4632 mov r2, r6
> 32: 4629 mov r1, r5
> 34: 4620 mov r0, r4
> 36: f7ff fffe bl 0 <curve25519_generic>
> 36: R_ARM_THM_CALL curve25519_generic
> 3a: e7ed b.n 18 <curve25519+0x18>
>
> curve25519_arch is a weak reference. It either gets satisfied at
> module load time, or it doesn't.
>
> If it does get satisfied, the relocations covering the movw/movt pair
> and the one covering the bl instruction get updated so that they point
> to the arch routine.
>
> If it does not get satisfied, the relocations are disregarded, in
> which case the cbz instruction at offset 0x10 jumps over the bl call.
>
> Note that this does not involve any memory accesses. It does involve
> some code patching, but only of the kind the module loader already
> does.
Won’t this have the counterintuitive property that, if you load the modules in
the opposite order, the reference won’t be re-resolved and performance will
silently regress?
I think it might be better to allow two different modules to export the same
symbol but only allow one of them to be loaded. Or use static calls.