> On Oct 5, 2019, at 12:24 AM, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote:
>
> On Fri, 4 Oct 2019 at 16:56, Ard Biesheuvel <ard.biesheu...@linaro.org>
> wrote:
>>
>>> On Fri, 4 Oct 2019 at 16:53, Andy Lutomirski <l...@amacapital.net> wrote:
>>>
>>>
>>>
>>>> On Oct 4, 2019, at 6:52 AM, Ard Biesheuvel <ard.biesheu...@linaro.org>
>>>> wrote:
>>>>
>>>> On Fri, 4 Oct 2019 at 15:42, Jason A. Donenfeld <ja...@zx2c4.com> wrote:
>>>>>
>>>>>> On Thu, Oct 03, 2019 at 10:43:29AM +0200, Ard Biesheuvel wrote:
>>>>>> On Wed, 2 Oct 2019 at 16:17, Ard Biesheuvel <ard.biesheu...@linaro.org>
>>>>>> wrote:
>>>>>>>
>>>>>> ...
>>>>>>>
>>>>>>> In the future, I would like to extend these interfaces to use static
>>>>>>> calls,
>>>>>>> so that the accelerated implementations can be [un]plugged at runtime.
>>>>>>> For
>>>>>>> the time being, we rely on weak aliases and conditional exports so that
>>>>>>> the
>>>>>>> users of the library interfaces link directly to the accelerated
>>>>>>> versions,
>>>>>>> but without the ability to unplug them.
>>>>>>>
>>>>>>
>>>>>> As it turns out, we don't actually need static calls for this.
>>>>>> Instead, we can simply permit weak symbol references to go unresolved
>>>>>> between modules (as we already do in the kernel itself, due to the
>>>>>> fact that ELF permits it), and have the accelerated code live in
>>>>>> separate modules that may not be loadable on certain systems, or be
>>>>>> blacklisted by the user.
>>>>>
>>>>> You're saying that at module insertion time, the kernel will override
>>>>> weak symbols with those provided by the module itself? At runtime?
>>>>>
>>>>
>>>> Yes.
>>>>
>>>>> Do you know offhand how this patching works? Is there a PLT that gets
>>>>> patched, and so the calls all go through a layer of function pointer
>>>>> indirection? Or are all call sites fixed up at insertion time and the
>>>>> call instructions rewritten with some runtime patching magic?
>>>>>
>>>>
>>>> No magic. Take curve25519 for example, when built for ARM:
>>>>
>>>> 00000000 <curve25519>:
>>>> 0: f240 0300 movw r3, #0
>>>> 0: R_ARM_THM_MOVW_ABS_NC curve25519_arch
>>>> 4: f2c0 0300 movt r3, #0
>>>> 4: R_ARM_THM_MOVT_ABS curve25519_arch
>>>> 8: b570 push {r4, r5, r6, lr}
>>>> a: 4604 mov r4, r0
>>>> c: 460d mov r5, r1
>>>> e: 4616 mov r6, r2
>>>> 10: b173 cbz r3, 30 <curve25519+0x30>
>>>> 12: f7ff fffe bl 0 <curve25519_arch>
>>>> 12: R_ARM_THM_CALL curve25519_arch
>>>> 16: b158 cbz r0, 30 <curve25519+0x30>
>>>> 18: 4620 mov r0, r4
>>>> 1a: 2220 movs r2, #32
>>>> 1c: f240 0100 movw r1, #0
>>>> 1c: R_ARM_THM_MOVW_ABS_NC .LANCHOR0
>>>> 20: f2c0 0100 movt r1, #0
>>>> 20: R_ARM_THM_MOVT_ABS .LANCHOR0
>>>> 24: f7ff fffe bl 0 <__crypto_memneq>
>>>> 24: R_ARM_THM_CALL __crypto_memneq
>>>> 28: 3000 adds r0, #0
>>>> 2a: bf18 it ne
>>>> 2c: 2001 movne r0, #1
>>>> 2e: bd70 pop {r4, r5, r6, pc}
>>>> 30: 4632 mov r2, r6
>>>> 32: 4629 mov r1, r5
>>>> 34: 4620 mov r0, r4
>>>> 36: f7ff fffe bl 0 <curve25519_generic>
>>>> 36: R_ARM_THM_CALL curve25519_generic
>>>> 3a: e7ed b.n 18 <curve25519+0x18>
>>>>
>>>> curve25519_arch is a weak reference. It either gets satisfied at
>>>> module load time, or it doesn't.
>>>>
>>>> If it does get satisfied, the relocations covering the movw/movt pair
>>>> and the one covering the bl instruction get updated so that they point
>>>> to the arch routine.
>>>>
>>>> If it does not get satisfied, the relocations are disregarded, in
>>>> which case the cbz instruction at offset 0x10 jumps over the bl call.
>>>>
>>>> Note that this does not involve any memory accesses. It does involve
>>>> some code patching, but only of the kind the module loader already
>>>> does.
>>>
>>> Won’t this have the counterintuitive property that, if you load the modules
>>> in the opposite order, the reference won’t be re-resolved and performance
>>> will silently regress?
>>>
>>
>> Indeed, the arch module needs to be loaded first
>>
>
> Actually, this can be addressed by retaining the module dependencies
> as before, but permitting the arch module to be omitted at load time.
I think that, to avoid surprises, you should refuse to load the arch module if
the generic module is loaded, too.
>
>>> I think it might be better to allow two different modules to export the
>>> same symbol but only allow one of them to be loaded.
>>
>> That is what I am doing for chacha and poly
>>
>>> Or use static calls.
>
> Given that static calls don't actually exist yet, I propose to proceed
> with the approach above, and switch to static calls once all
> architectures where it matters have an implementation that does not
> use function pointers (which is how static calls will be implemented
> generically)