Eric, Jason, Thanks for the fast replies!
On 30/11/2025, 00:22, "Eric Biggers" <[email protected] <mailto:[email protected]>> wrote: > I think you may be underestimating how much the requirements of the > kernel differ from userspace. There is no doubt this is the case -- I am not a kernel guy -- so the points you raise are very valuable. Equally, you may be underestimating how much work it is to go from a static verification-only code to something that the community will be able to work with and extend in the future. There's clearly opportunity to learn from each other here. If this patch forms the 'mldsa-v1' for the kernel, it would be great to work together to see if 'mldsa-v2' could come from mldsa-native. > In none of them has the kernel community been successful with > integrating a project wholesale, vs. just taking individual files. I take that as a challenge. With AWS-LC we were also told that mlkem-native would never be able to integrate wholesale -- and now it is. It's a matter of goodwill and collaboration, and not a binary yes/no -- if selected but minimal patches are needed, that's still better than an entirely separate implementation, in my mind. > - Kernel stack is 8 KB to 16 KB. ... Yes, as mentioned we started working on a) bringing the memory usage down, and b) making the use of heap/stack configurable. > - Vector registers (e.g. AVX) can be used in the kernel only in some > contexts, and only when they are explicitly saved and restored. So > we have to do our own integration of any code that uses them anyway. > There is also more overhead to each vector-optimized function than > there is in userspace, so very fine-grained optimization (e.g. as is > used in the Dilithium reference code) doesn't work too well. That's very useful, can you say more? Would one want some sort of configurable preamble/postamble in the top-level API which takes care of the necessary save/restore logic? What is the per-function overhead? > - The vector intrinsics like <immintrin.h> can't be used in the > kernel, as they depend on userspace headers. Thus, vector > instructions can generally be used only in assembly code. I believe > this problem is solvable with a combination of changes to GCC, clang, > and the kernel, and I'd like to see that happen. But someone would > need to do it. The use of intrinsics is on the way out; the kernel isn't the only project who can't use them. Using assembly is also more suitable for our optimization and verification approach in mlkem-native and mldsa-native: We superoptimize some assembly using SLOTHY (https://github.com/slothy-optimizer/slothy/) and then do 'post-hoc' verification of the final object code using the HOL-Light/s2n-bignum (https://github.com/awslabs/s2n-bignum/) infrastructure. In mlkem-native, all AArch64 assembly is developed and verified in this way; in mldsa-native, we just completed the verification of the AVX2 assembly for the base multiplication and the NTT. > Note that the kernel already has optimized Keccak code. That already > covers the most performance-critical part of ML-DSA. No, this would need _batched_ Keccak. An ML-DSA implementation using only 1x-Keccak will never have competitive performance. See https://github.com/pq-code-package/mldsa-native/pull/754 for the performance loss from using unbatched Keccak only, on a variety of platforms; it's >2x for some. In turn, if you want to integrate batched Keccak -- but perhaps only on some platforms? -- you need to rewrite your entire code to make use of it. That's not a simple change, and part of what I mean when I say that the challenges are just deferred. Note that the official reference and AVX2 implementations duck this problem by duplicating the code and adjusting it, rather than looking for a common structure that could host both 'plain' and batched Keccak. I assume the amount of code duplication this brings would be unacceptable. On 30/11/2025, 01:06, "Jason A. Donenfeld" <[email protected] <mailto:[email protected]>> wrote: > I've added a bit of formally verified code to the kernel, and also > ported some userspace crypto. In these cases, I wound up working with > the authors of the code to make it more suitable to the requirements > of kernel space -- even down to the formatting level. For example, the > HACL* project needed some changes to KReMLin to make the variety of > code fit into what the kernel expected. Andy Polyakov's code needed > some internal functions exposed so that the kernel could do cpu > capability based dispatch. And so on and so forth. There's always > _something_. 100%. This is where we need support from someone in the kernel to even know what needs doing. The caveat regarding SIMD usage Eric mentioned is a good example. The CPU capability based dispatch, for example, was something we flushed out when we did the AWS-LC integration: dispatch is now configurable. > If those are efforts you'd consider undertaking seriously, I'd be > happy to assist or help guide the considerations. We are taking mlkem/mldsa-native seriously and want to make them as usable as possible. So, regardless of whether they'd ultimately end up in the kernel, any support of the form "If you wanted to integrate this in environment XXX [like the kernel], then you would need ..." is very useful and we'd be grateful for it. I don't expect this to be something we can rush through in a couple of days, but something that's achieved with steady progress and collaboration. > Anyway, the bigger picture is that I'm very enthusiastic about getting > formally verified crypto in the kernel, so these types of efforts are > really very appreciated and welcomed. But it just takes a bit more > work than usual. Thank you, Jason, this is great to hear, and if you had time to work with us, we'd really appreciate it. Thanks, Hanno & Matthias
