crypto: Add ML-DSA verification support

Becker, Hanno Sat, 29 Nov 2025 23:16:00 -0800

Eric, Jason,

Thanks for the fast replies!

On 30/11/2025, 00:22, "Eric Biggers" <[email protected]
<mailto:[email protected]>> wrote:
> I think you may be underestimating how much the requirements of the
> kernel differ from userspace.

There is no doubt this is the case -- I am not a kernel guy -- so the
points you raise are very valuable.

Equally, you may be underestimating how much work it is to go from a
static verification-only code to something that the community will be
able to work with and extend in the future.

There's clearly opportunity to learn from each other here. If this patch
forms the 'mldsa-v1' for the kernel, it would be great to work together
to see if 'mldsa-v2' could come from mldsa-native.

> In none of them has the kernel community been successful with
> integrating a project wholesale, vs. just taking individual files.

I take that as a challenge. With AWS-LC we were also told that
mlkem-native would never be able to integrate wholesale -- and now it
is. It's a matter of goodwill and collaboration, and not a binary
yes/no -- if selected but minimal patches are needed, that's still
better than an entirely separate implementation, in my mind.

> - Kernel stack is 8 KB to 16 KB. ...

Yes, as mentioned we started working on a) bringing the memory usage
down, and b) making the use of heap/stack configurable.

> - Vector registers (e.g. AVX) can be used in the kernel only in some
>  contexts, and only when they are explicitly saved and restored.  So
>  we have to do our own integration of any code that uses them anyway.
>  There is also more overhead to each vector-optimized function than
>  there is in userspace, so very fine-grained optimization (e.g. as is
>  used in the Dilithium reference code) doesn't work too well.

That's very useful, can you say more? Would one want some sort of
configurable preamble/postamble in the top-level API which takes care of
the necessary save/restore logic?

What is the per-function overhead?

> - The vector intrinsics like <immintrin.h> can't be used in the
>  kernel, as they depend on userspace headers.  Thus, vector
>  instructions can generally be used only in assembly code.  I believe
>  this problem is solvable with a combination of changes to GCC, clang,
>  and the kernel, and I'd like to see that happen.  But someone would
>  need to do it.

The use of intrinsics is on the way out; the kernel isn't the only
project who can't use them.

Using assembly is also more suitable for our optimization and
verification approach in mlkem-native and mldsa-native: We superoptimize
some assembly using SLOTHY (https://github.com/slothy-optimizer/slothy/)
and then do 'post-hoc' verification of the final object code using
the HOL-Light/s2n-bignum (https://github.com/awslabs/s2n-bignum/)
infrastructure. In mlkem-native, all AArch64 assembly is developed and
verified in this way; in mldsa-native, we just completed the
verification of the AVX2 assembly for the base multiplication and the
NTT.

> Note that the kernel already has optimized Keccak code.  That already
> covers the most performance-critical part of ML-DSA.

No, this would need _batched_ Keccak. An ML-DSA implementation using
only 1x-Keccak will never have competitive performance. See
https://github.com/pq-code-package/mldsa-native/pull/754 for the
performance loss from using unbatched Keccak only, on a variety of
platforms; it's >2x for some.

In turn, if you want to integrate batched Keccak -- but perhaps only on
some platforms? -- you need to rewrite your entire code to make use of
it. That's not a simple change, and part of what I mean when I say that
the challenges are just deferred. Note that the official reference and
AVX2 implementations duck this problem by duplicating the code and
adjusting it, rather than looking for a common structure that could host
both 'plain' and batched Keccak. I assume the amount of code duplication
this brings would be unacceptable.

On 30/11/2025, 01:06, "Jason A. Donenfeld" <[email protected]
<mailto:[email protected]>> wrote:

> I've added a bit of formally verified code to the kernel, and also
> ported some userspace crypto. In these cases, I wound up working with
> the authors of the code to make it more suitable to the requirements
> of kernel space -- even down to the formatting level. For example, the
> HACL* project needed some changes to KReMLin to make the variety of
> code fit into what the kernel expected. Andy Polyakov's code needed
> some internal functions exposed so that the kernel could do cpu
> capability based dispatch. And so on and so forth. There's always
> _something_.

100%. This is where we need support from someone in the kernel to even
know what needs doing. The caveat regarding SIMD usage Eric mentioned is
a good example. The CPU capability based dispatch, for example, was
something we flushed out when we did the AWS-LC integration: dispatch is
now configurable.

> If those are efforts you'd consider undertaking seriously, I'd be
> happy to assist or help guide the considerations.

We are taking mlkem/mldsa-native seriously and want to make them as
usable as possible. So, regardless of whether they'd ultimately end up
in the kernel, any support of the form "If you wanted to integrate this
in environment XXX [like the kernel], then you would need ..." is very
useful and we'd be grateful for it.

I don't expect this to be something we can rush through in a couple of
days, but something that's achieved with steady progress and
collaboration.

> Anyway, the bigger picture is that I'm very enthusiastic about getting
> formally verified crypto in the kernel, so these types of efforts are
> really very appreciated and welcomed. But it just takes a bit more
> work than usual.

Thank you, Jason, this is great to hear, and if you had time to work with
us, we'd really appreciate it.

Thanks,
Hanno & Matthias

Re: [PATCH 1/4] lib/crypto: Add ML-DSA verification support

Reply via email to