Hello OpenBSD developers,

Let me state in advance that I'm not entirely sure if I'm sending this
to the right mailing list. Based on the descriptions, however, I do
believe tech@ is the correct one instead of misc@. Furthermore, I'm not
subscribed to any of the lists, so please CC me if you should want to
reply. Finally, I've sent this message before from another account, but
it didn't appear in any of the archives. If it did reach the list, I
apologize. In the meantime, I did write some extra things that could be
interesting.

I've been looking at the way several (BSD) operating systems implement
random numbers, in the context of an online election system (what to do
with an equal number of votes per seat, etc.). My current implementation
reads a byte from /dev/random and converts it to the required range,
possibly reading more bytes to avoid the systematic bias a naive
implementation has for ranges that are not a power of 2. It works
correctly, but there are some considerations (transparency and chroot
jail compatibility) that have led me to look at alternative
implementations. This is where arc4random_uniform comes in. The way it
avoids biased results is different from mine (my implementation uses the
high-order bits instead), but the main idea of throwing away results
that are outside the range that is a whole multiple of the desired range
is exactly the same. It might be a viable alternative if I can convince
myself (and others) that the values it generates do not favor one party
over the other, no matter how slightly. Obviously, the criteria are
different from those applied to cryptography, but I don't consider
myself competent to express them formally.

This is why I compared the function in lib/libc/crypt/arc4random.c to
that in sys/dev/rnd.c. I was surprised to find that the 32-bit
arithmetic in the libc version differs from that in the kernel version.
Apparently, back in 2008 some change was only applied to the kernel, not
to libc.

This is what the kernel source code says:
min = ((0xffffffff - upper_bound) + 1) % upper_bound;

This is what the libc source code says:
min = ((0xffffffff - (upper_bound * 2)) + 1) % upper_bound;

Now, I'm not a math geek, but it seems pretty obvious to me that the
version in the kernel source is to be preferred, given that it's simpler
and does exactly what it has to do. I'm assuming this is also the reason
the change was implemented, and that's what the commit log seems to say
as well. So, I wonder why the libc source code was not changed. Was this
an oversight?

Secondly, the kernel code has the added advantage that it works for any
possible value of upper_bound (you can check this for yourself if you
don't believe me). This means that the upper_bound > 0x80000000 check
can be removed, making the common case faster (upper_bound <= 0x80000000
is definitely the common case in my book) and everything easier to
understand and cleaner. Personally I don't see any real drawbacks, but
feel free to disagree. ;)

There are also other possible implementations that are even shorter,
although they aren't necessarily easier to understand. In summary, the
part between #else and #endif could be replaced by any of these lines:
min = ((0xffffffff - upper_bound) + 1) % upper_bound;
min = (1 + ~upper_bound) % upper_bound;
min = (0 - upper_bound) % upper_bound;
min = -upper_bound % upper_bound;

The latter two implementations work correctly because the C standard
defines modulo arithmetic for unsigned integers. Replace 0 by 2**32 and
you'll see why it does what it's supposed to. I'm not entirely sure the
last implementation would work on a machine that doesn't use two's
complement. The standard is vague in its wording.

Another option still would be to replace the entire #if block by this:
min = (0x100000000 - upper_bound) % upper_bound;

This will continue to work correctly even if int ever becomes 64-bit.

Kind regards,

Jorden Verwer

Reply via email to