[+djm, who wrote this code] I agree that simply "min = -upper_bound % upper_bound" should be sufficient in all cases, since u_int32_t arithmetic is defined as modulo 2**32 by the C standard, at least as of C99 and I think C89 too. (Even if we supported any 1s-complement architectures, the compiler would still need to implement u_int32_t as modulo 2**32.)
I also think it makes sense to get rid of the LP64 test, because 64-bit division still takes more than twice as long as 32-bit division on most amd64 processors for example (according to http://gmplib.org/~tege/x86-timing.pdf). Of course, the potential benefit here isn't that great, so I don't know whether this really makes sense to worry about. On Fri, Jun 8, 2012 at 5:13 AM, Jorden Verwer <jor...@vssd.nl> wrote: > Hello OpenBSD developers, > > Let me state in advance that I'm not entirely sure if I'm sending this > to the right mailing list. Based on the descriptions, however, I do > believe tech@ is the correct one instead of misc@. Furthermore, I'm not > subscribed to any of the lists, so please CC me if you should want to > reply. Finally, I've sent this message before from another account, but > it didn't appear in any of the archives. If it did reach the list, I > apologize. In the meantime, I did write some extra things that could be > interesting. > > I've been looking at the way several (BSD) operating systems implement > random numbers, in the context of an online election system (what to do > with an equal number of votes per seat, etc.). My current implementation > reads a byte from /dev/random and converts it to the required range, > possibly reading more bytes to avoid the systematic bias a naive > implementation has for ranges that are not a power of 2. It works > correctly, but there are some considerations (transparency and chroot > jail compatibility) that have led me to look at alternative > implementations. This is where arc4random_uniform comes in. The way it > avoids biased results is different from mine (my implementation uses the > high-order bits instead), but the main idea of throwing away results > that are outside the range that is a whole multiple of the desired range > is exactly the same. It might be a viable alternative if I can convince > myself (and others) that the values it generates do not favor one party > over the other, no matter how slightly. Obviously, the criteria are > different from those applied to cryptography, but I don't consider > myself competent to express them formally. > > This is why I compared the function in lib/libc/crypt/arc4random.c to > that in sys/dev/rnd.c. I was surprised to find that the 32-bit > arithmetic in the libc version differs from that in the kernel version. > Apparently, back in 2008 some change was only applied to the kernel, not > to libc. > > This is what the kernel source code says: > min = ((0xffffffff - upper_bound) + 1) % upper_bound; > > This is what the libc source code says: > min = ((0xffffffff - (upper_bound * 2)) + 1) % upper_bound; > > Now, I'm not a math geek, but it seems pretty obvious to me that the > version in the kernel source is to be preferred, given that it's simpler > and does exactly what it has to do. I'm assuming this is also the reason > the change was implemented, and that's what the commit log seems to say > as well. So, I wonder why the libc source code was not changed. Was this > an oversight? > > Secondly, the kernel code has the added advantage that it works for any > possible value of upper_bound (you can check this for yourself if you > don't believe me). This means that the upper_bound > 0x80000000 check > can be removed, making the common case faster (upper_bound <= 0x80000000 > is definitely the common case in my book) and everything easier to > understand and cleaner. Personally I don't see any real drawbacks, but > feel free to disagree. ;) > > There are also other possible implementations that are even shorter, > although they aren't necessarily easier to understand. In summary, the > part between #else and #endif could be replaced by any of these lines: > min = ((0xffffffff - upper_bound) + 1) % upper_bound; > min = (1 + ~upper_bound) % upper_bound; > min = (0 - upper_bound) % upper_bound; > min = -upper_bound % upper_bound; > > The latter two implementations work correctly because the C standard > defines modulo arithmetic for unsigned integers. Replace 0 by 2**32 and > you'll see why it does what it's supposed to. I'm not entirely sure the > last implementation would work on a machine that doesn't use two's > complement. The standard is vague in its wording. > > Another option still would be to replace the entire #if block by this: > min = (0x100000000 - upper_bound) % upper_bound; > > This will continue to work correctly even if int ever becomes 64-bit. > > Kind regards, > > Jorden Verwer