> I agree that simply "min = -upper_bound % upper_bound" should be > sufficient in all cases, since u_int32_t arithmetic is defined as > modulo 2**32 by the C standard, at least as of C99 and I think C89 > too. (Even if we supported any 1s-complement architectures, the > compiler would still need to implement u_int32_t as modulo 2**32.) Indeed. I was looking at it from a correctness point of view instead of trying to determine if it would work in practice.
> I also think it makes sense to get rid of the LP64 test, because > 64-bit division still takes more than twice as long as 32-bit division > on most amd64 processors for example (according to > http://gmplib.org/~tege/x86-timing.pdf). And to reduce complexity, of course. > Of course, the potential benefit here isn't that great, so I don't > know whether this really makes sense to worry about. Oh, there are certainly more important matters, but you know how these things go. You see something that can be improved and it turns into an itch that needs to be scratched. The quickest and best way to do so was to send an email to this list. Then when I wrote the message, I started thinking about whether this really was the best implementation or it could be improved further. I freely admit that it doesn't make any difference in the grand scheme of things, but there's also the minute scheme of things. ;)