Re: Picky, but much more efficient arc4random_uniform!

Luke Small Sat, 14 May 2022 23:14:44 -0700

This is version 1, which I was pretty sure was secure.

I revamped it with a few features and implanted the binary search for 'bit'


in most cases, which aren't intentionally worst-case, it's pretty darned
fast!

This is a sample run of my program with your piece of code included:


 1  99319 100239
 2 100353  99584
 3 100032  99879
 4  99704 100229
 5  99530  99796
 6  99617 100355
 7 100490 100319
 8  99793 100114
 9 100426  99744
10 100315 100558
11  99279  99766
12  99572  99750
13  99955 100017
14 100413 100005
15 100190 100052
16 101071 100195
17 100322 100224
18  99637  99540
19 100323  99251
20  99841 100177
21  99948  99504
22 100065 100031
23 100026  99827
24  99836  99818
25 100245  99822
26 100088  99678
27  99957  99993
28 100065  99961
29 100701 100679
30  99756  99587
31 100220 100076
32 100067 100005
33  99547  99984
34 100124 100031
35  99547 100661
36  99801  99963
37 100189 100230
38  99878  99579
39  99864  99442
40  99683 100004
41  99907 100094
42 100291  99817
43  99908  99984
44 100044 100606
45 100065 100120
46  99358 100141
47 100152 100442
48 100000 100279
49 100486  99848








newdata
time = 0.240530757 seconds

newdata_fast
time = 0.073868626 seconds

The original takes 325.620% of the runtime of newdata_fast






newdataTypableFilename
time = 0.236748989 seconds

newdataTypableFilename_fast
time = 0.115842333 seconds

The original takes 204.372% of the runtime of newdataTypableFilename_fast






rand_ideal
time = 0.235832820 seconds

rand_ideal_fast
time = 0.025300798 seconds

The original takes 932.116% of the runtime of rand_ideal_fast






rand_short_ideal
time = 0.236828684 seconds

rand_short_ideal_fast
time = 0.124025922 seconds

The original takes 190.951% of the runtime of rand_short_ideal_fast






rand_short_worst
time = 0.237142869 seconds

rand_short_worst_fast
time = 0.294156486 seconds

The original takes 80.618% of the runtime of rand_short_worst_fast






rand_worst
time = 0.237002775 seconds

rand_worst_fast
time = 0.377148420 seconds

The original takes 62.841% of the runtime of rand_worst_fast






shuffle
time = 0.003044472 seconds

shuffle_fast
time = 0.002590664 seconds

The original takes 117.517% of the runtime of shuffle_fast

If it crashed here, you are trying to profile.
Turn off pledge at the beginning of main().



On Sat, May 14, 2022 at 7:03 PM Philip Guenther <guent...@gmail.com> wrote:

> On Sun, 15 May 2022, Steffen Nurpmeso wrote:
> > Stuart Henderson wrote in
> ...
> >  |what's the perceived problem you're wanting to solve? and does that
> >  |problem actually exist in the first place?
> >
> > The problem is that if have a low upper bound then modulo will "remove a
> > lot of randomization".  For example if you have a program which
> > generates Lotto numbers (1..49), then using _uniform() as it is will
> > generate many duplicates.
>
> Wut.  The *WHOLE POINT* of arc4random_uniform() is that it has uniform
> distribution.  Says right so in the manpage.  If an implementation of that
> API fails to do that, it's a broken implementation.
>
> The OpenBSD implementation achieves that.  NetBSD's implementation has the
> same core logic.  Here's my quick lotto pick demo program, doing 4900000
> picks, so they should all be somewhere near 100000:
>
> #include <stdlib.h>
> #include <stdio.h>
> #define LIMIT   49
> int
> main(void)
> {
>         unsigned counts[LIMIT] = { 0 };
>         int i;
>         for (i = 0; i < 100000 * LIMIT; i++)
>                 counts[arc4random_uniform(LIMIT)]++;
>         for (i = 0; i < LIMIT; i++)
>                 printf("%2d\t%7u\n", i+1, counts[i]);
>         return 0;
> }
>
> And a sample run:
>
> : bleys; ./a.out
>
>  1       100100
>  2       100639
>  3        99965
>  4        99729
>  5        99641
>  6        99650
>  7       100299
>  8       100164
>  9        99791
> 10       100101
> 11       100657
> 12       100042
> 13        99661
> 14        99927
> 15        99426
> 16        99491
> 17        99646
> 18       100133
> 19       100013
> 20        99942
> 21        99873
> 22        99924
> 23        99567
> 24       100152
> 25       100688
> 26       100011
> 27       100481
> 28        99980
> 29       100406
> 30        99726
> 31        99808
> 32        99929
> 33       100050
> 34        99983
> 35       100048
> 36        99771
> 37        99906
> 38       100215
> 39       100261
> 40       100426
> 41        99847
> 42        99533
> 43       100368
> 44        99695
> 45       100041
> 46       100465
> 47        99875
> 48       100034
> 49        99920
> : bleys;
>
> Looks pretty good to me, with repeated runs showing the values bouncing
> around.
>
>
> ...
> > I was a bit interested when i saw Luke's message, but i am no
> > mathematician and, like i said, i in fact never needed the
> > functionality.  And the question i would have is how small
> > upper_bound has to be for the modulo problem to really kick in.
>
> If the implementation isn't broken, it's not a problem, period.
>
>
> > And even if, whether it matters.  For example, if you have
> > a bitset of 1024 "in-flight things", and the small upper bound
> > hits a used slot, you could simply search forward until you find
> > a free one.  I mean, with 1024 possible values the number of
> > possibilities is anyhow restricted.
>
> Well hey, let's give Luke's a try and see how uniform it is:
>
>
> #include <stdlib.h>
> #include <stdio.h>
> #define LIMIT   49
> int
> main(void)
> {
>         unsigned counts[LIMIT] = { 0 };
>         unsigned counts2[LIMIT] = { 0 };
>         int i;
>         for (i = 0; i < 100000 * LIMIT; i++) {
>                 counts[arc4random_uniform(LIMIT)]++;
>                 counts2[arc4random_uniform_fast_simple(LIMIT)]++;
>         }
>         for (i = 0; i < LIMIT; i++)
>                 printf("%2d\t%7u\t%7u\n", i+1, counts[i], counts2[i]);
>         return 0;
> }
>
> : bleys; ./a.out
>  1       100188   76502
>  2        99983   76602
>  3       100521   76522
>  4       100416   76682
>  5       100171   76387
>  6       100163   76759
>  7       100024   76336
>  8       100009   76703
>  9        99769   76237
> 10        99892   76532
> 11       100197   76730
> 12       100483   76398
> 13        99769   76310
> 14       100075   76474
> 15        99781   76599
> 16        99846   76439
> 17        99814   76430
> 18       100313   76648
> 19       100259   76813
> 20        99885   77068
> 21       100302   76546
> 22        99998   76698
> 23        99491   76678
> 24       100340   76324
> 25        99763  115263
> 26        99872  153008
> 27       100022  152979
> 28        99481  153793
> 29       100018  210714
> 30        99617  229286
> 31       100167  297003
> 32       100270  449664
> 33       100468   76790
> 34        99115   76452
> 35        99921   76392
> 36        99862   76140
> 37       100485   76607
> 38       100029   75885
> 39        99577   76498
> 40        99479   76727
> 41       100139   76746
> 42       100883   76698
> 43       100102   76474
> 44        99801   76592
> 45       100117   76124
> 46        99678   76417
> 47        99770   76639
> 48        99524   77034
> 49       100151   76658
> : bleys;
>
> Wow, that last column is *bad*.  Repeated runs show 25-32 are consistently
> high, 32 being *outrageously* off, and the others are low.
>
> Luke's implementation does not correctly implement the API.  Doesn't
> matter if it's a million times faster when it doesn't deliver the goal.
>
>
> Philip Guenther
>

arc4random_uniform_fast.c
Description: Binary data

Re: Picky, but much more efficient arc4random_uniform!

Reply via email to