[Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread camel-cdr
I tried to implement a different implementation of the ziggurat method for 
generating standard normal distributions that is about twice as fast and uses 
2/3 of the memory than the old one.
I tested the implementation separately and am very confident it's correct, but 
it does fail 28 test in coverage testing.
Checking the testing code I found out that all the failed tests are inside 
TestRandomDist which has the goal of "Make[ing] sure the random distribution 
returns the correct value for a given seed". Why would this be needed?
The only explanation I can come up with is that it's standard_normal is, in 
regards to seeding, required to be backwards compatible. If that's the case how 
would, could one even implement a new algorithm?___
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread camel-cdr
> Well, I can tell you why it needs to be backward compatible! I use random 
> numbers fairly frequently, and to unit test them I set a specific seed and 
> then make sure I get the same answers.

Hmm, I guess that makes sense. I tried to adjust my algorithms to do the same 
thing with the same bit's as the original one, but I couldn't get it to work.

> Have you benchmarked it using the generator interface? The structure of this 
> as a no monolithic generator makes it a good deal slower than generating in 
> straight C (with everything inline). While I'm not sure a factor of 2 is 
> enough to justify a change (for me 10x, 1.2x is not but I don't know where 
> the cutoff is).

I originally benchmarked my implementation against a bunch of other ones in c 
(because I was developing a c library 
https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h#L1928).
But I did run the built-in benchmark: ./runtests.py --bench 
bench_random.RNG.time_normal_zig and the results are:

new old
PCG64 589±3μs 1.06±0.03ms
MT19937 985±4μs 1.44±0.01ms
Philox 981±30μs 1.39±0.01ms
SFC64 508±4μs 900±4μs
numpy 2.99±0.06ms 2.98±0.01ms # no change for /dev/urandom

I'm not yet 100% certain about the implementations, but I attached a diff of my 
current progress.diff --git a/numpy/random/src/distributions/distributions.c b/numpy/random/src/distributions/distributions.c
index 93e0bdc5f..447d5e161 100644
--- a/numpy/random/src/distributions/distributions.c
+++ b/numpy/random/src/distributions/distributions.c
@@ -46,9 +46,9 @@ void random_standard_uniform_fill_f(bitgen_t *bitgen_state, npy_intp cnt, float
 static double standard_exponential_unlikely(bitgen_t *bitgen_state,
 uint8_t idx, double x) {
   if (idx == 0) {
 /* Switch to 1.0 - U to avoid log(0.0), see GH 13361 */
-return ziggurat_exp_r - log(1.0 - next_double(bitgen_state));
+return (double)ZIGGURAT_EXP_R - log(1.0 - next_double(bitgen_state));
   } else if ((fe_double[idx - 1] - fe_double[idx]) * next_double(bitgen_state) +
  fe_double[idx] <
  exp(-x)) {
 return x;
@@ -83,9 +83,9 @@ void random_standard_exponential_fill(bitgen_t * bitgen_state, npy_intp cnt, dou
 static float standard_exponential_unlikely_f(bitgen_t *bitgen_state,
  uint8_t idx, float x) {
   if (idx == 0) {
 /* Switch to 1.0 - U to avoid log(0.0), see GH 13361 */
-return ziggurat_exp_r_f - logf(1.0f - next_float(bitgen_state));
+return (float)ZIGGURAT_EXP_R - logf(1.0f - next_float(bitgen_state));
   } else if ((fe_float[idx - 1] - fe_float[idx]) * next_float(bitgen_state) +
  fe_float[idx] <
  expf(-x)) {
 return x;
@@ -132,41 +132,36 @@ void random_standard_exponential_inv_fill_f(bitgen_t * bitgen_state, npy_intp cn
 out[i] = -log(1.0 - next_float(bitgen_state));
   }
 }
 
-
 double random_standard_normal(bitgen_t *bitgen_state) {
-  uint64_t r;
-  int sign;
-  uint64_t rabs;
-  int idx;
-  double x, xx, yy;
-  for (;;) {
-/* r = e3n52sb8 */
-r = next_uint64(bitgen_state);
-idx = r & 0xff;
-r >>= 8;
-sign = r & 0x1;
-rabs = (r >> 1) & 0x000f;
-x = rabs * wi_double[idx];
-if (sign & 0x1)
-  x = -x;
-if (rabs < ki_double[idx])
-  return x; /* 99.3% of the time return here */
+  while (1) {
+double x, y, f0, f1;
+union { uint64_t u; double f; } u;
+const uint64_t u64 = next_uint64(bitgen_state);
+const uint_fast8_t idx = (u64 >> 1) & 0xff;
+const double uf64 = ((u64 >> 11) * (1.0 / (UINT64_C(1) << 53))) *
+xtbl_double[idx];
+
+if (uf64 < xtbl_double[idx + 1])
+  return u.f = uf64, u.u |= (u64 & 1) << 63, u.f;
+
 if (idx == 0) {
-  for (;;) {
-/* Switch to 1.0 - U to avoid log(0.0), see GH 13361 */
-xx = -ziggurat_nor_inv_r * log(1.0 - next_double(bitgen_state));
-yy = -log(1.0 - next_double(bitgen_state));
-if (yy + yy > xx * xx)
-  return ((rabs >> 8) & 0x1) ? -(ziggurat_nor_r + xx)
- : ziggurat_nor_r + xx;
-  }
-} else {
-  if (((fi_double[idx - 1] - fi_double[idx]) * next_double(bitgen_state) +
-   fi_double[idx]) < exp(-0.5 * x * x))
-return x;
+  do {
+x = log(next_double(bitgen_state)) * 1.0 / ZIGGURAT_NORM_R;
+y = log(next_double(bitgen_state));
+  } while (-(y + y) < x * x);
+  if (u64 & 1)
+return x - (double)ZIGGURAT_NORM_R;
+  else
+return (double)ZIGGURAT_NORM_R - x;
 }
+
+y = uf64 * uf64;
+f0 = exp(-0.5 * (xtbl_double[idx] * xtbl_double[idx] - y));
+f1 = exp(-0.5 * (xtbl_double[idx + 1] * xtbl_double[idx + 1] - y));
+if (f1 + next_double(bitgen_state) * (f0 - f1) <

Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread camel-cdr
‐‐‐ Original Message ‐‐‐
On Saturday, February 6, 2021 3:29 PM, Robert Kern  
wrote:

> On Sat, Feb 6, 2021 at 7:27 AM  wrote:
>
>>> Well, I can tell you why it needs to be backward compatible! I use random 
>>> numbers fairly frequently, and to unit test them I set a specific seed and 
>>> then make sure I get the same answers.
>>
>> Hmm, I guess that makes sense. I tried to adjust my algorithms to do the 
>> same thing with the same bit's as the original one, but I couldn't get it to 
>> work.
>
> To be clear, this is not our backwards compatibility policy for the methods 
> that you have modified. Our policy is spelled out here:
>
> https://numpy.org/neps/nep-0019-rng-policy.html
>
> The TestRandomDist suite of tests were adapted from the older RandomState 
> (which is indeed frozen and not allowed to change algorithms). It's a mix of 
> correctness tests that are valid regardless of the precise algorithm (does 
> this method reject invalid arguments? do degenerate arguments yield the 
> correct constant value?) and actual "has this algorithm changed 
> unexpectedly?" tests. The former are the most valuable, but the latter are 
> useful for testing in cross-platform contexts. Compilers and different CPUs 
> can do naughty things sometimes, and we want the cross-platform differences 
> to be minimal. When you do change an algorithm implementation for Generator, 
> as you have done, you are expected to do thorough tests (offline, not in the 
> unit tests) that it is correctly sampling from the target probability 
> distribution, then once satisfied, change the hard-coded values in 
> TestRandomDist to match whatever you are generating.
>
> --
> Robert Kern

Ok, cool, that basically explains a lot.

> When you do change an algorithm implementation for Generator, as you have 
> done, you are expected to do thorough tests (offline, not in the unit tests) 
> that it is correctly sampling from the target probability distribution, then 
> once satisfied, change the hard-coded values in TestRandomDist to match 
> whatever you are generating.

I'm probably not versed enough in statistics to do thorough testing. I used the 
testing in https://www.seehuhn.de/pages/ziggurat and plotting histograms to 
verify correctness, that probably won't be sufficient.___
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion