> On Sat, Feb 11, 2017 at 09:58:10AM +0100, Richard Cochran wrote: > > If I am not mistaken, then you can skip the cases val==2 and val==3, > > because they are equivalent to val==4 and 6. > > I took a stab at this, and you can see the result, below. My version has > lower average error than yours in the interval 1 < ppb < 60000, and it uses > only 8 64-bit divisions. Outside of that interval, your version has lower > error. > > So, at the very least, you should introduce a threshold and use this > algorithm for adjustments under 60 ppm. Better yet, find a way to use fewer > divisions for adjustments greater and 60 ppm... > > Thanks, > Richard > > --- > #include <stdint.h> > > #define TEN9 (1000000000UL) > > unsigned int calc_min_integer(uint64_t ppb, uint64_t *M) { > uint64_t err, m, min, n, N, p2, reg; > > min = TEN9; > for (n = 4; n <= 7; n++) { > m = n * TEN9; > m = (m + ppb/2) / ppb; > > /*truncate to HW resolution*/ > reg = (m + 8) / 16; > m = reg * 16; > > p2 = (n * TEN9 + m/2) / m; > > err = ppb > p2 ? ppb - p2 : p2 - ppb; > > if (min >= err) { > min = err; > N = n; > *M = m; > } > } > return N; > }
Richard, there are quite a bit of inaccuracies in the calculation here. I think it's best I'll clarify some things about the used algorithm - Given a fraction represented by (ppb / 10^9), the purpose of the loop is to best approximate it by a (val / ((16 * period) + 8) fraction, where val belongs to {0,..,7} and period to {0,...,0xffffff}. We'd need 'val' and 'period' from the calculation to configure the HW. The '+8' is not some sort of rounding correction, but rather part of the required configuration. Notice that if we wouldn't have needed it, we could have reduced some of the iterations [since (val / 16 * period) can be reduced for powers of two]. But given that we do, skipping those iteration would cause us to lose accuracy. Your suggestion seems to: a. Assume that the required period should be in ns, not in 16*ns units. b. mishandles the +8/-8 in the calculation. c. Doesn't seem to consider the upper bound on period. In addition, the rounding-up via ppb is likely to harm accuracy for higher ppb values. One thing I still don't get is *why* we're trying to optimize this area of the code - I ran a variant of the original code in a loop [on requiring 24 divisions per Call] on various PPB values, and managed close to 1M/s [single CPU, 64bit, 2.50GHz]. Given that the real use-case is somewhere between 1-16/second, why do we treat it as if its performance is crucial? Thanks, Yuval