Here's another one with no commit message and no SOB :-/ (cleaning up my inbox, now that Subsurface-mobile has been released)
/D On Fri, Mar 04, 2016 at 11:56:31AM -0800, Linus Torvalds wrote: > On Fri, Mar 4, 2016 at 1:04 AM, Robert Helling <[email protected]> wrote: > > > > But if you really wanted to tweet things, there are two things that could be > > factored out: Don’t divide o2 and he by 1000.0, rather do that only once in > > the end. > > Yes. And make it a "multiply by 0.001" instead - that not only avoids > the divide, it makes it possible for the compiler to do more madd > instructions (not that we'll see those on x86 any time soon due to > compatibility issues, even though they are starting to exist) > > > And instead of doing the weighted average of the Z factors, you > > could equivalently do the weighted average of the coefficients of the > > polynomial. That you would need to do only once per gas and save those > > values as we compute Z for many pressures for the same gas. > > The testing overhead for "is it the same gas" would actually overwhelm > the calculations. Floating point isn't really *that* expensive in the > end. > > Now, the whole thing is actually amenable to be written in vector > form, but that's just too much. > > Anyway, having done a release build (in order to get reasonable > optimizations), the code generated is actually quite good.. It ends up > being just 52 instructions, of which 41 are floating point > instructions (multiply, add, convert from integer, load fp constant). > The remaining 11 instructions are just the trivial "load o2 and > helium, check for air, subtract 1000-o2-he". > > It would be even better if we had multiply-add instructions, but it's > really not bad at all. > > Even vectorizing it might actually just make things worse: turning on > the vector units can be much more expensive than the couple of > instructions we'd win. > > I'm attaching the trivial patch to get to that state here, but I'm not > even sure it's worth it: with optimizations, gcc actually turns our > *current* code into fine code. > > Yeah, without this patch there's a couple of "divsd" instructions, but > they don't dominate, so it's actually likely that they schedule fine. > The latency of a divsd is only 3-4 times the latency of a multiply on > modern cores - we're talking fewer cycles than a cache miss. > > Linus > subsurface-core/gas-model.c | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/subsurface-core/gas-model.c b/subsurface-core/gas-model.c > index 81765e003002..4612d99c86d8 100644 > --- a/subsurface-core/gas-model.c > +++ b/subsurface-core/gas-model.c > @@ -40,22 +40,25 @@ double gas_compressibility_factor(struct gasmix *gas, > double bar) > -0.00000004077670019935, > +0.00000000000077707035 > }; > - double o2, he; > + int o2, he; > double x1, x2, x3; > double Z; > > - o2 = get_o2(gas) / 1000.0; > - he = get_he(gas) / 1000.0; > + o2 = get_o2(gas); > + he = get_he(gas); > > x1 = bar; x2 = x1*x1; x3 = x2*x1; > > Z = virial_m1(o2_coefficients, x1, x2, x3) * o2 + > virial_m1(he_coefficients, x1, x2, x3) * he + > - virial_m1(n2_coefficients, x1, x2, x3) * (1.0 - o2 - he); > + virial_m1(n2_coefficients, x1, x2, x3) * (1000 - o2 - he); > > /* > * We add the 1.0 at the very end - the linear mixing of the > * three 1.0 terms is still 1.0 regardless of the gas mix. > + * > + * The "*0.001" is because we multiplied in the gas fractions > + * as whole permille. > */ > - return Z + 1.0; > + return Z * 0.001 + 1.0; > } _______________________________________________ subsurface mailing list [email protected] http://lists.subsurface-divelog.org/cgi-bin/mailman/listinfo/subsurface
