------- Comment #19 from rguenther at suse dot de 2007-06-10 21:39 ------- Subject: Re: Use reciprocal and reciprocal square root with -ffast-math
On Sun, 10 Jun 2007, ubizjak at gmail dot com wrote: > > > ------- Comment #18 from ubizjak at gmail dot com 2007-06-10 17:34 ------- > (In reply to comment #14) > > The interesting difference between sqrtss, divss and rcpss, rsqrtss is that > > the former have throughput of 1/16 while the latter are 1/1 (latencies > > compare > > 21 vs. 3). This is on K10. The optimization guide only mentions > > calculating > > the reciprocal y = a/b via rcpss and the square root (!) via rsqrtss > > (sqrt a = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a))) > > > > So the optimization would be mainly to improve instruction throughput, not > > overall latency. > > If this is the case, then middle-end will need to fold sqrtss in different way > for targets that prefer rsqrtss. According to Comment #16, it is better to > fold > to 1.0/sqrt(c/b) instead of sqrt(b/c) because this way, we will loose one > multiplication during NR expansion by rsqrt [due to sqrt(x) <=> x * (1.0 / > sqrt(x))]. > > IMO we need a new tree code to handle reciprocal sqrt - RSQRT_EXPR, together > with proper folding functionality that expands directly to (NR-enhanced) rsqrt > optab. If we consider a*sqrt(b/c), then b/c will be expanded as b* NR-rcp(c) > [where NR-rcp stands for NR enhanced rcp] and sqrt will be expanded as > NR-rsqrt. In this case, I see no RTL pass that would be able to combine > everything together in order to swap (b/c) operands to produce NR-enhanced > a*rsqrt(c/b) equivalent. We just need a new builtin function, __builtin_rsqrt and at some stage replace reciprocals of sqrt with the new builtin. For example in tree-ssa-math-opts.c which does the existing reciprocal transforms. For example a target hook could be provided that would for example look like tree target_fn_for_expr (tree expr); and return a target builtin decl for the given expression. And we should start splitting this PR ;) One for a/sqrt(b/c) and one for the above transformation. Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723