https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #30
--- Comment #29 from ubizjak at gmail dot com 2007-06-18 08:56 ---
Patch was committed to SVN, so closing as fixed.
--
ubizjak at gmail dot com changed:
What|Removed |Added
---
--- Comment #28 from uros at gcc dot gnu dot org 2007-06-16 09:53 ---
Subject: Bug 31723
Author: uros
Date: Sat Jun 16 09:52:48 2007
New Revision: 125756
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=125756
Log:
PR middle-end/31723
* hooks.c (hook_tree_tree_bool_null):
--- Comment #27 from burnus at gcc dot gnu dot org 2007-06-15 13:23 ---
Cross-pointer: see also PR 32352 (Polyhedron aermod.f90 crashes due
out-of-bounds problems to numerical differences using rsqrt/-mrecip).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
--- Comment #26 from ubizjak at gmail dot com 2007-06-14 09:18 ---
Patch at http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00944.html
--
ubizjak at gmail dot com changed:
What|Removed |Added
---
--- Comment #25 from ubizjak at gmail dot com 2007-06-13 20:20 ---
RFC patch at http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00916.html
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
--- Comment #24 from tbptbp at gmail dot com 2007-06-11 05:58 ---
Yes, but there's some fuss at 0 when you pile up a NR round.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
--- Comment #23 from ubizjak at gmail dot com 2007-06-11 05:51 ---
(In reply to comment #22)
> At some point icc did such transformations (for 1/x and sqrt) but, apparently,
> they're now removed. It didn't bother to plug every holes (ie wrt infinities)
> but at least got the case of 0
--- Comment #22 from tbptbp at gmail dot com 2007-06-11 03:32 ---
I'm a bit late to the debate but...
At some point icc did such transformations (for 1/x and sqrt) but, apparently,
they're now removed. It didn't bother to plug every holes (ie wrt infinities)
but at least got the case of
--- Comment #21 from rguenth at gcc dot gnu dot org 2007-06-10 21:48
---
The other issue is really about this bug, so not splitting.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
--- Comment #20 from rguenth at gcc dot gnu dot org 2007-06-10 21:46
---
PR32279 for 1/sqrt(x/y) to sqrt(y/x)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
--- Comment #19 from rguenther at suse dot de 2007-06-10 21:39 ---
Subject: Re: Use reciprocal and reciprocal square root
with -ffast-math
On Sun, 10 Jun 2007, ubizjak at gmail dot com wrote:
>
>
> --- Comment #18 from ubizjak at gmail dot com 2007-06-10 17:34 ---
> (In re
--- Comment #18 from ubizjak at gmail dot com 2007-06-10 17:34 ---
(In reply to comment #14)
> The interesting difference between sqrtss, divss and rcpss, rsqrtss is that
> the former have throughput of 1/16 while the latter are 1/1 (latencies compare
> 21 vs. 3). This is on K10. The o
--- Comment #17 from ubizjak at gmail dot com 2007-06-10 16:49 ---
(In reply to comment #0)
> /* Mathematically equivalent to 1/sqrt(b*(1/a)) */
> return sqrtf(a/b);
Whoa, this one is a little gem, but ATM in the opposite direction. At least for
-ffast-math we could optimize (a /
--- Comment #16 from ubizjak at gmail dot com 2007-06-10 16:24 ---
(In reply to comment #13)
> > x1 = 0.5 X0 (3.0 - A x0 x0 x0)
Whops! One x0 too much above. Correct calcualtion reads:
rsqrt = 0.5 rsqrt(a) (3.0 - a rsqrt(a) rsqrt(a)).
> Well, I suppose it depends on the hardware. IIR
--- Comment #15 from rguenth at gcc dot gnu dot org 2007-06-10 12:09
---
And of course optimizing division or square root this way violates IEEE 754
which
specifies these as intrinsic operations. So a separate flag from
-funsafe-math-optimization should be used for this optimization.
--- Comment #14 from rguenth at gcc dot gnu dot org 2007-06-10 12:07
---
The interesting difference between sqrtss, divss and rcpss, rsqrtss is that
the former have throughput of 1/16 while the latter are 1/1 (latencies compare
21 vs. 3). This is on K10. The optimization guide only me
--- Comment #13 from jb at gcc dot gnu dot org 2007-06-10 11:06 ---
(In reply to comment #11)
Thanks for the work.
> First, please note that "divss" instruction is quite _fast_, clocking at 23
> cycles, where approximation with NR step would sum up to 20 cycles, not
> counting load of
--- Comment #12 from ubizjak at gmail dot com 2007-06-10 10:47 ---
Here are the results of mubench insn timings for various x86 processors:
http://mubench.sourceforge.net/results.html (target processor can be
benchmarked by downloading mubench from
http://mubench.sourceforge.net/index.ht
--- Comment #11 from ubizjak at gmail dot com 2007-06-10 08:28 ---
I have experimented a bit with rcpss, trying to measure the effect of
additional NR step to the performance. NR step was calculated based on
http://en.wikipedia.org/wiki/N-th_root_algorithm, and for N=-1 (1/A) we can
simp
--
pinskia at gcc dot gnu dot org changed:
What|Removed |Added
Severity|normal |enhancement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
--- Comment #9 from rguenth at gcc dot gnu dot org 2007-04-27 22:03 ---
I looked at this at some time and in priciple it doens't require it. For the
vectorized call we'd need to support target dependent pattern vectorization,
for the scalar case we would need a new optab to handle 1/x e
--- Comment #8 from steven at gcc dot gnu dot org 2007-04-27 21:43 ---
I suppose this is something that requires new builtins?
--
steven at gcc dot gnu dot org changed:
What|Removed |Added
---
--- Comment #7 from burnus at gcc dot gnu dot org 2007-04-27 12:41 ---
> (float) time for 1.0 / sqrt = 5.96 sec (res = 2.845058125000e+05)
> (float) time for rsqrt = 2.49 sec (res = 2.23602250e+05)
> (double) time for 1.0 / sqrt = 7.35 sec (res = 5.9926234364635509e+0
--- Comment #6 from rguenth at gcc dot gnu dot org 2007-04-27 12:09 ---
You are right, they are only available for float precision.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
--- Comment #5 from jb at gcc dot gnu dot org 2007-04-27 12:01 ---
With the benchmarks at http://www.hlnum.org/english/doc/frsqrt/frsqrt.html
I get
~/src/benchmark/rsqrt% g++ -O3 -funroll-loops -ffast-math -funit-at-a-time
-march=k8 -mfpmath=sse frsqrt.cc
~/src/benchmark/rsqrt% ./a.out
--- Comment #4 from jb at gcc dot gnu dot org 2007-04-27 11:29 ---
(In reply to comment #3)
> 1. Convert to single precision
> 2. Calculate rcp(s|p)s or rsqrt(p|s)s
> 3. Refine with newton iteration
>
> vs. just using div(p|s)d or sqrt(p|s)d?
This should be
1. Convert to single precis
--- Comment #3 from jb at gcc dot gnu dot org 2007-04-27 11:27 ---
(In reply to comment #2)
> Note that SSE can vectorize only the float precision variant, not the double
> precision one. So one needs to carefuly either disable vectorization for the
> double variant to get reciprocal co
--- Comment #2 from rguenth at gcc dot gnu dot org 2007-04-27 10:45 ---
Note that SSE can vectorize only the float precision variant, not the double
precision one. So one needs to carefuly either disable vectorization for the
double variant to get reciprocal code or the other way around
--- Comment #1 from burnus at gcc dot gnu dot org 2007-04-27 10:16 ---
Comment by Richard Guenther in the same thread:
-
I think that even with -ffast-math 12 bits accuracy is not ok. There is
the possibility of doing another newton iteration step to improve
accuracy, th
30 matches
Mail list logo