On Jun 24, 2010, at 4:08 PM, Lasse Kliemann wrote:
a <- 0 ; for(i in (1:200000000)) a <- a + 1/i
b <- 0 ; for(i in (200000000:1)) b <- b + 1/i
c <- sum(1/(1:200000000))
d <- sum(1/(200000000:1))
order(c(a,b,c,d))
[1] 1 2 4 3
b<c
[1] TRUE
c==d
[1] FALSE
I'd expected b being the largest, since we sum up the smallest
numbers first.
Can you explain? I would have assumed the floating point algorithms
would incorporate some sort of sensible rounding strategy that would
prevent consistent underflow.
Instead, c is the largest, which is sum() applied
to the vector ordered with largest numbers first.
Can anyone shed some light on this?
> sum(1/(200000000:1))
[1] 19.69104
> sum(1/(1:200000000))
[1] 19.69104
> b <- 0 ; for(i in (200000000:1)) b <- b + 1/i
> b
[1] 19.69104
> sum(1/(1:200000000)) - sum(1/(200000000:1))
[1] 7.105427e-15
> sum(1/(200000000:1)) -b
[1] 1.673328e-12
> sum(1/(1:200000000)) -b
[1] 1.680434e-12
> .Machine$double.eps
[1] 2.220446e-16
> .Machine$double.eps ^ 0.5
[1] 1.490116e-08
What is the best way in R to compute a sum while avoiding
cancellation effects?
By the way, sum() in the above example is much faster than the
loops, so it would be nice if we could utilize it.
Why wouldn't you? Generally differences as small as you are observing
are either meaningless or unavoidable using double precision
arithmetic. If you need exact arithmetic, then pick an appropriate
platform.
--
David.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.