On Jun 24, 2010, at 4:08 PM, Lasse Kliemann wrote:

a <- 0 ; for(i in (1:200000000)) a <- a + 1/i
b <- 0 ; for(i in (200000000:1)) b <- b + 1/i
c <- sum(1/(1:200000000))
d <- sum(1/(200000000:1))
order(c(a,b,c,d))
 [1] 1 2 4 3
b<c
 [1] TRUE
c==d
 [1] FALSE

I'd expected b being the largest, since we sum up the smallest
numbers first.

Can you explain? I would have assumed the floating point algorithms would incorporate some sort of sensible rounding strategy that would prevent consistent underflow.

Instead, c is the largest, which is sum() applied
to the vector ordered with largest numbers first.

Can anyone shed some light on this?
> sum(1/(200000000:1))
[1] 19.69104
> sum(1/(1:200000000))
[1] 19.69104
> b <- 0 ; for(i in (200000000:1)) b <- b + 1/i
> b
[1] 19.69104
> sum(1/(1:200000000)) - sum(1/(200000000:1))
[1] 7.105427e-15
> sum(1/(200000000:1)) -b
[1] 1.673328e-12
> sum(1/(1:200000000)) -b
[1] 1.680434e-12
> .Machine$double.eps
[1] 2.220446e-16
> .Machine$double.eps ^ 0.5
[1] 1.490116e-08


What is the best way in R to compute a sum while avoiding
cancellation effects?

By the way, sum() in the above example is much faster than the
loops, so it would be nice if we could utilize it.

Why wouldn't you? Generally differences as small as you are observing are either meaningless or unavoidable using double precision arithmetic. If you need exact arithmetic, then pick an appropriate platform.

--
David.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to