FWIW, both Excel 2007 and LibreOffice 4.2 yield the correct variance for
the numbers in Ranjan Maitra's HW problem for incoming students in R.
Namely, both these programs yield a sample variance of 0.2777777778
(rounded to 10 decimal digits).
Ronald Wyllys
On 02/10/2015 05:00 AM, r-help-requ...@r-project.org wrote:
ate: Mon, 9 Feb 2015 17:39:14 -0600
From: Ranjan Maitra<maitra.mbox.igno...@inbox.com>
To:<r-h...@stat.math.ethz.ch>
Subject: Re: [R] Variance is different in R vs. Excel?
Message-ID:<20150209173914.bae4d99ebeadafed35153...@inbox.com>
Content-Type: text/plain; charset="us-ascii"
I suspect that this is the long-documented issue with indeed an entire industry -- and publications --
devoted to finding such errors in Excel. Till the 2013 version, it used to be a favorite HW problem of mine.
Basically, Excel uses the "short formula" to calculate the variance and the sd. This "short
formula" has numerical issues with larger numbers (though I am surprised at the OP's data because these
numbers were not that large). Anyway, the "long formula" which removes the mean from each
datapoint, squares and sums is preferred with large numbers.
Btw, my HW problem for incoming students in my R class would be this:
Consider the following numbers:
100000000000001, 100000000000002, 100000000000001, 100000000000002,
100000000000001,
100000000000002, 100000000000001, 100000000000002, 100000000000001,
100000000000002.
Calculate the variance in Excel (gives pure garbage) and in R.
I got this (or may have adapted it) from the book: Numerical Issues in
Statistical Computing for the Social Scientist by M. Altman, J. Gill and M. P.
McDonald.
After over 10 years, Excel finally appears to have fixed the issue. gnumeric
never had this problem.
Best wishes,
Ranjan
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.