On 12/5/2005 2:25 PM, (Ted Harding) wrote: > On 05-Dec-05 Martin Maechler wrote: >> UweL> x <- c(1,2,3,4,5) >> UweL> n <- length(x) >> UweL> var(x)*(n-1)/n >> >> UweL> if you really want it. >> >> It seems Insightful at some point in time have given in to >> this user request, and S-plus nowadays has >> an argument "unbiased = TRUE" >> where the user can choose {to shoot (him/her)self in the leg and} >> require 'unbiased = FALSE'. >> {and there's also 'SumSquraes = FALSE' which allows to not >> require any division (by N or N-1)} >> >> Since in some ``schools of statistics'' people are really still >> taught to use a 1/N variance, we could envisage to provide such an >> argument to var() {and cov()} as well. Otherwise, people define >> their own variance function such as >> VAR <- function(x,....) .. N/(N-1)*var(x,...) >> Should we? > > If people need to do this, such an option would be a convenience, > but I don't see that it has much further merit than that. > > My view of how to calculate a "variance" is based, not directly > on the the "unbiased" issue, but on the following. > > Suppose you define a RV X as a single value sampled from a finite > population of values X1,...,XN. > > The variance of X is (or damn well should be) defined as > > Var(X) = E(X^2) - (E(X))^2 > > and this comes to (Sum(X^2) - (Sum(X)/N)^2))/(N-1).
I don't follow this. I agree with the first line (though I prefer to write it differently), but I don't see how it leads to the second. For example, consider a distribution which is equally likely to be +/- 1, and a sample from it consisting of a single 1 and a single -1. The first formula gives 1 (which is the variance), the second gives 2. The second formula is unbiased because in a random sample I am just as likely to get a 0 from the second formula, but I'm curious about what you mean by "this comes to". Duncan ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel