Dear Petr,
thank you for your input.
I tried to experiment with (probably somewhat biased) truncated means
like in the following code.
How I got the 225 as a truncation limit is a good question. :)
REPS1 <- REPS2 <- 1000
N1 <- 100000
N2 <- 30000
N <- N1+N2
x1 <- rep(0,N1)
x2 <- rnorm(N2,300,100)
x <- c(x1,x2)
n <- 1000
for (i in 1:REPS1){
x_sample <- sort(sample(x,n,replace=FALSE),TRUE)
x_trunc <- x_sample[1:225]
REPS1[i] <- mean(x_sample)*N
REPS2[i] <- sum(x_trunc)/n*N
}
sum(x2)
mean(REPS1)
mean(REPS2)
sd(REPS1)
sd(REPS2)
sd(REPS2)/sd(REPS1)
Best,
daniel
2012.05.03. 17:45 keltezéssel, Petr Savicky írta:
On Thu, May 03, 2012 at 03:08:00PM +0200, Kehl Dániel wrote:
Dear List-members,
I have a problem where I have to estimate a mean, or a sum of a
population but for some reason it contains a huge amount of zeros.
I cannot give real data but I constructed a toy example as follows
N1<- 100000
N2<- 3000
x1<- rep(0,N1)
x2<- rnorm(N2,300,100)
x<- c(x1,x2)
n<- 1000
x_sample<- sample(x,n,replace=FALSE)
I want to estimate the sum of x based on x_sample (not knowing N1 and N2
but their sum (N) only).
The sample mean has a huge standard deviation I am looking for a better
estimator.
Hi.
I do not know the exact answer, but let me formulate the following observation.
If the question is redefined to estimate the mean of nonzero numbers, then
an estimate is mean(x_sample[x_sample != 0]). Its standard deviation in your
situation may be estimated as
res<- rep(NA, times=1000)
for (i in seq.int(along=res)) {
x_sample<- sample(x,n,replace=FALSE)
res[i]<- mean(x_sample[x_sample != 0])
}
sd(res)
[1] 18.72677 # this varies with the seed a bit
The observation is that this cannot be improved much, since the estimate
is based on a very small sample. The average size of the sample of nonzero
values is N2/(N1+N2)*n = 29.1. So, the standard deviation should be
something close to 100/sqrt(29.1) = 18.5376.
Petr Savicky.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.