On May 4, 2012, at 4:22 PM, Petr Savicky wrote:

On Fri, May 04, 2012 at 07:43:32PM +0200, Kehl Dániel wrote:
Dear Petr,

thank you for your input.
I tried to experiment with (probably somewhat biased) truncated means
like in the following code.
How I got the 225 as a truncation limit is a good question. :)

REPS1 <- REPS2 <- 1000
N1 <- 100000
N2 <- 30000
N <- N1+N2
x1 <- rep(0,N1)
x2 <- rnorm(N2,300,100)
x <- c(x1,x2)

n <- 1000

for (i in 1:REPS1){
 x_sample <- sort(sample(x,n,replace=FALSE),TRUE)
 x_trunc <- x_sample[1:225]
 REPS1[i] <- mean(x_sample)*N
 REPS2[i] <- sum(x_trunc)/n*N
 }

sum(x2)
mean(REPS1)
mean(REPS2)
sd(REPS1)
sd(REPS2)
sd(REPS2)/sd(REPS1)

Dear Daniel.

Thank you for your reply.

In the original question, you used the parameters

 N1 <- 100000
 N2 <- 3000

and now the parameters

 N1 <- 100000
 N2 <- 30000

My remark was that with the original parameters, there are only 29.1
nonzero elements on average. Now, there are 230.8 nonzero elements on
average, which is significantly better.

Discussion of the use of the truncated mean is probably a question to
other members of the list. I do not feel to be an expert on this.

Best, Petr.

My experience is that Petr is better than I at much of R, but so far in this thread I have not seen mention of methods that are designed to examine data situations with large numbers of zeros. There is a very informative review of R techniques and packages to such efforts by Achim Zeileis and others. The same material was published in the Journal of Statistical Software and as a vignette in one of the contributed packages:

www.jstatsoft.org/v27/i08/paper
cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf

I don't have this information memorized, but generally find a Google- search with "count r zeileis" to be highly effective. I've just noticed that the second author Kleiber also has put up useful material on that topic for web-searchers to use.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to