There is also the chance that your sampling code is not correct. Have you tried it out on, say, 5 dimensional data with increasing numbers of samples?
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of Michael Dewey > Sent: Sunday, August 12, 2012 6:54 AM > To: Boel Brynedal; R-help Mailing List > Subject: Re: [R] Problem when creating matrix of values based on covariance > matrix > > At 15:17 11/08/2012, Boel Brynedal wrote: > >Hi, > > > >I want to simulate a data set with similar covariance structure as my > >observed data, and have calculated a covariance matrix (dimensions > >8368*8368). So far I've tried two approaches to simulating data: > >rmvnorm from the mvtnorm package, and by using the Cholesky > >decomposition > >(http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random- > normal-generation/). > >The problem is that the resulting covariance structure in my simulated > >data is very different from the original supplied covariance vector. > > It is, of course, not guaranteed to be the same as you are only > sampling from the distribution. In your example below you draw a > sample of size 1000 from a 8368 variable distribution so I suspect it > is almost sure to be different although I am surprised how different. > What happens if you increase the sample size? > > >Lets just look at some of the values: > > > > > cov8[1:4,1:4] # covariance of simulated data > > X1 X2 X3 X4 > >X1 34515296.00 99956.69 369538.1 1749086.6 > >X2 99956.69 34515296.00 2145289.9 -624961.1 > >X3 369538.08 2145289.93 34515296.0 -163716.5 > >X4 1749086.62 -624961.09 -163716.5 34515296.0 > > > CEUcovar[1:4,1:4] > > [,1] [,2] [,3] [,4] > >[1,] 0.1873402987 0.001837229 0.0009009272 0.010324521 > >[2,] 0.0018372286 0.188665853 0.0124216535 -0.001755035 > >[3,] 0.0009009272 0.012421654 0.1867835412 -0.000142395 > >[4,] 0.0103245214 -0.001755035 -0.0001423950 0.192883488 > > > >So the distribution of the observed covariance is very narrow compared > >to the simulated data. > > > >None of the eigenvalues of the observed covariance matrix are > >negative, and it appears to be a positive definite matrix. Here is > >what I did to create the simulated data: > > > >Chol <- chol(CEUcovar) > >Z <- matrix(rnorm(20351 * 8368), 8368) > >X <- t(Chol) %*% Z > >sample8 <- data.frame(as.matrix(t(X))) > > > dim(sample8) > >[1] 20351 8368 > >cov8=cov(sample8,method='spearman') > > > >[earlier I've also tried sample8 <- rmvnorm(1000, > >mean=rep(0,ncol(CEUcovar)), sigma=CEUcovar, method="eigen") with as > >'bad' results, much larger covariance values in the simulated data ] > > > >Any ideas of WHY the simulated data have such a different covariance? > >Any experience with similar issues? Would be happy to supply the > >covariance matrix if anyone wants to give it a try. > >Any suggestions? Anything apparent that I left our or neglected? > > > >Any advice would be highly appreciated. > >Best, > >Bo > > Michael Dewey > i...@aghmed.fsnet.co.uk > http://www.aghmed.fsnet.co.uk/home.html > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.