... actually, the scaling of the weights was not required as it is done by sample anyway.
On Fri, Dec 19, 2008 at 5:16 PM, Simon Knapp <sleepingw...@gmail.com> wrote: > Your code will always generate the same number of samples from each of > the normals specified on every call, where the number of samples from > each is (roughly) proportional to the weights column. If the weights > column in your data frame represents probabilities of draws coming > from each distribution, then this behaviour is not correct. Further, > it does not guarantee that the sample size is actually n. > > This definition will work with arbitrary numbers of rows: > > gmm_data <- function(n, data){ > rows <- sample(1:nrow(data), n, T, dat$weight) > rnorm(n, data$mean[rows], data$sd[rows]) > } > > and this one enforces a bit more sanity :-) > > gmm_data <- function(n, data, tol=1e-8){ > if(any(data$sd < 0)) stop("all of data$sd must be > 0") > if(any(data$weight < 0)) stop("all of data$weight must be > 0") > wgts <- if(abs(sum(data$weight) - 1) > tol) { > warning("data$weight does not sum to 1 - rescaling") > data$weight/sum(data$weight) > } else data$weight > rows <- sample(1:nrow(data), n, T, wgts) > rnorm(n, data$mean[rows], data$sd[rows]) > } > > Regards, > Simon Knapp. > > On Fri, Dec 19, 2008 at 4:14 PM, Bill McNeill (UW) > <bill...@u.washington.edu> wrote: >> I am trying to generate a set of data points from a Gaussian mixture >> model. My mixture model is represented by a data frame that looks >> like this: >> >>> gmm >> weight mean sd >> 1 0.3 0 1.0 >> 2 0.2 -2 0.5 >> 3 0.4 4 0.7 >> 4 0.1 5 0.3 >> >> I have written the following function that generates the appropriate data: >> >> gmm_data <- function(n, gmm) { >> c(rnorm(n*gmm[1,]$weight, gmm[1,]$mean, gmm[1,]$sd), >> rnorm(n*gmm[2,]$weight, gmm[2,]$mean, gmm[2,]$sd), >> rnorm(n*gmm[3,]$weight, gmm[3,]$mean, gmm[3,]$sd), >> rnorm(n*gmm[4,]$weight, gmm[4,]$mean, gmm[4,]$sd)) >> } >> >> However, the fact that my mixture has four components is hard-coded >> into this function. A better implementation of gmm_data() would >> generate data points for an arbitrary number of mixture components >> (i.e. an arbitrary number of rows in the data frame). >> >> How do I do this? I'm sure it's simple, but I can't figure it out. >> >> Thanks. >> -- >> Bill McNeill >> http://staff.washington.edu/billmcn/index.shtml > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.