> On Jan 23, 2017, at 11:47 PM, Thomas Petzoldt <t...@simecol.de> wrote: > > Dear Bert, > > thank you very much for your suggestion. You are right, ill-conditioning was > sometimes a problem for 3 components, but not in the two-component case. The > modes are well separated, and the sample size is high. > > My main problem is (1) the shape of the distributions and (2) the diversity > of available packages and approaches to this topic. > > In the mean time I made some progress in this matter by treating the data as > a mixture of gamma distributions (package mixdist, see below), so what I want > to know is the purely R technical question ;-) > > Has someone else has ever stumbled across something like this and can make a > suggestion which package to use?
In survival analysis of cancer cases, the question of cure comes up often. Physicians sometimes have a naive notion of survival to 5 years after definitive treatment with no evident recurrence being equivalent to 'cure', despite the fact that there is great heterogeneity in the recurrence and survival distribution of different cancer types. I have see papers in the medical statistical literature that used mixtures of Weibull variates to model this problem. The cancer-specific survival is often exponential (mu=1) or "sub-exponential" (shape < 1) whereas non-cancer survival times are "super-exponential" (shape >> 1). When I ran your second simulation with dist='weibull' I get: > library("mixdist") > set.seed(123) > lambda <- c(0.25, 0.75) > N <- 2000 > x <- c(rexp(lambda[1]*N, 1), rnorm(lambda[2]*N, 20, 4)) > xx <- mixgroup(x, breaks=0:40) > pp <- mixparam(mu=c(.5, 8), sigma=c(1, 10), pi=c(0.2, 0.5)) > mix <- mix(xx, pp, dist="weibull", emsteps=10) > > summary(mix) Parameters: pi mu sigma 1 0.2492 1.016 0.8702 2 0.7508 20.079 4.2328 Standard Errors: pi.se mu.se sigma.se 1 0.009683 0.04249 0.05137 2 0.009683 0.10972 0.06802 Analysis of Variance Table Df Chisq Pr(>Chisq) Residuals 29 80.407 1.01e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > p <- coef(mix) > p pi mu sigma 1 0.2492477 1.016319 0.8702055 2 0.7507523 20.079093 4.2327769 So the exponential parameter at least is well-estimated. I believe that Weibull variates with shape >> 1 are approximately normal, but I know that your mathematical sophistication exceeds mine by quite a bit, so consider this only a dilettante's comment. -- David. > > Thanks, Thomas > > > ## Approximate an Exponential+Gaussian mixture with > ## a mixture of Gammas > > library("mixdist") > set.seed(123) > lambda <- c(0.25, 0.75) > N <- 2000 > x <- c(rexp(lambda[1]*N, 1), rnorm(lambda[2]*N, 20, 4)) > > xx <- mixgroup(x, breaks=0:40) > pp <- mixparam(mu=c(1, 8), sigma=c(1, 3), pi=c(0.2, 0.5)) > mix <- mix(xx, pp, dist="gamma", emsteps=10) > > summary(mix) > p <- coef(mix) > beta <- with(p, sigma^2/mu) > alpha <- with(p, mu /beta) > lambda <- p$pi > > plot(mix, xlim=c(0, 35)) > x1 <- seq(0, 35, 0.1) > lines(x1, lambda[1]*dgamma(x1, alpha[1], 1/beta[1]), > col="orange", lwd=2) > lines(x1, lambda[2]*dgamma(x1, alpha[2], 1/beta[2]), > col="magenta", lwd=2) > > > > Am 24.01.2017 um 00:34 schrieb Bert Gunter: >> Fitting multicomponent mixtures distributions -- and 3 is already a >> lot of components -- is inherently ill-conditioned. You may need to >> reassess your strategy. You might wish to post on stackexchange >> instead to discuss such statistical issues. >> >> Cheers, >> Bert >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Mon, Jan 23, 2017 at 2:32 PM, Thomas Petzoldt <t...@simecol.de> wrote: >>> Dear friends, >>> >>> I am trying to separate bi- (and sometimes tri-) modal univariate mixtures >>> of biological data, where the first component is left bounded (e.g. >>> exponential or gamma) and the other(s) approximately Gaussian. >>> >>> After checking several packages, I'm not really clear what to do. Here is an >>> example with "mixtools" that already works quite good, however, the left >>> component is not Gaussian (and not symmetric). >>> >>> Any idea about a more adequate function or package for this problem? >>> >>> Thanks a lot! >>> >>> Thomas >>> >>> >>> >>> library(mixtools) >>> set.seed(123) >>> >>> lambda <- c(0.25, 0.75) >>> N <- 200 >>> >>> ## dist1 ~ gamma (or exponential as a special case) >>> #dist1 <- rexp(lambda[1]*N, 1) >>> dist1 <- rgamma(lambda[1]*N, 1, 1) >>> >>> ## dist2 ~ normal >>> dist2 <- rnorm(lambda[2]*N, 12, 2) >>> >>> ## mixture >>> x <- c(dist1, dist2) >>> >>> mix <- spEMsymloc(x, mu0=2, eps=1e-3, verbose=TRUE) >>> plot(mix, xlim=c(0, 25)) >>> summary(mix) >>> >>> >>> -- >>> Thomas Petzoldt >>> TU Dresden, Institute of Hydrobiology >>> http://www.tu-dresden.de/Members/thomas.petzoldt >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.